Documentation

# resubLoss

Resubstitution classification loss for multiclass error-correcting output codes (ECOC) model

## Syntax

``L = resubLoss(Mdl)``
``L = resubLoss(Mdl,Name,Value)``

## Description

example

````L = resubLoss(Mdl)` returns the classification loss by resubstitution (`L`) for the multiclass error-correcting output codes (ECOC) model `Mdl` using the training data stored in `Mdl.X` and the corresponding class labels stored in `Mdl.Y`. By default, `resubLoss` uses the classification error to compute `L`.The classification loss (`L`) is a generalization or resubstitution quality measure. Its interpretation depends on the loss function and weighting scheme, but in general, better classifiers yield smaller classification loss values.```

example

````L = resubLoss(Mdl,Name,Value)` returns the classification loss with additional options specified by one or more name-value pair arguments. For example, you can specify the loss function, decoding scheme, and verbosity level.```

## Examples

collapse all

Compute the resubstitution loss for an ECOC model with SVM binary learners.

Load Fisher's iris data set. Specify the predictor data `X` and the response data `Y`.

```load fisheriris X = meas; Y = species;```

Train an ECOC model using SVM binary classifiers. Standardize the predictors using an SVM template, and specify the class order.

```t = templateSVM('Standardize',true); classOrder = unique(Y)```
```classOrder = 3x1 cell array {'setosa' } {'versicolor'} {'virginica' } ```
`Mdl = fitcecoc(X,Y,'Learners',t,'ClassNames',classOrder);`

`t` is an SVM template object. During training, the software uses default values for empty properties in `t`. `Mdl` is a `ClassificationECOC` model.

Estimate the resubstitution classification error, which is the default classification loss.

`L = resubLoss(Mdl)`
```L = 0.0267 ```

The ECOC model misclassifies 2.67% of the training-sample irises.

Determine the quality of an ECOC model by using a custom loss function that considers the minimal binary loss for each observation.

Load Fisher's iris data set. Specify the predictor data `X`, the response data `Y`, and the order of the classes in `Y`.

```load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y) % Class order```
```classOrder = 3x1 categorical array setosa versicolor virginica ```
`rng(1); % For reproducibility`

Train an ECOC model using SVM binary classifiers. Standardize the predictors using an SVM template, and specify the class order.

```t = templateSVM('Standardize',true); Mdl = fitcecoc(X,Y,'Learners',t,'ClassNames',classOrder);```

`t` is an SVM template object. During training, the software uses default values for empty properties in `t`. `Mdl` is a `ClassificationECOC` model.

Create a function that takes the minimal loss for each observation, then averages the minimal losses for all observations. `S` corresponds to the `NegLoss` output of `resubPredict`.

`lossfun = @(~,S,~,~)mean(min(-S,[],2));`

Compute the custom classification loss for the training data.

`resubLoss(Mdl,'LossFun',lossfun)`
```ans = 0.0065 ```

The average minimal binary loss for the training data is `0.0065`.

## Input Arguments

collapse all

Full, trained multiclass ECOC model, specified as a `ClassificationECOC` model trained with `fitcecoc`.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `resubLoss(Mdl,'BinaryLoss','hamming','LossFun',@lossfun)` specifies `'hamming'` as the binary learner loss function and the custom function handle `@lossfun` as the overall loss function.

Binary learner loss function, specified as the comma-separated pair consisting of `'BinaryLoss'` and a built-in loss function name or function handle.

• This table describes the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula.

ValueDescriptionScore Domaing(yj,sj)
`'binodeviance'`Binomial deviance(–∞,∞)log[1 + exp(–2yjsj)]/[2log(2)]
`'exponential'`Exponential(–∞,∞)exp(–yjsj)/2
`'hamming'`Hamming[0,1] or (–∞,∞)[1 – sign(yjsj)]/2
`'hinge'`Hinge(–∞,∞)max(0,1 – yjsj)/2
`'linear'`Linear(–∞,∞)(1 – yjsj)/2
`'logit'`Logistic(–∞,∞)log[1 + exp(–yjsj)]/[2log(2)]
`'quadratic'`Quadratic[0,1][1 – yj(2sj – 1)]2/2

The software normalizes binary losses so that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class.

• For a custom binary loss function, for example `customFunction`, specify its function handle `'BinaryLoss',@customFunction`.

`customFunction` has this form:

`bLoss = customFunction(M,s)`
where:

• `M` is the K-by-L coding matrix stored in `Mdl.CodingMatrix`.

• `s` is the 1-by-L row vector of classification scores.

• `bLoss` is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class.

• K is the number of classes.

• L is the number of binary learners.

For an example of passing a custom binary loss function, see Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function.

The default `BinaryLoss` value depends on the score ranges returned by the binary learners. This table describes some default `BinaryLoss` values based on the given assumptions.

AssumptionDefault Value
All binary learners are SVMs or either linear or kernel classification models of SVM learners.`'hinge'`
All binary learners are ensembles trained by `AdaboostM1` or `GentleBoost`.`'exponential'`
All binary learners are ensembles trained by `LogitBoost`.`'binodeviance'`
All binary learners are linear or kernel classification models of logistic regression learners. Or, you specify to predict class posterior probabilities by setting `'FitPosterior',true` in `fitcecoc`.`'quadratic'`

To check the default value, use dot notation to display the `BinaryLoss` property of the trained model at the command line.

Example: `'BinaryLoss','binodeviance'`

Data Types: `char` | `string` | `function_handle`

Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of `'Decoding'` and `'lossweighted'` or `'lossbased'`. For more information, see Binary Loss.

Example: `'Decoding','lossbased'`

Loss function, specified as the comma-separated pair consisting of `'LossFun'` and `'classiferror'` or a function handle.

• Specify the built-in function `'classiferror'`. In this case, the loss function is the classification error, which is the proportion of misclassified observations.

• Or, specify your own function using function handle notation.

Assume that `n = size(X,1)` is the sample size and `K` is the number of classes. Your function must have the signature `lossvalue = lossfun(C,S,W,Cost)`, where:

• The output argument `lossvalue` is a scalar.

• You specify the function name (`lossfun`).

• `C` is an `n`-by-`K` logical matrix with rows indicating the class to which the corresponding observation belongs. The column order corresponds to the class order in `Mdl.ClassNames`.

Construct `C` by setting ```C(p,q) = 1``` if observation `p` is in class `q`, for each row. Set all other elements of row `p` to `0`.

• `S` is an `n`-by-`K` numeric matrix of negated loss values for the classes. Each row corresponds to an observation. The column order corresponds to the class order in `Mdl.ClassNames`. The input `S` resembles the output argument `NegLoss` of `resubPredict`.

• `W` is an `n`-by-1 numeric vector of observation weights. If you pass `W`, the software normalizes its elements to sum to `1`.

• `Cost` is a `K`-by-`K` numeric matrix of misclassification costs. For example, ```Cost = ones(K) – eye(K)``` specifies a cost of 0 for correct classification and 1 for misclassification.

Specify your function using `'LossFun',@lossfun`.

Data Types: `char` | `string` | `function_handle`

Estimation options, specified as the comma-separated pair consisting of `'Options'` and a structure array returned by `statset`.

To invoke parallel computing:

• You need a Parallel Computing Toolbox™ license.

• Specify `'Options',statset('UseParallel',true)`.

Verbosity level, specified as the comma-separated pair consisting of `'Verbose'` and `0` or `1`. `Verbose` controls the number of diagnostic messages that the software displays in the Command Window.

If `Verbose` is `0`, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages.

Example: `'Verbose',1`

Data Types: `single` | `double`

collapse all

### Classification Error

The classification error is a binary classification error measure that has the form

`$L=\frac{\sum _{j=1}^{n}{w}_{j}{e}_{j}}{\sum _{j=1}^{n}{w}_{j}},$`

where:

• wj is the weight for observation j. The software renormalizes the weights to sum to 1.

• ej = 1 if the predicted class of observation j differs from its true class, and 0 otherwise.

In other words, the classification error is the proportion of observations misclassified by the classifier.

### Binary Loss

A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class.

Suppose the following:

• mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j).

• sj is the score of binary learner j for an observation.

• g is the binary loss function.

• $\stackrel{^}{k}$ is the predicted class for the observation.

In loss-based decoding [Escalera et al.], the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is,

`$\stackrel{^}{k}=\underset{k}{\text{argmin}}\sum _{j=1}^{L}|{m}_{kj}|g\left({m}_{kj},{s}_{j}\right).$`

In loss-weighted decoding [Escalera et al.], the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is,

`$\stackrel{^}{k}=\underset{k}{\text{argmin}}\frac{\sum _{j=1}^{L}|{m}_{kj}|g\left({m}_{kj},{s}_{j}\right)}{\sum _{j=1}^{L}|{m}_{kj}|}.$`

Allwein et al. suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range.

This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj).

ValueDescriptionScore Domaing(yj,sj)
`'binodeviance'`Binomial deviance(–∞,∞)log[1 + exp(–2yjsj)]/[2log(2)]
`'exponential'`Exponential(–∞,∞)exp(–yjsj)/2
`'hamming'`Hamming[0,1] or (–∞,∞)[1 – sign(yjsj)]/2
`'hinge'`Hinge(–∞,∞)max(0,1 – yjsj)/2
`'linear'`Linear(–∞,∞)(1 – yjsj)/2
`'logit'`Logistic(–∞,∞)log[1 + exp(–yjsj)]/[2log(2)]
`'quadratic'`Quadratic[0,1][1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.].

Do not confuse the binary loss with the overall classification loss (specified by the `'LossFun'` name-value pair argument of the `loss` and `predict` object functions), which measures how well an ECOC classifier performs as a whole.

## References

[1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classiﬁers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141.

[2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134.

[3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of error-correcting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.