Documentation

resubEdge

Class: ClassificationNaiveBayes

Classification edge for naive Bayes classifiers by resubstitution

Syntax

``e = resubEdge(Mdl)``

Description

example

````e = resubEdge(Mdl)` returns the resubstitution classification edge (`e`) for the naive Bayes classifier `Mdl` using the training data stored in `Mdl.X` and corresponding class labels stored in `Mdl.Y`.```

Input Arguments

expand all

A fully trained naive Bayes classifier, specified as a `ClassificationNaiveBayes` model trained by `fitcnb`.

Output Arguments

expand all

Classification edge, returned as a scalar. If you passed in weights when training the classifier, then `e` is the weighted classification edge. The software normalizes the weights so that they sum to the prior probability of their respective class.

Examples

expand all

```load fisheriris X = meas; % Predictors Y = species; % Response rng(1);```

Train a naive Bayes classifier. It is good practice to specify the class order. Assume that each predictor is conditionally, normally distributed given its label.

`Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'});`

`Mdl` is a trained `ClassificationNaiveBayes` classifier.

Estimate the resubstitution edge.

`e = resubEdge(Mdl)`
```e = 0.8944 ```

The mean of the training sample margins is approximately `0.9`, which indicates that the classifier classifies in-sample observations with high confidence.

The classifier edge measures the average of the classifier margins. One way to perform feature selection is to compare training sample edges from multiple models. Based solely on this criterion, the classifier with the highest edge is the best classifier.

Load Fisher's iris data set. Define two data sets:

• `fullX` contains all predictors.

• `partX` contains the last two predictors.

```load fisheriris X = meas; % Predictors Y = species; % Response fullX = X; partX = X(:,3:4);```

Train naive Bayes classifiers for each predictor set.

```FullMdl = fitcnb(fullX,Y); PartMdl = fitcnb(partX,Y);```

Estimate the training sample edge for each classifier.

`fullEdge = resubEdge(FullMdl)`
```fullEdge = 0.8944 ```
`partEdge = resubEdge(PartMdl)`
```partEdge = 0.9169 ```

The edge for the classifier trained on predictors 3 and 4 is greater, suggesting that the classifier trained using only those predictors has a better in-sample fit.