resubMargin

Find classification margins for support vector machine (SVM) classifier by resubstitution

Description

example

m = resubMargin(SVMModel) returns the resubstitution classification margins (m) for the support vector machine (SVM) classifier SVMModel using the training data stored in SVMModel.X and the corresponding class labels stored in SVMModel.Y.

m is returned as a numeric vector with the same length as Y. The software estimates each entry of m using the trained SVM classifier SVMModel, the corresponding row of X, and the true class label Y.

Examples

collapse all

Load the ionosphere data set.

load ionosphere

Train an SVM classifier. Standardize the data and specify that 'g' is the positive class.

SVMModel = fitcsvm(X,Y,'ClassNames',{'b','g'},'Standardize',true);

SVMModel is a ClassificationSVM classifier.

Estimate the in-sample classification margins.

m = resubMargin(SVMModel);
m(10:20)
ans = 11×1

    5.5625
    4.2918
    1.9990
    4.5521
   -1.4904
    3.2817
    4.0261
    4.5421
   16.4460
    2.0006
      ⋮

An observation margin is the observed (true) class score minus the maximum false class score among all scores in the respective class. Classifiers that yield relatively large margins are preferred.

Perform feature selection by comparing in-sample margins from multiple models. Based solely on this comparison, the model with the highest margins is the best model.

Load the ionosphere data set. Define two data sets:

  • fullX contains all predictors (except the removed column of 0s).

  • partX contains the last 20 predictors.

load ionosphere
fullX = X;
partX = X(:,end-20:end);

Train SVM classifiers for each predictor set.

FullSVMModel = fitcsvm(fullX,Y);
PartSVMModel = fitcsvm(partX,Y);

Estimate the in-sample margins for each classifier.

fullMargins = resubMargin(FullSVMModel);
partMargins = resubMargin(PartSVMModel);
n = size(X,1);
p = sum(fullMargins < partMargins)/n
p = 0.2251

Approximately 22% of the margins from the full model are less than those from the model with fewer predictors. This suggests that the model trained with all the predictors is better.

Input Arguments

collapse all

Full, trained SVM classifier, specified as a ClassificationSVM model trained with fitcsvm.

More About

collapse all

Classification Edge

The edge is the weighted mean of the classification margins.

The weights are the prior class probabilities. If you supply weights, then the software normalizes them to sum to the prior probabilities in the respective classes. The software uses the renormalized weights to compute the weighted mean.

One way to choose among multiple classifiers, for example, to perform feature selection, is to choose the classifier that yields the highest edge.

Classification Margin

The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class.

The software defines the classification margin for binary classification as

m=2yf(x).

x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x).

If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.

Classification Score

The SVM classification score for classifying observation x is the signed distance from x to the decision boundary ranging from -∞ to +∞. A positive score for a class indicates that x is predicted to be in that class. A negative score indicates otherwise.

The positive class classification score f(x) is the trained SVM classification function. f(x) is also the numerical predicted response for x, or the score for predicting x into the positive class.

f(x)=j=1nαjyjG(xj,x)+b,

where (α1,...,αn,b) are the estimated SVM parameters, G(xj,x) is the dot product in the predictor space between x and the support vectors, and the sum includes the training set observations. The negative class classification score for x, or the score for predicting x into the negative class, is –f(x).

If G(xj,x) = xjx (the linear kernel), then the score function reduces to

f(x)=(x/s)β+b.

s is the kernel scale and β is the vector of fitted linear coefficients.

For more details, see Understanding Support Vector Machines.

Algorithms

For binary classification, the software defines the margin for observation j, mj, as

mj=2yjf(xj),

where yj ∊ {-1,1}, and f(xj) is the predicted score of observation j for the positive class. However, mj = yjf(xj) is commonly used to define the margin.

References

[1] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.

Introduced in R2014a