Leave-one-out cross-validation with svmtrain gives 'impossible' accuracy results
Show older comments
I am using svmtrain to perform leave-one-out cross-validation on some data that I have access to, and I was noticing that some svm models generated were obtaining 0% accuracy for a binary classification problem involving hundreds of examples.
To perfectly pick the wrong binary choice that many times is essentially impossible, so I figured there was something wrong with my svm implementation. Therefore, I wrote a test program which generates a random feature matrix as training input and a random binary value as training output. Even with this set up, some svm models generated by svmtrain give 0% accuracy when the output is totally random and uncorrelated with the input.
Can anyone explain what I am doing wrong? I have included the test program source below:
%clear workspace
clear;
clc;
pause on;
%seed random
rng('default');
%initialize variables
n_sets=1000;
n_pairs=20;
n_features=2;
%initialize classification accuracy
accuracy=zeros(1,n_sets);
for i=1:n_sets
fprintf('\nSet #%i\n',i);
%generate random feature matrix
training_input=single(rand(n_pairs,n_features));
%generate random classification matrix
training_output=single(rand(n_pairs,1)>0.5);
%initialize correct counter
correct=0;
%Perform leave one out cross validation
for j=1:n_pairs
%define inputs for SVM model
model_training_input=training_input;
model_training_output=training_output;
%blind training on the jth row of the feature matrix
model_training_output(j)=NaN;
%generate SVM model from all of feature matrix other than jth row
svm_model=svmtrain(model_training_input,model_training_output,'autoscale',false);
%test model on the jth row
prediction=svmclassify(svm_model,training_input(j,:));
%check if prediction was correct
if(prediction==training_output(j)), correct=correct+1; end
end
accuracy(i)=correct/n_pairs;
fprintf('Accuracy = %s\n',num2str(accuracy(i)));
if(accuracy(i)==0||accuracy(i)==1)
fprintf('WTF\n');
pause;
end
end
Accepted Answer
More Answers (1)
Ilya
on 17 Apr 2012
0 votes
I don't know what went wrong with your real data. In this mock-up exercise, you are trying to separate two classes that are essentially inseparable. SVM often fails to find any good decision boundary and classifies everything into the majority class. You can see, for instance, that all 20 observations are used as support vectors; this is an indication that SVM is not doing anything useful. When you generate 20 observations, 10 in each class, and remove one of them, the majority class is opposite to the class of the observation you have removed. That's why the incorrect class is predicted more often than the correct class.
Generally, leave-one-out CV is not a good choice. 10-fold CV is usually better.
Categories
Find more on Gaussian Mixture Models in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!