What exactly the ROC curve can tell us or can be inferred?
Show older comments
Hi Smart Guys,
I wrote some codes to run a linear discriminant analysis based classification:
%%Construct a LDA classifier with selected features and ground truth information
LDAClassifierObject = ClassificationDiscriminant.fit(featureSelcted, groundTruthGroup, 'DiscrimType', 'linear');
LDAClassifierResubError = resubLoss(LDAClassifierObject);
Thus, I can get
Resubstitution Error of LDA (Training Error): 1.7391e-01
Resubstitution Accuracy of LDA: 82.61%
Confusion Matrix of LDA:
14 3
1 5
Then I run a ROC analysis for the LDA classifier:
% Predict resubstitution response of LDA classifier
[LDALabel, LDAScore] = resubPredict(LDAClassifierObject);
% Fit probabilities for scores (the groundTruthGroup contains lables either 'Good' or 'Bad')
[FPR, TPR, Thr, AUC, OPTROCPT] = perfcurve(groundTruthGroup(:,1), LDAScore(:,1), 'Good');
I have got:
OPTROCPT =
0.1250 0.8667
Therefore, we can get:
Accuracy of LDA after ROC analysis: 86.91%
Confusion Matrix of LDA after ROC analysis:
13 1
2 7
My questions are:
1. After ROC analysis we obtained a better accuracy, when we report the accuracy of the classifier, which value we should use? What exactly the ROC curve can tell us or can be inferred? Can we say after ROC analysis we found a better accuracy of the LDA classifier?
2. Why the ROC can produce a better accuracy for the classifier, but the original ClassificationDiscriminant.fit can't?
3. I have also done a cross validation for the LDA classifier, like
cvLDAClassifier = crossval(LDAClassifierObject, 'leaveout', 'on');
Then how to get the ROC analysis for the cross validation? 'resubPredict' method seems only accept 'discriminant object' as input, then how can we get the scores?
4. classperf function of Matlab is very handy to gather all the information of the classifier, like
%%Get the performance of the classifier
LDAClassifierPerformace = classperf(groundTruthGroup, resubPredict(LDAClassifierObject));
However, anyone knows how to gather these information such as accuracy, FPR, etc. for the cross validation results?
Thanks very much. I am really looking forward to see the reply to above questions.
A.
Accepted Answer
More Answers (1)
Ilya
on 18 Mar 2013
1 vote
1-2. You can use either accuracy. The accuracy obtained by LDA is for assigning every observation into the class with the largest posterior. For two classes, this is equivalent to setting the threshold on the posterior probability for the positive class to 0.5. ROC analysis lets you optimize this threshold and therefore obtain a better accuracy.
The improvement obtained by the ROC analysis in your case is not statistically significant. For a small sample like yours, you would have trouble demonstrating (convincingly) superiority of one classifier over another. Look up the sign test. Let n01 be the number of observations misclassified by the 1st model and correctly classified by the 2nd model, and let n10 be the other way around. Then 2*binocdf(min(n01,n10),n01+n10,0.5) gives you a p-value for the two-sided test of equivalence for the two models.
3. Type methods(cvLDAClassifier) to see all methods of the cross-validated object (use properties to see its properties) or read the class description in the doc. The kfoldPredict method is what you want.
Categories
Find more on Discriminant Analysis in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!