fitcdiscr bug: Why does "ClassNames" now have to be provided in alphanumerical order otherwise accuracy is terrible?
Show older comments
[Update: There is a known bug in kfoldLoss. See answer and workaround from Mathworks technical support in the answers section]
I couldn't work out why I was getting terrible results (as if completely random) with fitcdiscr() and I've found out that it is because I wasn't specifying the ClassNames argument in alphabetical order. Comparing MATLAB 2024a to 2022, this is new behaviour and presumably a bug. One of the reasons to specify ClassNames can be to change the order for the results summary, etc.
Example code that gives terrible accuracy:
load fisheriris
Mdl = fitcdiscr(meas, species, "ClassNames", flip(unique(species)), "KFold", 10);
validationAccuracy = 1 - kfoldLoss(Mdl)
By simply removing the function flip(), the above code gives the expected accuracy of 0.98, otherwise it gives 0.32 (which is basically random for three classes). A side-effect of unique() is that it sorts the data into alphanumerical order, which isn't actually required for fisheriris because the observations happen to be in alphabetical order.
Here is some code that will give terrible accuracies if the order is randomised and the "stable" parameter is used to keep the random order:
load fisheriris
r = randperm(length(meas)); % Randomise the order of the occurrences
Mdl = fitcdiscr(meas(r,:), species(r), "ClassNames", unique(species(r), "stable"), "KFold", 10);
validationAccuracy = 1 - kfoldLoss(Mdl)
If run a few times, I get results like:
validationAccuracy =
0.3267
validationAccuracy =
0.0067
validationAccuracy =
0.0067
validationAccuracy =
0.3267
validationAccuracy =
0.9800
Update: I have now been able to test the code on another computer that still has MATLAB 2022 installed and it gives the correct accuracy with the above code, so this appears to be a bug in the latest version of MATLAB! I have reported it to Mathworks.
Answers (2)
Athanasios Paraskevopoulos
on 17 May 2024
Edited: Athanasios Paraskevopoulos
on 17 May 2024
- Code with Issue:
load fisheriris
Mdl = fitcdiscr(meas, species, "ClassNames", flip(unique(species, "stable")), "KFold", 10);
validationAccuracy = 1 - kfoldLoss(Mdl)
- Correct Code:
load fisheriris
Mdl = fitcdiscr(meas, species, "ClassNames", unique(species, "stable"), "KFold", 10);
validationAccuracy = 1 - kfoldLoss(Mdl)
Your observation indicates a potential bug in the latest version of MATLAB that should be reported to MathWorks. Until the issue is resolved, always specify the ClassNames argument in alphabetical order to ensure correct behavior.
1 Comment
Categories
Find more on Web Services in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!