How to get probabilities of each class which is classified with RUSBoost for an imbalanced data set

18 views (last 30 days)
I have a dataset with 7 classes and 3 features. The data set is hugely imbalanced. So, I referred https://www.mathworks.com/help/stats/classification-with-imbalanced-data.html to classfy the data. I get a prediction accuracy of 94%. But I need the probability of getting each class for a feature or set of features. How to get probability of each class to a given feature?
[Nt Mt] = size(y); % Number of observations in the training sample
t = templateTree('MaxNumSplits',Nt);
rusTree = fitcensemble(X,y,'Method','RUSBoost', 'NumLearningCycles',1000,'Learners',t,'LearnRate',0.1,'nprint',100);
[~,scores] = predict(rusTree,[1 16 3 5])
I get following scores for above code, 0.7345, 3.5105, 1.1893, 0, 0, 0, 0.0082
But above scores are not probablities, how to get values between 0-1 where sum of proabilities in all classes is equal to 1?

Accepted Answer

Raunak Gupta
Raunak Gupta on 29 Apr 2020
Edited: Raunak Gupta on 29 Apr 2020
Hi,
The reason behind predict not returning scores as probability estimates is because the RUSBoost algorithm used in the model does not treat scores as probabilistic estimates. Instead, the score represents the confidence of a classification into a class, higher, being more confidence as it is explained in the documentation link of fitcensemble .
If you would like to get probabilistic estimate for scores you can set the 'ScoreTransform' to 'logit' in 'fitcensemble'. This name-value pair transforms the score to probabilistic estimates. This is explained here. Then using predict on the model returns scores as probability values for each class.
  2 Comments
Siddharth Arora
Siddharth Arora on 27 Feb 2022
Hi Raunak,
I have treid the suggested approaches: (1) using Score Transform to logit in fitcenesmble (for a binary classification problem and the scores are still not probabilistic estimates. I have tried specifing 'ScoreTransform' to 'logit' in 'fitcensemble', and also tried Mdl.ScoreTransform = 'logit' before using the 'predict' function, and the scores (any given row) do not add to 1. I have tried 'doublelogit' for Adaboost and that works fine. But not RUSboost. Please let me know how else I could convert scores from RUSboost to probabilistic estimates? Is it right to use scores from RUSboost as inputs for perfcurve to get AUC values, or should the scores be transformed first? Thank you
Louis
Louis on 6 Nov 2023
I am experiencing the exactly same issue as Siddharth Arora as above. Setting "ScoreTransform' to 'logit' ensures that the score outputs are below 1, but score outputs do not sum to 1.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!