ClassificationTree with unequal costs
11 views (last 30 days)
Show older comments
Hello,
I have a question regarding the predict functionality on a classification tree when the classification costs are not equal. Specifically, when costs are unequal the resulting decision tree contains leaves for which the node class is not equal to the class with the maximum probability (instead, it is based on minimizing the cost). Now, if I use this tree to predict the outcome based on a data set, it should return the node class which is based on the unequal costs, right? Below is a simple example which illustrates the problem (I am using Matlab 2011a). Does the predict function only give the result with the maximum posterior probability, without taking into account the costs?
Thanks, Wes
% simple example
load fisheriris
% unequal cost function for illustration
costMat = [0, 1, 1; 1, 0, 10; 1, 1, 0;];
tree = ClassificationTree.fit(meas,species,'Cost', ...
costMat, 'ClassNames', {'setosa','versicolor','virginica'});
view(tree, 'mode', 'graph');
% look at node 8 (should be the rightmost node labeled 'versicolor')
tree.ClassProb(8,:)
tree.NodeClass(8)
% Note that class prob indicates that virginica is the most likely
% class, but the NodeClass is actually versicolor, because of the
% costs, so far so good!
% Use the tree to predict the results
[l,s,n,c] = predict(tree, meas);
% Look at the labels for examples that ended in node 8
% We expect versicolor based on the label for this node,
% however, they all show virginica
l(n==8)
0 Comments
Answers (1)
Ilya
on 15 Sep 2011
Yes, ClassificationTree always predicts class labels based on posterior probabilities. In that, ClassificationTree/predict deviates from classregtree/eval method. Unfortunately, this is not explained in the documentation in sufficient detail.
If you want to predict labels based on costs, you can do what you said
[~,~,n] = predict(tree, meas); tree.NodeClass(n)
Or you can apply the "average cost" correction before growing the tree. This correction is used by a tree grown with costs. Compare:
t1 = ClassificationTree.fit(meas,species,'Cost',costMat,'ClassNames',{'setosa' 'versicolor' 'virginica'})
and
t2 = ClassificationTree.fit(meas,species,'prior',sum(costMat,2),'ClassNames',{'setosa' 'versicolor' 'virginica'})
The trees are identical, but the 2nd tree predicts 'versicolor' for observations that landed on node 8.
0 Comments
See Also
Categories
Find more on Image Data Workflows in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!