what's the difference between DeltaCritDecisionSplit property vs. Gini's Diversity Index?
11 views (last 30 days)
Show older comments
I'm implementing a Random Forests code for selecting the most important predictors for my application. The treebagger webminar has two examples of ways to estimating predictor importance (DeltaCritDecisionSplit, OOBPermutedVarDeltaError). is DeltaCritDecisionSplit like the Gini diversity index (of predictorImportance)? If not, how are they different?
0 Comments
Answers (2)
Ilya
on 19 Jan 2012
Yes, DeltaCritDecisionSplit property of TreeBagger is the equivalent of predictorImportance method for an ensemble produced by fitensemble function. It is obtained by summing the impurity gain over all splits on a given predictor. Gini is the default impurity for classification trees.
Ilya
on 19 Jan 2012
Predictor importance estimates for every tree in an ensemble are added together. The sum is then divided by the number of trees. This means that the estimates are comparable if the two ensembles are composed of trees of roughly the same depth (that is, trees using roughly the same number of splits). Boosted trees by default use stumps (one-split trees), and many predictors may be never split on. Bagged trees by default are deep, and most predictors get many splits.
In general, comparing predictor importance estimates across ensembles of different types may not produce anything useful. These can only tell you what predictors are important for this particular ensemble.
See Also
Categories
Find more on Classification Ensembles in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!