what's the difference between DeltaCritDecisionSplit property vs. Gini's Diversity Index?

11 views (last 30 days)
I'm implementing a Random Forests code for selecting the most important predictors for my application. The treebagger webminar has two examples of ways to estimating predictor importance (DeltaCritDecisionSplit, OOBPermutedVarDeltaError). is DeltaCritDecisionSplit like the Gini diversity index (of predictorImportance)? If not, how are they different?

Answers (2)

Ilya
Ilya on 19 Jan 2012
Yes, DeltaCritDecisionSplit property of TreeBagger is the equivalent of predictorImportance method for an ensemble produced by fitensemble function. It is obtained by summing the impurity gain over all splits on a given predictor. Gini is the default impurity for classification trees.

Ilya
Ilya on 19 Jan 2012
Predictor importance estimates for every tree in an ensemble are added together. The sum is then divided by the number of trees. This means that the estimates are comparable if the two ensembles are composed of trees of roughly the same depth (that is, trees using roughly the same number of splits). Boosted trees by default use stumps (one-split trees), and many predictors may be never split on. Bagged trees by default are deep, and most predictors get many splits.
In general, comparing predictor importance estimates across ensembles of different types may not produce anything useful. These can only tell you what predictors are important for this particular ensemble.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!