B = TreeBagger(NTrees,X,Y)
B = TreeBagger(NTrees,X,Y,'param1'
,val1
,'param2'
,val2
,...)
B = TreeBagger(NTrees,X,Y)
creates an ensemble B
of NTrees
decision
trees for predicting response Y
as a function of
predictors X
. By default TreeBagger
builds
an ensemble of classification trees. The function can build an ensemble
of regression trees by setting the optional input argument 'method'
to 'regression'
.
X
is a numeric matrix of training data. Each
row represents an observation and each column represents a predictor
or feature. Y
is an array of true class labels
for classification or numeric function values for regression. True
class labels can be a numeric vector, character matrix, vector cell
array of strings or categorical vector. TreeBagger
converts
labels to a cell array of strings for classification.
For more information on grouping variables, see Grouping Variables.
B = TreeBagger(NTrees,X,Y,
specifies
optional parameter name/value pairs: 'param1'
,val1
,'param2'
,val2
,...)
'FBoot'  Fraction of input data to sample with replacement from the input data for growing each new tree. Default value is 1. 
'Cost'  Square matrix Alternatively,
The default value is If 
 'on' to sample with replacement or 'off' to
sample without replacement. If you sample without replacement, you
need to set 'FBoot' to a value less than one. Default
is 'on' . 
'OOBPred'  'on' to store info on what observations
are out of bag for each tree. This info can be used by oobPredict to
compute the predicted class probabilities for each tree in the ensemble.
Default is 'off' . 
'OOBVarImp'  'on' to store outofbag estimates of feature
importance in the ensemble. Default is 'off' . Specifying 'on' also
sets the 'OOBPred' value to 'on' . 
'Method'  Either 'classification' or 'regression' .
Regression requires a numeric Y . 
'NVarToSample'  Number of variables to select at random for each decision split.
Default is the square root of the number of variables for classification
and one third of the number of variables for regression. Valid values
are 'all' or a positive integer. Setting this argument
to any valid value but 'all' invokes Breiman's
'random forest' algorithm. 
'NPrint'  Number of training cycles (grown trees) after which TreeBagger displays
a diagnostic message showing training progress. Default is no diagnostic
messages. 
'MinLeaf'  Minimum number of observations per tree leaf. Default is 1 for classification and 5 for regression. 
'Options'  A structure that specifies options that govern the computation
when growing the ensemble of decision trees. One option requests that
the computation of decision trees on multiple bootstrap replicates
uses multiple processors, if the Parallel Computing Toolbox™ is
available. Two options specify the random number streams to use in
selecting bootstrap replicates. You can create this argument with
a call to statset . You can retrieve values of the
individual fields with a call to statget .
Applicable statset parameters
are:

'Prior'  Prior probabilities for each class. Specify as one of:
If you set values for both If 
'CategoricalPredictors'  Categorical predictors list, specified as the commaseparated
pair consisting of

In addition to the optional
arguments above, this method accepts all optional fitctree
and fitrtree
arguments
with the exception of 'minparent'
. Refer to the
documentation for fitctree
and fitrtree
for more detail.
Load Fisher's iris data set.
load fisheriris
Train a bagged ensemble of classification trees using the data and specifying 50
weak learners. Store which observations are out of bag for each tree.
rng(1); % For reproducibility BaggedEnsemble = TreeBagger(50,meas,species,'OOBPred','On')
BaggedEnsemble = TreeBagger Ensemble with 50 bagged decision trees: Training X: [150x4] Training Y: [150x1] Method: classification Nvars: 4 NVarToSample: 2 MinLeaf: 1 FBoot: 1 SampleWithReplacement: 1 ComputeOOBPrediction: 1 ComputeOOBVarImp: 0 Proximity: [] ClassNames: 'setosa' 'versicolor' 'virginica'
BaggedEnsemble
is a TreeBagger
ensemble. BaggedEnsemble.OOBIndices
stores the outofbag indices as a matrix of logical values.
Plot the outofbag error over the number of grown classification trees.
oobErrorBaggedEnsemble = oobError(BaggedEnsemble); plot(oobErrorBaggedEnsemble) xlabel 'Number of grown trees'; ylabel 'Outofbag classification error';
The outofbag error decreases with the number of grown trees.
To label outofbag observations, pass BaggedEnsemble
to oobPredict
.
TreeBagger
generates inbag samples by oversampling
classes with large misclassification costs and undersampling classes
with small misclassification costs. Consequently, outofbag samples
have fewer observations from classes with large misclassification
costs and more observations from classes with small misclassification
costs. If you train a classification ensemble using a small data set
and a highly skewed cost matrix, then the number of outofbag observations
per class might be very low. Therefore, the estimated outofbag error
might have a large variance and might be difficult to interpret. The
same phenomenon can occur for classes with large prior probabilities.
Avoid large estimated outofbag error variances by setting a more balanced misclassification cost matrix or a less skewed prior probability vector.