Main Content

RegressionPartitionedEnsemble

Cross-validated regression ensemble

Description

`RegressionPartitionedEnsemble` is a set of regression ensembles trained on cross-validated folds. Estimate the quality of classification by cross validation using one or more “kfold” methods: `kfoldfun`, `kfoldLoss`, or `kfoldPredict`. Every “kfold” method uses models trained on in-fold observations to predict response for out-of-fold observations. For example, suppose you cross validate using five folds. In this case, every training fold contains roughly 4/5 of the data and every test fold contains roughly 1/5 of the data. The first model stored in `Trained{1}` was trained on `X` and `Y` with the first 1/5 excluded, the second model stored in `Trained{2}` was trained on `X` and `Y` with the second 1/5 excluded, and so on. When you call `kfoldPredict`, it computes predictions for the first 1/5 of the data using the first model, for the second 1/5 of data using the second model and so on. In short, response for every observation is computed by `kfoldPredict` using the model trained without this observation.

Creation

Syntax

``````cvens = crossval(ens)``````
``cvens = fitrensemble(X,Y,Name,Value)``

Description

example

``````cvens = crossval(ens)``` creates a cross-validated ensemble from `ens`, a regression ensemble. For syntax details, see the `crossval` reference page.```
````cvens = fitrensemble(X,Y,Name,Value)` creates a cross-validated ensemble when `Name` is one of `'crossval'`, `'kfold'`, `'holdout'`, `'leaveout'`, or `'cvpartition'`. For syntax details, see the `fitrensemble` function reference page.```

Input Arguments

expand all

Regression ensemble, specified as the output of `fitrensemble`.

Properties

expand all

This property is read-only.

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the `'NumBins'` name-value argument as a positive integer scalar when training a model with tree learners. The `BinEdges` property is empty if the `'NumBins'` value is empty (default).

You can reproduce the binned predictor data `Xbinned` by using the `BinEdges` property of the trained model `mdl`.

```X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the `discretize` function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end```
`Xbinned` contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. `Xbinned` values are 0 for categorical predictors. If `X` contains `NaN`s, then the corresponding `Xbinned` values are `NaN`s.

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. `CategoricalPredictors` contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and `p`, where `p` is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty (`[]`).

Data Types: `single` | `double`

Name of the cross-validated model, returned as a character vector.

Data Types: `char`

Number of folds in the cross-validated ensemble, returned as a positive integer.

Data Types: `double`

Parameters of the cross-validated ensemble, returned as an object.

This property is read-only.

Number of observations in the training data, returned as a positive integer. `NumObservations` can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: `double`

Number of weak learners used in training each fold of the ensemble, returned as a positive integer.

Data Types: `double`

Partition used in cross-validation, returned as a `CVPartition` object.

Predictor names in order of their appearance in the predictor data `X`, specified as a cell array of character vectors. The length of `PredictorNames` is equal to the number of columns in `X`.

Data Types: `cell`

Response variable name, specified as a character vector.

Data Types: `char`

Function for transforming the raw response values (mean squared error), specified as a function handle or `'none'`. The default `'none'` means no transformation; equivalently, `'none'` means `@(x)x`. A function handle must accept a matrix of response values and return a matrix of the same size.

Add or change a `ResponseTransform` function using dot notation:

`tree.ResponseTransform = @function`

Data Types: `char` | `function_handle`

The trained learners, returned as a cell array of full ensembles trained on cross-validation folds. Every ensemble is full, meaning it contains its training data and weights.

Data Types: `cell`

The trained learners, returned as a cell array of compact ensembles trained on cross-validation folds.

Data Types: `cell`

This property is read-only.

Scaled weights in the ensemble, returned as a numeric vector. `W` has length `n`, the number of rows in the training data. The sum of the elements of `W` is `1`.

Data Types: `double`

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of `X` represents one variable (predictor), and each row represents one observation.

Data Types: `double` | `table`

This property is read-only.

Row classifications corresponding to the rows of `X`, returned as a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. Each row of `Y` represents the classification of the corresponding row of `X`.

Data Types: `single` | `double` | `logical` | `char` | `string` | `cell` | `categorical`

Object Functions

 `gather` Gather properties of Statistics and Machine Learning Toolbox object from GPU `kfoldLoss` Loss for cross-validated partitioned regression model `kfoldPredict` Predict responses for observations in cross-validated regression model `kfoldfun` Cross-validate function for regression `resume` Resume training of cross-validated regression ensemble model

Examples

collapse all

Construct a partitioned regression ensemble, and examine the cross-validation losses for the folds.

Load the `carsmall` data set.

`load carsmall;`

Create a subset of variables.

```XX = [Cylinders Displacement Horsepower Weight]; YY = MPG;```

Construct the ensemble model.

`rens = fitrensemble(XX,YY);`

Create a cross-validated ensemble from `rens`.

```rng(10,'twister') % For reproducibility cvrens = crossval(rens);```

Examine the cross-validation losses.

`L = kfoldLoss(cvrens,'mode','individual')`
```L = 10×1 21.4489 48.4388 28.2560 17.5354 29.9441 49.5254 51.2372 31.0152 31.6388 8.9607 ```

L is a vector containing the cross-validation loss for each trained learner in the ensemble.

Version History

Introduced in R2011a