This example shows how to train multiple models in Regression Learner, and determine the best-performing models based on their validation metrics. Check the test metrics for the best-performing models trained on the full data set, including training and validation data.
In the MATLAB® Command Window, load the
carbig data set, and
create a table containing most of the variables. Separate the table into
training and test sets.
load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG); rng('default') % For reproducibility of the data split n = length(MPG); partition = cvpartition(n,'Holdout',0.15); idxTrain = training(partition); % Indices for the training set cartableTrain = cartable(idxTrain,:); cartableTest = cartable(~idxTrain,:);
Open Regression Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.
On the Regression Learner tab, in the File section, click New Session and select From Workspace.
In the New Session from Workspace dialog box, select the
cartableTrain table from the Data Set
As shown in the dialog box, the app selects the response and predictor
variables. The default response variable is
protect against overfitting, the default validation option is 5-fold
cross-validation. For this example, do not change the default settings.
To accept the default options and continue, click Start Session.
Train all preset models. On the Regression Learner tab, in the Model Type section, click the arrow to open the gallery. In the Get Started group, click All. In the Training section, click Train. The app trains one of each preset model type and displays the models in the Models pane.
If you have Parallel Computing Toolbox™, you can train all the models (All) simultaneously by selecting the Use Parallel button in the Training section before clicking Train. After you click Train, the Opening Parallel Pool dialog box opens and remains open while the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, the app trains the models simultaneously.
Sort the trained models based on the validation root mean squared error
(RMSE). In the Models pane, open the Sort
by list and select
In the Models pane, click the star icons next to the three models with the lowest validation RMSE. The app highlights the lowest validation RMSE by outlining it in a box. In this example, the trained Rational Quadratic GPR model has the lowest validation RMSE.
The app displays a response plot of the car data. Blue points are true values, and yellow points are predicted values. The Models pane on the left shows the validation RMSE for each model.
Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.
Check the test set performance of the best-performing models. Begin by importing test data into the app.
On the Regression Learner tab, in the Testing section, click Test Data and select From Workspace.
In the Import Test Data dialog box, select the
cartableTest table from the Test Data
Set Variable list.
As shown in the dialog box, the app identifies the response and predictor variables.
Compute the RMSE of the best preset models on the
cartableTest data. For convenience, compute the test set
RMSE for all models at once. On the Regression Learner tab,
in the Testing section, click Test
All and select Test All. The app computes
the test set performance of the model trained on the full data set, including
training and validation data.
Sort the models based on the test set RMSE. In the Models
pane, open the Sort by list and select
(Test). The app still outlines the metric for the model with
the lowest validation RMSE, despite displaying the test RMSE.
Visually check the test set performance of the models. On the Regression Learner tab, in the Plots section, click Predicted vs. Actual and select Test Data. You can toggle between models to compare their performance.
In this example, the trained Medium Gaussian SVM performs better on the test set data than the other two starred models.
Compare the validation and test RMSE for the trained Medium Gaussian SVM model. In the Current Model Summary pane, compare the RMSE (Validation) value under Training Results to the RMSE (Test) value under Test Results. In this example, the two values are close, which indicates that the validation RMSE is a good estimate of the test RMSE for this model.