# loss

Regression loss for Gaussian kernel regression model

## Syntax

``L = loss(Mdl,X,Y)``
``L = loss(Mdl,X,Y,Name,Value)``

## Description

example

````L = loss(Mdl,X,Y)` returns the mean squared error (MSE) for the Gaussian kernel regression model `Mdl` using the predictor data in `X` and the corresponding responses in `Y`.```

example

````L = loss(Mdl,X,Y,Name,Value)` uses additional options specified by one or more name-value pair arguments. For example, you can specify a regression loss function and observation weights. Then, `loss` returns the weighted regression loss using the specified loss function.```

## Examples

collapse all

Train a Gaussian kernel regression model for a tall array, then calculate the resubstitution mean squared error and epsilon-insensitive error.

When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the `mapreducer` function.

Create a datastore that references the folder location with the data. The data can be contained in a single file, a collection of files, or an entire folder. Treat `'NA'` values as missing data so that `datastore` replaces them with `NaN` values. Select a subset of the variables to use. Create a tall table on top of the datastore.

```varnames = {'ArrTime','DepTime','ActualElapsedTime'}; ds = datastore('airlinesmall.csv','TreatAsMissing','NA',... 'SelectedVariableNames',varnames); t = tall(ds);```
```Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 4). ```

Specify `DepTime` and `ArrTime` as the predictor variables (`X`) and `ActualElapsedTime` as the response variable (`Y`). Select the observations for which `ArrTime` is later than `DepTime`.

```daytime = t.ArrTime>t.DepTime; Y = t.ActualElapsedTime(daytime); % Response data X = t{daytime,{'DepTime' 'ArrTime'}}; % Predictor data```

Standardize the predictor variables.

`Z = zscore(X); % Standardize the data`

Train a default Gaussian kernel regression model with the standardized predictors. Set `'Verbose',0` to suppress diagnostic messages.

`[Mdl,FitInfo] = fitrkernel(Z,Y,'Verbose',0)`
```Mdl = RegressionKernel PredictorNames: {'x1' 'x2'} ResponseName: 'Y' Learner: 'svm' NumExpansionDimensions: 64 KernelScale: 1 Lambda: 8.5385e-06 BoxConstraint: 1 Epsilon: 5.9303 Properties, Methods ```
```FitInfo = struct with fields: Solver: 'LBFGS-tall' LossFunction: 'epsiloninsensitive' Lambda: 8.5385e-06 BetaTolerance: 1.0000e-03 GradientTolerance: 1.0000e-05 ObjectiveValue: 30.7814 GradientMagnitude: 0.0191 RelativeChangeInBeta: 0.0228 FitTime: 103.3689 History: [] ```

`Mdl` is a trained `RegressionKernel` model, and the structure array `FitInfo` contains optimization details.

Determine how well the trained model generalizes to new predictor values by estimating the resubstitution mean squared error and epsilon-insensitive error.

`lossMSE = loss(Mdl,Z,Y) % Resubstitution mean squared error`
```lossMSE = MxNx... tall array ? ? ? ... ? ? ? ... ? ? ? ... : : : : : : ```
`lossEI = loss(Mdl,Z,Y,'LossFun','epsiloninsensitive') % Resubstitution epsilon-insensitive error`
```lossEI = MxNx... tall array ? ? ? ... ? ? ? ... ? ? ? ... : : : : : : ```

Evaluate the tall arrays and bring the results into memory by using `gather`.

`[lossMSE,lossEI] = gather(lossMSE,lossEI)`
```Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 3 sec Evaluation completed in 3.2 sec ```
```lossMSE = 2.8851e+03 ```
```lossEI = 28.0050 ```

Specify a custom regression loss (Huber loss) for a Gaussian kernel regression model.

Load the `carbig` data set.

`load carbig`

Specify the predictor variables (`X`) and the response variable (`Y`).

```X = [Weight,Cylinders,Horsepower,Model_Year]; Y = MPG;```

Delete rows of `X` and `Y` where either array has `NaN` values. Removing rows with `NaN` values before passing data to `fitrkernel` can speed up training and reduce memory usage.

```R = rmmissing([X Y]); X = R(:,1:4); Y = R(:,end); ```

Reserve 10% of the observations as a holdout sample. Extract the training and test indices from the partition definition.

```rng(10) % For reproducibility N = length(Y); cvp = cvpartition(N,'Holdout',0.1); idxTrn = training(cvp); % Training set indices idxTest = test(cvp); % Test set indices```

Standardize the training data and train the regression kernel model.

```Xtrain = X(idxTrn,:); Ytrain = Y(idxTrn); [Ztrain,tr_mu,tr_sigma] = zscore(Xtrain); % Standardize the training data tr_sigma(tr_sigma==0) = 1; Mdl = fitrkernel(Ztrain,Ytrain)```
```Mdl = RegressionKernel ResponseName: 'Y' Learner: 'svm' NumExpansionDimensions: 128 KernelScale: 1 Lambda: 0.0028 BoxConstraint: 1 Epsilon: 0.8617 Properties, Methods ```

`Mdl` is a `RegressionKernel` model.

Create an anonymous function that measures Huber loss $\left(\delta =1\right)$, that is,

`$L=\frac{1}{\sum {w}_{j}}\sum _{j=1}^{n}{w}_{j}{\ell }_{j},$`

where

`$\begin{array}{l}\\ {\ell }_{j}=\left\{\begin{array}{c}0.5{\underset{}{\overset{ˆ}{{e}_{j}}}}^{2};\\ |\underset{}{\overset{ˆ}{{e}_{j}}}|-0.5;\phantom{\rule{0.2777777777777778em}{0ex}}\phantom{\rule{0.2777777777777778em}{0ex}}\end{array}\begin{array}{c}\phantom{\rule{0.2777777777777778em}{0ex}}\phantom{\rule{0.2777777777777778em}{0ex}}|\underset{}{\overset{ˆ}{{e}_{j}}}|\le 1\\ \phantom{\rule{0.2777777777777778em}{0ex}}\phantom{\rule{0.2777777777777778em}{0ex}}|\underset{}{\overset{ˆ}{{e}_{j}}}|>1\end{array}.\end{array}$`

$\underset{}{\overset{ˆ}{{e}_{j}}}$ is the residual for observation j. Custom loss functions must be written in a particular form. For rules on writing a custom loss function, see the `'LossFun'` name-value pair argument.

```huberloss = @(Y,Yhat,W)sum(W.*((0.5*(abs(Y-Yhat)<=1).*(Y-Yhat).^2) + ... ((abs(Y-Yhat)>1).*abs(Y-Yhat)-0.5)))/sum(W);```

Estimate the training set regression loss using the Huber loss function.

`eTrain = loss(Mdl,Ztrain,Ytrain,'LossFun',huberloss)`
```eTrain = 1.7210 ```

Standardize the test data using the same mean and standard deviation of the training data columns. Estimate the test set regression loss using the Huber loss function.

```Xtest = X(idxTest,:); Ztest = (Xtest-tr_mu)./tr_sigma; % Standardize the test data Ytest = Y(idxTest); eTest = loss(Mdl,Ztest,Ytest,'LossFun',huberloss)```
```eTest = 1.3062 ```

## Input Arguments

collapse all

Kernel regression model, specified as a `RegressionKernel` model object. You can create a `RegressionKernel` model object using `fitrkernel`.

Predictor data, specified as an n-by-p numeric matrix, where n is the number of observations and p is the number of predictors. p must be equal to the number of predictors used to train `Mdl`.

Data Types: `single` | `double`

Response data, specified as an n-dimensional numeric vector. The length of `Y` and the number of observations in `X` must be equal.

Data Types: `single` | `double`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: ```L = loss(Mdl,X,Y,'LossFun','epsiloninsensitive','Weights',weights)``` returns the weighted regression loss using the epsilon-insensitive loss function.

Loss function, specified as the comma-separated pair consisting of `'LossFun'` and a built-in loss function name or a function handle.

• The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar. Also, in the table, $f\left(x\right)=T\left(x\right)\beta +b.$

• x is an observation (row vector) from p predictor variables.

• $T\left(·\right)$ is a transformation of an observation (row vector) for feature expansion. T(x) maps x in ${ℝ}^{p}$ to a high-dimensional space (${ℝ}^{m}$).

• β is a vector of m coefficients.

• b is the scalar bias.

ValueDescription
`'epsiloninsensitive'`Epsilon-insensitive loss: $\ell \left[y,f\left(x\right)\right]=\mathrm{max}\left[0,|y-f\left(x\right)|-\epsilon \right]$
`'mse'`MSE: $\ell \left[y,f\left(x\right)\right]={\left[y-f\left(x\right)\right]}^{2}$

`'epsiloninsensitive'` is appropriate for SVM learners only.

• Specify your own function by using function handle notation.

Let `n` be the number of observations in `X`. Your function must have this signature:

``lossvalue = lossfun(Y,Yhat,W)``

• The output argument `lossvalue` is a scalar.

• You choose the function name (`lossfun`).

• `Y` is an n-dimensional vector of observed responses. `loss` passes the input argument `Y` in for `Y`.

• `Yhat` is an n-dimensional vector of predicted responses, which is similar to the output of `predict`.

• `W` is an `n`-by-1 numeric vector of observation weights.

Specify your function using `'LossFun',@lossfun`.

Data Types: `char` | `string` | `function_handle`

Observation weights, specified as the comma-separated pair consisting of `'Weights'` and a numeric vector of positive values. `loss` weighs the observations in `X` with the corresponding values in `Weights`. The size of `Weights` must equal n, the number of observations (rows in `X`). If you supply the observation weights, `loss` computes the weighted regression loss, that is, the Weighted Mean Squared Error or Epsilon-Insensitive Loss Function.

`loss` normalizes `Weights` to sum to 1.

Data Types: `double` | `single`

## Output Arguments

collapse all

Regression loss, returned as a numeric scalar. The interpretation of `L` depends on `Weights` and `LossFun`. For example, if you use the default observation weights and specify `'epsiloninsensitive'` as the loss function, then `L` is the epsilon-insensitive loss.

collapse all

### Weighted Mean Squared Error

The weighted mean squared error is calculated as follows:

`$\text{mse}=\frac{\sum _{j=1}^{n}{w}_{j}{\left(f\left({x}_{j}\right)-{y}_{j}\right)}^{2}}{\sum _{j=1}^{n}{w}_{j}}\text{\hspace{0.17em}},$`

where:

• n is the number of observations.

• xj is the jth observation (row of predictor data).

• yj is the observed response to xj.

• f(xj) is the response prediction of the Gaussian kernel regression model `Mdl` to xj.

• w is the vector of observation weights.

Each observation weight in w is equal to `ones(n,1)/n` by default. You can specify different values for the observation weights by using the `'Weights'` name-value pair argument. `loss` normalizes `Weights` to sum to 1.

### Epsilon-Insensitive Loss Function

The epsilon-insensitive loss function ignores errors that are within the distance epsilon (ε) of the function value. The function is formally described as:

`$Los{s}_{\epsilon }=\left\{\begin{array}{c}0\text{\hspace{0.17em}},\text{\hspace{0.17em}}if\text{\hspace{0.17em}}|y-f\left(x\right)|\le \epsilon \\ |y-f\left(x\right)|-\epsilon \text{\hspace{0.17em}},\text{\hspace{0.17em}}otherwise.\end{array}$`

The mean epsilon-insensitive loss is calculated as follows:

`$Loss=\frac{\sum _{j=1}^{n}{w}_{j}\mathrm{max}\left(0,|{y}_{j}-f\left({x}_{j}\right)|-\epsilon \right)}{\sum _{j=1}^{n}{w}_{j}}\text{\hspace{0.17em}},$`

where:

• n is the number of observations.

• xj is the jth observation (row of predictor data).

• yj is the observed response to xj.

• f(xj) is the response prediction of the Gaussian kernel regression model `Mdl` to xj.

• w is the vector of observation weights.

Each observation weight in w is equal to `ones(n,1)/n` by default. You can specify different values for the observation weights by using the `'Weights'` name-value pair argument. `loss` normalizes `Weights` to sum to 1.