Main Content

Fit a support vector machine regression model

`fitrsvm`

trains or cross-validates a support vector machine (SVM) regression model on a low- through moderate-dimensional predictor data set. `fitrsvm`

supports mapping the predictor data using kernel functions, and supports SMO, ISDA, or *L*1 soft-margin minimization via quadratic programming for objective-function minimization.

To train a linear SVM regression model on a high-dimensional data set, that is, data sets that include many predictor variables, use `fitrlinear`

instead.

To train an SVM model for binary classification, see `fitcsvm`

for low- through moderate-dimensional predictor data sets, or `fitclinear`

for high-dimensional data sets.

returns a full, trained support vector machine (SVM) regression model `Mdl`

= fitrsvm(`Tbl`

,`ResponseVarName`

)`Mdl`

trained using the predictors values in the table `Tbl`

and the response values in `Tbl.ResponseVarName`

.

returns an SVM regression model with additional options specified by one or more name-value pair arguments, using any of the previous syntaxes. For example, you can specify the kernel function or train a cross-validated model.`Mdl`

= fitrsvm(___,`Name,Value`

)

`fitrsvm`

supports low- through moderate-dimensional data sets. For high-dimensional data set, use `fitrlinear`

instead.

Unless your data set is large, always try to standardize the predictors (see

`Standardize`

). Standardization makes predictors insensitive to the scales on which they are measured.It is good practice to cross-validate using the

`KFold`

name-value pair argument. The cross-validation results determine how well the SVM model generalizes.Sparsity in support vectors is a desirable property of an SVM model. To decrease the number of support vectors, set the

`BoxConstraint`

name-value pair argument to a large value. This action also increases the training time.For optimal training time, set

`CacheSize`

as high as the memory limit on your computer allows.If you expect many fewer support vectors than observations in the training set, then you can significantly speed up convergence by shrinking the active-set using the name-value pair argument

`'ShrinkagePeriod'`

. It is good practice to use`'ShrinkagePeriod',1000`

.Duplicate observations that are far from the regression line do not affect convergence. However, just a few duplicate observations that occur near the regression line can slow down convergence considerably. To speed up convergence, specify

`'RemoveDuplicates',true`

if:Your data set contains many duplicate observations.

You suspect that a few duplicate observations can fall near the regression line.

However, to maintain the original data set during training,

`fitrsvm`

must temporarily store separate data sets: the original and one without the duplicate observations. Therefore, if you specify`true`

for data sets containing few duplicates, then`fitrsvm`

consumes close to double the memory of the original data.After training a model, you can generate C/C++ code that predicts responses for new data. Generating C/C++ code requires MATLAB Coder™. For details, see Introduction to Code Generation.

For the mathematical formulation of linear and nonlinear SVM regression problems and the solver algorithms, see Understanding Support Vector Machine Regression.

`NaN`

,`<undefined>`

, empty character vector (`''`

), empty string (`""`

), and`<missing>`

values indicate missing data values.`fitrsvm`

removes entire rows of data corresponding to a missing response. When normalizing weights,`fitrsvm`

ignores any weight corresponding to an observation with at least one missing predictor. Consequently, observation box constraints might not equal`BoxConstraint`

.`fitrsvm`

removes observations that have zero weight.If you set

`'Standardize',true`

and`'Weights'`

, then`fitrsvm`

standardizes the predictors using their corresponding weighted means and weighted standard deviations. That is,`fitrsvm`

standardizes predictor*j*(*x*) using_{j}$${x}_{j}^{\ast}=\frac{{x}_{j}-{\mu}_{j}^{\ast}}{{\sigma}_{j}^{\ast}}.$$

$${\mu}_{j}^{\ast}=\frac{1}{{\displaystyle \sum _{k}{w}_{k}}}{\displaystyle \sum _{k}{w}_{k}{x}_{jk}}.$$

*x*is observation_{jk}*k*(row) of predictor*j*(column).$${\left({\sigma}_{j}^{\ast}\right)}^{2}=\frac{{v}_{1}}{{v}_{1}^{2}-{v}_{2}}{\displaystyle \sum _{k}{w}_{k}{\left({x}_{jk}-{\mu}_{j}^{\ast}\right)}^{2}}.$$

$${v}_{1}={\displaystyle \sum _{j}{w}_{j}}.$$

$${v}_{2}={\displaystyle \sum _{j}{\left({w}_{j}\right)}^{2}}.$$

If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.

The

`PredictorNames`

property stores one element for each of the original predictor variable names. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then`PredictorNames`

is a 1-by-3 cell array of character vectors containing the original names of the predictor variables.The

`ExpandedPredictorNames`

property stores one element for each of the predictor variables, including the dummy variables. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then`ExpandedPredictorNames`

is a 1-by-5 cell array of character vectors containing the names of the predictor variables and the new dummy variables.Similarly, the

`Beta`

property stores one beta coefficient for each predictor, including the dummy variables.The

`SupportVectors`

property stores the predictor values for the support vectors, including the dummy variables. For example, assume that there are*m*support vectors and three predictors, one of which is a categorical variable with three levels. Then`SupportVectors`

is an*m*-by-5 matrix.The

`X`

property stores the training data as originally input. It does not include the dummy variables. When the input is a table,`X`

contains only the columns used as predictors.

For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.

For a variable having

*k*ordered levels, the software creates*k*– 1 dummy variables. The*j*th dummy variable is -1 for levels up to*j*, and +1 for levels*j*+ 1 through*k*.The names of the dummy variables stored in the

`ExpandedPredictorNames`

property indicate the first level with the value +1. The software stores*k*– 1 additional predictor names for the dummy variables, including the names of levels 2, 3, ...,*k*.

All solvers implement

*L*1 soft-margin minimization.Let

`p`

be the proportion of outliers that you expect in the training data. If you set`'OutlierFraction',p`

, then the software implements*robust learning*. In other words, the software attempts to remove 100`p`

% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.

[1] Clark, D., Z. Schreter, A. Adams. "A Quantitative Comparison of Dystal and Backpropagation." submitted to the Australian Conference on Neural Networks, 1996.

[2] Fan, R.-E., P.-H. Chen, and C.-J. Lin. “Working set selection using second order information for training support vector machines.”
*Journal of Machine Learning Research*, Vol 6, 2005, pp. 1889–1918.

[3] Kecman V., T. -M. Huang, and M. Vogt. “Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance.” In *Support Vector Machines: Theory and Applications*. Edited by Lipo Wang, 255–274. Berlin: Springer-Verlag, 2005.

[4] Lichman, M. *UCI Machine Learning Repository*, [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

[5] Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. "The Population Biology of Abalone (*Haliotis* species) in Tasmania. I. Blacklip Abalone (*H. rubra*) from the North Coast and Islands of Bass Strait." Sea Fisheries Division, Technical Report No. 48, 1994.

[6] Waugh, S. "Extending and Benchmarking Cascade-Correlation: Extensions to the Cascade-Correlation Architecture and Benchmarking of Feed-forward Supervised Artificial Neural Networks." *University of Tasmania Department of Computer Science thesis*, 1995.

`CompactRegressionSVM`

| `predict`

| `RegressionPartitionedSVM`

| `RegressionSVM`