# lasso

Regularized least-squares regression using lasso or elastic net algorithms

## Syntax

```B = lasso(X,Y)[B,FitInfo] = lasso(X,Y)[B,FitInfo] = lasso(X,Y,Name,Value)```

## Description

`B = lasso(X,Y)` returns fitted least-squares regression coefficients for a set of regularization coefficients `Lambda`.

```[B,FitInfo] = lasso(X,Y)``` returns a structure containing information about the fits.

```[B,FitInfo] = lasso(X,Y,Name,Value)``` fits regularized regressions with additional options specified by one or more `Name,Value` pair arguments.

## Input Arguments

 `X` Numeric matrix with `n` rows and `p` columns. Each row represents one observation, and each column represents one predictor (variable). `Y` Numeric vector of length `n`, where `n` is the number of rows of `X`. `Y(i)` is the response to row `i` of `X`.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

 `'Alpha'` Scalar value from `0` to `1` (excluding `0`) representing the weight of lasso (L1) versus ridge (L2) optimization. `Alpha = 1` represents lasso regression, `Alpha` close to `0` approaches ridge regression, and other values represent elastic net optimization. See Definitions.Default: `1` `'CV'` Method `lasso` uses to estimate mean squared error: `K`, a positive integer — `lasso` uses `K`-fold cross validation.`cvp`, a `cvpartition` object — `lasso` uses the cross-validation method expressed in `cvp`. You cannot use a `'leaveout'` partition with `lasso`.`'resubstitution'` — `lasso` uses `X` and `Y` to fit the model and to estimate the mean squared error, without cross validation. Default: `'resubstitution'` `'DFmax'` Maximum number of nonzero coefficients in the model. `lasso` returns results only for `Lambda` values that satisfy this criterion. Default: `Inf` `'Lambda'` Vector of nonnegative `Lambda` values. See Definitions. If you do not supply `Lambda`, `lasso` calculates the largest value of `Lambda` that gives a nonnull model. In this case, `LambdaRatio` gives the ratio of the smallest to the largest value of the sequence, and `NumLambda` gives the length of the vector.If you supply `Lambda`, `lasso` ignores `LambdaRatio` and `NumLambda`. Default: Geometric sequence of `NumLambda` values, the largest just sufficient to produce `B` = `0` `'LambdaRatio'` Positive scalar, the ratio of the smallest to the largest `Lambda` value when you do not set `Lambda`. If you set `LambdaRatio = 0`, `lasso` generates a default sequence of `Lambda` values, and replaces the smallest one with `0`. Default: `1e-4` `'MCReps'` Positive integer, the number of Monte Carlo repetitions for cross validation. If `CV` is `'resubstitution'` or a `cvpartition` of type `'resubstitution'`, `MCReps` must be `1`.If `CV` is a `cvpartition` of type `'holdout'`, `MCReps` must be greater than `1`. Default: `1` `'NumLambda'` Positive integer, the number of `Lambda` values `lasso` uses when you do not set `Lambda`. `lasso` can return fewer than `NumLambda` fits if the if the residual error of the fits drops below a threshold fraction of the variance of `Y`. Default: `100` `'Options'` Structure that specifies whether to cross validate in parallel, and specifies the random stream or streams. Create the `Options` structure with `statset`. Option fields: `UseParallel` — Set to `true` to compute in parallel. Default is `false`.`UseSubstreams` — Set to `true` to compute in parallel in a reproducible fashion. To compute reproducibly, set `Streams` to a type allowing substreams: `'mlfg6331_64'` or `'mrg32k3a'`. Default is `false`. `Streams` — A `RandStream` object or cell array consisting of one such object. If you do not specify `Streams`, `lasso` uses the default stream. `'PredictorNames'` Cell array of strings representing names of the predictor variables, in the order in which they appear in `X`. Default: `{}` `'RelTol'` Convergence threshold for the coordinate descent algorithm (see Friedman, Tibshirani, and Hastie [3]). The algorithm terminates when successive estimates of the coefficient vector differ in the L2 norm by a relative amount less than `RelTol`. Default: `1e-4` `'Standardize'` Boolean value specifying whether `lasso` scales `X` before fitting the models. This affects whether the regularization is applied to the coefficients on the standardized scale or original scale. The results are always presented on the original data scale. `X` and `Y` are always centered. Default: `true` `'Weights'` Observation weights, a nonnegative vector of length `n`, where `n` is the number of rows of `X`. `lasso` scales `Weights` to sum to `1`. Default: `1/n * ones(n,1)`

## Output Arguments

`B`

Fitted coefficients, a `p`-by-`L` matrix, where `p` is the number of predictors (columns) in `X`, and `L` is the number of `Lambda` values.

`FitInfo`

Structure containing information about the model fits.

Field in FitInfoDescription
`Intercept`Intercept term β0 for each linear model, a `1`-by-`L` vector
`Lambda`Lambda parameters in ascending order, a `1`-by-`L` vector
`Alpha`Value of `Alpha` parameter, a scalar
`DF`Number of nonzero coefficients in `B` for each value of `Lambda`, a `1`-by-`L` vector
`MSE`Mean squared error (MSE), a `1`-by-`L` vector

If you set the `CV` name-value pair to cross validate, the `FitInfo` structure contains additional fields.

Field in FitInfoDescription
`SE`The standard error of MSE for each `Lambda`, as calculated during cross validation, a `1`-by-`L` vector
`LambdaMinMSE`The `Lambda` value with minimum MSE, a scalar
`Lambda1SE`The largest `Lambda` such that MSE is within one standard error of the minimum, a scalar
`IndexMinMSE`The index of `Lambda` with value `LambdaMinMSE`, a scalar
`Index1SE`The index of `Lambda` with value `Lambda1SE`, a scalar

## Examples

collapse all

### Remove Redundant Predictors

Construct a data set with redundant predictors, and identify those predictors using cross-validated `lasso`.

Create a matrix `X` of 100 five-dimensional normal variables and a response vector `Y` from just two components of `X`, with small added noise.

```X = randn(100,5); r = [0;2;0;-3;0]; % only two nonzero coefficients Y = X*r + randn(100,1)*.1; % small added noise```

Construct the default lasso fit.

`B = lasso(X,Y);`

Find the coefficient vector for the 25th value in `B`.

`B(:,25)`
```ans = 0 1.6093 0 -2.5865 0```

`lasso` identifies and removes the redundant predictors.

### Plot a Regularized Fit with Cross Validation

Visually examine the cross-validated error of various levels of regularization.

Load the `acetylene` data and prepare the data with interactions for fitting.

```load acetylene Xs = [x1 x2 x3]; X = x2fx(Xs,'interaction'); X(:,1) = []; % No constant term```

Construct the lasso fit using ten-fold cross validation. Include the `FitInfo` output so you can plot the result.

`[B FitInfo] = lasso(X,y,'CV',10);`

Plot the cross-validated fits.

`lassoPlot(B,FitInfo,'PlotType','CV');`

collapse all

### Lasso

For a given value of λ, a nonnegative parameter, `lasso` solves the problem

$\underset{{\beta }_{0},\beta }{\mathrm{min}}\left(\frac{1}{2N}\sum _{i=1}^{N}{\left({y}_{i}-{\beta }_{0}-{x}_{i}^{T}\beta \right)}^{2}+\lambda \sum _{j=1}^{p}|{\beta }_{j}|\right),$

where

• N is the number of observations.

• yi is the response at observation i.

• xi is data, a vector of p values at observation i.

• λ is a nonnegative regularization parameter corresponding to one value of `Lambda`.

• The parameters β0 and β are scalar and p-vector respectively.

As λ increases, the number of nonzero components of β decreases.

The lasso problem involves the L1 norm of β, as contrasted with the elastic net algorithm.

### Elastic Net

For an α strictly between 0 and 1, and a nonnegative λ, elastic net solves the problem

$\underset{{\beta }_{0},\beta }{\mathrm{min}}\left(\frac{1}{2N}\sum _{i=1}^{N}{\left({y}_{i}-{\beta }_{0}-{x}_{i}^{T}\beta \right)}^{2}+\lambda {P}_{\alpha }\left(\beta \right)\right),$

where

${P}_{\alpha }\left(\beta \right)=\frac{\left(1-\alpha \right)}{2}{‖\beta ‖}_{2}^{2}+\alpha {‖\beta ‖}_{1}=\sum _{j=1}^{p}\left(\frac{\left(1-\alpha \right)}{2}{\beta }_{j}^{2}+\alpha |{\beta }_{j}|\right).$

Elastic net is the same as lasso when α = 1. As α shrinks toward 0, elastic net approaches `ridge` regression. For other values of α, the penalty term Pα(β) interpolates between the L1 norm of β and the squared L2 norm of β.

## References

[1] Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Vol 58, No. 1, pp. 267–288, 1996.

[2] Zou, H. and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301–320, 2005.

[3] Friedman, J., R. Tibshirani, and T. Hastie. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, Vol 33, No. 1, 2010. `http://www.jstatsoft.org/v33/i01`

[4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd edition. Springer, New York, 2008.