Add terms to linear regression model

## Syntax

``NewMdl = addTerms(mdl,terms)``

## Description

example

````NewMdl = addTerms(mdl,terms)` returns a linear regression model fitted using the input data and settings in `mdl` with the terms `terms` added.```

## Examples

collapse all

Create a linear regression model of the `carsmall` data set without any interactions, and then add an interaction term.

Load the `carsmall` data set and create a model of the MPG as a function of weight and model year.

```load carsmall tbl = table(MPG,Weight); tbl.Year = categorical(Model_Year); mdl = fitlm(tbl,'MPG ~ Year + Weight^2')```
```mdl = Linear regression model: MPG ~ 1 + Weight + Year + Weight^2 Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 54.206 4.7117 11.505 2.6648e-19 Weight -0.016404 0.0031249 -5.2493 1.0283e-06 Year_76 2.0887 0.71491 2.9215 0.0044137 Year_82 8.1864 0.81531 10.041 2.6364e-16 Weight^2 1.5573e-06 4.9454e-07 3.149 0.0022303 Number of observations: 94, Error degrees of freedom: 89 Root Mean Squared Error: 2.78 R-squared: 0.885, Adjusted R-Squared: 0.88 F-statistic vs. constant model: 172, p-value = 5.52e-41 ```

The model includes five terms, `Intercept`, `Weight`, `Year_76`, `Year_82`, and `Weight^2`, where `Year_76` and `Year_82` are indicator variables for the categorical variable `Year` that has three distinct values.

Add an interaction term between the `Year` and `Weight` variables to `mdl`.

```terms = 'Year*Weight'; NewMdl = addTerms(mdl,terms)```
```NewMdl = Linear regression model: MPG ~ 1 + Weight*Year + Weight^2 Estimated Coefficients: Estimate SE tStat pValue ___________ __________ ________ __________ (Intercept) 48.045 6.779 7.0874 3.3967e-10 Weight -0.012624 0.0041455 -3.0454 0.0030751 Year_76 2.7768 3.0538 0.90931 0.3657 Year_82 16.416 4.9802 3.2962 0.0014196 Weight:Year_76 -0.00020693 0.00092403 -0.22394 0.82333 Weight:Year_82 -0.0032574 0.0018919 -1.7217 0.088673 Weight^2 1.0121e-06 6.12e-07 1.6538 0.10177 Number of observations: 94, Error degrees of freedom: 87 Root Mean Squared Error: 2.76 R-squared: 0.89, Adjusted R-Squared: 0.882 F-statistic vs. constant model: 117, p-value = 1.88e-39 ```

`NewMdl` includes two additional terms, `Weight*Year_76` and `Weight*Year_82`.

## Input Arguments

collapse all

Linear regression model, specified as a `LinearModel` object created using `fitlm` or `stepwiselm`.

Terms to add to the regression model `mdl`, specified as one of the following:

• Character vector or string scalar formula in Wilkinson Notation representing one or more terms. The variable names in the formula must be valid MATLAB® identifiers.

• Terms matrix `T` of size t-by-p, where t is the number of terms and p is the number of predictor variables in `mdl`. The value of `T(i,j)` is the exponent of variable `j` in term `i`.

For example, suppose `mdl` has three variables `A`, `B`, and `C` in that order. Each row of `T` represents one term:

• `[0 0 0]` — Constant term or intercept

• `[0 1 0]``B`; equivalently, `A^0 * B^1 * C^0`

• `[1 0 1]``A*C`

• `[2 0 0]``A^2`

• `[0 1 2]``B*(C^2)`

`addTerms` treats a group of indicator variables for a categorical predictor as a single variable. Therefore, you cannot specify an indicator variable to add to the model. If you specify a categorical predictor to add to the model, `addTerms` adds a group of indicator variables for the predictor in one step. See Modify Linear Regression Model Using step for an example that describes how to create indicator variables manually and treat each one as a separate variable.

## Output Arguments

collapse all

Linear regression model with additional terms, returned as a `LinearModel` object. `NewMdl` is a newly fitted model that uses the input data and settings in `mdl` with additional terms specified in `terms`.

To overwrite the input argument `mdl`, assign the newly fitted model to `mdl`:

`mdl = addTerms(mdl,terms);`

collapse all

### Wilkinson Notation

Wilkinson notation describes the terms present in a model. The notation relates to the terms present in a model, not to the multipliers (coefficients) of those terms.

Wilkinson notation uses these symbols:

• `+` means include the next variable.

• `–` means do not include the next variable.

• `:` defines an interaction, which is a product of terms.

• `*` defines an interaction and all lower-order terms.

• `^` raises the predictor to a power, exactly as in `*` repeated, so `^` includes lower-order terms as well.

• `()` groups terms.

This table shows typical examples of Wilkinson notation.

Wilkinson NotationTerms in Standard Notation
`1`Constant (intercept) term
`x1^k`, where `k` is a positive integer`x1`, `x12`, ..., `x1k`
`x1 + x2``x1`, `x2`
`x1*x2``x1`, `x2`, `x1*x2`
`x1:x2``x1*x2` only
`–x2`Do not include `x2`
`x1*x2 + x3``x1`, `x2`, `x3`, `x1*x2`
`x1 + x2 + x3 + x1:x2``x1`, `x2`, `x3`, `x1*x2`
`x1*x2*x3 – x1:x2:x3``x1`, `x2`, `x3`, `x1*x2`, `x1*x3`, `x2*x3`
`x1*(x2 + x3)``x1`, `x2`, `x3`, `x1*x2`, `x1*x3`

For more details, see Wilkinson Notation.

## Algorithms

• `addTerms` treats a categorical predictor as follows:

• A model with a categorical predictor that has L levels (categories) includes L – 1 indicator variables. The model uses the first category as a reference level, so it does not include the indicator variable for the reference level. If the data type of the categorical predictor is `categorical`, then you can check the order of categories by using `categories` and reorder the categories by using `reordercats` to customize the reference level. For more details about creating indicator variables, see Automatic Creation of Dummy Variables.

• `addTerms` treats the group of L – 1 indicator variables as a single variable. If you want to treat the indicator variables as distinct predictor variables, create indicator variables manually by using `dummyvar`. Then use the indicator variables, except the one corresponding to the reference level of the categorical variable, when you fit a model. For the categorical predictor `X`, if you specify all columns of `dummyvar(X)` and an intercept term as predictors, then the design matrix becomes rank deficient.

• Interaction terms between a continuous predictor and a categorical predictor with L levels consist of the element-wise product of the L – 1 indicator variables with the continuous predictor.

• Interaction terms between two categorical predictors with L and M levels consist of the (L – 1)*(M – 1) indicator variables to include all possible combinations of the two categorical predictor levels.

• You cannot specify higher-order terms for a categorical predictor because the square of an indicator is equal to itself.

## Version History

Introduced in R2012a