GeneralizedLinearMixedModel Class

Generalized linear mixed-effects model class

Description

A GeneralizedLinearMixedModel object represents a regression model of a response variable that contains both fixed and random effects. The object comprises data, a model description, fitted coefficients, covariance parameters, design matrices, residuals, residual plots, and other diagnostic information for a generalized linear mixed-effects (GLME) model. You can predict model responses with the predict function and generate random data at new design points using the random function.

Construction

You can fit a generalized linear mixed-effects (GLME) model to sample data using fitglme(tbl,formula). For more information, see fitglme.

Input Arguments

expand all

`tbl` — Input data
table | dataset array

Input data, which includes the response variable, predictor variables, and grouping variables, specified as a table or dataset array. The predictor variables can be continuous or grouping variables (see Grouping Variables). You must specify the model for the variables using formula.

Data Types: table

`formula` — Formula for model specification
character vector or string scalar of the form `'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'`

Formula for model specification, specified as a character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'. For a full description, see Formula.

Example: 'y ~ treatment +(1|block)'

Properties

expand all

`Coefficients` — Estimates of fixed-effects coefficients
dataset array

Estimates of fixed-effects coefficients and related statistics, stored as a dataset array that has one row for each coefficient and the following columns:

Name — Name of the coefficient
Estimate — Estimated coefficient value
SE — Standard error of the estimate
tStat — t-statistic for a test that the coefficient is equal to 0
DF — Degrees of freedom associated with the t statistic
pValue — p-value for the t-statistic
Lower — Lower confidence limit
Upper — Upper confidence limit

To obtain any of these columns as a vector, index into the property using dot notation.

Use the coefTest method to perform other tests on the coefficients.

`CoefficientCovariance` — Covariance of estimated fixed-effects coefficient
matrix

Covariance of estimated fixed-effects coefficient, stored as a matrix.

Data Types: single | double

`CoefficientNames` — Names of fixed-effects coefficients
cell array of character vectors

Names of fixed-effects coefficients, stored as a cell array of character vectors. The label for the coefficient of the constant term is (Intercept). The labels for other coefficients indicate the terms that they multiply. When the term includes a categorical predictor, the label also indicates the level of that predictor.

Data Types: cell

`DFE` — Degrees of freedom for error
positive integer value

Degrees of freedom for error, stored as a positive integer value. DFE is the number of observations minus the number of estimated coefficients.

DFE contains the degrees of freedom corresponding to the 'Residual' method of calculating denominator degrees of freedom for hypothesis tests on fixed-effects coefficients. If n is the number of observations and p is the number of fixed-effects coefficients, then DFE is equal to n – p.

Data Types: double

`Dispersion` — Model dispersion parameter
scalar value

Model dispersion parameter, stored as a scalar value. The dispersion parameter defines the conditional variance of the response.

For observation i, the conditional variance of the response y_i, given the conditional mean μ_i and the dispersion parameter σ², in a generalized linear mixed-effects model is

$var (y_{i} | μ_{i}, σ^{2}) = \frac{σ^{2}}{w_{i}} v (μ_{i}),$

where w_i is the ith observation weight and v is the variance function for the specified conditional distribution of the response. The Dispersion property contains an estimate of σ² for the specified GLME model. The value of Dispersion depends on the specified conditional distribution of the response. For binomial and Poisson distributions, the theoretical value of Dispersion is equal to σ² = 1.0.

If FitMethod is MPL or REMPL and the 'DispersionFlag' name-value pair argument in fitglme is true, then a dispersion parameter is estimated from data for all distributions, including binomial and Poisson distributions.
If FitMethod is ApproximateLaplace or Laplace, then the 'DispersionFlag' name-value pair argument in fitglme does not apply, and the dispersion parameter is fixed at 1.0 for binomial and Poisson distributions. For all other distributions, Dispersion is estimated from data.

Data Types: double

`DispersionEstimated` — Flag indicating if dispersion parameter was estimated
`true` | `false`

Flag indicating estimated dispersion parameter, stored as a logical value.

If FitMethod is ApproximateLaplace or Laplace, then the dispersion parameter is fixed at its theoretical value of 1.0 for binomial and Poisson distributions, and DispersionEstimated is false. For other distributions, the dispersion parameter is estimated from the data, and DispersionEstimated is true.
If FitMethod is MPL or REMPL, and the 'DispersionFlag' name-value pair argument in fitglme is specified as true, then the dispersion parameter is estimated for all distributions, including binomial and Poisson distributions, and DispersionEstimated is true.
If FitMethod is MPL or REMPL, and the 'DispersionFlag' name-value pair argument in fitglme is specified as false, then the dispersion parameter is fixed at its theoretical value for binomial and Poisson distributions, and DispersionEstimated is false. For distributions other than binomial and Poisson, the dispersion parameter is estimated from the data, and DispersionEstimated is true.

Data Types: logical

`Distribution` — Response distribution name
`'Normal'` | `'Binomial'` | `'Poisson'` | `'Gamma'` | `'InverseGaussian'`

Response distribution name, stored as one of the following:

'Normal' — Normal distribution
'Binomial' — Binomial distribution
'Poisson' — Poisson distribution
'Gamma' — Gamma distribution
'InverseGaussian' — Inverse Gaussian distribution

`FitMethod` — Method used to fit the model
`'MPL'` | `'REMPL'` | `'ApproximateLaplace'` | `'Laplace'`

Method used to fit the model, stored as one of the following.

'MPL' — Maximum pseudo likelihood
'REMPL' — Restricted maximum pseudo likelihood
'ApproximateLaplace' — Maximum likelihood using the approximate Laplace method, with fixed effects profiled out
'Laplace' — Maximum likelihood using the Laplace method

`Formula` — Model specification formula
object

Model specification formula, stored as an object. The model specification formula uses Wilkinson’s notation to describe the relationship between the fixed-effects terms, random-effects terms, and grouping variables in the GLME model. For more information see Formula.

`Link` — Link function characteristics
structure

Link function characteristics, stored as a structure containing the following fields. The link is a function G that links the distribution parameter MU to the linear predictor ETA as follows: G(MU) = ETA.

Field	Description
`Name`	Name of the link function
`Link`	Function that defines `G`
`Derivative`	Derivative of `G`
`SecondDerivative`	Second derivative of `G`
`Inverse`	Inverse of `G`

Data Types: struct

`LogLikelihood` — Log of likelihood function
scalar value

Log of likelihood function evaluated at the estimated coefficient values, stored as a scalar value. LogLikelihood depends on the method used to fit the model.

If you use 'Laplace' or 'ApproximateLaplace', then LogLikelihood is the maximized log likelihood.
If you use 'MPL', then LogLikelihood is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration.
If you use 'REMPL', then LogLikelihood is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.

Data Types: double

`ModelCriterion` — Model criterion
table

Model criterion to compare fitted generalized linear mixed-effects models, stored as a table with the following fields.

Field	Description
`AIC`	Akaike information criterion
`BIC`	Bayesian information criterion
`LogLikelihood`	For a model fit using `'Laplace'` or `'ApproximateLaplace'`, `LogLikelihood` is the maximized log likelihood. For a model fit using `'MPL'`, `LogLikelihood` is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration. For a model fit using `'REMPL'`, `LogLikelihood` is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.
`Deviance`	–2 times `LogLikelihood`

`NumCoefficients` — Number of fixed-effects coefficients
positive integer value

Number of fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.

Data Types: double

`NumEstimatedCoefficients` — Number of estimated fixed-effects coefficients
positive integer value

Number of estimated fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.

Data Types: double

`NumObservations` — Number of observations
positive integer value

Number of observations used in the fit, stored as a positive integer value. NumObservations is the number of rows in the table or dataset array tbl, minus rows excluded using the 'Exclude' name-value pair of fitglme or rows containing NaN values.

Data Types: double

`NumPredictors` — Number of predictors
positive integer value

Number of variables used as predictors in the generalized linear mixed-effects model, stored as a positive integer value.

Data Types: double

`NumVariables` — Total number of variables
positive integer value

Total number of variables, including the response and predictors, stored as a positive integer value. If the sample data is in a table or dataset array tbl, then NumVariables is the total number of variables in tbl, including the response variable. NumVariables includes variables, if any, that are not used as predictors or as the response.

Data Types: double

`ObservationInfo` — Information about the observations
table

Information about the observations used in the fit, stored as a table.

ObservationInfo has one row for each observation and the following columns.

Name	Description
`Weights`	The weight value for the observation. The default value is 1.
`Excluded`	If the observation was excluded from the fit using the `'Exclude'` name-value pair argument in `fitglme`, then `Excluded` is `true`, or `1`. Otherwise, `Excluded` is `false`, or `0`.
`Missing`	If the observation was excluded from the fit because any response or predictor value is missing, then `Missing` is `true`. Otherwise, `Missing` is `false`. Missing values include `NaN` for numeric variables, empty cells for cell arrays, blank rows for character arrays, and the `<undefined>` value for categorical arrays.
`Subset`	If the observation was used in the fit, then `Subset` is `true`. If the observation was not used in the fit because it is missing or excluded, then `Subset` is `false`.
`BinomSize`	Binomial size for each observation. This column only applies when fitting a binomial distribution.

Data Types: table

`ObservationNames` — Names of observations
cell array of character vectors

Names of observations used in the fit, stored as a cell array of character vectors.

If the data is in a table or dataset array tbl that contains observation names, then ObservationNames uses those names.
If the data is provided in matrices, or in a table or dataset array without observation names, then ObservationNames is an empty cell array.

Data Types: cell

`PredictorNames` — Names of predictors
cell array of character vectors

Names of the variables used as predictors in the fit, stored as a cell array of character vectors that has the same length as NumPredictors.

Data Types: cell

`ResponseName` — Name of response variable
character vector

Name of the variable used as the response variable in the fit, stored as a character vector.

Data Types: char

`Rsquared` — Proportion of variability in the response explained by the fitted model
structure

Proportion of variability in the response explained by the fitted model, stored as a structure. Rsquared contains the R-squared value of the fitted model, also known as the multiple correlation coefficient. Rsquared contains the following fields.

Field	Description
`Ordinary`	R-squared value, stored as a scalar value in a structure. `Rsquared.Ordinary = 1 — SSE./SST`
`Adjusted`	R-squared value adjusted for the number of fixed-effects coefficients, stored as a scalar value in a structure. `Rsquared.Adjusted = 1 — (SSE./SST)*(DFT./DFE)`, where `DFE = n – p`, `DFT = n – 1`, `n` is the total number of observations, and `p` is the number of fixed-effects coefficients.

Data Types: struct

`SSE` — Sum of squared errors
positive scalar

Sum of squared errors, specified as a positive scalar. SSE is the weighted sum of the squared conditional residuals, and is calculated as

$S S E = \sum_{i = 1}^{N} w_{i}^{e f f} {(y_{i} - f_{i})}^{2},$

where N is the number of observations, w_i^eff is the ith effective weight, y_i is the ith response, and f_i is the ith fitted value.

The ith effective weight is calculated as

$w_{i}^{e f f} = {\frac{w_{i}}{v_{i} (f_{i} (\hat{β}, \hat{b}))}},$

where w_i is the ith observation weight, v_i is the variance term for the ith observation, and $\hat{β}$ and $\hat{b}$ are estimated values of β and b, respectively.

The ith fitted value is calculated as

$f_{i} = g^{- 1} (x_{i}^{T} \hat{β} + z_{i}^{T} \hat{b} + δ_{i}),$

where g is the link function, x_i^T is the ith row of the fixed-effects design matrix X, z_i^T is the ith row of the random-effects design matrix Z, and δ_i is the ith offset value.

Data Types: double

`SSR` — Regression sum of squares
positive scalar

Regression sum of squares, specified as a positive scalar. SSR is the sum of squares explained by the generalized linear mixed-effects regression, and is equal to the sum of the squared deviations between the fitted values and the mean of the response. SSR is calculated as

$S S R = \sum_{i = 1}^{N} w_{i}^{e f f} {(f_{i} - \bar{y})}^{2},$

where N is the number of observations, w_i^eff is the ith effective weight, f_i is the ith fitted value, and $\bar{y}$ is the weighted average of the response.

The ith effective weight is calculated as

$w_{i}^{e f f} = {\frac{w_{i}}{v_{i} (f_{i} (\hat{β}, \hat{b}))}},$

where w_i is the ith observation weight, v_i is the variance term for the ith observation, and $\hat{β}$ and $\hat{b}$ are estimated values of β and b, respectively.

The ith fitted value is calculated as

$f_{i} = g^{- 1} (x_{i}^{T} \hat{β} + z_{i}^{T} \hat{b} + δ_{i}),$

where g is the link function, x_i^T is the ith row of the fixed-effects design matrix X, z_i^T is the ith row of the random-effects design matrix Z, and δ_i is the ith offset value.

Data Types: double

`SST` — Total sum of squares
positive scalar

Total sum of squares, specified as a positive scalar.

For a GLME model with an intercept, SST is calculated as

SST = SSE + SSR,

where SST is the total sum of squares, SSE is the error sum of squares, and SSR is the regression sum of squares.

For a GLME model without an intercept, SST is calculated as

$S S T = \sum_{i = 1}^{N} w_{i}^{e f f} {(y_{i} - \bar{y})}^{2},$

where N is the number of observations, w_i^eff is the ith effective weight, y_i is the ith response value, and $\bar{y}$ is the weighted average of the response.

Data Types: double

`VariableInfo` — Information about the variables
table

Information about the variables used in the fit, stored as a table. VariableInfo has one row for each variable and contains the following columns.

Column Name	Description
`Class`	Class of the variable (`'double'`, `'cell'`, `'nominal'`, and so on).
`Range`	Value range of the variable. For a numerical variable, `Range` is a two-element vector of the form `[min,max]`. For a cell or categorical variable, `Range` is a cell or categorical array containing all unique values of the variable.
`InModel`	If the variable is a predictor in the fitted model, `InModel` is `true`. If the variable is not in the fitted model, `InModel` is `false`.
`IsCategorical`	If the variable type is treated as a categorical predictor (such as cell, logical, or categorical), then `IsCategorical` is `true`. If the variable is a continuous predictor, then `IsCategorical` is `false`.

Data Types: table

`VariableNames` — Names of the variables
cell array of character vectors

Names of all the variables contained in the table or dataset array tbl, stored as a cell array of character vectors.

Data Types: cell

`Variables` — Variables
table

Variables, stored as a table. If the fit is based on a table or dataset array tbl, then Variables is identical to tbl.

Data Types: table

Object Functions

`anova`	Analysis of variance for generalized linear mixed-effects model
`coefCI`	Confidence intervals for coefficients of generalized linear mixed-effects model
`coefTest`	Hypothesis test on fixed and random effects of generalized linear mixed-effects model
`compare`	Compare generalized linear mixed-effects models
`covarianceParameters`	Extract covariance parameters of generalized linear mixed-effects model
`designMatrix`	Fixed- and random-effects design matrices
`fitted`	Fitted responses from generalized linear mixed-effects model
`fixedEffects`	Estimates of fixed effects and related statistics
`partialDependence`	Compute partial dependence
`plotPartialDependence`	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
`plotResiduals`	Plot residuals of generalized linear mixed-effects model
`predict`	Predict response of generalized linear mixed-effects model
`random`	Generate random responses from fitted generalized linear mixed-effects model
`randomEffects`	Estimates of random effects and related statistics
`refit`	Refit generalized linear mixed-effects model
`residuals`	Residuals of fitted generalized linear mixed-effects model
`response`	Response vector of generalized linear mixed-effects model

Examples

collapse all

Fit a Generalized Linear Mixed-Effects Model

Open Live Script

Load the sample data.

load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

Flag to indicate whether the batch used the new process (newprocess)
Processing time for each batch, in hours (time)
Temperature of the batch, in degrees Celsius (temp)
Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier)
Number of defects in the batch (defects)

The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0.

The number of defects can be modeled using a Poisson distribution

${defects}_{i j} \sim Poisson (μ_{i j})$

This corresponds to the generalized linear mixed-effects model

$\log (μ_{i j}) = β_{0} + β_{1} {newprocess}_{i j} + β_{2} {time_dev}_{i j} + β_{3} {temp_dev}_{i j} + β_{4} {supplier_C}_{i j} + β_{5} {supplier_B}_{i j} + b_{i},$

where

${defects}_{i j}$ is the number of defects observed in the batch produced by factory $i$ during batch $j$ .
$μ_{i j}$ is the mean number of defects corresponding to factory $i$ (where $i = 1, 2, . . ., 20$ ) during batch $j$ (where $j = 1, 2, . . ., 5$ ).
${newprocess}_{i j}$ , ${time_dev}_{i j}$ , and ${temp_dev}_{i j}$ are the measurements for each variable that correspond to factory $i$ during batch $j$ . For example, ${newprocess}_{i j}$ indicates whether the batch produced by factory $i$ during batch $j$ used the new process.
${supplier_C}_{i j}$ and ${supplier_B}_{i j}$ are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory $i$ during batch $j$ .
$b_{i} \sim N (0, σ_{b}^{2})$ is a random-effects intercept for each factory $i$ that accounts for factory-specific variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ...
    'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Display the model.

disp(glme)

Generalized linear mixed-effects model fit by ML

Model information:
    Number of observations             100
    Fixed effects coefficients           6
    Random effects coefficients         20
    Covariance parameters                1
    Distribution                    Poisson
    Link                            Log   
    FitMethod                       Laplace

Formula:
    defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory)

Model fit statistics:
    AIC       BIC       LogLikelihood    Deviance
    416.35    434.58    -201.17          402.35  

Fixed effects coefficients (95% CIs):
    Name                   Estimate     SE          tStat       DF    pValue        Lower        Upper    
    {'(Intercept)'}           1.4689     0.15988      9.1875    94    9.8194e-15       1.1515       1.7864
    {'newprocess' }         -0.36766     0.17755     -2.0708    94      0.041122     -0.72019    -0.015134
    {'time_dev'   }        -0.094521     0.82849    -0.11409    94       0.90941      -1.7395       1.5505
    {'temp_dev'   }         -0.28317      0.9617    -0.29444    94       0.76907      -2.1926       1.6263
    {'supplier_C' }        -0.071868    0.078024     -0.9211    94       0.35936     -0.22679     0.083051
    {'supplier_B' }         0.071072     0.07739     0.91836    94       0.36078    -0.082588      0.22473

Random effects covariance parameters:
Group: factory (20 Levels)
    Name1                  Name2                  Type           Estimate
    {'(Intercept)'}        {'(Intercept)'}        {'std'}        0.31381 

Group: Error
    Name                        Estimate
    {'sqrt(Dispersion)'}        1

The Model information table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a Poisson distribution, the link function is Log, and the fit method is Laplace.

Formula indicates the model specification using Wilkinson’s notation.

The Model fit statistics table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (AIC), Bayesian information criterion (BIC) values, log likelihood (LogLikelihood), and deviance (Deviance) values.

The Fixed effects coefficients table indicates that fitglme returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (Name) contains the name of each fixed-effects coefficient, column 2 (Estimate) contains its estimated value, and column 3 (SE) contains the standard error of the coefficient. Column 4 (tStat) contains the $t$ -statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (DF) and column 6 (pValue) contain the degrees of freedom and $p$ -value that correspond to the $t$ -statistic, respectively. The last two columns (Lower and Upper) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient.

Random effects covariance parameters displays a table for each grouping variable (here, only factory), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, std indicates that fitglme returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1.

The standard display generated by fitglme does not provide confidence intervals for the random-effects parameters. To compute and display these values, use covarianceParameters.

Fit Generalized Mixed-Effects Model to Binary Data

Open Live Script

Load the carbig sample data set.

load carbig

The variables Acceleration, Model_Year, and Cylinders contain data for car acceleration, year of manufacture, and number of engine cylinders, respectively. The data was collected from cars built between 1970 and 1982.

Create a variable named CylinderCats that indicates whether a car has more than four cylinders. Use the table function to create a table from the data in Acceleration, Model_Year, and CylinderCats.

CylinderCats = Cylinders>4;
tbl = table(Acceleration,Model_Year,CylinderCats);

Fit a generalized mixed-effects model to the data, using CylinderCats as the response variable and Model_Year as a random effect. Specify the response data distribution as binomial.

glme = fitglme(tbl,"CylinderCats~Acceleration+(Acceleration|Model_Year)",Distribution="Binomial");

glme is a GeneralizedLinearMixedModel object that contains information about the fitted model.

Inspect the statistics for the fixed effect Acceleration by using the fixedEffects object function with the default 95% confidence level.

[~,~,statsFixed] = fixedEffects(glme)

statsFixed = 
    Fixed effect coefficients: DFMethod = 'residual', Alpha = 0.05

    Name                    Estimate    SE          tStat      DF     pValue        Lower       Upper  
    {'(Intercept)' }          4.3838      1.2374     3.5428    404    0.00044213      1.9513     6.8163
    {'Acceleration'}        -0.29673    0.077896    -3.8093    404    0.00016104    -0.44986    -0.1436

The small p-value for the Acceleration term indicates that car acceleration has a statistically significant effect on whether a car has more than four cylinders.

Inspect the statistics for the random effect Model_Year by using the randomEffects object function with the default 95% confidence level.

[~,~,statsRandom] = randomEffects(glme)

statsRandom = 
    Random effect coefficients: DFMethod = 'residual', Alpha = 0.05

    Group                 Level         Name                    Estimate    SEPred     tStat       DF     pValue      Lower        Upper   
    {'Model_Year'}        {'70'}        {'(Intercept)' }           3.041     2.1322      1.4262    404     0.15457      -1.1506      7.2326
    {'Model_Year'}        {'70'}        {'Acceleration'}        -0.16836    0.13906     -1.2107    404     0.22672     -0.44173     0.10501
    {'Model_Year'}        {'71'}        {'(Intercept)' }          3.4715     2.3452      1.4802    404     0.13959      -1.1389      8.0818
    {'Model_Year'}        {'71'}        {'Acceleration'}        -0.21721    0.15106     -1.4378    404     0.15125     -0.51418    0.079764
    {'Model_Year'}        {'72'}        {'(Intercept)' }          4.2634     2.4382      1.7486    404    0.081124     -0.52977      9.0566
    {'Model_Year'}        {'72'}        {'Acceleration'}        -0.28827    0.15892     -1.8139    404    0.070435      -0.6007    0.024149
    {'Model_Year'}        {'73'}        {'(Intercept)' }          3.7951     2.1976      1.7269    404    0.084949     -0.52512      8.1153
    {'Model_Year'}        {'73'}        {'Acceleration'}        -0.21079    0.14182     -1.4864    404     0.13796     -0.48958    0.067996
    {'Model_Year'}        {'74'}        {'(Intercept)' }        -0.77693     2.6678    -0.29123    404     0.77103      -6.0214      4.4675
    {'Model_Year'}        {'74'}        {'Acceleration'}        0.056863    0.16571     0.34314    404     0.73167      -0.2689     0.38263
    {'Model_Year'}        {'75'}        {'(Intercept)' }         -3.2681     2.1531     -1.5178    404     0.12984      -7.5008     0.96463
    {'Model_Year'}        {'75'}        {'Acceleration'}         0.24151    0.13346      1.8096    404    0.071093    -0.020847     0.50387
    {'Model_Year'}        {'76'}        {'(Intercept)' }        -0.28228     2.0922    -0.13492    404     0.89274      -4.3952      3.8306
    {'Model_Year'}        {'76'}        {'Acceleration'}        0.045966    0.13069     0.35171    404     0.72524     -0.21096     0.30289
    {'Model_Year'}        {'77'}        {'(Intercept)' }        -0.78239     2.2806    -0.34305    404     0.73174      -5.2658       3.701
    {'Model_Year'}        {'77'}        {'Acceleration'}        0.052519    0.14498     0.36226    404     0.71735     -0.23249     0.33752
    {'Model_Year'}        {'78'}        {'(Intercept)' }        -0.46307     2.2693    -0.20406    404     0.83841      -4.9242      3.9981
    {'Model_Year'}        {'78'}        {'Acceleration'}        0.050014    0.14243     0.35114    404     0.72567     -0.22999     0.33002
    {'Model_Year'}        {'79'}        {'(Intercept)' }         -2.5181     2.0134     -1.2507    404     0.21178      -6.4762        1.44
    {'Model_Year'}        {'79'}        {'Acceleration'}         0.19051     0.1257      1.5156    404      0.1304    -0.056591     0.43761
    {'Model_Year'}        {'80'}        {'(Intercept)' }         -2.6168     2.4053     -1.0879    404     0.27728      -7.3452      2.1117
    {'Model_Year'}        {'80'}        {'Acceleration'}         0.10117    0.14903     0.67883    404     0.49763     -0.19181     0.39414
    {'Model_Year'}        {'81'}        {'(Intercept)' }         -1.8396     2.4268    -0.75801    404     0.44888      -6.6103      2.9312
    {'Model_Year'}        {'81'}        {'Acceleration'}         0.08723    0.15145     0.57596    404     0.56497      -0.2105     0.38496
    {'Model_Year'}        {'82'}        {'(Intercept)' }         -2.0238     2.5531    -0.79267    404     0.42843      -7.0428      2.9953
    {'Model_Year'}        {'82'}        {'Acceleration'}        0.058853    0.15948     0.36903    404      0.7123     -0.25467     0.37237

The large p-values in the table output indicate that not enough evidence exists to conclude that any of the random effect terms have a statistically significant effect on whether a car has more than four cylinders.

More About

expand all

Formula

In general, a formula for model specification is a character vector or string scalar of the form 'y ~ terms'. For generalized linear mixed-effects models, this formula is in the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)', where fixed and random contain the fixed-effects and the random-effects terms, respectively, and R is the number of grouping variables in the model.

Suppose a table tbl contains the following:

A response variable, y
Predictor variables, X_j, which can be continuous or grouping variables
Grouping variables, g₁, g₂, ..., g_R,

where the grouping variables in X_j and g_r can be categorical, logical, character arrays, string arrays, or cell arrays of character vectors.

Then, in a formula of the form, 'y ~ fixed + (random₁|g₁) + ... + (random_R|g_R)', the term fixed corresponds to a specification of the fixed-effects design matrix X, random₁ is a specification of the random-effects design matrix Z₁ corresponding to grouping variable g₁, and similarly random_R is a specification of the random-effects design matrix Z_R corresponding to grouping variable g_R. You can express the fixed and random terms using Wilkinson notation.

Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors.

Wilkinson Notation	Factors in Standard Notation
`1`	Constant (intercept) term
`X^k`, where `k` is a positive integer	`X`, `X²`, ..., `X^k`
`X1 + X2`	`X1`, `X2`
`X1*X2`	`X1`, `X2`, `X1.*X2 (elementwise multiplication of X1 and X2)`
`X1:X2`	`X1.*X2` only
`- X2`	Do not include `X2`
`X1*X2 + X3`	`X1`, `X2`, `X3`, `X1*X2`
`X1 + X2 + X3 + X1:X2`	`X1`, `X2`, `X3`, `X1*X2`
`X1X2X3 - X1:X2:X3`	`X1`, `X2`, `X3`, `X1X2`, `X1X3`, `X2*X3`
`X1*(X2 + X3)`	`X1`, `X2`, `X3`, `X1X2`, `X1X3`

Statistics and Machine Learning Toolbox™ notation always includes a constant term unless you explicitly remove the term using -1. Here are some examples for linear mixed-effects model specification.

Examples:

Formula	Description
`'y ~ X1 + X2'`	Fixed effects for the intercept, `X1` and `X2`. This is equivalent to `'y ~ 1 + X1 + X2'`.
`'y ~ -1 + X1 + X2'`	No intercept and fixed effects for `X1` and `X2`. The implicit intercept term is suppressed by including `-1`.
`'y ~ 1 + (1 \| g1)'`	Fixed effects for the intercept plus random effect for the intercept for each level of the grouping variable `g1`.
`'y ~ X1 + (1 \| g1)'`	Random intercept model with a fixed slope.
`'y ~ X1 + (X1 \| g1)'`	Random intercept and slope, with possible correlation between them. This is equivalent to `'y ~ 1 + X1 + (1 + X1\|g1)'`.
`'y ~ X1 + (1 \| g1) + (-1 + X1 \| g1)'`	Independent random effects terms for intercept and slope.
`'y ~ 1 + (1 \| g1) + (1 \| g2) + (1 \| g1:g2)'`	Random intercept model with independent main effects for `g1` and `g2`, plus an independent interaction effect.

GeneralizedLinearMixedModel Class

Description

Construction

Input Arguments

tbl — Input data table | dataset array

formula — Formula for model specification character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'

Properties

Coefficients — Estimates of fixed-effects coefficients dataset array

CoefficientCovariance — Covariance of estimated fixed-effects coefficient matrix

CoefficientNames — Names of fixed-effects coefficients cell array of character vectors

DFE — Degrees of freedom for error positive integer value

Dispersion — Model dispersion parameter scalar value

DispersionEstimated — Flag indicating if dispersion parameter was estimated true | false

Distribution — Response distribution name 'Normal' | 'Binomial' | 'Poisson' | 'Gamma' | 'InverseGaussian'

FitMethod — Method used to fit the model 'MPL' | 'REMPL' | 'ApproximateLaplace' | 'Laplace'

Formula — Model specification formula object

Link — Link function characteristics structure

LogLikelihood — Log of likelihood function scalar value

ModelCriterion — Model criterion table

NumCoefficients — Number of fixed-effects coefficients positive integer value

NumEstimatedCoefficients — Number of estimated fixed-effects coefficients positive integer value

NumObservations — Number of observations positive integer value

NumPredictors — Number of predictors positive integer value

NumVariables — Total number of variables positive integer value

ObservationInfo — Information about the observations table

ObservationNames — Names of observations cell array of character vectors

PredictorNames — Names of predictors cell array of character vectors

ResponseName — Name of response variable character vector

Rsquared — Proportion of variability in the response explained by the fitted model structure

SSE — Sum of squared errors positive scalar

SSR — Regression sum of squares positive scalar

SST — Total sum of squares positive scalar

VariableInfo — Information about the variables table

VariableNames — Names of the variables cell array of character vectors

Variables — Variables table

Object Functions

Examples

Fit a Generalized Linear Mixed-Effects Model

Fit Generalized Mixed-Effects Model to Binary Data

More About

Formula

See Also

Topics

`tbl` — Input data
table | dataset array

`formula` — Formula for model specification
character vector or string scalar of the form `'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'`

`Coefficients` — Estimates of fixed-effects coefficients
dataset array

`CoefficientCovariance` — Covariance of estimated fixed-effects coefficient
matrix

`CoefficientNames` — Names of fixed-effects coefficients
cell array of character vectors

`DFE` — Degrees of freedom for error
positive integer value

`Dispersion` — Model dispersion parameter
scalar value

`DispersionEstimated` — Flag indicating if dispersion parameter was estimated
`true` | `false`

`Distribution` — Response distribution name
`'Normal'` | `'Binomial'` | `'Poisson'` | `'Gamma'` | `'InverseGaussian'`

`FitMethod` — Method used to fit the model
`'MPL'` | `'REMPL'` | `'ApproximateLaplace'` | `'Laplace'`

`Formula` — Model specification formula
object

`Link` — Link function characteristics
structure

`LogLikelihood` — Log of likelihood function
scalar value

`ModelCriterion` — Model criterion
table

`NumCoefficients` — Number of fixed-effects coefficients
positive integer value

`NumEstimatedCoefficients` — Number of estimated fixed-effects coefficients
positive integer value

`NumObservations` — Number of observations
positive integer value

`NumPredictors` — Number of predictors
positive integer value

`NumVariables` — Total number of variables
positive integer value

`ObservationInfo` — Information about the observations
table

`ObservationNames` — Names of observations
cell array of character vectors

`PredictorNames` — Names of predictors
cell array of character vectors

`ResponseName` — Name of response variable
character vector

`Rsquared` — Proportion of variability in the response explained by the fitted model
structure

`SSE` — Sum of squared errors
positive scalar

`SSR` — Regression sum of squares
positive scalar

`SST` — Total sum of squares
positive scalar

`VariableInfo` — Information about the variables
table

`VariableNames` — Names of the variables
cell array of character vectors

`Variables` — Variables
table