Documentation

# GeneralizedLinearMixedModel class

Generalized linear mixed-effects model class

## Description

A `GeneralizedLinearMixedModel` object represents a regression model of a response variable that contains both fixed and random effects. The object comprises data, a model description, fitted coefficients, covariance parameters, design matrices, residuals, residual plots, and other diagnostic information for a generalized linear mixed-effects (GLME) model. You can predict model responses with the `predict` function and generate random data at new design points using the `random` function.

## Construction

You can fit a generalized linear mixed-effects (GLME) model to sample data using `fitglme(tbl,formula)`. For more information, see `fitglme`.

### Input Arguments

expand all

Input data, which includes the response variable, predictor variables, and grouping variables, specified as a table or dataset array. The predictor variables can be continuous or grouping variables (see Grouping Variables). You must specify the model for the variables using `formula`.

Data Types: `table`

Formula for model specification, specified as a character vector or string scalar of the form ```'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'```. For a full description, see Formula.

Example: `'y ~ treatment +(1|block)'`

## Properties

expand all

Estimates of fixed-effects coefficients and related statistics, stored as a dataset array that has one row for each coefficient and the following columns:

• `Name` — Name of the coefficient

• `Estimate` — Estimated coefficient value

• `SE` — Standard error of the estimate

• `tStat`t-statistic for a test that the coefficient is equal to 0

• `DF` — Degrees of freedom associated with the t statistic

• `pValue`p-value for the t-statistic

• `Lower` — Lower confidence limit

• `Upper` — Upper confidence limit

To obtain any of these columns as a vector, index into the property using dot notation.

Use the `coefTest` method to perform other tests on the coefficients.

Covariance of estimated fixed-effects vector, stored as a matrix.

Data Types: `single` | `double`

Names of fixed-effects coefficients, stored as a cell array of character vectors. The label for the coefficient of the constant term is `(Intercept)`. The labels for other coefficients indicate the terms that they multiply. When the term includes a categorical predictor, the label also indicates the level of that predictor.

Data Types: `cell`

Degrees of freedom for error, stored as a positive integer value. `DFE` is the number of observations minus the number of estimated coefficients.

`DFE` contains the degrees of freedom corresponding to the `'Residual'` method of calculating denominator degrees of freedom for hypothesis tests on fixed-effects coefficients. If n is the number of observations and p is the number of fixed-effects coefficients, then `DFE` is equal to np.

Data Types: `double`

Model dispersion parameter, stored as a scalar value. The dispersion parameter defines the conditional variance of the response.

For observation i, the conditional variance of the response yi, given the conditional mean μi and the dispersion parameter σ2, in a generalized linear mixed-effects model is

`$\mathrm{var}\left({y}_{i}|{\mu }_{i},{\sigma }^{2}\right)=\frac{{\sigma }^{2}}{{w}_{i}}v\left({\mu }_{i}\right)\text{\hspace{0.17em}},$`

where wi is the ith observation weight and v is the variance function for the specified conditional distribution of the response. The `Dispersion` property contains an estimate of σ2 for the specified GLME model. The value of `Dispersion` depends on the specified conditional distribution of the response. For binomial and Poisson distributions, the theoretical value of `Dispersion` is equal to σ2 = 1.0.

• If `FitMethod` is `MPL` or `REMPL` and the `'DispersionFlag'` name-value pair argument in `fitglme` is `true`, then a dispersion parameter is estimated from data for all distributions, including binomial and Poisson distributions.

• If `FitMethod` is `ApproximateLaplace` or `Laplace`, then the `'DispersionFlag'` name-value pair argument in `fitglme` does not apply, and the dispersion parameter is fixed at 1.0 for binomial and Poisson distributions. For all other distributions, `Dispersion` is estimated from data.

Data Types: `double`

Flag indicating estimated dispersion parameter, stored as a logical value.

• If `FitMethod` is `ApproximateLaplace` or `Laplace`, then the dispersion parameter is fixed at its theoretical value of 1.0 for binomial and Poisson distributions, and `DispersionEstimated` is `false`. For other distributions, the dispersion parameter is estimated from the data, and `DispersionEstimated` is `true`.

• If `FitMethod` is `MPL` or `REMPL`, and the `'DispersionFlag'` name-value pair argument in `fitglme` is specified as `true`, then the dispersion parameter is estimated for all distributions, including binomial and Poisson distributions, and `DispersionEstimated` is `true`.

• If `FitMethod` is `MPL` or `REMPL`, and the `'DispersionFlag'` name-value pair argument in `fitglme` is specified as `false`, then the dispersion parameter is fixed at its theoretical value for binomial and Poisson distributions, and `DispersionEstimated` is `false`. For distributions other than binomial and Poisson, the dispersion parameter is estimated from the data, and `DispersionEstimated` is `true`.

Data Types: `logical`

Response distribution name, stored as one of the following:

• `'Normal'` — Normal distribution

• `'Binomial'` — Binomial distribution

• `'Poisson'` — Poisson distribution

• `'Gamma'` — Gamma distribution

• `'InverseGaussian'` — Inverse Gaussian distribution

Method used to fit the model, stored as one of the following.

• `'MPL'` — Maximum pseudo likelihood

• `'REMPL'` — Restricted maximum pseudo likelihood

• `'ApproximateLaplace'` — Maximum likelihood using the approximate Laplace method, with fixed effects profiled out

• `'Laplace'` — Maximum likelihood using the Laplace method

Model specification formula, stored as an object. The model specification formula uses Wilkinson’s notation to describe the relationship between the fixed-effects terms, random-effects terms, and grouping variables in the GLME model. For more information see Formula.

Log of likelihood function evaluated at the estimated coefficient values, stored as a scalar value. `LogLikelihood` depends on the method used to fit the model.

• If you use `'Laplace'` or `'ApproximateLaplace'`, then `LogLikelihood` is the maximized log likelihood.

• If you use `'MPL'`, then `LogLikelihood` is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration.

• If you use `'REMPL'`, then `LogLikelihood` is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.

Data Types: `double`

Model criterion to compare fitted generalized linear mixed-effects models, stored as a table with the following fields.

FieldDescription
`AIC`Akaike information criterion
`BIC`Bayesian information criterion
`LogLikelihood`
• For a model fit using `'Laplace'` or `'ApproximateLaplace'`, `LogLikelihood` is the maximized log likelihood.

• For a model fit using `'MPL'`, `LogLikelihood` is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration.

• For a model fit using `'REMPL'`, `LogLikelihood` is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.

`Deviance`–2 times `LogLikelihood`

Number of fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.

Data Types: `double`

Number of estimated fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.

Data Types: `double`

Number of observations used in the fit, stored as a positive integer value. `NumObservations` is the number of rows in the table or dataset array `tbl`, minus rows excluded using the `'Exclude'` name-value pair of `fitglme` or rows containing `NaN` values.

Data Types: `double`

Number of variables used as predictors in the generalized linear mixed-effects model, stored as a positive integer value.

Data Types: `double`

Total number of variables, including the response and predictors, stored as a positive integer value. If the sample data is in a table or dataset array `tbl`, then `NumVariables` is the total number of variables in `tbl`, including the response variable. `NumVariables` includes variables, if any, that are not used as predictors or as the response.

Data Types: `double`

Information about the observations used in the fit, stored as a table.

`ObservationInfo` has one row for each observation and the following columns.

NameDescription
`Weights`The weight value for the observation. The default value is 1.
`Excluded`If the observation was excluded from the fit using the `'Exclude'` name-value pair argument in `fitglme`, then `Excluded` is `true`, or `1`. Otherwise, `Excluded` is `false`, or `0`.
`Missing`

If the observation was excluded from the fit because any response or predictor value is missing, then `Missing` is `true`. Otherwise, `Missing` is `false`.

Missing values include `NaN` for numeric variables, empty cells for cell arrays, blank rows for character arrays, and the `<undefined>` value for categorical arrays.

`Subset`If the observation was used in the fit, then `Subset` is `true`. If the observation was not used in the fit because it is missing or excluded, then `Subset` is `false`.
`BinomSize`Binomial size for each observation. This column only applies when fitting a binomial distribution.

Data Types: `table`

Names of observations used in the fit, stored as a cell array of character vectors.

• If the data is in a table or dataset array `tbl` that contains observation names, then `ObservationNames` uses those names.

• If the data is provided in matrices, or in a table or dataset array without observation names, then `ObservationNames` is an empty cell array.

Data Types: `cell`

Names of the variables used as predictors in the fit, stored as a cell array of character vectors that has the same length as `NumPredictors`.

Data Types: `cell`

Name of the variable used as the response variable in the fit, stored as a character vector.

Data Types: `char`

Proportion of variability in the response explained by the fitted model, stored as a structure. `Rsquared` contains the R-squared value of the fitted model, also known as the multiple correlation coefficient. `Rsquared` contains the following fields.

FieldDescription
`Ordinary`R-squared value, stored as a scalar value in a structure.
```Rsquared.Ordinary = 1 — SSE./SST```
`Adjusted`R-squared value adjusted for the number of fixed-effects coefficients, stored as a scalar value in a structure.
```Rsquared.Adjusted = 1 — (SSE./SST)*(DFT./DFE)```,
where `DFE = n – p`, ```DFT = n – 1```, `n` is the total number of observations, and `p` is the number of fixed-effects coefficients.

Data Types: `struct`

Error sum of squares, stored as a positive scalar value. `SSE` is the weighted sum of the squared conditional residuals, and is calculated as

`$SSE=\sum _{i=1}^{n}{w}_{i}^{eff}{\left({y}_{i}-{f}_{i}\right)}^{2}\text{\hspace{0.17em}},$`

where n is the number of observations, wieff is the ith effective weight, yi is the ith response, and fi is the ith fitted value.

The ith effective weight is calculated as

`${w}_{i}^{eff}=\left\{\frac{{w}_{i}}{{v}_{i}\left({\mu }_{i}\left(\stackrel{^}{\beta },\stackrel{^}{b}\right)\right)}\right\}\text{\hspace{0.17em}},$`

where vi is the variance term for the ith observation, $\stackrel{^}{\beta }$ and $\stackrel{^}{b}$ are estimated values of β and b, respectively.

The ith fitted value is calculated as

`${f}_{i}={g}^{-1}\left({x}_{i}^{T}\stackrel{^}{\beta }+{z}_{i}^{T}\stackrel{^}{b}+{\delta }_{i}\right)\text{\hspace{0.17em}},$`

where xiT is the ith row of the fixed-effects design matrix X, and ziT is the ith row of the random-effects design matrix Z. δi is the ith offset value.

Data Types: `double`

Regression sum of squares, stored as a positive scalar value. `SSR` is the sum of squares explained by the generalized linear mixed-effects regression, or equivalently the weighted sum of the squared deviations of the conditional fitted values from their weighted mean. `SSR` is calculated as

`$SSR=\sum _{i=1}^{N}{w}_{i}^{eff}{\left({f}_{i}-\overline{f}\right)}^{2}\text{\hspace{0.17em}},$`

where n is the number of observations, wieff is the ith effective weight, fi is the ith fitted value, and $\overline{f}$ is a weighted average of the fitted values.

The ith effective weight is calculated as

`${w}_{i}^{eff}=\left\{\frac{{w}_{i}}{{v}_{i}\left({\mu }_{i}\left(\stackrel{^}{\beta },\stackrel{^}{b}\right)\right)}\right\}\text{\hspace{0.17em}},$`

where $\stackrel{^}{\beta }$ and $\stackrel{^}{b}$ are estimated values of β and b, respectively.

The ith fitted value is calculated as

`${f}_{i}={g}^{-1}\left({x}_{i}^{T}\stackrel{^}{\beta }+{z}_{i}^{T}\stackrel{^}{b}+{\delta }_{i}\right)\text{\hspace{0.17em}},$`

where xiT is the ith row of the fixed-effects design matrix X, and ziT is the ith row of the random-effects design matrix Z. δi is the ith offset value.

The weighted average of fitted values is calculated as

`$\overline{f}=\frac{\left[\sum _{i=1}^{n}{w}_{i}^{eff}{f}_{i}\right]}{\sum _{i=1}^{n}{w}_{i}^{eff}}\text{\hspace{0.17em}}.$`

Data Types: `double`

Total sum of squares, stored as a positive scalar value. For a GLME model, `SST` is defined as ```SST = SSE + SSR```.

Data Types: `double`

Information about the variables used in the fit, stored as a table. `VariableInfo` has one row for each variable and contains the following columns.

Column NameDescription
`Class`Class of the variable (`'double'`, `'cell'`, `'nominal'`, and so on).
`Range`

Value range of the variable.

• For a numerical variable, `Range` is a two-element vector of the form `[min,max]`.

• For a cell or categorical variable, `Range` is a cell or categorical array containing all unique values of the variable.

`InModel`

If the variable is a predictor in the fitted model, `InModel` is `true`.

If the variable is not in the fitted model, `InModel` is `false`.

`IsCategorical`

If the variable type is treated as a categorical predictor (such as cell, logical, or categorical), then `IsCategorical` is `true`.

If the variable is a continuous predictor, then `IsCategorical` is `false`.

Data Types: `table`

Names of all the variables contained in the table or dataset array `tbl`, stored as a cell array of character vectors.

Data Types: `cell`

Variables, stored as a table. If the fit is based on a table or dataset array `tbl`, then `Variables` is identical to `tbl`.

Data Types: `table`

## Methods

 anova Analysis of variance for generalized linear mixed-effects model coefCI Confidence intervals for coefficients of generalized linear mixed-effects model coefTest Hypothesis test on fixed and random effects of generalized linear mixed-effects model compare Compare generalized linear mixed-effects models covarianceParameters Extract covariance parameters of generalized linear mixed-effects model designMatrix Fixed- and random-effects design matrices fitted Fitted responses from generalized linear mixed-effects model fixedEffects Estimates of fixed effects and related statistics plotResiduals Plot residuals of generalized linear mixed-effects model predict Predict response of generalized linear mixed-effects model random Generate random responses from fitted generalized linear mixed-effects model randomEffects Estimates of random effects and related statistics refit Refit generalized linear mixed-effects model residuals Residuals of fitted generalized linear mixed-effects model response Response vector of generalized linear mixed-effects model

## Examples

collapse all

`load mfr`

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

• Flag to indicate whether the batch used the new process (`newprocess`)

• Processing time for each batch, in hours (`time`)

• Temperature of the batch, in degrees Celsius (`temp`)

• Categorical variable indicating the supplier (`A`, `B`, or `C`) of the chemical used in the batch (`supplier`)

• Number of defects in the batch (`defects`)

The data also includes `time_dev` and `temp_dev`, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a generalized linear mixed-effects model using `newprocess`, `time_dev`, `temp_dev`, and `supplier` as fixed-effects predictors. Include a random-effects term for intercept grouped by `factory`, to account for quality differences that might exist due to factory-specific variations. The response variable `defects` has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as `'effects'`, so the dummy variable coefficients sum to 0.

The number of defects can be modeled using a Poisson distribution

`${\text{defects}}_{ij}\sim \text{Poisson}\left({\mu }_{ij}\right)$`

This corresponds to the generalized linear mixed-effects model

`$\mathrm{log}\left({\mu }_{ij}\right)={\beta }_{0}+{\beta }_{1}{\text{newprocess}}_{ij}+{\beta }_{2}{\text{time}\text{_}\text{dev}}_{ij}+{\beta }_{3}{\text{temp}\text{_}\text{dev}}_{ij}+{\beta }_{4}{\text{supplier}\text{_}\text{C}}_{ij}+{\beta }_{5}{\text{supplier}\text{_}\text{B}}_{ij}+{b}_{i},$`

where

• ${\text{defects}}_{ij}$ is the number of defects observed in the batch produced by factory $i$ during batch $j$.

• ${\mu }_{ij}$ is the mean number of defects corresponding to factory $i$ (where $i=1,2,...,20$) during batch $j$ (where $j=1,2,...,5$).

• ${\text{newprocess}}_{ij}$, ${\text{time}\text{_}\text{dev}}_{ij}$, and ${\text{temp}\text{_}\text{dev}}_{ij}$ are the measurements for each variable that correspond to factory $i$ during batch $j$. For example, ${\text{newprocess}}_{ij}$ indicates whether the batch produced by factory $i$ during batch $j$ used the new process.

• ${\text{supplier}\text{_}\text{C}}_{ij}$ and ${\text{supplier}\text{_}\text{B}}_{ij}$ are dummy variables that use effects (sum-to-zero) coding to indicate whether company `C` or `B`, respectively, supplied the process chemicals for the batch produced by factory $i$ during batch $j$.

• ${b}_{i}\sim N\left(0,{\sigma }_{b}^{2}\right)$ is a random-effects intercept for each factory $i$ that accounts for factory-specific variation in quality.

```glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ... 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');```

Display the model.

`disp(glme)`
```Generalized linear mixed-effects model fit by ML Model information: Number of observations 100 Fixed effects coefficients 6 Random effects coefficients 20 Covariance parameters 1 Distribution Poisson Link Log FitMethod Laplace Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory) Model fit statistics: AIC BIC LogLikelihood Deviance 416.35 434.58 -201.17 402.35 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue {'(Intercept)'} 1.4689 0.15988 9.1875 94 9.8194e-15 {'newprocess' } -0.36766 0.17755 -2.0708 94 0.041122 {'time_dev' } -0.094521 0.82849 -0.11409 94 0.90941 {'temp_dev' } -0.28317 0.9617 -0.29444 94 0.76907 {'supplier_C' } -0.071868 0.078024 -0.9211 94 0.35936 {'supplier_B' } 0.071072 0.07739 0.91836 94 0.36078 Lower Upper 1.1515 1.7864 -0.72019 -0.015134 -1.7395 1.5505 -2.1926 1.6263 -0.22679 0.083051 -0.082588 0.22473 Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 Type Estimate {'(Intercept)'} {'(Intercept)'} {'std'} 0.31381 Group: Error Name Estimate {'sqrt(Dispersion)'} 1 ```

The `Model information` table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a `Poisson` distribution, the link function is `Log`, and the fit method is `Laplace`.

`Formula` indicates the model specification using Wilkinson’s notation.

The `Model fit statistics` table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (`AIC`), Bayesian information criterion (`BIC`) values, log likelihood (`LogLikelihood`), and deviance (`Deviance`) values.

The `Fixed effects coefficients` table indicates that `fitglme` returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (`Name`) contains the name of each fixed-effects coefficient, column 2 (`Estimate`) contains its estimated value, and column 3 (`SE`) contains the standard error of the coefficient. Column 4 (`tStat`) contains the $t$-statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (`DF`) and column 6 (`pValue`) contain the degrees of freedom and $p$-value that correspond to the $t$-statistic, respectively. The last two columns (`Lower` and `Upper`) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient.

`Random effects covariance parameters` displays a table for each grouping variable (here, only `factory`), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, `std` indicates that `fitglme` returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1.

The standard display generated by `fitglme` does not provide confidence intervals for the random-effects parameters. To compute and display these values, use `covarianceParameters`.