Analysis of deviance for generalized linear regression model
Perform a deviance test on a generalized linear regression model.
Generate sample data using Poisson random numbers with two underlying predictors
rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);
Create a generalized linear regression model of Poisson data.
mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson')
mdl = Generalized linear regression model: log(y) ~ 1 + x1 + x2 Distribution = Poisson Estimated Coefficients: Estimate SE tStat pValue ________ _________ ______ ______ (Intercept) 1.0405 0.022122 47.034 0 x1 0.9968 0.003362 296.49 0 x2 1.987 0.0063433 313.24 0 100 observations, 97 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0
Test whether the model differs from a constant in a statistically significant way.
tbl = devianceTest(mdl)
tbl=2×4 table Deviance DFE chi2Stat pValue __________ ___ __________ ______ log(y) ~ 1 2.9544e+05 99 log(y) ~ 1 + x1 + x2 107.4 97 2.9533e+05 0
The small p-value indicates that the model significantly differs from a constant. Note that the model display of
mdl includes the statistics shown in the second row of the table.
tbl— Analysis of deviance summary statistics
Analysis of deviance summary statistics, returned as a table.
tbl contains analysis of deviance statistics for both a
constant model and the model
mdl. The table includes these columns
for each model.
Deviance is twice the difference between the loglikelihoods of the
corresponding model (
Degrees of freedom for the error (residuals), equal to n – p, where n is the number of observations, and p is the number of estimated coefficients
F-statistic or chi-squared statistic, depending on whether the dispersion is estimated (F-statistic) or not (chi-squared statistic)
p-value associated with the test: chi-squared
statistic with p – 1 degrees of freedom, or F-statistic with p – 1 numerator degrees of freedom and
Deviance of a model M1 is twice the difference between the loglikelihood of the model M1 and the saturated model Ms. A saturated model is the model with the maximum number of parameters that you can estimate.
For example, if you have n observations (yi, i = 1, 2, ..., n) with potentially different values for XiTβ, then you can define a saturated model with n parameters. Let L(b,y) denote the maximum value of the likelihood function for a model with the parameters b. Then the deviance of the model M1 is
where b1 and bs contain the estimated parameters for the model M1 and the saturated model, respectively. The deviance has a chi-square distribution with n – p degrees of freedom, where n is the number of parameters in the saturated model and p is the number of parameters in the model M1.
Assume you have two different generalized linear regression models M1 and M2, and M1 has a subset of the terms in M2. You can assess the fit of the models by comparing the deviances D1 and D2 of the two models. The difference of the deviances is
Asymptotically, the difference D has a chi-square distribution with degrees
of freedom v equal to the difference in the number of parameters
estimated in M1 and
M2. You can obtain the
p-value for this test by using
1 – chi2cdf(D,v).
Typically, you examine D using a model M2 with a constant term and no predictors. Therefore, D has a chi-square distribution with p – 1 degrees of freedom. If the dispersion is estimated, the difference divided by the estimated dispersion has an F distribution with p – 1 numerator degrees of freedom and n – p denominator degrees of freedom.