Multinomial Models for Ordinal Responses

The outcome of a response variable might be one of a restricted set of possible values. If there are only two possible outcomes, such as male and female for gender, these responses are called binary responses. If there are multiple outcomes, then they are called polytomous responses. Some examples of polytomous responses include levels of a disease (mild, medium, severe), preferred districts to live in a city, the species for a certain flower type, and so on. Sometimes there might be a natural order among the response categories. These responses are called ordinal responses.

The ordering might be inherent in the category choices, such as an individual being not satisfied, satisfied, or very satisfied with an online customer service. The ordering might also be introduced by categorization of a latent (continuous) variable, such as in the case of an individual being in the low risk, medium risk, or high risk group for developing a certain disease, based on a quantitative medical measure such as blood pressure.

You can specify a multinomial regression model that uses the natural ordering among the response categories. This ordinal model describes the relationship between the cumulative probabilities of the categories and predictor variables.

Different link functions can describe this relationship with logit and probit being the most used.

Logit: By default, the fitmnr function uses the logit link function to create a MultinomialRegression model object with ordinal categories. (You can specify a different link function using the Link name-value argument in fitmnr.) The resulting MultinomialRegression model object models the log cumulative odds—the logarithm of the ratio of the probability that a response belongs to a category with a value less than or equal to category j, P(y ≤ c_j), and the probability that a response belongs to a category with a value greater than category j, P(y >c_j).
Ordinal models are usually based on the assumption that the effects of predictor variables are the same for all categories on the logarithmic scale. That is, the model has different intercepts but common slopes (coefficients) among categories. This model is called a parallel regression or proportional odds model, and is the default for ordinal responses.
The proportional odds model is
$\begin{array}{l} \ln (\frac{P (y \leq c_{1})}{P (y > c_{1})}) = \ln (\frac{π_{1}}{π_{2} + \dots + π_{k}}) = α_{1} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{p} X_{p}, \\ \ln (\frac{P (y \leq c_{2})}{P (y > c_{2})}) = \ln (\frac{π_{1} + π_{2}}{π_{3} + \dots + π_{k}}) = α_{2} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{p} X_{p}, \\ ⋮ \\ \ln (\frac{P (y \leq c_{k - 1})}{P (y > c_{k - 1})}) = \ln (\frac{π_{1} + π_{2} + \dots + π_{k - 1}}{π_{k}}) = α_{k - 1} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{p} X_{p}, \end{array}$
where π_j, j = 1, 2, ..., k, are the category probabilities.
For example, for a response variable with three categories, there are 3 – 1 = 2 equations as follows:

$\begin{array}{l} \ln (\frac{π {}_{1}}{π {}_{2}+ π {}_{3}}) = α_{1} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{p} X_{p}, \\ \ln (\frac{π {}_{1}+ π {}_{2}}{π {}_{3}}) = α_{2} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{p} X_{p} . \end{array}$
Under the proportional odds assumption, the partial effect of a predictor variable X is invariant to the choice of the response variable category, j. For example, if there are three categories, then the coefficients express the impact of a predictor variable on the relative risk or log odds of the response value being in category 1 versus categories 2 or 3, or in category 1 or 2 versus category 3.
Thus, a unit change in variable X₂ would mean a change in the cumulative odds of the response value being in category 1 versus categories 2 or 3, or category 1 or 2 versus category 3 by a factor of exp(β₂), given all else equal.
You can alternatively fit a model with different intercept and slopes among the categories by using the 'interactions','on' name-value pair argument. However, using this option for ordinal models when the equal slopes model is true causes a loss of efficiency (you lose the advantage of estimating fewer parameters).
Probit: The 'link','probit' name-value pair argument uses the probit link function which is based on a normally distributed latent variable assumption. For ordinal response variables this is also called an ordered probit model. Consider the regression model that describes the relationship of a latent variable y* of an ordinal process and a vector of predictor variables, X,
$y^{*} = β X + ε,$
where the error term ε has a standard normal distribution. Suppose there is the following relationship between the latent variable y* and the observed variable y:
$\begin{array}{l} y = c_{1} i f α_{0} < y^{*} \leq α_{1}, \\ y = c_{2} i f α_{1} < y^{*} \leq α_{2}, \\ ⋮ ⋮ \\ y = c_{k} i f α_{k - 1} < y^{*} \leq α_{k}, \end{array}$
where α₀ = – ∞ and α_k = ∞. Then, the cumulative probability of y being in category j or one of earlier categories, P(y ≤ c_j), is equal to
$P (y \leq c_{j}) = P (y^{*} < α_{j}) = P (β X + ε < α_{j}) = P (ε < α_{j} - β X) = Φ (α_{j} - β X),$
where Φ is standard normal cumulative distribution function. Thus,

$Φ^{- 1} (P (y \leq c_{j})) = α_{j} - β X,$
where α_j corresponds to the cut points of the latent variable and the intercept in the regression model. This only holds under the assumptions of a normal latent variable and parallel regression. More generally, for a response variable with k categories and multiple predictors, the ordered probit model is
$\begin{array}{l} Φ^{- 1} (P (y \leq c_{1})) = α_{1} + β_{1} X_{1} + \dots + β_{p} X_{p}, \\ Φ^{- 1} (P (y \leq c_{2})) = α_{2} + β_{1} X_{1} + \dots + β_{p} X_{p}, \\ ⋮ ⋮ \\ Φ^{- 1} (P (y \leq c_{k - 1})) = α_{k - 1} + β_{1} X_{1} + \dots + β_{p} X_{p}, \end{array}$
where P(y ≤ c_j) = π₁+ π₂ + ... + π_j.
The coefficients indicate the impact of a unit change in the predictor variable on the likelihood of a state. A positive coefficient, β₁, for example, indicates an increase in the underlying latent variable with an increase in the corresponding predictor variable, X₁. Hence, it causes a decrease in P(y ≤ c₁) and an increase in P(y ≤ c_k).

After estimating the model coefficients by using fitmnr to create a MultinomialRegression model object, you can estimate the cumulative probabilities in each category by using predict with the name-value argument ProbabilityType="cumulative". predict accepts the MultinomialRegression model object returned by fitmnr, and estimates the category labels, categorical probabilities, and confidence bounds for each categorical probability. You can specify whether predict returns category, cumulative, or conditional probabilities using the ProbabilityType name-value argument.

References

[1] McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990.

[2] Long, J. S. Regression Models for Categorical and Limited Dependent Variables. Sage Publications, 1997.

[3] Dobson, A. J., and A. G. Barnett. An Introduction to Generalized Linear Models. Chapman and Hall/CRC. Taylor & Francis Group, 2008.

Multinomial Models for Ordinal Responses

References

See Also

Topics