This section focuses on using likelihood-based methods for multivariate normal regression. The parameters of the regression model are estimated via maximum likelihood estimation. For multiple series, this requires iteration until convergence. The complication due to the possibility of missing data is incorporated into the analysis with a variant of the EM algorithm known as the ECM algorithm.

The underlying theory of maximum likelihood estimation and the definition and significance of the Fisher information matrix can be found in Caines [1] and Cramér [2]. The underlying theory of the ECM algorithm can be found in Meng and Rubin [8] and Sexton and Swensen [9].

In addition, these two examples of maximum likelihood estimation are presented:

Suppose that you have a multivariate normal linear regression model in the form

$$\left[\begin{array}{c}{Z}_{1}\\ \vdots \\ {Z}_{m}\end{array}\right]~N\left(\left[\begin{array}{c}{H}_{1}b\\ \vdots \\ {H}_{m}b\end{array}\right],\text{}\left[\begin{array}{ccc}C& & 0\\ & \ddots & \\ 0& & C\end{array}\right]\right),$$

where the model has *m* observations of *n*-dimensional
random variables *Z*_{1}, ...,
*Z** _{m}* with a linear regression model that has a

Given a parameter vector *b* and a collection of design
matrices, the collection of *m* independent variables *Z** _{k}* is assumed to have independent identically distributed
multivariate normal residual errors

`0`

and `n`

-by-`n`

covariance
matrix A concise way to write this model is

$${Z}_{k}\sim N\left({H}_{k}b,\text{\hspace{0.17em}}C\right)$$

for *k* = 1, ..., *m*.

The goal of multivariate normal regression is to obtain maximum likelihood estimates for
*b* and *C*
given a collection of *m* observations *z*_{1}, ..., *z** _{m}* of
the random variables

**Note**

Quasi-maximum likelihood estimation works with the same models but with a relaxation of the assumption of normally distributed residuals. In this case, however, the parameter estimates are asymptotically optimal.

To estimate the parameters of the multivariate normal linear regression model using maximum
likelihood estimation, it is necessary to maximize the log-likelihood function over
the estimation parameters given observations *z*_{1}, ... , *z** _{m}*.

Given the multivariate normal model to characterize residual errors in the regression model, the log-likelihood function is

$$\begin{array}{c}L\left({z}_{1},\dots ,{z}_{m};\text{\hspace{0.17em}}b,\text{\hspace{0.17em}}C\right)=\frac{1}{2}mn\mathrm{log}\left(2\pi \right)+\frac{1}{2}m\mathrm{log}\left(\mathrm{det}\left(C\right)\right)\\ +\frac{1}{2}{\displaystyle \sum _{k=1}^{m}{\left({z}_{k}-{H}_{k}b\right)}^{T}{C}^{-1}\left({z}_{k}-{H}_{k}b\right)}.\end{array}$$

Although the cross-sectional residuals must be independent, you can use this log-likelihood
function for quasi-maximum likelihood estimation. In this case, the estimates for
the parameters *b* and *C* provide estimates to characterize the first and second moments of
the residuals. See Caines [1] for details.

Except for a special case (see Special Case of Multiple Linear Regression Model), if both the model parameters in *b* and the
covariance parameters in *C* are to be estimated,
the estimation problem is intractably nonlinear and a solution must use iterative
methods. Denote estimates for the parameters *b*
and *C* for iteration *t* = 0,
1, ... with the superscript notation *b** ^{(}^{t}^{)}* and

Given initial estimates *b*^{(0)} and
*C*^{(0)} for the
parameters, the maximum likelihood estimates for *b* and *C * are obtained using a
two-stage iterative process with

$${b}^{\left(t+1\right)}={\left({\displaystyle \sum _{k=1}^{m}{H}_{k}{}^{T}{\left({C}^{\left(t\right)}\right)}^{-1}{H}_{k}}\right)}^{-1}\left({\displaystyle \sum _{k=1}^{m}{H}_{k}{}^{T}{\left({C}^{\left(t\right)}\right)}^{-1}{z}_{k}}\right)$$

and

$${C}^{\left(t+1\right)}=\frac{1}{m}{\displaystyle \sum _{k=1}^{m}\left({z}_{k}-{H}_{k}{b}^{\left(t+1\right)}\right){\left({z}_{k}-{H}_{k}{b}^{\left(t+1\right)}\right)}^{T}}$$

for *t* = 0, 1, ... .

The special case mentioned in Maximum Likelihood Estimation occurs if
*n* = 1 so that the sequence of observations is a sequence of
scalar observations. This model is known as a multiple linear regression model. In
this case, the covariance matrix *C* is a
`1`

-by-`1`

matrix that drops out of the
maximum likelihood iterates so that a single-step estimate for *b* and *C* can be
obtained with converged estimates *b*^{(1)} and *C*^{(1)}.

Another simplification of the general model is called least-squares regression. If *b*^{(0)} = `0`

and *C*^{(0)} = *I*, then *b*^{(1)} and *C*^{(1)} from the two-stage iterative process
are least-squares estimates for *b* and *C*, where

$${b}^{LS}={\left({\displaystyle \sum _{k=1}^{m}{H}_{k}{}^{T}{H}_{k}}\right)}^{-1}\left({\displaystyle \sum _{k=1}^{m}{H}_{k}{}^{T}{z}_{k}}\right)$$

and

$${C}^{LS}=\frac{1}{m}{\displaystyle \sum _{k=1}^{m}\left({z}_{k}-{H}_{k}{b}^{LS}\right){\left({z}_{k}-{H}_{k}{b}^{LS}\right)}^{T}}.$$

A final simplification of the general model is to estimate the mean and covariance of a
sequence of *n*-dimensional observations *z*_{1}, ..., *z** _{m}*. In
this case, the number of series is equal to the number of model parameters with

If the iterative process continues until the log-likelihood function increases by no more than
a specified amount, the resultant estimates are said to be maximum likelihood
estimates *b** ^{ML}* and

If *n* = 1 (which implies a single data series),
convergence occurs after only one iterative step, which, in turn,
implies that the least-squares and maximum likelihood estimates are
identical. If, however, *n* > 1, the least-squares
and maximum likelihood estimates are usually distinct.

In Financial Toolbox™ software, both the changes in the log-likelihood function and the norm of the change in parameter estimates are monitored. Whenever both changes fall below specified tolerances (which should be something between machine precision and its square root), the toolbox functions terminate under an assumption that convergence has been achieved.

Since maximum likelihood estimates are formed from samples of random variables, their estimators are random variables; an estimate derived from such samples has an uncertainty associated with it. To characterize these uncertainties, which are called standard errors, two quantities are derived from the total log-likelihood function.

The Hessian of the total log-likelihood function is

$${\nabla}^{2}L\left({z}_{1},\dots ,{z}_{m};\text{\hspace{0.17em}}\theta \right)$$

and the Fisher information matrix is

$$I\left(\theta \right)=-E\left[{\nabla}^{2}L\left({z}_{1},\dots ,{z}_{m};\text{\hspace{0.17em}}\theta \right)\right],$$

where the partial derivatives of the $${\nabla}^{2}$$ operator are taken with respect to the combined parameter vector Θ
that contains the distinct components of *b* and
*C* with a total of *q* =
*p* + *n* (*n* + 1)/2
parameters.

Since maximum likelihood estimation is concerned with large-sample estimates, the central limit theorem applies to the estimates and the Fisher information matrix plays a key role in the sampling distribution of the parameter estimates. Specifically, maximum likelihood parameter estimates are asymptotically normally distributed such that

$$\left({\theta}^{\left(t\right)}-\theta \right)\sim N\left(0,\text{\hspace{0.17em}}{I}^{-1},\text{\hspace{0.17em}}\left({\theta}^{\left(t\right)}\right)\right)\text{as}t\to \infty ,$$

where Θ is the combined parameter vector and Θ^{(}^{t}^{)} is
the estimate for the combined parameter vector at iteration *t* =
0, 1, ... .

The Fisher information matrix provides a lower bound, called a Cramér-Rao lower bound, for the standard errors of estimates of the model parameters.

Given an estimate for the combined parameter vector Θ, the squared standard errors are the diagonal elements of the inverse of the Fisher information matrix

$${s}^{2}\left({\widehat{\theta}}_{i}\right)={\left({I}^{-1}\left({\widehat{\theta}}_{i}\right)\right)}_{ii}$$

for *i* = 1, ..., *q*.

Since the standard errors are estimates for the standard deviations of the parameter estimates, you can construct confidence intervals so that, for example, a 95% interval for each parameter estimate is approximately

$${\widehat{\theta}}_{i}\pm 1.96s\left({\widehat{\theta}}_{i}\right)$$

for *i* = 1, ..., *q*.

Error ellipses at a level-of-significance α ε [0, 1] for the parameter estimates satisfy the inequality

$${\left(\theta -\widehat{\theta}\right)}^{T}I\left(\widehat{\theta}\right)\left(\theta -\widehat{\theta}\right)\le {\chi}_{1-\alpha ,q}^{2}$$

and follow a $${\chi}^{2}$$ distribution with *q* degrees-of-freedom.
Similar inequalities can be formed for any subcollection of the parameters.

In general, given parameter estimates, the computed Fisher information matrix, and the log-likelihood function, you can perform numerous statistical tests on the parameters, the model, and the regression.

`convert2sur`

| `ecmlsrmle`

| `ecmlsrobj`

| `ecmmvnrfish`

| `ecmmvnrfish`

| `ecmmvnrmle`

| `ecmmvnrobj`

| `ecmmvnrstd`

| `ecmmvnrstd`

| `ecmnfish`

| `ecmnhess`

| `ecmninit`

| `ecmnmle`

| `ecmnobj`

| `ecmnstd`

| `mvnrfish`

| `mvnrmle`

| `mvnrobj`

| `mvnrstd`