The multivariate time series models used in Econometrics Toolbox™ functions are based on linear, autoregressive models. The basic models are:
Model Name | Abbreviation | Equation |
---|---|---|
Vector Autoregressive | VAR(p) | $${y}_{t}=a+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\epsilon}_{t}$$ |
Vector Moving Average | VMA(q) | $${y}_{t}=a+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{\epsilon}_{t}$$ |
Vector Autoregressive Moving Average | VARMA(p, q) | $${y}_{t}=a+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{\epsilon}_{t}$$ |
Vector Autoregressive Moving Average with eXogenous inputs | VARMAX(p, q, r) | $${y}_{t}=a+{X}_{t}\cdot b+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{\epsilon}_{t}$$ |
Structural Vector Autoregressive Moving Average with eXogenous inputs | SVARMAX(p, q, r) | $${A}_{0}{y}_{t}=a+{X}_{t}\cdot b+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{B}_{0}{\epsilon}_{t}$$ |
The following variables appear in the equations:
y_{t} is the vector of response time series variables at time t. y_{t} has n elements.
a is a constant vector of offsets, with n elements.
A_{i} are n-by-n matrices for each i. The A_{i} are autoregressive matrices. There are p autoregressive matrices.
ε_{t} is a vector of serially uncorrelated innovations, vectors of length n. The ε_{t} are multivariate normal random vectors with a covariance matrix Q, where Q is an identity matrix, unless otherwise specified.
B_{j} are n-by-n matrices for each j. The B_{j} are moving average matrices. There are q moving average matrices.
X_{t} is an n-by-r matrix representing exogenous terms at each time t. r is the number of exogenous series. Exogenous terms are data (or other unmodeled inputs) in addition to the response time series y_{t}.
b is a constant vector of regression coefficients of size r. So the product X_{t}·b is a vector of size n.
Generally, the time series y_{t} and X_{t} are
observable. In other words, if you have data, it represents one or
both of these series. You do not always know the offset a,
coefficient b, autoregressive matrices A_{i},
and moving average matrices B_{j}.
You typically want to fit these parameters to your data. See the vgxvarx
function reference page for ways
to estimate unknown parameters. The innovations ε_{t} are
not observable, at least in data, though they can be observable in
simulations.
There is an equivalent representation of the linear autoregressive equations in terms of lag operators. The lag operator L moves the time index back by one: Ly_{t} = y_{t–1}. The operator L^{m} moves the time index back by m: L^{m}y_{t} = y_{t–m}.
In lag operator form, the equation for a SVARMAX(p, q, r) model becomes
$$\left({A}_{0}-{\displaystyle \sum _{i=1}^{p}{A}_{i}{L}^{i}}\right){y}_{t}=a+{X}_{t}b+\left({B}_{0}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{L}^{j}}\right){\epsilon}_{t}.$$
This equation can be written as
$$A(L){y}_{t}=a+{X}_{t}b+B(L){\epsilon}_{t},$$
where
$$A(L)={A}_{0}-{\displaystyle \sum _{i=1}^{p}{A}_{i}{L}^{i}}\text{and}B(L)={B}_{0}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{L}^{j}}.$$
$$\mathrm{det}\left({I}_{n}-{A}_{1}z-{A}_{2}{z}^{2}-\mathrm{...}-{A}_{p}{z}^{p}\right)\ne 0\text{for}\left|z\right|\le 1,$$
This condition implies that, with all innovations equal to zero, the VAR process converges to a as time goes on. See Lütkepohl [69] Chapter 2 for a discussion.
$$\mathrm{det}\left({I}_{n}+{B}_{1}z+{B}_{2}{z}^{2}+\mathrm{...}+{B}_{q}{z}^{q}\right)\ne 0\text{for}\left|z\right|\le 1.$$
This condition implies that the pure VAR representation of the process is stable. For an explanation of how to convert between VAR and VMA models, see Changing Model Representations. See Lütkepohl [69] Chapter 11 for a discussion of invertible VMA models.
A VARMA model is stable if its VAR part is stable. Similarly, a VARMA model is invertible if its VMA part is invertible.
There is no well-defined notion of stability or invertibility for models with exogenous inputs (e.g., VARMAX models). An exogenous input can destabilize a model.
To understand a multiple time series model, or multiple time series data, you generally perform the following steps:
Import and preprocess data.
Specify a model.
Specifying Models to
set up a model using vgxset
:
Specification Structures with Known Parameters to specify a model with known parameters
Specification Structures with No Parameter Values to specify a model when you want MATLAB^{®} to estimate the parameters
Specification Structures with Selected Parameter Values to specify a model where you know some parameters, and want MATLAB to estimate the others
Determining an Appropriate Number of Lags to determine an appropriate number of lags for your model
Fit the model to data.
Fitting Models to Data to use vgxvarx
to estimate the unknown parameters
in your models. This can involve:
Changing Model Representations to change your model to a
type that vgxvarx
handles
Estimating Structural Matrices
Analyze and forecast using the fitted model. This can involve:
Checking Stability to determine whether your model is stable and invertible
Forecasting with vgxpred to forecast directly from models
Forecasting with vgxsim to simulate a model
Generate Impulse Responses for a VAR model to calculate impulse responses, which give forecasts based on an assumed change in an input to a time series
Comparing Forecasts with Forecast Period Data to compare the results of your model's forecasts to your data
Your application need not involve all of the steps in this workflow. For example, you might not have any data, but want to simulate a parameterized model. In that case, you would perform only steps 2 and 4 of the generic workflow.
You might iterate through some of these steps.
Often, the first step in creating a multiple time series model is to obtain data. There are two types of multiple time series data:
Response data. Response data corresponds to y_{t} in the multiple time series models defined in Types of VAR Models.
Exogenous data. Exogenous data corresponds to X_{t} in the multiple time series models defined in Types of VAR Models. For details and examples on structuring preparing exogenous data for multivariate model analysis, see Multivariate Time Series Models with Regression Terms.
Before using Econometrics Toolbox functions with the data, put the data into the required form. Use standard MATLAB commands, or preprocess the data with a spreadsheet program, database program, PERL, or other utilities.
There are several freely available sources of data sets, such
as the St. Louis Federal Reserve Economics Database
(known as FRED): http://research.stlouisfed.org/fred2/
.
If you have a license, you can use Datafeed Toolbox™ functions
to access data from various sources.
Response data for multiple time series models must be in the form of a matrix. Each row of the matrix represents one time, and each column of the matrix represents one time series. The earliest data is the first row, the latest data is the last row. The data represents y_{t} in the notation of Types of VAR Models. If there are T times and n time series, put the data in the form of a T-by-n matrix:
$$\left[\begin{array}{cccc}Y{1}_{1}& Y{2}_{1}& \cdots & Y{n}_{1}\\ Y{1}_{2}& Y{2}_{2}& \cdots & Y{n}_{2}\\ \vdots & \vdots & \ddots & \vdots \\ Y{1}_{T}& Y{2}_{T}& \cdots & Y{n}_{T}\end{array}\right]$$
Y1_{t} represents time series 1,..., Yn_{t} represents time series n, 1 ≤ t ≤ T.
Multiple Paths. Response
time series data can have an extra dimension corresponding to separate,
independent paths. For this type of data, use a three-dimensional
array Y(t,j,p)
, where:
t
is the time index of an observation,
1 ≤ t
≤ T
.
j
is the index of a time series,
1 ≤ j
≤ n
.
p
is the path index, 1 ≤ p
≤ P
.
For any path p
, Y(t,j,p)
is
a time series.
The file Data_USEconModel
ships with Econometrics Toolbox software.
It contains time series from the St. Louis Federal Reserve Economics
Database (known as FRED).
Enter
load Data_USEconModel
to load the data into your MATLAB workspace. The following items load into the workspace:
Data
, a 249-by-14 matrix containing
the 14 time series,
DataTable
, a 249-by-14 tabular
array that packages the data,
dates
, a 249-element vector containing
the dates for Data
,
Description
, a character array
containing a description of the data series and the key to the labels
for each series,
series
, a 1-by-14 cell array of
labels for the time series.
Examine the data structures:
firstPeriod = dates(1) lastPeriod = dates(end)
firstPeriod = 711217 lastPeriod = 733863
dates
is a vector containing MATLAB serial
date numbers, the number of days since the putative date January 1,
0000. (This "date" is not a real date, but is convenient
for making date calculations; for more information, see Date Formats in the Financial Toolbox™ User's
Guide.)
The Data
matrix contains 14 columns.
These represent the time series labeled by the cell vector of strings series
.
FRED Series | Description |
---|---|
COE | Paid compensation of employees in $ billions |
CPIAUCSL | Consumer Price Index |
FEDFUNDS | Effective federal funds rate |
GCE | Government consumption expenditures and investment in $ billions |
GDP | Gross Domestic Product |
GDPDEF | Gross domestic product in $ billions |
GDPI | Gross private domestic investment in $ billions |
GS10 | Ten-year treasury bond yield |
HOANBS | Non-farm business sector index of hours worked |
M1SL | M1 money supply (narrow money) |
M2SL | M2 money supply (broad money) |
PCEC | Personal consumption expenditures in $ billions |
TB3MS | Three-month treasury bill yield |
UNRATE | Unemployment rate |
DataTable
is a tabular array containing the
same data as in Data
, but you can call variables
using the tabular array and the name of the variable. Use dot notation
to access a variable, for example, DataTable.UNRATE
calls
the unemployment rate time series.
Your data might have characteristics that violate assumptions for linear multiple time series models. For example, you can have data with exponential growth, or data from multiple sources at different periodicities. You must preprocess your data to convert it into an acceptable form for analysis.
For time series with exponential growth, you can preprocess the data by taking the logarithm of the growing series. In some cases you then difference the result. For an example, see Transforming Data for Stationarity.
For data from multiple sources, you must decide how best to fill in missing values. Commonly, you take the missing values as unchanged from the previous value, or as interpolated from neighboring values.
Note: If you take a difference of a series, the series becomes 1 shorter. If you take a difference of only some time series in a data set, truncate the other series so that all have the same length, or pad the differenced series with initial values. |
Testing Data for Stationarity. You can test each time series data column for stability using unit root tests. For details, see Unit Root Nonstationarity.
To fit a lagged model to data, partition the response data in up to three sections:
Presample data
Estimation data
Forecast data
The purpose of presample data is to provide initial values for
lagged variables. When trying to fit a model to the estimation data,
you need to access earlier times. For example, if the maximum lag
in a model is 4, and if the earliest time in the estimation data is
50, then the model needs to access data at time 46 when fitting the
observations at time 50. vgxvarx
assumes the
value 0 for any data that is not part of the presample data.
The estimation data contains the observations y_{t}. Use the estimation data to fit the model.
Use the forecast data for comparing fitted model predictions against data. You do not have to have a forecast period. Use one to validate the predictive power of a fitted model.
The following figure shows how to arrange the data in the data matrix, with j presample rows and k forecast rows.
Econometrics Toolbox functions require a model specification structure as an input before they simulate, calibrate, forecast, or perform other calculations. You can specify a model with or without time series data. If you have data, you can fit models to the data as described in VAR Model Estimation. If you do not have data, you can specify a model with parameters you provide, as described in Specification Structures with Known Parameters.
Create an Econometrics Toolbox multiple time series model
specification structure using the vgxset
function.
Use this structure for calibrating, simulating, predicting, analyzing,
and displaying multiple time series.
There are three ways to create model structures using the vgxset
function:
Specification Structures with Known Parameters. Use this method when you know the values of all relevant parameters of your model.
Specification Structures with No Parameter Values. Use this method when you know the size, type, and number of lags in your model, but do not know the values of any of the AR or MA coefficients, or the value of your Q matrix.
Specification Structures with Selected Parameter Values.
Use this method when you know the size, type, and number of lags in
your model, and also know some, but not all, of the values of AR or
MA coefficients. This method includes the case when you want certain
parameters to be zero. You can specify as many parameters as you like.
For example, you can specify certain parameters, request that vgxvarx
estimate
others, and specify other parameters with []
to
indicate default values.
If you know the values of model parameters, create a model structure
with the parameters. The following are the name-value pairs you can
pass to vgxset
for known parameter values:
Model Parameters
Name | Value |
---|---|
a | An |
AR0 | An |
AR | An |
MA0 | An |
MA | An |
b | An |
Q | An |
ARlag | A monotonically increasing |
MAlag | A monotonically increasing |
vgxset
infers the model dimensionality,
given by n, p, and q in Types of VAR Models, from the input parameters. These parameters
are n
, nAR
, and nMA
respectively
in vgxset
syntax. For more information, see Specification Structures with No Parameter Values.
The ARlag
and MAlag
vectors
allow you to specify which lags you want to include. For example,
to specify AR lags 1 and 3 without lag 2, set ARlag
to [1 3]
. This setting corresponds to nAR
= 2 for two specified lags, even though
this is a third order model, since the maximum lag is 3.
The following example shows how to create a model structure when you have known parameters. Consider a VAR(1) model:
$${y}_{t}=a+\left[\begin{array}{ccc}.5& 0& 0\\ .1& .1& .3\\ 0& .2& .3\end{array}\right]{y}_{t-1}+{\epsilon}_{t},$$
Specifically, a = [0.05, 0, –.05]'
and w_{t} are
distributed as standard three-dimensional normal random variables.
Create a model specification structure with vgxset
:
A1 = [.5 0 0;.1 .1 .3;0 .2 .3]; Q = eye(3); Mdl = vgxset('a',[.05;0;-.05],'AR',{A1},'Q',Q)
Mdl = Model: 3-D VAR(1) with Additive Constant n: 3 nAR: 1 nMA: 0 nX: 0 a: [0.05 0 -0.05] additive constants AR: {1x1 cell} stable autoregressive process Q: [3x3] covariance matrix
vgxset
identifies this model as a stable
VAR(1) model with three dimensions and additive constants.
By default, vgxvarx
fits all unspecified
additive (a), AR, regression coefficients (b),
and Q parameters. You must specify the number of time series and the
type of model you want vgxvarx
to fit. The following
are the name-value pairs you can pass to vgxset
for
unknown parameter values:
Model Orders
Name | Value |
---|---|
n | A positive integer specifying the number of time series.
The default is |
nAR | A nonnegative integer specifying the number of AR lags
(corresponds to p in Types of VAR Models). The default is |
nMA | A nonnegative integer specifying the number of MA lags
(corresponds to q in Types of VAR Models). The default is Currently, |
nX | A nonnegative integer specifying the number regression
parameters (corresponds to r in Types of VAR Models). The default is |
Constant | Additive offset logical indicator. The default is |
The following example shows how to specify the model in Specification Structures with Known Parameters, but without explicit parameters.
Mdl = vgxset('n',3,'nAR',1,'Constant',true)
Mdl = Model: 3-D VAR(1) with Additive Constant n: 3 nAR: 1 nMA: 0 nX: 0 a: [] AR: {} Q: []
You can create a model structure with some known parameters,
and have vgxvarx
fit the unknown parameters to
data. Here are the name-value pairs you can pass to vgxset
for
requested parameter values:
Model Parameter Estimation
Name | Value |
---|---|
asolve | An |
ARsolve | An |
AR0solve | An |
MAsolve | An |
MA0solve | An |
bsolve | An |
Qsolve | An |
Specify a logical 1 (true) for every parameter that you want vgxvarx
to
estimate.
Currently, vgxvarx
cannot fit the AR0
, MA0
,
or MA
matrices. Therefore, vgxvarx
ignores
the AR0solve
, MA0solve
, and MAsolve
indicators.
However, you can examine the Example_StructuralParams.m
file
for an approach to estimating the AR0
and MA0
matrices.
Enter help Example_StructuralParams
at the MATLAB command
line for information. See Lütkepohl [69] Chapter 9 for
algorithms for estimating structural models.
Currently, vgxvarx
also ignores the Qsolve
matrix. vgxvarx
can
fit either a diagonal or a full Q
matrix; see vgxvarx
.
This example shows how to specify the model in Specification Structures with Known Parameters, but
with requested AR parameters with a diagonal autoregressive structure.
The dimensionality of the model is known, as is the parameter vector a
,
but the autoregressive matrix A1
and covariance
matrix Q
are not known.
Mdl = vgxset('ARsolve',{logical(eye(3))},'a',... [.05;0;-.05])
Mdl = Model: 3-D VAR(1) with Additive Constant n: 3 nAR: 1 nMA: 0 nX: 0 a: [0.05 0 -0.05] additive constants AR: {} ARsolve: {1x1 cell of logicals} autoregressive lag indicators Q: []
After you set up a model structure, you can examine it in several ways:
Call the vgxdisp
function.
Double-click the structure in the MATLAB Workspace browser.
Call the vgxget
function.
Enter Spec
at the MATLAB command
line, where Spec
is the name of the model
structure.
Enter Spec.ParamName
at the MATLAB command
line, where Spec
is the name of the model
structure, and ParamName
is the name of
the parameter you want to examine.
You can change any part of a model structure named, for example, Spec
,
using the vgxset
as follows:
Spec = vgxset(Spec,'ParamName',value,...)
This syntax changes only the 'ParamName'
parts
of the Spec
structure.
There are two Econometrics Toolbox functions that can help you determine an appropriate number of lags for your models:
The lratiotest
function
performs likelihood ratio tests to help identify the appropriate number
of lags.
The aicbic
function
calculates the Akaike and Bayesian information criteria to determine
the minimal appropriate number of required lags.
Example: Using the Likelihood Ratio Test to Calculate the Minimal
Requisite Lag. lratiotest
requires inputs
of the loglikelihood of an unrestricted model, the loglikelihood of
a restricted model, and the number of degrees of freedom (DoF). DoF
is the difference in the number of active parameters between the unrestricted
and restricted models. lratiotest
returns a Boolean: 1
means
reject the restricted model in favor of the unrestricted model, 0
means
there is insufficient evidence to reject the restricted model.
In the context of determining an appropriate number of lags,
the restricted model has fewer lags, and the unrestricted model has
more lags. Otherwise, test models with the same type of fitted parameters
(for example, both with full Q
matrices, or both
with diagonal Q
matrices).
Obtain the loglikelihood (LLF) of a model as the third
output of vgxvarx
:
[EstSpec,EstStdErrors,LLF,W] = vgxvarx(...)
Obtain the number of active parameters in a model
as the second output of vgxcount
:
[NumParam,NumActive] = vgxcount(Spec)
For example, suppose you have four fitted models with varying
lag structures. The models have loglikelihoods LLF1
, LLF2
, LLF3
,
and LLF4
, and active parameter counts n1p
, n2p
, n3p
,
and n4p
. Suppose model 4 has the largest number
of lags. Perform likelihood ratio tests of models 1, 2, and 3 against
model 4, as follows:
reject1 = lratiotest(LLF4,LLF1,n4p - n1p) reject2 = lratiotest(LLF4,LLF2,n4p - n2p) reject3 = lratiotest(LLF4,LLF3,n4p - n3p)
If reject1
= 1
, you reject
model 1 in favor of model 4, and similarly for models 2 and 3. If
any of the models have rejectI
= 0
,
you have an indication that you can use fewer lags than in model 4.
Example: Using Akaike Information Criterion to Calculate the
Minimal Requisite Lag. aicbic
requires inputs of the loglikelihood
of a model and of the number of active parameters in the model. It
returns the value of the Akaike information criterion. Lower values
are better than higher values. aicbic
accepts
vectors of loglikelihoods and vectors of active parameters, and returns
a vector of Akaike information criteria, which makes it easy to find
the minimum.
Obtain the loglikelihood (LLF) of a model as the third
output of vgxvarx
:
[EstSpec,EstStdErrors,LLF,W] = vgxvarx(...)
Obtain the number of active parameters in a model
as the second output of vgxcount
:
[NumParam,NumActive] = vgxcount(Spec)
For example, suppose you have four fitted models with varying
lag structures. The models have loglikelihoods LLF1
, LLF2
, LLF3
,
and LLF4
, and active parameter counts n1p
, n2p
, n3p
,
and n4p
. Calculate the Akaike information criteria
as follows:
AIC = aicbic([LLF1 LLF2 LLF3 LLF4],[n1p n2p n3p n4p])
The most suitable model has the lowest value of the Akaike information criterion.
To create a model of multiple time series data, decide on a parametric form of the model, and fit parameters to the data. When you have a calibrated (fitted) model, check if the model fits the data adequately.
To fit a model to data, you must have:
Time series data, as described in Multivariate Time Series Data
At least one time series model specification structure, as described in Model Specification Structures
There are several Econometrics Toolbox functions that aid these tasks, including:
vgxvarx
, which
fits VARX models.
vgxar
and vgxma
, which convert models to pure AR or
MA models; vgxar
enables you to
fit VARMA models with vgxvarx
,
as described in Fit a VARMA Model
lratiotest
, lmtest
, waldtest
,
and aicbic
, which can help determine
the number of lags to include in a model.
vgxqual
, which
examines the stability of models, as described in Checking Model Adequacy.
vgxpred
, which
creates forecasts that can be used to check the adequacy of the fit,
as described in VAR Model Forecasting, Simulation, and Analysis
Structural Matrices. The structural matrices in SVARMAX models are the A_{0} and B_{0} matrices.
See Types of VAR Models for definitions
of these terms. Currently, vgxvarx
cannot fit
these matrices to data. However, you can examine the Example_StructuralParams.m
file
for an approach to estimating the AR0
and MA0
matrices.
Enter help Example_StructuralParams
at the MATLAB command
line for information. See Lütkepohl [69] Chapter 9 for
algorithms for estimating structural models.
You can convert a VARMA model to an equivalent VAR model using
the vgxar
function. (See Types of VAR Models for definitions of
these terms.) Similarly, you can convert a VARMA model to an equivalent
VMA model using the vgxma
function.
These functions are useful in the following situations:
Calibration of models
The vgxvarx
function can
calibrate only VAR and VARX models. So to calibrate a VARMA model,
you could first convert it to a VAR model. However, you can ask vgxvarx
to
ignore VMA terms and fit just the VAR structure. See Fit a VARMA Model for a comparison of conversion versus
no conversion.
Forecasting
It is straightforward to generate forecasts for VMA models.
In fact, vgxpred
internally converts models to
VMA models to calculate forecast statistics.
Analyzing models
Sometimes it is easier to define your model using one structure, but you want to analyze it using a different structure.
The algorithm for conversion between models involves series
that are, in principle, infinite. The vgxar
and vgxma
functions
truncate these series to the maximum of nMA
and nAR
,
introducing an inaccuracy. You can specify that the conversion give
more terms, or give terms to a specified accuracy. See [69] for more information
on these transformations.
Convert a VARMA Model to a VAR Model.
This example creates a VARMA model, then converts it to a pure VAR model.
Create a VARMA model specification structure.
A1 = [.2 -.1 0;.1 .2 .05;0 .1 .3]; A2 = [.3 0 0;.1 .4 .1;0 0 .2]; A3 = [.4 .1 -.1;.2 -.5 0;.05 .05 .2]; MA1 = .2*eye(3); MA2 = [.3 .2 .1;.2 .4 0;.1 0 .5]; Spec = vgxset('AR',{A1,A2,A3},'MA',{MA1,MA2})
Spec = Model: 3-D VARMA(3,2) with No Additive Constant n: 3 nAR: 3 nMA: 2 nX: 0 AR: {3x1 cell} stable autoregressive process MA: {2x1 cell} invertible moving average process Q: []
Convert the structure to a pure VAR structure:
SpecAR = vgxar(Spec)
SpecAR = Model: 3-D VAR(3) with No Additive Constant n: 3 nAR: 3 nMA: 0 nX: 0 AR: {3x1 cell} unstable autoregressive process Q: []
The converted process is unstable; see the AR row. An unstable model could yield inaccurate predictions. Taking more terms in the series gives a stable model:
specAR4 = vgxar(Spec,4)
specAR4 = Model: 3-D VAR(4) with No Additive Constant n: 3 nAR: 4 nMA: 0 nX: 0 AR: {4x1 cell} stable autoregressive process Q: []
Convert a VARMA Model to a VMA Model.
This example uses a VARMA model and converts it to a pure VMA model.
Create a VARMA model specification structure.
A1 = [.2 -.1 0;.1 .2 .05;0 .1 .3]; A2 = [.3 0 0;.1 .4 .1;0 0 .2]; A3 = [.4 .1 -.1;.2 -.5 0;.05 .05 .2]; MA1 = .2*eye(3); MA2 = [.3 .2 .1;.2 .4 0;.1 0 .5]; Spec = vgxset('AR',{A1,A2,A3},'MA',{MA1,MA2})
Spec = Model: 3-D VARMA(3,2) with No Additive Constant n: 3 nAR: 3 nMA: 2 nX: 0 AR: {3x1 cell} stable autoregressive process MA: {2x1 cell} invertible moving average process Q: []
Convert the structure to a pure VAR structure:
SpecAR = vgxar(Spec)
SpecAR = Model: 3-D VAR(3) with No Additive Constant n: 3 nAR: 3 nMA: 0 nX: 0 AR: {3x1 cell} unstable autoregressive process Q: []
Convert the model specification structure Spec
to a pure MA structure:
SpecMA = vgxma(Spec)
SpecMA = Model: 3-D VMA(3) with No Additive Constant n: 3 nAR: 0 nMA: 3 nX: 0 MA: {3x1 cell} non-invertible moving average process Q: []
Notice that the pure VMA process has more MA terms than the original process. The number is the maximum of nMA
and nAR
, and nAR = 3
.
The converted VMA model is not invertible; see the MA
row. A noninvertible model can yield inaccurate predictions. Taking more terms in the series results in an invertible model.
specMA4 = vgxma(Spec,4)
specMA4 = Model: 3-D VMA(4) with No Additive Constant n: 3 nAR: 0 nMA: 4 nX: 0 MA: {4x1 cell} invertible moving average process Q: []
Converting the resulting VMA model to a pure VAR model results in the same VAR(3) model as SpecAR
.
SpecAR2 = vgxar(SpecMA); vgxdisp(SpecAR,SpecAR2)
Model 1: 3-D VAR(3) with No Additive Constant Conditional mean is not AR-stable and is MA-invertible Model 2: 3-D VAR(3) with No Additive Constant Conditional mean is not AR-stable and is MA-invertible Parameter Model 1 Model 2 -------------- -------------- -------------- AR(1)(1,1) 0.4 0.4 (1,2) -0.1 -0.1 (1,3) -0 -0 (2,1) 0.1 0.1 (2,2) 0.4 0.4 (2,3) 0.05 0.05 (3,1) -0 -0 (3,2) 0.1 0.1 (3,3) 0.5 0.5 AR(2)(1,1) 0.52 0.52 (1,2) 0.22 0.22 (1,3) 0.1 0.1 (2,1) 0.28 0.28 (2,2) 0.72 0.72 (2,3) 0.09 0.09 (3,1) 0.1 0.1 (3,2) -0.02 -0.02 (3,3) 0.6 0.6 AR(3)(1,1) 0.156 0.156 (1,2) -0.004 -0.004 (1,3) -0.18 -0.18 (2,1) 0.024 0.024 (2,2) -0.784 -0.784 (2,3) -0.038 -0.038 (3,1) -0.01 -0.01 (3,2) 0.014 0.014 (3,3) -0.17 -0.17 Q(:,:) [] []
Conversion Types and Accuracy. Some conversions occur when explicitly requested, such as those
initiated by calls to vgxar
and vgxma
.
Other conversions occur automatically as needed for calculations.
For example, vgxpred
internally converts models
to VMA models to calculate forecast statistics.
By default, conversions give terms up to the largest lag present in the model. However, for more accuracy in conversion, you can specify that the conversion use more terms. You can also specify that it continue until a residual term is below a threshold you set. The syntax is
SpecAR = vgxar(Spec,nAR,ARlag,Cutoff) SpecMA = vgxma(Spec,nMA,MAlag,Cutoff)
nMA
and nAR
represent
the number of terms in the series.
ARlag
and MAlag
are
vectors of particular lags that you want in the converted model.
Cutoff
is a positive parameter
that truncates the series if the norm of a converted term is smaller
than Cutoff
. Cutoff
is 0 by
default.
For details, see the function reference pages for vgxar
and vgxma
.
The vgxvarx
function performs
parameter estimation. vgxvarx
only estimates
parameters for VAR and VARX models. In other words, vgxvarx
does
not estimate moving average matrices, which appear, for example, in
VMA and VARMA models. For definitions of these terms, see Types of VAR Models.
The vgxar
function converts
a VARMA model to a VAR model. Currently, it does not handle VARMAX
models.
You have two choices in fitting parameters to a VARMA model or VARMAX model:
Set the vgxvarx
'IgnoreMA'
parameter
to 'yes'
(the default is 'no'
).
In this case vgxvarx
ignores VMA parameters,
and fits the VARX parameters.
Convert a VARMA model to a VAR model using vgxar
.
Then fit the resulting VAR model using vgxvarx
.
Each of these options is effective on some data. Try both if you have VMA terms in your model. See Fit a VARMA Model for an example showing both options.
Fit a VAR Model.
This example uses two series: the consumer price index (CPI) and the unemployment rate (UNRATE) from the data set Data_USEconmodel
.
Obtain the two time series, and convert them for stationarity:
load Data_USEconModel
cpi = DataTable.CPIAUCSL;
cpi = log(cpi);
dCPI = diff(cpi);
unem = DataTable.UNRATE;
Y = [dCPI,unem(2:end)];
Create a VAR model:
Spec = vgxset('n',2,'nAR',4,'Constant',true)
Spec = Model: 2-D VAR(4) with Additive Constant n: 2 nAR: 4 nMA: 0 nX: 0 a: [] AR: {} Q: []
Fit the model to the data using vgxvarx
:
[EstSpec,EstStdErrors,logL,W] = vgxvarx(Spec,Y); vgxdisp(EstSpec)
Model : 2-D VAR(4) with Additive Constant Conditional mean is AR-stable and is MA-invertible a Constant: 0.00184568 0.315567 AR(1) Autoregression Matrix: 0.308635 -0.0032011 -4.48152 1.34343 AR(2) Autoregression Matrix: 0.224224 0.00124669 7.19015 -0.26822 AR(3) Autoregression Matrix: 0.353274 0.00287036 1.48726 -0.227145 AR(4) Autoregression Matrix: -0.0473456 -0.000983119 8.63672 0.0768312 Q Innovations Covariance: 3.51443e-05 -0.000186967 -0.000186967 0.116697
Fit a VARMA Model.
This example uses artificial data to generate a time series, then shows two ways of fitting a VARMA model to the series.
Specify the model:
AR1 = [.3 -.1 .05;.1 .2 .1;-.1 .2 .4]; AR2 = [.1 .05 .001;.001 .1 .01;-.01 -.01 .2]; Q = [.2 .05 .02;.05 .3 .1;.02 .1 .25]; MA1 = [.5 .2 .1;.1 .6 .2;0 .1 .4]; MA2 = [.2 .1 .1; .05 .1 .05;.02 .04 .2]; Spec = vgxset('AR',{AR1,AR2},'Q',Q,'MA',{MA1,MA2})
Spec = Model: 3-D VARMA(2,2) with No Additive Constant n: 3 nAR: 2 nMA: 2 nX: 0 AR: {2x1 cell} stable autoregressive process MA: {2x1 cell} invertible moving average process Q: [3x3] covariance matrix
Generate a time series using vgxsim
:
YF = [100 50 20;110 52 22;119 54 23]; % starting values rng(1); % For reproducibility Y = vgxsim(Spec,190,[],YF);
Fit the data to a VAR model by calling vgxvarx
with the 'IgnoreMA'
option:
estSpec = vgxvarx(Spec,Y(3:end,:),[],Y(1:2,:),'IgnoreMA','yes');
Compare the estimated model with the original:
vgxdisp(Spec,estSpec)
Model 1: 3-D VARMA(2,2) with No Additive Constant Conditional mean is AR-stable and is MA-invertible Model 2: 3-D VAR(2) with No Additive Constant Conditional mean is AR-stable and is MA-invertible Parameter Model 1 Model 2 -------------- -------------- -------------- AR(1)(1,1) 0.3 0.723964 (1,2) -0.1 0.119695 (1,3) 0.05 0.10452 (2,1) 0.1 0.0828041 (2,2) 0.2 0.788177 (2,3) 0.1 0.299648 (3,1) -0.1 -0.138715 (3,2) 0.2 0.397231 (3,3) 0.4 0.748157 AR(2)(1,1) 0.1 -0.126833 (1,2) 0.05 -0.0690256 (1,3) 0.001 -0.118524 (2,1) 0.001 0.0431623 (2,2) 0.1 -0.265387 (2,3) 0.01 -0.149646 (3,1) -0.01 0.107702 (3,2) -0.01 -0.304243 (3,3) 0.2 0.0165912 MA(1)(1,1) 0.5 (1,2) 0.2 (1,3) 0.1 (2,1) 0.1 (2,2) 0.6 (2,3) 0.2 (3,1) 0 (3,2) 0.1 (3,3) 0.4 MA(2)(1,1) 0.2 (1,2) 0.1 (1,3) 0.1 (2,1) 0.05 (2,2) 0.1 (2,3) 0.05 (3,1) 0.02 (3,2) 0.04 (3,3) 0.2 Q(1,1) 0.2 0.193553 Q(2,1) 0.05 0.0408221 Q(2,2) 0.3 0.252461 Q(3,1) 0.02 0.00690626 Q(3,2) 0.1 0.0922074 Q(3,3) 0.25 0.306271
The estimated Q
matrix is close to the original Q
matrix. However, the estimated AR terms are not close to the original AR terms. Specifically, nearly all the AR(2) coefficients are the opposite sign, and most AR(1) coefficients are off by about a factor of 2.
Alternatively, before fitting the model, convert it to a pure AR model. To do this, specify the model and generate a time series as above. Then, convert the model to a pure AR model:
specAR = vgxar(Spec);
Fit the converted model to the data:
estSpecAR = vgxvarx(specAR,Y(3:end,:),[],Y(1:2,:));
Compare the fitted model to the original model:
vgxdisp(specAR,estSpecAR)
Model 1: 3-D VAR(2) with No Additive Constant Conditional mean is AR-stable and is MA-invertible Model 2: 3-D VAR(2) with No Additive Constant Conditional mean is AR-stable and is MA-invertible Parameter Model 1 Model 2 -------------- -------------- -------------- AR(1)(1,1) 0.8 0.723964 (1,2) 0.1 0.119695 (1,3) 0.15 0.10452 (2,1) 0.2 0.0828041 (2,2) 0.8 0.788177 (2,3) 0.3 0.299648 (3,1) -0.1 -0.138715 (3,2) 0.3 0.397231 (3,3) 0.8 0.748157 AR(2)(1,1) -0.13 -0.126833 (1,2) -0.09 -0.0690256 (1,3) -0.114 -0.118524 (2,1) -0.129 0.0431623 (2,2) -0.35 -0.265387 (2,3) -0.295 -0.149646 (3,1) 0.03 0.107702 (3,2) -0.17 -0.304243 (3,3) 0.05 0.0165912 Q(1,1) 0.2 0.193553 Q(2,1) 0.05 0.0408221 Q(2,2) 0.3 0.252461 Q(3,1) 0.02 0.00690626 Q(3,2) 0.1 0.0922074 Q(3,3) 0.25 0.306271
The model coefficients between the pure AR models are closer than between the original VARMA model and the fitted AR model. Most model coefficients are within 20% or the original. Notice, too, that estSpec
and estSpecAR
are identical. This is because both are AR(2) models fitted to the same data series.
How vgxvarx Works. vgxvarx
finds maximum likelihood
estimators of AR
and Q
matrices
and the a
and b
vectors if present.
For VAR models and if the response series do not contain NaN
values, vgxvarx
uses
a direct solution algorithm that requires no iterations. For VARX
models or if the response data contain missing values, vgxvarx
optimizes
the likelihood using the expectation-conditional-maximization (ECM)
algorithm. The iterations usually converge quickly, unless two or
more exogenous data streams are proportional to each other. In that
case, there is no unique maximum likelihood estimator, and the iterations
might not converge. You can set the maximum number of iterations with
the MaxIter
parameter, which has a default value
of 1000. vgxvarx
does not support exogenous series
containing NaN
values.
vgxvarx
calculates the loglikelihood of
the data, giving it as an output of the fitted model. Use this output
in testing the quality of the model. For example, see Determining an Appropriate Number of Lags and Examining the Stability of a Fitted Model.
When you enter the name of a fitted model at the command line, you obtain a summary. This summary contains a report on the stability of the VAR part of the model, and the invertibility of the VMA part. You can also find whether a model is stable or invertible by entering:
[isStable,isInvertible] = vgxqual(Spec)
This gives a Boolean value of 1
for isStable
if
the model is stable, and a Boolean value of 1
for isInvertible
if
the model is invertible. This stability or invertibility does not
take into account exogenous terms, which can disrupt the stability
of a model.
Stable, invertible models give reliable results, while unstable or noninvertible ones might not.
Stability and invertibility are equivalent to all eigenvalues
of the associated lag operators having modulus less than 1. In fact vgxqual
evaluates
these quantities by calculating eigenvalues. For more information,
see the vgxqual
function reference
page or Hamilton [48]
When you have models with parameters (known or estimated), you can examine the predictions of the models. For information on specifying models, see Model Specification Structures. For information on calibrating models, see VAR Model Estimation.
The main methods of forecasting are:
Generating forecasts with error bounds using vgxpred
Generating simulations with vgxsim
Generating sample paths with vgxproc
These functions base their forecasts on a model specification and initial data. The functions differ in their innovations processes:
vgxpred
assumes zero innovations.
Therefore, vgxpred
yields a deterministic forecast.
vgxsim
assumes the innovations
are jointly normal with covariance matrix Q. vgxsim
yields
pseudorandom (Monte Carlo) sample paths.
vgxproc
uses a separate input
for the innovations process. vgxproc
yields a
sample path that is deterministically based on the innovations process.
vgxpred
is faster and takes less memory
than generating many sample paths using vgxsim
.
However, vgxpred
is not as flexible as vgxsim
.
For example, suppose you transform some time series before making
a model, and want to undo the transformation when examining forecasts.
The error bounds given by transforms of vgxpred
error
bounds are not valid bounds. In contrast, the error bounds given by
the statistics of transformed simulations are valid.
Forecast a VAR Model.
This example shows how to use vgxpred
to forecast a VAR model.
vgxpred
enables you to generate forecasts with error estimates. vgxpred
requires:
A fully-specified model (for example , impSpec
in what follows)
The number of periods for the forecast (for example, FT
in what follows)
vgxpred
optionally takes:
An exogenous data series
A presample time series (e.g., Y(end-3:end,:)
in what follows)
Extra paths
Load the Data_USEconModel
data set. This example uses two time series: the logarithm of real GDP, and the real 3-month T-bill rate, both differenced to be approximately stationary. Suppose that a VAR(4) model is appropriate to describe the time series.
load Data_USEconModel DEF = log(DataTable.CPIAUCSL); GDP = log(DataTable.GDP); rGDP = diff(GDP - DEF); % Real GDP is GDP - deflation TB3 = 0.01*DataTable.TB3MS; dDEF = 4*diff(DEF); % Scaling rTB3 = TB3(2:end) - dDEF; % Real interest is deflated Y = [rGDP,rTB3];
Fit a VAR(4) model specification:
Spec = vgxset('n',2,'nAR',4,'Constant',true); impSpec = vgxvarx(Spec,Y(5:end,:),[],Y(1:4,:)); impSpec = vgxset(impSpec,'Series',... {'Transformed real GDP','Transformed real 3-mo T-bill rate'});
Predict the evolution of the time series:
FDates = datenum({'30-Jun-2009'; '30-Sep-2009'; '31-Dec-2009'; '31-Mar-2010'; '30-Jun-2010'; '30-Sep-2010'; '31-Dec-2010'; '31-Mar-2011'; '30-Jun-2011'; '30-Sep-2011'; '31-Dec-2011'; '31-Mar-2012'; '30-Jun-2012'; '30-Sep-2012'; '31-Dec-2012'; '31-Mar-2013'; '30-Jun-2013'; '30-Sep-2013'; '31-Dec-2013'; '31-Mar-2014'; '30-Jun-2014' }); FT = numel(FDates); [Forecast,ForecastCov] = vgxpred(impSpec,FT,[],... Y(end-3:end,:));
View the forecast using vgxplot
:
vgxplot(impSpec,Y(end-10:end,:),Forecast,ForecastCov);
Forecast a VAR Model Using Monte Carlo Simulation.
This example shows how to use Monte Carlo simulation via vgxsim
to forecast a VAR model.
vgxsim
enables you to generate simulations of time series based on your model. If you have a trustworthy model structure, you can use these simulations as sample forecasts.
vgxsim requires:
A model (impSpec
in what follows)
The number of periods for the forecast (FT
in what follows)
vgxsim optionally takes:
An exogenous data series
A presample time series (Y(end-3:end,:)
in what follows)
Presample innovations
The number of realizations to simulate (2000
in what follows)
Load the Data_USEconModel
data set. This example uses two time series: the logarithm of real GDP, and the real 3-month T-bill rate, both differenced to be approximately stationary. For illustration, a VAR(4) model describes the time series.
load Data_USEconModel DEF = log(DataTable.CPIAUCSL); GDP = log(DataTable.GDP); rGDP = diff(GDP - DEF); % Real GDP is GDP - deflation TB3 = 0.01*DataTable.TB3MS; dDEF = 4*diff(DEF); % Scaling rTB3 = TB3(2:end) - dDEF; % Real interest is deflated Y = [rGDP,rTB3];
Fit a VAR(4) model specification:
Spec = vgxset('n',2,'nAR',4,'Constant',true); impSpec = vgxvarx(Spec,Y(5:end,:),[],Y(1:4,:)); impSpec = vgxset(impSpec,'Series',... {'Transformed real GDP','Transformed real 3-mo T-bill rate'});
Define the forecast horizon.
FDates = datenum({'30-Jun-2009'; '30-Sep-2009'; '31-Dec-2009'; '31-Mar-2010'; '30-Jun-2010'; '30-Sep-2010'; '31-Dec-2010'; '31-Mar-2011'; '30-Jun-2011'; '30-Sep-2011'; '31-Dec-2011'; '31-Mar-2012'; '30-Jun-2012'; '30-Sep-2012'; '31-Dec-2012'; '31-Mar-2013'; '30-Jun-2013'; '30-Sep-2013'; '31-Dec-2013'; '31-Mar-2014'; '30-Jun-2014' }); FT = numel(FDates);
Simulate the model for 10 steps, replicated 2000 times:
rng(1); %For reproducibility
Ysim = vgxsim(impSpec,FT,[],Y(end-3:end,:),[],2000);
Calculate the mean and standard deviation of the simulated series:
Ymean = mean(Ysim,3); % Calculate means Ystd = std(Ysim,0,3); % Calculate std deviations
Plot the means +/- 1 standard deviation for the simulated series:
subplot(2,1,1) plot(dates(end-10:end),Y(end-10:end,1),'k') hold('on') plot([dates(end);FDates],[Y(end,1);Ymean(:,1)],'r') plot([dates(end);FDates],[Y(end,1);Ymean(:,1)]+[0;Ystd(:,1)],'b') plot([dates(end);FDates],[Y(end,1);Ymean(:,1)]-[0;Ystd(:,1)],'b') datetick('x') title('Transformed real GDP') subplot(2,1,2) plot(dates(end-10:end),Y(end-10:end,2),'k') hold('on') axis([dates(end-10),FDates(end),-.1,.1]); plot([dates(end);FDates],[Y(end,2);Ymean(:,2)],'r') plot([dates(end);FDates],[Y(end,2);Ymean(:,2)]+[0;Ystd(:,2)],'b') plot([dates(end);FDates],[Y(end,2);Ymean(:,2)]-[0;Ystd(:,2)],'b') datetick('x') title('Transformed real 3-mo T-bill rate')
How vgxpred and vgxsim Work. vgxpred
generates two quantities:
A deterministic forecast time series based on 0 innovations
Time series of forecast covariances based on the Q matrix
The simulations for models with VMA terms uses presample innovation
terms. Presample innovation terms are values of ε_{t} for
times before the forecast period that affect the MA terms. For definitions
of the terms MA, Q, and ε_{t},
see Types of VAR Models. If you
do not provide all requisite presample innovation terms, vgxpred
assumes
the value 0 for missing terms.
vgxsim
generates random time series based
on the model using normal random innovations distributed with Q covariances.
The simulations of models with MA terms uses presample innovation
terms. If you do not provide all requisite presample innovation terms, vgxsim
assumes
the value 0 for missing terms.
If you scaled any time series before fitting a model, you can unscale the resulting time series to understand its predictions more easily.
If you scaled a series with log
,
transform predictions of the corresponding model with exp
.
If you scaled a series with diff(log)
,
transform predictions of the corresponding model with cumsum(exp)
. cumsum
is
the inverse of diff
; it calculates
cumulative sums. As in integration, you must choose an appropriate
additive constant for the cumulative sum. For example, take the log
of the final entry in the corresponding data series, and use it as
the first term in the series before applying cumsum
.
You can examine the effect of impulse responses to models with the vgxproc
function. An impulse response is
the deterministic response of a time series model to an innovations
process that has the value of one standard deviation in one component
at the initial time, and zeros in all other components and times. vgxproc
simulates
the evolution of a time series model from a given innovations process.
Therefore, vgxproc
is appropriate for examining
impulse responses.
The only difficulty in using vgxproc
is
determining exactly what is "the value of one standard deviation
in one component at the initial time." This value can mean
different things depending on your model.
For a structural model, B_{0} is usually a known diagonal matrix, and Q is an identity matrix. In this case, the impulse response to component i is the square root of B(i,i).
For a nonstructural model, there are several choices. The simplest choice, though not necessarily the most accurate, is to take component i as the square root of Q(i,i). Other possibilities include taking the Cholesky decomposition of Q, or diagonalizing Q and taking the square root of the diagonal matrix.
Generate Impulse Responses for a VAR model.
This example shows how to generate impulse responses of an interest rate shock on real GDP using vgxproc
.
Load the Data_USEconModel
data set. This example uses two time series: the logarithm of real GDP, and the real 3-month T-bill rate, both differenced to be approximately stationary. Suppose that a VAR(4) model is appropriate to describe the time series.
load Data_USEconModel DEF = log(DataTable.CPIAUCSL); GDP = log(DataTable.GDP); rGDP = diff(GDP - DEF); % Real GDP is GDP - deflation TB3 = 0.01*DataTable.TB3MS; dDEF = 4*diff(DEF); % Scaling rTB3 = TB3(2:end) - dDEF; % Real interest is deflated Y = [rGDP,rTB3];
Define the forecast horizon.
FDates = datenum({'30-Jun-2009'; '30-Sep-2009'; '31-Dec-2009'; '31-Mar-2010'; '30-Jun-2010'; '30-Sep-2010'; '31-Dec-2010'; '31-Mar-2011'; '30-Jun-2011'; '30-Sep-2011'; '31-Dec-2011'; '31-Mar-2012'; '30-Jun-2012'; '30-Sep-2012'; '31-Dec-2012'; '31-Mar-2013'; '30-Jun-2013'; '30-Sep-2013'; '31-Dec-2013'; '31-Mar-2014'; '30-Jun-2014' }); FT = numel(FDates);
Fit a VAR(4) model specification:
Spec = vgxset('n',2,'nAR',4,'Constant',true); impSpec = vgxvarx(Spec,Y(5:end,:),[],Y(1:4,:)); impSpec = vgxset(impSpec,'Series',... {'Transformed real GDP','Transformed real 3-mo T-bill rate'});
Generate the innovations processes both with and without an impulse (shock):
W0 = zeros(FT, 2); % Innovations without a shock W1 = W0; W1(1,2) = sqrt(impSpec.Q(2,2)); % Innovations with a shock
Generate the processes with and without the shock:
Yimpulse = vgxproc(impSpec,W1,[],Y); % Process with shock Ynoimp = vgxproc(impSpec,W0,[],Y); % Process with no shock
Undo the scaling for the GDP processes:
Yimp1 = exp(cumsum(Yimpulse(:,1))); % Undo scaling
Ynoimp1 = exp(cumsum(Ynoimp(:,1)));
Compute and plot the relative difference between the calculated GDPs:
RelDiff = (Yimp1 - Ynoimp1) ./ Yimp1; plot(FDates,100*RelDiff);dateaxis('x',12) title(... 'Impact of Interest Rate Shock on Real Gross Domestic Product') ylabel('% Change')
The graph shows that an increased interest rate causes a dip in the real GDP for a short time. Afterwards the real GDP begins to climb again, reaching its former level in about 1 year.
Incorporate feedback from exogenous predictors, or study their linear associations to the response series by including a regression component in multivariate time series models. By order of increasing complexity, examples of multivariate, time series, regression models include:
Modeling the effects of an intervention or to include shared intercepts among several responses. In these cases, the exogenous series are indicator variables.
Modeling the contemporaneous linear associations between a subset of exogenous series to each response. Applications include CAPM analysis and studying the effects of prices of items on their demand. These applications are examples of seemingly unrelated regression (SUR). For more details, see Implement Seemingly Unrelated Regression Analyses and Estimate the Capital Asset Pricing Model Using SUR.
Modeling the linear associations between contemporaneous and lagged, exogenous series and the response as part of a multivariate, distributed lag model. Applications include determining how a change in monetary growth affects real gross domestic product (GDP) and gross national income (GNI).
Any combination of SUR and the distributed lag model that includes the lagged effects of responses, also known as simultaneous equation models. VARMAX modeling is an example (see Types of VAR Models).
The general equation for a multivariate, time series, regression model is
$${y}_{t}=a+{X}_{t}\cdot b+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{\epsilon}_{t},$$
where, in particular,
X_{t} is an n-by-r design matrix.
Row j of X_{t} contains the observations of the regression variables that correspond to the period t observation of response series j.
Column k of X_{t} corresponds to the period t observations of regression variable k. (There are r regression variables composed from the exogenous series. For details, see Design Matrix Structure for Including Exogenous Data.)
X_{t} can contain lagged exogenous series.
b is an r-by-1 vector of regression coefficients corresponding to the r regression variables. The column entries of X_{t} share a common regression coefficient for all t. That is, the regression component of the response series (y_{t} = [y_{1t},y_{2t},...,y_{nt}]′) for period t is
$$\left[\begin{array}{c}X{(1,1)}_{t}{b}_{1}+\cdots +X{(1,r)}_{t}{b}_{r}\\ X{(2,1)}_{t}{b}_{1}+\cdots +X{(2,r)}_{t}{b}_{r}\\ \vdots \\ X{(n,1)}_{t}{b}_{1}+\cdots +X{(n,r)}_{t}{b}_{r}\end{array}\right].$$
a is an n-by-1 vector of intercepts corresponding to the n response series.
Overview. For maximum flexibility, construct a design matrix that linearly associates the exogenous series to each response series. It helps to think of the design matrix as a vector of T smaller, block design matrices. The rows of block design matrix t correspond to observation t of the response series, and the columns correspond to the regression coefficients of the regression variables.
vgxvarx
estimates the regression
component of multivariate time series models using the Statistics and Machine Learning Toolbox™ function mvregress
. Therefore, you must pass the
design matrix as a T-by-1 cell vector, where cell t is
the n-by-r numeric, block, design
matrix at period t, n is the
number of response series, and r is the number
of regression variables in the design. That
is, the structure of the entire design matrix is
$$\begin{array}{c}\text{Regressionvariables}{X}_{t}\\ \left\{\begin{array}{c}\left[\begin{array}{ccc}X{(1,1)}_{1}& \cdots & X{(1,r)}_{1}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{1}& \cdots & X{(n,r)}_{1}\end{array}\right]\\ \left[\begin{array}{ccc}X{(1,1)}_{2}& \cdots & X{(1,r)}_{2}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{2}& \cdots & X{(n,r)}_{2}\end{array}\right]\\ \vdots \\ \left[\begin{array}{ccc}X{(1,1)}_{T}& \cdots & X{(1,r)}_{T}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{T}& \cdots & X{(n,r)}_{T}\end{array}\right]\end{array}\right\}\end{array}$$
At each time t, the n-by-r matrix X_{t} multiplies the r-by-1 vector b, yielding an n-by-1 vector of linear combinations. This setup implies suggests that:
The number of regression variables might differ from the number of exogenous series. That is, you can associate different sets of exogenous series among response series.
Each block design matrix in the cell vector must have
the same dimensionality. That is, the multivariate time series framework
does not accommodate time-varying models. The state-space framework
does accommodate time-varying, multivariate time series models. For
details, see ssm
.
vgxinfer
, vgxpred
, vgxproc
,
and vgxsim
accommodate multiple
response paths. You can associate a common design matrix for all response
paths by passing in a cell vector of design matrices. You can also
associate a different design matrix to each response path by passing
in a T-by-M cell matrix of design
matrices, where M is the number of response paths
and cell (t,m) is an n-by-r numeric,
design matrix at period t (denoted X_{t}^{(m)}).
That is, the structure of the entire design matrix for all paths is
$$\begin{array}{c}\begin{array}{ccc}\text{Path}1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}& \cdots \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}& \text{Path}M\end{array}\\ \left\{\begin{array}{ccc}\left[\begin{array}{ccc}X{(1,1)}_{1}^{(1)}& \cdots & X{(1,r)}_{1}^{(1)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{1}^{(1)}& \cdots & X{(n,r)}_{1}^{(1)}\end{array}\right],& \cdots & ,\left[\begin{array}{ccc}X{(1,1)}_{1}^{(M)}& \cdots & X{(1,r)}_{1}^{(M)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{1}^{(M)}& \cdots & X{(n,r)}_{1}^{(M)}\end{array}\right]\\ \left[\begin{array}{ccc}X{(1,1)}_{2}^{(1)}& \cdots & X{(1,r)}_{2}^{(1)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{2}^{(1)}& \cdots & X{(n,r)}_{2}^{(1)}\end{array}\right],& \cdots & ,\left[\begin{array}{ccc}X{(1,1)}_{2}^{(M)}& \cdots & X{(1,r)}_{2}^{(M)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{2}^{(M)}& \cdots & X{(n,r)}_{2}^{(M)}\end{array}\right]\\ \vdots & \ddots & \vdots \\ \left[\begin{array}{ccc}X{(1,1)}_{T}^{(1)}& \cdots & X{(1,r)}_{T}^{(1)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{T}^{(1)}& \cdots & X{(n,r)}_{T}^{(1)}\end{array}\right],& \cdots & ,\left[\begin{array}{ccc}X{(1,1)}_{T}^{(M)}& \cdots & X{(1,r)}_{T}^{(M)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{T}^{(M)}& \cdots & X{(n,r)}_{T}^{(M)}\end{array}\right]\end{array}\right\}\end{array}.$$
For more details on how to structure design matrices for mvregress
, see Set Up Multivariate Regression Problems.
Examples of Design Matrix Structures.
Intervention model — Suppose a tariff is imposed over some time period. You suspect that this tariff affects GNP and three other economic time series. To determine the effects of the tariff, use an intervention model, where the response series are the four econometric series, and the exogenous, regression variables are indicator variables representing the presence of the tariff in the system. Here are two ways of including the exogenous tariffs.
Responses share a regression coefficient —
Each block design matrix (or cell) consists of either ones(4,1)
or zeros(4,1)
,
where a 1
indicates that the tariff is in the system,
and 0
otherwise. That is, at period t,
a cell of the entire design matrix contains one of
$$\left[\begin{array}{c}1\\ 1\\ 1\\ 1\end{array}\right]\text{or}\left[\begin{array}{c}0\\ 0\\ 0\\ 0\end{array}\right].$$
Responses do not share regression coefficients —
Each block matrix (or cell) consists of either eye(4)
or zeros(4)
.
That is, at period t, a cell of the entire design
matrix contains one of
$$\left[\begin{array}{cccc}1& 0& 0& 0\\ 0& 1& 0& 0\\ 0& 0& 1& 0\\ 0& 0& 0& 1\end{array}\right]\text{or}\left[\begin{array}{cccc}0& 0& 0& 0\\ 0& 0& 0& 0\\ 0& 0& 0& 0\\ 0& 0& 0& 0\end{array}\right].$$
In this case, the sole exogenous, indicator variable expands
to four regression variables. The advantage of the larger (replicated)
formulation is that it allows for vgxvarx
to
estimate the influence of the tariff on each response series separately.
The resulting estimated regression coefficient vector $$\widehat{b}$$ can
have differing values for each component. The different values reflect
the different direct influences of the tariff on each time series.
Once you have the entire design matrix (denoted Design
),
you must put each block design matrix that composes Design
into
the respective cells of a T-by-1 cell vector (or
cell matrix for multiple paths). To do this, use mat2cell
. Specify to break up Design
into T
, 4
-by-size(Design,2)
block
design matrices using
DesignCell = mat2cell(Design,4*ones(T,1),size(Design,2))
DesignCell
is the properly structured regression
variables that you can now pass into vgxvarx
to
estimate the model parameters.
SUR that associates all exogenous series to each response
series — If the columns of a X
are exogenous
series, then, to associate all exogenous series to each response,
Create the entire design matrix by expanding X
using
its Kronecker product with the n-by-n identity
matrix, e.g., if there are four responses, then the entire design
matrix is
Design = kron(X,eye(4));
Put each block design matrix into the cells of a T-by-1
cell vector using mat2cell
. Each block matrix has
four rows and size(Design,2)
columns.
Linear trend — You can model linear trends
in your data by including the exogenous matrix eye(n)*t
in
cell t
of the entire design matrix.
Before you estimate a multivariate, time series, regression
model using vgxvarx
, specify the
number of regression variables in the created model. (For details
on specifying a multivariate time series model using vgxset
, see Model Specification Structures). Recall from Design Matrix Structure for Including Exogenous Data that
the number of regression variables in the model is the number of columns
in each design matrix denoted r. You can indicate
the number of regression variables several ways:
For a new model,
Specify the nX
name-value pair
argument as the number of regression variables when you create the
model using vgxset
.
Specify the bsolve
name-value
pair argument as the logical vector true(
r
,1)
vgxset
creates a multivariate time
series model object, and fills in the appropriate properties. In what
follows, Mdl
denotes a created multivariate time
aeries model in the Workspace.
For a model in the Workspace, set either of the nX
or bsolve
properties
to r or true(
,
respectively, using dot notation, e.g., r
,1)Mdl.nX =
.r
You can also exclude a subset of regression coefficient
from being estimated. For example, to exclude the first regression
coefficient, set 'bsolve',[false(1);true(
.
Be aware that if the model is new (i.e, r
-1,1)]Mdl.b
= []
),
then the software sets any coefficient it doesn't estimate
to 0
. To fix a coefficient to a value:
Enter
Mdl.b = ones(r,1);
Specify values for the elements you want to hold fixed
during estimation in the b
property. For example,
to specify that the first regression coefficient should be held at 2
during
estimation, enter
Mdl.b(1) = 2;
Enter
Mdl.bsolve = [false(1);true(r-1,1)];
The software does not estimate regression intercepts (a)
by default. To include a different regression intercept for each response
series, specify 'Constant',true
when you create
the model using vgxset
, or set the Constant
property
of a model in the Workspace to true
using dot notation.
Alternatively, you can specify 'asolve',true(n,1)
or
set the asolve
property to true(n,1)
.
To exclude a regression intercept from estimation, follow the same
steps as for excluding a regression coefficient.
To estimate the regression coefficients, pass the model, response
data, and the cell vector of design matrices (see Design Matrix Structure for Including Exogenous Data) to vgxvarx
. For details on how vgxvarx
works
when it estimates regression coefficients, see How vgxvarx Works.
Be aware that the presence of exogenous series in a multivariate time series model might destabilized the fitted model.
This example shows how to prepare exogenous data for several seemingly unrelated regression (SUR) analyses. The response and exogenous series are random paths from a standard Gaussian distribution.
In seemingly unrelated regression (SUR), each response variable is a function of a subset of the exogenous series, but not of any endogenous variable. That is, for and , the model for response at period is
The indices of the regression coefficients and exogenous predictors indicate that:
You can associate each response with a different subset of exogenous predictors.
The response series might not share intercepts or regression coefficients.
SUR accommodates intra-period innovation heteroscedasticity and correlation, but inter-period innovation independence and homoscedasticity, i.e.,
Simulate Data from the True Model
Suppose that the true model is
where , are multivariate Gaussian random variables each having mean zero and jointly having covariance matrix
Suppose that the paths represent different econometric measurements, e.g. stock returns.
Simulate four exogenous predictor paths from the standard Gaussian distribution.
rng(1); % For reproducibility n = 3; % Number of response series nExo = 4; % Number of exogenous series T = 100; X = randn(100,nExo);
The multivariate time series analysis functions of Econometrics Toolbox™ require you to input the exogenous data in a T
-by-1 cell vector. Cell
of the cell vector is a design matrix indicating the linear relationship of the exogenous variables with each response series at period
. Specifically, each design matrix in the cell array:
Has rows, each corresponding to a response series.
Has columns since, in this example, all exogenous variables are in the regression component of each response series.
To create the cell vector of design matrices for this case, first expand the exogenous predictor data by finding its Kronecker product with the -by- identity matrix.
ExpandX1 = kron(X,eye(n));
r1 = size(ExpandX1,2); % Number of regression variables
ExpandX1
is an
-by-
numeric matrix formed by multiplying each element of X
to the
-by-
identity matrix, and then putting the product in the corresponding position of a
-by-1 block matrix of
-by-
matrices.
Create the cell vector of design matrices by putting each consecutive
-by-
block matrices of ExpandX1
into the cells of a
-by-1 cell vector. Verify that one of the cells contains the expected design matrix (e.g. the third cell)).
CellX1 = mat2cell(ExpandX1,n*ones(T,1)); CellX1{3} X(3,:)
ans = Columns 1 through 7 -0.7585 0 0 1.9302 0 0 1.8562 0 -0.7585 0 0 1.9302 0 0 0 0 -0.7585 0 0 1.9302 0 Columns 8 through 12 0 0 1.3411 0 0 1.8562 0 0 1.3411 0 0 1.8562 0 0 1.3411 ans = -0.7585 1.9302 1.8562 1.3411
In period 3, all observed predictors are associated with each response series.
Create a multivariate time series model object that characterizes the true model using vgxset
.
aTrue = [1; -1; 0.5]; bTrue = [2; 4; -2; -1.5; 2.5; 0.5; 0.5; -1.75; -1.5; 0.75; -0.05; 0.7]; InnovCov = [1 0.5 -0.05; 0.5 1 0.25; -0.05 0.25 1]; TrueMdl = vgxset('n',n,'b',bTrue,'a',aTrue,'Q',InnovCov) Y = vgxsim(TrueMdl,100,CellX1);
TrueMdl = Model: 3-D VARMAX(0,0,12) with Additive Constant n: 3 nAR: 0 nMA: 0 nX: 12 a: [1 -1 0.5] additive constants b: [12x1] regression coefficients Q: [3x3] covariance matrix
SUR Using All Predictors for Each Response Series
Create a multivariate time series model suitable for SUR using vgxset
. You must specify the number of response series ('n'
), the number of regression variables ('nX'
), and whether to include different regression intercepts for each response series ('Constant'
).
Mdl1 = vgxset('n',n,'nX',r1,'Constant',true)
Mdl1 = Model: 3-D VARMAX(0,0,12) with Additive Constant n: 3 nAR: 0 nMA: 0 nX: 12 a: [] b: [] Q: []
Mdl1
is a multivariate time series model object. Unlike TrueMdl
, none of the coefficients, intercepts, and intra-period covariance matrix have values. Therefore, Mdl1
is suitable for estimation.
Estimate the regression coefficients using vgxvarx
. Extract the residuals. Display the estimated model using vgxdisp
[EstMdl1,~,~,W] = vgxvarx(Mdl1,Y,CellX1); vgxdisp(EstMdl1)
Model : 3-D VARMAX(0,0,12) with Additive Constant a Constant: 0.978981 -1.06438 0.453232 b Regression Parameter: 1.76856 3.85757 -2.20089 -1.55085 2.44067 0.464144 0.69588 -1.71386 -1.6414 0.670357 -0.0564374 0.565809 Q Innovations Covariance: 1.38503 0.667301 -0.159136 0.667301 0.973123 0.216492 -0.159136 0.216492 0.993384
EstMdl
is a multivariate time series model containing the estimated parameters. W
is a
-by-
matrix of residuals. By default, vgxvarx
models a full, intra-period innovations covariance matrix.
Alternatively, and in this case, you can use the backslash operator on X
and Y
. However, you must include a column of ones in X
for the intercepts.
coeff = [ones(T,1) X]\Y
coeff = 0.9790 -1.0644 0.4532 1.7686 3.8576 -2.2009 -1.5508 2.4407 0.4641 0.6959 -1.7139 -1.6414 0.6704 -0.0564 0.5658
coeff
is a nExo + 1
-by- n
matrix of estimated regression coefficients and intercepts. The estimated intercepts are in the first row, and the rest of the matrix contains the estimated regression coefficients
Compare all estimates to their true values.
fprintf('\n'); fprintf(' Intercepts \n'); fprintf(' True | vgxvarx | backslash\n'); fprintf('--------------------------------------\n'); for j = 1:n fprintf(' %8.4f | %8.4f | %8.4f\n',aTrue(j),EstMdl1.a(j),coeff(1,j)); end cB = coeff'; cB = cB(:); fprintf('\n'); fprintf(' Coefficients \n'); fprintf(' True | vgxvarx | backslash\n'); fprintf('--------------------------------------\n'); for j = 1:r1 fprintf(' %8.4f | %8.4f | %8.4f\n',bTrue(j),... EstMdl1.b(j),cB(n + j)); end fprintf('\n'); fprintf(' Innovations Covariance\n'); fprintf(' True | vgxvarx\n'); fprintf('----------------------------------------------------------\n'); for j = 1:n fprintf('%8.4f %8.4f %8.4f | %8.4f %8.4f %8.4f\n',... InnovCov(j,:),EstMdl1.Q(j,:)); end
Intercepts True | vgxvarx | backslash -------------------------------------- 1.0000 | 0.9790 | 0.9790 -1.0000 | -1.0644 | -1.0644 0.5000 | 0.4532 | 0.4532 Coefficients True | vgxvarx | backslash -------------------------------------- 2.0000 | 1.7686 | 1.7686 4.0000 | 3.8576 | 3.8576 -2.0000 | -2.2009 | -2.2009 -1.5000 | -1.5508 | -1.5508 2.5000 | 2.4407 | 2.4407 0.5000 | 0.4641 | 0.4641 0.5000 | 0.6959 | 0.6959 -1.7500 | -1.7139 | -1.7139 -1.5000 | -1.6414 | -1.6414 0.7500 | 0.6704 | 0.6704 -0.0500 | -0.0564 | -0.0564 0.7000 | 0.5658 | 0.5658 Innovations Covariance True | vgxvarx ---------------------------------------------------------- 1.0000 0.5000 -0.0500 | 1.3850 0.6673 -0.1591 0.5000 1.0000 0.2500 | 0.6673 0.9731 0.2165 -0.0500 0.2500 1.0000 | -0.1591 0.2165 0.9934
The estimates from implementing vgxvarx
and the backslash operator are the same, and are fairly close to their corresponding true values.
One way to check the relationship strength between the predictors and responses is to compute the coefficient of determination (i.e., the fraction of variation explained by the predictors), which is
where is the estimated variance of residual series , and is the estimated variance of response series .
R2 = 1 - sum(diag(cov(W)))/sum(diag(cov(Y)))
R2 = 0.9118
The SUR model and predictor data explain approximately 91% of the variation in the response data.
SUR Using a Unique Predictor for Each Response Series
For each period , create block design matrices such that response series is linearly associated to predictor series , . Put the block design matrices in cells of a -by-1 cell vector in chronological order.
CellX2 = cell(T,1); for j = 1:T CellX2{j} = diag(X(j,1:n)); end r2 = size(CellX2{1},2);
Create a multivariate time series model by using vgxset
and specifying the number of response series, the number of regression variables, and whether to include different regression intercepts for each response series.
Mdl2 = vgxset('n',n,'nX',r2,'Constant',true);
Estimate the regression coefficients using vgxvarx
. Display the estimated parameters. Compute the coefficient of determination.
[EstMdl2,~,~,W2] = vgxvarx(Mdl2,Y,CellX2); vgxdisp(EstMdl2) R2 = 1 - sum(diag(cov(W2)))/sum(diag(cov(Y)))
Model : 3-D VARMAX(0,0,3) with Additive Constant a Constant: 1.07752 -1.43445 0.674376 b Regression Parameter: 1.01491 3.83837 -2.71834 Q Innovations Covariance: 4.96205 4.91571 -1.86546 4.91571 20.8263 -11.0945 -1.86546 -11.0945 7.75392 R2 = 0.1177
The model and predictors explain approximately 12% of the variation in the response series. This should not be surprising since the model is not the same as the response-generating process.
SUR Using a Shared Intercept for All Response Series
Create block design matrices such that each response series is linearly associated to all predictor series . Prepend the resulting design matrix with a vector of ones representing the common intercept.
ExpandX3 = [ones(T*n,1) kron(X,eye(n))]; r3 = size(ExpandX3,2);
Put the block design matrices into the cells of a -by-1 cell vector in chronological order.
CellX3 = mat2cell(ExpandX3,n*ones(T,1));
Create a multivariate time series model by using vgxset
and specifying the number of response series and the number of regression variables. By default, vgxset
excludes regression intercepts.
Mdl3 = vgxset('n',n,'nX',r3);
Estimate the regression coefficients using vgxvarx
. Display the estimated parameters. Compute the coefficient of determination.
[EstMdl3,~,~,W3] = vgxvarx(Mdl3,Y,CellX3); vgxdisp(EstMdl3) a = EstMdl3.b(1) R2 = 1 - sum(diag(cov(W3)))/sum(diag(cov(Y)))
Model : 3-D VARMAX(0,0,13) with No Additive Constant b Regression Parameter: 0.388833 1.73468 3.94099 -2.20458 -1.56878 2.48483 0.462187 0.802394 -1.97614 -1.62978 0.63972 0.0190058 0.562466 Q Innovations Covariance: 1.72265 -0.164059 -0.122294 -0.164059 3.02031 0.12577 -0.122294 0.12577 0.997404 a = 0.3888 R2 = 0.9099
The shared, estimated regression intercept is 0.389
, and the other coefficients are similar to the first SUR implementation. The model and predictors explain approximately 91% of the variation in the response series. This should not be surprising since the model almost the same as the response-generating process.
This example shows how to implement the capital asset pricing model (CAPM) using the Econometric Toolbox™ multivariate time series framework.
The CAPM model characterizes comovements between asset and market prices. Under this framework, individual asset returns are linearly associated with the return of the whole market (for details, see [1], [2], and [3]). That is, given the return series of all stocks in a market ( ) and the return of a riskless asset ( ), the CAPM model for return series ( ) is
for all assets in the market.
is an -by-1 vector of asset alphas that should be zero, and it is of interest to investigate assets whose asset alphas are significantly away from zero. is a -by-1 vector of asset betas that specify the degree of comovement between the asset being modeled and the market. An interpretation of element of is
If , then asset moves in the same direction and with the same volatility as the market, i.e., is positively correlated with the market .
If , then asset moves in the opposite direction, but with the same volatility as the market, i.e., is negatively correlated with the market.
If , then asset is uncorrelated with the market.
In general:
determines the direction that the asset is moving relative to the market as described in the previous bullets.
is the factor that determines how much more or less volatile asset is relative to the market. For example, if , then asset is 10 times as volatile as the market.
Load and Process the Data
Load the CAPM data set included in the Financial Toolbox™.
load CAPMuniverse
varWithNaNs = Assets(any(isnan(Data),1))
dateRange = datestr([Dates(1) Dates(end)])
varWithNaNs = 'AMZN' 'GOOG' dateRange = 03-Jan-2000 07-Nov-2005
The variable Data
is a 1471-by-14 numeric matrix containing the daily returns of a set of 12 stocks (columns 1 through 12), one rickless asset (column 13), and the return of the whole market (column 14). The returns were measured from 03Jan2000 through 07Nov2005. AMZN
and GOOG
had their IPO during sampling, and so they have missing values.
Assign variables for the response and predictor series.
Y = bsxfun(@minus,Data(:,1:12),Data(:,14)); X = Data(:,13) - Data(:,14); [T,n] = size(Y)
T = 1471 n = 12
Y
is a 1471-by-12 matrix of the returns adjusted by the riskless return. X
is a 1471-by-1 vector of the market return adjusted by the riskless return.
Create block design matrices for each period, and put them into cells of a -by-1 cell vector. You can specify that each response series has its own intercept when you create the multivariate time series model object. Therefore, do not consider the intercept when you create the block design matrices.
Design = kron(X,eye(n)); CellX = mat2cell(Design,n*ones(T,1)); nX = size(Design,2);
Create the Multivariate Time Series Model
Create a multivariate time series model that characterizes the CAPM model. You must specify the number of response series, whether to give each response series equation an intercept, and the number of regression variables.
Mdl = vgxset('n',n,'Constant',true,'nX',nX);
Mdl
is a multivariate time series model object that characterizes the desired CAPM model.
Estimate the Multivariate Time Series Model
Pass the CAPM model specification (Mdl
), the response series (Y
), and the cell vector of block design matrices (CellX
) to vgxvarx
. Request to return the estimated multivariate time series model and the estimated coefficient standard errors. vgxvarx
maximizes the likelihood using the expectation-conditional-maximization (ECM) algorithm. ECM accommodates missing response values directly (i.e., without imputation), but at the cost of computation time.
[EstMdl,EstCoeffSEMdl] = vgxvarx(Mdl,Y,CellX);
EstMdl
and EstCoeffSEMdl
have the same structure as Mdl
, but EstMdl
contains the parameter estimates and EstCoeffSEMdl
contains the estimated standard errors of the parameter estimates. EstCoeffSEMdl
:
Contains the biased maximum likelihood standard errors.
Does not include the estimated standard errors of the intra-period covariances. To include the standard errors of the intra-period covariances, specify the name-value pair 'StdErrType','all'
in vgxvarx
.
Analyze Coefficient Estimates
Display the regression estimates, their standard errors, and their t statistics. By default, the software estimates, stores, and displays standard errors from maximum likelihood. Specify to use the unbiased least squares standard errors.
dispMdl = vgxset(EstMdl,'Q',[]) % Suppress printing covariance estimates vgxdisp(dispMdl,EstCoeffSEMdl,'DoFAdj',true)
dispMdl = Model: 12-D VARMAX(0,0,12) with Additive Constant n: 12 nAR: 0 nMA: 0 nX: 12 a: [12x1] additive constants asolve: [12x1 logical] additive constant indicators b: [12x1] regression coefficients bsolve: [12x1 logical] regression coefficient indicators Q: [] Qsolve: [12x12 logical] covariance matrix indicators Model : 12-D VARMAX(0,0,12) with Additive Constant Standard errors with DoF adjustment (least-squares) Parameter Value Std. Error t-Statistic -------------- -------------- -------------- -------------- a(1) 0.00116454 0.000869904 1.3387 a(2) 0.000715822 0.00121752 0.587932 a(3) -0.000223753 0.000806185 -0.277546 a(4) -2.44513e-05 0.000689289 -0.0354732 a(5) 0.00140469 0.00101676 1.38153 a(6) 0.00412219 0.000910392 4.52793 a(7) 0.000116143 0.00068952 0.168441 a(8) -1.37697e-05 0.000456934 -0.0301351 a(9) 0.000110279 0.000710953 0.155114 a(10) -0.000244727 0.000521036 -0.469693 a(11) 3.2336e-05 0.000861501 0.0375346 a(12) 0.000128267 0.00103773 0.123603 b(1) 1.22939 0.0741875 16.5714 b(2) 1.36728 0.103833 13.1681 b(3) 1.5653 0.0687534 22.7669 b(4) 1.25942 0.0587843 21.4245 b(5) 1.34406 0.0867116 15.5003 b(6) 0.617321 0.0776404 7.95103 b(7) 1.37454 0.0588039 23.375 b(8) 1.08069 0.0389684 27.7326 b(9) 1.60024 0.0606318 26.3928 b(10) 1.1765 0.0444352 26.4767 b(11) 1.50103 0.0734709 20.4303 b(12) 1.65432 0.0885002 18.6928
To determine whether the parameters are significantly away from zero, suppose that a t statistic of 3 or more indicates significance.
Response series 6 has a significant asset alpha.
sigASymbol = Assets(6)
sigASymbol = 'GOOG'
As a result, GOOG
has exploitable economic properties.
All asset betas are greater than 3. This indicates that all assets are significantly correlated with the market.
However, GOOG
has an asset beta of approximately 0.62
, whereas all other asset betas are greater than 1. This indicates that the magnitude of the volatility of GOOG
is approximately 62% of the market volatility. The reason for this is that GOOG
steadily and almost consistently appreciated in value while the market experienced volatile horizontal movements.
For more details and an alternative analysis, see Capital Asset Pricing Model with Missing Data.
This example shows how to estimate a multivariate time series model that contains lagged endogenous and exogenous variables, and how to simulate responses. The response series are the quarterly:
Changes in real gross domestic product (rGDP) rates ( )
Real money supply (rM1SL) rates ( )
Short-term interest rates (i.e., three-month treasury bill yield, )
from March, 1959 through March, 2009 . The exogenous series is the quarterly changes in the unemployment rates ( ).
Suppose that a model for the responses is this VARX(4,3) model
Preprocess the Data
Load the U.S. macroeconomic data set. Flag the series and their periods that contain missing values (indicated by NaN
values).
load Data_USEconModel varNaN = any(ismissing(DataTable),1); % Variables containing NaN values seriesWithNaNs = series(varNaN)
seriesWithNaNs = Columns 1 through 3 '(FEDFUNDS) Effec...' '(GS10) Ten-year ...' '(M1SL) M1 money ...' Columns 4 through 5 '(M2SL) M2 money ...' '(UNRATE) Unemplo...'
In this data set, the variables that contain missing values entered the sample later than the other variables. There are no missing values after sampling started for a particular variable.
vgxvarx
accommodates missing values for responses, but not for regression variables. Flag all periods corresponding to a missing regression variable value.
idx = ~isnan(DataTable.UNRATE);
For the rest of the example, consider only those values that of the series indicated by a true
in idx
.
Compute rGDP and rM1SL, and the growth rates of rGDP, rM1SL, short-term interest rates, and the unemployment rate. Description
contains a description of the data and the variable names. Reserve the last three years of data to investigate the out-of-sample performance of the estimated model.
rGDP = DataTable.GDP(idx)./(DataTable.GDPDEF(idx)/100); rM1SL = DataTable.M1SL(idx)./(DataTable.GDPDEF(idx)/100); dLRGDP = diff(log(rGDP)); % rGDP growth rate dLRM1SL = diff(log(rM1SL)); % rM1SL growth rate d3MTB = diff(DataTable.TB3MS(idx)); % Change in short-term interest rate (3MTB) dUNRATE = diff(DataTable.UNRATE(idx)); % Change in unemployment rate T = numel(d3MTB); % Total sample size oosT = 12; % Out-of-sample size estT = T - oosT; % Estimation sample size estIdx = 1:estT; % Estimation sample indices oosIdx = (T - 11):T; % Out-of-sample indices dates = dates((end - T + 1):end); EstY = [dLRGDP(estIdx) dLRM1SL(estIdx) d3MTB(estIdx)]; % In-sample responses estX = dUNRATE(estIdx); % In-sample exogenous data n = size(EstY,2); OOSY = [dLRGDP(oosIdx) dLRM1SL(oosIdx) d3MTB(oosIdx)]; % Out-of-sample responses oosX = dUNRATE(oosIdx); % Out-of-sample exogenous data
Create the Design Matrices
Create an estT
-by-1 cell vector of block design matrices that associate the predictor series with each response such that the responses do not share a coefficient.
EstExpandX = kron(estX,eye(n)); EstCellX = mat2cell(EstExpandX,n*ones(estT,1)); nX = size(EstExpandX,2);
Specify the VARX Model
Specify a multivariate time series model object that characterizes the VARX(4,3) model using vgxset
.
Mdl = vgxset('n',n,'nAR',4,'nX',nX,'Constant',true);
Estimate the VAR(4) Model
Estimate the parameters of the VARX(4,3) model using vgxvarx
. Display the parameter estimates.
EstMdl = vgxvarx(Mdl,EstY,EstCellX); vgxdisp(EstMdl)
Model : 3-D VARMAX(4,0,3) with Additive Constant Conditional mean is AR-stable and is MA-invertible a Constant: 0.00811792 0.000709263 0.0465824 b Regression Parameter: -0.0162116 -0.00163933 -1.50115 AR(1) Autoregression Matrix: -0.0375772 -0.0133236 0.00108218 -0.00519697 0.177963 -0.00501432 -0.873992 -6.89049 -0.165888 AR(2) Autoregression Matrix: 0.0753033 0.0775643 -0.001049 0.00282857 0.29064 -0.00159574 4.00724 0.465046 -0.221024 AR(3) Autoregression Matrix: -0.0927688 -0.0240239 -0.000549057 0.0627837 0.0686179 -0.00212185 -7.52241 10.247 0.227121 AR(4) Autoregression Matrix: 0.0646951 -0.0792765 -0.000176166 0.0276958 0.00922231 -0.000183861 1.38523 -11.8774 0.0518154 Q Innovations Covariance: 3.57524e-05 7.05807e-06 -4.23542e-06 7.05807e-06 9.67992e-05 -0.00188786 -4.23542e-06 -0.00188786 0.777151
EstMdl
is a multivariate time series model object containing the estimated parameters.
Simulate Out-Of-Sample Response Paths Using the Same Exogenous Data per Path
Simulate 1000, 3 year response series paths from the estimated model assuming that the exogenous unemployment rate is a fixed series. Since the model contains 4 lags per endogenous variable, specify the last 4 observations in the estimation sample as presample data.
OOSExpandX = kron(oosX,eye(n));
OOSCellX = mat2cell(OOSExpandX,n*ones(oosT,1));
numPaths = 1000;
Y0 = EstY((end-3):end,:);
rng(1); % For reproducibility
YSim = vgxsim(EstMdl,oosT,OOSCellX,Y0,[],numPaths);
YSim
is a 12-by-3-by-1000 numeric array of simulated responses. The rows of YSim
correspond to out-of-sample periods, the columns correspond to the response series, and the leaves correspond to paths.
Plot the response data and the simulated responses. Identify the 5%, 25%, 75% and 95% percentiles, and the mean and median of the simulated series at each out-of-sample period.
YSimBar = mean(YSim,3); YSimQrtl = quantile(YSim,[0.05 0.25 0.5 0.75 0.95],3); RepDates = repmat(dates(oosIdx),1,1000); respNames = {'dLRGDP' 'dLRM1SL' 'd3MTB'}; figure; for j = 1:n; subplot(3,1,j); h1 = plot(dates(oosIdx),squeeze(YSim(:,j,:)),'Color',0.75*ones(3,1)); hold on; h2 = plot(dates(oosIdx),YSimBar(:,j),'.-k','LineWidth',2); h3 = plot(dates(oosIdx),squeeze(YSimQrtl(:,j,:)),':r','LineWidth',1.5); h4 = plot(dates((end - 30):end),[EstY((end - 18):end,j);OOSY(:,j)],... 'b','LineWidth',2); title(sprintf('%s',respNames{j})); datetick; axis tight; hold off; end legend([h1(1) h2(1) h3(1) h4],{'Simulated Series','Simulation Mean',... 'Simulation Quartile','Data'},'Location',[0.4 0.1 0.01 0.01],... 'FontSize',8);
Simulate Out-Of-Sample Response Paths Using Random Exogenous Data
Suppose that the change in the unemployment rate is an AR(4) model, and fit the model to the estimation sample data.
MdlUNRATE = arima('ARLags',1:4); EstMdlUNRATE = estimate(MdlUNRATE,estX,'Display','off');
EstMdlUNRATE
is an arima
model object containing the parameter estimates.
Simulate 1000, 3 year paths from the estimated AR(4) model for the change in unemployment rate. Since the model contains 4 lags, specify the last 4 observations in the estimation sample as presample data.
XSim = simulate(EstMdlUNRATE,oosT,'Y0',estX(end-3:end),... 'NumPaths',numPaths);
XSim
is a 12-by-1000 numeric matrix of simulated exogenous paths. The rows correspond to periods and the columns correspond to paths.
Create a cell matrix of block design matrices to organize the exogenous data, where each column corresponds to a path.
ExpandXSim = kron(XSim,eye(n)); size(ExpandXSim) CellXSim = mat2cell(ExpandXSim,n*ones(oosT,1),n*ones(1,numPaths)); size(CellXSim) CellXSim{1,1}
ans = 36 3000 ans = 12 1000 ans = 0.7901 0 0 0 0.7901 0 0 0 0.7901
ExpandXSim
is a 36-by-3000 numeric matrix, and CellXSim
is a 12-by-1000 cell matrix of mutually exclusive, neighboring, 3-by-3 block matrices in ExpandXSim
.
Simulate 3 years of 1000 future response series paths from the estimated model using the simulated exogenous data. Since the model contains 4 lags per endogenous variable, specify the last 4 observations in the estimation sample as presample data.
YSimRX = vgxsim(EstMdl,oosT,CellXSim,Y0,[],numPaths);
YSimRX
is a 12-by-3-by-1000 numeric array of simulated responses.
Plot the response data and the simulated responses. Identify the 5%, 25%, 75% and 95% percentiles, and the mean and median of the simulated series at each out-of-sample period.
YSimBarRX = mean(YSimRX,3); YSimQrtlRX = quantile(YSimRX,[0.05 0.25 0.5 0.75 0.95],3); figure; for j = 1:n; subplot(3,1,j); h1 = plot(dates(oosIdx),squeeze(YSimRX(:,j,:)),'Color',0.75*ones(3,1)); hold on; h2 = plot(dates(oosIdx),YSimBarRX(:,j),'.-k','LineWidth',2); h3 = plot(dates(oosIdx),squeeze(YSimQrtlRX(:,j,:)),':r','LineWidth',1.5); h4 = plot(dates((end - 30):end),[EstY((end - 18):end,j);OOSY(:,j)],... 'b','LineWidth',2); title(sprintf('%s with Simulated Unemployment Rate',respNames{j})); datetick; axis tight; hold off; end legend([h1(1) h2(1) h3(1) h4],{'Simulated Series','Simulation Mean',... 'Simulation Quartile','Data'},'Location',[0.4 0.1 0.01 0.01],... 'FontSize',8)
This section contains an example of the workflow described in Building VAR Models. The example uses three time series: GDP, M1 money supply, and the 3-month T-bill rate. The example shows:
Loading and transforming the data for stationarity
Partitioning the transformed data into presample, estimation, and forecast intervals to support a backtesting experiment
Making several models
Fitting the models to the data
Deciding which of the models is best
Making forecasts based on the best model
Loading Data. The file Data_USEconModel
ships with Econometrics Toolbox software.
The file contains time series from the St. Louis Federal Reserve Economics
Database. This example uses three of the time series: GDP, M1 money
supply (M1SL), and 3-month T-bill rate (TB3MS). Load the data into
a time series matrix Y
as follows:
load Data_USEconModel
gdp = DataTable.GDP;
m1 = DataTable.M1SL;
tb3 = DataTable.TB3MS;
Y = [gdp,m1,tb3];
Transforming Data for Stationarity. Plot the data to look for trends:
figure subplot(3,1,1) plot(dates,Y(:,1),'r'); title('GDP') datetick('x') grid on subplot(3,1,2); plot(dates,Y(:,2),'b'); title('M1') datetick('x') grid on subplot(3,1,3); plot(dates, Y(:,3), 'k') title('3-mo T-bill') datetick('x') grid on hold off
Not surprisingly, both the GDP and M1 data appear to grow, while
the T-bill returns show no long-term growth. To counter the trends
in GDP and M1, take a difference of the logarithms of the data. Taking
a difference shortens the time series, as described in Transforming Data for Stationarity.
Therefore, truncate the T-bill series and date series X
so
that the Y
data matrix has the same number of rows
for each column:
Y = [diff(log(Y(:,1:2))), Y(2:end,3)]; % Transformed data X = dates(2:end); figure subplot(3,1,1) plot(X,Y(:,1),'r'); title('GDP') datetick('x') grid on subplot(3,1,2); plot(X,Y(:,2),'b'); title('M1') datetick('x') grid on subplot(3,1,3); plot(X, Y(:,3),'k'), title('3-mo T-bill') datetick('x') grid on
You see that the scale of the first two columns is about 100 times smaller than the third. Multiply the first two columns by 100 so that the time series are all roughly on the same scale. This scaling makes it easy to plot all the series on the same plot. More importantly, this type of scaling makes optimizations more numerically stable (for example, maximizing loglikelihoods).
Y(:,1:2) = 100*Y(:,1:2); figure plot(X,Y(:,1),'r'); hold on plot(X,Y(:,2),'b'); datetick('x') grid on plot(X,Y(:,3),'k'); legend('GDP','M1','3-mo T-bill'); hold off
Selecting Models. You can choose many different models for the data. This example rather arbitrarily chooses four models:
VAR(2) with diagonal autoregressive and covariance matrices
VAR(2) with full autoregressive and covariance matrices
VAR(4) with diagonal autoregressive and covariance matrices
VAR(4) with full autoregressive and covariance matrices
Make the series the same length, and transform them to be stationary and on a similar scale.
dGDP = 100*diff(log(gdp(49:end))); dM1 = 100*diff(log(m1(49:end))); dT3 = diff(tb3(49:end)); Y = [dGDP dM1 dT3];
Create the four models as follows:
dt = logical(eye(3)); VAR2diag = vgxset('ARsolve',repmat({dt},2,1),... 'asolve',true(3,1),'Series',{'GDP','M1','3-mo T-bill'}); VAR2full = vgxset(VAR2diag,'ARsolve',[]); VAR4diag = vgxset(VAR2diag,'nAR',4,'ARsolve',repmat({dt},4,1)); VAR4full = vgxset(VAR2full,'nAR',4);
The matrix dt
is a diagonal logical matrix. dt
specifies
that both the autoregressive matrices for VAR2diag
and VAR4diag
are
diagonal. In contrast, the specifications for VAR2full
and VAR4full
have
empty matrices instead of dt
. Therefore, vgxvarx
fits
the defaults, which are full matrices for autoregressive and correlation
matrices.
Choosing Presample, Estimation, and Forecast Periods. To assess the quality of the models, divide the response data Y
into
three periods: presample, estimation, and forecast. Fit the models
to the estimation data, using the presample period to provide lagged
data. Compare the predictions of the fitted models to the forecast
data. The estimation period is in-sample, and the forecast period
is out-of-sample (also known as backtesting).
For the two VAR(4) models, the presample period is the first
four rows of Y
. Use the same presample period for
the VAR(2) models so that all the models are fit to the same data.
This is necessary for model fit comparisons. For both models, the
forecast period is the final 10% of the rows of Y
.
The estimation period for the models goes from row 5 to the 90% row.
The code for defining these data periods follows:
Ypre = Y(1:4,:); T = ceil(.9*size(Y,1)); Yest = Y(5:T,:); YF = Y((T+1):end,:); TF = size(YF,1);
Fitting with vgxvarx. Now that the models and time series exist, you can easily fit the models to the data:
[EstSpec1,EstStdErrors1,LLF1,W1] = ... vgxvarx(VAR2diag,Yest,[],Ypre,'CovarType','Diagonal'); [EstSpec2,EstStdErrors2,LLF2,W2] = ... vgxvarx(VAR2full,Yest,[],Ypre); [EstSpec3,EstStdErrors3,LLF3,W3] = ... vgxvarx(VAR4diag,Yest,[],Ypre,'CovarType','Diagonal'); [EstSpec4,EstStdErrors4,LLF4,W4] = ... vgxvarx(VAR4full,Yest,[],Ypre);
The EstSpec
structures are the
fitted models.
The EstStdErrors
structures contain
the standard errors of the fitted models.
The LLF
are the loglikelihoods
of the fitted models. Use these to help select the best model, as
described in Checking Model Adequacy.
The W
are the estimated innovations
(residuals) processes, the same size as Yest
.
The specification for EstSpec1
and EstSpec3
includes
diagonal covariance matrices.
Checking Stability. You can check whether the estimated models are stable and invertible
with the vgxqual
function. (There are no MA terms
in these models, so the models are necessarily invertible.) The test
shows that all the estimated models are stable:
[isStable1,isInvertible1] = vgxqual(EstSpec1); [isStable2,isInvertible2] = vgxqual(EstSpec2); [isStable3,isInvertible3] = vgxqual(EstSpec3); [isStable4,isInvertible4] = vgxqual(EstSpec4); [isStable1,isStable2,isStable3,isStable4]
ans = 1 1 1 1
You can also look at the estimated specification structures. Each contains a line stating whether the model is stable:
EstSpec4
EstSpec4 = Model: 3-D VAR(4) with Additive Constant Series: {'GDP' 'M1' '3-mo T-bill'} n: 3 nAR: 4 nMA: 0 nX: 0 a: [0.524224 0.106746 -0.671714] additive constants asolve: [1 1 1 logical] additive constant indicators AR: {4x1 cell} stable autoregressive process ARsolve: {4x1 cell of logicals} autoregressive lag indicators Q: [3x3] covariance matrix Qsolve: [3x3 logical] covariance matrix indicators
Likelihood Ratio Tests. You can compare the restricted (diagonal) AR models to their
unrestricted (full) counterparts using the lratiotest
function.
The test rejects or fails to reject the hypothesis that the restricted
models are adequate, with a default 5% tolerance. This is an in-sample
test. To perform the test:
Count the parameters in the models using the vgxcount
function:
[n1,n1p] = vgxcount(EstSpec1); [n2,n2p] = vgxcount(EstSpec2); [n3,n3p] = vgxcount(EstSpec3); [n4,n4p] = vgxcount(EstSpec4);
Compute the likelihood ratio tests:
reject1 = lratiotest(LLF2,LLF1,n2p - n1p) reject3 = lratiotest(LLF4,LLF3,n4p - n3p) reject4 = lratiotest(LLF4,LLF2,n4p - n2p)
reject1 = 1 reject3 = 1 reject4 = 0
The 1
results indicate that the likelihood
ratio test rejected both the restricted models, AR(1) and AR(3), in
favor of the corresponding unrestricted models. Therefore, based on
this test, the unrestricted AR(2) and AR(4) models are preferable.
However, the test does not reject the unrestricted AR(2) model in
favor of the unrestricted AR(4) model. (This test regards the AR(2)
model as an AR(4) model with restrictions that the autoregression
matrices AR(3) and AR(4) are 0.) Therefore, it seems that the unrestricted
AR(2) model is best among the models fit.
Akaike Information Criterion. To find the best model among a set, minimize the Akaike information criterion. This is an in-sample calculation. Here is how to calculate the criterion for the four models:
AIC = aicbic([LLF1 LLF2 LLF3 LLF4],[n1p n2p n3p n4p])
AIC = 1.0e+03 * 1.5129 1.4462 1.5122 1.4628
The best model according to this criterion is the unrestricted AR(2) model. Notice, too, that the unrestricted AR(4) model has lower Akaike information than either of the restricted models. Based on this criterion, the unrestricted AR(2) model is best, with the unrestricted AR(4) model coming next in preference.
Comparing Forecasts with Forecast Period Data. To compare the predictions of the four models against the forecast
data YF
, use the vgxpred
function.
This function returns both a prediction of the mean time series, and
an error covariance matrix that gives confidence intervals about the
means. This is an out-of-sample calculation.
[FY1,FYCov1] = vgxpred(EstSpec1,TF,[],Yest); [FY2,FYCov2] = vgxpred(EstSpec2,TF,[],Yest); [FY3,FYCov3] = vgxpred(EstSpec3,TF,[],Yest); [FY4,FYCov4] = vgxpred(EstSpec4,TF,[],Yest);
A plot shows the predictions in the shaded region to the right:
figure vgxplot(EstSpec2,Yest,FY2,FYCov2)
It is now straightforward to calculate the sum of squares error
between the predictions and the data, YF
:
error1 = YF - FY1; error2 = YF - FY2; error3 = YF - FY3; error4 = YF - FY4; SSerror1 = error1(:)' * error1(:); SSerror2 = error2(:)' * error2(:); SSerror3 = error3(:)' * error3(:); SSerror4 = error4(:)' * error4(:); figure bar([SSerror1 SSerror2 SSerror3 SSerror4],.5) ylabel('Sum of squared errors') set(gca,'XTickLabel',... {'AR2 diag' 'AR2 full' 'AR4 diag' 'AR4 full'}) title('Sum of Squared Forecast Errors')
The predictive performance of the four models is similar.
The full AR(2) model seems to be the best and most parsimonious fit. Its model parameters are:
vgxdisp(EstSpec2)
Model : 3-D VAR(2) with Additive Constant Conditional mean is AR-stable and is MA-invertible Series : GDP Series : M1 Series : 3-mo T-bill a Constant: 0.687401 0.3856 -0.915879 AR(1) Autoregression Matrix: 0.272374 -0.0162214 0.0928186 0.0563884 0.240527 -0.389905 0.280759 -0.0712716 -0.32747 AR(2) Autoregression Matrix: 0.242554 0.140464 -0.177957 0.00130726 0.380042 -0.0484981 0.260414 0.024308 -0.43541 Q Innovations Covariance: 0.632182 0.105925 0.216806 0.105925 0.991607 -0.155881 0.216806 -0.155881 1.00082
This example shows two ways to make predictions or forecasts
based on the EstSpec4
fitted model:
Running vgxpred
based on the
last few rows of YF
.
Simulating several time series with the vgxsim
function.
In both cases, transform the forecasts so they are directly comparable to the original time series.
Forecasting with vgxpred. This example shows predictions 10 steps into the future.
Generate the prediction time series from the fitted model beginning at the latest times:
[ypred,ycov] = vgxpred(EstSpec2,10,[],YF);
Transform the predictions by undoing the scaling and
differencing applied to the original data. Make sure to insert the
last observation at the beginning of the time series before using cumsum
to
undo the differencing. And, since differencing occurred after taking
logarithms, insert the logarithm before using cumsum
:
yfirst = [gdp,m1,tb3]; yfirst = yfirst(49:end,:); % Remove NaNs dates = dates(49:end); endpt = yfirst(end,:); endpt(1:2) = log(endpt(1:2)); ypred(:,1:2) = ypred(:,1:2)/100; % Rescale percentage ypred = [endpt; ypred]; % Prepare for cumsum ypred(:,1:3) = cumsum(ypred(:,1:3)); ypred(:,1:2) = exp(ypred(:,1:2)); lastime = dates(end); timess = lastime:91:lastime+910; % Insert forecast horizon figure subplot(3,1,1) plot(timess,ypred(:,1),':r') hold on plot(dates,yfirst(:,1),'k') datetick('x') grid on title('GDP') subplot(3,1,2); plot(timess,ypred(:,2),':r') hold on plot(dates,yfirst(:,2),'k') datetick('x') grid on title('M1') subplot(3,1,3); plot(timess,ypred(:,3),':r') hold on plot(dates,yfirst(:,3),'k') datetick('x') grid on title('3-mo T-bill') hold off
The plot shows the extrapolations in dotted red, and the original data series in solid black.
Look at the last few years in this plot to get a sense of how the predictions relate to the latest data points:
ylast = yfirst(170:end,:); timeslast = dates(170:end); figure subplot(3,1,1) plot(timess,ypred(:,1),'--r') hold on plot(timeslast,ylast(:,1),'k') datetick('x') grid on title('GDP') subplot(3,1,2); plot(timess,ypred(:,2),'--r') hold on plot(timeslast,ylast(:,2),'k') datetick('x') grid on title('M1') subplot(3,1,3); plot(timess,ypred(:,3),'--r') hold on plot(timeslast,ylast(:,3),'k') datetick('x') grid on title('3-mo T-bill') hold off
The forecast shows increasing GDP, little growth in M1, and a slight decline in the interest rate. However, the forecast has no error bars. For a forecast with errors, see Forecasting with vgxsim.
Forecasting with vgxsim. This example shows forecasting 10 steps into the future, with a simulation replicated 2000 times, and generates the means and standard deviations.
Simulate a time series from the fitted model beginning at the latest times:
rng(1); % For reproducibility
ysim = vgxsim(EstSpec2,10,[],YF,[],2000);
Transform the predictions by undoing the scaling and
differencing applied to the original data. Make sure to insert the
last observation at the beginning of the time series before using cumsum
to
undo the differencing. And, since differencing occurred after taking
logarithms, insert the logarithm before using cumsum
:
yfirst = [gdp,m1,tb3]; endpt = yfirst(end,:); endpt(1:2) = log(endpt(1:2)); ysim(:,1:2,:) = ysim(:,1:2,:)/100; ysim = [repmat(endpt,[1,1,2000]);ysim]; ysim(:,1:3,:) = cumsum(ysim(:,1:3,:)); ysim(:,1:2,:) = exp(ysim(:,1:2,:));
Compute the mean and standard deviation of each series, and plot the results. The plot has the mean in black, with ±1 standard deviation in red.
ymean = mean(ysim,3); ystd = std(ysim,0,3); figure subplot(3,1,1) plot(timess,ymean(:,1),'k') datetick('x') grid on hold on plot(timess,ymean(:,1)+ystd(:,1),'--r') plot(timess,ymean(:,1)-ystd(:,1),'--r') title('GDP') subplot(3,1,2); plot(timess,ymean(:,2),'k') hold on datetick('x') grid on plot(timess,ymean(:,2)+ystd(:,2),'--r') plot(timess,ymean(:,2)-ystd(:,2),'--r') title('M1') subplot(3,1,3); plot(timess,ymean(:,3),'k') hold on datetick('x') grid on plot(timess,ymean(:,3)+ystd(:,3),'--r') plot(timess,ymean(:,3)-ystd(:,3),'--r') title('3-mo T-bill') hold off
The series show increasing growth in GDP, moderate to little growth in M1, and uncertainty about the direction of T-bill rates.
[1] Sharpe, W. F. "Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk." Journal of Finance.Vol. 19, 1964, pp. 425–442.
[2] Linter, J. "The Valuation of Risk Assets and the Selection of Risky Investments in Stocks." Review of Economics and Statistics. Vol. 14, 1965, pp. 13–37.
[3] Jarrow, A. Finance Theory. Prentice-Hall, Inc., 1988.
vgxinfer
| vgxplot
| vgxpred
| vgxproc
| vgxset
| vgxvarx