Topic 3 16

Topic 3
Vector Autoregressions
John Stapleton
() Vector Autoregressions 1 / 135

Table of Contents I
3.1 Introduction
3.2 Stationary Vector Autoregressions (VARs)
3.3 Estimating Stationary VARs
3.4 Forecasting Stationary VARs
3.5 Likelihood Ration Tests in VARs
3.6 VAR Order Selection
3.6.1 Introduction
3.6.2 A sequential test for selecting the VAR order
3.6.3 Selecting the VAR order using the FPE
3.6.4 Selecting the VAR order using information criteria
3.7 Testing for Granger-causality
3.7.1 An alternative representation of VARs
3.7.2 Testing for Granger-causality using an LR test
3.7.3 Limitations of Granger-causality tests
(ETC3450) Vector Autoregressions 2 / 135
Table of Contents II
3.8 Example 3.1

3.9 Structural versus reduced form VARs
3.10 Impulse Response Analysis
3.10.1 Introduction
3.10.2 Orthogonalized impulse responses
3.11 Advantages and Disadvantages of VARs
3.11.1 Advantages of VARs
3.11.2 Disadvantages of VARs
3.12 Vector Autoregressive Moving Average (VARMA) Models

3.1 Introduction I
In topics 1 and 2 we used ARIMA models to represent the data

generating process for a single time series. ARIMA models have the
virtues of being relatively simple and of capturing the dynamic nature
of the behavior of economic variables. That is, the value of a
variable in the present period depends on past values of the variable.
While such univariate analysis is often very useful, it is important to
recognize that variables in an economic system are interrelated, with
the value of one economic variable being determined simultaneously
with the values of many other economic variables.

3.1 Introduction II
For example, macroeconomic theory suggests that, in addition to

being dynamic, variables such as interest rates, exchange rates, prices
and income are also simultaneously determined. Consequently, a
univariate model of the exhange rate of the form
xt = 0 + 1 xt 1 + 2 xt 2 ... + p xt p + et ,
where xt denotes the value of the exchange rate in period t, may not
be a very good representation of the DGP for xt , since the exchange
rate depends on many other macroeconomic variables such as interest
rates, ination, the balance of payments etc.
Consequently, a multivariate framework which explicitly recognizes
the inter-relationships between economic variables as well as their
dynamic character, may be required to adequately capture the
underlying data generating process.

3.1 Introduction III
Vector autoregressions or VARs, which were rst proposed by Sims

(1980), attempt to capture both the simultaneous and dynamic
character of economic relationships.
Consider the following system of equations which describes the data
generating process for the random variables x1t and x2t.
x1t = 10 +11,1 x1t 1 + 12,1 x2t 1 + 11,2 x1t 2 + 12,2 x2t 2

+ 1t
(3.1)
x2t = 20 +21,1 x1t 1 +22,1 x2t 1 + 21,2 x1t 2 +22,2 x2t 2 + 2t
(3.2)
Each of these equations postulates that the dependent variable in
time period t depends on a constant, two lags of itself and two lags of
the other variable in the system.
Notice that, apart from the intercepts, each coe cient has three
subscripts.

3.1 Introduction IV
The rst subscript identies the equation to which the coe cient
belongs.
The second subscript identies the variable to which it is attached.
The third subscript identies the lag to which it pertains.
For example, 22,1 is the coe cient in the second equation, attached
to the second variable lagged once.
We can represent (3.1) and (3.2) more compactly by using matrix
notation. Dene
x1t x1t 1 1t
Xt = , Xt 1 = , t = ,
(2x 1 ) x2t (2x 1 ) x2t 1 2t
(2x 1 )
10 11,1 12,1 11,2 12,2

A0 = , A1 = , A2 = .
(2x 1 ) 20 (2x 2 ) 21,1 22,1 (2x 2 ) 21,2 22,2
Using these denitions, (3.1) and (3.2) may be written as

3.1 Introduction V
x1t 01 11,1 12,1 x1t 1

= +
x2t 02 21,1 22,1 x2t 1
11,2 12,2 x1t 2 1t
+ + ,
21,2 22,2 x2t 2 2t
or, even more compactly, as
Xt = A0 + A1 Xt 1 + A2 Xt 2 + t (3.3)
(2x 1 ) (2x 1 ) (2x 2 ) (2x 1 ) (2x 2 ) (2x 1 ) (2x 1 )
Notice that in equation (3.3) the variables on both the right-hand

side and the left-hand side are vectors rather than scalars.
The number of elements in the x vector is referred to as the dimension
of the VAR, which we denote by m.

3.1 Introduction VI
The number of lags in the VAR is referred to as the order of the VAR,
which we denote by p.
The autoregressive coe cients are mxm matrices. For example, Aj is
the coe cient matrix (or matrix of coe cients) at lag j.
Therefore, (3.3) is a two dimensional VAR of order 2, or a two
dimensional VAR(2), since it contains two x variables and each is
lagged twice.
Since t is a random vector, it has an associated covariance matrix,
which we denote by , and which is dened as
var (1t ) cov (1t , 2t )

. (3.4)
(2x 2 ) cov (2t , 1t ) var (2t )
We will assume that
t (0, )8t and E(t t +s ) = 08s 6= 0. (3.5)

(2x 1 )

3.1 Introduction VII
The covariance structure assumed in (3.5) implies the following:

The error terms in the two equations may be contemporaneously
correlated. That is,
cov (1t , 2t ) 6= 0.
There is no non-contemporeaneous cross-equation correlation. That is,
cov (1t , 2t j ) = 08j 6= 0.
The error term in each equation is serially uncorrelated. That is,
cov (1t , 1t j ) = 08j 6= 0,
cov (2t , 2t j ) = 08j 6= 0.

3.1 Introduction VIII
The errors in each equation are homoskedastic. That is,
var (1t ) = var (1s ), 8t, s,
var (2t ) = var (2s ), 8t, s.
An error vector that satises
t (0, )8t and E(t t +s ) = 08s 6= 0. (3.5)

(2x 1 )
is referred to as vector white noise.

3.1 Introduction IX
In the general case of an m dimensional VAR(p) process the model
may be written as
Xt = A0 + A1 Xt 1 + A2 Xt 2 +...+ Ap Xt p + t , (3.6)
(mx 1 ) (mx 1 ) (mxm )(mx 1 ) (mxm )(mx 1 ) (mxm ) (mx 1 ) (mx 1 )
where we assume that
t (0, )8t and E(t t +s ) = 08s 6= 0.

(mx 1 )
The contemporaneous covariance matrix of the error vector is given by

2 2 3
1 12 13 . . 1m
6 21 22 23 . . 2m 7
6 7
6 . . . 7
=6 6
7, (3.7)
(mxm ) 6 . . . 77
4 . . . 5
m1 m2 . . . 2m
3.1 Introduction X
where
2i = Var (it ) 8t, ij = Cov (it , jt ) 8t.
Note that in the general case:
Each equation in the VAR contains the same regressors (p lags of each
x variable plus a vector of ones).
The VAR contains a total of mp+1 regressors
Each equation in the VAR contains mp+1 coe cients, so the VAR
contains a total of m(mp+1) coe cients.
For example, if m = 4 and p = 3 then:
There are
mp + 1 = 4.3 + 1 = 13
regressors in each equation in the VAR.
There are
m (mp + 1) = 4(12 + 1) = 52
coe cients to estimate.
3.1 Introduction XI
Increasing either the order or the dimension of the VAR can

substantially increase the number of parameters to be estimated.
If we increase the number of lags in the VAR by one
the number of parameters will increase by M2 .
In the preceding example, if we increase p from 3 to 4 (with m = 4)
then the number of parameters to estimate will increase by
42 = 16.
.
If we increase the dimension of the VAR by one (from m to m+1)
the number of parameters in the VAR will increase by
p (m + 1) + 1 + mp = 2mp + p + 1
= p (2m + 1) + 1

3.1 Introduction XII
in the preceding example, if we increase the order of the VAR from 4 to

5 (with p =3) the number of parameters to estimate will increase by
p (2m + 1) + 1 = 3(8 + 1) + 1
= 28.
In principle, we could have dierent lag lengths for each variable in

each equation. However, in that case the OLS estimator would be
ine cient. If the number of lags of each variable is not uniform across
equations, e cient estimation requires the use of the seemingly
unrelated regressions (SUR) estimator. Consequently, it is
common practise to have the same number of lags of each
variable in each equation in the VAR.

3.2. Stationary VAR(p) processes I

3.2. Stationary VAR(p) processes II
Denition
3.1 If
xt VAR (p ),
(mx 1 )
then xt is stationary if :
The mean of xt is nite and time invariant i.e.
E (x t )= < 8t.
(mx 1 ) (mx 1 )
The contemporaneous covariance matrix is nite and time invariant i.e.
Cov (x t )= E [(x t )(x t )0 ] = x < 8t.

(mxm )
The covariance matrix between xt and xt j is nite and time invariant i.e.
0
Cov (x t , x t j )= E [(x t )(x t j ) ] = j < 8t, j.
(ETC3450) (mxm ) Vector Autoregressions 17 / 135
For example, if m =2, then
E (x1t )
E (x t ) = ,
(2x 1 )
E (x2t )
var (x1t ) cov (x1t , x2t )

Cov (x t ) = x = ,
(2x 2 )
cov (x2t , x1t ) var (x2t )
cov (x1t , x1t j) cov (x1t , x2t j)

Cov (x t , xt j) = j = .
(2x 2 )
cov (x2t , x1t j ) cov (x2t , x2t j)

Note that, using the lag operator,
may be written as
Xt A1 Xt 1 A2 Xt 2 ....... Ap Xt p = A0 + t
or
Xt A1 LXt A2 L2 Xt ....... Ap L P Xt = A0 + t
or
(Im A1 L A2 L 2 .... Ap Lp )Xt = A0 + t
or
( L ) Xt = A0 + t ,

3.2. Stationary VAR(p) processes II
where
(L) = (Im A1 L A2 L 2 .... Ap Lp ). (3.8)
(mxm ) (mxm )
(L) is referred to as the autoregressive matrix polynomial

because the coe cients attached to the various powers of L are
matrices rather than scalars.
The roots of (L) are the values of r that satisfy the equation
j Im A1 r i A2 ri2 .... Ap rip j = 0, (3.9)
where
jIm A1 ri A2 ri2 .... Ap rip j
denotes the determinant of the mxm matrix
Im A1 ri A2 ri2 .... Ap rip .

3.2. Stationary VAR(p) processes III
When we expand the left-hand side of (3.9) we obtain a polynomial of

order mp. Consequently, (3.9) has mp roots. The condition for xt to
be a stationary VAR(p) is expressed as a restriction on these mp roots
and is stated in theorem 3.1 below.
Theorem
3.1 The VAR(P) process given by (3.6) is stationary if the roots of (L) have
modulus greater than one. That is,
mod(ri ) > 1
for any ri that satises the equation
jIm A1 ri A2 ri2 .... Ap rip j = 0.

3.2. Stationary VAR(p) processes IV
There is a vector generalization of the Wold representation

theorem for a univariate time series which was stated in Topic2. This
generalization is stated in theorem 3.2 below.

3.2. Stationary VAR(p) processes V
Theorem
3.2 Every stationary VAR(p) process has an innite vector moving average
representation given by
xt = + t + 1 t 1 + 2 t 2 + .......
(mx 1 ) (mx 1 ) (mx 1 ) (mxm )(mx 1 ) (mxm )(mx 1 )
where
1
= (Im A1 A2 .... Ap ) A0
and
t (0, )8t and E(t t +s ) = 08s 6= 0,
(mx 1 )
i.e. t is vector white noise.

It follows from Theorem 3.2 that

i)
E ( Xt ) = E ( + t + 1 t 1 + 2 t 2 + ...)
=
ii)

Cov (Xt ) = x = j j0
j =0
iii)

Cov (Xt , Xt j) = j = j +s s0 .
s =0
Proof: See tutorial exercise 5.

3.3 Estimating Stationary VARs I
Each time we lag an variable we lose an observation. For convenience,

we will assume that our sample consists of T+p observations, so that
we have T observations available for estimating our VAR(p).
A VAR(p) is an example of what is referred to as a system of
seemingly unrelated regressions (SURE).
Normally, it is ine cient to estimate a SURE equation by equation
because such a procedure fails to take into account the fact that the
errors in the dierent equations are contemporaneously correlated.
However, in the special case in which the regressors in each equation
are the same, it can be shown that equation by equation estimation
by OLS is e cient.

3.3 Estimating Stationary VARs II
Consequently, we can obtain an e cient estimator of the parameters

of a VAR(p) by estimating each equation separately by conditional
least squares (CLS). That is, given p initial values for each of the m
variables in the VAR, we run m separate OLS regressions. In the ith
regression we regress xi on a constant, p lags of itself and p lags of
each of the other variables in the VAR.
For example, in the case of the 2 dimensional VAR(2) described in
section 3.1, we would estimate the following two equations separately
by OLS,
x1t = 10 +11,1 x1t 1 + 12,1 x2t 1 + 11,2 x1t 2 + 12,2 x2t 2 + 1t

(3.1)
x2t = 20 +21,1 x1t 1 + 22,1 x2t 1 + 21,2 x1t 2 + 22,2 x2t 2 + 2t .
(3.2)

3.3 Estimating Stationary VARs III
Because the regressors are random variables, the nite sample

properties of the OLS estimator are generally unknown and we have
to rely on asymptotic results to justify our inference procedures .
The following asymptotic results can be shown to hold under fairly
mild assumptions.
Provided that the errors in each equation are not autocorrelated,
the OLS estimator of the parameters in (3.1) and (3.2) is biased but
consistent.
The conventional t-statistic for testing the individual signicance of the
regressors in (3.1) and (3.2) has a standard normal distribution
asymptotically. For example, in (3.1)
^
12,1 d
! N (0, 1), (3.10)
^
SE (12,1 )

3.3 Estimating Stationary VARs IV
^ ^
where 12,1 is the OLS estimator of 12,1 and SE (12,1 ) is the
standard error. Consequently, we can test the individual signicance of
the regressors in (3.1) and (3.2) by using the usual test statistic and
taking the critical value for the test from the standard normal table.
We can test a set of linear restrictions on one or both of the equations
by performing a likelihood ratio test.
However, these tests are valid only asymptotically and may give
misleading results in small nite samples.
Note that autocorrelation in the error term in the ith equation is
potentially a serious problem since, depending on the order of the
autocorrelation, it can render the OLS estimator inconsistent.

3.3 Estimating Stationary VARs V
Recall that
2 3
21 12 13 . . 1m
6 21 22 23 . . 2m 7
6 7
6 . . . 7
cov (t ) = = 6 6
7 8t,
7 (3.7)
(mxm ) 6 . . . 7
4 . . . 5
m1 m2 . . . 2m
^
Let i denote the vector of residuals obtained when the ith equation
in the VAR(p) is estimated by OLS. That is,
0
^ ^ ^ ^
i = i 1 , i 2,....., iT .
(Tx 1 )

3.3 Estimating Stationary VARs VI
A consistent estimator of is given by
2 3
^2 ^ ^ ^
6 1 12 13 . . 1m 7
6 ^ ^2 ^ ^ 7
6 21 2 23 . . 2m 7
^ 6 7
6 . . . 7
=6 7 (3.11)
(mxm ) 6 . . . 7
6 7
6 . . . 7
4 5
^ ^ ^2
m1 m2 . . . m
where
^0 ^
^2 i i
i = , (3.12)
(T mp 1)
^0 ^
^ i j
ij = . (3.13)
(T mp 1)
3.3 Estimating Stationary VARs VII
For example,
^0 ^
^ 1 2
12 =
(T mp 1)

3.4 Forecasting a Stationary VAR(p) I
The results on forecasting a stationary univariate time series derived

in Topic 1.10 generalize in an obvious way to stationary VAR(p). In
particular, if
t i.i.d (0, ), (3.14)
(mx 1 )
then ET (xT +k ) is the optimal linear predictor of xT +k in the sense

that it minimizes the MSE of the forecasts.
Updating
to period T+1 we obtain
XT +1 = A0 + A1 XT + A2 XT 1 +...+ Ap XT p +1 + T +1 .
(mx 1 ) (mx 1 ) (mxm )(mx 1 ) (mxm ) (mx 1 ) (mxm ) (mx 1 ) (mx 1 )
(3.15)
3.4 Forecasting a Stationary VAR(p) II
Then
PT +1 jT = E T (X T +1 ) = A0 +A1 XT +A2 XT 1 +..... + AP XT p +1 .

(3.16)
Similarly,
XT +2 = A0 + A1 XT +1 + A2 XT +...+ Ap XT p +2 + T +2 ,
(mx 1 ) (mx 1 ) (mxm ) (mx 1 ) (mxm )(mx 1 ) (mxm ) (mx 1 ) (mx 1 )
and therefore
PT + 2 j T = ET ( X T + 2 )
= A0 +A1 ET (XT +1) + A2 XT +.. + AP XT p +2
= A0 +A1 PT +1 jT +A2 XT +..... + AP XT p +2 (3.16a)
Consequently, once we obtain PT +1 jT from (3.16) we can obtain

PT +2 jT from (3.16a).
3.4 Forecasting a Stationary VAR(p) III
In general, the forecast function for the VAR(p) will also be VAR(p).
That is,
PT +k jT = A0 +A1 PT +k 1 jT +A2 PT +k 2 jT +..... + AP PT +k p jT .

(3.17)
When the unknown coe cient matrices are replaced by their OLS
estimates we obtain
^ ^ ^ ^ ^ ^ ^ ^
P T + k j T = A0 + A1 P T + k 1 jT +A2 P T +k 2 jT +.....+Ap P T +k p jT .
(3.18)
Equation (3.18) can be used to recursively generate forecasts of the
variables in the VAR(p) for any forecast horizon, k.

3.5 Likelihood Ratio Tests in VARs I
In the context of VARs, hypotheses of interest to economists can

often be expressed as exclusion restrictions on a subset of the
coe cients of the VAR. Such hypotheses can easily be tested by
performing a likelihood ratio test.
Likelihood ratio tests are based upon the following well known result:
asy
LR = 2(Lu LR ) 2 (q ), (3.19)
where:
LR is the likelihood ratio test statistic.
Lu is the maximized log-likelihood of the unrestricted model.
LR is the maximized log-likelihood of the restricted model.
q is the number of restrictions imposed under the null hypothesis.

3.5 Likelihood Ratio Tests in VARs II
The likelihood ratio test statistic for testing exclusion restrictions may
also be written as
^ ^ asy
LR = T [log jR j log ju j] 2 (q ), (3.20)
where:
T is the number of observations.

^
R is the estimated error covariance matrix from the restricted model.
^
u is the estimated error covariance matrix from the unrestricted model.

3.5 Likelihood Ratio Tests in VARs III
Sims (1980) suggested a modication to the likelihood ratio test

which corrects for nite sample bias. The modied LR statistic is
given by
^ ^ asy
LR = (T k )[ log jR j log ju j] 2 (q ), (3.21)
where
k = (1 + mp )
is the number of regressosrs per equation in the unrestricted model.

3.6 VAR Order Selection I
3.6.1 Introduction
We say that the m dimensional VAR, xt , has order p if the following

condition is satised:
Ap 6 = 0 and Ai = 0 8 i > p.
We rarely, if ever, know the true order of the VAR. In practise, we

must use the data to determine p.
Making an erroneous choice of p may have serious consequences.
More specically:
Choosing a value for p which is smaller that the true value
(under-parameterizing the VAR) may cause the OLS estimator to be
ine cient and/or inconsistent.
Choosing a value of p which is greater than the true value
(over-parameterizing the VAR) may reduce the e ciency of the OLS
estimator and the precision of forecasts based on the estimated VAR.
3.6 VAR Order Selection II
3.6.1 Introduction
Adding an extra lag to an m dimensional VAR increases the number

of parameters to be estimated by m2 , while decreasing by 1 the
number of observations available for estimating each equation.
For example, if m = 5, adding an extra lag increases the number of
parameters to be estimated by 25, but reduces the number of
observations available by 1.
If m = 5, adding an extra 2 lags increases the number of parameters
to be estimated by 50 and reduces the number of observations
available by 2.
There is no consensus on the best way to choose the VAR order.
Below, we briey discuss several alternative data based approaches
to choosing p.

One approach to selecting the VAR order is to start with an arbitrarily

chosen upper bound for p and conduct a sequence of hypothesis tests
which terminates when a statistically signicant coe cient matrix is
encountered.
The sequential testing approach chooses the VAR order by executing
the following sequence of steps:
S1 Arbitrarily choose an initial VAR order p.
S2 Test
H0 : Ap = 0 (3.22)
H1 : Ap 6= 0.

The unrestricted model is given by
Xt = A0 + A1 Xt 1 + ... + Ap 2 Xt ( p 2 ) + Ap 1 Xt ( p 1 ) + Ap Xt p+ t ,
(3.23)
and the restricted model is given by
Xt = A0 + A1 Xt 1 +..... + Ap 2 Xt (p 2 ) +Ap 1 Xt (p 1 ) + t .
(3.24)
Notice that (3.22) requires that every element of the mxm matrix Ap
is zero. (Ap is a null matrix). Therefore (3.22) imposes m2 linear
restrictions. Since the null hypothesis imposes m2 linear restrictions
on the coe cients of (3.23), under the null hypothesis
asy 2
LR = 2(Lu LR ) 2 (m ),

3.6 VAR Order Selection III
or, using (3.21),
^ ^ asy 2
LR = (T k )[ log jR j log ju j] 2 (m ).
S3 Estimate (3.23) and (3.24) and obtain Lu and LR respectively.

S4 If
LRcalc > LRcrit
we reject the null and conclude that the VAR order is p.
If
LRcalc < LRcrit
we do not reject the null and proceed to step 5.

3.6 VAR Order Selection IV
S5 Test
H0 : Ap 1 =0 (3.25)
H1 : Ap 1 6= 0.
The unrestricted model for the test is
Xt = A0 + A1 Xt 1 +........ + Ap 2 Xt (p 2 ) +Ap 1 Xt (p 1 ) + t ,
(3.26)
and the restricted model is
Xt = A0 + A1 Xt 1 +........ + Ap 2 Xt (p 2 ) + t . (3.27)
If
LRcalc > LRcrit
we reject the null and conclude that the VAR order is p-1.
3.6 VAR Order Selection V
If
LRcalc < LRcrit
do not reject the null and proceed to test
H0 : Ap 2 = 0
H1 : Ap 2 6= 0.
The sequential testing procedure concludes as soon as we

reject the null hypothesis.
Note:

3.6 VAR Order Selection VI
At each stage of the testing procedure, the unrestricted model

contains more lags, and therefore fewer observations, that the
restricted model. However, both the unrestricted and restricted
models should be estimated using the same number of observations.
Therefore, when estimating the restricted model the sample size
should be adjusted so that it coincides with that over which the
unrestricted model is estimated.
This sequential testing procedure is automated in Eviews. The
automated Eviews procedure uses the modied likelihood ratio test
statistic
^ ^ asy
LR = (T k )[ log jR j log ju j] 2 (m 2 ). (3.21)

3.6.3 Using The Final Prediction Error criterion (FPE) to select the VAR order
A VAR is often used for the purpose of forecasting future values of

the variables in the VAR. In such cases, we may be more interested in
obtaining a good model for forecasting than in determining the true
order of the underlying DGP. If forecasting is the primary objective, it
makes sense to choose the VAR order so that it minimizes some
measure of forecast imprecision, such as the mean squared error
(MSE) of the forecast.
The FPE is a generalization of the MSE of the one step ahead
forecast and is dened as
m
T + mp + 1 b (p ) ,
FPE (p ) = . det (3.28)
T mp 1
where b (p ) is the estimated contemporaneous covariance matrix of

the error vector. The criterion is applied by executing the following
steps:
3.6.3 Using The Final Prediction Error criterion (FPE) to select the VAR order
S1 Choose an upper bound for p, which we denote by B.

S2 Estimate VAR models of orders p = 0,1,2,......,B and compute the
corresponding FPE(p).
S3 Choose as the estimate of p that value which minimizes the FPE(p)
over the set of integers (0,1,2,......,B).

3.6.4 Using information criteria to select the VAR order
Information criteria are often used for VAR order selection. The two
most popular information criteria are
h i
AIC (p ) = ln det b (p ) + (pm2 ) 2 . (3.29)
T
h i
BIC (p ) = ln det b (p ) + (pm2 ) ln T . (3.30)
T
The AIC and BIC (referred to in Eviews as the Scharwtz criterion)
attempt to achieve a compromise between choosing a model which
ts the data well and one which is parsimonious (i.e. does not have
too many parameters to estimate).

Goodness of t is measured by
h i
b
ln det (p ) , (3.31)
b (p ), will tend to be small if the model ts well. However, it is

since
always possible to make
h i
ln det b (p )
smaller by adding additional lags.

We therefore add a penalty term, (pm2 ) T2 in the case of the AIC and
(pm2 ) lnTT in the case of the BIC.
P is then chosen to minimize either (3.29) or (3.30).

3.6 VAR Order Selection III
Since ln(T) > 2 for T 8, the BIC penalizes additional lags more
severely than the AIC, thereby encouraging the selection of a more
parsimonious model.
Because it penalizes additional lags more severely than does the AIC,
the BIC will never select a larger value of p than that selected by the
AIC.

3.6.5 Diagnostic checks
After estimating the selected VAR one can perform various diagnostic
tests to determine whether or not the error term in each equation is
"well behaved".
One can test the hypothesis that the error term is normally
distributed in each equation in the VAR.
One can test for evidence of autocorrelation in the error term in each
equation in the VAR.
These tests are automated in Eviews.

3.7 Testing for Granger-causality I
3.7.1 An alternative representation of a VAR(P)
We usually write an m dimensional VAR(p) process by arranging

the VAR by lag. In the case of an m dimensional VAR(p) process
this representation is given by
Xt = A0 + A1 Xt 1 + A2 Xt 2 + ... + Ap + t ,
1 Xt ( p 1 ) + Ap Xt p
(3.32)
where xt , A0 and et are mx1 column vectors and Aj is a mxm matrix
for all j = 1,2,......p. Using summation notation, equation (3.32) may
be written more compactly as
p
Xt = A0 + Ai Xt i + t . (3.33)
i =1

3.7 Testing for Granger-causality II
However, for some purposes, including testing for Granger-causality, it

is more convenient to arrange the VAR by variable. For
example, we may write the individual equations of
Xt = A0 + A1 Xt 1 + A2 Xt 2 + ... + Ap 1 Xt ( p 1 ) + Ap Xt p + t ,
(3.32)
as

3.7 Testing for Granger-causality III
p p p
x1t = 10 + 11,j x1,t j + 12,j x2,t j + .. + 1m,j xm,t j
j =1 j =1 j =1
+1t
p p p
x2t = 20 + 21,j x,1t j + 22,j x2,t j . + .. + 2m,j xm,t j
j =1 j =1 j =1
+2t
. = .
. = .
. = .
. = .
p p p
xmt = m0 + m1,j x1,t j + m2,j x2,t j + .. + mm,j xm,t j
j =1 j =1 j =1
+mt
(ETC3450) Vector Autoregressions
(3.34)
54 / 135
3.7.2 Testing for Granger-causality using a likelihood ratio test
Economists are often interested in whether or not one variable

causesanother variable. For example, does an increase in interest
rates causean appreciation of the exchange rate?
In a famous paper in 1969 the Nobel Laureate Clive Granger dened a
concept of causality which has since become known as
Granger-causality.
Granger argues that since a cause must precede an eect, if the
variable x2 "causes" the variable x1 , past and present values of x2
should help to forecast future values of x1 .
Granger proposed a test of the hypothesis that one variable (or set of
variables) in a VAR Granger-causes another variable (or set of
variables) in the VAR. Testing for Granger-causality is very common
in empirical studies in macroeconomics and nance.

Informally, the time series x2 is said to Granger-cause the time series

x1 if using information about current and past values of x2 improves
the accuracy of forecasts of forecasts of x1 .That is, x2 Granger-causes
x1 if x2 is useful in predicting future values of x1 .
A formal denition of Granger causality is given below.

Denition
3.2 Let It denote the information set containing all the relevant information in
the universe up to and including period t. Let
It = I t fx2s js tg
denote the information set containing all relevant information in the universe
excluding past and present values of x2 . Then we say that x2 Granger-causes x1 if
mse (x 1,t +k jI t ) < mse (x 1,t +k jI t )
for at least one k = 1,2,....., where mse (x 1,t +k jI t ) and mse (x 1,t +k jI t )
denote the mean squared error of a forecast of x1 in period t+k conditional on It

and It respectively.

3.7 Testing for Granger-causality IV
In the 2 dimensional VAR(2)
x1t = 10 +11,1 x1t 1 + 12, x2t 1 + 11,2 x1t 2 + 12,2 x2t 2 + 1t (3.1)
x2t = 20 +21,1 x1t 1 + 22,1 x2t 1 + 21,2 x1t 2 + 22,2 x2t 2 + 2t ,

(3.2)
it can be shown that denition 3.2 implies that x2 does not
Granger-cause x1 if and only if
12,1 = 12,2 = 0
in (3.1). That is, lags of x2 do not belong in (3.1).

3.7 Testing for Granger-causality V
Therefore, testing the null hypothesis that x2 does not Granger-cause

x1 is equivalent to testing
H0 : 12,1 = 12,2 = 0 (3.36)
against
H1 : 12,j 6= 0 for at least one value of j.
Under the null hypothesis,
asy
LR = 2(Lu Lr ) 2 (2).
We can test (3.36) by executing the following steps:

3.7 Testing for Granger-causality VI
S1 Estimate
x1t = 10 +11,1 x1t 1 + 12, x2t 1 + 11,2 x1t 2 + 12,2 x2t 2 + 1t (3.1)
by OLS and obtain Lu .

S2 Impose the restrictions on (3.1) and obtain the restricted equation
x1t = 10 +11,1 x1t 1 + 12 x1t 2 + 1t . (3.37)
S3 Estimate (3.27) by OLS and obtain Lr .

S4 Reject the null hypothesis that x2 does not Granger-cause x1 if
LR calc = 2(Lu Lr ) > LR crit , (3.38)
where LRcrit is the (1-) percentile of a 2 variable with 2 degrees of

freedom.
3.7 Testing for Granger-causality VII
Note:
The null hypothesis for the test is that x2 does not Granger-cause
x1 .That is, the null hypothesis is that there is no Granger causality.
If we reject the null, we conclude that x2 does Granger-cause x1 .
The test is an asymptotic test, since only the asymptotic distribution of
the test statistic is known. Consequently, the test may be unreliable in
small samples.
To test for the null hypothesis that x1 does not Granger-cause x2 , we
test the null hypothesis
21,1 = 21,2 = 0.
The unrestricted model is
x2t = 20 + 21,1 x1t 1 + 21,2 x1t 2 + 22,1 x2,t 1 + 22,2 x2t 2 + 2t ,

3.7 Testing for Granger-causality VIII
x2t = 20 + 22,1 x2,t 1 + 22,2 x2t 2 + 2t .

Granger causality testing can easily be extended to higher dimensional

VARs. For example, consider the four dimensional VAR(P) given by

p p p
x1t = 10 + 11,j x1,t j + 12,j x2,t j + 13,j x3,t j
j =1 j =1 j =1
p
+ 14,j x4,t j + 1t
j =1
p p p
x2t = 20 + 21,j x1,t j + 22,j x2,t j + 23,j x3,t j
j =1 j =1 j =1
p
+ 24,j x4,t j + 2t
j =1
p p p
x3t = 30 + 31,j x1,t j + 32,j x2,t j + 33,j x3,t j
j =1 j =1 j =1
p
+ 34,j x4,t j + 3t
j =1
p p p
x4t = 40 + 41,j x1,t j + 42,j x2,t j + 43,j x3,t j
(ETC3450) j =1 j =1
Vector Autoregressions j =1 64 / 135
We can test the null hypothesis that x1 and x2 do not Granger-cause

either x3 or x4 . That is, the set of variables consisting of x1 and x2
does not Granger-cause the set of variables consisting of x3 and
x4 .The null hypothesis for the test is
31,1 = ... = 31,p = 0

32,1 = ... = 32,p = 0
41,1 = ... = 41,p = 0

42,1 = ... = 42,p = 0
Notice that the null hypothesis imposes 4p restrictions on the model -

2p restrictions on the equation for x3 and 2p restrictions on the
equation for x4 .
The unrestricted model for the test is

p p p
x3t = 30 + 31,j x1,t j + 32,j x2,t j + 33,j x3,t j
j =1 j =1 j =1
p
+ 34,j x4,t j + 3t
j =1
p p p
x4t = 40 + 41,j x1,t j + 42,j x2,t j + 43,j x3,t j
j =1 j =1 j =1
p
+ 44,j x4,t j + 4t ,
j =1
(3.39)


p p
x3t = 30 + 33,j x3,t j + 34,j x4,t j + 3t
j =1 j =1
p p
x4t = 40 + 43,j x3,t j + 44,j x4,t j + 2t .
j =1 j =1
(3.40)
We perform the test by executing the following steps:
S1 Estimate (3.39) and obtain LU .
S2 Estimate (3.40) and obtain Lr .

3.7 Testing for Granger-causality IV
S3 Reject the null hypothesis if
LRcalc = 2(Lu Lr ) > LR crit ,
where LRcrit is the (1-) percentile of a 2 variable with (4p) degrees

of freedom.
A test of the hypothesis that one of the variables in the system does
not Granger-cause the remaining variables is referred to as a
block-exogeneity test. In the four dimensional VAR(p) above, a test
of the hypothesis that x1 does not Granger-cause x2, x3 or x4 is an
example of a block-exogeneity test.
Bi-causality between two time series is quite common. That is, we
may reject the null hypothesis that x1 does not Granger-cause x2
(implying that x1 does Granger-cause x2 ), and also reject the null
hypothesis that x2 does not Granger-cause x1 ( implying that x2 does
Granger-cause x1 ).
One must be careful to distinguish between Granger-causality and

causality in the conventional sense of cause and eect. The fact that
x2 Granger-causes x1 does not necessarily mean that x2 is the cause
of x1 . It merely means that x2 is useful in forecasting x1 . Even if there
is no causal relation between x1 and x2 , but they are highly correlated,
we would expect each variable to be useful for forecasting the other.
Changing the nature of the information set can alter the outcome of
Granger causality tests.For example, including an additional variable
in the VAR may change the outcome of causality tests between x1
and x2 .
Granger- causality tests are often not robust with respect to the
choice of VAR order. That is, changing the order of the VAR may
change the outcome of a Granger-causality test.

The standard Granger-causality test described above is invalid if the

variables in the VAR are not I(0).

3.8 Example 3.1 I
Suppose that we wish to estimate a VAR to explore the relationships

between the following three variables:
dTB = change in the treasury bill rate

dR3 = change in the 3 year bond rate
dR10 = change in the 10 year bond rate
We have a sample consisting of quarterly observations on the three

variables from 1960Q1 to 1991Q4.

3.8 Example 3.1 II
The VAR order selected by the various VAR order selection criteria
that we have discussed are reported in the table below.
ST FPE AIC BIC

p 7 2 7 0
Since there is evidence of autocorrelation in the errors at lag 2 but

not at lag 7, we decided to estimate a VAR(7). We include a

3.8 Example 3.1 III
constant in the VAR(7) to allow for non-zero means in dTB, dR3 and
dR10. The unrestricted VAR(7) is
7 7
dTBt = 10 + 11,j dTBt j + 12,j dR3t j
j =1 j =1
7
+ 13,j dR10t j + 1t
j =1
7 7
dR3t = 20 + 21,j dTBt j + 22,j dR3t j
j =1 j =1
7
+ 23,j dR10t j + 2t
j =1
7 7
dR10t = 30 + 31,j dTBt j + 32,j dR3t j
j =1 j =1
7
+ 33,j dR10t j + 3t
j =1
3.8 Example 3.1 IV
We next test the null hypothesis that dTB does not Granger-cause
dR3 or dR10.
The null hypothesis is
21,j = 0, j = 1, 2, ...., 7
(3.41)
31,j = 0, j = 1, 2, ...., 7

3.8 Example 3.1 V
The unrestricted model for dR3 and dR10 is

7 7
dR3t = 20 + 21,j dTBt j + 22,j dR3t j
j =1 j =1
7
+ 23,j dR10t j + 2t
j =1
7 7
dR10t = 30 + 31,j dTBt j + 32,j dR3t j
j =1 j =1
7
+ 33,j dR10t j + 3t
j =1

3.8 Example 3.1 VI
The restricted model for dR3 and dR10 is

7 7
dR3t = 20 + 22,j dR3t j + 23,j dR10t j + 2t
j =1 j =1
7 7
dR10t = 30 + 32,j dR3t j + 33,j dR10t j + 3t
j =1 j =1
Under the null hypothesis

asy
LR = 2(Lu Lr ) 2 (14).
LRcalc = 2[60.92432 47.4175] = 27.01364.
LRcirt = 23.68.

3.8 Example 3.1 VII
Since
LRcalc > LRcirt
we reject the null hypothesis and conclude dTB does Granger-cause
dR3 and/or dR10.

3.9. Structural versus reduced form VARs I
The m dimensional VAR(p)
Xt = A0 + A1 Xt 1 + A2 Xt 2 +...+ Ap Xt p + t (3.6)
is called a reduced form VAR.

The model given by (3.6) can be interpreted as the reduced form of
an associated system of structural equations called a structural VAR
(SVAR).

3.9. Structural versus reduced form VARs II
To illustrate the relationship between structural and reduced form

VARs, consider the system of simultaneous structural equations
x1t = a0 + a1 x2t + a2 x1t 1 + a3 x2t 1 + a4 x1t 2 + a5 x2t 1 + u1t ,

(3.42)
and
x2t = b0 + b1 x1t + b2 x1t 1 + b3 x2t 1 + b4 x1t 2 + b5 x2t 1 + u2t ,

(3.43)
where
cov (u1t , u2t ) = 0 8t.
Notice that what distinguishes the system of equations given by
(3.42) and (3.43) from a reduced form VAR is the presence of the
contemporaneous regressors x2t and x1t in equations (3.42) and
(3.43) respectively.

3.9. Structural versus reduced form VARs III
Because (3.42) and (3.43) simultaneously determine the equilibrium
values of x1t and x2t ,the regressor x2t is endogenous in (3.42) and the
regressor x1t is endogenous in (3.43). Consequently, (3.42) and (3.43)
cannot be consistently estimated by least squares.
Moving the contemporaneous regressors in (3.42) and (3.43) to the
left-hand side we obtain
x1t a1 x2t = a0 + a2 x1t 1 + a3 x2t 1 + a4 x1t 2+ a5 x2t 1 + u1t ,

b1 x1t + x2t = b0 + b2 x1t 1 + b3 x2t 1 + b4 x1t 2 + b5 x2t 1 + u2t ,
or, using matrix notation,
1 a1 x1t a0 a2 a3 x1t 1
= +
b1 1 x2t b0 b2 b3 x2t 1
a4 a5 x1t 2 u1t
+ + .
b4 b5 x2t 2 u2t
3.9. Structural versus reduced form VARs IV
(3.44)
Equation (3.44) may be written more compactly as

SXt = S0 + S1 Xt 1 + S2 Xt 2 + ut , (3.45)
where
x1t x1t 1 u1t
Xt = , Xt 1 = , ut = ,
(2x 1 ) x2t (2x 1 ) x2t 1 u2t
(2x 1 )
1 a1 a0 a2 a3
S = , S0 = , S1 = ,
(2x 2 ) b1 1 b0 (2x 2 ) b2 b3
(2x 1 )
a4 a5
S2 = .
(2x 2 ) b4 b5
3.9. Structural versus reduced form VARs V
(3.45) is an example of a SVAR.
Because of the endogeneity bias problem alluded to above,
SXt = S0 + S1 Xt 1 + S2 Xt 2 + ut , (3.45)
cannot be consistently estimated by least squares.

However, premultiplying on both sides (3.45) by S 1 and using the
fact that
S 1 S = I2 ,
we obtain
1 1 1 1 1
S SXt = S S0 + S S1 Xt 1 +S S2 Xt 2 +S ut ,
or
Xt = A0 + A1 Xt 1 + A2 Xt 2 + t , (3.46)
(2x 1 ) (2x 1 ) (2x 2 ) (2x 1 ) (2x 2 ) (2x 1 ) (2x 1 )

3.9. Structural versus reduced form VARs VI
where
1 1 1 1
A0 = S S0 , A1 = S S1 , A2 = S S2 , t = S ut . (3.47)
Under the assumption that
E ( t t +s ) = 0 8 s 6 = 0
all the regressors on the right-hand side of (3.46) are exogenous, and
(3.46) may be interpreted as the reduced form of the SVAR
SXt = S0 + S1 Xt 1 + S2 Xt 2 + ut . (3.45)

3.9. Structural versus reduced form VARs VII
The relationship between the structural and the reduced form

parameters, and between the structural and reduced form errors are
given by
1 1 1 1
A0 = S S0 , A1 = S S1 , A2 = S S2 , t = S ut . (3.47)
Notice that the reduced form parameters and the reduced form errors
are nonlinear functions of their structural counterparts.
Notice that even though there is no cross-equation correlation in the
structural errors, that is,
E (ut ut0 ) = I2 ,

3.9. Structural versus reduced form VARs VIII
there is cross-equation correlation in the reduced form errors since,

using (3.47),
0
E (t t ) = E [(S 1
ut )(S 1
ut ) 0 ]
= E (S 1
ut ut0 (S 1 0
)
= S 1
E (ut ut0 )(S 1 )0
= S 1
I2 ( S 1 ) 0
1 1 0
= S (S )
6= I2.
The fact that there is cross-equation correlation in the reduced form

errors but not in the structural errors will prove to be important when
we discuss impulse response analysis in section 3.10 below.

3.10 Impulse Response Analysis I
3.10.1 Introduction
Researchers are often interested in the response over time of one or

more variables in a VAR to an impulse or shock to another variable
in the VAR. The analysis of such responses in known as impulse
response analysis or, less commonly, multiplier analysis.
In the univariate context the impulse response function traces out the
eect on current and future values of x1 of a shock to x1 in the
current period.
However, in an m dimensional VAR with m time series x1, x2,........., xm ,
a shock to x1 in period t will aect not just current and future values
of x1, but also current and future values of x2,........., xm .
For ease of exposition, in our discussion of impulse response analysis
we will conne our attention to a stationary m dimensional VAR of
order 1. It is straight forward to generalize the results to an m
dimensional VAR of order p.
3.10 Impulse Response Analysis II
3.10.1 Introduction
Let
Xt = A1 Xt 1 + t , (3.46)
where
t VWN (0, ).
Rearranging (3.46) and exploiting the stationarity assumption we
obtain
Xt A1 Xt 1 = t
( Im A1 L ) Xt = t
Xt = ( I m A1 L ) 1 t
Xt = (Im + 1 L + 2 L2 + 3 L3 + ....)t
Xt = t + 1 t 1 + 2 t 2 + 3 t 3 + ...(3.47)

3.10 Impulse Response Analysis III
3.10.1 Introduction
It immediately follows from (3.47) that
Xt
= j . (3.48)
t j
Equation (3.48) states that the response of the Xs in time period t to

shocks or impulses j periods earlier is captured by the matrix j .
Therefore, the estimated response of the Xs in time period t to
b j,
shocks or impulses j periods earlier is captured by the matrix
b
where j is an estimate of j .That is,
dt
X bj.
= (3.49)
t j

3.10 Impulse Response Analysis IV
3.10.1 Introduction
Since we cant estimate the regression

Xt = t + 1 t 1 + 2 t 2 + 3 t 3 + ...
how do we estimate j ?
We can estimate j by using the following recursive algorithm.
S1 Multiplying on both sides of
( Im A1 L ) 1
= (Im + 1 L + 2 L2 + 3 L3 + ....)
by (Im A1 L) we obtain
Im = (Im A1 L)(Im + 1 L + 2 L2 + 3 L3 + ....)
= (Im + 1 L + 2 L2 + 3 L3 + ....)
A1 L A1 1 L2 A1 2 L3 A1 3 L4 ......
0m = 1 L + 2 L2 + 3 L3 + ....
A1 L A1 1 L2 A1 2 L3 A1 3 L4 . (3.50)
3.10 Impulse Response Analysis V
3.10.1 Introduction
S2 Collecting terms in L in (3.50) we obtain
0m = (1 A1 ) L + ( 2 A1 1 ) L 2 + ( 3 A1 2 )L3 + ... (3.51)
It immediately follows from (3.51) that all the coe cient matrices on
the right-hand side must be null matrices. Therefore,
1 A1 = 0m ) 1 = A1
2 A1 1 = 0m ) 2 = A1 1 = A1 (A1 ) = A21 .
3 A1 2 = 0m ) 3 = A1 2 = A1 (A1 ) = A31
.
.
In summary, using this recursive algorithm we obtain
1 = A1 , 2 = A21 , 3 = A31 , ... (3.52)

3.10 Impulse Response Analysis VI
3.10.1 Introduction
Of course when we estimate
Xt = A1 Xt 1 + t (3.46)
we obtain estimates of the A matrices, which can then be used to

construct estimates of the matrices. That is,
b1 = A
b2 = A
b1, b3 = A
b 21 , b 31 , .... (3.53)
To consolidate ideas, consider a special case of (3.46) given by

0 1 2 30 1 0 1
x1t 0.5 0.2 0.1 x1t 1 1t
@ x2t A = 4 0.1 0.1 0.3 5 @ x2t 1 A + @ 2t A
x3t 0.3 0.2 0.3 x3t 1 3t
(3.54)

3.10 Impulse Response Analysis VII
3.10.1 Introduction
with the contemporaneous covariance matrix of t , which we denote

by , given by 2 3
4.0 0.5 0.0
= 4 0.5 1.0 0.5 5 . (3.55)
0.0 0.5 0.74
The model given by (3.54) and (3.55) is a special case of the general
3 dimensional VAR(1) process
Xt = A1 Xt 1 + t , (3.44)
where
t VWN (0, ),

3.10 Impulse Response Analysis VIII
3.10.1 Introduction
with
2 3 2 3
0.5 0.2 0.1 4.0 0.5 0.0
A1 = 4 0.1 0.1 0.3 5 , = 4 0.5 1.0 0.5 5 .
0.3 0.2 0.3 0.0 0.5 0.74
Since
Xt = A1 Xt 1 + t
Xt A1 Xt 1 = t
(I3 A1 L ) Xt = t ,
the autoregressive polynomial associated with (3.54) is
(L) I3 A1 L. (3.56)
(3x 3 )

3.10 Impulse Response Analysis IX
3.10.1 Introduction
The roots of (3.56) are the values of r that satisfy
j I3 rA1 j = 0,
where 2 3
0.5 0.2 0.1
A1 = 4 0.1 0.1 0.3 5 .
0.3 0.2 0.3
It is straightforward to show that
r1 = 1.37, r2 = 5, r3 = 36.
Since these roots are all greater than one in absolute value (since each
root is a real number, the modulus is the absolute value) , it follows
from Theorem 3.1 above that (3.54) is a stationary vector time series.

3.10 Impulse Response Analysis X
3.10.1 Introduction
Suppose that we wish to trace the eects on the system of a one unit
shock to x1 in period 1, assuming that no other shocks occur in
any time period. That is, we assume the following with respect to
the shocks impacting on the system
11 = 1, 21 = 0, 31 = 0,
(3.57)
1t = 2t = 3t = 0, t > 1.
We established above that for a stationary VAR(1)
Xt
= j = Aj1 . (3.58)
t j
Setting
t = 1, j = 0
3.10 Impulse Response Analysis XI
3.10.1 Introduction
in (3.58) we obtain
X1
= 0 = A01 = I3 . (3.59)
1
Using (3.59)
X1
dX1 = d 1
1
= I3 d 1
= d 1 .
That is, 0 1 0 1 0 1
dx11 d11 1
@ dx21 A = @ d21 A = @ 0 A . (3.60)
dx31 d31 0
3.10 Impulse Response Analysis XII
3.10.1 Introduction
Equation (3.60) captures the impact eect on each of the x

variables of the shock to x1 in period 1.
Notice that the shock to x1 in peroid 1 has no eect on x2 or x3 in
period 1. (No impact eect).
However, because of the autoregressive structure of the VAR, the
shock to x1 in period 1 does aect x1 and x2 in subsequent periods.
To compute the change in the x variables in period 2 caused by the
shock to x1 in time period 1 we set
t = 2, j = 1
in
Xt
= j = Aj1 (3.58)
t j

3.10 Impulse Response Analysis XIII
3.10.1 Introduction
and obtain
X2
= 1 = A11 = A1 . (3.61)
1
Therefore
X2
dX2 = d 1
1
= A1 d 1 .
That is,
0 1 2 3 0 1 0 1
dx12 0.5 0.2 0.1 1 0.5
@ dx22 A = 4 0.1 0.1 0.3 5 @ 0 A = @ 0.1 A .
dx32 0.3 0.2 0.3 0 0.3
(3.62)

3.10 Impulse Response Analysis XIV
3.10.1 Introduction
It is left as an exercise to show that

0 1 0 1
dx13 0.30
@ dx23 A = @ 0.15 A . (3.63)
dx33 0.26
Notice the following features of this exercise:

The shock to x1 in period 1 has no impact eect on x2 or x3 .
Because of the autoregressive structure of the VAR, the shock to x1 in
period 1 aects the values of all three variables in periods 2 and 3.
The eect of the shock on x1 and x3 has begun to decline by period 3.
That is,
dx13 = 0.30 < dx12 = 0.50,

dx33 = 0.26 < dx32 = 0.30.

3.10 Impulse Response Analysis XV
3.10.1 Introduction
We can also calculate the cumulative eect of the shock after n

periods. For example, after three periods the cumulative eect of a
one unit shock to x1 in period 1 is
0 1 0 1 0 1 0 1
1 0.5 0.30 1.80
dX1 + dX2 + dX3 = @ 0 A + @ 0.1 A + @ 0.15 A = @ 0.25 A
0 0.3 0.26 0.56
Because the VAR(1) is this example is stationary, the eect of the

shock to x1 in period 1 on subsequent values of x1 , x2 and x3 will
eventually die out.

3.10 Impulse Response Analysis XVI
3.10.1 Introduction
In fact, it can be shown that for a stationary m dimensional VAR(p)
Xt = A0 + A1 Xt 1 + A2 Xt 2 + ... + Ap 1 Xt ( p 1 ) + Ap Xt p + t ,
the long run eect of a shock to xi in period 1 is given by the ith

column of the (mxm) matrix
1
(Im A1 A2 ........... Ap ) . (3.64)

3.10 Impulse Response Analysis XVII
3.10.1 Introduction
For example, in our three dimensional VAR(1) the long run eect
on all the variables in the VAR of a shock to x1 in period 1 is given
by the rst column of the matrix
82 3 2 39 1
< 1 0 0 0.5 0.2 0.1 =
(I3 A1 ) 1
= 4 0 1 0 5 4 0.1 0.1 0.3 5
: ;
0 0 1 0.3 0.2 0.3
2 3 1
0.5 0.2 0.1
= 4 0.1 0.9 0.3 5
0.3 0.2 0.7
2 3
2. 5 0.7 0.7
= 4 0.7 1. 4 0.7 5.
1. 3 0.7 1. 9

3.10 Impulse Response Analysis XVIII
3.10.1 Introduction
Therefore, in the long run, a one unit shock to x1 in period 1

causes x1 to increase by 2.5, x2 to increase by 0.7 and x3 to increase
by 1.3 .
The second and third columns of the matrix
2 3
2. 5 0.7 0.7
(I3 A1 ) 1 = 4 0.7 1. 4 0.7 5
1. 3 0.7 1. 9
respectively contain the long run eects on x1 , x2 and x3 of a one

unit shock to x2 and x3 in period 1.

3.10 Impulse Response Analysis XIX
3.10.1 Introduction
In practise, the matrix

1
( Im A1 A2 ........... Ap ) (3.64)
is unknown. However, we can estimate the long run eects of shocks

to the x variables by replacing the unknown A matrices by the OLS
estimates we get when we estimate the VAR(p). The estimated
matrix of long run eects is
1
Im b1
A b2
A ...... bp
A .
When the variables in the VAR are of very dierent orders of

magnitude it makes more sense to consider a one standard
deviation shock to a variable rather than a one unit shock.

3.10 Impulse Response Analysis XX
3.10.1 Introduction
Recall that in our VAR(1) example

2 3
4.0 0.5 0.0
= 4 0.5 1.0 0.5 5 . (3.65)
0.0 0.5 0.74
Therefore, a one standard deviation shock to x1 in period 1

corresponds to a shock of 2 units since
p
SD (11 ) = 4 = 2.
In this case the analysis proceeds as before except that now

0 1 0 1
d11 2
@ d21 A = @ 0 A
d31 0
in the above calculations.

In section 3.10.1 we discussed conducting impulse response analysis

using the errors from the VAR. Unfortunately, using the VAR errors to
conduct impulse response analysis is problematic. The nature of the
problem is easily seen by considering the trivariate VAR in the
previous section.
In our trivariate VAR we traced the eect over time of a shock to x1
in period 1 under the assumption that
21 = 31 = 0.
However, we see from the covariance matrix of the error vector,

2 3
4.0 0.5 0.0
= 4 0.5 1.0 0.5 5 ,
0.0 0.5 0.74
that
cov (11 , 21 ) = 0.5.
Therefore, assuming that
11 6= 0, but 21 = 0
does not make sense since it ignores the fact that 11 and n21 are
correlated.
In general, impulse response analysis which ignores the
contemporaneous correlation between the errors in the VAR is unlikely
to provide an accurate description of the dynamic relationships
between the variables.
The problem of contemporaneous correlation of the errors can be
resolved by using the errors from the SVAR rather than the errors
from the VAR to conduct impulse response analysis.
To illustrate the basic idea, suppose that the structural model

associated with our three dimensional VAR(1) is
x1t = a0 + a1 x2t + a2 x3t + a3 x1t 1 + a4 x2t 1 + a5 x3t 1 + u1t ,

x2t = b0 + b1 x1t + b2 x3t + b3 x1t 1 + b4 x2t 1 + b5 x3t 1 + u2t ,
x3t = c0 + c1 x1t + c2 x2t + c3 x1t 1 + c4 x2t 1 + c5 x3t 1 + u3t ,
(3.66)
where
cov (u1t , u2t ) = cov (u1t , u3t ) = cov (u2t , u3t ) = 0 8t. (3.67)

As discussed in section 3.9, the system of equations given by (3.66)

may be written in matrix notation as
SXt = S0 + S1 Xt 1 + ut , (3.68)
where
2 3 2 3 2 3
x1t x1t 1 u1t
Xt = 4 x2t 5 , Xt 1 = 4 x2t 1 5 , ut = 4 u2t 5 ,
(3x 1 ) x3t (3x 1 ) x3t (3x 1 ) ut 1
1
2 3 2 3 2 3
1 a1 a2 a0 a3 a4 a5
S =4 b1 1 b 2 5 , S0 = 4 b 0 5 , S 1 = 4 b 3 b 4 b 5 5
(3x 3 ) (3x 1 ) (2x 2 )
c1 c2 1 c0 c3 c4 c5

3.10 Impulse Response Analysis V
As before, the VAR is obtained by premultiplying on both sides of

(3.68) by S 1 . Doing so we obtain
1 1 1 1
S SXt = S S0 + S S1 Xt 1 +S ut ,
or
Xt = A0 + A1 Xt 1 + t , (3.69)
(3x 1 ) (3x 1 ) (3x 3 ) (3x 1 ) (3x 1 )
where
1 1 1 1
A0 = S S0 , A1 = S S1 , A2 = S S2 , t = S ut . (3.70)

3.10 Impulse Response Analysis VI
Substituting
1
t = S ut (3.71)
into the VMA representation of our VAR given by
Xt = t + 1 t 1 + 2 t 2 + 3 t 3 + ... (3.72)
we obtain
Xt = S 1
ut + 1 S 1
ut 1 + 2 S 1
ut 2 + 3 S 1
ut 3 + ... (3.73)
It immediately follows from (3.73) that
Xt
= j S 1
. (3.74)
ut j
For example,
Xt
= 1 S 1
.
ut 1
3.10 Impulse Response Analysis VII
Impulse response analysis based on (3.74) is called orthogonal

impulse response analysis, because the errors in the vector ut j are
uncorrelated or orthogonal.
Because the errors in the vector ut are uncorrelated, it is reasonable
to consider the eect on future values of the x variables of a change
in u1t holding u2t and u3t constant.
The estimated orthogonal impulse responses are given by
dt
X bjS
= b 1
. (3.75)
ut j

3.10 Impulse Response Analysis VIII
We saw in section 3.10.1 that for a VAR(1)
b1 = A
b2 = A
b1, b3 = A
b 21 , b 31 , ..
so we have no problem estimating the j matrices in
Xt
= j S 1
. (3.74)
ut j
However, estimating
2 3
1 a1 a2
S=4 b1 1 b2 5
c1 c2 1
in (3.74) is problematic.

3.10 Impulse Response Analysis IX
We cant estimate consistently estimate the parameters in S by

estimating SVAR

x3t = c0 + c1 x1t + c2 x2t + c3 x1t 1 + c4 x2t 1 + c5 x3t 1 + u3t
(3.76)
by least squares, since (3.76) suers from simultaneity bias.

3.10 Impulse Response Analysis X
The most commonly adopted solution to the problem of consistently

estimating S is to impose a recursive structure on the
contemporaneous relationships among the variables x variables. This
entails rewriting (3.66) as
x1t = a0 + a3 x1t 1 + a4 x2t 1 + a5 x3t 1 + u1t ,

x2t = b0 + b1 x1t + b3 x1t 1 + b4 x2t 1 + b5 x3t 1 + u2t , (3.77)
x3t = c0 + c1 x1t + c2 x2t + c3 x1t 1 + c4 x2t 1 + c5 x3t 1 + u3t .
Imposing this recursive structure removes the simultaneity bias

problem and the parameters in (3.77) can be consistently estimated
by least squares.

3.10 Impulse Response Analysis XI
Notice that imposing this recursive structure is equivalent to requiring

that S is a lower triangular matrix of the form
2 3
1 0 0
S = 4 b1 1 0 5. (3.78)
c1 c2 1
For the VAR(1) model
Xt = A0 + A1 Xt 1 + t
the orthogonalized impulse responses are given by
Xt
= j S 1
= Aj1 S 1
, (3.79)
ut j
where S is a lower triangular matrix.

3.10 Impulse Response Analysis XII
Unfortunately, orthogonal impulse response analysis based on

imposing a recursive structure on

x3t = c0 + c1 x1t + c2 x2t + c3 x1t 1 + c4 x2t 1 + c5 x3t 1 + u3t
(3.76)
suers from a major limitation. The orthogonalized impulse
responses depend on the ordering of the variables in the VAR.

3.10 Impulse Response Analysis XIII
To see this note that for our trivariate VAR(1)
x1t = a0 + a3 x1t 1 + a4 x2t 1 + a5 x3t 1 + u1t ,

x2t = b0 + b1 x1t + b3 x1t 1 + b4 x2t 1 + b5 x3t 1 + u2t , (3.77)
x3t = c0 + c1 x1t + c2 x2t + c3 x1t 1 + c4 x2t 1 + c5 x3t 1 + u3t ,
it follows from
Xt
= j S 1
= Aj1 S 1
(3.79)
ut j
that
Xt
= 0 S 1
= A01 S 1
=S 1
. (3.80)
ut
Substituting 2 3
x1t x1t x1t
u 1t u 2t u 3t
Xt 6 x2t x2t x2t 7
=4 u 1t u 2t u 3t 5
ut x3t x3t x3t
u 1t u 2t u 3t
3.10 Impulse Response Analysis XIV
and
2 3 1 2 3
1 0 0 1 0 0
S 1
=4 b1 1 0 5 =4 b1 1 0 5
c1 c2 1 b 1 c 2 + c1 c 2 1
into (3.80) we obtain

2 x x x1t
3 2 3
1t 1t
u 1t u 2t u 3t 1 0 0
6 7 4
4
x2t
u 1t
x2t
u 2t
x2t
u 3t 5= b1 1 0 5. (3.81)
x3t x3t x3t b 1 c2 + c1 c2 1
u 1t u 2t u 3t
Notice that:

3.10 Impulse Response Analysis XV
The impact eect of an orthogonalized shock to x1 is given by the

elements in column 1 of the matrix on the right-hand side of (3.81).
Therefore an orthogonalized shock to x1 in period t can aect the
values of all three x variables in period t.
Therefore an orthogonalized shock to x2 in period t has no impact
eect on x1 . That is, it cannot aect x1 on in period t.
Therefore an orthogonalized shock to x3 in period t has no impact eect
on either x1 or x2 . That is, it cannot aect either x1 or x2 in period t.

3.10 Impulse Response Analysis XVI
Now suppose we change the ordering of the variables in the VAR and
rewrite our VAR as
x2t = b0 + b1 x1t + b2 x3t + b3 x1t 1 + b4 x2t 1 + b5 x3t 1 + u2t

(3.82)
Imposing a recursive structure on (3.82) we obtain
x2t = b0 + b3 x1t 1 + b4 x2t 1 + b5 x3t 1 + u2t

x1t = a0 + a1 x2t + a3 x1t 1 + a4 x2t 1 + a5 x3t 1 + u1t , (3.83)

3.10 Impulse Response Analysis XVII
For the model given by (3.83)

2 3 2 3 2 3
x2t u2t 1 0 0
Xt = 4 x1t 5 , ut = 4 u1t 5 , S = 4 a1 1 0 5,
(3x 1 ) x3t (3x 1 ) u3t c1 c2 1
2 3
1 0 0
S 1
= 4 a1 1 0 5
a 1 c 2 + c1 c 2 1

3.10 Impulse Response Analysis XVIII
Therefore, for the VAR(1) given by (3.83),
Xt
= 0 S 1
= A01 S 1
=S 1
ut
implies that
2 x x2t x2t
3 2 3
2t
u 2t u 1t u 3t 1 0 0
6 7 4
4
x1t
u 2t
x1t
u 1t
x1t
u 3t 5= a1 1 0 5. (3.84)
x3t x3t x3t a 1 c2 c2 1
u 2t u 1t u 3t
It is evident from (3.84) that changing the ordering of the variables in

the VAR has dramatically changed the orthogonalized impulse
responses. Now:
An orthogonalized shock to x2 in period t can aect the values of all
three variables in period t.
3.10 Impulse Response Analysis XIX
An orthogonalized shock to x1 in period t cannot aect x2 in period t.

An orthogonalized shock to x3 in period t cannot aect either x1 or x2
in period t.
Notice that, in general, a shock to the variable xj has no impact
eect on any variable that precedes xj in the VAR ordering.
The dependence of the orthogonalized impulses on the ordering of the
variables in the VAR is a serious limitation of orthogonalized impulse
response analysis, as the choice of ordering is often arbitrary.
In light of this limitation, there are two circumstances when
orthogonalized impulse response analysis is useful:
When the pattern of the impulse responses is not very sensitive to the
ordering of the variables.
When economic theory and/or common sense strongly suggests that a
particular ordering of the variables is appropriate.

3.10 Impulse Response Analysis XX
For example, assume that we have a trivariate VAR containing the

exchange rate, a price index and a wage index as variables. Since, in
the short run, the exchange rate is more exible than prices and
wages, and prices are more exible than wages, the following ordering
might be reasonable,
X = ( wages, prices, exchange rate). (3.85)
The ordering in (3.85) permits wages and price shocks to have an

immediate eect on the exchange rate, but does not allow an
exchange rate shock to have an immediate impact on prices and
wages.
After estimating a VAR in Eviews one can obtain:
Tables of estimated impulse responses.
Line graphs of estimated impulse responses.

3.10.3 Example 3.2
Consider the trivariate VAR(7) discussed in section 3.8 given by
xt = A0 + A1 x t 1 + ....... + A7 x t 7 + t ,
(3x 1 ) (3x 1 ) (3x 3 )(3x 1 ) (3x 3 )(3x 1 ) (3x 1 )
where
xt0 = (dtbt , dr 3t , dr 10t )
and
t VWN (0, ).

3.10.3 Example 3.2
Table 1 below shows the response of dtb over four time periods to
orthogonalized shocks to each of the variables in time period 1 (which
is 1960q1).
Table 1
Response of dtb Shock to Shock to Shock to
Period dtb dr3 dr10
. 1 0.761033 0.000000 0.000000
2 0.272200 0.001457 0.050738
3 0.249042 0.168280 0.006059
4 0.091407 0.121443 0.174631
Notice that, because they come after dtb in the VAR ordering, shocks
to dr3 and dr10 have no impact eect on dtb.

3.10.3 Example 3.2
Table 2 below shows the response of dtb over four time periods to
orthogonalized shocks to each of the variables in time period 1 when
we change the VAR ordering to
xt0 = (dr 3t , dr 10t , dtbt ).
Table 2
Response of dtb Shock to Shock to Shock to
Period dtb dr3 dr10
. 1 0.340359 0.658571 0.172081
2 0.097719 0.236282 0.106259
3 0.015845 0.299844 0.014800
4 0.213461 0.018240 0.087759
Notice that shocks to dr3 and dr10 now aect dtb in period 1.

3.10.3 Example 3.2
Tables of impulse responses are generally di cult to interpret. It is

generally more useful to present impulse responses in the form of a
graph.
The graphs below illustrate the responses over 10 time periods of dtb,
dr3 and dr10 to various shocks.
The blue line is the impulse response function and the red lines are
95% condence intervals.

Response to Cholesky One S.D. Innovations 2 S.E.
R es pons e of D(T BI LL) to D( TBILL) Response of D (TBI LL) to D (R3) Respons e of D(T BI LL) to D (R10)
1.00 1.00 1.00
0.75 0.75 0.75
0.50 0.50 0.50
0.25 0.25 0.25
0.00 0.00 0.00
-0.25 -0.25 -0.25
-0.50 -0.50 -0.50

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Respons e of D (R3) to D(T BILL) Res pons e of D(R 3) to D (R3) Response of D (R3) to D(R 10)
.8 .8 .8
.6 .6 .6
.4 .4 .4
.2 .2 .2
.0 .0 .0
-.2 -.2 -.2
-.4 -.4 -.4

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Res ponse of D (R10) t o D( TBILL) R esponse of D(R 10) to D(R3) R espons e of D (R10) to D (R10)
.6 .6 .6
.4 .4 .4
.2 .2 .2
.0 .0 .0
-.2 -.2 -.2
-.4 -.4 -.4

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
3.11.1 Advantages of VARs
In a VAR, all of the variables are treated as being endogenous. No

arbitrary decision is made to treat some variables as exogenous.
VARs capture the dynamic and simultaneous nature of economic
relationships.
Unlike structural econometric models, VARs do not require the
imposition of any dubious identifying restrictions.
VARs provide a very useful framework for forecasting, since they do
not require us to provide future values for exogenous variables.
VARs provide a framework for causality testing and impulse response
analysis.

The estimated coe cients of a VAR generally have no interesting

economic interpretation. For example in the equation
7 7
dR3t = 20 + 21,j dTBt j + 22,j dR3t j ,
j =1 j =1
dR3t
21,2 = ,
dTBt 2
which has no intrinsic economic interest.
While they dont require the imposition of identifying restrictions,
VARs ignore valid restrictions suggested by economic theory.
The number of parameters to be estimated can increase substantially
as the number of lags and /or variables included in the VAR
increases. In a high order, high dimensional VAR we may have a large
number of parameters to estimate relative to the sample size.
The decision as to which variables to include in the VAR is somewhat
arbitrary.
Forecasts, Granger-causality tests and impulse response analysis may

be very sensitive to the VAR order, p, which must be chosen by the
researcher.
The VAR framework is designed for the analysis of I(0) variables only.
A vector error correction model (VECM) is the appropriate
multivariate framework for the analysis of I(1) variables, provided they
are cointegrated.

3.12 Vector Autoregressive Moving Average (VARMA)
Models I
The DGP for the m-dimensional vector xt is a vector autoregressive

moving average (VARMA) process of order (p,q) if
xt = A0 + A1 xt 1 + .... + Ap xt p + t + B1 t 1 + B2 t 2 + ... + Bq t q ,
(3.86)
where
t VWN (0, ),
Ai , i = 1, 2, ..., p, and Bj , j = 1, 2, ...q,
are mxm matrices and A0 is an mx1 vector.

Models II
Equation (3.86) may be written more compactly as
(L)xt = A0 + (L)t , (3.87)
where
(L) = (Im A1 L A2 L 2 .... Ap Lp ),

(L) = (Im + B1 L + B2 L2 + ..... + Bq Lq ).
A VAR(p) is a special case of a VARMA(p,q) in which the moving

average coe icient matrices, Bj , j = 1, 2, ...q, are null matrices.
Just as a VAR(p) is a multivariate generalization of an AR(p) process,
a VARMA(p,q) is a multivariate generalization of an ARMA(p,q)
process.

Models III
The condition for a VARMA(p,q) process to be weakly stationary is

the same as that for weak stationarity of a VAR(p) process, namely
the roots of the autoregressive matrix polynomial (L) must all have
modulus greater than 1.
A VARMA(p,q) model can be estimated by maximum likelihood.
However, because of the presence of the moving average terms in
(3.87) the rst-order conditions for maximizing the log-likelihood
function are highly nonlinear in the parameters and have to be solved
by numerical methods. Perhaps for the reason, VARs are more
commonly used than VARMAs to model weakly stationary
multivariate time series.

Topic 3 16

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Topic 3 16

Transféré par

Droits d'auteur :

Formats disponibles

Topic 3

() Vector Autoregressions 1 / 135

3.8 Example 3.1

(ETC3450) Vector Autoregressions 3 / 135

In topics 1 and 2 we used ARIMA models to represent the data

(ETC3450) Vector Autoregressions 4 / 135

For example, macroeconomic theory suggests that, in addition to

(ETC3450) Vector Autoregressions 5 / 135

Vector autoregressions or VARs, which were rst proposed by Sims

x1t = 10 +11,1 x1t 1 + 12,1 x2t 1 + 11,2 x1t 2 + 12,2 x2t 2

(ETC3450) Vector Autoregressions 6 / 135

10 11,1 12,1 11,2 12,2

Using these denitions, (3.1) and (3.2) may be written as

x1t 01 11,1 12,1 x1t 1

or, even more compactly, as

Notice that in equation (3.3) the variables on both the right-hand

(ETC3450) Vector Autoregressions 8 / 135

var (1t ) cov (1t , 2t )

We will assume that

t (0, )8t and E(t t +s ) = 08s 6= 0. (3.5)

(ETC3450) Vector Autoregressions 9 / 135

The covariance structure assumed in (3.5) implies the following:

cov (1t , 2t j ) = 08j 6= 0.

The error term in each equation is serially uncorrelated. That is,

cov (1t , 1t j ) = 08j 6= 0,

cov (2t , 2t j ) = 08j 6= 0.

(ETC3450) Vector Autoregressions 10 / 135

The errors in each equation are homoskedastic. That is,

var (1t ) = var (1s ), 8t, s,

var (2t ) = var (2s ), 8t, s.

An error vector that satises

t (0, )8t and E(t t +s ) = 08s 6= 0. (3.5)

is referred to as vector white noise.

(ETC3450) Vector Autoregressions 11 / 135

where we assume that

t (0, )8t and E(t t +s ) = 08s 6= 0.

The contemporaneous covariance matrix of the error vector is given by

Increasing either the order or the dimension of the VAR can

(ETC3450) Vector Autoregressions 14 / 135

in the preceding example, if we increase the order of the VAR from 4 to

In principle, we could have dierent lag lengths for each variable in

(ETC3450) Vector Autoregressions 15 / 135

(ETC3450) Vector Autoregressions 16 / 135

The contemporaneous covariance matrix is nite and time invariant i.e.

Cov (x t )= E [(x t )(x t )0 ] = x < 8t.

var (x1t ) cov (x1t , x2t )

cov (x1t , x1t j) cov (x1t , x2t j)

(ETC3450) Vector Autoregressions 18 / 135

Note that, using the lag operator,

(ETC3450) Vector Autoregressions 19 / 135

(L) is referred to as the autoregressive matrix polynomial

j Im A1 r i A2 ri2 .... Ap rip j = 0, (3.9)

Im A1 ri A2 ri2 .... Ap rip .

When we expand the left-hand side of (3.9) we obtain a polynomial of

for any ri that satises the equation

jIm A1 ri A2 ri2 .... Ap rip j = 0.

(ETC3450) Vector Autoregressions 21 / 135

There is a vector generalization of the Wold representation

(ETC3450) Vector Autoregressions 22 / 135

i.e. t is vector white noise.

(ETC3450) Vector Autoregressions 23 / 135

It follows from Theorem 3.2 that

Proof: See tutorial exercise 5.