Vous êtes sur la page 1sur 10

Massachusetts Institute of Technology Department of Economics

Time Series 14.384

Guido Kuersteiner

Lecture 6: Multivariate Time Series and VARs The theory on linear time series models developed for the univariate case extends in a natural way to the multivariate case. 0 Let xt = (x1,t , ..., xk,t ) be a k-dimensional vector of univariate time series. Then xt is weakly stationary if Ext =
0

t
1/2

is the euclidean matrix and E (xt ) (xt+k ) = (k) t exists and k(0)k < where kAk = (tr AA0 ) n n PP norm. It follows immediately for any n and any vectors a1 , ..., an that ai (i l)al 0. The spectral
i=1 l=1

density matrix of yt is dened as

f () = P

1
X (h)eih 2
h=

provided h= k(h)k < . Note that the diagonal elements of f () are the univariate spectral densities of xit . The o-diagonal elements of f () are called the cross spectra between xl,t and xmt . Using l,m (h) = E (xlt l ) (xm,t+h m ) [f ()]l,m =
1
X l,m (h)eih . 2
h=

Note that l,m (h) 6= l,m (h) in general so that the o-diagonal elements of f () are in general complex valued. As for the univariate case there is an innite moving average representation of xt . Assume that xt is purely non-deterministic and weakly stationary, then yt = +
P X j=0

j tj = + (L)t ,

where (L) =

j Lj and t is a multivariate sequence of white-noise processes such that Et 0 Et t and Et j


0

j=0

= 0 =
= 0 for t 6= s

P 2 The coecients matrices j of dimension k k satisfy j=0 kj k < . If the polynomial (L) can be approximated by a rational matrix polynomial (L)1 (L) then the model has an ARMA representation
1

(L)(xt ) = (L)t .

The vector ARMA model is causal if (L) is well dened, i.e., if it has a convergent power series expansion. This is the case if (z) is invertible for |z| 1 or if det (z) 6= 0 for |z| 1. In the same way the ARMA representation is invertible if det (z) 6= 0 for |z| 1. We can then write (L)1 (L) (xt ) = t

or (xt ) =

P where I (L) = I i Li = (L)1 (L). i=1 In practice, it is usually assumed that (L) can be approximated by a nite order polynomial. This leads to the VAR(p) model yt = 1 yt1 + ... + p ytp + t , where yt = xt . The VAR(p) model can be represented following way 1 p1 yt I . . = .. . . ytp+1 I = A Yt
p X i=1

X i=1

i (xti ) + t

in companion form by stacking the vectors yt in the p 0 . . . 0 yt1 . . + . ytp Yt1 + t 0 . . . 0 et

The autocovariance function of yt can be found from considering the Yule Walker equations (0) = Eyt yt = = = and (h) =
0 0 0

i Eyti yt + Et yt

p X i=1 p X i=1

i (i) + (i)i +
0

p X i=1

(h i)i

by stacking = [1 , ..., p ] and p = [(1)...(p)] . These equations can be written as (0) = 0 + p and p = (p) with such that For the AR(1) case we have (0) = (1)0 + 1 (h) = (h 1)0 1 and such that (0) = 1 (0)1 +
0

(p) ij = (i j) = (0) (p) .


0 0

(1) = (0)1 or (1) = 1 (0) and vec (0) = (1 1 ) vec (0) + vec 2

solving for vec (0) gives us vec (0) = (1 1 1 )


1

vec .

The best linear predictor for the VAR(p) model can be found in the same way as for the univariate case yt+1 = PMt yt+1 = 1 yt + ... + p ytp+1 = (L)yt+1 = (L) (I (L))1 t+1 = ((L) I) t+1 X = s ts+1 ,
s=1

where (L) =

s=0

s Ls = (I (L))1 and for the h-step ahead prediction error we have yt+h = PMt yt+h =
X

s ts+h

s=h

with prediction error yt+h yt+h = The prediction error therefore has variance
h1 X s=0

s ts+h .

var (yt+h yt+h ) = + 6.1. Estimation of a VAR(p) We stack the vectors yt such that xt 1kp then we can write or yt = xt + t . We stack the variables into matrices 0 y1 . Y = . , X = . yT
0 0 0 0 0

h1 X s=1

s s

0 0 0 = yt1 , yt2 , ..., ytp


0

and

kkp

= 1 , ..., p

yt = xt + t
0 x1 . . . , = xT
0

such that the model can be written as In vectorized form this is

0 1 . . . T
0

Y = X + . vec Y = (I X) vec + vec . 0 Tk T 1 log(2) log || (vec ) 1 IT vec , 2 2 2 3

Note that E(vec )(vec ) = IT . The likelihood is then approximately proportional to

where

The ML estimator is now immediately seen to be vec = h


0

0 0 0 (vec ) 1 IT vec = tr 1 = tr 1 (Y X)(Y X) . i1 0 1 I (I X ) I X (1 I) vec y 1 1 0 = 1 X 0 X X vec y 0 1 0 = I X X X vec y I X

which shows that the ML estimator is equivalent to OLS carried out equation by equation. Now 0 1 0 Ik X X vec = X vec 0 1 0 = I X X Ik X vec ,

where

from noting that

1 p 0 0 0 d 1 1 I TX X I 1 and T I X vec N (0, p ) with p = Ext xt . This follows p xt 1t . . (I X) vec = , P . xt kt P var xt t js xs =


0

and

0 j p

if t 6= s otherwise.

Therefore the distribution of the parameter estimates is asymptotically d 1 1 = N 0, p . T vec N 0, I 1 ( p ) I p p

If we have blockwise restrictions as in the case of Granger-non causality then we still can estimate the system equation by equation. If we have more general restrictions then we need to estimate the full system. 6.2. Prediction error variance decomposition

If we want to analyze the contributions of the error terms t to the total forecast error variance then we need 0 0 to orthogonalize the system. Let Et t = and RR0 = I where R is lower triangular. Then ERt t R0 = 0 E t t = I. We now look at the transformed model yt =
X j=0

j R1 Rtj =

Cj tj .

The forecast error of an h-step ahead forecast now is var (yt+h yt+h ) = +
h1 X j=1

Cj Cj = +

h1 X j=1

j R1 R10 j .

The coecients Cj can be obtained from where J 0 = [Ik , 0...0], | {z }


kkp

Cj = J 0 Aj JR1 1 I A= .. . I p1 p 0 . . . 0

Then, according to Sims (1981), the proportion of the h-step ahead forecast error variance in variable l accounted for by innovations in variable i,t is given by
2 rl,i + h1 X j=1

c2 , li,j

where rli = R1 li and cli,j = [Cj ]li . To see this, note that the forecast error is given by yt+h yt+h = R1 t + while the forecast error resulting from i,tj is given by
Ri 1 i,t + h1 X j=1 h1 X j=1

Cj tj ,

Ci,j i,tj

1 where Ri is the ith column of R1 and Ci,j is the ith column of Cj . The relative forecast error variance is then given by 2 rli +

var (yt+h yt+h )l where var (yt+h yt+h )l is the lth diagonal element of var (yt+h yt+h ) . If one is interested in innovations in the original variables rather than the orthogonalized innovations t then an identication scheme as discussed in the next section is needed. In particular, since t = Rt with R lower triangular, we can identify the rst element of t with the rst element of t . Since the ordering of the vectors yt is arbitrary this identication scheme applies to all elements of t . 6.3. Impulse response functions Closely related to the concept of error variance decomposition is the concept of an impulse response function. We are interested in the eect of a shock it onto the variable yl,t+h . Using the MA() representation for yt+h we nd yt+h =
X j=0

Ph1
j=1

c2 li,j

j t+hj

so the impact of t onto yt+h is h t . If we are interested in a unit variance shock of it then we need to take into account the fact that it is correlated with other shocks. This is again done by orthogonalizing the innovations t = Rt

where R is lower triangular such that and t is orthogonal. Then t = R1 t and in particular, 1t = r11 1t . Since the ordering of the variables is arbitrary we can restrict attention to the rst innovation without loss of generality. Note that once the value of 1t is xed, the value of the other innovations follow from t = R1 t where we set 2t = .... = kt = 0. The impact of 1t onto yt+h is therefore h R1 .,1 rst column of h R1 n R1 l,i .

The impact onto variable l is then

Typically the ordering of the variables is chosen to reect a certain structure of shocks in the economy. For example, if we want to model monetary policy shocks as the original source of randomness then we would place a monetary variable in the rst equation. 6.4. Granger causality Assume that yt = (y1t , y2t ) is partitioned into two subvectors. Granger dened the concept of causality in terms of forecast performance. In this sense y1t does not g-cause y2t if it does not help in predicting y2t . Formally, Denition 6.1 (Granger Causality). Let yt be a stationary process. Dene the linear subspaces M1 t M2 t Then y1t causes y2t if PM1 M2 (y2t ) 6= PM2 (y2t ) t1 t1 t1 and y1t causes y2t instantaneously if PM1 M2 (y2t ) 6= PM2 (y2t ) t t1 t1 It follows at once from the denition of Granger causality and the projection theorem that y1t does not g-cause y2t if var 1 = var 2 t t = sp {y1s , s t} = sp {y2s , s t}

where 1 = y2t PM1 M2 (y2t ) and 2 = y2t PM2 (y2t ). Another way to characterize Granger noncausality t t t1 t1 t1 is by noting that cov(y2t , y1th PM2 (y1th )) = 0 for h > 0. t1 To see this note that by Granger causality y2t PM2 (y2t ) M1 M2 t1 t1 t1 which implies that cov(y2t PM2 (y2t ), y1th PM2 (y1th )) = 0 for h > 0 t1 t1 since y1th PM2 (y1th ) M1 M2 . But by the projection theorem it follows that y1th PM2 (y1th ) t1 t1 t1 t1 M2 such that t1 cov(PM2 (y2t ), y1th PM2 (y1th )) = 0. t1 t1 It has to be emphasized that this notion of causality is strongly related to the notion of sequentiality in the sense that an event causing another event has to precede it in time. Moreover, the denition really is in terms of correlation rather than causation. Finding evidence of Granger causality can be an artifact of a spurious correlation. On the other hand, lack of Granger causality can be misleading too if the true causal link is of nonlinear form. An alternative denition of causality is due to Sims. 6

Denition 6.2 (Sims Causality). For y1t and y2t stationary we say that y1t does not cause y2t if cov(y2t+j , y1t PM2 (y1t )) = 0 for all j 1. t It can be seen immediately that this denition implies that all the coecients dj for j < 0 in the projection y1t =
X

dj y2tj + wt

j=

are zero and the projection residual wt is uncorrelated with all future values y2t+j . Theorem 6.3. Granger Causality and Sims Causality are equivalent.
1 2 Proof. Assume yt does not Granger cause yt . Then

cov(y2t , y1th PM2 (y1th )) = 0 for h > 0. t1


2 Note that PM2 (y2t ) = PM2 P 1 (y2t ) = PM1 M2 (y2t ) such that y2t PM2 (y2t ) t1 t1h t1h Mt1 Mt1 t1h th M11 M2h for h > 0 such that t t

cov(y2t , y1th PM2 (y1th )) = 0 for h > 0. th By stationarity this is equivalent to cov(y2t+h , y1t PM2 (y1t )) = 0 for h > 0. t

(6.1)

The reverse implication follows from the fact that by (6.1) y2t PM2 (y2t ) M11 M21 which corresponds t t t1 to Granger causality. 6.5. Granger causality in a VAR Let 11 (L) 12 (L) 21 (L) 22 (L) y1t y2t = 1t 2t

where 1t and 2s are uncorrelated for all t and s. Then y1t fails to Granger cause y2t if 21 (L) = 0. This follows from the fact that PM1 M2 (y2t ) = (22 (L) I) y2t + 21 (L)y1t t t1 = PM2 (y2t ) t1 if and only if 21 (L) = 0. If (L)1 exists then the MA() representation of the system is y1t 1t 1t 1 = (L) = (L) y2t 2t 2t Thus (L)(L) = I so in particular 21 (L)11 (L) + 22 (L)21 (L) = 0 which implies 21 (L) = 0 if 22 (L) 6= 0. We see that 11 (L) 12 (L) y1t 1t = 0 22 (L) 2t y2t 7

if y1t fails to Granger cause y2t . Thus we have y1t = 11 (L)1t + 12 (L)2t and 22 (L)y2t = 2t so y1t = 11 (L)1t + 12 (L)22 (L)y2t . Then PM2 (y1t ) = 12 (L)22 (L)y2t
t

since 11 (L)1t is orthogonal to M2 . It now follows immediately that t cov (y2t+j , y1t 12 (L)22 (L)y2t ) = cov (y2,t+j , 11 (L)1t ) = 0 j > 0 This establishes that Granger causality implies Sims causality. It can also be shown that Sims causality implies Granger causality. It thus follows that the two concepts are equivalent. We can test the null of Granger non-causality by estimating the unrestricted VAR equation by equation by OLS and then test if the coecients 21,1 ...21,p in y2t = c2 + 21,1 y1t1 + ... + 21,p y1,tp + 22,1 y2t1 ...22,p y2tp ((*))

are jointly signicantly dierent from zero. For a bivariate system this can be carried out by a standard F -test. P Calculate the unrestricted residuals RSS1 = 2 and residuals from the restricted regression t as RSS0 = P y2t = c2 + 22,1 y2,t1 + ... + 22,p y2tp . 2 . Then under normality t RSS0 RSS1 (T 2p 1)/p F (p, T 2p 1) RSS1
d

An asymptotically equivalent test is T (RSS0 RSS1 )/RSS1 2 under the null Hypothesis of no Granger p causality. 6.6. Structural VARs Assume we have a structural economic model which is driven by behavioral sources of variation collected in a vector process (t). The structural model connects economic variables to current and past values of driving shocks X X Bs ts (6.2) As yts =
s=0 s=0

If the number of elements in t is equal to the number of elements in yt and if knowledge of As and Bs is enough to solve for t in terms of lagged yt then we can write B(L)1 A(L)yt = B0 t with B(L) = I + where B(L)1 has a polynomial expansion B(L)
1 X i=1

We also assume that Yt has an equivalent VAR() representation X yt = s yts + ut

(6.3)

(6.4)

1 Bi B0 Li

A(L) = B(L)1 [A0 + (A(L) A0 )] . 8

P
0

Ci Li and

with Cs such that

P P 1 i Since B(L)1 satises I + Bi B0 Li = I it must hold that C0 = I. This establishes that i=1 0 Ci L (6.4) is X A0 yt + Cs yts = B0 t
s=1 X 1

Ci A0 Li + B(L)1 (A(L) A0 ) =

Substitution from (6.3) then gives

X s=1

Cs Ls .

A0 ut +

X A0 s + Cs yts = B0 t s=1

If the structural and reduced forms are identical it has to hold that A0 s = Cs . Then the unrestricted innovations of the VAR are related to the behavioral innovations t by ut = A1 B0 t . 0 Note that s are unrestricted reduced form parameters that can always be estimated from the data. If the theoretical model (6.2) does not restrict the dynamics of the system then we can always set s = A1 Cs . 0 Identication of the system then reduces to nding the matrices A0 and B0 . Since we can consistently estimate the reduced form residuals ut we can estimate = var(ut ) by
0 1X b = ut ut T

If we now impose the restriction that the policy disturbances t be uncorrelated and that = var (t ) is diagonal and that B0 = I then 0 = A0 A0 The matrix A0 can then be identied by imposing that it is lower triangular. In other words, if the only restrictions on the system are that A0 is the lower triangular and that is diagonal then the structural VAR is just identied. It is clear that the just identied case with triangular matrix is only one of many possibilities to identify A0 . Another interesting example is Blanchard and Quahs decomposition. Their goal is to decompose GNP into permanent and transitory shocks. They postulate that demand side shocks have only temporary eects on GNP while supply side or technology shocks have permanent eects. Unemployment on the other hand is aected by both shocks. They postulate Yt c11 (L) c12 (L) dt = , c21 (L) c22 (L) ut st with c11 (1) = 0 such that dt has no long-run eect on Yt . Also assume Et t = I. The V AR(p) representation of the system is
Yt a11 (L) a12 (L)
1t Yt1 = + a21 (L) a22 (L) ut ut1 2t
Since in this case A0 = I, it follows that c11 (0) c12 (0) dt 1t = c21 (0) c22 (0) 2t st 9
0

The goal is to estimate the structural residuals t which can be done if we know the coecients of the matrix 0 C0 = C(0). From E t t = we have 0 = C0 C0 i.e., var 1 = c11 (0)2 + c12 (0)2 var 2 = c21 (0)2 + c22 (0)2 cov (1 , 2 ) = c11 (0)c21 (0) + c12 (0)c22 (0) These are three restrictions for four variables. The fourth restriction can be obtained from the long-run 1 restriction c11 (1) = 0. Note that (I A(L)L) C0 = C(L) by the MA() representation of the VAR and 1 t = C0 t . So in particular, (I A(1)) C0 = C(1). Now 1 1 a22 (1) +a12 (1) 1 (I A(1)) = a21 (1) 1 a11 (1) D where D = det(I A(1)). The upper corner of C(1) is zero by the long run restrictions such that we have an additional equation to determine the coecients (1 a22 (1))c11 (0) + a12 (1)c21 (0) = 0.

10

Vous aimerez peut-être aussi