Académique Documents
Professionnel Documents
Culture Documents
Summary When analysing macroeconomic data it is often of relevance to allow for structural breaks in the statistical analysis. In particular, cointegration analysis in the presence of
structural breaks could be of interest. We propose a cointegration model with piecewise linear
trend and known break points. Within this model it is possible to test cointegration rank,
restrictions on the cointegrating vector as well as restrictions on the slopes of the broken linear
trend.
1. Introduction
In the analysis of economic time series it is often necessary to allow breaks in the deterministic
components. When allowing for breaks the timing is important, this could either be known in
advance or an algorithm searching for breaks could be applied. While both issues are discussed
in the literature and mainly in a univariate setting, this paper focuses on cointegration analysis in
a multivariate setting in the presence of breaks at known points in time. The suggested approach
is a slight generalization of the likelihood-based cointegration analysis in vector autoregressive
models suggested by Johansen (1988, 1996). There are only a few conceptual differences and
the major issue for the practitioner is that new asymptotic tables are needed. This paper concerns
only asymptotic analysis. Finite sample properties of the rank test and tests for restrictions on the
cointegrating vector have been discussed by Johansen (2000a,b,c) in the situation of no breaks
using an analytic correction factor to the likelihood ratio tests.
c Royal Economic Society 2000. Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street,
Malden, MA, 02148, USA.
Cointegration analysis
217
Structural breaks have been discussed intensively in the context of univariate autoregressive
time series with a unit root. An important finding is that a time series given by stationary
fluctuations around a broken constant level is better described by a random walk than a stationary
time series, see Perron (1989, 1990) and Rappoport and Reichlin (1989). Addressing this issue,
these authors suggested various univariate models allowing for breaks in the deterministic term.
In particular, Perron (1989) suggested three models: (A) crash model with change in intercept
but unaffected slope of the linear trend, (B) changing growth model with no change in intercept
but changing slope of trend function, and (C) where both intercept and slope are changed at
the time of the break. The model presented here generalizes model (C) and allows for testing
hypotheses corresponding to model (A).
A related concern in econometric models is parameter stability which is investigated by
methods related to those for known break points. These methods typically allow for structural
breaks at unknown times, and have been discussed, for instance, in the special issues of the Journal
of Business & Economic Statistics, volume 10, 1990 and the Journal of Econometrics, volume
70, 1996. More recently a test of this type has been suggested by Inoue (1999) in connection with
cointegration testing in a vector autoregressive setting. While those authors and also this paper
are concerned with breaks in the deterministic terms, some procedures for analysing breaks in the
cointegration parameter have been presented by Kuo (1998), Seo (1998), Hansen and Johansen
(1999) and Hansen (2000).
The approach taken here is to analyse cointegration in a Gaussian vector autoregressive model
with a broken linear trend and known break points. Likelihood analysis of cointegration is then
given in terms of reduced rank regression, a combination of least squares regression analysis and
canonical correlation analysis. The model and the rank hypothesis are discussed in Section 2.
Section 3 presents tests for cointegration rank for models with broken trend and broken level.
The asymptotic distributions have been simulated and the results described by response surface
analysis. Next, in Section 4 various tests for linear restrictions on the slopes for the broken trend
are given. Most of these tests are asymptotically 2 -distributed. In Section 5 the suggested
procedures are illustrated using data for inflation rates, interest rates and exchange rate for Italy
and Germany.
Throughout the paper the following notational convention is used. For a matrix, a, with full
a = 0 and have the property that (a, a )
column rank let a = a(a a)1 . Further, let a satisfy a
2. The Model
The cointegrated vector autoregressive model with no breaks is analysed in detail in Johansen
(1996). A simple example of the basic model is
X t = X t1 + 1 t + + t ,
(2.1)
where we have left out more lags. Cointegration will appear if has reduced rank in which case
we can write = . In that case the process generated by (2.1) has a quadratic trend, which is
eliminated if we assume 1 = , as will be done throughout this paper. Note that the reduced
rank involves the combined matrix (, 1 ) = ( , ) . We therefore consider
X t = ( X t1 + t) + + t ,
c Royal Economic Society 2000
(2.2)
218
S. Johansen et al.
as the starting point for this paper. The role of the deterministic terms, see Johansen (1996, Ch. 5)
is briefly summarized as follows. The model defined by (2.2) will generate a process with a linear
trend and is therefore called Hl (r ). Even X t has a trend and is hence trend stationary. A number
of sub-models are defined by successively restricting the parameters and . If = 0 then
X t = X t1 + + t ,
(2.3)
and the process still has a linear trend, but X t does not and becomes stationary with a constant
level. This model is denoted Hlc (r ). Finally, if = 0 and also = then
X t = ( X t1 + ) + t ,
(2.4)
and the process has no linear trend in any direction. This model is denoted Hc (r ).
In the rest of this section we formulate a model for the observed time series X t , t = 1, . . . , T,
which is divided into sub-samples according to the position of break points. For each sub-sample
a vector autoregressive model is chosen, so the parameters of the stochastic components are the
same for all sub-samples, while the deterministic trend may change between sub-samples. In that
case the process can be given rather simple representations and interpretations in each period and
statistical analysis akin to that of the usual vector autoregressive models.
X t1
t
+ j +
k1
i X ti + t
(2.5)
i=1
Hl (r ):
rank(, 1 , . . . , q ) r
or
(, 1 , . . . , q ) =
1
..
.
Cointegration analysis
219
rank(, 1 , . . . , q ) r
and
1 , . . . , q = 0.
As an alternative to the models Hl and Hc a rank hypothesis could be formulated for alone as
in example (2.3),
Hlc (r ):
rank r
and
1 , . . . , q = 0.
Tj
T j1
D j,ti =
i=k+1
for T j1 + k + 1 t T j ,
otherwise,
1
0
is the effective sample of the jth period. It is convenient to gather the sample dummies and the
drift parameters for the different sample periods
E t = (E 1,t , . . . , E q,t ) ,
= (1 , . . . , q ),
= (1 , . . . , q ) ,
X t1
t Et
+ E t +
k1
i=1
i X ti +
q
k
i=1 j=2
j,i D j,ti + t ,
(2.6)
220
S. Johansen et al.
where the dummy parameters j,i are p-vectors and the observations X 1 , . . . , X k are held fixed
as initial observations. Note, that the effect of the dummy variables D j,t1 , . . . , D j,tk corresponding to the observations X T j1 +1 , . . . , X T j1 +k is to render the corresponding residuals zero
thereby essentially eliminating the corresponding factors from the likelihood function, and hence
producing the conditional likelihood function given the initial values in each period.
2.3. Interpretation
A process satisfying the hypothesis Hl (r ) can be interpreted using Grangers representation
theorem. That is, linear combinations of the process, given by , cointegrate while the process
exhibits a linear trend in each of the sub-samples. As usual it is necessary to assume that the
process is actually an I(1) process.
Assumption 1. Assume that the roots of the characteristic polynomial,
A(z) = (1 z)I p z
k1
i (1 z)z i ,
i=1
.
(2.7)
t
i + Y j,t + c, j + l, j t
(2.8)
i=T j1 +k+1
)1 . The processes Y
for j = 1, . . . , q, T j1 + k < t T j and C = (
j,t are
stationary, identically distributed and have zero expectation. The slope parameters, l, j , can be
expressed as
l, j = C j + (C I p ) j ,
whereas the level coefficient c, j depends on initial values in such a way that c, j is an identified
function of the parameters
c, j = (C I p ) j + (C ) j j .
For each sample period the process X t + t E
t t is stationary and hence it has no trending
behaviour. The common stochastic trends are
s=T j1 +k+1 s and the slopes of the common
= .
deterministic trends are
j
l, j
Cointegration analysis
221
In some situations it is convenient to have linear combinations of the data representing the
X and X both of which are combinations that do
non-stationary trends. Examples are
t
t
not cointegrate.
The representation shows that in each sub-sample all linear combinations of the process are
allowed to have a linear trend, which generalizes model (C) suggested by Perron (1989). Tests
for linear restrictions on the slope parameter l = C + (C I p ) are discussed in Section
4. The Granger representation shows that the slope for the cointegrating vector, l = , has
= .
to be treated separately from the slope of the common deterministic trend,
l
An example is the two-period model, q = 2, with common slopes, l,1 = l,2 , corresponding
to Perrons model (A). In general these hypotheses are of the form
l G = G = 0
Hl (r ) :
or
= G
(2.9)
Hl (r ) :
l M =
M = 0
for the common deterministic trends. Here G and M are known matrices of dimension (q g)
and (q m), respectively, with g, m < q and full column rank. In particular, Perrons model (A)
= .
is given by G = M = (1, 1) , which means that 1 = 2 and
1
2
A more subtle question is how the transition happens from one sample period to the next.
The suggested conditioning on k initial values in each period allows for great flexibility and these
transition periods can be extended if necessary. An alternative approach would be to use a specific
transition function of some kind. Suppose it is of interest to model an instantaneous break in the
level. As a simple example of such a problem, with one lag, consider an unobserved components
formulation
X t = c,1 1(tT1 ) + c,2 1(t>T1 ) + Z t ,
Z t = Z t1 + t ,
such that for 2 t T,
X t = {X t1 c,1 1(t1T1 ) c,2 1(t1>T1 ) } + c,1 1(tT1 ) + c,2 1(t>T1 ) + t .
Using the definitions of D j,t and E j,t , in particular that D2,t = 1(t=T1 ) , E 1,t = 1(tT1 ) and
E 2,t = 1(tT1 +2) , it follows that
X t = (X t1 c,1 E 1,t c,2 E 2,t ) + {c,2 (I p + )c,1 }D2,t1 + t .
Comparing with equation (2.6) or rather (3.6) it is found to be of the form
X t1
X t =
+ 2,1 D2,t1 + t
Et
222
S. Johansen et al.
Another type of restriction of interest is co-breaking, see Hendry (1997) or Clements and
Hendry (1999, pp. 249252). In the model Hl (r ), the slope and the intercept of the linear trend
of the cointegrating relation X t are in general different from period to period. An r -vector
is an equilibrium slope co-breaking vector if the slope of the deterministic trend X t does
not change from period to period; that is, must satisfy ( l ) span(1, . . . , 1) . Grangers
representation Theorem 2.1 shows that l = and hence co-breaking is a linear restriction
on the row space of the (q r )-matrix . When q < r or when the rank of otherwise is
smaller than r then there are at least (r rank ) co-breaking vectors, given by the orthogonal
complement, ( ) , to the matrix . Note, that the hypothesis in (2.9), = G, is formulated
in terms of the column space of and is therefore not directly linked to co-breaking, although, it
gives an upper bound for the rank of and thereby a lower bound for the number of co-breaking
vectors. When g is smaller than r there are at least (r g) co-breaking vectors given by ( ) .
X ti (i = 1, . . . , k 1),
CanCor X t ,
D j,ti , (i = 1, . . . , k; j = 2, . . . , q).
X t1
t Et
Ft ,
where Ft is shorthand notation for the -field generated by the regressors. The likelihood ratio
test statistic for the hypothesis of at most r cointegrating relations, Hl (r ), against Hl ( p) is given
by
p
L R{Hl (r )|Hl ( p)} = T
log(1 i ).
(3.1)
i=r +1
Cointegration analysis
223
For the asymptotic results the following notation is convenient. The relative break points
v j = T j /T satisfy 0 = v0 < v1 < < vq = 1. Let v j = v j v j1 and define a qdimensional vector of indicator functions for the sample periods, eu = {. . . , 1(v j1 <uv j ) , . . .} .
This function is the limit of E [T u] as T increases. For any two vector valued continuous functions
f u and gu on [0,1] we use the notation
( f u |gu ) = f u
f s gs ds
1
0
gs gs ds
1
gu ,
tr
0
d Wu Fu
Fu Fu du
Fu d Wu
(3.2)
as T and for fixed relative break points, v j . Here W is a standard Brownian motion of
dimension ( p r ) and F is a ( p r + q)-dimensional process,
Wu
Fu =
e .
(3.3)
ueu
u
The asymptotic distribution has been simulated and the results analysed by a response surface
analysis presented in Section 3.4.
For analytic reasoning and for computer simulations it is convenient to rewrite the representation of the distribution given by (3.2). This is based on two ideas. First, the distribution is
invariant with respect to linear transformations of the vector process F. Thus if the first ( p r )
components are regressed on the last q components giving F0,u = (Wu |ueu , eu ), the transformed
1
version of the matrix ( 0 Fu Fu du)1 is block diagonal. It follows that expression (3.2) can be
rewritten as the sum of two terms which do not involve the levels of the Brownian motion, see
(3.4) below. We find
1
1
d Wu F0,u
tr
0
+tr
0
F0,u F0,u
du
0
1
d Wu (ueu |eu )
F0,u d Wu
1
(ueu |eu )d Wu
Secondly, when regressing a Brownian motion on the level in each of the two or more sub-samples,
the sub-sample Brownian motions will be independent. These considerations lead to a second
representation of the asymptotic distribution (3.2).
c Royal Economic Society 2000
224
S. Johansen et al.
Theorem 3.2. Let W (1) , . . . , W (q) be independent ( p r )-dimensional standard Brownian motions and define
Jj =
0
1
( j)
(u|1){d Wu } ,
( j)
( j)
{Wu |1, u}{d Wu } ,
Kj =
1/2
(u|1)2 du
0
1
Lj =
( j)
( j)
j=1
j=1
(3.4)
j=1
From representation (3.4) of the limit distribution it is seen that the asymptotic distribution
only depends on the relative length of the sample periods, not on their ordering. For instance, in
the case of one break point, the asymptotic distribution is the same if T1 = T /3 as if T1 = 2T /3.
Moreover, the first term in (3.4) is the trace of a ( p r )-dimensional square matrix while the
second term is the sum of inner products of ( p r )-dimensional vectors. This reflects the degrees
of freedom arising from the matrix and the vectors j , respectively.
This representation also shows another feature of the limit distribution. In the case of q + 1
sample periods let D Fq+1 (v1 , . . . , vq+1 ) denote the asymptotic distribution. When the lengths
of one of the sample periods, v j , tends to zero it follows that the contributions K j , L j vanish
in the first term of (3.4). In the second term we can isolate J j J j and find
lim D Fq+1 (v1 , . . . , vq+1 ) = D Fq (v1 , . . . , v j1 , v j+1 , . . . , vq+1 ) + J j J j ,
v j 0
(3.5)
Cointegration analysis
225
X t1
Et
+
k1
i X ti +
i=1
q
k
j,i D j,ti + t .
(3.6)
i=1 j=2
Wu
eu
.
(3.7)
d Wu Fu N
N
1
0
Fu Fu du N
1
N
1
0
0
Iq
Fu d Wu
(3.8)
.
(3.9)
226
S. Johansen et al.
(iii) Suppose n < min( p r, q). Then there exist matrices , of rank n and dimensions
= . The asymptotic distribution is then
{( p r ) n}, (q n), respectively, so
given by (3.8) where N now depends on
I pr n 0( pr n)n 0
N =
.
(3.10)
0
0
The test is not as attractive as the previously considered tests. The limit distribution is a
. The test is therefore not asymptotically similar with respect to the
complicated function of
slope parameters for the broken trend. Although the third situation in Theorem 3.3 only occurs
on a null subset of the parameter space, the issue ought to be addressed in the statistical analysis.
Had there been no breaks in the trend the test strategy suggested by Johansen (1996, Section
12.2) could be used. A generalization of that idea is not simple for two reasons. First, the limit
distribution depends continuously on the nuisance parameter. Secondly, in many applications it
.
could be of interest subsequently to test hypotheses corresponding to rank restrictions on
Such tests are discussed in Section 4.
2
4
4
4
m +
=
im xi +
i jm xi x j +
i jkm xi x j xk dm
m=0
i=1
i=1 j i
i=1 j i k j
Cointegration analysis
227
b
0.5
0.4
0.3
0.2
0.1
0.0
0.0
0.1
0.2
0.3
the same role as the dummies for dimensions 1, 2 and 3 used in Doornik (1998), but give a better
fit. Note that the regression includes the inverse of the sample size, T 1 . The role of the sample
size in fitting response surfaces for the trace test is discussed in Doornik (1998). The asymptotic
moments are easily calculated from (3.11) by letting T . Note also that x1 d1 = d0 = 1 and
x1 d2 = d1 . Some of the parameters in (3.11) are therefore not identified, and are set to zero. The
remaining 75 parameters have been estimated by ordinary least squares, adding an error term to
(3.11) and minimizing the sum of squared residuals.
The moments of the asymptotic distribution were simulated for various values of ( p r ), a,
b and T. The involved Brownian motions can be discretized in several ways. One possibility is to
mimic the representation (3.2) and generate one random walk with T steps in each simulation, and
associate a percentage of this to each sample period. In order to avoid poor approximations for
cases with relatively short sample periods, representation (3.4) was used. The idea is to generate
three random walks each with T steps and then scale them according to the relative lengths of the
sample periods. The values of T were the integer part of 500/t for t = 1, . . . , 10. The considered
number of non-stationary relations was ( p r ) = 1, . . . , 8. Finally 20 different values of a and
b were chosen as illustrated in Figure 1, to be representative of all pairs (a,b) such that a < b and
b < (1 a b). Note that there is a more dense sampling when a = 0, corresponding to one
single break. This gives 1600 cases which were repeated N = 100 000 times.
The fit of (3.11) is excellent, even when the number of parameters is dramatically reduced
starting from the least significant, as illustrated in Table 1. Standard errors are about 0.2% for the
mean and 0.9% for the variance, an order of magnitude actually very close to the Monte Carlo
sampling variation in log(moment). Note that such small errors in the moments are virtually
negligible for all practical purposes when computing the quantiles and tail probabilities of the
-distribution. This point is illustrated in Tables 2 and 3, where quantiles and tail probabilities
are computed for the values mean = 80 and variance = 125 and small variations thereof. These
c Royal Economic Society 2000
228
S. Johansen et al.
# Par.
Restricted
R2
103
# Par.
R2
103
2.06
Hc , log(mean)
75
0.999997
1.78
31
0.999996
Hc , log(variance)
75
0.999963
5.86
19
0.999936
7.59
Hl , log(mean)
75
0.999998
1.15
31
0.999996
1.77
Hl , log(variance)
75
0.999940
6.76
24
0.999894
8.86
125/1.009
125
125*1.009
80/1.002
98.99
99.08
99.17
80
99.15
99.24
99.33
80*1.002
99.31
99.40
99.49
Table 3. Right-hand tail probability of the -distribution for the value 99.24.
Variance
Mean
125/1.009
125
125*1.009
80/1.002
0.0480
0.0487
0.0494
80
0.0493
0.0500
0.0507
80*1.002
0.0505
0.0513
0.0520
values of mean and variance are approximately equal to the average of the values found in our
simulations.
It is important to remark that the residuals of (3.11) are approximately homoscedastic in all
cases, which means that there are no values of ( p r ), a, b and T in which the errors are much
bigger as a percentage of the moment. This is what motivated the choice to model the log of
the moments rather than the moments themselves. We also tried to model the moments directly,
without taking logarithms, using weighted least squares with ( p r )2 as weights to account
for heteroscedasticity, along the lines of Doornik (1998). However, the fit appears to be slightly
poorer with that specification. Similar results can be achieved using PcGets, an algorithm for
model selection constructed by Krolzig and Hendry (2000).
The estimated coefficients are reported in Table 4 where the coefficients referred to the variable
x4 = T 1 are not reported, since they are irrelevant for computing the asymptotic moments.
In the case of q = 2 sample periods instead of q = 3 the mean and variance can still be
computed by using (3.11) by letting b = 0 and subtracting the mean and variance of a 2 ( p r )
variable, see (3.5). In the case of just q = 1 sample period let a = b = 0 and subtract an
additional 2 ( p r ) variable. Thus for q = 1, 2, 3 we have the approximation
mean exp{ f mean ( p r, a, b, )} (3 q)( p r )
(3.12)
(3.13)
Cointegration analysis
229
Hl
log(variance)
log(mean)
log(variance)
Constant
2.80
3.78
3.06
(p r)
0.501
0.346
0.456
3.97
0.314
1.43
0.859
1.47
1.79
0.399
0.993
0.256
( p r )2
0.0309
0.0106
0.0269
0.00898
( p r )a
0.0600
0.0339
0.0363
0.0688
( p r )b
0.0195
a2
5.72
ab
1.12
b2
1.70
( p r )3
0.000974
( p r )a 2
0.168
a3
6.34
ab2
1.89
2.35
4.08
2.35
0.000840
3.95
a2 b
b3
4.21
6.01
4.75
1.33
1.85
0.282
2.04
0.587
( p r )1
2.19
a( p r )1
0.438
0.874
0.304
1.62
b( p r )1
1.79
2.36
1.06
3.13
a 2 ( p r )1
6.03
2.88
9.35
4.52
ab( p r )1
3.08
b2 ( p r )1
1.97
a 3 ( p r )1
8.08
ab2 ( p r )1
5.79
b3 ( p r )1
( p r )2
4.44
0.717
1.29
a 2 ( p r )2
1.52
b2 ( p r )2
2.87
a 3 ( p r )2
2.05
2.47
3.82
1.21
2.12
5.87
22.8
7.15
4.31
b( p r )2
b3 ( p r )2
2.73
1.02
0.807
4.95
4.89
0.681
0.874
0.828
0.865
5.43
13.1
2.03
1.50
These formulas allow comparison with published approximations of the limit distribution for the
case with no breaks. Table 5 compares the mean and variance based on formulas (3.12) and
(3.13) with the expressions obtained by Doornik (1998) as well as comparing percentiles based
on these two sets of formula with those reported by Johansen (1996). Formulas (3.12) and (3.13)
agree quite well with Doorniks approximation both with respect to moments and in particular the
rightmost percentiles. Our approximation is based on surface responses with ( p r ) 8 while
c Royal Economic Society 2000
230
S. Johansen et al.
Table 5. Approximate mean, variance and 95th percentile of the asymptotic distribution of the rank test in
model Hc for a = b = 0, q = 1. Comparison of RS, the response surface analysis, D, Doornik (1998),
and J , Johansen (1996, Table 15.2).
mean
(p r)
RS
variance
D
RS
95th percentile
RS
4.1
4.1
7.0
6.7
9.2
9.1
9.1
12.0
12.1
19.6
20.0
20.1
20.3
20.0
24.2
24.0
38.5
38.6
35.2
35.0
34.8
40.2
40.0
63.2
63.2
54.1
53.9
53.4
60.2
60.1
94.0
93.8
77.0
76.9
75.7
84.1
84.1
131.1
130.4
103.8
103.7
101.8
111.9
112.1
174.2
173.6
134.5
134.6
132.0
142.8
144.1
222.6
221.6
169.2
169.4
165.7
180.3
180.1
274.9
276.2
208.4
208.3
203.3
10
222.5
220.1
329.3
336.8
253.2
251.1
244.6
Doornik considered ( p r ) 15. It is seen that our formula is not suited for extrapolation beyond
( p r ) > 10. The percentiles tabulated by Johansen are based on a discretization of T = 400,
whereas Doorniks and our formulas are based on response surfaces in T combined with the approximation. As expected there is agreement for lower dimensions whereas Johansens figures
are less accurate for higher dimensions. Doornik gives a more detailed comparison of percentiles
found from the -approximation and directly by simulation.
Cointegration analysis
231
Note that it is not straightforward to do these two tests in the opposite order. In that case the
test for slope of the common deterministic trend is burdened with nuisance parameters. This is
related to the issue that the non-stationary trends are not uniquely defined.
Hl (r ) :
= G,
where G is a known (q g)-matrix of rank g, where g q, and the parameter is a (gr )-matrix.
Under the hypothesis the slope for the cointegrating relations is therefore
l E t = G E t .
As an example suppose q = 2. By the choice G = (1, 0) the linear trend is absent in the second
period whereas if G = (1, 1) then the slope is not altered by the break. Note, that when there is
canonical correlations of the residuals, 1 > 1 > > p > 0, are given by
X t1
CanCor X t ,
F ,
t G Et
t
L R{ Hl (r )
Hl (r )} = T
log{(1 i )/(1 i )},
i=1
Theorem 4.1. Suppose Hl (r ) and Assumption 1 are satisfied. Then the likelihood ratio test
232
S. Johansen et al.
Hl (r ) : = M + M
,
l
former of these is
l =
C M +
C( )1 ,
showing that restrictions on are necessary to interpret Hl in terms of the non-stationary trends.
In the following we therefore discuss likelihood ratio tests for
Hl
(r ) : = G, = M + M
the hypothesis Hl ( p) entails no restrictions as compared with Hl ( p). The squared sample
X t1
CanCor X t , t G E t
Ft ,
(4.1)
E
M
t
where the notation Ft indicates that the regressor E t is replaced by M E t . The likelihood ratio
L R{ Hl
p
(r )
Hl (r )} = T
log{(1 i )/(1 i )},
L R{ Hl
i=r +1
(r ) Hl (r )} = L R{ Hl (r ) Hl (r )} + L R{ Hl (r ) Hl (r )},
see Johansen (1996, Theorem 6.2), where it is explained that it is convenient to express the first
of these statistics in terms of the small eigenvalues, using the fact that Hl ( p) = Hl ( p).
For the asymptotic analysis the restriction span(G) span(M) is crucial for avoiding nuisance
parameters.
5. Empirical Illustration
This section illustrates the suggested statistical analysis, applied to a five-dimensional data set with
variables relevant for analysing the Uncovered Interest Parity (UIP) hypothesis between Germany
c Royal Economic Society 2000
Cointegration analysis
233
Akaike
HannanQuinn
Schwartz
2 (125)
Godfrey ar
5
53.63
53.00
52.06
0.001
54.29
53.26
51.72
0.261
54.37
52.93
50.80
0.231
54.46
52.62
49.89
0.758
54.80
52.56
49.24
0.846
and Italy. The economic model is very simple, and should be regarded as an illustration rather than
a contribution to the ongoing economic debate. The analysis has been done using MALCOLM
2.4 (Mosconi1998), where all the techniques illustrated in this paper are implemented in a user
friendly menu driven environment.
Let us consider the vector
Yt = (ptI , ptD , et+1 , i tI , i tD )
where ptI and ptD are first differences of log Consumer Price Index and represent inflation rates
in Italy and Germany. The variable et+1 is the first differences of log nominal exchange rate
between Italian Lira and German Mark (LIT/DM) and represents the rational expectation to future
exchange rates. Finally, i tI and i tD are Italian and German nominal interest rates on long-term
treasury bonds, given as annual rates divided by 4, to make them dimensionally matching with
the other variables. As for the sources, prices are from EUROSTAT (except 19731975, where
prices are from UNMonthly Bulletin of Statistics); note that, after October 1990, German prices
refer to unified Germany. Exchange Rates are from the Bank of Italy (average quarterly exchange
rates). Interest Rates are from IMF, International Financial Statistics. The data are available from
the Econometrics Journal website.
The data, which are shown in Figure 2, are quarterly, ranging from 1973.2 to 1995.4 (T = 91).
To model these data, based on prior knowledge of relevant historical events, we introduce two
breaks. The last observation of the first period is 1979.4, while the last observation of the second
period is 1992.2 (T1 = 27, T2 = 77; v1 = 0.297, v2 = 0.846; a = 0.154, b = 0.297). The first
break coincides with the creation of the EMS, but it is also supposed to catch the oil shock and
the modification of the US monetary policy. The second break corresponds to the exit of Italy
from the EMS, but also to the unification of Germany. The plot clearly shows the presence of
trends: the trend in inflation and interest rates in both countries in the second period is apparent,
but one might suspect trends in some of the variables also in the first and third period. This
suggests modelling the data using model Hl . In fact, the presence of trends in the variables may
be explained within model Hc only by random walks, whereas model Hl allows for interpreting
trends either as related to random walks or trend stationarity. Within model Hl , the nature of the
trend may be decided according to appropriate tests once the cointegration rank is determined in
a setting which is robust with respect to trend stationarity.
The analysis to determine the maximum lag k is reported in Table 6. The information criteria
suggest different values of k, in which case it is common practice to prefer the HannanQuinn
criterion. Therefore, k = 2 has been selected, since it is also the first lag to give approximately
white noise residuals, according to the Godfrey test.
JarqueBera normality tests, reported in Table 7, show some problems with skewness in the
c Royal Economic Society 2000
234
S. Johansen et al.
75
77
79
81
83
85
87
89
91
93
95
89
91
93
95
89
91
93
95
75
77
79
81
83
85
87
75
77
79
81
83
85
87
Skewness
Kurtosis
Sk + Kur
ptI
0.789
0.155
0.351
ptD
0.188
0.152
0.151
et+1
0.053
0.162
0.058
i tI
i tD
0.541
0.085
0.188
0.941
0.015
0.053
System
0.376
0.001
0.004
Cointegration analysis
235
Test
p-value
r =0
256.46
0.000
r 1
157.43
0.000
r 2
71.96
0.043
r 3
29.50
0.621
r 4
10.42
0.745
r =5
r =3
1.00
0.82
0.12 + 0.72i
1.00
0.12 0.72i
0.11 0.71i
0.67 + 0.12i
0.11 + 0.71i
0.67 0.12i
0.45 0.27i
et equation and kurtosis in the i tD equation, so that, at the system level, normality is rejected. Due
to the illustrative aim of this analysis we did not try to analyse these problems any further. Note,
however, that all residual-based misspecification tests, like Godfrey and JarqueBera, should be
modified in the present setting to take into account that the first k residuals of each period are set
to zero by the presence of the dummies D j,ti , whose purpose is to condition upon the first k
observations of each period. This might partly explain the problems with kurtosis.
Coming to cointegration analysis, UIP implies that
i tI {i tD + E t (et+1 )} = 0
(5.1)
so that for an Italian investor the return from investing in Italy equals the expected return from
investing in Germany. The interpretation of the relation in the context of a vector autoregressive
model is
i tI (i tD + et+1 ) = zero mean stationary.
(5.2)
Therefore, the cointegration rank r is expected to be at least equal to one, but of course it may be
higher, since we do not have theoretical reasons to exclude more stationarity in the data.
The tests for cointegration rank are reported in Table 8. The analysis supports r = 3, which
is consistent with our prior expectation. Therefore, we estimate the model with r = 3. Table
9 reports the five largest characteristic roots for both the unrestricted and the restricted models,
) = ( p r ).
which seem to be consistent with the I(1) assumption rank (
Before trying to set up identifying restrictions on the cointegration space, let us illustrate
some interesting tests on the deterministic components. The slopes of the deterministic trends for
the cointegrating relations are given by the elements of the (3 3)-matrix , whose ith column
represents the trend coefficients of the ith stationary relation in the three different periods. A
suggested routine analysis consists in testing for the exclusion of the linear trend in the stationary
c Royal Economic Society 2000
236
S. Johansen et al.
Hl,1 (3)
Hl,2 (3)
Hl,3 (3)
Hl,1 (3)
Hl,2 (3)
H I (3)
2 (n)
degrees of freedom
p-value
7.98
0.046
9.66
0.021
27.05
0.000
12.87
2+3
0.025
10.48
2+3
0.063
18.00
3+4+5
0.116
components in each period. In our example, this is done using the matrices
0 0
1 0
1 0
G1 = 1 0 ,
G2 = 0 0 ,
G3 = 0 1
0 1
0 1
0 0
and the restrictions
Hl,i (3) : = G i .
Note that these hypotheses may be also written as
Ip
Hl,i (3) : =
=
0
0
Gi
which is easily implemented in standard cointegration software. The results are given in Table 10,
which shows that a trend stationary component cannot be removed from any of the three periods.
However, the test takes on borderline values for Hl,1 (3) and Hl,2 (3), whereas the rejection is
Hl,i (3) : = Mi , = Mi + Mi
with
0
M1 = 1
0
0
0
1
1
M2 = 0
0
0
0 .
1
Referring to the discussion in Section 4.2, note that, in this case, M = G, so that the condition
span(G) span(M) of Theorem 4.2 is fulfilled. When Hl,1 (3) holds, then l M1 = 0 (i.e.
l = M1 ), so the slopes of both the stationary and the non-stationary components are zero in
the first period. The interpretation of the other restriction is similar. We perform these tests for
illustrative purposes only, since part of Hl,i (3), i.e. Hl,i (3), involving the stationary components,
has already been tested and rejected although not very strongly. As shown in Table 10, the joint
test also takes on borderline values, rejecting Hl,1 (3) and accepting Hl,2 (3). Strictly speaking,
this means that, according to the joint test, linear trends may be excluded from both the stationary
and non-stationary components in the second period.
In order to (over)-identify the cointegration space, we suggest the following stationary linear
combinations
c Royal Economic Society 2000
Cointegration analysis
237
z 1t = i tI (i tD + et+1 )
z 2t = (i tD ptD )
z 3t = (i tI ptI ) (i tD ptD ).
The equations represent the UIP hypothesis, the German real interest rate, and the real interest
rate differential, respectively. Note that, if these linear combinations are stationary, then also
z 4t = z 1t z 3t = (ptI ptD ) et+1
z 5t = z 2t + z 3t = (i tI ptI )
are stationary, and could be used to find an alternative and equivalent basis of the cointegration
space. Identifying restrictions may be written as
H (3) : =
= (B1 b1 , B2 b2 , B3 b3 ).
In order to test the local trend stationarity of z 1t ,z 2t and z 3t , together with some plausible
restrictions on the deterministic part, we set up the following identifying restrictions:
0 0 0
0 0
1
0 0 0
1 0
1
1 0 0
0 0
0
1 0 0
0 0
1
,
B1 =
=
=
B
,
B
2
3
1 0 0
1 0
1
0 1 0
0 0
0
0 0 0
0 0
0
0 0 1
0 1
0
which exclude the linear trend from z 1t in the second period, from z 2t in the first and second
I
periods, and from
r z 3t in all periods. The degrees of freedom for testing H (r ) against H (r )
are given by
j=1 ( p + q r dim B j + 1), see Johansen (1996, Theorem 7.5). As shown
I
in Table 10, H (3) cannot be rejected. Figure 3 represents z 1t , z 2t and z 3t , together with their
deterministic components estimated under H I (3).
This shows that in the period (79.1, 92.2), in which Italy belonged to the EMS, z 1t is approximately zero on average, which is evidence in favour of the UIP in that period. Conversely,
the mean of z 1t is negative and quite large in the first period, although trending towards zero.
This means that the interest rate in Italy was much lower than predicted by UIP plus rational
expectations. An interpretation could be that the extreme devaluation of Italian Lira in the 1970s
was unexpected, or in other words (5.2) should be replaced in the first period by
i tI (i tD + et+1 ) = zero mean stationary + t ,
where t represents a systematic bias in expectations. An alternative interpretation of the low
Italian interest rate could be related to the presence of a negative risk premium on Italy: if Italy
is perceived as less risky than Germany, then i tI should be lower than (i tD + et+1 ). However,
this interpretation seems implausible in the 1970s.
c Royal Economic Society 2000
238
S. Johansen et al.
STATIONARY COMPONENT # 1
0.075
0.050
0.025
0.000
0.025
0.050
0.075
0.100
0.125
73
75
77
79
81
83
85
87
89
91
93
95
83
85
87
89
91
93
95
83
85
87
89
91
93
95
STATIONARY COMPONENT # 2
0.0210
0.0180
0.0150
0.0120
0.0090
0.0060
0.0030
0.0000
0.0030
73
75
77
79
81
STATIONARY COMPONENT # 3
0.025
0.000
0.025
0.050
73
75
77
79
81
A similar argument holds in the third period, the mean of z 1t is negative at the beginning,
immediately after Italy left the EMS, but is upward trending, turning positive in 1994. However,
the deviation from the UIP would be unlikely to continue trending beyond the relative short third
sample period. Thus in this situation forecasting should be done cautiously.
Acknowledgements
Comments from Maria Cristina Leali, Pieter Omtzigt, Anders Rahbek, Graziano Vigan`o and the
referees are gratefully acknowledged. This paper replaces a manuscript by Johansen and Nielsen
c Royal Economic Society 2000
Cointegration analysis
239
from 1993 with the title Asymptotics for cointegration rank tests in the presence of intervention
dummies manual for the simulation program DisCo.
REFERENCES
Anderson, T. W. (1951). Estimating linear restrictions on regression coefficients for multivariate normal
distributions. Annals of Mathematical Statistics 22, 327351. Correction in Annals of Statistics 8, 1400
(1980).
Bartlett, M. S. (1938). Further aspects of the theory of multiple regression. Proceedings of the Cambridge
Philosophical Society 34, 3340.
Chan, N. H., and C. Z. Wei (1988). Limiting distributions of least squares estimates of unstable autoregressive
processes. Annals of Statistics 16, 367410.
Clements, M. P., and D. F. Hendry (1999). Forecasting Non-stationary Economic Time Series. MIT press,
Cambridge MA, USA.
Doornik, J. A., (1998). Approximations to the asymptotic distribution of cointegration tests. Journal of
Economic Surveys 12, 573593.
Doornik, J. A., and D. F. Hendry (1994). Modelling linear dynamic econometric systems. Scottish Journal
of Political Economy 41, 133.
Doornik, J. A., D. F. Hendry, and B. Nielsen (1998). Inference in cointegrating models: UK M1 revisited.
Journal of Economic Surveys 12, 533572.
Hansen, H., and S. Johansen (1999). Some tests for parameter constancy in the cointegrated VAR. Econometrics Journal 2, 2552.
Hansen, P. R. (2000). Structural changes in cointegrated processes. Ph. D. Thesis, University of California,
San Diego.
Hendry, D. F. (1997). The econometrics of macroeconomic forecasting. Economic Journal 107, 13301357.
Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28, 32177.
Inoue, A. (1999). Tests of cointegrating rank with a trend-break. Journal of Econometrics 90, 215237.
Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control
12, 231254.
Johansen, S. (1996). Likelihood-based inference in Cointegrated Vector Autoregressive Models. 2nd printing.
Oxford University Press.
Johansen, S. (2000a). A small sample correction for test of hypotheses on the cointegrating vectors. To
appear in Journal of Econometrics.
Johansen, S. (2000b). A Bartlett correction factor for tests on the cointegrating relations. Econometric Theory
16, 740778.
Johansen, S. (2000c). A small sample correction of the test for cointegrating rank in the vector autoregressive
model. EUI working paper ECO no. 2000/15.
Krolzig, H.-M., and D. F. Hendry (2000). Computer automation of general-to-specific model selection
procedures. To appear in Journal of Economic Dynamics and Control.
Kuo, B. (1998). Test for partial parameters stability in regressions with I(1) processes. Journal of Econometrics 86, 337368.
Mosconi, R. (1998). MALCOLM: The Theory and Practice of Cointegration Analysis in RATS. Venice: Ca
Foscarina. Available online at http://www.greta.it/malcolm.
Nielsen, B. (1997). Bartlett correction of the unit root test in autoregressive models. Biometrika 84, 500504.
Nielsen, B., and A. Rahbek (2000). Similarity issues in cointegration models. Oxford Bulletin of Economics
and Statistics 6, 522.
Perron, P. (1989). The great crash, the oil price shock, and the unit root hypothesis. Econometrica 57,
13611401. Erratum (1993) Econometrica 61, 248249.
Perron, P. (1990). Testing for a unit root in a time series with a changing mean. Journal of Business &
Economic Statistics 8, 153162. Corrections and Extensions by Perron, P., and Vogelsang, T., 1992,
Journal of Business & Economic Statistics 10, 467470.
c Royal Economic Society 2000
240
S. Johansen et al.
Rappoport, P., and L. Reichlin (1989). Segmented trends and non-stationary time series. Economic Journal
99 supplement, 168177.
Seo, B. (1998). Tests for structural change in cointegrated systems. Econometric Theory 14, 222259.
E(X t ) = l, j ,
and taking expectation in (2.6) we obtain, since the dummies are zero,
l, j = { c, j + l, j (t 1) + j t} + j +
k1
i l, j
i=1
k1
i=1 i
l, j = ( c, j l, j ) + j ,
or
j
j
=
+
l, j + j = 0,
l, j
c,t
.
= I p+r .
T
s=1
Us Vs
T
1
Vs Vs
Vt .
s=1
c Royal Economic Society 2000
Cointegration analysis
241
We work under the assumption Hl (r ): = G, see Section 4.1, and define the extended parameter
, t E G) enters. Since X exhibits a linear
= ( , ) . In the reduced rank problem the process (X t1
t
t
trend according to Theorem 2.1 it is convenient to detrend the levels of the process using the last component
, t E G) by {(X
of the vector, t E t G, and replace (X t1
t1 |t G E t ) , t E t G}. Thus, define
t
"
Ip
QT =
T
t=1 X t1 E t Gt
so that
Ig
X t1
t G Et
#1
T
E E Gt
t
G
t
t
t=1
,
# "
X t1 |t G E t
t G Et
= QT
and hence
X t1
t G Et
=
X t1
t G Et
=
Q T
X t1 |t G E t
t G Et
.
X t ,
X t1 |t G E t
t G Et
Ft
(A.1)
S00
S10
S01
S11
=
T
1
R0,t
R0,t
.
R1,t
T t=1 R1,t
Note that S11 is block diagonal because the residuals (X t1 |t G E t ) and t G E t are orthogonal. The squared
1
S01 vi .
S11 i vi = S10 S00
The matrix of the r first eigenvectors is denoted 0 and the parameters and are estimated from the
equation
(A.2)
= 0 .
Q T = Q T
A corresponding asymptotic relation can be established for the parameters using the representation in
Theorem 2.1
1
T
T
0
( X t1 + t E t G)E t Gt
t G E t E t Gt
( ) = ,
t=1
t=1
de f
= { , O P (T 1 )} ( , 0) = ( 0 ) ,
since X t + t E t G is stationary.
c Royal Economic Society 2000
(A.3)
242
S. Johansen et al.
In order to formulate the asymptotic results for the residuals in model Hl (r ) some notation is needed.
Below we will choose an {( p r )n}-matrix with full column rank. We can then obtain three independent
standard Brownian motions of dimensions r, n, ( p r n), respectively,
int(T
u)
( 1 )1/2 1 T 1/2
)1/2 T 1/2
( )1/2 (
t=1
int(T
u)
t=1
int(T
u)
Further define
Wu =
W,u
W ,u
t W,u ,
D
t W ,u .
t=1
t Vu ,
Fu =
Wu
e .
ueu
u
To motivate the choice of consider the representation of X given in Theorem 2.1 and use the identity
Ig = G G + GG to find
X = C
t
i=T j1 +k+1
G G E + t GG E + O (1),
i + t
t
l t
P
l
(A.4)
for T j1 +k < t T j . The residuals are found by correcting for t G E t and the variables in Ft . Since t G E t
X |t G E , F ) has a linear trend G G (t E |G E , E ).
is eliminated by regression the residual (
t
t
t
t
t
t
l
G .
It turns out that the limit distribution depends on the row space of the {( p r ) (q g)}-matrix
l
If it has rank n, say, its row space is spanned by a {(q g) n}-matrix of full rank. Note that also spans
G .
the row space of
In order to utilize the Brownian motion W in the asymptotic analysis of (A.4) we will transform the
G . By Assumption 1 the matrices and have full rank. Choose a
column space of
)1/2 ( )1/2 T 1/2
(
C
int(T
u)
t
W ,u .
t=1
)1/2
(
B2 =
0
Ig
.
1
2
Lemma A.1. Asymptotic properties of BT R1 . Suppose hypothesis Hl (r ) and Assumption 1 are satisfied.
Define the matrix
I pr n
N = 0n( pr n)
0g( pr n)
0( pr n)n
0nn
0gn
0( pr n)q
G {Iq Z G(G Z G)1 G } ,
G
(A.5)
Cointegration analysis
243
where Z = 01 (ueu |eu )(ueu |eu ) du. Then as T
W ,u
1
D
BT R1,int(T u)
G {Iq Z G(G Z G)1 G }ueu
T
G ueu
eu = N Fu .
(A.6)
t
t
has no trend and behaves asymptotically like a random walk corrected for E t so
D
)1/2 B R
T 1/2 (
1 int(T u) (W ,u |eu ).
(A.7)
)1/2 (X
It also follows from (A.4) that (
t1 |t G E t , Ft ) behaves asymptotically
(A.8)
Finally we find
P
(A.9)
For each period the processes X t and X t1 + t G E t can be given stationary initial distributions,
see Theorem 2.1. Apart from a changing level the stationary distributions are identical. Thus define
"00 "0
X t
X t1 , . . . , X tk+1 .
= Var
"0 "
X t1 + t G E t
Lemma A.2. Asymptotic behaviour of Si j . Suppose hypothesis Hl (r ) and Assumption 1 are satisfied. Then,
as T ,
"00 "0
P
S00
S01 0
.
(A.10)
"0 "
0 S10 0 S11 0
This asymptotic covariance matrix satisfies the identity
1
1
1
1
)1 .
"00
"00
"0 ("0 "00
"0 )1 "0 "00
= (
(A.11)
T 1 BT S11 BT N
1
0
Fu Fu du N ,
1
( 1 )1/2 1
D
d Vu
S1 BT
( )1/2 ( )1/2
Fu N ,
dW
u
0
)1/2 ( )1/2
(
(A.12)
(A.13)
(A.14)
244
S. Johansen et al.
The proof of equations (A.12)(A.14) follows from Lemma A.1. Equation (A.10) follows by noting
that for T j1 + k < t T j , representation (2.8) implies
X t = Ct + Y j,t + l E t ,
X t = (Y j,t + c E t + tl E t ) = Y j,t + c E t t G E t ,
since the dummies are all zero. The distribution of Y j,t is the same in all periods. Consequently, the
processes X t1 + t G E t and X t are stationary and the conditional variance, (A.10), is the same in
all periods. Further,
( X t1
t G E t , Ft ) = ( X t1 + t G E t
t G E t , Ft )
and hence the limit in (A.10) is expressed in terms of the variance of X t1 + t G E t . Finally, equation
(A.11) follows from Lemma 10.1 (J).
The joint convergence of (A.12), (A.13) follows from Chan and Wei (1988).
I pr q
0
0
0
0
=
I pr q
0
0
I pr q
0
0
0
0
Iq
implies that
1
D
)1/2 S B
(
M
dWu Fu N .
1 T
0
As in the proof of Theorem 3.1 the limit distribution of L R{Hlc (r )|Hlc ( p)} is
tr
M
1
0
dWu Fu N
0
N Fu Fu N du
1
0
N Fu dWu M
which reduces to (3.8) since M is orthogonal. Noting that is regular the diagonal matrices diag(I pr q , )
cancel.
(3) If n < min(q, p r ) we arrive at expression (3.8) as in case (2).
c Royal Economic Society 2000
Cointegration analysis
245
The proof of Theorem 4.1 uses the asymptotic properties of 0 , see (A.2). To discuss these it is convenient
to apply the normalization
0 = 0 ( 0 0 )1 .
Since the matrix ( 0 , BT ) has full rank and orthogonal blocks, 0 BT = 0, it follows that I p+g =
0 0 + BT B . Consequently,
T
0 = 0 + BT B T 0 = 0 + BT UT ,
where UT = B T 0 .
Correspondingly, define = 0 0 so 0 = 0 .
Lemma A.3. Asymptotic behaviour of Si j . Suppose hypothesis Hl (r ) and Assumption 1 are satisfied. Then
in model H (r ) are consistent and
the estimators 0 , ,
and
l
0 S10 = 0 S10 + oP (T 1/2 ).
N
1
0
Fu Fu du N
1
N
1
0
Fu (d Vu ) ( 1 )1/2 ,
where F and N are given by (3.3) and (A.5). Note, that V and F are independent since V and W are
independent by construction.
In Lemma A.5 below the test for a simple hypothesis on is discussed. Both of the parameters and
0
are used in the proof. Thus recall the above definition of the residual product moment matrices Si j where
the levels of the process are detrended. Correspondingly, let Si j be the residual product moment matrices
where the levels are not detrended, hence,
S00 S01
Ip
=
0
S10 S11
0
QT
S00
S10
S01
S11
Ip
0
0
Q T
.
Lemma A.4. Detrending of levels residuals. Suppose hypothesis Hl (r ) and Assumption 1 are satisfied.
Then
S10 = 0 S10 ,
S11 = 0 S11 0 ,
(A.15)
(A.16)
The proof. The two identities in (A.15) follow from Q T = 0 , see (A.2). For (A.16) combine approximation (A.3) and Lemma A.2. For instance,
246
S. Johansen et al.
Lemma A.5. Test for simple hypothesis on in model Hl (r ). Suppose Assumption 1 and the hypothesis
are satisfied. Then
1
1
1
1
D
d Vu Fu N N
Fu Fu du N
N
Fu d Vu .
L R{ |Hl (r )} tr
0
0
0
This variable is 2 {r ( p r + g)}-distributed since V and F are independent.
The proof relies on an expansion of the likelihood functions around . Using Lemma A.4 it is possible
to replace , and Si j with 0 , 0 and Si j without changing the asymptotic results. The remaining
arguments of the proof of Lemma 13.8 (J) can then be followed using Lemma A.3. The degrees of freedom
is given by the product of dim V = r and dim(N F) = p r + g.
M )1 M Z }.
tr(Z Z ) tr{Z M(M M)1 M Z } = tr{Z M (M
Since M is a function of W which is independent of V this variable is 2 and independent of W. The degrees
F) = (q g).
of freedom is the product of dim V = r and dim(N
(r )
For the asymptotic analysis of Hl (r ) the assumption span (G) span (M) is important. Had this not been
satisfied nuisance parameters would appear. The easiest example of that phenomenon is seen in a model
without breaks, q = 1, one lag, k = 1, and a linear trend for the cointegrating relation, G = I1 = 1,
X t = ( X t1 + t) + + t .
The test for absence of the common deterministic slope, = or M = 0, is burdened with nuisance
parameters.
The likelihood ratio test statistic for Hl (r ) in Hl (r ) is based on the sample canonical correlations
given in (4.1). Due to the invariance properties of canonical correlations it is equivalent to consider canonical
correlations of the residuals
E
X t1 |t G E t , M
E
Ft .
t G E t |M
(R0,t , R1,t ) = X t ,
(A.17)
t
E
M
t
Here Ft indicates that we have only corrected for M E t rather than for E t . Let Si j denote the corresponding
sample product moment matrices.
S00
S10
S01
S11
1
S00 + s01 s11
s10
=
S10
s10
$
S01
S11
0
%
s01
,
0
s11
c Royal Economic Society 2000
Cointegration analysis
247
The expression for S00 follows because the regressor set Ft is smaller than Ft . Further, the first two
components of R1,t are orthogonal to the last component giving the orthogonal structure of S11 and can
be represented as
X t1 |t G E t
X t1 |t G E t
M
E
,
F
=
t t
Ft = R1,t ,
t G Et
t G Et
see (A.1), implying that the upper left blocks of S11 , S10 are S11 , S10 . The asymptotic properties of Si j
are given in Lemma A.2 and we only need to describe si j .
Lemma A.6. Asymptotics for si j . Suppose hypothesis Hl (r ), Assumption 1 and the condition span(G)
(e |M e ). For T the
span(M) are satisfied. Define the (q m)-dimensional function f u = M
u
u
following results hold jointly with those in Lemma A.2
T 1/2 s01 = OP (1),
1
P
f u f u du,
s11
(A.18)
(A.19)
)1/2
T 1/2 s10 (
1
0
f u dWu .
(A.20)
)1 ,
(
(A.21)
E t E t = O(T ),
t=1
T
t=1
E t Z t = OP (T 1/2 ),
T
Z t1 t = OP (T 1/2 ).
(A.22)
t=1
In order to use those results the differenced process needs to be analysed carefully. Theorem 2.1 and Hl
imply the representation
X t = Ct + Yt E t + l E t ,
where
l = C M + (C I p ) G .
(A.23)
(A.19): For the asymptotic analysis of s11 the regression on lagged differences can be ignored because
of (A.22), (A.23), hence
s11 = T 1
T
t=1
E
M E , D
( M
t
t
j,ti )( M E t M E t , D j,ti ) + oP (1).
(A.24)
248
S. Johansen et al.
For asymptotic purposes the regression on the dummies also can be ignored and (A.19) follows.
(A.18): For the analysis of s10 note that (A.23) implies that
s10 = T 1
T
E )( C + Y E | M E , C
(M
t
t
t t
t
ti + Yti E ti , D j,ti ) .
(A.25)
t=1
T
E )(
M E , D
(M
t
t
j,ti ) + oP (1),
t
(A.26)
t=1
(A.21): Since s10 = OP (T 1/2 ) while S00 , s11 are of order one, then S00 = S00 + oP (1). Using that
(S00 )1 (S00 )1 S01 { S10 (S00 )1 S01 }1 S10 (S00 )1
1
1
1
1
= S00
S00
S01 0 ( 0 S10 S00
S01 0 )1 0 S10 S00
+ oP (1)
)1 .
which by (A.10), (A.11) converges to (
= 0. Result (A.23) then fails and s is of
When l M = 0 then span(G) span(M) and l M
10
order one. In particular, s01 = l s11 + oP (1) converges to a non-zero value.
1
1 1
1
N
N
N
N
F
F
F
F
u
u
u
u
dWu
du
dWu ,
tr
fu
fu
fu
fu
0
0
0
where f is defined in Lemma A.6. The proof is the same as that of Theorem 3.1 with BT replaced by
0
BT
.
BT =
0
T 1/2 Iqm
The same type of argument shows that the likelihood ratio test statistic for Hl (r ) against Hl ( p) converges
in distribution to
1
1
1
1
tr
dWu Fu N
N Fu Fu N du
N Fu dWu .
0
0
0
Since F and f are orthogonal we find
L R{Hl
1
1
1
1
D
tr
dWu f u
f u f u du
f u dWu
0
0
0
Cointegration analysis
249
L R{Hl
(r )|Hl (r )} = L R{Hl
From the proof of Theorem 4.1 we have that L R{Hl (r )|Hl (r )} is asymptotically 2 and independent of W.
The asymptotic distribution of L R{Hl (r )|Hl (r )} is therefore a sum of two independent 2 -distributions.
If span(G) span(M) then (A.21) fails and the above arguments do not hold. It follows that nuisance
parameters are involved in the asymptotic distributions.