Vous êtes sur la page 1sur 34

Econometrics Journal (2000), volume 3, pp. 216249.

Cointegration analysis in the presence of structural breaks in


the deterministic trend
SREN J OHANSEN , R OCCO M OSCONI , B ENT N IELSEN
Economics

Department, European University Institute, Via Roccettini 9,


50016 San Domenico di Fiesole, Italy
E-mail: sjohanse@iue.it
http://www.iue.it/Personal/Johansen
Dipartimento di Economia e Produzione, Politecnico di Milano, Piazza L. da Vinci 32,
20133 Milano, Italy
E-mail: rocco.mosconi@polimi.it
http://www.ecopro.polimi.it/home/Rocco.Mosconi
Department of Economics, University of Oxford & Nuffield College,
Oxford OX1 1NF, UK
E-mail: bent.nielsen@nuf.ox.ac.uk
http://www.nuff.ox.ac.uk/users/nielsen

Summary When analysing macroeconomic data it is often of relevance to allow for structural breaks in the statistical analysis. In particular, cointegration analysis in the presence of
structural breaks could be of interest. We propose a cointegration model with piecewise linear
trend and known break points. Within this model it is possible to test cointegration rank,
restrictions on the cointegrating vector as well as restrictions on the slopes of the broken linear
trend.

Keywords: Break points, Cointegration, Common trend, Deterministic trend, Piecewise


linear trend, Stochastic trend, Structural breaks, Vector autoregressive model.

1. Introduction
In the analysis of economic time series it is often necessary to allow breaks in the deterministic
components. When allowing for breaks the timing is important, this could either be known in
advance or an algorithm searching for breaks could be applied. While both issues are discussed
in the literature and mainly in a univariate setting, this paper focuses on cointegration analysis in
a multivariate setting in the presence of breaks at known points in time. The suggested approach
is a slight generalization of the likelihood-based cointegration analysis in vector autoregressive
models suggested by Johansen (1988, 1996). There are only a few conceptual differences and
the major issue for the practitioner is that new asymptotic tables are needed. This paper concerns
only asymptotic analysis. Finite sample properties of the rank test and tests for restrictions on the
cointegrating vector have been discussed by Johansen (2000a,b,c) in the situation of no breaks
using an analytic correction factor to the likelihood ratio tests.
c Royal Economic Society 2000. Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street,

Malden, MA, 02148, USA.

Cointegration analysis

217

Structural breaks have been discussed intensively in the context of univariate autoregressive
time series with a unit root. An important finding is that a time series given by stationary
fluctuations around a broken constant level is better described by a random walk than a stationary
time series, see Perron (1989, 1990) and Rappoport and Reichlin (1989). Addressing this issue,
these authors suggested various univariate models allowing for breaks in the deterministic term.
In particular, Perron (1989) suggested three models: (A) crash model with change in intercept
but unaffected slope of the linear trend, (B) changing growth model with no change in intercept
but changing slope of trend function, and (C) where both intercept and slope are changed at
the time of the break. The model presented here generalizes model (C) and allows for testing
hypotheses corresponding to model (A).
A related concern in econometric models is parameter stability which is investigated by
methods related to those for known break points. These methods typically allow for structural
breaks at unknown times, and have been discussed, for instance, in the special issues of the Journal
of Business & Economic Statistics, volume 10, 1990 and the Journal of Econometrics, volume
70, 1996. More recently a test of this type has been suggested by Inoue (1999) in connection with
cointegration testing in a vector autoregressive setting. While those authors and also this paper
are concerned with breaks in the deterministic terms, some procedures for analysing breaks in the
cointegration parameter have been presented by Kuo (1998), Seo (1998), Hansen and Johansen
(1999) and Hansen (2000).
The approach taken here is to analyse cointegration in a Gaussian vector autoregressive model
with a broken linear trend and known break points. Likelihood analysis of cointegration is then
given in terms of reduced rank regression, a combination of least squares regression analysis and
canonical correlation analysis. The model and the rank hypothesis are discussed in Section 2.
Section 3 presents tests for cointegration rank for models with broken trend and broken level.
The asymptotic distributions have been simulated and the results described by response surface
analysis. Next, in Section 4 various tests for linear restrictions on the slopes for the broken trend
are given. Most of these tests are asymptotically 2 -distributed. In Section 5 the suggested
procedures are illustrated using data for inflation rates, interest rates and exchange rate for Italy
and Germany.
Throughout the paper the following notational convention is used. For a matrix, a, with full
 a = 0 and have the property that (a, a )
column rank let a = a(a  a)1 . Further, let a satisfy a

has full rank.

2. The Model
The cointegrated vector autoregressive model with no breaks is analysed in detail in Johansen
(1996). A simple example of the basic model is
X t = X t1 + 1 t + + t ,

(2.1)

where we have left out more lags. Cointegration will appear if  has reduced rank in which case
we can write  =  . In that case the process generated by (2.1) has a quadratic trend, which is
eliminated if we assume 1 =  , as will be done throughout this paper. Note that the reduced
rank involves the combined matrix (, 1 ) = (  ,  ) . We therefore consider
X t = (  X t1 +  t) + + t ,
c Royal Economic Society 2000


(2.2)

218

S. Johansen et al.

as the starting point for this paper. The role of the deterministic terms, see Johansen (1996, Ch. 5)
is briefly summarized as follows. The model defined by (2.2) will generate a process with a linear
trend and is therefore called Hl (r ). Even  X t has a trend and is hence trend stationary. A number
of sub-models are defined by successively restricting the parameters and . If = 0 then
X t =  X t1 + + t ,

(2.3)

and the process still has a linear trend, but  X t does not and becomes stationary with a constant
level. This model is denoted Hlc (r ). Finally, if = 0 and also =  then
X t = (  X t1 +  ) + t ,

(2.4)

and the process has no linear trend in any direction. This model is denoted Hc (r ).
In the rest of this section we formulate a model for the observed time series X t , t = 1, . . . , T,
which is divided into sub-samples according to the position of break points. For each sub-sample
a vector autoregressive model is chosen, so the parameters of the stochastic components are the
same for all sub-samples, while the deterministic trend may change between sub-samples. In that
case the process can be given rather simple representations and interpretations in each period and
statistical analysis akin to that of the usual vector autoregressive models.

2.1. Formulation of model and rank hypothesis


The model allows for any pre-specified number of sample periods, q say, of length T j T j1
for j = 1, . . . , q and 0 = T0 < T1 < T2 < < Tq = T . It follows that the last observation
in the jth sample is T j while T j + 1 is the first observation in sample period number ( j + 1). A
vector autoregressive model of order k is considered. In analogy with the usual models without
structural breaks, the model is formulated conditionally on the first k observations of each subsample, X T j1 +1 , . . . , X T j1 +k , and it is given by the equations

X t = (,  j )

X t1
t


+ j +

k1


i X ti + t

(2.5)

i=1

for j = 1, . . . , q and T j1 + k < t T j . The innovations are assumed to be independently,


identically normally distributed with mean zero and variance . The parameters vary freely, so
, i ,  which relate to the stochastic component of the time series are the same in all subsamples and of dimension ( p p) with  being symmetric and positive definite, while the
p-vectors  j , j relate to the deterministic component and could be different in different sample
periods.
A cointegration hypothesis can be formulated in terms of the rank of either  alone or in
conjunction with 1 , . . . , q , as we saw in the discussion of example (2.1). The latter gives
nicer interpretations and some advantageous similarity properties and is given by

Hl (r ):

rank(, 1 , . . . , q ) r

or

(, 1 , . . . , q ) =

1
..
.

c Royal Economic Society 2000




Cointegration analysis

219

where the parameters vary freely so , are of dimension ( p r ) and j is of dimension (1 r ).


The notation Hl indicates that in each sub-sample the deterministic component is linear both for
non-stationary and cointegrating relations. This feature will become evident from the Granger
representation below. A related hypothesis arises in the case of no linear trend but a broken
constant level as in example (2.4),
Hc (r ):

rank(, 1 , . . . , q ) r

and

1 , . . . , q = 0.

As an alternative to the models Hl and Hc a rank hypothesis could be formulated for  alone as
in example (2.3),
Hlc (r ):

rank  r

and

1 , . . . , q = 0.

The hypotheses are nested as Hc (r ) Hlc (r ) Hl (r ).


For the purpose of determining the cointegration rank the hypothesis Hlc is less attractive than
Hc , Hl for two reasons. First, as indicated by the sub-index, the hypothesis Hlc implies that the
non-stationary relations have a broken linear trend while the cointegrating relations have broken
constant levels. Thus under the hypothesis Hlc the deterministic trend of a component depends
on the cointegrating properties, whereas in testing Hl the deterministic behaviour of the process
is the same regardless of the cointegrating properties, namely a linear trend in all directions.
Secondly, in Section 3.3 it will be demonstrated that the asymptotic analysis is heavily burdened
with nuisance parameters. These issues are discussed in further detail by Nielsen and Rahbek
(2000).

2.2. Another formulation


The above description involves writing q model equations of type (2.5). In order to write these
as one equation which is more conformable with standard econometric computer packages some
dummy variables are introduced. Let

1
for t = T j1 ,
D j,t =
for j = 2, . . . , q; t = . . . , 1, 0, 1, . . . ,
0
otherwise,
so D j,ti is an indicator function for the ith observation in the jth period; that is, D j,ti = 1 if
t = T j1 + i. Further,
E j,t =

Tj
T j1


D j,ti =

i=k+1

for T j1 + k + 1 t T j ,
otherwise,

1
0

is the effective sample of the jth period. It is convenient to gather the sample dummies and the
drift parameters for the different sample periods
E t = (E 1,t , . . . , E q,t ) ,

= (1 , . . . , q ),

= (1 , . . . , q ) ,

of dimensions (q 1), ( p q), (q r ), respectively. The model equation becomes



X t =

 

c Royal Economic Society 2000




X t1
t Et


+ E t +

k1

i=1

i X ti +

q
k 

i=1 j=2

j,i D j,ti + t ,

(2.6)

220

S. Johansen et al.

where the dummy parameters j,i are p-vectors and the observations X 1 , . . . , X k are held fixed
as initial observations. Note, that the effect of the dummy variables D j,t1 , . . . , D j,tk corresponding to the observations X T j1 +1 , . . . , X T j1 +k is to render the corresponding residuals zero
thereby essentially eliminating the corresponding factors from the likelihood function, and hence
producing the conditional likelihood function given the initial values in each period.

2.3. Interpretation
A process satisfying the hypothesis Hl (r ) can be interpreted using Grangers representation
theorem. That is, linear combinations of the process, given by , cointegrate while the process
exhibits a linear trend in each of the sub-samples. As usual it is necessary to assume that the
process is actually an I(1) process.
Assumption 1. Assume that the roots of the characteristic polynomial,
A(z) = (1 z)I p  z

k1


i (1 z)z i ,

i=1

are outside the complex unit


or at 1 and that the matrices and have full column rank r.
circle
k1
Further, define  = I p i=1
i and assume full rank of the matrix


 .

(2.7)

Theorem 4.2 of Johansen (1996) can be generalized as follows.


Theorem 2.1. Grangers Representation Theorem. Suppose Assumption 1 is satisfied. Then,
for each period the initial values X T j1 +1 , . . . , X T j1 +k can be given a distribution such that
 X t + j t and X t are stationary processes. In particular,
Xt = C

t


i + Y j,t + c, j + l, j t

(2.8)

i=T j1 +k+1
  )1  . The processes Y
for j = 1, . . . , q, T j1 + k < t T j and C = (

j,t are

stationary, identically distributed and have zero expectation. The slope parameters, l, j , can be
expressed as
l, j = C j + (C I p ) j ,

whereas the level coefficient c, j depends on initial values in such a way that  c, j is an identified
function of the parameters
 c, j =  (C I p ) j +  (C ) j j .
For each sample period the process  X t + t  E
t t is stationary and hence it has no trending

behaviour. The common stochastic trends are
s=T j1 +k+1 s and the slopes of the common
 =   .
deterministic trends are
j
l, j

c Royal Economic Society 2000




Cointegration analysis

221

In some situations it is convenient to have linear combinations of the data representing the
 X and   X both of which are combinations that do
non-stationary trends. Examples are
t
t

not cointegrate.
The representation shows that in each sub-sample all linear combinations of the process are
allowed to have a linear trend, which generalizes model (C) suggested by Perron (1989). Tests
for linear restrictions on the slope parameter l = C + (C I p )  are discussed in Section
4. The Granger representation shows that the slope for the cointegrating vector,  l =  , has
  =  .
to be treated separately from the slope of the common deterministic trend,
l

An example is the two-period model, q = 2, with common slopes, l,1 = l,2 , corresponding
to Perrons model (A). In general these hypotheses are of the form

 l G =  G = 0

Hl (r ) :

or

= G

(2.9)

for the cointegrating relation and

Hl (r ) :




l M =
M = 0

for the common deterministic trends. Here G and M are known matrices of dimension (q g)
and (q m), respectively, with g, m < q and full column rank. In particular, Perrons model (A)
 =  .
is given by G = M = (1, 1) , which means that 1 = 2 and
1
2
A more subtle question is how the transition happens from one sample period to the next.
The suggested conditioning on k initial values in each period allows for great flexibility and these
transition periods can be extended if necessary. An alternative approach would be to use a specific
transition function of some kind. Suppose it is of interest to model an instantaneous break in the
level. As a simple example of such a problem, with one lag, consider an unobserved components
formulation
X t = c,1 1(tT1 ) + c,2 1(t>T1 ) + Z t ,
Z t =  Z t1 + t ,
such that for 2 t T,
X t =  {X t1 c,1 1(t1T1 ) c,2 1(t1>T1 ) } + c,1 1(tT1 ) + c,2 1(t>T1 ) + t .
Using the definitions of D j,t and E j,t , in particular that D2,t = 1(t=T1 ) , E 1,t = 1(tT1 ) and
E 2,t = 1(tT1 +2) , it follows that
X t =  (X t1 c,1 E 1,t c,2 E 2,t ) + {c,2 (I p +  )c,1 }D2,t1 + t .
Comparing with equation (2.6) or rather (3.6) it is found to be of the form


 
X t1

X t =
+ 2,1 D2,t1 + t
Et

with j =  c, j , for j = 1, 2, and 2,1 = c,2 (I p +  )c,1 satisfying the restriction


 2,1 = (Ir +  )1 2 .
This restriction on 2,1 is related to only one observation and is therefore difficult to test. For
models of higher order the conditions for instantaneous breaks would similarly involve all transition parameters j,i . This issue is discussed in further detail for the univariate case by Perron
(1990).
c Royal Economic Society 2000


222

S. Johansen et al.

Another type of restriction of interest is co-breaking, see Hendry (1997) or Clements and
Hendry (1999, pp. 249252). In the model Hl (r ), the slope and the intercept of the linear trend
of the cointegrating relation  X t are in general different from period to period. An r -vector
is an equilibrium slope co-breaking vector if the slope of the deterministic trend   X t does
not change from period to period; that is, must satisfy (  l ) span(1, . . . , 1) . Grangers
representation Theorem 2.1 shows that  l =  and hence co-breaking is a linear restriction
on the row space of the (q r )-matrix . When q < r or when the rank of otherwise is
smaller than r then there are at least (r rank ) co-breaking vectors, given by the orthogonal
complement, (  ) , to the matrix  . Note, that the hypothesis in (2.9), = G, is formulated
in terms of the column space of and is therefore not directly linked to co-breaking, although, it
gives an upper bound for the rank of and thereby a lower bound for the number of co-breaking
vectors. When g is smaller than r there are at least (r g) co-breaking vectors given by (  ) .

3. Test for Rank


The cointegration rank can be tested by modifying the procedures suggested by Johansen (1996).
Whereas the statistical analysis is hardly changed the asymptotic results are related but different.
New asymptotic distributions arise. First, these are described formally for the three different cases:
Hl , Hc , and Hlc . The analysis of the latter hypothesis is burdened with nuisance parameters and
less useful than the first two. Secondly, the asymptotic distributions related to Hl and Hc are
described by response surface analysis which can easily be programmed.
For the suggested model the likelihood function can be maximized using canonical correlation
methods as developed by Hotelling (1936), Bartlett (1938), Anderson (1951), and implemented in
cointegration analysis by Johansen (1996, Ch. 6). In particular, in the case of model Hl inference
is based on the squared sample canonical correlations, 1 > 1 > > p > 0, of X t and
 , t E  ) corrected for the regressors
(X t1
t
Et ,

X ti (i = 1, . . . , k 1),

These will be denoted


CanCor X t ,

D j,ti , (i = 1, . . . , k; j = 2, . . . , q).


X t1
t Et

 

Ft ,

where Ft is shorthand notation for the -field generated by the regressors. The likelihood ratio
test statistic for the hypothesis of at most r cointegrating relations, Hl (r ), against Hl ( p) is given
by
p

L R{Hl (r )|Hl ( p)} = T
log(1 i ).
(3.1)
i=r +1

3.1. Asymptotic distribution: a broken linear trend


Inference should ideally be based on the exact distribution of the test statistic (3.1). Unfortunately
this is not feasible so some kind of asymptotic distribution approximation is needed. In order to
ensure a good approximation the breaks need to be treated with care. The approach taken here is
that the relative break points given by v j = T j /T are fixed while an asymptotic argument in T is
made.
c Royal Economic Society 2000


Cointegration analysis

223

For the asymptotic results the following notation is convenient. The relative break points
v j = T j /T satisfy 0 = v0 < v1 < < vq = 1. Let v j = v j v j1 and define a qdimensional vector of indicator functions for the sample periods, eu = {. . . , 1(v j1 <uv j ) , . . .} .
This function is the limit of E [T u] as T increases. For any two vector valued continuous functions
f u and gu on [0,1] we use the notation


( f u |gu ) = f u

f s gs ds



1
0

gs gs ds

1
gu ,

for the residual of f after correcting for g.


Theorem 3.1. Suppose Hl (r ) and Assumption 1 are satisfied. Then the asymptotic distribution of
the likelihood ratio test statistic for Hl (r ) against Hl ( p) is given by



1 
1

tr
0

d Wu Fu

Fu Fu du

Fu d Wu

(3.2)

as T and for fixed relative break points, v j . Here W is a standard Brownian motion of
dimension ( p r ) and F is a ( p r + q)-dimensional process,


Wu
Fu =
e .
(3.3)
ueu u
The asymptotic distribution has been simulated and the results analysed by a response surface
analysis presented in Section 3.4.
For analytic reasoning and for computer simulations it is convenient to rewrite the representation of the distribution given by (3.2). This is based on two ideas. First, the distribution is
invariant with respect to linear transformations of the vector process F. Thus if the first ( p r )
components are regressed on the last q components giving F0,u = (Wu |ueu , eu ), the transformed
1
version of the matrix ( 0 Fu Fu du)1 is block diagonal. It follows that expression (3.2) can be
rewritten as the sum of two terms which do not involve the levels of the Brownian motion, see
(3.4) below. We find



1 
1


d Wu F0,u

tr
0


+tr
0


F0,u F0,u
du


0
1

d Wu (ueu |eu )

F0,u d Wu

1 

(ueu |eu )(ueu |eu ) du


(ueu |eu )d Wu

The second term is 2 {q( p r )}-distributed since


 1

 1
D
(ueu |eu )d Wu = Nq( pr ) 0,
(ueu |eu )(ueu |eu ) du I pr .
0

Secondly, when regressing a Brownian motion on the level in each of the two or more sub-samples,
the sub-sample Brownian motions will be independent. These considerations lead to a second
representation of the asymptotic distribution (3.2).
c Royal Economic Society 2000


224

S. Johansen et al.

Theorem 3.2. Let W (1) , . . . , W (q) be independent ( p r )-dimensional standard Brownian motions and define


Jj =
0
1

( j)

(u|1){d Wu } ,

( j)
( j)
{Wu |1, u}{d Wu } ,

Kj =


1/2 
(u|1)2 du

0
1

Lj =

( j)

( j)

{Wu |1, u}{Wu |1, u} du.

Then the limiting variable (3.2) can be expressed as



  q
1  q
  q

q




tr
K j v j
L j (v j )2
K j v j +
J j J j .
j=1

j=1

j=1

(3.4)

j=1

From representation (3.4) of the limit distribution it is seen that the asymptotic distribution
only depends on the relative length of the sample periods, not on their ordering. For instance, in
the case of one break point, the asymptotic distribution is the same if T1 = T /3 as if T1 = 2T /3.
Moreover, the first term in (3.4) is the trace of a ( p r )-dimensional square matrix while the
second term is the sum of inner products of ( p r )-dimensional vectors. This reflects the degrees
of freedom arising from the matrix  and the vectors  j , respectively.
This representation also shows another feature of the limit distribution. In the case of q + 1
sample periods let D Fq+1 (v1 , . . . , vq+1 ) denote the asymptotic distribution. When the lengths
of one of the sample periods, v j , tends to zero it follows that the contributions K j , L j vanish
in the first term of (3.4). In the second term we can isolate J j J j and find
lim D Fq+1 (v1 , . . . , vq+1 ) = D Fq (v1 , . . . , v j1 , v j+1 , . . . , vq+1 ) + J j J j ,

v j 0

(3.5)

where D Fq and J j J j are independent and J j J j is 2 ( p r )-distributed. The additional 2


 , t E  ) is preserved although one of the
term arises because the dimension of the vector (X t1
t
relative sample lengths vanishes, and hence the dimension of the restrictions imposed by the rank
hypothesis is unaltered. If the dummies with the vanishing sample length are taken out of the
statistical analysis the additional 2 -distributed element disappears.
The asymptotic distribution given above does not depend on the parameters for the deterministic component. The test is therefore asymptotically similar with respect to these parameters
provided that Assumption 1 is satisfied, see also Nielsen and Rahbek (2000).
In order to estimate the rank a sequential testing procedure is necessary. One suggestion is to
test the hypotheses
Hl (0), Hl (1), . . . , Hl ( p 1)
sequentially against the unrestricted model Hl ( p). If Hl (r ) is the first hypothesis to be accepted
then the cointegrating rank is estimated by r. For consistency properties of this procedure see
Johansen (1996, Section 12.1).
c Royal Economic Society 2000


Cointegration analysis

225

3.2. A broken constant level


In some applications the level of the data may change from time to time but the data do not exhibit
a linear trend. Then the model is given by

X t = (, )

X t1
Et


+

k1


i X ti +

i=1

q
k 


j,i D j,ti + t .

(3.6)

i=1 j=2

The hypothesis of reduced cointegration rank is given by Hc (r ): rank (, ) r, or equivalently


that (, ) can be written as (  ,  ), while the likelihood ratio test statistic for Hc (r ) against
a general alternative, Hc ( p), is of form (3.1). The result of Theorem 3.1 applies with F replaced
by a ( p r + q)-dimensional process with components

Fu =

Wu
eu


.

(3.7)

3.3. Models with unrestricted parameters for the broken trend


The hypothesis Hc (r ) in model (3.6) imposes rank restrictions on the first-order autoregressive
parameter as well as the parameter for the broken deterministic trend. In some situations it may
seem reasonable to analyse a rank hypothesis which only involves the autoregressive parameter
for levels
Hlc (r ) :
rank  r
or
 =  ,
while is left unrestricted. This hypothesis is analysed by correcting X t and X t1 for the
remaining components in the model and subsequently performing a canonical correlation analysis
of the residuals. The likelihood ratio test statistic for Hlc (r ) against Hlc ( p) = Hc ( p) is of form
(3.1). Its asymptotic distribution is given as follows.
Theorem 3.3. Suppose Hlc (r ) and Assumption 1 are satisfied. Let W be a ( p r )-dimensional
standard Brownian motion and F the ( p r + q)-dimensional process given in Theorem 3.1. The
asymptotic distribution as T of the likelihood ratio test statistic for Hlc (r ) against Hlc ( p)
 , in particular, let n = rank (  ) min( p r, q).
depends on

(i) Suppose n = ( p r ) q. Then the asymptotic distribution is 2 {( p r )2 }.


(ii) Suppose n = q < ( p r ). Then the asymptotic distribution is given by

tr
0

d Wu Fu N

N

1
0

Fu Fu du N

1

where N is the {( p r + q) ( p r )}-matrix



I pr q 0( pr q)q

N =
0
0
c Royal Economic Society 2000


N

1
0

0
Iq


Fu d Wu

(3.8)


.

(3.9)

226

S. Johansen et al.

(iii) Suppose n < min( p r, q). Then there exist matrices , of rank n and dimensions
 =  . The asymptotic distribution is then
{( p r ) n}, (q n), respectively, so
given by (3.8) where N now depends on


I pr n 0( pr n)n 0

N =
.
(3.10)
0
0

The test is not as attractive as the previously considered tests. The limit distribution is a
 . The test is therefore not asymptotically similar with respect to the
complicated function of
slope parameters for the broken trend. Although the third situation in Theorem 3.3 only occurs
on a null subset of the parameter space, the issue ought to be addressed in the statistical analysis.
Had there been no breaks in the trend the test strategy suggested by Johansen (1996, Section
12.2) could be used. A generalization of that idea is not simple for two reasons. First, the limit
distribution depends continuously on the nuisance parameter. Secondly, in many applications it
 .
could be of interest subsequently to test hypotheses corresponding to rank restrictions on
Such tests are discussed in Section 4.

3.4. Critical values for rank tests


Exact analytic expressions for the asymptotic distributions are not known and the quantiles have
to be determined by simulation. The asymptotic distributions depend on a number of factors:
the number of non-stationary relations, the location of break points and the trend specification.
The moments of these distributions have been approximated using a large number of simulations
and a subsequent response surface analysis based on these factors. Then, the quantiles can be
approximated using the empirical observation that the shape of rank test distributions typically
are approximated rather well by -distributions, see Nielsen (1997) and Doornik (1998). Since
the parameters of a -distribution are given by the first two moments, it suffices to report adequate
approximations to the asymptotic mean and variance of the trace test distributions. The quantiles
can then be determined using a numerical routine for the incomplete -integral or a 2 -distribution
with non-integer degrees of freedom which is available in most statistical computer packages.
In the following the cases with a broken trend or a broken level are considered with up to
three sample periods, q = 3. The cases with q = 1, 2, 3 can be described jointly. Let v j = T j /T
denote the break points as a percentage of the full sample. For the case q = 3 there are three
relative sample lengths, v1 0, v2 v1 , 1 v2 . Let a and b denote the smallest and the second
smallest of these. For the case q = 2 there are two relative sample lengths v1 0, 1 v1 . Let b
denote the smallest of these and let a = 0. Finally, for q = 1 let a = b = 0.
The moments of the asymptotic distributions are unknown functions of ( p r ), a, b. We
have found that such functions are very accurately approximated by
log(moment) f moment ( p r, a, b, T )
(3.11)

2
4
4 
4 




m +
=
im xi +
i jm xi x j +
i jkm xi x j xk dm
m=0

i=1

i=1 j  i

i=1 j  i k  j

where x1 = ( p r ), x2 = a, x3 = b, x4 = T 1 , dm = ( p r )m . This function is essentially a


third-order polynomial in ( p r ), a, b and T 1 , where the terms in ( p r )1 and ( p r )2 play
c Royal Economic Society 2000


Cointegration analysis

227

b
0.5
0.4
0.3
0.2
0.1
0.0
0.0

0.1

0.2

0.3

Figure 1. Values of a and b used in the simulations.

the same role as the dummies for dimensions 1, 2 and 3 used in Doornik (1998), but give a better
fit. Note that the regression includes the inverse of the sample size, T 1 . The role of the sample
size in fitting response surfaces for the trace test is discussed in Doornik (1998). The asymptotic
moments are easily calculated from (3.11) by letting T . Note also that x1 d1 = d0 = 1 and
x1 d2 = d1 . Some of the parameters in (3.11) are therefore not identified, and are set to zero. The
remaining 75 parameters have been estimated by ordinary least squares, adding an error term to
(3.11) and minimizing the sum of squared residuals.
The moments of the asymptotic distribution were simulated for various values of ( p r ), a,
b and T. The involved Brownian motions can be discretized in several ways. One possibility is to
mimic the representation (3.2) and generate one random walk with T steps in each simulation, and
associate a percentage of this to each sample period. In order to avoid poor approximations for
cases with relatively short sample periods, representation (3.4) was used. The idea is to generate
three random walks each with T steps and then scale them according to the relative lengths of the
sample periods. The values of T were the integer part of 500/t for t = 1, . . . , 10. The considered
number of non-stationary relations was ( p r ) = 1, . . . , 8. Finally 20 different values of a and
b were chosen as illustrated in Figure 1, to be representative of all pairs (a,b) such that a < b and
b < (1 a b). Note that there is a more dense sampling when a = 0, corresponding to one
single break. This gives 1600 cases which were repeated N = 100 000 times.
The fit of (3.11) is excellent, even when the number of parameters is dramatically reduced
starting from the least significant, as illustrated in Table 1. Standard errors are about 0.2% for the
mean and 0.9% for the variance, an order of magnitude actually very close to the Monte Carlo
sampling variation in log(moment). Note that such small errors in the moments are virtually
negligible for all practical purposes when computing the quantiles and tail probabilities of the
-distribution. This point is illustrated in Tables 2 and 3, where quantiles and tail probabilities
are computed for the values mean = 80 and variance = 125 and small variations thereof. These
c Royal Economic Society 2000


228

S. Johansen et al.

Table 1. Goodness of fit measures for the response surface.


Unrestricted
Model

# Par.

Restricted

R2

103

# Par.

R2

103
2.06

Hc , log(mean)

75

0.999997

1.78

31

0.999996

Hc , log(variance)

75

0.999963

5.86

19

0.999936

7.59

Hl , log(mean)

75

0.999998

1.15

31

0.999996

1.77

Hl , log(variance)

75

0.999940

6.76

24

0.999894

8.86

Table 2. 95th Percentiles of the -distribution.


Variance
Mean

125/1.009

125

125*1.009

80/1.002

98.99

99.08

99.17

80

99.15

99.24

99.33

80*1.002

99.31

99.40

99.49

Table 3. Right-hand tail probability of the -distribution for the value 99.24.
Variance
Mean

125/1.009

125

125*1.009

80/1.002

0.0480

0.0487

0.0494

80

0.0493

0.0500

0.0507

80*1.002

0.0505

0.0513

0.0520

values of mean and variance are approximately equal to the average of the values found in our
simulations.
It is important to remark that the residuals of (3.11) are approximately homoscedastic in all
cases, which means that there are no values of ( p r ), a, b and T in which the errors are much
bigger as a percentage of the moment. This is what motivated the choice to model the log of
the moments rather than the moments themselves. We also tried to model the moments directly,
without taking logarithms, using weighted least squares with ( p r )2 as weights to account
for heteroscedasticity, along the lines of Doornik (1998). However, the fit appears to be slightly
poorer with that specification. Similar results can be achieved using PcGets, an algorithm for
model selection constructed by Krolzig and Hendry (2000).
The estimated coefficients are reported in Table 4 where the coefficients referred to the variable
x4 = T 1 are not reported, since they are irrelevant for computing the asymptotic moments.
In the case of q = 2 sample periods instead of q = 3 the mean and variance can still be
computed by using (3.11) by letting b = 0 and subtracting the mean and variance of a 2 ( p r )
variable, see (3.5). In the case of just q = 1 sample period let a = b = 0 and subtract an
additional 2 ( p r ) variable. Thus for q = 1, 2, 3 we have the approximation
mean exp{ f mean ( p r, a, b, )} (3 q)( p r )

(3.12)

variance exp{ f variance ( p r, a, b, )} 2(3 q)( p r ).

(3.13)

c Royal Economic Society 2000




Cointegration analysis

229

Table 4. Estimated response surface.


Hc
log(mean)

Hl

log(variance)

log(mean)

log(variance)

Constant

2.80

3.78

3.06

(p r)

0.501

0.346

0.456

3.97
0.314

1.43

0.859

1.47

1.79

0.399

0.993

0.256

( p r )2

0.0309

0.0106

0.0269

0.00898

( p r )a

0.0600

0.0339

0.0363

0.0688

( p r )b

0.0195

a2

5.72

ab

1.12

b2

1.70

( p r )3

0.000974

( p r )a 2

0.168

a3

6.34

ab2

1.89

2.35

4.08

2.35
0.000840
3.95

a2 b
b3

4.21

6.01

4.75

1.33
1.85

0.282

2.04

0.587

( p r )1

2.19

a( p r )1

0.438

0.874

0.304

1.62

b( p r )1

1.79

2.36

1.06

3.13

a 2 ( p r )1

6.03

2.88

9.35

4.52

ab( p r )1

3.08

b2 ( p r )1

1.97

a 3 ( p r )1

8.08

ab2 ( p r )1

5.79

b3 ( p r )1
( p r )2

4.44

0.717
1.29

a 2 ( p r )2

1.52

b2 ( p r )2

2.87

a 3 ( p r )2

2.05

2.47

3.82

1.21

2.12

5.87

22.8
7.15
4.31

b( p r )2

b3 ( p r )2

2.73

1.02
0.807

4.95

4.89

0.681

0.874

0.828

0.865

5.43
13.1

2.03

1.50

These formulas allow comparison with published approximations of the limit distribution for the
case with no breaks. Table 5 compares the mean and variance based on formulas (3.12) and
(3.13) with the expressions obtained by Doornik (1998) as well as comparing percentiles based
on these two sets of formula with those reported by Johansen (1996). Formulas (3.12) and (3.13)
agree quite well with Doorniks approximation both with respect to moments and in particular the
rightmost percentiles. Our approximation is based on surface responses with ( p r ) 8 while
c Royal Economic Society 2000


230

S. Johansen et al.

Table 5. Approximate mean, variance and 95th percentile of the asymptotic distribution of the rank test in
model Hc for a = b = 0, q = 1. Comparison of RS, the response surface analysis, D, Doornik (1998),
and J , Johansen (1996, Table 15.2).
mean
(p r)

RS

variance
D

RS

95th percentile

RS

4.1

4.1

7.0

6.7

9.2

9.1

9.1

12.0

12.1

19.6

20.0

20.1

20.3

20.0

24.2

24.0

38.5

38.6

35.2

35.0

34.8

40.2

40.0

63.2

63.2

54.1

53.9

53.4

60.2

60.1

94.0

93.8

77.0

76.9

75.7

84.1

84.1

131.1

130.4

103.8

103.7

101.8

111.9

112.1

174.2

173.6

134.5

134.6

132.0

142.8

144.1

222.6

221.6

169.2

169.4

165.7

180.3

180.1

274.9

276.2

208.4

208.3

203.3

10

222.5

220.1

329.3

336.8

253.2

251.1

244.6

Doornik considered ( p r ) 15. It is seen that our formula is not suited for extrapolation beyond
( p r ) > 10. The percentiles tabulated by Johansen are based on a discretization of T = 400,
whereas Doorniks and our formulas are based on response surfaces in T combined with the approximation. As expected there is agreement for lower dimensions whereas Johansens figures
are less accurate for higher dimensions. Doornik gives a more detailed comparison of percentiles
found from the -approximation and directly by simulation.

4. Restrictions on the slope parameters


When the cointegrating rank is known it is usually desirable to test further restrictions on the
parameters. In this section hypotheses on the slope of the deterministic trend are considered.
Recall from Theorem 2.1 that the slope parameter is
l, j = C j + (C I p )(  )1 j
in the jth sample period. In particular, the deterministic slope for the cointegrating relations,
 X t , is therefore  l, j = j . In brief, the results are
1. tests for linear restrictions on the slope for the cointegrating relation,  l G = 0, are
asymptotically 2 -distributed.
2. tests for linear restrictions on the slope for the entire process, l M = 0, are asymptotically
2 -distributed.
The two tests can be performed sequentially, by first imposing restrictions on the slope for
the cointegrating relation,  l G = 0, and then, if accepted, imposing restrictions on the slope
  M = 0, provided span(G) span(M). In this way
for the common deterministic trend,
l
it is possible to impose more restrictions on the slope for the cointegrating relation,  l , than on
the slope in general, l .
c Royal Economic Society 2000


Cointegration analysis

231

Note that it is not straightforward to do these two tests in the opposite order. In that case the
test for slope of the common deterministic trend is burdened with nuisance parameters. This is
related to the issue that the non-stationary trends are not uniquely defined.

4.1. Slope of the cointegrating relation


The slope for the linear trend in the cointegrating relation is given by the parameter . Linear
restrictions on this parameter can be formulated as

Hl (r ) :

= G,

where G is a known (q g)-matrix of rank g, where g q, and the parameter is a (gr )-matrix.
Under the hypothesis the slope for the cointegrating relations is therefore
 l E t =  G  E t .
As an example suppose q = 2. By the choice G = (1, 0) the linear trend is absent in the second
period whereas if G = (1, 1) then the slope is not altered by the break. Note, that when there is

no cointegration, r = 0, then vanishes, hence Hl (0) = Hl (0).


As before the likelihood is maximized by canonical correlation analysis. The squared sample

canonical correlations of the residuals, 1 > 1 > > p > 0, are given by

 

X t1
CanCor X t ,
F ,
t G  Et t

and the likelihood ratio test for the hypothesis Hl (r ) in Hl (r ) is


r


L R{ Hl (r ) Hl (r )} = T
log{(1 i )/(1 i )},
i=1

see Johansen (1996, Theorem 7.2). The asymptotic distribution is as follows.

Theorem 4.1. Suppose Hl (r ) and Assumption 1 are satisfied. Then the likelihood ratio test

statistic for Hl (r ) in Hl (r ) is asymptotically 2 {r (q g)}-distributed.


The restriction on the linear term of the cointegrating vector, = G, could be combined
with, for instance, a linear restriction on the cointegrating vector itself, = H , where H is a
known, full-rank, ( p h)-matrix and the parameter is of dimension (h r ). The likelihood ratio
test statistic for this hypothesis is also asymptotically 2 . The degrees of freedom is {r ( p h)}

if the alternative is Hl (r ) and {r ( p h + q g)} if Hl (r ) is the alternative.


4.2. Slope of the process
Restrictions on the slope l for the process X t are now studied. This slope is a linear combination
of the slope for the cointegrating relation,  l =  , and that of the common deterministic
  =  . Restrictions on were studied above and we now turn to restricting  .
trends,
l

c Royal Economic Society 2000




232

S. Johansen et al.

 will be expressed in terms of a known (q m)-matrix M where


Linear restrictions on
m q. The formulation


Hl (r ) : = M  +  M
,

where , are of dimension ( p m) and {(q m) r }, respectively, leaves  unrestricted,


 is restricted so  M = 0. If the slope of the cointegrating relation is restricted
whereas

correspondingly as  l M = 0, or more generally as  l G = 0 for some G satisfying


span(G) span(M), we have that the slope of the entire process satisfies l M = 0.
As opposed to the concept of cointegrating relations the non-stationary trends are linear
combinations of the levels of the time series which do not cointegrate. They could be chosen in
 X or   X . Under the above restriction, H , the slope of the
various ways, for instance as
t
t

l
former of these is



l =
C M  +
C(  )1  ,

showing that restrictions on are necessary to interpret Hl in terms of the non-stationary trends.
In the following we therefore discuss likelihood ratio tests for

Hl


(r ) : = G, = M  +  M

in Hl (r ) and Hl (r ). Note, that in the unrestricted model, with up to p cointegrating relations,

the hypothesis Hl ( p) entails no restrictions as compared with Hl ( p). The squared sample

canonical correlations, 1 > 1 > > p > 0, are now based on


X t1

CanCor X t , t G  E t Ft ,
(4.1)

 E

M
t

where the notation Ft indicates that the regressor E t is replaced by M  E t . The likelihood ratio

test statistics for Hl (r ) are

L R{ Hl

p


(r ) Hl (r )} = T
log{(1 i )/(1 i )},

L R{ Hl

i=r +1

(r ) Hl (r )} = L R{ Hl (r ) Hl (r )} + L R{ Hl (r ) Hl (r )},

see Johansen (1996, Theorem 6.2), where it is explained that it is convenient to express the first

of these statistics in terms of the small eigenvalues, using the fact that Hl ( p) = Hl ( p).
For the asymptotic analysis the restriction span(G) span(M) is crucial for avoiding nuisance
parameters.

Theorem 4.2. Suppose Hl (r ) and Assumption 1 are


satisfied. If in addition span(G)

span(M) then L R{ Hl (r ) Hl (r )} and L R{ Hl (r ) Hl (r )} are asymptotically 2 -distributed


with {( p r )(q m)} and {( p r )(q m) + r (q g)} degrees of freedom, respectively.
If span(G)  span(M), the asymptotic distributions of the test statistics involve nuisance
parameters.

5. Empirical Illustration
This section illustrates the suggested statistical analysis, applied to a five-dimensional data set with
variables relevant for analysing the Uncovered Interest Parity (UIP) hypothesis between Germany
c Royal Economic Society 2000


Cointegration analysis

233

Table 6. Maximum lag analysis ( p-value for the Godfrey test).


k

Akaike

HannanQuinn

Schwartz

2 (125)
Godfrey ar
5

53.63

53.00

52.06

0.001

54.29

53.26

51.72

0.261

54.37

52.93

50.80

0.231

54.46

52.62

49.89

0.758

54.80

52.56

49.24

0.846

and Italy. The economic model is very simple, and should be regarded as an illustration rather than
a contribution to the ongoing economic debate. The analysis has been done using MALCOLM
2.4 (Mosconi1998), where all the techniques illustrated in this paper are implemented in a user
friendly menu driven environment.
Let us consider the vector
Yt = (ptI , ptD , et+1 , i tI , i tD )
where ptI and ptD are first differences of log Consumer Price Index and represent inflation rates
in Italy and Germany. The variable et+1 is the first differences of log nominal exchange rate
between Italian Lira and German Mark (LIT/DM) and represents the rational expectation to future
exchange rates. Finally, i tI and i tD are Italian and German nominal interest rates on long-term
treasury bonds, given as annual rates divided by 4, to make them dimensionally matching with
the other variables. As for the sources, prices are from EUROSTAT (except 19731975, where
prices are from UNMonthly Bulletin of Statistics); note that, after October 1990, German prices
refer to unified Germany. Exchange Rates are from the Bank of Italy (average quarterly exchange
rates). Interest Rates are from IMF, International Financial Statistics. The data are available from
the Econometrics Journal website.
The data, which are shown in Figure 2, are quarterly, ranging from 1973.2 to 1995.4 (T = 91).
To model these data, based on prior knowledge of relevant historical events, we introduce two
breaks. The last observation of the first period is 1979.4, while the last observation of the second
period is 1992.2 (T1 = 27, T2 = 77; v1 = 0.297, v2 = 0.846; a = 0.154, b = 0.297). The first
break coincides with the creation of the EMS, but it is also supposed to catch the oil shock and
the modification of the US monetary policy. The second break corresponds to the exit of Italy
from the EMS, but also to the unification of Germany. The plot clearly shows the presence of
trends: the trend in inflation and interest rates in both countries in the second period is apparent,
but one might suspect trends in some of the variables also in the first and third period. This
suggests modelling the data using model Hl . In fact, the presence of trends in the variables may
be explained within model Hc only by random walks, whereas model Hl allows for interpreting
trends either as related to random walks or trend stationarity. Within model Hl , the nature of the
trend may be decided according to appropriate tests once the cointegration rank is determined in
a setting which is robust with respect to trend stationarity.
The analysis to determine the maximum lag k is reported in Table 6. The information criteria
suggest different values of k, in which case it is common practice to prefer the HannanQuinn
criterion. Therefore, k = 2 has been selected, since it is also the first lag to give approximately
white noise residuals, according to the Godfrey test.
JarqueBera normality tests, reported in Table 7, show some problems with skewness in the
c Royal Economic Society 2000


234

S. Johansen et al.

Inflation in Italy (dashed) and Germany


0.064
0.056
0.048
0.040
0.032
0.024
0.016
0.008
0.000
0.008
73

75

77

79

81

83

85

87

89

91

93

95

89

91

93

95

89

91

93

95

Interest Rates in Italy (dashed) and Germany


0.06
0.05
0.04
0.03
0.02
0.01
73

75

77

79

81

83

85

87

log difference of LIT/DM Exchange Rate


0.150
0.125
0.100
0.075
0.050
0.025
0.000
0.025
0.050
0.075
73

75

77

79

81

83

85

87

Figure 2. The data.

Table 7. JarqueBera Normality tests ( p-values).


Equation

Skewness

Kurtosis

Sk + Kur

ptI

0.789

0.155

0.351

ptD

0.188

0.152

0.151

et+1

0.053

0.162

0.058

i tI
i tD

0.541

0.085

0.188

0.941

0.015

0.053

System

0.376

0.001

0.004

c Royal Economic Society 2000




Cointegration analysis

235

Table 8. Rank tests.


Hypothesis

Test

p-value

r =0

256.46

0.000

r 1

157.43

0.000

r 2

71.96

0.043

r 3

29.50

0.621

r 4

10.42

0.745

Table 9. Characteristic roots of the models.


Root

r =5

r =3
1.00

0.82

0.12 + 0.72i

1.00

0.12 0.72i

0.11 0.71i

0.67 + 0.12i

0.11 + 0.71i

0.67 0.12i

0.45 0.27i

et equation and kurtosis in the i tD equation, so that, at the system level, normality is rejected. Due
to the illustrative aim of this analysis we did not try to analyse these problems any further. Note,
however, that all residual-based misspecification tests, like Godfrey and JarqueBera, should be
modified in the present setting to take into account that the first k residuals of each period are set
to zero by the presence of the dummies D j,ti , whose purpose is to condition upon the first k
observations of each period. This might partly explain the problems with kurtosis.
Coming to cointegration analysis, UIP implies that
i tI {i tD + E t (et+1 )} = 0

(5.1)

so that for an Italian investor the return from investing in Italy equals the expected return from
investing in Germany. The interpretation of the relation in the context of a vector autoregressive
model is
i tI (i tD + et+1 ) = zero mean stationary.

(5.2)

Therefore, the cointegration rank r is expected to be at least equal to one, but of course it may be
higher, since we do not have theoretical reasons to exclude more stationarity in the data.
The tests for cointegration rank are reported in Table 8. The analysis supports r = 3, which
is consistent with our prior expectation. Therefore, we estimate the model with r = 3. Table
9 reports the five largest characteristic roots for both the unrestricted and the restricted models,
  ) = ( p r ).
which seem to be consistent with the I(1) assumption rank (

Before trying to set up identifying restrictions on the cointegration space, let us illustrate
some interesting tests on the deterministic components. The slopes of the deterministic trends for
the cointegrating relations are given by the elements of the (3 3)-matrix , whose ith column
represents the trend coefficients of the ith stationary relation in the three different periods. A
suggested routine analysis consists in testing for the exclusion of the linear trend in the stationary
c Royal Economic Society 2000


236

S. Johansen et al.

Table 10. Test statistics on several hypotheses, all tested against Hl (r ).


Hypothesis

Hl,1 (3)

Hl,2 (3)

Hl,3 (3)

Hl,1 (3)

Hl,2 (3)
H I (3)

2 (n)

degrees of freedom

p-value

7.98

0.046

9.66

0.021

27.05

0.000

12.87

2+3

0.025

10.48

2+3

0.063

18.00

3+4+5

0.116

components in each period. In our example, this is done using the matrices

0 0
1 0
1 0
G1 = 1 0 ,
G2 = 0 0 ,
G3 = 0 1
0 1
0 1
0 0
and the restrictions

Hl,i (3) : = G i .
Note that these hypotheses may be also written as

 

Ip

Hl,i (3) : =
=
0

0
Gi



which is easily implemented in standard cointegration software. The results are given in Table 10,
which shows that a trend stationary component cannot be removed from any of the three periods.

However, the test takes on borderline values for Hl,1 (3) and Hl,2 (3), whereas the rejection is

much stronger for Hl,3 (3).


Let us now illustrate some joint hypotheses about the common trends and the stationary
components, namely
!


Hl,i (3) : = Mi , = Mi +  Mi
with

0
M1 = 1
0

0
0
1

1
M2 = 0
0

0
0 .
1

Referring to the discussion in Section 4.2, note that, in this case, M = G, so that the condition

span(G) span(M) of Theorem 4.2 is fulfilled. When Hl,1 (3) holds, then l M1 = 0 (i.e.


l = M1 ), so the slopes of both the stationary and the non-stationary components are zero in
the first period. The interpretation of the other restriction is similar. We perform these tests for

illustrative purposes only, since part of Hl,i (3), i.e. Hl,i (3), involving the stationary components,
has already been tested and rejected although not very strongly. As shown in Table 10, the joint

test also takes on borderline values, rejecting Hl,1 (3) and accepting Hl,2 (3). Strictly speaking,
this means that, according to the joint test, linear trends may be excluded from both the stationary
and non-stationary components in the second period.
In order to (over)-identify the cointegration space, we suggest the following stationary linear
combinations
c Royal Economic Society 2000


Cointegration analysis

237

z 1t = i tI (i tD + et+1 )
z 2t = (i tD ptD )
z 3t = (i tI ptI ) (i tD ptD ).
The equations represent the UIP hypothesis, the German real interest rate, and the real interest
rate differential, respectively. Note that, if these linear combinations are stationary, then also
z 4t = z 1t z 3t = (ptI ptD ) et+1
z 5t = z 2t + z 3t = (i tI ptI )
are stationary, and could be used to find an alternative and equivalent basis of the cointegration
space. Identifying restrictions may be written as



H (3) : =
= (B1 b1 , B2 b2 , B3 b3 ).

In order to test the local trend stationarity of z 1t ,z 2t and z 3t , together with some plausible
restrictions on the deterministic part, we set up the following identifying restrictions:

0 0 0
0 0
1
0 0 0
1 0
1

1 0 0
0 0
0

1 0 0
0 0
1
,

B1 =
=
=
B
,
B
2
3
1 0 0
1 0
1

0 1 0
0 0
0

0 0 0
0 0
0
0 0 1
0 1
0
which exclude the linear trend from z 1t in the second period, from z 2t in the first and second
I
periods, and from
r z 3t in all periods. The degrees of freedom for testing H (r ) against H (r )
are given by
j=1 ( p + q r dim B j + 1), see Johansen (1996, Theorem 7.5). As shown
I
in Table 10, H (3) cannot be rejected. Figure 3 represents z 1t , z 2t and z 3t , together with their
deterministic components estimated under H I (3).
This shows that in the period (79.1, 92.2), in which Italy belonged to the EMS, z 1t is approximately zero on average, which is evidence in favour of the UIP in that period. Conversely,
the mean of z 1t is negative and quite large in the first period, although trending towards zero.
This means that the interest rate in Italy was much lower than predicted by UIP plus rational
expectations. An interpretation could be that the extreme devaluation of Italian Lira in the 1970s
was unexpected, or in other words (5.2) should be replaced in the first period by
i tI (i tD + et+1 ) = zero mean stationary + t ,
where t represents a systematic bias in expectations. An alternative interpretation of the low
Italian interest rate could be related to the presence of a negative risk premium on Italy: if Italy
is perceived as less risky than Germany, then i tI should be lower than (i tD + et+1 ). However,
this interpretation seems implausible in the 1970s.
c Royal Economic Society 2000


238

S. Johansen et al.

STATIONARY COMPONENT # 1
0.075
0.050
0.025
0.000
0.025
0.050
0.075
0.100
0.125
73

75

77

79

81

83

85

87

89

91

93

95

83

85

87

89

91

93

95

83

85

87

89

91

93

95

STATIONARY COMPONENT # 2
0.0210
0.0180
0.0150
0.0120
0.0090
0.0060
0.0030
0.0000
0.0030
73

75

77

79

81

STATIONARY COMPONENT # 3
0.025

0.000

0.025

0.050
73

75

77

79

81

Figure 3. The stationary components.

A similar argument holds in the third period, the mean of z 1t is negative at the beginning,
immediately after Italy left the EMS, but is upward trending, turning positive in 1994. However,
the deviation from the UIP would be unlikely to continue trending beyond the relative short third
sample period. Thus in this situation forecasting should be done cautiously.

Acknowledgements
Comments from Maria Cristina Leali, Pieter Omtzigt, Anders Rahbek, Graziano Vigan`o and the
referees are gratefully acknowledged. This paper replaces a manuscript by Johansen and Nielsen
c Royal Economic Society 2000


Cointegration analysis

239

from 1993 with the title Asymptotics for cointegration rank tests in the presence of intervention
dummies manual for the simulation program DisCo.

REFERENCES
Anderson, T. W. (1951). Estimating linear restrictions on regression coefficients for multivariate normal
distributions. Annals of Mathematical Statistics 22, 327351. Correction in Annals of Statistics 8, 1400
(1980).
Bartlett, M. S. (1938). Further aspects of the theory of multiple regression. Proceedings of the Cambridge
Philosophical Society 34, 3340.
Chan, N. H., and C. Z. Wei (1988). Limiting distributions of least squares estimates of unstable autoregressive
processes. Annals of Statistics 16, 367410.
Clements, M. P., and D. F. Hendry (1999). Forecasting Non-stationary Economic Time Series. MIT press,
Cambridge MA, USA.
Doornik, J. A., (1998). Approximations to the asymptotic distribution of cointegration tests. Journal of
Economic Surveys 12, 573593.
Doornik, J. A., and D. F. Hendry (1994). Modelling linear dynamic econometric systems. Scottish Journal
of Political Economy 41, 133.
Doornik, J. A., D. F. Hendry, and B. Nielsen (1998). Inference in cointegrating models: UK M1 revisited.
Journal of Economic Surveys 12, 533572.
Hansen, H., and S. Johansen (1999). Some tests for parameter constancy in the cointegrated VAR. Econometrics Journal 2, 2552.
Hansen, P. R. (2000). Structural changes in cointegrated processes. Ph. D. Thesis, University of California,
San Diego.
Hendry, D. F. (1997). The econometrics of macroeconomic forecasting. Economic Journal 107, 13301357.
Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28, 32177.
Inoue, A. (1999). Tests of cointegrating rank with a trend-break. Journal of Econometrics 90, 215237.
Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control
12, 231254.
Johansen, S. (1996). Likelihood-based inference in Cointegrated Vector Autoregressive Models. 2nd printing.
Oxford University Press.
Johansen, S. (2000a). A small sample correction for test of hypotheses on the cointegrating vectors. To
appear in Journal of Econometrics.
Johansen, S. (2000b). A Bartlett correction factor for tests on the cointegrating relations. Econometric Theory
16, 740778.
Johansen, S. (2000c). A small sample correction of the test for cointegrating rank in the vector autoregressive
model. EUI working paper ECO no. 2000/15.
Krolzig, H.-M., and D. F. Hendry (2000). Computer automation of general-to-specific model selection
procedures. To appear in Journal of Economic Dynamics and Control.
Kuo, B. (1998). Test for partial parameters stability in regressions with I(1) processes. Journal of Econometrics 86, 337368.
Mosconi, R. (1998). MALCOLM: The Theory and Practice of Cointegration Analysis in RATS. Venice: Ca
Foscarina. Available online at http://www.greta.it/malcolm.
Nielsen, B. (1997). Bartlett correction of the unit root test in autoregressive models. Biometrika 84, 500504.
Nielsen, B., and A. Rahbek (2000). Similarity issues in cointegration models. Oxford Bulletin of Economics
and Statistics 6, 522.
Perron, P. (1989). The great crash, the oil price shock, and the unit root hypothesis. Econometrica 57,
13611401. Erratum (1993) Econometrica 61, 248249.
Perron, P. (1990). Testing for a unit root in a time series with a changing mean. Journal of Business &
Economic Statistics 8, 153162. Corrections and Extensions by Perron, P., and Vogelsang, T., 1992,
Journal of Business & Economic Statistics 10, 467470.
c Royal Economic Society 2000


240

S. Johansen et al.

Rappoport, P., and L. Reichlin (1989). Segmented trends and non-stationary time series. Economic Journal
99 supplement, 168177.
Seo, B. (1998). Tests for structural change in cointegrated systems. Econometric Theory 14, 222259.

A. Appendix: Mathematical Details


The techniques in the proofs are those of Johansen (1996)which we in the following will refer to as
(J). One major difference is that the monograph focuses on models, Hlc , where the parameters for the
deterministic terms are unrestricted under the rank hypothesis, whereas here the focus is on models, Hl ,
where the deterministic component of the process is not affected by the rank hypothesis.

A.1. Proof of Theorem 2.1


Each sub-sample can be considered separately because of the conditioning on the first k observations in
each period. Representation (2.8) therefore follows from Theorem 4.2 (J). In each period the initial value
of the stationary component is given its invariant distribution, so the stationary components of (2.8), Y j,t ,
are identically distributed with zero expectation. To derive the representations for the slope parameter l, j
and the intercept parameter  c, j we find from representation (2.8) for T j1 + k < t T j ,
E(  X t ) =  c, j +  l, j t,

E(X t ) = l, j ,

and taking expectation in (2.6) we obtain, since the dummies are zero,
l, j = {  c, j +  l, j (t 1) + j t} + j +

k1


i l, j

i=1

identifying coefficients, we find for  = I p

k1

i=1 i

l, j = (  c, j  l, j ) + j ,
or

j
j


=

 + 


 l, j + j = 0,


l, j
 c,t

The expressions in the theorem then follow by noting that




C
(C I p )
 + 



(C I p ) (C ) Ir


.


= I p+r .

A.2. Some asymptotic results


The statistical analysis is based on canonical correlation analysis. It is important to note that the sample
canonical correlations are invariant with respect to linear transformations of the data, whereas the corresponding canonical vectors used for estimating the cointegrating vector transform linearly.
For any two processes Ut and Vt we define the residuals
(Ut |Vt ) = Ut

T

s=1


Us Vs

T


1
Vs Vs

Vt .

s=1
c Royal Economic Society 2000


Cointegration analysis

241

We work under the assumption Hl (r ): = G, see Section 4.1, and define the extended parameter
 , t E  G) enters. Since X exhibits a linear
= (  ,  ) . In the reduced rank problem the process (X t1
t
t
trend according to Theorem 2.1 it is convenient to detrend the levels of the process using the last component
 , t E  G) by {(X



of the vector, t E t G, and replace (X t1
t1 |t G E t ) , t E t G}. Thus, define
t


"

Ip

QT =

T

t=1 X t1 E t Gt

so that

Ig


X t1
t G  Et

#1 
T
 E E  Gt
t
G
t
t
t=1
,

# "

X t1 |t G  E t
t G  Et

= QT

and hence


X t1
t G  Et


=

 

X t1
t G  Et


=

Q T

 

X t1 |t G  E t
t G  Et


.

We define the residual processes



(R0,t , R1,t ) =


X t ,

X t1 |t G  E t
t G  Et

 

Ft

(A.1)

and the residual product moment matrices




S00
S10

S01
S11


=



T 
1 
R0,t
R0,t
.
R1,t
T t=1 R1,t

Note that S11 is block diagonal because the residuals (X t1 |t G  E t ) and t G  E t are orthogonal. The squared

sample canonical correlations, 1 1 p 0 and i = 0 say, for i = p + 1, . . . , p + g of R0


and R1 are then given as solutions to the eigenvalue problem




1
S01 = 0.
S11 S10 S00
The corresponding eigenvectors, vi , satisfy

1
S01 vi .
S11 i vi = S10 S00

The matrix of the r first eigenvectors is denoted 0 and the parameters and are estimated from the
equation



(A.2)
= 0 .
Q T = Q T

A corresponding asymptotic relation can be established for the parameters using the representation in
Theorem 2.1


1
T
T



0








( X t1 + t E t G)E t Gt
t G E t E t Gt
( ) = ,

t=1

t=1

de f

= {  , O P (T 1 )} (  , 0) = ( 0 ) ,
since  X t + t  E t G is stationary.
c Royal Economic Society 2000


(A.3)

242

S. Johansen et al.

In order to formulate the asymptotic results for the residuals in model Hl (r ) some notation is needed.
Below we will choose an {( p r )n}-matrix with full column rank. We can then obtain three independent
standard Brownian motions of dimensions r, n, ( p r n), respectively,
int(T
u)

(  1 )1/2  1 T 1/2
  )1/2  T 1/2
(  )1/2  (

t=1
int(T
u)

 )1/2  (   )1/2  T 1/2


(

t=1
int(T
u)

Further define


Wu =

W,u
W ,u

t W,u ,
D

t W ,u .

t=1

t Vu ,

Fu =


Wu
e .
ueu u

To motivate the choice of consider the representation of X given in Theorem 2.1 and use the identity
Ig = G G  + GG  to find
 X =  C

t

i=T j1 +k+1

 G G  E + t  GG  E + O (1),
i + t
t
l t
P
l

(A.4)

for T j1 +k < t T j . The residuals are found by correcting for t G  E t and the variables in Ft . Since t G  E t
 X |t G  E , F ) has a linear trend  G G  (t E |G  E , E ).
is eliminated by regression the residual (
t
t
t
t
t
t
l
 G .
It turns out that the limit distribution depends on the row space of the {( p r ) (q g)}-matrix
l
If it has rank n, say, its row space is spanned by a {(q g) n}-matrix of full rank. Note that also spans
 G .
the row space of

In order to utilize the Brownian motion W in the asymptotic analysis of (A.4) we will transform the
 G . By Assumption 1 the matrices   and   have full rank. Choose a
column space of

  )1/2    G =  . It follows that


{( p r ) n}-matrix so (


 )1/2  (   )1/2   T 1/2
(

 C

int(T
u)


t

W ,u .

t=1

In correspondence with this we proceed to define the directions



B1 =

  )1/2
 (


B2 =

0
Ig


.

These constitute an orthogonal complement of 0 = (  , 0) . Finally, define


 )1/2 , B T 1/2 , B T 1/2 }.
BT = {B1 (

1
2

Lemma A.1. Asymptotic properties of BT R1 . Suppose hypothesis Hl (r ) and Assumption 1 are satisfied.
Define the matrix

I pr n
N  = 0n( pr n)
0g( pr n)

0( pr n)n
0nn
0gn

0( pr n)q
 G  {Iq Z G(G  Z G)1 G  } ,
G

(A.5)

c Royal Economic Society 2000




Cointegration analysis

243


where Z = 01 (ueu |eu )(ueu |eu ) du. Then as T

W ,u
1 
D  
BT R1,int(T u)
G {Iq Z G(G  Z G)1 G  }ueu
T
G  ueu




eu = N  Fu .

(A.6)

If G = Iq then Hl (r ) reduces to Hl (r ), g = q, n = 0 and N = I pr +q .

If G = 0 then Hl (r ) reduces to Hlc (r ), q = 0 and N is given in Theorem 3.3.


The proof follows from (A.4). It is seen from (A.4) that the residual
 (   )1/2    (X |t G  E , F )

t
t

has no trend and behaves asymptotically like a random walk corrected for E t so
D

 )1/2  B  R
T 1/2 (

1 int(T u) (W ,u |eu ).

(A.7)

  )1/2    (X

It also follows from (A.4) that (

t1 |t G E t , Ft ) behaves asymptotically

like  G  (t E t |t G  E t , E t ) and hence




T 1 B1 Rint(T u)  G  (ueu |G  ueu , eu ).

(A.8)

Finally we find
P

T 1 B2 Rint(T u) = T 1 G  {int(T u)E int(T u) |Fint(T u) } G  (ueu |eu ).

(A.9)


The proof is completed by combining (A.7)(A.9)

For each period the processes X t and  X t1 + t  G  E t can be given stationary initial distributions,
see Theorem 2.1. Apart from a changing level the stationary distributions are identical. Thus define






"00 "0
X t
X t1 , . . . , X tk+1 .
= Var



"0 "
X t1 + t G E t

Lemma A.2. Asymptotic behaviour of Si j . Suppose hypothesis Hl (r ) and Assumption 1 are satisfied. Then,
as T ,




"00 "0
P
S00
S01 0

.
(A.10)
"0 "
0 S10 0 S11 0
This asymptotic covariance matrix satisfies the identity
1
1
1
1
  )1  .
"00
"00
"0 ("0 "00
"0 )1 "0 "00
= (

(A.11)

The product moment matrices satisfy a joint convergence result


D

T 1 BT S11 BT N 

 1
0

Fu Fu du N ,


 1
(  1 )1/2  1
D
d Vu
S1 BT
(  )1/2  (   )1/2 
Fu N ,

dW
u
0
 )1/2  (   )1/2 
(

(BT S10 , BT S11 0 ) = OP (1),

where S1 = S01  Q T S11 .


c Royal Economic Society 2000


(A.12)

(A.13)
(A.14)

244

S. Johansen et al.

The proof of equations (A.12)(A.14) follows from Lemma A.1. Equation (A.10) follows by noting
that for T j1 + k < t T j , representation (2.8) implies
X t = Ct + Y j,t + l E t ,
 X t =  (Y j,t + c E t + tl E t ) =  Y j,t +  c E t t  G  E t ,
since the dummies are all zero. The distribution of Y j,t is the same in all periods. Consequently, the
processes  X t1 + t  G  E t and X t are stationary and the conditional variance, (A.10), is the same in
all periods. Further,


(  X t1 t G  E t , Ft ) = (  X t1 + t  G  E t t G  E t , Ft )
and hence the limit in (A.10) is expressed in terms of the variance of  X t1 + t  G  E t . Finally, equation
(A.11) follows from Lemma 10.1 (J).
The joint convergence of (A.12), (A.13) follows from Chan and Wei (1988).


A.3. Proof of Theorem 3.1


Under Hl (r ) we have G = Ig , so n = 0, N = I pr +g and W = W . The asymptotic result is then found
as in the proof of Theorem 11.1 (J). Note that the proof relies on identity (A.11).


A.4. Proof of Theorem 3.3


The proof resembles that of Theorem 3.1 but G = 0, so g = 0, G = Iq . The asymptotic theory therefore
 .
depends on the n-dimensional row space of

(1) If n = ( p r ) q we find N = {0n( pr ) ,  } and W = W . As in the proof of Theorem 3.1
we find that the asymptotic distribution is of form (3.8). Since N  Fu = (ueu |eu ) is deterministic, see (A.6),
then (3.8) is 2 -distributed.
(2) If n = q < ( p r ) then
N =

I pr q
0

0
0

0



=

I pr q
0

0




I pr q
0

0
0

0
Iq

 )1/2 }. Result (A.13) then


while Wu has components W,u , W ,u . Define M = {(  )1/2 , (

implies that
 1
D
  )1/2  S B

(
M
dWu Fu N .

1 T
0

As in the proof of Theorem 3.1 the limit distribution of L R{Hlc (r )|Hlc ( p)} is

tr

M

 1
0

dWu Fu N


0

N  Fu Fu N du

1 
0

N  Fu dWu M

which reduces to (3.8) since M is orthogonal. Noting that is regular the diagonal matrices diag(I pr q ,  )
cancel.
(3) If n < min(q, p r ) we arrive at expression (3.8) as in case (2).

c Royal Economic Society 2000


Cointegration analysis

245

A.5. Asymptotic properties of under Hl (r )

The proof of Theorem 4.1 uses the asymptotic properties of 0 , see (A.2). To discuss these it is convenient
to apply the normalization
0 = 0 ( 0 0 )1 .
Since the matrix ( 0 , BT ) has full rank and orthogonal blocks, 0 BT = 0, it follows that I p+g =
0 0 + BT B  . Consequently,
T

0 = 0 + BT B T 0 = 0 + BT UT ,

where UT = B T 0 .

Correspondingly, define = 0 0 so 0 = 0 .

Lemma A.3. Asymptotic behaviour of Si j . Suppose hypothesis Hl (r ) and Assumption 1 are satisfied. Then
in model H (r ) are consistent and
the estimators 0 , ,
and 
l
0 S10 = 0 S10 + oP (T 1/2 ).

0 S11 0 = 0 S11 0 + oP (1)

Moreover, the estimator 0 is asymptotically mixed Gaussian in the sense that


D
T UT = B T ( 0 0 )


N

 1
0

Fu Fu du N

1

N

 1
0

Fu (d Vu ) (  1 )1/2 ,

where F and N are given by (3.3) and (A.5). Note, that V and F are independent since V and W are
independent by construction.

The proof follows those of Lemma 13.1 and 13.2 (J).

In Lemma A.5 below the test for a simple hypothesis on is discussed. Both of the parameters and
0
are used in the proof. Thus recall the above definition of the residual product moment matrices Si j where

the levels of the process are detrended. Correspondingly, let Si j be the residual product moment matrices
where the levels are not detrended, hence,



S00 S01
Ip
=

0
S10 S11

0
QT



S00
S10

S01
S11



Ip
0

0
Q T


.

Lemma A.4. Detrending of levels residuals. Suppose hypothesis Hl (r ) and Assumption 1 are satisfied.
Then

 S10 = 0 S10 ,

 S10 = 0 S10 + oP (1),

 S11 = 0 S11 0 ,

 S11 = 0 S11 0 + oP (1).

(A.15)
(A.16)

The proof. The two identities in (A.15) follow from Q T = 0 , see (A.2). For (A.16) combine approximation (A.3) and Lemma A.2. For instance,

 S10 =  Q T S10 = (Q T 0 ) S10 + 0 S10


where Q T 0 is O P (T 1 ), see (A.3), and S10 is of stochastic order O P (T 1/2 ) by (A.10) and (A.14).

c Royal Economic Society 2000




246

S. Johansen et al.

Lemma A.5. Test for simple hypothesis on in model Hl (r ). Suppose Assumption 1 and the hypothesis
are satisfied. Then

 
1
 1
 1

1
D

d Vu Fu N N 
Fu Fu du N
N
Fu d Vu .
L R{ |Hl (r )} tr
0

0
0
This variable is 2 {r ( p r + g)}-distributed since V and F are independent.
The proof relies on an expansion of the likelihood functions around . Using Lemma A.4 it is possible

to replace , and Si j with 0 , 0 and Si j without changing the asymptotic results. The remaining
arguments of the proof of Lemma 13.8 (J) can then be followed using Lemma A.3. The degrees of freedom
is given by the product of dim V = r and dim(N  F) = p r + g.

A.6. Proof of Theorem 4.1





Using Lemma A.5 and defining Z = 01 d Vu Fu ( 01 Fu Fu du)1/2 , M = ( 01 Fu Fu du)1/2 N , the asymptotic
distribution is found to be

L R{Hl (r )|Hl (r )} = L R{ |Hl (r )} L R{ |Hl (r )}


D

 M )1 M  Z  }.
tr(Z Z  ) tr{Z M(M  M)1 M  Z  } = tr{Z M (M

Since M is a function of W which is independent of V this variable is 2 and independent of W. The degrees
 F) = (q g).
of freedom is the product of dim V = r and dim(N


A.7. Asymptotic results under the hypothesis Hl

(r )

For the asymptotic analysis of Hl (r ) the assumption span (G) span (M) is important. Had this not been
satisfied nuisance parameters would appear. The easiest example of that phenomenon is seen in a model
without breaks, q = 1, one lag, k = 1, and a linear trend for the cointegrating relation, G = I1 = 1,
X t = (  X t1 +  t) + + t .
The test for absence of the common deterministic slope, =  or M = 0, is burdened with nuisance
parameters.

The likelihood ratio test statistic for Hl (r ) in Hl (r ) is based on the sample canonical correlations
given in (4.1). Due to the invariance properties of canonical correlations it is equivalent to consider canonical
correlations of the residuals

 E
X t1 |t G  E t , M

 E
Ft .
t G  E t |M
(R0,t , R1,t ) = X t ,
(A.17)
t

 E

M
t

Here Ft indicates that we have only corrected for M  E t rather than for E t . Let Si j denote the corresponding
sample product moment matrices.

The sample product moment matrices Si j can be expressed as




S00

S10

S01

S11

1
S00 + s01 s11
 s10
=
S10

s10

$
 S01
S11
0

%
s01 
,
0

s11
c Royal Economic Society 2000


Cointegration analysis

247

where si j are the sample product moment matrices of the residuals



 E F ).
(r0,t , r1,t ) = ( X t , M
t
t

The expression for S00 follows because the regressor set Ft is smaller than Ft . Further, the first two

components of R1,t are orthogonal to the last component giving the orthogonal structure of S11 and can
be represented as



 
X t1 |t G  E t 
X t1 |t G  E t

M
E
,
F
=
t t
Ft = R1,t ,
t G  Et
t G  Et

see (A.1), implying that the upper left blocks of S11 , S10 are S11 , S10 . The asymptotic properties of Si j
are given in Lemma A.2 and we only need to describe si j .

Lemma A.6. Asymptotics for si j . Suppose hypothesis Hl (r ), Assumption 1 and the condition span(G)
 (e |M  e ). For T the
span(M) are satisfied. Define the (q m)-dimensional function f u = M
u
u
following results hold jointly with those in Lemma A.2
T 1/2 s01 = OP (1),
 1
P
f u f u du,
s11

(A.18)
(A.19)

  )1/2
T 1/2 s10 (

 1
0

f u dWu .

(A.20)

It then follows that S00 "00 and hence

(S00 )1 (S00 )1 S01 {  S10 (S00 )1 S01 }1  S10 (S00 )1


P

  )1  ,
(

(A.21)

where is the {( p + g + q m) r }-matrix (  , 0) .


When span(G)  span(M) and the slope parameter satisfies l M  = 0, then s01 = OP (1) and the
limit in (A.21) involves nuisance parameters.
For the proof the main observation is that for a stationary process, Z t , with zero mean and finite variance
and adapted to the natural filtration for t
T


E t E t = O(T ),

t=1

T

t=1

E t Z t = OP (T 1/2 ),

T


Z t1 t = OP (T 1/2 ).

(A.22)

t=1

In order to use those results the differenced process needs to be analysed carefully. Theorem 2.1 and Hl
imply the representation
X t = Ct + Yt E t + l E t ,
where

l = C M  + (C I p )  G  .

If span(G) span(M) then G  M = 0 and l = M  for some parameter and


( X t | M  E t ) = ( Ct + Yt E t | M  E t ).

(A.23)

(A.19): For the asymptotic analysis of s11 the regression on lagged differences can be ignored because
of (A.22), (A.23), hence
s11 = T 1

T

t=1

c Royal Economic Society 2000






 E M E , D



( M
t
t
j,ti )( M E t M E t , D j,ti ) + oP (1).

(A.24)

248

S. Johansen et al.

For asymptotic purposes the regression on the dummies also can be ignored and (A.19) follows.
(A.18): For the analysis of s10 note that (A.23) implies that
s10 = T 1

T

 E )( C + Y E | M  E , C

(M
t
t
t t
t
ti + Yti E ti , D j,ti ) .

(A.25)

t=1

Then (A.22) shows that T 1/2 s10 = OP (1).




 X F ) = (  F ). Using (A.22), (A.23) as
(A.20): The model equation (2.6) implies that (
t
t
t
t
above it follows that
T 1/2 s10 = T 1/2

T


 E )(  M  E , D

(M
t
t
j,ti ) + oP (1),
t

(A.26)

t=1

and this leads to (A.20).

(A.21): Since s10 = OP (T 1/2 ) while S00 , s11 are of order one, then S00 = S00 + oP (1). Using that

 S10 = 0 S10 it then follows that


(S00 )1 (S00 )1 S01 {  S10 (S00 )1 S01 }1  S10 (S00 )1
1
1
1
1
= S00
S00
S01 0 ( 0 S10 S00
S01 0 )1 0 S10 S00
+ oP (1)
  )1  .
which by (A.10), (A.11) converges to (


  = 0. Result (A.23) then fails and s is of
When l M = 0 then span(G)  span(M) and  l M
10


order one. In particular, s01 = l s11 + oP (1) converges to a non-zero value.


A.8. Proof of Theorem 4.2


Suppose span(G) span(M). The proof follows in two steps. First it is established that the likelihood ratio

test statistic for Hl (r ) against Hl ( p) = Hl ( p) converges in distribution to

 
  1  
 
 1  1  

 1
N
N
N
N
F
F
F
F
u
u
u
u

dWu
du
dWu ,
tr
fu
fu
fu
fu
0
0
0
where f is defined in Lemma A.6. The proof is the same as that of Theorem 3.1 with BT replaced by


0
BT

.
BT =
0
T 1/2 Iqm

The same type of argument shows that the likelihood ratio test statistic for Hl (r ) against Hl ( p) converges
in distribution to


1 
 1

1
1





tr
dWu Fu N
N Fu Fu N du
N Fu dWu .
0

0
0
Since F and f are orthogonal we find

L R{Hl

(r )|Hl (r )} = L R{Hl (r )|Hl ( p)} L R{Hl (r )|Hl ( p)}


1 

 1
1
1
D
tr
dWu f u
f u f u du
f u dWu 

0
0
0

which is 2 with dimW = ( p r ) times dim f = (q m) degrees of freedom.


c Royal Economic Society 2000


Cointegration analysis

249

For the joint test we use

L R{Hl

(r )|Hl (r )} = L R{Hl

(r )|Hl (r )} + L R{Hl (r )|Hl (r )}.

From the proof of Theorem 4.1 we have that L R{Hl (r )|Hl (r )} is asymptotically 2 and independent of W.

The asymptotic distribution of L R{Hl (r )|Hl (r )} is therefore a sum of two independent 2 -distributions.
If span(G)  span(M) then (A.21) fails and the above arguments do not hold. It follows that nuisance
parameters are involved in the asymptotic distributions.

c Royal Economic Society 2000




Vous aimerez peut-être aussi