Vous êtes sur la page 1sur 65

MULTICOLLINEARITY, AUTOCORRELATION, AND RIDGE REGRESSION

by

JACKIE JEN-CHY HSU


B.A. i n Econ., The N a t i o n a l Taiwan U n i v e r s i t y , 1977

A THESIS SUBMITTED IN PARTIAL FULFILMENT OF


THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE fBUSISJESS ADMINISTRATION)


IN:
THE FACULTY OF GRADUATE STUDIES"
THE FACULTY OF COMMERCE AND BUSINESS ADMINISTRATION

We accept

t h i s t h e s i s as conforming

to the r e q u i r e d

standard

THE UNIVERSITY OF BRITISH COLUMBIA


February 1980

c)

J a c k i e Jen-Chy Hsu,

1980

In p r e s e n t i n g
an a d v a n c e d
the

this

degree

Library shall

I further
for

agree

scholarly

by

his

of

this

thesis

in p a r t i a l

fulfilment

at

University

of

the

make

that

thesis

purposes

for

may

avai1able

It

financial

is

of

The U n i v e r s i t y

by

the

understood

gain

shall

British

2 0 7 5 Wesbrook Place
Vancouver, Canada
1W5

Feti.

8,

1980

Columbia

requirements

reference
copying

Head o f

that

not

Commerce & B u s i n e s s A d m i n .
of

for

for extensive

be g r a n t e d

the

B r i t i s h Co 1umbia,

permission.

Department

Date

freely

permission

representatives.

written

V6T

it

of

of

I agree
and
this

be a l l o w e d

or

that

study.
thesis

my D e p a r t m e n t

copying

for

or

publication

without

my

ABSTRACT

The

presence of m u l t i c o l l i n e a r i t y

can induce l a r g e

i n the o r d i n a r y L e a s t - s q u a r e s estimates

of r e p r e s s i o n

coefficients.

I t has been shown t h a t r i d g e r e g r e s s i o n can reduce t h i s


effect

on e s t i m a t i o n .

The presence o f s e r i a l l y

methods, have been proposed to o b t a i n good estimates


i n t h i s case.

adverse

correlated error

terms can a l s o cause s e r i o u s e s t i m a t i o n problems.

coefficients

variances

Various

two-stage

of the r e g r e s s i o n

Although the m u l t i c o l l i n e a r i t y and

a u t o c o r r e l a t i o n problems have l o n g been r e c o g n i z e d

i n regression

a n a l y s i s , they a r e u s u a l l y d e a l t w i t h

This thesis

explores

separately.

the j o i n t e f f e c t s of these two c o n d i t i o n s on the mean

square e r r o r p r o p e r t i e s of the o r d i n a r y r i d g e e s t i m a t o r
as the o r d i n a r y

'least-squares

estimator.

as w e l l

We show t h a t r i d g e

r e g r e s s i o n i s doubly advantageous when m u l t i c o l l i n e a r i t y i s


accompanied by a u t o c o r r e l a t i o n i n b o t h , t h e e r r o r s and the p r i n c i p a l .
components.
adjusted

We then d e r i v e a new r i d g e type e s t i m a t o r

that i s

for autocorrelation.

F i n a l l y , using
of - m u l t i c o l l i n e a r i t y

s i m u l a t i o n experiments w i t h d i f f e r e n t

degrees

and a u t o c o r r e l a t i o n , we compare the mean

square e r r o r p r o p e r t i e s o f v a r i o u s

estimators.

TABLE OF CONTENTS

INTRODUCTION
NOTATION AND PRELIMINARIES
MULTICOLLINEARITY
3.1

Sources

3.2

Effects

3.3

Detection

AUTOCORRELATION
4.1

Sources

4.2

Effects

4.3

Detection

JOINT EFFECTS OF MULTICOLLINEARITY AND AUTOCORRELATION


5.1

Mean Square Error of the OLS Estimates of '

5.2

Mean Square Error of the Ridge Estimates of,3

5.3

When w i l l Ridge estimates be better than the


OLS estimates?

5.4

Use of the "Ridge Trace"

RIDGE REGRESSION:
PREDICTION

ESTIMATES, MEAN SQUARE ERROR AND

6.1

Derivation of Ridge Estimator for a CLR Model

6.2

Derivation of Ridge Estimator for an ALR Model

6.3

Mean Square Error of the "Generalized Estimates"

6.4

Estimation

6.5

Prediction

TABLE OF CONTENTS (cont'd)

THE MONTE CARLO STUDY


7.1

D e s i g n o f t h e Experiments

7.2

Sampling R e s u l t s
7.2a.

R e s u l t s assuming p

i s known

7.2b.

R e s u l t s assuming p

i s unknown

7.2c.

Forecasting

CONCLUSIONS
REFERENCES

INTRODUCTION
M u l t i c o l l i n e a r i t y and A u t o c o r r e l a t i o n a r e two v e r y
in regression analysis.

common problems

As i s well-known, the presence o f some degree

of m u l t i c o l l i n e a r i t y r e s u l t s i n e s t i m a t i o n ,

instability

and model m i s -

s p e c i f i c a t i o n w h i l e the presence o f s e r i a l l y c o r r e l a t e d e r r o r s l e a d s t o
u n d e r e s t i m a t i o n o f the v a r i a n c e s
prediction.
estimation

o f parameter e s t i m a t e s and i n e f f i c i e n t

Because these two c o n d i t i o n s have adverse e f f e c t s on


and p r e d i c t i o n , a wide range o f t e s t s have, been developed t o

reduce t h e i r impact.

I n v a r i a b l y , the m u l t i c o l l i n e a r i t y and auto-

c o r r e l a t i o n problems a r e d e a l t w i t h s e p a r a t e l y

i n most i f not a l l the

preceedings.
In t h i s t h e s i s we address the q u e s t i o n

"What are. the j o i n t e f f e c t s

of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n on e s t i m a t i o n
Thereafter

we s h a l l study a n a l y t i c a l l y the p o s s i b l e changes i n the

e f f e c t i v e n e s s of various
these two c o n d i t i o n s .
estimator

adjusted

e s t i m a t i o n methods i n the j o i n t presence of

As a r e s u l t o f these new f i n d i n g s , a new r i d g e

f o r a u t o c o r r e l a t i o n i s then proposed and i t s

p r o p e r t i e s a r e i n v e s t i g a t e d by c o n d u c t i n g a s i m u l a t i o n
We b r i e f l y o u t l i n e t h i s t h e s i s .
our

and p r e d i c t i o n ? "

analysis.

Sections

Section 2 provides

3 and 4 g i v e a g e n e r a l

of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n .
the v a l i d i t y o f v a r i o u s

study.
the s e t t i n g f o r

d i s c u s s i o n of the problems

In a d d i t i o n , we comment on

existing diagnostic tests.

The a n a l y t i c a l study

of the j o i n t e f f e c t s of m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n i s p r e s e n t ed i n S e c t i o n 5 .

I n S e c t i o n 6 , a new r i d g e e s t i m a t o r

adjusted

f o r auto-

c o r r e l a t i o n i s d e r i v e d and i t s mean square e r r o r p r o p e r t i e s a r e a n a l y z e d .


A l s o , we d i s c u s s how these new e s t i m a t e s can be obtained
The

i n practice.

methodology and the r e s u l t s o f sampling experiments appear i n S e c t i o n

-2-

7.

The

t h e s i s concludes w i t h the p r e s e n t a t i o n

methods t h a t can be used w i t h the new


a c h i e v e b e t t e r e s t i m a t e s and

of s e v e r a l two-stage

ridge r u l e that hopefully

predictions.

will

-3-

2. NOTATION AND PRELIMINARIES


The C l a s s i c a l L i n e a r R e g r e s s i o n
the

(CLR) model can be r e p r e s e n t e d by

equation

Y = x + e

(2.1)

where Y i s a

n x l . v e c t o r o f o b s e r v a t i o n s on t h e dependent v a r i a b l e , X

i s a nxp m a t r i x o f o b s e r v a t i o n s on t h e e x p l a n a t o r y v a r i a b l e s , 3 i s a
p x l v e c t o r o f r e g r e s s i o n c o e f f i c i e n t s t o be e s t i m a t e d and e i s a n x l
vector of true

e r r o r terms.

The s t a n d a r d assumptions o f the l i n e a r

r e g r e s s i o n model a r e :
(1) E ( e ) = 0 , where 0 i s t h e zero v e c t o r

(2) E(ee ) = a I , where I i s t h e i d e n t i t y m a t r i x .


(3) The e x p l a n a t o r y v a r i a b l e s a r e n o n - s t o c h a s t i c , hence they
a r e independent o f the e r r o r
(4) Rank (X) = p < n .
The O r d i n a r y L e a s t - s q u a r e s

(2.2)

0 L g

terms.

(OLS) e s t i m a t o r o f 3 i s g i v e n

(fp^fY

with variance-covariance matrix


(2.3)

Var(3

) = a (X X)
2

Q L S

_ 1

T
For s i m p l i c i t y , we w i l l assume t h a t (X X) i s i n c o r r e l a t i o n form. L e t
X T
P be the pxp o r t h o g o n a l m a t r i x such t h a t PX XP = A where A i s a
T
d i a g o n a l m a t r i x w i t h t h e e i g e n v a l u e s o f (X X), X^,...,X , d i s p l a y e d on
the d i a g o n a l o f A.

1 2

We assume f u r t h e r t h a t
p

-4-

After

applying

(2.4)

an o r t h o g o n a l

E(Y) = X P P
T

rotation,

P, i t f o l l o w s

from

(2.1) t h a t

= X*a

T
w h e r e X* = XP

i s the data

matrix

represented

and

t h e c o l u m n s o f X* a r e l i n e a r l y

the

vector

of regression

follows

that

We w i l l

consider

(2.6)

B (k) =(X X+kI)~


T

is

ridge

symmetric
be

an " a d a p t i v e

0 L S

ridge

LS

estimator"

estimator"

i n the rotated

[12].

where Z =

By

SfcCk) - Z a
(A+kI)

substituting

8 (k)
R

- 1

the ridge

0 L S

A.

= (P AP+kI)
T

? (
T

A + k

_ 1

-P^A+kl)"
It

follows

(2.8)

from

(2.7) t h a t

a ( k )= P (k) .
R

- 1

I found

P AP
T

??

T A

Q L S

?OLS

Aa ...
QLS

by a

i s said to

[ll:p.63].

coordinates,

X*P = X i n ( 2 . 6 ) ,

I f k l i s replaced

k, t h e n t h e e s t i m a t o r

given by

(2.7)

When k i s a f u n c t i o n o f |3Q > ( k )

nonnegative d e f i n i t e matrix

a "generalized

Expressed

Q L S

It

0 < k < l

of

components.

f o r 3 o f the form

(X X)

by

coordinates
a =-P i s

The v e c t o r

of the p r i n c i p a l

of a i s given

ridge estimators

Where k i s i n d e p e n d e n t
called

independent.

coefficients

t h e OLS e s t i m a t o r

i n the rotated

estimator

fora i s

For the CLR

model, assumption

i s often violated i n practice.


Autoregressive

Linear Regression

(2) t h a t the e r r o r s are

This leads
(ALR)

to the f o r m u l a t i o n

model.

Mathematically

model i s given by r e p l a c i n g assumptions (2) and


(2') and

(3) by

T
E(ee

) = a^Q

We

the

ALR

assumptions

(3 )

t ; L > 2 > tp-' ' ^

fc

where 2 i s a n o n d i a g o n a l p o s i t i v e d e f i n i t e

ie

observation

on the

v a r i a b l e s , i s independent of the contemporaneous and


E

an

(2 )
1

t'

of

(3') below.

uncorrelated

matrix.

explanatory

succeeding

errors,

t+1'

assume t h a t the e r r o r term 'e

follows a f i r s t - o r d e r

autoregressive

scheme, t h a t i s
e

(2.9)
where p
and

i s the a u t o c o r r e l a t i o n c o e f f i c i e n t .

that U

(2.10)

= p e - + U
e t-1
t

s a t i s f i e s the f o l l o w i n g f o r a l l t
E(U
E

) = 0

t t
U

E(U U
t

t + s

"

s = 0

)=0

4 o

and
(2.11)

E(ee
~~
N

) =
'

a2u~V

where
n-1
n-2

V
n-1

n-2

We

require that

|p |<l

-6-

For an ALR model, the " G e n e r a l i z e d L e a s t - s q u a r e s "


"Best L i n e a r Unbiased E s t i m a t o r "

(GLS) w i l l

(BLUE) o f 3, denoted as 3

r T C

g i v e the
.

The

m a t r i x ft can be w r i t t e n
T

0 = 9Q
where Q i s n o n s i n g u l a r .

-1

-1 T

ft(Q
-1 T
(Q )

and
^GLS

Hence

) = I
-3
-1
n

b ^ - ^ by making
t a

the f o l l o w i n g s u b s t i t u t i o n i n the ALR model;

n e

S*

_ 1

Then i t f o l l o w s t h a t
(2.12)

Since

= X g + ,.

(2.12) s a t i s f i e s a l l the assumptions o f a CLR model, OLS

g i v e the BLUE of 3.

( 2

1 3 )

i Ls

Hence i t f o l l o w s t h a t

T -1 -1 T -1
= (X ft X)
X ft Y
For p r e d i c t i o n , formula

where e

t + 1

= X

t + 1

G L S

i s the t*"* GLS


1

'

(2.15) g i v e s the "Best L i n e a r Unbiased P r e d i c t o r "

(BLUP) i n a f i r s t - o r d e r ALR model

(2.15)

will

p e

residual.

-7-

3. MULTICOLLINEARITY
In a p p l y i n g m u l t i p l e r e g r e s s i o n models, some degree of
dependence among e x p l a n a t o r y

v a r i a b l e s can be

expected.

inter-

As

this

T
interdependence grows and

the c o r r e l a t i o n m a t r i x (X X)

s i n g u l a r i t y , m u l t i c o l l i n e a r i t y c o n s t i t u t e s a problem.

approaches
Therefore

it is

p r e f e r a b l e to t h i n k of m u l t i c o l l i n e a r i t y i n terms of i t s " s e v e r i t y "


rather
3.1

than i t s " e x i s t e n c e "

or

"nonexistence".

Sources
In g e n e r a l , m u l t i c o l l i n e a r i t y can be

poor e x p e r i m e n t a l d e s i g n .
be c l a s s i f i e d as f o l l o w s
(i)

Not

The

considered

to be

a symptom

of

sources of severe m u l t i c o l l i n e a r i t y may

[20:p.99-101].

enough d a t a or two

many v a r i a b l e s

In many cases l a r g e d a t a s e t s o n l y c o n t a i n a few b a s i c f a c t o r s .


As

the number of v a r i a b l e s e x t r a c t e d

from the d a t a i n c r e a s e s ,

each

v a r i a b l e tends to measure the d i f f e r e n t nuances of the same b a s i c


f a c t o r s and

each h i g h l y c o l l i n e a r v a r i a b l e o n l y has

t i o n content.

little

In t h i s case, d e l e t i n g some v a r i a b l e s or

informa-

collecting

more data can u s u a l l y s o l v e the problem.


( i i ) P h y s i c a l or s t r u c t u r a l s i n g u l a r i t y
Sometimes h i g h l y c o l l i n e a r v a r i a b l e s , due
p h y s i c a l c o n s t r a i n t s , are i n a d v e r t e n t l y

to mathematical or

included

i n the model.

( i i i ) Sampling s i n g u l a r i t y
Due

to expense, a c c i d e n t

or m i s t a k e , sampling was

conducted i n a s m a l l r e g i o n of the d e s i g n

space.

only

3.2

Effects
The

(i)

major

effects

Estimation
As

the

T
(X X)

the

T
-1
(X X)
explode.

inverse matrix

of

becomes i l l - c o n d i t i o n e d ,

the

OLS

estimates

f o r B are

elements of

the

inverse matrix

instability

of

the

be

numerically

the

inverse matrix

regression coefficients

variances
(ii)

of

and

impossible
are

of

quite sensitive

the

As

a result

of

small

OLS

estimates

v a r i a b l e s might

case

they

changes

have

i n the

large

data

set.

Structure misspecification
The
the

increase

information

decreasing
to

the

the

i n the

content
sample

explained

d e p e n d s on

size

of

the v a r i a b l e set of X

o f each e x p l a n a t o r y

significance

variance

e a c h member o f

o f Y.

authors
data

a relatively

[6:p.94][13:p.l60][15],

limitation

responsible
Forecast
If

then

f o r the

than

tendency

the

thereby

each v a r i a b l e ' s c o n t r i b u t i o n
even though Y

large v a r i a b l e set

happen.

i n the

variable,

decreases

As

process

theoretical

to underspecify

really
of

a s s e r t e d by
of

many

model-building,

limitation
the

X,

is

models.

inaccuracy

an

collinear,
changes

rather

of

Therefore,

e r r o n e o u s d e l e t i o n o f v a r i a b l e s may

(iii)

, the

I n any

to

that

-1

collinear

to obtain.

shows

the

-1

(X X)

the

(2.3)

a f f e c t e d by

T
(XX)
T

the

are

c o r r e l a t i o n matrix

variances

diagonal

serious multicollinearity

instability

elements of
the

of

important

v a r i a b l e i s omitted

but

in- the

later

i t s behavior

and

prediction period,

moves i n d e p e n d e n t l y

any. f o r e c a s t i n g u n d e r . t h i s

inaccurate.

because

of

i t i s highly

this

omitted

variable

other v a r i a b l e s ,

o v e r s i m p l i f i e d model w i l l

be

very

-9-

(iv)

Numerical problems
T

The

c o r r e l a t i o n m a t r i x (X X) i s not i n v e r t i b l e i f the columns


T

o f X a r e l i n e a r l y dependent.
the

With the m a t r i x (X X) b e i n g

singular,

OLS e s t i m a t e s of , r e p r e s e n t e d by ( 2 . 2 ) , a r e c o m p l e t e l y

indeterminate.

In case o f an almost s i n g u l a r

s e t of v a r i a b l e s ,
T -1

the n u m e r i c a l i n s t a b i l i t y i n c a l c u l a t i n g i n v e r s e m a t r i x (X X)
still
3.3

remains.

Detection
T e s t s f o r the presence and l o c a t i o n o f s e r i o u s

are b r i e f l y o u t l i n e d
(1)

multicollinearity

and f o l l o w e d by comments.

T e s t s based o n . v a r i o u s c o r r e l a t i o n

coefficients

Here, h a r m f u l m u l t i c o l l i n e a r i t y i s g e n e r a l l y
r u l e s o f thumb.

r e c o g n i z e d by

For i n s t a n c e , an admitted r u l e o f thumb

requires

simple p a i r - w i s e c o r r e l a t i o n c o e f f i c i e n t s of e x p l a n a t o r y

variables

to be l e s s than 0.8.

sophisticated

Certainly,

those more extended and

r u l e s o f thumb w i t h prudent use o f v a r i o u s c o r r e l a t i o n


w i l l give

more s a t i s f a c t o r y r e s u l t s .

i s generally

c o n s i d e r e d t o be s u p e r i o r

The f o l l o w i n g

coefficients

r u l e o f thumb

to o t h e r r u l e s :

a variable

i s s a i d t o be h i g h l y m u l t i c o l l i n e a r i f i t s c o e f f i c i e n t o f
2
m u l t i p l e c o r r e l a t i o n , R., w i t h the remaining (p-1) v a r i a b l e s i s
2
g r e a t e r than the c o e f f i c i e n t o f m u l t i p l e c o r r e l a t i o n , R , w i t h a l l
the e x p l a n a t o r y v a r i a b l e s [I4:p.101]. The v a r i a n c e o f the e s t i m a t e
of

3^ can be expressed as f o l l o w s
-,

(3.1)

1 - R

Var(B ) = ^--4
1

[9]

\
a

X.
l

1 - R

2
2
where a i s the v a r i a n c e o f the dependent v a r i a b l e Y and a,, i s the
y
X.
l
3

-10-

variance of the explanatory variable

X^.

From

(3.1), i t i s obvious
2

that m u l t i c o l l i n e a r i t y

constitutes

a problem

o n l y when R i s r e l -

2
high to R^.

atively
this

(ii)

rule

U n f o r t u n a t e l y the geometric

o f thumb i s a p p a r e n t

o n l y when t h e r e a r e two

variables
[6:p.98],
Three-stage hierarchy test
T h i s i s p r o p o s e d b y F a r r a r and G l a u b e r
stage,

i f the n u l l

hypothesis H

the W i l k s - B a r t l e t t ' s
severe

a n d move

test,

toward

i n t e r p r e t a t i o n of

: |X

At the f i r s t

X| = 1 i s rejected

we may a s s e r t

the second

[6].

explanatory

based

on

that m u l t i c o l l i n e a r i t y i s

stage.

The F s t a t i s t i c

i s then

2
f o r each R^

computed

RJ/(P-D
F

i = 1,

. . . , p

(l-R*)/(n-p)
Statistical
stage,

inspection o f the p a r t i a l

X ^ and t h e r e m a i n i n g
can

F^ i m p l i e s X ^ i s c o l l i n e a r .

significant

(p-1) v a r i a b l e s

severe

F a r r a r and G l a u b e r

multicollinearity

among e x p l a n a t o r y v a r i a b l e s
different
Haitovsky
In

stages

of their

C h i Square

statistic

test

between

tratios

among t h e e x p l a n a t o r y
that detecting,

the pattern of

can be r e s p e c t i v e l y

localizing

interdependence

achieved

at three

test.

test

1969, H a i t o v s k y

hypothesis

claimed

and l e a r n i n g

coefficients

and t h e a s s o c i a t e d

show t h e p a t t e r n o f i n t e r d e p e n d e n c y

variables.

(iii)

correlation

At the t h i r d

[9] proposed

of severe

a heuristic

multicollinearity.

i s a f u n c t i o n o f the determinant

statistic

This

f o r the

heuristic

of the c o r r e l a t i o n

matrix

T
( X X ) , and a p p r o x i m a t e l y

distributed

as C h i Square.

Applications

to F a r r a r and Glauber's d a t a show t h a t t h i s t e s t g i v e s more


s a t i s f a c t o r y r e s u l t s than the W i l k s - B a r t l e t t ' s t e s t t h a t i s
adopted
test.
and

at the f i r s t

stage of

the F a r r a r and Glauber

three-stage

T h e r e f o r e H a i t o v s k y c l a i m e d the s u p e r i o r i t y of h i s t e s t

suggested

a replacement

of W i l k - B a r t l e t t ' s t e s t by h i s t e s t i n

the F a r r a r and Glauber

three-stage t e s t .

on the the determinant

of c o r r e l a t i o n m a t r i x has some b u i l t - i n

deficiencies..

However, any

test

As w i l l be shown l a t e r , the mean square

based

error
T

p r o p e r t i e s depend o n l y on the e i g e n v a l u e s of the m a t r i x (X X ) .


T
Only when the

(X X)has a broad e i g e n v a l u e spectrum, that i s to say

the r a t i o of the l a r g e s t e i g e n v a l u e to the s m a l l e s t one,


l a r g e , the performance of the OLS
the determinant

e s t i m a t e s may

X^/X

deteriorate.

, is
Since

of the c o r r e l a t i o n m a t r i x i s e q u a l t o the product

of a l l the e i g e n v a l u e s , t h i s t e s t w i l l t r e a t the m a t r i x having


broad e i g e n v a l u e spectrum

e q u i v a l e n t l y to those h a v i n g

relatively

narrow e i g e n v a l u e s p e c t r a , so l o n g as they have the same or n e a r l y


the same d e t e r m i n a n t s .
is difficult

The

r e l a t i v e magnitude of the e i g e n v a l u e s

i f n o t i m p o s s i b l e t o i n f e r from the r e s u l t s of

t e s t t h a t i s based

on the determinant

However, H a i t o v s k y t e s t g i v e s a f a i r l y

any

of the c o r r e l a t i o n m a t r i x .
good i n d i c a t i o n i n the

presence of severe m u l t i c o l l i n e a r i t y i n our s i m u l a t i o n study.


T
Examining the spectrum o f m a t r i x (X X)
T
I f the m a t r i x (X X) has a broad e i g e n v a l u e spectrum, t h a t i s ,
A^/Ap i s l a r g e , then the mean square e r r o r of the OLS e s t i m a t e s of
becomes v e r y l a r g e .

S i n c e the t r a c e of the c o r r e l a t i o n m a t r i x

T
(X X) i s e q u a l to the number o f e x p l a n a t o r y v a r i a b l e s p, an

arbitrary

-12-

r u l e of thumb may consider \,/\ i s large i f A,/X > p.


1 p
1 p

Besides,

the minimax index MMI = Z X. /X


i
P

i s a useful indicator too. Small

MMI

, say, less than two implies the presence of m u l t i c o l l i n e a r i t y

[21:p.13-14].
Among a l l these tests and methods proposed, examining the
T

eigenvalue spectrum of the matrix (X X) provides not only a sound


t h e o r e t i c a l basis but also the l i g h t e s t computation burden.

-13-

4. AUTOCORRELATION
One o f t h e b a s i c assumptions o f t h e CLR model i s t h a t the e r r o r
terms a r e independent o f each o t h e r .
i s applied

However, when r e g r e s s i o n

analysis

t o time s e r i e s d a t a , the r e s i d u a l s a r e o f t e n found t o be

s e r i a l l y correlated.

Like multicollinearity, autocorrelation

w i d e s p r e a d problem i n a p p l y i n g

r e g r e s s i o n models.

i s another

For s i m p l i c i t y ,

first-

o r d e r a u t o c o r r e l a t i o n i s assumed i n our s t u d y .

4.1

Sources
The s o u r c e s a r e m a i n l y t h e f o l l o w i n g :

(i)

Omission of v a r i a b l e s
The t i m e - o r d e r e d e f f e c t s o f t h e o m i t t e d v a r i a b l e s w i l l be
included

i n t h e e r r o r terms.

d i s p l a y i n g random b e h a v i o r .

This prevents the e r r o r s

from

I n t h i s case, f i n d i n g the m i s s i n g

v a r i a b l e s and i d e n t i f y i n g t h e c o r r e c t r e l a t i o n s h i p can s o l v e t h e
problem.
(ii)

S y s t e m a t i c measurement e r r o r i n t h e dependent v a r i a b l e
A g a i n , t h e e r r o r terms absorb the s y s t e m a t i c measurement e r r o r
i n t h e dependent v a r i a b l e and t h e n d i s p l a y non-random b e h a v i o r .

(iii)

E r r o r s t r u c t u r e i s time dependent
The g r e a t impacts o f some random e v e n t s o r s h o c k s , such as
war, s t r i k e s , f l o o d , e t c . , a r e spread over s e v e r a l p e r i o d s o f t i m e ,
c a u s i n g t h e e r r o r terms t o be s e r i a l l y c o r r e l a t e d .
"true-autocorrelation".

This i s so-called

-14-

4.2

Effects
When the OLS technique i s s t i l l used f o r e s t i m a t i o n , the major

effects are:
(i)

Unbiased but i n e f f i c i e n t e s t i m a t o r of B
GLS p r o v i d e s t h e BLUE o f B when the d i s p e r s i o n m a t r i x o f e,
2
oM}, i s n o n d i a g o n a l .

That i s t o say on the average

the sampling

v a r i a n c e s o f GLS e s t i m a t e s of B a r e l e s s than t h a t o f OLS e s t i m a t e s


of

g, hence OLS i s i n e f f i c i e n t

compared w i t h GLS.

U n d e r e s t i m a t i o n o f the v a r i a n c e s o f the e s t i m a t e s o f B

(ii)

As an i l l u s t r a t i o n , c o n s i d e r t h e v e r y simple model
y

V
where u

= Bx

e
e

+ c

t - i

s a t i s f i e s assumptions ( 2 . 1 0 ) .

I t has been shown t h a t the

v a r i a n c e o f OLS e s t i m a t e o f B i s [ 1 3 : p . 2 4 7 ]
n-1
a

(4.1)

Var(6

0 L S

i>i i l

- -f
i-i

n-2

[ 1 + 2^

+.2p

1=1

+ 2 p e
n

-f^*

i - i

X
1

"
n

i=l

The OLS formula ( 2 . 3 ) i g n o r e s

J, l i+2

xi
1

the term i n parentheses

i n ( 4 . 1 ) and

2
2
g i v e s the v a r i a n c e s o f the e s t i m a t e s o f B as a / x ^ I f b o t h e
. i=l
and
x a r e p o s i t i v e l y a u t o c o r r e l a t e d the e x p r e s s i o n i n parentheses
n

i s almost c e r t a i n l y greater than unity, therefore the OLS formula


w i l l underestimate the true variance of 3
(iii)

n T C

I n e f f i c i e n t predictor of Y
When autocorrelation i s present,, error made at one point i n
time gives information about the error made at a subsequent point
i n time.

The OLS predictor f a i l s to take this information into

account, hence i t i s not the BLUP of Y [13:p.265-266].


A.3

Detection
The tests which are commonly used to recognize

the existence of

f i r s t - o r d e r autocorrelation are the following.


(i)

Eye-ball tests
The plot of OLS residuals e
Any nonrandom bechaior
autocorrelation.
lagged value e

fc

of e

fc

against time t can be informative.

can be considered

as an i n d i c a t i o n of

We may also plot the OLS residual e


^.

I f the observations

against i t s

are hot evenly spread over

the four quadrants, we may conclude the f i r s t - o r d e r autocorrelation


i s present.

These eye-ball tests are quite e f f e c t i v e , however they

are imprecise and do not lend themselves to c l a s s i c a l i n f e r e n t i a l


methods.
(ii)

yon-Neumann r a t i o
In 1941, the r a t i o of the mean square successive difference to
the variance was proposed by von-Neumann as a test s t a t i s t i c for the
existence of f i r s t - o r d e r autocorrelation [22].

Though various

applications have proven the usefulness of the von-Neumann r a t i o ,


we emphasize that this test i s applicable only when e values are
independently d i s t r i b u t e d and the sample size i s large.

In practice,

the OLS r e s i d u a l s used

to compute t h e von-Neumann r a t i o

are not i n d e p e n d e n t l y d i s t r i b u t e d

usually

even when the t r u e e r r o r

terms

are.
(iii)

Durbin-Watson t e s t
T h i s t e s t , named a f t e r i t s o r i g i n a t o r s D u r b i n and Watson, i s
w i d e l y used

f o r s m a l l sample s i z e s

comings o f the Durbin-Watson t e s t .


o f indeterminancy.

[4][5]..

There a r e some s h o r t -

First,

t h e r e e x i s t two r e g i o n s

Though an exact t e s t was suggested

i n 1966, i t s heavy c o m p u t a t i o n a l burden, p r e v e n t s


a p p l i c a t i o n s ' . [10].

Secondly,

the .test" from wide

the Durbin-Watson t e s t i s d e r i v e d f o r

non-stochastic explanatory v a r i a b l e s only.


if

by Henshaw

I t has been shown t h a t

the lagged dependent v a r i a b l e s a r e p r e s e n t e i t h e r i n s i n g l e

r e g r e s s i o n e q u a t i o n models or i n systems o f simultaneous r e g r e s s i o n


e q u a t i o n s , t h e Durbin-Watson t e s t

i s b i a s e d towards the v a l u e f o r

a random e r r o r , t h a t i s , d i s b i a s e d towards 2
v e r y m i s l e a d i n g i n f o r m a t i o n [17].

thereby

giving

I t i s as n e c e s s a r y as important

to t e s t f o r s e r i a l c o r r e l a t i o n f o r models c o n t a i n i n g lagged
dependent v a r i a b l e s s i n c e a u t o c o r r e l a t e d models a r e u s u a l l y r e p a i r e d
by i n s e r t i n g lagged Y v a l u e s i n t o the r i g h t - h a n d s i d e of the
regression equation.

To t h i s end, D u r b i n

on the h s t a t i s t i c i n 1970[3].
"h" i s d e f i n e d as the f o l l o w i n g ,
n

where b' i s the c o e f f i c i e n t o f Y

t-1*

developed

a test

based

-17-

This test

i s c o m p u t a t i o n a l cheap but o n l y a p p l i c a b l e f o r l a r g e

sample s i z e s .

The s m a l l sample p r o p e r t i e s of the "h" s t a t i s t i c

are s t i l l unknown.

-18-

5.

JOINT EFFECTS OF MULTICOLLINEARITY AND AUTOCORRELATION


In

s t a t i s t i c a l a n a l y s i s , a p o i n t e s t i m a t e i s u s u a l l y o f l i t t l e use

u n l e s s accompanied by an e s t i m a t e o f i t s a c c u r a c y .
Mean Square E r r o r
it

In t h i s

connection,

(MSE) i s w i d e l y used as a measure of a c c u r a c y .

Since

i s t r u e t h a t a c c u r a t e parameter e s t i m a t e s c o n s t i t u t e an e f f e c t i v e

model.

MSE

can be used

t o determine

the models e f f e c t i v e n e s s when

the u n d e r l y i n g o b j e c t i v e i s s i m p l y t o o b t a i n good parameter e s t i m a t e s .


In

1970 paper,

H o e r l and Kennard p r e s e n t e d

OLS and Ridge e s t i m a t e s of [11].


s t u d i e s have confirmed

the MSE p r o p e r t i e s f o r the

T h e r e a f t e r the r e s u l t s o f v a r i o u s

that ridge r e g r e s s i o n w i l l

improve the MSE

of

e s t i m a t i o n and p r e d i c t i o n s i n the presence

In

t h i s s e c t i o n , we w i l l p r e s e n t e x p r e s s i o n s f o r the MSE of

B (k)
(2.9).

and

These e x p r e s s i o n s w i l l enable us t o examine the e f f e c t o f these

c o n d i t i o n s on the r i d g e and the OLS e s t i m a t e s .

be reduced

5.1

multicollineary.

when the e r r o r terms f o l l o w a f i r s t - o r d e r a u t o c o r r e l a t e d p a t t e r n

two

o f severe

Our a n a l y s i s can

t o t h a t o f H o e r l and Kennard by s e t t i n g p

= 0.

Mean Square E r r o r of the OLS E s t i m a t e s of 8


We b e g i n w i t h the a n a l y s i s f o r the OLS e s t i m a t e s f o r a f i r s t - o r d e r A L R

model. L e t

= D i s t a n c e from

^0LS"

) T (

Q L S

to B

^0LS~^ '

"
We d e f i n e the MSE of B

ul_ib

2
to be E ( L ) .
_L

P r o p o s i t i o n 5.1

(5.D

-BO*) =
1

j = l =1

D
J J i

-19-

where

D = X(X X)
T

Proof:

( 5 . 2 )

From (2.1)

^OLS "

(2.2)

? P ? I ~

- 1

= (X X) X (X+e) - 3T

-1

= (X X)
l

-1

X. e

By d e f i n i t i o n and (5.2) i f follows that

E(L ) = E [ ( 6
2

-B) (i
T

O L S

O L S

-3)]

= gt^xonp X e] .
Noting that E(e) = 0 i t follows from Theorem 4.6.1

G r a y b i l l [7:p.l39]

that
(5.3)

E(L ) = a t . [ X ( X X ) " X V ]
2

From the d e f i n i t i o n of V and D, (5.1) follows.


(5.1) does not give much insight into the effect of m u l t i c o l l i n e a r i t y
and autocorrelation on the MSE

of 3Q

By rotating axes (using

LS

p r i n c i p a l components) the effect can be more c l e a r l y

Proposition

5.2

1=1

where x*. i s the i

demonstrated.

j > I

observaton on the i

1=1

p r i n c i p a l component.

^ o r a matrix A we use the notation t ( A ) to denote the trace of A.


r

-20-

Proof:
(5.5)

From (2.4) and (2.5)


a

- a = (X* X*) X* Y - a
T

Q L S

-1

= (X* X*) X* (X*a+e) - a


T

T
-1 T
= (X* X*) X* g
By d e f i n i t i o n and (5.5) i t follows that

^"ibLS-^VPCgoLS-^l

< 1>
L

E [ (

?0LS^

^0LS 2
_

= E[e X*(X* X*)


T

)]

X* e]
T

Hence by the same argument used i n proving Proposition 5.1

E(L )
2

a t [X*A X* V]
u r ~ ~ ~ ~
2

xl i

!i !i
X

2
1 AT
l
L

X* X* .

l i nx

au2 tr (,

V .)
X* . x* . p X* . X* .

nx l x Y nx 2x
2
^ 2
AT
1
AT
X

p x*
y nx
2

1 A
x
2

By d e f i n i t i o n of A and V, (5.4) follows.


After orthogonal rotation, the effect of m u l t i c o l l i n e a r i t y and
autocorrelation

becomes apparent from (5.4).

First, i f p

i s positive

and most of the p r i n c i p a l components are also p o s i t i v e l y autocorrelated,


almost c e r t a i n l y the second term i n (5.4) w i l l be p o s i t i v e .

That i s to

-21-

say t h a t the MSE o f g

w i l l be l a r g e r than when these e f f e c t s a r e n o t

p r e s e n t ; moreover the d i f f e r e n c e w i l l be i n p r o p o r t i o n to the magnitude


of

p .

Secondly, we o b t a i n a c r o s s term o f e i g e n v a l u e s , A_ and a u t o T

c o r r e l a t i o n c o e f f i c i e n t , p^.
that i s ,

I f the matrix

(X X) i s i l l - c o n d i t i o n e d ,

i s c l o s e to zero and t h e r e is' a h i g h degree o f p o s i t i v e

c o r r e l a t i o n both

i n the p*"* component and the e r r o r terms, then the


1

second term i n (5.4) dominates and the MSE o f g


It

auto-

i s then extremely

characteristics.

can be v e r y l a r g e .

dangerous t o a p p l y OLS t o data w i t h the above

However, the problem w i l l n o t be t h a t s e r i o u s i f p

i s n e g a t i v e o r t h e p r i n c i p a l components, e s p e c i a l l y those weak components,


are n o t a u t o c o r r e l a t e d .

Finally,

from (5.4) we a r e a b l e to t e l l by

how much the MSE o f Pg^g changes because o f the e x i s t e n c e o f f i r s t - o r d e r


a u t o c o r r e l a t e d e r r o r s i n g e n e r a l r e g r e s s i o n models c o n t a i n i n g p
explanatory v a r i a b l e s .
Note t h a t when p^ = 0, (5.4) reduces t o
E ( L ) = a t (xV
I
u r ~ ~
2

?
^.
> A.
i=l

= a

5.2

Mean Square E r r o r o f the Ridge E s t i m a t e s

- '* " ' '-'

of 0

In p a r a l l e l w i t h 5.1, we d e f i n e
L.(k) = D i s t a n c e from B_(k)
L
~K

to g
~

The MSE o f g ( k ) i s g i v e n by
D

E[L (k)] = E[g (k)

B ) ( B ( k ) - )].

P r o p o s i t i o n 5.3
(5.6)

E[L (k)] =

where

2
y^k) = ^

2
Y

( k )

Y l

( k ) + Y (k) + Y (k)
2

iJ.

\^

(A +k)2
.

"i

-22-

n n-1
Y (k) - l o l l
l
.
j > I
3

Proof:

A.
] p -'

px*.x*
l
\
i=l( Aj+k)

From (2.7) and (2.8), the MSE of R.(k) can be written

as
E [ L ( k ) ] = E[( (k) - B ) V p ( 3 ( k ) - )]
2

"

E [ ( Z

?OLS " ?

= E[(a

(5.7)

) T ( Z

5oLS * ?

- a) Z Z(a
T

Q L S

) ]

- a)] + (Za - a ) ( Z a - a)
T

Q L S

Since the f i r s t term i n (5.7) i s a scalar, from (2.7) and Proposition


5.2, i t follows
(5.8)

E[(a

-a) Z Z(a
T

Q L S

0 L S

-a)] = a

t [^r^CA+kl)" ^" ^*^]


2

= a j t [X*(A+kI)" X* V]
2

2
u

2o

1=1 (XH-k)
n

n-1

l i t !

j > ,

P x*.x*

- ^

.
2

] Pf

i = l (X +k)

Since the matrix (Z-I) can be written as

(5.9)

Z - I = Z(I-Z )
-1

= Z(-kA )
-1

= -k(A+kI)

_1

From (5.9), the second term i n (5.7) can be expressed as follows.

(Za-a) -(Za-a) =
J

a (Z-I)^'a
i

2 T
-?
= kV(M-kl)
a
Z

= k

I
i=l

a. '

(A +k)

= Y (k)
'

Completing the p r o o f .
The
Y^(k)

MSE of B ( k ) c o n s i s t s o f t h r e e p a r t s , y^ik),
R

can be c o n s i d e r e d

t o be the t o t a l v a r i a n c e

e s t i m a t e s and i s a m o n o t o n i c a l l y
the

decreasing

Y ( k ) and Y^Ck).
2

o f the parameter

f u n c t i o n o f k, Y ( k ) i s
2

square o f the b i a s brought by the augmented m a t r i x k l and i s

m o n o t o n i c a l l y i n c r e a s i n g f u n c t i o n o f k w h i l e Y-j(k) i s r e l a t e d to the
a u t o c o r r e l a t i o n i n the e r r o r terms.

Hoerl

the presence o f severe m u l t i c o l l i n e a r i t y ,

and Kennard c l a i m t h a t i n
i t i s p o s s i b l e t o reduce MSE

s u b s t a n t i a l l y by t a k i n g a l i t t l e b i a s , t h a t i s , c h o o s i n g k > 0.

This

i s because i n the neighborhood o f o r i g i n , Y^(k) w i l l drop s h a r p l y


Y (k)
2

w i l l only increase

s l i g h t l y as k i n c r e a s e s

[ll:p.60-61].

After

i n c o r p o r a t i n g a u t o c o r r e l a t i o n i n the c o n t e x t o f r i d g e r e g r e s s i o n
t h e i r a s s e r t i o n w i l l s t i l l be t r u e o n l y
satisfied.

From (5.6) we see t h a t

while

analys

i f c e r t a i n conditions are

the e f f e c t s o f m u l t i c o l l i n e a r i t y and

a u t o c o r r e l a t i o n a r e the f o l l o w i n g .
(i)

If

i s p o s i t i v e and the p r i n c i p a l components, e s p e c i a l l y t h e

weak components, a r e a l s o p o s i t i v e l y a u t o c o r r e l a t e d ,
method w i l l be even more d e s i r a b l e than OLS.

then

ridge

T h i s i s because

s u b s t a n t i a l decrease i n b o t h Y-^(k) and Y^(k) can be a c h i e v e d by


c h o o s i n g k > 0 w h i l e the i n c r e a s e

i n Y ( k ) i s r e l a t i v e l y s m a l l as
2

moving t o k > 0.
(ii)

If p

i s n e g a t i v e or almost a l l o f the p r i n c i p a l components a r e

-24 -

not

autocorrelated,

then on the average Y ( k ) i s c l o s e to z e r o ,


3

hence the r i d g e and the OLS e s t i m a t e s w i l l perform


the

same as i n the u n c o r r e l a t e d

( i i i ) Since

ridge regression

relatively

case,

i s s i m i l a r to s h r i n k i n g the model by d r o p p i n g

the l e a s t important component [21:p.24-28]

(5.6) g i v e s a t h e o r e t i c a l

j u s t i f i c a t i o n t o s h r i n k t h e model i f both t h e l a s t
e r r o r terms a r e p o s i t i v e l y a u t o c o r r e l a t e d .
of e s t i m a t i o n

stability,

component and the

From the p o i n t o f view

t h e r i d g e method w i l l be h e l p f u l when severe

m u l t i c o l l i n e a r i t y i s accompanied by h i g h degree o f p o s i t i v e autoc o r r e l a t i o n both i n the weakest component

5.3

and the e r r o r terms.

When w i l l Ridge e s t i m a t e s be b e t t e r than the OLS e s t i m a t e s ?


T a k i n g the d e r i v a t i v e s o f Y-^(k) and Y 2 ( k ) , H o e r l and Kennard found

a c o n d i t i o n on k such that r i d g e r e g r e s s i o n g i v e s b e t t e r
e s t i m a t e s than OLS i n terms of MSE.
a

That i s when k i s s m a l l e r

2
, where a
i s the l a r g e s t r e g r e s s i o n c o e f f i c i e n t
max
max

MSE o f 6 ,(k) w i l l be l e s s than that o f g. [11].


T

i s present,

Y ( k ) and Y ( k ) .
2

below.

than a /
u

i n magnitude, the

When a u t o c o r r e l a t i o n

t h e c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l

b e t t e r than OLS r e g r e s s i o n a r e d e s c r i b e d
of Y ( k ) ,

parameter

perform

C o n s i d e r the d e r i v a t i v e s

"P
dy

/dk

2k

A. af

'I

i=l

(5.10) d y / d k
3

When (X X)
values

(A^k)-

approaches s i n g u l a r i t y which i m p l i e s t h a t A

of the f i r s t

are g i v e n

two

d e r i v a t i v e s i n the neighborhood of

the

origin

by

(5.11) Lim Lim


A ->0 k->0
P

(dy /dk)

= -<*>

(dy

= 0

(5.12) Lira Lim


A ->0 k+0
P

/dk)

As k i n c r e a s e s , a huge drop i n y^ w i t h s l i g h t
expected.

-> 0,

increase

in y

may

However (5.10) shows t h a t the b e h a v i o r of y^ depends on

degree of a u t o c o r r e l a t i o n b o t h i n the p r i n c i p a l components and


terms.

Therefore

increases.

be

The

y^ may

use

increase

or d e c r e a s e at v a r i o u s

of r i d g e r e g r e s s i o n

the

rate's as

i s most f a v o u r a b l e

the

when

error
k

there

i s a h i g h degree of p o s i t i v e a u t o c o r r e l a t i o n both i n the components


the e r r o r terms.

We

now

formalize

these arguments and

present

c o n d i t i o n on k such t h a t r i d g e r e g r e s s i o n w i l l be b e t t e r than
regression

i n MSE

criterion.

a
OLS

and

-26-

Let F(k) = E ( L ) - E [ L ( k ) ]
2

2
i = i At;

(A.+k)"

" j = i *=i

fcJ

=i

(A.+k)

Then
dF/dk = 2 a J

(5.13)

j -^[x/f
i=l

(X + k )

j=l =1

X*.X*

pj]-2k

P A

1=1 ( A + k )

Assume t h a t Yg(k) i s a n o n - i n c r e a s i n g f u n c t i o n of k i n the neighborhood


From ( 5 . 1 1 ) and

of o r i g i n .
moving

towards k > 0 ,

( 5 . 1 2 ) we may

^ ( d F / d k ) > 0.

i.e.

e x i s t s k > 0 such t h a t the OLS

Theorem 5 . 1 .

/r- , ,

o*
a

max

implies

If

2
\

max

ri-1

1 3=1

then E(L?; ) - E [ L ( k ) ] > 0 .


2

In o t h e r words,

e s t i m a t e s have h i g h e r MSE

estimates.

j J J + (dF/dk) > 0

expect F ( k ) to i n c r e a s e as

n-j
1=1

there

than the r i d g e

-27-

Again (5.14) w i l l reduce to Hoerl and Kennard's result i f p

=0.

When

p o s i t i v e autocorrelation exists i n the error terms and the p r i n c i p a l


components, the second term i n (5.14) may well be p o s i t i v e , hence the
range of k for ridge estimates to be better than OLS estimates i n MSE
c r i t e r i o n w i l l be larger than what Hoerl and Kennard asserted i n
uncorrelated case.

(5.14) shows that the extension i n the range of k

i s p o s i t i v e l y related with the magnitude of p .

However, (5.14) i s

2
2
just a necessary condition on k for E(L^) to be greater than E[L2(k)]
since F(k)- i s increasing i n k over the range shown by (5.14).

It

i s possible that for some values of k, F(k) i s decreasing i n k while


2
the function value i s s t i l l p o s i t i v e , that i s E(L^) i s s t i l l greater
2
than E[L (k)]. Therefore, we may

consider (5.14) as a stringent

condition on k for ridge estimates to be better than OLS


i n MSE

estimates

criterion.

If either p

i s negative or the p r i n c i p a l components, especially

those weak ones, are not autocorrelated, the behavior of Y^k)


thereby F(k) as k increases w i l l be hard to predict.

and

The effect of

autocorrelation on the range of k depends on the data set we

gathered.

In practice, the true parameters are unknown, the range of k shown


by (5.14) can be approximated

by conducting a p r i n c i p a l component

analysis and substituting the estimates for the


5.4

parameters.

Use of the "Ridge Trace"


In ridge regression the augmented matrix (kl) i s used to cause

the system to have the general c h a r a c t e r i s t i c s of an orthogonal system.

-28-

H o e r l and Kennard claimed t h a t at c e r t a i n v a l u e of k,


stabilize

[ll':p.65].

They proposed

the system w i l l

the usage of a "Ridge T r a c e " as a

d i a g n o s t i c t o o l t o s e l e c t a s i n g l e v a l u e of k and a unique
of

g i n practice.

The

"Ridge T r a c e " w i l l p o r t r a r y the b e h a v i o r of a l l

the parameter e s t i m a t e s as k v a r i e s .
dimensions

T h e r e f o r e i n s t e a d of s u p p r e s s i n g

e i t h e r by d e l e t i n g c o l l i n e a r v a r i a b l e s or dropping

p r i n c i p a l components of s m a l l importance,
how

s i n g u l a r i t y i s causing i n s t a b i l i t y ,

incorrect

ridge estimate

signs.

the

the Ridge Trace w i l l show

over/under-estimations

and

In c o n n e c t i o n w i t h a u t o c o r r e l a t i o n where r i d g e

r e g r e s s i o n i s even more d e s i r a b l e , c e r t a i n l y the "Ridge T r a c e " w i l l


of

g r e a t h e l p i n g e t t i n g b e t t e r p o i n t e s t i m a t e s and

predictions.

Even when p

be

thereby b e t t e r

i s n e g a t i v e or the p r i n c i p a l components
e

are not a u t o c o r r e l a t e d , the m e r i t s and u s e f u l n e s s of the "Ridge T r a c e "


and the "Ridge R e g r e s s i o n " a r e s t i l l p r e s e r v e d i n d e a l i n g w i t h
problem of m u l t i - c o l l i n e a r i t y .

the

-29-

6.

RIDGE REGRESSION:

ESTIMATES, MEAN SQUARE ERROR AND PREDICTION

The MSE of OLS estimates of g can be written as the difference


in the length between two vectors, $ g and g [11:p.56]
0Tj

(6.1)

E(L*) = E ( i

T
0 L S

) - gg

'

0 L S

(6.1) shows that i n the presence of severe m u l t i c o l l i n e a r i t y , the MSE


can be improved by shortening the OLS estimates of g.

In this section

we w i l l show that t h i s reasoning appears to be compatible with the


derivation of ridge estimator of 'g.

Hence ridge regression can be

expected to be better i n terms of MSE.

61

Derivation of Ridge Estimator for a CLR Model


Let B be any estimate of g.

I t s residual sum of squares, 0 ,

can be written as the value of minimum sum of squares, 0 . , plus


^
mm'
c

the distance from B to &Q g weighted through ( X X ) .


L

0 = (Y-XB) (Y-XB)
T

(6.2)

= ^Xg

= 0

min

) (Y-xg
T

O L S

+ 0(B)

o L s

) + (B-B

) X X(B-g
T

0LS

0LS

For a s p e c i f i c value of 0(B), 0

, the ridge estimator i s founded

by choosing a B to
T
Minimize B B
(6.3)

Subject to ( B - g

) X X (B-g ) = 0
T

0LS

QLS

This problem can be solved by use of Lagrange m u l t i p l i e r techniques

-30-

where (1/k)
The

i s the m u l t i p l i e r c o r r e s p o n d i n g to the c o n s t r a i n t

(6.3).

problem i s to minimize

(6.4)

F = BB
T

(l/k)[(B-6

) X (.?-0LS
T

0 L S

T x

A n e c e s s a r y c o n d i t i o n f o r B to minimize (6.4)

|f

=2B

i[2(X

X)B-2(X

"

0*

i s that

X)^

Hence

[J+^? ?)]?4 ? ?^0LS


T

and
B* = B ( k )

(X X+kI) X Y
T

_ 1

where k i s chosen to s a t i s f y c o n s t r a i n t
work the other way

(6.3).

In p r a c t i c e , we

usually

round s i n c e i t i s e a s i e r to choose a k > 0 and

compute the a d d i t i o n a l r e s i d u a l sum

of squares,

then

0.
Q

I t i s c l e a r t h a t f o r a f i x e d increment 0 , t h e r e i s a continuum of
o'
v a l u e s of B
t h a t w i l l s a t i s f y the r e l a t i o n s h i p 0 = 0 .
+ 0 , and
mxn
o
nu

the r i d g e e s t i m a t e so d e r i v e d
Therefore,

we

may

i s the one

w i t h the minimum l e n g t h .

w e l l expect the r i d g e e s t i m a t e s to y i e l d

l e s s MSE

the presence of m u l t i c o l l i n e a r i t y s i n c e they are o r i g i n a l l y


by m i n i m i z i n g the l e n g t h of the r e g r e s s i o n v e c t o r .
c e r t a i n extent
equivalent
(6.2)

derived

I t i s t r u e to a

t h a t m i n i m i z i n g the l e n g t h of the r e g r e s s i o n v e c t o r

to r e d u c i n g

the MSE

of parameter e s t i m a t e s .

Q L S

T
increase

i n the r e s i d u a l sum

approaching s i n g u l a r i t y .

That i s to say,

is

In a d d i t i o n ,

shows t h a t i t i s p o s s i b l e to move f u r t h e r away from 8

an a p p r e c i a b l e

in

of squares as

ridge regression

(X

X)

may

without

achieve large reduction i n MSE

at v i r t u a l l y no cost i n terms of the


T

r e s i d u a l sum
In 1971,

of squares i f the conditioning of (X X) i s poor enough.

Newhouse and Oman [18] used MSE

as evaluation c r i t e r i o n i n

t h e i r Monte Carlo studies of ridge regression.


the standard way

to evaluate proposals

Since then i t has become

for ridge estimators.

the above derivation of the ridge estimator, obviously we


that ridge estimates are designed to be better i n MSE
Now

From

realize

criterion.

we would l i k e to study the implications of the constraint

i n deriving the ridge estimator.


i n t e r p r e t a t i o n , we represent
Let A = PB

Since orthogonalization can ease

(6.2) i n the rotated axes.

Then
0

= ^ O L s ' V ^ O L S

min

^OLS V
)

T x

*CA-a

( J L S

X1=1 - V O L S . l * i
(

) 2

Where A^ i s the estimate of regression c o e f f i c i e n t for the i t h component


and a

i s the OLS

ULio

estimate of regression c o e f f i c i e n t for the i t h

component.
The problem i s
T
Minimize A A
(6.5)

Subject

to ( - ?
A

) A(A-a
T

0 L S

0 L S

) = 0

or equivalently

(6.6)

(6.5)

Subject

to

\ (A.-
i=l

Shows that the vector

) X.
2

0LS

A _

= 0

i
?

0 L S

] i s normed through A to have the

length equal to 0 .

Since the eigenvalue X^ can be considered as an

indicator of the information content and explanatory power of the i t h


p r i n c i p a l component, we may well conclude that the derivation of the
ridge estimator has already taken the r e l a t i v e information content
and explaining power of the explanatory variables into account.

(6.6)

shows that the constraint has incorporated the concept of square-errorloss function as w e l l .

It increases the length of A the most when the

parameter estimates of the important

components deviate from OLS

estimates

since i t i s found by taking the square of the deviations multiplied by


their corresponding eigenvalues.

This implies that i t i s best to

shrink the estimate of for those components that have small eigenvalues,
i . e . the ones most subject to

6.2

instability.

Derivation of Ridge Estimator for an ALR Model


In the presence of autocorrelated error terms, the OLS

estimator

of 8 w i l l no longer have the minimum-variance property; the GLS


estimator w i l l be the BLUE of B.

type

Our derivation O f a new ridge

estimator adjusted for autocorrelation, 8

(k), w i l l p a r a l l e l with

~GR

the derivation of 6 (k) i n the previous section.

Again l e t B be

r)

estimate of B..

any

Its residual sum of squares, 0, can be written as

the value of the minimum sum of squares, 0 . , plus the distance from

mm'

r,

B to

rj-i
G L S

weighted through ( X X ) .
A

(See (2.10) for notation).

0 = (Y*-X*Bj (Y*-X*B)
T

" CY -xJ
A

) (Y,-xJ
T

G L S

G L S

) 4- ( B - B

) X^X (B-B
T

G L S

G L S

We have

-33rp

0(B) = 0 , 3

For a s p e c i f i c v a l u e

(6.7)

subject

The

to ( B -

) X^(B-

i s derived

= 0

G L S

)^ ] -

to minimize B B

L a g r a n g i a n i s g i v e n by

* = ? ? + C(?-
T

G L S

> &(?-e
T

A n e c e s s a r y c o n d i t i o n f o r a minimum i s t h a t

2B

+ i

[2X^B

2 ( X ^ ) 3

T h i s reduces t o

(6.8)

B* = 3

G R

(k)

= (X*X* +

for

T -1
'
-1
T
( X Q X + kl)
X n

Where k i s chosen t o s a t i s f y
trace of 3

k l ) "

(6.7).

-1

The c h a r a c t e r i z a t i o n o f the r i d g e

(k) w i l l be e s s e n t i a l l y the same as t h a t of B ^ k ) .

a specific

increment 0 , the 3 , ^ so d e r i v e d
o
~(JK.

i s the r e g r e s s i o n

v e c t o r w i t h the minimum l e n g t h among a continuum o f v a l u e s


will

satisfy

the r e l a t i o n s h i p 0 = 0 ^
m

+ 0 .
Q

For i n s t a n c e ,

of B

that

However m u l t i c o l l i n e a r i t y

may no l o n g e r be a s u b s t a n t i a l problem a f t e r t r a n s f o r m i n g
i n some r a r e cases.

That i s ,

X into X

t h i s may happen i n time s e r i e s

s t u d i e s where m u l t i c o l l i n e a r i t y

i s a r e s u l t o f the e x p l a n a t o r y

i n c r e a s i n g together

I t i s then p o s s i b l e t h a t the transformed

over time.

v a r i a b l e s are not close to being


the case, the r e d u c t i o n

c o l l i n e a r w i t h each o t h e r .

i n MSE can n o t be obtained

variables

I f that i s

with only a s l i g h t

i n c r e a s e i n the r e s i d u a l sum o f squares.


T h i s i s because o f t h e low
"
T
MSE of
a l r e a d y achieved and the n o n - s i n g u l a r i t y o f the ( X ^ X ) .
G T s

In most c a s e s , i f not a l l ,

the m a t r i x

T -1
(X ft X)

i s very l i k e l y

to have

X
a broad

eigenvalue

spectrum i f (X X) does.

on the m o t i v a t i o n of m i n i m i z i n g
the i n t e r p r e t a t i o n and

the l e n g t h of the r e g r e s s i o n v e c t o r ,

i m p l i c a t i o n s o f - t h e c o n s t r a i n t i n the d e r i v a t i o n

of r i d g e e s t i m a t o r of 8 f o r a CLR

Mean Square E r r o r of the " G e n e r a l i z e d


The MSE

Y
p

model w i l l be a p p l i c a b l e to t h a t i n

3.
~GR

the d e r i v a t i o n of

6-3

= X8
=0

of 8^

&

and

Tc

(k)

are r e a d i l y e s t a b l i s h e d .

= a

tr

( X ^ ) '

with

to 3

-1

T
t r (X

model, (5.3)

(6.9)

Since

of 3 as f o l l o w s ,
-GLS

L e t L . = D i s t a n c e from $
2

Estimators"

s a t i s f i e s a l l the assumptions of a CLR

g i v e s the MSE

E(L )

Then the p r e v i o u s d i s c u s s i o n

-1
X)

'

e
S e t t i n g p^ = 0 i n (5.8),
L e t L^(k) = D i s t a n c e

(6.10)

E[L (k)] = a
2

(6.10) g i v e s

the MSE
from ( k ) to 8

of

8 (k).
r R

G R

t r [ ( X f t X+kI) ( X f t X ) ] +
T

2 T T -1
-2
k (X ?+kI) 8
The

(6.10),
expect

effect
since

E(L-j)

a n

of a u t o c o r r e l a t i o n Is d i f f i c u l t

to i n f e r from (6.9)

0, i s not a d i a g o n a l m a t r i x , however, n o r m a l l y we
^

E [ L ^ ( k ) ] to.be l e s s than E(L^) and

and
may

E[L (k)] respectively.


2

-35-

6.4

Estimation
T h e o r e t i c a l l y , t h e GLS g i v e s the BLUE o f 8 f o r an ALR model.

But u s u a l l y i n p r a c t i c e , n e i t h e r the order o f a u t o c o r r e l a t i o n


s t r u c t u r e nor t h e v a l u e o f the parameter p' i s known.
or GR e s t i m a t e s

can not be computed d i r e c t l y .

Many two-stage methods

have been proposed t o approximate the GLS e s t i m a t e s


to be q u i t e e f f e c t i v e .
process

Hence the GLS

and have proven

These i n c l u d e the Cochrane-Ocutt

iterative

[1] and Durbin's two-step method [ 2 ] .

In the j o i n t presence o f m u l t i c o l l i n e a r i t y and a u t o c o r r e l a t i o n ,


it
one

i s a c t u a l l y quite straightforward

incorporated

of .

We i l l u s t r a t e how r i d g e r e g r e s s i o n can be

i n Durbin's two-step method f o r a simple model w i t h

c o l l i n e a r explanatory

(6.11)

.
and

- 3

B l

e t-l

= P

t l

only

variables:

+ 8 X
2

t 2

t=l,2,,..,n

' J

for a l l t
E(u )

= 0

E ( u

t' t+s
u

for s = 0

=0

The

with

o f t h e two-stage r e g r e s s i o n methods i n the hope o f a c h i e v i n g

b e t t e r estimates

two

t o combine r i d g e r e g r e s s i o n

for s 0

transformed r e l a t i o n i s g i v e n by

(6.12)

- P

Y _
t

- 3 (l-p ) + ^ ( X ^ - P ^ . ^
o

+ u.

e (X
2

t 2

-P X
e

l j 2

Combining (6.11) and (6.12) gives


(6.13) Y

- 3 (1-P >-+ ^

~ h e t-l,l
X

2 t2 " W t - l , 2
X

Y
E

t-l

+ U

The f i r s t step i s to estimate the parameters of (6.13) using OLS. Then


use the estimated c o e f f i c i e n t of Y
(Y -p Y _ ),
t

^ to compute the transformed variables

( X ^ - p g X ^ . ^ and (\ ~ e t-l
P

^.

At the second step,

ridge regression i s highly recommended to be used i n place of OLS


and applied to relationship (6.12) containing those transformed
variables.

The c o e f f i c i e n t estimate of (X . - p X
.) i s our
ti
e t-li
approximation of 3
and the intercept term divided by (1-p ) i s our
n

/\

approximation of 3
G R

It might seem reasonable to apply ridge regression at the f i r s t


step of Durbin's method since X,, and X^ are c o l l i n e a r .
t l
t2
v

As stated

e a r l i e r i n Section 3, high pair-wise c o r r e l a t i o n c o e f f i c i e n t of


explanatory

variables does not necessarily r e s u l t i n estimation' i n - .

stability.

Besides, the lagged values of X ^ and X ^ are inserted

into the explanatory

variable set.

I f X ^ and X ^ are not autocorrelated,


T

the conditioning of the enlarged


u

fc

(X X) may be s a t i s f a c t o r y . Moreover

i n (6.12) has a scalar dispersion matrix, therefore OLS gives

consistent estimates of regression c o e f f i c i e n t s .


estimates,

Also among these

only the c o e f f i c i e n t estimate of Y' ^ w i l l be used to compute

the transformed variables.

Hence OLS technique i s recommended to be

used at the f i r s t step even when Xj.^ and X ^ are c o l l i n e a r .


This combination of ridge regression and Durbin's two-step method
can e a s i l y be extended to a p-variable model with higher order of
autocorrelation.

-37-

6.5

Prediction
Consider a f i r s t - o r d e r ALR model, (2.15) gives the minimum

variance predictor (BLUP).

In p r a c t i c e , both $

replaced by their estimated

values.

r T O

and p are

If ridge regression i s used i n conjunction with some other


methods to cope with the j o i n t problem of m u l t i c o l l i n e a r i t y and autoc o r r e l a t i o n , the p r e d i c t i o n i s given by
(6.14)

f c + 1

= X ^ B ^ k )

+P e

st
where X ^
t +

i s a 1 x p vector of the (t+1)

explanatory v a r i a b l e s , B
regression c o e f f i c i e n t s , P

observation on the

(k) i s a p x 1 vector of approximated

i s an estimate of autocorrelation c o e f f i c i e n t

and e^ i s the ridge r e s i d u a l at time t .

THE MONTE CARLO STUDY


Consider a f i r s t - o r d e r ALR model with two explanatory variables,
the error terms i n the transformed r e l a t i o n , as shown by (6.12), have
a scalar dispersion.

The residual sum of squares from (6.12) i s given

by

t=i

t=i

If

t 2 -

e W

and Y^ are given, the summation can run from 1 to n, other-

wise i t can only run from 2 to n.


/N

/S

respect to B , 6^, 8
Q

The direct minimization of (7.1) with

A
a n
2

d P

leads to non-linear equations, therefore,


/N

/S

/\

the analytic expressions for 8 B^> 8^ and


Q

can not be obtained.

As mentioned before, many two-stage methods have been proposed to


approximate these parameters.

Usually i n practice, not only the

parameters and the true error terms but also the order of autocorrelation
structure i s unknown.

As indicated previously, j o i n t presence of

autocorrelation and m u l t i c o l l i n e a r i t y w i l l further complicate the s i t u a tion.

Under t h i s circumstance, the r e l a t i v e effectiveness of those

two-stage methods can best be studied by the Monte Carlo experiments


[19].

7.1

Design of the Experiments


The main purpose of the experiment i s to give an empirical

support to the inference drawn from our analytic studies.


experiments are conducted i n the following manner.

The sampling

B a s i c a l l y , the

sampling experiments comprise nine d i f f e r e n t experiments with d i f f e r e n t

-39-

degree

of multicollinearity and autocorrelation.

They are

summarized in Table 1.

' Table 1

Experiment
1
2
3
4
5
6
7
8
9

12

.05
.05
.05
.50
.50
.50
.95
.95
.95

In our experiments, y ^ and

.05
.50
.90
.05
.50
.90
.05
.50
.90

are used to indicate the severity

of multicollinearity and autocorrelation respectively.. Usually in


practice, multicollinearity constitutes a problem only when Y-^

1 S

a s

high as 0.8 or 0.9. In addition, the error terms are normally considered
to be independent, moderately and highly autocorrelated when p = .05,

.50 and .90 respectively.

As shown by Table 1; the experiments are

set up to have different characteristics. Through this design, we can


study the effects of autocorrelation on estimation and prediction for
a given degree of multicollinearity.

Moreover, we can observe how these

effects of autocorrelation change as the degree of multicollinearity


varies.
The data is generated as follows; f i r s t , values are assigned to
8^, 8 , B and the probability characteristics of error terms u
1

fc

in

-40(2.9).

Three s e r i e s of e

the v a l u e s o f e

i n (2.9) are s u b s e q u e n t l y generated, g i v e n

and d i f f e r e n t v a l u e of p .
e

The

probability

c h a r a c t e r i s t i c s o f the j o i n t d i s t r i b u t i o n X, ., and X^ a r e chosen to


t l
t.2
J

generate the s e r i e s of X ^
t h r e e experiments.

and X ^

We

and X ^

a r e generated f o r the r e m a i n i n g

have a l s o a s s u r e d t h a t t h e r e i s no

f i r s t - o r d e r a u t o c o r r e l a t i o n i n X ^ and X ^
is first-order.

Solving for Y

of f o r t y o b s e r v a t i o n s a r e generated.

so t h a t the e r r o r

structure
sets

F o r each experiment, ten samples


In each sample, t h i r t y

from t = 1 to t = 30 a r e employed

by a p p r o r p i a t e methods.

significant

based on the d a t a , n i n e d i f f e r e n t

can be generated f o r the experiments.

on the Y 's

first

By v a r y i n g the c o r r e l a t i o n c o e f f i c i e n t o f X ^ and

X 2 another two s e r i e s o f X ^
s i x experiments.

t h a t a r e s u i t a b l e f o r the

observations

to e s t i m a t e the e q u a t i o n

O b s e r v a t i o n s 31 to 40 a r e used to study the

p r e d i c t i o n p r o p e r t i e s of e s t i m a t o r s .

The BLUP i s used i n the presence

of s i g n i f i c a n t a u t o c o r r e l a t i o n i n the e r r o r

terms.

S p e c i a l c a r e has to be e x e r c i s e d i n c o n t r o l l i n g the s e r i a l
c o r r e l a t i o n p r o p e r t i e s of the e r r o r terms.

In t h i s c o n n e c t i o n , OLS

r e g r e s s i o n has t o be r u n on
(7.2)

=pe
3

,+u
J

j = 1,2,...,10
t = 1,2
40

to determine whether the e s t i m a t e d r e g r e s s i o n c o e f f i c i e n t


c o n s i s t e n t w i t h the p which i s used to generate them.
i s well-known,

the OLS

is

However, as

e s t i m a t e s o f parameters f o r s m a l l samples

may

be b a d l y b i a s e d i f some o f the r e g r e s s o r s are lagged dependent


variables

[23].

T h i s i s because the e r r o r terms, u_.

fc

be independent of the r e g r e s s o r s , j _ ^ >


E

jt''"'' j40"
G

w i l l no
*

longer

^ )>
2

-41-

( -i.>
jt

)
jt+s

-4.j_

0 for s

4- 0

and a l l t , hence the OLS e s t i m a t e f o r the

c o e f f i c i e n t of ..-, i s b i a s e d .
3t-l

The u s u a l t t e s t on the e s t i m a t e of

r e g r e s s i o n c o e f f i c i e n t may be q u i t e m i s l e a d i n g , t h e r e f o r e we can o n l y
a s c e r t a i n t h a t the d e s i r e d s e r i a l
by a s s u r i n g t h a t u ^
first

t e s t whether

c o r r e l a t i o n p r o p e r t i e s are o b t a i n e d

are randomly d i s t r i b u t e d .

the s e r i e s o f u

For each sample,

i s c o n s i s t e n t w i t h the p r o b a b i l i t y

c h a r a c t e r i s t i c s chosen t o generate them, then we use run


determine whether u

fc

we

i s randomly d i s t r i b u t e d .

t e s t to

Only those s e r i e s of u

passed a l l the t e s t s are adopted i n our s i m u l a t i o n study.

We are

now

ready to e s t i m a t e the r e g r e s s i o n e q u a t i o n .
First,
the

f o r each experiment, the OLS p r i n c i p l e i s a p p l i e d to e s t i m a t e

parameters.

The Durbin-Watson

statistic

t e s t the e x i s t e n c e of a u t o c o r r e l a t i o n .
statistic

i s used as a f i l t e r

Whenever the

to

Durbin-Watson

computed from the f i t t e d model i s l e s s than the c o r r e s p o n d i n g

upper c r i t i c a l v a l u e d^ (a=0.05), a u t o c o r r e l a t i o n i s assumed to'be


p r e s e n t i n the e r r o r terms, then D u r b i n ' s two-step method i s used i n con j u n c t i o n ' w i t h Ridge r e g r e s s i o n f o r e s t i m a t i o n as d e s c r i b e d i n
S e c t i o n 6.4;
o n l y OLS
purposes.

o t h e r w i s e , a u t o c o r r e l a t i o n i s assumed to be absent, and

and Ridge r e g r e s s i o n s t e c h n i q u e s are employed

for- e s t i m a t i o n

In a d d i t i o n , whenever the e x i s t e n c e of a u t o c o r r e l a t i o n i s
~

r e c o g n i z e d , 3

~
Tc

and B

-'

(k) are computed f o r comparison purposes.

Since the t r u e v a l u e of the a u t o c o r r e l a t i o n c o e f f i c i e n t


each experiment, c a l c u l a t i o n s o f 3
and 3(k)
- GLo
~ GR

i s known f o r

w i l l simply be the

s t r a i g h t f o r w a r d m u l t i p l i c a t i o n of m a t r i c e s as shown by

(2.13) and

(6.8).

The methods adopted f o r e s t i m a t i o n i n each experiment are r e c o r d e d i n


T a b l e 2.

-42-

Table 2

Experiment

Method

OLS,

Durb., Durb.+RR, GLS,

GR

Durb., Durb.+RR, GLS,

GR

OLS,

OLS:

RR

RR

Durb., Durb.+RR, GLS,

GR

Durb., Durb.+RR,. GLS,

GR

OLS,

Durb., Durb.+RR, GLS,

GR

9.

Durb., Durb.+RR, GLS,

GR

RR

O r d i n a r y L e a s t --squares

Durb.:

Durbin's

Durb.+RR:

Regression

Two--step Method

Durbin's Two-step i n c o n j u n c t i o n w i t h
Ridge R e g r e s s i o n

GLS:
GR:

C e n t r a l i z e d Least-squares

Regression

Ridge R e g r e s s i o n a d j u s t e d f o r A u t o c o r r e l a t i o n

As i s expected, no c o r r e c t i o n f o r a u t o c o r r e l a t i o n i s n e c e s s a r y f o r
experiments

1, 4 and 7.

Whenever t h e r i d g e method i s a p p l i e d seven o r

e i g h t v a l u e s o f k have been used i n our study.

I n o r d e r t o minimize

the e f f e c t s o f s u b j e c t i v i t y r e s u l t i n g from s e l e c t i n g the v a l u e o f k

A.

i n r i d g e r e g r e s s i o n s , we compute the mean 6 o f the samples f o r every


~R
s p e c i f i c v a l u e o f k i n each experiment.
based

on a "Mean Ridge T r a c e " .

That

Then the v a l u e o f k i s s e l e c t e d

i s to say, a unique

s e l e c t e d w i l l g e n e r a l l y be t h e b e s t f o r a l l then samples.

value of k
Obviously,

the v a l u e o f k s o - s e l e c t e d may w e l l n o t t o be the b e s t f o r every


i n d i v i d u a l sample.

T h e r e f o r e , t h e minimum o f t h e MSE o f r i d g e

e s t i m a t e s o f 3 a c h i e v e d f o r each experiment

i s s l i g h t l y upward b i a s e d .

C e r t a i n l y t h i s way o f s e l e c t i n g the v a l u e o f k cannot be used i n


practice.

7.2

Sampling R e s u l t s
For

each method f o r each experiment, the MSE o f the e s t i m a t e s


2

of

t h e a d j u s t e d R , t h e r e s i d u a l sum o f squares, the MSE f o r e c a s t

and the Durbin-Watson s t a t i s t i c a r e averaged over t e n samples.


a d d i t i o n , t h e mean e s t i m a t e
statistic

a r e a l s o computed,

In

o f p . and t h e mean H a i t o v s k y h e u r i s t i c
u

i s assumed to f o l l o w a normal

d i s t r i b u t i o n w i t h mean zero and v a r i a n c e e q u a l t o 6 i . e . u ~'N(0,'6).


The t r u e mean o f X

i s 10 and t h a t o f X
tl

variance of X

i s 8.

and X ^

18 and 15.

i s chosen t o be 3 f o r each

sample.
The t r u e v a l u e o f BQ i s 5, 8^ i s 1.1 and
7.2a R e s u l t s assuming p i s Known
F i r s t we assume p

i s known.

best of s i t u a t i o n s .

i s 1.

The r e s u l t s here w i l l

whether the methods d e s c r i b e d i n S e c t i o n 6.4


the

The r e s p e c t i v e

t2

indicate

can show promise i n

T a b l e 3 c o n t a i n s t h e average MSE o f 8
~ Lrii b

and 3 f o r experiments 2, 3, 5, 6, 8 and 9.


_GR

Table 3

\5xperiment
k

\.

(Y =.05) (Y =.05) (Y =.50) (Y =-50) (Y =.95) (Y =.95)


12

(P

0.0*

12

=.50)

(P

12

=.90)

(P

12

=.50)

12

=.90)

(P

12

=.50)

(P.

=.90)

.4824

3.2913

.0834

2.3940

.025

.0591

1.8084

.0018

1.4991

.05

.0363

.8073

.1215

.8337

.075

.3603

.2274

.3033

.4263

.1

.9849

.0156

.8871

.1092

.4929

.2901

.2

5.7786

2.0100

4.0851

.6153

2.4426

.4242

.3

2.9771

7.0896

9.3743

4.0521

5.4924

1.0113

.5

30.1494

21.7887

21.1581

11.4165

13.7628

4.8285

.7

47.5367

36.6241

35.2063

22.2489

23.7003

10.9524

1.0

75.6732

63.4542

56.5869

39.9762

38.5830

.1011

2.1681

.0561

.9405
.

* GLS regressions can be considered as a s p e c i a l case of Ridge


regressions, adjusted for autocorrelation, with k = 0.
In Section 5, i t has been shown that the MSE of 8

will

increase

rapidly i f s i g n i f i c a n t l y p o s i t i v e autocorrelation exists both i n the


disturbances and i n the p r i n c i p a l components.

Correction for auto-

c o r r e l a t i o n w i l l then be necessary i n estimating the regression

equation.

Though the GLS regression y i e l d s the BLUE of 8 , the behavior of the MSE
of 8
i s very d i f f i c u l t to i n f e r from (6.10).
~Gljb

From the MSE of

experiments 3, 6 and 9 when k = 0, we observe that the MSE of 8


decreases as the degree of m u l t i c o l l i n e a r i t y increases for s u f f i c i e n t l y
high degree of autocorrelation.

On the other hand, for a given degree

of m u l t i c o l l i n e a r i t y , the MSE.of 8
autocorrelation increases.

w i l l increase as the degree of

But the magnitude of the increase i n the MSE

-45of B

decreases as the r e l a t i o n among explanatory variables increases.

For instance, the difference i n MSE

of 3

r T 0

between experiments 8 and

~ GL< o

9 i s less than that between experiments 5 and 6.

Moreover, Table 3

shows that there exists at least one value of k for each experiment
such that the MSE

of 8

i s less than that of 8 .


TC

= 9,

Note that when

~GLiD

~GR

k = .1 obtains the minimum MSE

of the estimates of g i n
T

experiment 9.

This also implies that the transformed matrix (X^X^) i s

s t i l l ill-conditioned.

In Section 5.3 we have shown that the range

of k such that the MSE

of 3- i s less than that of B


w i l l be larger
~R
~OLS
T c

i f m u l t i c o l l i n e a r i t y i s accompanied by high degree


Now
P

with parameter estimates f u l l y adjusted

of autocorrelation.

for autocorrelation (since

i s known) experiment 9 s t i l l has the largest admissible

range of k.

That i s , the range of k such that the MSE


B is s t i l l

of g
i s less than that of
~GR
larger i f m u l t i c o l l i n e a r i t y i s accompanied by high

~Gijb

degree of autocorrelation and autocorrelation has been f u l l y


We

adjusted.

also observe that as the degree of autocorrelation increases, a

larger reduction i n the MSE of the estimates of 3 can be obtained by


replacing B with $ . For instance, the difference i n the MSE of
~GLo
~GR
3 and 3
(.05) i n experiment 8 i s less than that i n experiment 9.
~G1JO
~GK
c

n T

of B i s very d i f f i c u l t , i f not
~GR
(6.10) shows that the MSE of $
i s comprised
~GR

However, the behavior of the MSE


impossible,to
of two

terms.

predict.
How

each term behaves w i l l depend not only on

the

data matrix X and the degree of autocorrelation but also on the

way

the matrix X. i s linked with the matrix 0. ^.


7.2b

Results assuming p
In practice p

i s unknown

i s unknown.

c o e f f i c i e n t i s unknown and we

We

assume that the true autocorrelation

t r y to f i t the equation using h e u r i s t i c

-46-

techniques akin to the Durbin's two-step method i n which has been


shown by G r i l i c h e s and Rao [8] to perform well when there i s autocorrelation.

We apply these techniques as described i n Section 7.1.


2

Tables 4,5 and 6 report the mean adjusted R

and the mean Durbin-

Watson s t a t i s t i c for each experiment. (du(a = 5%) = 1.57 for


experiments L-4 and 7; du(a = 5%) = 1.56 f o r theremaining six
experiments)
Table 4 ( Y

Experiment

1 2

= .05)

1
V

2
.05

= .50

.90

2*
R
a

d**

.8640

2.0791

.8979

1.8766

.9149

1.8380

.025

.8640

2.0903

.8884

1.8713

.9144

1.8269

.05

.8620

2.1001

. 8868

1.8728

.9128

1.8286

.075

.8579

2.1083

.8844

1.8805

.9104

1.8409

.1

.8567

2.1154

.8813

1.8916

.9071

1.8613

.2

. 8401

2.1353

.8619

1.9619

.8888

1.9809

.3

.8167

2.1463

.8399

2 .0383

.8648

2.1030

.5

.7651

2.1536

.7865

2.1563

.8102

2.2789

1.0

.6394

2.1527

.6571

2.2960

.6780

2.4740

k
0.0

* R :
*d :

2
R
a

the mean adjusted R'


the mean Durbin-Watson

statistic

2
R
a

Table 5

(Y,

= -50)

Experiment

5
05

.50
P

0.0

2
R

.90

' d

.8973

2.0984

.9178

1.8958

.9475

2.0754

.025

.8970

2.1040

.9175

1.9054

.9472

2.0865

.05

.8962

2.1095

.9168

1.9194

.9464

2.1059

.075

.8950

2.1147

.9155

1.9371

.9451

2.1317

.1

.8933

2.1198

.9138

1.9576

.9434

2.1622

.2

.8832

2.1381

.9032

2.0540

.9336

2.3024

.3

.8702

2.1592

.8895

2.1504

.9184

2.4532

.5

.8351

2.1721

.8546

2.2988

.8834

2.6086

.7

.7970

2.1826

.8187

2.4203

.8440

2.7084

1.0

.7412

2.1900

.7572

2.4732

.7846

2.7887

Table 6

(y., = . 9 5 )

Experiment
V
k

8
.05

9
50

90

.9208

2.0511

.9391

1.8785

.9575

1.8707

.05

.9201

2.0873

.9380

1.9058

.9565

1.8883

.1

.9181

2.1128

.9359

1.9437

.9544

1.9335

.2

.9116

2.1473

.9293

2.0280

.9677

2.0562

.9024

2.1659

.9199

2.1084

.9382

2.1660

.5

.8786

2.1768

.8957

2.2312

.9136

2.8381

.7

.8505

2.1725

.8673

2.3082

.8847

2.4417

1.0

.8056

2.1594

.8217

2.3739

.8399

2.5259

0.0

-48-

From T a b l e s 4-6, we observe t h a t the a d j u s t e d

I n c r e a s e s as the

degree of a u t o c o r r e l a t i o n i n c r e a s e s f o r a g i v e n v a l u e o f k and a g i v e n
degree o f m u l t i c o l l i n e a r i t y .

This i s i n t u i t i v e l y

plausible

since

a u t o c o r r e l a t i o n can account f o r p a r t o f the v a r i a t i o n i n the e r r o r s ,


thereby d e c r e a s i n g the r e s i d u a l sum of squares and i n c r e a s i n g the
2
adjusted R .
increases.

F o r each experiment, the a d j u s t e d R

d e c r e a s e s as k

The r e a s o n i s obvious from the d e r i v a t i o n of r i d g e

B e s i d e s , the b e s t R
is,

2
cl

estimators.

a c h i e v e d f o r each experiment i s p r e t t y h i g h , t h a t

the e s t i m a t e d model can e x p l a i n most o f the v a r i a t i o n i n Y^_.

This

a l s o i m p l i e s that the e s t i m a t i o n methods adopted i n our experiments are


fairly efficient

and powerful.. The mean Durbin-Watson

statistic

computed f o r each method f o r each experiment i s h i g h enough to


a s c e r t a i n t h a t the f i t t e d model has s u c c e s s f u l l y removed the problem
of

autocorrelation.

S i n c e the model i s r e a s o n a b l y w e l l f i t t e d ,

simulation

comparisons o f t h e e x p e r i m e n t a l r e s u l t s s h o u l d be m e a n i n g f u l as w e l l as
informative.
The average MSE o f the e s t i m a t e s o f i s computed f o r each method
for

each experiment and r e c o r d e d i n T a b l e 7.

Table 7

^^Experiment.
1

.1101

.4824

.9594

.0030

.0342

.3996

.0180

.0720

.6951

.025

.0192

.0570

.2691

.1104

.0210

.0945

.05

.2865

.0390

.0087

.4158

.1965

.0024

.1833

.0719

.0939

.075

.8820

.3744

.1200

.8973

.5778

.1026

.1

1.7430

1.0167

.5559

1.5366

1.1097

.3765

.8307

.5643

.0041

.2

7.3539

5.9115

4.7694

5.3823

4.5690

2.7540

3.2001

1.2549

.5822

.3

15.1383

13.2456

11.5962

10.8432

9.5949

6.8219

6.6531

5.7624

3.3012

.5

33.5067

31.1559

28.8369

23.9694

22.3449

18.6255

15.6804

14.2377

10.0251

.7

38.6334

36.9396

32.0214

26.3049

24.3789

18.5115

79.3134

76.8708

60.7620

50.3176

52.8276

43.3464

40.2351

32.3991

k
0.0

1.0

-
73.8630

-50-

As t h e known a u t o c o r r e l a t i o n case, when k = 0 t h e MSE o f


e s t i m a t e s o f B i n c r e a s e s as t h e degree o f a u t o c o r r e l a t i o n i n c r e a s e s ,
g i v e n the degree o f m u l t i c o l l i n e a r i t y .

However, b e i n g

different

from t h e known a u t o c o r r e l a t i o n case, t h e MSE o f e s t i m a t e s o f 8 f i r s t


decreases

then i n c r e a s e s as t h e degree o f m u l t i c o l l i n e a r i t y i n c r e a s e s

f o r k = 0 and a g i v e n degree o f a u t o c o r r e l a t i o n .
except

f o r experiments

T a b l e 7 shows t h a t

4 and 7, b e t t e r e s t i m a t e s o f B i n MSE

criterion

can be o b t a i n e d i f D u r b i n ' s

two-step method i s combined w i t h r i d g e

regression f o r estimation.

B e s i d e s , amazingly we have found t h a t we

a r e a b l e t o o b t a i n b e t t e r e s t i m a t e s o f B i n terms o f MSE i f t h e t r u e
autocorrelation coefficient P

i s unknown.

F o r c l a r i t y , we

shall

compare o n l y t h e minimum o f t h e average MSE o f t h e e s t i m a t e s o f B


a c h i e v e d f o r each experiment
p,

known c a s e .

i n the p

unknown case w i t h t h a t i n the

T a b l e 8 r e p o r t s t h e minima o f t h e average MSE o f t h e

e s t i m a t e s o f 8 a c h i e v e d f o r each experiment
unknown c a s e s .

i n both the

known and

I n a d d i t i o n , t h e e s t i m a t i o n method and the c h a r a c t e r i s t i c s

of each experiment

are also tabulated.

-51-

Table 8
(p

(p

unknown)

known)

^ s E x p e r imen t

Experiment

(r

1 2

,P )

Estimation
Method

Min. MSE
of g

Min.

^GR

(.05,.05)

RR

.025

.0192

(.05,.50)

. Durb.+RR

.05

.0390

.05

.0363

(.05,.90)

Durb.+RR

.05

.0087

.1

.0156

(.50,.05)

.0030

(.50,,50)

Durb.+RR

.025

.0210

. 025

.0018

(.50,.90)

Durb.+RR

.05

.0024

.1

.1092

(.95,.05)

.0180

(.95,.50)

Durb.+RR

.05

.0719

.05

.0561

(.95,.90)

Durb.+RR

.1 .

.0441

.1

.2901

OLS

0.0

OLS

0.0

A couple o f i n t e r e s t i n g o b s e r v a t i o n s can be made from T a b l e 8.


if

First

the degree o f m u l t i c o l l i n e a r i t y i s h e l d c o n s t a n t , the minimum o f the

average

MSE o f parameter e s t i m a t e s w i l l

first

the degree o f a u t o c o r r e l a t i o n i n c r e a s e s .

i n c r e a s e then decrease as

On the o t h e r hand, g i v e n the

degree o f a u t o c o r r e l a t i o n , t h e minimum o f the average


estimates w i l l

first

decrease

c o l l i n e a r i t y increases.

MSE o f the parameter

then i n c r e a s e as the degree o f m u l t i -

These a r e i n t u i t i v e l y p l a u s i b l e s i n c e s u f f i c i e n t

h i g h degree o f a u t o c o r r e l a t i o n s h o u l d - l e a d to more s t a b l e parameter


e s t i m a t e s w h i l e s u f f i c i e n t h i g h degree o f m u l t i c o l l i n e a r i t y

usually

r e s u l t s i n v e r y u n s t a b l e parameter e s t i m a t e s .

observe

t h a t the v a l u e o f r i d g e parameter k, used


of

MSE

Secondly, we

t o a c h i e v e the minimum MSE

the e s t i m a t e s o f 8, i n c r e a s e s w i t h the degree o f m u l t i c o l l i n e a r i t y

and a u t o c o r r e l a t i o n .

T h i s i s c o n s i s t e n t w i t h our a n a l y t i c

findings

shown i n Section 5.3. Moreover, we have found that knowing

does not

give better estimates of 6 for s u f f i c i e n t high degree of autocorrelation.


This may r e s u l t from sample sizes being small.
Table 9 contains the mean estimates of p^ obtained i n the f i r s t step
of Durbin's method and the mean Haitovsky h e u r i s t i c s t a t i s t i c for each
experiment.

Table 9

Experiment

Bias i n p
e
H
x^
df = 3
2

.3581

125.7

.7182

.3586

.7231

.1419

.1818

.1414

.1769

123.1

111.7

38.7

37.4

39.1

2.78

.3849

.7498

.1151 .1502
2.53

2.40

In a l l cases, Durbin's two-step method tends to underestimate the true


autocorrelation c o e f f i c i e n t .

This results from the presence of the lagged

Y values among the explanatory variables [16].

I f the degree of multi-

c o l l i n e a r i t y Is held constant, the bias of estimate of p^ increases as the


degree of autocorrelation increases; while given the degree of autocorrelation, the bias decreases as the degree of m u l t i c o l l i n e a r i t y increases.
In our simulation study, Haitovsky h e u r i s t i c s t a t i s t i c can recognize the
existence of severe m u l t i c o l l i n e a r i t y i n experiments

7, 8 and 9.

However,

i t does not give any warning when there exists a f a i r l y high degree of
m u l t i c o l l i n e a r i t y , i . e . based on the Haitovsky test, m u l t i c o l l i n e a r i t y
i s i n s i g n i f i c a n t i n experiments

4, 5 and 6.

Since the Haitovsky test

i s based on the determinant of correlation matrix, i t has some b u i l t - i n


defficiencies

(see Section 3.3 for d e t a i l s ) .

Our experiments have

-53-

disclosed these d e f f i c i e n c i e s to a certain extent, hence we suggest that


s p e c i a l care has to be exercised i n applying this test.
7.2.C.

Forecasting

Tables 10, 11 and 12 report the average residual sums of squares and
the mean square error of prediction from the given values f o r the forecast period of each experiment, under the assumption that

Table 10

Experiment
k

1 2

= .05, a* = 6)

1
a

0.0

(r

i s unknown.

AA

2*

AA

F/C

F/C

AA
M S E

F/C

5.9700

8.1132

5.7055

9.1966

5.6351

10.939

.025

5.9924

8.0343

5.7332

9.6623

5.6721

10.952

.05

6.0554

7.9961

5.8113

9.0620

5.7762

11.034

.075

6.1536

7.9986

5.9328

9.1072

5.9382

11.173

.1

6.2824

8.0343

6.0918

9.1913

6.1501

11.360

.2

7.0250

8.4293

7.0074

9.8181

7.3716

12.470

.3

8.0022

9.0877

8.2087

10.739

8.9754

13.932

.5

10.245

10.755

10.957

12.944

12.6521

17.249

1.0

15.733

15.075

17.656

18.419

21.649

25.164

"2
a : the average of the residual sums of squares over ten samples.

** MSE ^ :
F

the average MSE of predictions from the given values for


the forecast period.

-54-

Table 11

(r, = .50, a

k
0.0

Experiment
-2
a
u

= 6)

12

M S E

F/C

-2
. a
u

M S E

F/C

"2
a
u

M S E

F/C

6.0169

8.2093

5.8625

9.3838

5.7757

10.733

.025

6.0331

8.1743

5.8828

9.3691

5.8038

10.731

.05

6.0797

8.1690

5.9409

9.3874

5.8838

10.672

.075

6.1541

8.1905

6.0330

9.4351

6.0109

10.850

.1

6.2518

8.2360

6.1559

9.5093

6.1803

10.961

.2

6.8444

8.6142

6.8849

10.021

7.1976

11.679

.3

7.2030

9.2476

7.9231

.10.783

8.6992

12 .482

.5

9.7190

10.851

10.470

12.735

12.128

15.327

.7

11.933

12.726

13.070

14.851

15.759

18.018

1.0

15.429

15.620

17.566

18.301

21.929

22.680

2
Table 12

a
u

0.0

= .95, a

^F/C

= 6)

Experiment
k

(r

-2
a
u

MSE .
F/C

9
-2
a
u

M S E

F/C

6.0443

8.3699

5.6725

9.8589

5.4033

11.335

.05

6.1186

8.1165

5.7759

9.4141

5.5390

10.758

.1

6.2753

8.1220

6.0473

9.3568

5.8110

10.754

.2

6.7890

8.4120

6.6160

9.5945

6.7433

11.172

.3

7.5180

8.9408

7.5161

10.116

7.9187

12.001

.5

9.4101

9.8466

11.683

11.120

. 14.301

10.445

.7

11.643

.12.290

12.584

13.651

14.881

17.123

1.0

15.198

15.343

16.973

16.918

20.713

21.591

Though the BLUP i s adopted f o r f o r e c a s t purposes, the MSE


prediction w i l l
increases.

still

i n c r e a s e as the degree of

of

autocorrelation

However, the main p o i n t i s t h a t the presence of m u l t i -

c o l l i n e a r i t y w i l l adversely
disturbances

a f f e c t the p r e d i c t i v e performance i f the

are h i g h l y s e r i a l l y c o r r e l a t e d .

The

t h a t the p r e d i c t i v e power o f the model i s not

commonly h e l d

a f f e c t e d by

belief

existence

of m u l t i - c o l l i n e a r i t y i s o n l y t r u e i f the problem of a u t o c o r r e l a t i o n
is

not

serious.

I n the 9th experiment, the model f i t t e d by

method g i v e s s a t i s f a c t o r y r e s u l t s on v a r i o u s

diagnostic tests,

the p r e d i c t i o n of the BLUP l e a v e s much t o be d e s i r e d .

determine the model which w i l l y i e l d l e s s MSE


c r i t e r i a and

c o l l i n e a r i t y and

not

l e s s MSE

We

to
perform

a l s o observed t h a t the v a l u e

of p r e d i c t i o n s .

e s t i m a t e s of B i n MSE

of

of squares w i l l

However, the

c r i t e r i o n tends to

of p r e d i c t i o n f o r each experiment.

the v a l i d i t y of the MSE

o f p r e d i c t i o n and

e s t i m a t e of the r e s i d u a l sum

u s u a l l y g i v e the minimum MSE

o f k g i v i n g the b e s t

are a b l e

t e s t s i n the j o i n t presence of m u l t i -

autocorrelation.

k t h a t y i e l d s the b e s t

still

Fortunately,

w i t h Durbin's method combined w i t h Ridge r e g r e s s i o n , we

w e l l on v a r i o u s

Durbin's

Hence, we

may

value
yield

conclude

c r i t e r i o n i n the e v a l u a t i o n of parameter

estimates.
To

avoid

c o n f u s i o n , we

based on
g
~GLS

and

obtained

However, we
true p

have not

reported

found t h a t the MSE

the MSE

of p r e d i c t i o n based

i s l e s s than t h a t based on the parameter e s t i m a t e s

by Durbin's two-step method i n c o n j u n c t i o n

regression.

of p r e d i c t i o n

with

ridge

Though Durbin's two-step method combined w i t h

r e g r e s s i o n g i v e s b e t t e r e s t i m a t e s o f 3, but
underestimates p .

Therefore,

ridge

i n general i t

the BLUP based on

(3

and

true p

on

g i v e s the minimal MSE o f p r e d i c t i o n i n each o f the experiments 2, 3,


5, 6, 8, and 9.

CONCLUSIONS
I t has been shown t h a t i n the presence o f m u l t i c o l l i n e a r i t y w i t h
s u f f i c i e n t h i g h degrees o f a u t o c o r r e l a t i o n .

The OLS e s t i m a t e s o f

r e g r e s s i o n c o e f f i c i e n t s can be h i g h l y i n a c c u r a t e .
estimation

procedure i s o b v i o u s l y n e c e s s a r y .

r e g r e s s i o n , we d e r i v e d a new

Combining GLS and Ridge

estimator.

(k) = (xV'-'-X+kl)'" X n~ Y
~ .. ~
~
~
~
T

~GR

Improving the

where 0 < k < 1

andft i s defined i n (2').

&(k), though b i a s e d ,

i s expected to perform w e l l i n the j o i n t presence o f m u l t i c o l l i n e a r i t y


and

autocorrelation.

based on the b i a s e d
Therefore,

However, s i n c e ft i s unknown,parameter e s t i m a t e s
estimator

B^Ck)

cannot be o b t a i n e d

i n practice.

we combined Durbin's two-step method w i t h o r d i n a r y

r e g r e s s i o n to approximate those parameters.

The e f f e c t i v e n e s s o f our

a p p r o x i m a t i o n can then b e s t be examined by the Monte C a r l o


Our

simulation.

study has confirmed t h a t , f o r a g i v e n degree o f m u l t i -

c o l l i n e a r i t y , the MSE o f the GLS e s t i m a t e s o f 3 i s d i r e c t l y


to the degree o f a u t o c o r r e l a t i o n .
wisdom.

Ridge

T h i s agrees w i t h

proportioned

conventional

Unexpectedly, we found t h a t the MSE o f t h e GLS e s t i m a t e s o f

3 i s i n v e r s e l y p r o p o r t i o n a l to the degree of m u l t i c o l l i n e a r i t y f o r a
s u f f i c i e n t l y high

degree o f a u t o c o r r e l a t i o n .

T h i s i m p l i e s t h a t i n the

a p p l i c a t i o n o f the GLS t e c h n i q u e , the symptom o f the e x i s t e n c e o f


m u l t i c o l l i n e a r i t y may be d i s g u i s e d .

However, s i n c e i n p r a c t i c e n e i t h e r

the t r u e e r r o r terms nor


GLS
in

the a u t o c o r r e l a t i o n c o e f f i c i e n t

e s t i m a t e s can p o s s i b l y be o b t a i n e d .

We

were p l e a s e d

the j o i n t presence of m u l t i c o l l i n e a r i t y and

i s known, no
to f i n d

autocorrelation;

whatever the degree i s , Durbin's two-step method i n c o n j u n c t i o n


Ridge r e g r e s s i o n
the GLS

with

(p^ unknown) y i e l d s even b e t t e r e s t i m a t e s of 8 than

technique ( p

known) does i n MSE

criterion.

Though the

o f k g i v i n g b e t t e r e s t i m a t e s of 3 tends to y i e l d l e s s MSE
prediction, s t i l l
the cases.

that

the GLS

gives

value

of

the minimal-MSE o f p r e d i c t i o n i n a l l

Besides, our e x p e r i m e n t a l r e s u l t s have shown t h a t

Durbin-Watson t e s t f o r d e t e c t i n g the e x i s t e n c e

the

of f i r s t - o r d e r

auto-

c o r r e l a t i o n remains p o w e r f u l i n the presence of m u l t i c o l l i n e a r i t y w h i l e


the H a i t o v s k y h e u r i s t i c s t a t i s t i c g i v e s r e l a t i v e l y l i m i t e d
about the e x i s t e n c e

of - m u l t i c o l l i n e a r i t y e i t h e r w i t h or without

presence of a u t o c o r r e l a t e d
Our
"optimal

autocorrelation.

to the s e a r c h

independent phenomena.

Empirical research

f o r optimal

d e a l i n g w i t h m u l t i c o l l i n e a r i t y and
and

find,an

package" t h a t d e a l s w i t h the j o i n t problem of

m u l t i c o l l i n e a r i t y and
been c o n f i n e d

the

e r r o r terms.

r e s u l t s a l s o suggest t h a t i t might be p o s s i b l e to
estimation

information

The

estimation

autocorrelated

ordinary

has

hitherto

techniques i n
e r r o r s as

separate

ridge regression, i . e . ,
T

adding a constant

k on the d i a g o n a l

of c o r r e l a t i o n m a t r i x (X X)

Durbin's two-step method have been shown to be v e r y


techniques i n h a n d l i n g

m u l t i c o l l i n e a r i t y and

Even though s a t i s f a c t o r y e s t i m a t i o n

and

the combination o f Durbin's method and


t h e r e may

still

e x i s t some o t h e r

and

powerful

a u t o c o r r e l a t i o n problems.

p r e d i c t i o n are o b t a i n e d
ordinary

ridge

by

regression,

even more e f f i c i e n t approaches to

the j o i n t problem o f m u l t i c o l l i n e a r i t y and

autocorrelation.

For

-58-

i n s t a n c e , the combination

of the Cochrane-Orcutt

procedure

G e n e r a l i z e d Ridge r e g r e s s i o n i s a more f l e x i b l e e s t i m a t i o n
and

thereby

with
technique

s h o u l d l e a d to b e t t e r e s t i m a t i o n and p r e d i c t i o n .

Allowing

f o r h i g h e r o r d e r and mixed o r d e r a u t o c o r r e l a t i o n w i l l be a good


d i r e c t i o n to pursue as w e l l .

BIBLIOGRAPHY
Cochrane, D. and Orcutt, G. H. (1949). Application of l e a s t squares regressions to relationships containing autocorrelated
error terms. J . Am. S t a t i s t . Assoc., 44, 32-61.
Durbin, J . (1960). Estimation of parameters i n time-series
regression models. J . Royal S t a t i s t . S o c , Series B,
139-153.
Durbin, J . (1970). Testing for s e r i a l c o r r e l a t i o n i n l e a s t squares regression when some of the regressors are lagged
dependent variables. Econometrica, 38, 410-421.
Durbin, J . and Watson, G. S. (1950). Testing f o r s e r i a l
c o r r e l a t i o n i n least-squares regression (part 1). Biometrica,
J 3 7 , 409-428.
Durbin, J . and Watson, G. S. (1951). Testing for s e r i a l c o r r e l a t i o n
i n least-squares regression (part 2). Biometrica, 38, 159-178.
Farrar, D. C. and Glanber, R. R. (1967). M u l t i c o l l i n e a r i t y i n
regression analysis: the problem r e v i s i t e d . Rev. Economics
S t a t i s t i c s , 49, 92-107.
G r a y b i l l , F. A. (1976). Theory and Application of the Linear
.Model. Daxburg Press, North Scituate, Mass.
G r i l i c h e s , Z. and Rao, P. (1969). Small-sample properties'of
several two-stage regression methods i n the context of .
autocorrelated errors. JASA, 64, 253-272.
Hailovsky, Y. (1969). M u l t i c o l l i n e a r i t y i n regression analysis:
comment. Rev. Economics S t a t i s t i c s , 486-489.
Henshaw, R. C., J r . (1966). Testing single-equation least squares
regression models for autocorrelated disturbances.
Econometrica,
34, 646-660.
Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression:
biased
estimation for nonorthogonal problems. Tech., 12, 55-67.
Hoerl, A. E., Kennard, R. W., and Baldwin, K. F. (1975). Ridge
regression: some simulations. Comm. Stat. , k_, 105-123.
Johnston, J . (1972).

Econometric Methods.

2nd edn., McGraw-Hill.

K l e i n , L. R. (1962).
Hall.

An Introduction to Econometrics.

Prentice-

-60-

[15]

L i n , T. C. (1960). Underidentification, s t r u c t u r a l estimation and


forecasting. Econometrica, 28, 856.

[16]

Marriott, F. H. C. and Pope, J . A. (1954). Bias i n the estimation


of autocorrelation. Biometrica, 41, 390-402.

[17]

Nerlove, M. and W a l l i s , K. F. (1966). Use of the Durbin-Watson


S t a t i s t i c i n inappropriate s i t u a t i o n s . Econometrica, 34,
235-238.

[18]

Newhouse, J . P. and Oman, S. D. (1971). An evaluation of ridge


estimators. Rand report No. R-716-PR.

[19]

Smith, V. K. (1973).

[20]

Thisted, R. A. (1976). Ridge regression, minimax estimation and


empirical Bayes method. Ph.D. thesis, Tech. Report 28,
B i o s t a t i s t i c s Dept., Stanford University.

[21]

Thisted, R. A. (1978). M u l t i c o l l i n e a r i t y , information and ridge


regression. S t a t i s t i c s Dept., University of Chicago.

[22]

-yon Neumann, J . (1941). D i s t r i b u t i o n of the r a t i o of the mean


square successive difference to the variance. Ann. Math. Stat.
12, 367-395.

[23]

White, J . S. (1961). Asymptotic expansions f o r the mean and


variance of the s e r i a l . c o r r e l a t i o n c o e f f i c i e n t .
Biometrica,
48, 85-94.

Monte Carlo Methods.. Lexington, Mass.

Vous aimerez peut-être aussi