Further Regression Topics II

1/32
EC114 Introduction to Quantitative Economics

20. Further Regression Topics II
Marcus Chambers
Department of Economics
University of Essex
20/22 March 2012
EC114 Introduction to Quantitative Economics 20. Further Regression Topics II
2/32
Outline
1
Introduction
2
Heteroskedasticity
3
Autocorrelation
4
Dynamic Models
Reference: R. L. Thomas, Using Statistics in Economics,
McGraw-Hill, 2005, sections 15.3, 15.4 and 16.1.
Introduction 3/32
Many applications in Econometrics are based around the
multiple linear regression model
Y =
1
+
2
X
2
+. . . +
k
X
k
+,
which can be used to test hypotheses arising in economic
models using observed data.
A popular method of estimating the unknown parameters
1
, . . . ,
k
is Ordinary Least Squares (OLS), resulting in the
estimators b
1
, . . . , b
k
.
Under the Classical assumptions on the regressors
(X
2
, . . . , X
k
) and the disturbance () it follows that the OLS
estimators are unbiased and have minimum variance;
these are desirable properties for an estimator to possess.
Introduction 4/32
Lets just recall what the Classical assumptions actually
are:
Assumptions concerning the explanatory variables
IA (non-random): X
2
, . . . , X
k
are non-stochastic;
IB (xed): The values of X
2
, . . . , X
k
are xed in
repeated samples;
ID (no collinearity): There exist no exact linear relationships
between the sample values of any two
or more of the explanatory variables.
Introduction 5/32
Assumptions concerning the disturbances
IIA (zero mean): E(
i
) = 0, for all i;
IIB (constant variance): V(
i
) =
2
= constant for all i;
IIC (zero covariance): Cov(
i
,
j
) = 0 for all i = j;
IID (normality): each
i
is normally distributed.
But what happens to the properties of OLS if one or more
of the Classical assumptions does not hold?
We shall look at two cases where the assumptions fail:
heteroskedasticity and autocorrelation.
Heteroskedasticity 6/32
Heteroskedasticity (sometimes written heteroscedasticity)
occurs when the variance of is not constant throughout
the sample.
Recall that Assumption IIB requires
V(
i
) =
2
(i.e. constant for all i = 1, . . . , n),
ruling out the possibility that
V(
i
) =
2
i
(i.e. different for all i = 1, . . . , n).
When the variance is constant, as it is under IIB, the
disturbances are said to be homoskedastic.
When the variance is allowed to change, the disturbances
are said to be heteroskedastic.
What are the consequences for OLS estimation of
heteroskedasticity?
Assuming the other assumptions (apart from IIB) hold, we
nd that:
1. the OLS estimators remain unbiased; but
2. the OLS estimators are no longer BLUE; and
3. the usual formulae for estimating the standard errors
of the OLS estimators are incorrect.
The fact that OLS is no longer a best linear unbiased
estimator means that we can nd other linear unbiased
estimators with smaller variance and which are therefore
preferred to OLS.
A serious consequence of heteroskedasticity is that the
formulae for the OLS standard errors are incorrect.
This is because the formulae are derived assuming a
constant variance,
2
, which we estimate using s
2
; this is
no longer correct if the variance isnt constant.
This means that, for example, any t-statistics we construct
will be using the wrong standard errors, and so we may
draw the wrong inferences from our hypothesis tests.
For example, we may conclude that a variable X
j
is a
signicant determinant of Y when in fact it isnt i.e. we
reject H
0
:
j
= 0 when in fact the null is true!
Conversely, we may conclude that a variable X
j
is not a
signicant determinant of Y when in fact it is i.e. we do not
reject H
0
:
j
= 0 when in fact the null is false!
It is often the case that the variance is associated with one
or more of the regressors.
In the two-variable model, suppose that
E(Y
i
) = 7 + 0.8X
i
.
If X
3
= 10 then E(Y
3
) = 7 + (0.8 10) = 15 whereas if
X
30
= 100 then E(Y
30
) = 7 + (0.8 100) = 87.
Now, suppose that the actual Y
3
varies (in repeated
samples) in a band of width 10 around the value 15
i.e. Y
3
= 15 5.
Is it likely that Y
30
will also uctuate in the same size band
i.e. is it likely that Y
30
= 87 5?
Often, for larger values, the range of variation will also be
larger e.g. Y
30
= 87 29, as illustrated on the next slide:

The range of variation of Y is greater the larger the value of
X i.e. the variance increases with X.
Given the potentially serious consequences of
heteroskedasticity, particularly for inferences, it is important
to be able to test whether it is present (or absent).
Given that we dont observe we have to base any test on
the residuals e
1
, . . . , e
n
.
If we suspect that the variance is associated with a
particular regressor, X, one thing we can do is plot the
squared residuals against X.
Why squared residuals? Note that E(
2
i
) =
2
i
and so e
2
i
is
taken as a measure of the variation in the residuals.
An example is given in the following diagram:

Note that the dispersion of squared residuals increases
with X which can suggest evidence of heteroskedasticity.
However, although such diagrams can be informative, what
is really required is a statistical test. . .
A very general test is the Breusch-Pagan test, or BP test,
which is an example of what is known as a Lagrange
multiplier test.
The test that is used in Stata assumes that
V(
i
) =
1
+
2
Y
2
i
, i = 1, . . . , n,
where

Y
i
denotes the tted value from the regression.
The null and alternative hypotheses are:
H
0
:
2
= 0, H
A
:
2
= 0.
Note that, under H
0
, V(
i
) =
1
(constant) while under H
A
the variance depends on i through

Y
2
i
.
Although we shall not go into details of how the statistic is
computed, we do need to know how to use it.
In fact, the test statistic, TS, has a
2
1
distribution under H
0
.
If c
0.05
denotes the (upper-tail) 5% critical value from the
2
1
distribution, then the decision rule is:
if TS > c
0.05
reject H
0
in favour of H
A
;
if TS < c
0.05
do not reject H
0
i.e. reserve judgment.
Lets return to the money demand example of 30 countries
in 1985, and estimate the equation
M
i
=
1
+
2
G
i
+
i
, i = 1, . . . , 30,
where M denotes money stock and G denotes GDP.
The Stata output for the regression and the BP test is as
follows:
. regress m g
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 1, 28) = 94.88
Model | 20.3862321 1 20.3862321 Prob > F = 0.0000
Residual | 6.01600434 28 .214857298 R-squared = 0.7721
-------------+------------------------------ Adj R-squared = 0.7640
Total | 26.4022364 29 .910421946 Root MSE = .46353
------------------------------------------------------------------------------
m | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g | .1748489 .0179502 9.74 0.000 .1380795 .2116182
_cons | .0212579 .1157594 0.18 0.856 -.2158645 .2583803
------------------------------------------------------------------------------
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of m
chi2(1) = 14.10
Prob > chi2 = 0.0002
We nd that TS = 14.10 > 3.841 (the 5% critical value for
the
2
1
distribution) and so we reject H
0
i.e. there is
evidence of heteroskedasticity.
The regression output tells us that there is only 0.02% of
the distribution to the right of 14.10, so the test statistic is
highly signicant.
Suppose, however, that we run the regression in
logarithms i.e. we estimate
ln(M)
i
=
1
+
2
ln(G)
i
+
i
, i = 1, . . . , 30.
Does this affect the outcome of the test?
The relevant output is:
. regress lm lg
-------------+------------------------------ F( 1, 28) = 232.11
Model | 57.4853608 1 57.4853608 Prob > F = 0.0000
Residual | 6.93446511 28 .247659468 R-squared = 0.8924
-------------+------------------------------ Adj R-squared = 0.8885
Total | 64.4198259 29 2.22137331 Root MSE = .49765
------------------------------------------------------------------------------
lm | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lg | 1.04467 .068569 15.24 0.000 .9042126 1.185127
_cons | -1.912253 .104309 -18.33 0.000 -2.12592 -1.698586
------------------------------------------------------------------------------
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of lm
chi2(1) = 0.19
Prob > chi2 = 0.6603
Here we nd that TS = 0.19 < 3.841 and so we do not reject
H
0
i.e. there is no signicant evidence of heteroskedasticity.
The output also tells us that there is 66.03% of the
distribution to the right of 0.19, so this statistic is highly
insignicant.
This example indicates that the regression in levels may be
misspecied in that there is evidence of a non-constant
variance in the residuals.
Specifying the relationship between the variables in the
logarithmic form appears to result in a better specied
equation (at least, one in which we are unable to reject the
null hypothesis of homoskedasticity).
We shall now examine another situation in which one of
the Classical assumptions is violated. . .
Autocorrelation 19/32
Autocorrelation (or serial correlation) occurs when there is
correlation between the disturbances at different points in
the sample.
Recall that Assumption IIC requires
Cov(
i
,
j
) = 0 for i = j.
However, if it is the case that
Cov(
i
,
j
) = 0 for at least one pair i, j,
then the disturbances are said to be autocorrelated.
Such a situation is common in time series in which effects
from one period persist into the next.
If the disturbance this period,
t
, is affected by the
disturbance last period,
t1
, then Cov(
t
,
t1
) = 0.
What are the consequences for OLS estimation of
autocorrelation?
Assuming the other assumptions (apart from IIC) hold, we
nd that:
1. the OLS estimators remain unbiased; but
2. the OLS estimators are no longer BLUE; and
3. the usual formulae for estimating the standard errors
of the OLS estimators are incorrect.
The fact that OLS is no longer a best linear unbiased
estimator means that we can nd other linear unbiased
estimators with smaller variance and which are therefore
preferred to OLS.
A serious consequence of autocorrelation is that the
formulae for the OLS standard errors are incorrect.
This is because the formulae are derived assuming
Cov(
i
,
j
) = 0 for all i = j; this is no longer correct under
autocorrelation.
This means that, for example, any t-statistics we construct
will be using the wrong standard errors, and so we may
draw the wrong inferences from our hypothesis tests.
For example, we may conclude that a variable X
j
is a
signicant determinant of Y when in fact it isnt i.e. we
reject H
0
:
j
= 0 when in fact the null is true!
Conversely, we may conclude that a variable X
j
is not a
signicant determinant of Y when in fact it is i.e. we do not
reject H
0
:
j
= 0 when in fact the null is false!
In view of the potential for making incorrect inferences it is
important to be able to test for autocorrelation.
Most tests are concerned with autocorrelation being of an
autoregressive form.
For example, a rst-order autoregressive process is
t
=
t1
+ u
t
,
where 1 < < 1 and u
t
is a disturbance that satises the
Classical assumptions.
As
t
depends on
t1
it follows that Cov(
t
,
t1
) = 0.
When > 0 we have positive autocorrelation: positive
values of
t1
tend to be followed by positive values of
t
.
When < 0 we have negative autocorrelation: positive
values of
t1
tend to be followed by negative values of
t
.
Note that, when = 0, there is no autocorrelation because
t
= u
t
, which is Classical and, hence, uncorrelated.
A test for rst-order autocorrelation can be regarded as a
test of H
0
: = 0 against H
A
: = 0.
More generally, an autoregressive process of order p is
t
=
1
t1
+. . . +
p
tp
+ u
t
,
so that
t
depends on its own value in the p preceeding
periods.
When all the parameters (
1
, . . . ,
p
) are zero there is no
autocorrelation.
A test for pth-order autocorrelation can be regarded as a
test of H
0
:
1
= . . .
p
= 0 against H
A
: at least one
j
= 0.
How can we test for serial correlation in disturbances?
A widely used test is the Breusch-Godfrey test, or BG test,
which is another example of a Lagrange multiplier test.
Under the null hypothesis of no autocorrelation, the
statistic has a
2
p
distribution, where p denotes the highest
order of autocorrelation being tested for.
If c
0.05
denotes the (upper-tail) 5% critical value from the
2
p
distribution, then the decision rule is:
if TS > c
0.05
reject H
0
in favour of H
A
;
if TS < c
0.05
do not reject H
0
i.e. reserve judgment.
In other words, a signicant value of TS indicates
autocorrelation in the disturbances.
Example. Dataset 7 in Thomas contains US annual
observations from 1959 to 1997 on the following variables:
RY: real personal disposable income;
RC: real non-durable consumption expenditure;
NC: nominal non-durable consumption expenditure.
The real variables are in constant 1996 dollars; the
nominal variable is in current dollars; and all variables are
in per capita terms.
Lets estimate the following equation:
ln(RC)
t
=
1
+
2
ln(RY)
t
+
t
, t = 1959, . . . , 1997,
and test for autocorrelation using the BG statistic.
The output from Stata is as follows:
. regress lrc lry
-------------+------------------------------ F( 1, 37) = 9563.63
Model | 2.7811551 1 2.7811551 Prob > F = 0.0000
Residual | .010759799 37 .000290805 R-squared = 0.9961
-------------+------------------------------ Adj R-squared = 0.9960
Total | 2.7919149 38 .073471445 Root MSE = .01705
------------------------------------------------------------------------------
lrc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lry | 1.005177 .0102785 97.79 0.000 .9843504 1.026003
_cons | -.1666405 .0988785 -1.69 0.100 -.3669873 .0337063
------------------------------------------------------------------------------
. estat bgodfrey, lags(1)
Breusch-Godfrey LM test for autocorrelation
---------------------------------------------------------------------------
lags(p) | chi2 df Prob > chi2
-------------+-------------------------------------------------------------
1 | 23.776 1 0.0000
---------------------------------------------------------------------------
H0: no serial correlation
We nd that TS = 23.776 > 3.841 (the 5% critical value for
the
2
1
distribution) and so we reject H
0
i.e. there is
evidence of rst-order autocorrelation.
Dynamic Models 27/32
What do we do if we nd evidence of autocorrelation?
A possible cause of autocorrelation in disturbances is that
genuine dynamic factors have been ignored in the
regression.
Suppose, in the consumption example, that current
consumption depends not only on current income but also
the values of these variables in the previous period.
This means that we should be estimating the model
ln(RC)
t
=
1
+
2
ln(RY)
t
+
3
ln(RY)
t1
+
4
ln(RC)
t1
+
t
,
and because we ignore the lagged values the
autocorrelation shows up in the residuals instead.
The resulting regression output is:
. regress lrc lry L.lry L.lrc
-------------+------------------------------ F( 3, 34) = 8252.01
Model | 2.55683972 3 .852279906 Prob > F = 0.0000
Residual | .00351157 34 .000103281 R-squared = 0.9986
-------------+------------------------------ Adj R-squared = 0.9985
Total | 2.56035129 37 .069198683 Root MSE = .01016
------------------------------------------------------------------------------
lrc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lry |
--. | .8875018 .1034194 8.58 0.000 .6773282 1.097675
L1. | -.7417126 .1410369 -5.26 0.000 -1.028334 -.4550912
|
lrc |
L1. | .8614589 .1065743 8.08 0.000 .644874 1.078044
|
_cons | -.0828006 .0666774 -1.24 0.223 -.2183054 .0527042
------------------------------------------------------------------------------
. estat bgodfrey, lags(1)
Breusch-Godfrey LM test for autocorrelation
---------------------------------------------------------------------------
lags(p) | chi2 df Prob > chi2
-------------+-------------------------------------------------------------
1 | 0.030 1 0.8631
---------------------------------------------------------------------------
H0: no serial correlation
We nd that TS = 0.030 < 3.841 and so we do not reject H
0
i.e. there is no evidence of rst-order autocorrelation.
Incorporating lagged values of the dependent variable and
the regressor has allowed us to model (or account for) the
autocorrelation in a more direct way.
It is usually a good idea to try to model dynamic features
rather than let the unobserved part of the model (the
disturbance) account for them this can be particularly
important when using the model for forecasting.
One of the implications of a dynamic model is that the
effects of changes in the regressors are different in the
short run and the long run.
Suppose we have the model
Y
t
=
1
+
2
X
t
+
3
X
t1
+
4
Y
t1
+
t
.
The immediate impact of a unit change in X
t
is represented
by the coefcient
2
, but it will also affect Y
t+1
via the
coefcient
3
.
What is the long run effect?
Lets dene the long run as a situation when all variables
are in equilibrium, so that
Y
t
= Y
t1
= Y and X
t
= X
t1
= X.
Plugging these values into the model (and ignoring ) gives
Y =
1
+
2
X +
3
X +
4
Y.
Collecting terms we obtain
(1
4
)Y =
1
+ (
2
+
3
)X
which results in
Y =

1
(1
4
)
+
(
2
+
3
)
(1
4
)
X.
Hence the long run effect of a unit change in X is equal to
(
2
+
3
)/(1
4
)
which can be compared to the impact effect of
2
.
Dynamic models therefore allow for different short run and
long run properties.
Summary 32/32
Summary
heteroskedasticity
autocorrelation
dynamic models
Next term:
revision classes

Further Regression Topics II

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Further Regression Topics II

Transféré par

Droits d'auteur :

Formats disponibles

1/32

EC114 Introduction to Quantitative Economics

Vous aimerez peut-être aussi