Further Regression Topics

1/35
EC114 Introduction to Quantitative Economics

19. Further Regression Topics I
Marcus Chambers
Department of Economics
University of Essex
13/15 March 2012
EC114 Introduction to Quantitative Economics 19. Further Regression Topics I
2/35
Outline
1
Dummy Variables
2
Chow Tests
Reference: R. L. Thomas, Using Statistics in Economics,
McGraw-Hill, 2005, sections 14.1 and 14.2.
Dummy Variables 3/35
Sometimes the variables we want to use cant be
measured in a precise quantitative way.
We therefore have to introduce qualitative factors into the
analysis.
For example, suppose we want to study the demand for
beef over time.
We might consider the regression model
Q
t
= + Y
t
+
t
, t = 1, . . . , n,
where Q is per-capita demand for beef, Y is per-capita
disposable income, and the t subscript denotes the time
period which indexes observations.
Suppose that we suspect the nature of demand for beef to
change in certain periods because of a scare due to mad
cow disease (Creutzfeld-Jakob disease, or CJD).
How can we deal with such a qualitative factor?
The approach we will follow introduces a dichotomous or
dummy variable such that, in period t,
D
t
= 1 when consumers fear getting CJD,
D
t
= 0 when consumers do not fear getting CJD.
We shall see how such variables can be used in a
regression model to allow the parameters of the model to
change in certain periods.
We shall also consider tests of whether the parameters are
constant over the entire sample or whether they change in
the way described above.
Suppose we have monthly data on the demand for beef for
5 years (n = 60).
Furthermore suppose that we know that the CJD fear is
particularly relevant for the 12 months of the second year
and for the rst 6 months of the third year.
We can then dene a dummy variable for the relevant
period as:
D
t
= 1 for t = 13, . . . , 30,
D
t
= 0 for all other t.
This variable enters the data set as a series of zeros and
ones and is treated like any other variable.
Note that we can only construct D
t
if we have information
on the period affected by the CJD scare.
We can allow for the CJD scare to affect the intercept, ,
simply by including D
t
as an additional explanatory
variable:
Q
t
= + D
t
+ Y
t
+
t
, t = 1, . . . , n.
When D
t
= 0 (no CJD scare) the model reduces to
Q
t
= + Y
t
+
t
.
But during the CJD scare, D
t
= 1 and so
Q
t
= ( + ) + Y
t
+
t
,
the intercept becoming + (and we would probably
expect < 0 so that demand falls for given Y during the
CJD scare).
Suppose we run this regression and obtain
Q
t
= 25 12D
t
+ 0.02Y
t
.
This implies that during the CJD scare (setting D
t
= 1)
Q
t
= 13 + 0.02Y
t
,
while when there is no CJD scare (setting D
t
= 0)
Q
t
= 25 + 0.02Y
t
.
This effect is illustrated in the following diagram:

The presence of the dummy variable enables the
estimated demand equation to shift downwards during the
CJD scare.
Note, however, that the slope remains unaffected.
The shift in the demand equation, allowing different
demand equations in the different periods, was obtained by
estimating a single equation.
It means that we can test whether the shift is signicant by
carrying out a t-test for the signicance of the dummy
variables D
t
.
So, to test H
0
: = 0 against H
A
: = 0 we could use
TS =
t
n3
under H
0
,
where

denotes the estimated coefcient on D
t
and s
is
its standard error.
However, it is nevertheless possible that the effect of the
CJD scare is on the marginal propensity to consume beef
out of income i.e. on the parameter .
This situation might appear to be a bit more complicated
but it is also straightforward to handle.
Instead of including the variable D
t
by itself in the
regression, we now include the product of D
t
with Y
t
in the
regression i.e. we include the variable Y
t
D
t
.
It is easy to construct a new variable in regression software
that is the product of two variables.
For example, in Stata, if D and Y are the two variables, we
can use the command:
gen yd=y
*
d
to generate the product.
The starting point is now given by the equation
Q
t
= + Y
t
+ Y
t
D
t
+
t
.
When D
t
= 0 it follows that Y
t
D
t
= 0 and we have the
original equation
Q
t
= + Y
t
+
t
,
while when D
t
t
D
t
= Y
t
and we obtain
Q
t
= + ( + )Y
t
+
t
.
If < 0 and + > 0 the intercept remains unchanged but
the slope falls while remaining positive.
Suppose we run this regression and obtain
Q
t
= 25 + 0.02Y
t
0.01Y
t
D
t
.
This implies that during the CJD scare (setting D
t
= 1)
Q
t
= 25 + 0.01Y
t
,
while when there is no CJD scare (setting D
t
= 0)
Q
t
= 25 + 0.02Y
t
.
This effect is illustrated in the following diagram:

The presence of the product dummy Y
t
D
t
results in the
slope of the estimated equation changing.
Note that the intercept remains unchanged.
We can, of course, allow both the intercept and the slope
to change at the same time.
This requires adding both D
t
and Y
t
D
t
to the regression:
Q
t
= +
1
D
t
+ Y
t
+
2
Y
t
D
t
+
t
.
When D
t
t
D
t
= 0 and we have the
original equation
Q
t
= + Y
t
+
t
,
while when D
t
t
D
t
= Y
t
and we obtain
Q
t
= ( +
1
) + ( +
2
)Y
t
+
t
.
We can generalise this to any multiple linear regression.
Example. Consider the logarithmic money demand model
ln(M) =
1
+
2
ln(G) +
where M denotes the money stock and G denotes GDP.
Data set 9.1 in Thomas contains observations for 30
countries in 1985 and we shall split the sample into those
countries with GDP<$4,000 and those with GDP>$4,000.
We can dene an appropriate dummy variable:
D = 1 if GDP<$4,000,
D = 0 if GDP>$4,000.
We shall run regressions to see if the intercept and/or
slope are different for countries in these two ranges of
GDP per head.
Including the dummy variable yields:
. regress lm lg d
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 2, 27) = 119.10
Model | 57.8613015 2 28.9306507 Prob > F = 0.0000
Residual | 6.5585244 27 .242908311 R-squared = 0.8982
-------------+------------------------------ Adj R-squared = 0.8906
Total | 64.4198259 29 2.22137331 Root MSE = .49286
------------------------------------------------------------------------------
lm | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lg | .9030628 .132545 6.81 0.000 .6311029 1.175023
d | -.4534076 .3644604 -1.24 0.224 -1.201219 .2944033
_cons | -1.519285 .3323406 -4.57 0.000 -2.201191 -.8373782
------------------------------------------------------------------------------
The dummy variable is statistically insignicant and so
there does not appear to be any difference in the intercept
between these two groups.
Including the variable D ln(G) yields:
. regress lm lg dlg
-------------+------------------------------ F( 2, 27) = 116.31
Model | 57.7200283 2 28.8600142 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.8883
Total | 64.4198259 29 2.22137331 Root MSE = .49814
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
lg | 1.089832 .0828707 13.15 0.000 .9197953 1.259869
dlg | -.1650309 .1697023 -0.97 0.339 -.5132313 .1831695
_cons | -1.958399 .1146874 -17.08 0.000 -2.193718 -1.72308
------------------------------------------------------------------------------
The product variable is also statistically insignicant
suggesting no difference in the slope parameter (income
elasticity of money demand) between the two groups of
countries.
Dummy variables are widely used in Econometrics.
In cross-sections they can be used to represent things
such as:
gender: D = 1 if female, D = 0 if male;
employment status: D = 1 if employed,
D = 0 if unemployed;
marital status: D = 1 if married,
D = 0 if unmarried.
In time series dummies can be used to represent things
such as:
season: D
j
= 1 if quarter j,
D
j
= 0 otherwise (j = 1, . . . , 4);
particular event: D = 1 during wartime,
D = 0 otherwise.
Chow Tests 19/35
It is also possible to test whether coefcients in a
regression remain constant over two pre-specied
sub-samples using the Chow test for parameter stability.
Suppose we split the sample of n observations into two
sub-samples, the rst containing n
1
observations, the
second containing n
2
observations, where n
1
+n
2
= n.
Suppose, in the rst sub-sample, we have the population
regression
E(Y) =
1
+
2
X
2
+ . . . +
k
X
k
,
while in the second sub-sample we have the population
regression
E(Y) =
1
+
2
X
2
+ . . . +
k
X
k
.
Chow Tests 20/35
The null hypothesis of interest is
H
0
:
1
=
1
,
2
=
2
, . . . ,
k
=
k
,
i.e. that the coefcients are the same in the two
sub-samples.
The alternative hypothesis is
H
A
:
j
=
j
for at least one j.
We need to conduct three regressions and obtain the sum
of squared residuals, SSR, from each regression.
Chow Tests 21/35
The required regressions are as follows:
Regression 1: use the n
1
observations, obtain SSR
1
;
Regression 2: use the n
2
observations, obtain SSR
2
;
Regression 3: use all n observations, obtain SSR
p
.
Regression 3 is sometimes known as a pooled regression
because all the observations have been pooled together.
There are two versions of the Chow test.
The rst version compares all three values of SSR, while
the second compares either SSR
1
or SSR
2
with SSRp.
Chow Tests 22/35
The rst test statistic is
TS =
(SSR
p
SSR
1
SSR
2
)/k
(SSR
1
+ SSR
2
)/(n
1
+n
2
2k)
F
k,n
1
+n
2
2k
under H
0
.
The usual decision rule applies:
if TS > F
0.05
reject H
0
in favour of H
A
;
if TS < F
0.05
do not reject H
0
,
where F
0.05
is the 5% critical value from the F
k,n
1
+n
2
2k
distribution.
This test assesses whether the gain from estimating two
separate regressions, rather than the pooled regression, is
statistically signicant (as measured by comparing SSR
p
with SSR
1
+SSR
2
).
Chow Tests 23/35
The second test statistic is
TS =
(SSR
p
SSR
1
)/n
2
SSR
1
/(n
1
k)
F
n
2
,n
1
k
under H
0
.
The usual decision rule applies:
if TS > F
0.05
reject H
0
in favour of H
A
;
if TS < F
0.05
do not reject H
0
,
where F
0.05
is the 5% critical value from the F
n
2
,n
1
k
distribution.
This test is sometimes known as a predictive failure test,
because it is assessing whether the additional n
2
observations included in the pooled regression result in a
statistically signicant proportionate change in SSR.
Chow Tests 24/35
In time series applications the second sub-sample of n
2
observations follows chronologically from the rst
sub-sample of n
1
observations, so there is a clear ordering
of the observations.
In cross-sections, however, there is often no unique
ordering, and so the roles of the two sub-samples can be
reversed, resulting in the test statistic
TS =
(SSR
p
SSR
2
)/n
1
SSR
2
/(n
2
k)
F
n
1
,n
2
k
under H
0
.
In this case the n
2
observations are taken as the reference
point, which wouldnt make sense with time series.
Chow Tests 25/35
Another way to think of the Chow test is in terms of dummy
variables.
Let D denote the following dummy variable:
D = 1 if sub-sample 1 (n
1
observations),
D = 0 if sub-sample 2 (n
2
observations).
We can then dene the population regression
E(Y) =
1
+
2
X
2
+ . . . +
k
X
k
+
1
D +
2
(DX
2
) + . . . +
k
(DX
k
).
In sub-sample 1, we have
E(Y) = (
1
+
1
) + (
2
+
2
)X
2
+ . . . + (
k
+
k
)X
k
,
while in sub-sample 2 we have
E(Y) =
1
+
2
X
2
+ . . . +
k
X
k
.
Chow Tests 26/35
The differences in the coefcients between the two
sub-samples are captured by the
j
parameters.
If the
j
are all zero then there are no differences.
We can therefore consider our test of parameter stability as
being a test of
H
0
:
1
= 0,
2
= 0, . . . ,
k
= 0
against H
A
: at least one
j
= 0.
We can carry out an F-test of these k restrictions by
running just two regressions, as follows:
Chow Tests 27/35
Regression 1 is the unrestricted regression
Y =
1
+
2
X
2
+ . . . +
k
X
k
+
1
D +
2
(DX
2
) + . . . +
k
(DX
k
) + ;
from this we need the SSR, denoted SSR
U
.
Regression 2 is the restricted regression
Y =
1
+
2
X
2
+ . . . +
k
X
k
+ ;
from this we need SSR
R
.
We then construct the test statistic
TS =
(SSR
R
SSR
U
)/k
SSR
U
/(n 2k)
F
k,n2k
under H
0
and apply the usual decision rule.
Chow Tests 28/35
Example. Lets return to the money demand example,
where we divided the countries into those with
GDP>$4000 per head and those with GDP<$4000 per
head
We shall conduct Chow tests of whether the regression
parameters are different between these two groups we
begin with the pooled regression
. regress lm lg
-------------+------------------------------ F( 1, 28) = 232.11
Model | 57.4853608 1 57.4853608 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.8885
Total | 64.4198259 29 2.22137331 Root MSE = .49765
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
lg | 1.04467 .068569 15.24 0.000 .9042126 1.185127
_cons | -1.912253 .104309 -18.33 0.000 -2.12592 -1.698586
------------------------------------------------------------------------------
Chow Tests 29/35
The two sub-sample regressions are:
. regress lm lg if g<4
-------------+------------------------------ F( 1, 17) = 38.52
Model | 10.6102646 1 10.6102646 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.6758
Total | 15.2933819 18 .849632328 Root MSE = .52486
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
lg | .9226822 .148673 6.21 0.000 .6090096 1.236355
_cons | -1.970365 .1216961 -16.19 0.000 -2.227121 -1.713609
------------------------------------------------------------------------------
. regress lm lg if g>4
-------------+------------------------------ F( 1, 9) = 3.52
Model | .714288023 1 .714288023 Prob > F = 0.0934
-------------+------------------------------ Adj R-squared = 0.2012
Total | 2.54105312 10 .254105312 Root MSE = .45053
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
lg | .7237502 .3858088 1.88 0.093 -.1490099 1.59651
_cons | -1.117129 .8758755 -1.28 0.234 -3.098497 .864239
------------------------------------------------------------------------------
Chow Tests 30/35
For the rst test we nd that
SSR
p
= 6.9345, SSR
1
= 4.6831, SSR
2
= 1.8268
with n
1
= 19, n
2
= 11 and k = 2.
The test statistic is
TS =
(SSR
p
SSR
1
SSR
2
)/k
(SSR
1
+ SSR
2
)/(n
1
+n
2
2k)
=
(6.9345 4.6831 1.8268)/2
(4.6831 + 1.8268)/(19 + 11 4)
= 0.8479.
Under H
0
(that the coefcients are the same in the two
sub-samples) TS has an F
2,26
distribution, and so
F
0.05
= 3.3690.
As TS = 0.8026 < 3.3690 we are unable to reject H
0
.
Chow Tests 31/35
For the second test we have
SSR
p
= 6.9345, SSR
1
= 4.6831, SSR
2
= 1.8268
with n
1
= 19, n
2
= 11 and k = 2.
TS =
(SSR
p
SSR
1
)/n
2
SSR
1
/(n
1
k)
=
(6.9345 4.6831)/11
(4.6831)/(19 2)
= 0.7430.
Under H
0
11,17
F
0.05
2.4153 (the values for F
10,17
and F
12,17
are 2.4499
and 2.3807, respectively).
As TS = 0.7430 < 2.4153 we are unable to reject H
0
.
Chow Tests 32/35
We can also conduct the second test by reversing the roles
of the two sub-samples, by computing
TS =
(SSR
p
SSR
2
)/n
1
SSR
2
/(n
2
k)
=
(6.9345 1.8268)/19
(1.8268)/(11 2)
= 1.3244.
Under H
0
19,9
F
0.05
2.9365 (this is actually the value for F
20,9
).
As TS = 1.3244 < 2.9365 we are once more unable to reject
H
0
.
Chow Tests 33/35
Finally, lets compute the alternative version of the rst test
using dummy variables, dening D = 1 for countries with
GDP<$4,000.
The unrestricted regression is:
. regress lm lg d dlg
-------------+------------------------------ F( 3, 26) = 77.10
Model | 57.9099435 3 19.3033145 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.8873
Total | 64.4198259 29 2.22137331 Root MSE = .50038
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
lg | .7237502 .4285011 1.69 0.103 -.1570464 1.604547
d | -.8532358 .979691 -0.87 0.392 -2.867019 1.160548
dlg | .198932 .4513347 0.44 0.663 -.7287999 1.126664
_cons | -1.117129 .9727969 -1.15 0.261 -3.116742 .8824836
------------------------------------------------------------------------------
Chow Tests 34/35
We therefore have SSR
R
= 6.5099 while our earlier results
give SSR
U
= 6.9345; also, k = 2 and n = 30.
TS =
(6.9345 6.5099)/2
6.5099/(30 4)
= 0.8479;
this is exactly the same value as the rst Chow statistic we
computed!
In fact, it is possible to prove that the two statistics are
identical.
Because TS has an F
2,26
distributions under H
0
(as before),
we know the 5% critical value is 3.3690 and the test result
is the same (do not reject H
0
).
We have actually carried out exactly the same test but by a
slightly different approach.
Summary 35/35
Summary
Dummy variables
Chow tests
Next week:
Heteroskedasticity, autocorrelation and dynamic models

Further Regression Topics

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Further Regression Topics

Transféré par

Droits d'auteur :

Formats disponibles

1/35

EC114 Introduction to Quantitative Economics

Vous aimerez peut-être aussi