Académique Documents
Professionnel Documents
Culture Documents
Econometrics - Slides
2010/2011
João Nicolau
2
1 Introduction
Econometrics is based upon the development of statistical methods for estimating economic
relationships, testing economic theories, and evaluating and implementing government and
business policy. Application of econometrics:
forecast (e.g. interest rates, in‡ation rates, and gross domestic product).
evaluating and implementing government and business policy. For example, what are the
e¤ects of political campaign expenditures on voting outcomes? What is the e¤ect of school
spending on student performance in the …eld of education?
3
Build the economic model. An economic model consists of mathematical equations that
describe various relationships. Formal economic modeling is sometimes the starting point
for empirical analysis, but it is more common to use economic theory less formally, or
even to rely entirely on intuition.
Cross-sectional data is closely aligned with the applied microeconomics …elds, such as labor
economics, state and local public …nance, industrial organization, urban economics, demog-
raphy, and health economics.
5
Models based on Cross-Sectional Data usually satisfy the assumptions cover by the chapter
“Finite-Sample Properties of OLS”.
7
A time series data set consists of observations on a variable or several variables over time.
E.g.: stock prices, money supply, consumer price index, gross domestic product, annual
homicide rates, and automobile sales …gures, etc.
Time series data cannot be assumed to be independent across time. For example, knowing
something about the gross domestic product from last quarter tells us quite a bit about the
likely range of the GDP during this quarter ...
The analysis of time series data is more di¢ cult than that of cross-sectional data. Reasons:
time-series data exhibits unique features such as trends over time and seasonality;
models based on time-series data rarely satisfy the assumptions cover be the chapter
“Finite-Sample Properties of OLS”. The most adequate assumptions are cover by chapter
“Large-Sample Theory”, which is theoretically more advanced.
8
An example of a time series (scatterplots cannot in general be used here, but there are
exceptions):
9
Ceteris Paribus: “other (relevant) factors being equal”. Plays an important role in causal
analysis.
Example. Suppose that wages depend on education and labor force experience. Your goal
is to measure the “return to education”. If your analysis involves only wages and education
you may not uncover the ceteris paribus e¤ect of education on wages. Consider the following
data:
Example. In a totalitarianism regime how can you measure the ceteris paribus e¤ect of
another year of education on wages? You may create 100 clones of a “normal” individual.
Give to each person an amount of education and then measure their wages.
In economics you have nonexperimental data, so in principle, it is di¢ cult to estimate the
ceteris paribus e¤ects. However, we will see that econometric methods can simulate a ceteris
paribus experiment. We will be able to do in nonexperimental environments what natural
scientists are able to do in a controlled laboratory setting: keep other factors …xed.
11
This chapter covers the …nite- or small-sample properties of the OLS estimator, that is, the
statistical properties of the OLS estimator that are valid for any given sample size.
The dependent variable is related to several other variables (called the regressors or the
explanatory variables).
Let (xi1; xi2; :::; xiK ) be the i-th observation of the K regressors. The sample or data is a
collection of those n observations.
0s : regression coe¢ cients. They represent the marginal and separate e¤ects of the regres-
sors.
Example (1.1). (Consumption function): Consider
yi = 1 + e 2xi2 + "i
14
Partial E¤ects
What is the impact on the conditional expected value y; E ( yij xi1; xi2) when xi2 is increased
by a small amount
x0i = (xi1; xi2) ! xi 0 = (xi1; xi2 + xi2) (holding the other variable …xed)?
Let
E ( yij xi) E ( yij xi1 = xi1; xi2 = xi2 + xi2) E ( yij xi1; xi2) :
Equation Interpretation of 2
(level-level) yi = 1 + 2xi2 + "i E ( yij xi) = 2 xi2
2 xi2
(level-log) yi = 1 + 2 log (xi2) + "i E ( yij xi) ' 100 x i2
100
E(yi jxi )
(log-level) log (yi) = 1 + 2xi2 + "i 100 ' (100 2) xi2
E(yi jxi )
(100 2: semi-elast.)
E(yi jxi ) xi2
(log-log) log (yi) = 1 + 2 log (xi2) + "i 100 ' 2 xi2 100
E(yi jxi )
( 2: elasticity)
15
Exercise 2.1. Suppose, for example, the marginal e¤ect of experience on wages declines with
the level of experience. How can this be captured?
Exercise 2.2. Provide an interpretation of 2 in the following equations:
(a) coni = 1 + 2inci + "i; where inc: income, con: consumption (both measured in
dollars). Assume that 2 = 0:8;
(b) log (wagei) = 1 + 2educi + 3tenurei + 4expri + "i: Assume that 2 = 0:05:
(c) log (pricei) = 1 + 2 log (disti) + "i where prices = housing price and dist =
distance from a recently built garbage incinerator. Assume that 2 = 0:6:
16
We have
2 3
1
h i6 7
6 2 7
yi = 1 xi1 + 2xi2 + ::: + K xiK + "i = xi1 xi2 xiK 6 ... 7 + "i
4 5
K
= x0i + "i
where
2 3 2 3
xi1 1
6 xi2 7 6 7
xi = 6
6 ...
7
7;
6
=6 2 7
... 7
4 5 4 5
xiK K
yi = x0i + "i:
17
More compactly
2 3 2 3 2 3
y1 x11 x12 x1K "1
6 y2 7 6 x x22 x2K 7 6 "2 7
6 7 6 7
y =6 ... 7; X = 6 21 7; "i = 6
6
7
7
4 5 4 ... ... ... 5 4 ... 5
yn xn1 xn2 xnK "n
y = X + ":
Important: y and X (or yi and xik ) may be random variables or observed values. We use
the same notation for both cases.
18
E "ij xj = 0; 8i; j i 6= j
It remains to be analyzed whether or not
?
E ( "ij xi) = 0:
19
(Time Series, Static models) There is a feedback from yi on future values of xi.
xi2 = wi + ui:
Assume: E (ui) = 0; Cov (wi; ui) = 0; Cov (vi; ui) = 0. Now substituting xi2 = wi + ui
into yi = 1 + 2wi + vi we obtain
Example (Feedback from y on future values of x). Consider a simple static time-series model
to explain a city’s murder rate (yt) in terms of police o¢ cers per capita (xt):
yt = 1 + 2xt + "t;
Suppose that the city adjusts the size of its police force based on past values of the murder
rate. This means that, say, xt+1 might be correlated with "t (since a higher "t leads to a
higher yt).
Example (There is a lag dependent variable as a regressor). See section 2.1.5.
Exercise 2.3. Let kids denote the number of children ever born to a woman, and let educ
denote years of education for the woman. A simple model relating fertility to years of
education is
kidsi = 1 + 2educi + "i:
where "i is the unobserved error. (i) What kinds of factors are contained in "i? Are these
likely to be correlated with level of education? (ii) Will a simple regression analysis uncover
the ceteris paribus e¤ect of education on fertility? Explain.
22
E ("i) = 0; 8i:
E "ij xj = 0; 8i; j:
E xjk "i = 0; 8i; j; k or E xj "i = 0; 8i; j The regressors are orthogonal to the
error term for all observations
For time-series models where strict exogeneity can be rephrased as: the regressors are or-
thogonal to the past, current, and future error terms. However, for most time-series models,
strict exogeneity is not satis…ed.
Example. Consider
yi = yi 1 + "i; E ( "ij yi 1) = 0 (thus E (yi 1"i) = 0).
Let xi = yi 1: By construction we have
2
E (xi+1"i) = E (yi"i) = ::: = E "i 6= 0:
The regressor is not orthogonal to the past error term, which is a violation of strict exogeneity.
However, the estimator may possess good large-sample properties without strict exogeneity.
None of the K columns of the data matrix X can be expressed as a linear combination of
the other columns of X.
Example (1.4 - continuation of Example 1.2). If no individuals in the sample ever changed
jobs, then tenurei = expri for all i, in violation of the no multicollinearity assumption.
There no way to distinguish the tenure e¤ect on the wage rate from the experience e¤ect.
Remedy: drop tenurei or expri from the wage equation.
Example (Dummy Variable Trap). Consider
wagei = 1 + 2educi + 3f emalei + 4malei + "i
where
(
1 if i corresponds to a female
f emalei = ; malei = 1 f emalei:
0 if i corresponds to a male
In vectorial notation we have
wage = 11 + 2 educ + 3 female + 4 male + ":
It is obvious that 1 = female + male: Therefore the above model violates Assumption
1.3. One may also justify using scalar notation: xi1 = f emalei + malei because this
relationship implies 1 = female + male: Can you overcome the dummy variable trap by
removing xi1 1 from the equation?
25
Exercise 2.4. In a study relating college grade point average to time spent in various activ-
ities, you distribute a survey to several students. The students are asked how many hours
they spend each week in four activities: studying, sleeping, working, and leisure. Any activity
is put into one of the four categories, so that for each student the sum of hours in the four
activities must be 168. (i) In the model
Exercise 2.5. Under the Assumptions 1.2 and 1.4, show that Cov yi; yj X = 0:
26
E ""0 X = 2I:
Note
2 3
E "21 X E ( " 1 "2 j X ) E ( "1 "n j X )
6 7
6 ( " " j X) 7
0 6 E 1 2 E "22 X E ( "2 "n j X ) 7
E "" X =6
6 ... ... ... ... 7:
7
4 5
E ( " 1 "n j X ) E ( " 2 " n j X ) E "2n X
27
The sample (y; X) is a random sample if f(yi; xi)g is i.i.d. (independently and identically
distributed) across observations. Random sample automatically implies:
This is a simplifying (and generally an unrealistic) assumption to make the statistical analysis
tractable. It means that X is exactly the same in repeated samples. Sampling schemes that
support this assumption:
a) Experimental situations. For example, suppose that y represents the yields of a crop
grown on n experimental plots, and let the rows of X represent the seed varieties, irrigation
and fertilizer for each plot. The experiment can be repeated as often as desired, with the
same X. Only y varies across plots.
yi x0i ~ :
Vector of residuals (evaluated at ~ ):
y X~:
Sum of squared residuals (SSR):
n
X 2 0
SSR ~ = yi x0i ~ = y X~ y X~ :
i=1
The OLS (Ordinary Least Squares):
K = 1 ; y i = x i + "i
Example. Consider yi = 1 + 2xi2 + "i: The data:
y X
1 1 1
3 1 3
2 1 1
8 1 3
12 1 8
!
0
Verify that SSR ~ = 42 when ~ = :
1
31
@ 2SSR ~
0 is a Positive De…nite Matrix , b is global minimum point.
~
@ @ ~
32
X0Xb = X0y or
X0 (y Xb) = 0:
33
This is a system with K equations and K unknowns. These equations are called the normal
equations. If
1
rank (X) = K ) X0 X is nonsingular ) there exists X0 X :
Therefore, if rank (X) = K we have a unique solution:
1
b = X0 X X0 y OLS estimator.
The SOC is
@ 2SSR ~
0 = 2 X0 X:
@ ~@ ~
If rank (X) = K then 2X0X is a positive de…nite matrix thus SSR ~ is strictly convex
in Rk . Hence b is a global minimum point.
e=y Xb
is called the vector of OLS residuals (or simply residuals).
34
1
b = X0 X X0 y
X0 X 1 X0 y
b= n n = Sxx1Sxy ; where
n
X0 X 1X
Sxx = = xix0i (sample average of xix0i)
n n i=1
n
X0 y 1X
Sxy = = xiy (sample average of xiyi).
n n i=1
y X
1 1 1
3 1 3
2 1 1
8 1 3
12 1 8
Properties:
Exercise 2.7. Show that P and M are symmetric and idempotent and
PX = X
MX = 0
y
^ = Py
e = My = M"
SSR = e0e = y0My = "0M":
37
The OLS estimate of 2 (the variance of the error term), denoted s2, is
SSR e0 e
s2 = =
n K n K
s2 is called the standard error of regression.
y y
25 R^2 = 0.96 60 y
50 y^
20 40
30 R^2 = 0.19
15 20
y 10
10
y^ 0
x
-3 -2 -1 -10 0 1 2 3
5
-20
0 x -30
-3 -2 -1 0 1 2 3 -40
-5 -50
y
17
16
15
14
13 y
12 y^
11
10 R^2 = 0.00
9
8 x
-3 -2 -1 0 1 2 3
39
“The most important thing about R2 is that it is not important” (Goldberger). Why?
We are concerned with parameters in a population, not with goodness of …t in the sample;
Assumptions:
1.1 - Linearity: yi = 1xi1 + 2xi2 + ::: + K xiK + "i:
1.2 - Strict exogeneity: E ( "ij X) = 0:
1.3 - No multicollinearity.
1.4 - Spherical error variance: E "2i X = 2; E "i"j X = 0:
0
Matrix P = X X0X 1 X
Py ! Fitted values from the regression of y on X
Pz ! ?
1 0
Matrix M = I P = I X X0 X X
My ! Residuals from the regression of y on X
Mz ! ?
h i
Consider a partition of X as follows X = X1 X2
1 0
Matrix P1= X1 X01X1 X
1
P1y ! ?
1 0
Matrix M1= I P1 = I X1 X01X1 X
1
M1 y ! ?
44
Partition X as
h i
X= X 1 X2 ; XK 1 n; XK2 n ; K1 + K2 = K
Long Regression
We have
" #
h i b1
y=y
^ + e = Xb + e = X1 X 2 + e = X1b1 + X2b2 + e:
b2
Short Regression
Suppose that we shorten the list of explanatory variables and regress y on X1: We have
y=y
^ + e = X1b1 + e
where
1
b1 = X01X1 X1 y
e = M1 y ; M1 = I X1 X01X1 X01
45
b1 vs. b1
We have,
1
b1 = X01X1 X1 y
1
= X01X1 X01 (X1b1 + X2b2 + e)
1 1
= b1 + X01X1 X01X2b2 + X01X1 X01e
| {z }
0
1
= b1 + X01X1 X01X2b2
1
= b1 + Fb2; F = X01X1 X01X2:
Thus, in general, b1 6= b1: Exceptional cases: b2 = 0 or X01X2 = O ) b1 = b1:
46
e vs. e
We have,
e = M1 y
= M1 (X1b1 + X2b2 + e)
= M1X1b1 + M1X2b2 + M1e
= M1X2b2 + e;
= v+e
Thus,
e 0e = e0e + v0v e0e
Thus the SSR of the short regression (e 0e ) exceeds the SSR of the long regression (e0e)
and e 0e = e0e i¤ v = 0; that is i¤ b2 = 0:
47
Consider
y = X +"
= X 1 1 + X2 2 + ":
Premultiplying both sides by M1 and using M1X1 = 0; we obtain
M1 y = M1 X 1 1 + M 1 X 2 2 + M1 "
y ~ 2 2 + M1 "
~ = X
The OLS gives
1 1 1
~0 X
b2 = X ~ ~0 y
X ~ = ~0 X
X ~ ~ 0 M1 y = X
X ~0 X
~ ~0 y
X
2 2 2 2 2 2 2 2 2
Thus
1
~0 X
b2 = X ~ ~0 y
X
2 2 2
49
1
~0 X
Another way to prove b2 = X ~ ~ 0 y (you may skip this proof). We have
X
2 2 2
1 1
~0 X
X ~ ~0 y =
X ~0 X
X ~ ~ 0 (X1b1 + X2b2 + e)
X
2 2 2 2 2 2
1 1 1
= ~0 X
X ~ ~ 0 X1b1 + X
X ~0 X
~ ~ 0 X2b2 + X
X ~0 X
~ ~0 e
X
2 2 2 2 2 2 2 2 2
| {z } | {z } | {z }
0 b2 0
= b2
since:
1 1
~0 X
X ~ ~ 0 X1b1 = X
X ~0 X
~ X02M1X1b1
2 2 2 2 2
= 0
1 0 1
~0 X
X ~ ~ X2b2 = X
X ~0 X
~ X02M1X2b2
2 2 2 2 2
0 0 1 0
= X 2 M 1 M1 X 2 X2M1X2b2
0 1 0
= X 2 M 1 X2 X2M1X2b2
= b2
~0 e =
X X02M1e
2
= X02e
= 0:
50
1 1
~0 X
The conclusion is that we can obtain b2 = X ~ ~0 y = X
X ~0 X
~ ~0 y
X
2 2 2 2 2 2 ~ as follows:
OR:
1’) Same as 1).
2’a) Regress y on X1 to get the residuals y
~ = M1 y :
2’b) Regress y ~ 2 to get the coe¢ cient b2 of the long regression.
~ on X
h i
Example. Consider X = 1 exper tenure IQ educ and
h i
X1 = 1 exper tenure IQ ; X2 = educ
53
54
Suppose that yt and xt have a linear trend. Should the trend term be included in the
regression as in the case
yt = 1 + 2xt2 + 3xt3 + "t; xt3 = t
or should the variables …rst be “detrended” and then used without the trend term included
as in
y~t = 2x
~t2 + ~
"t ?
According to the previous results, the OLS coe¢ cient b2 is the same in both regressions.
In the second regression b2 is obtained from the regression of y
~ = M1y on x
~ 2 = M1 x 2
where
2 3
1 1
h i 6 7
6 1 2 7
X1 = 1 x 3 = 6 .. .. 7 :
4 . . 5
1 n
55
EQ01
Dependent Variable: TXDES EQ02
Method: Least Squares Dependent Variable: TXDES
Sample: 1948 2003 Method: Least Squares
Sample: 1948 2003
Variable Coefficient Std. Error t-Statistic Prob.
Variable Coefficient Std. Error t-Statistic Prob.
C 4.463068 0.425856 10.48023 0.0000
INF 0.104712 0.063329 1.653473 0.1041 C 4.801316 0.379453 12.65325 0.0000
@TREND 0.027788 0.011806 2.353790 0.0223 @TREND 0.030277 0.011896 2.545185 0.0138
EQ03
Dependent Variable: INF EQ04
Method: Least Squares Dependent Variable: TXDES_
Sample: 1948 2003 Method: Least Squares
Sample: 1948 2003
Variable Coefficient Std. Error t-Statistic Prob.
Variable Coefficient Std. Error t-Statistic Prob.
C 3.230263 0.802598 4.024758 0.0002
@TREND 0.023770 0.025161 0.944696 0.3490 INF_ 0.104712 0.062167 1.684382 0.0978
56
Suppose that we have data on the variable y; quarter by quarter, for m years. A way to deal
with (deterministic) seasonality is the following
yt = 1Qt1 + 2Qt2 + 3Qt3 + 4Qt4 + 5xt5 + "i
where
(
1 in quarter i
Qti =
0 otherwise.
Let
h i h i
X= Q1 Q2 Q3 Q4 x 5 ; X1 = Q1 Q2 Q3 Q4 :
Previous results show that b5 can be obtained from the regression of y
~ = M1y on x
~ 5=
M1x 5: It can be proved
8
>
> yt yQ1 in quarter 1
>
>
< y yQ2 in quarter 2
t
y~t =
>
> yt yQ3 in quarter 3
>
>
: y yQ4 in quarter 4
t
where yQi is the seasonal mean of quarter i:
57
so that
y
^ = x 1b1 + X2b2:
~ 2 = M1X2 where
1) Regress X2 on x 1 to get the residuals X
1 0 x 1x0 1
M1 = I x 1 x0 1x 1 x 1 =I :
n
58
As we know
~ 2 = M1 X2
X
h i
= M1 x 2 x K
h i
= M1 x 2 M1 x K
2 3
x12 x2 x1K xK
6 ... ... 7
= 4 5:
xn2 x2 xnK xK
Consider:
E ( y j X) = X1 1 + X 2 2
= X
Var ( yj X) = 2 I; etc.
60
b1 is a biased estimator of 1
Given that
1 1
b1 = X01X1 X01y = b1 + Fb2; F= X01X1 X01X2:
we have
E ( b1j X) = E ( b1 + Fb2j X) = 1 + F 2;
1 1 1
Var ( b1j X) = Var X01X1 X01y X = X01X1 X01 Var ( yj X) X1 X01X1
2 1
= X01X1
thus, in general,
b1 is a biased estimator of 1 (“omitted-variable bias”)
unless:
Consider b1 = b1 Fb2
In practise there may be a bias-variance trade-o¤ between short and long regression when
the target is 1:
62
Exercise 2.9. Consider the standard simple regression model yi = 1 + 2xi2 + "i under
Assumptions 1.1 through 1.4. Thus, the usual OLS estimators b1 and b2 are unbiased for
their respective population parameters. Let b2 be the estimator of 2 obtained by assuming
the intercept is zero i.e. 1 = 0 (i) Find E b2 X . Verify that b2 is unbiased for 2 when
the population intercept 1 is zero. Are there other cases where b2 is unbiased? (ii) Find the
variance of b2. (iii) Show that Var b2 X Var ( b2j X); (iv) Comment on the trade-o¤
between bias and variance when choosing between b2 and b2.
Exercise 2.10. Suppose that average worker productivity at manufacturing …rms (avgprod)
depends on two factors, average hours of training (avgtrain) and average worker ability
(avgabil):
avgprodi = 1 + 2avgtraini + 3avgabili + "i
Assume that this equation satis…es Assumptions 1.1 through 1.4. If grants have been given to
…rms whose workers have less than average ability, so that avgtrain and avgabil are negatively
correlated, what is the likely bias in b2 in obtained from the simple regression of avgprod on
avgtrain?
63
Let’s see now that the omission of explanatory variables leads to an increase in the expected
SSR. We have, by R5,
0
E e e X = E y0M1y X = tr (M1 Var ( yj X)) + E ( yj X)0 M1 E ( yj X)
= 2 tr (M1) + 0 X ~ 2 = 2 (n K1) + 0 X
~0 X
2 2 2
~0 X
~2
2 2 2
and E e0e X = 2 (n K ) thus
0 0 2 0 ~0 X
~
E e e X E e e X = K2 + 2X 2 2 2 > 0:
Notice that: e 0e ~0 X
e0e = b02X ~ ~0 X
0: (check E b02X ~ 2K
2 2 b2 2 2 b2 X = 2 +
~0 X
0X ~ 2 ).
2
2 2
64
C) Residual Regression
If follows that
2
Var ( bK j X) = 0
x K M1 x K
and x0 K M1x K is the sum of the squared residuals in the auxiliary regression
x K = 1x 1 + 2x 2 + ::: + K 1x K 1 + error:
One can conclude (assuming that x 1 is the summer vector):
2 x0 K M1x K
RK =1 P 2
:
(xiK xK )
Solving this equation for x0 K M1x K we have
X
x0 K M1 x K = 1
2
RK (xiK xK )2 :
We get
2 2
Var ( bK j X) = P 2
= :
1 2
RK (xiK xK ) 1 2 S2 n
RK xK
66
2 2
Var ( bK j X) = = :
1 2 P (x
RK xK ) 2
1 2 S2 n
RK
iK xK
We can conclude that the precision of bK is high (i.e. Var (bK ) is small) when:
2 is low;
Exercise 2.11. Consider: sleep: minutes sleep at night per week; totwrk: hours worked
per week; educ: years of schooling; female: binary variable equal to one if the individual
is female. Do women sleep more than men? Explain the di¤erences between the estimates
32.18 and -90.969.
Variable Coefficient Std. Error t-Statistic Prob. C 3838.486 86.67226 44.28737 0.0000
TOTWRK -0.167339 0.017937 -9.329260 0.0000
C 3252.407 22.22211 146.3591 0.0000 EDUC -13.88479 5.657573 -2.454196 0.0144
FEMALE 32.18074 33.75413 0.953387 0.3407 FEMALE -90.96919 34.27441 -2.654143 0.0081
R-squared 0.001289 Mean dependent var 3266.356 R-squared 0.119277 Mean dependent var 3266.356
Adjusted R-squared -0.000129 S.D. dependent var 444.4134 Adjusted R-squared 0.115514 S.D. dependent var 444.4134
S.E. of regression 444.4422 Akaike info criterion 15.03435 S.E. of regression 417.9581 Akaike info criterion 14.91429
Sum squared resid 1.39E+08 Schwarz criterion 15.04726 Sum squared resid 1.23E+08 Schwarz criterion 14.94012
68
Example. The goal is to analyze the impact of another year of education on wages. Consider:
wage: monthly earnings; KWW: knowledge of world work score (KWW is a general test of
work-related abilities); educ: years of education; exper: years of work experience; tenure:
years with current employer
Dependent Variable: LOG(WAGE)
Method: Least Squares
Dependent Variable: LOG(WAGE) Sample: 1 935
Method: Least Squares White Heteroskedasticity-Consistent Standard Errors & Covariance
Sample: 1 935
White Heteroskedasticity-Consistent Standard Errors & Covariance Variable Coefficient Std. Error t-Statistic Prob.
Variable Coefficient Std. Error t-Statistic Prob. C 5.496696 0.112030 49.06458 0.0000
EDUC 0.074864 0.006654 11.25160 0.0000
C 5.973062 0.082272 72.60160 0.0000 EXPER 0.015328 0.003405 4.501375 0.0000
EDUC 0.059839 0.006079 9.843503 0.0000 TENURE 0.013375 0.002657 5.033021 0.0000
R-squared 0.097417 Mean dependent var 6.779004 R-squared 0.155112 Mean dependent var 6.779004
Adjusted R-squared 0.096449 S.D. dependent var 0.421144 Adjusted R-squared 0.152390 S.D. dependent var 0.421144
S.E. of regression 0.400320 Akaike info criterion 1.009029 S.E. of regression 0.387729 Akaike info criterion 0.947250
Sum squared resid 149.5186 Schwarz criterion 1.019383 Sum squared resid 139.9610 Schwarz criterion 0.967958
y = X1b1 + X2b2 + e;
and the following coe¢ cients (obtained from the short regressions):
1 1
b1 = X01X1 X01y; b2 = X02X2 X02y:
Decide if you agree or disagree with the following statement: if Cov b1; b2 X1; X2 = O
(zero matrix) then b1 = b1 and b2 = b2:
70
2.5 Multicollinearity
If rank (X) < K then b is not de…ned. This is called strict multicollinearity. When this
happens, the statistical software will be unable to construct X0X 1 : Since the error is
discovered quickly, this is rarely a problem for applied econometric practice.
The more relevant situation is near multicollinearity, which is often called “multicollinearity”
for brevity. This is the situation when the X0X is near singular, when the columns of X are
close to linearly dependent.
Consequence: the individual coe¢ cient estimates will be imprecise. We have shown that
2
Var ( bK j X) = :
1 2 S2 n
RK xK
2 is the coe¢ cient of determination in the auxiliary regression
where RK
x K = 1x 1 + 2x 2 + ::: + K 1x K 1 + error:
71
Exercise 2.14. Do you agree with the following quotations: (a) “But more data is no remedy
for multicollinearity if the additional data are simply "more of the same." So obtaining lots
of small samples from the same population will not help” (Johnston, 1984); (b) “Another
important point is that a high degree of correlation between certain independent variables
can be irrelevant as to how well we can estimate other parameters in the model.”
Exercise 2.15. Suppose you postulate a model explaining …nal exam score in terms of class
attendance. Thus, the dependent variable is …nal exam score, and the key explanatory
variable is number of classes attended. To control for student abilities and e¤orts outside
the classroom, you include among the explanatory variables cumulative GPA, SAT score, and
measures of high school performance. Someone says, “You cannot hope to learn anything
from this exercise because cumulative GPA, SAT score, and high school performance are
likely to be highly collinear.” What should be your answer?
72
Assumption 1.5 together with Assumptions 1.2 and 1.4 implies that
"j X N 0; 2I and yj X N X ; 2I :
1. z N (0; I) , z0z 2 :
(n)
2 ; w 2 ; w w =m
2. w1 (m) 2 (n) 1 and w2 are independent, w1 =n F (m; n) :
2
3. w 2 ; z
(n) N (0; 1) ; w and z are independent, p z t(n):
w=n
4. Asymptotic Results:
d
v F (m; n) ) mv ! 2(m) as n ! 1
d
u t(n) ) u ! N (0; 1) as n ! 1:
w = (y X )0 1 (y X ) 2 :
(n)
74
"0M" X 2 :
(r)
1
8. bj X N ; 2 X0 X :
9. Let r = R (Rp K ) with rank (R) = p (in Hayashi’s notation p is equal to #r):
Then,
1
Rbj X N r; 2R X0 X R0 :
75
1
10. Let bk be the kth element of b and q kk the (k; k) element of X0X : Then,
b
bk j X N k;
2 q kk or zk = kq k N (0; 1) :
q kk
0 1 1 2 2 :
11. w = (Rb r) R X0 X R0 (Rb r) = (p)
2
(bk k) 2 :
12. wk = 2 q kk (1)
13. w0 = e0e= 2 2
(n K) :
b 1
16. tk = k^ k t (n K ) ; where ^ 2b is the (k; k) element of s2 X0X :
bk k
17. q Rb R t (n K ) ; R is of type 1 K
s R(X0 X) 1 R0
0 1 1
18. F = (Rb r) R X0 X R0 (Rb r) = ps2 F (p; n K) :
Exercise 2.16. Prove the results #8, #9, #16 and #18 (take the other results as given).
P jtj < t =2 = 1 :
78
P (F > F ) = 1
79
(1 ) 100% Con…dence region for the parameter vector (consider R = I in the pre-
vious case)
n o
: (b ) 0
X0 X (b ) =s2 pF :
80
Exercise 2.17. Consider yi = 1xi1 + 2xi2 + "i where yi = wagesi wages; xi1 =
educi educ; xi2 = experi exper: The results are
Dependent Variable: Y
Method: Least Squares
Sample: 1 526
X
X1 0.644272 0.053755 11.98541 0.0000
X2 0.070095 0.010967 6.391393 0.0000
" # " #
4025:4297 5910:064 1 2:7291 10 4 1:6678 10 5
X0 X = ; X0 X =
5910:064 96706:846 1:6678 10 5 1:1360 10 5
(c) Build the 95% con…dence region for the parameter vector :
81
.10
.09
.08
beta2
.07
.06
.05
.04
.50 .55 .60 .65 .70 .75 .80
beta1
Suppose that we have a hypothesis about the kth regression coe¢ cient:
H0 : k = 0k
( 0k is a speci…c value, e.g. zero), and that this hypothesis is tested against the alternative
hypothesis
H1 : k 6= 0k :
We do not reject H0 at the 100% level if
0 lies within the (1 ) 100% CI for k ; i.e., bk t =2 ^ bk ;
k
reject H0 otherwise. Equivalently, calculate the test statistic
bk 0
tobs = k
^ bk
and,
if jtobsj > t =2 then reject H0;
if jtobsj t =2 then do not reject H0:
83
Other cases:
H0 : k = 0k vs: H1 : k > 0k ;
if tobs > t then reject H0 at the 100% level; otherwise do not reject H0:
H0 : k = 0k vs: H1 : k < 0k ;
if tobs < t then reject H0 at the 100% level; otherwise do not reject H0:
84
p-value
p-value (or p) is the probability of obtaining a test statistic at least as extreme as the one that
was actually observed, assuming that the null hypothesis is true. p is an informal measure
of evidence of the null hypothesis.
Example. Consider H0 : k = 0k vs: H1 : k 6= 0k
H0 : k = 0k vs. H1 : k 6= 0k
When the null is rejected we say that bk (not k ) is signi…cantly di¤erent from 0k at
100% level. Some authors also say “the variable (associated with bk ) is statistically
signi…cant at 100% level”.
When the null isn’t rejected we say that bk (not k ) is not signi…cantly di¤erent from
0 at 100% level or that the variable is not statistically signi…cant at 100% level.
k
86
More Remarks:
Rejection of the null is not proof that the null is false. Why?
Acceptance of the null is not proof that the null is true. Why? We prefer to use the
language “we fail to reject H0 at the x% level” rather than “H0 is accepted at the x%
level.”
The statistical signi…cance of a variable is determined by the size of tobs = bk =se (bk ) ;
whereas the economic signi…cance of a variable is related to the size and sign of bk :
Example. Suppose that in a business activity we have
log\
(wagei) = :1 + 0:01 f emale + ::: n = 600
(0:001)
H0 : 2 = 0 vs. H1 = 2 6= 0: We have:
b2
t0k= t(600 K) N (0; 1) (under the null)
^ b2
0:01
tobs = = 10;
0:001
p-value = P t0k > 10 H0 is true 0:
Discuss statistical versus economic signi…cance.
88
Exercise 2.18. Can we say that students at smaller schools perform better than those at
larger schools? To discuss this hypothesis we consider data on 408 high schools in Michigan
for the year 1993 (see Wooldridge, chapter 4). Performance is measured by the percentage
of students receiving a passing score on a tenth grade math test ( math10). School size
is measured by student enrollment ( enroll). We will control for two other factors, average
annual teacher compensation ( totcomp) and the number of sta¤ per one thousand students
( sta¤ ). Teacher compensation is a measure of teacher quality, and sta¤ size is a rough
measure of how much attention students receive. Figure below reports the results. Answer
to the initial question.
Exercise 2.19. We want to relate the median housing price ( price) in the community to
various community characteristics: nox is the amount of nitrous oxide in the air, in parts
per million; dist is a weighted distance of the community from …ve employment centers, in
miles; rooms is the average number of rooms in houses in the community; and stratio is
the average student-teacher ratio of schools in the community. Can we conclude that the
elasticity of price with respect to nox is -1? (Sample: 506 communities in the Boston area -
see Wooldridge, chapter 4).
H0 : R = r vs. H1 : R 6= r:
where R p 1; Rp K ). The test statistics is
1 1
F0 = (Rb r)0
R X0 X R0 (Rb r) = ps2 :
F0 F(p;n K):
If we observe F 0 > F and the H0 is true, then a low-probability event has occurred.
91
In the case p = 1 (single linear combination of the elements of ) one may use the test
statistics
0 Rb R
t = q t (n K ) :
1
s R (X0X) R0
Example. We consider a simple model to compare the returns to education at junior colleges
and four-year colleges; for simplicity, we refer to the latter as “universities”(See Wooldridge,
chap. 4).The model is
2 3
0:0023972 9:4121 10 5 8:50437 105 1:6780 10 5
6 7
1 6 9:41217 10 5 0:0002520 1:04201 10 5 9:2871 10 8 7
X0 X =6
6
7
7
4 8:50437 10 5 1:0420 10 5 2:88090 10 5 2:12598 10 7 5
1:67807 10 5 9:2871 10 8 2:1259 10 7 1:3402 10 7
Under the null, the test statistics is
Rb R
t0 = q t (n K) :
1
s R (X0X) R0
93
We have
h i
R = 0 1 1 0
q
1
R (X0X) R0 = 0:016124827
q
s R (X0X) 1 R0 = 0:430138 0:016124827 = 0:006936
2 3
1:472326
h i 6 0:066697 7
6 7
Rb = 0 1 1 0 6 7 = 0:01018
4 0:076876 5
0:004944
2 3
1
h i6 7
6 2 7
R = 0 1 1 0 6 7= 2 3 = 0 (under H0)
4 3 5
4
0:01018
tobs = = 1:467
0:006936
t0:05 = 1:645:
We do not reject H0 at the 5% level. There is no evidence against 2 = 3 at 5% level.
94
H0 : R = r vs. H1 : R 6= r:
(where R p 1; Rp K ). It can be proved that
1 1
F0 = (Rb r)0
R X0 X R0 (Rb r) = ps2
e 0e e0e =p
=
e0e= (n K )
R2 R2 =p
= F (p; n K)
1 R2 = (n K)
where refers to the short regression or the regression subjected to the constraint R = r.
96
Example. Consider once again the equation log (wagesi) = 1 + 2jci + 3univi +
4 experi + "i and H0 : 2 = 3 against H0 : 2 6= 3 : The results of the regression
subjected to the constraint H0 : 2 = 3 are
In the case “all slopes zero” (test of signi…cance of the complete regression), it can be
proved that F o equals
R2= (K 1)
F0 = :
1 R2 = (n K)
Having speci…ed the distribution of the error vector, we can use the maximum likelihood
(ML) principle to estimate the model parameters = 0; 2 0.
ML principle: choose the parameter estimates to maximize the probability of obtaining the
data. Maximizing the joint density associated with the data, f y; X; ~ ; leads to the same
solution. Therefore:
0.0012
joint density
0.0011
0.0010
0.0009
0.0008
0.0007
0.0006
0.0005
0.0004
0.0003
0.0002
0.0001
0.0000
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
theta
100
d 6 (1 )4 6
=0,^=
d 10
and since
d2 6 (1 )4
<0
d 2
^ = 0:6 maximizes f y;~ : ^ is the “most likely” value ; that is the value that maximizes
the probability of observing (y1; :::; y10) : Notice that the ML estimator is y:
In most cases we prefer to solve max log f y; X; ~ rather max f y; X; ~ ; since the
transformation log greatly simplify the likelihood (products become sums).
101
Assumption 1.5 (the normality assumption) together with Assumptions 1.2 and 1.4 imply
that the distribution of " conditional on X is N 0; 2I . Thus,
"j X N 0; 2I ) yj X N X ; 2I )
2 n=2 1 0
f ( y j X; ) = 2 exp 2
(y X ) (y X ) )
2
n 2 1
log f ( yj X; ) = log 2 2
(y X )0 (y X ):
2 2
It can be proved
n
X n
n 2 1 X 2
log f ( yj X; ) = log f ( yij xi) = log 2 2
yi x0i :
i=1 2 2 i=1
Proposition (1.5 - ML Estimator of and 2). Suppose Assumptions 1.1-1.5 hold. Then,
1
M L estimator of = X0 X X0 y :
2 e0e 2 e0 e
M L estimator of = 6= s = :
n n K
103
e0 e 6= 2:
E n
e 0e
limn!1 E n = 2:
Proposition (1.6 - b is the Best Unbiased Estimator BUE). Under Assumptions 1.1-1.5,
the OLS estimator b of is BUE in that any other unbiased (but not necessarily linear)
estimator has larger conditional variance in the matrix sense.
This result should be distinguished from the Gauss-Markov Theorem that b is minimum
variance among those estimators that are unbiased and linear in y. Proposition 1.6 says
that b is minimum variance in a larger class of estimators that includes nonlinear unbiased
estimators. This stronger statement is obtained under the normality assumption (Assumption
1.5) which is not assumed in the Gauss-Markov Theorem. Put di¤erently, the Gauss-Markov
Theorem does not exclude the possibility of some nonlinear estimator beating OLS, but this
possibility is ruled out by the normality assumption.
104
Exercise 2.22. Suppose yi = x0i + "i where "ij X t(v): Assume that Assumptions
1.1-1.4 hold. Use your intuition to answer “true” or “false” to the following statements:
(c) the BUE estimator can only be obtained numerically (i.e. there is not a closed formula
for the BUE estimator).
The model y = X + " based on the assumptions Assumptions 1.1-1.3 and E ""0 X =
2 V is called generalized regression model.
Example (case where E "2i X depends on X). Consider the following model
yi = 1 + 2xi2 + "i
to explain household expenditure on food (y ) as a function of household income. Typical
behavior: Low-income household do not have the option of extravagant food tastes: they
have few choices and are almost forced to spend a particular portion of their income on food;
High-income household could have simple food tastes or extravagant food tastes: income by
itself is likely to be relatively less important as an explanatory variable.
20
18
16
y : Expenditure
14
12
10
8
6
4
2
0
6 7 8 9 10 11 12 13
x : Income
108
If e accurately re‡ects the behavior of the "; the information in the previous …gure suggests
that the variability of yi increases as income increases, thus it is reasonable to suppose that
1. The Gauss-Markov Theorem no longer holds for the OLS estimator. The BLUE is some
other estimator.
2. The t-ratio is not distributed as the t distribution. Thus, the t-test is no longer valid. The
same comments apply to the F-test. Note that Var ( bj X) is no longer 2 X0X 1 : In
e¤ect,
1 1 1
Var ( bj X) = Var X0 X X0 y X = X0 X X0 Var ( yj X) X X0 X
2 1 1
= X0 X X0VX X0X :
On the other hand,
2 tr (MVM) 2 tr (MV)
E e0e X tr (Var ( ej X))
E s2 X = = = = :
n K n K n K n K
The conventional standard errors are incorrect when Var ( yj X) 6= 2I: Con…dence
region and hypothesis test procedures based on the classical regression model are not
valid.
110
3. However, the OLS estimator is still unbiased, because the unbiasedness result (Propo-
sition 1.1 (a)) does not require Assumption 1.4. In e¤ect,
0 1 1
E ( bj X) = X X X0 E ( y j X) = X0 X X0 X = ; E (b) =
1 1
Use b to estimate and Var ( bj X) = 2 X0X X0VX X0X for inference
purposes. Note that yj X N X ; 2V implies
1 1
bj X N ; 2 X0 X X0VX X0X :
This is not a good solution as if you know V you may use a more e¢ cient estimator, as
we will see below. Later on, in chapter “Large Sample Theory” we will …nd that 2V
may be replaced by a consistent estimator.
If the value of the matrix function V is known, a BLUE estimator for , called generalized
least squares (GLS), can be deduced. The basic idea of the derivation is to transform
the generalized regression model into a model that satis…es all the assumptions, including
Assumption 1.4, of the classical regression model. Consider
y = X + "; 0 2
E "" X = V:
We should multiply both sides of the equation by a nonsingular matrix C (depending on X)
Cy = CX + C"
y ~ +~
~ = X "
" verify E ~
such that the transformed error ~ "0 X = 2I; i.e.
"~
Given CVC0 = I, how to …nd C? Since V is by construction symmetric and positive de…-
nite, there exists a nonsingular n n matrix C such
1 1
V=C C0 or V 1 = C0C
Note
1 0
CVC0 = CC 1 C0 C = I:
It easy to see that if y = X + " satis…es Assumptions 1.1-1.3 and Assumption 1.5 (but
not Assumption 1.4), then
y ~ +~
~=X "; where y ~ = CX
~ = Cy; X
satis…es Assumptions 1.1-1.5. Let
1 1X 1 1 y:
~ 0X
^ GLS = X ~ ~ 0y
X ~ = X0 V X0 V
113
E ^ GLS X = :
(b) (expression for the variance) Under Assumptions 1.1-1.3 and the assumption E ""0 X =
2 V that the conditional second moment is proportional to V,
2 1
Var ^ GLS X = X0 V 1 X :
(c) (the GLS estimator is BLUE) Under the same set of assumptions as in (b), the GLS
estimator is e¢ cient in that the conditional variance of any unbiased estimator that is linear
in y is greater than or equal to Var ^ GLS X in the matrix sense.
We have
2 3 2 3
v1 0 0 1=v1 0 0
6 0 v 0 7 6 0 1=v2 0 7
V = 6
6 .. 2
... . . . ...
7
7) V 1 6
= 6 ... ... ... ...
7
7)
4 . 5 4 5
0 0 vn 0 0 1=vn
2 p 3
1= v1 0 0
6 p 7
0 1= v2 0
C = 6
6 ... ... ... ...
7
7:
4 5
p
0 0 1= vn
115
Now
2 y 3
2
p 32 3 p1
1= v1 0 0 y1 6 yv1 7
6 p 76 y 7 6 p2 7
6 0 1= v2 0 76 2 7 6 7
y
~ = Cy = 6 .
.. .
.. ... .
.. 7 6 .. 7 = 6 .v2 7
4 54 . 5 6 .. 7
p 4 y 5
0 0 1= vn yn pn
vn
2 p 32 3
1= v1 0 0 1 x12 x1K
6 p 76 1 7
~ = CX = 6 6 0 1 = v2 0 76 x22 x2K 7
X ... ... ... ... 7 6 .. ... ... ... 7
4 54 . 5
p
0 0 1= vn 1 xn2 xnK
2 p p p 3
1= v1 x12= v1 x1K = v1
6 1=pv x =
p
v x =
p 7
v2 7
6 2 22 2 2K
= 6 ... ... ... ... 7:
4 5
p p p
1= vn xn2= vn xnK = vn
Notice:
!
" 1 1
"ij X) = Var p ix xi2
Var (~ = x Var ( "ij xi2) = x 2exi2 = 2:
e i2 e i2 e i2
E¢ cient estimation under a known form of heteroskedasticity is called the weighted regression
(or the weighted least squares (WLS)).
Example. Consider wagei = 1 + 2educi + 3experi + "i:
30 30
25 25
20 20
WAGE
WAGE
15 15
10 10
5 5
0 0
0 10 20 30 40 50 60 0 4 8 12 16 20
EXPER EDUC
118
300
250
Dependent Variable: WAGE
Method: Least Squares 200
Sample: 1 526
RES2
Variable Coefficient Std. Error t-Statistic Prob.
150
C -3.390540 0.766566 -4.423023 0.0000
EDUC 0.644272 0.053806 11.97397 0.0000
EXPER 0.070095 0.010978 6.385291 0.0000 100
Exercise 2.23. Let fyi; i = 1; 2; :::g be a sequence of independent random variables with
distribution N ; 2i ; where 2i is known (note: we assume 21 6= 22 6= :::). When
the variances are unequal, the sample mean y is not the best linear unbiased estimator,
Pn
i.e. BLUE). The BLUE has the form y~ = i=1 wiyi where wi are nonrandom weights.
(a) Find a condition on wi such that E (~ y ) = ; (b) Find the optimal weights wi that
make y~ the BLUE. Hint: You may translate this problem into an econometric framework:
if fyig is a sequence of independent random variables with distribution N ; 2i then yi
can be represented by the equation yi = + "i; where "i N 0; 2i : Then …nd the GLS
estimator of :
120
Exercise 2.26. A research …rst ran a OLS regression. Then she was given the true V matrix.
She transformed the data appropriately and obtained the GLS estimator. For several coe¢ -
cient, standard errors in the second regression were larger than those in the …rst regression.
Does this contradict 1.7 proposition? See the previous exercise.
Finite-sample properties of GLS rest on the assumption that the regressors are strictly
exogenous. In time-series models the regressors are not strictly exogenous and the error
is serially correlated.
V can be estimated from the sample. This approach is called the Feasible Generalized
Least Squares (FGLS). But if the function V is estimated from the sample, its value V
becomes a random variable, which a¤ects the distribution of the GLS estimator. Very
little is known about the …nite-sample properties of the FGLS estimator. We need to
use the large-sample properties ...
122
3 Large-Sample Theory
The …nite-sample theory breaks down if one of the following three assumptions is violated:
This chapter develops an alternative approach based on large-sample theory (n is “su¢ ciently
large”).
123
Convergence in Probability
Example. Consider a fair coin. Let zi = 1 if the ith toss results in heads and zi = 0
1 Pn p
otherwise. Let zn = n i=1 zi : The following graph suggests that zn ! 1=2:
125
A sequence of random scalars fzig converges in mean square (or in quadratic mean) to a
if
h i
2
lim E (zn ) =0
n!1
Convergence in Distribution
Let fzng be a sequence of random scalars and Fn be the cumulative distribution function
(c.d.f.) of zn, i.e. zn Fn. We say that fzng converges in distribution to a random scalar
z if the c.d.f. Fn, of zn , converges to the c.d.f. F of z at every continuity point of F . We
write
d
zn ! z; where z F;
F is is the asymptotic (or limiting) distribution of z . If F is well-known, for example, if F
is the cumulative normal N (0; 1) distribution we prefer to write
d d
zn ! N (0; 1) (instead of zn ! z and z N (0; 1)):
d
Example. Consider zn t(n): We know that zn ! N (0; 1) :
p p
(a) if zn ! ) f (zn) ! f ( ) ;
d d
(b) if zn ! z ) f (zn) ! f (z) :
An immediate implication of Lemma 2.3 (a) is that the usual arithmetic operations preserve
convergence in probability:
p p p
xn ! ; yn ! ) xn + yn ! + :
p p p
xn ! ; yn ! ) xnyn ! :
p p p
xn ! ; yn ! ) xn=yn ! = ; 6= 0:
p p
Yn ! ) Yn 1 ! 1 ( is invertible).
128
d p d
(a) xn ! x; yn ! ) xn + yn ! x + :
d p p
(b) xn ! x; yn ! 0 ) y0nxn ! 0:
d p d
(c) xn ! x; An ! A ) Anxn ! Ax: In particular if x N (0; ) ; then
d
Anxn ! N 0; A A0 :
d p d
(d) xn ! x; An ! A ) x0nAn 1xn ! x0A 1x (A is nonsingular).
p
If xn ! 0 we write xn = op (1) :
p
If xn yn ! 0 we write xn = yn + op (1) :
d
In part (c) we may write Anxn = Axn (Anxn and Axn have the same asymptotic
distribution).
129
Wooldridge’s quotation:
While not all useful estimators are unbiased, virtually all economists agree that
consistency is a minimal requirement for an estimator. The famous econometrician
Clive W.J. Granger once remarked: “If you can’t get it right as n goes to in…nity,
you shouldn’t be in this business.” The implication is that, if your estimator of a
particular population parameter is not consistent, then you are wasting your time.
130
The variance matrix is called the asymptotic variance and is denoted Avar ^n ; i.e.
p
lim Var n ^n = Avar ^n = :
n!1
Some authors use the notation Avar ^n to mean =n (which is zero in the limit).
131
Consider
n
1X
zn = zi:
n i=1
p
We say that zn obeys to the LLN if zn ! where = E (zi) or limn E (zn) = :
p
(Kolmogorov’s Second Strong LLN) If fzig is i.i.d. with E (zn) = ) zn ! :
Theorem 1 (Lindeberg-Levy CLT). Let fzig be i.i.d. with E (zn) = and Var (zi) = :
Then
n
p 1 X d
n (zn )=p (zi ) ! N (0; ) :
n i=1
Notice that
p
E n (zn ) = 0 ) E (zn) =
p
Var n (zn ) = ) Var (zn) = =n
Given the previous equations, some authors write
!
a
zn N ; :
n
133
Example. Let fzig be i.i.d. with distribution 2(1): By the Lindeberg-Levy CLT (scalar case)
we have
n !
1X 2
a
zn = zi N ;
n i=1 n
where
n
1X
E (zn) = E (zi) = E (zi) = = 1;
n i=1
0 1
Xn 2
1 1 2
Var (zn) = Var @ ziA = Var (zi) = = :
n i=1 n n n
134
-
3210.1
0.4
0.3
0.2
Example. In a random sampling, sample size = 30; on the variable z with E (z ) = 10;
Var (z ) = 9 but unknown distribution, obtain an approximation to P (zn < 9:5) : We do
not know the exact distribution of zn: However, from Lindeberg-Levy CLT we have
!
p (zn ) d 2
a
n ! N (0; 1) or zn N ; :
n
Hence,
!
p (zn ) p (9:5 10)
P (zn < 9:5) = P n < 30
3
' ( 0:9128) , [ is the cdf of N (0; 1) ]
= 0:1807:
136
Stochastic process (SP): is a sequence of random variables. For this reason, it is more
adequate to write a SP as fzig (means a sequence of random variables) rather than zi
(means the random variable at time i).
137
The de…nition implies that any transformation (function) of a stationary process is itself
stationary,
n othat is, if fzig is stationary, then fg (zi)g is. For example, if fzig is stationary
then ziz0i is also a SP.
De…nition (Covariance Stationary Processes). A stochastic process fzig is weakly (or co-
variance) stationary if: (i) E (zi) does not depend on i , and (ii) Cov zi; zi j exists, is
…nite, and depends only on j but not on i:
q
Example. It can be proved that fzig ; zi = 0 + 1zi2 1"i; where f"ig is i.i.d. with mean
q
zero and unit variance and 0> 0 and 1=3 1< 1 is a covariance stationary process.
However, wi = zi2 is not a covariance stationary process as E wi2 does not exist.
Exercise 3.3. Consider the SP futg where
8
>
> t if t 2000
<
ut = q
>
> k 2 if t > 2000
:
k t
iid iid
where t and s are independent for all t and s and t N (0; 1) and s t(k). Explain
why futg is weakly (or covariance) stationary but not strictly stationary.
De…nition (White Noise Processes). A white noise process fzig is a covariance stationary
process with zero mean and no serial correlation:
Y Y
8 25
4
20
0
15
-4
10
-8
5
-12
0
-16
-20 -5
25 50 75 100 125 150 175 200 25 50 75 100 125 150 175 200
Y Y5
10 4
3
0
2
-10 1
0
-20
-1
-30 -2
-3
-40
-4
-50 -5
25 50 75 100 125 150 175 200 10 20 30 40 50 60 70 80 90
140
In the literature there is not a unique de…nition of ergodicity. We prefer to call “weakly
dependent process” to what Hayashi calls “ergodic process”.
De…nition. A stationary process fzig is said to be a weakly dependent process (= ergodic in
Hayashi’s de…nition) if, for any two bounded functions f : Rk+1 ! R and g : Rs+1 ! R;
Serial dependence, which is ruled out by the i.i.d. assumption in Kolmogorov’s LLN, is
allowed in this Theorem, provided that it disappears in the long run. Since, for any function
f , ff (zi)g is a S&WD stationary whenever fzig is, this theorem implies that any moment
of a S&WD process (if it exists and is …nite) is consistently estimated by the sample moment.
For example, suppose fzig is a S&WD process and E ziz0i exists and is …nite. Then
n
1X p
zn = zizi ! E ziz0i :
0
n i=1
141
De…nition (Martingale). A vector process fzig is called a martingale with respect to fzig if
By de…nition
0 1 0 1
n
X n
X nX1 X
n
1 @ A
1 @
Var (gn) = 2 Var gt = 2 Var (gt) + 2 Cov gi; gi j A :
n t=1 n t=1 j=1 i=j+1
However, if fgig is a stationary MDS with …nite second moment then
n
X
Var (gt) = n Var (gt) ; Cov gi; gi j = 0;
t=1
so
1
Var (gn) = Var (gt) :
n
De…nition (Random Walk). Let fgig be a vector independent white noise process. A random
walk, fzig, is a sequence of cumulative sums:
zi = gi + gi 1 + ::: + g1:
Exercise 3.4. Show that the random walk can be written as
zi = zi 1 + gi ; z1 = g1:
143
We have three formulations of a lack of serial dependence for zero-mean covariance stationary
processes:
The model presented in this section has probably the widest range of economic applications:
No speci…c distributional assumption (such as the normality of the error term) is required;
The requirement in …nite-sample theory that the regressors be strictly exogenous or …xed
is replaced by a much weaker requirement that they be "predetermined."
Assumption (2.5 - fgig is a martingale di¤erence sequence with …nite second moments).
fgig ; where gi = xi"i; is a martingale di¤erence sequence (so a fortiori E (gi) = 0.
The K K matrix of cross moments, E gigi0 , is nonsingular. We use S for Avar (g ) (the
p P
variance of ng; where g = n1 gi). By Assumption 2.2 and S&WD martingale Di¤erences
CLT, S = E gigi0 :
Remarks:
1. (S&WD) A special case of S&WD is that f(yi; xi)g is i.i.d. (random sample in cross-
sectional data).
3. (E (xi"i) = 0 vs. E ( "ij xi) = 0) The condition E ( "ij xi) = 0 is stronger than
E (xi"i) = 0. In e¤ect,
4. (Predetermined vs. strictly exogenous regressors) Assumption 2.3, restricts only the
contemporaneous relationship between the error term and the regressors. The exogeneity
assumption (Assumption 1.2) implies that, for any regressor k, E xjk "i = 0 for all i
and j; not just for i = j: Strict exogeneity is a strong assumption that does not hold in
general for time series models.
148
6. (A su¢ cient condition for fgig to be a MDS) Since a MDS is zero-mean by de…nition,
Assumption 2.5 is stronger than Assumption 2.3 (this latter is redundant in face of
Assumption 2.5). We will need Assumption 2.5 to prove the asymptotic normality of
the OLS estimator. A su¢ cient condition for fgig to be an MDS is
E ( xi"ij gi 1; :::; g1) = E [ E ( xi"ij Fi)j gi 1; :::; g1] = E [0j gi 1; :::; g1] = 0
thus E ( "ij Fi) = 0 ) fgig is a MDS.
150
E ( "ij "i 1; :::; "1) = E ( E ( "ij gi 1; :::; g1)j "i 1; :::; "1) = 0:
Assumption 2.5 implies that the error term itself is a MDS and hence is serially uncorrelated.
Proposition (2.1- asymptotic distribution of the OLS Estimator). (a) (Consistency of b for
) Under Assumptions 2.1-2.4,
p
b ! :
(b) (Asymptotic Normality of b) If Assumption 2.3 is strengthened as Assumption 2.5, then
p d
n (b ) ! N (0; Avar (b))
where
Avar (b) = 1 1
xx S xx :
^
(c) (Consistent Estimate of Avar (b)) Suppose there is available a consistent estimator S
of S: Then under Assumption 2.2, Avar (b) is consistently estimated by
^ xx1
[ (b) = Sxx1SS
Avar
where
n
X0 X 1X
Sxx = = xix0i:
n n i=1
153
Proposition (2.2 - consistent estimation of error variance). Under the Assumptions 2.1- 2.4,
n
X
1 p
s2 = e2i ! E "2i
n K i=1
Under conditional homocedasticity E "2i xi = 2 (we will see this in detail later) we
have,
S = E gigi0 = E "2i xix0i = ::: = 2 0
E xixi =
2
xx
and
Derivation of the distribution of test statistics is easier than in …nite-sample theory because
we are only concerned about the large-sample approximation to the exact distribution.
Proposition (2.3 - robust t-ratio and Wald statistic). Suppose Assumptions 2.1-2.5 hold,
^ of S. As before, let Avar
and suppose there is available a consistent estimate of S [ (b) =
^ 1: Then
Sxx1SS xx
Remarks
The di¤erences from the …nite-sample t-test are: (1) the way the standard error is
calculated is di¤erent, (2) we use the table of N (0; 1) rather than that of t(n K),
and (3) the actual size or exact size of the test (the probability of Type I error given
the sample size) equals the nominal size (i.e., the desired signi…cance level ) only
approximately, although the approximation becomes arbitrarily good as the sample size
increases. The di¤erence between the exact size and the nominal size of a test is called
the size distortion.
How to select an estimator for a population parameter? One of the most important method
is the analog estimation method or the method of moments. The method of moment
principle: To estimate a feature of the population, use the corresponding feature of the
sample.
E (yi) Y
Var (yi) Sy2
xy Sxy
2 2
x Pn Sx
i=1 Ifyi cg
P (yi c) n
median (yi) sample median
max(yi) maxi=1;:::;n (yi)
157
The analogy principle suggests that E "2i xix0i can be estimated using the estimator
n
1X
"2i xix0i:
n i=1
Since "i is not observable we need another one:
Xn
1
^=
S e2i xix0i:
n i=1
2
Assumption (2.6 - …nite fourth moments for regressors). E xik xij exists and is …nite
for all k and j (k; j = 1; :::; K ) :
Proposition (2.4 - consistent estimation of S). Suppose S = E "2i xix0i exists and is …nite.
Then, under Assumptions 2.1-2.4 and 2.6, S ^ is consistent for S:
158
!
a [ b)
Avar( ^ xx1
Sxx1SS 1 1
b N ; n =N ; n =N ; X0 X X0BX X0X
0 1
W = n (Rb [ (b) R0
r) RAvar (Rb r)
1
= n (Rb r) 0 ^ xx1R0
RSxx1SS (Rb r)
0 1 d
0 1 1 2
= (Rb r) R X0 X X0BX X0 X R (Rb r) ! (p)
159
(b) (Consistent estimation of asymptotic variance) Under the same set of assumptions,
Avar (b) is consistently estimated by
1
[ (b) = s2Sxx1 = ns2 X0X
Avar :
161
Under H0 : k = 0k we have
bk 0 [ ( bk )
Avar 1
d
t0k = k ! N (0; 1) ; where ^ 2bk = 2 0
=s XX :
^ bk n kk
Notice
e 0e e0e d
pF 0 = 0 ! 2
(p)
e e= (n K )
where refers to the short regression or the regression subjected to the constraint R =r
Remark (No need for fourth-moment assumption) By S&WD and Assumptions 2.1-2.4,
p
s2Sxx ! 2 xx = S: We do not need the fourth-moment assumption (Assumption 2.6)
for consistency.
162
With the advent of robust standard errors allowing us to do inference without specifying the
conditional second moment testing conditional homoskedasticity is not as important as it
used to be. This section presents only the most popular test due to White (1980) for the
case of random samples.
where R2 is the R2 from the auxiliary regression of e2i on a constant and i and m is the
dimension of i:
163
Test Equation:
Dependent Variable: RESID^2
Even when the error is found to be conditionally heteroskedastic, the OLS estimator is still
consistent and asymptotically normal, and valid statistical inference can be conducted with
robust standard errors and robust Wald statistics. However, in the (somewhat unlikely) case
of a priori knowledge of the functional form of the conditional second moment, it should be
possible to obtain sharper estimates with smaller asymptotic variance.
166
To simplify the discussion, throughout this section we strengthen Assumptions 2.2 and 2.5
by assuming that f(yi; xi)g is i.i.d.
The parametric functional form for the conditional second moment we consider is
2 0
E "i xi = zi
where zi is a function of xi:
z0i = 1 x2i2 :
167
The WLS (also GLS) estimator can be obtained by applying the OLS to the regression
~0i + ~"i
y~i = x
where
y x "
y~i = q i ; ~ik = q ik ;
x "i = q i ;
~ i = 1; 2; :::; n
z0i z0i z0i
We have
1 1X 1 1 y:
~ 0X
^ GLS = ^ (V) = X ~ ~ 0y
X ~ = X0 V X0 V
168
Note that
"i j x
E (~ ~ i ) = 0:
Therefore, provided that E x ~0i is nonsingular, Assumptions 2.1-2.5 are satis…ed for equa-
~i x
tion y~i = x ~0i +~"i. Furthermore, by construction, the error ~"i is conditionally homoskedastic:
E (~"i j x
~i) = 1. So Proposition 2.5 applies: the WLS estimator is consistent and asymptoti-
cally normal, and the asymptotic variance is
1
Avar ^ (V) = E x ~0i
~i x
0 1 1
n
X
1
= plim @ x ~0iA
~i x (by S&WD theorem)
n i=1
1 0 1
= plim X V 1X :
n
Thus n1 X0V 1X is a consistent estimator of Avar ^ (V) :
169
"2i = E "2i xi + i
where by construction E ( ij xi) = 0: This suggest that the following regression can be
considered
"2i = z0i + i
Provided that E ziz0i is nonsingular, Proposition 2.1 is applicable to this auxiliary regres-
sion: the OLS estimator of is consistent and asymptotically normal. However we cannot
run this regression as "i is not observable. In the previous regression we should replace "i
by the consistent estimate ei (despite the presence of conditional heteroskedasticity). In
conclusion, we may obtain a consistent estimate of by considering the regression of e2i on
zi to get
0 1 1
n
X Xn
^ =@ ziz0iA zie2i :
i=1 i=1
170
Step 1: Estimate the equation yi = x0i + "i by OLS and compute the OLS residuals ei:
^ V
^ p
!
p d
n ^ V
^ ! N 0; Avar ^ (V)
1 X0 V
^ 1X is a consistent estimator of Avar ^ (V) :
n
The especi…cation "2i = z0i + i may lead to z0i ^ < 0: To overcome this problem a popular
speci…cation for E "2i xi is
n o
E "2i xi = exp 0x
i
(it guarantees that Var ( yij xi) > 0 for all 2 Rr ): It implies log E "2i xi = 0x :
i This
suggests the following procedure:
x
c) Transform the data y~i = ^yi ; ~ij = ^ij .
x
i i
d) Regress y ~ and obtain ^ V
~ on X ^
173
Based on information below, are the standard errors reported in the …rst table reliable?
174
cigs: number of cigarettes smoked per day, log(income): log of annual income, log(cigprice):
log of per pack price of cigarettes in cents, educ: years of education, age and restaurn:
binary indicator equal to unity if the person resides in a state with restaurant smoking re-
strictions.
175
n o
Calculate ^ 2i = exp ^ 0xi \
= exp log e2i :
\
Notice: log \
e21; :::; log e2n are the …tted values of the above regression.
178
b and ^ V
^ are consistent.
Assuming that the functional form of the conditional second moment is correctly spec-
i…ed, ^ V
^ is asymptotically more e¢ cient than b.
It is not clear which estimator is better (in terms of e¢ ciency) in the following situations:
– in …nite samples, even if the functional form is correctly speci…ed, the large-sample
approximation will probably work less well for the WLS estimator than for OLS
because of the estimation of extra parameters (a) involved in the WLS procedure.
180
Because the issue of serial correlation arises almost always in time-series models, we use the
subscript "t" instead of "i" in this section. Throughout this section we assume that the
regressors include a constant. The issue is how to deal with
E "t"t j xt j ; xt 6= 0:
181
When the regressors include a constant (true in virtually all known applications), Assumption
2.5 implies that the error term is a scalar martingale di¤erence sequence, so if the error
is found to be serially correlated (or autocorrelated), that is an indication of a failure of
Assumption 2.5.
Assumptions 2.1-2.4 may hold under serial correlation, so the OLS estimator may be consis-
tent even if the error is autocorrelated. However, the large-sample properties of b, t , and
F of proposition 2.5 are not valid. To see why, consider
p p
n (b ) = Sxx1 ng :
182
We have
Avar (b) = 1
xx S
1
xx ;
\
Avar ^ xx1:
(b) = Sxx1SS
Consider the regression yt = x0t + "t: We want to test whether or not "t is serial correlated.
Consider
Cov "t; "t j Cov "t; "t j j E " t "t j
j = r = = =
2
:
Var ("t) Var "t j Var (" t ) 0 E "t
Proposition. If f"tg is a stationary MDS with E "2t "t 1; "t 2; ::: = 2; then
p d p d
n~j ! N 0; 4 and n~j ! N (0; 1) :
Proposition. Under the assumptions of the previous proposition
p
X p
X
p 2 d
Box-Pierce Q statistics = QBP = n~j =n ~2j ! 2(p):
j=1 j=1
regression et on et 1; :::; et p:
We calculate the F statistic for the hypothesis that the p coe¢ cients of et 1; :::; et p are
all zero.
187
Given
et = 1 + 2xt2 + ::: + K xtK + 1et 1 + ::: + pet p + errort
the null hypothesis can be formulated as
H0 : 1 = ::: = p = 0
Use the F test:
188
EVIEWS
189
Example. Consider, chnimp: the volume of imports of barium chloride from China, chempi:
index of chemical production (to control for overall demand for barium chloride), gas: the
volume of gasoline production (another demand variable), rtwex: an exchange rate index
(measures the strength of the dollar against several other currencies).
Equation 1
Dependent Variable: LOG(CHNIMP)
Method: Least Squares
Sample: 1978M02 1988M12
Included observations: 131
Equation 2
Breusch-Godfrey Serial Correlation LM Test:
Test Equation:
Dependent Variable: RESID
Method: Least Squares
Sample: 1978M02 1988M12
Included observations: 131
Presample missing value lagged residuals set to zero.
If you conclude that the errors are serial correlated you have a few options:
(a) You know (at least approximately) the form of autocorrelation and so you use a feasible
GLS estimator.
(b) The second approach, parallels the use of the White estimator for heteroskedasticity:
you don’t know the form of autocorrelation so you rely on the OLS, but you use a
consistent estimator for Avar (b) :
(c) You are concerned only with the dynamic speci…cation of the model and with forecast.
You may try to convert your model into a dynamically complete model.
(d) You model may be misspeci…ed: you respeci…ed the model and the autocorrelation
disappear.
192
There are many forms of autocorrelation and each one leads to a di¤erent structure for the
error covariance matrix V. The most popular form is known as the …rst-order autoregressive
process. In this case the error term in
yt = x0t + "t
is assumed to follow the AR(1) model
Initial Model:
yt = x0t + "t; "t = "t 1 + vt; j j<1
The GLS estimator is the OLS estimator applied to the transformed model
~0t + vt
y~t = x
where
( q ( q
1 2y t=1 ; 1 2 x0 t= 1 ;
y~t = 1 ~0t =
x 1
yt yt 1 t > 1 (xt xt 1)0 t > 1
Without the …rst observation, the transformed model is
0
yt yt 1 = (xt xt 1) + vt:
Example (continuation of the previous example). Let’s consider the residuals of Equation 1:
Equation 3
Dependent Variable: LOG(CHNIMP)
Method: Least Squares
Sample (adjusted): 1978M03 1988M12
Included observations: 130 after adjustments
Convergence achieved after 8 iterations
For sake of generality, assume that you have also a problem of heteroskedasticity.
Given
p 1 nX1 X n
0 0
S = Var ng = Var (gt) + E gtgt j + E gt j gt
n j=1 t=j+1
nX1 X n
1
= E "2t xtx0t + 0 0
E "t"t j xtxt j + E "t j "txt j xt ;
n j=1 t=j+1
a possible estimator of S based on the analogy principle would be
n 0
nX1 X n
1X 1
e2t xtx0t + etet j xtx0t j + et j etxt j x0t ; n0 < n:
n t=1 n j=1 t=j+1
A major problem with this estimator is that it is not positive semi-de…nite and hence cannot
be a well-de…ned variance-covariance matrix.
196
Newey and West show that with a suitable weighting function ! (j ), the estimator below is
consistent and positive semi-de…nite:
Xn XL Xn
1 1
^ HAC =
S e2t xtx0t + ! (j ) etet j xtx0t j + et j etxt j x0t
n t=1 n j=1 t=j+1
where the weighting function ! (j ) is
j
! (j ) = 1 :
L+1
The maximum lag L must be determined in advance. Autocorrelations at lags longer than
L are ignored. For a moving-average process, this value is in general a small number.
This estimator is known as (HAC) covariance matrix estimator and is valid when both
conditional heteroskedasticity and serial correlations are present but of an unknown form.
197
1
! (1) = 1 = 0:75
4
2
! (2) = 1 = 0:50
4
3
! (3) = 1 = 0:25
4
198
EVIEWS:
10
L
9
0
0 1000 2000 3000 4000 5000
n
n 2=9
Eviews selects L = f loor(4 100 )
199
Equation 4
Dependent Variable: LOG(CHNIMP)
Method: Least Squares
Sample: 1978M02 1988M12
Included observations: 131
Newey-West HAC Standard Errors & Covariance (lag truncation=4)
Consider
~0t + ut
yt = x
such that E ( utj x
~t) = 0: This condition although necessary for consistency, does not pre-
clude autocorrelation. You may try to increase the number of regressors to xt and get a new
regression model
yt = x0t + "t such that
Proposition. If a model is DC then the errors are not correlated. Moreover fgig is a MDS.
E ( ytj x
~t) = E ( ytj xt2) = 1 + 2xt2:
202
ut = yt ( 1 + 2xt2) )
ut 1 = yt 1 ( 1 + 2xt 1;2)
we have
yt = 1 + 2xt2 + ut
= 1 + 2 xt2 + ut 1 + "t
= 1 + 2 xt2 + yt 1 1 + 2 xt 1;2 + "t :
This equation can be written in the form
Equation 6
Breusch-Godfrey Serial Correlation LM Test:
Test Equation:
Dependent Variable: RESID
Method: Least Squares
Equation 5 Date: 05/12/10 Time: 19:13
Dependent Variable: LOG(CHNIMP) Sample: 1978M03 1988M12
Included observations: 130
Method: Least Squares Presample missing value lagged residuals set to zero.
Sample (adjusted): 1978M03 1988M12
Variable Coefficient Std. Error t-Statistic Prob.
Included observations: 130 after adjustments
C 1.025127 26.26657 0.039028 0.9689
LOG(CHEMPI) 1.373671 3.968650 0.346130 0.7299
Variable Coefficient Std. Error t-Statistic Prob. LOG(GAS) -0.279136 1.055889 -0.264361 0.7920
LOG(RTWEX) -0.074592 2.234853 -0.033377 0.9734
C -11.30596 23.24886 -0.486302 0.6276 LOG(CHEMPI(-1)) -1.878917 4.322963 -0.434636 0.6647
LOG(GAS(-1)) 0.315918 1.076831 0.293378 0.7698
LOG(CHEMPI) -7.193799 3.539951 -2.032175 0.0443 LOG(RTWEX(-1)) -0.007029 2.224878 -0.003159 0.9975
LOG(GAS) 1.319540 1.003825 1.314513 0.1911 LOG(CHNIMP(-1)) 0.151065 0.293284 0.515082 0.6075
RESID(-1) -0.189924 0.307062 -0.618520 0.5375
LOG(RTWEX) -0.501520 2.108623 -0.237842 0.8124 RESID(-2) 0.088557 0.124602 0.710715 0.4788
LOG(CHEMPI(-1)) 9.618587 3.602977 2.669622 0.0086 RESID(-3) 0.154141 0.098337 1.567475 0.1199
RESID(-4) -0.125009 0.098681 -1.266795 0.2079
LOG(GAS(-1)) -1.223681 1.002237 -1.220950 0.2245 RESID(-5) -0.035680 0.099831 -0.357407 0.7215
LOG(RTWEX(-1)) 0.935678 2.088961 0.447915 0.6550 RESID(-6) 0.048053 0.098008 0.490291 0.6249
LOG(CHNIMP(-1)) 0.270704 0.084103 3.218710 0.0016 RESID(-7) 0.129226 0.097417 1.326523 0.1874
RESID(-8) 0.052884 0.099891 0.529420 0.5976
RESID(-9) -0.122323 0.102670 -1.191423 0.2361
R-squared 0.394405 Mean dependent var 6.180590 RESID(-10) 0.022149 0.099419 0.222788 0.8241
RESID(-11) 0.034364 0.099973 0.343738 0.7317
Adjusted R-squared 0.359658 S.D. dependent var 0.699063 RESID(-12) -0.038034 0.102071 -0.372628 0.7101
S.E. of regression 0.559400 Akaike info criterion 1.735660
R-squared 0.081251 Mean dependent var -9.76E-15
Sum squared resid 38.17726 Schwarz criterion 1.912123 Adjusted R-squared -0.077442 S.D. dependent var 0.544011
Log likelihood -104.8179 Hannan-Quinn criter. 1.807363 S.E. of regression 0.564683 Akaike info criterion 1.835533
F-statistic 11.35069 Durbin-Watson stat 2.059684 Sum squared resid 35.07532 Schwarz criterion 2.276692
Log likelihood -99.30962 Hannan-Quinn criter. 2.014790
Prob(F-statistic) 0.000000 F-statistic 0.512002 Durbin-Watson stat 2.011429
Prob(F-statistic) 0.952295
204
In many cases the …nding of autocorrelation is an indication that the model is misspeci…ed.
If this is the case, the most natural route is not to change your estimator (from OLS to GLS)
but to change your model. Types of misspeci…cation may lead to a …nding of autocorrelation
in your OLS residuals:
yt = 1 + 2 log t + "t:
In the following …gure we estimate a misspeci…ed functional form: yt = 1 + 2t + "t : The
residuals are clearly autocorrelated