Panel DEEQA

DEEQA,Ecole Doctorale MPSE
Academic year 2003-2004
Advanced Econometrics
Panel data econometrics
and GMM estimation
Alban Thomas
MF 102, thomas@toulouse.inra.fr
Purpose of the course
Present recent developments in econometrics, that allow for

a consistent treatment of the impact of unobserved heterogeneity
on model predictions:
Panel data analysis.
Present a convenient econometric framework for dealing with

restrictions imposed by theory:
Method of Moments estimation.
Deal with discrete-choice models with unobserved hetero-
geneity.
Two keywords: unobserved heterogeneity and endogeneity.
Methods:
- Fixed Eects Least Squares
- Generalized Least Squares
- Instrumental Variables
- Maximum Likelihood estimation for Panel Data models
- Generalized Method of Moments for Times Series

- Generalized Method of Moments for Panel Data
- Heteroskedasticity-consistent estimation
- Dynamic Panel Data models
- Logit and Probit models for Panel Data

- Simulation-based inference
- Nonparametric and Semiparametric estimation
Statistical software: SAS, GAUSS, STATA (?)
Contents
I
Panel Data Models
Introduction
1.1
Gains in pooling cross section and time series . . .
1.1.1
Discrimination between alternative models .
1.1.2
Examples . . . . . . . . . . . . . . . . . . .
10
1.1.3
Less colinearity between explanatory variables 11
1.1.4
May reduce bias due to missing or unobserved variables
. . . . . . . . . . . . . . .
11
1.2
Analysis of variance . . . . . . . . . . . . . . . . .
12
1.3
Some denitions . . . . . . . . . . . . . . . . . . .
15
The linear model
17
2.1
Notation . . . . . . . . . . . . . . . . . . . . . . .
17
2.1.1
Model notation
. . . . . . . . . . . . . . .
18
2.1.2
Standard matrices and operators . . . . . .
19
2.1.3
Important properties of operators
. . . . .
20
The One-Way Fixed Eects model . . . . . . . . .
21
2.2
2.2.1
The estimator in terms of the Frisch-WaughLovell theorem . . . . . . . . . . . . . . . .
21
2.2.2
Interpretation as a covariance estimator
. .
23
2.2.3
Comments . . . . . . . . . . . . . . . . . .
24
2.2.4
Testing for poolability and individual eects
25
CONTENTS
2.3
The Random Eects model . . . . . . . . . . . . .
26
2.3.1
Notation and assumptions
. . . . . . . . .
26
2.3.2
GLS estimation of the Random-eect model
27
2.3.3
Comparison between GLS, OLS and Within
29
2.3.4
Fixed individual eects or error components? 29
2.3.5
Example: Wage equation, Hausman (1978)
2.3.6
Best Quadratic Unbiased Estimators (BQU)

of variances
31
Extensions
33
3.1
The Two-way panel data model . . . . . . . . . . .
33
3.1.1
The Two-way xed-eect model
33
3.1.2
Example: Production function (Hoch 1962)
3.2
3.3
. . . . . . . . . . . . . . . . .
30
More on non-spherical disturbances
. . . . . .
. . . . . . . .
36
37
3.2.1
Heteroskedasticity in individual eect
. . .
37
3.2.2
`Typical heteroskedasticity . . . . . . . . .
38
Unbalanced panel data models
. . . . . . . . . . .
39
3.3.1
Introduction . . . . . . . . . . . . . . . . .
39
3.3.2
Fixed eect models for unbalanced panels .
40
Augmented panel data models
47
4.1
Introduction . . . . . . . . . . . . . . . . . . . . .
47
4.2
Choice between Within and GLS . . . . . . . . . .
48
4.3
An important test for endogeneity
49
4.4
Instrumental Variable estimation: Hausman-Taylor
. . . . . . . . .
GLS estimator . . . . . . . . . . . . . . . . . . . .
51
4.4.1
Instrumental Variable estimation . . . . . .
51
4.4.2
IV in a panel-data context
51
4.4.3
Exogeneity assumptions and a rst instru-
. . . . . . . . .
ment matrix . . . . . . . . . . . . . . . . .
52
CONTENTS
4.4.4
More ecient procedures: Amemiya-MaCurdy

and Breusch-Mizon-Schmidt
4.5
4.5.1
. . . . . . . . . . . . . . . . . . . . . .
Full IV-GLS estimation procedure
Example: Wage equation

4.6.1
4.7
55
. . . . .
56
. . . . . . . . . . . . . .
56
. . . . . . . . . . . . .
56
Model specication
Application: returns to education
. . . . . . . . .
4.7.1
Variables related to job status
4.7.2
Variables related to characteristics of households heads
53
Computation of variance-covariance matrix for IV

estimators
4.6
. . . . . . . .
. . . . . . .
. . . . . . . . . . . . . . . . .
58
58
58
Dynamic panel data models
63
5.1
63
Motivation . . . . . . . . . . . . . . . . . . . . . .
5.1.1
5.2
5.3
Dynamic formulations from dynamic programming problems . . . . . . . . . . . . .
63
5.1.2
Euler equations and consumption . . . . . .
65
5.1.3
Long-run relationships in economics
. . . .
67
The dynamic xed-eect model . . . . . . . . . . .
69
5.2.1
Bias in the Fixed-Eects estimator . . . . .
70
5.2.2
Instrumental-variable estimation . . . . . .
73
The Random-eects model
. . . . . . . . . . . . .
75
5.3.1
Bias in the ML estimator . . . . . . . . . .
75
5.3.2
An equivalent representation
. . . . . . . .
76
5.3.3
The role of initial conditions
. . . . . . . .
77
5.3.4
Possible inconsistency of GLS . . . . . . . .
78
5.3.5
Example: The Balestra-Nerlove study
78
. . .
8
II
6
CONTENTS
Generalized Method of Moments estimation

The GMM estimator
6.1
6.2
6.3
85
Moment conditions and the method of moments
85
. . . . . . . . . . . . .
85
6.1.1
Moment conditions
6.1.2
Example: Linear regression model
6.1.3
Example: Gamma distribution
. . . . .
86
. . . . . . .
87
6.1.4
Method of moments estimation . . . . . . .
87
6.1.5
Example: Poisson counting model
. . . . .
88
6.1.6
Comments . . . . . . . . . . . . . . . . . .
89
The Generalized Method of Moments (GMM) . . .
91
6.2.1
Introduction . . . . . . . . . . . . . . . . .
91
6.2.2
Example: Just-identied IV model . . . . .
91
6.2.3
A denition
92
6.2.4
Example: The IV estimator again
. . . . . . . . . . . . . . . . .
. . . . .
92
Asymptotic properties of the GMM estimator . . .
93
6.3.1
Consistency
. . . . . . . . . . . . . . . . .
94
6.3.2
Asymptotic normality . . . . . . . . . . . .
95
6.4
Optimal and two-step GMM
. . . . . . . . . . . .
97
6.5
Inference with GMM
. . . . . . . . . . . . . . . .
99
6.6
Extension: optimal instruments for GMM . . . . .
102
6.6.1
Conditional moment restrictions
. . . . . .
102
6.6.2
A rst feasible estimator
. . . . . . . . . .
104
6.6.3
Nearest-neighbor estimation of optimal instruments
6.6.4
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
GMM estimators for time series models

7.1
GMM and Euler equation models

7.1.1
106
Generalizing the approach: other nonparametric estimators
83
109
115
. . . . . . . . .
115
Hansen and Singleton framework . . . . . .
115
CONTENTS
7.1.2
7.2
7.3
7.4
GMM estimation
. . . . . . . . . . . . . .
117
GMM Estimation of MA models . . . . . . . . . .
118
7.2.1
A simple estimator
. . . . . . . . . . . . .
118
7.2.2
A more ecient estimator . . . . . . . . . .
120
7.2.3
Example: The Durbin estimator
. . . . . .
121
. . . . . . . .
122
. . . . . . . . . . .
122
. . . . . . . . . . . . . . . .
123
Covariance matrix estimation . . . . . . . . . . . .
125
7.4.1
Example 1: Conditional homoskedasticity .
126
7.4.2
Example 2: Conditional heteroskedasticity .
126
7.4.3
Example 3: Covariance stationary process .
127
7.4.4
The Newey-West estimator . . . . . . . . .
128
7.4.5
Weighted autocovariance estimators
. . . .
130
7.4.6
Weighted periodogram estimators
. . . . .
133
GMM Estimation of ARMA models

7.3.1
The ARMA(1,1) model
7.3.2
IV estimation
GMM estimators for dynamic panel data
135
8.1
Introduction . . . . . . . . . . . . . . . . . . . . .
135
8.2
The Arellano-Bond estimator . . . . . . . . . . . .
136
8.2.1
Model assumptions
136
8.2.2
Implementation of the GMM estimator
. . . . . . . . . . . . .
. .
137
More ecient procedures (Ahn-Schmidt) . . . . . .
139
8.3.1
Additional assumptions . . . . . . . . . . .
139
8.4
The Blundell-Bond estimator . . . . . . . . . . . .
140
8.5
Dynamic models with Multiplicative eects
. . . .
141
8.5.1
Multiplicative individual eects . . . . . . .
141
8.5.2
Mixed structure
143
8.3
8.6
. . . . . . . . . . . . . . .
Example: Wage equation
. . . . . . . . . . . . . .
145
10
III
9
CONTENTS
Discrete choice models
149
Nonlinear panel data models

9.1
9.2
151
Brief review of binary discrete-choice models
. . .
151
. . . . . . . . . .
151
9.1.1
Linear Probability model
9.1.2
Logit model
. . . . . . . . . . . . . . . . .
152
9.1.3
Probit model . . . . . . . . . . . . . . . . .
152
Logit models for panel data . . . . . . . . . . . . .
153
9.2.1
Sucient statistics . . . . . . . . . . . . . .
153
9.2.2
Conditional probabilities
. . . . . . . . . .
155
9.2.3
Example:
. . . . . . . . . . . . . . .
156
. . . . . . . . . . . . . . . . . . . .
157
T =2
9.3
Probit models
9.4
Semiparametric estimation of discrete-choice models 158
9.5
9.4.1
The binary choice model
. . . . . . . . . .
159
9.4.2
The IV estimator
. . . . . . . . . . . . . .
162
SML estimation of selection models
. . . . . . . .
164
9.5.1
The GHK simulator . . . . . . . . . . . . .
164
9.5.2
Example
168
. . . . . . . . . . . . . . . . . . .
Appendix 1. Maximum-Likelihood estimation of the

Random-eect model
Appendix 2. The two-way random eects model
171
173
Appendix 3. The one-way unbalanced random eects

model
179
Appendix 4. ML estimation of dynamic panel models181

Appendix 5. GMM estimation of static panel models185
11
CONTENTS
Appendix 6. A framework for simulation-based inference
194
c Software
c
Appendix 8. A crash course in Gauss
c
Appendix 9. Example: The Gauss software
Appendix 7. Example: the SAS
203
211
219
c 224
Appendix 10. IV and GMM estimation with Gauss
Appendix 11. DPD estimation with Gauss
232
References
238
12
CONTENTS
Part I
Panel Data Models
13
Chapter 1
Introduction
Panel data: Sequential observations on a number of
units (individuals, rms).
cross-sections over time, longitudinal data

cross-section time-series data.
Also called
or
pooled
1.1 Gains in pooling cross section and time series

1.1.1
Discrimination between alternative models
Many economic models in the form:
F (Y; X; Z; ) = 0;
where
Y:
individual control variables (workers, rms);
policy or principal's) variables;
:
Z:
(public
(xed) individual attributes;
parameters.
Linear model:
Y = 0 + xX + z Z + u:
15
X:
16
CHAPTER 1. INTRODUCTION
Alternative views concerning this model:
Policy variables have a signicant impact whatever individual

characteristics, or
Dierences across individuals are due to idiosyncratic individual

features,
not included in Z .
In practice, observed dierences across individuals may be due

to both inter-individual dierences
and the impact of policy vari-
ables.
1.1.2
Examples
a) W AGE = 0 + 1EDUCAT ION + 2Z .
People with higher education level have higher wages because

rms value those people more;
People have higher education because they have higher ability

(expected productivity) anyway, and rms value worker ability
more.
b)
SALES = 0 + 1ADV ERT ISEMENT + 2Z .
Advertisement expenditures boost sales;

More ecient rms enjoy more sales, and thus have more money
for advertisement expenditures.
c)
OUT P UT = 0 + 1REGULAT ION + 2Z .
Regulatory control aects rm output;

Firms with higher output are more regulated on average.
d) W AGE = 0 + 11I(UNION ) + 2Z .
Belonging to a union signicantly raises wages;
1.1.
GAINS IN POOLING CROSS SECTION AND TIME SERIES
17
Firms react to higher wages imposed by unions by hiring higherquality workers, and
1.1.3
1I(UNION ) is a proxy for worker quality.
Less colinearity between explanatory variables
In consumer or production economics, input, output or consumer

prices are dicult to use, because:
Time-series:
Aggregated macro price indexes are highly cor-
related;
Cross-sections: Not enough price variation across individuals

or rms.
With panel data, variations across individuals and across time periods are accounted for.
Time-series: no information on the impact of individual characteristics (socioeconomic variables,...);
Cross-sections: no information on adjustment dynamics. Estimates may reect inter-individual dierences inherent in comparisons of
1.1.4
dierent people or rms.
May reduce bias due to missing or unobserved

variables
With panel data, easy to control for unobserved heterogeneity

across individuals. This is critical in practice, explains why panel
data models are now so popular in micro- and macro-econometrics.
Point related to endogeneity and omitted variables issues.
18
Example: Output supply function under perfect competition
max = pQ C (; Q) where C (; Q) = c(Q)

(Q)
, p = @c@Q
= AQ 1 (Cobb-Douglas)
= (0 + 1Q) (Quadratic).
1
Cobb-Douglas case: log Q = 1 (log p
log A ). From
equilibrium condition to estimable equation: Observations (Qit ; pit ),
unobserved heterogeneity i , rm i, period t.
1
(log pit log i A )
log Qit =
1
Identication issue: estimable equation is
Q~ it = a0 + a1p~it + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
~ it = log Qit, p~it = log pit, a1 = 1=( 1),
where Q
a0 = ( A E log i) =( 1), Euit = 0.
Model identied if E log i = 0, i.e., Ei = 1, otherwise A is biased if i is overlooked and E log i 6= 0.
Empirical issue: possible correlation between output price
and eciency term
i.
pit
1.2 Analysis of variance

Consider the model
yit = i + xiti + "it;

where
xit
is scalar,
and
i = 1; 2; : : : ; N; t = 1; 2; : : : ; Ti;
i
are parameters, and
time periods available for individual
i.
Ti:
number of
1.2.
19
ANALYSIS OF VARIANCE
Useful rst-order empirical moments are
Ti
1X
y ;
yi =
T t=1 it
Sxxi =
Ti
X
t=1
x )2;
(xit
and
Syyi =
Ti
X
t=1
(yit
Ti
1X
x ;
xi =
T t=1 it
Sxyi =
yi)2;
Ti
X
t=1
(xit
xi)(yit
yi);
i = 1; 2; : : : ; N:
Least-square parameter estimates are computed as
^ i = Sxyi=Sxxi
and
xi^
^ i = y i
and the Residual Sum of Squares (RSS) for individual
2 =S ;
Sxyi
xxi
RSSi = Syyi
with
(Ti
i is
2) degrees of freedom:
Consider now a restricted model with constant slopes and constant intercepts:
yit = + xit + "it;
which obtains by imposing the following restrictions
1 = 2 = = N (= )
1 = 2 = = N (= ):
Under these restrictions, least-squares parameter estimates would

be
^ =
PN PTi
)(yit
i=1 t=1(xit x
PN PTi
)2
i=1 t=1 (xit x
y)
20
and
^ = y x^ , where
y =
Ti
N X
X
1
P
i Ti i=1 t=1
yit; x =
1
P
Ti
N X
X
i Ti i=1 t=1
xit:
The Residual Sum of Squares is
RSS =
hP
Ti
N X
X
i=1 t=1
(yit
y)2
with as number of degrees of
N PTi
i=1 t=1(yit y)(xit
PN PTi
)2
i=1 t=1(xit x
PN
freedom:
i=1 Ti 2.
i2
x)
For a majority of applications, the rst model is too general and

estimation would require a great number of time observations. If
unobserved heterogeneity is additive in the model, we might consider the following specication with constant slope and dierent
intercepts:
Minimizing
P P
i t (yit
yit = i + xit + "it:

i xit )2 with respect to i and , we
have
XX
t
(yit
xit ) = 0;
XX
i
xit(yit
xit ) = 0;
so that
P P
x (y y )
^ i = yi xi and ^ = P i P t it it i :
i )
i t xit (xit x
P
Residual Sum of Squares has now
i Ti (N + 1) degrees of
N + 1 parameters are estimated).
free-
dom (
This is the most popular model encountered in empirical applications.
1.3.
21
SOME DEFINITIONS
1.3 Some denitions

Typical panel: when number of units (individuals) N
and number of time periods (
T ) is small.
is large,
Short (long) panel: when # periods T is small (large).

Balanced panel: same # periods for every unit (individual).
Rotating panel: A subset of individuals is replaced every period. Rotating panels can be balanced or unbalanced.
Pseudo panel:
when one is pooling cross-sections made of
dierent individuals for every period.
Attrition: with long panels, the probability that an individual

remains in the sample decreases as the number of periods increases
(non response, moving, death, etc.)
22
Chapter 2
The linear model
2.1 Notation
yit = xit + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
where
xit is a K
vector,
is a (K 1) vector of parameters, and
uit is the residual term.

yit and components of xit are both time-varying and varying across
individuals.
Component of dependent variable that is unexplained by
xit:
uit = i + t + "it;
i is the time-invariant individual
eect, and "it is the i.i.d. component.
where
t is the time
uit = i + "it.
error-component model: uit = i + t + "it .
One-way error-component model:

Two-way
eect,
23
24
CHAPTER 2. THE LINEAR MODEL
Allows several predictions of
yit given Xit:
E (yitjxit) = xit across i and t,

E (yitjxit; i) = xit + i for ind. i, across periods,
E (yitjxit; t) = xit + t for period t, across individuals,
E (yitjxit; i; t) = xit + i + t for ind. i and period t.
2.1.1
Model notation
2.1.1.1 Model in matrix form
Y = X + + + ";
Y; ; and " are (NT 1), X is (NT K ).

Convention: index t runs faster, index i runs slower:
where
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
y11
..
.
y1T
y21
..
.
y2T
..
.
yit
..
.
yN 1
..
.
yNT
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
(1)
X11
..
.
X1(1)T
(1)
X21
..
.
X2(1)T
..
.
Xit(1)
..
.
XN(1)1
..
.
(1)
XNT

(K )
X11
..
.
X1(KT )
(K )
X21
..
.
X2(KT )
..
.
Xit(K )
..
.
XN(K1)
..
.
(K )
XNT
3
7
7
7
7
70
7
7
7B
7B
7B
7B
7B
7B
7B
7@
7
7
7
7
7
7
7
5
1
2
C
C
..
C
.
C
+++"
k C
C
C
..
A
.
2.1.
25
NOTATION
2.1.1.2 Model in vector form
yi = Xi + i + + "i; i = 1; 2; : : : ; N;
0
where yi is T 1, Xi is T K . Note: = (1 ; 2 ; : : : ; T ) and
i = (i; i; : : : ; i)0 are (T 1).
2.1.2
Standard matrices and operators
INT : identity matrix w/ NT rows and NT columns;

eT : T -vector of ones;
B = IN
(1=T )eT e0T :
B = (1=N )eN e0N
IT :
Q = INT
(Between-individual operator);
(Between-period operator);
IN
(1=T )eT e0T = INT
(Within-individual operator);
Q = INT
(1=N )eN e0N

IT = INT
B
(Within-period operator;)
B B = (1=NT )eNT e0NT

(Computes full population mean).
Important assumption: No intercept term in the

model (otherwise, use B B to demean all variables).
The
B operators are used to compute, from NT
vectors and ma-
trices, individual- or time-specic means of variables which are
26
stored in matrices of row dimension

The
NT .
operators are used to compute deviations from these
means.
2.1.3
Important properties of operators
Symmetry, idempotency and orthogonality
Q0 = Q; B 0 = B; Q2 = Q; B 2 = B; BQ = QB = 0;
Rank of idempotent matrix = its trace
rank(Q) = N (T 1) and rank(B ) = N:

Decomposition of the Q operator with N = T = 2:
02
3
1
1 0 0 0

B6 0 1 0 0 7
C
1
1
0
1
1
6
7
Cy
Qy = B
@4 0 0 1 0 5
0 1
2 1 1 A
0 0 0 1
0
1
2
30
1
1 1 0 0
y11
y11
B y12 C 1 6 1 1 0 0 7 B y12 C
6
7B
C
C
=B
@ y21 A 2 4 0 0 1 1 5 @ y21 A
0 0 1 1
y22
y22
0
1
0
1
y11
y11 + y12
B y12 C 1 B y11 + y12 C
C
B
C
=B
@ y21 A 2 @ y21 + y22 A
y22
y21 + y22
We will also use
BT = (1=T )eT e0T : Between operator for a single individual;

QT = IT (1=T )eT e0T = IT BT : Within operator for a single
individual.
2.2.
27
THE ONE-WAY FIXED EFFECTS MODEL
2.2 The One-Way Fixed Eects model

Terminology: the xed-eects model does not mean that individual eects
estimation is
are not random in the true model !
conditional
Rather,
on unobserved heterogeneity: the
i 's
are treated as parameters to be estimated.
2.2.1
The estimator in terms of the Frisch-Waugh-Lovell

theorem
Inference is conditional on individual eects: estimates obtain by
Y on X and on individual dummies.

Let E the NT N matrix of individual dummy variables:
2
3
1
0
0 0
61
7
0
0 0
6
7
61
7
0
0 0
6
7
60
7
1
0

0
6
7
60
7
1
0 0
6
7
60
7
1
0 0
6
7
E = 6 ..
7
..
regressing
6
6
6
6
6
6
6
6
4
"
"
0
0
0
0
0
0
(i = 1) (i = 2)

0
0
0

1
1
1
"
(i = N )
and consider the model
Y = X + E + " = W + u
0 00
where W = [X; E ], = ( ; ) , u = + ".
7
7
7
7
7
7
7
7
5
28
Frish-Waugh-Lovell theorem:
Parameter estimates
are numeri-
cally identical in the 2 following procedures:
^ from ^0OLS = (^ 00; ^0)0 = (W 0W )

^ = (X X ) 1X Y ; where
X = [I E (E 0E )
Y = [I E (E 0E )
1W 0Y
1E 0 ]X = PE X;
1E 0]Y = PE Y
(residuals from least-square regression of
and
on
0
0
But E = IN
eT , E E = IN
eT eT = IN T
, PE = I E (E 0E ) 1E 0 = I T1 E (IN )E 0
= I 1 (IN
eT )(IN
eT )0 = I IN
1 eT e0 = Q.
T
E ).
^ = (X 0 X ) 1(X 0 Y ) = (X 0PE0 PE X ) 1(X 0PE0 PE Y )

= (X 0QX ) 1(X 0QY ).
Hence
Idea behind the xed-eect estimation procedure:

Eliminate individual eects
Eliminate individualspecic deviations

from variables
Transformation of the linear model as follows:
yit
1=T
X
t
yit = (xit
BY = (X
1=T
X
t
xit) + uit
BX ) + u Bu
1=T
X
t
uit
QY = QX + Qu:
Least square parameter estimate:
^ = [(QX )0(QX )] 1 (QX )0QY = [X 0 Q0QX ] 1 (X 0Q0QY )

= (X 0QX ) 1X 0 QY and V ar(^ ) = "2(X 0QX ) 1.
2.2.
29
2.2.2
Interpretation as a covariance estimator
The model is, in vector form:
y1
6y 7
6 2 7
6 .. 7
4. 5
yN
x1
6x 7
2 7
=6
6 .. 7
4. 5
xN
eT
0T
60 7
6e 7
T 7
6 T 7
+6
6 .. 7 1 + 6 .. 7 2
4. 5
4. 5
0T
60
6 T
+ + 6 ..
4
eT
0T
7
7
7 N
5
0T
+ 6 ..
7
7
7;
5
"1
6"
6 2
4
"N
with assumptions:
E ("i) = 0; E ("i"0i) = "2IT ; E ("i"0j ) = 0 i 6= j:

OLS estimates of and i obtain by
N
X
X
min "0i"i = (yi
i=1
i=1
, ^ i = yi
i
xi )0(yi
i
xi )
i = 1; 2; : : : ; N;
and substituting in partial derivative wrt. , we have
^ =
" N;T
X
i;t
(xit
xi)(xit
xi;
xi)0
# 1 " N;T
X
i;t
(xit
xi)(yit
yi)
covariance estimator, or the LSDV (Least-Square

Dummy-Variable) estimator. ^ is unbiased, is consistent when N
This is called the
or
tends to innity. Its covariance matrix is

V ar ^ = ^ 2"
" N
X
i=1
xiQT x0i
# 1
30
where
QT = IT
(1=T )eT e0T .
^ i is unbiased but consistent only when T

2.2.3
! 1.
Comments
Model transformation by ltering out individual components

) Coecients associated with time-invariant regressors are not
identied.
Fixed-eect procedure uses variation within periods for each

unit, hence the name.
Another possibility is the Between procedure, using variation between individuals.
BY = BX + B + B";
^ = [(BX )0(BX )] 1 (BX )0BY = [X 0 BX ] 1 X 0BY:

This alternative estimator uses variation between individual means
for model variables.
If
X1 is time-varying only, BX1 = f T1
PT
t x1it i;t
= x1 8i, and
the intercept term is not identied.
A word of caution in computing variance estimates. In the
QY = QX + Qu, statistical software would divide RSS

by NT
K (individual eects not included). But in the model
Y = X + E + + ", the RSS would be divided by N (T 1) K .
model
Parameter variance estimates in the Within regression model must

be multiplied by
(NT
K )=[N (T
1) K ].
2.2.
31
Y
..........
..........
.
.
.
.
.
.
.
.
.
.
..........
..........
.
........
........
.
.
.
.
.
.
.
......
Within
Between
1...........
X
2.2.4
Testing for poolability and individual eects
Poolability
As before:
yit = i + xiti + "it

versus
yit = i + xit + "it;
xit is a K vector.
H0 : 1 = 2 = = N (= ) (K (N
but now
1) constraints).
Fisher test statistic is
(RRSS URSS )=K (N 1)

v F (K (N
URSS=N (T K 1)
where RRSS: from Within regression

and URSS:
PN
i=1 RSSi where RSSi
Testing for individual eects

H0 : 1 = = N (= ).
1); N (T
1)) ;
2 =Sxxi (see 1.2).

= Syyi Sxyi
32
yit = + xit + "it
(OLS)
versus
yit = i + xit + "it
(Within)
Fisher test statistic is
(RRSS URSS )=(N 1)

v F ((N
URSS=(NT N K )
1); NT
K )) ;
where RRSS: from OLS regression on pooled data

and URSS: from Within (LSDV) regression.
2.3 The Random Eects model

2.3.1
Notation and assumptions
Problem with Fixed-eect model: degrees of freedom are lost when
! 1.
Dierent approach: assume individual eects are ran-
dom, i.e., model inference is drawn marginally (unconditionally

upon the
i 's) wrt.
the population of all eects.
Assumptions:
i v IID(0; 2 ); "it v IID(0; "2); E (i"it) = E (ixit) = 0;

with
E (ij ) =

2
0
if
i = j;
otherwise
"2 if i = j and t = s;
E ("it"sj ) = 0 otherwise:
2
2
2
Hence cov (uit ; ujs ) = + " if i = j and t = s, and if i = j
and t 6= s.
2.3.
33
THE RANDOM EFFECTS MODEL
Let
2 + "2 2
6 2
2 + "2
6
0
T = E (uiui) = 6 ..
4.

2
2

2
2
..
.
2 + "2
T T ) matrix, for every individual i, i = 1; 2; : : : ; N .
a (
E (uu0) =
= IN

T = IN
2 (eT e0T ) + "2IT
3
7
7
7;
5
We have
= IN
2 (T BT ) + "2(QT + BT )
since QT = IT
BT and BT = (1=T )eT e0T . Therefore
= IN
2 (T BT ) + "2(QT + BT ) = T 2 B + "2INT
= "2Q + (T 2 + "2)B .
or equivalently:
2.3.2
GLS estimation of the Random-eect model
General model form:
Y = X + U;
with
E (UU 0) =
.
Generalized Least Squares (GLS) produce ecient parameter estimates of
, 2
and
"2,
based on known structure of variance-
.
^ GLS = X 0
1X 1 X 0
1Y
^ GLS ) = "2 X 0
1X 1.
and V ar (
covariance matrix
Computation of
1:
use of the formula
r = ("2)r Q + (T 2 + "2)r B
for an arbitrary scalar
r.
Based on properties of
potency and orthogonality).
Q and B (idem-
34
Hence useful matrices are
1
1
B
1 = 2Q + 2
"
T + "2
and
1
1
1=2 = Q +
B:
2
"
(T + "2)1=2
^ GLS = X 0
1X 1 X 0
1Y
We have
"
= X0
"2
1
# 1"
X0
"2
1
Y :
i 1h
i
1
1
0
0
= X (Q + B ) X
X (Q + B ) Y ;
2
2 2
2 2
where = (T + )= = 1 + T = .

"
"
"
GLS as Weighted Least Squares. Premultiply the model by
"
1=2 and use OLS: Y = X + u, where

"

1
=
2
Y = "
Y = Q +
B Y
(" + T )1=2

"

1
=
2
X = "
X = Q +
B X;
(" + T )1=2
so that
Y = (Q + 1=2B )Y; X = (Q + 1=2B )X;
scalar form:
fyit g = (yit
yi) + 1=2yi = yit
(1
fxitg = (xit
xi) + 1=2xi = xit
(1
and in
p1
)yi

p1 )xi:

See Appendix 1 for Maximum Likelihood Estimation of the randomeects model.
2.3.
35
2.3.3
Comparison between GLS, OLS and Within
1
1
^ GLS = X 0 QX + 1 X 0 BX
X 0QY + X 0 BY

^ W ithin = (X 0QX ) 1X 0 QY; ^ Between = (X 0BX ) 1X 0BY;
so that
^ GLS = S1^ W ithin + S2^ Between;

1 0
0
1 0
where S1 = [X QX + X BX ] X QX and

0
S2 = [X 0QX + 1 X 0 BX ] 1 X BX
.
(i) If 2 = 0, then 1= = 1 and ^ GLS = ^ OLS .

(ii) If T ! 1, then 1= ! 0 and ^ GLS ! ^ W ithin.
(iii) If 1= ! 1, then ^ GLS ! ^ Between.
(iv) V ar(^ W ithin) V ar(^ GLS ) is a s.d.p. matrix.
(v) If 1= ! 0 then V ar(^ W ithin) ! V ar(^ GLS ).
2.3.4
Fixed individual eects or error components?
Crucial issue in panel data econometrics: how should we treat effects
i's ?
As parameters or as random variables ?
) If inference is restricted to the specic units (individuals)

in the sample: conditional inference, use Fixed eects. Example:
Individuals are not selected as random, or all rms in a given industry are selected.
) If inference on the whole population:
marginal (uncondi-
tional) inference, use Random eects. Example: Individuals are

selected randomly from a huge population (consumers).
36
2.3.4.1 Some practical choice criteria
Interpretation of eects in the (economic) model;

Sampling process: purely random or not;
Number of units (countries, regions, households,...);
Interchangeability of units;
Endogeneity of Xit (see later).
2.3.4.2 Terminology
When xed individual eects are considered, Fixed-Eects or
Within estimation procedure. When random individual eects,

GLS (Generalized Least Squares) estimation procedure.
2.3.5
Example: Wage equation, Hausman (1978)
629 high-school graduates, Michigan income dynamics study. 3774

observations (
N = 629, T = 6).
Dependent variable: log wage
The GLS estimator is a weighted-average of the Within and Between estimators, where the weight is the inverse of the corresponding variance.
The Within estimator neglects the variation between individuals,

the Between estimator neglects the
variation within individuals, and the OLS gives equal weight to
both Within and Between variations.
Note. If the model contains an intercept:
yit = + xit + i + "it;
2.3.
37
Table 2.1:
Within and GLS estimation results
Variable
Within
GLS
Constant
0.8499
Age in [20,35]
0.0557
0.0393
Age in [35,45]
0.0351
0.0092
Age in [45,55]
0.0209
-0.0007
Age in [55,65]
0.0209
-0.0097
Age 65 over
-0.0171
-0.0423
Unemployed prev. year
-0.0042
-0.0277
Poor health prev. year
-0.0204
-0.0250
Self-employed
-0.2190
-0.2670
South
-0.1569
-0.0324
Rural
-0.0101
-0.1215
we use
2.3.6
B B instead of B (to eliminate ) in the formulae.
Best Quadratic Unbiased Estimators (BQU) of

variances
If errors are normal, BQU estimates of
^ 2 = u0Qu=tr(Q) =
"
and
because
2 and "2 are found from
PN PT
i=1 t=1(uit
N (T
1)
ui)2
X
"2 + T 2 = u0Bu=tr(B ) = T
u2i =N;
i=1
tr(Q) = N (T
But in practice, the

variances from the
1) and tr(B ) = N .
uit's
are unknown and we must estimates
uît's instead.
38
1/ Wallace and Hussain (1969):

true
u's;
2/ Amemiya (1971):

p
2
pNT (^2"

Use OLS residuals in place of
Use LSDV residuals estimates. We have
N (^
"2)
2 )
where
^ 2 = "2 + T 2
vN
0;
2"4 0
0 24

^ 2" =T .
3/ Swamy and Arora (1972):
Use mean square errors of the
Within and the Between regressions.

Mean square error from Within regression:
^ 2" = Y 0QY

Y 0QX (X 0QX ) 1X 0 QY =[N (T
1) K ]
and from the Between regression:

Y 0BX (X 0BX ) 1X 0BY =[N
"2 + T 2 = Y 0BY
Note: Intercept term in the Between regressors (
X ),
1]:
not in the
Within regression.
4/ Nerlove (1971):
Compute
^ 2 = N1 1
PN
i
i=1(^
î)2, where ^ i
are parameter estimates associated to individual dummies from

LSDV regression. And
"2 is estimated from Within regression.
Estimation methods above with covariance components replaced

by consistent estimates:
Feasible GLS.
Chapter 3
Extensions
3.1 The Two-way panel data model
Error component structure of the form:
uit = i + t + "it
i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
or in matrix form
U = (IN
eT ) + (eN
IT ) + ";
where
= (1; : : : ; N )0 and = (1; : : : ; T )0.
3.1.1
The Two-way xed-eect model
t are treated as xed parameters, conditional

on the N individuals over the period 1 ! T .
and
inference
3.1.1.1 Notation
Fixed-eect estimates of
Q = IN
IT
obtain by using the new operator:
IN
(eT e0T =T ) (eN e0N =N )
IT ;
39
40
CHAPTER 3.
so that
Qu = fuit
ui
EXTENSIONS
utgit :
Averaging over individuals, we have
yt = xt + t + "t
N
X
with restriction
i=1
i = 0:
and averaging over time periods:
yi = xi + i + "i
T
X
with restriction
t=1
t = 0;
OLS on model in deviations yields
^ = (X 0QX ) 1X 0 QY;
^
^ i = yi xi;
^
^t = yt xt:
If the model contains an intercept, operator
Q = IN
IT
so that
Qu = fuit
Q becomes
IN
(eT e0T =T ) (eN e0N =N )
IT
+(eN e0N =N )
(eT e0T =T )
ui ut + ugit, and Within estimates are
^ = (X 0QX )
^ i = (yi y)
^ t = (yt y)
1X 0 QY;
(xi
(xt
^
x);
^
x):
3.1.1.2 Testing for eects
1/ H0 : 1 = = N = 1 = = T = 0.
3.1.
41
THE TWO-WAY PANEL DATA MODEL
Fisher test statistic:
(RRSS URSS )=(N + T 2)

v F (k1; k2);
URSS=[(N 1)(T 1) K ]
where
k1 = N + T
2; k2 = (N
1)(T
1) K );
and
URSS (Unrestricted RSS): from Within model,

RRSS: (Restricted RSS): from pooled OLS.
2/ H0 : 1 = = N = 0 given t 6= 0; t T
1.
(RRSS URSS )=(N 1)

v F (k1; k2);
URSS=[(N 1)(T 1) K ]
where
k1 = N
1; k2 = (N
1)(T
1) K );
and
URSS: from Within model,

RRSS: from regression w/ time dummies only:
(yit
yt) = (xit
xt) + (uit
ut):
3/ H0 : 1 = = T 1 = 0 given i 6= 0; i N
1.
(RRSS URSS )=(T 1)

v F (k1; k2);
URSS=[(N 1)(T 1) K ]
where
k1 = T
1; k2 = (N
1)(T
1)
K );
and
42
CHAPTER 3.
EXTENSIONS
URSS: from Within model,

RRSS: from Within regression as in one-way model:
(yit
yi) = (xit
xi) + (uit
ui):
See Appendix 2 for the two-way random eects model.
3.1.2
Example: Production function (Hoch 1962)
Sample: 63 Minnesota farms over the period 1946-1951.

Estimation of a Cobb-Douglas production function:
log Outputit = 0 + 1 log Laborit + 2 log Real estateit

+3 log Machineryit + 4 log F ertilizerit:
uit):
Climatic conditions, identical across farms (t);
Motivation for adding specic eects (into
Farm-specic factors (soil, managerial quality) (i).

Table 3.1:
Least square estimates of Cobb-Douglas production func-
tion
Assumption
Estimate
1 (Labor)
2 (Real estate)
3 (Machinery)
4 (Fertilizer)
Sum of 's
R 2
(I)
(II)
(III)
0.256
0.166
0.043
0.135
0.230
0.199
0.163
0.261
0.194
0.349
0.311
0.289
0.904
0.967
0.726
0.721
0.813
0.884
i = t = 0 i = 0 t = 0
3.2.
43
MORE ON NON-SPHERICAL DISTURBANCES
3.2 More on non-spherical disturbances

Panel data: in the random-eect context, heteroskedasticity due
to panel data structure.
But variances
2
"2
and
are assumed
constant.
Heteroskedasticity and serial correlation:
V ar(i) = i2
V ar("i) = i2
E ("it"is) 6= 0
Individual-specic heteroskedasticity
t 6= s
Typical heteroskedasticity
Serial correlation
We present here the rst two cases only.
3.2.1
Heteroskedasticity in individual eect
Mazodier and Trognon (1978):
V ar(i) = i2 "it v IID(0; "2);
or
i = 1; 2; : : : ; N;
E (0) = diag[i2] = and " v IID(0; "2).
= E (UU 0) = diag[i2]
(eT e0T ) + diag["2]
IT ;
where
diag["2] is N N .
We have
e e0
eT e0T
= diag[T i2 + "2]

T T + diag["2]
IT
T
T

eT e0T
eT e0T
r
2
2
r
2
r
= diag[(T i + " ) ]
+ diag[(" ) ]
IT
:
T
T
Transformation of the heteroskedastic model:
multiply both sides by
"
1=2
"
= diag
2
(T i + "2)1=2
eT e0T
+ IN
IT
T
eT e0T
:
T
44
CHAPTER 3.
Transformed variables in scalar form:
yit = yit
"
"
p
T i2 + "2
EXTENSIONS
!#
yi:
Same form as in the homoskedastic case, only here
is individual-
specic:
i = (T i2 + "2)="2
and
yit = yit
Feasible GLS:
p1 yi:
i
Step 1. Estimate "2 consistently from usual Within regression;

2
2
2
2
Step 2. Noting
that V ar (uit ) = wi = i + " , estimate wi by
PT
1=(T
ûi)2, where uit is OLS residual;

Compute
^ 2i = wî2 ^ 2" ;
Form T
^ 2i + ^ 2" , î and compute yît ; xît;
Regress y
ît on xît to get ^ .
1)
Step 3.
Step 4.
Step 5.
uit
t=1 (^
Important: consistency of variance components estimates
1; 2; : : : ; N
3.2.2
requires
T >> N .
wî2; i =
`Typical heteroskedasticity
Assumptions:
i v IID(0; i2) and V ar("it) = i2.
= E (UU 0) = diag[2 ]
(eT e0T ) + diag[i2]
IT
= diag[T 2 + i2]
(eT e0T =T ) + diag[i2]
(IT
Transformed model uses
1
]
(eT e0T =T )
2
2
T + i
1=2 = diag[ p
eT e0T =T ) :
3.3.
45
UNBALANCED PANEL DATA MODELS
+diag[1=i]
(IT
eT e0T =T ) ;
Y =
1=2 has typical element
y y
y
yit = it i + p 2i 2
i
T + i
y iyi
i
p
= it
where i = 1
i
T 2 + i2
E (u2it) = wi2 = 2 + i2 8i, hence
OLS residuals u
ît can be used to
P
T
2 ^ 2 = 1=(T 1)
estimate wi : w
uit ûi)2.
i
t (^
Within residuals u
~ are then used to compute
PTit
2
^ i = 1=(T 1) t (~uit u~i)2.
so that
A consistent estimate of
2 is ^ 2 = (1=N )
PN 2
î
i (w
^ 2i ).
3.3 Unbalanced panel data models

3.3.1
Introduction
Denition: number of time periods is dierent from one unit (indi-
i
Ti periods, and total
PN
number of observations is now
i=1 Ti (instead of NT previously).
vidual) to another. For individual , we have
Examples
Firms: may close down or new intrants in an industry;

Consumers: may move, die or refuse to answer anymore;
Workers: may become unemployed,...
Problem of attrition: probability of a unit staying in the sample
decreases as the # of periods increases.
46
CHAPTER 3.
3.3.2
EXTENSIONS
Fixed eect models for unbalanced panels
3.3.2.1 The one-way unbalanced xed-eect model

Consider the unbalanced model with
y11
B y12
B
B y13
B
@ y21
y22
To eliminate
T1 = 3 and T2 = 2:
x11
1
C B x12 C
B 1
C B
C
B
C = B x13 C + B 1
C B
C
B
A @ x21 A
@ 2
x22
2
"11
C B "12
C B
C + B "13
C B
A @ "21
"22
C
C
C:
C
A
, we need a new Within operator

Q =
2
6
6
6
6
4
2=3
1=3
1=3
0
0
I3
e3e03=3
0
I2
1=3
2=3
1=3
0
0
1=3
1=3
2=3
0
0
0
e2e02=2
0
0
0
1=2
1=2

3
0
07
7
07
7;
1=2 5
1=2
and the same procedure as in the balanced case is applied:
^ W ithin = (X 0 QX ) 1 X 0 QY

where
Q = diag(ITi
eTi e0Ti =Ti)ji=1;2;:::;N .
3.3.2.2 The two-way unbalanced xed-eect model

The model is
yit = xit + i + t + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;
3.3.
47
where
Nt:
# of units observed in period
Total number of observations is
n.
t, and n =
PT
t=1 Nt .
A bit more complex to extend the Within approach here.
Important: We now assume that observations are ordered differently:
i runs fast and t runs slowly.
Consider a
N
matrix at time
from which we delete rows
corresponding to missing individuals at
t.
N = 3, N1 = 3, N2 = 2, N3 = 2, and observations are

(y11; y21; y31) (y12; y32) (y13; y23).
Example:
1 0 0
40 1 05
0 0 1
8
>
>
>
>
>
>
>
>
>
>
>
>
>
<
1 0 0
D1 = 4 0 1 0 5
0 0 1
1 0 0
D
=
>
2
>
>
0 0 1
>
>
>
>
>
>
>
>
>
>
:
1 0 0
D3 =
0 1 0
We have 3 (Nt N ) matrices Dt , t = 1; 2; 3 constructed from I3
above.
as (1; 2), where 1 = (D10 ; : : : ; DT0 )0,

a (n N ) matrix, and 2 = diag (Dt eN ), a (n T ) matrix:
2
3
D1 D1eN
0
6 D
0
0 7
6 2
7
= 6 ..
7:
..
..
4 .
5
0
.
.
DT 0 DT eN
Now dene a new matrix
48
CHAPTER 3.
EXTENSIONS
Dt eN 's provide the number of units present for each period t

(the Nt 's).
The
Matrix
is
n (N + T ),
and corresponds to the matrix of all
dummies (units and periods) present in the sample. Part
1
in
is the equivalent ot matrix E (containing individual dummies)

before.
011 = diag(Ti) (number of periods in the sample for

0
unit i), and 2 2 = diag (Nt ) (number of individuals for period
t).
0
Also, 2 1 is a T N matrix of dummy variables for the presence
in the sample of unit i at time t.
Note that
Fixed-eect estimator could be implemented by considering the

model
yit = xit + Dit + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;
where
and
Dit:
t's.
particular row of matrix
, and contains all the i's
1 = (eT
2 = (IT
eN ), and would be NT (N + T ).
In the balanced panel case, we would have
IN ) and
3.3.
49
n = 3 + 2 + 2 = 7 and N = 3:
In example above,
=
vector
1
60
6
60
6
61
6
40
0
0
0
1
1
0
0
1
0
0
0
1
0
1
0
0
1
0
1
0
0
1
0
0
0
0
1
0
0
1
0
1
0
0
1
1
1
0
0
0
0
0
0
0
1
1
0
0
0
07
7
07
7
07
7;
07
7
7
15
1
(1; 2; 3; 1; 2; 3), and 0Y =
would be
0
1
0
1
0
0
6
6
6
6
6
6
6
6
6
4
0
0
1
0
1
0
1
0
0
0
0
1
0
y
0 B 11 C
y11 + y12 + y13
y
B
21
C
17
y21 + y23
C B
7B
y
B
C B
31
7
B
0 7B
C B y31 + y32
y
B
C=B
12
07
y11 + y21 + y31
C
7B
B y32 C B
5
@
C
0 B
y12 + y32
@ y13 A
1
y13 + y23
y23
3
1
C
C
C
C
C
C
A
would compute the sums of variables over periods and inviduals.
Easier method if
and
are large: use deviations from indi-
vidual and time means, as in the balanced two-way Within case.
Let
N = 011
T = 022
NT = 021
= 2 1N10NT

P = T NT N10NT = 02
(N N );
(T T );
(T N );
(n T );
(T T ):
50
CHAPTER 3.
EXTENSIONS
Wansbeek and Kapteyn (1989): The required Within operator for

such unbalanced two-way panel is
Q = In
where
1N101
: generalized inverse of
P
0;
P.
QY , say, is also written as

P
0Y = Y 1N11
QY = Y 1N101Y
0
0Y .
where 1 = 1 Y and = P
Transformed variable
PTi
t=1 yti .
1 compute the individual sums

Typical transformed element:
(QY )ti = yti

where
;

1i
a0i
+
Ti
Ti
t;
ai : i-th column of NT .
Example
Y = (y11; y21; y31; y12; y32; y13; y23) = (1; 2; 3; 2; 6; 3; 4), n = 7,

N = 3, T = 3.
Let
We have
3 0 0
1 1 1
N = T = 4 0 2 0 5 ; NT = 4 1 0 1 5 ;
0 0 2
1 1 0
2
P =4
1:6666
0:8333
0:8333
0:8333
1:1666
0:3333
0:8333
0:3333 5
1:1666
3.3.
51
QY =
B
B
B
B
B
B
B
B
B
@
0:4582
0:1875 C
C
0 1
0
1
0:5000 C
6
0:3383
C
0:5418 C
C ; 1 = @ 6 A = @ 1:6618 A
0:5000 C
C
9
2:0368
C
0:0832 A
0:1875
For example,
Qy11 = 1
6 1
+ ( ) (1 1 1 ) @
3 3
0
Qy31 = 3
9 1
+ ( ) (1 1 0 ) @
2 2
0:3383
1:6618 A + 0:3383 = 0:4582:
2:0368
1
0:3383
1:6618 A + 0:3383 = 0:5:
2:0368
See Appendix 3 for the unbalanced random-eects model.
52
CHAPTER 3.
EXTENSIONS
Chapter 4
Augmented panel data models
What are augmented panel models ? Implication for estimation ?
Special estimation techniques when GLS are not feasible.
4.1 Introduction
Consider the model
yit = xit + zi + i + "it; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

xit a 1 K vector of time- and individual-varying regressors,
and zi a 1 G vector of individual-specic (time-invariant) rewith
gressors.
Example:
log W AGE = 1HOURS + 1EDUC + 2SEX + i + "it:

Estimation method:
Within: is not identiable because

QY = QX + (I
B )Z + Q + Q" = QX + Q";
53
54
since
CHAPTER 4.
BZ = Z .
Only
AUGMENTED PANEL DATA MODELS
identiable.
But two-step procedure is
feasible:
1/ Run Within regression
) ^ ;
2/ Run Between regression on
xi^ = i + Zi + "i;
to estimate the 's.
yi
i = 1; 2; : : : ; N;
GLS: Both and are identiable.

4.2 Choice between Within and GLS
One of the choice criterion between Within and GLS: presence of
zi's in the model.
Recall: GLS is a consistent and ecient estimator provided regressors are exogenous:
E (izi) = 0
8i; t:
Consider the non-augmented model yit = xit + i + "it .
If xit is endogenous in the sense E (i xit ) 6= 0, then GLS are not
E (ixit) = 0
consistent:
and

^ GLS = + X 0
1X 1 X 0
1U

= + X 0 Q + 1B X 1 X 0 Q + 1B U ;
2 2
where = 1 + T =" , so that
0
X

Q + 1B U = [X 0Q" + X 0(B + B")=]
4.3.
55
AN IMPORTANT TEST FOR ENDOGENEITY
= 0 + X 0B= + 0 = X 0= 6= 0;
because
E (X 0") = 0 and B = .
Same problem with the augmented model, if
E (Z 0) 6= 0.
Important consequence in practice:
E (X 0) 6= 0 and/or
If (some of the) re-
gressors are endogenous, GLS estimates are not consistent, but

Within estimates are consistent because
is ltered out.
Another criterion of choice between Within and GLS:
If endogenous regressors ) Choose Within estimation (but

not identiable);
If all regressors are exogenous, use GLS (the most ecient).

Three problems remain:
still not identied, because in the Between regression

xi^ = zi + i + "i,
zi still correlated with i.
yi
If one uses Within, all regressors are treated as endogenous (no

distinction between exogenous and endogenous
Within estimates not ecient.
xit's).
4.3 An important test for endogeneity

Null hypothesis:
H0 : E (X 0) = E (Z 0) = 0 (exogeneity).
Comparison between two estimators:
56
CHAPTER 4.
H0
Alternative
^ GLS
^ W ithin
Consistent,
Consistent,
ecient
not ecient
Not consistent
Consistent
Hausman (1978): Even if the

mates of
xit's
are exogenous, GLS esti-
are not consistent in the augmented model.
Therefore,
one can test for exogeneity using parameter estimates for
Hausman test statistic: Under
HT = ^ W ithin

Notes
^ GLS
^ GLS

0 h
^ W ithin
H0,
only.
i 1
V ar(^ W ithin) V ar(^ GLS )

^ GLS
v 2(K ):
^ W ithin hmust have the same dimension.

i
^
^
Weighting matrix V ar( W ithin) V ar( GLS ) is positive: GLS
and
more ecient than Within under the null.
Recall that
V ar(^ GLS ) = "2(X 0QX + X 0 BX ) 1 and V ar(^ w ) =
"2(X 0QX ) 1.
Interpretation of # of degrees of freedom of the test:

0
Within estimator is based on the condition E (X QU ) = 0, whereas
0 1
0
0
GLS is based on E (X
U ) = 0 ) E (X QU ) = 0 and E (X BU ) =
0.
For GLS, we add
of
X.
later).
additional conditions (in terms of
B ):
rank
Hausman test uses these additional restrictions (see GMM
4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLOR GLS ESTIMATOR57
4.4 Instrumental Variable estimation: HausmanTaylor GLS estimator

4.4.1
Instrumental Variable estimation
Alternative method:
Instrumental-variable estimation.
cross-section context with
In the
observations:
Y = X + "; E (X 0") 6= 0; E (W 0") = 0;

W is a N L matrix of instruments.
If K = L,
where
[W 0(Y
If L > K ,
[W 0(Y
X )] = 0
^ = (W 0X ) 1W 0Y
X )] = 0
(W 0Y ) = (W 0X )
(IV estimator)
L conditions on K
parameters)
(Y X )0W (W 0W ) 1W 0
X ) where PW = W (W 0W ) 1W 0
and construct quadratic form
(Y
) ^ = (X 0PW0 X )
Note:
in general, instruments
1 (X 0 P Y ):
W
originate from or outside the
equation.
4.4.2
IV in a panel-data context
Account for variance-covariance structure (

);
Find relevant instruments, not correlated with .
58
CHAPTER 4.
Consider the general, augmented model:
Y = X11 + X22 + Z1 1 + Z2 2 + + ";
where
X1 :
X2 :
Z1 :
Z2 :
and let
N K1
N K2
N G1
N G2
i and t;
endogenous, varying across i and t;
exogenous, varying across i;
endogenous, varying across i;
exogenous, varying across
= (X10 ; X20 ; Z10 ; Z20 ) and = (10 ; 20 ; 10 ; 20 )0.
General form of the Instrumental-variable estimator for panel

data: Let
have
Y =
1=2Y , X =
1=2X ,
h
=
1=2.
We
1 0
0
^ IV = PW
PW Y
h
i 1h
i
0
1
=
2
1
=
2
0
1
=
2
1
=
2
=
PW

PW
Y :
Computation of
4.4.3
and
1=2:
as in the usual GLS case.
Exogeneity assumptions and a rst instrument matrix
E (X10 ) = E (Z10 ) = 0
) Obvious instruments are X1 and Z1, not sucient because
K1 + G1 < K1 + K2 + G1 + G2.
Additional instruments: must not be correlated with .
Because is the source of endogeneity, every variable not correlated with is a valid instrument. Best valid instruments are
highly correlated with X2 and Z2 .
QX1 and QX2 are valid instruments: E [(QX1)0] = E [X10 Q] =
Exogeneity assumptions:
4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLOR GLS ESTIMATOR59
0 and E [(QX2)0] = E [X20 Q] = 0.

As for
X1, equivalent to use BX1 because we need
E [X10
1U ] = E [X10 (Q + 1B )U ] = E [X10 B (Q + 1B )U ]
since
BQ = 0 and BB = B .
Hausman-Taylor (1981) matrix of instruments:
WHT = [QX1; QX2; BX1; Z1] = [QX1; QX2; X1; Z1]:

Identication condition: We have K1 + K2 + G1 + G2 parameters
to estimate, using K1 + K1 + K2 + G1 instruments (K1 + K2 instruments in
4.4.4
QX ).
Therefore, identication condition is
K1 G2.
More ecient procedures: Amemiya-MaCurdy and

Breusch-Mizon-Schmidt
4.4.4.1 Amemiya and MaCurdy (1986)
xit is exogenous, we can use the following conE (xiti ) = 0 8i; 8t instead of E (x0ii) = 0.
Use the fact that if

ditions:
Amemiya and MaCurdy (1986) suggest to use matrix
X1
in
60
CHAPTER 4.
the list of instruments:
x11
6x
6 11
6 :::
6
6
6 x21
6
x21

X1 = 6
6
6 :::
6
6 xN 1
6
6 xN 1
6
4 :::
xN 1
x12
x12
:::
x22
x22
:::
xN 2
xN 2
:::
xN 2
:::
:::
:::
:::
:::
:::
:::
:::
:::
:::
x1T
x1T
:::
x2T
x2T
:::
xNT
xNT
:::
xNT
(i = 1; t = 1)
(i = 1; t = 2) 7
7
7
:::
7
(i = 2; t = 1) 7
7
(i = 2; t = 2) 7
7
7
:::
7
7
(i = N; t = 1) 7
7
(i = N; t = 2) 7
7
5
:::
(i = N; t = T )
QX1 = 0 and BX1 = X1. The AM instrument matrix

= [QX; X1; Z1], and an equivalent estimator obtains by
such that
is
WAM
using
WAM = [QX; (QX1); BX1; Z1];

where (QX1 ) is constructed as X1 above.
Amemiya and MaCurdy: their instrument matrix yields an estimator as least as ecient as with the Hausman-Taylor matrix,
if
i is not correlated with regressors 8t.
(QX1) to the Hausman-Taylor

list of instruments, but as [(QX1 ) ; X1 ] is of rank K1 , we only add
(T 1)K1 instruments. identication condition is T K1 G2.
Identication condition:
We add
4.4.4.2 Breusch, Mizon and Schmidt (1989)

Even more ecient estimator: based on conditions
E [(QX2it)0i] = 0 8i; 8t, instead of condition

E [(QT X2i)0i] = 0.
4.5. COMPUTATION OF VARIANCE-COVARIANCE MATRIX FOR IV ESTIMATORS61

For BMS, estimator is more ecient if endogeneity in
X2
origi-
nates from a time-invariant component. BMS instrument matrix:
WBMS = [QX; (QX1); (QX2); BX1; Z1]

where
(QX1)
and
(QX2)
are constructed the same way as
X1
for AM.
(QX2) to AmemiyaMaCurdy instruments. Condition is then T K1 +(T

1)K2 G2.
As before, we only add (T
1)K2 instruments, as (QX2) is not
full rank but (T
1)K2.
Identication condition:
For BMS, we add
4.5 Computation of variance-covariance matrix

for IV estimators
Problem here: endogenous regressors may yield unconsistent estimates of variance components in
, in particular parameter .
Method suggested by Hausman-Taylor (1981) that yields consistent estimates.
Let
M1 denote the individual-mean vector of the Within residual:

M1 = BY
where
BX ^ W = B

BX (X 0 QX ) 1X 0Q Y

= Z + + B BX (X 0 QX ) 1X 0Q ";
X = (X1jX2), Z = (Z1jZ2), and = ( 1; 2).
The last
three terms above can be treated as centered residuals, and it

suces to nd instruments for
The IV estimator of
is
Z2 in order to estimate .
^B = (Z 0PC Z ) 1(Z 0PC M1);
62
CHAPTER 4.
PC is the projection matrix associated to instruments C =

(X1; Z1). Using parameter estimates ^ W and ^B , we form residwhere
uals
QX ^ W and u^B = BY
u^W = QY
BX ^ W
Z ^B :
These two vectors of residuals are used to compute variance composants as in standard Feasible GLS.
4.5.1
Full IV-GLS estimation procedure
Step 1. Compute individual means and deviations, BX , BY ,

QX
and
QY .
Step 2. Estimate parameters associated to X using Within.

Step 3. Estimate B by the IV procedure above.
Step 4. Compute 2 and "2 from u^W and u^B , and compute
^ = 1 + T ^ 2 =^ 2" .
Step 5.p Transform variablespby GLS scalar procedure , e.g.,

(Q + B )Y = yit
(1
)yi.
Step 6. Compute projection projection PW from instrument

matrix
W.
Step 7. Estimate parameters .

4.6 Example: Wage equation
4.6.1
Model specication
4.6.
63
EXAMPLE: WAGE EQUATION
Theory (Human capital or signal theory):
log w = F [X1; ; ED];
:
where
w:
wage rate
X1: additional variables (industry, occupation status, etc.), and ED : educational level. Proxies
worker's ability (unobserved),
for ability that can be used: number of hours worked, experience,

union, etc.
Main objective: estimate marginal gain associated with
ED: @w=@ED.
But problem: what if worker's ability is constant through time and
conditions
where
ED ?
True model would be
log w = F [X1; ; ED];

ED = G[; X2];
X2 are additional, individual-specic variables.
If ability
is replaced by proxies Z , we have

log w = F [X1; Z; ED] + U;

ED = G[X2; Z2] + V;
U = F [X1; ; ED] F [X1; Z; ED] and

V = G[X2; ] G[X2; Z ].
where
Two problems when estimating the rst equation while overlooking the second one:
If some X1 and X2 variables in common, endogeneity bias (because of
ED);
If Z correlated with omitted variables (explaining ability), measurement-
error bias.
64
CHAPTER 4.
4.7 Application: returns to education

Sample used: Panel Study of Income Dynamics (PSID), University of Michigan. See Baltagi and KhantiAkom 1990, Cornwell
and Rupert 1988.
595 individuals, for years 1976 to 1982 (7 time periods): heads of

households (males and females) aged between 18 and 65 in 1976,
with a positive wage in private, nonfarm employment for the
years 1976 to 1982.
4.7.1
Variables related to job status
LW AGE : logarithm of wage earnings;

W KS : number of weeks worked in the year;
EXP : working experience in years at the date of the sample;
OCC : dummy, 1 if bluecollar occupation;
IND : dummy, 1 if working in industry;
UNION : dummy, 1 if wage is covered by a union contract.
4.7.2
Variables related to characteristics of households

heads
SMSA : dummy, 1 if household resides in SMSA (Standard

Metropolitan Statistical Area);
SOUT H : dummy, 1 if individual resides in the south;

MS : Marital Status dummy, 1 if head is married;
4.7.
65
APPLICATION: RETURNS TO EDUCATION
F EM : dummy, 1 female;
BLK : dummy, 1 if head is black;
ED : number of years of education attained.
Individual-specic variables:
ED, BLK
and
F EM .
Estimation of non-augmented models (w/o

Variables
a priori
individual eects):
MS );
Variables
IND).
a priori
Zi's)
endogenous (because correlated with ability:
X2: (EXP E , EXP E 2, UNION , W KS ,
exogenous:
X1: (OCC , SOUT H , SMSA,
Augmented model
Yit = X1it1 + X2it2 + Z1i 1 + Z2i 2 + i + "it

a priori endogenous: Z2: ED;
Variables a priori exogenous: Z1 : (BLK , F EM ).
Variables
66
CHAPTER 4.
Table 4.1:
Variable
LW AGE
EXP
W KS
OCC
IND
UNION
SOUT H
SMSA
MS
ED
F EM
BLK
Sample 1 1976-1982. Descriptive Statistics
Mean
Std. Dev.
Minimum
Maximum
6.6763
0.4615
4.6052
8.5370
19.8538
10.9664
1.0000
51.0000
46.8115
5.1291
5.0000
52.0000
0.5112
0.4999
0.0000
1.0000
0.3954
0.4890
0.0000
1.0000
0.3640
0.4812
0.0000
1.0000
0.2903
0.4539
0.0000
1.0000
0.6538
0.4758
0.0000
1.0000
0.8144
0.3888
0.0000
1.0000
12.8454
2.7880
4.0000
17.0000
0.1126
0.3161
0.0000
1.0000
0.0723
0.2590
0.0000
1.0000
4.7.
67
APPLICATION: RETURNS TO EDUCATION
Table 4.2:
Dependent variable: log(wage).
Exogenous regressors
only.
Within
GLS
0.0976 (0.0040)
OCC
-0.0696 (0.02323)
-0.0701 (0.02322)
SOUTH
-0.0052 (0.05833)
-0.0072 (0.05807)
SMSA
-0.1287 (0.03295)
-0.1275 (0.03290)
0.0317 (0.02626)
0.0317 (0.02624)
Constant
IND
2(4) = 0:551
Notes. Standard errors are in parentheses.
Table 4.3:
Dependent variable: log(wage). Endogenous regressors
only.
Within
GLS
0.0561 (0.0024)
0.1136 (0.002467)
0.1133 (0.002466)
EXPE2
-0.0004 (0.000054)
-0.0004 (0.000054)
WKS
0.0008 (0.0005994)
0.0008 (0.0005994)
-0.0322 (0.01893)
-0.0325 (0.01892)
0.0301 (0.01480)
0.0300 (0.01479)
Constant
EXPE
MS
UNION
2(5) = 24:94
68
CHAPTER 4.
Table 4.4:
Dependent variable: log(wage). Augmented model.

Within
GLS
0.1866 (0.01189)
OCC
-0.0214 (0.01378)
-0.0243 (0.01367)
SOUTH
-0.0018 (0.03429)
0.0048 (0.03188)
SMSA
-0.0424 (0.01942)
-0.0468 (0.01891)
IND
0.0192 (0.01544)
0.0148 (0.01521)
EXPE
0.1132 (0.00247)
0.1084 (0.00243)
-0.0004 (0.00005)
-0.0004 (0.00005)
0.0008 (0.00059)
0.0008 (0.00059)
-0.0297 (0.01898)
-0.0391 (0.01884)
0.0327 (0.01492)
0.0375 (0.01472)
FEM
-0.1666 (0.12646)
BLK
-0.2639 (0.15413)
ED
0.1373 (0.01415)
Constant
EXPE2
WKS
MS
UNION
2(9) = 495:3
Table 4.5:
Dependent variable: log(wage). IV Estimation

HT
AM
BMS
0.1772 (0.017)
0.1781 (0.016)
0.1748 (0.016)
-0.0207 (0.013)
-0.0208 (0.013)
-0.0204 (0.013)
0.0074 (0.031)
0.0072 (0.031)
0.0077 (0.031)
-0.0418 (0.018)
-0.0419 (0.018)
-0.0423 (0.018)
IND
0.0135 (0.015)
0.0136 (0.015)
0.0138 (0.015)
EXPE
0.1131 (0.002)
0.1129 (0.002)
0.1127 (0.002)
-0.0004 (0.005)
-0.0004 (0.000)
-0.0004 (0.000)
0.0008 (0.000)
0.0008 (0.000)
0.0008 (0.000)
-0.0298 (0.018)
-0.0300 (0.018)
-0.0303 (0.018)
0.0327 (0.014)
0.0324 (0.014)
0.0326 (0.014)
FEM
-0.1309 (0.126)
-0.1320 (0.126)
-0.1337 (0.126)
BLK
-0.2857 (0.155)
-0.2859 (0.155)
-0.2793 (0.155)
0.1379 (0.021)
0.1372 (0.020)
0.1417 (0.020)
Constant
OCC
SOUTH
SMSA
EXPE2
WKS
MS
UNION
ED
Test
2(3) = 5:23 2(13) = 19:29 2(13) = 12:23
Chapter 5
Dynamic panel data models
5.1 Motivation
Usefulness of dynamic panel data models:
Investigate adjustment dynamics in micro- and macro-economic

variables of interest;
Estimate equations from intertemporal-framework models (lifecycle models, nance,...)
In practice: estimate long-run elasticities and structural parameters from Euler equations.
5.1.1
Dynamic formulations from dynamic programming

problems
Consider the general problem
R
maxq(0);:::;q(T ) E e rt(t) ;
(t) = p(t)q(t) c[q(t); b(t)];
b_ = G[b(t); q(t)];
69
70
CHAPTER 5.
DYNAMIC PANEL DATA MODELS
b(t) is the state variable (stock, capital,...), q(t) is the control variable, r is discount rate. G(:) describes the evolution path
where
of the state variable.
Dynamic programming solves the problem in a series of steps.
Switch to discrete-time framework:
nP
o
T
t
maxq0;:::;qT E
t=0(1 + r) t ;
bt+1 = f (bt; qt);

and use the Bellman equation:

Vt(bt) = max Et t + (1 + r) 1Vt+1(bt+1)
= max Et fptqt c[qt; bt] + Vt+1f [bt; qt]g ;

where Vt (bt ) is the value function of the problem at time t,
Et is the conditional expectation operator at time t.
and
We use a) the envelope theorem (evolution path at optimum depends only on state variable, as control variable is already optimized); b) First-order condition wrt. control variable.
@Vt(bt) @ t(bt; qt)

1 @Vt+1 @f (bt; qt)
=
+
;
@bt
@bt
1 + r @f
@bt
(Envelope theorem)
@Vt(bt) @ t(bt; qt)

1 @Vt+1 @f (bt; qt)
=
+
=0
@qt
@qt
1 + r @f
@qt
From (F OC ):

@Vt+1
@ t @f (bt; qt)
=
@f
@qt
@qt
1
(1 + r);
(FOC)
5.1.
71
MOTIVATION
that we replace in rst equation above:
@Vt @ t
=
@bt @bt
Now we lag
1
(F OC ) once and replace:

"
@ t 1
1 @ t
+
@qt 1 1 + r @bt
Assume
@ t @f (bt; qt)
@qt
@qt

@ t @f
@qt @qt
1
@f=@q = a1 and @f=@b = a2.

@f (bt; qt)
:
@bt
@f @f (bt 1; qt 1)
= 0:
@bt
@qt 1
We have
@ t
1 + r @ t 1
a @ t
=
+ 1
:
@qt
a2
@qt 1
a2 @bt
This is the Euler equation relating current and past marginal
prots.
If, for instance, prot is linear-quadratic in
b0 + b1qt + b2bt =
1+r (b0 + b1 qt 1 + b2 bt 1)
a

2
qt and bt, we have
a1
a2
(c0 + c1qt + c2bt)
qit = 0 + 1qi;t 1 + 2bi;t 1 + 3bit + i + "it;
where
0
1
2
3
5.1.2
= (a2 b1
= (a2 b1
= (a2 b1
= (a2 b1
a1c1)
a1c1)
a1c1)
a1c1)
1 [b ((1 + r) a ) + a c ] ;
0
2
1 0
1 [(1 + r)b ] ;
1
1 [(1 + r)b ] ;
2
1 [a c a b ] :
1 2
2 2
Euler equations and consumption
72
CHAPTER 5.
Consider a two-period model with the following period-to-period

budget constraint
ct + At = yt + At 1(1 + rt); t = 1; 2;
where
ct
is consumption at time
income, and
rt is interest rate.
t, At
is total assets,
yt
is wage
Assume further, intertemporally additive preferences:
U = u(c1) +
where
u0 > 0, u00 < 0 and
1
u(c );
1+ 2
0 is the subjective discount rate.
Often-used specication: CES (Constant Elasticity of Substitution)
U
where
= c1 +
1
c2 ;
1+
= 1=(1+ ) is the intertemporal elasticity of substitution.
At the optimum (by replacing budget constraints in utility function and optimizing wrt.
A1):
@u @c1
1 @u @c2
@U
=
+
=0
@A1 @c1 @A1 1 + @c2 @A1
@u 1 + r @u
, @c
=
:
1 1 + @c2
This is the
intertemporal eciency condition
the CES case we have
c1 1= =
1+r
c2 1= :
1+
(Hall 1978), and in
5.1.
73
MOTIVATION
Stochastic framework with
c1 =
1+r
(
1+
u(X ) = 1=2(
Ec2)
X )2 :
c1 = Ec2
if
r = :
Hall Euler equation with more than 2 periods reduces to
ct+1 = ct + "t+1;
where
"t+1 is i.i.d.;
which is tested from the equation
ct = 0 + 1yt + 2(yt 1
ct 1) + "t:
This is an error-correction model that can be written
ct = 0 + 1yt + (ct 1
5.1.3
1yt 1) + 2(yt 1
ct 1) + "t:
Long-run relationships in economics
Long-run relationships are represented by the stationary path

of the variable of interest (consumption, capital stock,...)
yt+1
yt
= and if we add variable xt, yt+1 = yt + xt+1 , stationary

x .
equilibrium path is y =
1
5.1.3.1 Long-run elasticities
Dynamic models are helpful in computing long-run elasticities.
Consider for example the dynamic consumption model
where
C~i;t+j = C~i;t+j 1 + P~i;t+j + ui;t+j ;

C~i;t+j and P~i;t+j respectively denote logs of
and price.
have
consumption
Lagged consumption here accounts for habits.
C~i;t+j = j +1C~i;t 1 + j P~it + j 1P~i;t+1 + : : :
We
74
CHAPTER 5.
+P~i;t+j 1 + P~i;t+j + ui;t+j ;

j
j 1u
where ui;t+j = uit +
i;t+1 + + ui;t+j 1 + ui;t+j .
Assume we want to compute the change in consumption at
t + j following a permanent change of 1% in price between

t and t + j :
@ C~i;t+j
@ C~i;t+j @ C~i;t+j
+
+

+
= (j + j 1 + + + 1):
~
~
~
@ Pit
@ Pi;t+1
@ Pi;t+j
time
When consumption is stationary (in logs),
jj < 1, and the long-
run eect of price obtains by taking the limit
j
X
@ C~i;t+j
j + j 1 + + + 1) = :
=
lim

(

~
j !1
j !1
1
s=0 @ Pi;t+s
lim
5.1.3.2 Dynamic representations from AR(1) errors

Consider the following Cobb-Douglas production model
log Qit = 1 log Nit + 2 log Kit + uit;

where
Qit
is output of rm
capital stock, and

poses into
where
t
change),
uit
i at time t, Nit
is labor input,
Kit
is
is the residual. Assume the latter decom-
uit = t + i + vit + "it;
is a year-specic intercept (industry-wide technological
is the unobserved rm-specic eect,
error component (measurement error), and

shock having an AR(1) representation:
vit = vi;t 1 + eit:
vit
"it
is an i.i.d.
is a productivity
5.2.
75
THE DYNAMIC FIXED-EFFECT MODEL
This model has the following, dynamic representation:
log Qit = 1 log Nit
1 log Ni;t 1 + 2 log Kit
2 log Ki;t 1
+ log Qi;t 1 + (t
t 1) + [i(1 ) + eit + "it
"i;t 1] ;
or
log Qit = 1 log Nit + log Ni;t 1 + 3 log Kit + log Ki;t 1
+5 log Qi;t 1 + t + (i + !it);
subject to restrictions
2 = 1 5 and 4 = 35.
Hence, equivalence between a static (short-run) model with seriallycorrelated productivity shocks, and a dynamic representation of
production output.
5.2 The dynamic xed-eect model

Simple dynamic panel-data model:
yit = yi;t 1 + i + "it; i = 1; 2; : : : ; N ; t = 1; 2; : : : ; T;

yi0; i = 1; 2; : : : ; N are assumed known.
2
We assume E ("it ) = 0 8i; t, E ("it "js ) = "
if i = j; t = s and 0 otherwise, E (i "it ) = 0 8i; t.
where initial conditions
By continuous substitution:
yit = "it + "i;t 1
+ 2"
i;t
2 +
+ t
t
1 " + 1 + t y :
i1
i0
1 i
76
CHAPTER 5.
5.2.1
Bias in the Fixed-Eects estimator
The Within estimator is:
^ =
PN PT
i=1 t=1(yit yi)(yi;t 1 yi; 1) ;
PN PT
2
i=1 t=1 (yi;t 1 yi; 1)
^ i = yi
^yi; 1;
where
T
T
T
1X
1X
1X
y ; yi; 1 =
y ; "i =
" :
yi =
T t=1 it
T t=1 i;t 1
T t=1 it
Also,
1 PN PT ("it "i )(yi;t 1 yi; 1)

^ = + NT 1i=1PN t=1PT
;
2
(
y
y

)
i;t
1
i;
1
i=1 t=1
NT
This estimator exists if denominator
6= 0 and is consistent if nu-
merator converges to 0.
Numerator:
1
plimN !1
NT
because
N;T
X
i;t
(yi;t 1
yi; 1)("it
N
1X
"i) = plim
y "
N i=1 i; 1 i
"it is serially uncorrelated and not correlated with i .
We
use
T
1X
1 1 T
(T
yi; 1 =
yi;t 1 =
yi0 +
T t=1
T 1
1) T + T
i
(1 )2

1 T 1
1 T 2
+
" +
" + + "i;T 1 :
1 i1
1 i2
5.2.
77
We have
N
X
plim
N
X
1
1
1
yi; 1"i = plim
"i
N i=1
N i=1 T
(
N
X
T
X
1
1
"
N i=1 T t=1 it

"2 (T 1)
= 2
T
(1
1
T
"
T 1
X
1
1
t=1
"
T 1
X
1
T t
T t
#)
"it
#)
"it
1

t=1

T + T
:
)2
1 PN;T (y
2
In a similar manner, we show that plim
i;t 1 yi; 1)
i;t
NT
= plim
2
= " 2 1
1
1) T + T
T2
2
(T

(1 )2
1
T
Forming the ratio of these two terms, the asymptotic bias is
1 1 T
T 1
1+
plimN !1 (^
) =
1
T 1
2
(1 )(T
1)
1 T
T (1 )
1
= O(1=T ):
In the transformed model
(yit
yi) = (yi;t 1
yi; 1) + ("it "i);
the explanatory variable is correlated with residual, and correlation is of order
1=T .
Hence, the Fixed-Eects estimator is biased
in the usual case where
is large and
is small.
78
CHAPTER 5.
Table 5.1:
0.2
0.5
0.7
0.9
Asymptotic bias in Fixed-Eects DPD estimator

Bias
Percent
-0.2063
-103.1693
-0.1539
-76.9597
10
-0.1226
-61.3139
20
-0.0607
-30.3541
40
-0.0302
-15.0913
-0.2756
-55.1282
-0.2049
-40.9769
10
-0.1622
-32.4421
20
-0.0785
-15.6977
40
-0.0384
-7.6819
-0.3307
-47.2392
-0.2479
-35.4084
10
-0.1966
-28.0912
20
-0.0938
-13.3955
40
-0.0449
-6.4114
-0.3939
-43.7633
-0.3017
-33.5179
10
-0.2432
-27.0248
20
-0.1196
-13.2934
40
-0.0563
-6.2561
5.2.
79
5.2.2
Instrumental-variable estimation
Only way to obtain consistent estimator of

(small).
when
is xed
Dierent procedure to eliminate individual eects: use
First dierencing instead of Within:
(yit
yi;t 1) = (yi;t 1 yi;t 2) + ("it

yit = yi;t 1 + "it;
"i;t 1)
and in vector form:
yi = yi; 1 + "i; i = 1; 2; : : : ; N:

In model above, yi;t 1 correlated by construction with "i;t 1 ! We
need instruments that are uncorrelated with ("it
"i;t 1) but correlated with (yi;t 1
yi;t 2). Only possibility in a single-equation
framework with no other explanatory variables: use values of dependent variables.
Because of autoregressive nature of model, instruments from fu-
yit are not feasible because yit is a recursive function

of "it ; "i;t 1; : : : ; "i1 ; i ; yi0 .
As for lagged dependent variables, we can use either yi;t 2 or
(yi;t 2 yi;t 3):
E [yi;t 2("it "i;t 1)] = E ("i;t 2"it) E ("i;t 2"i;t 1) = 0;
E [(yi;t 2 yi;t 3)("it "i;t 1)] = E ["i;t 2("it "i;t 1)]
E ["i;t 3("it "i;t 1)] = 0;
E [yi;t 2(yi;t 1 yi;t 2)] = 0 E ("2i;t 2) = "2;
E [(yi;t 2 yi;t 3)(yi;t 1 yi;t 2)] = 0 E ("2i;t 2) = "2:
Instrumental-variable estimators that are consistent when N and/or
T ! 1:
PN PT
(y y )(y
y )
^ = PNi=1PT t=3 it i;t 1 i;t 2 i;t 3
i=1 t=3 (yi;t 1 yi;t 2)(yi;t 2 yi;t 3)
ture values of
80
CHAPTER 5.
^ =
or
Conclusion:
even though
because the
PN PT
i=1 t=3(yit
PN PT
i=1 t=3(yi;t 1
yi;t 1)yi;t 2
:
yi;t 2)yi;t 2
With Within transformation on a dynamic model,
i is eliminated, endogeneity bias occurs for xed T

Q operator used introduces errors "is correlated by
construction with current explanatory variable.
Consider now a more general model:
yit = yi;t 1 + xit + zi + i + "it:

IV Estimation proceeds as follows.
Step 1.
(yit
First-dierence the model, to get
yi;t 1) = (yi;t 1
yi;t 2) + (xit
xi;t 1) + "it
"i;t 1:
yi;t 2 or (yi;t 2 yi;t 3) as instrument for (yi;t 1 yi;t 2) and

estimate ; with the IV procedure.
Use
Step 2.
Substitute
yi
^yi; 1
and estimate
Step 3.
^ and ^
in rst-dierence Between equation:
xi^ = zi + i + "i; i = 1; 2; : : : ; N;
by OLS.
Estimate variance components:
^ 2" = 2N (T1 1) Ni=1 Tt=1 [(yit yi;t 1) ^(yi;t 1

i2
^
(xit xi;t 1) ;
i2
PN h
2
1
1 ^ 2;
^
^ = N i=1 yi ^yi; 1 zi ^ xi
"
T
yi;t 2)
5.3.
81
THE RANDOM-EFFECTS MODEL
Consistency of the estimator:
, and "2 are consistent when N or T ! 1;

2
IV estimator of and are consistent only when T ! 1, but
inconsistent when T is xed and N ! 1.
IV estimator of
5.3 The Random-eects model

We now treat
for static models,
as a random variable, in addition to
yi;t 1.
Bias in the ML estimator
In the simple model
yit = yi;t 1 + i + "it, the MLE is equivalent
to the OLS estimator:
^ =
PN PT
i=1 t=1 yit yi;t 1
PN PT
2
i=1 t=1 yi;t 1
=+
PN PT
i=1 t=1(i + "it)yi;t 1 :
PN PT
2
i=1 t=1 yi;t 1
We show that
N X
T
1 1 T
1 X
( + " )y
=
Cov(yi0; i)
plimN !1
NT i=1 t=1 i it i;t 1 T 1
1 2
T;
+
(
T
1)
T

+

T (1 )2
and
As
is not eliminated, but it is correlated by
construction with lagged dependent variable
5.3.1
"it.
N X
T
N 2
1 X
1 2T
2
i yi0
plimN !1
yi;t 1 =
:
NT i=1 t=1
T (1 2) N

2 1
1 T 1 2T
+
: T 2
+
(1 )2 T
1
1 2
82
CHAPTER 5.
1 T 1
+
T (1 ) 1
1

"2
+
(T 1)
T (1 2)2
2
2T
Cov(yi0; i)
2
T 2 + 2T :
The bias depends on the behavior of initial conditions

or generated as
5.3.2
yit).
yi0 (constant
An equivalent representation
We consider a more general model
yit = yi;t 1 + xit + zi + uit;

with the following assumptions:
jj < 1; E (i) = E ("it) = 0;

E (ixit) = 0; E (izi) = 0; E (i"it) = 0;
E (ij ) = 2 if i = j;
0 otherwise;
E ("it"js) = "2 if i = j; t = s;
0 otherwise:
We can also write
wit = wi;t 1 + xit + zi + "it;

yit = wit + i;
where i = i =(1
); Ei = 0; V ar(i) = 2 = 2 =(1 )2;
and the dynamic process
fect
i.
fwitg is independent from individual ef-
5.3.
83
5.3.3
The role of initial conditions
The two equivalent specications of the model are:

(A)
(B)
In model (A),
yit
yit = yi;t 1 + xit + zi + i + "it;

wit = wi;t 1 + xit + zi + "it;
yit = wit + i:
is driven by unobserved characteristics
ferent across units, in addition to
xit and zi .
i , dif-
wit is independent from individual

eects i . Conditional on exogenous xit and zi , wit are driven by
identical processes with i.i.d. shocks "it . But observed value yit is
shifted by individual-specic eect i .
In model (B), dynamic process
Possible interpretation:
and
wit
is a latent variable,
i is a time-invariant measurement error.
The two processes are equivalent because
yit
is observed,
wit is unobserved.
But
assumptions (or knowledge) on initial conditions may help to distinguish between both processes.
Dierent cases:
1/ yi0 xed;
2/ yi0 random;
2.a/ yi0 independent of i, with E (yi0) = y
2 ;
and
V ar(yi0) =
y0
2.b/ yi0 correlated with i, with Cov(yi0; i) = y2 ;

3/ wi0 xed;
4/ wi0 random;
4.a/ wi0 random with common mean w and variance "2=(1
0
2)
84
CHAPTER 5.
(stationarity assumption);
4.b/ wi0 random with common mean w and arbitrary variance

w2 0;
4.c/ wi0 random with mean i0 and variance "2=(1
2) (sta-
tionarity assumption);
4.d/ wi0 random with mean i0 and arbitrary variance w2 0.
See Appendix 4 for a derivation of Maximum Likelihood estimators in each case.
5.3.4
Possible inconsistency of GLS
In cases 1 and 2.a/ (yi0 xed of random but independent of i ):
"2 are known, maximizing log-likelihood wrt. ;

2
2
and yields the GLS estimator. When and " are unknown,
When
2
and
feasible GLS applies by using consistent estimates of these variances in
VT .
Other cases
and are consistent when T ! 1, because GLS

converges to Within. When N ! 1 and T is xed, GLS is inconEstimators for
sistent in cases where initial values are correlated with individual

eects.
5.3.5
Example: The Balestra-Nerlove study
Seminal paper on Dynamic Panel Data models (1966). Household

demand for natural gas in the US, including a/ the demand due
to replacement of gas appliances, and b/ demand due to increases
in the stock of appliances.
5.3.
85
Table 5.2: Properties of the MLE for dynamic panel data models
Parameters
xed,
Case 1:
; ; "2
; 2

Case 2.a:
; ; "2
y ; ; 2 ; y2
0
; ; "2
wi0; ; 2
yi0
xed,
!1
xed
Consistent
Inconsistent
Consistent
yi0
Case 2.b:
!1
Consistent
; ; "2
y ; ; 2 ; y2 ;
random,
yi0
ind. of
Consistent
Consistent
Inconsistent
Consistent
yi0
correlated with
Consistent
Consistent
Inconsistent
Consistent
Case 3:
wi0
xed
Consistent
Inconsistent
Inconsistent
Inconsistent
wi0 random, mean w , variance "2=(1 2)

; ; "2
Consistent
Consistent
2
w ; ;
Inconsistent
Consistent
2
Case 4.b: wi0 random, mean w , variance w
; ; "2
Consistent
Consistent
2
w ; ; ; w
Inconsistent
Consistent
Case 4.c: wi0 random, mean i0, variance "2 =(1
2)
; ; "2
Consistent
Inconsistent
2
Inconsistent
Inconsistent
i0; ;
2
Case 4.d: wi0 random, mean i0, variance w
; ; "2
Consistent
Inconsistent
2
2
i0; ; w
Inconsistent
Inconsistent
Case 4.a:
86
CHAPTER 5.
Demand system:
Git = Git (1 r)Gi;t 1;

Fit = Fit (1 r)Fi;t 1;
Fit = a0 + a1Nit + a2Iit;
Git = b0 + b1Pit + b2Fit;
Git and Git are respectively the new demand and the actual
demand for gas at time t from unit i, r is the appliances deprecia
tion rate, Fit and Fit are respectively the new and actual demand
for all types of fuel, Nit is total population, Iit is per-head income,
and Pit is relative price of gas.
where
Solving the system, we have the equation to be estimated:
Git = 0 + 1Pit + 2Nit + 3Ni;t 1

+4Iit + 5Ii;t 1 + 6Gi;t 1;
where
Nit = Nit
Ni;t 1, Iit = Iit
Ii;t 1, and 6 = 1 r.
Estimation procedures: OLS, Within (LSDV) and GLS (with as-
Gi0 are xed, case 1/).

In accordance with the theory, (here, 6 ) is biased upward for
sumption that initial conditions
OLS and downward for Within.
5.3.
87
Table 5.3: Parameter estimates, Balestra-Nerlove model

Parameter
0 (Intercept)
1 (Pit)
2 (Nit)
3 (Ni;t 1)
4 (Iit)
5 (Ii;t 1)
6 (Gi;t 1)
OLS
Within
GLS
-3.650
-4.091
(3.316)
(11.544)
-0.0451(*)
-0.2026
-0.0879(*)
(0.027)
(0.0532)
(0.0468)
0.0174(*)
-0.0135
-0.00122
(0.0093)
(0.0215)
(0.0190)
0.00111(**)
0.0327(**)
0.00360(**)
(0.00041)
(0.0046)
(0.00129)
0.0183(**)
0.0131
0.0170(**)
(0.0080)
(0.0084)
(0.0080)
0.00326
0.0044
0.00354
(0.00197)
(0.0101)
(0.00622)
1.010(**)
0.6799(**)
0.9546(**)
(0.014)
(0.0633)
(0.0372)
Notes. N = 36, T = 11. Standard errors are in parentheses. (*) and (**):
parameter signicant at 10% and 5% level respectively.
88
CHAPTER 5.
Part II
Generalized Method of Moments
estimation
89
Chapter 6
The GMM estimator
Generalized Method of Moments: ecient way to obtain consistent parameter estimates under mild conditions on the model.
Very popular in estimating structural economic models, as it requires much less conditions on model disturbances than Maximum
Likelihood. Another important advantage: easy to obtain parameter estimates that are robust to heteroskedasticity of unknown
form.
6.1 Moment conditions and the method of moments

6.1.1
Moment conditions
N , fxi; i = 1; 2; : : : ; N g from which one

wishes to estimate a p 1 vector whose true value is 0 .
Note: notation above is very general, xi will typically include de-
Consider a sample of size
pendent (endogenous) and explanatory (exogenous, endogenous)

variables.
Let
f (xi; ) denote a q 1 function whose expectation E [f (xi; )]

91
92
CHAPTER 6. THE GMM ESTIMATOR
exists and is nite. Moment conditions are then dened as
E [f (xi; 0)] = 0:
6.1.2
Example: Linear regression model
Consider the linear model
yi = xi0 + ui; i = 1; 2; : : : ; N;
where
0 :
true value of parameter vector
term.
A common assumption is
and
ui
is the error
E (uijxi) = 0 , E (yijxi) = xi0, and
from the Law of Iterated Expectations:
E (xiui) = E [E (xiuijxi)] = E [xiE (uijxi)] = 0:

In terms of the denition above,
xi ).
= and f ((xi; yi); ) = xi(yi
Moment conditions are then
E (xiui) = E [xi(yi
Note that here,
p = q,
xi0] = 0:
as many moment conditions as we have
parameters to estimate.
Suppose now we do not assume
E (ziui) = 0.
such that
There are
Vector
but instead, that
zi is q 1 and would consist of instruments
E (ziui) = E [zi(yi xi0)] = 0;

f [(xi; yi; zi); ] = zi(yi xi ):
or
q moment equations (as many as there are instruments)
p parameters to estimate.
q p.
and
E (uijxi) = 0
Hence, identication condition is
6.1.
MOMENT CONDITIONS AND THE METHOD OF MOMENTS
6.1.3
Example: Gamma distribution
A sample
bution
93
fxi; i = 1; 2; : : : ; N g is drawn from a Gamma distri-
(a; b) with
true values
a0
and
b0.
Relationship between
parameters and two rst moments of the distribution:
a
E (xi) = 0 ;
b0
a
E (xi)]2 = 20 :
b0
In our notation in the denition above: = (a; b) and
h
a
a 2 ai
; (x
)
;
f (xi; ) = xi
b i b
b2
so that E [f (xi; 0] = 0.
6.1.4
E [xi
Method of moments estimation
using moment conditions given above ? In the

case where p = q (as many conditions as parameters), we could
solve E [f (xi; 0 )] = 0 for 0 . But E [f (:)] is unknown, whereas
function values f (xi; ) can be computed 8; 8i. Also, sample
moments of function f (:) can be computed:
How to estimate
N
1X
fN () =
f (x ; ):
N i=1 i
E (f ) close to
fN (population moments close to empirical moments), then ^N is
a convenient estimate for 0 , where f (^
N ) = 0.
0 = E [f (0)] fN (^N ) ) 0 ^N :
Basic idea of the method of moment estimation: if
Two important conditions need to hold for the method of moment

estimation to be valid: a)
E (f )
is adequately approximated by
94
fN ; b) moment conditions can be solved for ^N .

Example: linear regression.
Sample moment conditions are
N
N
1X
1X
x u^ =
x (y
N i=1 i i N i=1 i i
and solving for
^ N
yields
^ N =
6.1.5
xi^ N ) = 0;
N
X
i=1
xix0i
! 1 N
X
i=1
xiyi:
Example: Poisson counting model
Poisson process: dependent variable is discrete (number of events,

etc.). Restriction: Mean of distribution is equal to the variance.
Assumption:
dependent variables
y1; y2; : : : ; yN
are distributed
according to independent Poisson distributions, with parameters
1; 2; : : : ; N
respectively.
P rob[yi = r] = exp( i)

We assume the
i's
ri
r!
depend on explanatory variables by a log-
linear relationship:
log i = 0 +
p
X
j =1
j xij :
The likelihood of the Poisson model is
L=
Ni=1
exp(

yi i
i)
yi!
"
= exp
N
X
i=1
i + 0
N
X
i=1
yi
6.1.
MOMENT CONDITIONS AND THE METHOD OF MOMENTS
p
X
j =1
N
X
i=1
xij yi
1
Ni=1yi!
95
Let us consider the following sample moments :
T0 =
N
X
i=1
yi
Tj =
N
X
i=1
xij yi
j = 1; : : : ; p;
and we use the fact that
@i
= i
@0
If we set derivatives of
T0 =
N
X
i=1
^ i
and
@i
= xij i:
@j
log L wrt. 0 and the j 's to 0, we get
Tj =
N
X
i=1
xij î
j = 1; : : : ; p
P
î = exp(^ 0 + pj=1 ^ j xij ): Hence, we match sample moPN
Pp
^
ments T0 and Tj to theoretical moments
exp(

+
^ j xij )
0
i
=1
j
=1
PN
^ Pp ^
and Tj =
i=1 xij exp( 0 + j =1 j xij ) respectively.
We have p + 1 such matching conditions for p + 1 parameters.
where
6.1.6
Comments
Note the dierence between the Method of Moments philosophy

and the usual estimation criteria. For Maximum Likelihood and
Least Squares, we maximize (minimize) a criterion
^ = arg max log L() (MLE);

^ = arg min N1 PNi [yi f (xi; )]2
(LS)
96
whereas here, we start from First-order Conditions and solve the

system for
.
Example: Instrumental Variable estimation

We could consider minimizing the IV criterion wrt.
^ = arg min (Y

where
X)0Z (Z 0Z ) 1Z 0(Y
:
X);
Z is a N q matrix of instruments, or start from the FOC:

N
N
1X
1X
z u^ =
z (y
N i=1 i i N i=1 i i
^ =
N
X
i=1
zi0 xi
! 1 N
X
i=1
xi^) = 0
zi0 yi = (Z 0X ) 1Z 0Y:
Equivalently, we could maximize the log likelihood wrt.

from the FOC
or start
N
1X
@ log L()
j=^ = 0;
N i=1
@
which can be regarded here as a set of sample moment conditions.
Problems that remain to be solved:
Ensure that we can replace population moments by sample moments, for the Method of Moments to work.
What if the system of moment conditions is overidentied (more

conditions than parameters) ?
How to be sure our moment conditions are valid (e.g.,

choice of instruments) ?
valid
6.2.
97
THE GENERALIZED METHOD OF MOMENTS (GMM)
6.2 The Generalized Method of Moments (GMM)

6.2.1
Introduction
As the name indicates, GMM is an extension of the Method of
are overidentied by moment conditions. Equations E [f (xi; 0 ] = 0 represent q conditions for p

unknown parameters, therefore we cannot nd a vector ^
N satisfying fN ( ) = 0.
Moments, when parameters
But we can look for
^ that makes fN () as close to 0 as possible,
by dening
^N = arg min QN () = fN ()0AN fN ();

AN
0(1).
Important note: for the just-identied case, QN ( ) = 0
fN () = 0, but in the over-identied case, QN () > 0.
where
is a positive weighting matrix of order
because
This fact is important for model checking (we will come to this
point later in the course).
6.2.2
Example: Just-identied IV model
Consider
Y = X + u with condition E (W 0u) = 0 (W
ments), and
rank(W 0X ) = p.
Solving for
we have
are instru-
^ = (W 0X ) 1(W 0Y )
that we replace in the IV criterion:
u(^ )0PW0 u(^ ) = Y

X (W 0X ) 1(W 0Y ) 0 W
(W 0W ) 1W 0

Y X (W 0X ) 1(W 0Y )

= Y 0PW Y + (W 0Y )0(W 0X ) 1X 0 PW X (W 0X ) 1(W 0Y )
98
(W 0Y )0(W 0X ) 1X 0 PW Y Y 0PW X (W 0X ) 1(W 0Y )

= Y 0PW Y + (Y 0W )(W 0X ) 1(X 0W )(W 0W ) 1(W 0X )(W 0X ) 1
(W 0Y ) (Y 0W )(W 0X ) 1(X 0W )(W 0W ) 1(W 0Y )
(Y 0W )(W 0W ) 1(W 0X )(W 0X ) 1(W 0Y )
0
1 = (X 0 W ) 1 :
and because (W X )
u(^ )0PW0 u(^ ) = 2Y 0PW Y 2Y 0PW Y = 0:
6.2.3
A denition
Let the observed sample fxi; i = 1; 2; : : : ; N g from

which we wish to estimate a p 1 vector of parameters whose
true value is 0. Let E [f (xi; 0)] = 0 be a set of q moment conditions, and fN () the corresponding set of sample moments. Dene
the criterion
QN = fN ()0AN fN ();
where AN is a stochastic, positive O(1) matrix. The GMM estimator of is
^N = arg min QN ():
Denition 1
6.2.4
Example: The IV estimator again
Consider again the linear regression model with
q > p instruments
(this is an over-identied model). Moment conditions are
E (ziui) = E (zi(yi
xi0)) = 0
and sample moments are
N
1X
fN ( ) =
z (y
N i=1 i i
xi ) =
1 0
(Z Y
N
Z 0X ):
6.3.
ASYMPTOTIC PROPERTIES OF THE GMM ESTIMATOR
99
Let us choose the weighting matrix as
AN =
Assume that
1 0Z
NZ
N
X
1
zi0 zi
N i=1
! 1
= N (Z 0Z ) 1:
converges in probability, (as
! 1), to a
A. The GMM criterion is then

1
QN ( ) = (Z 0Y Z 0X )0(Z 0Z ) 1(Z 0Y Z 0X ):
N
Dierentiating wrt. give rst-order conditions:
1
@QN ( )
j
^ N = 2X 0 Z (Z 0Z ) 1(Z 0Y Z 0 X ^ N ) = 0:

=

@
N
^ N , we have
Solving for
constant matrix

^ N = X 0Z (Z 0Z ) 1Z 0X 1 X 0Z (Z 0Z ) 1Z 0Y:
This expression is the IV formulation for the case where there are
more instruments than parameters.
6.3 Asymptotic properties of the GMM estimator

We examine here key properties that any useful estimator should
verify: consistency (convergence to the true parameter value as
the sample size gets large) and asymptotic normality (to be able
to use the asymptotic distribution for statistical inference).
100
6.3.1
Consistency
Assumption set 1
(i)
E [f (xi; )] exists and is nite 8 2 and 8i.

gi() = E [f (xi; )].
gi() = 0 8i , = 0.
(ii) Let
There exists
0
such that
fNj and gNj respectively denote elements of the q vectors

p
fN () and gN (). Then fNj gNj !
0 uniformly 8 2
and 8j = 1; 2; : : : ; q .
(iii) Let
(iv) There exists a non-random sequence of positive denite matrices
AN
such that
AN
AN
Under assumptions (i)

^N is weakly consistent.
Theorem 1
p
!
0.
(iv), the GMM estimator
(iii) is a stronger requirement than

pointwise convergence in probability on . It means that
Note: Uniform convergence in
p
sup fNj () gNj ()

2
! 0 for j = 1; 2; : : : ; q:
With pointwise convergence in probability only, it is not always

true that
when
fNj (N )
gNj (N )
p
!
0, where N is a sequence of
increases.
Elements of the proof:

From (iii) and (iv ), we can form a non-random sequence
Q N () = gN ()0AN gN ()
6.3.
ASYMPTOTIC PROPERTIES OF THE GMM ESTIMATOR
101
such that
p
!
0 uniformly for 2 :
N () = 0 , =
we have that Q
QN () Q N ()
(i) and (ii),
Q N () > 0 otherwise.
From
0,
and
Therefore,
0 = arg min Q N ():

2
^N
^N minimizes QN ();
0 minimizes Q N (p );
QN () Q N () ! 0:
But this implies that
p
!
0, because
For asymptotic normality, we need additional assumptions.
6.3.2
Asymptotic normality
Assumption set 2
(v) Function
f (xi; ) is continuously dierentiable wrt. on .

P
FN () = @fN ()=@ = N1 Ni=1 @f (xi; )=@.

p

!
sequence N such that N
0, we assume that
(vi) Let
FN (N ) FN
where
on
.
FN
is a sequence of
For any
p
!
0;
q p matrices that do not depend
102
(vii) Function
f (xi; 0) satises a central limit theorem:
VN 1=2 NfN (0)
d
!
N (0; Iq );
N = NV ar[fN (0)], a sequence of q q non-random,
where V
positive denite matrices.
Under Assumptions (i) (vii), thepGMM estimator

^N has the following asymptotic distribution: N (^N 0) v
N (0;
), where
is a p p matrix:
Theorem 2
i 1
FN (^N )0AN FN (^N )
Using White (1984,
FN (^N )0AN VN AN FN (^N )

h
i 1
0
^
^
FN (N ) AN FN (N ) :
Asymptotic theory for econometricians, Aca-
demic Press: Orlando, Denition 4.20):
1=2

0
0
^
^
^
^

FN (N ) AN VN AN FN (N )
FN (N ) AN FT (N )
p
d
N (0; Ip)
N (^N 0) !
Proof:
We know that
f (^N ) = 0 because ^N
minimizes the GMM crite-
rion. Consider a rst-order expansion of
0:
fN
around the true value
0 = fN (^N ) = fN (0) + FN (N )(^N 0);

where N 2 [^
N ; 0]. Since ^N is a consistent estimator (proved
p
!
above), we know that N
0.
Let us premultiply expansion above by FN (^
N )0AN :
FN (^N )0AN fN (^N ) = FN (^N )0AN fN (0)
+FN (^N )0AN FN (N )(^N
0) = 0
6.4.
103
OPTIMAL AND TWO-STEP GMM
(^N
0 ) =
N (^N
i 1
FN (^N )0AN FN (N )

h
FN (^N )0AN fN (0)

i
1
0 ) =
FN (^N )0AN FN (N )
p
FN (^N )0AN VN1=2VN 1=2 NfN (0)
p
VN 1=2 hNfN (0) is Ni(0; Iq ).
hp
i
p ^
^
Therefore, E
N (N 0) = 0 and V ar N (N 0) =
,
where
=
h
i 1
0

^
FN (N ) AN FN (N )
FN (^N )0AN VN AN FN (^N )
where
FN (^N )0AN FN (N )
i 1
(vi), we can replace FN () by FN (^N )

everywhere. Note that FN is q p, therefore the variance-covariance
matrix of the GMM estimator is p p.
Finally, using Assumption
6.4 Optimal and two-step GMM

Optimality of GMM: what is the best weighting matrix
AN , the
one giving us the smallest asymptotic variance-covariance matrix.
1 0
1
0
0
Aopt
N = arg min (FN AN FN )) FN AN VN AN FN (FN AN FN ) :
AN
We now use the following lemma.
Lemma 3
The matrix
(FN0 AN FN )) 1 FN0 AN VN AN FN (FN0 AN FN ) 1
is positive semi-denite 8AN .
(FN0 VN 1FN ) 1
104
If we select
AN = VN 1, we get
(FN0 AN FN )) 1 FN0 AN FN (FN0 AN FN ) 1
(FN0 AN FN ) 1
= 0:
Hence, best weighting matrix for GMM: inverse of the variancecovariance of moment conditions.
For this choice, variance of GMM is simply
1
V ar(^N ) = FN0 (^N )VN 1FN (^N )

"
!0
!# 1
h
i
^
^
1
1 @f (x; N ) 1
1 @f (x; N )
V arf (x; ^N )
N @
N
N @
and this denes the optimal GMM. But: in general, no condition imposed on distribution of
u (this is an interesting feature of
GMM, compared to IV).
Empirical issue: nd an estimate of
VN
that produces a
heteroskedasticity-robust GMM estimator for .

Solution: use a two-step estimation procedure
Step 1.
Compute an initial consistent estimator
arbitrary matrix for
AN (A1N ):
^1N
using an
^1N = arg min u0()ZA1N Z 0u():

Step 2. Compute V^N from u(^1N ) and nd ^2N such that
^2N = arg min u0()Z (V^N ) 1Z 0u():

Disadvantage: Two-step GMM estimators are independent from

initial matrix
A1N
only asymptotically. In small samples, GMM
6.5.
105
INFERENCE WITH GMM
estimators may not be unique, depending on that choice. Several

solutions:
Method 1.
Use an iterative algorithm for estimation, succes-
sively replacing
^N
and
AN
until full convergence.
Method 2. Acknowledge the fact that optimal weighting matrix

depends on
, and solve
^N = arg min QN () fN ()0AN ()fN ():

In practice, construction of variance-covariance matrix depends on the nature of data: cross-sections, times series, or panel
data (see dedicated section below).
6.5 Inference with GMM

Advantage of GMM over many alternative estimation procedures:
easy to provide statistical inference on model validity. In general,
we will test for the validity of moment conditions, also denoted orthogonality conditions.
Recall GMM procedure:
^N = arg min QN () = fN () VN fN (), where fN ()

P
1 N f (x ; ) and V is a consistent estimator of
i
N
i
N
V = limN !1 var[ NfN (0)].
First-order condition associated with minimization of QN ( ):
Find
@QN (^N )
^N )0VN 1fN (^N ) = 0;
=
F
(
N
@ ^N
where FN (^
N ) = @fN (N )=@.
If ^
N satises FOC above, it must also satisfy
P^ VN 1=2fN (^N ) = 0;
106
where
P^ = M^ (M^ 0M^ ) 1M^ 0
and
so that
M^ = VN 1=2FN (^N );

P^ VN 1=2fN (^N ) = VN 1=2FN (^N ) FN (^N )0VN 1FN (^N )

FN (^N )0VN 1fN (^N ):
1
Population analog to condition above:
P V 1=2E [f (0)] = 0;
where
P = M (M 0 M ) 1M 0
and Fi ( ) = @f (xi; )=@ .
Projection matrix
only
If
and
is of rank
p,
linear combinations of the
M = V 1=2E [Fi(0)];
so that restrictions above set
q1
vector
E [f (xi; 0)] to
0.
0 is identied, then these are the identifying conditions, and
the remaining conditions are unused in estimation.

The identifying restrictions determine the asymptotic distribution
of
^N :
N (^N
p
0
1
0
1
=
2
0) = (M M ) M V
NfN (0) + op(1);
M pN (^N 0) is asymptotically equivalent to

P V 1=2 NfN (0). This implies
p ^

d
N (N 0) !
N 0; (M 0M ) 1 :
where
The basic way of testing for model validity is to use the over-
identifying restrictions
(Iq
P )V 1=2E [f (xi; 0)] = 0;
6.5.
107
INFERENCE WITH GMM
which are or rank
p.
We have the sample analog:
P^ )VN 1=2fN (^N ) = VN 1=2fN (^N ):

QN (^N ) measures the extend to which
(IQ
Interpretation:
the data
satises the over-identifying restrictions. The asymptotic distribution of sample moments is determined by the function of the
data in the over-identifying restrictions:
VN 1=2 NfN (^N ) = (I q P )V 1=2 NfN (0) + op(1);

^ converges in probability to P . We nally have
because P
VN 1=2 NfN (^N )
d
!
N (0; Iq
P):
Both statistics (from identifying and over-identifying restrictions)

are orthogonal:
p
p
Cov[ N (^N 0); NfN (^N )]
= (M 0M ) 1M 0 (Iq P ) = 0:
It is equivalent to test model validity by testing either
H0 : E [f (xi; 0)] = 0
or
H0 : V 1=2E [f (xi; 0)] = 0;
V 1=2 is non-singular. H0 is the combination of identifyI

O
ing restrictions (H0 ) and over-identifying restrictions (H0 ):
H0I : P V 1=2E [f (xi; 0)] = 0;
H0O : (Iq P )V 1=2E [f (xi; 0)] = 0:
because
H0I because this is a set of p conditions,

O
by estimated sample moments. But H0
It is not possible to test for

automatically satised
can be tested because they are not necessary for identication.

The test statistic proposed by Hansen (1982) is
JN = NQN (^N )
d
!
2(q
p)
under
H0:
108
It can be shown that
JN
A 0
JN v
zq (Iq
where zq v N (0; Iq ).
is asymptotically equivalent to
P )0 (Iq
P )zq = zq0 (I
P )zq ;
6.6 Extension: optimal instruments for GMM

We have seen above how to obtain the optimal GMM estimator,
by selecting for the weighting matrix the inverse of the covariance
matrix for the moment conditions. We now show how to obtain
an even more ecient GMM estimator, based on the best choice
for the instruments. We are looking for the optimal, asymptotic
variance minimizing choice of instruments.
Based on Newey 1993, Ecient estimation of models with conditional moment restrictions.
6.6.1
Conditional moment restrictions
(z; ) denote a s 1 vector of residuals, where z is a p 1

vector of observations (on all variables), and is the q 1 vector
Let
of parameters. We have the following moment restrictions
E [(z; 0)jx] = 0
E [A(x)(z; 0)] = 0;
where x is a vector of conditioning variables, A(x) is an r s
matrix of functions of x, and 0 the true value of parameters.
Focus of the analysis here: choose A(x) to minimize the asymptotic variance of the GMM estimator.
Let
@(z; 0)
D(x) = E
jx ;
@
(x) = E [(z; 0)(z; 0)0jx]:
6.6.
109
EXTENSION: OPTIMAL INSTRUMENTS FOR GMM
The optimal instruments are given by
B (x) = C:D(x)0
(x) 1;
where
is an arbitrary, nonsingular matrix, and the asymptotic
covariance matrix for these instruments is

= E [D(x)0
(x) 1D(x)] 1 :
Example: Linear model with heteroskedasticity
We have in the model
D(x) = x0;
y = x00 + "; E ("jx) = 0,
(x) = E ("2jx); C = I; B (x) = x=2(x):
The corresponding IV estimator is in this case the weighted least
1=2(x):
1 corrects for heteroskedasticity,
Analogy with linear model:
(x)
and derivatives @(z; 0)=@ correspond to regressors, and matrix
D(x) is a function of x closely correlated with those derivatives.
squares estimator with weight
Since
does not depend on C , we can set C = I
and dene
mA = FN0 AN A(x)(z; 0); mB = B (x)(z; 0);

so that
E (mAm0B ) = FN0 AN E [A(x)

(x)B (x)0] = FN0 AN E [A(x)D(x)]
and
= FN0 AN FN ;
E (mAm0A ) = FN0 AN VN AN FN ; [E (mB m0B )] 1 = :
Therefore,
(FN0 AN FN )) 1 FN0 AN VN AN FN (FN0 AN FN ) 1
= (E [mAm0B ]) 1 E [mAm0A ] (E [mB m0A ]) 1
(E [mB m0B ]) 1
110
= (E [mAm0B ]) 1 E [mAm0A ]
where
E [mAm0B ] (E [mB m0B ]) 1 E [mB m0A ]

E [mB m0A] = E [RR0];
n
R = (E [mAm0B ]) 1 mA
Since
E [mAm0B ] (E [mB m0B ]) 1 mB :
E [RR0] is positive semi-denite, is a lower bound for the
asymptotic variance.
6.6.2
A rst feasible estimator
Optimal instruments
B (x) cannot be used, because they depend
on unknown parameters and/or functions. Assume
D(x) = D(x; 0)
and
(x) =
(x; 0);
D(:) and
(:) are known, and is a real vector. Because D (x) and
(x), we could estimate 0 by running
a linear regression of @(z; ^
)=@ and (z; ^)(z; ^)0 on x. This
^ (x) = D(x; ^)0
(x; ^) 1 and the resulting GMM estimator
gives B
where functions
would be
^ = arg min
8
n
<X
2 :
i=1
(zi; )0B^ (xi)0
"
n
X
i=1
B^ (xi)B^ (xi)0
# 1
n
X
i=1
9
=
B^ (xi)(zi; ) :
This estimator is always consistent, but not ecient if functions
D(x; ) and
(x; ) are misspecied.
Example: Nonlinear model with heteroskedasticity

Consider
y = f (x; 0) + "; E ["jx] = 0; E ["2jx] = h(x; 0; 0);
6.6.
where
111
h(:) is known.
Model with restricted rst two conditional moments only.
Ex-
ploiting additional information on second moment yields an IV

estimator at least as ecient as weighted least squares.
Drawback: estimator may not be consistent if the form of heteroskedasticity is misspecied.
Dene moment restrictions as
(z; ) = y
f (x; ) ; [y

f (x; )]2 h(x; ; ) 0 :
Optimal instruments take the form
@f (x; )=@ 0
0
D(x) = D(x; 0); D(x; ) = @h(x; ; )=@ 0 @h(x; ; )=@0 ;
(x) = V ar[("; "2)0jx];
B (x) = D(x)0
(x) 1:
Empirical issue: when is incorporating additional moment condition yielding a more ecient estimator ?
Asymptotic variance of the heteroskedasticity-corrected least squares
estimator:

E ["2jx]
@f (x; 0)
@

@f (x; 0) 0 1
;
@
to be compared with block corresponding to

in E [D(x)0
(x) 1D(x)] 1.
The two are equal if
E ["3jx] = 0, or
h(x; 0; 0) = h(x; 0).
Otherwise, the asymptotic variance of the heteroskedasticity-corrected
least squares estimator will be larger than the conditional moment
bound.
Corollary:
Gain in eciency exists even if
not depend on
x or !
h(x; ; ) and
(x) do
112
Computation of an ecient estimator
Needs specication of
(x),
and in particular, conditional third and fourth moments.

Assume
E ["3jx] = 0;
V ar("2jx) = 0h(x; 0; 0)2;
0 can be estimated by the sample vari2

^
^ ^ ), where ^ and ^ are initial estif (xi; )] =h(xi; ;
where kurtosis parameter

ance of
[yi
mators.
Estimated optimal instruments are then
"
^
0
^ x) = h(x; ; ^ )
D^ (x) = D(x; ^);
(
^ ^ )2 ;
0
^:h(x; ;
^ x) 1:
B^ (x) = D^ (x)0
(
6.6.3
Nearest-neighbor estimation of optimal instruments
Advantage:
avoid misspecication in
D(x; 0)
and
(x; 0)
in
computing optimal instruments.

Principle: estimate expectations that enter optimal instruments
nonparametrically (these expectations are conditional upon
x).
6.6.3.1 The nearest neighbor estimator

Simplest nonparametric estimator: nearest neighbor, or
NN
estimator.
The nearest neighbor estimator of conditional expectation is

constructed by averaging over the values of the dependent variable for observations where the conditional variable (x) is closest
to its evaluation value.
6.6.
113
xl denote a measure of scale of lth component of x (standard deviation). x being of rank r , dene
Let
jjxi
xj jjn =
r
X
(xil
l=1
^ l
xjl )2
)1=2
This measures the distance between observations

ing for the multivariate nature of
K; K n, and
8
<
:
Integer
vation
i.
!kK 0
!kK = 0
x.
i and j , account-
Consider now a given integer
1 k K;
for
k > K;
PK
k=1 !kK = 1:
for
and
n is the number of nearest neighbors for any obser-
j 6= i according to distance
th
above. Then assign the weight Wij = !jK to observation with j
smallest distance jjxi
xj jjn.
Let
Wii = 0
and rank observations
!kK = 1=K; k K .
To compute conditional expectation of y given x:
Select the set of the K (out of n) xi's closest to point x;
Compute the mean of the yi values corresponding to the xi's
Example: uniform weights
chosen above:
K
1X

E (yjx) =
!kK yk (x) =
yk (x);
K k=1
k=1
K
X
where
yk(x)
yi's ordered according to distance

is the yi whose xi is closest to x, y2 is
are the original

measure dened above (y1
the second closest, and so on).
114
Other possibility:
k NN estimator with non-uniform weights.
Stone (1977) suggests the following estimator
E (yjx) =
n
X
j =1
!j yj(x);
using either triangular weights:
!jT =
2(K
0
j + 1)=[K (K + 1)]
j < K;
for j K;
for
of quadratic weights:
!jQ
6(K 2
0
(j
1)2]=[K (K + 1)(4K
1)]
j < K;
for j K:
for
6.6.3.2 Application to optimal instruments estimation

The nearest neighbor estimator of the conditional covariance
at
xi is
^ xi) =
where observation
n
X
j =1
(xi)
Wij (zj ; ^)(zj ; ^)0;
i is excluded because Wii = 0 (leave-one-out
procedure).
D(x) is accordingly
n
X
@(zj ; ^)
^
D(xi) =
Wij
:
@
j =1
The nonparametric estimation of
Assume now some components of

form, and depend only on
x.
D(x)
have known functional
The estimator will consist in the sum
of both parametric and nearest neighbor components. Let
D(x; )
6.6.
115
x and nuisance parameters .

D(x; ) has the same dimension as D(x), and its components are
equal to those of D (x) that are known, and 0 otherwise.
denote a pre-specied function of
The estimator will be
D^ (xi) = D(xi; ^) +
n
X
j =1
Wij
@(zj ; ^)
@
D(xj ; ^) :
Finally, we can compute
n
X
1
0
1
^ xi) ; ^ =
^ xi)D^ (xi)
B^ (xi) = D^ (xi)
(
D^ (xi)0
(
n i=1
6.6.4
! 1
Generalizing the approach: other nonparametric

estimators
6.6.4.1 Conditional moment estimation

We wish to estimate the conditional expectation at the point
x, E (Y jX = x) = m(x), with
m(x) =
Z 1
X=
f (y; x)
y
dy;
f1(x)
1
f (:; :) and f1(:) are respectively the joint density of (x; y)

and the marginal density of x.
A nonparametric alternative to k
NN will consist in estimating densities above nonparametrically, to construct m
^ (x) =
i
R1 h
^
^
1 y f (y; x)=f1(x) dy . Popular approach in practice: the Nadaraya-
where
Watson kernel-based estimator.
116
6.6.4.2 Nonparametric density estimation

Let
F (x) denote the cumulative density function of X .
The den-
sity function is
f (x) =
d
F (x + h=2) F (x h=2)
F (x) = lim
h!0
dx
h
P rob (x h=2 < X < x + h=2)

:
h!0
h
For estimating f (x) based on observations x1 ; : : : ; xn , we consider h a function of n such that h
! 0 when n ! 1. The
= lim
probability above is then estimated by the proportion of observations falling in the interval
1
f^(x) =
nh
1
=
nh
(x
Number of

Number of
x1
n
1 X
=
1I
nh i=1
n
1 X
=
1I
nh i=1
1

2
h=2; x + h=2):

h
h
;x+
2
2
x1; : : : ; xn in x
x
;:::;
xn
in
( 1=2; 1=2)
1 xi x 1
h 2
2

1
;
i
2

xi
f^(x) is the per unit relative frequency in the interval (x h=2; x +

h=2), with midpoint x. Bandwidth h measures the degree to which
the data are smoothed (averaged) in computing f^(x).
This rst, naive nonparametric density estimator as been proposed by Fix and Hodges, 1951, and obtains by averaging the
xi's
6.6.
in an interval around
117
x, e.g., x h=2, where h is the interval width.
Density estimators using indicator functions are stepwise by
xi h=2.
nature and have jumps at
If one prefers smoother sets
of weights, one can replace the indicator function by a positive kernel function denoted
K (:).
estimator is
The Parzen-Rosenblatt kernel density
n
n
X
X
x
x
1
1
i
K
=
K ( i) ;
f^(x) =
nh i=1
h
nh i=1
where the kernel function has the following properties:
Z 1
K ( )d = 1; K (
1) = K (1) = 0;
and may or may not be symmetric.
Note: Easy to generalize to multivariate density estimation,

with a multivariate kernel
and
K (x)dx = 1:
K1(:; :) such that K (x) = K1(x; y)dy

n
1 X
z z
^
^
f (y; x) = f (z ) = q+1
K1 i
;
nh i=1
h
where
xi has rank q, zi is the ith observation (yi; xi) and z = (y; x)
is a xed point.
6.6.4.3 Selection of bandwidth parameter

Important issue: selection of the optimal bandwidth parameter,
h.
For this, we need the following set of assumptions:
(A1) Observations
(A2) Kernel
x1; : : : ; xn are i.i.d.
K (:) is symmetric around 0 and satises
118
R
R K (2 )d
(ii)
(
R K
2
(i)
(iii)
= 1,
)d = 2 6= 0,
K ( )d < 1.
(A3) Second-order derivatives of

in some neighborhood of
x.
are continuous and bounded
h = hn ! 0 as n ! 1.
(A5) nhn ! 1 as n ! 1.
(A4)
With these assumptions, we can approximate the bias and variance of
f^:
h2 00
^
Bias [f (x)] =
f (x);
2 2
Strategy for choosing
h:
1
var [f^(x)] =
f (x)
nh
K 2 ( )d :
minimize the Mean Integrated Squared
Error (MISE):
Z h
i2
f^(x) f (x) dx =
Z h
(Bias f^)2 + Var (f^) dx;
or preferably, its approximation (AMISE):
1
AMISE = 1h4 + 2(nh) 1;
4
Z
where
1 = 22 [f 00(x)]2dx; 2 =
K 2( )d :
2 if O(h4 ) and variance is O(nh) 1, AMISE if of order

maxfO(h4); O(nh) 1g. Hence, the only value of h for which the
Since Bias
two are of the same order of magnitude is
h / n 1=5;
for which
AMISE = O(n 4=5):
6.6.
119
6.6.4.4 The Nadaraya-Watson kernel-based estimator

The estimator is
m
^ (x) =
"
1 Pn K1 yi y ; xi x
iP
=1
h
h
y
dy;
n
x
x
q
1
i
K
1 (nh )
i=1
h
Z 1
(nhp)
K (:) and K1(:; :) are q-multivariate and p-multivariate kernels respectively, and p = q + 1 (recall x has rank q ). Dene
i = h 1(yi y) , y = yi hi. The numerator above becomes
where
Z 1
(nhp) 1
n
X
i=1
Z
n
1X
y
=
n i=1 i
n Z 1
1X
n i=1
(yi
1
1
h )K1 ;

K1 ;
xi
h p+2K1 ;
xi
xi

hd
h q d

d;
and since the last term is zero for symmetric kernels, we nally
have
n
1X
=
yh
n i=1 i
Z 1
K1 ;
xi

d
n
1X
x x
=
:
yih q K i
n i=1
h
Hence the nal nonparametric kernel estimator is
m
^ (x) =
" n
X
i=1
xi
# 1 "X
n
i=1
xi
yi :
G
Special case: the General Nearest Neighbor estimator (
NN ).
120
We consider here weights similar to kernel functions with unbounded support:
m(x) = E (Y jX = x) =
where
and
n
X
i=1
!is(x)yi;
xi x
d
!is(x) = Pn
xi x ;
K
i=1
d
th nearest
distance between x and its K
d is the
neighbor.
Numerous papers on optimal choice of window width

One can show (Mack, 1981) that
and
K.
are linked in kernel
estimation:
K = nh4=(4+q)
and
K opt / n4=5; hopt / n 1=5:
Chapter 7
GMM estimators for time series
models
7.1 GMM and Euler equation models
Lucas critique (1976): evaluations based traditional dynamic simultaneousequation models are awed because parameters are assumed invariant across dierent policy regimes.
Hence, marginal response to a change in policy instruments is not
to be expected from rational agents taking into account policy
changes in their decision making.
Standard estimation procedures (MLE) are computationally burdensome when one introduces taste and technology parameters.
Hansen and Singleton (1982): GMM can be applied easily to

structural models, to draw inference on these parameters.
7.1.1
Hansen and Singleton framework
Consumption-based asset pricing model: representative agent chooses

consumption and investment in a single asset to maximize dis-
121
122
CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS
counted utility
max E0
"
1
X
t=0
t U (Ct) ;
where
Et(:) is expectation operator at time t, conditional on information

set t ,
Ct is consumption,
t is a constant discount factor,
U (:) is a strictly concave utility function.
Budget constraint:
Ct + Pt Qt RtQt 1 + Wt;
where
Rt: pay-o for asset (bought in period t 1),

Pt and Qt: price and quantity of asset bought,
Wt: labor income.
Asset price is deated by the price of consumption good.
First-order condition:
Pt U 0(Ct) = Et[Rt+1U 0(Ct+1)];

Equivalently,
where
Rt+1 U 0(Ct+1)
Et
U 0(C )
Pt
t
U 0(:) = @U=@C:
1 = 0:
This is the Euler equation for the system.

Specication of the utility function:
Ct
U (Ct) = ;

with
< 1;
7.1.
123
GMM AND EULER EQUATION MODELS
so that
where
7.1.2
R
C
Et t+1 t+1 1 = 0;
Pt
Ct
1.
(7.1)
GMM estimation
Maximum-Likelihood Estimation: specify conditional distributions

of
R
LW1;t+1 = log t+1
Pt
and
C
LW2;t+1 = log t+1
Ct

given
t ;
and maximum likelihood function based on these, subject to restriction (7.1).

Disadvantage:
computer-intensive method, and possible biased
inference if conditional distributions are misspecied.
Consider GMM estimation of Equation (7.1); to identify parameters
and we need at least two moment restrictions.
The
rst one obtains from using the Law of Iterated Expectations:
R
C
E t+1 t+1
Pt Ct
1 =E

R
C
t+1 t+1
Pt Ct

= 0:
Additional restriction obtain from incorporating the rational

expectations hypothesis: agents use all available information at
t, t, so that
If yt+1 2
= t but zt 2 t then Et(yt+1zt) = [Et(yt+1)] zt:
If Et (yt+1 ) = 0, by the Law of Iterated Expectations, we have
E (yt+1zt) = 0, and the Euler equation implies

Rt+1 Ct+1
E ["t+1(; )zt] = 0 where "t+1 =
1;
Pt
Ct
time
124
and
zt
is a vector of variables contained in information set
Valid candidates are
Ct i; Rt i; Pt i; i 0.
t.
Notes.
This example shows that model errors need not be linear in

endogenous variables for GMM.
If replaced by 1=(1 + r) where r:

rate, model is just identied (for
observed constant interest
) through Euler equation (7.1).
7.2 GMM Estimation of MA models

Consider estimation of a pure moving average MA(1) model
yt = "t + 0"t 1;
where
7.2.1
"t is an i.i.d.
process with 0 mean, variance
(7.2)
02, and j0j < 1.
A simple estimator
Implied rst-order autocorrelation is
0 =
E (ytyt 1)
0
=
:
E (yt2)
1 + 02
Replacing unknown parameter
^T =
we obtain estimator
^T
0 by sample estimator
PT
t=2 yt yt 1 ;
PT
2
t=2 yt
by solving
^2T
^T 1^T
1 = 0:
7.2.
125
GMM ESTIMATION OF MA MODELS
j^T j < 0:5, but this may

not be veried in nite samples, especially if j0 j close to 1. We
Problem: Solution is real-valued only if
may dene
~T =
and solution for
~T
8
<
0:5
^T < 0:5;

if j
^T j < 0:5;
if
^T > 0:5;
if
^
: T
0:5
is
~T =
Second structural parameter:

rived from
1 4^2T
:
2^T
02, whose expression can be de-
E (yt2) = 02(1 + 02);
with sample analog
(1=T ) Tt=1 yt2

2
~ T =
:
2
~
1+
Consider now estimation in a GMM framework.
= (; 2)0 and let

2
y
t yt 1
f (yt; ) =
;
yt2 2(1 + 2)
such that Ef (yt ; 0 ) = 0 (theoretical moment condition).
Dene parameter vector
Sample moments are
T
1X
f T ( ) =
f (y ; ) =
T t=1 t
(1=T ) PTt=1 ytyt 1 2

(1=T ) Tt=1 yt2 2(1 + 2)
fT (^T = 0
~
^T = ~T = (T ; ~ 2T ).
This system is just-identied, and solving

same estimators as above:
yields the
126
Estimators ^T and ~T are consistent and asymptotically normal with distribution
Theorem 4
p
pT (^T
0)
0)
T (^T
where
1
=
(1 02)2
v N (0; );
1 + 02 + 404 + 06 + 08

20203(2 + 02 + 04)
20203(2 + 02 + 04) 204(1 202 + 304 + 206)

0 0
+
;
0 4
with 4 the fourth-order cumulant of "t.
Under the normality assumption, asymptotic variance of the
MLE of
^T
(1
is
02).
Hence this GMM estimator is asymptot-
ically as ecient as MLE only if
0 = 0, and is rather inecient
in general.
7.2.2
A more ecient estimator
This estimator is based on
Autoregressive Approximation (Durbin
1959).
The MA(1) dened by (7.2) is invertible, therefore it admits
an AR representation:
yt =
where
1
X
j =1
j (0)yt j + "t;
j (0) = ( )j ; 8;
j = 1; 2; : : :
7.2.
127
GMM ESTIMATION OF MA MODELS
which is approximated in practice by
yt =
K
X
j =1
j (0)yt j + "Kt:
(7.3)
This approximation produces an extra error because
"Kt = "t +
1
X
j =K +1
j (0)yt j = "t + ( 1)K +10K +1"t
1:
We can look at (7.2) as the structural model, and Equation (7.3)

as the reduced-form model. The AR model captures second-order
properties of
yt
instead of the autocovariance function. We need
then to dene estimators for
0 based on estimators for j (0).
K -vector
0
1
1()
A 8 ; with j () = ( )j ;
AK () = @ ...
K ( )
^K denote the K -vector of OLS estimators (^1; : : : ; ^ K )
and let A
Dene the
in (7.3).
For an given
K , we dene

^T K = arg min A^K

2
where
7.2.3
0
AK () VT K
= ( 1; +1) and VT K
is a
A^K
K K
AK () ;
weighting matrix.
Example: The Durbin estimator
We can write
j () = j 1; j = 1; 2; : : : ;
(7.4)
with
0() = 1:
128
exact autoregressive relationship for j (), and we can

estimate 0 by regressing OLS estimates (^
1; : : : ; ^ K ) on lagged
values of themselves, i.e., on 1; (^
1; : : : ; ^ K 1). The estimator is
This is an
P
^D =
K
^ j ^ j 1
j =1
P

K
2
^j
j =1
with
^ 0 = 1:
And in terms of (7.4):
VT K = BK ()0BK ();
and
LK : K K
where
BK () = IK + LK ;
matrix with 1's on the rst lower o-diagonal
and 0's elsewhere.
7.3 GMM Estimation of ARMA models

To simplify exposition, we concentrate on the ARMA(1,1) case.
7.3.1
The ARMA(1,1) model
The model is
yt = 0yt 1 + "t + 0"t 1;
where we assume
0 6= 0;
j0j < 1; j0j < 1;
and we view the model as a regression
yt = 0yt 1 + ut;
where
ut = "t + 0"t 1;
0 is the parameter of interest.
(7.5)
7.3.
129
GMM ESTIMATION OF ARMA MODELS
OLS estimation (ignoring the moving average structure in

is inconsistent because
yt =
7.3.2
1
X
j =0
0j ut
ut)
E (ytyt 1) 6= 0, since by back substitution:
E (utyt 1) = E (utut 1) = "20:
(7.6)
IV estimation
ut implies E (utut j ) = 0 8j 2, and

(7.6) implies that E (ut yt j ) = 0 8j 2. We can use these moment conditions to estimate consistently 0 with an IV procedure.
The MA(1) structure on
Moment conditions are
Ef (yt; 0) = 0
f (yt; ) = (yt
where
yt 1)yt 2;
T
1X
fT ( ) =
(y yt 1)yt 2;
T t=3 t
^ T ) = 0 for ^T gives
and solving fT (
^ T =
T
X
t=3
yt 2 yt 1
! 1 T
X
t=3
yt 2yt:
^ T is consistent and asymptotically normal, with

T (^ T 0) v N (0; ),
(1 02)(1 + 402 + 400 + 4003 + 20202 + 02)
=
:
(1 + 00)2(0 + 0)2
Theorem 5
In contrast, the asymptotic distribution is the MLE from the

ARMA(1,1) model is
T (^ MLE
(1 + 00)2(1 02)
0) v N 0;
:
(0 + 0)2
130
Notes. Both these estimators have a large variance when

is close to
0.
The MLE is more ecient than GMM, especially for large values
of
0 and 0.
We can also consider augmenting the set of instruments (the

model becomes over-identied) by including
yt j ; j = 2; 3; : : :,
yielding
T
X
^ Tj =
t=j +1
yt 1yt
! 1
j
T
X
t=j +1
yt yt j ;
for
j 2:
^ Tj has asymptotic variance 0 2(j 2).

^ T is the most ecient of these
Because j0 j < 1, it follows that
Dolado 1990 shows that
estimators. Intuition: eciency of IV estimator related to the cor-
yt 1 )
relation between stochastic regressor (

variable (
yt
j ).
(rapidly) with
Since
yt
and its instrumental
is stationary, this correlation decreases
j , and it is best to choose the smallest j admissible.
Finally, last possibility is to use more than 1 instrument for
yt 1
implied by moment conditions
This gives the
q vector of conditions
E (utyt j ) = 0;
Ef (yt0) = 0; f (yt; ) = Yq;t 2(yt

where
8j
yt 1))
Yq;t = (yt; : : : ; yt m+1)0 is a q-vector of instruments.
estimator is
^ Tq =
T
X
t=q+2
yt 1Yq;t0 2ATq
T
X
t=q+2
2.
Yq;t 2yt 1
! 1
GMM
7.4.
131
COVARIANCE MATRIX ESTIMATION
T
X
t=q+2
yt 1Yq;t0 2ATq
X
t=q+2
Yq;t 2yt;
q q weighting matrix.
^ Tq is
The asymptotic distribution of
where
ATq
is a positive denite

T (^ Tq 0) !d N 0; "2(Rq0 Aq Rq ) 1Rq0 Aq Vq Aq Rq (Rq0 Aq Rq ) 1 ;
where
Vq = lim T varfT (0);

T !1
(1 + 00)(0 + 0)
"20j 1
:
(1 02)
1
The optimal choice for the weighting matrix being ATq = Vq , we
Rq = E (Yq;t 2yt 1);
with
have
T (^ Tq
j th element

0) !d N 0; "2(Rq0 Aq Rq ) 1 :
7.4 Covariance matrix estimation

In the time-series framework, moment conditions are dened as
E [f (xt; 0] = 0,
and the variance-covariance matrix to be esti-
mated is
T X
T
X
1
VT = T var[fT (0)] =
E [f (xt; 0)f (xs; 0)]:
T t=1 s=1
This is the average of autocovariances for the process
Let
ft = f (xt; 0)
and rewrite
function:
VT =
T 1
X
j = (T
1)
VT
f (xt; 0).
as a general autocovariance
T (j )
where
132
T (j ) =
7.4.1
(1=T ) Tt=j +1 E (ftft0 j ); j 0;

P
(1=T ) Tt= (j 1) E (ft+j ft0); j < 0:
Example 1: Conditional homoskedasticity
Linear regression model
yt = xt0 + ut;
Assume
where
E (ft) = E (xtut) = 0:
E (utjut 1; xt; ut 2; xt 1; : : :) = 0
and
E (utu0tjut 1; xt; ut 2; xt 1; : : :) = u2 :
Residual
ut is neither heteroskedastic nor serially correlated.
We
have
T
1X

VT =
T (0) =
E (xtututx0t) = u2 E (xtx0t);
T t=1
the standard OLS variance-covariance matrix. The estimator of
VT
constructed from sample moments is the MLE:
T
2X
^
V^T = u
xtx0t;
T t=1
7.4.2
where
T
1X
2
^ u =
u^2t ; u^t = yt
T t=1
^
xt:
Example 2: Conditional heteroskedasticity
Assume now that
E (utu0tjut 1; xt; ut 2; xt 1; : : :) = t2:

The covariance matrix is then
T
1X

VT =
T (0) =
E (xtututx0t);
T t=1
7.4.
133
which is consistently estimated by
T
1X
^
VT =
xtu^tu^tx0t:
T t=1
This is White's heteroskedasticity consistent estimator.
In a typical IV setup, where
ft = wt(yt
xt ); wt are instruments;
T
1X

E (u2t )wt0 wt;
VT =
T t=1
and the asymptotic covariance matrix would be
1
P X
T W
1
1 ^
P P
T W W

1 0
X PW
T
1
^ is a T T diagonal matrix with typical element u^2t , and

PW = W (W 0W ) 1W 0.
where
7.4.3
Example 3: Covariance stationary process
Assume
T (j ) = 0 for j > m, so that
VT =
m
X
j= m
T (j ):
The covariance matrix estimator is based on the sample analogue
V^T =
^ T (j ) =
m
X
^ T (j );
where
j= m
(
P
(1=T ) Tt=j +1 xtu^tx0t j u^t j ;
P
(1=T ) Tt= (j 1) xt+j u^t+j x0tu^t;
j 0;
j < 0:
134
In most cases, restrictions as in examples 1-3 above are too

strong, and an obvious idea would be to construct an estimator
V^MM
based on sample analogues to population autocovariances:
V^MM =
^ T (j ) =
where
T 1
X
j = (T
1)
^ T (j );
where
P
(1=T ) Tt=j +1 f^tf^t0 j ; j 0;
P
(1=T ) Tt= (j 1) f^t+j f^t0; j < 0;
f^t = ft(xt; ^T ).
But:
The number of estimated autocovariances grows with the sample size;
Although V^MM may be asymptotically unbiased, it is not consistent in the mean squared error sense;
Finite sample properties: In the exact-identication case, V^MM

is 0 8T .
Why sample autocovariance matrix
^ T (j )
not consistent for
j, T + 1 j T 1 ?
Suppose j = T
2; then
^ T (j ) tends to 0 as T
arbitrary
7.4.4
!1!
The Newey-West estimator
Given problem above, a rst idea is to consider models for which

autocovariance genuinely tends to 0 as
case for
! 1.
This is the
asymptotically independent processes characterized by the
mixing property.
7.4.
135
Consider two bounded mappings Y : Rk+1 ! R

and Z : Rl+1 ! R. The sequence fytg is mixing if there exists
a sequence of positive numbers fng, converging to 0, such that
Denition 2
jE [Y (yt; yt+1; : : : ; yt+k )Z (yt+n; yt+n+1; : : : ; yt+n+l)]

E [Y (:)]E [Z (:)]j < n:
We can replace the sum in the denition of
sum,
such that terms for which
p are eliminated.
VT :
V^T =
^ T (0) +
p
X
j =1
by a
truncated
is greater than some threshold
Using the fact that
the following estimator for
^ T
( j ) =
(j )0, we consider

^ T (j ) +
^ T (j )0 :
(7.7)
This is the Hansen (1982) covariance estimator (see also Hansen
p is dened as the lag truncation parameter,

1=4), so
and should go to innity at some rate, typically p = o(T
that all non-zero
T (j )'s are consistently estimated.
and Singleton 1982).
Problem with estimator in (7.7): in nite samples,

not be positive semidenite.
(1987):
with
multiply
^ T (j )
^ T (j ) may
Suggestion by Newey and West
by a sequence of weights that decrease
jj j. The Newey-West estimator is

V^T =
^ T (0) +
p
X
j =1
j
p+1

^ T (j ) +
^ T (j )0 ;
where linear weights decrease from a value of 1 for
1=(p + 1) for jj j = p with step size of 1=(p + 1).
^ T (0) down to
136
7.4.5
Weighted autocovariance estimators
Extension of Newey-West suggestion: looking for more ecient

covariance matrix estimators.
General form: weighted average of sample autocovariance matrices:
V^T =
T 1
X
s= (T
1)
!s
^ T (s);
f!sg is denoted the lag window.

Strategy: choose a lag window such that f!s g approaches 1
rapidly enough to obtain asymptotic unbiasedness;
slowly enough to ensure that the variance converges to 0.
where the sequence of weights
In practice, we concentrate on
scale parameter windows, where

s
;
!s = k
mT
mT is the scale parameter or the bandwith parameter, and

function k (:) is the lag window generator. These estimators bewhere
long to the class of
0.
kernel spectral density estimators at frequency
We assume
The function k(:) : R ! [ 1; 1] satises
k(0) = 1; k(z ) = k( z ) 8z 2 R;
Z 1
jk(z)jdz < 1;
and k(:) is continuous at 0 and "everywhere else" except at a nite number of points.
Note: When k (:) = 0 for z > 1, mT reduces to p, the lag truncation parameter.
7.4.
Let
r be the largest integer such that

kr = lim
z !0
is nite and not 0. Integer

tion
kr
137
k(:), and kr
k(z )
jzjr
r is the characteristic exponent of func-
measures the smoothness of the lag window. If
is nite for some
r0, then kr = 0 for r < r0.
Consider nally the following measure of smoothness of the spectral density function in the neighborhood of 0:
1
X
(
r
)
1
S = (2)
jj jr
(j );
j= 1
also denoted the
function:
When
generalized rth derivative of the spectral density

1
1 X
Sf () =
(j )e
2 j = 1
! 1, the limit of VT
tral density matrix of
ij :
is equal to
2
times the spec-
ft evaluated at the zero frequency (Hansen
1982).
Dene the asymptotic truncated Mean Squared Error:
T
MSEh = E min j vec(V^T
mT
where
BT
VT )0BT vec(V^T
VT )j; h ;
is a square, possibly random, weighting matrix.
We have the following result.
Theorem 6
We have
(i) If m2T =T
(Andrews 1991). Assume mT
! 0 then V^T
VT
p
!
0.
p
! 1 and BT !
B.
138
(ii) If m2Tr+1=T ! 2)0; 1( for some r 2 [0; 1) for which kr

and jjS (r) jj < 1, then
p
T=mT (V^T VT ) = Op(1):
(iii)
lim lim MSEh = 42 kr2(vecS (r) )0B vecS (r) =

T !1 h!1

Z 1
+
k2(z )dz tr(B )(I + Bqq )Sf (0)
Sf (0) ;
1 P
P
where Bqq = i j eie0j
ej ei and ei is a zero vector with 1 as
the ith element.
(i): establishes consistency of scale parameter covariance estimators for bandwidth sequences that grow at rate
o( T ).
(ii): Gives rate of convergence.

(iii): Gives asymptotic truncated Mean Squared Error.
For
j th diagonal element of V^T , asymptotic bias is

(r )
mT r kr 2Sj;j
and asymptotic variance
Z 1
m
T
(2)
2
8 Sj;j
k2(z )dz:
T
1
Criteria of choice for scale parameter window
According to the-
orem above, preferred estimators have large
Variance of these
r.
O(mT =T ) and the bias is O(mT r ).

Also, no kernel estimators with r > 2 can be positive semidenite. Hence, we should restrict attention to estimators with r = 2
kernel estimators are
(which rules out truncated and Bartlett kernels).

Optimal choice of scale parameter
mT :
according to asymptotic
truncated MSE, the scale parameter should be of order
T (2r+1)
7.4.
139
Table 7.1: Some Kernel estimators for weighted autocovariance

k(z )
r kr
Truncated
1 for jz j 1,
1 0
0 otherwise
Bartlett
1
jzj for jzj 1
1 1
0 otherwise
Parzen
1
6z 2 + 6jz j3 for 0 jz j 1=2,
2(1
jzj)3 for 1=2 jzj,
2 6
0 otherwise
7.4.6
Weighted periodogram estimators
Consider the Fourier transform of the lag window:
T 1
1 X
W (; mT ) =
!e
2 s= (T 1) s
This is also denoted the
spectral window.
is:
Kernel estimators can
be computed as weighted integrals:
V^T =
Z
W (; mT )I^T ()d;
where
I^T () = (2T ) 1

is the
sample spectral density
or
T 1
X
s= (T
1)
^ seis
periodogram,
and
W (:; :) is
the
averaging kernel.
Spectral estimators once computationally burdensome, before FFT
(Fast Fourier Transforms) became popular.
Dene the Fourier transform of
f^t as
T
1 X
(p) = p
f^teip t:
2T t=1
140
Table 7.2: Some Kernel estimators for weighted periodograms

k(z )
kr
h
i r
sin(6z=5)
25
2
cos(6z=5)
Quadratic 122 z2 6z=5
2 =10
Daniell
(sin(z )=z
2 2 =6
2(1 cos(sz ))
Tent
2 1/12
z2
The periodogram matrix can be computed at the Fourier frequencies
p =
as
2p
T
; p = 1; 2; : : : ;
T
2
I^T (p) = (p)(p)0;
and we have the nal expression for the covariance matrix:
(T 1)
X
2
V^T =
I^T (0p)W (0p; mT );
2T 1 p= (T 1)
0p = 2p=(2T 1). Within the class of scale parameter

windows with r = 2, the Quadratic Spectral window (see table)
where
minimizes the truncated MSE across at the 0 frequency (Andrews

1992).
Chapter 8
GMM estimators for dynamic
panel data
8.1 Introduction
GMM estimation was introduced as an interesting alternative to
Fixed-eects, Maximum-Likelihood or GLS estimation procedures.
But its advantages are the most obvious for estimating dynamic
panel-data models.
Consider the simple model without exogenous regressors:
yit = yi;t 1 + uit; uit = i + "it:

The Anderson-Hsiao Instrumental-variable procedure: consistent
estimates when
is xed, based on First-Dierence model trans-
formation.
Two drawbacks:
a) In IV procedure, variance-covariance matrix is restricted;
b) Only one instrument is used (either
141
yi;t 2 or yi;t 2
yi;t 3).
142
CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA
8.2 The Arellano-Bond estimator

Important paper: Arellano and Bond (Review Econ. Stat. 1991):
more robust procedure can be used (point a)) and more orthogonality conditions can be used (point b)).
8.2.1
Model assumptions
(i) For all
i, "it is uncorrelated with yi0 for all t;
(ii) For all
i, "it is uncorrelated with i, for all t;
(iii) For all
i, the "it's are mutually uncorrelated.
Under these assumptions, we have the set of moment conditions:
E (yisuit) = 0; t = 2; 3; : : : ; T; s = 0; 1; : : : ; t 2;
uit = "it = "it
where
"i;t 1.
This is a set of
T (T
1)=2
conditions (compare with Anderson-Hsiao, where only 1 condition

was available).
" are
correlated, i.e., we must have E ("it "i;t+s ) = 0, for
Important assumption: conditions above hold if error terms
not serially
s = 1; 1.
If serial correlation is present, we have the set of conditions:
E (yisuit) = 0; t = 3; : : : ; T; s = 0; 1; : : : ; t 3;
which gives
(T
1)(T
2)=2 conditions (we lost (T
1) condi-
tions).
By continuous substitution seen before:
1 t
i + tyi0;
yit = "it + "i;t 1 + 2"i;t 2 + + t 1"i1 +
1
8.2.
143
THE ARELLANO-BOND ESTIMATOR
yit = f ("it; "i;t 1; : : : ; "i1; i; yi0), and
so that
E (yi;t 2uit) = E (yi;t 2("it "i;t 1))

= E ("i;t 2("it "i;t 1)) = 0
because by assumption E (i "it ) = E ("it yi0 ) = 0.
8.2.2
Implementation of the GMM estimator
We need a) the instrument matrix W; b) An initial weighting

matrix.
The instrument submatrix for unit
i is of the form:
yi0 0 0
6 0 yi0 yi1 0
0 0
6
60 0 0 y
i0 yi1 yi2 0
Wi = 6

0
0
0
yi;T 2
6
4
..
.
..
.
..
.
0
0
so that Wi ui =
0
ui2 yi0
B ui3 yi0
B
B ui3 yi1
B
B ui4 yi0
B
B u y
i4 i1
B
B u y
i4 i2
B
B
B
B
B
B
@
..
.
uiT yi0
..
.
uiT yi;T 2
0
and E (Wi ui ) = 0.
..
.
..
.
..
.
..
.
..
.
yi0
C
C
C
C
C
C
C
C
C=
C
C
C
C
C
C
A
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
(yi2
(yi3
(yi3
(yi4
(yi4
(yi4
(yiT
(yiT
..
.
..
.
yi1) yi0
yi2) yi0
yi2) yi1
yi3) yi0
yi3) yi1
yi3) yi2
..
.
yi;T 1) yi0
..
.
yi;T 1) yi;T 2
7
7
7
7
7
5
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
144
(W 0
W ) 1:
is the variance-covariance
of " (in the transformed model). If "it is homoskedastic, we have
Initial weighting matrix for
E ("it"i;t 1) = E [("it "i;t

E ("2it) = E [("it "i;t 1)("it
E ("it"i;t+1) = E [("it "i;t
so that for unit
i, E (uiu0i) = "2H , where

2
H=
(T
2)(T
1)("i;t 1 "i;t 2)] = "2

"i;t 1)] = 2"2
1)("i;t+1 "it )] = "2
6
6
6
6
6
4
2
1 0
1 2
1 0
0
1 2
1
..
.
2) matrix.
..
.
..
.
..
.
..
.
..
.

..
.
We can use
0
07
7
07
7;
..
.
1 2
7
5
to compute the initial
weighting matrix as
A1 =
N
X
i=1
Wi0HWi:
After nding the rst-stage GMM estimator
^GMM = arg min u0W A1 1W 0u

= y0 1W A1 1W 0y 1 1 y0 1W A1 1W 0y ;
we can compute the second-stage weighting matrix as:
A2 =
where
ûi = yi
N
X
i=1
^yi; 1.
Wi0ûiû0iWi;
8.3.
145
MORE EFFICIENT PROCEDURES (AHN-SCHMIDT)
8.3 More ecient procedures (Ahn-Schmidt)

Ahn and Schmidt (1995) propose
ditions:
2 additional nonlinear con-
E (uiT uit) = 0; t = 2; 3; : : : ; T
With Ahn-Schmidt and Arellano-Bond, we
(T
2)
orthogonality conditions.
1:
have T (T
1)=2 +
Ahn-Schmidt show that they
represent all moment conditions implied by our assumptions.
8.3.1
Additional assumptions
8.3.1.1 Homoskedasticity
V ar("2it) is the same 8t, we have:

t = 1; 2; : : : ; T . This adds T 1 condi-
Under the assumption:
E (u2it)
is the same for
8i,
tions, and the nal set of conditions under homoskedasticity is
E (yisuit) = 0 t = 2; : : : ; T; s = 0; : : : ; t 2;
E (yitui;t+1 yi;t+1ui;t+2) = 0 t = 1; : : : ; T 2;
E (uiui;t+1) = 0 t = 1; : : : ; T 1;
where
ui = T1
PT
t=1 uit.
8.3.1.2 Stationarity
Cov(i; yit) is the same

The entire set of the T (T
1)=2+(2T
When stationarity assumption is added:
8t, this adds 1 condition.
2) conditions is now
E (yisuit) = 0 t = 2; : : : ; T; s = 0; : : : ; t
E (uiT yit) = 0 t = 1; : : : ; T 1;
E (uityit ui;t 1yi;t 1) = 0 t = 2; : : : ; T:
2;
146
Advantage: this set consists of linear conditions only.

The Ahn and Schmidt estimator obtains by adding to ArellanoBond instrument matrix the following block for unit
Wi =
B
B
B
@
yi2 0 ::: :::

yi3 yi3 0 :::
..
.
..
.
..
.
..
.
::: ::: :::
0
0
..
.
i:
ui 0 ::: 0
0 ui ::: 0
..
.
..
.
..
.
..
.
yi;T 1 0 ::: ::: ui
How to test for alternative assumptions: let
1
C
C
C:
A
W 1 denote the instru-
ment matrix associated with the set of conditions to be tested, and
W0
an instrument matrix associated with a smaller set of valid
conditions.
ments
Let
(W 0; W 1)
^
^0 denote GMM estimates with instru0

W 0 respectively, and J (^) and J (^ ) the
and
and
corresponding GMM criterion values.

Then under
H0 : conditions associated with W 1 are valid, we have

0
J (^) J (^ ) v 2(rank(W 1)):
8.4 The Blundell-Bond estimator

Blundell and Bond (1998) suggest to use linear moment restrictions based on assumptions for initial conditions. They propose
E (uityi;t 1) = 0 t = 3; 4; : : : ; T;
with the addition of
E (ui3yi2) = 0:
This last condition combined with the one above implies the AhnSchmidt (1995) nonlinear restrictions
E (uitui;t 1) = 0; t = 3; : : : ; T .
8.5.
DYNAMIC MODELS WITH MULTIPLICATIVE EFFECTS
147
It means that we have the following stationarity condition on the

model:
yi0 =
+ "i0:
In other terms, initial deviations from

correlated with the level of
i=(1
)
i=(1 ) itself.
must not be
The GMM estimator of Blundell and Bond combines the AhnSchmidt conditions
Wi with their new instruments dened above:
Wi 0
0
6 0 y
0
i2
6
6
Wi+ = 6 0 0 yi3
6
4
..
.
..
.
..
.

..
.
0
0
0
0
yi;T 1
7
7
7
7;
7
5
for estimating parameters in a two-equation system:
yi = yi; 1 + "i

yi = yi; 1 + i + "i:
8.5 Dynamic models with Multiplicative eects

We consider here two generalizations to multiplicative individual
eects models.
8.5.1
Multiplicative individual eects
Holtz-Eakin et al.
(1988, see also Ahn, Lee and Schmidt 2001)
suggest the model
yit = yi;t 1 + xit + uit;

uit = ti + "it;
148
where
t
is a time parameter. Unobserved heterogeneity in
is
aected by a time shock (common to all units).

Let us lag equation above one period:
yi;t 1 = yi;t 2 + xi;t 1 + t 1i + "i;t 1;

and dene a new variable rt = t =t 1 . Substracting from the rst
equation the second one premultiplied by rt , we have
yit rtyi;t 1 = (yi;t 1 rtyi;t 2) + (xit rtxi;t 1)
+"it rt"i;t 1:
This is new, nonlinear equation with parameters to be estimated:
; ; rt; t = 2; 3; : : : ; T .
The transformation used is denoted Quasi-
dierencing.
GMM estimation is applicable as before (Arellano-Bond, AhnSchmidt or Blundell-Bond), but the initial weighting matrix cannot be used anymore. Let
"it = "it
rt"i;t 1.
We have, under
homoskedasticity and no-serial-correlation assumptions:
E ("it"i;t 1) = E [("it rt"i;t 1)("i;t 1 rt 1"i;t 2)]

= rt"2
E ("2
= E [("it rt"i;t 1)("it rt"i;t 1)]
it )
= "2(1 + rt2)
E ("it"i;t+1) = E [("it rt"i;t 1)("i;t+1 rt+1"it)]
= rt+1"2:
Thus, the optimal initial weighting matrix would be
1 + r12 r2
6 r2
1 + r22
6
60
r3
6
4 :::
:::
0
:::
0
r3 0
1 + r32 0
:::
:::
:::
rT 1
:::
:::
:::
:::
1 + rT2
3
7
7
7:
7
5
8.5.
149
DYNAMIC MODELS WITH MULTIPLICATIVE EFFECTS
When the
rt's
are unknown, we must start with arbitrary val-
ues, but they would produce two-step estimates conditional on

our choice, see above. Also, as the model is nonlinear, we must
minimize the GMM numerically (no closed-form solution).
8.5.2
Mixed structure
Consider
yit = yi;t 1 + uit

where
i = 1; 2; :::; N t = 1; 2; :::; T;
jj < 1, initial conditions yi0; i = 1; : : : ; N are known, and

uit = i + tvi + "it:
is the purely stationary individual eect, and
tvi
captures
an additional, time-varying individual eect.

We assume
E (i2) = 2 ; E (vi2) = v2; E ("2it) = "2 8i; 8t;

E ("iti) = E ("itvi) = 0; E (yi0"it) = 0 8t; E (ivi) = v :
Consider the case where one of the following conditions holds:
t = s
Under condition (8.1),
= i + vi.
i
Under condition (8.2),
8t; s = 1; 2; : : : ; T;
v = v2 = 0:
let t =
8t; then uit = i + "it,
vi
(8.1)
(8.2)
where
is constant, which corresponds to the
E (u2it) =
2 + "2 and E (uituis) = 2 if t 6= s. Models uit = i + tvi + "it
usual model in terms of second-order moments because
150
and
uit = i + "it cannot be distinguished.
Consider now the case where
v = 1; 1).
and
are perfectly correlated
We have
uit = (1 + t)i + "it

uit = (1 t)i + "it
Then
vi
v = 1;
if v =
1:
if
i disappears from the error term, which becomes:

uit = tvi + "it
with
t = (1 + t); vi = i:
8.5.2.1 Inconsistency of GMM with First-Dierence and QuasiDierence

When
t's are dierent across time, rst-dierencing transforma-
tion yields:
uit
ui;t 1 = (t
t 1)vi + "it
"i;t 1;
and instruments from lagged dependent variables are correlated

with
vi:
E [(uit ui;t 1)Yis] = (t t 1)v + s(t t 1)v2
s t 2:
If quasi-dierence transformation is applied to the model:
uit
rtui;t 1 = (1 rt)i + "it
and the transformed residual depends on
E [(uit
i .
rt"i;t 1;
We have
rtui;t 1)Yis] = (1 rt)2 + (1 rt)sv
s t 2:
8.6.
151
8.5.2.2 A consistent transformation

To eliminate both eects
and
tvi,
it is necessary to use a
double-transformation: First-Dierence, and then Quasi-Dierence:
4yit
r~t4yi;t 1 = (4yi;t 1
r~t4yi;t 2) + 4"it
r~t4"i;t 1;
i = 1; 2; : : : ; N; t = 3; 4; : : : ; T , where
r~t = 4t=4t 1 = (t
t 1)=(t 1
t 2):
GMM estimators of the double-dierence model based on Quasidierencing rst and then First-dierencing residuals are not consistent when instruments include lagged dependent variables.
We would have in that case:
4 [("it
rt"i;t 1) + i(1 rt)] = 4"it
which depends on
4(rt"i;t
1)
i4rt;
i.
GMM procedures using instrument matrices from lagged dependent variables would yield consistent estimates only when the correct model transformation is performed.
8.6 Example: Wage equation

Consider the wage equation seen before, in a simpler, dynamic
form:
log wit = log wi;t 1 + 1W KSit + 2OCCit + uit;

where
wit:
OCCit:
wage rate,
W KSit:
# of years worked in the year,
dummy for blue collar job.
152
We estimate the above model under three specications:
1. Usual case uit = i + "it;

2. Multiplicative case uit = tvi + "it;
3. Mixed case uit = i + tvi + "it.
In case 1, we use a linear GMM procedure with First-dierence

transformation. In case 2, nonlinear GMM in parameters
and
rt = t=t 1; t = 3; 4; : : : ; T , and Quasi-dierence. In case 3, nonlinear GMM in parameters and r

~t = t=t 1; t = 4; 5; : : : ; T ,
and Double-dierence.
W KS; OCC ).
Instruments in all cases: weakly exogenous in level (
Table 8.1:
Parameter

1
2
First-dierence GMM
Estimate
Std. error
t-stat.
0.9465
0.0126
74.83
0.0022
0.0022
0.98
-0.0848
0.0423
-2.00
Hansen test 69.68 (7.4E-07)
8.6.
Table 8.2:
Parameter

1
2
r1
r2
r3
r4
r5
153
Quasi-dierence GMM
Estimate
Std. error
t-stat.
0.9121
0.0218
41.72
0.0150
0.0038
3.87
-0.1014
0.1007
-1.00
-0.5838
0.3856
-1.51
-0.0871
0.0974
-0.89
0.3294
0.0621
5.29
-0.1842
0.1074
-1.71
1.0401
0.5947
1.75
Hansen test 2.32 (0.99)
154
Table 8.3:
Parameter

1
2
r~1
r~2
r~3
r~4
Double-dierence GMM
Estimate
Std. error
t-stat.
0.9211
0.0460
19.98
0.0082
0.0014
5.79
-0.0394
0.0322
-1.22
-0.5272
0.2250
-2.34
-0.1188
0.1029
-1.15
0.2931
0.1009
2.90
-0.0863
0.0399
-2.16
Hansen test 19.20 (0.05)
Part III
Discrete choice models
155
Chapter 9
Nonlinear panel data models
9.1 Brief review of binary discrete-choice models
Models with qualitative variables: binary choice and multinomial
models. Brief survey of these models, for cross-section data and
the binary case :
yi = xi + ui; i = 1; 2; : : : ; N;

yi = 1
if yi > 0;

yi = 0
if yi 0;
yi and yi: respectively latent (unobserved) and observed variables;

xi: 1 K vector of regressors. Threshold 0 is arbitrary here, as
E (yi) is unknown.
9.1.1
Linear Probability model
E (yi) = P rob(yi = 1) = xi + ui:

[0; 1]. Two
possible values for residual ui : 1 xi (when yi = 1) or ui =
xi
Unreasonable, as predicted probabilility may not lie in
157
158
CHAPTER 9. NONLINEAR PANEL DATA MODELS
yi = 0). Heteroskedasticity, since V ar(ui) = P rob(yi =

0) ( xi )2 + P rob(yi = 1) (1 xi )2
(when
= (1 xi ) ( xi )2 + xi (1 xi )2
= (1 xi )[( xi )2 + xi (1 xi )]
= xi (1 xi ):
9.1.2
Logit model
Based on Logistic distribution:
exp(xi ) ;
P rob(yi = 1) = (xi ) = 1+exp(
xi )
1
P rob(yi = 0) = 1 (xi ) = 1+exp(
xi ) ;
exp(xi )
Density: (xi ) =
[1+exp(xi )] :
2
In this case,
9.1.3
V ar(ui) = 2=3.
Probit model
Based on Normal distribution:

xi = R xi = p1 exp( u2i );
1 2
2 2
R
1 exp( u2i2 );
p
xi = x+1
2
i = 2
2
ui
p1
2 exp( 22 ):
P rob(yi = 1) =
P rob(yi = 0) = 1

xi
Density:
=
Parameter
ui is N (0; 2)
is unidentied (appears in ratio
=):
is normal-
ized to 1.
Estimation method: Maximum Likelihood:
^ = arg max

N
Y
i=1
[P rob(yi = 1)]yi [1
P rob(yi = 0)]1
yi
9.2.
159
LOGIT MODELS FOR PANEL DATA
= arg min

where
N
Y
i=1
F (ixi );
F (:) is probability function ( or ), and i = 2yi
1.
In these models, inference is best drawn on a) sign of estimates;
@P rob(yi = 1)=@xi).
b) marginal eects (
When moving to panel data, we consider
uit = i + "it, so that
P rob(yit = 1) = P rob(yit > 0) = P rob("it > xit

= P rob("it < xit + i) = F (xit + i):
i )
9.2 Logit models for panel data

9.2.1
Sucient statistics
Consider rst a model with xed-eects.

Maximum Likelihood estimator: we have to estimate both
and
i; i = 1; : : : ; N , but i and are not independent for qualitativechoice models. When T is xed, MLE estimates of i are not consistent and consequently, the MLE of is not consistent either.
Individual eects i are denoted incidental parameters (their number increases with N ).
Solution: Neyman-Scott (1948) principle of estimation in the presence of incidental parameters.
statistic i for , i = 1; 2; : : : ; N
Suppose there exists a
sucient
that does not depend on
, then
the conditional density
f (yijxi; i; ) =
f (yijxi; i; )
;
g(ijxi; i; )
for
g(ijxi; i; ) > 0;
160
does not depend on
i.
then obtains by maximizing the conditional density of (y1 ; : : : ; yN ) given (1 ; : : : ; N ):

A consistent estimator of
^ = arg max

Joint probability of
yi:
h
P rob(yi) =
exp i
N
Y
i=1
P
f (yijxi; i; ):
T
t=1 yit
P
QT
t=1 [1 + exp(xit
T
t=1 yit xit
i
+ i)]
If we solve the FOC associated with maximizing the log-likelihood

wrt.
N X
T
@ log L X
=
@
i=1 t=1
and wrt.
exp(xit + i)
+ y x = 0;
1 + exp(xit + i ) it it
i:
T
@ log L X
=
@i
t=1
T
X
t=1
yit =
exp(xit + i)
+ y = 0; i = 1; 2; : : : ; N;
1 + exp(xit + i) it
T
X
t=1
exp(xit + i)
1 + exp(xit + i )
i is: i =
PT
The probability that
t yit = s is
Hence, a sucient statistic for
exp(is)
T!
Q

s!(T s)!
[1
+
exp(
x

+

)]
it
i
t
i = 1; 2; : : : ; N:
PT
t=1 yit .
X
d2Bi
exp
T
X
t=1
! )
ditxit
9.2.
161
LOGIT MODELS FOR PANEL DATA
9.2.2
Conditional probabilities
The conditional probability of
yi given i is:
i
T
exp
t=1 yit xit
P

P rob (yi i) = P
T
d2Bi exp
t=1 dit xit
P
P
( t yit)!(T
t yit)! ;
hP
where
T!
Bi is a set of indices for individual i:
(
Bi = (di1; di2; : : : ; diT )jdit = 0; 1

Set
Bi
T
X
and
t=1
dit =
T
X
t=1
yit :
yP
it for individual
T
in
t yit . Groups
represents all possible combinations of
with the same number of 1's as described
for which
PT
t yit
= 0
or
PT
t yit
= T
have probability 1, and
contribute nothing to the likelihood. Only sets of interest: when
T
y
=
s
2
]0
;
T
[
;
there
are
(
it
1
s ) =T !=[s!(T s)!] such elements,
that correspond to distinct T sequences with value s.
PT
Notes:
The second expression does not depend on and can be dropped;

To compute the above probability, we have to consider for each
s all possible sequences of 0's and 1's. Example: if T = 4 and
s = 2, we would have 6 cases and
2
3
1 1 0 0 0
1
61 0 1 07
exp(
x

)
i
1
!
6
7
T
X
X
6 1 0 0 1 7 B exp(xi2 ) C
7B
C
exp
ditxit = vec 6
6 0 1 1 0 7 @ exp(xi3 ) A
6
7
t=1
d2Bi
40 1 0 15
exp(xi4 )
0 0 1 1
162
9.2.3
Example:
T =2
Only case of interest:
yi1 + yi2 = 1.
!i = 1
!i = 0
if
if
Let
(yi1; yi2) = (0; 1);

(yi1; yi2) = (1; 0):
We have the conditional probability:
P rob(!i = 1jyi1 + yi2 = 1) =
P rob(!i = 1)
P rob(!i = 0) + P rob(!i = 1)
exp(i + yi2xi2 )
=
[1 + exp(i + xi1 )][1 + exp(i + xi2 )]
exp(i + xi1 )][1 + exp(i + xi2 )]

[1 +exp
(i + xi1 ) + exp(i + xi2 )
exp(i + xi2 )
=
exp(i + xi1) + exp(i + xi2 )
exp[(xi2 xi1) ])
=
= [(xi2 xi1) ]:
1 + exp[(xi2 xi1) ]
In that case, Bi = fijyi1 + yi2 = 1g and the conditional
likelihood is log L =
X
i2Bi
f!i log [(x2i
log-
xi1) ] + (1 !i) log f1 [(x2i xi1) ]gg :
T >P
2, we have to consider alternative sets of
T
observations for which
t yit is the same. Note that this formulation is a conditional Logit specication: regressors x depend on
In practice, when
the alternative.
9.3.
163
PROBIT MODELS
9.3 Probit models

One typically uses the Probit model in the random-eect case
(easier to work with).
uit = i + "it,
Consider a model where

distribution
where
G(:) and is independent of the xi's.
is drawn from
Assume
2
:
1 + 2
The contribution to the likelihood of unit i is Li = P rob(yi )
V ar() = 2 ; V ar("it) = 1; Corr(uit; uis) = =
=
Z i1 xi1
Z iT xiT
it = 2yit
elements in ui .
where
1
and
f (ui1; ui2; : : : ; uiT )dui1 duiT ;

f ( :)
is the joint density function of
Integration of this density is impratical when
is large, but one
can work with the conditional density, because conditional on

the
uit's are independent:
f (ui1; ui2; : : : ; uiT ) =
Z +1 Y
T
1 t=1
where the density of
2 )).
Z +1
i,
f (ui1; ui2; : : : ; uiT ji)f (i)di
f (uitji )f (i)di;
i is N [0; =(1 )] (remember = 2 =(1 +

Li as
p2 #
) dti;
itxit + it ti p
1
Butler and Mott (1982) show that we can write
1
Li(yi) = p

Z +1
"
T
Y
t2i
(
t=1
which is now a one-dimensional integral that can be evaluated numerically (Gauss-Hermite integration procedure).
of the method: assume a constant correlation
Disadvantage
) across periods.
164
9.4 Semiparametric estimation of discrete-choice

models
We consider here estimation of binary-choice panel data models
with xed eects and possibly endogenous regressors.
In the model
yit = x0it + i + "it;
i = 1; : : : ; N; t = 1; : : : ; T;
we now that the Within estimator is consistent if
E [(xit xi)("it "i)] = 0;

true if
x is strictly exogenous:
E ["itjxi1; : : : ; xiT ] = 0:
It is not sucient to assume that
x is predetermined only:
E ["itjxi1; : : : ; xit] = 0;
and in this case we have to use IV estimation strategy, e.g., tting
4yit = 4xit + 4"it;

using as instruments past values of
x.
Such an approach would not work in nonlinear models, unless

some linearization of the model is performed.
Semiparametric approach of Honor and Lewbel (2000): provide a
N -consistent semiparametric estimator for binary-choice
models, where distribution of error is unspecied.
9.4. SEMIPARAMETRIC ESTIMATION OF DISCRETE-CHOICE MODELS165
9.4.1
The binary choice model
Consider the binary-choice model
yit = 1I(it + x0it + i + "it > 0):

Negative result of Chamberlain (1993): Even if the distribution of
"it
is known, the Logit model is the only version of model above
that can provide a
N -consistent estimator for .
But this nega-
tive result can be overthrown under some additional assumptions

(e.g.,
it
is independent from
set of instruments denoted
zi).
and
"it,
conditional on
x and
Assumptions
A.1. The conditional distribution of
it given (xit; zi) is abso-
lutely continuous wrt. a Lebesgue measure with non-degenerate

Radon-Nikodym conditional density function
ft(itjxit; zi).
eit = i + "it; 8 t. eit is conditionally independent of

it (conditioning on xit and zi). The conditional distribution of
eit has support
et(xit; zi) and is denoted Fet(eitjxit; zi).
A.2. Let
t = r and t = s, the conditional distribution of it given xit and zi has support [Lt ; Kt ] with
1 Lt <
0 < Kt 1, and the support of xit eit is a subset of [Lt; Kt].
A.3. For 2 periods
A.4. Let
xtz = E (xitzi0 ) and zz = E (zizi0 ).

(i)
Then
E ("ir zi) = E ("iszi) = 0;
(ii)
E (izi); zz ; xrz and xsz exist:
(iii) zz and (xrz xsz ) 0zz (xrz xsz )0
166
are nonsingular.
The case ir = is = i is allowed;

(xit; zi) can be correlated with it, but (A.1) rules out (xit; zi)
it;
i can be correlated with xit or zi, but (nuit; i) must be independent given (xit ; zi );
"it is uncorrelated with instruments zi;
(A.2) means that the conditional distribution of "it given (it; xit; zi)
does not depend on it ;
According to (A.3), it can take on any value that x0it + eit
as deterministic functions of
(rest of latent variable) can take on.
These assumptions imply that there exist sequences
fk2g such that

P rob(yitjxit; zi; it = k1) ! 0
and
fk1g and
P rob(yitjxit; zi; it = k2)
! 1:
Practical implication: the resulting estimator will perform well

when the variance of
it is large relative to the rest of the latent
variable.
Theorem 7
Let
yit =
yit 1I(it > 0)

:
ft(itjxit; zi)
If Assumptions (A.1) to (A.3) hold, then for t = r; s,

E (yit jxit; zi) = x0it + E (i + "itjxit; zi):
Proof. Let (dropping subscripts for clarity) s = s(x; e) =
x0
e.
We have
E (yjx; z ) = E
E [y
1I( > 0)j; x; z ]

jx; z
f ( jx; z )
=
=
Z KZ
Z K
E [y
1I( > 0j; x; z )]

f ( jx; z )d
f ( jx; z )
[1I( + x0 + e > 0) 1I( > 0)] dFe(ej; x; z )d
(and because of A.2: conditional independence of
Z K
e L
[1I( > s)
e wrt. :)
1I( > 0)] ddFe(ejx; z ):
Note that
1I( > s)
1I( > 0) = 1I( > s > 0)1I(s > 0) + [1I(0 s)
+1I( > 0 s)] 1I(s 0) [1I(s > > 0) + 1I( > s > 0)] 1I(s > 0)
1I( > 0 s)1I(s 0)
= 1I(s > 0) [1I( > s > 0) 1I(s > > 0) 1I( > s > 0)]
+1I(s 0) [1I(0 s) + 1I( > 0 s)
= 1I(s 0)1I(0 > s)
,
=
E (yjx; z ) =

1I(s 0)
Z K
e L
Z 0
s
1I( > 0 s)]
1I(s > 0)1I(s > > 0)
[1I(s 0)1I(0 > s) 1I(s > 0)1I(0 < < s)]
ddFe(ejx; z)
1 d
1I(s > 0)
Z s
1 d dFe(ejx; z )
[1I(s 0) ( s) 1I(s > 0) s] =

= x0 + E (ejx; z )
QED:
sdFe(ejx; z )
168
9.4.2
The IV estimator
A corollary to the theorem above is that, under assumptions (A.1)

to (A.4),
E (ziyit ) = E (zix0it) + E (zii);
for
t = r; s:
Let

xsz )0 1 (xrz
xsz )zz1(xrz
= (xrz
xsz )zz1
t = E (ziyit ):
Then is consistently estimated by
(r
s).
and
xir
Estimation procedure: run a 2SLS regression of
xis, using zi as instruments.
yir
yis
on
We use the fact that
E (yjx; z ) = x0 + E ( + "jx; z )
,
Let
4x = xr
yis = (x0ir x0is) + E ("ir "isjx; z )

= (x0ir x0is) + E ("ir "isjx):
xs, 4y = yr ys,4 = zyr zys. The 2SLS
yir
will be

^ = (4xz 0)(z 0z ) 1(z 4x0) 1 (4xz 0)(z 0z ) 1z 4y:
Lewbel and Honor show that
N (^
where
) v N 0;
Var(Q^ i)
0 ;
can be replaced by
^ and
Q^ i = (ziyir
ziyis ) zi(xir
^
xis)0:

For computing
yit , we need a nal component:
estimate of
ft .
Feasible two-step estimation: use a kernel estimator of joint den-
(it; xit; zi) divided by a kernel estimator of joint density of

(xit; zi).
sity of
wit = (xit; zi) denote the K + L vector of explanatory and

instrument variables, and uit = (it ; wit ) (a K + L + 1 vector).
f^(it; wit) and f^(wit) respectively denote the estimated joint density function of it and components of wit , and the joint density
associated to components of wit . These densities are
Let
NT
X
K +L+1 1
f^(it; wit) = NT h
= NT h
NT Z
X

1
K +L+1
uit
uj
f^(it; wit)dit

Km
it
NT
1X
j =1
Km
j wit
;
wit
wj
h

wj
dit
Km R(:) and Km (:) are Rtwo multivariate

kernels such that Km (x) =
Km (x; y)dy and Km (x)dx = 1.
where
j =1

NT hK +L
Km
j =1
f^(wit) =
is the window,
2
The conditional density is then estimated by
f^(it; wit)
^
ft(itjxit; zi) = ^
:
f (wit)
170
9.5 SML estimation of selection models

General-purpose estimation technique for models with selection:
models with endogenous regime switching, Generalized Tobit models, etc.
Use of a particular, ecient simulator for multivariate normal
distributions: the GHK simulator (Geweke-Hajivassiliou-Keane,
Geweke 1991, Brsch-Supan and Hajivassiliou 1993, Keane 1994).
9.5.1
The GHK simulator
Consider the following likelihood function
L=
where
g("j)f ()d;
f:rg
= (1; 2; : : : ; K )0 and "
are a
K -vector
(9.1)
and a
M-
vector of normal variates respectively. This corresponds to a very

general structural model dened by
straints
r.
g(:j:),
and the set of con-
Notes:
In this model formulation, " is an implicit function of parameters and observed variables.
is typically an unobserved heterogeneity term.

Function g(:j:) may depend in particular on the conditional distribution of
" given .
Problem in practice: ML estimation would require numerical

integration involving multiple probability distributions.
Idea of the GHK technique: construct a recursive algorithm to
approximate multiple integrals.
9.5.
171
SML ESTIMATION OF SELECTION MODELS
Let
= var(),
a positive-denite matrix such that there
exists a lower diagonal matrix

decomposition):
D
Dene
B
=B
B
@
satisfying
DD0 =
(Choleski
d11 0 : : : 0
d21 d22 : : : 0 C
C
..
.
..
.
..
dK 1 dK 2 : : :
..
.
..
.
C:
A
= D 1, such that is a multivariate standard normal
variate. We have
L=
where
i(:):
K
Y
f :D rg i=1
i(i) g("jD )d;
(9.2)
= (1; 2; : : : ; K ).
f : rg can be written
standard normal density of
The domain (set of constraints)

recursively as
1
1
1
1
r1; 2 (r2 d211); 3 (r3 d311 d322);
d11
d22
d33
1
: : : ; K
(r
d : : : dK;K 1K 1):
dKK K K 1 1
Equation (9.2) becomes
L=
Z 1
"
Z 1
1
r1 =d11 d22
(r2 d21 1 )
:::
K
Y
i=1
i(i) g("jD ) d1 : : : dK :
Dene now the truncated normal density function for
(i)Ai = (i)
1
1
(r
dii i
di11
i
: : : di;i 1i 1)
(9.3)
1
172
1
where Ai =
dii (ri
di11
: : : di;i 1i 1); 1
i
, and
(:) is the
normal cumulative density function (CDF). The likelihood function above is now
L=
Z
A1
:::
"
Z
AK
K
Y
i=1
K
Y
i=1
1
(r

dii i
!
!
di11
: : : di;i 1i 1)
Ai (i) g("jD ) d1 : : : dK :
We now move from truncated normal variables to uniform ran-
i given that its

distribution is truncated normal is between 0 and 1. Let ui denote
a random variable on [0; 1]. We can then write
dom variables. The probability associated to any
ui =
(i)
1
d1ii (ri

d1ii (ri
di11
di11
: : : di;i 1i 1)
: : : di;i 1i 1)
; i = 1; 2; : : : ; K:
For example:
(1) (r1=d11)
,
1 = 1 [u1 (1 (r1=d11)) + (r1=d11)]
1 (r1=d11)
(2) (1=d22(r2 d21r1))
u2 =
1 (1=d22(r2 d21r1))

1
1
, 2 = 1 u2 1 d (r2 d211) + d (r2 d211) ;
22
22
where 1 is dened above.
For any i, we have the recursive formula:

1
i = 1 ui 1
(r d : : : di;i 1i 1)
dii i i1 1
u1 =
9.5.
173
1
+
(r
dii i
di11

: : : di;i 1i 1)
(u1; : : : ; uK ).
The likelihood function now involves random variables ui ; i =
1; : : : ; K and K integrals with constant bounds:
which depends on the sequence of uniform random variables
L=
Z 1
where
:::
Z 1" Y
K
i=1
1

(r
dii i
di11
: : : di;i 1i 1)
!
g("jD )
du1du2 : : : duK ;
i's are implicit recursive functions of the ui's.
Note: the product of conditional normal densities
QK
i Ai (i)
disappears from the likelihood function, because it is equals to
dui=di.
Since the
ui's
are i.i.d., we can approximate the likelihood
above by
Drawing S values for the vector u: fus1; us2; : : : ; usK gSs=1;

Compute recursively (1s; : : : ; Ks ) from us above;
Average out over the S draws to form the Simulated Likelihood:
LS =
"K
S Y
X
1
1
1
(r
S s=1 i=1
dii i
di11s

: : : di;i 1is 1)
Note
Easy to generalize to a restriction set of the form
a < < b.
would construct recursively:
i = q [ui; (ai
di11
: : : di;i 1i 1)=dii;
g("jD s) :
We
174
(bi
di11
: : : di;i 1i 1)=dii] ; i = 1; : : : ; K;
where
q(u; a; b) = 1 [(a) (1 u) + (b) u] :

If we want to compute for example the probability of an event
a < < b in the multivariate case, we just need to evaluate

Q( ) = Q1:Q2: : : : QK ;
where
Qi = [(bi di11 : : : di;i 1i 1)=dii]

[(ai di11 : : : di;i 1i 1)=dii] ;
and average out over simulations.
9.5.2
Example
Based on the paper: V.A. Hajivassiliou and Y.M. Ioannides 2001,

"Unemployment and liquidity constraints".
St and Et respectively.
y1t > 0;
y1t 0:
Liquidity and employment constraints:
St =
Et =
8
<
:
0
1
1
0
if
if
y2t < ;

+
if y2t < ;
+

if y2t :
if
System of latent variables:
y1 = 1I(y2 < ) 11 + 1I( < y2 < +) 12 + x11 + v1;

y2 = 1I(y1 > 0)2 + x22 + v2:
Six possible regimes, as (S; E ) in f0; 1g f 1; 0; 1g.
9.5.
S E
0
-1
-1
175
y1
y2
11 + x11 + v1 < 0
x22 + v2 <
x11 + v1 <0
< x22 + v2 < +
12 + x11 + v1 < 0
+ < x22 + v2
11 + x11 + v1 > 0
2 + x22 + v2 <
x11 + v1 > 0 < 2 + x22 + v2 < +
12 + x11 + v1 > 0
+ < 2 + x22 + v2
We can then dene bounds corresponding to
a1
a2
<
v1
v2
<
b1
b2
as follows:
S E
0
-1
-1
a1
1
1
1
( 11 + x11)
x11
( 12 + x11) +
a2

+
x22
x22
2
2
x22
x22
b1
( 11 + x11)

x11
+
( 12 + x11)
+1 2
+1 + 2
+1
Advantage of the method: only need to specify the variancecovariance matrix
for (and possibly the one for ") in g("j).
vit = i + it in the

example above, where v corresponds to " and corresponds to
In the panel data case: We would have
in our general notation above.
We can then construct the distribution of full error term condi-
tional on
(Recall discussion on panel Probit model).
b2
x22
x22
+1
x22
x22
+1
176
Allows for multivariate distributions for individual eects, possibly correlated across equations.
Allows for various serial and contemporaneous correlations across

the
it's (pure one-way random eects, serial correlation, etc.)
177
Appendix 1. Maximum-Likelihood estimation

of the Random-eect model
and ", the log-likelihood is
NT
N
1 0 1
log("2)
log()
U U;
2
2
2"2
Under normality assumption for
log L =
where
NT
log(2)
2
=
="2 = Q + B , and
j
j = ("2)N (T
1)( 2 + T 2 )N = ( 2)NT N :
"

"
Concentrated log-likelihood wrt.
":

NT
1
N
NT
log(2)
log d0 Q + B d
log();
log L =
2
2

2
where d = Y
X ^ .
Estimate of 1= conditional on :
P P
0 Qd
d
dit di)2
i t (P
1d
= =
=
:
(T 1)d0Bd T (T 1) i(di d)2
Estimate of
conditional on 1=:

1
X0 Q + B X
1
1
X 0 Q + B Y:
Maddala (1971): there are at most 2 maxima for the log-likelihood

(problem of local maximum).
Breusch (1987) procedure: iterate between
vergence.
^ "2 and 1d
= until con-
Starting with ^ W ithin and 1= = 0, the next 1d

= is positive and
starts an increasing sequence;
178
APPENDIX 1. MLE OF THE RANDOM-EFFECT MODEL
Starting with ^ Between and 1= ! 1, the next 1d

= is positive
and starts a decreasing sequence.
Since at most 2 maxima, use both as starting values. If

maximum of log L is the same, this is the true maximum.
179
Appendix 2. The two-way random eects model

A2.1 Assumptions and notation
Assumptions:
i v IID(0; 2 ); t v IID(0; 2 ); "it v IID(0; "2);

E (it) = E (i"it) = E ("it) = 0;
and
is independent of
i; t and "it.
We have
E (uitujs) =
8 2
2
2
< + + "
2
: 2

i = j; t = s;
if i = j; t 6= s;
if i 6= j; t = s:
if
Variance-covariance matrix of error term is
= 2 (IN
eT e0T ) + 2 (eN e0N
IT ) + "2(IN
IT )
= T 2 B + N2 B + "2INT :
A2.2 Feasible GLS estimation
We can write
P4
j =1 j Mj ,
1 = "2
2 = T 2 + "2
3 = N2 + "2
4 = T 2 + N2 + "2
with
M1 = (IN eNNeN )
(IT
0
0
M2 = (IN eNNeN )
eETeT
0
0
M3 = eNNeN
(IT eTTeT )
0
0
M4 = ( eNNeN )
( eTTeT ):
eT e0T
T )
180
APPENDIX 2. THE TWO-WAY RANDOM EFFECTS MODEL
We have
r =
P4
r
j =1 j Mj ,
so that
4
X

"
1=2 =
p" j Mj
j =1
and the typical element of
yit = yit
with
1 = 1
p" 2 ; 2 = 1
Y = "
1=2Y
1yi

is
2yt + 3y;

p" 3 ; 3 = 1 + 2 +
GLS estimates obtain by OLS regression of
p" 4
1:
Y on X .
V ar(Mj U ) = j Mj ; j = 1; 2; 3, the Best Quadratic Un0

biased estimator of j is U Mj U=tr (Mj ); j = 1; 2; 3.
Because
Amemiya (1971): Replace OLS residuals by Within (two-eect)

residuals.
Asymptotic distribution of variance component esti-
mates:
p
2
NT
(^

p 2"
@ N (^
p 2
0
T (^
"2)
2"4 0 0
2 ) A v N @0; @ 0 24 0
0 0 24
2 )
Method of Swamy and Arora (1972):
11
AA :
Estimate variance com-
ponents from mean square errors of three regressions:

Between-individual and Between-periods.
First regression: Within, model transformed by

M1 = (IN eN e0N =N )
(IT eT e0T =T ).
Within,
181
Estimate of
1:
^ 1 = ^ 2" =
[Y 0M1Y
Y 0M1X (X 0M1X ) 1X 0M1Y ]

:
(N 1)(T 1) K
Second regression: Between individual,

M2 = (IN eN e0N =N )
(eT e0T =T ).
Estimate of
2:
^ 2 =
[Y 0M2Y
and we compute
model transformed by
Y 0M2X (X 0M2X ) 1X 0M2Y ]

;
(N 1) K
^ 2 = (1=T )(^2
^ 2" ).
Third regression: Between period, model transformed by

M3 = (IN eN eN =N )
(eT e0T =T ).
Estimate of
3:
^ 3 =
[Y 0M3Y
and we compute
Y 0M3X (X 0M3X ) 1X 0M3Y ]

;
(T 1) K
^ 2 = (1=N )(^3
^ 2" ).
General formulation of the GLS estimate:

^ GLS = (X 0 M1X )="2 + (X 0M2X )=2 + (X 0M3X )=3 1
and
(X 0M1Y )="2 + (X 0M2Y )=2 + (X 0M3Y )=3

V ar(^ GLS ) = "2 (X 0M1X ) + "2(X 0 M2X )=2
+"2(X 0M3X )=3 1 :

^ W ithin = [X 0M1X ] 1[X 0M1Y ],
Within estimator
^ BI = [X 0M2X ] 1[X 0M2Y ],
Between-individual estimator is
182
Between-period estimator is
^ BP = [X 0M3X ] 1[X 0 M3Y ], so that
^ GLS = W1^ W ithin + W2^ BI + W3^ BP ;
with
i
0M X
0M X 1
X
X
0
2
2
W1 = X M1X + " + "
(X 0M1X );
h
i
0M X
0 M X 1 "
X
X
2
0
0
2
W2 = X M1X + " + "
(X M2 X );
h
i
0M X
0 M X 1 "
X
X
0
2
2
W3 = X M1X +
+
(X 0M3X ):
2
"
2
"
3
2
2
2
3
If 2 = 2 = 0, ^ GLS is ^ OLS ;

When T and N ! 1, ^ GLS ! ^ W ithin;
If " ! 1, then ^ GLS ! ^ BI ;
If " ! 1, then ^ GLS ! ^ BP .
2
2
2
3
A2.3 Testing for eects

Breusch-Pagan (1980): Lagrange Multiplier test statistic for
= = 0.
Lagrange Multiplier (LM) test: uses restricted estimates
H0 :
only

1

@ log L() 0
@ 2 log L()
@ log L()
LM =
E
;
@
@@0
@
where
log L() =
and
= (2 ; 2 ; "2).
NT
log(2)
2
1
log j
j
2
1 0 1
U
U;
2
183
Gradient of log likelihood:
@ log L()
1
@
= tr
1
@i
2
@i
i = 1; 2; 3.

1
@
+ U 0
1
1U ;
2
@i
Because
= 2 (IN
eT e0T ) + 2 (eN e0N
IT ) + "2(IN
IT );
we have
=
@i
8
0
< IN eT eT
eN e0N IT
:
INT
Hence
(2 )
2
i=2 ( )
2
i=3 (" ):
i=1
0
0
0
@ log L()
NT 4 1 U0 (IN
0 eT eT )U=U 0U
=
1 U (eN eN
IT )U=U U
@
2"2
0
and
3
5;
1
@ 2 log L()
=
E
@@0
2
3
(
N
1)
0
(1
N
)
4
2"
4
0
(T 1) (1 T ) 5 :
NT (N 1)(T 1) (1 N ) (1 T ) (NT 1)

LM test statistic is nally
NT
LM =
1
2(T 1)

NT
+
1
2(N 1)

U 0(IN
eT e0T )U 2
U 0U

U 0(eN e0N
IT )U 2
U 0U
184
and is distributed as a
Important note. LM
2(2) under H0.
test statistic does not depend on variances,
only uses OLS residuals
U.
185
Appendix 3. The one-way unbalanced random

eects model
A3.1 Notation
D1 and D1 + D2 resp.

D1 1 )
Y1
X1
U1
=
+
;
(D1 + D2) 1 ) Y2
X2
U2
where X1 and X2 are resp. D1 K and (D1 + D2 ) K .
Consider 2 cross-sections, of dimension
Variance-covariance matrix of
Now, let
We have
"2 ID1 + 2 eD1 e0D1
0
0
Tj =

Pj
i=1 Di ,

1 0
=
;
0
2
is
(2D1 + D2) (2D1 + D2)

=
0
"2 ID1 + 2 eD1 e0D1

2 eD1 e0D2
2
0
2
eD2 eD1
" ID2 + 2 eD2 e0D2
(Tj 2 + "2)eTj e0Tj =Tj

with
j =
+"2(ITj eTj e0Tj =Tj ):
rj = (Tj 2 + "2)r
eT e0
j
Tj
Tj
2
2
2
If we denote wj = Tj + " ,
matrix:
+ ("2)r ITj
!
0
eTj eTj
Tj
"
wj
the transformation for the unbal-
anced panel is
"
j 1=2 =
T1 = D1 and T2 = D1 + D2.
so that
Using the formula for the power of the
eTj e0Tj
+ ITj
Tj
eTj e0Tj
Tj
186APPENDIX 3.
THE ONE-WAY UNBALANCED RANDOM EFFECTS MODEL
eTj e0Tj
j
Tj
= ITj
where
1=2Yj : yjt
Typical element of "
Direct generalization to the case

diagonal and o-terms (in the
0
^ GLS = X X
1
Y = "
1=2Y , and
"
1=2 = diag ITi
j = 1
"
:
wj

PTj
1
j Tj t=1 yjt .
N > 2,
because
is block-
j 's) are always equal to 2 .
0
X Y where X = "
1=2X;
eTi e0Ti
+ diag
Ti

"
wi

eTi e0Ti
Ti

A3.2 Estimation of variance components

Amemiya (1971) suggests the following estimates for
U^ 0QU^
;
T
N
K
i
i
2 and "2:
^ 2" = P

N + tr (X 0QX ) 1X 0 B X ^ 2"
2
P
P 2 P
^ =
T
i
i
i Ti = i Ti
0

tr (X Q X ) 1X 0 (Jn=N ) X ^ 2"
P
P 2 P
+
;
T
T
=
T
i
i
i
i i
i
P
P
where Jn is a matrix of ones, of dimension (
i Ti) ( i Ti),
U^ 0B U^
eTi e0Ti

B = diag
Ti
ji=1!N ; Q = diag
ITi
eTi e0Ti
Ti
ji=1!N :
187
Appendix 4. ML estimation of dynamic panel

models
A4.1 Likelihood functions
Dierent likelihoods corresponding to cases 1 to 4 above. Assumption:
i and "it are jointly normally distributed.

y
For Case 1/ ( i0 xed):
L1 = (2)
NT
2
"
N
(det V )
exp
N
1X
2 i=1
u0iVT 1ui ;
ui = (yi yi; 1 xi zi ) and VT = "2IT + 2 eT e0T , the

(T T ) variance-covariance matrix for unit i.
where
i):
For Case 2.a/ ( i0 random and independent of
L2a = L1 (2) (y2 )

N
2
"
N
2
N
X
1
(y
2y2 i=1 i0
exp
y )2 :
0
For Case 2.b/ ( i0 random and correlated with

NT
L2b = (2)
(
exp
("2)
N (T
1)
("2 + T a)
N
2
(y2 )
0
N
2
" T
N X
X
N X
T
X
1
a
2+
u
it
2"2 i=1 t=1
2"2("2 + T a) i=1
(2)
i):
"
N
2
exp
N
X
1
(y
2y2 i=1 i0
0
t=1
u2it
y )2 ;
0
#)
188
APPENDIX 4. ML ESTIMATION OF DYNAMIC PANEL MODELS
where
y ).
a = 2 2 y2
and
uit = yit yi;t 1 xit zi (yi0
For Case 3/ ( i0 xed):
L3 = (2)
NT
2
( 2 )
NT
"
(2)
N X
T
1 X
[(y
2"2 i=1 t=1 it
exp
(yi;t 1
N
yi0 + wi0) xit

"
(2)
N
2
z ]2
2)):
"
N (T +1)
L4a = (2)
"
j
T +1j
N
2
exp
N
1X
wi0)2 :
For Case 4.a/ ( i0 random with common mean
2=(1
N
1 X
(y
22 i=1 i0
exp
yi0 + wi0)
w and variance
#
1 v0 ;
vi
T +1
i
2 i=1
where vi is a (T + 1) vector vi = (yi0
w ; yi1 yi0 xit
zi ; : : : ; yiT yi;T 1 xiT zi ) and
T +1 is a (T +1) (T +1)
matrix
T +1 = "2 1
0T
Useful expressions:
00T
IT
1
+ 2 1
eT
1
"2T
j
T +1j = 1 2 "2 + T 2 + 11 + 2
and
1 = 1
T +1
"2
"
1 2 00T
0T
IT
2
"
2
; e0T :
1+
+T +
1
1
189
1+
(1 + ; e0T ) :
eT
For Case 4.b/ ( i0 random with common mean w and arbitrary

2
variance w0 ): same as 4.a/, but with
T +1 replaced by
1
2 = 2 00

VT +1 = "2 w " T + 2 1
0T IT
eT

; e0T :
For Case 4.c/ ( i0 random with mean i0 and variance

2 ): same as 4.a/, but with y replaced by i0.
0
)
For Case 4.d/ ( i0 random with mean

ance
(1
2
Same as L2b but with y ,

2
) (2 + w2 ) respectively.
w0 ):
i0
and
"2=(1
and arbitrary vari-
y
replaced by
i0,
Consistency of the MLE (Maximum Likelihood Estimator) depends on the way
and
tend to innity, in each case.
Crucial problem with MLE: estimator is not consistent when, for

large
and xed
T , the choice of initial conditions is mistaken,
because likelihood function is dierent.
A4.2 Specication tests

Useful for checking maintained assumptions on initial conditions.
Based on Likelihood Ratio (LR) statistics.
Case 1
yi0 xed.
Test for random-eects specication, i.e. for
VT +1. Let L01 denote es2

timated log-likelihood L1 under assumption H0 : VT = " IT +
2 eT e0T , and L1 the estimated log-likelihood with unrestricted VT
L0) is distributed
(T (T + 1)=2 components). Under H0 , 2(L1
1
structure of variance-covariance matrix
190
APPENDIX 4. ML ESTIMATION OF DYNAMIC PANEL MODELS
as a
2(T (T + 1)=2 2).
Case 4.a
wi0
random with common mean
wi0
random with common mean
w
and variance
"2=(1 2)
H0: matrix
T +1 as dened in likelihood for Case 4.a, vs. alternative: unrestricted variance-covariance with (T + 1)(T + 2)=2
0

components, with log-likelihoods L4a and L4a respectively. Under
H0, 2(L4a L04a) is distributed as a 2((T + 1)(T + 2)=2 2)
(note only two free parameters in restricted VT +1, as already
estimated).
Case 4.b
variance
2
w0 .
Let
L04b
w
and arbitrary
denote log-likelihood under restriction on
VT +1 for Case 4.b, and L4a the unrestricted log-likelihood for Case
L0 ) admits
4.a (as above). Under H0 : True model is 4.b, 2(L4a
4b
2
a ((T +1)(T +2)=2
3) distribution (3 free parameters in Case
2
2
2
4.b: " ; ; w ).
0
Test for stationarity: Case 4.a vs. Case 4.b
H0:
2
as a (1).
Under
stationarity (Case 4.a),
2(L04:b L04:a) is distributed
191
Appendix 5. GMM estimation of static panel

models
In the Instrumental-Variable context (Hausman-Taylor, AmemiyaMaCurdy, Breusch-Mizon,Schmidt), we assumed:
Error-component model with E (uu0) =

= "2INT + 2 (IN
eT e0T ).
Endogeneity was caused by E (X 0) 6= 0 or E (Z 0) 6= 0, but it

E (X 0") = E (Z 0") = 0.
was assumed
With GMM, we can consider dierent exogeneity assumptions related to
or ", producing dierent orthogonality conditions.
Several cases:
1. Random or xed eects (instruments correlated with
);
2. Strictly or weakly exogenous instruments (correlation with
").
A.5.1 Computation of the variance-covariance matrix

For the panel data case, we can use the fact that several time
observations are available for each unit.
If heteroskedasticity of
E (uituis) = 0; t 6= s, we have

N
1 0
VN = NV arf (x; ) = Nvar Z u = 2 E [Z 0uu0Z ]
N
N
1
= Z 0[IT
diagfi2g]Z
N
P
2
where i can be estimated by
^ 2i = T1 Tt=1 u^2it. Hence, a optimal
second-step estimate for VN would be
N
X
1
^ i; where H^ = diagf^ 2i g:
V^N =
Zi0HZ
N i=1
2
2
the form E (u ) =
it
such that
192
APPENDIX 5. GMM ESTIMATION OF STATIC PANEL MODELS
Important aspect for panel data: If we transform the model to

remove individual eects
i, the optimal weighting matrix AN =
VN 1 depends on this transformation.

model with
Consider a linear xed-eect
q orthogonality conditions:
E [Wi0ui ] = 0 where ui = QT ui

and Wi is a T q matrix of instruments.
Because QT is a T T symmetric matrix, conditions above can
0
be rewritten E [(QT Wi ) ui ] = 0 and the optimal weighting matrix
AN is VN 1 with

W 0 u
VN = NE
N
0
u W
= NE [(QW=N )0uu0(QW=N )]
= [(QW )0 (QW )]:

N
Hence, for GMM, it is equivalent to transform the model (by
or the instrument matrix.
Assume now the error-component assumption holds; we have
VN =
because
1
[(QW )0["2INT + T 2 B ](QW )]
N
1
= [(QW )0["2INT ](QW )]
N
Q and B are orthogonal, therefore

"2 0
1 0
VN = (W Q)(QW )
W QW:
N
N
Replacing in the GMM criterion:
0
0
0
^N = arg min u () W ( W QW ) 1 W u () ;

N
N
N
Q)
193
and the optimal GMM estimator is
^N = X 0W (W 0QW ) 1W 0X 1 X 0W (W 0QW ) 1W 0Y :

A.5.2 Random eects and strictly exogenous instruments
By denition, random eects
exogeneity (
E (X 0) = 0 and we assume strict
uncorrelated with
" at
1 q instrument vector wit, we have
E (wis0 uit) = 0
for
every time period). For a
s; t = 1; 2; : : : ; T;
qT 2 moment conditions. Let wit0 (wi1; wi2; : : : ; wit)

8t = 1; 2; : : : ; T , and set WSE;i = IT
wiT0 . Moment conditions
0
then read E (WSE;iui ) = 0.
which gives
We can show, using Theorem above, that GMM estimators using

the form for 2SLS or the 3SLS form are equivalent. We have
0 ) = 1=2
w0
1=2WSE;i = 1=2(IN
wiT
iT
0 )( 1=2
I ) = W B;
= (IN
wiT
qT
SE;i
where
B = 1=2
IqT .
Hence Theorem 4 applies.
A.5.3 Fixed eects and strictly exogenous instruments

We assume now instruments are correlated with
, but still strictly
i, we can use the rst-dierence operator

LT of dimension T (T 1):
exogenous. To remove
L0T yi = L0T xi + L0T ui
where
L0T ui = L0T "i;
194
where
6
6
LT = 6
6
6
4
Note that (
T)
LT (L0T LT ) 1L0T .
If instruments
1 0 0
1 1 0

0
0
0 0 0
0 0 0

1 1
0
1
..
.
..
.
..
.
..
.
..
.
0
0
..
.
3
7
7
7
7:
7
5
Within operator is related to
LT : QT =
wit are strictly exogenous, we have
0 L0 u ) = E (Z 0 L0 " ) = 0;
E (ZSE;i
T i
SE;i t i
where
0:
ZSE;i = IT 1
wiT
Model in First-dierence form can be estimated by GMM using
ZSE;i as instruments.
A.5.4 Weakly exogenous instruments

In this case, we consider a
1q
vector of instruments
wit
such
that
E (wit0 uis) = 0; for t = 1; 2; : : : ; T; t s:

There are T (T + 1)=2 such conditions: instruments are not correlated with future values of "it (and are not correlated with i ).
On the other hand, if instruments are weakly exogenous but are
correlated with
i, we have a smaller number of conditions, that
can be written
E (wit0 uis) = E (wit0 "is) = 0

where
uis = uis
for
t = 1; 2; : : : ; T
1; t s;
ui;s 1.
A convenient way of transforming the above model is to use the
Forward-Filter (Keane and Runkle 1992). Let F be a T T upper0

triangular matrix that satises F F = IT , so that Cov (F ui ) =
195
IT . We have F = fFij g, Fij = 0 for i > j . Using instruments

0 ), we have the following Forward-Filter
Wi = (wi01; wi02; : : : ; wiT
estimator:

1
^ FF = X 0F 0 H (H 0H ) 1H 0F X
X 0F 0 H (H 0H ) 1H 0F Y ;
where
F = IN
F .
Dierence between standard IV and FF: IV with instruments
Wi
1=2 is not consistent unless Hi are strictly exogenous.

and lter
But FF transformation preserves the weak exogeneity of instruments
wit.
When
is large and
is small, the FF is not nec-
essarily more ecient GMM or 3SLS with the same instruments
Wi .
If we don't have conditional homoskedasticity:
plim N1
PN
0 0
i=1 HiF ui uiF Hi
1 PN H 0 F F 0 H
6= plimP
i
i=1 i
N
plim N1
N
0
i=1 Hi Hi :
A.5.5 Ecient GMM estimation

We now present alternative GMM estimators that may be more
ecient than IV-HT, IV-AM or IV-BMS. Why: under strict exogeneity assumption, we have much more moment conditions than
HT, AM or BMS.
We rst consider the case where we restrict
= "2IT + eT e0T
2
(as in HT, AM and BMS). We will then examine the case of an

unrestricted
matrix.
Consider the model with strictly exogenous regressors
yi = Ri + (eT
zi) + ui Xi + ui;
0 )0 (a T k matrix of
ui = (eT
i )+"i, Ri = (ri01; r20 i; : : : ; riT
0 0
00
time-varying regressors), eT
zi = [zi ; zi ; : : : ; zi ] (a T g matrix
where
196
of time-invariant regressors).
Assume regressors
rit and zi are strictly exogenous wrt. "it:
E (di
"i) = 0;
where
di = (ri1; ri2; : : : ; riT ; zi);
but some may be correlated with
i:
rit = (r1it; r2it); zi = (z1i; z2i); E (r10 iti ) = E (w10 ii) = 0:

HT, AM and BMS instruments are of the form
WA;i = (QT Ri; eT

si); sHT;i = (r1i; z1i)
sAM;i = (r1i1; r1i2; : : : ; r1iT ; z1i)
sBMS;i = (sAM;i; r2i);
where
r2i = (r2i1
If the
no conditional heteroskedasticity condition holds:
r2i; : : : ; r2i;T 1 r2i).
E (Wi0uiu0iWi) = E (Wi0Wi),
ing the same instruments
Wi.
then GMM is as ecient as IV usBut GMM is more ecient if this
condition is violated and a unrestricted weighting matrix is used.

The strict exogeneity assumption implies
E [(LT
di)0ui] = E (L0T ui
di)
= E [L0T (eT i + "i)
di] = E (L0T "i
di) = 0;
where
LT
di is a T [(T
1)(kT + g]
matrix. Arellano and
Bover (1995) propose a GMM estimator using instruments:
WB;i = (LT
di; eT
si) instead of WA;i = (QT Ri; eT
si):
Number of additional instruments wrt.
BMS:
rank(ZB;i)
rank(ZA;i) = (T
advantage: variance-covariance matrix
IV-HT, IV-AM or IV-
1)(kT + g)
k.
is unrestricted.
Other
197
A.5.6 GMM with unrestricted variance-covariance matrix
ZB;i satisfy the no conditional heteroskedasticity assumption, but the variance-covariance of u is unrestricted.
We assume instruments
Result of Im, Ahn, Schmidt and Wooldridge (1996): The 3SLS

form of the GMM estimator with unrestricted
ments
1ZA;i
using instru-
is numerically equivalent to the 3SLS estimator
when all instruments
ZB;i are used, in the BMS case.
This is not
true for HT or AM instrument matrices. We have
E (Ri0 QT 1ui) = E (Ri0 QT 1eT i) + E (Ri0 QT 1"i)

= E (Ri0 QT 1eT i):
But when BMS assumption is not true and with an unrestricted
, E (Ri0 QT 1eT i) 6= 0.
These authors propose another trans-
formation matrix instead of
Q = 1
and we can show that
QT
for removing
i :
1eT (e0T 1eT ) 1e0T 1;
QeT = 0.
Therefore:
Ri0 QeT i = 0 and E (Ri0 Qui) = E (Ri0 Q"i) = 0

because
Ri are assumed strictly exogenous wrt. "i.
The optimal choice of instruments would be
ZC;i = (QRi; 1eT

si);
for
si:
HT, AM or BMS. This modied 3SLS estimator is an
is unrestricted but the no conditional heteroskedasticity condition is valid.
ecient GMM estimator when
A.5.7 GMM vs. IV estimators
198
Main dierence between GMM and Instrumental-Variable estimators:

specied
variance-covariance matrix of error terms need not be
a priori for GMM, it must be for IV.
In the GMM case, we nd parameters by solving the system of

moment conditions or by nding
^GMM = arg min u0()ZVN 1Z 0u();

Z are instruments and VN is an estimate of the variance of

0 0
moment conditions: V = E (Z uu Z ). In a linear model, u( ) =
Y X where , we can solve directly for ^ N :

^ GMM = X 0ZVN 1Z 0X 1 X 0ZVN 1Z 0Y:
where
In the IV case, we restrict
u to be a) homoskedastic (V
is diago-
nal), or b) heteroskedastic of known form. Example: panel data
= E (uu0) = IN
, where
= "2INT + 2 (IN
eT e0T ) and = "2IT + 2 eT e0T :
Consider two IV estimators for panel data: 2SLS or 3SLS.

In the 2SLS case (HT, AM, BMS), we premultiply the model in
vector form
Zi:
yi = Xi + ui
by
1=2 and then apply instruments

^ 2SLS = X0
1=2Z (Z 0Z ) 1Z 0
1=2X 1
X 0
1=2Z (Z 0Z ) 1Z 0
1=2Y :
1=2Zi as instruAn equivalent 2SLS estimator obtains by using
1
ments:

^ 2SLS = X0
1Z (Z 0
1Z ) 1Z 0
1X 1
X 0
1Z (Z 0
1Z ) 1Z 0
1Y :
2
In the 3SLS case, we have

^ 3SLS = X 0Z (Z 0
Z ) 1Z 0X 1 X 0Z (Z 0
Z ) 1Z 0Y :
199
GMM and 3SLS are equivalent if the following condition holds:
E (Zi0uiu0iZi) = E (Zi0Zi) 8i = 1; 2 : : : ; N;
because, as
! 1,
N
1 0
1X
plim Z
Z = plim
Zi0uîu^0iZi = E (Zi0uiu0iZi) = V:
N
N i=1
This condition is denoted
When condition
No conditional heteroskedasticity.
E (Zi0uiu0iZi) = E (Zi0Zi)
does not hold, GMM
is strictly more ecient than 3SLS.

Impossible to prove 3SLS is more or less ecient than 2SLS, but
there exists a condition for numerical equivalence between 2SLS
and 3SLS:
The 2SLS and 3SLS estimators are equivalent if there

exists a non-singular, non-stochastic matrix B such that
1=2Z =
ZB .
Theorem 8
is
^
estimated from rst-stage N for GMM. It states that under this
1=2) does
condition, ltering (premultiplying instruments by
This Theorem can be applied for IV or GMM procedures:
not change eciency of GMM or IV estimators.
200APPENDIX 6.
A FRAMEWORK FOR SIMULATION-BASED INFERENCE
Appendix 6. A framework for simulation-based

inference
A.6.1 Heterogeneity and the linear property
In linear panel-data models, the residual consists of an heterogeneity factor
i and an i.i.d.
error term
"it:
uit = i + "it:
OLS (or, equivalently, ML) yield consistent but not ecient estimates if unobserved heterogeneity is omitted.
In nonlinear models, this often leads to signicant biases. Other

problem: dicult to compute the likelihood of nonlinear models
because of dependent observations for a given individual (
yit
is
not i.i.d.).
A.6.1.1 Example: Dynamic model with heterogenous AR(1)

root and no individual eect
where
yit = iyi;t 1 + "it = ( + i)yi;t 1 + "it;

jij < 1, i independent from "it, "it is N (0; "2).
The
nonlinear feature of the model comes from
yit = "it + i"i;t 1 + 2i "i;t 1 + + hi"i;t h + : : : :

If the restricted model is estimated, under the following data generating process:
yit = yi;t 1 + "it;
201
the OLS estimate of
is
N
1X
Cov(i; V ar(yi;t
P
^
i +
N i=1
1=n i V ar(yi;t
1))
1)
N
Covi(P
i; "2=(1 2i ))
1X
+
:
=
N i=1 i 1=n i "2=(1 2i )
i > 0, Cov(i; "2=(1 2i )) > 0 and ^

average of the true i 's (the bias is positive).
If all
overestimates the
A6.1.2 Example: Duration model with heterogeneity
yit v i exp( iyit) = ( + i ) exp[ ( + i )yit];

where
i are i.i.d.
heterogeneity factors with
E (i) = 0.
This is
the exponential duration model. If the model is misspecied:
yit v exp( yit);

the Maximum Likelihood estimate of
^ =
We have
^ T !1
!
"
"P
is
N PT
i=1 t=1 yit
NT
N
X
1
1
N i=1 i
# 1
# 1
N
1X
<
:
N i=1 i
Hence, the MLE of the misspecied model underestimates the average of individual parameters
i .
202APPENDIX 6.
In many cases, it is not possible to lter out the individual effect without very restrictive assumptions (e.g., Fixed-eect Logit,
Another possibility is to integrate
Butler-Mott Probit, ...).
out the heterogeneity factor.

Basic idea: specify a density distribution for
i and compute the
conditional likelihood.
yit conditional on xit and i is

f~(yit; xit + i) with i v (; );
where is a distributional parameter, and the vector of param-
Assume the density function of
eters of interest.
The distribution of
yit conditional on observed variables is
f (yitjxit; ; ) =
f~(yit; xit + )(; )d
In many cases, this cannot be solved analytically. Additional parameters to estimate:
A.6.1.3 Example: Poisson model

Assume
exp(xit + i)yit
f~(yit; xit + i) =
exp[ exp(xit + i)]:
yit!
Change of variable: i = exp(i ), with probability distribution:
1= 1 exp( =)
(; ) =
;
( )1= (1= )
where
(:): Gamma distribution, and > 0. Then it can be
shown that
(1= + yit)[ exp(xit )]yit

f (yitjxit; ; ) =
:
(1= ) (yit + 1)[1 + ; exp(xit )]yit+1=
203
This is the
negative binomial distribution.
A.5.1.4 Example: the Probit model again

Probit with heterogeneity:
P rob[yit = 1jxit; i] = [xit + i]:

Assume
i v N (0; 2 ):
P rob[yit = 1jxit] =
where
(:):
1

(xit + )
d;

density function of
N (0; 1).
Since observations are dependent:
P rob[yi1 = 1; : : : ; yiT = 1] =
6=
T
Y
t=1
Z Y
T
1

(xit + )
d

t=1
P rob[yit = 1]:
In this case, numerical integration is feasible.
In more complex
cases, one can use simulation techniques to approximate integrals

of the form
M (yitjxit; ; ) =
m(yit; xit + )(; )d:
A.6.2 Integration by simulation

Purpose: approximate multiple integrals using Monte Carlo (simulation) techniques.
204APPENDIX 6.
We can write
M (yitjxit; ; ) =
(; ) 0
m(yit; xit + ) 0
(; 0)d;
0
(; )
(:; 0) is a known distribution density with xed parame0

ters . We have for individual i at time t:

(; )
M (yit jxit; ; ) = E m(yit; xit + ) 0
;
(; 0)
0
0
which is the expectation using distribution of m() ()= ().
0
Density function is the importance sampling function.
where
S random variables for individual i: is; s = 1; : : : ; S

from distribution 0 , we can approximate the above
If we can nd
drawn
expectation by
S
1X
(is ; )
s
m(yit; xit + i ) 0 s 0 :
S s=1
(i ; )
Under (mild) regularity assumptions, the simulated expression
converges to the above expectation, using a weak Law of Large
Numbers. Two issues in practice:
Choice of density function 0(; );

Number of draws to obtain consistency ?
For the choice of the importance sampling function, make sure the
domain of
0 contains the domain of (to capture rare events in
tails of distribution). Regarding the number of draws, consistency

of estimator depends on estimation procedure.
A.6.3 Simulated GMM and Maximum Likelihood estimators
205
Gouriroux and Monfort (J. of Econometrics, 1993): Simulated
GMM (SGMM) and Simulated Maximum Likelihood (SML).
For SGMM, when population moments are impossible to compute,
we replace
S
1X
E [f (yit; xit; i; ] = 0 by
[f (yit; xit; is; ] 0;
S s=1
or by
S
1X
(s ; )
[f (yit; xit; is; ] 0 is 0:
S s=1
(i ; )
The SGMM criterion is then
s
MGMM
=
( N
X
S
1X
[f (yi; xi; is; ]0 Zi
S s=1
i=1
!)
T 1
N
X
S
1X
0
Zi
[f (yi; xi; is; )]
S s=1
i=1
Zi is a T L matrix of instruments. The SGMM is consistent and asymptotically normal when N tends to innity and S
where
is xed. This is because we can use the weak Law of Large Numbers for consistency of the simulator
1P f
s
S
towards
E f
and a
Central Limit Theorem for asymptotic normality, see below.
For the SML estimator, we want to compute
log L() =
where heterogeneity
f (yijxi; ).
N
X
i=1
log f (yijxi; );
is already integrated out in density function
But if this integration is not possible analytically ?
206APPENDIX 6.
Suppose we nd a simulator
f~(yi; xi; ; ) where is drawn from
a known distribution, such that
Ef~(yi; xi; ; ) = f (yijxi; ):
Then
f (yijxi; ) can be approximated by
S
1X
f~(yi; xi; is; );
S s=1
where
simulations are used for each
the Simulated Log-likelihood is
Ls() =
"
N
X
S
X
i: is ; s = 1; 2; : : : ; S
and
1
1
log
f~(y ; x ; s; ) :
N i=1
S s=1 i i i
The Simulated Maximum Likelihood estimator is consistent when
N=S
! 0. In practice, a very large number of simulated draws
may be necessary.
A.6.4 Choice of simulation number and mode

We use the Gouriroux and Monfort (1993) result. The SGMM
and SML criteria are of the form
"
GN () =
1
N
N
X
and we assume that, when

in
to
(yi; xi; E (yi; xi; ; ))
! 1, GN () converges uniformly
G() = [E (yi; xi; E (yi; xi; ; ))] :

Two dierent simulated criteria can be used for GN ( ): whether
I
D
identical (GN ( )) or dierent sets (GN ( )) of simulation draws
207
are used for each individual:
"
N
S
1X
1X
I
GN ( ) =
yi; xi;
(yi; xi; s; )
N i
S s
"
GDN () =
1
N
N
X
i
yi; xi;
Case 1. S is xed and N ! 1.
1
S
S
X
s
(yi; xi; is; )
!#
;
!#
GIN () converges to the random variable (it is a function of (1; : : : ; S )):

"
E yi; xi;
1
S
S
X
s
!#
(yi; xi; s; )
I
G(). Therefore ^ that maximizes (SML)
I
or minimizes (SGMM) GN ( ) is inconsistent.
GDN () converges to the non random scalar:
which is dierent from
"
!#
S
1X
EE yi; xi;
(yi; xi; s; ) ;
S s
which is in general dierent from G( ). But if function is linear
D
D
wrt. E (:), GN ( ) converges to G( ) and ^
is consistent.
Case 2. S and N ! 1.
Both
Î
and
^D
are consistent.
A.6.5 Examples: Probit and Tobit models
yit = xit + i + ""it;

yit = 1 if yit > 0;
yit = 0 if yit 0; (Probit);
208APPENDIX 6.
and
yit = xit + i + "";

yit = yit if yit > 0;
yit = 0 if yit 0; (Tobit);
where i v N (0; 1), "it v N (0; 1).
Because
the
yit
T -fold
is present for each component in
yi = (yi1; : : : ; yiT )0,
are serially correlated and the likelihood would contain

integrals. But we can consider the conditional likelihood
functions of
yi given xi and i:
f (yijxi; i; ) =
yit =1
(xit + )
for the Probit and
1
y
f (yijxi; i; ) =
it

yit >0 "
Y
Y
yit =0
Y
yit =0
( xit
xit
"
xit
"
)
for the Tobit. These conditional likelihoods can be directly used

as simulators.
Appendix 7. Example: the SAS c Software

*
*
*
*
*
*
;
DYNTAB.SAS ;
;
Uses datafile DYNTAB3.DAT;
;
Create library and file names ;
* Change directory information below ;
libname water 'd:/dea/panel';

filename watfile 'd:/dea/panel/dyntab3.dat';
* Create SAS table and read data from Ascii file ;
data wat;
infile watfile;
input id year conso price revenue precip ;
* Compute logs ;
lconso=log(conso); lprice=log(price);
lrevenue=log(revenue);
run;
* Descriptive statistics ;
proc means data=wat;run;
* OLS regression ;
proc reg data=wat;
model lconso = lprice lrevenue;
run;
* Model 1: One-way Fixed effects ;
* cs=116:
Set the number of cross-sections ;
209
210
APPENDIX 7. EXAMPLE: THE SAS
* option /fixone: Set one-way Fixed-effect ;

proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /fixone ;
run;
* Model 2: Two-way Fixed effects ;
* option /fixtwo: Set two-way Fixed-effect ;
model lconso= lprice lrevenue /fixtwo ;
run;
* Model 3: One-way Random effects ;
* option /ranone: Set one-way Random-effect ;
model lconso= lprice lrevenue /ranone;
run;
* Model 4: Two-way Random effects ;
* option /rantwo Set Two-way Random-effect ;
model lconso= lprice lrevenue /rantwo;
run;
* Model 5: One-way Random effects with AR(1) ;
* option /ranone parks rho Set One-way Random-effect ;
* and compute RHO: Ar(1) parameter ;
model lconso= lprice lrevenue /ranone parks rho;
run;
* Compute parameter estimates on each cross section ;
proc sort data=wat;
by year;
proc reg data=wat;
SOFTWARE
211
model lconso= lprice lrevenue ;
by year;
run;
* Compute Within and Between estimates ;
* using the MEANS procedure ;
proc sort data=wat;
by id;
proc means data=wat noprint;
var lconso lprice lrevenue ;
by id;
output out=out1 mean=mconso mprice mrevenue ;
data out1;set out1;
keep id mconso mprice mrevenue ;
data wat;
merge wat out1;
by id;
data wat;set wat;
qconso=lconso-mconso; qprice=lprice-mprice;
qrevenue=lrevenue-mrevenue;
* Within regression ;
proc reg data=wat;
model qconso = qprice qrevenue ;
run;
* Between regression ;
proc reg data=wat;
model mconso = mprice mrevenue;
run;
212
SOFTWARE
ESTIMATES USING TSCSREG PROCEDURE

MODEL 1. ONE-WAY FIXED EFFECTS
The SAS System 16:15 Monday, January 22, 2001 3

TSCSREG Procedure
Dependent Variable: LCONSO
Model Description
Estimation Method
FIXONE
Number of Cross Sections 116
Time Series Length
6
SSE
MSE
RSQ
Model Variance
2.578099 DFE
578
0.00446
Root MSE 0.066786
0.9344
F Test for No Fixed Effects

Numerator DF:
115 F value: 58.3964
Denominator DF: 578 Prob.>F: 0.0000
Parameter Estimates
Variable
CS 1
CS 2
CS 3
CS 4
CS 5
... ...
CS 112
CS 113
CS 114
CS 115
INTERCEP
LPRICE
LREVENUE
DF
1
1
1
1
1
...
1
1
1
1
1
1
1
Parameter
Estimate
-0.455773
-0.222476
0.153338
-0.131488
0.027422
...
0.420843
-0.322888
-0.259767
-0.240823
5.099257
-0.134245
0.024386
Standard
Error
0.039463
0.039923
0.038900
0.039174
0.038890
...
0.040309
0.039376
0.038678
0.039379
0.366957
0.018447
0.033223
T for H0:
Parameter=0
-11.549433
-5.572620
3.941882
-3.356518
0.705132
... ...
10.440506
-8.200102
-6.716134
-6.115479
13.896065
-7.277506
0.734009
Prob > |T|

0.0001
0.0001
0.0001
0.0008
0.4810
...
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.4632
Variable
Label
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Intercept
213
MODEL 2. TWO-WAY FIXED EFFECTS

TSCSREG Procedure
Dependent Variable:
LCONSO
Model Description
Estimation Method
FIXTWO
Time Series Length
6
SSE
MSE
RSQ
Model Variance
2.205671 DFE
573
0.003849 Root MSE 0.062043
0.9439
F Test for No Fixed Effects

Numerator DF:
120 F value: 65.6530
Denominator DF: 573 Prob.>F: 0.0000
Variable
CS 1
CS 2
CS 3
...
CS 114
CS 115
TS 1
TS 2
TS 3
TS 4
TS 5
INTERCEP
LPRICE
LREVENUE
DF
1
1
1
...
1
1
1
1
1
1
1
1
1
1
Parameter Estimates
Parameter Standard T for H0:
Estimate
Error
Parameter=0
-0.535192 0.040793 -13.119702
-0.302435 0.041809 -7.233670
0.120803
0.037066 3.259125
... ...
...
... ...
-0.288486 0.036463 -7.911820
-0.256215 0.036669 -6.987209
-0.102087 0.017883 -5.708681
-0.047565 0.016463 -2.889216
-0.030524 0.014486 -2.107135
-0.007359 0.012507 -0.588378
-0.025528 0.009992 -2.554900
6.316873
0.396540 15.929983
-0.251061 0.034210 -7.338896
-0.053316 0.033244 -1.603773
Prob > |T|

0.0001
0.0001
0.0012
...
0.0001
0.0001
0.0001
0.0040
0.0355
0.5565
0.0109
0.0001
0.0001
0.1093
Variable
Label
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Time Seri
Time Seri
Time Seri
Time Seri
Time Seri
Intercept
214
SOFTWARE
MODEL 3. ONE-WAY RANDOM EFFECTS

TSCSREG Procedure
Dependent Variable: LCONSO
Model Description
Estimation Method
RANONE
Time Series Length
6
Variance Component Estimates
SSE 3.12498
DFE
693
MSE 0.004509 Root MSE 0.067152
RSQ 0.1087
Variance Component for Cross Sections
Variance Component for Error
0.043243
0.004460
Hausman Test for Random Effects

Degrees of Freedom: 2
m value: 14.4912 Prob. > m: 0.0007
Variable
INTERCEP
LPRICE
LREVENUE
DF
1
1
1
Parameter
Estimate
4.692305
-0.149074
0.053077
Parameter Estimates
Standard T for H0:
Error
Parameter=0
0.354917 13.220844
0.017611 -8.465039
0.032306 1.642977
Prob > |T|

0.0001
0.0001
0.1008
Variable
Label
Intercept
215
MODEL 4. TWO-WAY FIXED EFFECTS

TSCSREG Procedure
Dependent Variable:
LCONSO
Model Description
Estimation Method
RANTWO
Time Series Length
6
Variance Component Estimates
SSE 2.707154 DFE
693
MSE 0.003906 Root MSE 0.062501
RSQ 0.0907
Variance Component for Cross Sections
Variance Component for Time Series
Variance Component for Error
0.043638
0.000746
0.003849
Hausman Test for Random Effects

Degrees of Freedom: 2
m value: 22.2377 Prob. > m: 0.0000
Variable
INTERCEP
LPRICE
LREVENUE
DF
1
1
1
Parameter
Estimate
5.674742
-0.225151
-0.018251
Parameter Estimates
Standard T for H0:
Error
Parameter=0
0.371984 15.255323
0.027604 -8.156464
0.032401 -0.563297
Prob > |T|

0.0001
0.0001
0.5734
Variable
Label
Intercept
216
SOFTWARE
WITHIN REGRESSION USING PROC REG
Source
Model
Error
c Total
Analysis
Sum of
DF
Squares
2
0.31252
693 2.57810
695 2.89062
Root MSE
Dep Mean
C.V.
Variable
INTERCEP
QPRICE
QREVENUE
DF
1
1
1
of Variance
Mean
Square
F Value
0.15626 42.003
0.00372
0.06099
-0.00000
-1.291786E17
R-square
Adj R-sq
Prob>F
0.0001
0.1081
0.1055
Parameter Estimates
Parameter
Standard
T for H0:
Estimate
Error
Parameter=0
-5.28092E-17 0.00231195 -0.000
-0.134245
0.01684666 -7.969
0.024386
0.03034107 0.804
Prob > |T|

1.0000
0.0001
0.4218
Variable
Label
BETWEEN REGRESSION USING PROC REG
Source
Model
Error
C Total
DF
2
693
695
Analysis of Variance
Sum of
Mean
Squares
Square
F Value
7.13103
3.56551 84.369
29.28684 0.04226
36.41786
Root MSE
Dep Mean
C.V.
Variable
INTERCEP
MPRICE
MREVENUE
DF
1
1
1
Parameter
Estimate
-0.176444
-0.259461
0.494483
0.20557
4.99481
4.11576
R-square
Adj R-sq
Prob>F
0.0001
0.1958
0.1935
Parameter Estimates
Standard
T for H0:
Error
Parameter=0
0.68091356 -0.259
0.02278084 -11.389
0.05958703 8.298
Prob > |T|

0.7956
0.0001
0.0001
Variable
Label
217
Appendix 8. A crash course in Gauss c

Introduction
Gauss is an interpreter computer language, that is most conveniently run in interactive mode (global variables are kept in memory until one quits Gauss). It has a small built-in editor useful for
long jobs, or it can be used in command mode.
Editing and running jobs

When Gauss is executed rst, you are inside the command mode,
with the following prompt:
[Gauss].
You can toggle between the
command mode and the edit mode using either tool bar (Windows bar at the bottom, Gauss bar on top). In command mode,
you can edit any le (for example
myprog.prg)
by typing
edit
myprog.prg, or running this le by typing run myprog.prg.
You
may edit the preselected le by entering the F4 function key. In

edit mode, simply use the Run option on top, or enter key function F3. You may save the program by entering the F2 function
key.
Saving results and output management

To declare a text le for output, use the syntax
output file=c:/mydir/toto.out reset;

The reset option clears the le if it exists!
In a program, you can choose to have output written to the le
or not (useful for inspecting results on the screen only):
output on; (open output le at bottom) or output off; (closes

output le).
218
APPENDIX 8. A CRASH COURSE IN GAUSS
Loading data and creating Gauss datasets

You can either work with data les in text format (Ascii), or with
preexisting Gauss datasets. To load a text-format data le:
load x[1000,5]=mydata.dat
or
n=100;t=10;nvar=5;load x[n*t,nvar]=mydata.dat;.
1000 5 matrix denoted x in memory .

1
This will load a
Some built-in procedures in Gauss require specic Gauss datasets.
To create one, you must specify a) a data matrix ( ); b) a vector

of variable names (
mydata").
varnames);
c) and the Gauss dataset name
("
Then, use the command
call saved(x,"mydata",varnames).
Basic operators
In Gauss, most operators return a value that may be stored in a
variable, or printed to screen. If no assigment command is given,
the program will simply output the result to the screen. Example:
2*x; vs. y=2*x.

You don't have to specify the dimension of ectors or matrices if
they are assigned a computed value.
But you need to assign a
prior value in two cases: a vector/matrix of parameters (that can

be modied afterwards) or when using loops (see below). To create a vector with predetermined values:
x={1 2 3};
(a
1 3 vector) or x={1,2,3}; (a 3 1 vector).
And you can do the same for a vector of strings:
1 Note:
commands are always separated by semicolons
;.
vnames={"a","b","c"}.
219
Here is a list of useful operators:
cols(x)
rows(x)
meanc(x)
stdc(x)
sqrt(x)
sumc(x)
cumsumc(x)
columns of
x;
Returns the number of columns of

Returns the number of rows of
x;
Returns the mean of columns in
x;
x;
Returns the standard deviation of columns in

Computes square root of elements in
x;
Returns the sum of elements in columns of
x;
x;
Returns the cumulative sum of elements in
cdfn(x)
Returns the cumulative normal distribution (x);
2
cdfchic(x,y)
Returns the complement to 1 of the (x) cumulative distribution with
2
puting p-values of
degrees of freedom. Useful for com-
tests.
Working with matrices
x'
Transposes matrix or vector x;
y=x1 x2, y=x1|x2
Concatenates
two vectors or matrices
horizontally or vertically;
y=x[.,1]
Selects column 1 and all rows of matrix x;
y=x[1:10,.]
Selects rows 1 to 10 and all columns;
y=x[1:10,1:20]
Selects columns 1 to 20 and rows 1 to 10;
vec(x)
Creates a vector from a matrix, by stacking all
columns one after the other. vec(x) is NT 1 if x is N T ;
diag(x)
Returns the rst diagonal of matrix x (must be
square);
reshape(x,n,t)
Reshapes matrix x into a N T matrix;
a*b*c
Performs matrix multiplication (check number of rows
and columns!);
a.*b, a./b
Performs element-wise matrix multiplication or
220
a and b must have the same dimension);

inv(x)
Compute inverse of x (for generalized
division (
inverse, use
invpd(x));
zeros(n,m)
Returns a n m matrix of zeros;
ones(n,m)
Returns a n m matrix of ones;
eye(n)
Returns a n n identity matrix;
a.*.b
Computes the Kronecker product a
b;
Conditional operators and loops
Useful for testing and creating dummy variables. Operators:
.neq, .lt, .le, gt., ge.
.eq,
for equal to, not equal to, strictly
less than, less than or equal to, strictly greater than, greater than
or equal to.
Example: suppose you want to create an indicator variable equal
xi 50; i = 1; 2; : : : ; N . The syntax would be

y= x .le 50, which creates a N 1 vector y , with yi = 1 if
xi 50 and 0 otherwise. That is, when a variable is assigned
to 1 when
the result of a condition, Gauss automatically creates an indicator variable.

Example: You want to create a new variable
and equal to
y if z > 0.
+ y.*(z .gt 0).
z , equal to x if z < 0
The syntax would be
z = x.*(z .lt 0)
Loops are not recommended because they produce lengthy processes, and vector operators should always be preferred. But in
some cases, they are necessary. Examples of loops are:
i=1; do until i>n;

y[i]=x[i]+a;
i=i+1;
endo;
221
or
i=1; do while i=<n;

y[i]=x[i]+a;
i=i+1;
endo;
Note: in the above examples, vector
instance
y=zeros(n,1).
y must be dened before, for
Working with data matrices
It is very easy with Gauss to sort data vectors or matrices, or to

select a subset of observations.
y=sorthc(x,1)
Sorts matrix
x using
variable in column 1
as key;
y=selif(x, x .eq 1)
Creates matrix
Creates matrix
equal to 1;
y=delif(x, x .lt 0)
tive values from
x;
from values of
by deleting nega-
Creating procedures
Very useful to speed up repetitive tasks. The general syntax is
proc func(a);
local toto;
:::
retp(toto);
endp;.
a as input (scalar, vector or matrix), create toto as

local variable (not accessible outside procedure func) and return
a single argument toto. In some cases, it is necessary to have more
This will use
than 1 input and 1 output; we can use then:
222
proc (3)=func(a1,a2,: : : ,aK);

local toto1,toto2,toto3;
:::
retp(toto1,toto2,toto3);
endp;.
This code declares 3 inputs
a1; a2; a3; proc(3) = func states that
there will be 3 outputs.

In that case, we must use the following syntax when calling this
procedure:
{b1,b2,b3}=func(a1,a2,a3);
Beware of the use of local variables; any variable used in the procedure must either be declared as local (its value is lost when one
quits the procedure) or else where in the program (this will be a
global variable). A possibility to avoid problems is to declare all
variables as global at the start of the program, with the syntax:
clearg a,b,: : : , toto;
Example: procedure for returning deviations from individual means

(Within operator).
proc(x);
local toto;
toto=reshape(x,n,t);
toto=toto-meanc(toto');
toto=reshape(toto,n*t,1);
retp(toto);
endp;
Note in this case, variables
and
are global variables. If not,
one could use them as arguments in an equivalent, more compact

procedure:
proc(x,n,t);
local toto;
223
toto=reshape(x,n,t);
retp(reshape(toto-meanc(toto'),n*t,1));
retp(toto);
endp;
And if we wished to return both Between and Within:
proc (2)=(x,n,t);
local toto;
toto=reshape(meanc(reshape(x,n,t)'),n*t,1);
retp(toto,x-toto));
endp;
Some useful built-in procedures
Some of these procedures require a Gauss dataset to be created.

If not, a 0 is put in place of the Gauss dataset name.
call dstat(0,x)
in
x;
Prints descriptive statistics for elements
call dstat("mydata",1|3)
Prints descriptive statistics for
elements 1 and 3 in Gauss dataset "mydata";
call ols(0,y,x);
Runs an OLS regression of
To minimize a function, a useful procedure is
y on x;
optmum, which works
as follows:
library optmum;optmum;
To load library and default ar-
guments;
x0={0.1 , 0.1 , 0.5};
Declare initial values of parame-
ters;
{x, f, g, ret} = optmum(&func,x0);
Main command;
x returns the nal value of parameters after convergence, f
is the
224
nal function value,
is the gradient vector, and
ret is a return
code (equal to 0 if convergence is OK).

The optmum procedure calls a user-dened procedure (here,
func)
that returns the value of the function to be minimized, depending

on parameters (here,
proc(z);
:::;
retp(crit);
endp;
z).
Example: To estimate a nonlinear model by minimizing the residual sum of squares, where the model is
log(1)wi:
yi = 0 + 12xi +
library optmum;optmum;
x0={0.1 , 0.1 , 0.5};
{x, f, g, ret} = optmum(&func,x0);
proc(z);
local err;
err=y-z[1]-z[1]*z[2]*x-ln(z[2])*w;
z [2], 2 is z [3], and variables y; x; w

be global variables, while err (the residual) is local
1 PN u2
err=meanc(err'*err);
Computes
i i
N
Note:
is
z [1], 1
is
retp(crit);
endp;
To compute gradients and hessians numerically:
gg=gradp(&func,x) and hh=hessp(&func,x0).
must
225
Appendix 9. Example: The Gauss c software

/* DYNTAB.PRG 16 01 2001 Residential water use */
new; clear all;
library tscs,pgraph;
tscsset;graphset;
output le=d:/dea/panel/dyntab.out reset;

output on;
n=116; t=6;
load x[n*t,6]=d:/dea/panel/dyntab3.dat;
id=x[.,1];
year=x[.,2];
conso=ln(x[.,3]);
price=ln(x[.,4]);
revenue=ln(x[.,5]);
precip=ln(x[.,6]);
vnames="year","conso","price","revenue","precip","id" ;
call saved(year conso price revenue precip id,"watle",vnames);
y= conso ;
x= price,revenue ;
grp= id ;
__title("Water demand equation");
call tscs("watle",y,x,grp);
226
APPENDIX 9. EXAMPLE: THE GAUSS
SOFTWARE
=====================================================================
TSCS Version 3.1.2 1/17/01 3:51 pm
=====================================================================
Data Set: watfile
OLS DUMMY VARIABLE RESULTS
Dependent variable: conso

Observations :
Number of Groups :
Degrees of freedom :
Residual SS :
Std error of est :
Total SS (corrected) :
F = 35.033
P-value =
Var
price
revenue
Coef.
-0.134245
0.024386
Std.
Group Number
1
2
3
...
114
115
116
696
116
578
2.578
0.067
2.891
with 2,578 degrees of freedom
0.000
Coef.

-0.347461
0.035045
Std.
Error

0.018447
0.033223
Dummy Variable
4.643484
4.876781
5.252595
... ... ...
4.839490
4.858434
5.099257
t-Stat

-7.277506
0.734009
Standard Error
0.365639
0.370063
0.369474
... ... ...
0.365496
0.359065
0.366957
F-statistic for equality of dummy variables :

F(115, 578) = 58.3964 P-value: 0.0000
P-Value
0.000
0.463
227

OLS ESTIMATE OF CONSTRAINED MODEL


Observations :
696
Number of Groups :
116
693
R-squared :
0.172
Rbar-squared :
0.170
Residual SS :
32.532
Std error of est :
0.217
Total SS (corrected) : 39.308
F = 72.175
P-value =
0.000
Var
CONSTANT
price
revenue
Coef.
1.164761
-0.249873
0.376643
Std.
Coef.

-0.406149
0.257121
Std.
Error

0.598014
0.022153
0.052746
t-Stat

1.947715
-11.279345
7.140637
P-Value

0.052
0.000
0.000
FULL, RESTRICTED, AND PARTIAL R-SQUARED TERMSDUMMY VARIABLES ARE CONSTRAINED

TABLE OF R-SQUARED TERMS
R-squaredfull model:
0.934
R-squaredconstrained model: 0.172
Partial R-squared:
0.921

FULL, RESTRICTED, AND PARTIAL R-SQUARED TERMSX VARIABLES ARE CONSTRAINED
228
APPENDIX 9. EXAMPLE: THE GAUSS
SOFTWARE
TABLE OF R-SQUARED TERMS

R-squaredfull model:
0.934
R-squaredconstrained model: 0.926 Partial R-squared:
0.108
GLS ERROR COMPONENTS RESULTS


Observations :
696
Number of Groups :
116
693
Residual SS :
3.135
Std error of est :
0.067
Total SS (corrected) : 3.517
F = 22047.870
P-value =
0.000
Std. errors of error terms:
Individual constant terms: 0.206
White noise error : 0.067
Var
CONSTANT
price
revenue
Coef.
4.687235
-0.149316
0.053560
Std.
Coef.

-0.363264
0.071009
Std.
Error

0.355285
0.017623
0.032338
t-Stat

13.192903
-8.472974
1.656247
P-Value

0.000
0.000
0.098
229
Group Number
1
2
3
4
5
...
112
113
114
115
116
Random Components
-0.346522
-0.121608
0.250638
-0.020350
0.128761
... ... ...
0.512636
-0.216224
-0.151243
-0.125587
0.104064
Lagrange Multiplier Test for Error Components Model

Null hypothesis: Individual error components do not exist.
Chi-squared statistic (1): 1367.1014
P-value:
0.0000
230
APPENDIX 10. IV AND GMM ESTIMATION WITH GAUSS
Appendix 10. IV and GMM estimation with

Gauss c
/* IV2.PRG Instrumental variable estimation and GMM estimation
Model y(it) = X(it)beta + Z(i) gamma
We use Hausman-Taylor, Amemiya-MaCurdy, Breusch-Mizon-Schmidt instruments,
both for IV and GMM */
new;clear all;
/* You only need to change this block */
/* Define dimensions
N: number of units, T=number of time periods
nvar= Nb. of variables to be read
k1: Nb. of X1it, k2: Nb. of X2it, g1= Nb.
kq= k1+k2, kb= k1+k2+g1+g2*/
n=595;
t=7;
nvar=13;
k1=4;
k2=5;
g1=2;
g2=1;
kq=k1+k2;
kb=k1+k2+g1+g2;
et=ones(t,1);
un=ones(n*t,1);
unb=ones(n,1);
/* Read data */
load x[n*t,nvar]=psid.dat;
output file=iv1.out reset;
expe=x[.,1];
expe2=x[.,2];
wks=x[.,3];
of Z1i, g2: Nb.
of Z2i
231
occ=x[.,4];
ind=x[.,5];
south=x[.,6];
smsa=x[.,7];
ms=x[.,8];
fem=x[.,9];
unioni=x[.,10];
edu=x[.,11];
blk=x[.,12];
lwage=x[.,13];
/* Define matrices X, Z and vector Y */
x1=occ south smsa ind;
x2=expe expe2 wks ms unioni;
z1=fem blk;
z2=edu;
y=lwage;
x=x1 x2;
z=z1 z2;
/* You don't need to change anything after this */
/* Compute Between and Within transformations:
Caution: keep that order for BXZ: X,Z,Y */
qx=with(x y);
bxz=bet(x z y);
by=bxz[.,cols(bxz)];
bxz=bxz[.,1:cols(bxz)-1];
qy=qx[.,cols(qx)];
qx=qx[.,1:cols(qx)-1];
/* Within regression and error term (uw) */
betaw=inv(qx'qx)*qx'qy;
uw=qy-qx*betaw;
/* Compute variance with instruments */
exob=un bxz;
gamb=inv(exob'exob)*(exob'by);
BX and QX
232
ub=by-exob*gamb;
sigep=uw'uw/(n*(t-1)-kq);
sigq=sqrt(sigep*diag(inv(qx'qx)));
a=x1 z1;
di=by-bxz[.,1:kq]*betaw;
zz=un z1 z2;
gamhatw=inv(zz'*a*inv(a'*a)*a'*zz)*zz'*a*inv(a'*a)*a'*di;
s2=(1/(n*t))*(by-bxz[.,1:kq]*betaw
-zz*gamhatw)'*(by-bxz[.,1:kq]*betaw-zz*gamhatw);
sigal=s2-(1/t)*sigep;
theta=sqrt(sigep/(sigep+t*sigal));
/* GLS transformation and estimate
Caution: keep the order 1,X1,X2,Z1,Z2 in matrix EXOG */
exog=gls(un x1 x2 z1 z2 y);
yg=exog[.,cols(exog)];
exog=exog[.,1:cols(exog)-1];
betagls=inv(exog'exog)*(exog'yg);
siggls=sqrt(sigep*diag(inv(exog'exog)));
/* HT */
aht=un qx bet(x1) z1;
betaht=inv(exog'*aht*inv(aht'*aht)*aht'*exog)*exog'*aht*inv(aht'*aht)
*aht'*yg;
sight=sqrt(sigep*diag(inv(exog'*aht*inv(aht'*aht)*aht'*exog)));
/* AM */
x1s=tam(x1);
aam=un qx x1s z1;
betaam=inv(exog'*aam*inv(aam'*aam)*aam'*exog);
betaam=betaam*exog'*aam*inv(aam'*aam)*aam'*yg;
sigam=sqrt(sigep*diag(inv(exog'*aam*inv(aam'*aam)*aam'*exog)));
/* BMS */
233
abms1=aam tbms(with(x2));
/* This is the general form for BMS instrument, it should work in most
cases. But with the application to PSID data, we must drop some variables,
see below. This means you have to delete ABMS1 below for your application
*/
/* Remove abms1 just below: */
abms1=un qx bet(x1) tbms(with(occ south smsa ind ms wks unioni)) z1;
betabms1=inv(exog'*abms1*inv(abms1'*abms1)*abms1'*exog)
*exog'*abms1*inv(abms1'*abms1)*abms1'*yg;
sigbms1=sqrt(sigep*diag(inv(exog'*abms1*inv(abms1'*abms1)*abms1'*exog)));
/* Compute variance-covariance matrices */
varq=sigep*inv(qx'qx); varg=sigep*inv(exog'*exog);
varht=sigep*inv(exog'*aht*inv(aht'*aht)*aht'*exog);
varam=sigep*inv(exog'*aam*inv(aam'*aam)*aam'*exog);
varbms1=sigep*inv(exog'*abms1*inv(abms1'*abms1)*abms1'*exog);
test1=(betagls[2:kq+1]-betaw)'*inv(varq-varg[2:kq+1,2:kq+1]);
test1=test1*(betagls[2:kq+1]-betaw);
test2=(betaht[2:kq+1]-betaw)'*inv(varq-varht[2:kq+1,2:kq+1])
*(betaht[2:kq+1]-betaw);
test3=(betaht-betaam)'*inv(varht-varam)*(betaht-betaam);
test4=(betaam-betabms1)'*inv(varam-varbms1)*(betaam-betabms1);
output file=iv1.out reset;
output on;
"Within estimates ";
" Estimate standard error t-stat ";
betaw sigq betaw./sigq;
"GLS estimates";
"sigma(alpha),sigma(epsilon),theta(=(sig(ep)/(sig(ep+t*sig(al)))**(1/2))";
sigal sigep theta;
betagls siggls betagls./siggls;
234
"HT estimates ";

betaht sight betaht./sight;
"AM estimates ";
betaam sigam betaam./sigam; "BMS estimates ";
betabms1 sigbms1 betabms1./sigbms1;
"Hausman test statistics and p-value ";
"Within vs. GLS ";
test1 cdfchic(test1,kq);
"Within vs. HT ";
test2 cdfchic(test2,k1-g2);
"AM vs. HT ";
test3 cdfchic(test3,cols(aam)-cols(aht));
"BMS vs. AM ";
test4 cdfchic(test4,cols(abms1)-cols(aam));
/* GMM estimation */
b1,se1,b2,se2,sar = gmm(y,un x1 x2 z1 z2,aht,1);
"GMM-HT estimates ";
b2 se2 b2./se2;
"Hansen test and p-value ";
sar cdfchic(sar,cols(aht)-rows(b2));
b1,se1,b2,se2,sar = gmm(y,un x1 x2 z1 z2,aam,1);
"GMM-AM estimates ";
b2 se2 b2./se2;
sar cdfchic(sar,cols(aam)-rows(b2));
b1,se1,b2,se2,sar = gmm(y,un x1 x2 z1 z2,abms1,1);
"GMM-BMS estimates ";
235
b2 se2 b2./se2;
sar cdfchic(sar,cols(abms1)-rows(b2));
output off;
proc bet(w);
/* Compute BX from matrix w */
local i,term,betx;
term=reshape(w[.,1],n,t);
term=meanc(term').*.et;
term=reshape(term,n*t,1);
betx=term;
i=2;
do until i>cols(w);
term=reshape(w[.,i],n,t);
term=reshape(meanc(term').*.et,n*t,1);
betx=betx term;
i=i+1;
endo;
retp(betx);
endp;
proc with(w);
/* Compute Within transformation for matrix W */
retp(w-bet(w));
endp;
proc gls(w);
/* GLS transformation */
local term; term=w-(1-theta)*bet(w);
retp(term);
endp;
proc tam(w);
/* AM transformation, stacking time observations */
local i,term,xstar;
term=reshape(w[.,1],n,t).*.et;
xstar=term;
236
i=2;
do until i>cols(w);
term=reshape(w[.,i],n,t).*.et;
xstar=xstar term;
i=i+1;
endo;
retp(xstar);
endp;
proc tbms(w);
/* BMS transformation, stacking time observations but deleting last column
*/
local i,term,xstar;
term=reshape(w[.,1],n,t).*.et;
xstar=term[.,1:cols(term)-1];
i=2;
do until i>cols(w);
term=reshape(w[.,i],n,t).*.et;
xstar=xstar term[.,1:cols(term)-1];
i=i+1;
endo;
retp(xstar);
endp;
proc (5)=gmm(y,x,z,d);
local zx,w,w2,b,e,e2,b2,se,se2,sar2;
zx = z'x;
if d==1;
w = invpd(inw(z));
else;
w = invpd(z'z);
endif;
b = invpd(zx'w*zx)*zx'w*z'y;
e = y-x*b;
w2 = ezw(e,z);
se = invpd(zx'w*zx)*zx'w*w2*w*zx*invpd(zx'w*zx);
237
w = invpd(w2);
se2 = invpd(zx'w*zx);
b2 = se2*zx'w*z'y;
e2 = y-x*b2;
sar2 = e2'z*w*z'e2;
retp(b,sqrt(diag(se)),b2,sqrt(diag(se2)),sar2);
endp;
proc ezw(e,z);
local k,ez,T;
T = rows(e)/N;
k = cols(z);
ez = reshape(e.*z,N,K*T)*(ones(T,1).*.eye(K));
retp(ez'ez);
endp;
proc inw(z);
local a,i,zi,zaz,T;
t = rows(z)/N;
a = eye(T);
zaz = 0;
i = 1;
do until i>N;
zi = z[(i-1)*T+1:i*T,.];
zaz = zaz + zi'a*zi;
i = i+1;
endo;
retp(zaz);
endp;
238
APPENDIX 11. DPD ESTIMATION WITH GAUSS
Appendix 11. DPD estimation with Gauss c
/* DPD1.PRG Program for DPD (Dynamic Panel Data model)

Method: Arellano-Bond */
/* Defines variables below as global */
clearg N,T,y,x,z,alpha,sco,hes,zgy,fake,mom,w;
/*Read data*/
n=595; t=7; nvar=13;
load x[n*t,nvar]=d:/dea/panel/psid.dat;
lwage=x[.,13];
wks=x[.,3];
occ=x[.,4];
clear x;
/* Create a (NxT) matrix for dependent var.
y=reshape(lwage,n,t);
/* Stack exogenous vars. */
x=wks occ;
*/
/* Set top=0 for instruments from lagged Y's only;

top=1 to add instruments from X that are weakly exogenous and in level;
set top=2 to add for instruments from X that are strongly exogenous and
in first-difference form */
top=2;
/* Set AR1 to 0 for general case, and AR1 to 1
for serially correlated epsilon's of order 1 (E (epi tepi ; t + 1) <> 0) */
ar1=1;
/* You don't need to change anything after this line */
/* Define identity matrices I(T-2) for AB and BB */
ddif = eye(T-2);
/* Construct AB instrument matrix Z.
239
First component matrix: lagged Y's
Recall: if AR1=1, restriction when epsilon's are serially correlated
of order 1 */
z = (y[.,1]).*.ddif[.,1];
j = 2;
do until j>cols(ddif);
z = z ((y[.,1:j]).*.ddif[.,j]);
j = j+1;
endo;
if ar1==1;
z = (y[.,1]).*.ddif[.,1];
j = 2;
z = z ((y[.,1:j-1]).*.ddif[.,j]);
j = j+1;
endo;
z=z[.,2:cols(z)];
endif;
/* Second component matrix: Instruments from X */
/* Delete this block if you want only instruments from y's */
if top==1;
/* Weakly exogenous X's, in level */
toto=shapent(x[.,1]);
z2 = (toto[.,1]).*.ddif[.,1];
j = 2;
z2 = z2 ((toto[.,1:j]).*.ddif[.,j]);
j = j+1;
endo;
i=2;
do until i>cols(x);
toto=shapent(x[.,i]);
z2 =z2 ((toto[.,1]).*.ddif[.,1]);
j = 2;
z2 = z2 ((toto[.,1:j]).*.ddif[.,j]);
240
j = j+1;
endo;
i=i+1;
endo;
z=z z2;
endif;
if top==2;
/* Strongly exogenous X's, in first-difference form */
toto=shapent(x[.,1]);
z2 = (toto[.,3]-toto[.,2]).*.ddif[.,1];
j = 2;
z2 = z2 ((toto[.,j]-toto[.,j-1]).*.ddif[.,j]);
j = j+1;
endo;
i=2;do until i>cols(x);
toto=shapent(x[.,i]);
z2 = z2 ((toto[.,3]-toto[.,2]).*.ddif[.,1]);
j = 2;
z2 = z2 ((toto[.,j]-toto[.,j-1]).*.ddif[.,j]);
j = j+1;
endo;
i=i+1;
endo;
z=z z2;
endif;
b1,se1,b2,se2,sar = gmm(vec((y[.,3:T]-y[.,2:T-1])'),
vec((y[.,2:T-1]-y[.,1:T-2])')
trans(x),z,1);
output file = dpd1.out on;
"Arellano-Bond GMM estimates";
if top ==0;
"Instruments from lagged Y's only (TOP=0)";
endif;
if top==1;
241
"Instruments from X are weakly exogenous and in level (TOP=1)";
endif;
if top==2;
"Instruments from X are strongly exogenous and first-differenced (TOP=2)";
endif;
if ar1==1;
"Restricted estimates: epsilon are serially correlated of order 1 (AR1=1)";
endif;
" Estimate standard error t-stat";
b2 se2 b2./se2;
"Nb. of conditions (instruments) " cols(z);
"Nb. of parameters " rows(b2);
"Hansen specification test and p-value ";
sar cdfchic(sar,cols(z)-rows(b2));
output off;
proc shapent(w);
/* Reshapes vector in NxT form */
retp(reshape(w,n,t));
endp;
proc trans(w);
/* Transforms matrix X in First Difference */
local toto,i,xfd;
toto=reshape(w[.,1],n,t);
toto=vec((toto[.,3:T]-toto[.,2:T-1])');
xfd=toto;
i=2;
do until i>cols(w);
toto=reshape(w[.,i],n,t);
toto=vec((toto[.,3:T]-toto[.,2:T-1])');
xfd=xfd toto;
i=i+1;
endo;
retp(xfd);
endp;
242
proc (2)=ls(y,x);
/* Computes OLS, returns White var-covar matrix */
local ixx,b,e,v;
ixx = invpd(x'x);
b = ixx*x'y;
e = y-x*b;
v = ixx*(ezw(e,x))*ixx;
retp(b,v);
endp;
proc ezw(e,z);
local k,ez,T;
T = rows(e)/N;
k = cols(z);
ez = reshape(e.*z,N,K*T)*(ones(T,1).*.eye(K));
retp(ez'ez);
endp;
proc inw(z);
local d,a,i,zi,zaz,T;
T = rows(z)/N;
d = zeros(T,1) (eye(T-1)|zeros(1,T-1));
a = 2*eye(T) - (d + d');
zaz = 0;
i = 1;
do until i>N;
zi = z[(i-1)*T+1:i*T,.];
zaz = zaz + zi'a*zi;
i = i+1;
endo;
retp(zaz);
endp;
proc (5)=gmm(y,x,z,d);
local zx,w,w2,b,e,e2,b2,se,se2,sar2;
zx = z'x;
243
if d==1;
w = invpd(inw(z));
else;
w = invpd(z'z);
endif;
b = invpd(zx'w*zx)*zx'w*z'y;
e = y-x*b;
w2 = ezw(e,z);
se = invpd(zx'w*zx)*zx'w*w2*w*zx*invpd(zx'w*zx);
w = invpd(w2);
se2 = invpd(zx'w*zx);
b2 = se2*zx'w*z'y;
e2 = y-x*b2;
sar2 = e2'z*w*z'e2;
retp(b,sqrt(diag(se)),b2,sqrt(diag(se2)),sar2);
endp;
244
REFERENCES
References
S.C. Ahn and P. Schmidt, Ecient Estimation of Models for Dynamic Panel
Data, Journal of Econometrics, 68, 5-27, 1995.
S.C. Ahn and P. Schmidt, A Separability Result for GMM Estimation, with
Applications to GLS Prediction and Conditional Moment Tests, Econometric Reviews, 14(1), 19-34, 1995.
S.C. Ahn and P. Schmidt, Ecient Estimation of Dynamic Panel Data Models:
Alternative Assumptions and Simplied Estimation, Journal of Econometrics, 76,
309-321, 1997.
S.C. Ahn, Y.H. Lee and P. Schmidt, GMM Estimation of Linear Panel Data
Models with Time-varying Individual Eects, Journal of Econometrics, 101, 219255, 2001.
T. Amemiya, The estimation of the variances in a variance-components model,
International Economic Review, 12, 1-13, 1971.
T. Amemiya and T.E. MaCurdy, Instrumental-Variable Estimation of an ErrorComponents Model, Econometrica, 54(4), 869880, 1986.
E.B. Andersen, Conditional inference and models for measuring (Mentalhygiejnisk Forlag, Copenhague), 1973.
T.W. Anderson and C. Hsiao, Formulation and Estimation of Dynamic Models
Using Panel Data, Journal of Econometrics, 18, 4782, 1982.
D.W.K. Andrews, Heteroskedasticity and autocorrelation consistent covariance
matrix estimation, Econometrica, 59, 817-858, 1991.
D.W.K. Andrews and J.C. Monahan, An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator, Econometrica, 60, 953-966,
1992.
W. Antweiler, Nested Random Eects Estimation in Unbalanced Panel Data,
Journal of Econometrics, 101, 295-313, 2001.
M. Arellano, Discrete choices with panel data, working paper 0101, CEMFI,
245
2001.
M. Arellano and S. Bond, Some Tests of Specication for Panel Data: Monte
Carlo Evidence and an Application to Employment Equations, Review of Economic
Studies, 58, 277297, 1991.
M. Arellano and O. Bover, Another Look at the Instrumental Variable Estimation of Error-Components Models, Journal of Econometrics, 68, 2951, 1995.
J. Alvarez and M. Arellano, The Time Series and Cross Section Asymptotics
of Dynamic Panel Data Estimators, CEMFI Working Paper No. 9808, 1998.
P. Balestra and M. Nerlove, Pooling cross-section and time-series data in the
estimation of a dynamic model: the demand for natural gas, Econometrica, 34,
585-612,1966.
B.H. Baltagi, Econometric Analysis of Panel Data, J. Wiley, 1995.
B.H. Baltagi and S. Khanti-Akom, On ecient estimation with panel data:an
empirical comparison of instrumental variables estimators, Journal of Applied Econometrics, 5, 401-406, 1990.
B.H. Baltagi, Simultaneous equations with error components, Journal of Econometrics, 17, 189-200, 1981.
B.H. Baltagi, Specication issues, in The econometrics of panel data: Handbook of theory and applications, chap. 9, L. Matyas and P. Sevestre eds., Kluwer
Academix Publishers, Dordrecht, 196-205, 1992.
B.H. Baltagi, Panel data, Journal of Econometrics, 68, 1-268, 1995.
B.H. Baltagi, S.H. Song and B.C. Jung, The Unbalanced Nested Error Component Regression Model, Journal of Econometrics, 101, 357-381, 2001.
R. Blundell and S. Bond, GMM estimation with persistent panel data: An
application to production functions, IFS working paper W99/4, 1999.
R. Blundell and S. Bond, Initial Conditions and Moment Restrictions in Dynamic Panel Data Models, Journal of Econometrics, 87, 115143, 1998.
246
REFERENCES
A. Brsch-Supan and V. Hajivassiliou, Smooth unbiased multivariate probability simulators for maximum likelihood estimation of limited dependent variables
models, Cowles Foundation paper 960, Yale University, 1990.
T.S. Breusch, G.E. Mizon and P. Schmidt, Ecient Estimation Using Panel
Data, Econometrica, 57(3), 695-700, 1989.
G. Chamberlain, Asymptotic Eciency in Estimation with Conditional Moment Restrictions, Journal of Econometrics, 34, 305-334, 1987.
G. Chamberlain, Panel data, in Handbook of Econometrics, pp. 1247-1318, Z.
Griliches and M. Intriligator eds., North- Holland, Amsterdam, 1984.
G. Chamberlain, Comment: Sequential Moment Restrictions in Panel Data,
Journal of Business and Economic Statistics, 10, 20-26, 1992.
G. Chamberlain, Multivariate regression models for panel data, Journal of
Econometrics, 18, 5-46, 1982.
E. Charlier, B. Melenberg and A. van Soest, Estimation of a censored regression panel data model using conditional moment restrictions eciently, Journal of
Econometrics, 95, 25-56, 2000.
C. Cornwell. and P. Rupert, Ecient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators, Journal of Applied Econometrics, 3, 149-155, 1988.
B. Crpon, F. Kramarz and A. Trognon, Parameters of Interest, Nuisance Parameters and Orthogonality Conditions. An Application to Autoregressive Error
Component Models, Journal of Econometrics, 82, 135156, 1997.
C. Cornwell, P. Schmidt and D. Wyhowski, Simultaneous equations and panel
data, Journal of Econometrics, 51, 151-181, 1992.
G. Dionne, R. Gagn and C. Vanasse, Inferring technological parameters from
incomplete panel data, Journal of Econometrics, 87, 303-327, 1998.
J. Dolado, Optimal instrumental variable estimator of the AR parameter of an
ARMA(1,1) process, Econometric Theory, 6, 117-119.
247
B. Dormont, Introduction l'Economtrie des Donnes de Panel, Editions du
Centre National de la Recherche Scientique, Paris, 1989.
E. Fix and J.L. Hodges, Discriminatory analysis, nonparametric estimation:
consistent properties, Report No 4, USAF School of Aviation Medicine, Randolph
Field, Texas, 1951.
J. Geweke, Bayesian inference in econometric models using Monte Carlo integration, Econometrica, 57, 1317-1339, 1989.
S. Girma, A quasi-dierencing approach to dynamic modelling from a time series of independent cross-sections, Journal of Econometrics, 365-383, 2000.
R. Hall, Stochastic implications of the life cycle-permanent income hypothesis,
Journal of Political Economy, 86, 971-987, 1978.
B.E. Hansen, Threshold Eects in Non-Dynamic Panels: Estimation, Testing,
and Inference,Journal of Econometrics, 93, 345368, 1999.
L.P. Hansen, Large sample properties of generalized method of moments estimators, Econometrica, 50, 102-1054, 1982.
L.P. Hansen, A method of calculating bounds on the asymptotic covariance
matrices of generalized method of moments estimators, Journal of Econometrics,
30, 203-238, 1985.
L.P. Hansen and T.J. Sargent, Instrumental variables procedures for estimating
linear rational expectations models, Journal of Monetary Economics, 9, 263-296,
1982.
L.P. Hansen and K.J. Singleton, Generalized instrumental variable estimation
of nonlinear rational expectations models, Econometrica, 50, 1269-1286, 1982.
L.P. Hansen, J.C. Heaton and A. Yaron, Finite-sample properties of some alternative GMM estimators, Journal of Business and Economics Statistics, 14, 262-280,
1993.
W. Hrdle and J.S. Marron, Optimal bandwidth selection in nonparametric
regression function estimation, Annals of Statistics, 13 1465-1481, 1983.
R.D.F. Harris and E. Tzavalis, Inference for unit roots in dynamic panels where
248
REFERENCES
the time dimension is xed, Journal of Econometrics, 91, 201-226, 1999.

J.A. Hausman, Specication Tests in Econometrics, Econometrica, 46(6), 12511271,
1978.
J.A. Hausman and W.E. Taylor, Panel Data and Unobservable Individual Effects, Econometrica, 49(6), 13771398, 1981.
J.J. Heckman and T.E. MaCurdy, A life-cycle model of female labor supply,
Review of Economic Studies, 47, 47-74, 1980.
I. Hoch, Estimation of production function parameters combining time-series
and cross-section data, Econometrica, 30, 34-53, 1962.
D. Holtz-Eakin, W. Newey and H. Rosen, Estimating Vector Autoregressions
with Panel Data, Econometrica, 56, 13711395, 1988.
B.E. Honor and A. Lewbel, Semiparametric binary choice panel data models
without strictly exogeneous regressors, working paper, Boston College, 2000.
C. Hsiao, Analysis of Panel Data, Cambridge University Press, 1986.
K.S. Im, S.C. Ahn, P. Schmidt and J.M. Wooldridge, Ecient estimation of
panel data models with strictly exogenous explanatory variables, Journal of Econometrics, 93, 177-201, 1999.
G.W. Imbens, One-step estimators for over-identied generalized method of
moments models, Review of Economic Studies, 64, 359-383.
J. Inkmann, Misspecied heteroskedasticity in the panel Probit model: A small
sample comparison of GMM and SML estimators, Journal of Econometrics, 97, 227259, 2000.
R.A. Judson and A.L. Owen, Estimating dynamic panel data models: A guide
for macroeconomists, Economics Letters, 65, 9-15, 1999.
M.P. Keane and D.E. Runkle, On the estimation of panel-data models with
serial correlation when instruments are not strictly exogenous, Journal of Business
and Economic Statistics, 10, 1-9, 1992.
N.M. Kiefer, A Time Series-Cross Section Model with Fixed Eects with an
249
Intertemporal Factor Structure, unpublished manuscript, Cornell University, 1980.
E. Kyriazidou, Estimation of a panel data sample selection model, Econometrica, 65, 1335-1364, 1997.
Y.H. Lee and P. Schmidt, A Production Frontier Model with Flexible Temporal
Variation in Technical Ineciency, in The Measurement of Productive Eciency:
Techniques and Applications, Oxford University Press, 1993.
L.A. Lillard and Y. Weiss, Components of Variation in Panel Earnings Data:
American Scientists 1960-1970, Econometrica, 47, 437454, 1979.
R. Lucas, Econometric policy evaluation: A critique, in The Phillips curve and
labor markets, K. Brunner (Ed.), Vol. 1, North-Holland, 1976.
Y.P. Mack, Local properties of k N N regression estimates, SIAM Journal of
Algebraic and discrete methods, 2, 311-323, 1981.
L. Matyas and P. Sevestre, The Econometrics of Panel Data. Handbook of
Theory and Applications, Kluwer Academic Publishers, 1992.
P. Mazodier and A. Trognon, Heteroskedasticity and stratication in error components models, Annales de l'INSEE, 30-31, 451-482, 1978.
C. Meghir and F. Windmeijer, Moment Conditions for Dynamic Panel Data
Models with Multiplicative Individual Eects in the Conditional Variance,IFS
Working Paper Series No. W97/21, 1997.
R. Mott, Identication and estimation of dynamic models with a time series
of repeated cross-sections, Journal of Econometrics, 59, 99-123, 1993.
M. Nerlove, A note on error components models, Econometrica, 39, 383-396,
1971.
W.K. Newey, Ecient estimation of models with conditional moment restrictions, in Handbook of Statistics, C.R. Rao and H.D. Vinod (Eds.), Vol. 11, Elsevier
Science Publishers, 1993.
W.K. Newey, Ecient instrumental variables estimation of nonlinear models,
Econometrica, 58, 809-837, 1990.
250
REFERENCES
W.K. Newey and K.D. West, Automatic lag selection in covariance estimation,
Review of Economic Studies, 61, 631-653, 1994.
W.K. Newey and K.D. West, Hypothesis testing with ecient method of moments estimation, International Economic Review, 28, 777-787, 1987.
W.K. Newey and K.D. West, A simple, positive denite, heteroscedasticity and
autocorrelation consistent covariance matrix, Econometrica, 55, 703-708, 1987.
P. Schmidt, S.C. Ahn and D. Wyhowski, Comment: Sequential Moment Restrictions in Panel Data,Journal of Business and Economic Statistics, 10, 1014,
1992.
C.J. Stone, Consistent nonparametric regression, Annals of Statistics, 5, 595645, 1977.
P.A.V.B. Swamy and S.S. Arora, The exact nite sample properties of the estimators of coecients in the error components regression models, Econometrica,
40, 261-275, 1972.
M. Verbeek and T.E. Nijman, Testing for selectivity bias in panel data models,
International Economic Review, 33, 681-703, 1992.
M. Verbeek and T.E. Nijman, Minimum MSE estimation of a regression model
with xed eects and a series of cross- sections, Journal of Econometrics, 59, 125136, 1993.
T.D. Wallace and A. Hussain, The use of error components models in combining cross-sction and time-series data, Econometrica, 37, 55-72, 1969.
T.J. Wansbeek and A. Kapteyn, Estimation of the error components model
with incomplete panels, Journal of Econometrics, 41, 341-361, 1989.
H. White, A heteroscedasticity consistent covariance matrix estimator and a
direct test for heteroscedasticity, Econometrica, 48, 817-838, 1980.
H. White, Asymptotic theory for econometricians, Academic Press, Orlando,
1984.
251
J.M. Wooldridge, A framework for estimating dynamic, unobserved eects
panel data models with possible feedback to future explanatory variables, Economics Letters, 68, 245-250, 2000.

Panel DEEQA

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Panel DEEQA

Transféré par

Droits d'auteur :

Formats disponibles

DEEQA,Ecole Doctorale MPSE

Academic year 2003-2004

Purpose of the course

 Present recent developments in econometrics, that allow for

Panel data analysis.

 Present a convenient econometric framework for dealing with

Method of Moments estimation.

Deal with discrete-choice models with unobserved hetero-

Two keywords: unobserved heterogeneity and endogeneity.

- Generalized Method of Moments for Times Series

- Logit and Probit models for Panel Data

Statistical software: SAS, GAUSS, STATA (?)

Panel Data Models

Gains in pooling cross section and time series . . .

Discrimination between alternative models .

Less colinearity between explanatory variables 11

May reduce bias due to missing or unobserved variables

The linear model

Standard matrices and operators . . . . . .

Important properties of operators

The One-Way Fixed Eects model . . . . . . . . .

The estimator in terms of the Frisch-WaughLovell theorem . . . . . . . . . . . . . . . .

Interpretation as a covariance estimator

Testing for poolability and individual eects

The Random Eects model . . . . . . . . . . . . .

Notation and assumptions

GLS estimation of the Random-eect model

Comparison between GLS, OLS and Within

Fixed individual eects or error components? 29

Example: Wage equation, Hausman (1978)

Best Quadratic Unbiased Estimators (BQU)

The Two-way panel data model . . . . . . . . . . .

The Two-way xed-eect model

Example: Production function (Hoch 1962)

More on non-spherical disturbances

Heteroskedasticity in individual eect

Unbalanced panel data models

Fixed eect models for unbalanced panels .

Augmented panel data models

Choice between Within and GLS . . . . . . . . . .

An important test for endogeneity

Instrumental Variable estimation: Hausman-Taylor

Instrumental Variable estimation . . . . . .

Exogeneity assumptions and a rst instru-

More ecient procedures: Amemiya-MaCurdy

Full IV-GLS estimation procedure

Example: Wage equation

Application: returns to education

Variables related to job status

Variables related to characteristics of households heads

Computation of variance-covariance matrix for IV

Dynamic panel data models

Dynamic formulations from dynamic programming problems . . . . . . . . . . . . .

Euler equations and consumption . . . . . .

Long-run relationships in economics

The dynamic xed-eect model . . . . . . . . . . .

Bias in the Fixed-Eects estimator . . . . .

The Random-eects model

Bias in the ML estimator . . . . . . . . . .

The role of initial conditions

Possible inconsistency of GLS . . . . . . . .

Example: The Balestra-Nerlove study

Generalized Method of Moments estimation

Present recent developments in econometrics, that allow for

Present a convenient econometric framework for dealing with

The One-Way Fixed Eects model . . . . . . . . .

Testing for poolability and individual eects

The Random Eects model . . . . . . . . . . . . .

GLS estimation of the Random-eect model

Fixed individual eects or error components? 29

The Two-way xed-eect model

Heteroskedasticity in individual eect

Fixed eect models for unbalanced panels .

Exogeneity assumptions and a rst instru-

More ecient procedures: Amemiya-MaCurdy

The dynamic xed-eect model . . . . . . . . . . .

Bias in the Fixed-Eects estimator . . . . .

The Random-eects model

Example: Just-identied IV model . . . . .

A rst feasible estimator

A more ecient estimator . . . . . . . . . .

More ecient procedures (Ahn-Schmidt) . . . . . .

Dynamic models with Multiplicative eects

Multiplicative individual eects . . . . . . .

Appendix 3. The one-way unbalanced random eects

individual control variables (workers, rms);

(xed) individual attributes;

Policy variables have a signicant impact whatever individual

Dierences across individuals are due to idiosyncratic individual

In practice, observed dierences across individuals may be due

a) W AGE = 0 + 1EDUCAT ION + 2Z .

People with higher education level have higher wages because

People have higher education because they have higher ability