solved using various versions of Stata, with some dating back to Stata 4.0.
Partly out of laziness, but also because it is useful for students to see
computer output, I have included Stata output in most cases rather than type
tables.
versions of Stata.
Currently, there are some missing solutions.
For
Please
report any mistakes or discrepencies you might come across by sending me email at wooldri1@msu.edu.
CHAPTER 2
dE(yx1,x2)
dE(yx1,x2)
= b1 + b4x2 and
= b2 + 2b3x2 + b4x1.
dx1
dx2
2
b. By definition, E(ux1,x2) = 0. Because x2 and x1x2 are just functions
2.1. a.


E(ux1,x2) = 0.
dE(yx1,x2)/dx1 = b1 + b3x2.
Because E(x2) = 0, b1 =
1
E[dE(yx1,x2)/dx1].
Similarly, b2 = E[dE(yx1,x2)/dx2].
Under the
assumptions we have made, the linear projection in (2.48) does have as its
slope coefficients on x1 and x2 the partial effects at the population average
values of x1 and x2  zero in both cases  but it does not allow us to
obtain the partial effects at any other values of x1 and x2.
Incidentally,
By
> s21.
This
simple conclusion means that, when error variances are constant, the error
variance falls as more explanatory variables are conditioned on.
y = g(x) + zB + u, E(ux,z) = 0.
Take the expected value of this equation conditional only on x:
E(yx) = g(x) + [E(zx)]B,
and subtract this from the first equation to get
y  E(yx) = [z  E(zx)]B + u
~
~
or y = zB + u.
~
~
Because z is a function of (x,z), E(uz) = 0 (since E(ux,z) =
~ ~
~
0), and so E(yz) = zB.
^
~
_ yi  E(y
ixi) and zi _ zi  
~
~
is estimated from an OLS regression yi on zi, i = 1,...,N.
Under
CHAPTER 3
3.1. To prove Lemma 3.1, we must show that for all e > 0, there exists be <
and an integer Ne such that P[xN
following fact:
since xN
p
L
But
We use the
> Ne .
inequality), and so
a > 1]
_ a + 1
p
L
g(c).

b. By the CLT,





Avar(yN) = s /N.



Therefore,


1)
is used:
^2
s = (N 
The asymptotic
i=1
^
standard error of yN is simply s/rN.


^
^
^
2
^
if g = g(q) then Avar[rN(g  g)] = [dg(q)/dq] Avar[rN(q  q)].


When g(q) =
^
log(q)  which is, of course, continuously differentiable  Avar[rN(g  g)]

2
^
= (1/q) Avar[rN(q  q)].

^
c. In the scalar case, the asymptotic standard error of g is generally
dg(^q)/dqWse(^q).
^
^ ^
Therefore, for g(q) = log(q), se(g) = se(q)/q.
^
^
and se(q) = 2, g = log(4)
^
When q = 4
^
^
d. The asymptotic t statistic for testing H0: q = 1 is (q  1)/se(q) =
3/2 = 1.5.
e. Because g = log(q), the null of interest can also be stated as H0: g =
4
0.
^
The t statistic based on g is about 1.39/(.5) = 2.78.
This leads to a
^
very strong rejection of H0, whereas the t statistic based on q is, at best,
marginally significant.
where G(Q) =
G)]
~
= G(Q)V1G(Q), Avar[rN(G 
Dqg(Q) is Q * P.
~
Avar[rN(G 
G)]
G)]
= G(Q)V2G(Q),
Therefore,
^
 Avar[rN(G 
G)]
= G(Q)(V2  V1)G(Q).
CHAPTER 4
Therefore
dg/db1 = 100Wexp(b1).
respect to b1:
^
The asymptotic standard error of q1
^
using the delta method is obtained as the absolute value of dg/db1 times
^
se(b1):
^
^
^
se(q1) = [100Wexp(b1)]Wse(b1).
c. We can evaluate the conditional expectation in part (a) at two levels
of education, say educ0 and educ1, all else fixed.
For
^
^
Then q2 = 29.7 and se(q2) = 3.11.
^
^
Therefore, q1 = 22.01 and se(q1) = 4.76.
^ ^
(B,g).
D)
Since Var(yw) =
is s [E(ww)] ,where
2
1
the upper K
* K block gives
6
Avar
^
2
1
rN(B
 B) = s [E(xx)] .

~
rN(B
 B).
It is helpful to write y = xB + v

_ y  E(yx,z).
Further, E(v
Unless E(z
~
1
2
1
rN(B
 B) = [E(xx)] E(v xx)[E(xx)] .

~
^
rN(B
 B)  Avar rN(B  B) is positive semidefinite by


writing
Avar
~
^
rN(B
 B)  Avar rN(B  B)


1
1
 s [E(xx)]
2
1
 s [E(xx)] E(xx)[E(xx)]
2
1
1
Because [E(xx)]
1
s E(xx) is p.s.d.
2
1
_ E(z2x).
= h
$ 0, is actually
x) = E(z2)
In particular, if E(z
2 2
students that
come from wealthier families tend to do better in school, other things equal.
Family income and PC ownership are positively correlated because the
probability of owning a PC increases with family income.
7
Another factor in u
a student
who had more exposure with computers in high school may be more likely to own
a computer.
^
b. b3 is likely to have an upward bias because of the positive
correlation between u and PC, but it is not clearcut because of the other
explanatory variables in the equation.
education is.
students home zip code, as zip code is often part of school records.
Proxies
The
1
But Corr(w1,w) =
Number of obs
F( 9,
925)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
935
37.28
0.0000
0.2662
0.2591
.36251
lwage 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+exper 
.0127522
.0032308
3.947
0.000
.0064117
.0190927
tenure 
.0109248
.0024457
4.467
0.000
.006125
.0157246
married 
.1921449
.0389094
4.938
0.000
.1157839
.2685059
south  .0820295
.0262222
3.128
0.002
.1334913
.0305676
urban 
.1758226
.0269095
6.534
0.000
.1230118
.2286334
black  .1303995
.0399014
3.268
0.001
.2087073
.0520917
educ 
.0498375
.007262
6.863
0.000
.0355856
.0640893
iq 
.0031183
.0010128
3.079
0.002
.0011306
.0051059
kww 
.003826
.0018521
2.066
0.039
.0001911
.0074608
_cons 
5.175644
.127776
40.506
0.000
4.924879
5.426408
. test iq kww
( 1)
( 2)
iq = 0.0
kww = 0.0
F(
2,
925) =
Prob > F =
8.59
0.0002
a. The estimated return to education using both IQ and KWW as proxies for
ability is about 5%.
jointly significant.
c. The wage differential between nonblacks and blacks does not disappear.
Blacks are estimated to earn about 13% less than nonblacks, holding all other
factors fixed.
Number of obs
F( 4,
85)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
90
15.15
0.0000
0.4162
0.3888
.42902
lcrmrte 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lprbarr  .7239696
.1153163
6.28
0.000
.9532493
.4946899
lprbconv  .4725112
.0831078
5.69
0.000
.6377519
.3072706
lprbpris 
.1596698
.2064441
0.77
0.441
.2507964
.570136
lavgsen 
.0764213
.1634732
0.47
0.641
.2486073
.4014499
_cons  4.867922
.4315307
11.28
0.000
5.725921
4.009923
Because of the loglog functional form, all coefficients are elasticities.
The elasticities of crime with respect to the arrest and conviction
probabilities are the sign we expect, and both are practically and
statistically significant.
of serving a prison term and the average sentence length are positive but are
statistically insignificant.
b. To add the previous years crime rate we first generate the lag:
. gen lcrmr_1 = lcrmrte[_n1] if d87
(540 missing values generated)
. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 if d87
10
Source 
SS
df
MS
+Model  23.3549731
5 4.67099462
Residual 
3.4447249
84
.04100863
+Total 
26.799698
89 .301120202
Number of obs
F( 5,
84)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
90
113.90
0.0000
0.8715
0.8638
.20251
lcrmrte 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lprbarr  .1850424
.0627624
2.95
0.004
.3098523
.0602325
lprbconv  .0386768
.0465999
0.83
0.409
.1313457
.0539921
lprbpris  .1266874
.0988505
1.28
0.204
.3232625
.0698876
lavgsen  .1520228
.0782915
1.94
0.056
.3077141
.0036684
lcrmr_1 
.7798129
.0452114
17.25
0.000
.6899051
.8697208
_cons  .7666256
.3130986
2.45
0.016
1.389257
.1439946
There are some notable changes in the coefficients on the original variables.
The elasticities with respect to prbarr and prbconv are much smaller now, but
still have signs predicted by a deterrenteffect story.
probability is no longer statistically significant.
The conviction
rate changes the signs of the elasticities with respect to prbpris and avgsen,
and the latter is almost statistically significant at the 5% level against a
twosided alternative (pvalue = .056).
respect to the lagged crime rate is large and very statistically significant.
(The elasticity is also statistically different from unity.)
c. Adding the logs of the nine wage variables gives the following:
Number of obs
F( 14,
75)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
90
43.81
0.0000
0.8911
0.8707
.19731
lcrmrte 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+11
lprbarr  .1725122
.0659533
2.62
0.011
.3038978
.0411265
lprbconv  .0683639
.049728
1.37
0.173
.1674273
.0306994
lprbpris  .2155553
.1024014
2.11
0.039
.4195493
.0115614
lavgsen  .1960546
.0844647
2.32
0.023
.364317
.0277923
lcrmr_1 
.7453414
.0530331
14.05
0.000
.6396942
.8509887
lwcon  .2850008
.1775178
1.61
0.113
.6386344
.0686327
lwtuc 
.0641312
.134327
0.48
0.634
.2034619
.3317244
lwtrd 
.253707
.2317449
1.09
0.277
.2079524
.7153665
lwfir  .0835258
.1964974
0.43
0.672
.4749687
.3079171
lwser 
.1127542
.0847427
1.33
0.187
.0560619
.2815703
lwmfg 
.0987371
.1186099
0.83
0.408
.1375459
.3350201
lwfed 
.3361278
.2453134
1.37
0.175
.1525615
.8248172
lwsta 
.0395089
.2072112
0.19
0.849
.3732769
.4522947
lwloc  .0369855
.3291546
0.11
0.911
.6926951
.618724
_cons  3.792525
1.957472
1.94
0.056
7.692009
.1069592
. testparm lwconlwloc
(
(
(
(
(
(
(
(
(
1)
2)
3)
4)
5)
6)
7)
8)
9)
lwcon
lwtuc
lwtrd
lwfir
lwser
lwmfg
lwfed
lwsta
lwloc
F(
=
=
=
=
=
=
=
=
=
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
9,
75) =
Prob > F =
1.50
0.1643
The nine wage variables are jointly insignificant even at the 15% level.
Plus, the elasticities are not consistently positive or negative.
The two
largest elasticities  which also have the largest absolute t statistics have the opposite sign.
12
example.
8, Cov(xB,u) is welldefined.
Cov(xB,u) = 0.
8.
Since Var(u)
b. This is nonsense when we view the xi as random draws along with yi.
2
This is
When we add
z to the regressor list, the error changes, and so does the error variance.
(It gets smaller.)
sense to think we have access to the entire set of factors that one would ever
want to control for, so we should allow for error variances to change across
different models for the same response variable.
2
c. Write R
= 1  SSR/SST = 1  (SSR/N)/(SST/N).
Therefore, plim(R ) = 1
2
where we use the fact that SSR/N is a consistent estimator of su and SST/N is
2
The
Neither
CHAPTER 5
5.1. Define x1
^
^ ^
_ (z1,y2) and x2 _ v^2, and let B
_ (B
1 ,r1) be OLS estimator
B^1
^ ^
= (D
1 ,a1).
B^1
partitioned regression:
^
(i) Regress x1 onto v2 and save the residuals, say
x1.
(ii) Regress y1 onto
x1.
^
^
But when we regress z1 onto v2, the residuals are just z1 since v2 is
N
orthogonal in sample to z.
(More precisely,
S zi1^vi2 = 0.)
Further, because
i=1
^
^
^
^
we can write y2 = y2 + v2, where y2 and v2 are orthogonal in sample, the
^
residuals from regressing y2 onto v2 are simply the first stage fitted values,
^
y2.
^
In other words,
x1 = (z1,y2).
B1
is obtained
^
exactly from the OLS regression y1 on z1, y2.
cigarettes.
States that have lower taxes on cigarettes may also have lower
Number of obs
F( 4, 1383)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
1388
12.55
0.0000
0.0350
0.0322
.18756
lbwght 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+male 
.0262407
.0100894
2.601
0.009
.0064486
.0460328
parity 
.0147292
.0056646
2.600
0.009
.0036171
.0258414
lfaminc 
.0180498
.0055837
3.233
0.001
.0070964
.0290032
packs  .0837281
.0171209
4.890
0.000
.1173139
.0501423
_cons 
4.675618
.0218813
213.681
0.000
4.632694
4.718542
. reg lbwght male parity lfaminc packs (male parity lfaminc cigprice)
Source 
SS
df
MS
+Model  91.3500269
4 22.8375067
Residual  141.770361 1383 .102509299
+Total  50.4203336 1387 .036352079
Number of obs
F( 4, 1383)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
(2SLS)
1388
2.39
0.0490
.
.
.32017
lbwght 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+packs 
.7971063
1.086275
0.734
0.463
1.333819
2.928031
male 
.0298205
.017779
1.677
0.094
.0050562
.0646972
parity  .0012391
.0219322
0.056
0.955
.044263
.0417848
lfaminc 
.063646
.0570128
1.116
0.264
.0481949
.1754869
_cons 
4.467861
.2588289
17.262
0.000
3.960122
4.975601

The difference between OLS and IV in the estimated effect of packs on bwght is
huge.
The IV estimate
The sign and size of the smoking effect are not realistic.
d. We can see the problem with IV by estimating the reduced form for
packs:
. reg packs male parity lfaminc cigprice
Source 
SS
df
MS
+Model  3.76705108
4
.94176277
Residual  119.929078 1383 .086716615
+Total  123.696129 1387 .089182501
Number of obs
F( 4, 1383)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
1388
10.86
0.0000
0.0305
0.0276
.29448
packs 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+male  .0047261
.0158539
0.298
0.766
.0358264
.0263742
parity 
.0181491
.0088802
2.044
0.041
.0007291
.0355692
lfaminc  .0526374
.0086991
6.051
0.000
.0697023
.0355724
cigprice 
.000777
.0007763
1.001
0.317
.0007459
.0022999
_cons 
.1374075
.1040005
1.321
0.187
.0666084
.3414234
The reduced form estimates show that cigprice does not significantly affect
packs; in fact, the coefficient on cigprice is not the sign we expect.
Thus,
the problem that cigprice may not truly be exogenous in the birth weight
equation.
5.5. Under the null hypothesis that q and z2 are uncorrelated, z1 and z2 are
exogenous in (5.55) because each is uncorrelated with u1.
16
Unfortunately, y2
is correlated with u1, and so the regression of y1 on z1, y2, z2 does not
produce a consistent estimator of 0 on z2 even when E(z
2 q) = 0.
that
^
J
1
We could find
J1
= 0 when z2
_ (1/d1).
(5.56)
Since each xj
equation once ability (and other factors, such as educ and exper), have been
controlled for.
but should have no partial effect on log(wage) once ability has been accounted
for.
be correlated with the indicator, q1, say IQ, once the xj have been netted
out.
Number of obs
F( 8,
713)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
722
25.81
0.0000
0.1546
0.1451
.38777
lwage 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+iq 
.0154368
.0077077
2.00
0.046
.0003044
.0305692
tenure 
.0076754
.0030956
2.48
0.013
.0015979
.0137529
educ 
.0161809
.0261982
0.62
0.537
.035254
.0676158
married 
.1901012
.0467592
4.07
0.000
.0982991
.2819033
south 
.047992
.0367425
1.31
0.192
.1201284
.0241444
urban 
.1869376
.0327986
5.70
0.000
.1225442
.2513311
black 
.0400269
.1138678
0.35
0.725
.1835294
.2635832
exper 
.0162185
.0040076
4.05
0.000
.0083503
.0240867
_cons 
4.471616
.468913
9.54
0.000
3.551
5.392231
. reg lwage exper tenure educ married south urban black kww (exper tenure educ
married south urban black meduc feduc sibs)
Instrumental variables (2SLS) regression
Source 
SS
df
MS
+18
Number of obs =
F( 8,
713) =
722
25.70
Model 
19.820304
8
2.477538
Residual  106.991612
713 .150058361
+Total  126.811916
721 .175883378
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
0.0000
0.1563
0.1468
.38737
lwage 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+kww 
.0249441
.0150576
1.66
0.098
.0046184
.0545067
tenure 
.0051145
.0037739
1.36
0.176
.0022947
.0125238
educ 
.0260808
.0255051
1.02
0.307
.0239933
.0761549
married 
.1605273
.0529759
3.03
0.003
.0565198
.2645347
south 
.091887
.0322147
2.85
0.004
.1551341
.0286399
urban 
.1484003
.0411598
3.61
0.000
.0675914
.2292093
black  .0424452
.0893695
0.47
0.635
.2179041
.1330137
exper 
.0068682
.0067471
1.02
0.309
.0063783
.0201147
_cons 
5.217818
.1627592
32.06
0.000
4.898273
5.537362
Even though there are 935 men in the sample, only 722 are used for the
estimation, because data are missing on meduc and feduc.
What we could do is
935 observations.
The return to education is estimated to be small and insignificant
whether IQ or KWW used is used as the indicator.
statistic for joint significance of meduc, feduc, and sibs have pvalues below
.002, so it seems the family background variables are sufficiently partially
correlated with the ability indicators.)
19
+ b3 totcoll + q4fouryr + u,
2SLS using exper, exper , dist2yr and dist4yr as the full set of instruments.
^
We can use the t statistic on q4 to test H0: q4 = 0 against H1: q4 > 0.
5.11. Following the hint, let y2 be the linear projection of y2 on z2, let a2
L2
is known.
(The results on
generated regressors in Section 6.1.1 show that the argument carries over to
the case when
L2
is estimated.)
Plugging in y2 = y2 + a2 gives
that
P2
The problem
By
y2
is known) is essentially
y1 = z1D1 + a1y2 + a1r2 + u1.
*
The
lesson is that one must be very careful if manually carrying out 2SLS by
explicitly doing the first and secondstage regressions.
5.13. a. In a simple regression model with a single IV, the IV estimate of the
^
slope can be written as b1 =



i=1




Next, write y





= (N0/N)(y1  y0).



Taking the
(When eligibility is


(^
9^
11
12
0
IK
)
, where IK
20
^11
is the K2 x K2
is L1 x K1, and
^12
is K2 x
^11
has
^,
can be written as
^11
be exactly zero, which means that at least one zh must appear in the
^11
has zeros in its second row, which means that the second row of
zeros.
is all
two instruments, only one of them turned out to be partially correlated with
x1 and x2.
c. Without loss of generality, we assume that zj appears in the reduced
form for xj; we can simply reorder the elements of z1 to ensure this is the
case.
Then
Looking at
^11
diagonals then
Therefore, rank
(^
11
12
9^
2
0
IK
)
, we see that if
20
^11
= K.
CHAPTER 6
6.1. a. Here is abbreviated Stata output for testing the null hypothesis that
educ is exogenous:
. qui reg educ nearc4 nearc2 exper expersq black south smsa reg661reg668
smsa66
. predict v2hat, resid
22
. reg lwage educ exper expersq black south smsa reg661reg668 smsa66 v2hat
lwage 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+educ 
.1570594
.0482814
3.253
0.001
.0623912
.2517275
exper 
.1188149
.0209423
5.673
0.000
.0777521
.1598776
expersq  .0023565
.0003191
7.384
0.000
.0029822
.0017308
black  .1232778
.0478882
2.574
0.010
.2171749
.0293807
south  .1431945
.0261202
5.482
0.000
.1944098
.0919791
smsa 
.100753
.0289435
3.481
0.000
.0440018
.1575042
reg661 
.102976
.0398738
2.583
0.010
.1811588
.0247932
reg662  .0002286
.0310325
0.007
0.994
.0610759
.0606186
reg663 
.0469556
.0299809
1.566
0.117
.0118296
.1057408
reg664  .0554084
.0359807
1.540
0.124
.1259578
.0151411
reg665 
.0515041
.0436804
1.179
0.238
.0341426
.1371509
reg666 
.0699968
.0489487
1.430
0.153
.0259797
.1659733
reg667 
.0390596
.0456842
0.855
0.393
.050516
.1286352
reg668  .1980371
.0482417
4.105
0.000
.2926273
.1034468
smsa66 
.0150626
.0205106
0.734
0.463
.0251538
.0552789
v2hat  .0828005
.0484086
1.710
0.087
.177718
.0121169
_cons 
3.339687
.821434
4.066
0.000
1.729054
4.950319
^
The t statistic on v2 is 1.71, which is not significant at the 5% level
against a twosided alternative.
is essentially the same finding that the 2SLS estimated return to education is
larger than the OLS estimate.
that educ is endogenous.
the same researcher may take t = 1.71 as evidence for or against endogeneity.)
b. To test the single overidentifying restiction we obtain the 2SLS
residuals:
. qui reg lwage educ exper expersq black south smsa reg661reg668 smsa66
(nearc4 nearc2 exper expersq black south smsa reg661reg668 smsa66)
. predict uhat1, resid
Now, we regress the 2SLS residuals on all exogenous variables:
. reg uhat1 exper expersq black south smsa reg661reg668 smsa66 nearc4 nearc2
Source 
SS
df
MS
Number of obs =
23
3010
+Model  .203922832
16 .012745177
Residual  491.568721 2993 .164239466
+Total  491.772644 3009 .163433913
F( 16, 2993)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
0.08
= 1.0000
= 0.0004
= 0.0049
= .40526
The test statistic is the sample size times the Rsquared from this regression:
. di 3010*.0004
1.204
. di chiprob(1,1.2)
.27332168
2
to test for each by estimating the two reduced forms, the rank condition could
still be violated (although see Problem 15.5c).
because of things like transportation costs that are not systematically related
to regional variations in individual productivity.
prices reflect food quality and that features of the food other than calories
and protein appear in the disturbance u1.
b. Since there are two endogenous explanatory variables we need at least
two prices.
c. We would first estimate the two reduced forms for calories and protein
2
by regressing each on a constant, exper, exper , educ, and the M prices, p1,
..., pM.
^
^
We obtain the residuals, v21 and v22.
2
^
^
regression log(produc) on 1, exper, exper , educ, v21, v22 and do a joint
24
^
^
significance test on v21 and v22.
heteroskedasticityrobust test.
Var(ux) = s .
2
freedom adjustment.
^2
^2
So ui  s has a zero sample average, which means that
asymptotically.)
N
1/2 N
^2
^2
1/2 N
^2
^2
S (hi  Mh)(u
S hi (u
i  s ) = N
i  s ).
1/2 N
i=1
i=1
Next, N
i=1
1/2 N
op(1).
So N
Therefore, so
i=1
far we have
1/2 N
N
^2
2
S hi (u^2i  ^s2) = N1/2 S (hi  Mh)(u
i  s ) + op(1).
i=1
i=1
1/2 N
i=1
2
h)ui
+ op(1).
^
[xi(B N
B)]2,
i=1
^2
2
^
Now, as in Problem 4.4, we can write ui = ui  2uixi(B 
B)
so
1/2 N
(6.40)
i=1
^
where the expression for the third term follows from [xi(B 
B)]2
^
= xi(B 
B)(B^
^
^
t xi)vec[(B
 B)(B  B)]. Dropping the "2" the second term can
& 1 N
*
^
^
be written as N S ui(hi  Mh)xi rN(B  B) = op(1)WOp(1) because rN(B  B) =
7 i=1
8
Op(1) and, under E(uixi) = 0, E[ui(hi  Mh)xi] = 0; the law of large numbers
B)xi
= (xi


1/2&
7N
^
^
1/2
S (hi  Mh)(xi t xi)*8{vec[rN(B
 B)rN(B  B)]} = N
WOp(1)WOp(1),
i=1
1 N


where we again use the fact that sample averages are Op(1) by the law of large
^
numbers and vec[rN(B 
B)rN(B^

B)]
= Op(1).
25
i=1
2
h)(ui
2 2
2uis
 s )] =
+ s .
2
E[(ui
2 2
 s ) (hi 
Mh)(hi
Mh)].
xi] = k2  s4 _ h2.
2 2
2 2
2 2
Mh)}
Mh)(hi
Mh)]xi}
show.
2 2
= E{E[(ui  s )
Mh)(hi
= ui 
[since E(uixi) = 0 is
= E{E[(ui  s ) (hi 
2 2
Now (ui  s )
Mh)].
Mh)(hi
Mh)]
xi](hi  Mh)(hi 
cQ distribution:
i=1
replace the matrix in the quadratic form with a consistent estimator, which is
^2& 1
h N
^2
1
where h = N
N ^2
^2 2
S (u
i  s ) .
N
S (hi  h)(hi  h)*8,
i=1


i=1
can be written as



^2
^2
Now h is just the total sum of squares in the ui, divided by N.
The numerator
^2
of the statistic is simply the explained sum of squares from the regression ui
on 1, hi, i = 1,...,N.
^2
2
(centered) Rsquared from the regression ui on 1, hi, i = 1,...,N, or NRc.
2
2 2
Mh)]
generally.
Mh)(hi
We replace
the population expected value with the sample average and replace any unknown
parameters (under H0).
B,
s , and
Mh
&
7
^2
^2 *
S hi (u
i  s )8
i=1
1/2 N
is
N
1 N

i=1
& SN (u
^2
^2
*& SN (u^2  ^s2)2(h  h)(h  h)*1
 s )(hi  h)
i
i
i
7i=1
87i=1 i
8
N
&
^2
^2 *
W7 S (hi  h)(ui  s )8,




i=1
which is easily seen to be the explained sum of squares from the regression of
^2
^2
1 on (ui  s )(hi  h), i = 1,...,N (without an intercept).

Number of obs
F( 1,
140)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
142
30.79
0.0000
0.1803
0.1744
.35429
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+ldist 
.3648752
.0657613
5.548
0.000
.2348615
.4948889
_cons 
8.047158
.6462419
12.452
0.000
6.769503
9.324813
This regression suggests a strong link between housing price and distance from
the incinerator (as distance increases, so does housing price).
27
The elasticity
regression:
the incinerator may have been put near homes with lower values to
begin with.
simple regression even if the new incinerator had no effect on housing prices.
b. The parameter d3 should be positive:
Here is my
Stata session:
. gen y81ldist = y81*ldist
. reg lprice y81 ldist y81ldist
Source 
SS
df
MS
+Model  24.3172548
3 8.10575159
Residual  37.1217306
317 .117103251
+Total  61.4389853
320 .191996829
Number of obs
F( 3,
317)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
321
69.22
0.0000
0.3958
0.3901
.3422
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+y81  .0113101
.8050622
0.014
0.989
1.59525
1.57263
ldist 
.316689
.0515323
6.145
0.000
.2153006
.4180775
y81ldist 
.0481862
.0817929
0.589
0.556
.1127394
.2091117
_cons 
8.058468
.5084358
15.850
0.000
7.058133
9.058803
The coefficient on ldist reveals the shortcoming of the regression in part (a).
This coefficient measures the relationship between lprice and ldist in 1978,
before the incinerator was even being rumored.
the null hypothesis that building the incinerator had no effect on housing
prices.
28
Number of obs
F( 11,
309)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
321
108.04
0.0000
0.7937
0.7863
.20256
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+y81 
.229847
.4877198
0.471
0.638
1.189519
.7298249
ldist 
.0866424
.0517205
1.675
0.095
.0151265
.1884113
y81ldist 
.0617759
.0495705
1.246
0.214
.0357625
.1593143
lintst 
.9633332
.3262647
2.953
0.003
.3213518
1.605315
lintstsq  .0591504
.0187723
3.151
0.002
.096088
.0222128
larea 
.3548562
.0512328
6.926
0.000
.2540468
.4556655
lland 
.109999
.0248165
4.432
0.000
.0611683
.1588297
age  .0073939
.0014108
5.241
0.000
.0101699
.0046178
agesq 
.0000315
8.69e06
3.627
0.000
.0000144
.0000486
rooms 
.0469214
.0171015
2.744
0.006
.0132713
.0805715
baths 
.0958867
.027479
3.489
0.000
.041817
.1499564
_cons 
2.305525
1.774032
1.300
0.195
1.185185
5.796236
The incinerator effect is now larger (the elasticity is about .062) and the t
statistic is larger, but the interaction is still statistically insignificant.
Using these models and this two years of data we must conclude the evidence
that housing prices were adversely affected by the new incinerator is somewhat
weak.
Number of obs
F( 14, 5334)
Prob > F
Rsquared
Adj Rsquared
=
=
=
=
=
5349
16.37
0.0000
0.0412
0.0387
Total 
8699.85385
5348
1.62674904
Root MSE
1.2505
ldurat 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+afchnge 
.0106274
.0449167
0.24
0.813
.0774276
.0986824
highearn 
.1757598
.0517462
3.40
0.001
.0743161
.2772035
afhigh 
.2308768
.0695248
3.32
0.001
.0945798
.3671738
male  .0979407
.0445498
2.20
0.028
.1852766
.0106049
married 
.1220995
.0391228
3.12
0.002
.0454027
.1987962
head  .5139003
.1292776
3.98
0.000
.7673372
.2604634
neck 
.2699126
.1614899
1.67
0.095
.0466737
.5864988
upextr 
.178539
.1011794
1.76
0.078
.376892
.0198141
trunk 
.1264514
.1090163
1.16
0.246
.0872651
.340168
lowback  .0085967
.1015267
0.08
0.933
.2076305
.1904371
lowextr  .1202911
.1023262
1.18
0.240
.3208922
.0803101
occdis 
.2727118
.210769
1.29
0.196
.1404816
.6859052
manuf  .1606709
.0409038
3.93
0.000
.2408591
.0804827
construc 
.1101967
.0518063
2.13
0.033
.0086352
.2117581
_cons 
1.245922
.1061677
11.74
0.000
1.03779
1.454054
The estimated coefficient on the interaction term is actually higher now, and
even more statistically significant than in equation (6.33).
This
means that making predictions of log(durat) would be very difficult given the
factors we have included in the regression:
Rsquared does not mean we have a biased or consistent estimator of the effect
of the policy change.
can get a reasonably precise estimate of the effect, although the 95%
confidence interval is pretty wide.
c. Using the data for Michigan to estimate the simple model gives
. reg ldurat afchnge highearn afhigh if mi
Source 
SS
df
MS
+Model  34.3850177
3 11.4616726
Residual  2879.96981 1520 1.89471698
+Total  2914.35483 1523 1.91356194
Number of obs
F( 3, 1520)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
1524
6.05
0.0004
0.0118
0.0098
1.3765
ldurat 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+afchnge 
.0973808
.0847879
1.15
0.251
.0689329
.2636945
highearn 
.1691388
.1055676
1.60
0.109
.0379348
.3762124
afhigh 
.1919906
.1541699
1.25
0.213
.1104176
.4943988
_cons 
1.412737
.0567172
24.91
0.000
1.301485
1.523989
The coefficient on the interaction term, .192, is remarkably similar to that
for Kentucky.
(5,626/1,524)
the importance of a large sample size for this kind of policy analysis.
6.11. The following is Stata output that I will use to answer the first three
parts:
. reg lwage y85 educ y85educ exper expersq union female y85fem
Source 
SS
df
MS
+Model  135.992074
8 16.9990092
31
Number of obs =
F( 8, 1075) =
Prob > F
=
1084
99.80
0.0000
Rsquared
=
Adj Rsquared =
Root MSE
=
0.4262
0.4219
.4127
lwage 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+y85 
.1178062
.1237817
0.95
0.341
.125075
.3606874
educ 
.0747209
.0066764
11.19
0.000
.0616206
.0878212
y85educ 
.0184605
.0093542
1.97
0.049
.000106
.036815
exper 
.0295843
.0035673
8.29
0.000
.0225846
.036584
expersq  .0003994
.0000775
5.15
0.000
.0005516
.0002473
union 
.2021319
.0302945
6.67
0.000
.1426888
.2615749
female  .3167086
.0366215
8.65
0.000
.3885663
.244851
y85fem 
.085052
.051309
1.66
0.098
.0156251
.185729
_cons 
.4589329
.0934485
4.91
0.000
.2755707
.642295

In fact, you can check that when 1978 wages are used, the
coefficient on y85 becomes about .383, which shows a significant fall in real
wages for given productivity characteristics and gender over the sevenyear
period.
d. To answer this question, I just took the squared OLS residuals and
regressed those on the year dummy, y85.
32
So
there is some evidence that the variance of the unexplained part of log wages
(or log real wages) has increased over time.
e. As the equation is written in the problem, the coefficient d0 is the
growth in nominal wages for a male with no years of education!
with 12 years of education, we want q0
_ d0 + 12d1.
For a male
^
^
^
the standard error of q0 = d0 + 12d1 is to replace y85Weduc with y85W(educ Simple algebra shows that, in the new model, q0 is the coefficient on
12).
educ.
In Stata we have
Number of obs
F( 8, 1075)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
1084
99.80
0.0000
0.4262
0.4219
.4127
lwage 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+y85 
.3393326
.0340099
9.98
0.000
.2725993
.4060659
educ 
.0747209
.0066764
11.19
0.000
.0616206
.0878212
y85educ0 
.0184605
.0093542
1.97
0.049
.000106
.036815
exper 
.0295843
.0035673
8.29
0.000
.0225846
.036584
expersq  .0003994
.0000775
5.15
0.000
.0005516
.0002473
union 
.2021319
.0302945
6.67
0.000
.1426888
.2615749
female  .3167086
.0366215
8.65
0.000
.3885663
.244851
y85fem 
.085052
.051309
1.66
0.098
.0156251
.185729
_cons 
.4589329
.0934485
4.91
0.000
.2755707
.642295
So the growth in nominal wages for a man with educ = 12 is about .339, or
33.9%.
[We could use the more accurate estimate, obtained from exp(.339) 1.]
33
CHAPTER 7
B^
&N1 SN XX *1&N1 SN Xu *.
7 i=1 i i8 7 i=1 i i8
From SOLS.2, the weak law of large numbers, and Slutskys Theorem,
plim
&N1 SN Xu * = 0. Thus,
7 i=1 i i8
^
& 1 N X *1Wplim &N1 SN Xu * = B + A1W0 = B. )
plim B = B + plim N S X
7 i=1 i i8
7 i=1 i i8
is diagonal,
it suffices to show that the GLS estimators for different equations are
asymptotically uncorrelated.
is block diagonal (see Section 3.5), where the blocking is by the parameter
vector for each equation.
from Theorem 7.4:
Now, we can use the special form of Xi for SUR (see Example 7.1), the fact
that
)1
),
SGLS.3
E(uiguihx
igxih) = E(uiguih)E(x
igxih) = 0, all g
$ h.
&s2
0
1 E(x
i1xi1)
2
1
0
W
E(X
i ) Xi) = 2
W
2
7
0
0
Therefore, we have
*
2
2.
0
2
s2
G E(x
iGxiG)8
0
B1
B2
imposed.
is diagonal in a SUR system, system OLS and GLS are the same.
Under SGLS.1 and SGLS.2, GLS and FGLS are asymptotically equivalent
(regardless of the structure of
=
B^GLS
when
and
))
 B
^
^
 ^
^
rN(
FGLS  BGLS) = op(1), then rN(BSOLS  BFGLS) = op(1).
B^SOLS
But, if
Thus,
)^
is
Note that
1
&^ 1 & N
*1
N
^
2)
t 7 S xi xi*82 = )
t &7 S xi xi*8 .
7
i=1
8
i=1
Therefore,
B^
1*
&^
N
^ 1
2)
t &7 S xi xi*8 2()
7
i=1
8
& SN xy *
& SN xy *
i i12
i i12
2i=1
2i=1
2
2
1
2
2
&
N
*
t IK)2
WW 2 = 2IG t &7 S xi xi*8 22
W
WW 22.
i=1
82
2
W 2 7
2 N
2
2 SN xy 2
i iG8
7 S xi yiG8
7
i=1
i=1
Straightforward multiplication shows that the right hand side of the equation
^
^
^
^
is just the vector of stacked Bg, g = 1,...,G. where Bg is the OLS estimator
for equation g.
with uit, which says that xi,t+1 = yit is correlated with uit.
does not hold.
xis, s > t.
Thus, SGLS.1
)1
However, since
is diagonal, X
i ) ui =
1
E(X
i ) ui) =
S s2
t E(x
ituit)
t=1
1
since E(x
ituit) = 0 under (7.80).
S xits2
t uit, and so
t=1
= 0
without SGLS.1.
d. First, since
)1
is diagonal, X
i)
= (s1 x
i1,s2 x
i2, ...,
1
2
2
s2
T x
iT),
and so
E(X
i ) uiu
i ) Xi) =
1
1
$ t.
2
S S s2
t ss E(uituisx
itxis).
t=1s=1
$ s.
for each t,
E(uitx
itxit) = E[E(uitx
itxitxit)] = E[E(uitxit)x
itxit)]
2
= E[stx
itxit] =
2
st2E(xitxit),
t = 1,2,...,T.
It follows that
E(X
i ) uiu
i ) Xi) =
1
1
1
S s2
t E(x
itxit) = E(X
i ) Xi).
t=1
36
Next,
^
^
e. First, run pooled regression across all i and t; let uit denote the
pooled OLS residuals.
Then, by
^2
st Lp s2t as N L 8.
In particular, standard
errors obtained from (7.51) are asymptotically valid, and F statistics from
(7.53) are valid.
th
Now, if
^
)
s^t2 as the
diagonal, then the FGLS statistics are easily shown to be identical to the
^2
st should be
obtained from the pooled OLS residuals for the unrestricted model.
g. If
to pooled OLS.
FGLS reduces
Number of obs
F( 4,
103)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
108
153.67
0.0000
0.8565
0.8509
.55064
lscrap 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+d89  .1153893
.1199127
0.962
0.338
.3532078
.1224292
grant  .1723924
.1257443
1.371
0.173
.4217765
.0769918
grant_1  .1073226
.1610378
0.666
0.507
.426703
.2120579
lscrap_1 
.8808216
.0357963
24.606
0.000
.809828
.9518152
_cons  .0371354
.0883283
0.420
0.675
.2123137
.138043
The estimated effect of grant, and its lag, are now the expected sign, but
neither is strongly statistically significant.
Number of obs
F( 4,
49)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
54
73.47
0.0000
0.8571
0.8454
.567
lscrap 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+grant 
.0165089
.215732
0.077
0.939
.4170208
.4500385
grant_1  .0276544
.1746251
0.158
0.875
.3785767
.3232679
lscrap_1 
.9204706
.0571831
16.097
0.000
.8055569
1.035384
uhat_1 
.2790328
.1576739
1.770
0.083
.0378247
.5958904
_cons 
.232525
.1146314
2.028
0.048
.4628854
.0021646
. reg lscrap d89 grant grant_1 lscrap_1 if year != 1987, robust cluster(fcode)
Regression with robust standard errors
Number of obs =
F( 4,
53) =
Prob > F
=
38
108
77.24
0.0000
Rsquared
Root MSE
=
=
0.8565
.55064

Robust
lscrap 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+d89  .1153893
.1145118
1.01
0.318
.3450708
.1142922
grant  .1723924
.1188807
1.45
0.153
.4108369
.0660522
grant_1  .1073226
.1790052
0.60
0.551
.4663616
.2517165
lscrap_1 
.8808216
.0645344
13.65
0.000
.7513821
1.010261
_cons  .0371354
.0893147
0.42
0.679
.216278
.1420073
The robust standard errors for grant and grant1 are actually smaller than the
usual ones, making both more statistically significant.
grant = 0.0
grant_1 = 0.0
F(
2,
53) =
Prob > F =
1.14
0.3266
There is
strong evidence of positive serial correlation in the static model, and the
fully robust standard errors are much larger than the nonrobust ones.
. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82d87
Source 
SS
df
MS
+Model  117.644669
11 10.6949699
Residual 
88.735673
618 .143585231
+Total  206.380342
629 .328108652
Number of obs
F( 11,
618)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
630
74.49
0.0000
0.5700
0.5624
.37893
lcrmrte 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lprbarr  .7195033
.0367657
19.570
0.000
.7917042
.6473024
lprbconv  .5456589
.0263683
20.694
0.000
.5974413
.4938765
lprbpris 
.2475521
.0672268
3.682
0.000
.1155314
.3795728
39
lavgsen  .0867575
.0579205
1.498
0.135
.2005023
.0269872
lpolpc 
.3659886
.0300252
12.189
0.000
.3070248
.4249525
d82 
.0051371
.057931
0.089
0.929
.1086284
.1189026
d83 
.043503
.0576243
0.755
0.451
.1566662
.0696601
d84  .1087542
.057923
1.878
0.061
.222504
.0049957
d85  .0780454
.0583244
1.338
0.181
.1925835
.0364927
d86  .0420791
.0578218
0.728
0.467
.15563
.0714718
d87  .0270426
.056899
0.475
0.635
.1387815
.0846963
_cons  2.082293
.2516253
8.275
0.000
2.576438
1.588149
. predict uhat, resid
. gen uhat_1 = uhat[_n1] if year > 81
(90 missing values generated)
. reg uhat uhat_1
Source 
SS
df
MS
+Model  46.6680407
1 46.6680407
Residual  30.1968286
538 .056127934
+Total  76.8648693
539 .142606437
Number of obs
F( 1,
538)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
540
831.46
0.0000
0.6071
0.6064
.23691
uhat 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+uhat_1 
.7918085
.02746
28.835
0.000
.7378666
.8457504
_cons 
1.74e10
.0101951
0.000
1.000
.0200271
.0200271
Because of the strong serial correlation, I obtain the fully robust standard
errors:
Number of obs =
F( 11,
89) =
Prob > F
=
Rsquared
=
Root MSE
=
630
37.19
0.0000
0.5700
.37893

Robust
lcrmrte 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lprbarr  .7195033
.1095979
6.56
0.000
.9372719
.5017347
40
lprbconv  .5456589
.0704368
7.75
0.000
.6856152
.4057025
lprbpris 
.2475521
.1088453
2.27
0.025
.0312787
.4638255
lavgsen  .0867575
.1130321
0.77
0.445
.3113499
.1378348
lpolpc 
.3659886
.121078
3.02
0.003
.1254092
.6065681
d82 
.0051371
.0367296
0.14
0.889
.0678438
.0781181
d83 
.043503
.033643
1.29
0.199
.1103509
.0233448
d84  .1087542
.0391758
2.78
0.007
.1865956
.0309127
d85  .0780454
.0385625
2.02
0.046
.1546683
.0014224
d86  .0420791
.0428788
0.98
0.329
.1272783
.0431201
d87  .0270426
.0381447
0.71
0.480
.1028353
.0487502
_cons  2.082293
.8647054
2.41
0.018
3.800445
.3641423
. drop uhat uhat_1
b. We lose the first year, 1981, when we add the lag of log(crmrte):
. gen lcrmrt_1 = lcrmrte[_n1] if year > 81
(90 missing values generated)
. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83d87 lcrmrt_1
Source 
SS
df
MS
+Model  163.287174
11 14.8442885
Residual  16.8670945
528 .031945255
+Total  180.154268
539 .334237975
Number of obs
F( 11,
528)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
540
464.68
0.0000
0.9064
0.9044
.17873
lcrmrte 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lprbarr  .1668349
.0229405
7.273
0.000
.2119007
.1217691
lprbconv  .1285118
.0165096
7.784
0.000
.1609444
.0960793
lprbpris  .0107492
.0345003
0.312
0.755
.078524
.0570255
lavgsen  .1152298
.030387
3.792
0.000
.174924
.0555355
lpolpc 
.101492
.0164261
6.179
0.000
.0692234
.1337606
d83  .0649438
.0267299
2.430
0.015
.1174537
.0124338
d84  .0536882
.0267623
2.006
0.045
.1062619
.0011145
d85  .0085982
.0268172
0.321
0.749
.0612797
.0440833
d86 
.0420159
.026896
1.562
0.119
.0108203
.0948522
d87 
.0671272
.0271816
2.470
0.014
.0137298
.1205245
lcrmrt_1 
.8263047
.0190806
43.306
0.000
.7888214
.8637879
_cons  .0304828
.1324195
0.230
0.818
.2906166
.229651
Not surprisingly, the lagged crime rate is very significant.
Further,
41
The
r is practically small).
standard errors.
d. None of the log(wage) variables is statistically significant, and the
magnitudes are pretty small in all cases:
. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83d87 lcrmrt_1 lwconlwloc
Source 
SS
df
MS
+Model  163.533423
20 8.17667116
Residual  16.6208452
519
.03202475
+Total  180.154268
539 .334237975
Number of obs
F( 20,
519)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
540
255.32
0.0000
0.9077
0.9042
.17895
lcrmrte 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lprbarr  .1746053
.0238458
7.322
0.000
.2214516
.1277591
lprbconv  .1337714
.0169096
7.911
0.000
.166991
.1005518
lprbpris  .0195318
.0352873
0.554
0.580
.0888553
.0497918
lavgsen  .1108926
.0311719
3.557
0.000
.1721313
.049654
lpolpc 
.1050704
.0172627
6.087
0.000
.071157
.1389838
d83  .0729231
.0286922
2.542
0.011
.1292903
.0165559
d84  .0652494
.0287165
2.272
0.023
.1216644
.0088345
42
d85  .0258059
.0326156
0.791
0.429
.0898807
.038269
d86 
.0263763
.0371746
0.710
0.478
.0466549
.0994076
d87 
.0465632
.0418004
1.114
0.266
.0355555
.1286819
lcrmrt_1 
.8087768
.0208067
38.871
0.000
.767901
.8496525
lwcon  .0283133
.0392516
0.721
0.471
.1054249
.0487983
lwtuc  .0034567
.0223995
0.154
0.877
.0474615
.0405482
lwtrd 
.0121236
.0439875
0.276
0.783
.0742918
.098539
lwfir 
.0296003
.0318995
0.928
0.354
.0330676
.0922683
lwser 
.012903
.0221872
0.582
0.561
.0306847
.0564908
lwmfg  .0409046
.0389325
1.051
0.294
.1173893
.0355801
lwfed 
.1070534
.0798526
1.341
0.181
.0498207
.2639275
lwsta  .0903894
.0660699
1.368
0.172
.2201867
.039408
lwloc 
.0961124
.1003172
0.958
0.338
.1009652
.29319
_cons  .6438061
.6335887
1.016
0.310
1.88852
.6009076
. test lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc
(
(
(
(
(
(
(
(
(
1)
2)
3)
4)
5)
6)
7)
8)
9)
lwcon
lwtuc
lwtrd
lwfir
lwser
lwmfg
lwfed
lwsta
lwloc
F(
=
=
=
=
=
=
=
=
=
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
9,
519) =
Prob > F =
0.85
0.5663
CHAPTER 8
8.1. Letting Q(b) denote the objective function in (8.23), it follows from
multivariable calculus that
dQ(b)
N
^& N
               = 2&7 S Zi Xi*
S Zi (yi  Xib)*8.
8 W7i=1
db
i=1
Evaluating the derivative at the solution
B^
gives
43
B^
Solving for
gives (8.24).
8.3. First, we can always write x as its linear projection plus an error:
*
+ e, where x
= z^ and E(ze) = 0.
x =
_ h(z),
^1
is M
* K and ^2 is Q * K.
^2
= 0.
But, from
^2
^2
_ h(z).
But
= 0.
1
C*
1/2
where D
_ *1/2WC.
1/2
C,
* L
positive semidefinite.
8.7. When
)^
^2&
N
^
^
S Zi )
Zi = Z(IN t ))Z
i=1
* ^2
block sg S z
7i=1 igzig8 _ sgZg Zg, where Zg
N
block Z
g Xg.
Further, ZX
1
1
This is just
If E(uizi) =
2
1
w(zi) = E(ui2zi).
rNconsistent
OLS estimator.
or
^
^
The
is used, and so
Without the
)(z)
Var(u1z) =
s21.
Further,
(1/s1)[z1,E(y2z)].
2
optimal instruments.
b. If y2 is binary then E(y2z) = P(y2 = 1z) = F(z), and so the optimal
IVs are [z1,F(z)].
45
CHAPTER 9
9.1. a. No.
We may be
interested in the tradeoff between wages and benefits, but then either of
these can be taken as the dependent variable and the analysis would be by OLS.
Of course, if we have omitted some important factors or have a measurement
error problem, OLS could be inconsistent for estimating the tradeoff.
But it
expenditures are assigned randomly across cities, then we could estimate the
crime equation by OLS.
The simultaneous equations model recognizes that cities choose law enforcement
expenditures in part on what they expect the crime rate to be.
An SEM is a
These are both choice variables of the firm, and the parameters
in a twoequation system modeling one in terms of the other, and vice versa,
have no economic meaning.
It makes no
sense to think about how exogenous changes in one would affect the other.
Further, suppose that we look at the effects of changes in local property tax
rates.
We would not want to hold fixed family saving and then measure the
from the support equation is the variable mremarr; since the support equation
contains one endogenous variable, this equation is identified if and only if
d21 $ 0.
mothers reaction function that does not also shift the fathers reaction
function.
The visits equation is identified if and only if at least one of finc and
fremarr actually appears in the support equation; that is, we need
d11 $ 0 or
d13 $ 0.
b. Each equation can be estimated by 2SLS using instruments 1, finc,
fremarr, dist, mremarr.
c. First, obtain the reduced form for visits:
47
visits =
^
Estimate this equation by OLS, and save the residuals, v2.
regression
^
support on 1, visits, finc, fremarr, dist, v2
^
and do a (heteroskedasticityrobust) t test that the coefficient on v2 is
zero.
Assuming
^
u2 on 1, finc, fremarr, dist, mremarr;
the sample size times the usual Rsquared from this regression is distributed
asymptotically as
exogenous.
A heteroskedasticityrobust test is also easy to obtain.
^
Let support
denote the fitted values from the reduced form regression for support.
Next,
^
regress finc (or fremarr) on support, mremarr, dist, and save the residuals,
^
say r1.
^ ^
Then, run the simple regression (without intercept) of 1 on u2r1; N 
9.5. a. Let
B1
denote the 7
B1
= (1,g12,g13,d11,d12,d13,d14).
48
The restrictions
&0 0
71 0
0
0
0
0
1
0
0*
.
18
0
1
* 3 matrix
&
d12
2d + d  1
14
7 13
d23
d22
+ d24  g21
d33
d32
*
2.
+ d34  g31
8
0,
But
g23 = 0, d22 =
d32
*
2.
+ d34  g31
8
0.
b. It is easy to see how to estimate the first equation under the given
assumptions.
Set
After simple
algebra we get
y1  z4 =
Note
9.7. a. Because alcohol and educ are endogenous in the first equation, we need
at least two elements in z(2) and/or z(3) that are not also in z(1).
49
Ideally,
we have at least one such element in z(2) and at least one such element in
z(3).
b. Let z denote all nonredundant exogenous variables in the system.
Then
(
2z i
Zi = 2 0
2 0
9
d. z(3) = z.
0
(zi,educi)
0
)
2
2.
zi2
0
0
9.9. a. Here is my Stata output for the 3SLS estimation of (9.28) and (9.29):
. reg3 (hours lwage educ age kidslt6 kidsge6 nwifeinc) (lwage hours educ exper
expersq)
Threestage least squares regression
Equation
Obs Parms
RMSE
"Rsq"
chi2
P
hours
428
6
1368.362
2.1145
34.53608
0.0000
lwage
428
4
.6892584
0.0895
79.87188
0.0000

Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+hours

lwage 
1676.933
431.169
3.89
0.000
831.8577
2522.009
educ  205.0267
51.84729
3.95
0.000
306.6455
103.4078
age  12.28121
8.261529
1.49
0.137
28.47351
3.911094
kidslt6  200.5673
134.2685
1.49
0.135
463.7287
62.59414
kidsge6  48.63986
35.95137
1.35
0.176
119.1032
21.82352
nwifeinc 
.3678943
3.451518
0.11
0.915
6.396957
7.132745
_cons 
2504.799
535.8919
4.67
0.000
1454.47
3555.128
+lwage

hours 
.000201
.0002109
0.95
0.340
.0002123
.0006143
educ 
.1129699
.0151452
7.46
0.000
.0832858
.1426539
exper 
.0208906
.0142782
1.46
0.143
.0070942
.0488753
50
expersq  .0002943
.0002614
1.13
0.260
.0008066
.000218
_cons  .7051103
.3045904
2.31
0.021
1.302097
.1081241
Endogenous variables: hours lwage
Exogenous variables:
educ age kidslt6 kidsge6 nwifeinc exper expersq

b. To be added.
9.11. a. Since z2 and z3 are both omitted from the first equation, we just
need
d11 $ 0.
Given
So our estimate of
Of
p11.
d22 $ 0.
(This is
Number of obs
F( 2,
111)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
114
45.17
0.0000
0.4487
0.4387
17.796
open 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lpcinc 
.5464812
1.49324
0.366
0.715
2.412473
3.505435
lland  7.567103
.8142162
9.294
0.000
9.180527
5.953679
_cons 
117.0845
15.8483
7.388
0.000
85.68006
148.489
This shows that log(land) is very statistically significant in the RF for
Smaller countries are more open.
open.
Number of obs
F( 2,
111)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
(2SLS)
114
2.79
0.0657
0.0309
0.0134
23.836
inf 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+open  .3374871
.1441212
2.342
0.021
.6230728
.0519014
lpcinc 
.3758247
2.015081
0.187
0.852
3.617192
4.368841
_cons 
26.89934
15.4012
1.747
0.083
3.61916
57.41783
52
Number of obs
F( 2,
111)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
114
2.63
0.0764
0.0453
0.0281
23.658
inf 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+open  .2150695
.0946289
2.273
0.025
.402583
.027556
lpcinc 
.0175683
1.975267
0.009
0.993
3.896555
3.931692
_cons 
25.10403
15.20522
1.651
0.102
5.026122
55.23419
The 2SLS estimate is notably larger in magnitude.
has a larger standard error.
endogenous.
d. If we add
A regression of open
Since
is a natural
2
of about 2.
Number of obs
F( 3,
110)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
(2SLS)
114
2.09
0.1060
.
.
24.40
inf 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+53
open  1.198637
.6205699
1.932
0.056
2.428461
.0311868
opensq 
.0075781
.0049828
1.521
0.131
.0022966
.0174527
lpcinc 
.5066092
2.069134
0.245
0.807
3.593929
4.607147
_cons 
43.17124
19.36141
2.230
0.028
4.801467
81.54102

The squared term indicates that the impact of open on inf diminishes; the
estimate would be significant at about the 6.5% level against a onesided
alternative.
e. Here is the Stata output for implementing the method described in the
problem:
. reg open lpcinc lland
Source 
SS
df
MS
+Model  28606.1936
2 14303.0968
Residual  35151.7966
111 316.682852
+Total  63757.9902
113 564.230002
Number of obs
F( 2,
111)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
114
45.17
0.0000
0.4487
0.4387
17.796
open 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lpcinc 
.5464812
1.49324
0.37
0.715
2.412473
3.505435
lland  7.567103
.8142162
9.29
0.000
9.180527
5.953679
_cons 
117.0845
15.8483
7.39
0.000
85.68006
148.489
. predict openh
(option xb assumed; fitted values)
. gen openhsq = openh^2
. reg inf openh openhsq lpcinc
Source 
SS
df
MS
+Model  3743.18411
3 1247.72804
Residual  61330.2376
110 557.547615
+Total  65073.4217
113 575.870989
Number of obs
F( 3,
110)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
114
2.24
0.0879
0.0575
0.0318
23.612
inf 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+54
openh  .8648092
.5394132
1.60
0.112
1.933799
.204181
openhsq 
.0060502
.0059682
1.01
0.313
.0057774
.0178777
lpcinc 
.0412172
2.023302
0.02
0.984
3.968493
4.050927
_cons 
39.17831
19.48041
2.01
0.047
.5727026
77.78391

Qualitatively, the results are similar to the correct IV method from part d.
If
is uncessary, less robust, and we cannot trust the standard errors, anyway.
CHAPTER 10
would have to find an instrument for the tax variable that is uncorrelated
with ci and correlated with the tax rate.
doing pooled OLS is a useful initial exercise; these results can be compared
with those from an FE analysis).
of zit, taxit, and disasterit in the sense that these are uncorrelated with
the errors uis for all t and s.
I have no strong intuition for the likely serial correlation properties
of the {uit}.
allowed for ci, in which case I would use standard fixed effects.
However, it
seems more likely that the uit are positively autocorrelated, in which case I
might use first differencing instead.
uncorrelated.
e. If taxit and disasterit do not have lagged effects on investment, then
the only possible violation of the strict exogeneity assumption is if future
values of these variables are correlated with uit.
this is not a worry for the disaster variable:
values, since a larger base means a smaller rate can achieve the same amount
of revenue.

.
xi2 = xi2  xi, and similarly for
yi1 and y
i2
B^FE
Dxi/2
Dyi/2.
Therefore,
x
i1xi1 + x
i2xi2 =
DxD
i xi/4 + DxD
i xi/4 = DxD
i xi/2
x
i1yi1 + x
i2 i2 =
DxD
i yi/4 + DxD
i yi/4 = DxD
i yi/2,
and so
B^FE
^
^
^
^
B
b. Let ui1 =
yi1 
xi1BFE and ui2 =
yi2  x
i2 FE be the fixed effects
residuals for the two time periods for cross section observation i.
=
B^FD,
Since
B^FE
^
where ei
^
^
DxiB
FD)/2 _ ei/2
^
^
^
Dyi/2  (Dxi/2)B
FD = (Dyi  DxiBFD)/2 _ ei/2,
^
_ Dyi  DxiB
FD are the first difference residuals, i = 1,2,...,N.
Therefore,
N
N ^2
2
^2
S (u^i1
+ ui2) = (1/2) S ei.
i=1
i=1
This shows that the sum of squared residuals from the fixed effects regression
is exactly one have the sum of squared residuals from the first difference
regression.
Since we know the variance estimate for fixed effects is the SSR
57
divided by N  K (when T = 2), and the variance estimate for first difference
is the SSR divided by N  K, the error variance from fixed effects is always
half the size as the error variance for first difference estimation, that is,
^2
B^FE
and
B^FD
are identical.
This is easy since the variance matrix estimate for fixed effects is
1
N
N
*1 = ^s2& SN DxDx *1,
su7 S (xi1xi1 + xi2xi2)*8 = (^s2e/2)&7 S DxD
x
/2
i
i 8
e7
i
i8
i=1
i=1
i=1
^2&
Thus, the
standard errors, and in fact all other test statistics (F statistics) will be
numerically identical using the two approaches.
Under RE.1,
2
s2uIT, which implies that E(uiu
i xi) = suIT
Therefore,
E(viv
i xi) = E(cixi)jTj
T + E(uiu
i xi) = h(xi)jTj
T +
2
where h(xi)
s2uIT,
conditional variance matrix of vi given xi has the same covariance for all t
s, h(xi), and the same variance for all t, h(xi) +
s2u.
rNasymptotically
normal
without assumption RE.3b, but the usual random effects variance estimator of
B^RE
58
sd(u_id)
sd(e_id_t)
sd(e_id_t + u_id)
=
=
=
.3718544
.4088283
.5526448
corr(u_id, X)
0 (assumed)
(theta = 0.3862)
=
=
=
0.2067
0.5390
0.4785
chi2( 10)
=
Prob > chi2 =
512.77
0.0000
trmgpa 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+spring  .0606536
.0371605
1.632
0.103
.1334868
.0121797
crsgpa 
1.082365
.0930877
11.627
0.000
.8999166
1.264814
frstsem 
.0029948
.0599542
0.050
0.960
.1145132
.1205028
season  .0440992
.0392381
1.124
0.261
.1210044
.0328061
sat 
.0017052
.0001771
9.630
0.000
.0013582
.0020523
verbmath  .1575199
.16351
0.963
0.335
.4779937
.1629538
hsperc  .0084622
.0012426
6.810
0.000
.0108977
.0060268
hssize  .0000775
.0001248
0.621
0.534
.000322
.000167
black  .2348189
.0681573
3.445
0.000
.3684048
.1012331
female 
.3581529
.0612948
5.843
0.000
.2380173
.4782886
_cons 
1.73492
.3566599
4.864
0.000
2.43396
1.035879
. * fixed effects estimation, with timevarying variables only.
. xtreg trmgpa spring crsgpa frstsem season, fe
59
sd(u_id)
sd(e_id_t)
sd(e_id_t + u_id)
=
=
=
.679133
.4088283
.792693
corr(u_id, Xb)
0.0893
=
=
=
0.2069
0.0333
0.0613
4,
362) =
Prob > F =
23.61
0.0000
trmgpa 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+spring  .0657817
.0391404
1.681
0.094
.1427528
.0111895
crsgpa 
1.140688
.1186538
9.614
0.000
.9073506
1.374025
frstsem 
.0128523
.0688364
0.187
0.852
.1225172
.1482218
season  .0566454
.0414748
1.366
0.173
.1382072
.0249165
_cons  .7708056
.3305004
2.332
0.020
1.420747
.1208637
id 
F(365,362) =
5.399
0.000
(366 categories)
. * Obtaining the regressionbased Hausman test is a bit tedious.
compute the timeaverages for all of the timevarying variables:
. egen atrmgpa = mean(trmgpa), by(id)
. egen aspring = mean(spring), by(id)
. egen acrsgpa = mean(crsgpa), by(id)
. egen afrstsem = mean(frstsem), by(id)
. egen aseason = mean(season), by(id)
. * Now obtain GLS transformations for both timeconstant and
. * timevarying variables. Note that lamdahat = .386.
. di 1  .386
.614
. gen bone = .614
. gen bsat = .614*sat
. gen bvrbmth = .614*verbmath
. gen bhsperc = .614*hsperc
. gen bhssize = .614*hssize
60
First,
Number of obs
F( 11,
721)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
732
862.67
0.0000
0.9294
0.9283
.40858
btrmgpa 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+bone  1.734843
.3566396
4.864
0.000
2.435019
1.034666
bspring 
.060651
.0371666
1.632
0.103
.1336187
.0123167
bcrsgpa 
1.082336
.0930923
11.626
0.000
.8995719
1.265101
bfrstsem 
.0029868
.0599604
0.050
0.960
.114731
.1207046
bseason  .0440905
.0392441
1.123
0.262
.1211368
.0329558
bsat 
.0017052
.000177
9.632
0.000
.0013577
.0020528
bvrbmth  .1575166
.1634784
0.964
0.336
.4784672
.163434
bhsperc  .0084622
.0012424
6.811
0.000
.0109013
.0060231
bhssize  .0000775
.0001247
0.621
0.535
.0003224
.0001674
bblack  .2348204
.0681441
3.446
0.000
.3686049
.1010359
bfemale 
.3581524
.0612839
5.844
0.000
.2378363
.4784686
. * These are the RE estimates, subject to rounding error.
. * Now add the time averages of the variables that change across i and t
. * to perform the Hausman test:
. reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc
bhssize bblack bfemale acrsgpa afrstsem aseason, nocons
61
Source 
SS
df
MS
+Model  1584.40773
14 113.171981
Residual  120.053023
718 .167204767
+Total  1704.46076
732
2.3284983
Number of obs
F( 14,
718)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
732
676.85
0.0000
0.9296
0.9282
.40891
btrmgpa 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+bone  1.423761
.5182286
2.747
0.006
2.441186
.4063367
bspring  .0657817
.0391479
1.680
0.093
.1426398
.0110764
bcrsgpa 
1.140688
.1186766
9.612
0.000
.9076934
1.373683
bfrstsem 
.0128523
.0688496
0.187
0.852
.1223184
.148023
bseason  .0566454
.0414828
1.366
0.173
.1380874
.0247967
bsat 
.0016681
.0001804
9.247
0.000
.001314
.0020223
bvrbmth  .1316462
.1654425
0.796
0.426
.4564551
.1931626
bhsperc  .0084655
.0012551
6.745
0.000
.0109296
.0060013
bhssize  .0000783
.0001249
0.627
0.531
.0003236
.000167
bblack  .2447934
.0685972
3.569
0.000
.3794684
.1101184
bfemale 
.3357016
.0711669
4.717
0.000
.1959815
.4754216
acrsgpa  .1142992
.1234835
0.926
0.355
.3567312
.1281327
afrstsem  .0480418
.0896965
0.536
0.592
.2241405
.1280569
aseason 
.0763206
.0794119
0.961
0.337
.0795867
.2322278
. test acrsgpa afrstsem aseason
( 1)
( 2)
( 3)
acrsgpa = 0.0
afrstsem = 0.0
aseason = 0.0
F(
3,
718) =
Prob > F =
0.61
0.6085
. * Thus, we fail to reject the random effects assumptions even at very large
. * significance levels.
For comparison, the usual form of the Hausman test, which includes spring
among the coefficients tested, gives pvalue = .770, based on a
distribution (using Stata 7.0).
c24
62
is to just add the time averages of all explanatory variables, excluding the
dummy variables, and estimating the equation by random effects.
done a better job of spelling this out in the text.
I should have

= 0.
is to be added.
Parts b, c, and d:
To be added.
10.11. To be added.
L 8.
In particular, it produces a
estimator of
B.
rNconsistent,
asymptotically normal
with good
63
t=1
which gives
^
&
ai(b) = wi
T
*
& T
*
S
yit/hit  wi S xit/hit b
7t=1
8
7t=1
8
w
w
_ yi  xib,
T
&
*
  w
& T
*
where wi _ 1/ S (1/hit) > 0 and yi _ wi S yit/hit , and a similar
7t=1
8
7t=1
8
  w
  w
  w
definition holds for xi. Note that yi and xi are simply weighted averages.
  w
  w
If h
equals the same constant for all t, y and x are the usual time
it
averages.
^
^
Now we can plug each ai(b) into the SSR to get the problem solved by B:
min
K
beR
N T
S S [(yit  y wi)  (xit  x  wi)b]2/hit.
i=1t=1
  w
But this is just a pooled weighted least squares regression of (yit  yi) on
w
~
Equivalently, define yit
   ,
_ (yit  y wi)/rh it
Then
B^
can be expressed
B^
& SN ST x~ x~ *1& SN ST x~ y~ *.
7i=1t=1 it it8 7i=1t=1 it it8
(10.82)
  w
Note carefully how the initial yit are weighted by 1/hit to obtain yi, but

where the usual 1/rhit weighting shows up in the sum of squared residuals on
the timedemeaned data (where the demeaming is a weighted average).
Given
^
L 8) properties of B
. First, it is
  w   w
  w
  w
& T
*
easy to show that yi = xiB + ci + ui, where ui _ wi S uit/hit . Subtracting
7t=1
8
(10.82), we can study the asymptotic (N
   .
_ (uit  u wi)/rh it
~
~
ci + uit for all t gives yit = xitB
~
uit,
~
When we plug this in for yit in (10.82) and
64
&N1 SN ST x~ x~ *1&N1 SN ST x~ u~ *.
7 i=1t=1 it it8 7 i=1t=1 it it8
T ~ ~
T ~
Straightforward algebra shows that S x
ituit = S x
ituit/rhit, i = 1,...,N,
B^
t=1
t=1
B^
(10.83)
B^.
Why?
We
~
So E(x
ituit) = 0, t = 1,...,T.
As long
as we assume rank
mild assumptions.
 B
^
1
1
rN(
 B) = A BA ,
where
T
T
   * .
_ S E(x~it~xit) and B _ Var&7 S ~xituit/rh it
8
t=1
t=1
If we assume that Cov(uit,uisxi,hi,ci) = 0, t $ s, in addition to the
 B
^
2 1
su2A, and so rN(
 B) = suA .
The same subtleties that arise in estimating
Then, note that the residuals from the pooled OLS regression
~
~
yit on xit, t = 1,...,T, i = 1,...,N,
(10.84)
^
~
  w
^
say rit, are estimating uit = (uit  ui)/rhit (in the sense that we obtain rit
~
from uit by replacing
w
+ E[(ui) /hit] =
with
B^).
~2
2
  w
Now E(uit) = E[(uit/hit)]  2E[(uitui)/hit]
65
~2
Therefore, E(uit) =
t=1
This contains the usual result for the within transformation as a special
case.
A consistent estimator of
^
The estimator of Avar(B) is then
^2&
1
N T
su7 S S ~xit~xit*8 .
i=1t=1
$ s2uhit, then we can just apply the robust formula for the
CHAPTER 11
The
Thus, the
66
Without
additional assumptions, the pooled OLS standard errors and test statistics
need to be adjusted for heteroskedasticity and serial correlation (although
the later will not be present under dynamic completeness).
b. As we discussed in Section 7.8.2, this statement is incorrect.
Provided our interest is in E(yitzi,xit,yi,t1,progit), we do not care about
serial correlation in the implied errors, nor does serial correlation cause
inconsistency in the OLS estimators.
c. Such a model is the standard unobserved effects model:
yit = xitB +
d1progit + ci + uit,
t=1,2,...,T.
differencing.
We
t = 1,...,T.
L 8 with fixed T.
t=2,...,T.
> 2.
Under
strict exogeneity, past and future values of xit can also be used as
instruments.
can be written as
2
N T
N T
b + 7&N1 S S (xit  x  i)*8 &7N1 S S (xit  x  i)(uit  u  i  b(rit  r i)*8.
i=1t=1
i=1t=1
*
*
*
Now, xit  xi = (xit  xi) + (rit  ri). Then, because E(ritxi,ci) = 0 for
*
  *
all t, (x
 x ) and (r
 r ) are uncorrelated, and so
it
it

  *



Similarly, under (11.30), (xit  xi) and (uit  ui) are uncorrelated for all


  *


t.

Var(rit  ri).
variances across t,
N
T
T
S S (xit  x  i) Lp S Var(xit  x i) = T[Var(x*it  x *i) + Var(rit  r i)]
1 N
i=1t=1
t=1
and
N
T
S S (xit  x  i)(uit  u  i  b(rit  r i) Lp TbVar(rit  r i).
1 N
i=1t=1
Therefore,
plim
bFE
Var(rit  r i )
&
*
= b  b 7[Var(x*  x   * ) + Var(r  r   )]8
it
it
it
it
Var(rit  r i )
&
*
= b 1   .
7
*
*
8
[Var(x
 x ) +
Var(r
 r )]
A]
+ E(uizi,xi) = Zi(A 
A)
+ 0 = 0.
expectations argument.
assumptions given, which shows that the conditional variance depends on zi.
Unlike in the standard random effects model, there is conditional
heteroskedasticity.
b. If we use the usual RE analysis, we are applying FGLS to the equation
yi = ZiA + XiB + vi, where vi = Zi(ai 
A)
+ ui.
^
)
L 8 for
rNasymptotically
normal, provided the rank condition, Assumption
(Remember, a feasible GLS analysis with any
^
)
will be consistent
L 8.
It need
^
^
not be the case that Var(vixi,zi) = plim()), or even that Var(vi) = plim()).
From part a, we know that Var(vixi,zi) depends on zi unless we restrict
almost all elements of
constant in zit).
11.7. When
Lt
L/T

B^
(along with
along with
B.
L)
vit, t = 1,2,...,T.
By
B^
procedure:
69

* K vectors of
^
residuals, say rit, t = 1,...,T, i = 1,...,N.
^
(ii) Regress yit on rit across all t and i.
B^
is the FE estimator.
^
^
The OLS vector on rit is B.
^
be obtained by pooled OLS of yit on (xit  xi), it suffices to show that rit =

But
^
 &
rit = xit  xi
N T       *1& N T   S
S xi xi8 7 S S xi xit*8
7i=1t=1
i=1t=1
N  N T  ^
= S Tx
i xi = S S x
i xi, and so rit = xit  xiIK
N T
N
T
S S x  i xit = S x i S xit
i=1t=1
i=1
t=1
i=1
= xit  xi. This completes the proof.
and
i=1t=1
11.9. a. We can apply Problem 8.8.b, as we are applying pooled 2SLS to the
& ST E(z x )* = K.
it it 8
7t=1
timedemeaned equation:
rank
constant instruments.
zit so that
t=1
In
1
particular, in equation (8.25), take C = E(Z
i Xi), W = [E(Z
i Zi)] , and
E(Z
i uiu
i Zi).
"
u , where
A key point is that
Z
i ui = (QTZi)(QTui) = Z
i QTui = Z
i i
Under (11.80),
E(uiu
i Zi) =
u u
E(Z
i i i Zi) =
s2uE(Z
i Zi).
into (8.25)
 B
^
2
1
1
Z
rN(
 B) = su{E(X
i i)[E(Z
i Zi)] E(Z
i Xi)} .
estimator.
First,
2
2
S E(uit
) = (T  1)su, just as before.
t=1
70
^
If uit = y
it  xitB
are the pooled 2SLS residuals applied to the timedemeaned data, then [N(T 1 N
1)]
T
S S ^u2it is a consistent estimator of s2u.
i=1t=1
B,
^
..., dNi, zit across all t and i, and obtain the residuals, say rit; second,
^
^
^
obtain c1, ..., cN, B from the pooled regression yit on d1i, ..., dNi, xit,
^
rit.
^
D,
B^
from this last regression can be obtained by first partialling out the
Therefore,
B^
and
D^
can be
, ^
obtained from the pooled regression
yit on x
it rit, where we use the fact
^
that the time average of rit for each i is identically zero.
Now consider the 2SLS estimator of
from (11.79).
This is equivalent to
^
first regressing
xit on
zit and saving the residuals, say sit, and then
^
running the OLS regression
yit on
xit, sit.
^
and the fact that regressing on d1i, ..., dNi results in time demeaning, sit =
^
rit for all i and t.
from (11.79)
^
would usually be the case, some entries in rit are identically zero for all t
and i.
But we can simply drop those without changing any other steps in the
argument.)
e. First, by writing down the first order condition for the 2SLS
^
estimates from (11.81) (with dni as their own instruments, and xit as the IVs
^
 ^
^
for xit), it is easy to show that ci = yi  xiB, where B is the IV estimator
71
 ^
^
 ^
are computed as yit  (yi  xiB)  xitB = (yit  yi)  (xit  xi)B = y
it ^
This is
because the timedemeaned IVs will generally be correlated with some elements
of ui (usually, all elements).
11.11. Differencing twice and using the resulting cross section is easily done
in Stata.
Number of obs
F( 2,
51)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
54
=
0.97
= 0.3868
= 0.0366
= 0.0012
= .70368
cclscrap 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
72
+ccgrnt 
.1564748
.2632934
0.594
0.555
.3721087
.6850584
ccgrnt_1 
.6099015
.6343411
0.961
0.341
.6635913
1.883394
_cons  .2377384
.1407363
1.689
0.097
.5202783
.0448014

sd(u_fcode)
sd(e_fcode_t)
sd(e_fcode_t + u_fcode)
=
=
=
.509567
.4975778
.7122094
corr(u_fcode, Xb)
0.4011
=
=
=
0.0577
0.0476
0.0050
3,
51) =
Prob > F =
1.04
0.3826
clscrap 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+d89  .2377385
.1407362
1.689
0.097
.5202783
.0448014
cgrant 
.1564748
.2632934
0.594
0.555
.3721087
.6850584
cgrant_1 
.6099016
.6343411
0.961
0.341
.6635913
1.883394
_cons  .2240491
.114748
1.953
0.056
.4544153
.0063171
fcode 
F(53,51) =
1.674
0.033
(54 categories)
The estimates from the random growth model are pretty bad  the estimates on
the grant variables are of the wrong sign  and they are very imprecise.
The
11.13. To be added.
11.15. To be added.
73
A (N
1/2 N
Si=1Xi ui)
+ op(1).
 B
^
rN(
FE  B) =
 A
^
1/2
rN(
 A) = N
S [(Zi Zi)1Zi (yi  XiB)  A]
i=1
& 1 N
1
*  ^
 N S (Z
7 i=1 i Zi) Zi Xi8rN(BFE  B)
N
1/2 N
= N
S (si  A)  CA1N1/2 S Xi ui + op(1)
N
i=1
where C
=
A.
i=1
1
By definition, E(si)
 A
^
1/2
rN(
 A) = N
S [(si  A)  CA1Xi ui] + op(1),
N
i=1
which implies by the central limit theorem and the asymptotic equivalence
lemma that
 A
^
rN(
 A) is asymptotically normal with zero mean and variance
E(rir
i ), where ri
If we replace
A,
C, A, and
with
^
their consistent estimators, we get exactly (11.55), sincethe ui are the FE
residuals.
74
CHAPTER 12
x}
= E(u
= E(u
minimized at
Qo
Q,
^
dlog[E(y
z)]/dlog(z1) = d[q^1 +
q3/(2^q4).
2
~
~
~
statistic as NRu from the regression ui on mixi1, mixi2, i = 1,...,N, where
~
~
ui = yi  mi.
obtain the 1
~
~
For the robust test, we first regress mixi2 on mixi1 and
* K2 residuals, r~i.
regression (12.75).
12.5. We need the gradient of m(xi,Q) evaluated under the null hypothesis.
By the chain rule,
3d2(xB) ],
2
d1 = d2 = 0 imposed.
~
~ 2
~ 3
Ddm(xi,Q~) = g(xiB
)[(xiB) ,(xiB) ].
Then
Let
B~
denote
~
Dbm(xi,~Q) = g(xiB
)xi and
2
~
~
~
~ 2 ~
~ 3
obtained as NRu from the regression ui on gixi, giW(xiB) , giW(xiB) , where
~
gi
~
_ g(xiB
).
Then E(uiu
i xi) = E(uiu
i) =
squares residuals.
)o.
^
^ be the vector of nonlinear least
Let u
i
)^ _
because each NLS estimator,
)o
is
1 N
^
^
Qg
S ^^ui^^ui
i=1
is consistent for
Qog
as N
8.
b. This part involves several steps, and I will sketch how each one
goes.
First, let
 the nuisance
observation i is
s(wi,Q;G) = Dqm(xi,Q)) ui(Q)
1
So
Each
Dgsj(wi,Qo;G) is a
)o.
Next, we derive Bo
_ E[si(Qo;Go)si(Qo;Go)]:
= E{E[Dqmi(Qo))o uiu
i )o
1
1
1
1
Dqmi(Qo)]
Dqmi(Qo)xi]}
= E[Dqmi(Qo))o E(uiu
i xi))o
= E[Dqmi(Qo))o
1
1
Dqmi(Qo)]
)o)1
o Dqmi(Qo)]
= E[Dqmi(Qo))o
1
Dqmi(Qo)].
can be written
)1Dqmi(Q)
So,
Dqmi(Qo))1
o Dqmi(Qo).
1
= Ao
with respect to
Dqmi(Qo))1
o Dqmi(Qo) + [IP t E(uixi)]F(xi,Qo;Go)
Qo)
The Jacobian
where F(xi,Q;G) is a GP
Q.
The
= {E[Dqmi(Qo))o
1
Dqmi(Qo)].
So, we have
rN(^Q 
Dqmi(Qo)]}1.
&N1 SN D m (^Q))
^ 1
^ *1
D
mi(Q)
q
i
q
7 i=1
8 /N
N
1
& S D m (Q^))
^ 1
=
Dqmi(^Q)*8 .
q i
7
^ ^
Avar(Q) =
i=1
The estimate
)
^
Dqgmig(Qog), a 1 * Pg matrix.
If
)o
that
2
o
so1
Dq1mi1
Dq 1 m oi1 0 W W W
Dqmi(Qo))1
o Dqmi(Qo) =
)
0
WW
W
o
o
s2
oG DqGm iGDqGmiG
9
0
^
Taking expectations and inverting the result shows that Avar rN(Qg  Qog) =
W W W

s2og[E(DqgmoigDqgmoig)]1, g = 1,...,G.
These
asymptotic variances are easily seen to be the same as those for nonlinear
least squares on each equation; see p. 360.
e. I cannot see a nonlinear analogue of Theorem 7.7.
given in Problem 7.5 does not extend readily to nonlinear models, even when
the same regressors appear in each equation.
with
Dqm(xi,Qo).
While this G
described in part d, the blocks are not the same even when the same
regressors appear in each equation.
for all g.
But, unless
Qog
restrictive assumption 
Dqgmg(xi,Qog) = xi
For example, if
differ across g.
a and d.
78
depend on x.
c. When u and x are independent, the partial effects of xj on the
conditional mean and conditional median are the same, and there is no
ambiguity about what is "the effect of xj on y," at least when only the mean
and median are in the running.
But it could
That is,
Bo
that
E{[m(xi,Bo)  m(xi,B)][m(xi,Bo)  m(xi,B)]} > 0,
In a linear model, where m(xi,B) = XiB for Xi a G
B $ Bo.
is
(Bo 
B)E(Xi Xi)(Bo
B)
> 0,
B $ Bo,
Generally, Bo = E[Dqmi(Bo)uiuD
i qmi(Bo)]
1 N
i=1
which is just to say that the usual consistency proof can be used provided we
verify identification.
1
where E(uixi) = 0 is used to show the crossproduct term, 2E{[m(xi,Bo) m(xi,B)][Wi(Do)] ui}, is zero (by iterated expectations, as always).
1
Bo;
As
First,
It
This means
Do
rNconsistent estimator.

1
= E{E[Dbmi(Bo)[Wi(Do)] uiu
i [Wi(Do)]
1
1
Dbmi(Bo)xi]}
= E[Dbmi(Bo)[Wi(Do)] E(uiu
i xi)[Wi(Do)]
1
1
= E{Dbmi(Bo)[Wi(Do)] ]Dbmi(Bo)}.
1
80
Dbmi(Bo)]
Dbmi(Bo)]
B),
Taking
expectations gives
Ao
^
1
rN(B
 Bo) = Ao , and

a consistent estimator of Ao is
^
1
A = N
^
^ 1
^
S Dbm(xi,B
)[Wi(D)] Dbm(xi,B).
i=1
c. The consistency argument in part b did not use the fact that W(x,D)
is correctly specified for Var(yx).
through.
1 N
^
^ 1^ ^
^ 1
^
S Dbm(xi,B
)[Wi(D)] uiu
i [Wi(D)] Dbm(xi,B).
i=1
^
^1^^1
rN(B
 Bo) in the usual way: A BA .

CHAPTER 13
13.1. No.
We know that
Qo
solves
Qe$
$.
Qo
Therefore,
E[f(yixi;Q)]
$ exp{E[log f(yixi;Q)]}.
81
In
f(yixi;Q)]}.
1
1
1
xi}
= [G(Qo)] E[si(Qo)si(Qo)xi][G(Qo)]
1
1
= [G(Qo)] Ai(Qo)[G(Qo)] .
1
1
Qo
with
~g
~
1
~
~ 1
Ai = [G(Q)] Ai(Q)[G(Q)]
Q~
and
Fo
with
F~:
_ ~G1~Ai~G1.
N
1
N
& S s~ * G~1G~& S A~ * G~G~1& S ~s *
=
7i=1 i8
7i=1 i8
7i=1 i8
N
N
1
N
& S ~s *& S ~A * & S ~s * = LM.
=
i
i
i
7i=1 8 7i=1 8
7i=1 8
The log
Q) _
li(
Qo
maximizes E[li1(Q)yi2,xi].
therefore
Qo
maximizes E[ri2li1(Q)].
Similary,
82
Qo
so it follows that
Qo
For identification, we
Therefore,
We have shown
Dqsi2(Q).
Dqsi1(Q).
(13.70)
E[ri2si1(Qo)si1(Qo)] = E[ri2Hi1(Qo)].
Combining all the pieces, we have shown that
E[si(Qo)si(Qo)] = E[ri2Hi1(Qo)]  E[Hi2(Qo)]
= {E[ri2Dqsi1(Q) +
= E[Dqli(Q)]
2
Dqsi2(Q)]
_ E[Hi(Q)].
rN(^Q  Qo) is

1 N
S (ri2H^i1 + H^i2),
i=1
Since, by
1 N
_ E[Hi2(Qo)xi], N
i=1
1 N
This
i=1
sample, the result conditional MLE would be more efficient than the partial
MLE based on the selected sample.
Answer:
B are P
1
 A
if A and
1
is positive definite.
If we could use
the entire random sample for both terms, the asymptotic variance would be
{E[Ai1(Qo) + Ai2(Qo)]} .
1
Ai2(Qo)]}
1
1
 {E[Ai1(Qo) +
> 0.
13.9. To be added.
13.11. To be added.
CHAPTER 14
g2 $ 1.
If E(u2x) =
2
s22,
2SLS using the given list of instruments is the efficient, single equation
GMM estimator.
Even under
If
g2 $ 1.
g1 = 0, the parameter g2 does not appear in the model.
course, if we knew
Of
Now, when
= x1D1 +
g1E(y22x) + E(u1x)
g1E(y22x).
g1(xD2) 2;
D2,
While the
^ g2
yi1 on xi1, (xiD2)
will not be consistent for
example of a "forbidden regression.")
When
D1
and
g2.
(This is an
%o
* L matrix that is a
Let Zi be a G
Then the asymptotic variance of the GMM estimator has the form (14.10) with
Go = E[Z
i Ro(xi)].
G
o %oZ
i r(wi,Qo).
Ro(xi))o(xi) r(wi,Qo).
1
r = 1:
E[s(wi)s (wi)] = G
o %oE[Z
i r(wi,Qo)r(wi,Qo))o(xi) Ro(xi)]
*
1
= G
o %oE[Z
i E{r(wi,Qo)r(wi,Qo)xi})o(xi) Ro(xi)]
1
= G
o %oE[Z
i )o(xi))o(xi) Ro(xi)] = G
o %oGo = A.
1
Pt
is 1 + 3K
stacking the
the
Pt
Pt.
= (j,L
1 ,L
2 ,L
3 ,B).
we have
P2
= [L
1 ,(L2 +
B),L3 ], P3
= [L
1 ,L
2 ,(L3 +
by
86
B)].
&1 0 0 0 0 *
0 IK 0 0
0 0 IK 0
0
1
0
H =
0
0
1
0
0
70
2
0
0
IK
0
0
0
IK
0
0
0
0
0
IK
0
0
0
IK
0
IK
0
0
0
IK
0
0
0
IK
IK
0
0
0
0
.
IK
0
0
0
0
IK8
2
QeR P
Q.
^1 ^
^1^
(H% H)Q = H% P.
or
^1
Therefore, assuming H% H is nonsingular  which occurs w.p.a.1. when
H%o H  is nonsingular  we have
1
Q^
^1 1 ^1^
= (H% H) H% P.
14.9. We have to verify equations (14.55) and (14.56) for the random effects
and fixed effects estimators.
ljTvi.

Therefore, E(si1s
i1) = E(X
i rir
i Xi) =
r.
r _ su2.
Now,
But si2s
i1 = X
i uir
i X i.
X
i ui.
su2Xi Xi.
So si2s
i1 = X
i rir
i Xi and therefore E(si2s
i1xi) = X
i E(rir
i xi)Xi =
It follows that E(si2s
i1) =
note that X
i Xi = X
i (Xi =
su2E(Xi Xi).
ljTxi) = Xi Xi = Xi Xi.

s2u.
88
CHAPTER 15
If we drop d1
but add an overall intercept, the overall intercept is the cell frequency for
the first category, and the coefficient on dm becomes the difference in cell
frequency between category m and category one, m = 2, ..., M.

g1z2 + g2z22);
just replace the parameters with their probit estimates, and use average or
other interesting values of z.
c. We would apply the delta method from Chapter 3.
Thus, we would
require the full variance matrix of the probit estimates as well as the
gradient of the expression of interest, such as (g1 + 2g2z2)Wf(z1D1 +
g1z2 +

b. Write y
= z1D1 + r, where r =
E(ez) = 0.
Also,
Var(rz) =
g1z2E(qz) +
Thus,
5 2 2
g1z2 + 1 has a standard normal distribution independent of z.
======================================
It follows
that
5
F&7z1D1/r g21z22 + 18*.
(15.90)
2
c. Because P(y = 1z) depends only on g1, this is what we can estimate
======================================
P(y = 1z) =
along with
D1.
(For example,
r1 = g21.
Testing H0:
r1 = 0 is most
easily done using the score or LM test because, under H0, we have a standard
probit model.
Define
Let
D1
^
90
5
Fi, and ~ui _ u^i/r F^i(1  F^i)
^

r1 = 0.
================================================
D1,
with respect to
fizi1 .
The
r1 evaluated at the
r1 is,
for each i,
(zi1D1)(zi2/2)
2
Then, the
5
5
2 ^
f^izi1/r F^i(1  F^i), (zi1D^1)zi2
fi/r F^i(1  F^i);
================================================
on
2 a
under H0, NRu ~
================================================
c21.
g21.
r1 in
. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
Source 
SS
df
MS
+Model  44.9720916
8 5.62151145
Residual  500.844422 2716 .184405163
+Total  545.816514 2724
.20037317
Number of obs
F( 8, 2716)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
2725
30.48
0.0000
0.0824
0.0797
.42942
arr86 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+pcnv  .1543802
.0209336
7.37
0.000
.1954275
.1133329
avgsen 
.0035024
.0063417
0.55
0.581
.0089326
.0159374
tottime  .0020613
.0048884
0.42
0.673
.0116466
.007524
ptime86  .0215953
.0044679
4.83
0.000
.0303561
.0128344
inc86  .0012248
.000127
9.65
0.000
.0014738
.0009759
black 
.1617183
.0235044
6.88
0.000
.1156299
.2078066
hispan 
.0892586
.0205592
4.34
0.000
.0489454
.1295718
born60 
.0028698
.0171986
0.17
0.867
.0308539
.0365936
_cons 
.3609831
.0160927
22.43
0.000
.329428
.3925382
91
. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60, robust
Regression with robust standard errors
Number of obs
F( 8, 2716)
Prob > F
Rsquared
Root MSE
=
=
=
=
=
2725
37.59
0.0000
0.0824
.42942

Robust
arr86 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+pcnv  .1543802
.018964
8.14
0.000
.1915656
.1171948
avgsen 
.0035024
.0058876
0.59
0.552
.0080423
.0150471
tottime  .0020613
.0042256
0.49
0.626
.010347
.0062244
ptime86  .0215953
.0027532
7.84
0.000
.0269938
.0161967
inc86  .0012248
.0001141
10.73
0.000
.0014487
.001001
black 
.1617183
.0255279
6.33
0.000
.1116622
.2117743
hispan 
.0892586
.0210689
4.24
0.000
.0479459
.1305714
born60 
.0028698
.0171596
0.17
0.867
.0307774
.036517
_cons 
.3609831
.0167081
21.61
0.000
.3282214
.3937449
The estimated effect from increasing pcnv from .25 to .75 is about .154(.5) =
.077, so the probability of arrest falls by about 7.7 points.
There are no
In fact,
avgsen = 0.0
tottime = 0.0
F(
2, 2716) =
Prob > F =
0.18
0.8320
. qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
. test avgsen tottime
92
( 1)
( 2)
avgsen = 0.0
tottime = 0.0
F(
2, 2716) =
Prob > F =
0.18
0.8360
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
1608.1837
1486.3157
1483.6458
1483.6406
Probit estimates
Number of obs
LR chi2(8)
Prob > chi2
Pseudo R2
=
=
=
=
2725
249.09
0.0000
0.0774
arr86 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+pcnv  .5529248
.0720778
7.67
0.000
.6941947
.4116549
avgsen 
.0127395
.0212318
0.60
0.548
.028874
.0543531
tottime  .0076486
.0168844
0.45
0.651
.0407414
.0254442
ptime86  .0812017
.017963
4.52
0.000
.1164085
.0459949
inc86  .0046346
.0004777
9.70
0.000
.0055709
.0036983
black 
.4666076
.0719687
6.48
0.000
.3255516
.6076635
hispan 
.2911005
.0654027
4.45
0.000
.1629135
.4192875
born60 
.0112074
.0556843
0.20
0.840
.0979318
.1203466
_cons  .3138331
.0512999
6.12
0.000
.4143791
.213287
Now, we must compute the difference in the normal cdf at the two different
values of pcnv, black = 1, hispan = 0, born60 = 1, and at the average values
of the remaining variables:
. sum avgsen tottime ptime86 inc86
Variable 
Obs
Mean
Std. Dev.
Min
Max
+avgsen 
2725
.6322936
3.508031
0
59.2
tottime 
2725
.8387523
4.607019
0
63.4
ptime86 
2725
.387156
1.950051
0
12
inc86 
2725
54.96705
66.62721
0
541
93
. di 1903/1970
.96598985
. di 78/755
.10331126
For men who were not arrested, the probit predicts correctly about 96.6% of
the time.
Unfortunately, for the men who were arrested, the probit is correct
quite high, but we cannot very well predict the outcome we would most like to
predict.
e. Adding the quadratic terms gives
. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 pcnvsq
pt86sq inc86sq
94
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
6:
7:
log
log
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
=
=
1608.1837
1452.2089
1444.3151
1441.8535
1440.268
1439.8166
1439.8005
1439.8005
Probit estimates
Number of obs
LR chi2(11)
Prob > chi2
Pseudo R2
=
=
=
=
2725
336.77
0.0000
0.1047
arr86 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+pcnv 
.2167615
.2604937
0.83
0.405
.2937968
.7273198
avgsen 
.0139969
.0244972
0.57
0.568
.0340166
.0620105
tottime  .0178158
.0199703
0.89
0.372
.056957
.0213253
ptime86 
.7449712
.1438485
5.18
0.000
.4630333
1.026909
inc86  .0058786
.0009851
5.97
0.000
.0078094
.0039478
black 
.4368131
.0733798
5.95
0.000
.2929913
.580635
hispan 
.2663945
.067082
3.97
0.000
.1349163
.3978727
born60  .0145223
.0566913
0.26
0.798
.1256351
.0965905
pcnvsq  .8570512
.2714575
3.16
0.002
1.389098
.3250042
pt86sq  .1035031
.0224234
4.62
0.000
.1474522
.059554
inc86sq 
8.75e06
4.28e06
2.04
0.041
3.63e07
.0000171
_cons 
.337362
.0562665
6.00
0.000
.4476423
.2270817
note: 51 failures and 0 successes completely determined.
. test pcnvsq pt86sq inc86sq
( 1)
( 2)
( 3)
pcnvsq = 0.0
pt86sq = 0.0
inc86sq = 0.0
chi2( 3) =
Prob > chi2 =
38.54
0.0000
The quadratic in
that there is an estimated deterrent effect over most of the range of pcnv.
95
li(B)
B^,
^
defined only if 0 < xiB < 1 for all i = 1,...,N.
It may be
are consistent for the unknown parameters, asymptotically the true density
will produce the highest average log likelihood function.
use an Rsquared to choose among different functional forms for E(yx), we can
use values of the loglikelihood to choose among different models for P(y =
1x) when y is binary.
that is, the joint density (conditional on xi) is the product of the marginal
densities (each conditional on xi).
exogeneity assumptiond:
f(y1,...,yTxi) =
p [G(xitB)]yt[1  G(xitB)]1yt,
t=1
the treatment and control groups, both before and after the policy change,
will be identical across models.
b. Let d2 be a binary indicator for the second time period, and let dB be
an indicator for the treatment group.
treatment effect is
P(y = 1x) =
Once we have
^
^
^
 [F(d0 + d1 + xG) 

F(^d0 + x^G)],

i=1
^
^
^
 [F(d0 + d1 + xiG) 
F(^d0 + xi^G)]}.
Both are estimates of the difference, between groups B and A, of the change in
the response probability over time.
c. We would have to use the delta method to obtain a valid standard error
for either
q or ~q.
97
approximation.)
s2.
15.17. a. We obtain the joint density by the product rule, since we have
independence conditional on (x,c):
f(y1,...,yGx,c;Go) = f1(y1x,c;Go)f2(y1x,c;Go)WWWfG(yGx,c;Go).
1
8&
log
As expected, this depends only on the observed data, (xi,yi1,...,yiG), and the
unknown parameters.
15.19. To be added.
98
CHAPTER 16
8, F{[log(c)  xiB]/s}
F{[log(c)  xiB]/s}.
0 as c
8.
This simply says that, the longer we wait to censor, the less likely it is
that we observe a censored observation.
b. The density of yi
*
density of yi

B2
1
f[(yi  xiB)/s]}.
This requires estimating the model with all variables, and then
The LR statistic is
distributed asymptotically as
LR
= 2(Lur 
Lr).
Under H0,
LR
is
c2K2.
affecting ti.
Thus, if xi
F[(a1  xiB)/s].
Similarly,
P(yi = a2xi) = P(yi
= P[(ui/s)
=
F[(a2  xiB)/s].
Taking
the derivative of this cdf with respect to y gives the pdf of yi conditional
on xi for values of y strictly between a1 and a2:
*
b. Since y = y
a2).
But y
when a1 < y
= xB + u, and a1 < y
(1/s)f[(y  xiB)/s].
E(y
= xB +
= xB +
s{f[(a1  xB)/s]
F[(a1  xB)/s]}
+ (xB)W{F[(a2  xB)/s] +
F[(a1  xB)/s]}
(16.57)
+ a2F[(xB  a2)/s].
B.
The linear regression of yi on xi using only those yi such that a1 < yi < a2
*
< a2.
on x in the subpopulation
B.
li(q)
Note how the indicator function selects out the appropriate density for each
of the three possible cases:
B^
and
interesting values of x.
f. We can show this by bruteforce differentiation of equation (16.57).
As a shorthand, write
a2)/s],
101
Then
with.
The scale factor is simply the probability that a standard normal random
variable falls in the interval [(a1  xB)/s,(a2  xB)/s], which is necessarily
between zero and one.
g. The partial effects on E(yx) are given in part f.
These are
estimated as
^ ^
{F[(a2  xB)/s] where the estimates are the MLEs.

at, say, x.
^ ^ ^
F[(a1  xB
)/s]}bj,
^ ^
Or, we could average {F[(a2  xiB)/s] 
gj.
(16.58)
^ ^
F[(a1  xiB
)/s]} across
bj
Generally, we expect
^
^
gj ~ ^rWb
j,
where 0 <
It
s is "ancillary."
h. For data censoring where the censoring points might change with i,
the analysis is essentially the same but a1 and a2 are replaced with ai1 and
ai2.
16.5. a. The results from OLS estimation of the linear model are
. reg hrbens exper age educ tenure married male white nrtheast nrthcen south
union
Source 
SS
df
MS
+Model  101.132288
11 9.19384436
Residual  170.839786
604 .282847328
+Total  271.972074
615 .442231015
Number of obs
F( 11,
604)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
616
32.50
0.0000
0.3718
0.3604
.53183
hrbens 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+exper 
.0029862
.0043435
0.688
0.492
.005544
.0115164
age  .0022495
.0041162
0.547
0.585
.0103333
.0058343
educ 
.082204
.0083783
9.812
0.000
.0657498
.0986582
tenure 
.0281931
.0035481
7.946
0.000
.021225
.0351612
married 
.0899016
.0510187
1.762
0.079
.010294
.1900971
male 
.251898
.0523598
4.811
0.000
.1490686
.3547274
white 
.098923
.0746602
1.325
0.186
.0477021
.2455481
nrtheast  .0834306
.0737578
1.131
0.258
.2282836
.0614223
nrthcen  .0492621
.0678666
0.726
0.468
.1825451
.084021
south  .0284978
.0673714
0.423
0.672
.1608084
.1038129
union 
.3768401
.0499022
7.552
0.000
.2788372
.4748429
_cons  .6999244
.1772515
3.949
0.000
1.048028
.3518203
b. The Tobit estimates are
. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south
union, ll(0)
Tobit Estimates
Number of obs
chi2(11)
Prob > chi2
Pseudo R2
=
616
= 283.86
= 0.0000
= 0.2145
hrbens 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+exper 
.0040631
.0046627
0.871
0.384
.0050939
.0132201
age  .0025859
.0044362
0.583
0.560
.0112981
.0061263
educ 
.0869168
.0088168
9.858
0.000
.0696015
.1042321
tenure 
.0287099
.0037237
7.710
0.000
.021397
.0360227
married 
.1027574
.0538339
1.909
0.057
.0029666
.2084814
male 
.2556765
.0551672
4.635
0.000
.1473341
.364019
white 
.0994408
.078604
1.265
0.206
.054929
.2538105
103
nrtheast  .0778461
.0775035
1.004
0.316
.2300547
.0743625
nrthcen  .0489422
.0713965
0.685
0.493
.1891572
.0912729
south  .0246854
.0709243
0.348
0.728
.1639731
.1146022
union 
.4033519
.0522697
7.717
0.000
.3006999
.5060039
_cons  .8137158
.1880725
4.327
0.000
1.18307
.4443616
+_se 
.5551027
.0165773
(Ancillary parameter)
Obs. summary:
The Tobit and OLS estimates are similar because only 41 of 616 observations,
or about 6.7% of the sample, have hrbens = 0.
estimates are all slightly larger in magnitude; this reflects that the scale
factor is always less than unity.
s.
You
^2
as we know,
and tenure
are included:
. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south
union expersq tenuresq, ll(0)
Tobit Estimates
Number of obs
chi2(13)
Prob > chi2
Pseudo R2
=
616
= 315.95
= 0.0000
= 0.2388
hrbens 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+exper 
.0306652
.0085253
3.597
0.000
.0139224
.047408
age  .0040294
.0043428
0.928
0.354
.0125583
.0044995
educ 
.0802587
.0086957
9.230
0.000
.0631812
.0973362
tenure 
.0581357
.0104947
5.540
0.000
.037525
.0787463
married 
.0714831
.0528969
1.351
0.177
.0324014
.1753675
male 
.2562597
.0539178
4.753
0.000
.1503703
.3621491
white 
.0906783
.0768576
1.180
0.239
.0602628
.2416193
nrtheast  .0480194
.0760238
0.632
0.528
.197323
.1012841
nrthcen 
.033717
.0698213
0.483
0.629
.1708394
.1034053
south 
.017479
.0693418
0.252
0.801
.1536597
.1187017
union 
.3874497
.051105
7.581
0.000
.2870843
.4878151
expersq  .0005524
.0001487
3.715
0.000
.0008445
.0002604
104
tenuresq  .0013291
.0004098
3.243
0.001
.002134
.0005242
_cons  .9436572
.1853532
5.091
0.000
1.307673
.5796409
+_se 
.5418171
.0161572
(Ancillary parameter)
Obs. summary:
Both squared terms are very signficant, so they should be included in the
model.
d. There are nine industries, and we use ind1 as the base industry:
. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south
union expersq tenuresq ind2ind9, ll(0)
Tobit Estimates
Number of obs
chi2(21)
Prob > chi2
Pseudo R2
=
616
= 388.99
= 0.0000
= 0.2940
hrbens 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+exper 
.0267869
.0081297
3.295
0.001
.0108205
.0427534
age  .0034182
.0041306
0.828
0.408
.0115306
.0046942
educ 
.0789402
.0088598
8.910
0.000
.06154
.0963403
tenure 
.053115
.0099413
5.343
0.000
.0335907
.0726393
married 
.0547462
.0501776
1.091
0.276
.0438005
.1532928
male 
.2411059
.0556864
4.330
0.000
.1317401
.3504717
white 
.1188029
.0735678
1.615
0.107
.0256812
.2632871
nrtheast  .1016799
.0721422
1.409
0.159
.2433643
.0400045
nrthcen  .0724782
.0667174
1.086
0.278
.2035085
.0585521
south  .0379854
.0655859
0.579
0.563
.1667934
.0908226
union 
.3143174
.0506381
6.207
0.000
.2148662
.4137686
expersq  .0004405
.0001417
3.109
0.002
.0007188
.0001623
tenuresq  .0013026
.0003863
3.372
0.000
.0020613
.000544
ind2  .3731778
.3742017
0.997
0.319
1.108095
.3617389
ind3  .0963657
.368639
0.261
0.794
.8203574
.6276261
ind4  .2351539
.3716415
0.633
0.527
.9650425
.4947348
ind5 
.0209362
.373072
0.056
0.955
.7117618
.7536342
ind6  .5083107
.3682535
1.380
0.168
1.231545
.214924
ind7 
.0033643
.3739442
0.009
0.993
.7310468
.7377754
ind8  .6107854
.376006
1.624
0.105
1.349246
.127675
ind9  .3257878
.3669437
0.888
0.375
1.04645
.3948746
_cons  .5750527
.4137824
1.390
0.165
1.387704
.2375989
+_se 
.5099298
.0151907
(Ancillary parameter)
105
Obs. summary:
1)
2)
3)
4)
5)
6)
7)
8)
ind2
ind3
ind4
ind5
ind6
ind7
ind8
ind9
F(
=
=
=
=
=
=
=
=
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
8,
595) =
Prob > F =
9.66
0.0000
The likelihood
industry dummies are economically significant, with a worker in, say, industry
eight earning about 61 cents less per hour in benefits than comparable worker
in industry one.
16.7. a. This follows because the densities conditional on y > 0 are identical
for the Tobit model and Craggs model.
17.3.
density of y given x and y > 0 is f(Wx)/[1  F(0x)], where F(Wx) is the cdf
106
of y given x.
s2, we get
that f(yx,y > 0) = {F(xB/s)} {f[(y  xB)/s]/s} for the Tobit model, and this
1
c. This follows very generally  not just for Craggs model or the Tobit
model  from (16.8):
log[E(yx)] = log[P(y > 0x)] + log[E(yx,y > 0)].
If we take the partial derivative with respect to log(x1) we clearly get the
sum of the elasticities.
would not have to worry about 100 percent of income being invested in a
pension plan).
c. From Problem 16.3(b), with a1 = 0, we have
E(yx) = (xB)W{F[(a2  xB)/s] +
F(xB/s)}
F[(xB  a2)/s].
(16.59)
B^
and
^
^
F[(xB
 10)/s], where
B^
and
That is why a
We simply
have
&
7
ci =  T
where
T
j _ 7&T1 S Pt8*X.
t=1
1 T



What if we
standardize each xit by its crosssectional mean and variance at time t, and
108
In

X/T,
1/2
r
t = 1,2,...,T.
ci =
Pt
rNasymptotically normal.

Then, form ^
zit
P^t
and
)^ t,
and
)t
The
are consistent
^ 1/2
^
_ )
(xit  Pt), and proceed
t
with the usual Tobit (or probit) unobserved effects analysis that includes the
1
time averages ^
zi = T

T
S ^zit. This is a rather simple twostep estimation
t=1
P^t
and
)^ t
would be
P^t
and
)^ t,
in
which case one might ignore the sampling error in the firststage estimates.
16.15. To be added.
CHAPTER 17
17.1. If you are interested in the effects of things like age of the building
and neighborhood demographics on fire damage, given that a fire has occured,
then there is no problem.
of the probability that buildings catch fire, given building and neighborhood
characteristics.
is another
f(yx i ;B,G)
, a1(xi) < y < a2(xi).
F(a2(xi)xi;B,G)  F(a 1 (xi)xi;B,G)

In the Hausman and Wise (1977) study, yi = log(incomei), a1(xi) = 8, and
a2(xi) was a function of family size (which determines the official poverty
level).
^
17.5. If we replace y2 with y2, we need to see what happens when y2 = zD2 + v2
is plugged into the structural mode:
y1 = z1D1 +
= z1D1 +
So, the procedure is to replace
a1W(zD2 + v2) + u1
a1W(zD2) + (u1 + a1v2).
D2
in (17.81) its
(17.81)
a1v2.
If the
a1v2
Then
we can write
E(y1z,v3) = z1D1 +
where E[(u1 +
a1W(zD2) + g1v3,
Conditioning on y3 = 1 gives
a1W(zD2) + g1l(zD3).
(17.82)
110
17.7. a. Substitute the reduced forms for y1 and y2 into the third equation:
y3 = max(0,a1(zD1) +
_ max(0,zP3 + v3),
where v3
_ u3 + a1v1 + a2v2.
Thus, if we knew
D1
and
D2,
we could consistently
>From the
D1
estimators of
entire sample.
D2.
and
Estimation of
Estimation of
D1
D2
is simple:
the system
y1 = zD1 + v1
(17.83)
y3 = max(0,zP3 + v3),
(17.84)
D1
^
Then, obtain
and
D^2,
^
^
form ziD1 and ziD2 for each observation i in the sample.
yi3
^
^
on (ziD1), (ziD2), zi3
also in z3.
Obtaining the correct asymptotic variance matrix is complicated.
It is
D2
s23.
17.9. To be added.
We only need to
If we have a random
rNasymptotically

normal.
c. We would use a standard probit model.
x follows a probit model with P(w = 1x) =
d. E(yx) = P(y > 0x)WE(yx,y > 0) =
the NLS estimator of
Then w given
F(xG).
F(xG)Wexp(xB).
So we would plug in
G.
Confusion arises, I think, when two part models are specified with
y = wWexp(xB + u),
w = 1[xG + v > 0],
112
so that w = 0
6 y = 0.
Then, if u and
So
normal distribution for v then we have the usual inverse Mills ratio added to
the linear model:
E[log(y)x,w = 1] = xB +
A twostep strategy for estimating
probit of wi on xi to get
and
and
l(xi^G).
rl(xG).
is pretty clear.
First, estimate a
^ ^
l(xi^G) to obtain B
, r.
A standard t
This twostep procedure reveals a potential problem with the model that
allows u and v to be correlated:
comes entirely from the nonlinearity of the IMR, which we warned about in this
chapter.
Ideally, we would have a variable that affects P(w = 1x) that can
allow for fixed costs of entering the labor market, one would try to find a
variable that affects the fixed costs of being employed that does not affect
the choice of hours.
If we assume (u,v) is multivariate normal, with mean zero, then we can
use a full maximum likelihood procedure.
113
we can
For
one,
E(yx,y > 0) = exp(x,B)WE[exp(u)x,w = 1)],
where E[exp(u)x,w = 1)] can be obtained under joint normality.
A similar
F(xG).
The point is
This
17.13. a. We cannot use censored Tobit because that requires observing x when
whatever the value of y.
we use the
derived price, we need sufficient price variation for the population that
consumes some of the good.
114
CHAPTER 18
Therefore, by (18.5),

18.3. The following Stata session estimates a using the three different
regression approaches.
. probit train re74 re75 age agesq nodegree married black hisp
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
302.1
= 294.07642
= 294.06748
= 294.06748
Probit estimates
Number of obs
LR chi2(8)
Prob > chi2
Pseudo R2
=
=
=
=
445
16.07
0.0415
0.0266
train 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+re74  .0189577
.0159392
1.19
0.234
.0501979
.0122825
re75 
.0371871
.0271086
1.37
0.170
.0159447
.090319
age  .0005467
.0534045
0.01
0.992
.1052176
.1041242
agesq 
.0000719
.0008734
0.08
0.934
.0016399
.0017837
nodegree 
.44195
.1515457
2.92
0.004
.7389742
.1449258
married 
.091519
.1726192
0.53
0.596
.2468083
.4298464
black  .1446253
.2271609
0.64
0.524
.5898524
.3006019
115
hisp  .5004545
.3079227
1.63
0.104
1.103972
.1030629
_cons 
.2284561
.8154273
0.28
0.779
1.369752
1.826664
. predict phat
(option p assumed; Pr(train))
. sum phat
Variable 
Obs
Mean
Std. Dev.
Min
Max
+phat 
445
.4155321
.0934459
.1638736
.6738951
. gen traphat0 = train*(phat  .416)
. reg unem78 train phat
Source 
SS
df
MS
+Model 
1.3226496
2 .661324802
Residual  93.4998223
442
.21153806
+Total  94.8224719
444 .213564126
Number of obs
F( 2,
442)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
445
3.13
0.0449
0.0139
0.0095
.45993
unem78 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+train 
.110242
.045039
2.45
0.015
.1987593
.0217247
phat  .0101531
.2378099
0.04
0.966
.4775317
.4572254
_cons 
.3579151
.0994803
3.60
0.000
.1624018
.5534283
. reg unem78 train phat traphat0
Source 
SS
df
MS
+Model  1.79802041
3 .599340137
Residual  93.0244515
441 .210939799
+Total  94.8224719
444 .213564126
Number of obs
F( 3,
441)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
445
2.84
0.0375
0.0190
0.0123
.45928
unem78 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+train  .1066934
.0450374
2.37
0.018
.195208
.0181789
phat 
.3009852
.3151992
0.95
0.340
.3184939
.9204644
traphat0 
.719599
.4793509
1.50
0.134
1.661695
.222497
_cons 
.233225
.129489
1.80
0.072
.0212673
.4877173
. reg unem78 train re74 re75 age agesq nodegree married black hisp
116
Source 
SS
df
MS
+Model  5.09784844
9 .566427604
Residual  89.7246235
435 .206263502
+Total  94.8224719
444 .213564126
Number of obs
F( 9,
435)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
445
2.75
0.0040
0.0538
0.0342
.45416
unem78 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+train  .1105582
.0444832
2.49
0.013
.1979868
.0231295
re74  .0025525
.0053889
0.47
0.636
.0131441
.0080391
re75 
.007121
.0094371
0.75
0.451
.025669
.0114269
age 
.0304127
.0189565
1.60
0.109
.0068449
.0676704
agesq  .0004949
.0003098
1.60
0.111
.0011038
.0001139
nodegree 
.0421444
.0550176
0.77
0.444
.0659889
.1502777
married  .0296401
.0620734
0.48
0.633
.1516412
.0923609
black 
.180637
.0815002
2.22
0.027
.0204538
.3408202
hisp  .0392887
.1078464
0.36
0.716
.2512535
.1726761
_cons  .2342579
.2905718
0.81
0.421
.8053572
.3368413

status was randomly assigned, so we are not surprised that different methods
lead to roughly the same estimate.
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
302.1
= 294.07642
= 294.06748
= 294.06748
Probit estimates
Number of obs
LR chi2(8)
Prob > chi2
Pseudo R2
=
=
=
=
445
16.07
0.0415
0.0266
117
train 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+re74  .0189577
.0159392
1.19
0.234
.0501979
.0122825
re75 
.0371871
.0271086
1.37
0.170
.0159447
.090319
age  .0005467
.0534045
0.01
0.992
.1052176
.1041242
agesq 
.0000719
.0008734
0.08
0.934
.0016399
.0017837
nodegree 
.44195
.1515457
2.92
0.004
.7389742
.1449258
married 
.091519
.1726192
0.53
0.596
.2468083
.4298464
black  .1446253
.2271609
0.64
0.524
.5898524
.3006019
hisp  .5004545
.3079227
1.63
0.104
1.103972
.1030629
_cons 
.2284561
.8154273
0.28
0.779
1.369752
1.826664
. predict phat
(option p assumed; Pr(train))
. reg re78 train re74 re75 age agesq nodegree married black hisp (phat re74
re75 age agesq nodegree married black hisp)
Instrumental variables (2SLS) regression
Source 
SS
df
MS
+Model  703.776258
9
78.197362
Residual  18821.8804
435 43.2686905
+Total  19525.6566
444 43.9767041
Number of obs
F( 9,
435)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
445
1.75
0.0763
0.0360
0.0161
6.5779
re78 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+train 
.0699177
18.00172
0.00
0.997
35.31125
35.45109
re74 
.0624611
.1453799
0.43
0.668
.2232733
.3481955
re75 
.0863775
.2814839
0.31
0.759
.4668602
.6396151
age 
.1998802
.2746971
0.73
0.467
.3400184
.7397788
agesq  .0024826
.0045238
0.55
0.583
.0113738
.0064086
nodegree  1.367622
3.203039
0.43
0.670
7.662979
4.927734
married 
.050672
1.098774
0.05
0.963
2.210237
2.108893
black  2.203087
1.554259
1.42
0.157
5.257878
.8517046
hisp  .2953534
3.656719
0.08
0.936
7.482387
6.89168
_cons 
4.613857
11.47144
0.40
0.688
17.93248
27.1602
. reg phat re74 re75 age agesq nodegree married black hisp
Source 
SS
df
MS
+Model  3.87404126
8 .484255158
Residual  .003026272
436 6.9410e06
+Total  3.87706754
444 .008732134
118
Number of obs
F( 8,
436)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
445
=69767.44
= 0.0000
= 0.9992
= 0.9992
= .00263
phat 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+re74  .0069301
.0000312 222.04
0.000
.0069914
.0068687
re75 
.0139209
.0000546
254.82
0.000
.0138135
.0140283
age  .0003207
.00011
2.92
0.004
.0005368
.0001046
agesq 
.0000293
1.80e06
16.31
0.000
.0000258
.0000328
nodegree  .1726018
.000316 546.14
0.000
.1732229
.1719806
married 
.0352802
.00036
98.01
0.000
.0345727
.0359877
black  .0562315
.0004726 118.99
0.000
.0571603
.0553027
hisp  .1838453
.0006238 294.71
0.000
.1850713
.1826192
_cons 
.5907578
.0016786
351.93
0.000
.5874586
.594057
b. The IV estimate of a is very small  .070, much smaller than when we
used either linear regression or the propensity score in a regression in
Example 18.2.
^
(When we do not instrument for train, a = 1.625, se = .640.)
The very large standard error (18.00) suggests severe collinearity among the
instruments.
^
c. The collinearity suspected in part b is confirmed by regressing Fi on
the xi:
^
variation in Fi that cannot be explained by xi.
d. This example illustrates why trying to achieve identification off of a
nonlinearity can be fraught with problems.
18.7. To be added.
J)D
+ u + wWv + e,
and, again, we will replace wWv with its expectation given (x,z) and an error.
But E(wWvx,z) = E[E(wWvx,z,v)x,z] = E[E(wx,z,v)Wvx,z] = E[exp(p0 + xP1 +
zP2 + p3v)Wvx,z] = xWexp(p0 + xP1 + zP2) where x = E[exp(p3v)Wv], and we have
119
Now, define r = u + [w 
J)D
+ xE(wx,z) + r, E(rx,z) = 0.
If h
assume we can write w = exp(p0 + xP1 + zP2 + g), where E(ug,x,z) = rWg and
E(vg,x,z) = qWg.
on (g,x,z):
E(yv,x,z) = h0 + xG + bw + wW(x 
J)D
+ E(ug,x,z) + wE(vgx,z)
+ E(eg,x,z)
= h0 + xG + bw + wW(x 
J)D
+ rWg + qwWg,
where we have used the fact that w is a function of (g,x,z) and E(eg,x,z) =
0.
P1,
and
P2
from the
^
^
yi on 1, xi, wi, wi(xi  x), gi, wigi, i = 1,...,N.

test  on the last two terms effectively tests the null hypothesis that w is
exogenous.
CHAPTER 19
Write q(m)
2
 m
3
mo
2
= mo
1
The first
The
2
< 0.
The
Number of obs
F( 7,
799)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
807
6.38
0.0000
0.0529
0.0446
13.412
cigs 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lcigpric  .8509044
5.782321
0.15
0.883
12.20124
10.49943
lincome 
.8690144
.7287636
1.19
0.233
.561503
2.299532
restaurn  2.865621
1.117406
2.56
0.011
5.059019
.6722235
white  .5592363
1.459461
0.38
0.702
3.424067
2.305594
educ  .5017533
.1671677
3.00
0.003
.829893
.1736136
age 
.7745021
.1605158
4.83
0.000
.4594197
1.089585
121
agesq  .0090686
.0017481
5.19
0.000
.0124999
.0056373
_cons  2.682435
24.22073
0.11
0.912
50.22621
44.86134
. test lcigpric lincome
( 1)
( 2)
lcigpric = 0.0
lincome = 0.0
F(
2,
799) =
Prob > F =
0.71
0.4899
. reg cigs lcigpric lincome restaurn white educ age agesq, robust
Regression with robust standard errors
Number of obs
F( 7,
799)
Prob > F
Rsquared
Root MSE
=
=
=
=
=
807
9.38
0.0000
0.0529
13.412

Robust
cigs 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lcigpric  .8509044
6.054396
0.14
0.888
12.7353
11.0335
lincome 
.8690144
.597972
1.45
0.147
.3047671
2.042796
restaurn  2.865621
1.017275
2.82
0.005
4.862469
.8687741
white  .5592363
1.378283
0.41
0.685
3.26472
2.146247
educ  .5017533
.1624097
3.09
0.002
.8205533
.1829532
age 
.7745021
.1380317
5.61
0.000
.5035545
1.04545
agesq  .0090686
.0014589
6.22
0.000
.0119324
.0062048
_cons  2.682435
25.90194
0.10
0.918
53.52632
48.16145
. test lcigpric lincome
( 1)
( 2)
lcigpric = 0.0
lincome = 0.0
F(
2,
799) =
Prob > F =
1.07
0.3441
Poisson regression
Number of obs
LR chi2(7)
122
=
=
807
1068.70
Log likelihood =
8111.519
=
=
0.0000
0.0618
cigs 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+lcigpric  .1059607
.1433932
0.74
0.460
.3870061
.1750847
lincome 
.1037275
.0202811
5.11
0.000
.0639772
.1434779
restaurn  .3636059
.0312231
11.65
0.000
.4248021
.3024098
white  .0552012
.0374207
1.48
0.140
.1285444
.0181421
educ  .0594225
.0042564
13.96
0.000
.0677648
.0510802
age 
.1142571
.0049694
22.99
0.000
.1045172
.1239969
agesq  .0013708
.000057
24.07
0.000
.0014825
.0012592
_cons 
.3964494
.6139626
0.65
0.518
.8068952
1.599794
. glm cigs lcigpric lincome restaurn white educ age agesq, family(poisson)
sca(x2)
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
= 8380.1083
= 8111.6454
= 8111.519
= 8111.519
=
=
No. of obs
Residual df
Scale param
(1/df) Deviance
(1/df) Pearson
14752.46933
16232.70987
[Poisson]
[Log]
Log likelihood
BIC
AIC
= 8111.519022
= 14698.92274
=
=
=
=
=
807
799
1
18.46367
20.31628
20.12272
cigs 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+lcigpric  .1059607
.6463244
0.16
0.870
1.372733
1.160812
lincome 
.1037275
.0914144
1.13
0.257
.0754414
.2828965
restaurn  .3636059
.1407338
2.58
0.010
.6394391
.0877728
white  .0552011
.1686685
0.33
0.743
.3857854
.2753831
educ  .0594225
.0191849
3.10
0.002
.0970243
.0218208
age 
.1142571
.0223989
5.10
0.000
.0703561
.158158
agesq  .0013708
.0002567
5.34
0.000
.001874
.0008677
_cons 
.3964493
2.76735
0.14
0.886
5.027457
5.820355
(Standard errors scaled using square root of Pearson X2based dispersion)
* The estimate of sigma is
123
. di sqrt(20.32)
4.5077711
. poisson cigs restaurn white educ age agesq
Iteration 0:
Iteration 1:
Iteration 2:
Poisson regression
Number of obs
LR chi2(5)
Prob > chi2
Pseudo R2
=
=
=
=
807
1041.16
0.0000
0.0602
cigs 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+restaurn  .3545336
.0308796
11.48
0.000
.4150564
.2940107
white  .0618025
.037371
1.65
0.098
.1350483
.0114433
educ  .0532166
.0040652
13.09
0.000
.0611842
.0452489
age 
.1211174
.0048175
25.14
0.000
.1116754
.1305594
agesq  .0014458
.0000553
26.14
0.000
.0015543
.0013374
_cons 
.7617484
.1095991
6.95
0.000
.5469381
.9765587
. di 2*(8125.291  8111.519)
27.544
. * This is the usual LR statistic.
. * dividing by 20.32:
. di 2*(8125.291  8111.519)/(20.32)
1.3555118
. glm cigs lcigpric lincome restaurn white educ age agesq, family(poisson)
robust
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
= 8380.1083
= 8111.6454
= 8111.519
= 8111.519
=
=
No. of obs
Residual df
Scale param
(1/df) Deviance
(1/df) Pearson
14752.46933
16232.70987
[Poisson]
[Log]
124
=
=
=
=
=
807
799
1
18.46367
20.31628
Log likelihood
BIC
= 8111.519022
= 14698.92274
AIC
20.12272

Robust
cigs 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+lcigpric  .1059607
.6681827
0.16
0.874
1.415575
1.203653
lincome 
.1037275
.083299
1.25
0.213
.0595355
.2669906
restaurn  .3636059
.140366
2.59
0.010
.6387182
.0884937
white  .0552011
.1632959
0.34
0.735
.3752553
.264853
educ  .0594225
.0192058
3.09
0.002
.0970653
.0217798
age 
.1142571
.0212322
5.38
0.000
.0726427
.1558715
agesq  .0013708
.0002446
5.60
0.000
.0018503
.0008914
_cons 
.3964493
2.97704
0.13
0.894
5.438442
6.23134
. di .1143/(2*.00137)
41.715328
a. Neither the price nor income variable is significant at any reasonable
significance level, although the coefficient estimates are the expected sign.
It does not matter whether we use the usual or robust standard errors.
The
two variables are jointly insignificant, too, using the usual and
heteroskedasticityrobust tests (pvalues = .490, .344, respectively).
b. While the price variable is still very insignificant (pvalue = .46),
the income variable, based on the usual Poisson standard errors, is very
significant: t = 5.11.
cigpric and restaurn vary only at the state level, and, not surprisingly, they
are significantly correlated.
125
^
c. The GLM estimate of s is s = 4.51.
The t statistic on
lcigpric is now very small (.16), and that on lincome falls to 1.13  much
more in line with the linear model t statistic (1.19 with the usual standard
errors).
restriction variable, education, and the age variables are still significant.
(Interestingly, there is no race effect, conditional on the other covariates.)
d. The usual LR statistic is 2(8125.291  8111.519) = 27.54, which is a
2
~ 0).
^2
divides the usual LR statistic by s = 20.32, so QLR = 1.36 (pvalue
~ .51).
As expected, the QLR statistic shows that the variables are jointly
insignificant, while the LR statistic shows strong significance.
e. Using the robust standard errors does not significantly change any
conclusions; in fact, most explanatory variables become slightly more
significant than when we use the GLM standard errors.
^
the adjustment by s > 1 that makes the most difference.
In this example, it is
Having fully robust
^
^
bage/(2bage2)
~ 41.72.
It
> 1) as a

First,
Var(yitxi) = E[Var(yitxi,ci)xi] + Var[E(yitxi,ci)xi]
= E[ciexp(xitB)xi] + Var[ciexp(xitB)xi]
= exp(a + xitB) + t [exp(xitB)] ,
2
where t
A similar,
B,
estimate.
_ yit  E(yitxit).
2
because t
[We could
2
E{[uit/exp(xitB)][uir/exp(xirB)]}, all t
2
$ r.
~
~
Section 19.6.3; see also Problem 12.11. The matrix Wi(D) = W(xi,D) has
~
~
~
~2
~ 2
diagonal elements yit + t [exp(xitB)] , t = 1,...,T and offdiagonal elements
~
~
~2
~
~
~ ~
t exp(xitB)exp(xirB), t $ r. Let a, B be the solutions to
N
~ 1
min (1/2) S [yi  m(xi,a,B)][Wi(D)] [yi  m(xi,a,B)],
i=1
a,B
th
of MWNLS, we need the score of the comditional mean function, with respect to
all parameters, evaluated under H0.
Let
Q _
= 0.
which would be 1
Usually, xit
would contain year dummies or other aggregate effects, and these would be

Let
Q _
~ ~ ~
(a,B,G), is
~
~
~ 1~
si(Q) = DqM(xi,Q)[Wi(D)] ui,
~
where ui is the T
128
The
S DqM(xi,Q~)[Wi(~D)]1DqM(xi,~Q),
i=1
a (1 + 2K)
* (1 + 2K) matrix.
& S D M(x ,~Q)[W (~D)]1~u *& SN D M(x ,~Q)[W (~D)]1D M(x ,~Q)*1
i
i
i8 7
q i
i
q i 8
7i=1 q
i=1
N
W&7 S DqM(xi,Q~)[Wi(~D)]1~ui*8.
N
LM =
i=1
If only J < K

assumptions.
~
A fully robust form is given in equation (12.68), where si(Q)
~
~
1
and A are as given above, and B = N
S si(~Q)si(~Q).
i=1
are written as
matrix is K
= 0, we take c(Q) =
G,
~
and so C = [0IK], where the zero
* (1 + K).
Gamma(d,d), then things are even easier  at least if we have software that
estimates random effects Poisson models.
$ r
aixi ~ Gamma(d,d).
In other words, the full set of random effects Poisson assumptions holds, but
where the mean function in the Poisson distribution is aiexp(a + xitB + xiG).


yt = 0,1,2,....
Multiplying these together gives the joint density of (yi1,...,yiT) given (xi
= x, ci = c).
t=1
(ni/ci) =
S m(xit,B).
t=1
B,
we have ci(B) =
_ S m(xit,B).
t=1
= [ni/Mi(B)]Mi(B) +
= ni + nilog(ni) +
T
S yit{log[ni/Mi(B)] + log[m(xit,B)]
t=1
T
S yit{log[m(xit,B)/Mi(B)]
t=1
t=1
i=1
li(ci,B)
with respect to
li[ci(B),B].
i=1
depend on
i=1
Therefore,
130
Number of obs
F( 4,
675)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
680
72.92
0.0000
0.3017
0.2976
.14287
atndrte 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+ACT  .0169202
.001681
10.07
0.000
.0202207
.0136196
priGPA 
.1820163
.0112156
16.23
0.000
.1599947
.2040379
frosh 
.0517097
.0173019
2.99
0.003
.0177377
.0856818
soph 
.0110085
.014485
0.76
0.448
.0174327
.0394496
_cons 
.7087769
.0417257
16.99
0.000
.6268492
.7907046
. predict atndrteh
(option xb assumed; fitted values)
. sum atndrteh
Variable 
Obs
Mean
Std. Dev.
Min
Max
+atndrteh 
680
.8170956
.0936415
.4846666
1.086443
. count if atndrteh > 1
12
. glm atndrte ACT priGPA frosh soph, family(binomial) sca(x2)
note: atndrte has noninteger values
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
226.64509
223.64983
223.64937
223.64937
No. of obs
131
680
Optimization
: ML: NewtonRaphson
Deviance
Pearson
=
=
Residual df
Scale param
(1/df) Deviance
(1/df) Pearson
285.7371358
85.57283238
[Bernoulli]
[Logit]
Log likelihood
BIC
AIC
= 223.6493665
= 253.1266718
=
=
=
=
675
1
.4233143
.1267746
.6724981
atndrte 
Coef.
Std. Err.
z
P>z
[95% Conf. Interval]
+ACT  .1113802
.0113217
9.84
0.000
.1335703
.0891901
priGPA 
1.244375
.0771321
16.13
0.000
1.093199
1.395552
frosh 
.3899318
.113436
3.44
0.001
.1676013
.6122622
soph 
.0928127
.0944066
0.98
0.326
.0922209
.2778463
_cons 
.7621699
.2859966
2.66
0.008
.201627
1.322713
(Standard errors scaled using square root of Pearson X2based dispersion)
. di (.1268)^2
.01607824
. di exp(.7622  .1114*30 + 1.244*3)/(1 + exp(.7622  .1114*30 + 1.244*3))
.75991253
. di exp(.7622  .1114*25 + 1.244*3)/(1 + exp(.7622  .1114*25 + 1.244*3))
.84673249
. di .760  .847
.087
. predict atndh
(option mu assumed; predicted mean atndrte)
. sum atndh
Variable 
Obs
Mean
Std. Dev.
Min
Max
+atndh 
680
.8170956
.0965356
.3499525
.9697185
. corr atndrte atndh
(obs=680)
 atndrte
atndh
+atndrte 
1.0000
atndh 
0.5725
1.0000
132
. di (.5725)^2
.32775625
a. The coefficient on ACT means that if the ACT score increases by 5
points  more than a one standard deviation increase  then the attendance
rate is estimated to fall by about .017(5) = .085, or 8.5 percentage points.
The coefficient on priGPA means that if prior GPA is one point higher, the
attendance rate is predicted to be about .182 higher, or 18.2 percentage
points.
^
Note that s
~ .0161.
In other words, the usual MLE standard errors, obtained, say, from the
expected Hessian of the quasilog likelihood, are much too large.
2
The
(If you
omit the "sca(x2)" option in the "glm" command, you will get the usual MLE
standard errors.)
c. Since the coefficient on ACT is negative, we know that an increase in
ACT score, holding year and prior GPA fixed, actually reduces predicted
attendance rate.
the estimated fall in atndrte is about .087, or about 8.7 percentage points.
This is very similar to that found using the linear model.
d. The Rsquared for the linear model is about .302.
19.11. To be added.
20.1. To be added.
20.3. a. If all durations in the sample are censored, di = 0 for all i, and so
the loglikelihood is
i=1
i=1
S exp(xiB)cai .
N
loglikelihood is 
i=1
N
S cai .
i=1
N
S cai > 0.
i=1
But then, for any a > 0, the loglikelihood is maximized by minimizing exp(b)
across b.
But as b
8, exp(b)
0.
likelihood will lead to b getting more and more negative without bound.
So no
20.5. a. P(ti
< txi,ai,ci,si = 1) = P(t*i < txi,t*i > b  ai) = P(t*i < t,t*i >
b  aixi)/P(ti
*
>
b  aixi) = P(ti
*
< txi)/P(t*i
>
134
cixi)/P(ti
*
aixi)].
D(aici,xi) = D(aixi), the density of (ai,ti) given (ci,xi) does not depend
*
8.
This is
also the conditional density of (ai,ti) given (ci,xi) when t < ci, that is,
the observation is uncensored.
> b 
Using
20.9. To be added.
135