Vous êtes sur la page 1sur 135

# This file and several accompanying files contain the solutions to the oddnumbered problems in the book Econometric

## Analysis of Cross Section and Panel

Data, by Jeffrey M. Wooldridge, MIT Press, 2002.

## The empirical examples are

solved using various versions of Stata, with some dating back to Stata 4.0.
Partly out of laziness, but also because it is useful for students to see
computer output, I have included Stata output in most cases rather than type
tables.

## In some cases, I do more hand calculations than are needed in current

versions of Stata.
Currently, there are some missing solutions.

## occasionally to fill in the missing solutions, and to make corrections.

some problems I have given answers beyond what I originally asked.

For

Please

report any mistakes or discrepencies you might come across by sending me email at wooldri1@msu.edu.

CHAPTER 2

dE(y|x1,x2)
dE(y|x1,x2)
= b1 + b4x2 and
= b2 + 2b3x2 + b4x1.
dx1
dx2
2
b. By definition, E(u|x1,x2) = 0. Because x2 and x1x2 are just functions

2.1. a.

-----------------------------------------------------

-----------------------------------------------------

## of (x1,x2), it does not matter whether we also condition on them:

E(u|x1,x2,x2,x1x2) = 0.
2

and x2:

and x2:
b.

E(u|x1,x2) = 0.

## We can say nothing further about u.

dE(y|x1,x2)/dx1 = b1 + b3x2.

Because E(x2) = 0, b1 =
1

E[dE(y|x1,x2)/dx1].

Similarly, b2 = E[dE(y|x1,x2)/dx2].

= 0.

2

## projections (Property LP.5 in Appendix 2A):

L(y|1,x1,x2) = L(b0 + b1x1 + b2x2 + b3x1x2|1,x1,x2)
= b0 + b1x1 + b2x2 + b3L(x1x2|1,x1,x2)
= b0 + b1x1 + b2x2.
d. Equation (2.47) is more useful because it allows us to compute the
partial effects of x1 and x2 at any values of x1 and x2.

Under the

assumptions we have made, the linear projection in (2.48) does have as its
slope coefficients on x1 and x2 the partial effects at the population average
values of x1 and x2 -- zero in both cases -- but it does not allow us to
obtain the partial effects at any other values of x1 and x2.

Incidentally,

## the main conclusions of this problem go through if we allow x1 and x2 to have

any population means.

2

## assumption, these are constant and necessarily equal to s1

Var(u2), respectively.

By

2

## But then Property CV.4 implies that s2

> s21.

This

simple conclusion means that, when error variances are constant, the error
variance falls as more explanatory variables are conditioned on.

## 2.7. Write the equation in error form as

2

y = g(x) + zB + u, E(u|x,z) = 0.
Take the expected value of this equation conditional only on x:
E(y|x) = g(x) + [E(z|x)]B,
and subtract this from the first equation to get
y - E(y|x) = [z - E(z|x)]B + u
~
~
or y = zB + u.

~
~
Because z is a function of (x,z), E(u|z) = 0 (since E(u|x,z) =

~ ~
~
0), and so E(y|z) = zB.

## using very flexible methods, typically, so-called nonparametric methods.

~
Then, after obtaining residuals of the form yi
^
E(zi|xi),

^
~
_ yi - E(y
i|xi) and zi _ zi - -

~
~
is estimated from an OLS regression yi on zi, i = 1,...,N.

Under

to a

-----

## and Powell (1994).

CHAPTER 3

3.1. To prove Lemma 3.1, we must show that for all e > 0, there exists be <
and an integer Ne such that P[|xN|
following fact:

since xN

p
L

But

We use the

## that P[|xN - a| > 1] < e for all N

Definition 3.3(1).]

> Ne .

## |xN| = |xN - a + a| < |xN - a| + |a| (by the triangle

inequality), and so

|a| > 1]

_ |a| + 1

## (irrespective of the value of e) and then the existence of Ne follows from

Definition 3.3(1).
3

p
L

g(c).

2

-----

b. By the CLT,

-----

-----

-----

-----

-----

-----

-----

Avar(yN) = s /N.
-----

-----

-----

Therefore,
-----

-----

## d. The asymptotic standard deviation of yN is the square root of its

asymptotic variance, or s/rN.
-----

-----

estimator of s.
-1 N

1)

## S (yi - yN)2, and then s^ is the positive square root.

-----

is used:

^2
s = (N -

The asymptotic

i=1

^
standard error of yN is simply s/rN.
-----

-----

## 3.7. a. For q > 0 the natural logarithim is a continuous function, and so

^
^
plim[log(q)] = log[plim(q)] = log(q) = g.
^
b. We use the delta method to find Avar[rN(g - g)].
-----

## In the scalar case,

^
^
^
2
^
if g = g(q) then Avar[rN(g - g)] = [dg(q)/dq] Avar[rN(q - q)].
-----

-----

When g(q) =

^
log(q) -- which is, of course, continuously differentiable -- Avar[rN(g - g)]
-----

2
^
= (1/q) Avar[rN(q - q)].
-----

^
c. In the scalar case, the asymptotic standard error of g is generally

|dg(^q)/dq|Wse(^q).

^
^ ^
Therefore, for g(q) = log(q), se(g) = se(q)/q.

^
^
and se(q) = 2, g = log(4)

^
When q = 4

## ~ 1.39 and se(g^) = 1/2.

^
^
d. The asymptotic t statistic for testing H0: q = 1 is (q - 1)/se(q) =
3/2 = 1.5.
e. Because g = log(q), the null of interest can also be stated as H0: g =
4

0.

^
The t statistic based on g is about 1.39/(.5) = 2.78.

This leads to a

^
very strong rejection of H0, whereas the t statistic based on q is, at best,
marginally significant.

## 3.9. By the delta method,

^
Avar[rN(G -----

where G(Q) =

G)]

~
= G(Q)V1G(Q), Avar[rN(G -----

Dqg(Q) is Q * P.

~
Avar[rN(G -----

G)]

G)]

= G(Q)V2G(Q),

Therefore,

^
- Avar[rN(G -----

G)]

= G(Q)(V2 - V1)G(Q).

CHAPTER 4

## 4.1. a. Exponentiating equation (4.49) gives

wage = exp(b0 + b1married + b2educ + zG + u)
= exp(u)exp(b0 + b1married + b2educ + zG).
Therefore,
E(wage|x) = E[exp(u)|x]exp(b0 + b1married + b2educ + zG),
where x denotes all explanatory variables.
then E[exp(u)|x] = E[exp(u)] = d0, say.

Therefore

## E(wage|x) = d0exp(b0 + b1married + b2educ + zG).

Now, finding the proportionate difference in this expectation at married = 1
and married = 0 (with all else equal) gives exp(b1) - 1; all other factors
cancel out.

## b. Since q1 = 100W[exp(b1) - 1] = g(b1), we need the derivative of g with

5

dg/db1 = 100Wexp(b1).

respect to b1:

^
The asymptotic standard error of q1

^
using the delta method is obtained as the absolute value of dg/db1 times
^
se(b1):
^
^
^
se(q1) = [100Wexp(b1)]Wse(b1).
c. We can evaluate the conditional expectation in part (a) at two levels
of education, say educ0 and educ1, all else fixed.

## in expected wage from educ0 to educ1 is

[exp(b2educ1) - exp(b2educ0)]/exp(b2educ0)
= exp[b2(educ1 - educ0)] - 1 = exp(b2Deduc) - 1.
^
Using the same arguments in part (b), q2 = 100W[exp(b2Deduc) - 1] and
^
^
^
se(q2) = 100W|Deduc|exp(b2Deduc)se(b2)
^
^
d. For the estimated version of equation (4.29), b1 = .199, se(b1) =
^
^
.039, b2 = .065, se(b2) = .006.
^
q2 we set Deduc = 4.

For

^
^
Then q2 = 29.7 and se(q2) = 3.11.

## 4.3. a. Not in general.

Var(u|x) = E(u

^
^
Therefore, q1 = 22.01 and se(q1) = 4.76.

## b. It could be that E(xu) = 0, in which case OLS is consistent, and

Var(u|x) is constant.

## 4.5. Write equation (4.50) as E(y|w) = wD, where w = (x,z).

2
^
s , it follows by Theorem 4.2 that Avar rN(D -----

^ ^
(B,g).

D)

Since Var(y|w) =

is s [E(ww)] ,where
2

-1

2

the upper K

* K block gives
6

## Inverting E(ww) and focusing on

Avar

^
2
-1
rN(B
- B) = s [E(xx)] .
-----

## Next, we need to find Avar

where v = gz + u and u
E(xv) = 0.

~
rN(B
- B).

It is helpful to write y = xB + v

-----

_ y - E(y|x,z).

Further, E(v

2

Unless E(z

Avar

## So, without further assumptions,

~
-1
2
-1
rN(B
- B) = [E(xx)] E(v xx)[E(xx)] .
-----

## Now we can show Avar

~
^
rN(B
- B) - Avar rN(B - B) is positive semi-definite by
-----

-----

writing
Avar

~
^
rN(B
- B) - Avar rN(B - B)
-----

-----

-1

-1

## = [E(xx)] E(v xx)[E(xx)]

-1

-1

- s [E(xx)]
2

-1

- s [E(xx)] E(xx)[E(xx)]
2

-1

-1

## = [E(xx)] [E(v xx) - s E(xx)][E(xx)] .

-1

Because [E(xx)]

-1

s E(xx) is p.s.d.
2

-1

2

_ E(z2|x).

2

## a positive definite matrix except by fluke.

2

= h

\$ 0, is actually
|x) = E(z2)

In particular, if E(z

2

2 2

## 4.7. a. One important omitted factor in u is family income:

students that

come from wealthier families tend to do better in school, other things equal.
Family income and PC ownership are positively correlated because the
probability of owning a PC increases with family income.
7

Another factor in u

## This may also be correlated with PC:

a student

who had more exposure with computers in high school may be more likely to own
a computer.
^
b. b3 is likely to have an upward bias because of the positive
correlation between u and PC, but it is not clear-cut because of the other
explanatory variables in the equation.

## u = d0 + d1hsGPA + d2SAT + d3PC + r

then the bias is upward if d3 is greater than zero.

## correlation between u (say, family income) and PC, and it is likely to be

positive.
c. If data on family income can be collected then it can be included in
the equation.

education is.

## Another possibility is to use average house value in each

students home zip code, as zip code is often part of school records.

Proxies

## for high school quality might be faculty-student ratios, expenditure per

student, average teacher salary, and so on.

## Dlog(y) = b0 + xB + (a1 - 1)log(y-1) + u.

Clearly, the intercept and slope estimates on x will be the same.

The

## coefficient on log(y-1) changes.

b. For simplicity, let w = log(y), w-1 = log(y-1).

## slope coefficient in a simple regression is always a1 = Cov(w-1,w)/Var(w-1).

But, by assumption, Var(w) = Var(w-1), so we can write a1 =
Cov(w-1,w)/(sw sw), where sw
-1

-1

## = sd(w-1) and sw = sd(w).

But Corr(w-1,w) =

-1

## 4.11. Here is some Stata output obtained to answer this question:

. reg lwage exper tenure married south urban black educ iq kww
Source |
SS
df
MS
---------+-----------------------------Model | 44.0967944
9 4.89964382
Residual | 121.559489
925 .131415664
---------+-----------------------------Total | 165.656283
934 .177362188

Number of obs
F( 9,
925)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

935
37.28
0.0000
0.2662
0.2591
.36251

-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------exper |
.0127522
.0032308
3.947
0.000
.0064117
.0190927
tenure |
.0109248
.0024457
4.467
0.000
.006125
.0157246
married |
.1921449
.0389094
4.938
0.000
.1157839
.2685059
south | -.0820295
.0262222
-3.128
0.002
-.1334913
-.0305676
urban |
.1758226
.0269095
6.534
0.000
.1230118
.2286334
black | -.1303995
.0399014
-3.268
0.001
-.2087073
-.0520917
educ |
.0498375
.007262
6.863
0.000
.0355856
.0640893
iq |
.0031183
.0010128
3.079
0.002
.0011306
.0051059
kww |
.003826
.0018521
2.066
0.039
.0001911
.0074608
_cons |
5.175644
.127776
40.506
0.000
4.924879
5.426408
-----------------------------------------------------------------------------. test iq kww
( 1)
( 2)

iq = 0.0
kww = 0.0
F(

2,
925) =
Prob > F =

8.59
0.0002

a. The estimated return to education using both IQ and KWW as proxies for
ability is about 5%.

## lower estimated return to education, but it is still practically nontrivial

and statistically very significant.
b. We can see from the t statistics that these variables are going to be

jointly significant.

## The F test verifies this, with p-value = .0002.

c. The wage differential between nonblacks and blacks does not disappear.
Blacks are estimated to earn about 13% less than nonblacks, holding all other
factors fixed.

## 4.13. a. Using the 90 counties for 1987 gives

. reg lcrmrte lprbarr lprbconv lprbpris lavgsen if d87
Source |
SS
df
MS
-------------+-----------------------------Model | 11.1549601
4 2.78874002
Residual | 15.6447379
85
.18405574
-------------+-----------------------------Total |
26.799698
89 .301120202

Number of obs
F( 4,
85)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

90
15.15
0.0000
0.4162
0.3888
.42902

-----------------------------------------------------------------------------lcrmrte |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lprbarr | -.7239696
.1153163
-6.28
0.000
-.9532493
-.4946899
lprbconv | -.4725112
.0831078
-5.69
0.000
-.6377519
-.3072706
lprbpris |
.1596698
.2064441
0.77
0.441
-.2507964
.570136
lavgsen |
.0764213
.1634732
0.47
0.641
-.2486073
.4014499
_cons | -4.867922
.4315307
-11.28
0.000
-5.725921
-4.009923
-----------------------------------------------------------------------------Because of the log-log functional form, all coefficients are elasticities.
The elasticities of crime with respect to the arrest and conviction
probabilities are the sign we expect, and both are practically and
statistically significant.

## The elasticities with respect to the probability

of serving a prison term and the average sentence length are positive but are
statistically insignificant.
b. To add the previous years crime rate we first generate the lag:
. gen lcrmr_1 = lcrmrte[_n-1] if d87
(540 missing values generated)
. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 if d87
10

Source |
SS
df
MS
-------------+-----------------------------Model | 23.3549731
5 4.67099462
Residual |
3.4447249
84
.04100863
-------------+-----------------------------Total |
26.799698
89 .301120202

Number of obs
F( 5,
84)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

90
113.90
0.0000
0.8715
0.8638
.20251

-----------------------------------------------------------------------------lcrmrte |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lprbarr | -.1850424
.0627624
-2.95
0.004
-.3098523
-.0602325
lprbconv | -.0386768
.0465999
-0.83
0.409
-.1313457
.0539921
lprbpris | -.1266874
.0988505
-1.28
0.204
-.3232625
.0698876
lavgsen | -.1520228
.0782915
-1.94
0.056
-.3077141
.0036684
lcrmr_1 |
.7798129
.0452114
17.25
0.000
.6899051
.8697208
_cons | -.7666256
.3130986
-2.45
0.016
-1.389257
-.1439946
-----------------------------------------------------------------------------There are some notable changes in the coefficients on the original variables.
The elasticities with respect to prbarr and prbconv are much smaller now, but
still have signs predicted by a deterrent-effect story.
probability is no longer statistically significant.

The conviction

## Adding the lagged crime

rate changes the signs of the elasticities with respect to prbpris and avgsen,
and the latter is almost statistically significant at the 5% level against a
two-sided alternative (p-value = .056).

## Not surprisingly, the elasticity with

respect to the lagged crime rate is large and very statistically significant.
(The elasticity is also statistically different from unity.)
c. Adding the logs of the nine wage variables gives the following:

## . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 lwcon-lwloc if d87

Source |
SS
df
MS
-------------+-----------------------------Model | 23.8798774
14 1.70570553
Residual | 2.91982063
75 .038930942
-------------+-----------------------------Total |
26.799698
89 .301120202

Number of obs
F( 14,
75)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

90
43.81
0.0000
0.8911
0.8707
.19731

-----------------------------------------------------------------------------lcrmrte |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------11

lprbarr | -.1725122
.0659533
-2.62
0.011
-.3038978
-.0411265
lprbconv | -.0683639
.049728
-1.37
0.173
-.1674273
.0306994
lprbpris | -.2155553
.1024014
-2.11
0.039
-.4195493
-.0115614
lavgsen | -.1960546
.0844647
-2.32
0.023
-.364317
-.0277923
lcrmr_1 |
.7453414
.0530331
14.05
0.000
.6396942
.8509887
lwcon | -.2850008
.1775178
-1.61
0.113
-.6386344
.0686327
lwtuc |
.0641312
.134327
0.48
0.634
-.2034619
.3317244
lwtrd |
.253707
.2317449
1.09
0.277
-.2079524
.7153665
lwfir | -.0835258
.1964974
-0.43
0.672
-.4749687
.3079171
lwser |
.1127542
.0847427
1.33
0.187
-.0560619
.2815703
lwmfg |
.0987371
.1186099
0.83
0.408
-.1375459
.3350201
lwfed |
.3361278
.2453134
1.37
0.175
-.1525615
.8248172
lwsta |
.0395089
.2072112
0.19
0.849
-.3732769
.4522947
lwloc | -.0369855
.3291546
-0.11
0.911
-.6926951
.618724
_cons | -3.792525
1.957472
-1.94
0.056
-7.692009
.1069592
-----------------------------------------------------------------------------. testparm lwcon-lwloc
(
(
(
(
(
(
(
(
(

1)
2)
3)
4)
5)
6)
7)
8)
9)

lwcon
lwtuc
lwtrd
lwfir
lwser
lwmfg
lwfed
lwsta
lwloc
F(

=
=
=
=
=
=
=
=
=

0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

9,
75) =
Prob > F =

1.50
0.1643

The nine wage variables are jointly insignificant even at the 15% level.
Plus, the elasticities are not consistently positive or negative.

The two

largest elasticities -- which also have the largest absolute t statistics -have the opposite sign.

## d. Using the "robust" option in Stata, which is appended to the "reg"

command, gives the heteroskedasiticity-robust F statistic as F = 2.19 and
p-value = .032.

12

example.

## 4.15. a. Because each xj has finite second moment, Var(xB) <

<

8, Cov(xB,u) is well-defined.

Cov(xB,u) = 0.

8.

Since Var(u)

## Therefore, Var(y) = Var(xB) + Var(u), or sy = Var(xB) + su.

2

b. This is nonsense when we view the xi as random draws along with yi.
2

## The statement "Var(ui) = s

are nonrandom (or

This is

## another example of how the assumption of nonrandom regressors can lead to

counterintuitive conclusions.

## z, which is uncorrelated with each xj, suddenly becomes observed.

When we add

z to the regressor list, the error changes, and so does the error variance.
(It gets smaller.)

## In the vast majority of economic applications, it makes no

sense to think we have access to the entire set of factors that one would ever
want to control for, so we should allow for error variances to change across
different models for the same response variable.
2

c. Write R

= 1 - SSR/SST = 1 - (SSR/N)/(SST/N).

Therefore, plim(R ) = 1
2

## - plim[(SSR/N)/(SST/N)] = 1 - [plim(SSR/N)]/[plim(SST/N)] = 1 - su/sy = r ,

2

where we use the fact that SSR/N is a consistent estimator of su and SST/N is
2

## a consistent estimator of sy.

d. The derivation in part (c) assumed nothing about Var(u|x).

The

## population R-squared depends on only the unconditional variances of u and y.

Therefore, regardless of the nature of heteroskedasticity in Var(u|x), the
usual R-squared consistently estimates the population R-squared.

Neither

13

## such as unbiasedness, so the only analysis we can do in any generality

involves asymptotics.

## The statement in the problem is simply wrong.

CHAPTER 5

5.1. Define x1

^
^ ^
_ (z1,y2) and x2 _ v^2, and let B
_ (B
1 ,r1) be OLS estimator

B^1

^ ^
= (D
1 ,a1).

B^1

## can also be obtained by

partitioned regression:
^
(i) Regress x1 onto v2 and save the residuals, say
x1.
(ii) Regress y1 onto
x1.
^
^
But when we regress z1 onto v2, the residuals are just z1 since v2 is
N

orthogonal in sample to z.

(More precisely,

S zi1^vi2 = 0.)

Further, because

i=1

^
^
^
^
we can write y2 = y2 + v2, where y2 and v2 are orthogonal in sample, the
^
residuals from regressing y2 onto v2 are simply the first stage fitted values,
^
y2.

^
In other words,
x1 = (z1,y2).

## But the 2SLS estimator of

B1

is obtained

^
exactly from the OLS regression y1 on z1, y2.

## 5.3. a. There may be unobserved health factors correlated with smoking

behavior that affect infant birth weight.

## pregnancy may, on average, drink more coffee or alcohol, or eat less

nutritious meals.
b. Basic economics says that packs should be negatively correlated with
cigarette price, although the correlation might be small (especially because
price is aggregated at the state level).

little careful.

## One component of cigarette price is the state tax on

14

cigarettes.

States that have lower taxes on cigarettes may also have lower

## maybe cigarette price fails the exogeneity requirement for an IV.

c. OLS is followed by 2SLS (IV, in this case):
. reg lbwght male parity lfaminc packs
Source |
SS
df
MS
---------+-----------------------------Model | 1.76664363
4 .441660908
Residual |
48.65369 1383 .035179819
---------+-----------------------------Total | 50.4203336 1387 .036352079

Number of obs
F( 4, 1383)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1388
12.55
0.0000
0.0350
0.0322
.18756

-----------------------------------------------------------------------------lbwght |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------male |
.0262407
.0100894
2.601
0.009
.0064486
.0460328
parity |
.0147292
.0056646
2.600
0.009
.0036171
.0258414
lfaminc |
.0180498
.0055837
3.233
0.001
.0070964
.0290032
packs | -.0837281
.0171209
-4.890
0.000
-.1173139
-.0501423
_cons |
4.675618
.0218813
213.681
0.000
4.632694
4.718542
-----------------------------------------------------------------------------. reg lbwght male parity lfaminc packs (male parity lfaminc cigprice)
Source |
SS
df
MS
---------+-----------------------------Model | -91.3500269
4 -22.8375067
Residual | 141.770361 1383 .102509299
---------+-----------------------------Total | 50.4203336 1387 .036352079

Number of obs
F( 4, 1383)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

(2SLS)
1388
2.39
0.0490
.
.
.32017

-----------------------------------------------------------------------------lbwght |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------packs |
.7971063
1.086275
0.734
0.463
-1.333819
2.928031
male |
.0298205
.017779
1.677
0.094
-.0050562
.0646972
parity | -.0012391
.0219322
-0.056
0.955
-.044263
.0417848
lfaminc |
.063646
.0570128
1.116
0.264
-.0481949
.1754869
_cons |
4.467861
.2588289
17.262
0.000
3.960122
4.975601
------------------------------------------------------------------------------

## (Note that Stata automatically shifts endogenous explanatory variables to the

beginning of the list when report coefficients, standard errors, and so on.)
15

The difference between OLS and IV in the estimated effect of packs on bwght is
huge.

The IV estimate

## has the opposite sign, is huge in magnitude, and is not statistically

significant.

The sign and size of the smoking effect are not realistic.

d. We can see the problem with IV by estimating the reduced form for
packs:
. reg packs male parity lfaminc cigprice
Source |
SS
df
MS
---------+-----------------------------Model | 3.76705108
4
.94176277
Residual | 119.929078 1383 .086716615
---------+-----------------------------Total | 123.696129 1387 .089182501

Number of obs
F( 4, 1383)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1388
10.86
0.0000
0.0305
0.0276
.29448

-----------------------------------------------------------------------------packs |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------male | -.0047261
.0158539
-0.298
0.766
-.0358264
.0263742
parity |
.0181491
.0088802
2.044
0.041
.0007291
.0355692
lfaminc | -.0526374
.0086991
-6.051
0.000
-.0697023
-.0355724
cigprice |
.000777
.0007763
1.001
0.317
-.0007459
.0022999
_cons |
.1374075
.1040005
1.321
0.187
-.0666084
.3414234
-----------------------------------------------------------------------------The reduced form estimates show that cigprice does not significantly affect
packs; in fact, the coefficient on cigprice is not the sign we expect.

Thus,

## cigprice fails as an IV for packs because cigprice is not partially correlated

with packs (with a sensible sign for the correlation).

## This is separate from

the problem that cigprice may not truly be exogenous in the birth weight
equation.

5.5. Under the null hypothesis that q and z2 are uncorrelated, z1 and z2 are
exogenous in (5.55) because each is uncorrelated with u1.
16

Unfortunately, y2

is correlated with u1, and so the regression of y1 on z1, y2, z2 does not
produce a consistent estimator of 0 on z2 even when E(z
2 q) = 0.
that

^
J
1

We could find

## and z2 are uncorrelated -- in which case we would incorrectly conclude that z2

is not a valid IV candidate.

J1

= 0 when z2

## and q are correlated -- in which case we incorrectly conclude that the

elements in z2 are valid as instruments.
The point of this exercise is that one cannot simply add instrumental
variable candidates in the structural equation and then test for significance
of these variables using OLS.
cannot be tested.

## 5.7. a. If we plug q = (1/d1)q1 - (1/d1)a1 into equation (5.45) we get

y = b0 + b1x1 + ... + bKxK + h1q1 + v - h1a1,
where h1

_ (1/d1).

(5.56)

## uncorrelated with the structural error, v (by definition of redundancy).

Further, we have assumed that the zh are uncorrelated with a1.

Since each xj

## is also uncorrelated with v - h1a1, we can estimate (5.56) by 2SLS using

instruments (1,x1,...,xK,z1,z2,...,zM) to get consistent of the bj and h1.
Given all of the zero correlation assumptions, what we need for
identification is that at least one of the zh appears in the reduced form for
q1.

## More formally, in the linear projection

q1 = p0 + p1x1 + ... + pKxK + pK+1z1 + ... + pK+MzM + r1,

## at least one of pK+1, ..., pK+M must be different from zero.

b. We need family background variables to be redundant in the log(wage)
17

equation once ability (and other factors, such as educ and exper), have been
controlled for.

## The idea here is that family background may influence ability

but should have no partial effect on log(wage) once ability has been accounted
for.

## For the rank condition to hold, we need family background variables to

be correlated with the indicator, q1, say IQ, once the xj have been netted
out.

## are (partially) correlated.

c. Applying the procedure to the data set in NLS80.RAW gives the
following results:
. reg lwage exper tenure educ married south urban black iq (exper tenure educ
married south urban black meduc feduc sibs)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
-------------+-----------------------------Model | 19.6029198
8 2.45036497
Residual | 107.208996
713 .150363248
-------------+-----------------------------Total | 126.811916
721 .175883378

Number of obs
F( 8,
713)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

722
25.81
0.0000
0.1546
0.1451
.38777

-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iq |
.0154368
.0077077
2.00
0.046
.0003044
.0305692
tenure |
.0076754
.0030956
2.48
0.013
.0015979
.0137529
educ |
.0161809
.0261982
0.62
0.537
-.035254
.0676158
married |
.1901012
.0467592
4.07
0.000
.0982991
.2819033
south |
-.047992
.0367425
-1.31
0.192
-.1201284
.0241444
urban |
.1869376
.0327986
5.70
0.000
.1225442
.2513311
black |
.0400269
.1138678
0.35
0.725
-.1835294
.2635832
exper |
.0162185
.0040076
4.05
0.000
.0083503
.0240867
_cons |
4.471616
.468913
9.54
0.000
3.551
5.392231
-----------------------------------------------------------------------------. reg lwage exper tenure educ married south urban black kww (exper tenure educ
married south urban black meduc feduc sibs)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
-------------+-----------------------------18

Number of obs =
F( 8,
713) =

722
25.70

Model |
19.820304
8
2.477538
Residual | 106.991612
713 .150058361
-------------+-----------------------------Total | 126.811916
721 .175883378

Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=

0.0000
0.1563
0.1468
.38737

-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kww |
.0249441
.0150576
1.66
0.098
-.0046184
.0545067
tenure |
.0051145
.0037739
1.36
0.176
-.0022947
.0125238
educ |
.0260808
.0255051
1.02
0.307
-.0239933
.0761549
married |
.1605273
.0529759
3.03
0.003
.0565198
.2645347
south |
-.091887
.0322147
-2.85
0.004
-.1551341
-.0286399
urban |
.1484003
.0411598
3.61
0.000
.0675914
.2292093
black | -.0424452
.0893695
-0.47
0.635
-.2179041
.1330137
exper |
.0068682
.0067471
1.02
0.309
-.0063783
.0201147
_cons |
5.217818
.1627592
32.06
0.000
4.898273
5.537362
-----------------------------------------------------------------------------Even though there are 935 men in the sample, only 722 are used for the
estimation, because data are missing on meduc and feduc.

What we could do is

## define binary indicators for whether the corresponding variable is missing,

set the missing values to zero, and then use the binary indicators as
instruments along with meduc, feduc, and sibs.

## This would allow us to use all

935 observations.
The return to education is estimated to be small and insignificant
whether IQ or KWW used is used as the indicator.

## background variables do not satisfy the appropriate redundancy condition, or

they might be correlated with a1.

## (In both first-stage regressions, the F

statistic for joint significance of meduc, feduc, and sibs have p-values below
.002, so it seems the family background variables are sufficiently partially
correlated with the ability indicators.)

## 5.9. Define q4 = b4 - b3, so that b4 = b3 + q4.

the equation and rearranging gives

19

## + b3(twoyr + fouryr) + q4fouryr + u

+ b3 totcoll + q4fouryr + u,

## log(wage) = b0 + b1exper + b2exper

= b0 + b1exper + b2exper
where totcoll = twoyr + fouryr.

## Now, just estimate the latter equation by

2SLS using exper, exper , dist2yr and dist4yr as the full set of instruments.
^
We can use the t statistic on q4 to test H0: q4 = 0 against H1: q4 > 0.

5.11. Following the hint, let y2 be the linear projection of y2 on z2, let a2

L2

## be the projection error, and assume that

is known.

(The results on

generated regressors in Section 6.1.1 show that the argument carries over to
the case when

L2

is estimated.)

Plugging in y2 = y2 + a2 gives

0

## each explanatory is orthogonal to the composite error, a1a2 + u1.

assumption, E(zu1) = 0.
is that E(z
1 a2)

*

*

that

P2

The problem

general.

By

y2

## The second step regression (assuming

is known) is essentially
y1 = z1D1 + a1y2 + a1r2 + u1.
*

## Now, r2 is uncorrelated with z, and so E(z

1 r2) = 0 and E(y2r2) = 0.
*

The

lesson is that one must be very careful if manually carrying out 2SLS by
explicitly doing the first- and second-stage regressions.

5.13. a. In a simple regression model with a single IV, the IV estimate of the
^
slope can be written as b1 =

_ * =
i
i
7i=1 i
8 7i=1 i
8
20

## & SN z (y - _y)*/& SN z (x - _x)*. Now the numerator can be written as

7i=1 i i
8 7i=1 i i
8
N
N
N
S zi(yi - _y) = S ziyi - &7 S zi*8_y = N1y1 - N1y = N1(y1 - y).
i=1
i=1
i=1
N
where N1 = S zi is the number of observations in the sample with zi = 1 and
-----

-----

-----

-----

i=1

-----

-----

## y1 is the average of the yi over the observations with zi = 1.

as a weighted average:
clear.

-----

-----

Next, write y

-----

-----

-----

-----

-----

## Straightforward algebra shows that y1 - y = [(N - N1)/N]y1 - (N0/N)y0

-----

-----

= (N0/N)(y1 - y0).

-----

-----

-----

-----

Taking the

-----

-----

## fraction of observations receiving treatment when zi = 1 and x0 is the

fraction receiving treatment when zi = 0.

## participates in a job training program, and let zi = 1 if person i is eligible

for participation in the program.

-----

## participating in the program out of those made eligibile, and x0 is the

fraction of people participating who are not eligible.

(When eligibility is

-----

-----

## response between the z = 1 and z = 0 groups gets divided by the difference in

participation rates across the two groups.

(^
9^

11
12

0
IK

K1.

)
, where IK

20

^11

is the K2 x K2

is L1 x K1, and

^12

is K2 x

## As in Problem 5.12, the rank condition holds if and only if rank(^) = K.

a. If for some xj, the vector z1 does not appear in L(xj|z), then
21

^11

has

## a linear combination of the last K2 elements of

^,

can be written as

## Therefore, a necessary condition for the rank condition is that no columns of

^11

be exactly zero, which means that at least one zh must appear in the

## reduced form of each xj, j = 1,...,K1.

b. Suppose K1 = 2 and L1 = 2, where z1 appears in the reduced form form
both x1 and x2, but z2 appears in neither reduced form.

^11

## Then the 2 x 2 matrix

has zeros in its second row, which means that the second row of

zeros.

is all

## Intuitively, while we began with

two instruments, only one of them turned out to be partially correlated with
x1 and x2.
c. Without loss of generality, we assume that zj appears in the reduced
form for xj; we can simply reorder the elements of z1 to ensure this is the
case.

Then

Looking at

^11

diagonals then
Therefore, rank

(^

11
12

9^
2

0
IK

)
, we see that if

20

^11

## is lower triangular with all nonzero diagonal elements.

= K.

CHAPTER 6

6.1. a. Here is abbreviated Stata output for testing the null hypothesis that
educ is exogenous:

. qui reg educ nearc4 nearc2 exper expersq black south smsa reg661-reg668
smsa66
. predict v2hat, resid
22

. reg lwage educ exper expersq black south smsa reg661-reg668 smsa66 v2hat
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------educ |
.1570594
.0482814
3.253
0.001
.0623912
.2517275
exper |
.1188149
.0209423
5.673
0.000
.0777521
.1598776
expersq | -.0023565
.0003191
-7.384
0.000
-.0029822
-.0017308
black | -.1232778
.0478882
-2.574
0.010
-.2171749
-.0293807
south | -.1431945
.0261202
-5.482
0.000
-.1944098
-.0919791
smsa |
.100753
.0289435
3.481
0.000
.0440018
.1575042
reg661 |
-.102976
.0398738
-2.583
0.010
-.1811588
-.0247932
reg662 | -.0002286
.0310325
-0.007
0.994
-.0610759
.0606186
reg663 |
.0469556
.0299809
1.566
0.117
-.0118296
.1057408
reg664 | -.0554084
.0359807
-1.540
0.124
-.1259578
.0151411
reg665 |
.0515041
.0436804
1.179
0.238
-.0341426
.1371509
reg666 |
.0699968
.0489487
1.430
0.153
-.0259797
.1659733
reg667 |
.0390596
.0456842
0.855
0.393
-.050516
.1286352
reg668 | -.1980371
.0482417
-4.105
0.000
-.2926273
-.1034468
smsa66 |
.0150626
.0205106
0.734
0.463
-.0251538
.0552789
v2hat | -.0828005
.0484086
-1.710
0.087
-.177718
.0121169
_cons |
3.339687
.821434
4.066
0.000
1.729054
4.950319
-----------------------------------------------------------------------------^
The t statistic on v2 is -1.71, which is not significant at the 5% level
against a two-sided alternative.

## The negative correlation between u1 and educ

is essentially the same finding that the 2SLS estimated return to education is
larger than the OLS estimate.
that educ is endogenous.

## (Depending on the application or purpose of a study,

the same researcher may take t = -1.71 as evidence for or against endogeneity.)
b. To test the single overidentifying restiction we obtain the 2SLS
residuals:
. qui reg lwage educ exper expersq black south smsa reg661-reg668 smsa66
(nearc4 nearc2 exper expersq black south smsa reg661-reg668 smsa66)
. predict uhat1, resid
Now, we regress the 2SLS residuals on all exogenous variables:
. reg uhat1 exper expersq black south smsa reg661-reg668 smsa66 nearc4 nearc2
Source |

SS

df

MS

Number of obs =
23

3010

---------+-----------------------------Model | .203922832
16 .012745177
Residual | 491.568721 2993 .164239466
---------+-----------------------------Total | 491.772644 3009 .163433913

F( 16, 2993)
Prob > F
R-squared
Adj R-squared
Root MSE

=
0.08
= 1.0000
= 0.0004
= -0.0049
= .40526

The test statistic is the sample size times the R-squared from this regression:
. di 3010*.0004
1.204
. di chiprob(1,1.2)
.27332168
2

## The p-value, obtained from a c1 distribution, is about .273, so the instruments

pass the overidentification test.

## While this is easy

to test for each by estimating the two reduced forms, the rank condition could
still be violated (although see Problem 15.5c).

## Ideally, prices vary

because of things like transportation costs that are not systematically related
to regional variations in individual productivity.

## A potential problem is that

prices reflect food quality and that features of the food other than calories
and protein appear in the disturbance u1.
b. Since there are two endogenous explanatory variables we need at least
two prices.
c. We would first estimate the two reduced forms for calories and protein
2

by regressing each on a constant, exper, exper , educ, and the M prices, p1,
..., pM.

^
^
We obtain the residuals, v21 and v22.

## Then we would run the

2
^
^
regression log(produc) on 1, exper, exper , educ, v21, v22 and do a joint

24

^
^
significance test on v21 and v22.

## We could use a standard F test or use a

heteroskedasticity-robust test.

## 6.5. a. For simplicity, absorb the intercept in x, so y = xB + u, E(u|x) = 0,

^2
In these tests, s is implictly SSR/N -- there is no degrees of

Var(u|x) = s .
2

freedom adjustment.

^2
^2
So ui - s has a zero sample average, which means that

asymptotically.)
N

-1/2 N

^2
^2
-1/2 N
^2
^2
S (hi - Mh)(u
S hi (u
i - s ) = N
i - s ).

-1/2 N

i=1

i=1

Next, N

i=1
-1/2 N

op(1).

## S (hi - Mh)(s^2 - s2) = Op(1)Wop(1) = op(1).

So N

Therefore, so

i=1

far we have
-1/2 N

N
^2
2
S hi (u^2i - ^s2) = N-1/2 S (hi - Mh)(u
i - s ) + op(1).

i=1

i=1

-1/2 N

## S (hi - Mh)u^2i = N-1/2 S (hi -

i=1

2
h)ui

+ op(1).

^
[xi(B N

B)]2,

i=1

^2
2
^
Now, as in Problem 4.4, we can write ui = ui - 2uixi(B -

B)

so

-1/2 N

## S (hi - Mh)u^2i = N-1/2 S (hi - Mh)u2i

i=1
i=1
&
-1/2 N
^
- 2 N
S ui(hi - Mh)xi8*(B
- B)
7
i=1
N
& -1/2 S (h - M )(x t x )*{vec[(B
^
^
+ N
- B)(B - B)]},
i
h
i
i 8
7

(6.40)

i=1

^
where the expression for the third term follows from [xi(B -

B)]2

^
= xi(B -

B)(B^

^
^
t xi)vec[(B
- B)(B - B)]. Dropping the "-2" the second term can
& -1 N
*
^
^
be written as N S ui(hi - Mh)xi rN(B - B) = op(1)WOp(1) because rN(B - B) =
7 i=1
8
Op(1) and, under E(ui|xi) = 0, E[ui(hi - Mh)xi] = 0; the law of large numbers

B)xi

= (xi

-----

-----

N

-1/2&

7N

## The third term can be written as

^
^
-1/2
S (hi - Mh)(xi t xi)*8{vec[rN(B
- B)rN(B - B)]} = N
WOp(1)WOp(1),
i=1

-1 N

-----

-----

where we again use the fact that sample averages are Op(1) by the law of large
^
numbers and vec[rN(B -----

B)rN(B^
-----

B)]

= Op(1).
25

-1/2 N

i=1

2
h)(ui

2 2

2uis

- s )] =

+ s .

2
E[(ui

2 2

- s ) (hi -

Mh)(hi

Mh)].

## Under the null, E(ui|xi) = Var(ui|xi) = s

2

|xi] = k2 - s4 _ h2.

2 2

2 2

2 2

Mh)}

Mh)(hi

Mh)]|xi}

## [since hi = h(xi)] = h E[(hi -

show.

2 2

= E{E[(ui - s )

Mh)(hi

= ui -

[since E(ui|xi) = 0 is

## assumed] and therefore, when we add (6.27), E[(ui - s )

= E{E[(ui - s ) (hi -

2 2

Now (ui - s )

Mh)].

Mh)(hi

Mh)]

|xi](hi - Mh)(hi -

## representing the population is a matter of taste.)

c. From part (b) and Lemma 3.8, the following statistic has an asymptotic
2

cQ distribution:

## &N-1/2 SN (u^2 - s^2)h *{h2E[(h - M )(h - M )]}-1&N-1/2 SN h(u

^2
^2 *
i
i8
i
h
i
h
i
i - s )8.
7
7
i=1
i=1
N ^2
^2
Using again the fact that S (ui - s ) = 0, we can replace hi with hi - h in
-----

i=1

## Then, again by Lemma 3.8, we can

replace the matrix in the quadratic form with a consistent estimator, which is
^2& -1
h N
^2
-1
where h = N

N ^2
^2 2
S (u
i - s ) .

N
S (hi - h)(hi - h)*8,
i=1
-----

-----

## The computable statistic, after simple algebra,

i=1

can be written as

## & SN (u^2 - s^2)(h - h)*& SN (h - h)(h - h)*-1& SN (h - h)(u

^2
^2 * ^2
i
i
i - s )8/h .
7i=1 i
87i=1 i
8 7i=1 i
-----

-----

-----

-----

^2
^2
Now h is just the total sum of squares in the ui, divided by N.

The numerator

^2
of the statistic is simply the explained sum of squares from the regression ui
on 1, hi, i = 1,...,N.

## Therefore, the test statistic is N times the usual

^2
2
(centered) R-squared from the regression ui on 1, hi, i = 1,...,N, or NRc.
2

2 2

Mh)]

generally.

## Hopefully, the approach is by now pretty clear.

26

Mh)(hi

We replace

the population expected value with the sample average and replace any unknown
parameters -(under H0).

B,

s , and

Mh

&
7

^2
^2 *
S hi (u
i - s )8
i=1

-1/2 N

is
N

-1 N

-----

-----

i=1

## and the test statistic robust to heterokurtosis can be written as

& SN (u
^2
^2
*& SN (u^2 - ^s2)2(h - h)(h - h)*-1
- s )(hi - h)
i
i
i
7i=1
87i=1 i
8
N
&
^2
^2 *
W7 S (hi - h)(ui - s )8,
-----

-----

-----

-----

i=1

which is easily seen to be the explained sum of squares from the regression of
^2
^2
1 on (ui - s )(hi - h), i = 1,...,N (without an intercept).
-----

## sum of squares, without demeaning, is N = (1 + 1 + ... + 1) (N times), the

statistic is equivalent to N - SSR0, where SSR0 is the sum of squared
residuals.

## . reg lprice ldist if y81

Source |
SS
df
MS
---------+-----------------------------Model | 3.86426989
1 3.86426989
Residual | 17.5730845
140 .125522032
---------+-----------------------------Total | 21.4373543
141 .152037974

Number of obs
F( 1,
140)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

142
30.79
0.0000
0.1803
0.1744
.35429

-----------------------------------------------------------------------------lprice |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------ldist |
.3648752
.0657613
5.548
0.000
.2348615
.4948889
_cons |
8.047158
.6462419
12.452
0.000
6.769503
9.324813
-----------------------------------------------------------------------------This regression suggests a strong link between housing price and distance from
the incinerator (as distance increases, so does housing price).
27

The elasticity

## However, this is not a good causal

regression:

the incinerator may have been put near homes with lower values to

begin with.

## If so, we would expect the positive relationship found in the

simple regression even if the new incinerator had no effect on housing prices.
b. The parameter d3 should be positive:

## house should be worth more the farther it is from the incinerator.

Here is my

Stata session:
. gen y81ldist = y81*ldist
. reg lprice y81 ldist y81ldist
Source |
SS
df
MS
---------+-----------------------------Model | 24.3172548
3 8.10575159
Residual | 37.1217306
317 .117103251
---------+-----------------------------Total | 61.4389853
320 .191996829

Number of obs
F( 3,
317)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

321
69.22
0.0000
0.3958
0.3901
.3422

-----------------------------------------------------------------------------lprice |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------y81 | -.0113101
.8050622
-0.014
0.989
-1.59525
1.57263
ldist |
.316689
.0515323
6.145
0.000
.2153006
.4180775
y81ldist |
.0481862
.0817929
0.589
0.556
-.1127394
.2091117
_cons |
8.058468
.5084358
15.850
0.000
7.058133
9.058803
-----------------------------------------------------------------------------The coefficient on ldist reveals the shortcoming of the regression in part (a).
This coefficient measures the relationship between lprice and ldist in 1978,
before the incinerator was even being rumored.

## of the effect is as expected, it is not especially large, and it is

statistically insignificant anyway.

## Therefore, at this point, we cannot reject

the null hypothesis that building the incinerator had no effect on housing
prices.

28

## c. Adding the variables listed in the problem gives

. reg lprice y81 ldist y81ldist lintst lintstsq larea lland age agesq rooms
baths
Source |
SS
df
MS
---------+-----------------------------Model | 48.7611143
11 4.43282858
Residual |
12.677871
309 .041028709
---------+-----------------------------Total | 61.4389853
320 .191996829

Number of obs
F( 11,
309)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

321
108.04
0.0000
0.7937
0.7863
.20256

-----------------------------------------------------------------------------lprice |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------y81 |
-.229847
.4877198
-0.471
0.638
-1.189519
.7298249
ldist |
.0866424
.0517205
1.675
0.095
-.0151265
.1884113
y81ldist |
.0617759
.0495705
1.246
0.214
-.0357625
.1593143
lintst |
.9633332
.3262647
2.953
0.003
.3213518
1.605315
lintstsq | -.0591504
.0187723
-3.151
0.002
-.096088
-.0222128
larea |
.3548562
.0512328
6.926
0.000
.2540468
.4556655
lland |
.109999
.0248165
4.432
0.000
.0611683
.1588297
age | -.0073939
.0014108
-5.241
0.000
-.0101699
-.0046178
agesq |
.0000315
8.69e-06
3.627
0.000
.0000144
.0000486
rooms |
.0469214
.0171015
2.744
0.006
.0132713
.0805715
baths |
.0958867
.027479
3.489
0.000
.041817
.1499564
_cons |
2.305525
1.774032
1.300
0.195
-1.185185
5.796236
-----------------------------------------------------------------------------The incinerator effect is now larger (the elasticity is about .062) and the t
statistic is larger, but the interaction is still statistically insignificant.
Using these models and this two years of data we must conclude the evidence
that housing prices were adversely affected by the new incinerator is somewhat
weak.

## 6.9. a. The Stata results are

. reg ldurat afchnge highearn afhigh male married head-construc if ky
Source |
SS
df
MS
-------------+-----------------------------Model | 358.441793
14 25.6029852
Residual | 8341.41206 5334 1.56381928
-------------+-----------------------------29

Number of obs
F( 14, 5334)
Prob > F
R-squared
Adj R-squared

=
=
=
=
=

5349
16.37
0.0000
0.0412
0.0387

Total |

8699.85385

5348

1.62674904

Root MSE

1.2505

-----------------------------------------------------------------------------ldurat |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------afchnge |
.0106274
.0449167
0.24
0.813
-.0774276
.0986824
highearn |
.1757598
.0517462
3.40
0.001
.0743161
.2772035
afhigh |
.2308768
.0695248
3.32
0.001
.0945798
.3671738
male | -.0979407
.0445498
-2.20
0.028
-.1852766
-.0106049
married |
.1220995
.0391228
3.12
0.002
.0454027
.1987962
head | -.5139003
.1292776
-3.98
0.000
-.7673372
-.2604634
neck |
.2699126
.1614899
1.67
0.095
-.0466737
.5864988
upextr |
-.178539
.1011794
-1.76
0.078
-.376892
.0198141
trunk |
.1264514
.1090163
1.16
0.246
-.0872651
.340168
lowback | -.0085967
.1015267
-0.08
0.933
-.2076305
.1904371
lowextr | -.1202911
.1023262
-1.18
0.240
-.3208922
.0803101
occdis |
.2727118
.210769
1.29
0.196
-.1404816
.6859052
manuf | -.1606709
.0409038
-3.93
0.000
-.2408591
-.0804827
construc |
.1101967
.0518063
2.13
0.033
.0086352
.2117581
_cons |
1.245922
.1061677
11.74
0.000
1.03779
1.454054
-----------------------------------------------------------------------------The estimated coefficient on the interaction term is actually higher now, and
even more statistically significant than in equation (6.33).

## explanatory variables only slightly increased the standard error on the

interaction term.
b. The small R-squared, on the order of 4.1%, or 3.9% if we used the
adjusted R-squared, means that we cannot explain much of the variation in time
on workers compensation using the variables included in the regression.
is often the case in the social sciences:

This

## The low R-squared

means that making predictions of log(durat) would be very difficult given the
factors we have included in the regression:

## However, the low

R-squared does not mean we have a biased or consistent estimator of the effect
of the policy change.

30

## With over 5,000 observations, we

can get a reasonably precise estimate of the effect, although the 95%
confidence interval is pretty wide.
c. Using the data for Michigan to estimate the simple model gives
. reg ldurat afchnge highearn afhigh if mi
Source |
SS
df
MS
-------------+-----------------------------Model | 34.3850177
3 11.4616726
Residual | 2879.96981 1520 1.89471698
-------------+-----------------------------Total | 2914.35483 1523 1.91356194

Number of obs
F( 3, 1520)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1524
6.05
0.0004
0.0118
0.0098
1.3765

-----------------------------------------------------------------------------ldurat |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------afchnge |
.0973808
.0847879
1.15
0.251
-.0689329
.2636945
highearn |
.1691388
.1055676
1.60
0.109
-.0379348
.3762124
afhigh |
.1919906
.1541699
1.25
0.213
-.1104176
.4943988
_cons |
1.412737
.0567172
24.91
0.000
1.301485
1.523989
-----------------------------------------------------------------------------The coefficient on the interaction term, .192, is remarkably similar to that
for Kentucky.

## statistic is insignificant at the 10% level against a one-sided alternative.

Asymptotic theory predicts that the standard error for Michigan will be about
1/2

(5,626/1,524)

## The difference in the KY and MI cases shows

the importance of a large sample size for this kind of policy analysis.

6.11. The following is Stata output that I will use to answer the first three
parts:

. reg lwage y85 educ y85educ exper expersq union female y85fem
Source |
SS
df
MS
-------------+-----------------------------Model | 135.992074
8 16.9990092
31

Number of obs =
F( 8, 1075) =
Prob > F
=

1084
99.80
0.0000

## Residual | 183.099094 1075 .170324738

-------------+-----------------------------Total | 319.091167 1083
.29463635

R-squared
=
Adj R-squared =
Root MSE
=

0.4262
0.4219
.4127

-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------y85 |
.1178062
.1237817
0.95
0.341
-.125075
.3606874
educ |
.0747209
.0066764
11.19
0.000
.0616206
.0878212
y85educ |
.0184605
.0093542
1.97
0.049
.000106
.036815
exper |
.0295843
.0035673
8.29
0.000
.0225846
.036584
expersq | -.0003994
.0000775
-5.15
0.000
-.0005516
-.0002473
union |
.2021319
.0302945
6.67
0.000
.1426888
.2615749
female | -.3167086
.0366215
-8.65
0.000
-.3885663
-.244851
y85fem |
.085052
.051309
1.66
0.098
-.0156251
.185729
_cons |
.4589329
.0934485
4.91
0.000
.2755707
.642295
------------------------------------------------------------------------------

## a. The return to another year of education increased by about .0185, or

1.85 percentage points, between 1978 and 1985.

## is marginally significant at the 5% level against a two-sided alternative.

b. The coefficient on y85fem is positive and shows that the estimated
gender gap declined by about 8.5 percentage points.

## only significant at about the 10% level against a two-sided alternative.

Still, this is suggestive of some closing of wage differentials between men
and women at given levels of education and workforce experience.
c. Only the coefficient on y85 changes if wages are measured in 1978
dollars.

In fact, you can check that when 1978 wages are used, the

coefficient on y85 becomes about -.383, which shows a significant fall in real
wages for given productivity characteristics and gender over the seven-year
period.

## (But see part e for the proper interpretation of the coefficient.)

d. To answer this question, I just took the squared OLS residuals and
regressed those on the year dummy, y85.

## standard error of about .022, which gives a t statistic of about 1.91.

32

So

there is some evidence that the variance of the unexplained part of log wages
(or log real wages) has increased over time.
e. As the equation is written in the problem, the coefficient d0 is the
growth in nominal wages for a male with no years of education!
with 12 years of education, we want q0

_ d0 + 12d1.

For a male

## A simple way to obtain

^
^
^
the standard error of q0 = d0 + 12d1 is to replace y85Weduc with y85W(educ Simple algebra shows that, in the new model, q0 is the coefficient on

12).
educ.

In Stata we have

## . gen y85educ0 = y85*(educ - 12)

. reg lwage y85 educ y85educ0 exper expersq union female y85fem
Source |
SS
df
MS
-------------+-----------------------------Model | 135.992074
8 16.9990092
Residual | 183.099094 1075 .170324738
-------------+-----------------------------Total | 319.091167 1083
.29463635

Number of obs
F( 8, 1075)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1084
99.80
0.0000
0.4262
0.4219
.4127

-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------y85 |
.3393326
.0340099
9.98
0.000
.2725993
.4060659
educ |
.0747209
.0066764
11.19
0.000
.0616206
.0878212
y85educ0 |
.0184605
.0093542
1.97
0.049
.000106
.036815
exper |
.0295843
.0035673
8.29
0.000
.0225846
.036584
expersq | -.0003994
.0000775
-5.15
0.000
-.0005516
-.0002473
union |
.2021319
.0302945
6.67
0.000
.1426888
.2615749
female | -.3167086
.0366215
-8.65
0.000
-.3885663
-.244851
y85fem |
.085052
.051309
1.66
0.098
-.0156251
.185729
_cons |
.4589329
.0934485
4.91
0.000
.2755707
.642295
-----------------------------------------------------------------------------So the growth in nominal wages for a man with educ = 12 is about .339, or
33.9%.

[We could use the more accurate estimate, obtained from exp(.339) -1.]

33

CHAPTER 7

## 7.1. Write (with probability approaching one)

B^

&N-1 SN XX *-1&N-1 SN Xu *.
7 i=1 i i8 7 i=1 i i8

From SOLS.2, the weak law of large numbers, and Slutskys Theorem,
plim

## &N-1 SN XX *-1 = A-1.

7 i=1 i i8

&N-1 SN Xu * = 0. Thus,
7 i=1 i i8
^
& -1 N X *-1Wplim &N-1 SN Xu * = B + A-1W0 = B. )
plim B = B + plim N S X
7 i=1 i i8
7 i=1 i i8

## 7.3. a. Since OLS equation-by-equation is the same as GLS when

is diagonal,

it suffices to show that the GLS estimators for different equations are
asymptotically uncorrelated.

## This follows if the asymptotic variance matrix

is block diagonal (see Section 3.5), where the blocking is by the parameter
vector for each equation.
from Theorem 7.4:

## under SGLS.1, SGLS.2, and SGLS.3,

--- ^
-1
-1
Avar rN(B - B) = [E(X
i ) Xi)] .

Now, we can use the special form of Xi for SUR (see Example 7.1), the fact
that

)-1

igxig) =
2

),

SGLS.3

## sg2E(xigxig) for all g = 1,...,G, and

E(uiguihx
igxih) = E(uiguih)E(x
igxih) = 0, all g

\$ h.

&s-2
0
1 E(x
i1xi1)
2
-1
0
W
E(X
i ) Xi) = 2
W
2
7
0
0

Therefore, we have

*
2
2.
0
2
s-2
G E(x
iGxiG)8
0

## When this matrix is inverted, it is also block diagonal.

asymptotic variance of what we wanted to show.
34

## b. To test any linear hypothesis, we can either construct the Wald

statistic or we can use the weighted sum of squared residuals form of the
statistic as in (7.52) or (7.53).
model with the restriction

B1

B2

imposed.

## impose general linear restrictions.

c. When

is diagonal in a SUR system, system OLS and GLS are the same.

Under SGLS.1 and SGLS.2, GLS and FGLS are asymptotically equivalent
(regardless of the structure of
=

B^GLS

when

and

))

## whether or not SGLS.3 holds.

--- B
^
^
--- ^
^
rN(
FGLS - BGLS) = op(1), then rN(BSOLS - BFGLS) = op(1).

B^SOLS

But, if
Thus,

)^

is

## estimated in an unrestricted fashion and even if the system homoskedasticity

assumption SGLS.3 does not hold.

## 7.5. This is easy with the hint.

Note that

-1
&^ -1 & N
*-1
N
^
2)
t 7 S xi xi*82 = )
t &7 S xi xi*8 .
7
i=1
8
i=1
Therefore,

B^

-1*
&^
N
^ -1
2)
t &7 S xi xi*8 2()
7
i=1
8

& SN xy *
& SN xy *
i i12
i i12
2i=1
2i=1
2
2
-1
2
2
&
N
*
t IK)2
WW 2 = 2IG t &7 S xi xi*8 22
W
WW 22.
i=1
82
2
W 2 7
2 N
2
2 SN xy 2
i iG8
7 S xi yiG8
7
i=1

i=1

Straightforward multiplication shows that the right hand side of the equation
^
^
^
^
is just the vector of stacked Bg, g = 1,...,G. where Bg is the OLS estimator
for equation g.

E[E(uit|xit)] =
2

35

## are easily found since E(uit) =

Now, consider E(uituis), and

## iterated expectations (LIE) again we have E(uituis) = E[E(uituis|uis)] =

E[E(uit|uis)uis)] = 0.
b. The GLS estimator is
-1 N
N
B* _ &7 S Xi )-1Xi*8 &7 S Xi )-1yi*8
i=1
i=1

## & SN ST s-2x x *-1& SN ST s-2x y *.

7i=1t=1 t it it8 7i=1t=1 t it it8
= b0 + b1yi,t-1 + uit, then yit is clearly correlated
=

## c. If, say, yit

with uit, which says that xi,t+1 = yit is correlated with uit.
does not hold.
xis, s > t.

Thus, SGLS.1

## Generally, SGLS.1 holds whenever there is feedback from yit to

)-1

However, since

is diagonal, X
i ) ui =
-1

E(X
i ) ui) =

S s-2
t E(x
ituit)
t=1

-1

since E(x
ituit) = 0 under (7.80).

S xits-2
t uit, and so

t=1

= 0

without SGLS.1.
d. First, since

)-1

is diagonal, X
i)

= (s1 x
i1,s2 x
i2, ...,

-1

-2

-2

s-2
T x
iT),

and so
E(X
i ) uiu
i ) Xi) =
-1

-1

\$ t.

-2
S S s-2
t ss E(uituisx
itxis).

t=1s=1

## E(uit|xit,uis,xis) = 0, and so by the LIE, E(uituisx

itxis) = 0, t

\$ s.

for each t,
E(uitx
itxit) = E[E(uitx
itxit|xit)] = E[E(uit|xit)x
itxit)]
2

= E[stx
itxit] =
2

st2E(xitxit),

t = 1,2,...,T.

It follows that
E(X
i ) uiu
i ) Xi) =
-1

-1

-1
S s-2
t E(x
itxit) = E(X
i ) Xi).

t=1

36

Next,

^
^

e. First, run pooled regression across all i and t; let uit denote the
pooled OLS residuals.

N ^2
^2
st = N-1 S ^uit
i=1

## (We might replace N with N - K as a degrees-of-freedom adjustment.)

standard arguments,

Then, by

^2

st Lp s2t as N L 8.

## f. We have verified the assumptions under which standard FGLS statistics

have nice properties (although we relaxed SGLS.1).

In particular, standard

errors obtained from (7.51) are asymptotically valid, and F statistics from
(7.53) are valid.
th

Now, if

^
)

## is taken to be the diagonal matrix with

s^t2 as the

diagonal, then the FGLS statistics are easily shown to be identical to the

## statistics obtained by performing pooled OLS on the equation

^
^
(yit/st) = (xit/st)B + errorit, t = 1,2,...,T, i = 1,...,N.
We can obtain valid standard errors, t statistics, and F statistics from this
weighted least squares analysis.

## For F testing, note that the

^2

st should be

obtained from the pooled OLS residuals for the unrestricted model.
g. If

to pooled OLS.

FGLS reduces

## computing the fully robust standard errors:

. reg lscrap d89 grant grant_1 lscrap_1 if year != 1987
Source |
SS
df
MS
---------+-----------------------------Model | 186.376973
4 46.5942432
Residual | 31.2296502
103 .303200488
---------+-----------------------------Total | 217.606623
107 2.03370676
37

Number of obs
F( 4,
103)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

108
153.67
0.0000
0.8565
0.8509
.55064

-----------------------------------------------------------------------------lscrap |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------d89 | -.1153893
.1199127
-0.962
0.338
-.3532078
.1224292
grant | -.1723924
.1257443
-1.371
0.173
-.4217765
.0769918
grant_1 | -.1073226
.1610378
-0.666
0.507
-.426703
.2120579
lscrap_1 |
.8808216
.0357963
24.606
0.000
.809828
.9518152
_cons | -.0371354
.0883283
-0.420
0.675
-.2123137
.138043
-----------------------------------------------------------------------------The estimated effect of grant, and its lag, are now the expected sign, but
neither is strongly statistically significant.

## certainly different from when we omit the lag of log(scrap).

Now test for AR(1) serial correlation:
. predict uhat, resid
(363 missing values generated)
. gen uhat_1 = uhat[_n-1] if d89
(417 missing values generated)
. reg lscrap grant grant_1 lscrap_1 uhat_1 if d89
Source |
SS
df
MS
---------+-----------------------------Model | 94.4746525
4 23.6186631
Residual | 15.7530202
49 .321490208
---------+-----------------------------Total | 110.227673
53 2.07976741

Number of obs
F( 4,
49)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

54
73.47
0.0000
0.8571
0.8454
.567

-----------------------------------------------------------------------------lscrap |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------grant |
.0165089
.215732
0.077
0.939
-.4170208
.4500385
grant_1 | -.0276544
.1746251
-0.158
0.875
-.3785767
.3232679
lscrap_1 |
.9204706
.0571831
16.097
0.000
.8055569
1.035384
uhat_1 |
.2790328
.1576739
1.770
0.083
-.0378247
.5958904
_cons |
-.232525
.1146314
-2.028
0.048
-.4628854
-.0021646
-----------------------------------------------------------------------------. reg lscrap d89 grant grant_1 lscrap_1 if year != 1987, robust cluster(fcode)
Regression with robust standard errors

Number of obs =
F( 4,
53) =
Prob > F
=
38

108
77.24
0.0000

R-squared
Root MSE

## Number of clusters (fcode) = 54

=
=

0.8565
.55064

-----------------------------------------------------------------------------|
Robust
lscrap |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d89 | -.1153893
.1145118
-1.01
0.318
-.3450708
.1142922
grant | -.1723924
.1188807
-1.45
0.153
-.4108369
.0660522
grant_1 | -.1073226
.1790052
-0.60
0.551
-.4663616
.2517165
lscrap_1 |
.8808216
.0645344
13.65
0.000
.7513821
1.010261
_cons | -.0371354
.0893147
-0.42
0.679
-.216278
.1420073
-----------------------------------------------------------------------------The robust standard errors for grant and grant-1 are actually smaller than the
usual ones, making both more statistically significant.

## grant-1 are jointly insignificant:

. test grant grant_1
( 1)
( 2)

grant = 0.0
grant_1 = 0.0
F(

2,
53) =
Prob > F =

1.14
0.3266

## 7.11. a. The following Stata output should be self-explanatory.

There is

strong evidence of positive serial correlation in the static model, and the
fully robust standard errors are much larger than the nonrobust ones.
. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87
Source |
SS
df
MS
---------+-----------------------------Model | 117.644669
11 10.6949699
Residual |
88.735673
618 .143585231
---------+-----------------------------Total | 206.380342
629 .328108652

Number of obs
F( 11,
618)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

630
74.49
0.0000
0.5700
0.5624
.37893

-----------------------------------------------------------------------------lcrmrte |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lprbarr | -.7195033
.0367657
-19.570
0.000
-.7917042
-.6473024
lprbconv | -.5456589
.0263683
-20.694
0.000
-.5974413
-.4938765
lprbpris |
.2475521
.0672268
3.682
0.000
.1155314
.3795728
39

lavgsen | -.0867575
.0579205
-1.498
0.135
-.2005023
.0269872
lpolpc |
.3659886
.0300252
12.189
0.000
.3070248
.4249525
d82 |
.0051371
.057931
0.089
0.929
-.1086284
.1189026
d83 |
-.043503
.0576243
-0.755
0.451
-.1566662
.0696601
d84 | -.1087542
.057923
-1.878
0.061
-.222504
.0049957
d85 | -.0780454
.0583244
-1.338
0.181
-.1925835
.0364927
d86 | -.0420791
.0578218
-0.728
0.467
-.15563
.0714718
d87 | -.0270426
.056899
-0.475
0.635
-.1387815
.0846963
_cons | -2.082293
.2516253
-8.275
0.000
-2.576438
-1.588149
-----------------------------------------------------------------------------. predict uhat, resid
. gen uhat_1 = uhat[_n-1] if year > 81
(90 missing values generated)
. reg uhat uhat_1
Source |
SS
df
MS
---------+-----------------------------Model | 46.6680407
1 46.6680407
Residual | 30.1968286
538 .056127934
---------+-----------------------------Total | 76.8648693
539 .142606437

Number of obs
F( 1,
538)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

540
831.46
0.0000
0.6071
0.6064
.23691

-----------------------------------------------------------------------------uhat |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------uhat_1 |
.7918085
.02746
28.835
0.000
.7378666
.8457504
_cons |
1.74e-10
.0101951
0.000
1.000
-.0200271
.0200271
-----------------------------------------------------------------------------Because of the strong serial correlation, I obtain the fully robust standard
errors:

## . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87, robust

cluster(county)
Regression with robust standard errors

Number of obs =
F( 11,
89) =
Prob > F
=
R-squared
=
Root MSE
=

## Number of clusters (county) = 90

630
37.19
0.0000
0.5700
.37893

-----------------------------------------------------------------------------|
Robust
lcrmrte |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lprbarr | -.7195033
.1095979
-6.56
0.000
-.9372719
-.5017347
40

lprbconv | -.5456589
.0704368
-7.75
0.000
-.6856152
-.4057025
lprbpris |
.2475521
.1088453
2.27
0.025
.0312787
.4638255
lavgsen | -.0867575
.1130321
-0.77
0.445
-.3113499
.1378348
lpolpc |
.3659886
.121078
3.02
0.003
.1254092
.6065681
d82 |
.0051371
.0367296
0.14
0.889
-.0678438
.0781181
d83 |
-.043503
.033643
-1.29
0.199
-.1103509
.0233448
d84 | -.1087542
.0391758
-2.78
0.007
-.1865956
-.0309127
d85 | -.0780454
.0385625
-2.02
0.046
-.1546683
-.0014224
d86 | -.0420791
.0428788
-0.98
0.329
-.1272783
.0431201
d87 | -.0270426
.0381447
-0.71
0.480
-.1028353
.0487502
_cons | -2.082293
.8647054
-2.41
0.018
-3.800445
-.3641423
-----------------------------------------------------------------------------. drop uhat uhat_1
b. We lose the first year, 1981, when we add the lag of log(crmrte):
. gen lcrmrt_1 = lcrmrte[_n-1] if year > 81
(90 missing values generated)
. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83-d87 lcrmrt_1
Source |
SS
df
MS
---------+-----------------------------Model | 163.287174
11 14.8442885
Residual | 16.8670945
528 .031945255
---------+-----------------------------Total | 180.154268
539 .334237975

Number of obs
F( 11,
528)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

540
464.68
0.0000
0.9064
0.9044
.17873

-----------------------------------------------------------------------------lcrmrte |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lprbarr | -.1668349
.0229405
-7.273
0.000
-.2119007
-.1217691
lprbconv | -.1285118
.0165096
-7.784
0.000
-.1609444
-.0960793
lprbpris | -.0107492
.0345003
-0.312
0.755
-.078524
.0570255
lavgsen | -.1152298
.030387
-3.792
0.000
-.174924
-.0555355
lpolpc |
.101492
.0164261
6.179
0.000
.0692234
.1337606
d83 | -.0649438
.0267299
-2.430
0.015
-.1174537
-.0124338
d84 | -.0536882
.0267623
-2.006
0.045
-.1062619
-.0011145
d85 | -.0085982
.0268172
-0.321
0.749
-.0612797
.0440833
d86 |
.0420159
.026896
1.562
0.119
-.0108203
.0948522
d87 |
.0671272
.0271816
2.470
0.014
.0137298
.1205245
lcrmrt_1 |
.8263047
.0190806
43.306
0.000
.7888214
.8637879
_cons | -.0304828
.1324195
-0.230
0.818
-.2906166
.229651
-----------------------------------------------------------------------------Not surprisingly, the lagged crime rate is very significant.

Further,

41

The

## variable log(prbpris) now has a negative sign, although it is insignificant.

We still get a positive relationship between size of police force and crime
rate, however.
c. There is no evidence of serial correlation in the model with a lagged
dependent variable:
. predict uhat, resid
(90 missing values generated)
. gen uhat_1 = uhat[_n-1] if year > 82
(180 missing values generated)
. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d84-d87 lcrmrt_1 uhat_1
From this regression the coefficient on uhat-1 is only -.059 with t statistic
-.986, which means that there is little evidence of serial correlation
(especially since

r is practically small).

## Thus, I will not correct the

standard errors.
d. None of the log(wage) variables is statistically significant, and the
magnitudes are pretty small in all cases:
. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83-d87 lcrmrt_1 lwconlwloc
Source |
SS
df
MS
---------+-----------------------------Model | 163.533423
20 8.17667116
Residual | 16.6208452
519
.03202475
---------+-----------------------------Total | 180.154268
539 .334237975

Number of obs
F( 20,
519)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

540
255.32
0.0000
0.9077
0.9042
.17895

-----------------------------------------------------------------------------lcrmrte |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lprbarr | -.1746053
.0238458
-7.322
0.000
-.2214516
-.1277591
lprbconv | -.1337714
.0169096
-7.911
0.000
-.166991
-.1005518
lprbpris | -.0195318
.0352873
-0.554
0.580
-.0888553
.0497918
lavgsen | -.1108926
.0311719
-3.557
0.000
-.1721313
-.049654
lpolpc |
.1050704
.0172627
6.087
0.000
.071157
.1389838
d83 | -.0729231
.0286922
-2.542
0.011
-.1292903
-.0165559
d84 | -.0652494
.0287165
-2.272
0.023
-.1216644
-.0088345
42

d85 | -.0258059
.0326156
-0.791
0.429
-.0898807
.038269
d86 |
.0263763
.0371746
0.710
0.478
-.0466549
.0994076
d87 |
.0465632
.0418004
1.114
0.266
-.0355555
.1286819
lcrmrt_1 |
.8087768
.0208067
38.871
0.000
.767901
.8496525
lwcon | -.0283133
.0392516
-0.721
0.471
-.1054249
.0487983
lwtuc | -.0034567
.0223995
-0.154
0.877
-.0474615
.0405482
lwtrd |
.0121236
.0439875
0.276
0.783
-.0742918
.098539
lwfir |
.0296003
.0318995
0.928
0.354
-.0330676
.0922683
lwser |
.012903
.0221872
0.582
0.561
-.0306847
.0564908
lwmfg | -.0409046
.0389325
-1.051
0.294
-.1173893
.0355801
lwfed |
.1070534
.0798526
1.341
0.181
-.0498207
.2639275
lwsta | -.0903894
.0660699
-1.368
0.172
-.2201867
.039408
lwloc |
.0961124
.1003172
0.958
0.338
-.1009652
.29319
_cons | -.6438061
.6335887
-1.016
0.310
-1.88852
.6009076
-----------------------------------------------------------------------------. test lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc
(
(
(
(
(
(
(
(
(

1)
2)
3)
4)
5)
6)
7)
8)
9)

lwcon
lwtuc
lwtrd
lwfir
lwser
lwmfg
lwfed
lwsta
lwloc
F(

=
=
=
=
=
=
=
=
=

0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

9,
519) =
Prob > F =

0.85
0.5663

CHAPTER 8

8.1. Letting Q(b) denote the objective function in (8.23), it follows from
multivariable calculus that

dQ(b)
N
^& N
- - - - - - - - - - - - - - - = -2&7 S Zi Xi*
S Zi (yi - Xib)*8.
8 W7i=1
db
i=1
Evaluating the derivative at the solution

B^

gives

## & SN ZX *^W& SN Z(y - X B

^ *
i )8 = 0.
7i=1 i i8 7i=1 i i
In terms of full data matrices, we can write, after simple algebra,
^
^
^
(XZWZX)B = (XZWZY).

43

B^

Solving for

gives (8.24).

8.3. First, we can always write x as its linear projection plus an error:
*

+ e, where x

= z^ and E(ze) = 0.

x =

*

_ h(z),

## and write the linear projection as

L(y|z,h) = z^1 + h^2,
where

^1

is M

* K and ^2 is Q * K.

^2

= 0.

But, from

^2

-1

## Now, by the assumption that E(x|z) = L(x|z), r is also equal to x - E(x|z).

Therefore, E(r|z) = 0, and so r is uncorrelated with all functions of z.
s is simply a function of z since h
shows that

^2

_ h(z).

But

= 0.

-1

-1

C*

-1/2

where D

_ *1/2WC.

-1

-1/2

C,

* L

## symmetric, idempotent matrix IL - D(DD) D, it is necessarily itself

-1

positive semi-definite.

8.7. When

)^

th

## is a block diagonal matrix with g

denotes the N

^2&

N
^
^
S Zi )
Zi = Z(IN t ))Z

i=1

* ^2
block sg S z
7i=1 igzig8 _ sgZg Zg, where Zg
N

th

block Z
g Xg.

Further, ZX

44

## straightforward to show that the 3SLS estimator consists of

[X
g Zg(Z
g Zg) Z
g Xg] X
g Zg(Z
g Zg) Z
g Yg stacked from g = 1,...,G.
-1

-1

-1

This is just

## 8.9. The optimal instruments are given in Theorem 8.5, with G = 1:

zi = [w(zi)] E(xi|zi),
*

If E(ui|zi) =
2

-1

w(zi) = E(ui2|zi).

## so the optimal instruments are zi^.

2SLS, except that

--rN-consistent
OLS estimator.

or

^
^

The

is used, and so

## 2SLS is asymptotically efficient.

If E(u|x) = 0 and E(u

Without the

)(z)

Var(u1|z) =

s21.

Further,

## It follows that the optimal instruments are

(1/s1)[z1,E(y2|z)].
2

## s21 clearly does not affect the

optimal instruments.
b. If y2 is binary then E(y2|z) = P(y2 = 1|z) = F(z), and so the optimal
IVs are [z1,F(z)].

45

CHAPTER 9

9.1. a. No.

## What causal inference could one draw from this?

We may be

interested in the tradeoff between wages and benefits, but then either of
these can be taken as the dependent variable and the analysis would be by OLS.
Of course, if we have omitted some important factors or have a measurement
error problem, OLS could be inconsistent for estimating the tradeoff.

But it

b. Yes.

## expenditures causing a reduction in crime, and we are certainly interested in

such thought experiments.

## If we could do the appropriate experiment, where

expenditures are assigned randomly across cities, then we could estimate the
crime equation by OLS.

## (In fact, we could use a simple regression analysis.)

The simultaneous equations model recognizes that cities choose law enforcement
expenditures in part on what they expect the crime rate to be.

An SEM is a

## convenient way to allow expenditures to depend on unobservables (to the

econometrician) that affect crime.
c. No.

These are both choice variables of the firm, and the parameters

in a two-equation system modeling one in terms of the other, and vice versa,
have no economic meaning.

## foreign technology affects foreign technology (FT) purchases, why would we

want to hold fixed R&D spending?

## simultaneously chosen, but we should use a SUR model where neither is an

explanatory variable in the others equation.
d. Yes.

46

## determined by the demand for skills; alcohol consumption is determined by

individual behavior.
e. No.

## These are choice variables by the same household.

It makes no

sense to think about how exogenous changes in one would affect the other.
Further, suppose that we look at the effects of changes in local property tax
rates.

We would not want to hold fixed family saving and then measure the

## SUR system with property tax as an explanatory variable seems to be the

appropriate model.
f. No.
profits.

## First, the only variable excluded

from the support equation is the variable mremarr; since the support equation
contains one endogenous variable, this equation is identified if and only if

d21 \$ 0.

## This ensures that there is an exogenous variable shifting the

mothers reaction function that does not also shift the fathers reaction
function.
The visits equation is identified if and only if at least one of finc and
fremarr actually appears in the support equation; that is, we need

d11 \$ 0 or

d13 \$ 0.
b. Each equation can be estimated by 2SLS using instruments 1, finc,
fremarr, dist, mremarr.
c. First, obtain the reduced form for visits:
47

visits =

## p20 + p21finc + p22fremarr + p23dist + p24mremarr + v2.

^
Estimate this equation by OLS, and save the residuals, v2.

## Then, run the OLS

regression
^
support on 1, visits, finc, fremarr, dist, v2
^
and do a (heteroskedasticity-robust) t test that the coefficient on v2 is
zero.

## the support equation.

d. There is one overidentifying restriction in the visits equation,
assuming that

Assuming

## homoskedasticity of u2, the easiest way to test the overidentifying

restriction is to first estimate the visits equation by 2SLS, as in part b.
^
Let u2 be the 2SLS residuals.

## Then, run the auxiliary regression

^
u2 on 1, finc, fremarr, dist, mremarr;
the sample size times the usual R-squared from this regression is distributed
asymptotically as

## c21 under the null hypothesis that all instruments are

exogenous.
A heteroskedasticity-robust test is also easy to obtain.

^
Let support

denote the fitted values from the reduced form regression for support.

Next,

^
regress finc (or fremarr) on support, mremarr, dist, and save the residuals,
^
say r1.

^ ^
Then, run the simple regression (without intercept) of 1 on u2r1; N -

9.5. a. Let

B1

denote the 7

## with only the normalization restriction imposed:

B1

= (-1,g12,g13,d11,d12,d13,d14).
48

The restrictions

R1 =

&0 0
71 0

0
0

0
0

1
0

0*
.
18

0
1

## Because R1 has two rows, and G - 1 = 2, the order condition is satisfied.

Now, we need to check the rank condition.

* 3 matrix

## of all structural parameters with only the three normalizations,

straightforward matrix multiplication gives
R1B =

&
d12
2d + d - 1
14
7 13

d23

d22
+ d24 - g21

d33

d32
*
2.
+ d34 - g31
8

R1B is zero.

## d23 = 0, d24 = 0, g31 = 0, and g32 = 0,

&0
0
R1B = 2
70 -g21 d33
Identification requires g21 \$ 0 and d32 \$

0,

But

g23 = 0, d22 =

## and so R1B becomes

d32
*
2.
+ d34 - g31
8
0.

b. It is easy to see how to estimate the first equation under the given
assumptions.

Set

After simple

algebra we get
y1 - z4 =

Note

## that, if we just count instruments, there are just enough instruments to

estimate this equation.

9.7. a. Because alcohol and educ are endogenous in the first equation, we need
at least two elements in z(2) and/or z(3) that are not also in z(1).

49

Ideally,

we have at least one such element in z(2) and at least one such element in
z(3).
b. Let z denote all nonredundant exogenous variables in the system.

Then

## use these as instruments in a 2SLS analysis.

c. The matrix of instruments for each i is

(
2z i
Zi = 2 0
2 0
9
d. z(3) = z.

0
(zi,educi)
0

)
2
2.
zi2
0
0

## the reduced form for educ.

9.9. a. Here is my Stata output for the 3SLS estimation of (9.28) and (9.29):

. reg3 (hours lwage educ age kidslt6 kidsge6 nwifeinc) (lwage hours educ exper
expersq)
Three-stage least squares regression
---------------------------------------------------------------------Equation
Obs Parms
RMSE
"R-sq"
chi2
P
---------------------------------------------------------------------hours
428
6
1368.362
-2.1145
34.53608
0.0000
lwage
428
4
.6892584
0.0895
79.87188
0.0000
--------------------------------------------------------------------------------------------------------------------------------------------------|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------hours
|
lwage |
1676.933
431.169
3.89
0.000
831.8577
2522.009
educ | -205.0267
51.84729
-3.95
0.000
-306.6455
-103.4078
age | -12.28121
8.261529
-1.49
0.137
-28.47351
3.911094
kidslt6 | -200.5673
134.2685
-1.49
0.135
-463.7287
62.59414
kidsge6 | -48.63986
35.95137
-1.35
0.176
-119.1032
21.82352
nwifeinc |
.3678943
3.451518
0.11
0.915
-6.396957
7.132745
_cons |
2504.799
535.8919
4.67
0.000
1454.47
3555.128
-------------+---------------------------------------------------------------lwage
|
hours |
.000201
.0002109
0.95
0.340
-.0002123
.0006143
educ |
.1129699
.0151452
7.46
0.000
.0832858
.1426539
exper |
.0208906
.0142782
1.46
0.143
-.0070942
.0488753
50

expersq | -.0002943
.0002614
-1.13
0.260
-.0008066
.000218
_cons | -.7051103
.3045904
-2.31
0.021
-1.302097
-.1081241
-----------------------------------------------------------------------------Endogenous variables: hours lwage
Exogenous variables:
educ age kidslt6 kidsge6 nwifeinc exper expersq
------------------------------------------------------------------------------

b. To be added.

## conveniently allow system estimation using different instruments for different

equations.

9.11. a. Since z2 and z3 are both omitted from the first equation, we just
need

d11 \$ 0.

## p11 = d11/(1 - g12g21).

c. We can estimate the system by 3SLS; for the second equation, this is
identical to 2SLS since it is just identified.
each equation.

Given

## d. Whether we estimate the parameters by 2SLS or 3SLS, we will generally

inconsistently estimate

## equation by 2SLS, we will still consistently estimate

misspecified this equation.)

So our estimate of

## inconsistent in any case.

e. We can just estimate the reduced form E(y2|z1,z2,z3) by ordinary least
squares.
f. Consistency of OLS for

## exclusion restrictions in the structural model, whereas using an SEM does.

course, if the SEM is correctly specified, we obtain a more efficient
51

Of

estimating

p11.

d22 \$ 0.

(This is

## the rank condition.)

b. Here is my Stata output:
. reg open lpcinc lland
Source |
SS
df
MS
---------+-----------------------------Model | 28606.1936
2 14303.0968
Residual | 35151.7966
111 316.682852
---------+-----------------------------Total | 63757.9902
113 564.230002

Number of obs
F( 2,
111)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

114
45.17
0.0000
0.4487
0.4387
17.796

-----------------------------------------------------------------------------open |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lpcinc |
.5464812
1.49324
0.366
0.715
-2.412473
3.505435
lland | -7.567103
.8142162
-9.294
0.000
-9.180527
-5.953679
_cons |
117.0845
15.8483
7.388
0.000
85.68006
148.489
-----------------------------------------------------------------------------This shows that log(land) is very statistically significant in the RF for
Smaller countries are more open.

open.

## . reg inf open lpcinc (lland lpcinc)

Source |
SS
df
MS
---------+-----------------------------Model | 2009.22775
2 1004.61387
Residual |
63064.194
111 568.145892
---------+-----------------------------Total | 65073.4217
113 575.870989

Number of obs
F( 2,
111)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

(2SLS)
114
2.79
0.0657
0.0309
0.0134
23.836

-----------------------------------------------------------------------------inf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------open | -.3374871
.1441212
-2.342
0.021
-.6230728
-.0519014
lpcinc |
.3758247
2.015081
0.187
0.852
-3.617192
4.368841
_cons |
26.89934
15.4012
1.747
0.083
-3.61916
57.41783
52

## -----------------------------------------------------------------------------. reg inf open lpcinc

Source |
SS
df
MS
---------+-----------------------------Model | 2945.92812
2 1472.96406
Residual | 62127.4936
111
559.70715
---------+-----------------------------Total | 65073.4217
113 575.870989

Number of obs
F( 2,
111)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

114
2.63
0.0764
0.0453
0.0281
23.658

-----------------------------------------------------------------------------inf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------open | -.2150695
.0946289
-2.273
0.025
-.402583
-.027556
lpcinc |
.0175683
1.975267
0.009
0.993
-3.896555
3.931692
_cons |
25.10403
15.20522
1.651
0.102
-5.026122
55.23419
-----------------------------------------------------------------------------The 2SLS estimate is notably larger in magnitude.
has a larger standard error.

endogenous.
d. If we add

2

## log(land) is partially correlated with open, [log(land)]

candidate.

A regression of open

Since

is a natural
2

2

## gives a heteroskedasticity-robust t statistic on [log(land)]

This is borderline, but we will go ahead.

of about 2.

## . gen opensq = open^2

. gen llandsq = lland^2
. reg inf open opensq lpcinc (lland llandsq lpcinc)
Source |
SS
df
MS
---------+-----------------------------Model | -414.331026
3 -138.110342
Residual | 65487.7527
110 595.343207
---------+-----------------------------Total | 65073.4217
113 575.870989

Number of obs
F( 3,
110)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

(2SLS)
114
2.09
0.1060
.
.
24.40

-----------------------------------------------------------------------------inf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------53

open | -1.198637
.6205699
-1.932
0.056
-2.428461
.0311868
opensq |
.0075781
.0049828
1.521
0.131
-.0022966
.0174527
lpcinc |
.5066092
2.069134
0.245
0.807
-3.593929
4.607147
_cons |
43.17124
19.36141
2.230
0.028
4.801467
81.54102
------------------------------------------------------------------------------

The squared term indicates that the impact of open on inf diminishes; the
estimate would be significant at about the 6.5% level against a one-sided
alternative.
e. Here is the Stata output for implementing the method described in the
problem:
. reg open lpcinc lland
Source |
SS
df
MS
-------------+-----------------------------Model | 28606.1936
2 14303.0968
Residual | 35151.7966
111 316.682852
-------------+-----------------------------Total | 63757.9902
113 564.230002

Number of obs
F( 2,
111)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

114
45.17
0.0000
0.4487
0.4387
17.796

-----------------------------------------------------------------------------open |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lpcinc |
.5464812
1.49324
0.37
0.715
-2.412473
3.505435
lland | -7.567103
.8142162
-9.29
0.000
-9.180527
-5.953679
_cons |
117.0845
15.8483
7.39
0.000
85.68006
148.489
-----------------------------------------------------------------------------. predict openh
(option xb assumed; fitted values)
. gen openhsq = openh^2
. reg inf openh openhsq lpcinc
Source |
SS
df
MS
-------------+-----------------------------Model | 3743.18411
3 1247.72804
Residual | 61330.2376
110 557.547615
-------------+-----------------------------Total | 65073.4217
113 575.870989

Number of obs
F( 3,
110)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

114
2.24
0.0879
0.0575
0.0318
23.612

-----------------------------------------------------------------------------inf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------54

openh | -.8648092
.5394132
-1.60
0.112
-1.933799
.204181
openhsq |
.0060502
.0059682
1.01
0.313
-.0057774
.0178777
lpcinc |
.0412172
2.023302
0.02
0.984
-3.968493
4.050927
_cons |
39.17831
19.48041
2.01
0.047
.5727026
77.78391
------------------------------------------------------------------------------

Qualitatively, the results are similar to the correct IV method from part d.
If

## But the forbidden regression implemented in this part

is uncessary, less robust, and we cannot trust the standard errors, anyway.

CHAPTER 10

## 10.1. a. Since investment is likely to be affected by macroeconomic factors,

it is important to allow for these by including separate time intercepts; this
is done by using T - 1 time period dummies.
b. Putting the unobserved effect ci in the equation is a simple way to
account for time-constant features of a county that affect investment and
might also be correlated with the tax variable.

## county economic climate, which affects investment, could easily be correlated

with tax rates because tax rates are, at least to a certain extent, selected
by state and local officials.

## If only a cross section were available, we

would have to find an instrument for the tax variable that is uncorrelated
with ci and correlated with the tax rate.

## c. Standard investment theories suggest that, ceteris paribus, larger

marginal tax rates decrease investment.
d. I would start with a fixed effects analysis to allow arbitrary
correlation between all time-varying explanatory variables and ci. (Actually,
55

doing pooled OLS is a useful initial exercise; these results can be compared
with those from an FE analysis).

## Such an analysis assumes strict exogeneity

of zit, taxit, and disasterit in the sense that these are uncorrelated with
the errors uis for all t and s.
I have no strong intuition for the likely serial correlation properties
of the {uit}.

## These might have little serial correlation because we have

allowed for ci, in which case I would use standard fixed effects.

However, it

seems more likely that the uit are positively autocorrelated, in which case I
might use first differencing instead.

## fully robust standard errors along with the usual ones.

differencing it is easy to test whether the changes

## Duit are serially

uncorrelated.
e. If taxit and disasterit do not have lagged effects on investment, then
the only possible violation of the strict exogeneity assumption is if future
values of these variables are correlated with uit.
this is not a worry for the disaster variable:

## presumably, future natural

On the other hand, state

## officials might look at the levels of past investment in determining future

tax policy, especially if there is a target level of tax revenue the officials
are are trying to achieve.
rates:

## sometimes property tax rates are set depending on recent housing

values, since a larger base means a smaller rate can achieve the same amount
of revenue.

## ------10.3. a. Let xi = (xi1 + xi2)/2, yi = (yi1 + yi2)/2,

xi1 = xi1 - xi,
56

--
.
xi2 = xi2 - xi, and similarly for
yi1 and y
i2

B^FE

i2 i2 8
i2 i2 8
7i=1 i1 i1
7i=1 i1 i1

Dxi/2

Dyi/2.

Therefore,

x
i1xi1 + x
i2xi2 =

DxD
i xi/4 + DxD
i xi/4 = DxD
i xi/2

x
i1yi1 + x
i2 i2 =

DxD
i yi/4 + DxD
i yi/4 = DxD
i yi/2,

and so

B^FE

## & SN DxDx /2*-1& SN DxDy /2*

7i=1 i i 8 7i=1 i i 8
& SN DxDx *-1& SN DxDy * = B
^
=
FD.
7i=1 i i8 7i=1 i i8

^
^
^
^
B
b. Let ui1 =
yi1 -
xi1BFE and ui2 =
yi2 - x
i2 FE be the fixed effects
residuals for the two time periods for cross section observation i.
=

B^FD,

Since

B^FE

## and using the representations in (4.1), we have

^
^
ui1 = -Dyi/2 - (-Dxi/2)BFD = -(Dyi ^
ui2 =

^
where ei

^
^
DxiB
FD)/2 _ -ei/2

^
^
^
Dyi/2 - (Dxi/2)B
FD = (Dyi - DxiBFD)/2 _ ei/2,

^
_ Dyi - DxiB
FD are the first difference residuals, i = 1,2,...,N.

Therefore,
N
N ^2
2
^2
S (u^i1
+ ui2) = (1/2) S ei.

i=1

i=1

This shows that the sum of squared residuals from the fixed effects regression
is exactly one have the sum of squared residuals from the first difference
regression.

Since we know the variance estimate for fixed effects is the SSR
57

divided by N - K (when T = 2), and the variance estimate for first difference
is the SSR divided by N - K, the error variance from fixed effects is always
half the size as the error variance for first difference estimation, that is,
^2

## su = ^s2e/2 (contrary to what the problem asks you so show).

to show is that the variance matrix estimates of

B^FE

and

## What I wanted you

B^FD

are identical.

This is easy since the variance matrix estimate for fixed effects is
-1
N
N
*-1 = ^s2& SN DxDx *-1,
su7 S (xi1xi1 + xi2xi2)*8 = (^s2e/2)&7 S DxD
x
/2
i
i 8
e7
i
i8
i=1
i=1
i=1

^2&

## which is the variance matrix estimator for first difference.

Thus, the

standard errors, and in fact all other test statistics (F statistics) will be
numerically identical using the two approaches.

i = cijTj
T + uiu
i + jT(ciu
i ) + (ciui)j
T.
2

Under RE.1,

## E(ui|xi,ci) = 0, which implies that E[(ciu

i )|xi) = 0 by interated expecations.
Under RE.3a, E(uiu|
i xi,ci) =

2
s2uIT, which implies that E(uiu|
i xi) = suIT

## (again, by iterated expectations).

Therefore,

E(viv|
i xi) = E(ci|xi)jTj
T + E(uiu|
i xi) = h(xi)jTj
T +
2

where h(xi)

s2uIT,

## This shows that the

conditional variance matrix of vi given xi has the same covariance for all t
s, h(xi), and the same variance for all t, h(xi) +

s2u.

## variances and covariances depend on xi in general, they do not depend on time

separately.
b. The RE estimator is still consistent and

-r-N-asymptotically
normal

without assumption RE.3b, but the usual random effects variance estimator of

B^RE

## is no longer valid because E(viv|

i xi) does not have the form (10.30)

58

## 10.7. I provide annotated Stata output, and I compute the nonrobust

regression-based statistic from equation (11.79):

## . * random effects estimation

. iis id
. tis term
. xtreg trmgpa spring crsgpa frstsem season sat verbmath hsperc hssize black
female, re

sd(u_id)
sd(e_id_t)
sd(e_id_t + u_id)

=
=
=

.3718544
.4088283
.5526448

corr(u_id, X)

0 (assumed)

(theta = 0.3862)

## Random-effects GLS regression

Number of obs =
732
n =
366
T =
2
R-sq within
between
overall

=
=
=

0.2067
0.5390
0.4785

chi2( 10)
=
Prob > chi2 =

512.77
0.0000

-----------------------------------------------------------------------------trmgpa |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------spring | -.0606536
.0371605
-1.632
0.103
-.1334868
.0121797
crsgpa |
1.082365
.0930877
11.627
0.000
.8999166
1.264814
frstsem |
.0029948
.0599542
0.050
0.960
-.1145132
.1205028
season | -.0440992
.0392381
-1.124
0.261
-.1210044
.0328061
sat |
.0017052
.0001771
9.630
0.000
.0013582
.0020523
verbmath | -.1575199
.16351
-0.963
0.335
-.4779937
.1629538
hsperc | -.0084622
.0012426
-6.810
0.000
-.0108977
-.0060268
hssize | -.0000775
.0001248
-0.621
0.534
-.000322
.000167
black | -.2348189
.0681573
-3.445
0.000
-.3684048
-.1012331
female |
.3581529
.0612948
5.843
0.000
.2380173
.4782886
_cons |
-1.73492
.3566599
-4.864
0.000
-2.43396
-1.035879
-----------------------------------------------------------------------------. * fixed effects estimation, with time-varying variables only.
. xtreg trmgpa spring crsgpa frstsem season, fe
59

sd(u_id)
sd(e_id_t)
sd(e_id_t + u_id)

=
=
=

.679133
.4088283
.792693

corr(u_id, Xb)

-0.0893

## Fixed-effects (within) regression

Number of obs =
732
n =
366
T =
2
R-sq within
between
overall
F(

=
=
=

0.2069
0.0333
0.0613

4,
362) =
Prob > F =

23.61
0.0000

-----------------------------------------------------------------------------trmgpa |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------spring | -.0657817
.0391404
-1.681
0.094
-.1427528
.0111895
crsgpa |
1.140688
.1186538
9.614
0.000
.9073506
1.374025
frstsem |
.0128523
.0688364
0.187
0.852
-.1225172
.1482218
season | -.0566454
.0414748
-1.366
0.173
-.1382072
.0249165
_cons | -.7708056
.3305004
-2.332
0.020
-1.420747
-.1208637
-----------------------------------------------------------------------------id |
F(365,362) =
5.399
0.000
(366 categories)
. * Obtaining the regression-based Hausman test is a bit tedious.
compute the time-averages for all of the time-varying variables:
. egen atrmgpa = mean(trmgpa), by(id)
. egen aspring = mean(spring), by(id)
. egen acrsgpa = mean(crsgpa), by(id)
. egen afrstsem = mean(frstsem), by(id)
. egen aseason = mean(season), by(id)
. * Now obtain GLS transformations for both time-constant and
. * time-varying variables. Note that lamdahat = .386.
. di 1 - .386
.614
. gen bone = .614
. gen bsat = .614*sat
. gen bvrbmth = .614*verbmath
. gen bhsperc = .614*hsperc
. gen bhssize = .614*hssize
60

First,

## . gen bblack = .614*black

. gen bfemale = .614*female
. gen btrmgpa = trmgpa - .386*atrmgpa
. gen bspring = spring - .386*aspring
. gen bcrsgpa = crsgpa - .386*acrsgpa
. gen bfrstsem = frstsem - .386*afrstsem
. gen bseason = season - .386*aseason
. * Check to make sure that pooled OLS on transformed data is random
. * effects.
. reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc
bhssize bblack bfemale, nocons
Source |
SS
df
MS
---------+-----------------------------Model | 1584.10163
11 144.009239
Residual | 120.359125
721
.1669336
---------+-----------------------------Total | 1704.46076
732
2.3284983

Number of obs
F( 11,
721)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

732
862.67
0.0000
0.9294
0.9283
.40858

-----------------------------------------------------------------------------btrmgpa |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------bone | -1.734843
.3566396
-4.864
0.000
-2.435019
-1.034666
bspring |
-.060651
.0371666
-1.632
0.103
-.1336187
.0123167
bcrsgpa |
1.082336
.0930923
11.626
0.000
.8995719
1.265101
bfrstsem |
.0029868
.0599604
0.050
0.960
-.114731
.1207046
bseason | -.0440905
.0392441
-1.123
0.262
-.1211368
.0329558
bsat |
.0017052
.000177
9.632
0.000
.0013577
.0020528
bvrbmth | -.1575166
.1634784
-0.964
0.336
-.4784672
.163434
bhsperc | -.0084622
.0012424
-6.811
0.000
-.0109013
-.0060231
bhssize | -.0000775
.0001247
-0.621
0.535
-.0003224
.0001674
bblack | -.2348204
.0681441
-3.446
0.000
-.3686049
-.1010359
bfemale |
.3581524
.0612839
5.844
0.000
.2378363
.4784686
-----------------------------------------------------------------------------. * These are the RE estimates, subject to rounding error.
. * Now add the time averages of the variables that change across i and t
. * to perform the Hausman test:
. reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc
bhssize bblack bfemale acrsgpa afrstsem aseason, nocons
61

Source |
SS
df
MS
---------+-----------------------------Model | 1584.40773
14 113.171981
Residual | 120.053023
718 .167204767
---------+-----------------------------Total | 1704.46076
732
2.3284983

Number of obs
F( 14,
718)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

732
676.85
0.0000
0.9296
0.9282
.40891

-----------------------------------------------------------------------------btrmgpa |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------bone | -1.423761
.5182286
-2.747
0.006
-2.441186
-.4063367
bspring | -.0657817
.0391479
-1.680
0.093
-.1426398
.0110764
bcrsgpa |
1.140688
.1186766
9.612
0.000
.9076934
1.373683
bfrstsem |
.0128523
.0688496
0.187
0.852
-.1223184
.148023
bseason | -.0566454
.0414828
-1.366
0.173
-.1380874
.0247967
bsat |
.0016681
.0001804
9.247
0.000
.001314
.0020223
bvrbmth | -.1316462
.1654425
-0.796
0.426
-.4564551
.1931626
bhsperc | -.0084655
.0012551
-6.745
0.000
-.0109296
-.0060013
bhssize | -.0000783
.0001249
-0.627
0.531
-.0003236
.000167
bblack | -.2447934
.0685972
-3.569
0.000
-.3794684
-.1101184
bfemale |
.3357016
.0711669
4.717
0.000
.1959815
.4754216
acrsgpa | -.1142992
.1234835
-0.926
0.355
-.3567312
.1281327
afrstsem | -.0480418
.0896965
-0.536
0.592
-.2241405
.1280569
aseason |
.0763206
.0794119
0.961
0.337
-.0795867
.2322278
-----------------------------------------------------------------------------. test acrsgpa afrstsem aseason
( 1)
( 2)
( 3)

acrsgpa = 0.0
afrstsem = 0.0
aseason = 0.0
F(

3,
718) =
Prob > F =

0.61
0.6085

. * Thus, we fail to reject the random effects assumptions even at very large
. * significance levels.
For comparison, the usual form of the Hausman test, which includes spring
among the coefficients tested, gives p-value = .770, based on a
distribution (using Stata 7.0).

c24

## regression-based test robust to any violation of RE.3:

cluster(id)" to the regression command.

62

## The simplest way to compute a Hausman test

is to just add the time averages of all explanatory variables, excluding the
dummy variables, and estimating the equation by random effects.
done a better job of spelling this out in the text.

I should have

---

## yit = xitB + wiX + rit, t = 1,...,T,

where xit includes an overall intercept along with time dummies, as well as
wit, the covariates that change across i and t.
by random effects and test H0:

= 0.

## The actual calculation for this example

is to be added.

Parts b, c, and d:

To be added.

10.11. To be added.

as N

L 8.

## Yes, we can justify this procedure with fixed T

In particular, it produces a

estimator of

B.

-r-N-consistent,
asymptotically normal

## weights are known functions of exogenous variables (including xi and possible

other covariates that do not appear in the conditional mean), is another case
where "estimating" the fixed effects leads to an estimator of
properties.

with good

## it is mostly just algebra.

First, in the sum of squared residuals, we can "concentrate" the ai out
^
by finding ai(b) as a function of (xi,yi) and b, substituting back into the

63

## sum of squared residuals, and then minimizing with respect to b only.

Straightforward algebra gives the first order conditions for each i as
T
S (yit - ^ai - xitb)/hit = 0,

t=1

which gives
^
&
ai(b) = wi

T
*
& T
*
S
yit/hit - wi S xit/hit b
7t=1
8
7t=1
8
w
w
_ yi - xib,
T
&
*
- - -w
& T
*
where wi _ 1/ S (1/hit) > 0 and yi _ wi S yit/hit , and a similar
7t=1
8
7t=1
8
- - -w
- - -w
- - -w
definition holds for xi. Note that yi and xi are simply weighted averages.
- - -w
- - -w
If h
equals the same constant for all t, y and x are the usual time
it

averages.
^
^
Now we can plug each ai(b) into the SSR to get the problem solved by B:
min
K
beR

N T
S S [(yit - -y- -wi) - (xit - x- - -wi)b]2/hit.

i=1t=1

- - -w

But this is just a pooled weighted least squares regression of (yit - yi) on

-----w

~
Equivalently, define yit

~
xit

## - - - -, all t = 1,...,T, i = 1,...,N.

_ (xit - -x- -wi)/r-h- -it

- - - -,
_ (yit - -y- -wi)/r-h- -it

Then

B^

can be expressed

## in usual pooled OLS form:

B^

& SN ST x~ x~ *-1& SN ST x~ y~ *.
7i=1t=1 it it8 7i=1t=1 it it8

(10.82)

- - -w

Note carefully how the initial yit are weighted by 1/hit to obtain yi, but

-------

where the usual 1/rhit weighting shows up in the sum of squared residuals on
the time-demeaned data (where the demeaming is a weighted average).

Given

^
L 8) properties of B
. First, it is
- - -w - - -w
- - -w
- - -w
& T
*
easy to show that yi = xiB + ci + ui, where ui _ wi S uit/hit . Subtracting
7t=1
8
(10.82), we can study the asymptotic (N

## this equation from yit = xitB

~
where uit

- - - -.
_ (uit - -u- -wi)/r-h- -it

~
~
ci + uit for all t gives yit = xitB

~
uit,

~
When we plug this in for yit in (10.82) and

## divide by N in the appropriate places we get

64

&N-1 SN ST x~ x~ *-1&N-1 SN ST x~ u~ *.
7 i=1t=1 it it8 7 i=1t=1 it it8
T ~ ~
T ~
------Straightforward algebra shows that S x
ituit = S x
ituit/rhit, i = 1,...,N,

B^

t=1

t=1

B^

## &N-1 SN ST x~ x~ *-1&N-1 SN ST x~ u /rh- - - - - - -*.

7 i=1t=1 it it8 7 i=1t=1 it it it8

(10.83)

B^.

Why?

We

## assumed that E(uit|xi,hi,ci) = 0, which means uit is uncorrelated with any

~
function of (xi,hi), including xit.

~
So E(x
ituit) = 0, t = 1,...,T.

As long

## & S E(x~ ~x )* = K, we can use the usual proof to show

it it 8
7t=1
^
^
plim(B) = B. (We can even show that E(B|X,H) = B.)
^
--It is also clear from (10.83) that B is rN-asymptotically normal under
T

as we assume rank

mild assumptions.

## The asymptotic variance is generally

Avar

--- B
^
-1
-1
rN(
- B) = A BA ,

where
T
T
- - - -* .
_ S E(x~it~xit) and B _ Var&7 S ~xituit/r-h- -it
8
t=1
t=1
If we assume that Cov(uit,uis|xi,hi,ci) = 0, t \$ s, in addition to the

=

## s2uhit, then it is easily shown that B

--- B
^
2 -1
su2A, and so rN(
- B) = suA .
The same subtleties that arise in estimating

## covariance assumption and correct variance specification in the previous

paragraph.

Then, note that the residuals from the pooled OLS regression
~
~
yit on xit, t = 1,...,T, i = 1,...,N,

(10.84)

^
~
- - -w
------^
say rit, are estimating uit = (uit - ui)/rhit (in the sense that we obtain rit
~
from uit by replacing

-----w

+ E[(ui) /hit] =

with

B^).

~2
2
- - -w
Now E(uit) = E[(uit/hit)] - 2E[(uitui)/hit]

65

i
i
u i

## iterated expectations is applied several times, and E[(ui)

been used.

~2
Therefore, E(uit) =

## S E(u~2it) = s2u{T - E[wiWSTt=1(1/hit)]} = s2u(T - 1).

t=1

This contains the usual result for the within transformation as a special
case.

A consistent estimator of

## usual sum of squared residuals from (10.84), and the subtraction of K is

optional.

^
The estimator of Avar(B) is then
^2&

-1
N T
su7 S S ~xit~xit*8 .
i=1t=1

## If we want to allow serial correlation in the {uit}, or allow

Var(uit|xi,hi,ci)

\$ s2uhit, then we can just apply the robust formula for the

CHAPTER 11

## 11.1. a. It is important to remember that, any time we put a variable in a

regression model (whether we are using cross section or panel data), we are
controlling for the effects of that variable on the dependent variable.

The

## whole point of regression analysis is that it allows the explanatory variables

to be correlated while estimating ceteris paribus effects.

Thus, the

## inclusion of yi,t-1 in the equation allows progit to be correlated with

yi,t-1, and also recognizes that, due to inertia, yit is often strongly
related to yi,t-1.
An assumption that implies pooled OLS is consistent is
E(uit|zi,xit,yi,t-1,progit) = 0, all t,

66

## which is implied by but is weaker than dynamic completeness.

Without

additional assumptions, the pooled OLS standard errors and test statistics
need to be adjusted for heteroskedasticity and serial correlation (although
the later will not be present under dynamic completeness).
b. As we discussed in Section 7.8.2, this statement is incorrect.
Provided our interest is in E(yit|zi,xit,yi,t-1,progit), we do not care about
serial correlation in the implied errors, nor does serial correlation cause
inconsistency in the OLS estimators.
c. Such a model is the standard unobserved effects model:
yit = xitB +

d1progit + ci + uit,

t=1,2,...,T.

## We would probably assume that (xit,progit) is strictly exogenous; the weakest

form of strict exogeneity is that (xit,progit) is uncorrelated with uis for
all t and s.

differencing.

We

## could also do a GLS analysis after the fixed effects or first-differencing

transformations, but we should have a large N.
d. A model that incorporates features from parts a and c is
yit = xitB +

t = 1,...,T.

## Now, program participation can depend on unobserved city heterogeneity as well

as on lagged yit (we assume that yi0 is observed).
differencing are both inconsistent as N

## Fixed effects and first-

L 8 with fixed T.

## Assuming that E(uit|xi,progi,yi,t-1,yi,t-2,...,yi0) = 0, a consistent

procedure is obtained by first differencing, to get
yit =
At time t and
yi,t-j for j

t=2,...,T.

> 2.

## Either pooled 2SLS or a GMM procedure can be used.

67

Under

strict exogeneity, past and future values of xit can also be used as
instruments.

## bxit + ci + uit - brit, the fixed effects estimator ^bFE

can be written as
2
N T
N T
b + 7&N-1 S S (xit - x- - -i)*8 &7N-1 S S (xit - x- - -i)(uit - u- - -i - b(rit - -r- -i)*8.
i=1t=1
i=1t=1
*
*
--*
Now, xit - xi = (xit - xi) + (rit - ri). Then, because E(rit|xi,ci) = 0 for
*
- - -*
--all t, (x
- x ) and (r
- r ) are uncorrelated, and so
it

it

---

- - -*

---

## Var(xit - xi) = Var(xit - xi) + Var(rit - ri), all t.

---

---

Similarly, under (11.30), (xit - xi) and (uit - ui) are uncorrelated for all

---

---

- - -*

---

---

t.

---

Var(rit - ri).

## By the law of large numbers and the assumption of constant

variances across t,
N

T
T
S S (xit - x- - -i) Lp S Var(xit - -x- -i) = T[Var(x*it - -x- -*i) + Var(rit - -r- -i)]

-1 N

i=1t=1

t=1

and
N

T
S S (xit - x- - -i)(uit - u- - -i - b(rit - -r- -i) Lp -TbVar(rit - -r- -i).

-1 N

i=1t=1

Therefore,

plim

bFE

--Var(rit - r i )
&
*
= b - b -------------------------------------------------------------------------------7[Var(x* - x- - - * ) + Var(r - r- - - )]8
it

it

it

it

--Var(rit - r i )
&
*
= b 1 - -------------------------------------------------------------------------------- .
7
*
*
8
[Var(x
- x ) +
Var(r
- r )]

## 11.5. a. E(vi|zi,xi) = Zi[E(ai|zi,xi) -

A]

+ E(ui|zi,xi) = Zi(A -

A)

+ 0 = 0.

## Next, Var(vi|zi,xi) = ZiVar(ai|zi,xi)Z

i + Var(ui|zi,xi) + Cov(ai,ui|zi,xi) +
Cov(ui,ai|zi,xi) = ZiVar(ai|zi,xi)Z
i + Var(ui|zi,xi) because ai and ui are
uncorrelated, conditional on (zi,xi), by FE.1 and the usual iterated
68

expectations argument.

i +

## s2uIT under the

assumptions given, which shows that the conditional variance depends on zi.
Unlike in the standard random effects model, there is conditional
heteroskedasticity.
b. If we use the usual RE analysis, we are applying FGLS to the equation
yi = ZiA + XiB + vi, where vi = Zi(ai -

A)

+ ui.

## E(vi|xi,zi) = 0, and so the usual RE estimator is consistent (as N

fixed T) and
RE.2, holds.
provided

^
)

L 8 for

-r-N-asymptotically
normal, provided the rank condition, Assumption
(Remember, a feasible GLS analysis with any

^
)

will be consistent

## converges in probability to a nonsingular matrix as N

L 8.

It need

^
^
not be the case that Var(vi|xi,zi) = plim()), or even that Var(vi) = plim()).
From part a, we know that Var(vi|xi,zi) depends on zi unless we restrict
almost all elements of
constant in zit).

## based on the usual RE variance matrix estimator -- will be invalid.

c. We can easily make the RE analysis fully robust to an arbitrary
Var(vi|xi,zi), as in equation (7.49).

11.7. When

Lt

L/T

---

B^

(along with

along with

B.

## yit = xitB + xiL

Let

L)

vit, t = 1,2,...,T.

By

## standard results on partitioned regression [for example, Davidson and

MacKinnon (1993, Section 1.4)],

B^

procedure:

69

---

## (i) Regress xit on xi across all t and i, and save the 1

* K vectors of

^
residuals, say rit, t = 1,...,T, i = 1,...,N.
^
(ii) Regress yit on rit across all t and i.

B^

## We want to show that

is the FE estimator.

^
^
The OLS vector on rit is B.

## Given that the FE estimator can

--^
be obtained by pooled OLS of yit on (xit - xi), it suffices to show that rit =
-----

## xit - xi for all t and i.

But

^
--- &
rit = xit - xi

N T - - - - - - *-1& N T - - S
S xi xi8 7 S S xi xit*8
7i=1t=1
i=1t=1
N --- --N T --- --^
--= S Tx
i xi = S S x
i xi, and so rit = xit - xiIK

N T
N
T
S S x- - -i xit = S -x- -i S xit
i=1t=1
i=1
t=1
i=1
----= xit - xi. This completes the proof.

and

i=1t=1

11.9. a. We can apply Problem 8.8.b, as we are applying pooled 2SLS to the

& ST E(z x )* = K.
it it 8
7t=1

time-demeaned equation:

rank

7t=1

## The condition rank

constant instruments.

zit so that

t=1

## b. We can apply the results on GMM estimation in Chapter 8.

In

-1

particular, in equation (8.25), take C = E(Z
i Xi), W = [E(Z
i Zi)] , and

E(Z
i uiu
i Zi).

"
u , where
A key point is that
Z
i ui = (QTZi)(QTui) = Z
i QTui = Z
i i

Under (11.80),

E(uiu|
i Zi) =
u u
E(Z
i i i Zi) =

s2uE(Z
i Zi).

into (8.25)

Avar

--- B
^
2
-1
-1
Z

rN(
- B) = su{E(X
i i)[E(Z
i Zi)] E(Z
i Xi)} .

## c. The argument is very similar to the case of the fixed effects

T

estimator.

First,

2
2
S E(uit
) = (T - 1)su, just as before.

t=1

70

^
If uit = y
it - xitB

are the pooled 2SLS residuals applied to the time-demeaned data, then [N(T -1 N

1)]

T
S S ^u2it is a consistent estimator of s2u.

i=1t=1

## replaced by N(T - 1) - K as a degrees of freedom adjustment.

d. From Problem 5.1 (which is purely algebraic, and so applies
immediately to pooled 2SLS), the 2SLS estimator of all parameters in (11.81),
including

B,

## first run the regression xit on d1i,

^
..., dNi, zit across all t and i, and obtain the residuals, say rit; second,
^
^
^
obtain c1, ..., cN, B from the pooled regression yit on d1i, ..., dNi, xit,
^
rit.
^

D,

B^

## and the coefficient on rit, say

from this last regression can be obtained by first partialling out the

## out is equivalent to time demeaning all variables.

Therefore,

B^

and

D^

can be

, ^
obtained from the pooled regression
yit on x
it rit, where we use the fact
^
that the time average of rit for each i is identically zero.
Now consider the 2SLS estimator of

from (11.79).

This is equivalent to

^
first regressing
xit on
zit and saving the residuals, say sit, and then
^
running the OLS regression
yit on
xit, sit.

## But, again by partial regression

^
and the fact that regressing on d1i, ..., dNi results in time demeaning, sit =
^
rit for all i and t.

from (11.79)

## (If some elements of xit are included in zit, as

^
would usually be the case, some entries in rit are identically zero for all t
and i.

But we can simply drop those without changing any other steps in the

argument.)
e. First, by writing down the first order condition for the 2SLS
^
estimates from (11.81) (with dni as their own instruments, and xit as the IVs
^
----- ^
^
for xit), it is easy to show that ci = yi - xiB, where B is the IV estimator
71

## Therefore, the 2SLS residuals from (11.81)

----- ^
^
----- ^

are computed as yit - (yi - xiB) - xitB = (yit - yi) - (xit - xi)B = y
it ^

estimating

^
-1
^
^
^

X and
Z, W = (Z
Z/N) , ui =
yi -
XiB, and * =

## &N-1 SN Zu^ ^uZ *.

7 i=1 i i i i8
g. The 2SLS procedure is inconsistent as N

## method that uses time-demeaning to eliminate the unobserved effect.

This is

because the time-demeaned IVs will generally be correlated with some elements
of ui (usually, all elements).

11.11. Differencing twice and using the resulting cross section is easily done
in Stata.

## . gen cclscrap = clscrap - clscrap[_n-1] if d89

(417 missing values generated)
. gen ccgrnt = cgrant - cgrant[_n-1] if d89
(314 missing values generated)
. gen ccgrnt_1 = cgrant_1 - cgrant_1[_n-1] if d89
(314 missing values generated)
. reg cclscrap ccgrnt ccgrnt_1
Source |
SS
df
MS
---------+-----------------------------Model | .958448372
2 .479224186
Residual | 25.2535328
51
.49516731
---------+-----------------------------Total | 26.2119812
53 .494565682

Number of obs
F( 2,
51)
Prob > F
R-squared
Adj R-squared
Root MSE

=
54
=
0.97
= 0.3868
= 0.0366
= -0.0012
= .70368

-----------------------------------------------------------------------------cclscrap |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
72

---------+-------------------------------------------------------------------ccgrnt |
.1564748
.2632934
0.594
0.555
-.3721087
.6850584
ccgrnt_1 |
.6099015
.6343411
0.961
0.341
-.6635913
1.883394
_cons | -.2377384
.1407363
-1.689
0.097
-.5202783
.0448014
------------------------------------------------------------------------------

## . xtreg clscrap d89 cgrant cgrant_1, fe

sd(u_fcode)
sd(e_fcode_t)
sd(e_fcode_t + u_fcode)

=
=
=

.509567
.4975778
.7122094

corr(u_fcode, Xb)

-0.4011

## Fixed-effects (within) regression

Number of obs =
108
n =
54
T =
2
R-sq within
between
overall
F(

=
=
=

0.0577
0.0476
0.0050

3,
51) =
Prob > F =

1.04
0.3826

-----------------------------------------------------------------------------clscrap |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------d89 | -.2377385
.1407362
-1.689
0.097
-.5202783
.0448014
cgrant |
.1564748
.2632934
0.594
0.555
-.3721087
.6850584
cgrant_1 |
.6099016
.6343411
0.961
0.341
-.6635913
1.883394
_cons | -.2240491
.114748
-1.953
0.056
-.4544153
.0063171
-----------------------------------------------------------------------------fcode |
F(53,51) =
1.674
0.033
(54 categories)

The estimates from the random growth model are pretty bad -- the estimates on
the grant variables are of the wrong sign -- and they are very imprecise.

The

## joint F test for the 53 different intercepts is significant at the 5% level,

so it is hard to know what to make of this.

## standard unobserved effects model without a random growth term.

11.13. To be added.

11.15. To be added.

73

-1

A (N

-1/2 N
Si=1Xi ui)

+ op(1).

--- B
^
rN(
FE - B) =

## and op(1) give

--- A
^
-1/2
rN(
- A) = N
S [(Zi Zi)-1Zi (yi - XiB) - A]
i=1
& -1 N
-1
* --- ^
- N S (Z
7 i=1 i Zi) Zi Xi8rN(BFE - B)
N
-1/2 N
= N
S (si - A) - CA-1N-1/2 S Xi ui + op(1)
N

i=1

where C
=

A.

i=1

## _ E[(Zi Zi) Zi Xi] and si _ (Zi Zi) Zi (yi - XiB).

-1

-1

By definition, E(si)

## By combining terms in the sum we have

--- A
^
-1/2
rN(
- A) = N
S [(si - A) - CA-1Xi ui] + op(1),
N

i=1

which implies by the central limit theorem and the asymptotic equivalence
lemma that

-- A
^
r-N(
- A) is asymptotically normal with zero mean and variance

E(rir
i ), where ri

## _ (si - A) - CA-1Xi ui.

If we replace

A,

C, A, and

with

^
their consistent estimators, we get exactly (11.55), sincethe ui are the FE
residuals.

74

CHAPTER 12

## 12.1. Take the conditional expectation of equation (12.4) with respect to x,

and use E(u|x) = 0:
E{[y - m(x,Q)]

## |x] = E(u2|x) + 2[m(x,Qo) - m(x,Q)]E(u|x)

+ E{[m(x,Qo) - m(x,Q)]

|x}

= E(u

= E(u

minimized at

Qo

Q,

## 12.3. a. The approximate elasticity is

^
dlog[E(y
|z)]/dlog(z1) = d[q^1 +

## q2log(z1) + ^q3z2]/dlog(z1) = ^q2.

^
^
b. This is approximated by 100Wdlog[E(y|z)]/dz2 = 100Wq3.
c. Since

*

d. Since

q3/(-2^q4).

## evaluated under the null is

restricted NLS estimator.

## Dq~mi = exp(xi1Q~1)xi _ ~mixi, where ~Q1 is the

From (12.72), we can compute the usual LM

2
~
~
~
statistic as NRu from the regression ui on mixi1, mixi2, i = 1,...,N, where

~
~
ui = yi - mi.
obtain the 1

~
~
For the robust test, we first regress mixi2 on mixi1 and

* K2 residuals, r~i.

## Then we compute the statistic as in

regression (12.75).

12.5. We need the gradient of m(xi,Q) evaluated under the null hypothesis.
By the chain rule,

75

3d2(xB) ],
2

## the NLS estimator with

d1 = d2 = 0 imposed.

~
~ 2
~ 3
Ddm(xi,Q~) = g(xiB
)[(xiB) ,(xiB) ].

Then

Let

B~

denote

~
Dbm(xi,~Q) = g(xiB
)xi and

## Therefore, the usual LM statistic can be

2
~
~
~
~ 2 ~
~ 3
obtained as NRu from the regression ui on gixi, giW(xiB) , giW(xiB) , where

~
gi

~
_ g(xiB
).

0, g = 1,...,G.

## Further, let ui be the G

Then E(uiu|
i xi) = E(uiu
i) =
squares residuals.

)o.

## * 1 vector containing the uig.

^
^ be the vector of nonlinear least
Let u
i

## Then, by standard arguments, a consistent estimator of

)^ _
because each NLS estimator,

)o

is

-1 N

^
^

Qg

S ^^ui^^ui

i=1

is consistent for

Qog

as N

8.

b. This part involves several steps, and I will sketch how each one
goes.

First, let

-- the nuisance

## Then, the score for

observation i is
s(wi,Q;G) = -Dqm(xi,Q)) ui(Q)
-1

## condition (12.37), even though the actual derivatives are complicated.

element of s(wi,Q;G) is a linear combination of ui(Q).
linear combination of ui(Qo)
of (xi,Qo,G).

So

Each

Dgsj(wi,Qo;G) is a

)o.

## the hint directly, which has the same consequence.

76

Next, we derive Bo

_ E[si(Qo;Go)si(Qo;Go)]:

## E[si(Qo;Go)si(Qo;Go)] = E[Dqmi(Qo))o uiu

i )o
-1

= E{E[Dqmi(Qo))o uiu
i )o
-1

-1

-1

-1

## Next, we have to derive Ao

Dqmi(Qo)]

Dqmi(Qo)|xi]}

= E[Dqmi(Qo))o E(uiu|
i xi))o
= E[Dqmi(Qo))o

-1

-1

Dqmi(Qo)]

)o)-1
o Dqmi(Qo)]

= E[Dqmi(Qo))o

-1

Dqmi(Qo)].

## Hessian itself is complicated, but its expected value is not.

of si(Q;G) with respect to
Hi(Q;G) =

can be written

)-1Dqmi(Q)

## The key is that F(xi,Q;G) depends on xi, not on yi.

E[Hi(Qo;Go)|xi] =
=

## verified (12.37) and that Ao = Bo.

-1

So,

Dqmi(Qo))-1
o Dqmi(Qo).
-1

= Ao

with respect to

Dqmi(Qo))-1
o Dqmi(Qo) + [IP t E(ui|xi)]F(xi,Qo;Go)

Qo)

The Jacobian

## Dqm(xi,Q))-1Dqm(xi,Q) + [IP t ui(Q)]F(xi,Q;G),

where F(xi,Q;G) is a GP

Q.

The

= {E[Dqmi(Qo))o

-1

Dqmi(Qo)].

So, we have

rN(^Q -----

Dqmi(Qo)]}-1.

## c. As usual, we replace expectations with sample averages and unknown

^ ^
parameters, and divide the result by N to get Avar(Q):

&N-1 SN D m (^Q))
^ -1
^ *-1
D
mi(Q)
q
i
q
7 i=1
8 /N
N
-1
& S D m (Q^))
^ -1
=
Dqmi(^Q)*8 .
q i
7

^ ^
Avar(Q) =

i=1

The estimate

)
^

## updated after the nonlinear SUR estimates have been obtained.

d. First, note that

## Dqmi(Qo) is a block-diagonal matrix with blocks

Dqgmig(Qog), a 1 * Pg matrix.

77

If

)o

## Standard matrix multiplication shows

that

-2
o
so1
Dq1mi1
Dq 1 m oi1 0 W W W

Dqmi(Qo))-1
o Dqmi(Qo) =

)
0

WW
W

o
o
s-2
oG DqGm iGDqGmiG
9
0
^
Taking expectations and inverting the result shows that Avar rN(Qg - Qog) =

W W W

-----

s2og[E(DqgmoigDqgmoig)]-1, g = 1,...,G.

## estimators are asymptotically uncorrelated across equations.)

These

asymptotic variances are easily seen to be the same as those for nonlinear
least squares on each equation; see p. 360.
e. I cannot see a nonlinear analogue of Theorem 7.7.

## The first hint

given in Problem 7.5 does not extend readily to nonlinear models, even when
the same regressors appear in each equation.
with

Dqm(xi,Qo).

While this G

## * P matrix has a block-diagonal form, as

described in part d, the blocks are not the same even when the same
regressors appear in each equation.
for all g.

But, unless

Qog

restrictive assumption --

## mg(xi,Qog) = exp(xiQog) then

Dqgmg(xi,Qog) = xi

For example, if

differ across g.

## 12.9. a. We cannot say anything in general about Med(y|x), since Med(y|x) =

m(x,Bo) + Med(u|x), and Med(u|x) could be a general function of x.
b. If u and x are independent, then E(u|x) and Med(u|x) are both
constants, say

a and d.

78

## a - d, which does not

depend on x.
c. When u and x are independent, the partial effects of xj on the
conditional mean and conditional median are the same, and there is no
ambiguity about what is "the effect of xj on y," at least when only the mean
and median are in the running.

But it could

## 12.11. a. For consistency of the MNLS estimator, we need -- in addition to

the regularity conditions, which I will ignore -- the identification
condition.

That is,

Bo

## m(xi,B)][yi - m(xi,B)]} = E({ui + [m(xi,Bo) - m(xi,B)]}{ui + [m(xi,Bo) m(xi,B)]}) = E(u

i ui) + 2E{[m(xi,Bo) - m(xi,B)]ui} + E{[m(xi,Bo) m(xi,B)][m(xi,Bo) - m(xi,B)]} = E(uiu
i ) + E{[m(xi,Bo) - m(xi,B)][m(xi,Bo) m(xi,B)]} because E(ui|xi) = 0.

## Therefore, the identification assumption is

that
E{[m(xi,Bo) - m(xi,B)][m(xi,Bo) - m(xi,B)]} > 0,
In a linear model, where m(xi,B) = XiB for Xi a G

B \$ Bo.

is
(Bo -

B)E(Xi Xi)(Bo

B)

> 0,

B \$ Bo,

## and this holds provided E(X

i Xi) is positive definite.
Provided m(x,W) is twice continuously differentiable, there are no
problems in applying Theorem 12.3.
and Ao = E[Dqmi(Bo)Dqmi(Bo)].

Generally, Bo = E[Dqmi(Bo)uiuD
i qmi(Bo)]

## obvious way after obtain the MNLS estimators.

b. We can apply the results on two-step M-estimation.
79

N

-1 N

i=1

## converges uniformly in probability to

E{[yi - m(xi,B)][W(xi,Do)] [yi - m(xi,B)]}/2,
-1

which is just to say that the usual consistency proof can be used provided we
verify identification.

## unweighted case to show

E{[yi - m(xi,B)][W(xi,Do)] [yi - m(xi,B)]} = E{u
i [Wi(Do)] ui}
-1

-1

## + E{[m(xi,Bo) - m(xi,B)][Wi(Do)] [m(xi,Bo) - m(xi,B)]},

-1

where E(ui|xi) = 0 is used to show the cross-product term, 2E{[m(xi,Bo) m(xi,B)][Wi(Do)] ui}, is zero (by iterated expectations, as always).
-1

at

Bo;

As

## To get the asymptotic variance, we proceed as in Problem 12.7.

it can be shown that condition (12.37) holds.

First,

It

This means

## that, under E(yi|xi) = m(xi,Bo), we can ignore preliminary estimation of

provided we have a

Do

rN-consistent estimator.
-----

## To obtain the asymptotic variance when the conditional variance matrix

is correctly specified, that is, when Var(yi|xi) = Var(ui|xi) = W(xi,Do), we
can use an argument very similar to the nonlinear SUR case in Problem 12.7:
E[si(Bo;Do)si(Bo;Do)] = E[Dbmi(Bo)[Wi(Do)] uiu
i [Wi(Do)]
-1

-1

= E{E[Dbmi(Bo)[Wi(Do)] uiu
i [Wi(Do)]
-1

-1

Dbmi(Bo)|xi]}

= E[Dbmi(Bo)[Wi(Do)] E(uiu|
i xi)[Wi(Do)]
-1

-1

= E{Dbmi(Bo)[Wi(Do)] ]Dbmi(Bo)}.
-1

80

Dbmi(Bo)]

Dbmi(Bo)]

Hi(Bo;Do) =

B),

## for some complicated function F(xi,Bo;Do) that depends only on xi.

Taking

expectations gives
Ao

## Therefore, from the usual results on M-estimation, Avar

^
-1
rN(B
- Bo) = Ao , and
-----

a consistent estimator of Ao is
^
-1
A = N

^
^ -1
^
S Dbm(xi,B
)[Wi(D)] Dbm(xi,B).

i=1

c. The consistency argument in part b did not use the fact that W(x,D)
is correctly specified for Var(y|x).
through.

## Bo, and the expression for Bo no longer holds.

still works, of course.
^
B = N

## To consistently estimate Bo we use

-1 N

^
^ -1^ ^
^ -1
^
S Dbm(xi,B
)[Wi(D)] uiu
i [Wi(D)] Dbm(xi,B).

i=1

## Now, we estimate Avar

^
^-1^^-1
rN(B
- Bo) in the usual way: A BA .
-----

CHAPTER 13

13.1. No.

We know that

Qo

solves

Qe\$

## where the expectation is over the joint distribution of (xi,yi).

because exp(W) is an increasing function,
f(yi|xi;Q)]} over

\$.

Qo

Therefore,

## function cannot be interchanged:

E[f(yi|xi;Q)]

\$ exp{E[log f(yi|xi;Q)]}.

81

In

f(yi|xi;Q)]}.

g

-1

## E[si(Fo)si(Fo)|xi] = E{[G(Qo)] si(Qo)si(Qo)[G(Qo)]

g

-1

-1

|xi}

= [G(Qo)] E[si(Qo)si(Qo)|xi][G(Qo)]
-1

-1

= [G(Qo)] Ai(Qo)[G(Qo)] .
-1

## b. In part b, we just replace

-1

Qo

with

~g
~
-1
~
~ -1
Ai = [G(Q)] Ai(Q)[G(Q)]

Q~

and

Fo

with

F~:

_ ~G-1~Ai~G-1.

## c. The expected Hessian form of the statistic is given in the second

~g
~g
part of equation (13.36), but where it is based initial on si and Ai:
& SN ~sg*& SN A~g*-1& SN ~sg*
LMg =
7i=1 i8 7i=1 i8 7i=1 i8
& SN ~G-1~s *& SN ~G-1~A ~G-1*-1& SN ~G-1s~ *
=
i8 7
i
i8
7i=1
8 7i=1
i=1
N

N
-1
N
& S s~ * G~-1G~& S A~ * G~G~-1& S ~s *
=
7i=1 i8
7i=1 i8
7i=1 i8
N
N
-1
N
& S ~s *& S ~A * & S ~s * = LM.
=
i
i
i

7i=1 8 7i=1 8

7i=1 8

The log-

Q) _

li(

## and we would use this in a standard MLE analysis (conditional on xi).

b. First, we know that, for all (yi2,xi),

Qo

maximizes E[li1(Q)|yi2,xi].

## Since ri2 is a function of (yi2,xi),

E[ri2li1(Q)|yi2,xi] = ri2E[li1(Q)|yi2,xi];
since ri2

## > 1, Qo maximizes E[ri2li1(Q)|yi2,xi] for all (yi2,xi), and

therefore

Qo

maximizes E[ri2li1(Q)].

Similary,
82

Qo

## maximizes E[li2(Q)], and

so it follows that

Qo

## maximizes ri2li1(Q) + li2(Q).

For identification, we

## have to assume or verify uniqueness.

c. The score is
si(Q) = ri2si1(Q) + si2(Q),
where si1(Q)

Therefore,

## E[si(Qo)si(Qo)] = E[ri2si1(Qo)si1(Qo)] + E[si2(Qo)si2(Qo)]

+ E[ri2si1(Qo)si2(Qo)] + E[ri2si2(Qo)si1(Qo)].
Now by the usual conditional MLE theory, E[si1(Qo)|yi2,xi] = 0 and, since ri2
and si2(Q) are functions of (yi2,xi), it follows that
E[ri2si1(Qo)si2(Qo)|yi2,xi] = 0, and so its transpose also has zero
conditional expectation.
expectation.

We have shown

## E[si(Qo)si(Qo)] = E[ri2si1(Qo)si1(Qo)] + E[si2(Qo)si2(Qo)].

Now, by the unconditional information matrix equality for the density
h(y2|x;Q), E[si2(Qo)si2(Qo)] = -E[Hi2(Qo)], where Hi2(Q) =

Dqsi2(Q).

## Further, byt the conditional IM equality for the density g(y1|y2,x;Q),

E[si1(Qo)si1(Qo)|yi2,xi] = -E[Hi1(Qo)|yi2,xi],
where Hi1(Q) =

Dqsi1(Q).

(13.70)

## Then, by iterated expectatins,

E[ri2si1(Qo)si1(Qo)] = -E[ri2Hi1(Qo)].
Combining all the pieces, we have shown that
E[si(Qo)si(Qo)] = -E[ri2Hi1(Qo)] - E[Hi2(Qo)]
= -{E[ri2Dqsi1(Q) +
= -E[Dqli(Q)]
2

Dqsi2(Q)]

_ -E[Hi(Q)].

## So we have verified that an unconditional IM equality holds, which means we

can estimate the asymptotic variance of
83

-----

## d. From part c, one consistent estimator of

rN(^Q - Qo) is
-----

-1 N

S (ri2H^i1 + H^i2),

i=1

## the problem into needed consistent estimators of -E[ri2Hi1(Qo)] and

-E[Hi2(Qo)], for which we can use iterated expectations.
definition, Ai2(Qo)

Since, by

-1 N

_ -E[Hi2(Qo)|xi], N

i=1

## -E[Hi1(Qo)|yi2,xi], and ri2 is a function of (yi2,xi), it follows that

E[ri2Ai1(Qo)] = -E[ri2Hi1(Qo)].
conditions, N

-1 N

This

i=1

## a true conditional maximum likelihood problem, we can still used the

conditional expectations of the hessians -- but conditioned on different sets
of variables, (yi2,xi) in one case, and xi in the other -- to consistently
estimate the asymptotic variance of the partial MLE.
e. Bonus Question:

## Show that if we were able to use the entire random

sample, the result conditional MLE would be more efficient than the partial
MLE based on the selected sample.
Answer:
B are P
-1

- A

if A and

## * P positive definite matrices, then A - B is p.s.d. if and only if

-1

is positive definite.

## variance of the partial MLE is {E[ri2Ai1(Qo) + Ai2(Qo)]} .

-1

If we could use

the entire random sample for both terms, the asymptotic variance would be
{E[Ai1(Qo) + Ai2(Qo)]} .
-1

Ai2(Qo)]}

-1

-1

- {E[Ai1(Qo) +

84

## = E[(1 - ri2)Ai1(Qo)] is p.s.d. (since Ai1(Qo) is p.s.d. and 1 - ri2

> 0.

13.9. To be added.

13.11. To be added.

CHAPTER 14

(x1,x2).

## -- these would generally improve efficiency if

g2 \$ 1.

If E(u2|x) =
2

s22,

2SLS using the given list of instruments is the efficient, single equation
GMM estimator.

Even under

## homoskedasticity, these are difficult, if not impossible, to find

analytically if
b. No.

If

g2 \$ 1.
g1 = 0, the parameter g2 does not appear in the model.

course, if we knew

Of

## c. We can see this by obtaining E(y1|x):

E(y1|x) = x1D1 +

Now, when

= x1D1 +

g1E(y22|x) + E(u1|x)

g1E(y22|x).

E(y1|x) = x1D1 +

g1(xD2) 2;

## in fact, we cannot find E(y1|x) without more assumptions.

regression y2 on x2 consistently estimates
85

D2,

While the

## the two-step NLS estimator of

^ g2
yi1 on xi1, (xiD2)
will not be consistent for
example of a "forbidden regression.")

When

D1

and

g2.

(This is an

%o

## function of xi and let

* L matrix that is a

Let Zi be a G

## be the probability limit of the weighting matrix.

Then the asymptotic variance of the GMM estimator has the form (14.10) with
Go = E[Z
i Ro(xi)].
G
o %oZ
i r(wi,Qo).

*

## The optimal score function is s (wi)

Ro(xi))o(xi) r(wi,Qo).
-1

## Now we can verify (14.57) with

r = 1:

E[s(wi)s (wi)] = G
o %oE[Z
i r(wi,Qo)r(wi,Qo))o(xi) Ro(xi)]
*

-1

= G
o %oE[Z
i E{r(wi,Qo)r(wi,Qo)|xi})o(xi) Ro(xi)]
-1

= G
o %oE[Z
i )o(xi))o(xi) Ro(xi)] = G
o %oGo = A.
-1

yit =
where

Pt

is 1 + 3K

stacking the
the

Pt

Pt.

Let

= (j,L
1 ,L
2 ,L
3 ,B).

we have

P2

= [L
1 ,(L2 +

B),L3 ], P3

= [L
1 ,L
2 ,(L3 +

by

86

B)].

&1 0 0 0 0 *
0 IK 0 0
0 0 IK 0

0
1
0
H =
0
0
1
0
0
70
2

0
0
IK
0
0
0
IK
0
0

0
0
0
IK
0
0
0
IK
0

IK
0
0
0
IK
0
0
0
IK

IK
0

0
0
0
.
IK
0
0
0
0
IK8
2

## 14.7. With h(Q) = HQ, the minimization problem becomes

^
^-1 ^
min (P - HQ)% (P - HQ),

QeR P

Q.

## condition is easily seen to be

^-1 ^
^
-2H% (P - HQ) = 0

^-1 ^
^-1^
(H% H)Q = H% P.

or

^-1
Therefore, assuming H% H is nonsingular -- which occurs w.p.a.1. when
H%o H -- is nonsingular -- we have
-1

Q^

^-1 -1 ^-1^
= (H% H) H% P.

14.9. We have to verify equations (14.55) and (14.56) for the random effects
and fixed effects estimators.

## subscripts for clarity), A1, and A2 are given in the hint.

10, we know that E(rir|
i xi) =
-

ljTvi.
-----

## s2uIT under RE.1, RE.2, and RE.3, where ri = vi

Therefore, E(si1s
i1) = E(X
i rir
i Xi) =

i ri = X
i (vi 87

r.

r _ su2.

Now,

But si2s
i1 = X
i uir
i X i.

## ljTvi) = Xi vi = Xi (cijT + ui) =

-----

X
i ui.

su2Xi Xi.

So si2s
i1 = X
i rir
i Xi and therefore E(si2s
i1|xi) = X
i E(rir|
i xi)Xi =
It follows that E(si2s
i1) =

note that X
i Xi = X
i (Xi =

su2E(Xi Xi).

ljTxi) = Xi Xi = Xi Xi.
-----

s2u.

88

## To finish off the proof,

This verifies (14.56) with

CHAPTER 15

0 for k

## just the cell frequencies, and these are necessarily in [0,1].

b. The fitted values for each category will be the same.

If we drop d1

but add an overall intercept, the overall intercept is the cell frequency for
the first category, and the coefficient on dm becomes the difference in cell
frequency between category m and category one, m = 2, ..., M.

## 15.3. a. If P(y = 1|z1,z2) = F(z1D1 + g1z2+ g2z2) then

dP(y = 1|z1,z2)
= (g1 + 2g2z2)Wf(z1D1 +
dz 2
2

-------------------------------------------------------------------------

g1z2 + g2z22);

## for given z, this is estimated as

^
^
^
^
^ 2
(g1 + 2g2z2)Wf(z1D1 + g1z2 + g2z2),
where, of course, the estimates are the probit estimates.
b. In the model
P(y = 1|z1,z2,d1) =
the partial effect of z2 is
dP(y = 1|z1,z2,d1)
= (g1 +
dz 2
---------------------------------------------------------------------------------------

## The effect of d1 is measured as the difference in the probabilities at d1 = 1

and d1 = 0:
P(y = 1|z,d1 = 1) - P(y = 1|z,d1 = 0)
=

## Again, to estimate these effects at given z and -- in the first case, d1 -- we

89

just replace the parameters with their probit estimates, and use average or
other interesting values of z.
c. We would apply the delta method from Chapter 3.

Thus, we would

require the full variance matrix of the probit estimates as well as the
gradient of the expression of interest, such as (g1 + 2g2z2)Wf(z1D1 +

g1z2 +

## F(z1D1 + g1z2q) then

dP(y = 1|z,q) = g qWf(z D + g z q),
1
1 1
1 2
dz 2

## 15.5. a. If P(y = 1|z,q) =

-----------------------------------------------------------------

## assuming that z2 is not functionally related to z1.

*

b. Write y

= z1D1 + r, where r =

2 2

E(e|z) = 0.

Also,

Var(r|z) =

## because Cov(q,e|z) = 0 by independence between e and (z,q).

r/r

g1z2E(q|z) +

Thus,

5 2 2
g1z2 + 1 has a standard normal distribution independent of z.
======================================

It follows

that

5
F&7z1D1/r g21z22 + 18*.
(15.90)
2
c. Because P(y = 1|z) depends only on g1, this is what we can estimate
======================================

P(y = 1|z) =

along with

D1.

(For example,

## This is why we define

r1 = g21.

Testing H0:

r1 = 0 is most

easily done using the score or LM test because, under H0, we have a standard
probit model.
Define

Let

D1
^

## Fi = F(zi1^D1), ^fi = f(zi1^D1), ^ui = yi

^

90

5
Fi, and ~ui _ u^i/r F^i(1 - F^i)

^
-----

r1 = 0.

================================================

D1,

with respect to

## only other quantity needed is the gradient with respect to

null estimates.

fizi1 .

The

r1 evaluated at the

r1 is,

for each i,

## &r z2 + 1*-3/2f(z D /r5g2z2 + 1).

1 i2
7 1 i2
8
9 i1 1
0
^
^
2
^
When we evaluate this at r1 = 0 and D1 we get -(zi1D1)(zi2/2)fi.
==========================================

-(zi1D1)(zi2/2)
2

Then, the

## score statistic can be obtained as NRu from the regression

~
ui

5
5
2 ^
f^izi1/r F^i(1 - F^i), (zi1D^1)zi2
fi/r F^i(1 - F^i);
================================================

on

2 a
under H0, NRu ~

================================================

c21.

place of

g21.

r1 in

## 15.7. a. The following Stata output is for part a:

. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
Source |
SS
df
MS
-------------+-----------------------------Model | 44.9720916
8 5.62151145
Residual | 500.844422 2716 .184405163
-------------+-----------------------------Total | 545.816514 2724
.20037317

Number of obs
F( 8, 2716)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

2725
30.48
0.0000
0.0824
0.0797
.42942

-----------------------------------------------------------------------------arr86 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pcnv | -.1543802
.0209336
-7.37
0.000
-.1954275
-.1133329
avgsen |
.0035024
.0063417
0.55
0.581
-.0089326
.0159374
tottime | -.0020613
.0048884
-0.42
0.673
-.0116466
.007524
ptime86 | -.0215953
.0044679
-4.83
0.000
-.0303561
-.0128344
inc86 | -.0012248
.000127
-9.65
0.000
-.0014738
-.0009759
black |
.1617183
.0235044
6.88
0.000
.1156299
.2078066
hispan |
.0892586
.0205592
4.34
0.000
.0489454
.1295718
born60 |
.0028698
.0171986
0.17
0.867
-.0308539
.0365936
_cons |
.3609831
.0160927
22.43
0.000
.329428
.3925382
91

-----------------------------------------------------------------------------. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60, robust
Regression with robust standard errors

Number of obs
F( 8, 2716)
Prob > F
R-squared
Root MSE

=
=
=
=
=

2725
37.59
0.0000
0.0824
.42942

-----------------------------------------------------------------------------|
Robust
arr86 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pcnv | -.1543802
.018964
-8.14
0.000
-.1915656
-.1171948
avgsen |
.0035024
.0058876
0.59
0.552
-.0080423
.0150471
tottime | -.0020613
.0042256
-0.49
0.626
-.010347
.0062244
ptime86 | -.0215953
.0027532
-7.84
0.000
-.0269938
-.0161967
inc86 | -.0012248
.0001141
-10.73
0.000
-.0014487
-.001001
black |
.1617183
.0255279
6.33
0.000
.1116622
.2117743
hispan |
.0892586
.0210689
4.24
0.000
.0479459
.1305714
born60 |
.0028698
.0171596
0.17
0.867
-.0307774
.036517
_cons |
.3609831
.0167081
21.61
0.000
.3282214
.3937449
-----------------------------------------------------------------------------The estimated effect from increasing pcnv from .25 to .75 is about -.154(.5) =
-.077, so the probability of arrest falls by about 7.7 points.

There are no

In fact,

## in a couple of cases the robust standard errors are notably smaller.

b. The robust statistic and its p-value are gotten by using the "test"
command after appending "robust" to the regression command:
. test avgsen tottime
( 1)
( 2)

avgsen = 0.0
tottime = 0.0
F(

2, 2716) =
Prob > F =

0.18
0.8320

. qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
. test avgsen tottime

92

( 1)
( 2)

avgsen = 0.0
tottime = 0.0
F(

2, 2716) =
Prob > F =

0.18
0.8360

## c. The probit model is estimated as follows:

. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

=
=
=
=

-1608.1837
-1486.3157
-1483.6458
-1483.6406

Probit estimates

Number of obs
LR chi2(8)
Prob > chi2
Pseudo R2

## Log likelihood = -1483.6406

=
=
=
=

2725
249.09
0.0000
0.0774

-----------------------------------------------------------------------------arr86 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pcnv | -.5529248
.0720778
-7.67
0.000
-.6941947
-.4116549
avgsen |
.0127395
.0212318
0.60
0.548
-.028874
.0543531
tottime | -.0076486
.0168844
-0.45
0.651
-.0407414
.0254442
ptime86 | -.0812017
.017963
-4.52
0.000
-.1164085
-.0459949
inc86 | -.0046346
.0004777
-9.70
0.000
-.0055709
-.0036983
black |
.4666076
.0719687
6.48
0.000
.3255516
.6076635
hispan |
.2911005
.0654027
4.45
0.000
.1629135
.4192875
born60 |
.0112074
.0556843
0.20
0.840
-.0979318
.1203466
_cons | -.3138331
.0512999
-6.12
0.000
-.4143791
-.213287
-----------------------------------------------------------------------------Now, we must compute the difference in the normal cdf at the two different
values of pcnv, black = 1, hispan = 0, born60 = 1, and at the average values
of the remaining variables:
. sum avgsen tottime ptime86 inc86
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------avgsen |
2725
.6322936
3.508031
0
59.2
tottime |
2725
.8387523
4.607019
0
63.4
ptime86 |
2725
.387156
1.950051
0
12
inc86 |
2725
54.96705
66.62721
0
541

93

## . di -.313 + .0127*.632 - .0076*.839 - .0812*.387 - .0046*54.97 + .467 + .0112

-.1174364
. di normprob(-.553*.75 - .117) - normprob(-.553*.25 - .117)
-.10181543
This last command shows that the probability falls by about .10, which is
somewhat larger than the effect obtained from the LPM.
d. To obtain the percent correctly predicted for each outcome, we first
generate the predicted values of arr86 as described on page 465:
. predict phat
(option p assumed; Pr(arr86))
. gen arr86h = phat > .5
. tab arr86h arr86
|
arr86
arr86h |
0
1 |
Total
-----------+----------------------+---------0 |
1903
677 |
2580
1 |
67
78 |
145
-----------+----------------------+---------Total |
1970
755 |
2725

. di 1903/1970
.96598985
. di 78/755
.10331126
For men who were not arrested, the probit predicts correctly about 96.6% of
the time.

Unfortunately, for the men who were arrested, the probit is correct

## The overall percent correctly predicted is

quite high, but we cannot very well predict the outcome we would most like to
predict.
e. Adding the quadratic terms gives
. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 pcnvsq
pt86sq inc86sq
94

Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:
4:
5:
6:
7:

log
log
log
log
log
log
log
log

likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood

=
=
=
=
=
=
=
=

-1608.1837
-1452.2089
-1444.3151
-1441.8535
-1440.268
-1439.8166
-1439.8005
-1439.8005

Probit estimates

Number of obs
LR chi2(11)
Prob > chi2
Pseudo R2

## Log likelihood = -1439.8005

=
=
=
=

2725
336.77
0.0000
0.1047

-----------------------------------------------------------------------------arr86 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pcnv |
.2167615
.2604937
0.83
0.405
-.2937968
.7273198
avgsen |
.0139969
.0244972
0.57
0.568
-.0340166
.0620105
tottime | -.0178158
.0199703
-0.89
0.372
-.056957
.0213253
ptime86 |
.7449712
.1438485
5.18
0.000
.4630333
1.026909
inc86 | -.0058786
.0009851
-5.97
0.000
-.0078094
-.0039478
black |
.4368131
.0733798
5.95
0.000
.2929913
.580635
hispan |
.2663945
.067082
3.97
0.000
.1349163
.3978727
born60 | -.0145223
.0566913
-0.26
0.798
-.1256351
.0965905
pcnvsq | -.8570512
.2714575
-3.16
0.002
-1.389098
-.3250042
pt86sq | -.1035031
.0224234
-4.62
0.000
-.1474522
-.059554
inc86sq |
8.75e-06
4.28e-06
2.04
0.041
3.63e-07
.0000171
_cons |
-.337362
.0562665
-6.00
0.000
-.4476423
-.2270817
-----------------------------------------------------------------------------note: 51 failures and 0 successes completely determined.
. test pcnvsq pt86sq inc86sq
( 1)
( 2)
( 3)

pcnvsq = 0.0
pt86sq = 0.0
inc86sq = 0.0
chi2( 3) =
Prob > chi2 =

38.54
0.0000

The quadratic in

## pcnv means that, at low levels of pcnv, there is actually a positive

relationship between probability of arrest and pcnv, which does not make much
sense.

## ~ .127, which means

that there is an estimated deterrent effect over most of the range of pcnv.
95

li(B)

## which is only well-defined for 0 < xiB < 1.

b. For any possible estimate

B^,

## the log-likelihood function is well-

^
defined only if 0 < xiB < 1 for all i = 1,...,N.

It may be

## impossible to find an estimate that satisfies these inequalities for every

observation, especially if N is large.
c. This follows from the KLIC:

## Since the MLEs

are consistent for the unknown parameters, asymptotically the true density
will produce the highest average log likelihood function.

## So, just as we can

use an R-squared to choose among different functional forms for E(y|x), we can
use values of the log-likelihood to choose among different models for P(y =
1|x) when y is binary.

## 15.11. We really need to make two assumptions.

independence assumption:
independent.

## This allows us to write

f(y1,...,yT|xi) = f1(y1|xi)WWWfT(yT|xi),

that is, the joint density (conditional on xi) is the product of the marginal
densities (each conditional on xi).
exogeneity assumptiond:

## standard assumption for pooled probit -- that D(yit|xit) follows a probit

model -- then
96

f(y1,...,yT|xi) =

p [G(xitB)]yt[1 - G(xitB)]1-yt,

t=1

## 15.13. a. If there are no covariates, there is no point in using any method

other than a straight comparison of means.

## The estimated probabilities for

the treatment and control groups, both before and after the policy change,
will be identical across models.
b. Let d2 be a binary indicator for the second time period, and let dB be
an indicator for the treatment group.

## Then a probit model to evaluate the

treatment effect is
P(y = 1|x) =

Once we have

-----

## which requires either plugging in a value for x, say x, or averaging the

differences across xi.

## q^ _ [F(^d0 + ^d1 + ^d2 + ^d3 + x^G) - F(^d0 + ^d2 + x^G)]

------

^
^
^
- [F(d0 + d1 + xG) ------

------

F(^d0 + x^G)],
------

## and in the latter we have

N
^
^
^
^
q _ N-1 S {[F(d^0 + d^1 + d^2 + d^3 + xiG
) - F(d0 + d2 + xiG)]

i=1

^
^
^
- [F(d0 + d1 + xiG) -

F(^d0 + xi^G)]}.

Both are estimates of the difference, between groups B and A, of the change in
the response probability over time.
c. We would have to use the delta method to obtain a valid standard error
for either

q or ~q.

97

## 15.15. We should use an interval regression model; equivalently, ordered

probit with known cut points.

data.

approximation.)

s2.

## regression with actual GPAs.

15.17. a. We obtain the joint density by the product rule, since we have
independence conditional on (x,c):
f(y1,...,yG|x,c;Go) = f1(y1|x,c;Go)f2(y1|x,c;Go)WWWfG(yG|x,c;Go).
1

## b. The density of (y1,...,yG) given x is obtained by integrating out with

respect to the distribution of c given x:

8&

7
8
-8 g=1
where

## D(yg|x,c), y1,...,yG are dependent without conditioning on c.

c. The log likelihood for each i is

## # 8i& pG f (y |x ,c;Gg)*h(c|x ;D)dc\$.

i
3 -87g=1 g ig i
8
4

log

As expected, this depends only on the observed data, (xi,yi1,...,yiG), and the
unknown parameters.

15.19. To be added.

98

CHAPTER 16

*

## = P[ui > log(c) - xiB|xi] = 1 As c

8, F{[log(c) - xiB]/s}

F{[log(c) - xiB]/s}.

## 1, and so P[log(ti) = log(c)|xi]

0 as c

8.

This simply says that, the longer we wait to censor, the less likely it is
that we observe a censored observation.

## _ log(ti) (given xi) when ti < c is the same as the

b. The density of yi
*

density of yi

## _ log(ti*), which is just Normal(xiB,s2).

< y|xi) = P(yi* < y|xi).
f(y|xi) = 1 -

## sf[(y - xiB)/s], y < log(c).

2
c. li(B,s ) = 1[yi = log(c)]Wlog(1 - F{[log(c) - xiB]/s})
f(y|xi) =

-----

## + 1[yi < log(c)]Wlog{s

d. To test H0:
statistic.

B2

-1

f[(yi - xiB)/s]}.

## = 0, I would probably use the likelihood ratio

This requires estimating the model with all variables, and then

## the model without x2.

The LR statistic is

distributed asymptotically as

LR

= 2(Lur -

Lr).

Under H0,

LR

is

c2K2.

## e. Since ui is independent of (xi,ci), the density of yi given (xi,ci)

has the same form as the density of yi given xi above, except that ci replaces
c.

*

affecting ti.

## where ui might contain unobserved ability, we do not wait longer to censor

people of lower ability.

Thus, if xi

99

*

## < a1|xi) = P[(ui/s) < (a1 - xiB)/s]

F[(a1 - xiB)/s].

Similarly,
P(yi = a2|xi) = P(yi

= P[(ui/s)
=

## > (a2 - xiB)/s] = 1 - F[(a2 - xiB)/s]

F[-(a2 - xiB)/s].

## < y|xi) = P(y*i < y|xi) = F[(y - xiB)/s].

Taking

the derivative of this cdf with respect to y gives the pdf of yi conditional
on xi for values of y strictly between a1 and a2:
*

b. Since y = y
a2).

But y

## < a2, E(y|x,a1 < y < a2) = E(y

when a1 < y

= xB + u, and a1 < y

(1/s)f[(y - xiB)/s].

E(y

= xB +

= xB +

s{f[(a1 - xB)/s]

## = E(y|x,a1 < y < a2).

Now, we can easily get E(y|x) by using the following:
E(y|x) = a1P(y = a1|x) + E(y|x,a1 < y < a2)WP(a1 < y < a2|x)
+ a2P(y2 = a2|x)
= a1F[(a1 - xB)/s]
+ E(y|x,a1 < y < a2)W{F[(a2 - xB)/s] + a2F[(xB - a2)/s]
= a1F[(a1 - xB)/s]
100

F[(a1 - xB)/s]}

+ (xB)W{F[(a2 - xB)/s] +

F[(a1 - xB)/s]}

(16.57)

## s{f[(a1 - xB)/s] - f[(a2 - xB)/s]}

+ a2F[(xB - a2)/s].

## would be a fluke if OLS on the restricted sample consistently estimated

B.

The linear regression of yi on xi using only those yi such that a1 < yi < a2
*

*

## for which a1 < y

< a2.

on x in the subpopulation

B.

## up to a common scale coefficient.]

d. We get the log-likelihood immediately from part a:

li(q)

## = 1[yi = a1]log{F[(a1 - xiB)/s]}

+ 1[yi = a2]log{F[(xiB - a2)/s]}
+ 1[a1 < yi < a2]log{(1/s)f[(yi - xiB)/s]}.

Note how the indicator function selects out the appropriate density for each
of the three possible cases:

## strictly between the endpoints.

e. After obtaining the maximum likelihood estimates
these into the formulas in part b.

B^

and

## The expressions can be evaluated at

interesting values of x.
f. We can show this by brute-force differentiation of equation (16.57).
As a shorthand, write
a2)/s],

## F1 _ F[(a1 - xiB)/s], and F2 _ F[(a2 - xB)/s].

dE(y|x) = -(a /s)f b + (a /s)f b
1
1 j
2
2 j
dx j
+ (F2 - F1)bj + [(xB/s)(f1 - f2)]bj
-----------------------------------

101

Then

## + {[(a1 - xB)/s]f1}bj - {[(a2 - xB)/s]f2}bj,

where the first two parts are the derivatives of the first and third terms,
respectively, in (16.57), and the last two lines are obtained from
differentiating the second term in E(y|x).
terms cancel except (F2 -

## F1)bj, which is the expression we wanted to be left

with.
The scale factor is simply the probability that a standard normal random
variable falls in the interval [(a1 - xB)/s,(a2 - xB)/s], which is necessarily
between zero and one.
g. The partial effects on E(y|x) are given in part f.

These are

estimated as
^ ^
{F[(a2 - xB)/s] where the estimates are the MLEs.
-----

at, say, x.

^ ^ ^
F[(a1 - xB
)/s]}bj,

## We could evaluate these partial effects

^ ^
Or, we could average {F[(a2 - xiB)/s] -

## all i to obtain the average partial effect.

can be compared to the

gj.

(16.58)

^ ^
F[(a1 - xiB
)/s]} across

## In either case, the scaled

bj

Generally, we expect
^

^
gj ~ ^rWb
j,
where 0 <

## be very good in a partiular application, but it is often roughly true.

does not make sense to directly compare the magnitude of
By the way, note that

It

## there is no sense in which

s is "ancillary."

h. For data censoring where the censoring points might change with i,
the analysis is essentially the same but a1 and a2 are replaced with ai1 and
ai2.

## able to do OLS on an uncensored sample.

102

16.5. a. The results from OLS estimation of the linear model are
. reg hrbens exper age educ tenure married male white nrtheast nrthcen south
union
Source |
SS
df
MS
---------+-----------------------------Model | 101.132288
11 9.19384436
Residual | 170.839786
604 .282847328
---------+-----------------------------Total | 271.972074
615 .442231015

Number of obs
F( 11,
604)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

616
32.50
0.0000
0.3718
0.3604
.53183

-----------------------------------------------------------------------------hrbens |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------exper |
.0029862
.0043435
0.688
0.492
-.005544
.0115164
age | -.0022495
.0041162
-0.547
0.585
-.0103333
.0058343
educ |
.082204
.0083783
9.812
0.000
.0657498
.0986582
tenure |
.0281931
.0035481
7.946
0.000
.021225
.0351612
married |
.0899016
.0510187
1.762
0.079
-.010294
.1900971
male |
.251898
.0523598
4.811
0.000
.1490686
.3547274
white |
.098923
.0746602
1.325
0.186
-.0477021
.2455481
nrtheast | -.0834306
.0737578
-1.131
0.258
-.2282836
.0614223
nrthcen | -.0492621
.0678666
-0.726
0.468
-.1825451
.084021
south | -.0284978
.0673714
-0.423
0.672
-.1608084
.1038129
union |
.3768401
.0499022
7.552
0.000
.2788372
.4748429
_cons | -.6999244
.1772515
-3.949
0.000
-1.048028
-.3518203
-----------------------------------------------------------------------------b. The Tobit estimates are
. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south
union, ll(0)
Tobit Estimates

Number of obs
chi2(11)
Prob > chi2
Pseudo R2

## Log Likelihood = -519.66616

=
616
= 283.86
= 0.0000
= 0.2145

-----------------------------------------------------------------------------hrbens |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------exper |
.0040631
.0046627
0.871
0.384
-.0050939
.0132201
age | -.0025859
.0044362
-0.583
0.560
-.0112981
.0061263
educ |
.0869168
.0088168
9.858
0.000
.0696015
.1042321
tenure |
.0287099
.0037237
7.710
0.000
.021397
.0360227
married |
.1027574
.0538339
1.909
0.057
-.0029666
.2084814
male |
.2556765
.0551672
4.635
0.000
.1473341
.364019
white |
.0994408
.078604
1.265
0.206
-.054929
.2538105
103

nrtheast | -.0778461
.0775035
-1.004
0.316
-.2300547
.0743625
nrthcen | -.0489422
.0713965
-0.685
0.493
-.1891572
.0912729
south | -.0246854
.0709243
-0.348
0.728
-.1639731
.1146022
union |
.4033519
.0522697
7.717
0.000
.3006999
.5060039
_cons | -.8137158
.1880725
-4.327
0.000
-1.18307
-.4443616
---------+-------------------------------------------------------------------_se |
.5551027
.0165773
(Ancillary parameter)
-----------------------------------------------------------------------------Obs. summary:

## 41 left-censored observations at hrbens<=0

575 uncensored observations

The Tobit and OLS estimates are similar because only 41 of 616 observations,
or about 6.7% of the sample, have hrbens = 0.

## As expected, the Tobit

estimates are all slightly larger in magnitude; this reflects that the scale
factor is always less than unity.

s.

You

## should ignore the phrase "Ancillary parameter" (which essentially means

"subordinate") associated with "_se" as it is misleading for corner solution
applications:

^2

as we know,

## c. Here is what happens when exper

and tenure

are included:

. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south
union expersq tenuresq, ll(0)
Tobit Estimates

Number of obs
chi2(13)
Prob > chi2
Pseudo R2

## Log Likelihood = -503.62108

=
616
= 315.95
= 0.0000
= 0.2388

-----------------------------------------------------------------------------hrbens |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------exper |
.0306652
.0085253
3.597
0.000
.0139224
.047408
age | -.0040294
.0043428
-0.928
0.354
-.0125583
.0044995
educ |
.0802587
.0086957
9.230
0.000
.0631812
.0973362
tenure |
.0581357
.0104947
5.540
0.000
.037525
.0787463
married |
.0714831
.0528969
1.351
0.177
-.0324014
.1753675
male |
.2562597
.0539178
4.753
0.000
.1503703
.3621491
white |
.0906783
.0768576
1.180
0.239
-.0602628
.2416193
nrtheast | -.0480194
.0760238
-0.632
0.528
-.197323
.1012841
nrthcen |
-.033717
.0698213
-0.483
0.629
-.1708394
.1034053
south |
-.017479
.0693418
-0.252
0.801
-.1536597
.1187017
union |
.3874497
.051105
7.581
0.000
.2870843
.4878151
expersq | -.0005524
.0001487
-3.715
0.000
-.0008445
-.0002604
104

tenuresq | -.0013291
.0004098
-3.243
0.001
-.002134
-.0005242
_cons | -.9436572
.1853532
-5.091
0.000
-1.307673
-.5796409
---------+-------------------------------------------------------------------_se |
.5418171
.0161572
(Ancillary parameter)
-----------------------------------------------------------------------------Obs. summary:

## 41 left-censored observations at hrbens<=0

575 uncensored observations

Both squared terms are very signficant, so they should be included in the
model.
d. There are nine industries, and we use ind1 as the base industry:
. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south
union expersq tenuresq ind2-ind9, ll(0)
Tobit Estimates

Number of obs
chi2(21)
Prob > chi2
Pseudo R2

## Log Likelihood = -467.09766

=
616
= 388.99
= 0.0000
= 0.2940

-----------------------------------------------------------------------------hrbens |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------exper |
.0267869
.0081297
3.295
0.001
.0108205
.0427534
age | -.0034182
.0041306
-0.828
0.408
-.0115306
.0046942
educ |
.0789402
.0088598
8.910
0.000
.06154
.0963403
tenure |
.053115
.0099413
5.343
0.000
.0335907
.0726393
married |
.0547462
.0501776
1.091
0.276
-.0438005
.1532928
male |
.2411059
.0556864
4.330
0.000
.1317401
.3504717
white |
.1188029
.0735678
1.615
0.107
-.0256812
.2632871
nrtheast | -.1016799
.0721422
-1.409
0.159
-.2433643
.0400045
nrthcen | -.0724782
.0667174
-1.086
0.278
-.2035085
.0585521
south | -.0379854
.0655859
-0.579
0.563
-.1667934
.0908226
union |
.3143174
.0506381
6.207
0.000
.2148662
.4137686
expersq | -.0004405
.0001417
-3.109
0.002
-.0007188
-.0001623
tenuresq | -.0013026
.0003863
-3.372
0.000
-.0020613
-.000544
ind2 | -.3731778
.3742017
-0.997
0.319
-1.108095
.3617389
ind3 | -.0963657
.368639
-0.261
0.794
-.8203574
.6276261
ind4 | -.2351539
.3716415
-0.633
0.527
-.9650425
.4947348
ind5 |
.0209362
.373072
0.056
0.955
-.7117618
.7536342
ind6 | -.5083107
.3682535
-1.380
0.168
-1.231545
.214924
ind7 |
.0033643
.3739442
0.009
0.993
-.7310468
.7377754
ind8 | -.6107854
.376006
-1.624
0.105
-1.349246
.127675
ind9 | -.3257878
.3669437
-0.888
0.375
-1.04645
.3948746
_cons | -.5750527
.4137824
-1.390
0.165
-1.387704
.2375989
---------+-------------------------------------------------------------------_se |
.5099298
.0151907
(Ancillary parameter)
-----------------------------------------------------------------------------105

Obs. summary:

## 41 left-censored observations at hrbens<=0

575 uncensored observations

(
(
(
(
(
(
(
(

1)
2)
3)
4)
5)
6)
7)
8)

ind2
ind3
ind4
ind5
ind6
ind7
ind8
ind9
F(

=
=
=
=
=
=
=
=

0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

8,
595) =
Prob > F =

9.66
0.0000

## Each industry dummy variable is individually insignificant at even the 10%

level, but the joint Wald test says that they are jointly very significant.
This is somewhat unusual for dummy variables that are necessarily orthogonal
(so that there is not a multicollinearity problem among them).

The likelihood

## ratio statistic is 2(503.621 - 467.098) = 73.046; notice that this is roughly

8 (= number of restrictions) times the F statistic; the p-value for the LR
statistic is also essentially zero.

## Certainly several estimates on the

industry dummies are economically significant, with a worker in, say, industry
eight earning about 61 cents less per hour in benefits than comparable worker
in industry one.

## zero, it is roughly legitimate to use the parameter estimates as the partial

effects.]

16.7. a. This follows because the densities conditional on y > 0 are identical
for the Tobit model and Craggs model.
17.3.

## Briefly, if f(W|x) is the continuous density of y given x, then the

density of y given x and y > 0 is f(W|x)/[1 - F(0|x)], where F(W|x) is the cdf
106

of y given x.

## When f is the normal pdf with mean xB and variance

s2, we get

that f(y|x,y > 0) = {F(xB/s)} {f[(y - xB)/s]/s} for the Tobit model, and this
-1

## is exactly the density specified for Craggs model given y > 0.

b. From (6.8) we have
E(y|x) =

## F(xG)WE(y|x,y > 0) = F(xG)[xB + sl(xB/s)].

c. This follows very generally -- not just for Craggs model or the Tobit
model -- from (16.8):
log[E(y|x)] = log[P(y > 0|x)] + log[E(y|x,y > 0)].
If we take the partial derivative with respect to log(x1) we clearly get the
sum of the elasticities.

## 16.9. a. A two-limit Tobit model, of the kind analyzed in Problem 16.3, is

appropriate, with a1 = 0, a2 = 10.
b. The lower limit at zero is logically necessary considering the kind of
response:

## underlying variable, which would be the percentage invested in the absense of

any restrictions.

## Then, there would be no upper bound required (since we

would not have to worry about 100 percent of income being invested in a
pension plan).
c. From Problem 16.3(b), with a1 = 0, we have
E(y|x) = (xB)W{F[(a2 - xB)/s] +

F(-xB/s)}

107

+
=

F[(xB - a2)/s].

(16.59)

from 10 to 11.

B^

and

^
^
F[(xB
- 10)/s], where

## of x or at other interesting values (such as across gender or race).

d. If yi < 10 for i = 1,...,N,

B^

and

## 16.11. No. OLS always consistently estimates the parameters of a linear

projection -- provided the second moments of y and the xj are finite, and
Var(x) has full rank K -- regardless of the nature of y or x.

That is why a

## linear regression analysis is always a reasonable first step for binary

outcomes, corner solution outcomes, and count outcomes, provided there is not
true data censoring.

## 16.13. This extension has no practical effect on how we estimate an unobserved

effects Tobit or probit model, or how we estimate a variety of unobserved
effects panel data models with conditional normal heterogeneity.

We simply

have

&
7

ci = - T
where

T
j _ -7&T-1 S Pt8*X.
t=1

t=1

-1 T

-----

-----

-----

## swept out of xi in this case but would usually be included in xit.

An interesting follow-up question would have been:

What if we

standardize each xit by its cross-sectional mean and variance at time t, and
108

## assume ci is related to the mean and variance of the standardized vectors.

other words, let zit

In

## _ (xit - Pt))t-1/2, t = 1,...,T, for each random draw i

Then, we might assume ci|xi ~ Normal(j + ziX,sa) (where,
2

-----

T

X/T,

-1/2
r

t = 1,2,...,T.

ci =

r=1

Pt

## for each t using the cross section observations {xit: i = 1,2,...,N}.

usual sample means and sample variance matrices, say
and

rN-asymptotically normal.
-----

Then, form ^
zit

P^t

and

)^ t,

and

)t

The

are consistent

^ -1/2
^
_ )
(xit - Pt), and proceed
t

with the usual Tobit (or probit) unobserved effects analysis that includes the
-1
time averages ^
zi = T
-----

T
S ^zit. This is a rather simple two-step estimation

t=1

cumbersome.

P^t

and

)^ t

would be

## It may be possible to use a much larger to obtain

P^t

and

)^ t,

in

which case one might ignore the sampling error in the first-stage estimates.

16.15. To be added.

CHAPTER 17

17.1. If you are interested in the effects of things like age of the building
and neighborhood demographics on fire damage, given that a fire has occured,
then there is no problem.

## You might want to supplement this with an analysis

of the probability that buildings catch fire, given building and neighborhood
characteristics.

109

## 17.3. This is essentially given in equation (17.14).

density f(y|xi,B,G), where

is another

## yi given xi, si = 1, when si = 1[a1(xi) < yi < a2(xi)], is

p(y|xi,si=1) =

f(y|x i ;B,G)
, a1(xi) < y < a2(xi).
F(a2(xi)|xi;B,G) - F(a 1 (xi)|xi;B,G)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In the Hausman and Wise (1977) study, yi = log(incomei), a1(xi) = -8, and
a2(xi) was a function of family size (which determines the official poverty
level).

^
17.5. If we replace y2 with y2, we need to see what happens when y2 = zD2 + v2
is plugged into the structural mode:
y1 = z1D1 +
= z1D1 +
So, the procedure is to replace

a1W(zD2 + v2) + u1
a1W(zD2) + (u1 + a1v2).

D2

in (17.81) its

(17.81)

-----

a1v2.

If the

## selection correction is going to work, we need the expected value of u1 +

given (z,v3) to be linear in v3 (in particular, it cannot depend on z).

a1v2

Then

we can write
E(y1|z,v3) = z1D1 +
where E[(u1 +

a1W(zD2) + g1v3,

## a1v2)|v3] = g1v3 by normality.

E(y1|z,y3 = 1) = z1D1 +

Conditioning on y3 = 1 gives

a1W(zD2) + g1l(zD3).

(17.82)

## A sufficient condition for (17.82) is that (u1,v2,v3) is independent of z

with a trivariate normal distribution.
the nature of v2 is restricted.

110

## nothing about v2 except for the usual linear projection assumption.

As a practical matter, if we cannot write y2 = zD2 + v2, where v2 is
independent of z and approximately normal, then the OLS alternative will not
be consistent.

procedure.

## This is why 2SLS is generally preferred.

17.7. a. Substitute the reduced forms for y1 and y2 into the third equation:
y3 = max(0,a1(zD1) +

## a2(zD2) + z3D3 + v3)

_ max(0,zP3 + v3),
where v3

_ u3 + a1v1 + a2v2.

## z and normally distributed.

estimate

Thus, if we knew

D1

and

D2,

we could consistently

>From the

D1

estimators of
entire sample.

D2.

and

Estimation of

Estimation of

D1

D2

is simple:

## follows exactly as in Procedure 17.3 using

the system
y1 = zD1 + v1

(17.83)

y3 = max(0,zP3 + v3),

(17.84)

## where y1 is observed only when y3 > 0.

Given

D1
^

Then, obtain

and

D^2,

^
^
form ziD1 and ziD2 for each observation i in the sample.

## a1, ^a2, and D^3 from the Tobit

^

yi3

^
^
on (ziD1), (ziD2), zi3

## using all observations.

For identification, (zD1,zD2,z3) can contain no exact linear
dependencies.

## Necessary is that there must be at least two elements in z not

111

also in z3.
Obtaining the correct asymptotic variance matrix is complicated.

It is

## most easily done in a generalized method of moments framework.

b. This is not very different from part a.

D2

## c. We need to estimate the variance of u3,

s23.

17.9. To be added.

## 17.11. a. There is no sample selection problem because, by definition, you

have specified the distribution of y given x and y > 0.

We only need to

## obtain a random sample from the subpopulation with y > 0.

b. Again, there is no sample selection bias because we have specified the
conditional expectation for the population of interest.

If we have a random

## sample from that population, NLS is generally consistent and

rN-asymptotically
-----

normal.
c. We would use a standard probit model.
x follows a probit model with P(w = 1|x) =
d. E(y|x) = P(y > 0|x)WE(y|x,y > 0) =
the NLS estimator of

Then w given

F(xG).
F(xG)Wexp(xB).

## and the probit estimator of

So we would plug in

G.

## e. Not when you specify the conditional distributions, or conditional

means, for the two parts.
problem.

## By definition, there is no sample selection

Confusion arises, I think, when two part models are specified with

## For example, we could write

y = wWexp(xB + u),
w = 1[xG + v > 0],
112

so that w = 0

6 y = 0.

Then, if u and

## v are independent -- so that u is independent of (x,w) -- we have

E(y|x,w) = wWexp(xB)E[exp(u)|x,w] = wWexp(xB)E[exp(u)],
which implies the specification in part b (by setting w = 1, once we absorb
E[exp(u)] into the intercept).
correlated.

So

## E[log(y)|x,w = 1] = xB + E(u|x,w = 1).

If we make the usual linearity assumption, E(u|v) =

## rv and assume a standard

normal distribution for v then we have the usual inverse Mills ratio added to
the linear model:
E[log(y)|x,w = 1] = xB +
A two-step strategy for estimating
probit of wi on xi to get

and

and

l(xi^G).

## run the regression log(yi) on xi,

statistic on

rl(xG).

is pretty clear.

First, estimate a

## Then, using the yi > 0 observations,

^ ^
l(xi^G) to obtain B
, r.

A standard t

## r is a simple test of Cov(u,v) = 0.

This two-step procedure reveals a potential problem with the model that
allows u and v to be correlated:

## In other words, identification of

comes entirely from the nonlinearity of the IMR, which we warned about in this
chapter.

Ideally, we would have a variable that affects P(w = 1|x) that can

## In labor economics, where two-part models are used to

allow for fixed costs of entering the labor market, one would try to find a
variable that affects the fixed costs of being employed that does not affect
the choice of hours.
If we assume (u,v) is multivariate normal, with mean zero, then we can
use a full maximum likelihood procedure.
113

## robust, making full distributional assumptions has a subtle advantage:

then compute partial effects on E(y|x) and E(y|x,y > 0).

we can

## of assumptions, the partial effects are not straightforward to obtain.

For

one,
E(y|x,y > 0) = exp(x,B)WE[exp(u)|x,w = 1)],
where E[exp(u)|x,w = 1)] can be obtained under joint normality.

A similar

## example is given in Section 19.5.2; see, particularly, equation (19.44).

Then, we can multiply this expectation by P(w = 1|x) =
that we cannot simply look at

F(xG).

The point is

This

## is very different from the sample selection model.

17.13. a. We cannot use censored Tobit because that requires observing x when
whatever the value of y.

we use the

population.

## (for a corner solution outcome).

b. Provided x varies enough in the subpopulation where y > 0 such that b.
rank E(xx|y > 0) = K, the parameters.

## In the case where an element of x is a

derived price, we need sufficient price variation for the population that
consumes some of the good.

## F(xB/s)xB + sf(xB/s) because we have made the assumption that y given x

follows a Tobit in the full population.

114

CHAPTER 18

## 18.1. a. This follows from equation (18.5).

E(y0) = E(y|w = 1).
-----

## First, E(y1) = E(y|w = 1) and

-----

Therefore, by (18.5),

-----

-----

## and so the bias is given by the first term.

b. If E(y0|w = 1) < E(y0|w = 0), those who participate in the program
would have had lower average earnings without training than those who chose
not to participate.

## leads to an underestimate of the impact of the program.

18.3. The following Stata session estimates a using the three different
regression approaches.

## the vector x, but I did not do so:

. probit train re74 re75 age agesq nodegree married black hisp
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

=
-302.1
= -294.07642
= -294.06748
= -294.06748

Probit estimates

Number of obs
LR chi2(8)
Prob > chi2
Pseudo R2

## Log likelihood = -294.06748

=
=
=
=

445
16.07
0.0415
0.0266

-----------------------------------------------------------------------------train |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------re74 | -.0189577
.0159392
-1.19
0.234
-.0501979
.0122825
re75 |
.0371871
.0271086
1.37
0.170
-.0159447
.090319
age | -.0005467
.0534045
-0.01
0.992
-.1052176
.1041242
agesq |
.0000719
.0008734
0.08
0.934
-.0016399
.0017837
nodegree |
-.44195
.1515457
-2.92
0.004
-.7389742
-.1449258
married |
.091519
.1726192
0.53
0.596
-.2468083
.4298464
black | -.1446253
.2271609
-0.64
0.524
-.5898524
.3006019
115

hisp | -.5004545
.3079227
-1.63
0.104
-1.103972
.1030629
_cons |
.2284561
.8154273
0.28
0.779
-1.369752
1.826664
-----------------------------------------------------------------------------. predict phat
(option p assumed; Pr(train))
. sum phat
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------phat |
445
.4155321
.0934459
.1638736
.6738951
. gen traphat0 = train*(phat - .416)
. reg unem78 train phat
Source |
SS
df
MS
-------------+-----------------------------Model |
1.3226496
2 .661324802
Residual | 93.4998223
442
.21153806
-------------+-----------------------------Total | 94.8224719
444 .213564126

Number of obs
F( 2,
442)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

445
3.13
0.0449
0.0139
0.0095
.45993

-----------------------------------------------------------------------------unem78 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------train |
-.110242
.045039
-2.45
0.015
-.1987593
-.0217247
phat | -.0101531
.2378099
-0.04
0.966
-.4775317
.4572254
_cons |
.3579151
.0994803
3.60
0.000
.1624018
.5534283
-----------------------------------------------------------------------------. reg unem78 train phat traphat0
Source |
SS
df
MS
-------------+-----------------------------Model | 1.79802041
3 .599340137
Residual | 93.0244515
441 .210939799
-------------+-----------------------------Total | 94.8224719
444 .213564126

Number of obs
F( 3,
441)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

445
2.84
0.0375
0.0190
0.0123
.45928

-----------------------------------------------------------------------------unem78 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------train | -.1066934
.0450374
-2.37
0.018
-.195208
-.0181789
phat |
.3009852
.3151992
0.95
0.340
-.3184939
.9204644
traphat0 |
-.719599
.4793509
-1.50
0.134
-1.661695
.222497
_cons |
.233225
.129489
1.80
0.072
-.0212673
.4877173
-----------------------------------------------------------------------------. reg unem78 train re74 re75 age agesq nodegree married black hisp
116

Source |
SS
df
MS
-------------+-----------------------------Model | 5.09784844
9 .566427604
Residual | 89.7246235
435 .206263502
-------------+-----------------------------Total | 94.8224719
444 .213564126

Number of obs
F( 9,
435)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

445
2.75
0.0040
0.0538
0.0342
.45416

-----------------------------------------------------------------------------unem78 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------train | -.1105582
.0444832
-2.49
0.013
-.1979868
-.0231295
re74 | -.0025525
.0053889
-0.47
0.636
-.0131441
.0080391
re75 |
-.007121
.0094371
-0.75
0.451
-.025669
.0114269
age |
.0304127
.0189565
1.60
0.109
-.0068449
.0676704
agesq | -.0004949
.0003098
-1.60
0.111
-.0011038
.0001139
nodegree |
.0421444
.0550176
0.77
0.444
-.0659889
.1502777
married | -.0296401
.0620734
-0.48
0.633
-.1516412
.0923609
black |
.180637
.0815002
2.22
0.027
.0204538
.3408202
hisp | -.0392887
.1078464
-0.36
0.716
-.2512535
.1726761
_cons | -.2342579
.2905718
-0.81
0.421
-.8053572
.3368413
------------------------------------------------------------------------------

around -.11:

## Of course, in this example, training

status was randomly assigned, so we are not surprised that different methods
lead to roughly the same estimate.

## 18.5. a. I used the following Stata session to answer all parts:

. probit train re74 re75 age agesq nodegree married black hisp
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

=
-302.1
= -294.07642
= -294.06748
= -294.06748

Probit estimates

Number of obs
LR chi2(8)
Prob > chi2
Pseudo R2

## Log likelihood = -294.06748

=
=
=
=

445
16.07
0.0415
0.0266

-----------------------------------------------------------------------------117

train |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------re74 | -.0189577
.0159392
-1.19
0.234
-.0501979
.0122825
re75 |
.0371871
.0271086
1.37
0.170
-.0159447
.090319
age | -.0005467
.0534045
-0.01
0.992
-.1052176
.1041242
agesq |
.0000719
.0008734
0.08
0.934
-.0016399
.0017837
nodegree |
-.44195
.1515457
-2.92
0.004
-.7389742
-.1449258
married |
.091519
.1726192
0.53
0.596
-.2468083
.4298464
black | -.1446253
.2271609
-0.64
0.524
-.5898524
.3006019
hisp | -.5004545
.3079227
-1.63
0.104
-1.103972
.1030629
_cons |
.2284561
.8154273
0.28
0.779
-1.369752
1.826664
-----------------------------------------------------------------------------. predict phat
(option p assumed; Pr(train))

. reg re78 train re74 re75 age agesq nodegree married black hisp (phat re74
re75 age agesq nodegree married black hisp)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
-------------+-----------------------------Model | 703.776258
9
78.197362
Residual | 18821.8804
435 43.2686905
-------------+-----------------------------Total | 19525.6566
444 43.9767041

Number of obs
F( 9,
435)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

445
1.75
0.0763
0.0360
0.0161
6.5779

-----------------------------------------------------------------------------re78 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------train |
.0699177
18.00172
0.00
0.997
-35.31125
35.45109
re74 |
.0624611
.1453799
0.43
0.668
-.2232733
.3481955
re75 |
.0863775
.2814839
0.31
0.759
-.4668602
.6396151
age |
.1998802
.2746971
0.73
0.467
-.3400184
.7397788
agesq | -.0024826
.0045238
-0.55
0.583
-.0113738
.0064086
nodegree | -1.367622
3.203039
-0.43
0.670
-7.662979
4.927734
married |
-.050672
1.098774
-0.05
0.963
-2.210237
2.108893
black | -2.203087
1.554259
-1.42
0.157
-5.257878
.8517046
hisp | -.2953534
3.656719
-0.08
0.936
-7.482387
6.89168
_cons |
4.613857
11.47144
0.40
0.688
-17.93248
27.1602
-----------------------------------------------------------------------------. reg phat re74 re75 age agesq nodegree married black hisp
Source |
SS
df
MS
-------------+-----------------------------Model | 3.87404126
8 .484255158
Residual | .003026272
436 6.9410e-06
-------------+-----------------------------Total | 3.87706754
444 .008732134
118

Number of obs
F( 8,
436)
Prob > F
R-squared
Adj R-squared
Root MSE

=
445
=69767.44
= 0.0000
= 0.9992
= 0.9992
= .00263

-----------------------------------------------------------------------------phat |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------re74 | -.0069301
.0000312 -222.04
0.000
-.0069914
-.0068687
re75 |
.0139209
.0000546
254.82
0.000
.0138135
.0140283
age | -.0003207
.00011
-2.92
0.004
-.0005368
-.0001046
agesq |
.0000293
1.80e-06
16.31
0.000
.0000258
.0000328
nodegree | -.1726018
.000316 -546.14
0.000
-.1732229
-.1719806
married |
.0352802
.00036
98.01
0.000
.0345727
.0359877
black | -.0562315
.0004726 -118.99
0.000
-.0571603
-.0553027
hisp | -.1838453
.0006238 -294.71
0.000
-.1850713
-.1826192
_cons |
.5907578
.0016786
351.93
0.000
.5874586
.594057
-----------------------------------------------------------------------------b. The IV estimate of a is very small -- .070, much smaller than when we
used either linear regression or the propensity score in a regression in
Example 18.2.

^
(When we do not instrument for train, a = 1.625, se = .640.)

The very large standard error (18.00) suggests severe collinearity among the
instruments.
^
c. The collinearity suspected in part b is confirmed by regressing Fi on
the xi:

## the R-squared is .9992, which means there is virtually no separate

^
variation in Fi that cannot be explained by xi.
d. This example illustrates why trying to achieve identification off of a
nonlinearity can be fraught with problems.

## Generally, it is not a good idea.

18.7. To be added.

## 18.9. a. We can start with equation (18.66),

y = h0 + xG + bw + wW(x -

J)D

+ u + wWv + e,

and, again, we will replace wWv with its expectation given (x,z) and an error.
But E(wWv|x,z) = E[E(wWv|x,z,v)|x,z] = E[E(w|x,z,v)Wv|x,z] = E[exp(p0 + xP1 +
zP2 + p3v)Wv|x,z] = xWexp(p0 + xP1 + zP2) where x = E[exp(p3v)Wv], and we have
119

## used the assumption that v is independent of (x,z).

E(wWv|x,z)] + e.

Now, define r = u + [w -

## need to replace p0 with a different constant, as is implied in the statement

of the problem.] So we can write
y = h0 + xG + bw + wW(x -

J)D

+ xE(w|x,z) + r, E(r|x,z) = 0.

## b. The ATE b is not identified by the IV estimator applied to the

extended equation.

If h

instruments.

## What I should have said is,

assume we can write w = exp(p0 + xP1 + zP2 + g), where E(u|g,x,z) = rWg and
E(v|g,x,z) = qWg.

## Then we take the expected value of (18.66) conditional

on (g,x,z):
E(y|v,x,z) = h0 + xG + bw + wW(x -

J)D

+ E(u|g,x,z) + wE(v|gx,z)

+ E(e|g,x,z)
= h0 + xG + bw + wW(x -

J)D

+ rWg + qwWg,

where we have used the fact that w is a function of (g,x,z) and E(e|g,x,z) =
0.

## p0 + xiP1 + ziP2 + gi, we can consistently estimate p0,

OLS regression log(wi) on 1, xi, zi, i = 1,...,N.
^
need the residuals, gi, i = 1,...,N.

P1,

and

P2

from the

## In the second step, run the regression

^
^
yi on 1, xi, wi, wi(xi - x), gi, wigi, i = 1,...,N.
-----

## As usual, the coefficient on wi is the consistent estimator of b, the average

treatment effect.

## A standard joint significant test -- for example, an F-type

120

test -- on the last two terms effectively tests the null hypothesis that w is
exogenous.

CHAPTER 19

## molog(m) - m for m > 0.

derivative to zero.

Write q(m)

-2

## 0, so the sufficient second order condition is satisfied.

b. For the exponential case, q(m)
-2

- m
-3

-2

mo

-2

= -mo

-1

The first

-2

The
-2

< 0.

The

## . reg cigs lcigpric lincome restaurn white educ age agesq

Source |
SS
df
MS
-------------+-----------------------------Model | 8029.43631
7 1147.06233
Residual | 143724.246
799 179.880158
-------------+-----------------------------Total | 151753.683
806 188.280003

Number of obs
F( 7,
799)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

807
6.38
0.0000
0.0529
0.0446
13.412

-----------------------------------------------------------------------------cigs |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lcigpric | -.8509044
5.782321
-0.15
0.883
-12.20124
10.49943
lincome |
.8690144
.7287636
1.19
0.233
-.561503
2.299532
restaurn | -2.865621
1.117406
-2.56
0.011
-5.059019
-.6722235
white | -.5592363
1.459461
-0.38
0.702
-3.424067
2.305594
educ | -.5017533
.1671677
-3.00
0.003
-.829893
-.1736136
age |
.7745021
.1605158
4.83
0.000
.4594197
1.089585
121

agesq | -.0090686
.0017481
-5.19
0.000
-.0124999
-.0056373
_cons | -2.682435
24.22073
-0.11
0.912
-50.22621
44.86134
-----------------------------------------------------------------------------. test lcigpric lincome
( 1)
( 2)

lcigpric = 0.0
lincome = 0.0
F(

2,
799) =
Prob > F =

0.71
0.4899

. reg cigs lcigpric lincome restaurn white educ age agesq, robust
Regression with robust standard errors

Number of obs
F( 7,
799)
Prob > F
R-squared
Root MSE

=
=
=
=
=

807
9.38
0.0000
0.0529
13.412

-----------------------------------------------------------------------------|
Robust
cigs |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lcigpric | -.8509044
6.054396
-0.14
0.888
-12.7353
11.0335
lincome |
.8690144
.597972
1.45
0.147
-.3047671
2.042796
restaurn | -2.865621
1.017275
-2.82
0.005
-4.862469
-.8687741
white | -.5592363
1.378283
-0.41
0.685
-3.26472
2.146247
educ | -.5017533
.1624097
-3.09
0.002
-.8205533
-.1829532
age |
.7745021
.1380317
5.61
0.000
.5035545
1.04545
agesq | -.0090686
.0014589
-6.22
0.000
-.0119324
-.0062048
_cons | -2.682435
25.90194
-0.10
0.918
-53.52632
48.16145
-----------------------------------------------------------------------------. test lcigpric lincome
( 1)
( 2)

lcigpric = 0.0
lincome = 0.0
F(

2,
799) =
Prob > F =

1.07
0.3441

Iteration 0:
Iteration 1:
Iteration 2:

## log likelihood = -8111.8346

log likelihood = -8111.5191
log likelihood = -8111.519

Poisson regression

Number of obs
LR chi2(7)
122

=
=

807
1068.70

Log likelihood =

## Prob > chi2

Pseudo R2

-8111.519

=
=

0.0000
0.0618

-----------------------------------------------------------------------------cigs |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lcigpric | -.1059607
.1433932
-0.74
0.460
-.3870061
.1750847
lincome |
.1037275
.0202811
5.11
0.000
.0639772
.1434779
restaurn | -.3636059
.0312231
-11.65
0.000
-.4248021
-.3024098
white | -.0552012
.0374207
-1.48
0.140
-.1285444
.0181421
educ | -.0594225
.0042564
-13.96
0.000
-.0677648
-.0510802
age |
.1142571
.0049694
22.99
0.000
.1045172
.1239969
agesq | -.0013708
.000057
-24.07
0.000
-.0014825
-.0012592
_cons |
.3964494
.6139626
0.65
0.518
-.8068952
1.599794
-----------------------------------------------------------------------------. glm cigs lcigpric lincome restaurn white educ age agesq, family(poisson)
sca(x2)
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

= -8380.1083
= -8111.6454
= -8111.519
= -8111.519

## Generalized linear models

Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

No. of obs
Residual df
Scale param
(1/df) Deviance
(1/df) Pearson

14752.46933
16232.70987

## Variance function: V(u) = u

Link function
: g(u) = ln(u)
Standard errors : OIM

[Poisson]
[Log]

Log likelihood
BIC

AIC

= -8111.519022
= 14698.92274

=
=
=
=
=

807
799
1
18.46367
20.31628

20.12272

-----------------------------------------------------------------------------cigs |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lcigpric | -.1059607
.6463244
-0.16
0.870
-1.372733
1.160812
lincome |
.1037275
.0914144
1.13
0.257
-.0754414
.2828965
restaurn | -.3636059
.1407338
-2.58
0.010
-.6394391
-.0877728
white | -.0552011
.1686685
-0.33
0.743
-.3857854
.2753831
educ | -.0594225
.0191849
-3.10
0.002
-.0970243
-.0218208
age |
.1142571
.0223989
5.10
0.000
.0703561
.158158
agesq | -.0013708
.0002567
-5.34
0.000
-.001874
-.0008677
_cons |
.3964493
2.76735
0.14
0.886
-5.027457
5.820355
-----------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion)
* The estimate of sigma is
123

. di sqrt(20.32)
4.5077711
. poisson cigs restaurn white educ age agesq
Iteration 0:
Iteration 1:
Iteration 2:

## log likelihood = -8125.618

log likelihood = -8125.2907
log likelihood = -8125.2906

Poisson regression

Number of obs
LR chi2(5)
Prob > chi2
Pseudo R2

## Log likelihood = -8125.2906

=
=
=
=

807
1041.16
0.0000
0.0602

-----------------------------------------------------------------------------cigs |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------restaurn | -.3545336
.0308796
-11.48
0.000
-.4150564
-.2940107
white | -.0618025
.037371
-1.65
0.098
-.1350483
.0114433
educ | -.0532166
.0040652
-13.09
0.000
-.0611842
-.0452489
age |
.1211174
.0048175
25.14
0.000
.1116754
.1305594
agesq | -.0014458
.0000553
-26.14
0.000
-.0015543
-.0013374
_cons |
.7617484
.1095991
6.95
0.000
.5469381
.9765587
-----------------------------------------------------------------------------. di 2*(8125.291 - 8111.519)
27.544
. * This is the usual LR statistic.
. * dividing by 20.32:

## The GLM version is obtained by

. di 2*(8125.291 - 8111.519)/(20.32)
1.3555118
. glm cigs lcigpric lincome restaurn white educ age agesq, family(poisson)
robust
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

= -8380.1083
= -8111.6454
= -8111.519
= -8111.519

## Generalized linear models

Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

No. of obs
Residual df
Scale param
(1/df) Deviance
(1/df) Pearson

14752.46933
16232.70987

## Variance function: V(u) = u

Link function
: g(u) = ln(u)
Standard errors : Sandwich

[Poisson]
[Log]
124

=
=
=
=
=

807
799
1
18.46367
20.31628

Log likelihood
BIC

= -8111.519022
= 14698.92274

AIC

20.12272

-----------------------------------------------------------------------------|
Robust
cigs |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lcigpric | -.1059607
.6681827
-0.16
0.874
-1.415575
1.203653
lincome |
.1037275
.083299
1.25
0.213
-.0595355
.2669906
restaurn | -.3636059
.140366
-2.59
0.010
-.6387182
-.0884937
white | -.0552011
.1632959
-0.34
0.735
-.3752553
.264853
educ | -.0594225
.0192058
-3.09
0.002
-.0970653
-.0217798
age |
.1142571
.0212322
5.38
0.000
.0726427
.1558715
agesq | -.0013708
.0002446
-5.60
0.000
-.0018503
-.0008914
_cons |
.3964493
2.97704
0.13
0.894
-5.438442
6.23134
-----------------------------------------------------------------------------. di .1143/(2*.00137)
41.715328
a. Neither the price nor income variable is significant at any reasonable
significance level, although the coefficient estimates are the expected sign.
It does not matter whether we use the usual or robust standard errors.

The

two variables are jointly insignificant, too, using the usual and
heteroskedasticity-robust tests (p-values = .490, .344, respectively).
b. While the price variable is still very insignificant (p-value = .46),
the income variable, based on the usual Poisson standard errors, is very
significant: t = 5.11.

## elasticity is -.106 and the estimated income elasticity is .104.

Incidentally, if you drop restaurn -- a binary indicator for restaurant
smoking restrictions at the state level -- then log(cigpric) becomes much more
significant (but using the incorrect standard errors).

## In this data set, both

cigpric and restaurn vary only at the state level, and, not surprisingly, they
are significantly correlated.

## restrictions also have higher average prices, on the order of 2.9%.)

125

^
c. The GLM estimate of s is s = 4.51.

## standard errors should be multiplied by this factor, as is done using the

"glm" command in Stata, with the option "sca(x2)."

The t statistic on

lcigpric is now very small (-.16), and that on lincome falls to 1.13 -- much
more in line with the linear model t statistic (1.19 with the usual standard
errors).

## With the GLM standard errors, the restaurant

restriction variable, education, and the age variables are still significant.
(Interestingly, there is no race effect, conditional on the other covariates.)
d. The usual LR statistic is 2(8125.291 - 8111.519) = 27.54, which is a
2

~ 0).

## The QLR statistic

^2
divides the usual LR statistic by s = 20.32, so QLR = 1.36 (p-value

~ .51).

As expected, the QLR statistic shows that the variables are jointly
insignificant, while the LR statistic shows strong significance.
e. Using the robust standard errors does not significantly change any
conclusions; in fact, most explanatory variables become slightly more
significant than when we use the GLM standard errors.
^
the adjustment by s > 1 that makes the most difference.

In this example, it is
Having fully robust

## standard errors has no additional effect.

f. We simply compute the turning point for the quadratic:
= 1143/(2*.00137)

^
^
bage/(-2bage2)

~ 41.72.

## g. A double hurdle model -- which separates the initial decision to smoke

at all from the decision of how much to smoke -- seems like a good idea.
is certainly worth investigating.

It

> 1) as a

probit.
126

## 19.5. a. We just use iterated expectations:

E(yit|xi) = E[E(yit|xi,ci)|xi] = E(ci|xi)exp(xitB)
= exp(a + xiG)exp(xitB) = exp(a + xitB + xiG).
-----

-----

## b. We are explicitly testing H0:

independence of ci and xi under H0.
Var(yi|xi), the T

## * T conditional variance matrix of yi given xi under H0.

First,
Var(yit|xi) = E[Var(yit|xi,ci)|xi] + Var[E(yit|xi,ci)|xi]
= E[ciexp(xitB)|xi] + Var[ciexp(xitB)|xi]
= exp(a + xitB) + t [exp(xitB)] ,
2

where t

A similar,

## general expression holds for conditional covariances:

Cov(yit,yir|xi) = E[Cov(yit,yir|xi,ci)|xi]
+ Cov[E(yit|xi,ci),E(yir|xi,ci)|xi]
= 0 + Cov[ciexp(xitB),ciexp(xirB)|xi]
= t exp(xitB)exp(xirB).
2

B,

## It is natural to use a score test of H0: G = 0. First, obtain

~
~
~
~ ~
~
~
~
consistent estimators a, B by, say, pooled Poisson QMLE. Let yit = exp(a +
~
~
~
~
~
~
2
xitB) and uit = yit - yit. A consistent estimator of t can be obtained from

estimate.

## a simple pooled regression, through the origin, of

~
~
~
~2
~
~ 2
uit - yit on [exp(xitB)] , t = 1,...,T; i = 1,...,N.
~2
Call this estimator t .

2

## = exp(a + xitB) + t [exp(xitB)] , where uit

2

_ yit - E(yit|xit).
2

## also use the many covariance terms in estimating t

127

because t

[We could
2

E{[uit/exp(xitB)][uir/exp(xirB)]}, all t
2

\$ r.

## * T weighting matrix for observation i, as in

~
~
Section 19.6.3; see also Problem 12.11. The matrix Wi(D) = W(xi,D) has
~
~
~
~2
~ 2
diagonal elements yit + t [exp(xitB)] , t = 1,...,T and off-diagonal elements
~
~
~2
~
~
~ ~
t exp(xitB)exp(xirB), t \$ r. Let a, B be the solutions to
N
~ -1
min (1/2) S [yi - m(xi,a,B)][Wi(D)] [yi - m(xi,a,B)],
i=1
a,B

th

variance matrix.

## To obtain the score test in the context

of MWNLS, we need the score of the comditional mean function, with respect to
all parameters, evaluated under H0.
Let

Q _

= 0.

-----

## Taking the gradient and evaluating it under H0 gives

~
Dqmt(xi,Q~) = exp(a~ + xitB
)[1,xit,xi],
-----

which would be 1

## * (1 + 2K) without any redundancies in xi.

-----

Usually, xit

would contain year dummies or other aggregate effects, and these would be
-----

T

Let

## Then the score function, evaluate at the null estimates

Q _

~ ~ ~
(a,B,G), is

~
~
~ -1~
si(Q) = -DqM(xi,Q)[Wi(D)] ui,
~
where ui is the T

128

The

## estimated conditional Hessian, under H0, is

~
-1
A = N

S DqM(xi,Q~)[Wi(~D)]-1DqM(xi,~Q),

i=1

a (1 + 2K)

* (1 + 2K) matrix.

## The score or LM statistic is therefore

& S D M(x ,~Q)[W (~D)]-1~u *& SN D M(x ,~Q)[W (~D)]-1D M(x ,~Q)*-1
i
i
i8 7
q i
i
q i 8
7i=1 q
i=1
N
W&7 S DqM(xi,Q~)[Wi(~D)]-1~ui*8.
N

LM =

i=1

If only J < K

-----

## elements of xi are included, then the degrees of freedom gets reduced to J.

In practice, we might want a robust form of the test that does not
require Var(yi|xi) = W(xi,D) under H0, where W(xi,D) is the matrix described
above.

## This variance matrix was derived under pretty restrictive

assumptions.

~
A fully robust form is given in equation (12.68), where si(Q)

~
~
-1
and A are as given above, and B = N

S si(~Q)si(~Q).

## Since the restrictions

i=1

are written as
matrix is K

= 0, we take c(Q) =

G,

~
and so C = [0|IK], where the zero

* (1 + K).

## c. If we assume (19.60), (19.61) and ci = aiexp(a + xiG) where ai|xi ~

-----

Gamma(d,d), then things are even easier -- at least if we have software that
estimates random effects Poisson models.

-----

## yit, yir are independent conditional on (xi,ai), t

\$ r

ai|xi ~ Gamma(d,d).
In other words, the full set of random effects Poisson assumptions holds, but
where the mean function in the Poisson distribution is aiexp(a + xitB + xiG).
-----

-----

## In practice, we just add the (nonredundant elements of) xi in each time

period, along with a constant and xit, and carry out a random effects Poisson
analysis.

129

yt

## f(yt|x,c;Bo) = exp[-cWm(xt,Bo)][cWm(xt,Bo)] /yt!,

yt = 0,1,2,....

Multiplying these together gives the joint density of (yi1,...,yiT) given (xi
= x, ci = c).

T

t=1

## b. Taking the derivative of li(ci,B) with respect to ci, setting the

result to zero, and rerranging gives
T

(ni/ci) =

S m(xit,B).

t=1

## Letting ci(B) denote the solution as a function of

ni/Mi(B), where Mi(B)

B,

we have ci(B) =

_ S m(xit,B).

t=1

## for a maximum is easily seen to hold.

c. Plugging the solution from part b into li(ci,B) gives
li[ci(B),B]

= -[ni/Mi(B)]Mi(B) +
= -ni + nilog(ni) +
T

S yit{log[ni/Mi(B)] + log[m(xit,B)]

t=1
T

S yit{log[m(xit,B)/Mi(B)]

t=1

t=1

N

i=1

li(ci,B)

with respect to

N

li[ci(B),B].
i=1

depend on

i=1

Therefore,

us to a

-----

130

## the "glm" command in Stata.

. replace atndrte = atndrte/100
(680 real changes made)
. reg atndrte ACT priGPA frosh soph
Source |
SS
df
MS
-------------+-----------------------------Model | 5.95396289
4 1.48849072
Residual | 13.7777696
675 .020411511
-------------+-----------------------------Total | 19.7317325
679 .029059989

Number of obs
F( 4,
675)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

680
72.92
0.0000
0.3017
0.2976
.14287

-----------------------------------------------------------------------------atndrte |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ACT | -.0169202
.001681
-10.07
0.000
-.0202207
-.0136196
priGPA |
.1820163
.0112156
16.23
0.000
.1599947
.2040379
frosh |
.0517097
.0173019
2.99
0.003
.0177377
.0856818
soph |
.0110085
.014485
0.76
0.448
-.0174327
.0394496
_cons |
.7087769
.0417257
16.99
0.000
.6268492
.7907046
-----------------------------------------------------------------------------. predict atndrteh
(option xb assumed; fitted values)
. sum atndrteh
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------atndrteh |
680
.8170956
.0936415
.4846666
1.086443
. count if atndrteh > 1
12
. glm atndrte ACT priGPA frosh soph, family(binomial) sca(x2)
note: atndrte has non-integer values
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

=
=
=
=

-226.64509
-223.64983
-223.64937
-223.64937

## Generalized linear models

No. of obs
131

680

Optimization

: ML: Newton-Raphson

Deviance
Pearson

=
=

Residual df
Scale param
(1/df) Deviance
(1/df) Pearson

285.7371358
85.57283238

## Variance function: V(u) = u*(1-u)

Link function
: g(u) = ln(u/(1-u))
Standard errors : OIM

[Bernoulli]
[Logit]

Log likelihood
BIC

AIC

= -223.6493665
= 253.1266718

=
=
=
=

675
1
.4233143
.1267746

.6724981

-----------------------------------------------------------------------------atndrte |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ACT | -.1113802
.0113217
-9.84
0.000
-.1335703
-.0891901
priGPA |
1.244375
.0771321
16.13
0.000
1.093199
1.395552
frosh |
.3899318
.113436
3.44
0.001
.1676013
.6122622
soph |
.0928127
.0944066
0.98
0.326
-.0922209
.2778463
_cons |
.7621699
.2859966
2.66
0.008
.201627
1.322713
-----------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion)
. di (.1268)^2
.01607824
. di exp(.7622 - .1114*30 + 1.244*3)/(1 + exp(.7622 - .1114*30 + 1.244*3))
.75991253
. di exp(.7622 - .1114*25 + 1.244*3)/(1 + exp(.7622 - .1114*25 + 1.244*3))
.84673249
. di .760 - .847
-.087
. predict atndh
(option mu assumed; predicted mean atndrte)
. sum atndh
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------atndh |
680
.8170956
.0965356
.3499525
.9697185
. corr atndrte atndh
(obs=680)
| atndrte
atndh
-------------+-----------------atndrte |
1.0000
atndh |
0.5725
1.0000
132

. di (.5725)^2
.32775625
a. The coefficient on ACT means that if the ACT score increases by 5
points -- more than a one standard deviation increase -- then the attendance
rate is estimated to fall by about .017(5) = .085, or 8.5 percentage points.
The coefficient on priGPA means that if prior GPA is one point higher, the
attendance rate is predicted to be about .182 higher, or 18.2 percentage
points.

## less than zero.

b. The GLM standard errors are given in the output.

^
Note that s

~ .0161.

In other words, the usual MLE standard errors, obtained, say, from the
expected Hessian of the quasi-log likelihood, are much too large.
2

The

## < 1 are given by the GLM output.

(If you

omit the "sca(x2)" option in the "glm" command, you will get the usual MLE
standard errors.)
c. Since the coefficient on ACT is negative, we know that an increase in
ACT score, holding year and prior GPA fixed, actually reduces predicted
attendance rate.

## The calculation shows that when ACT increases from 25 to 30,

the estimated fall in atndrte is about .087, or about 8.7 percentage points.
This is very similar to that found using the linear model.
d. The R-squared for the linear model is about .302.

^
E(atndrtei|xi).

## in the logistic functional form are not chosen to maximize an R-squared.

133

19.11. To be added.

## SOLUTIONS TO CHAPTER 20 PROBLEMS

20.1. To be added.

20.3. a. If all durations in the sample are censored, di = 0 for all i, and so
the log-likelihood is

i=1

i=1

## b. For the Weibull case, F(t|xi;Q) = 1 - exp[-exp(xiB)t ], and so the

S exp(xiB)cai .
N

log-likelihood is -

i=1

is -exp(b)

N
S cai .

i=1

## Since ci > 0, we can choose any a > 0 so that

N
S cai > 0.

i=1

But then, for any a > 0, the log-likelihood is maximized by minimizing exp(b)
across b.

But as b

-8, exp(b)

0.

## So plugging any value a into the log-

likelihood will lead to b getting more and more negative without bound.

So no

## two real numbers for a and b maximize the log likelihood.

d. It is not possible to estimate duration models from flow data when all
durations are right censored.

20.5. a. P(ti

< t|xi,ai,ci,si = 1) = P(t*i < t|xi,t*i > b - ai) = P(t*i < t,t*i >

b - ai|xi)/P(ti
*

>

b - ai|xi) = P(ti
*

< t|xi)/P(t*i

>

## ai) = [F(t|xi) - F(b - ai|xi)]/[1 - F(b - ai|xi)].

b. The derivative of the cdf in part a, with respect to t, is simply
f(t|xi)/[1 - F(b - ai|xi)].
c. P(ti = ci|xi,ai,ci,si = 1) = P(ti
*

134

ci|xi)/P(ti
*

ai|xi)].

## First, by (20.22) and

D(ai|ci,xi) = D(ai|xi), the density of (ai,ti) given (ci,xi) does not depend
*

## on ci and is given by k(a|xi)f(t|xi) for 0 < a < b and 0 < t <

8.

This is

also the conditional density of (ai,ti) given (ci,xi) when t < ci, that is,
the observation is uncensored.

*

## observing the random draw (ai,ci,xi,ti), conditional on xi, is P(ti

ai,xi), which is exactly (20.32).

> b -

## truncated distributions, the density of (ai,ti) given (ci,di,xi) and si = 1 is

d
(1 - di)
k(a|xi)[f(t|xi)] i[1 - F(ci|xi)]
/P(si = 1|xi),
for all combinations (a,t) such that si = 1.

## taking the log gives (20.56).

b. We have the usual tradeoff between robustness and efficiency.

Using

## the log likelihood (20.56) results in more efficient estimators provided we

have the two densities correctly specified; (20.30) requires us to only
specify f(W|xi).

20.9. To be added.

135

## Menu de pied de page

### Social

Droits d'auteur © 2021 Scribd Inc.