Académique Documents
Professionnel Documents
Culture Documents
Lecture 15-3
Truncated regression
and
Heckman Sample
Selection Corrections
1
Truncated regression
Truncated regression is different from
censored regression in the following way:
Censored regressions: dependent variable
may be censored, but you can include the
censored observations in the regression
Truncated regressions: A subset of
observations are dropped, thus, only the
truncated data are available for the
regression.
2
10
12
E(u|x,
Thus, E(u|x,s=1) 0.
Similarly, you can show that E(u|x,s=0) 0.
Thus E(u|x,s) 0. Thus, this OLS is biased.
18
24
Family income
per month
$500
These
observat
ions are
dropped
from the
data.
True
regression
Biased regression
when applying OLS to
truncated data
Educ of
household
head
26
27
f (ui )
f (ui )
f (ui )
u
c 0 1 xi
c 0 1 xi
P (ui ci 0 1 xi )
P( i i
) ( i
)
ci 0 1 xi
1
2 2
ui2
2
e 2
2
1 i
2
1
e
ci 0 1 xi 2
(
)
ui
ui
ci 0 1 xi
29
Li
c 0 1 xi
( i
)
L( 0 , 1 , ) Li
i 1
31
Exercise
We do not have a suitable data for truncated
regression. Therefore, let us truncate the
data by ourselves to check how the truncated
regression works.
EX1. Use JPSC_familyinc.dta to estimate the
following model using all the observation.
(family income)=0+1(husband educ)+u
Family income is in 10,000 yen.
32
SS
df
MS
Model
Residual
38305900.9
1 38305900.9
318850122 7693 41446.7856
Total
familyinc
Coef.
huseduc
_cons
32.93413
143.895
Std. Err.
1.083325
15.09181
30.40
9.53
Number of obs
F( 1, 7693)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
7695
924.22
0.0000
0.1073
0.1071
203.58
P>|t|
0.000
0.000
30.81052
114.3109
35.05775
173.479
SS
df
MS
Number of obs
F( 1, 6272)
Prob > F
R-squared
Adj R-squared
Root MSE
11593241.1
1 11593241.1
120645494 6272 19235.5699
132238735 6273
familyinc
Coef.
huseduc
_cons
20.27929
244.5233
21080.621
Std. Err.
.8260432
11.33218
t
24.55
21.58
=
=
=
=
=
=
6274
602.70
0.0000
0.0877
0.0875
138.69
P>|t|
0.000
0.000
18.65996
222.3084
21.89861
266.7383
Obs with
familyinc800 are
dropped. The
parameter on
huseduc is biased
towards zero.
34
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-39676.782
-39618.757
-39618.629
-39618.629
Truncated regression
Limit: lower =
-inf
upper =
800
Log likelihood = -39618.629
familyinc
Coef.
Std. Err.
huseduc
_cons
24.50276
203.6856
1.0264
13.75721
/sigma
153.1291
1.805717
P>|z|
23.87
14.81
0.000
0.000
22.49105
176.7219
26.51446
230.6492
84.80
0.000
149.59
156.6683
35
39
E ( yi | si 1, zi ) E ( yi | si* 0, zi )
E ( yi | zi ei 0, zi )
E ( yi | ei zi , zi )
E ( xi ui | ei zi , zi )
xi E (ui | ei zi , zi )
Using a result of bivariate normal distribution,
the last term can be shown to be E(ui|ei>zi,zi)=
( zi ) / ( zi ) . But the term,
( z i ) / ( z i )
,
41
is the inverse mills ratio, (zi).
Thus, we have
E ( yi | si 1, zi )
xi E (ui | ei zi , zi )
xi ( zi )
42
(
z
43
.
( z )
Second step: Plug in
in the wage
) equation using
( z the
equation, then estimate
OLS. That is: estimate the following.
i
45
46
Exercise
Using Mroz.dta estimate the wage
offer equation using Heckit model.
The explanatory variables for wage
offer equation are educ exper
expersq. The explanatory variables
for the sample selection equation is
educ, exper, expersq, nwifeinc, age,
kidslt6, kidsge6.
47
. **********************************************
. * Estimating heckit model manually
*
. **********************************************
. ***************************
. * First create selection *
. * Variable
*
. ***************************
.
gen s=0 if wage==.
(428 missing values generated)
Estimating Heckit
Manually. (note: you will
not get the correct
standard errors.
.
replace s=1 if wage~=.
(428 real changes made)
.
.
.
.
.
*******************************
*Next, estimate the probit
*
*selection equation
*
*******************************
probit s educ exper expersq nwifeinc age kidslt6 kidsge6
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-514.8732
-405.78215
-401.32924
-401.30219
-401.30219
Probit regression
Number of obs
LR chi2(7)
Prob > chi2
Pseudo R2
Coef.
educ
exper
expersq
nwifeinc
age
kidslt6
kidsge6
_cons
.1309047
.1233476
-.0018871
-.0120237
-.0528527
-.8683285
.036005
.2700768
Std. Err.
.0252542
.0187164
.0006
.0048398
.0084772
.1185223
.0434768
.508593
z
5.18
6.59
-3.15
-2.48
-6.23
-7.33
0.83
0.53
P>|z|
0.000
0.000
0.002
0.013
0.000
0.000
0.408
0.595
=
=
=
=
753
227.14
0.0000
0.2206
.180402
.1600311
-.0007111
-.0025378
-.0362376
-.636029
.1212179
1.266901
48
.
.
.
.
*******************************
*Then create inverse lambda *
*******************************
predict xdelta, xb
The second
step:
*************************************
*Finally, estimate the Heckit model *
*************************************
reg lwage educ exper expersq lambda
Source
SS
df
MS
Model
Residual
35.0479487
188.279492
4 8.76198719
423 .445105182
Total
223.327441
427 .523015084
lwage
Coef.
educ
exper
expersq
lambda
_cons
.1090655
.0438873
-.0008591
.0322619
-.5781032
Std. Err.
.0156096
.0163534
.0004414
.1343877
.306723
t
6.99
2.68
-1.95
0.24
-1.88
Number of obs
F( 4, 423)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.008
0.052
0.810
0.060
=
=
=
=
=
=
428
19.69
0.0000
0.1569
0.1490
.66716
Note the
standard errors
are not correct.
.1397476
.0760313
8.49e-06
.2964126
.024788
49
. heckman lwage educ exper expersq, select(s=educ exper expersq nwifeinc age kidslt6 kidsge6) twostep
Heckman selection model -- two-step estimates
(regression model with sample selection)
Coef.
lwage
Std. Err.
Number of obs
Censored obs
Uncensored obs
=
=
=
753
325
428
Wald chi2(3)
Prob > chi2
=
=
51.53
0.0000
P>|z|
educ
exper
expersq
_cons
.1090655
.0438873
-.0008591
-.5781032
.015523
.0162611
.0004389
.3050062
7.03
2.70
-1.96
-1.90
0.000
0.007
0.050
0.058
.0786411
.0120163
-.0017194
-1.175904
.13949
.0757584
1.15e-06
.019698
educ
exper
expersq
nwifeinc
age
kidslt6
kidsge6
_cons
.1309047
.1233476
-.0018871
-.0120237
-.0528527
-.8683285
.036005
.2700768
.0252542
.0187164
.0006
.0048398
.0084772
.1185223
.0434768
.508593
5.18
6.59
-3.15
-2.48
-6.23
-7.33
0.83
0.53
0.000
0.000
0.002
0.013
0.000
0.000
0.408
0.595
.0814074
.0866641
-.003063
-.0215096
-.0694678
-1.100628
-.049208
-.7267473
.180402
.1600311
-.0007111
-.0025378
-.0362376
-.636029
.1212179
1.266901
lambda
.0322619
.1336246
0.24
0.809
-.2296376
.2941613
rho
sigma
lambda
0.04861
.66362875
.03226186
.1336246
mills
Heckit
estimated
automati
cally.
Note H0:=0
cannot be
rejected. So
there is little
evidence that
sample selection
bias is present.
50