Vous êtes sur la page 1sur 7

ECONOMICS 762: 2SLS Stata Example

L. Magee Mar ch, 2008




This example uses data in the file 2slseg.dta. It contains 2932 observations from a sample
of young adult males in the U.S. in 1976. The variables are:

1. nearc2 =1 if lived near a 2 yr college in 1966
2. nearc4 =1 if lived near a 4 yr college in 1966
3. educ years of schooling, 1976
4. age age in years, 1976
5. smsa =1 if lived in an SMSA, 1976 (SMSA =Standard Metropolitan Statistical
Area, basically indicates live in an urban area)
6. south =1 if live in southern U.S., 1976
7. wage hourly wage in cents, 1976
8. married =1 if married, 1976

This data set is used in the article Using Geographic Variation in College Proximity to
Estimate the Returns to Schooling, by D. Card (1994) in L.N. Christophides et al.(ed.),
Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp and used in
the textbook: Introductory Econometrics: A Modern Approach, second edition, by J effrey
M. Wooldridge.

The goal is to estimate the percentage effect on the wage of getting an extra year of
education, by estimating the coefficient on EDUC variable in a regression equation with
the log of WAGE as the dependent variable, controlling for other factors as follows:

LHS variable: log of WAGE
RHS variables: EDUC, AGE, MARRIED, SMSA

This will be referred to as the wage equation. It is commonly thought that EDUC is
correlated with the error term in the wage equation (unobserved ability). This would
result in OLS over-estimating the effect of EDUC on the log wage. It is hard to find
instruments though. They need to be uncorrelated with the error term, yet help to predict
years of schooling. In this example, some information on how far these young men lived
from two types of colleges 10 years earlier is used as instruments.
2
Here is the do file without comments:

******************************************************************************
** 2SLS. do : Mar ch 2007
******************************************************************************

cl ear
capt ur e l og usi ng " C: \ Document s and Set t i ngs\ cour ses\ 761 and
762\ w07\ 2SLS\ 2SLS. l og" , r epl ace
use " C: \ Document s and Set t i ngs\ cour ses\ 761 and 762\ w07\ 2SLS\ 2SLSeg. dt a"

summar i ze

gen l wage=l og( wage)

** I V r egr essi on ( 2SLS) **
i vr eg l wage age mar r i ed smsa ( educ = near c2 near c4)

** gener al ver si on of Hausman t est **
pr edi ct i vr esi d, r esi dual s
est st or e i vr eg
r eg l wage educ age mar r i ed smsa
hausman i vr eg . , const ant si gmamor e df ( 1)

** Wu ver si on of Hausman t est **
qui et l y r eg educ age mar r i ed smsa near c2 near c4
pr edi ct educhat , xb
r eg l wage educ age mar r i ed smsa educhat

** over i dent i f i cat i on t est **
qui et l y r eg i vr esi d age mar r i ed smsa near c2 near c4
pr edi ct expl r esi d, xb
mat r i x accumr ssmat = expl r esi d, noconst ant
mat r i x accumt ssmat = i vr esi d, noconst ant
scal ar nobs=e( N)
scal ar x2=nobs*r ssmat [ 1, 1] / t ssmat [ 1, 1]
scal ar pval =1- chi 2( 1, x2)
scal ar l i st x2 pval

l og cl ose



3
Here is the same do file with comments about some of the commands inserted below them in
italics:

******************************************************************************
** 2SLS. do : Mar ch 2007
******************************************************************************

cl ear
capt ur e l og usi ng " C: \ Document s and Set t i ngs\ cour ses\ 761 and
762\ w07\ 2SLS\ 2SLS. l og" , r epl ace
use " C: \ Document s and Set t i ngs\ cour ses\ 761 and 762\ w07\ 2SLS\ 2SLSeg. dt a"

summar i ze

gen l wage=l og( wage)

** I V r egr essi on ( 2SLS) **
i vr eg l wage age mar r i ed smsa ( educ = near c2 near c4)

This i vr eg command computes the 2SLS estimates. The dependent variable is l wage. The regressors
that are assumed exogenous are left outside of the parentheses: age mar r i ed smsa. The regressors that
are assumed endogenous are in the parentheses to the left of the equals sign. Theres just one in this
example: educ. In the parentheses to the right of the equals sign are the instrumental variables, that are
assumed exogenous and do not appear as regressors in the equation. Here they are near c2 and near c4.
The key assumption is that distances from 2yr and 4yr colleges in 1966 are not correlated with the error in
the wage equation, but do help to explain years of schooling in 1976.

** gener al ver si on of Hausman t est **
pr edi ct i vr esi d, r esi dual s

This post-estimation command stores the 2SLS residuals in a variable that I called i vr esi d..

est st or e i vr eg

This post-estimation command stores some of the 2SLS results for later use in a Hausman test.

r eg l wage educ age mar r i ed smsa

This command estimates the same equation by OLS in order to compute the Hausman test statistic.

hausman i vr eg . , const ant si gmamor e df ( 1)

This command computes the Hausman test statistic. The null hypothesis is that the OLS estimator is consistent. If
accepted, we probably would prefer to use OLS instead of 2SLS. The option const ant is necessary to tell Stata to
include the constant term in the comparison of both estimates. The si gmamor e option tells Stata to use the same
estimate of the variance of the error term for both models. This is desirable here since the error term has the same
interpretation in both models. The df ( 1) option tells Stata that the null distribution has one degree of freedom. Stata
was able to figure this out when I left this option out, even though the Hausman test is comparing values of two 5-
element (not one-element) vectors. It probably knew this by finding only one non-zero eigenvalue of the 5-by-5
covariance matrix estimate that it calls ( V_b- V_B) in the output. Its safer to impose the d.f. in the hausman
command as above.

** Wu ver si on of Hausman t est **
qui et l y r eg educ age mar r i ed smsa near c2 near c4

The above OLS regression is done only to get the predicted value of educ to perform the Wu version of the Hausman
test as described on p.82 of the Greene text, 5
th
edition. To reduce the amount of output in the log file, its output is
suppressed by preceding the command with qui et l y.

pr edi ct educhat , xb
4
r eg l wage educ age mar r i ed smsa educhat

This OLS regression takes the original wage equation and adds the OLS predicted values of all of the (suspected)
endogenous variables. Here there is only one, educhat . It was predicted using the full set of exogenous variables.
The Wu version of the Hausman test is the standard significance test for the coefficient(s) on these added variables.
Since theres just one here, use a two-sided t-test.

** over i dent i f i cat i on t est **
qui et l y r eg i vr esi d age mar r i ed smsa near c2 near c4

The uncentred R-square of the above regression will be computed below to produce the overidentification test statistic,
also known as the Sargan statistic. The dependent variable i vr esi d is the 2SLS residual vector, saved earlier.

pr edi ct expl r esi d, xb

The predicted values from the regression are saved in order to calculate the uncentred R-squared.

mat r i x accumr ssmat = expl r esi d, noconst ant
mat r i x accumt ssmat = i vr esi d, noconst ant

Theres probably a neater way to do this, but I used these mat r i x accumcommands with a noconst ant option
in order to compute two scalars, r ssmat (which is the sum of squares of expl r esi d) and t ssmat (which is the
sum of squares of i vr esi d)

scal ar nobs=e( N)

e( N) is the sample size, which was automatically stored earlier. This command stores that value in a scalar variable
nobs.

scal ar x2=nobs*r ssmat [ 1, 1] / t ssmat [ 1, 1]

This command computes the overidentification test statistic, called x2.

scal ar pval =1- chi 2( 1, x2)

This command computes the P-value using the Stata function chi 2( n, x) , which computes the area to the left of x
under a chi-square distribution with n d.f.

scal ar l i st x2 pval

This prints out the values of x2 and pval .

l og cl ose
5
Now the log file:

. use " C: \ Document s and Set t i ngs\ cour ses\ 761 and 762\ w07\ 2SLS\ 2SLSeg. dt a"

.
. summar i ze

Var i abl e | Obs Mean St d. Dev. Mi n Max
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
near c2 | 2932 . 430764 . 4952676 0 1
near c4 | 2932 . 6828104 . 4654613 0 1
educ | 2932 13. 25887 2. 682475 1 18
age | 2932 28. 11937 3. 134548 24 34
smsa | 2932 . 7060027 . 4556684 0 1
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sout h | 2932 . 3915416 . 4881783 0 1
wage | 2932 577. 1872 264. 5756 100 2404
mar r i ed | 2932 . 7141883 . 4518772 0 1

.
. gen l wage=l og( wage)

.
. ** I V r egr essi on ( 2SLS) **
. i vr eg l wage age mar r i ed smsa ( educ = near c2 near c4)

I nst r ument al var i abl es ( 2SLS) r egr essi on

Sour ce | SS df MS Number of obs = 2932
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 4, 2927) = 122. 71
Model | - 19. 3235809 4 - 4. 83089521 Pr ob > F = 0. 0000
Resi dual | 601. 657409 2927 . 205554291 R- squar ed = .
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = .
Tot al | 582. 333829 2931 . 198680938 Root MSE = . 45338

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l wage | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
educ | . 1386543 . 0342091 4. 05 0. 000 . 0715779 . 2057307
age | . 0366522 . 0027297 13. 43 0. 000 . 0312999 . 0420044
mar r i ed | . 1937981 . 0201602 9. 61 0. 000 . 1542685 . 2333277
smsa | . 0976942 . 0417188 2. 34 0. 019 . 0158931 . 1794953
_cons | 3. 184304 . 4405519 7. 23 0. 000 2. 320481 4. 048127
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I nst r ument ed: educ
I nst r ument s: age mar r i ed smsa near c2 near c4
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

.
. ** gener al ver si on of Hausman t est **
. pr edi ct i vr esi d, r esi dual s

. est st or e i vr eg

. r eg l wage educ age mar r i ed smsa

Sour ce | SS df MS Number of obs = 2932
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 4, 2927) = 243. 70
Model | 145. 487691 4 36. 3719228 Pr ob > F = 0. 0000
Resi dual | 436. 846137 2927 . 149247057 R- squar ed = 0. 2498
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 2488
Tot al | 582. 333829 2931 . 198680938 Root MSE = . 38633

6
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l wage | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
educ | . 0485886 . 0027103 17. 93 0. 000 . 0432742 . 0539029
age | . 0364856 . 0023253 15. 69 0. 000 . 0319262 . 041045
mar r i ed | . 1759239 . 0161841 10. 87 0. 000 . 1441906 . 2076572
smsa | . 1962286 . 0159841 12. 28 0. 000 . 1648874 . 2275698
_cons | 4. 326357 . 074032 58. 44 0. 000 4. 181197 4. 471517
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

. hausman i vr eg . , const ant si gmamor e df ( 1)

Not e: t he r ank of t he di f f er enced var i ance mat r i x ( 1) does not equal t he number
of coef f i ci ent s bei ng t est ed
( 5) ; be sur e t hi s i s what you expect , or t her e may be pr obl ems
comput i ng t he t est . Exami ne t he out put
of your est i mat or s f or anyt hi ng unexpect ed and possi bl y consi der
scal i ng your var i abl es so t hat t he
coef f i ci ent s ar e on a si mi l ar scal e.

- - - - Coef f i ci ent s - - - -
| ( b) ( B) ( b- B) sqr t ( di ag( V_b- V_B) )
| i vr eg . Di f f er ence S. E.
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
educ | . 1386543 . 0485886 . 0900657 . 0290232
age | . 0366522 . 0364856 . 0001666 . 0000537
mar r i ed | . 1937981 . 1759239 . 0178742 . 0057599
smsa | . 0976942 . 1962286 - . 0985344 . 0317522
_cons | 3. 184304 4. 326357 - 1. 142053 . 3680211
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
b = consi st ent under Ho and Ha; obt ai ned f r omi vr eg
B = i nconsi st ent under Ha, ef f i ci ent under Ho; obt ai ned f r omr egr ess

Test : Ho: di f f er ence i n coef f i ci ent s not syst emat i c

chi 2( 1) = ( b- B) ' [ ( V_b- V_B) ^( - 1) ] ( b- B)
= 9.63
Pr ob>chi 2 = 0.0019
( V_b- V_B i s not posi t i ve def i ni t e)

.
. ** Wu ver si on of Hausman t est **
. qui et l y r eg educ age mar r i ed smsa near c2 near c4

. pr edi ct educhat , xb

. r eg l wage educ age mar r i ed smsa educhat

Sour ce | SS df MS Number of obs = 2932
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 5, 2926) = 197. 47
Model | 146. 924944 5 29. 3849888 Pr ob > F = 0. 0000
Resi dual | 435. 408884 2926 . 148806864 R- squar ed = 0. 2523
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 2510
Tot al | 582. 333829 2931 . 198680938 Root MSE = . 38575

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l wage | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
educ | . 0478031 . 0027181 17. 59 0. 000 . 0424736 . 0531327
age | . 0366522 . 0023225 15. 78 0. 000 . 0320982 . 0412061
mar r i ed | . 1937981 . 0171531 11. 30 0. 000 . 1601647 . 2274315
smsa | . 0976942 . 035496 2. 75 0. 006 . 0280945 . 1672939
educhat | . 0908512 . 0292331 3.11 0. 002 . 0335316 . 1481708
7
_cons | 3. 184304 . 3748395 8. 50 0. 000 2. 449328 3. 91928
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

.
. ** over i dent i f i cat i on t est **
. qui et l y r eg i vr esi d age mar r i ed smsa near c2 near c4

. pr edi ct expl r esi d, xb

. mat r i x accumr ssmat = expl r esi d, noconst ant
( obs=2932)

. mat r i x accumt ssmat = i vr esi d, noconst ant
( obs=2932)

. scal ar nobs=e( N)

. scal ar x2=nobs*r ssmat [ 1, 1] / t ssmat [ 1, 1]

. scal ar pval =1- chi 2( 1, x2)

. scal ar l i st x2 pval
x2 = 5.9600396
pval = .01463371

.
. l og cl ose
l og: C: \ Document s and Set t i ngs\ cour ses\ 761 and 762\ w07\ 2SLS\ 2SLS. l og
l og t ype: t ext
cl osed on: 13 Mar 2007, 16: 28: 25
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The two Hausman tests give identical information. The general version is in chi-square form, and equals
9.63, while the Wu version is a t-statistic, t = 3.11, which is the square root of 9.63. The have the same P-
value of .002, indicating rejection of the consistency of OLS, providing support for using 2SLS.

The overidentification test has a P-value of .014, which is significant at 5% but not 1%. So at the 5% level
we would reject the hypothesis that the instrumental variables near c2 and near c4 are exogenous. If no
other instrumental variables are available, it is hard to know what to do about this. We could drop one of
the two instruments, but we would not know if that solves the problem because we then have no
overidentification restrictions left to test.

Vous aimerez peut-être aussi