Vous êtes sur la page 1sur 71

Multiple Regression

(SW Chapter 5)
OLS estimate of the Test Score/STR relation:

TestScore
= 698.9 2.28STR, R
2
= .05, SER = 18.6
(10.4) (0.52)
Is this a credible estimate of the causal effect o test
scores of a cha!e i the studet"teacher ratio#
No$ there are omitted cofoudi! factors (famil%
icome& 'hether the studets are ati(e )!lish
s*ea+ers) that bias the ,-. estimator$ STR could be
/*ic+i! u*0 the effect of these cofoudi! factors.
5"1
Omitted Variable ias
(SW Se!tion 5"#)
1he bias i the ,-. estimator that occurs as a result of
a omitted factor is called omitted variable bias. 2or
omitted (ariable bias to occur, the omitted factor /Z0
must be$
1. a determiat of Y& and
2. correlated 'ith the re!ressor X.
Both conditions must hold for the omission of Z to result
in omitted variable bias.
5"2
I the test score e3am*le$
1. )!lish la!ua!e abilit% ('hether the studet has
)!lish as a secod la!ua!e) *lausibl% affects
stadardi4ed test scores$ Z is a determiat of Y.
2. Immi!rat commuities ted to be less affluet ad
thus ha(e smaller school bud!ets ad hi!her STR$
Z is correlated 'ith X.
5ccordi!l%,
1
6
is biased
7hat is the directio of this bias#
7hat does commo sese su!!est#
If commo sese fails %ou, there is a formula8
5"9
5 formula for omitted (ariable bias$ recall the e:uatio,
1
6

1
=
1
2
1
( )
( )
n
i i
i
n
i
i
X X u
X X

=
1
2
1
1
n
i
i
X
v
n
n
s
n

_

,

'here v
i
= (X
i
X )u
i
(X
i

X
)u
i
. ;der -east .:uares
5ssum*tio <1,
E=(X
i

X
)u
i
> = co((X
i
,u
i
) = 0.
?ut 'hat if E=(X
i

X
)u
i
> = co((X
i
,u
i
) =
Xu
0#
5"4
1he
1
6

1
=
1
2
1
( )
( )
n
i i
i
n
i
i
X X u
X X

=
1
2
1
1
n
i
i
X
v
n
n
s
n

_

,

so
E(
1
6
)
1
=
1
2
1
( )
( )
n
i i
i
n
i
i
X X u
E
X X

1
1
1

1
]


2
Xu
X

=
u Xu
X X u


_ _


, ,
'here holds 'ith e:ualit% 'he n is lar!e& s*ecificall%,
1
6

p


1
@
u
Xu
X

_

,
, 'here
Xu
= corr(X,u)
5"5
Omitted $ariable bias formula$
1
6

p


1
@
u
Xu
X

_

,
.
If a omitted factor Z is both$
(1) a determiat of Y (that is, it is cotaied i u)&
and
(2) correlated 'ith X,
the
Xu
0 ad the ,-. estimator
1
6
is biased.
1he math ma+es *recise the idea that districts 'ith fe'
).- studets (1) do better o stadardi4ed tests ad (2)
ha(e smaller classes (bi!!er bud!ets), so i!ori! the
).- factor results i o(erstati! the class si4e effect.
Is this is actually going on in the C data#
5"6
Aistricts 'ith fe'er )!lish -earers ha(e hi!her test
scores
Aistricts 'ith lo'er *ercet E! ("ctE!) ha(e smaller
classes
5"B
5mo! districts 'ith com*arable "ctE!, the effect of class
si4e is small (recall o(erall /test score !a*0 = B.4)
5"8
%hree &a's to o$er!ome omitted $ariable bias
1. Cu a radomi4ed cotrolled e3*erimet i 'hich
treatmet (STR) is radoml% assi!ed$ the "ctE! is
still a determiat of TestScore, but "ctE! is
ucorrelated 'ith STR. (#ut this is unrealistic in
practice$)
2. 5do*t the /cross tabulatio0 a**roach, 'ith fier
!radatios of STR ad "ctE! (#ut soon %e %ill run
out of data& and %hat about other determinants li'e
family income and parental education#)
9. ;se a method i 'hich the omitted (ariable ("ctE!) is
o lo!er omitted$ iclude "ctE! as a additioal
re!ressor i a multi*le re!ressio.
5"9
%he (opulation Multiple Regression Model
(SW Se!tion 5"))
Dosider the case of t'o re!ressors$
Y
i
=
0
@
1
X
1i
@
2
X
2i
@ u
i
, i = 1,8,n
X
1
, X
2
are the t'o independent variables (regressors)
(Y
i
, X
1i
, X
2i
) deote the i
th
obser(atio o Y, X
1
, ad X
2
.

0
= u+o' *o*ulatio iterce*t

1
= effect o Y of a cha!e i X
1
, holdi! X
2
costat

2
= effect o Y of a cha!e i X
2
, holdi! X
1
costat
5"10
u
i
= /error term0 (omitted factors)
Interpretation of multiple regression coefficients
Y
i
=
0
@
1
X
1i
@
2
X
2i
@ u
i
, i = 1,8,n
Dosider cha!i! X
1
b% X
1
'hile holdi! X
2
costat$
Eo*ulatio re!ressio lie before the cha!e$
Y =
0
@
1
X
1
@
2
X
2
Eo*ulatio re!ressio lie, after the cha!e$
Y @ Y =
0
@
1
(X
1
@ X
1
) @
2
X
2

5"11
Before$ Y =
0
@
1
(X
1
@ X
1
) @
2
X
2

After$ Y @ Y =
0
@
1
(X
1
@ X
1
) @
2
X
2
Difference$ Y =
1
X
1
1hat is,

1
=
1
Y
X

, holding X
)
!onstant
also,

2
=
2
Y
X

, holding X
#
!onstant
ad

0
= *redicted (alue of Y 'he X
1
= X
2
= 0.
5"12
%he OLS *stimator in Multiple Regression
(SW Se!tion 5"+)
7ith t'o re!ressors, the ,-. estimator sol(es$
0 1 2
2
, , 0 1 1 2 2
1
mi = ( )>
n
b b b i i i
i
Y b b X b X

+ +

1he ,-. estimator miimi4es the a(era!e s:uared


differece bet'ee the actual (alues of Y
i
ad the
*redictio (*redicted (alue) based o the estimated lie.
1his miimi4atio *roblem is sol(ed usi! calculus
1he result is the OLS estimators of
,
and
#
.
5"19
*-ample: the California test s!ore data
Ce!ressio of TestScore a!aist STR$

TestScore
= 698.9 2.28STR
Fo' iclude *ercet )!lish -earers i the district
("ctE!)$

TestScore
= 696.0 1.10STR 0.65"ctE!
7hat ha**es to the coefficiet o STR#
7h%# (Note$ corr(STR, "ctE!) = 0.19)
5"14
Multiple regression in S%.%.
reg testscr str pctel, robust;
Regression with robust standard errors Number of obs = 420
F( 2, 41! = 22"#$2
%rob & F = 0#0000
R's(uared = 0#42)4
Root *+, = 14#4)4
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
- Robust
testscr - .oef# +td# ,rr# t %&-t- /012 .onf# 3nter4al5
'''''''''''''6''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
str - '1#10120) #4"2$42 '2#14 0#011 '1#0121" '#2104)1)
pctel - '#)40)$ #0"10"1$ '20#04 0#000 '#101 '#1$$$)
7cons - )$)#0"22 $#2$224 $#)0 0#000 ))$#$14 0"#1$0
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

TestScore
= 696.0 1.10STR 0.65"ctE!
7hat are the sam*li! distributio of
1
6
ad
2
6
#
5"15
%he Least S/uares .ssumptions for Multiple
Regression (SW Se!tion 5"0)
Y
i
=
0
@
1
X
1i
@
2
X
2i
@ 8 @
'
X
'i
@ u
i
, i = 1,8,n
1. 1he coditioal distributio of u !i(e the XGs has
mea 4ero, that is, E(uHX
1
= (
1
,8, X
'
= (
'
) = 0.
2. (X
1i
,8,X
'i
&Y
i
), i =1,8,n, are i.i.d.
9. X
1
,8, X
'
, ad u ha(e four momets$ E(
4
1i
X
) I ,8,
E(
4
'i
X
) I , E(
4
i
u
) I .
4. 1here is o *erfect multicolliearit%.
5"16
.ssumption 1#: the !onditional mean of u gi$en the
in!luded X2s is 3ero"
1his has the same iter*retatio as i re!ressio
'ith a si!le re!ressor.
If a omitted (ariable (1) belo!s i the e:uatio (so
is i u) ad (2) is correlated 'ith a icluded X, the
this coditio fails
2ailure of this coditio leads to omitted (ariable
bias
1he solutio if possible is to iclude the omitted
(ariable i the re!ressio.
5"1B
.ssumption 1): (X
#i
454X
ki
,Y
i
)4 i 6#454n4 are i"i"d"
1his is satisfied automaticall% if the data are collected
b% sim*le radom sam*li!.
.ssumption 1+: finite fourth moments
1his is techical assum*tio is satisfied automaticall%
b% (ariables 'ith a bouded domai (test scores,
"ctE!, etc.)
5"18
.ssumption 10: %here is no perfe!t multi!ollinearit'
erfect multicollinearit! is 'he oe of the re!ressors is
a e3act liear fuctio of the other re!ressors.
"#ample$ .u**ose %ou accidetall% iclude STR t'ice$
regress testscr str str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 41$! = 10#2)
%rob & F = 0#0000
R's(uared = 0#0112
Root *+, = 1$#1$1
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
- Robust
testscr - .oef# +td# ,rr# t %&-t- /012 .onf# 3nter4al5
''''''''6''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
str - '2#20$0$ #1104$02 '4#"0 0#000 '"#"00041 '1#21$)1
str - (dropped!
7cons - )0$#0"" 10#")4") )#44 0#000 )$#1)02 10#"01
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
5"19
erfect multicollinearit! is 'he oe of the re!ressors is
a e3act liear fuctio of the other re!ressors.
I the *re(ious re!ressio,
1
is the effect o TestScore
of a uit cha!e i STR, holdi! STR costat (###)
.ecod e3am*le$ re!ress TestScore o a costat, ),
ad #, 'here$ )
i
= 1 if STR J 20, = 0 other'ise& #
i
= 1
if STR K20, = 0 other'ise, so #
i
= 1 )
i
ad there is
*erfect multicolliearit%
7ould there be *erfect multicolliearit% if the iterce*t
(costat) 'ere someho' dro**ed (that is, omitted or
su**ressed) i the re!ressio#
Eerfect multicolliearit% usuall% reflects a mista+e i
the defiitios of the re!ressors, or a oddit% i the data
5"20
%he Sampling 7istribution of the OLS *stimator
(SW Se!tion 5"5)
;der the four -east .:uares 5ssum*tios,
1he e3act (fiite sam*le) distributio of
1
6


has mea

1
, (ar(
1
6
) is i(ersel% *ro*ortioal to n& so too for
2
6
.
,ther tha its mea ad (ariace, the e3act
distributio of
1
6
is (er% com*licated

1
6

is cosistet$
1
6

p


1
(la' of lar!e umbers)

1 1
1
6 6
( )
6
(ar( )
E

is a**ro3imatel% distributed N(0,1) (D-1)


.o too for
2
6
,8,
6
'

5"21
8'pothesis %ests and Confiden!e 9nter$als for a
Single Coeffi!ient in Multiple Regression
(SW Se!tion 5":)

1 1
1
6 6
( )
6
(ar( )
E

is a**ro3imatel% distributed N(0,1) (D-1).


1hus h%*otheses o
1
ca be tested usi! the usual t"
statistic, ad cofidece iter(als are costructed as L
1
6

1.96SE(
1
6

)M.
.o too for
2
,8,
'
.

1
6
ad
2
6
are !eerall% ot ide*edetl% distributed
so either are their t"statistics (more o this later).
5"22
"#ample: 1he Daliforia class si4e data
(1)

TestScore

= 698.9 2.28STR
(10.4) (0.52)
(2)

TestScore

= 696.0 1.10STR 0.650"ctE!
(8.B) (0.49) (0.091)
1he coefficiet o STR i (2) is the effect o
TestScores of a uit cha!e i STR, holdi! costat
the *erceta!e of )!lish -earers i the district
Doefficiet o STR falls b% oe"half
95N cofidece iter(al for coefficiet o STR i (2)
is L1.10 1.960.49M = (1.95, 0.26)
5"29
%ests of ;oint 8'potheses
(SW Se!tion 5"<)
-et E(pn = e3*editures *er *u*il ad cosider the
*o*ulatio re!ressio model$
TestScore
i
=
0
@
1
STR
i
@
2
E(pn
i
@
9
"ctE!
i
@ u
i
1he ull h%*othesis that /school resources doGt matter,0
ad the alterati(e that the% do, corres*ods to$
*
0
$
1
= 0 and
2
= 0
(s. *
1
$ either
1
0 or
2
0 or both
5"24
TestScore
i
=
0
@
1
STR
i
@
2
E(pn
i
@
9
"ctE!
i
@ u
i
*
0
$
1
= 0 and
2
= 0
(s. *
1
$ either
1
0 or
2
0 or both
5 $oint h!pothesis s*ecifies a (alue for t'o or more
coefficiets, that is, it im*oses a restrictio o t'o or
more coefficiets.
5 /commo sese0 test is to reOect if either of the
idi(idual t"statistics e3ceeds 1.96 i absolute (alue.
?ut this /commo sese0 a**roach doesGt 'or+P
1he resulti! test doesGt ha(e the ri!ht si!ificace
le(elP
5"25
*ere+s %hy$ Dalculatio of the *robabilit% of icorrectl%
reOecti! the ull usi! the /commo sese0 test based o
the t'o idi(idual t"statistics. 1o sim*lif% the
calculatio, su**ose that
1
6
ad
2
6
are ide*edetl%
distributed. -et t
1
ad t
2
be the t"statistics$
t
1
=
1
1
6
0
6
( ) SE

ad t
2
=
2
2
6
0
6
( ) SE

1he /commo sese0 test is$


reOect *
0
$
1
=
2
= 0 if Ht
1
H K 1.96 adQor Ht
2
H K 1.96
7hat is the *robabilit% that this /commo sese0 test
reOects *
0
, 'he *
0
is actuall% true# (It should be 5N.)
5"26
Erobabilit% of icorrectl% reOecti! the ull
=
0
Er
* =Ht
1
H K 1.96 adQor Ht
2
H K 1.96>
=
0
Er
* =Ht
1
H K 1.96, Ht
2
H K 1.96>
@
0
Er
* =Ht
1
H K 1.96, Ht
2
H J 1.96>
@
0
Er
* =Ht
1
H J 1.96, Ht
2
H K 1.96> (disOoit e(ets)
=
0
Er
* =Ht
1
H K 1.96>
0
Er
* =Ht
2
H K 1.96>
@
0
Er
* =Ht
1
H K 1.96>
0
Er
* =Ht
2
H J 1.96>
@
0
Er
* =Ht
1
H J 1.96>
0
Er
* =Ht
2
H K 1.96>
(t
1
, t
2
are ide*edet b% assum*tio)
= .05.05 @ .05.95 @ .95.05
= .09B5 = 9.B5N 'hich is not the desired 5NPP
1he si%e of a test is the actual reOectio rate uder the ull
h%*othesis.
5"2B
1he si4e of the /commo sese0 test isGt 5NP
Its si4e actuall% de*eds o the correlatio bet'ee t
1

ad t
2
(ad thus o the correlatio bet'ee
1
6
ad
2
6
).
%&o Solutions:
;se a differet critical (alue i this *rocedure ot
1.96 (this is the /?oferroi method see 5**. 5.9)
;se a differet test statistic that test both
1
ad
2
at
oce$ the ,"statistic.
5"28
%he &=statisti!
1he ,"statistic tests all *arts of a Ooit h%*othesis at oce.
;*leasat formula for the s*ecial case of the Ooit
h%*othesis
1
=
1,0
ad
2
=
2,0
i a re!ressio 'ith t'o
re!ressors$
, =
1 2
1 2
2 2
1 2 , 1 2
2
,
6
2
1
6
2 1
t t
t t
t t t t

_
+

,
'here
1 2
,
6
t t

estimates the correlatio bet'ee t


1
ad t
2
.
CeOect 'he , is /lar!e0
5"29
1he ,"statistic testi!
1
ad
2
(s*ecial case)$
, =
1 2
1 2
2 2
1 2 , 1 2
2
,
6
2
1
6
2 1
t t
t t
t t t t

_
+

,
1he ,"statistic is lar!e 'he t
1
adQor t
2
is lar!e
1he ,"statistic corrects (i Oust the ri!ht 'a%) for the
correlatio bet'ee t
1
ad t
2
.
1he formula for more tha t'o Gs is reall% ast%
uless %ou use matri3 al!ebra.
1his !i(es the ,"statistic a ice lar!e"sam*le
a**ro3imate distributio, 'hich is8
5"90
Large=sample distribution of the &=statisti!
Dosider s*ecial case that t
1
ad t
2
are ide*edet, so
1 2
,
6
t t

0& i lar!e sam*les the formula becomes


, =
1 2
1 2
2 2
1 2 , 1 2
2
,
6
2
1
6
2 1
t t
t t
t t t t

_
+

,

2 2
1 2
1
( )
2
t t +
;der the ull, t
1
ad t
2
ha(e stadard ormal
distributios that, i this s*ecial case, are ide*edet
1he lar!e"sam*le distributio of the ,"statistic is the
distributio of the a(era!e of t'o ide*edetl%
distributed s:uared stadard ormal radom (ariables.
5"91
1he chi's(uared distributio 'ith - de!rees of freedom (
2
-

) is defied to be the distributio of the sum of -


ide*edet s:uared stadard ormal radom (ariables.
I lar!e sam*les, , is distributed as
2
-

Q-.
Sele!ted large=sample !riti!al $alues of
2
-

/(
- 5N critical (alue
1 9.84 (%hy#)
2 9.00 (the case -=2 abo(e)
9 2.60
4 2.9B
5 2.21
5"92
p.value using the ,.statistic$
p"(alue = tail *robabilit% of the
2
-

Q- distributio
be%od the ,"statistic actuall% com*uted.
9mplementation in S%.%.
;se the /test0 commad after the re!ressio
E(ample/ 1est the Ooit h%*othesis that the *o*ulatio
coefficiets o STR ad e3*editures *er *u*il
(e(pn0stu) are both 4ero, a!aist the alterati(e that at
least oe of the *o*ulatio coefficiets is o4ero.
5"99
,.test e(ample& California class si1e data$
reg testscr str e8pn7stu pctel, r;
Regression with robust standard errors Number of obs = 420
F( ", 41)! = 14#20
%rob & F = 0#0000
R's(uared = 0#4"))
Root *+, = 14#"1"
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
- Robust
testscr - .oef# +td# ,rr# t %&-t- /012 .onf# 3nter4al5
'''''''''''''6''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
str - '#2$)"002 #4$202$ '0#10 0#11" '1#2"4001 #))120"
e8pn7stu - #00"$)0 #0011$0 2#41 0#011 #000)0 #00)011
pctel - '#)1)022 #0"1$44 '20#)4 0#000 '#1$100$ '#10"144)
7cons - )40#10 11#41$"4 42#02 0#000 )10#101 )0#0)41
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
NOTE
test str e8pn7stu; The test command follows the regression
( 1! str = 0#0 There are q=2 restrictions being tested
( 2! e8pn7stu = 0#0
F( 2, 41)! = 1#4" The 5% critical value for q=2 is "#00
%rob & F = 0#004 Stata computes the pvalue for !ou
5"94
T%o 2related3 loose ends$
1. Romos+edasticit%"ol% (ersios of the ,"statistic
2. 1he /,0 distributio
%he homos>edasti!it'=onl' (?rule=of=thumb@) &=
statisti!
1o com*ute the homos+edasticit%"ol% 2"statistic$
;se the *re(ious formulas, but usi!
homos+edasticit%"ol% stadard errors& or
Cu t'o re!ressios, oe uder the ull h%*othesis
(the /restricted0 re!ressio) ad oe uder the
alterati(e h%*othesis (the /urestricted0 re!ressio).
1he secod method !i(es a sim*le formula
5"95
The 4restricted5 and 4unrestricted5 regressions
E(ample$ are the coefficiets o STR ad E(pn 4ero#
Cestricted *o*ulatio re!ressio (that is, uder *
0
)$
TestScore
i
=
0
@
9
"ctE!
i
@ u
i
(%hy#)
;restricted *o*ulatio re!ressio (uder *
1
)$
TestScore
i
=
0
@
1
STR
i
@
2
E(pn
i
@
9
"ctE!
i
@ u
i
1he umber of restrictios uder *
0
= - = 2.
1he fit 'ill be better (R
2
'ill be hi!her) i the
urestricted re!ressio (%hy#)
5"96
?% ho' much must the R
2
icrease for the coefficiets o
E(pn ad "ctE! to be Oud!ed statisticall% si!ificat#
Simple formula for the homos'edasticity.only ,.statistic/
, =
2 2
2
( ) Q
(1 ) Q( 1)
unrestricted restricted
unrestricted unrestricted
R R -
R n '



'here$
2
restricted
R
= the R
2
for the restricted re!ressio
2
unrestricted
R
= the R
2
for the urestricted re!ressio
- = the umber of restrictios uder the ull
'
unrestricted
= the umber of re!ressors i the
urestricted re!ressio.
5"9B
E(ample$
Cestricted re!ressio$

TestScore
= 644.B 0.6B1"ctE!,
2
restricted
R
= 0.4149
(1.0) (0.092)
;restricted re!ressio$

TestScore
= 649.6 0.29STR @ 9.8BE(pn 0.656"ctE!
(15.5) (0.48) (1.59) (0.092)
2
unrestricted
R
= 0.4966, '
unrestricted
= 9, - = 2
so$
, =
2 2
2
( ) Q
(1 ) Q( 1)
unrestricted restricted
unrestricted unrestricted
R R -
R n '


=
(.4966 .4149) Q 2
(1 .4966) Q(420 9 1)


= 8.01
5"98
The homos'edasticity.only ,.statistic
, =
2 2
2
( ) Q
(1 ) Q( 1)
unrestricted restricted
unrestricted unrestricted
R R -
R n '


1he homos+edasticit%"ol% ,"statistic reOects 'he
addi! the t'o (ariables icreased the R
2
b% /eou!h0
that is, 'he addi! the t'o (ariables im*ro(es the
fit of the re!ressio b% /eou!h0
If the errors are homos+edastic, the the
homos+edasticit%"ol% ,"statistic has a lar!e"sam*le
distributio that is
2
-

Q-.
?ut if the errors are heteros+edastic, the lar!e"sam*le
distributio is a mess ad is ot
2
-

Q-
5"99
%he & distribution
If$
1. u
1
,8,u
n
are ormall% distributed& ad
2. X
i
is distributed ide*edetl% of u
i
(so i
*articular u
i
is homos+edastic)
the the homos+edasticit%"ol% ,"statistic has the
/,
-&n.'61
0 distributio, 'here - = the umber of
restrictios ad ' = the umber of re!ressors uder the
alterati(e (the urestricted model).
5"40
1he ,
-,n6'61
distributio$
1he , distributio is tabulated ma% *laces
7he n !ets lar!e the ,
-&n.'61
distributio as%m*totes
to the
2
-

Q- distributio$
&
(4
is another name for
2
-

/(
2or - ot too bi! ad nS100, the ,
-,n6'61
distributio
ad the
2
-

Q- distributio are essetiall% idetical.


Ta% re!ressio *ac+a!es com*ute p"(alues of ,"
statistics usi! the , distributio ('hich is ,U if the
sam*le si4e is 100
Vou 'ill ecouter the /,"distributio0 i *ublished
em*irical 'or+.
5"41
)igression/ little history of statistics7
1he theor% of the homos+edasticit%"ol% ,"statistic
ad the ,
-,n6'61
distributios rests o im*lausibl%
stro! assum*tios (are eari!s ormall%
distributed#)
1hese statistics dates to the earl% 20
th
cetur%, 'he
/com*uter0 'as a Oob descri*tio ad obser(atios
umbered i the do4es.
1he ,"statistic ad ,
-,n6'61
distributio 'ere maOor
brea+throu!hs$ a easil% com*uted formula& a si!le
set of tables that could be *ublished oce, the
a**lied i ma% setti!s& ad a *recise,
mathematicall% ele!at Oustificatio.
5"42
little history of statistics& ctd7
1he stro! assum*tios seemed a mior *rice for this
brea+throu!h.
?ut 'ith moder com*uters ad lar!e sam*les 'e ca
use the heteros+edasticit%"robust ,"statistic ad the
,
-,
distributio, 'hich ol% re:uire the four least
s:uares assum*tios.
1his historical le!ac% *ersists i moder soft'are, i
'hich homos+edasticit%"ol% stadard errors (ad ,"
statistics) are the default, ad i 'hich p"(alues are
com*uted usi! the ,
-,n6'61
distributio.
5"49
Summar': the homos>edasti!it'=onl' (?rule of
thumb@) &=statisti! and the & distribution
1hese are Oustified ol% uder (er% stro! coditios
stro!er tha are realistic i *ractice.
Vet, the% are 'idel% used.
You should use the heteros+edasticit%"robust ,"
statistic, 'ith
2
-

Q- (that is, ,
-,
) critical (alues.
2or n S 100, the ,"distributio essetiall% is the
2
-

Q-
distributio.
2or small n, the , distributio isGt ecessaril% a
/better0 a**ro3imatio to the sam*li! distributio of
the ,"statistic ol% if the stro! coditios are true.
5"44
Summar': testing Aoint h'potheses
1he /commo"sese0 a**roach of reOecti! if either
of the t"statistics e3ceeds 1.96 reOects more tha 5N of
the time uder the ull (the si1e e3ceeds the desired
si!ificace le(el)
1he heteros+edasticit%"robust ,"statistic is built i to
.1515 (/test0 commad)& this tests all - restrictios
at oce.
2or n lar!e, , is distributed as
2
-

Q- (= ,
-,
)
1he homos+edasticit%"ol% ,"statistic is im*ortat
historicall% (ad thus i *ractice), ad is ituiti(el%
a**eali!, but i(alid 'he there is heteros+edasticit%
5"45
%esting Single Restri!tions on Multiple Coeffi!ients
(SW Se!tion 5"B)
Y
i
=
0
@
1
X
1i
@
2
X
2i
@ u
i
, i = 1,8,n
Dosider the ull ad alterati(e h%*othesis,
*
0
$
1
=
2
(s. *
1
$
1

2
1his ull im*oses a single restrictio (- = 1) o multiple
coefficiets it is ot a Ooit h%*othesis 'ith multi*le
restrictios (com*are 'ith
1
= 0 ad
2
= 0).
5"46
1'o methods for testi! si!le restrictios o multi*le
coefficiets$
1. Cearra!e (/trasform0) the re!ressio
Cearra!e the re!ressors so that the restrictio
becomes a restrictio o a si!le coefficiet i
a e:ui(alet re!ressio
2. Eerform the test directl%
.ome soft'are, icludi! .1515, lets %ou test
restrictios usi! multi*le coefficiets directl%
5"4B
8ethod 9/ Rearrange 24transform53 the regression
Y
i
=
0
@
1
X
1i
@
2
X
2i
@ u
i

*
0
$
1
=
2
(s. *
1
$
1

2
5dd ad subtract
2
X
1i
$
Y
i
=
0
@ (
1

2
)

X
1i
@
2
(X
1i
@

X
2i
) @ u
i
or
Y
i
=
0
@
1
X
1i
@
2
:
i
@ u
i
'here

1
=
1

2
:
i
= X
1i
@ X
2i
5"48
2a3 ;riginal system$
Y
i
=
0
@
1
X
1i
@
2
X
2i
@ u
i

*
0
$
1
=
2
(s. *
1
$
1

2

2b3 Rearranged 24transformed53 system$
Y
i
=
0
@
1
X
1i
@
2
:
i
@ u
i
'here
1
=
1

2
ad :
i
= X
1i
@ X
2i
so
*
0
$
1
= 0 (s. *
1
$
1
0
1he testi! *roblem is o' a sim*le oe$
test 'hether
1
= 0 i s*ecificatio (b).
5"49
8ethod </ "erform the test directly
Y
i
=
0
@
1
X
1i
@
2
X
2i
@ u
i

*
0
$
1
=
2
(s. *
1
$
1

2
E(ample$
TestScore
i
=
0
@
1
STR
i
@
2
E(pn
i
@
9
"ctE!
i
@ u
i
1o test, usi! .1515, 'hether
1
=
2
$
regress testscore str expn pctel, r
test str=expn
Confiden!e Sets for Multiple Coeffi!ients
5"50
(SW Se!tion 5"C)
Y
i
=
0
@
1
X
1i
@
2
X
2i
@ 8 @
'
X
'i
@ u
i
, i = 1,8,n
7hat is a =oint cofidece set for
1
ad
2
#
5 95N confidence set is$
5 set"(alued fuctio of the data that cotais the true
*arameter(s) i 95N of h%*othetical re*eated sam*les.
1he set of *arameter (alues that caot be reOected at
the 5N si!ificace le(el 'he ta+e as the ull
h%*othesis.
5"51
1he coverage rate of a cofidece set is the *robabilit%
that the cofidece set cotais the true *arameter (alues
5 /commo sese0 cofidece set is the uio of the
95N cofidece iter(als for
1
ad
2
, that is, the
recta!le$
L
1
6
1.96SE(
1
6
),
2
6
1.96 SE(
2
6
)M
7hat is the co(era!e rate of this cofidece set#
Aes its co(era!e rate e:ual the desired cofidece
le(el of 95N#
5"52
Do(era!e rate of /commo sese0 cofidece set$
Er=(
1
,
2
) L
1
6
1.96SE(
1
6
),
2
6
1.96 SE(
2
6
)M>
= Er=
1
6
1.96SE(
1
6
)
1

1
6
@ 1.96SE(
1
6
),

2
6
1.96SE(
2
6
)
2

2
6
@ 1.96SE(
2
6
)>
= Er=1.96
1 1
1
6
6
( ) SE

1.96, 1.96
2 2
2
6
6
( ) SE

1.96>
= Er=Ht
1
H 1.96 ad Ht
2
H 1.96>
= 1 Er=Ht
1
H K 1.96 adQor Ht
2
H K 1.96> 95N P
7h%#
This confidence set 4inverts5 a test for %hich the si1e
doesn+t e-ual the significance level>
5"59
Recall$ the *robabilit% of icorrectl% reOecti! the ull
=
0
Er
* =Ht
1
H K 1.96 adQor Ht
2
H K 1.96>
=
0
Er
* =Ht
1
H K 1.96, Ht
2
H K 1.96>
@
0
Er
* =Ht
1
H K 1.96, Ht
2
H J 1.96>
@
0
Er
* =Ht
1
H J 1.96, Ht
2
H K 1.96> (disOoit e(ets)
=
0
Er
* =Ht
1
H K 1.96>
0
Er
* =Ht
2
H K 1.96>
@
0
Er
* =Ht
1
H K 1.96>
0
Er
* =Ht
2
H J 1.96>
@
0
Er
* =Ht
1
H J 1.96>
0
Er
* =Ht
2
H K 1.96>
(if t
1
, t
2
are ide*edet)
= .05.05 @ .05.95 @ .95.05
= .09B5 = 9.B5N 'hich is not the desired 5NPP
Istead, use the acce*tace re!io of a test that has si4e
e:ual to its si!ificace le(el (/i(ert0 a (alid test)$
5"54
-et ,(
1,0
,
2,0
) be the (heteros+edasticit%"robust) ,"
statistic testi! the h%*othesis that
1
=
1
,
0
ad
2
=
2,0
$
95N cofidece set = L
1,0
,
2,0
$ ,(
1,0
,
2,0
) I 9.00M
9.00 is the 5N critical (alue of the 2
2,
distributio
1his set has co(era!e rate 95N because the test o
'hich it is based (the test it /i(erts0) has si4e of 5N.
5"55
The confidence set based on the ,.statistic is an ellipse
L
1
,
2
$ , =
1 2
1 2
2 2
1 2 , 1 2
2
,
6
2
1
6
2 1
t t
t t
t t t t

_
+

,
J 9.00M
Fo'
, =
1 2
1 2
2 2
1 2 , 1 2
2
,
1
6
2
6
2(1 )
t t
t t
t t t t

1 +
]

1 2
1 2
2
,
2 2
2 2,0 1 1,0 1 1,0 2 2,0
,
2 1 1 2
1
6
2(1 )
6 6 6 6
6
2
6 6 6 6
( ) ( ) ( ) ( )
t t
t t
SE SE SE SE

1
_ _ _ _

1
+ +


1
, , , ,
]
1his is a :uadratic form i
1,0
ad
2,0
thus the
boudar% of the set , = 9.00 is a elli*se.
5"56
Confidence set based on inverting the ,.statistic
5"5B
%he R
)
4 S"R4 and
2
R
for Multiple Regression
(SW Se!tion 5"#,)
5ctual = *redicted @ residual$ Y
i
=
6
i
Y @
6
i
u
5s i re!ressio 'ith a si!le re!ressor, the SER (ad the
R8SE) is a measure of the s*read of the YGs aroud the
re!ressio lie$
SER =
2
1
1
6
1
n
i
i
u
n '



5"58
1he R
2
is the fractio of the (ariace e3*laied$
R
2
=
ESS
TSS
=
1
SSR
TSS

,
'here ESS =
2
1
6 6
( )
n
i
i
Y Y

, SSR =
2
1
6
n
i
i
u

, ad TSS =
2
1
( )
n
i
i
Y Y

Oust as for re!ressio 'ith oe re!ressor.


1he R
2
al'a%s icreases 'he %ou add aother
re!ressor a bit of a *roblem for a measure of /fit0
1he
2
R
corrects this *roblem b% /*eali4i!0 %ou for
icludi! aother re!ressor$
5"59
2
R
6
1
1
1
n SSR
n ' TSS



,
so
2
R
D R
2
*o% to interpret the R
2
ad
2
R
#
5 hi!h R
2
(or
2
R
) meas that the re!ressors e3*lai
the (ariatio i Y.
5 hi!h R
2
(or
2
R
) does not mea that %ou ha(e
elimiated omitted (ariable bias.
5 hi!h R
2
(or
2
R
) does not mea that %ou ha(e a
ubiased estimator of a causal effect (
1
).
5 hi!h R
2
(or
2
R
) does not mea that the icluded
(ariables are statisticall% si!ificat this must be
determied usi! h%*otheses tests.
5"60
5"61
"#ample: . Closer Loo> at the %est S!ore 7ata
(SW Se!tion 5"##4 5"#))
general approach to variable selection and model
specification$
.*ecif% a /base0 or /bechmar+0 model.
.*ecif% a ra!e of *lausible alterati(e models, 'hich
iclude additioal cadidate (ariables.
Aoes a cadidate (ariable cha!e the coefficiet of
iterest (
1
)#
Is a cadidate (ariable statisticall% si!ificat#
;se Oud!met, ot a mechaical reci*e8
5"62
5"69
?ariables %e %ould li'e to see in the California data set$
S!hool !hara!teristi!s:
studet"teacher ratio
teacher :ualit%
com*uters (o"teachi! resources) *er studet
measures of curriculum desi!8
Student !hara!teristi!s:
)!lish *roficiec%
a(ailabilit% of e3tracurricular erichmet
home leari! e(iromet
5"64
*aretGs educatio le(el8
5"65
?ariables actually in the California class si1e data set$
studet"teacher ratio (STR)
*ercet )!lish learers i the district ("ctE!)
*ercet eli!ible for subsidi4edQfree luch
*ercet o *ublic icome assistace
a(era!e district icome
5"66
loo' at more of the California data
5"6B
)igression/ presentation of regression results in a table
-isti! re!ressios i /e:uatio0 form ca be
cumbersome 'ith ma% re!ressors ad ma% re!ressios
1ables of re!ressio results ca *reset the +e%
iformatio com*actl%
Iformatio to iclude$
(ariables i the re!ressio (de*edet ad
ide*edet)
estimated coefficiets
stadard errors
results of ,"tests of *ertiet Ooit h%*otheses
some measure of fit
5"68
umber of obser(atios
5"69
5"B0
Summar': Multiple Regression
Tulti*le re!ressio allo's %ou to estimate the effect
o Y of a cha!e i X1, holdi! X2 costat.
If %ou ca measure a (ariable, %ou ca a(oid omitted
(ariable bias from that (ariable b% icludi! it.
1here is o sim*le reci*e for decidi! 'hich (ariables
belo! i a re!ressio %ou must e3ercise Oud!met.
,e a**roach is to s*ecif% a base model rel%i! o
a.priori reasoi! the e3*lore the sesiti(it% of the
+e% estimate(s) i alterati(e s*ecificatios.
5"B1

Vous aimerez peut-être aussi