Vous êtes sur la page 1sur 17

Sampling Error of the OLS Coefficient estimatesrecap

Suppose the assumptions of the classical linear model hold, we fix X, and we
know the true parameters of the model, and .
Recall that the true model says that
i
is drawn, iid, from the same distribution
for every observation. So the batch of
i
s (or if you prefer, the vector ), will be
different for every sample of fixed size.
Suppose we
!drew an at random ("iven )
!then calculated y # X $ (so we have generated some fake data that we know
follows the models true data-generating process (DGP))
!then estimated b by %&S on this fake data, i.e. on y and X.
'ecause (and therefore the y vector it "ives rise to) is random, the estimated b
will in "eneral not e(ual exactly. )nstead, it will deviate from accordin" to
the formula b- =

A X X X

*
*
) (



. (notice+if by some bizarre chance, we
happened to draw a vector of zeros for , this means wed "et b exactly ri"ht).
Lectre !" #inite-Sample $roperties of the OLS Estimator
%&ayashi, pp' ()-!*+
A' #inite-Sample $roperties" ,efinition
- .he $roperties ,and the assumptions needed for each see next pa"e-"
*. .nbiasedness /(b/X) # . ,*!0-

1. 2ariance of b 2ar(b/X) #
* 1
) (

X X
. ,*!3-

0. Gauss-!arkov "heorem 4he %&S estimator is the '&./. ,*!3-
3. %rtho"onality of b, e 5ov(b,e 6 X) #0, where e is the %&S
residual. ,*!3-
7. s
#

n n
$$%

=
e e
is unbiased, i.e. &(s
#
) #
#
,*!3-
8uick Recap of 9ssumptions
*. Linearity '
i
is a linear function of the (
i
s, plus an error term,
i
, i)e)*
'
i +
,
-
(
i-
$ ,
#
(
i#
$ : ,
-k
(
i
$
i
. or

=
+ =

k
i ik / i
( '
*

1. Strict Exogeneity 4he expected value of
i
6X e(uals zero for every i.
3. No Multicollinearity: 4he columns of X are linearly independent.
4. Homoskedasticity: 4he variance of
i
e(uals a constant,
1
for every i0 the
covariance of across observations is zero.
A' #inite Sample $roperties

4hese are properties that are true for any "iven sample size n. ;or example,
unbiasedness this refers to what you would expect, in the limit, if you ran a
re"ression of y on X over and over, each time usin" a sample of size n, and each
time takin" a new set of independent draws from the distribution of the error
term, . )f the mean of the estimated %&S coefficients conver"es to the true
coefficients, , then the %&S coefficients are said to be unbiased.
4hese properties help us interpret what we "et when we estimate an %&S
re"ression on real data, if <when we run that re"ression!! we believe that the
data were "enerated by one such draw from the distribution of .
-' .he $roperties
*' 1nbiase2ness of b" .nder assumptions *!0 (note we dont need
homoskedasticity), /(b/X) # .

=hat does it mean> ;irst, fix consider a fixed set of conditionin"
variables, X. ?ow consider a bunch of random draws of the true (nx*)
vector of disturbances, . ;or each such draw, the %&S estimator
produces a (@*) vector of estimated coefficients, b. Aroperty * says
that the expected value of b e(uals its true value. )t follows that, if we
took % draws (BreplicationsC) of and let % approach infinity, the mean
of the estimated bs across these replications would approach the true
value, .
Proof: ;irst, rewrite the property as /(b- /X) # 0.
=e have already shown (last class) that the samplin" error, b- =

A X X X

*
*
) (


.
(note, interestin"ly and importantly, that this doesnt depend on what the true is)
So,
). ( ) ( ) ( X / A X / A X / b & & & = =
'ut 9ssumption 1 (strict exo"eneity) says
D ) ( = X / &
. 8/E.
(' 3ariance of b .nder assumptions *!3, 2ar(b/X) #
* 1
) (

X X
.
=hat does it mean> 9"ain, fix X and consider a bunch of realizations of
the true disturbance vector, . /ach such realization produces a (@*)
vector of coefficients, b. )f we were to do this over many realizations of
, the (@ ) variance!covariance matrix among the elements of b would
be "iven by the true
1

(a scalar), times
*
) (

X X
.
Proof.
) ( ) ( X / b X / b =1ar 1ar
(because is not random).
) ( X / A 1ar =
(by the defn of A+
A X / A = ) ( 1ar
(bFc A is not random, "iven X)

A X / A = ) ( &
(because
D ) ( = &
strict exo"eneity)
A 4 A =
n
1

(by independence and homoskedasticity)


A A =
1

.
Substitutin" the definition of A #
X X X
*
) (
and simplifyin" completes the proof.
(note for future reference that this implies
*
= X+ X % A A
+ %5+
!' GaussMarko! "#eorem" .nder assumptions *!3, the %&S estimator
is efficient in the class of linear unbiased estimators, i.e. it is the '&./.
;ormally, for any unbiased estimator,

G
, that is linear in y,
) ( )
G
( X / b X / 1ar 1ar
.
=hat does it mean>
a) BlinearityC in the above expression means Blinear in yC. 4he %&S
estimator is linear in y because it can be computed as


Ay y X X X b = =

*
*
*
) (
n
n


.
b) recall that the variance of the estimator refers to how the computed estimator
)
G
or ( b
varies across independent draws of the true error vector, .
c) note that the ine(uality definin" BbestC is a matrix ine(uality, comparin" two
@ matrices. 4his means that the difference between these matrices,
) ( )
G
( X / b X / 1ar 1ar
, itself a @ matrix, is positive semidefinite. 9mon" other
thin"s, this means that the dia"onals of this matrix,
) ( )
G
( X / X /
k k
2 1ar , 1ar
, are all
positive.
Proof.
=rite the alternative estimator,

G
, as
Cy =
G
, where C can depend on X (but
not y). ("his esta2lishes the constraint that

G
must 2e linear). ;urther, define C
= , 6 A, so that , is in some sense the Bdiver"enceC between the matrix we use
to calculate

G
and the matrix we use to calculate the %&S estimator, b' (Recall
that
X X X A
*
) (
, and b=Ay). So,
Ay ,y y A , + = + = ) (
G
, so
b X , + + = ) (
G
#
b , ,X + + =
(*)
?ow, to incorporate the constraint that

G
be unbiased, consider its expectation
) 6 ( ) 6 ( ) 6
G
( X b X , ,X X & & & + + =
#
,X +
(because
0 X , = ) 6 ( &
(by strict exo"eneity), and
X b = ) 6 ( &

(by unbiasedness of %&S, which, we have recently proved).
.nbiasedness of

G
therefore re7ires ,X = 0. )ndeed, since we need

G
to be
unbiased re"ardless of the true , un$iasedness of

G
t#us re%uires ,X=0.
)ncorporatin" the re(uirement for unbiasedness (,X=0) into (*) now yields
b , + =
G
. Subtractin" from both si2es yiel2s an e8pression for the sampling error
of this alternati9e, linear nbiase2 estimator

G
, in terms of the OLS sampling
error
) ( b
"
) (
G
b , + =
.
.sin" our expression for the %&S samplin" error,
A X X X b = =
*
) (
, yields
A , A , ) (
G
+ = + =
. So the samplin" error of an' linear, unbiased estimator is
related to the %&S samplin" error, A, in this simple way.
%ur next step is to calculate the variance of

G
above so we can compare it to
the variance of the %&S estimator.
[ ] X A , X X 6 ) ( ) 6
G
( ) 6
G
( + = = 1ar 1ar 1ar

n n n n
1ar

+ + = ) ( ) ( ) ( A , A ,
(since , and A both depend only on X)
) )( (
1
+ + = A , A ,
(since n
1ar 4
1
) ( =
)
( ) A A A , , A , , + + + =
1

?ow, look at these expressions, term by term


[ ] D = =

=
* *
X+ X ,X% X X+ X % , A , , since ,X=0 (by unbiasedness of

G
)
(note the above uses symmetry of HH)
'y the same token,
= , A
0,
and from our proof of Aroperty 1 (2ariance of b" e(n (I)) ,
( )
*
= X X A A
.
So,
( ) X+ / %b X+ X X+ X % , , X
* *
1ar 1ar = + =

( ) 6
G
(
1 1

, where the ine(uality follows from
positive semidefiniteness of ,,:. 8/E.
5omments
*. Jow surprisin" is this, really> 4he Kloss function minimized by %&S is a
sum of s(uared error terms. 4he variance of b is also a sum of s(uares. So its
not totally surprisin" that minimizin" a sum of s(uares (rather than absolute
deviations, or some other function of the errors) minimizes the variance of b.
1. =hat are the alternative, unbiased linear estimators that %&S dominates>
)ts hard to think of promisin" examples. (=&S>)
;' Orthogonality of b,e" .nder 9ssumptions *!3, 5ov(b,e 6 X) #D, where e
is the %&S residual.
=hat does this mean> ;irst, fix X. ?ow consider a bunch of random
draws of the true (n3-) vector of disturbances, . ;or each such draw,
the %&S estimator produces a (3-) vector of estimated coefficients, b.
)t also produces an (n3-) vector of estimated residuals, e. ?ow consider
the (n3) covariance matrix between these two vectors across
realizations of (an element of this matrix is the 5ov between 2
k
and e
i

across draws of . All the elements of this matrix are zero. i.e. the
constructed bs and es are uncorrelated with each other.
Proof+see Jayashi.
<' 1nbiase2ness of s
&
" .nder assumptions *!3, s
#
L
n n
$$%

e e
is also
unbiased, i.e. &(s
#
6X) +
#
.
Proof:
;irst, note that s
#
#
n
e e
is unbiased iff
1
)
6
( =

n
&
X e e
, or
) ( ) 6 (
1
n & = X e e
, or
) ( ) 6 (
1
n & = X =
, where
= =
( ) X X X X 4
*


n is the n3n annihilator matrix that calculates e from .
=e now show, in turn, that
) ( ) 6 (
1
= X = trace & =
, and that
trace(=) # n - .
Aart *
) 6 ( ) 6 (
* *

= =
=
n
i
n
/
/ i i/
& m & X X =
(this Must writes out the (uadratic form, and uses
the fact that = depends only on X)

=
=
n
i
i/
m
*
1

(since all the cross terms e(ual zero, by no autocorrelation)

=
= =
n
i
i/
trace m
*
1 1
) (=
.
Aart 1 trace(=) # trace,
( ) X X X X 4
*


n - #
( ) - , trace - , trace X X X X 4
*


n
#
( ) - , trace X X X X
*


n
9nd,
( ) ( ) - , trace - , trace X X X X X X X X
* *
=

(because trace(9') # trace('9))
# trace(4

) # .
8/E.
5orollary the variance of the estimated residuals,
n n
$$% e e
=
, i.e. the sample
analog of N
1
, is a biased estimator of N
1
. )t is consistent, however (since n4
(n-) approaches one in lar"e samples).
)ntuition for the n- Bde"rees of freedom correctionC 'y assumption,
the n errors (s) are each independently drawn from the same
distribution, and "enerate n independent 's. ;rom these 's the %&S
procedure then calculates n residuals, e
i
+ '
i
5 8
i
b. 'ut, by construction,
these n "enerated residuals have to satisfy e(uality constraints, "iven
by the %&S normal e(uations. So, in some sense, they cant vary as
much as the s. =e adMust for this by inflatin" the variance of the es by
the factor nF(n-), which exceeds one.
9dditional )ntuition 5onsider the extreme case where the number of
observations # the number of variables (n#). )n this case %&S yields a
perfect fit. 4hat does not, of course, mean that the true N
1
is zero.
;inal note "iven property 7 plus property 1 !!the formula for 1ar(b6X)!!
a natural estimator of 1ar(b/X) is
* 1
) ( ) (

= X X X / b s 1arhat .
=ell talk about its properties in the next lesson, on hypothesis testin".

Vous aimerez peut-être aussi