Académique Documents
Professionnel Documents
Culture Documents
Suppose the assumptions of the classical linear model hold, we fix X, and we
know the true parameters of the model, and .
Recall that the true model says that
i
is drawn, iid, from the same distribution
for every observation. So the batch of
i
s (or if you prefer, the vector ), will be
different for every sample of fixed size.
Suppose we
!drew an at random ("iven )
!then calculated y # X $ (so we have generated some fake data that we know
follows the models true data-generating process (DGP))
!then estimated b by %&S on this fake data, i.e. on y and X.
'ecause (and therefore the y vector it "ives rise to) is random, the estimated b
will in "eneral not e(ual exactly. )nstead, it will deviate from accordin" to
the formula b- =
A X X X
*
*
) (
. (notice+if by some bizarre chance, we
happened to draw a vector of zeros for , this means wed "et b exactly ri"ht).
Lectre !" #inite-Sample $roperties of the OLS Estimator
%&ayashi, pp' ()-!*+
A' #inite-Sample $roperties" ,efinition
- .he $roperties ,and the assumptions needed for each see next pa"e-"
*. .nbiasedness /(b/X) # . ,*!0-
1. 2ariance of b 2ar(b/X) #
* 1
) (
X X
. ,*!3-
0. Gauss-!arkov "heorem 4he %&S estimator is the '&./. ,*!3-
3. %rtho"onality of b, e 5ov(b,e 6 X) #0, where e is the %&S
residual. ,*!3-
7. s
#
n n
$$%
=
e e
is unbiased, i.e. &(s
#
) #
#
,*!3-
8uick Recap of 9ssumptions
*. Linearity '
i
is a linear function of the (
i
s, plus an error term,
i
, i)e)*
'
i +
,
-
(
i-
$ ,
#
(
i#
$ : ,
-k
(
i
$
i
. or
=
+ =
k
i ik / i
( '
*
1. Strict Exogeneity 4he expected value of
i
6X e(uals zero for every i.
3. No Multicollinearity: 4he columns of X are linearly independent.
4. Homoskedasticity: 4he variance of
i
e(uals a constant,
1
for every i0 the
covariance of across observations is zero.
A' #inite Sample $roperties
4hese are properties that are true for any "iven sample size n. ;or example,
unbiasedness this refers to what you would expect, in the limit, if you ran a
re"ression of y on X over and over, each time usin" a sample of size n, and each
time takin" a new set of independent draws from the distribution of the error
term, . )f the mean of the estimated %&S coefficients conver"es to the true
coefficients, , then the %&S coefficients are said to be unbiased.
4hese properties help us interpret what we "et when we estimate an %&S
re"ression on real data, if <when we run that re"ression!! we believe that the
data were "enerated by one such draw from the distribution of .
-' .he $roperties
*' 1nbiase2ness of b" .nder assumptions *!0 (note we dont need
homoskedasticity), /(b/X) # .
=hat does it mean> ;irst, fix consider a fixed set of conditionin"
variables, X. ?ow consider a bunch of random draws of the true (nx*)
vector of disturbances, . ;or each such draw, the %&S estimator
produces a (@*) vector of estimated coefficients, b. Aroperty * says
that the expected value of b e(uals its true value. )t follows that, if we
took % draws (BreplicationsC) of and let % approach infinity, the mean
of the estimated bs across these replications would approach the true
value, .
Proof: ;irst, rewrite the property as /(b- /X) # 0.
=e have already shown (last class) that the samplin" error, b- =
A X X X
*
*
) (
.
(note, interestin"ly and importantly, that this doesnt depend on what the true is)
So,
). ( ) ( ) ( X / A X / A X / b & & & = =
'ut 9ssumption 1 (strict exo"eneity) says
D ) ( = X / &
. 8/E.
(' 3ariance of b .nder assumptions *!3, 2ar(b/X) #
* 1
) (
X X
.
=hat does it mean> 9"ain, fix X and consider a bunch of realizations of
the true disturbance vector, . /ach such realization produces a (@*)
vector of coefficients, b. )f we were to do this over many realizations of
, the (@ ) variance!covariance matrix among the elements of b would
be "iven by the true
1
(a scalar), times
*
) (
X X
.
Proof.
) ( ) ( X / b X / b =1ar 1ar
(because is not random).
) ( X / A 1ar =
(by the defn of A+
A X / A = ) ( 1ar
(bFc A is not random, "iven X)
A X / A = ) ( &
(because
D ) ( = &
strict exo"eneity)
A 4 A =
n
1
.
Substitutin" the definition of A #
X X X
*
) (
and simplifyin" completes the proof.
(note for future reference that this implies
*
= X+ X % A A
+ %5+
!' GaussMarko! "#eorem" .nder assumptions *!3, the %&S estimator
is efficient in the class of linear unbiased estimators, i.e. it is the '&./.
;ormally, for any unbiased estimator,
G
, that is linear in y,
) ( )
G
( X / b X / 1ar 1ar
.
=hat does it mean>
a) BlinearityC in the above expression means Blinear in yC. 4he %&S
estimator is linear in y because it can be computed as
Ay y X X X b = =
*
*
*
) (
n
n
.
b) recall that the variance of the estimator refers to how the computed estimator
)
G
or ( b
varies across independent draws of the true error vector, .
c) note that the ine(uality definin" BbestC is a matrix ine(uality, comparin" two
@ matrices. 4his means that the difference between these matrices,
) ( )
G
( X / b X / 1ar 1ar
, itself a @ matrix, is positive semidefinite. 9mon" other
thin"s, this means that the dia"onals of this matrix,
) ( )
G
( X / X /
k k
2 1ar , 1ar
, are all
positive.
Proof.
=rite the alternative estimator,
G
, as
Cy =
G
, where C can depend on X (but
not y). ("his esta2lishes the constraint that
G
must 2e linear). ;urther, define C
= , 6 A, so that , is in some sense the Bdiver"enceC between the matrix we use
to calculate
G
and the matrix we use to calculate the %&S estimator, b' (Recall
that
X X X A
*
) (
, and b=Ay). So,
Ay ,y y A , + = + = ) (
G
, so
b X , + + = ) (
G
#
b , ,X + + =
(*)
?ow, to incorporate the constraint that
G
be unbiased, consider its expectation
) 6 ( ) 6 ( ) 6
G
( X b X , ,X X & & & + + =
#
,X +
(because
0 X , = ) 6 ( &
(by strict exo"eneity), and
X b = ) 6 ( &
(by unbiasedness of %&S, which, we have recently proved).
.nbiasedness of
G
therefore re7ires ,X = 0. )ndeed, since we need
G
to be
unbiased re"ardless of the true , un$iasedness of
G
t#us re%uires ,X=0.
)ncorporatin" the re(uirement for unbiasedness (,X=0) into (*) now yields
b , + =
G
. Subtractin" from both si2es yiel2s an e8pression for the sampling error
of this alternati9e, linear nbiase2 estimator
G
, in terms of the OLS sampling
error
) ( b
"
) (
G
b , + =
.
.sin" our expression for the %&S samplin" error,
A X X X b = =
*
) (
, yields
A , A , ) (
G
+ = + =
. So the samplin" error of an' linear, unbiased estimator is
related to the %&S samplin" error, A, in this simple way.
%ur next step is to calculate the variance of
G
above so we can compare it to
the variance of the %&S estimator.
[ ] X A , X X 6 ) ( ) 6
G
( ) 6
G
( + = = 1ar 1ar 1ar
n n n n
1ar
+ + = ) ( ) ( ) ( A , A ,
(since , and A both depend only on X)
) )( (
1
+ + = A , A ,
(since n
1ar 4
1
) ( =
)
( ) A A A , , A , , + + + =
1
=
* *
X+ X ,X% X X+ X % , A , , since ,X=0 (by unbiasedness of
G
)
(note the above uses symmetry of HH)
'y the same token,
= , A
0,
and from our proof of Aroperty 1 (2ariance of b" e(n (I)) ,
( )
*
= X X A A
.
So,
( ) X+ / %b X+ X X+ X % , , X
* *
1ar 1ar = + =
( ) 6
G
(
1 1
, where the ine(uality follows from
positive semidefiniteness of ,,:. 8/E.
5omments
*. Jow surprisin" is this, really> 4he Kloss function minimized by %&S is a
sum of s(uared error terms. 4he variance of b is also a sum of s(uares. So its
not totally surprisin" that minimizin" a sum of s(uares (rather than absolute
deviations, or some other function of the errors) minimizes the variance of b.
1. =hat are the alternative, unbiased linear estimators that %&S dominates>
)ts hard to think of promisin" examples. (=&S>)
;' Orthogonality of b,e" .nder 9ssumptions *!3, 5ov(b,e 6 X) #D, where e
is the %&S residual.
=hat does this mean> ;irst, fix X. ?ow consider a bunch of random
draws of the true (n3-) vector of disturbances, . ;or each such draw,
the %&S estimator produces a (3-) vector of estimated coefficients, b.
)t also produces an (n3-) vector of estimated residuals, e. ?ow consider
the (n3) covariance matrix between these two vectors across
realizations of (an element of this matrix is the 5ov between 2
k
and e
i
across draws of . All the elements of this matrix are zero. i.e. the
constructed bs and es are uncorrelated with each other.
Proof+see Jayashi.
<' 1nbiase2ness of s
&
" .nder assumptions *!3, s
#
L
n n
$$%
e e
is also
unbiased, i.e. &(s
#
6X) +
#
.
Proof:
;irst, note that s
#
#
n
e e
is unbiased iff
1
)
6
( =
n
&
X e e
, or
) ( ) 6 (
1
n & = X e e
, or
) ( ) 6 (
1
n & = X =
, where
= =
( ) X X X X 4
*
n is the n3n annihilator matrix that calculates e from .
=e now show, in turn, that
) ( ) 6 (
1
= X = trace & =
, and that
trace(=) # n - .
Aart *
) 6 ( ) 6 (
* *
= =
=
n
i
n
/
/ i i/
& m & X X =
(this Must writes out the (uadratic form, and uses
the fact that = depends only on X)
=
=
n
i
i/
m
*
1
=
= =
n
i
i/
trace m
*
1 1
) (=
.
Aart 1 trace(=) # trace,
( ) X X X X 4
*
n - #
( ) - , trace - , trace X X X X 4
*
n
#
( ) - , trace X X X X
*
n
9nd,
( ) ( ) - , trace - , trace X X X X X X X X
* *
=
(because trace(9') # trace('9))
# trace(4
) # .
8/E.
5orollary the variance of the estimated residuals,
n n
$$% e e
=
, i.e. the sample
analog of N
1
, is a biased estimator of N
1
. )t is consistent, however (since n4
(n-) approaches one in lar"e samples).
)ntuition for the n- Bde"rees of freedom correctionC 'y assumption,
the n errors (s) are each independently drawn from the same
distribution, and "enerate n independent 's. ;rom these 's the %&S
procedure then calculates n residuals, e
i
+ '
i
5 8
i
b. 'ut, by construction,
these n "enerated residuals have to satisfy e(uality constraints, "iven
by the %&S normal e(uations. So, in some sense, they cant vary as
much as the s. =e adMust for this by inflatin" the variance of the es by
the factor nF(n-), which exceeds one.
9dditional )ntuition 5onsider the extreme case where the number of
observations # the number of variables (n#). )n this case %&S yields a
perfect fit. 4hat does not, of course, mean that the true N
1
is zero.
;inal note "iven property 7 plus property 1 !!the formula for 1ar(b6X)!!
a natural estimator of 1ar(b/X) is
* 1
) ( ) (
= X X X / b s 1arhat .
=ell talk about its properties in the next lesson, on hypothesis testin".