ECON5121 Tutorial3

ECON5121 Tutorial 3
Stanley Iat-Meng KO
Department of Economics, The Chinese University of Hong Kong
December 3, 2010
1 Endogeneity Problem (Handout 6)
Endogeneity problemcomes fromthe correlation between explanatory variables and the error term,
i.e. violation of the key exogeneity assumption (orthogonality condition in random sample). The
variables correlated to the error term are thus named endogenous variables. Recalled that we
have the so-called nite sample model which gives you nice solution as long as the problem in
question fulll the assumptions. However, we lose those nice properties when endogeneity exists
and there is no way to eliminate estimation bias. Therefore we resort to the second best one which
is consistency in the large sample framework. In mathematical language, endogeneity is
E(|x) 0, (1.0.1)
or
E(x) 0 or E(x
k
) 0 (k = 1, 2, . . . , K), (1.0.2)
Noted that the rst one which stronger implies the second one.
1.1 Omitted variables and Errors-In-Variables Problems
Sources of endogeneity are Simultaneous Equations Models
1
, omitted variable bias and Errors-
In-Variables.
For omitted variables
2
,
exclusion of relevant regressors (right hand side variables), we always have biased estimator
and also inconsistent estimator. And variance of the estimator will be smaller than the one
with relevant regressors included;
1
For details please refer to the next section.
2
For derivation details please refer to handout 6.
1
inclusion of irrelevant regressors will have no eect on unbiasedness and consistency since
the regressors have no explanatory power on y. But the estimator is inecient (having larger
variance) because we introduce noise into the system.
For Errors-In-Variables
3
,
suppose we have measurement error in regressor x
K
y =
0
+
1
x
1
+ +
K
x
K
+
=
0
+
1
x
1
+ +
K
x
K
+ (
K
v
K
)
=
0
+
1
x
1
+ +
K
x
K
+ e
(1.1.1)
where the measurement error assumption is x
K
= x
k
+ v
K
, with x
K
being the true value and
v
K
being the unobservable measurement error. The endogeneity comes from the correlation
between x
K
and e when we do regression with respect to x
1
, x
2
, . . . , x
K
.
note the three assumptions are crucial in the EIV model ( page 8 in handout 6 ).
in the case of a single explanatory variable (the example on page 9 in handout 9), we have
plim
=
_

2
v
2
x
+
2
v
_
=
_

2
x
2
x
+
2
v
_
. (1.1.2)
The term multiplying is always less than unity (even in the general case). Hence the
estimate is always biased. Sometimes you may suspect EIV when unreasonable upward or
downward biased exists.
as mention in the handout, all the coecient estimates are generally inconsistent even if
only a subset of explanatory variables has measurement errors. Exception is that when x
K
is
uncorrelated with all other x
j
4
.
measurement error in the dependent variable y only results in a larger error variance in gen-
eral.
A good reference of both omitted variables and Errors-In-Variables is section 4.3 and 4.4 in
Wooldridge (2002).
1.2 Instrumental Variables Estimator
Endogeneity problem is everywhere in social science and instrumental variables (IV) regression
is a general and popular way to tackle the problem. Instrumental variables estimator is consistent
under certain conditions.
Key things need to know about IV regression:
3
For derivation details please refer to handout 6.
4
See section 4.4 in Wooldridge (2002).
2
The two conditions for valid instruments: Orthogenality Condition and Relevance (Rank)
Condition.
A fast check with Order Condition.
Exact number of IVs as endogenous variables v.s. More IVs than endogenous variables.
How the two stage least squares (2SLS) works: Consistency, Asymptotic Normality, Asymp-
totic Eciency.
Weak instruments
Testing Endogeneity: the Hausman test.
(not in notes) Testing overidentication: the Sargan test.
Here we review the assumptions required for 2SLS. Suppose the population {y, x
, } follows
y = x
+ , (1.2.1)
where x is K 1, includes unity and E(x) 0. Hence x contains endogenous variables.
Assumption 1.1 (orthogonality condition). z is a M 1 vector such that E(z
) = 0.
Remark 1.1. z is named exogenous variables or predetermined variables. Note that z usually con-
tains some exogenous elements of x.
Assumption 1.2 (rank condition).
(a) rank E(zz
) = M; (b) rank E(zx
) = K. (1.2.2)
Remark 1.2. Part (a) of assumption 1.2 is necessary but not as important as part (b) since it usually
holds . Part (b) is the crucial rank condition for identication
5
.
A fast check for identication is the Order Condition which is
M(= #exogenous or predetermined variables) K(= #regressors). (1.2.3)
The intuition behind is that we need at least as many exogenous (predetermined) variables as
endogenous variables in order to consistently estimate the model. Note that order condition is a
necessary but not sucient condition.
When apply 2SLS, we may think of a reduced
6
form exist for each endogenous variable
x
= z
+ u
(1.2.4)
or
5
It will be clear when we come to the Simultaneous Equations Models.
6
See the section on SEM below.
3
_
x
1
. . . x
K
_
=
_
z
1
. . . z
M
_
_
11
. . .
K1
.
.
.
.
.
.
.
.
.
1M
. . .
KM
_
_
+
_
u
1
. . . u
K
_
, (1.2.5)
with E(zu
) = 0. Hence we have
= [E(zz
)]
1
E(zx
), (1.2.6)
here we use part (a) of assumption 1.2 to insure the invertibility of E(zz
). Let x
(z
z
and premultiply to both side of (1.2.1),
y = x
+ ,
x
y = x
+ x
, (premultiply x on both sides)

E(x
y) = E(x
) + E(x
), (take expectation on both sides)

= [E(x
)]
1
E(x
y). (assumption 1.1 and 1.2)

(1.2.7)
Therefore,
= [E(xz
)[E(zz
)]
1
E(zx
)]
1
[E(xz
)[E(zz
)]
1
E(zy)] (1.2.8)
and assuming i.i.d sample of size n, the 2SLS is
=
_
_
_
_
n
i=1
x
i
z
i
_
_
_
_
n
i=1
z
i
z
i
_
_
1
_
_
n
i=1
z
i
x
i
_
_
_
_
1
_
_
_
_
n
i=1
x
i
z
i
_
_
_
_
n
i=1
z
i
z
i
_
_
1
_
_
n
i=1
z
i
y
i
_
_
_
_
. (1.2.9)
2 Systems of Regression Equations (Handout 7)
2.1 Seemingly Unrelated Regression (SUR) Models
7
Seemingly Unrelated Regression (SUR) Model is a direct extension from single equation model to
multiple equation model. The structure of the model is
y
1i
= x
1i
1
+ u
1i
,
y
2i
= x
2i
2
+ u
2i
,
.
.
.
y
ji
= x
ji
j
+ u
ji
,
.
.
.
y
Ji
= x
Ji
J
+ u
Ji
,
(2.1.1)
where j = 1, 2, . . . , J is the dimension of equations (therefore we have J individuals in the system)
and i = 1, 2, . . . , n is the dimension of observations. x
ji
is K
j
1 explanatory variable vector of the
7
This section draws extensively from chapter 7 of Wooldridge (2002).
4
i-th observation of the j-th equation with K
j
elements.
j
is K
j
1 coecient vector of the j-th
equation with K
j
elements (which of cause is invariant across i). u
i j
denotes the error term.
In many applications x
i j
is the same for all j, but the general model allows the elements and the
dimension of x
i j
to vary across equations. The name is so-called since each equation has its own
vector
j
, it appears that the J individuals are unrelated and of cause it is no harm to do estimation
separately. Nevertheless, correlation across the errors in dierent equations can provide links that
can be exploited in estimation.
2.1.1 Estimating SUR Model by OLS
We shall rst try to apply OLS to SUR Model as an exercise under certain conditions that OLS
works. Note that in (2.1.1), we deliberately express a snapshot of the i-th observation of the J
equations and this is crucial in understanding what random sample means in the case of multiple
equations model.
Denition 2.1. Let
(y, X, u)
_
_
y
1
x
1
0 0 u
1
y
2
0 x
2
0 u
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y
J
0 0 x
J
u
J
_
_
be the population. In this case a random sample (i.i.d.) means we randomly draw, say, n times
from (y, X, u) and obtain {(y
i
, X
i
, u
i
)|i = 1, 2, . . . , n} where
(y
i
, X
i
, u
i
)
_
_
y
1i
x
1i
0 0 u
1i
y
2i
0 x
2i
0 u
2i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y
Ji
0 0 x
Ji
u
Ji
_
_
.
With the above notation, we can express (2.1.1) in matrix form
y
i
= X
i
+ u
i
(2.1.2)
where = (
1
,
2
. . . ,
J
)
and u
i
= (u
1i
, u
2i
. . . , u
Ji
)
.
In order to perform OLS, we assume the following to the population,
Assumption 2.1.
E(X
u) = 0. (2.1.3)
Assumption 2.2.
A E(X
X) is nonsingular. (2.1.4)
Now, as in the single equation case, we express in terms of population moments
5
y = X + u,
X
y = X
+ X
u, (premultiply X
on both sides)
E(X
y) = E(X
X) + E(X
u), (take expectation on both sides)

= [E(X
X)]
1
E(X
y). (assumption 2.1 and 2.2)

Then as usual, we approximate population expectation by sample average, we have
_
_
1
n
n
i=1
(X
i
X
i
)
_
_
1
p
[E(X
X)]
1
A
1
. (2.1.5)
1
n
n
i=1
(X
i
y
i
)
p
E(X
y). (2.1.6)
Hence
_
1
n
n
i=1
(X
i
X
i
)
_
_
1
1
n
n
i=1
(X
i
y
i
) =
_
_
n
i=1
(X
i
X
i
)
_
_
1
i=1
(X
i
y
i
) (2.1.7)
Stacking up all y
i
and X
i
, we have
(Y, X)
_
_
y
1
, X
1
.
.
.
y
i
, X
i
.
.
.
y
n
, X
n
_
_
where Y (y
1
, . . . , y
i
, . . . , y
n
)
is the (J n) 1 vector of stacked observations of y

i
, and X
(X
1
, . . . , X
i
, . . . , X
n
)
is the (J n) K matrix
8
of stacked observations X
i
. It happens that
n
i=1
(X
i
X
i
) =
_
X
1
X
2
X
n
_
_
X
1
X
2
.
.
.
X
n
_
_
= X
X, (2.1.8)
and
n
i=1
(X
i
y
i
) =
_
X
1
X
2
X
n
_
_
y
1
y
2
.
.
.
y
n
_
_
= X
Y. (2.1.9)
Therefore we have the familiar form of

8
K K
1
+ + K
j
+ + K
J
6
(X
X)
1
(X
Y). (2.1.10)
Indeed, the OLS apply to SUR model above is equivalent to the case of single equation model,
with each observation as a one-set equations system instead of a one-line equation, (with scalars
(y
i
, x
i
,
i
) replaced by vectors and matrices (y
i
, X
i
, u
i
). If we expand (2.1.10) out we will found that
is simply stacking up the separated OLS estimates of

j
, the same as we do OLS equation by
equation.
2.1.2 GLS (FGLS) and SUR Model
Though OLS in the previous section do give us consistent estimator
9
, we can do better with Gen-
eralized Least Squares which exploits more information from inter-equation relationship. The idea
of GLS plays a central role in SUR model as OLS does in the single equation case.
Recall that we introduce GLS in the single equation model with heteroskedasticity, that is
Var(
i
) =
2
i
i = 1, 2, . . . , n.
Please be noticed that we apply GLS in the multiple equations model not because of heteroskedas-
ticity. Actually we have homoskedasticity in u
i
since
E(u
i
u
i
) = i = 1, 2, . . . , n.
So with no heteroskedasticity, why bother to use a more tedious method and make things compli-
cated? The answer is that, under some conditions FGLS (Feasible GLS) is asymptotically more
ecient than OLS
10
.
To make GLS works, we need the following assumptions
Assumption 2.1
E(X u) = 0. (2.1.11)
Remark 2.1. Assumption 2.1 is stronger than assumption 2.1. It assumes that each element of u
is uncorrelated with each element of X and plays a crucial role in establishing consistency
11
of the
GLS estimator.
Assumption 2.2 E(uu
) is positive denite and E(X
1
X) is nonsingular.
We transform the system of equations by multiplying equation (2.1.2) by
1/2
1/2
y
i
= (
1/2
X
i
) +
1/2
u
i
or y
i
= X
i
+ u
i
. (2.1.12)
Now we estimate equation (2.1.12) by OLS introduced in the previous section. We have
9
See Theorem 7.1 in Wooldridge (2002).
10
See section 7.5.2 and problem 7.2 in Wooldridge (2002).
11
For the proof of consistency please refer to section 7.4.1 in Wooldridge (2002).
7
_
n
i=1
X
i
X
i
_
_
1
_
_
n
i=1
X
i
y
i
_
_
=
_
_
n
i=1
(X
1
X
i
)
_
_
1
_
_
n
i=1
(X
1
y
i
)
_
_
. (2.1.13)
In FGLS we replaced the unknown matrix with a consistent estimator. That is
Apply OLS as in the previous section to obtain u
i
y
i
X
i
;
acquire consistent estimator
12
of by

1/n
_
n
i=1
u
i
u
i
;
given

, the FGLS of is
=
_
_
n
i=1
(X
1
X
i
)
_
_
1
_
_
n
i=1
(X
1
y
i
)
_
_
. (2.1.14)
2.2 Simultaneous Equations (SEM) Models
It is good to start from a simple two-equation structural model:
y
1
=
2
y
2
+
11
x
1
+
12
x
2
+ u
1
, (2.2.1)
y
2
=
1
y
1
+
21
x
1
+
22
x
2
+ u
2
. (2.2.2)
where y
1
, y
2
are endogenous and x
1
, x
2
are predetermined (or exogenous) variables. Sometimes
we will express in terms of matrices and vectors
_
1
2
1
1
_ _
y
1
y
2
_
+
_
11

12
21

22
_ _
x
1
x
2
_
=
_
u
1
u
2
_
. (2.2.3)
Let us rst express all endogenous variables in terms of predetermined variables. By simple alge-
bra, we have
y
1
=
11
+
2
21
1
1
2
x
1
+
12
+
2
22
1
1
2
x
2
+
1
1
1
2
u
1
+
2
1
1
2
u
2
,
y
2
=
11
+
21
1
1
2
x
1
+
12
+
22
1
1
2
x
2
+
1
1
1
2
u
1
+
1
1
1
2
u
2
.
(2.2.4)
or equivalently,
y
1
=
11
x
1
+
12
x
2
+ v
1
, (2.2.5)
y
2
=
21
x
1
+
22
x
2
+ v
2
. (2.2.6)
where
12
See section 7.5.1 in Wooldridge (2002).
8
11
=
11
+
2
21
1
1
2
,
12
=
12
+
2
22
1
1
2
(2.2.7)
and so on. We say equations (2.2.1) and (2.2.2) is the structural form equations. And equations
(2.2.5) and (2.2.6) are reduced form equations.
Recall that one of the sources of endogeneity in single equation model is SEM. It is now easy
to see why. Suppose we are interested in investigating equation (2.2.1) but not (2.2.2). Then y
2
in equation (2.2.1) acts as the endogenous variable in the usual single equation case. In 2SLS,
we rst regress y
2
(the endogenous variable) on all exogenous variables. This rst stage actually
comes from equation (2.2.6). Hence equations (2.2.1) and all the predetermined variables in the
system provide us the information to consistently estimate structural equation (2.2.1). Note that
with the two equations we are only possible to recover equation (2.2.1) but not (2.2.2) since the
information is limited
13
.
2.2.1 Identication
We have seen the SEM can be expressed as the structural form and the reduced form. The reduced
form can always be consistently estimated by OLS since the predetermined variables on the right
hand side are always orthogonal to the error terms (see (2.2.4)). But it is not necessary that the
parameters in the structural equations, i.e.
1
,
2
,
11
,
12
,
21
,
22
in our case, can be recovered
from
11
,
12
,
21
,
22
. If it is possible to deduce the structural coecients of an equation from a
knowledge of the reduced form equations, we say that that equation is identied.
Take equation (2.2.1) as an example. From (2.2.5) and (2.2.6), we have
y
1

2
y
2
= (
11

2
21
)x
1
+ (
12

2
22
)x
2
+ (v
1

2
v
2
). (2.2.8)
And rearranging (2.2.1), we have
y
1

2
y
2
=
11
x
1
+
12
x
2
+ u
1
. (2.2.9)
Comparing (2.2.8) and (2.2.9) we have
11

2
21
=
11
,
12

2
22
=
12
,
or equivalently,
_
21
1 0
22
0 1
_
_
11
12
_
_
=
_
11
12
_
. (2.2.10)
13
That is why we have the Maximum Likelihood counterpart of 2SLS named as Limited-Information Maximum
Likelihood (LIML).
9
Note that (2.2.10) is a linear equations system with two equations and three unknowns to solve,
i.e.
2
,
11
,
12
. Hence we do not have unique solution. Suppose we place an exclusion restriction
12
= 0 in equation (2.2.1). We then have one additional equation and (2.2.10) becomes
_
21
1 0
22
0 1
0 0 1
_
_
_
11
12
_
_
=
_
11
12
0
_
_
. (2.2.11)
Knowledge from Linear Algebra
14
tells us that (2.2.11) has unique solution.
The example shows that identication requires restriction assumptions. See page 5 of handout
7 for the ve possible restrictions.
2.2.2 2SLS Estimation, the Rank Condition and Order Condition
The relationship between single equation endogeneity and SEM actually provides us a way to esti-
mate the SEM. Just do 2SLS to the interested structural equation with the predetermined variables
in the system act as IV. Therefore, identication problem is nothing but the rank condition in 2SLS.
Suppose we want to estimate the rst equation in the structural form in our example
y
1
=
2
y
2
+
11
x
1
+
12
x
2
+ u
1
,
with the predetermined variables x
1
, x
2
as IV.
Without any restriction, is the equation identied? Let z (x
1
, x
2
)
and x (y
2
, x
1
, x
2
)
, we
come back to the 2SLS. It is easy to see that the equation in question is not identied as the order
condition (1.2.3) is failed (M = 2 3 = K). As a demonstration we also check the rank condition
of 2SLS, i.e. part (b) of assumption 1.2, hence
rank E(zx
) = rank E
__
x
1
y
2
x
2
1
x
1
x
2
x
2
y
2
x
1
x
2
x
2
2
__
3 = K. (2.2.12)
Therefore without any restriction the equation is not identied.
Suppose we restrict
12
= 0, i.e. exclude x
2
from the equation, we have
y
1
=
2
y
2
+
11
x
1
+ u
1
,
Now z (x
1
, x
2
)
and x (y
2
, x
1
)
. Order condition is fullled (M = 2 = K). The rank condition

is now
rank E(zx
) = rank
_
E(x
1
y
2
) E(x
2
1
)
E(x
2
y
2
) E(x
1
x
2
)
_
. (2.2.13)
The rank is equal to K = 2 if and only if the determinant is not zero, i.e. E(x
1
y
2
) E(x
1
x
2
)
E(x
2
y
2
) E(x
2
1
) 0. And it is easy to see that this is the case in our example and thus the equation
is identied.
14
A very good reference to learn Linear Algebra is the video lecture by Prof. Gilbert Strang at MIT OCW.
http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/index.htm
10
2.2.3 Full System v.s. Single Equation Estimation
In the SUR model, we see that we can do estimation in the full system sense rather than equa-
tion by equation. In the SEM, we can also do the same provided all the structural equations are
correctly specied and identied. Then system procedures such as the System 2SLS procedure
and 3SLS
15
are asymptotically more ecient than a single-equation procedure such as 2SLS. But
single-equation methods are more robust. If interest lies, say, in the rst equation of a system, sin-
gle equation 2SLS is consistent and asymptotically normal provided the rst equation is correctly
specied and the instruments are the predetermined variables in the system. However, if one equa-
tion in a system is misspecied, the 3SLS or GMM estimates of all the parameters are generally
inconsistent.
15
See Chapter 8 in Wooldridge (2002).
11
References
Wooldridge, Jerey M. (2002) Econometric Analysis of Cross Section and Panel Data: The MIT
Press.
12

ECON5121 Tutorial3

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

ECON5121 Tutorial3

Transféré par

Droits d'auteur :

Formats disponibles

ECON5121 Tutorial 3

) = M; (b) rank E(zx

, (premultiply x on both sides)

), (take expectation on both sides)

y). (assumption 1.1 and 1.2)

u), (take expectation on both sides)

y). (assumption 2.1 and 2.2)

is the (J n) 1 vector of stacked observations of y

is simply stacking up the separated OLS estimates of

) is positive denite and E(X

. Order condition is fullled (M = 2 = K). The rank condition

Vous aimerez peut-être aussi