CH 5

2ME03 - Econometrics
Endogeneity, Instrumental Variables and GMM
October, 2022
Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 1 / 32

Endogenous regressors
Consider the linear regression model
yi = xi′ β + ϵi
we assumed that the error term ϵi and the explanatory variables xi

were contemporaneously uncorrelated, i.e. E(ϵi xi ) = 0
But if E(ϵi xi ) ̸= 0 the estimator of β will be inconsistent
When can we expect E(ϵi xi ) ̸= 0?

Example: autocorrelation with a lagged dependent variable
Suppose that
yt = β1 + β2 xt + β3 yt−1 + ϵt
and that ϵt = ρϵt−1 + vt where vt is white noite
Now,
cov (yt−1 , ϵt ) = cov (yt−1 , ρϵt−1 + vt )

= cov (yt−1 , ρϵt−1 ) + cov (yt−1 , vt )
= ρcov (yt−1 , ϵt−1 )
= ρcov (β1 + β2 xt−1 + β3 yt−2 + ϵt−1 , ϵt−1 )
= ρβ3 cov (yt−2 , ϵt−1 ) + ρσ 2
= ρβ3 cov (yt−1 , ϵt ) + ρσ 2

Example: autocorrelation with a lagged dependent variable
and thus
ρσ 2
cov (yt−1 , ϵt ) =
1 − ρβ3
unless ρ = 0 the OLS estimators are biased

Example: simultaneity and reverse causality
Consider a Keynesian consumption function:
yt = β1 + β2 xt + ϵt
where yt is aggregate consumption, xt is aggregate income and β2 denotes

the marginal propensity to consume
But aggregate income is not exogenous because,
xt = yt + zt
where zt denotes investment

This implies that income xt and the error term ϵt are correlated

Example: simultaneity and reverse causality
This can be shown be deriving the reduced form, which describes yt

and xt as a function of exogenous variable(s) and error terms
The reduced form is
β1 zt ϵt
xt = + +
1 − β2 1 − β2 1 − β2
β1 β2 zt ϵt
yt = + +
1 − β2 1 − β2 1 − β2
And it follows that

V (ϵt ) σ2
cov (xt , ϵt ) = =
1 − β2 1 − β2

Example: Measurement error
Consider the model with GM assumptions holding:
yi = xi∗ ′ β + ϵi
If independent variables xi∗ are mismeasured in dataset, even with a

i.i.d. white noise error, we have:
A variável Xi está medida com um erro
xi = xi∗ + ui
where ui is such that:

E [ui ϵi ] = 0
E [ui xi∗ ] = 0
i.i.d
ui ∼ N(0, σu × IK∗K )
This effect is often called iron law of econometrics.
Mismeasured variables cause their coefficients to be biased towards zero
in the presence of classical measurement error, as presented. This is
often referred to as attenuation bias.
Example: Measurement error
Consider the simple case of 1 independent variable, with GM
assumptions holding:
yi = xi∗ β + ϵi
Then the OLS estimator for β, if xi is used instead of xi∗ is:
cov [xi , yi ]
βb =
var [xi ]
cov [xi∗ + ui , xi∗ β + ϵi ]

βb =
var [xi ]
var [xi∗ ]
plimβb = β
var [xi ]
var [xi∗ ]

plimβb = β
var [xi∗ ] + var [ui ]
| {z }
attenuation bias
Example: Omitted variable (self-selection) bias
Consider the simple case of 1 independent variable, with GM
assumptions holding:
yi = xi β + wi γ + ϵi
If the econometrician doesn’t have variable wi and only considers the

model:
yi = xi β + εi
where εi = wi γ + ϵi , then:
cov [xi , yi ]
βb =
var [xi ]
cov [xi , xi β + wi γ + ϵi ]
βb =
var [xi ]
cov [xi , wi ]
plimβb = β + γ
var [xi ]
| {z }
Omitted Variable Bias
Example: Omitted variable (self-selection) bias
cov [xi , wi ]
plimβb = β + γ
var [xi ]
| {z }
Omitted Variable Bias
The OLS will be biased.

Bias depends on γ and on the covariance between included and excluded
independent variables.
This can be caused by relevant variables ommitted, or self-selection
mechanisms that are ignored by the econometrician.

An alternative estimator
Consider the simple model
yi = β1 + β2 xi + ϵi
where E (ϵi xi ) ̸= 0 so OLS is inconsistent.

Now, suppose we can find an instrumental variable zi satisfying:
Exogeneity: E (ϵi zi ) = 0 (instrument uncorrelated to error term)
Relevance: cov (xi , zi ) ̸= 0 (instrument correlated with endogenous
regressor)

An alternative estimator: Graphical Intuition
Imagine that each box represents the variability of each variable. Our
problem is that variation in ϵi is correlated with variation in xi .
Two approaches to solve this problem:

In an OLS setting we would try to come out with variables wi to
account for such correlation. Sometimes it is impossible.
In an IV setting we try to find a variable/instrument that is sufficiently
correlated with xi , but only on the valid variation.
Take the covariance with zi on both sides of
yi = β1 + β2 xi + ϵi
to obtain
cov (yi , zi ) = β2 cov (xi , zi ) + cov (ϵi , zi )
and thus
cov (yi , zi )
β2 =
cov (xi , zi )

How can we build an estimator for β2 ?

Replace the population covariances by the sample covariances
1 PN PN
i=1 (yi − y )(zi − z) (yi − y )(zi − z)
βb2,IV = N
1 PN = Pi=1
N
N i=1 (xi − x )(zi − z) i=1 (xi − x )(zi − z)
This is an instrumental variables estimator

Note that this reduces to OLS if zi = xi
βb2,IV is a consistent estimator for β2 provided the instruments are valid
In general we cannot show unbiasedness of the IV estimator (small
sample properties are unknown)
As colunas de X que utilizam variáveis endogénas são subtituidas por variaveis instrumentais

The more general case - one IV per endogenous variable
Consider the model

yi = xi′ β + ϵi
where E(ϵi xi ) ̸= 0 for some elements of xi and the ϵi follow the usual
conditions
Suppose we can find a vector of instruments zi having the same
dimension as xi such that
E(ϵi zi ) = 0
exogenous variables are instrumented by themselves

working in matrix terms

Y = Xβ + ϵ
and Z is the vector of instruments (one for each variable in X)
then the IV estimator is given by
βbIV = (Z′ X)−1 Z′ Y
this estimator is consistent and asymptotically follows a normal

distribution.

the (asymptotic) variance-covariance matrix can be estimated by
V(
b β b 2 (Z′ X)−1 Z′ Z(X′ Z)−1
b )=σ
IV
with
N
1 X
σ2
b = (yi − xi′ βbIV )
N i=1
since the results are valid asymptotically it does not matter whether we
correct for degrees of freedom
standard errors of IV estimators are typically quite high when compared
to OLS - this is usually due to low correlation between the instrument
and the regressor

The Generalized Instrumental Variable estimator
Consider the linear model
yi = xi′ β + ϵi
where β is a K-dimensional vector of parameters

For OLS to be consistent it must be true that E(ϵi xi ) = 0
If E(ϵi xi ) ̸= 0 then OLS is inconsistent

The model is unidentified - in order to be identified we need to impose

alternative assumptions
Identification is obtained if we can find an R-dimensional vector of
(relevant) instruments zi such that E(ϵi zi ) = 0
The conditions E(ϵi zi ) = 0
E(ϵi zi ) = E((yi − xi′ β)zi ) = 0
are moment conditions.

These R moment conditions can be used to estimate β

Simply replace the expectations by sample averages. That is, replace
E((yi − xi′ β)zi ) = 0
by
N
1 X
(yi − xi′ β)zi = 0
N i=1
Next, choose an estimate for β that makes the sample averages as

close to 0 as possible
Why? Sample averages converge to population means if N becomes
infinitely large
population mean is zero (only) for the true parameter values

But note that
If R < K we do not have enough instruments. There is an infinite
number of values for βbIV that satisfy the moment conditions. The
model remains unidentified
If R = K there is (typically) one unique solution satisfying the moment
conditions. That solution is
βbIV = (Z′ X)−1 Z′ Y
If R > K the model is overidentified. There are more instruments than

necessary for identification. Rather than choosing a subset of
instruments, we can exploit them all by minimizing a quadratic form in
the sample moments
" N
#′ " N
#
1 X ′ 1 X ′
QN (β) = (yi − xi β)zi WN (yi − xi β)zi
N i=1 N i=1
where WN is a R × R positive definite weighting matrix

The resulting estimator for β is consistent for any choice of weighting

matrix
the optimal weighting matrix yields the most efficient estimator for β
When ϵi ∼ N(0, σ 2 ) the optimal weighting matrix is given by
N
!−1
opt 1 X
WN = zi z′
N i=1 i
In matrix terms, the resulting estimator is
βbIV = (X′ Z(Z′ Z)−1 Z′ X)−1 X′ Z(Z′ Z)−1 Z′ Y

This is the generalized instrumental variable estimator (GIVE)

estimator
The GIVE estimator is consistent and asymptotically normal with
estimated variance
V(
b β b 2 (X′ Z(Z′ Z)−1 Z′ X)−1
b )=σ
IV
where
N
1 X
b2 =
σ (yi − xi′ βbIV )
N i=1

It is also known as the two-stage least squares (2SLS) estimator.
This is because βbIV can be obtained by OLS as follows:
βbIV = (X b −1 X
b ′ X) b ′Y
where
b = Z(Z′ Z)−1 Z′ X
X
In the first step we regress each endogenous variable on all instruments

and exogenous variables
In the second step we replace the endogenous variables by their
predicted values
Note that the standard errors are not correct in the second step
If the ϵi are not homoskedastic we can estimate a “robust”
variance-covariance matrix using an approach similar to what we do in
OLS
Stata commands
The command to fit linear models with endogenous regressors is the

ivregress command
Suppose that x3 is endogenous and we want to instrument with z1 and
z2 using 2SLS. The command is:
ivregress 2sls y x1 x2 (x3 = z1 z2)
If x2 is instrumented by z1 and x3 by z2 the command is
ivregress 2sls y x1 (x2 x3 = z1 z2)

Additional remarks
Finding instruments is hard and statistical theory is of little help.

Instruments should be motivated by economic arguments
Instruments should be exogenous, i.e. uncorrelated with the equation’s
error term
They should also be relevant, i.e. correlated with the regressors that
they are supposed to be instrumenting
This means that in the reduced form, where we explain xi from zi , the
instruments should be “sufficiently important ”. Otherwise, we may
have a weak instruments problem
Stock and Watson propose a simple rule-of-thumb: If the F-statistic for
(joint) significance of the instruments in the first stage regression is
above 10 we do not need to worry about weak instruments

The Hausman (or Durbin-Wu-Hausman) test for
endogeneity
We can test whether one or more regressors are endogenous (correlated
with the error term), if we are willing to assume that the instruments
are valid (assuming E(ϵi zi ) = 0 we can test whether E(ϵi xi ) = 0)
Under the null, both the OLS and IV estimator are consistent. They
should differ by sampling error only. Under the alternative, only the IV
estimator is consistent (and OLS is inconsistent)
A simple version of the test is obtained by running an auxiliary
regression, where we augment the original model with the residual(s)
from the reduced form equations
The auxiliary regression reproduces the IV estimator. Under the null
(xi is exogenous) – the added residual(s) should be irrelevant
The Hausman test for endogeneity is based on the t-statistic (or
F-statistic) on the reduced form residuals
The Hausman (or Durbin-Wu-Hausman) test for
endogeneity
So the hypothesis of the test is:

H0 : bOLS is efficient and consistent; and bIV is consistent
H1 : bOLS is inconsistent; and bIV is consistent
The test unfolds as:
′ −1
bOLS − bIV VarIV − VarOLS bOLS − bIV ∼ χ2k
The intuition is that if coefficients are sufficiently close, we are more

likely to not reject the null hypothesis.

Overidentification restriction (Sargan) test
In the overidentified case, we can test the overidentifying restrictions.

We can do so by checking whether the sample moments
X
(1/N) εbi zi = 0
are “close” to zero

It is not possible to test whether instruments are valid (exogenous) if
they are needed to identify (=consistently estimate) the model. Thus,
in the exactly identified case (K = R) we cannot test the instruments.
We just have to believe them!
This is implemented using the overidentifying restrictions test
(Sargan test). The test follows a chi-squared distribution with R − K
degrees of freedom

So the hypothesis of the test is:

H0 : E [zi εi ] = 0
H1 : E [zi εi ] ̸= 0
The test unfolds as:
′ −1
′
X X X
N (1/N) εbi zi (1/N) zi zi (1/N) εbi zi ∼ χ2k

A simple way to implement the test is by taking N times R 2 of an

auxiliary regression of IV residuals upon the full set of instruments
If the test rejects, the sample evidence is inconsistent with the joint
validity of the R moment conditions
It is possible to test for the validity of a subset of excluded instruments.
In this case we use a “difference-in-Sargan” test or C-test
The C-test is calculated as the difference between two Sargan tests,
one computed from a regression that uses the full set of overidentifying
restrictions the other without the suspect instrument(s)
The test follows a chi-square with degrees of freedom equal to the
number of instruments in the subset

Stata commands
After running ivregress you can:

Obtain the F-statistic from the first stage regression with
estat firststage
Implement the Hausman test for endogeneity with
estat endogenous
Apply Sargan test for overidentification with
estat overid

CH 5

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

CH 5

Transféré par

Droits d'auteur :

Formats disponibles

2ME03 - Econometrics

Endogeneity, Instrumental Variables and GMM

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 1 / 32

Consider the linear regression model

we assumed that the error term ϵi and the explanatory variables xi

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 2 / 32

cov (yt−1 , ϵt ) = cov (yt−1 , ρϵt−1 + vt )

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 3 / 32

unless ρ = 0 the OLS estimators are biased

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 4 / 32

Consider a Keynesian consumption function:

where yt is aggregate consumption, xt is aggregate income and β2 denotes

where zt denotes investment

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 5 / 32

This can be shown be deriving the reduced form, which describes yt

And it follows that

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 6 / 32

If independent variables xi∗ are mismeasured in dataset, even with a

where ui is such that:

cov [xi∗ + ui , xi∗ β + ϵi ]

If the econometrician doesn’t have variable wi and only considers the

The OLS will be biased.

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 10 / 32

Consider the simple model

where E (ϵi xi ) ̸= 0 so OLS is inconsistent.

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 11 / 32

Two approaches to solve this problem:

Take the covariance with zi on both sides of

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 13 / 32

How can we build an estimator for β2 ?

This is an instrumental variables estimator

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 14 / 32

Consider the model

exogenous variables are instrumented by themselves

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 15 / 32

working in matrix terms

βbIV = (Z′ X)−1 Z′ Y

this estimator is consistent and asymptotically follows a normal

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 16 / 32

the (asymptotic) variance-covariance matrix can be estimated by

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 17 / 32

Consider the linear model

where β is a K-dimensional vector of parameters

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 18 / 32

The model is unidentified - in order to be identified we need to impose

E(ϵi zi ) = E((yi − xi′ β)zi ) = 0

are moment conditions.

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 19 / 32

Simply replace the expectations by sample averages. That is, replace

E((yi − xi′ β)zi ) = 0

Next, choose an estimate for β that makes the sample averages as

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 20 / 32

If R > K the model is overidentified. There are more instruments than

where WN is a R × R positive definite weighting matrix

The resulting estimator for β is consistent for any choice of weighting

In matrix terms, the resulting estimator is

βbIV = (X′ Z(Z′ Z)−1 Z′ X)−1 X′ Z(Z′ Z)−1 Z′ Y

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 22 / 32

This is the generalized instrumental variable estimator (GIVE)

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 23 / 32

In the first step we regress each endogenous variable on all instruments

The command to fit linear models with endogenous regressors is the

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 25 / 32