Vous êtes sur la page 1sur 32

2ME03 - Econometrics

Endogeneity, Instrumental Variables and GMM

October, 2022

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 1 / 32


Endogenous regressors

Consider the linear regression model

yi = xi′ β + ϵi

we assumed that the error term ϵi and the explanatory variables xi


were contemporaneously uncorrelated, i.e. E(ϵi xi ) = 0
But if E(ϵi xi ) ̸= 0 the estimator of β will be inconsistent
When can we expect E(ϵi xi ) ̸= 0?

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 2 / 32


Example: autocorrelation with a lagged dependent variable

Suppose that
yt = β1 + β2 xt + β3 yt−1 + ϵt
and that ϵt = ρϵt−1 + vt where vt is white noite
Now,

cov (yt−1 , ϵt ) = cov (yt−1 , ρϵt−1 + vt )


= cov (yt−1 , ρϵt−1 ) + cov (yt−1 , vt )
= ρcov (yt−1 , ϵt−1 )
= ρcov (β1 + β2 xt−1 + β3 yt−2 + ϵt−1 , ϵt−1 )
= ρβ3 cov (yt−2 , ϵt−1 ) + ρσ 2
= ρβ3 cov (yt−1 , ϵt ) + ρσ 2

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 3 / 32


Example: autocorrelation with a lagged dependent variable

and thus
ρσ 2
cov (yt−1 , ϵt ) =
1 − ρβ3

unless ρ = 0 the OLS estimators are biased

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 4 / 32


Example: simultaneity and reverse causality

Consider a Keynesian consumption function:

yt = β1 + β2 xt + ϵt

where yt is aggregate consumption, xt is aggregate income and β2 denotes


the marginal propensity to consume
But aggregate income is not exogenous because,

xt = yt + zt

where zt denotes investment


This implies that income xt and the error term ϵt are correlated

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 5 / 32


Example: simultaneity and reverse causality

This can be shown be deriving the reduced form, which describes yt


and xt as a function of exogenous variable(s) and error terms
The reduced form is
β1 zt ϵt
xt = + +
1 − β2 1 − β2 1 − β2

β1 β2 zt ϵt
yt = + +
1 − β2 1 − β2 1 − β2

And it follows that


V (ϵt ) σ2
cov (xt , ϵt ) = =
1 − β2 1 − β2

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 6 / 32


Example: Measurement error
Consider the model with GM assumptions holding:

yi = xi∗ ′ β + ϵi

If independent variables xi∗ are mismeasured in dataset, even with a


i.i.d. white noise error, we have:
A variável Xi está medida com um erro
xi = xi∗ + ui

where ui is such that:


E [ui ϵi ] = 0
E [ui xi∗ ] = 0
i.i.d
ui ∼ N(0, σu × IK∗K )
This effect is often called iron law of econometrics.
Mismeasured variables cause their coefficients to be biased towards zero
in the presence of classical measurement error, as presented. This is
often referred to as attenuation bias.
Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 7 / 32
Example: Measurement error
Consider the simple case of 1 independent variable, with GM
assumptions holding:
yi = xi∗ β + ϵi
Then the OLS estimator for β, if xi is used instead of xi∗ is:

cov [xi , yi ]
βb =
var [xi ]

cov [xi∗ + ui , xi∗ β + ϵi ]


βb =
var [xi ]
var [xi∗ ]
plimβb = β
var [xi ]
var [xi∗ ]
 
plimβb = β
var [xi∗ ] + var [ui ]
| {z }
attenuation bias
Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 8 / 32
Example: Omitted variable (self-selection) bias
Consider the simple case of 1 independent variable, with GM
assumptions holding:
yi = xi β + wi γ + ϵi

If the econometrician doesn’t have variable wi and only considers the


model:
yi = xi β + εi
where εi = wi γ + ϵi , then:
cov [xi , yi ]
βb =
var [xi ]
cov [xi , xi β + wi γ + ϵi ]
βb =
var [xi ]
cov [xi , wi ]
plimβb = β + γ
var [xi ]
| {z }
Omitted Variable Bias
Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 9 / 32
Example: Omitted variable (self-selection) bias

cov [xi , wi ]
plimβb = β + γ
var [xi ]
| {z }
Omitted Variable Bias

The OLS will be biased.


Bias depends on γ and on the covariance between included and excluded
independent variables.
This can be caused by relevant variables ommitted, or self-selection
mechanisms that are ignored by the econometrician.

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 10 / 32


An alternative estimator

Consider the simple model

yi = β1 + β2 xi + ϵi

where E (ϵi xi ) ̸= 0 so OLS is inconsistent.


Now, suppose we can find an instrumental variable zi satisfying:
Exogeneity: E (ϵi zi ) = 0 (instrument uncorrelated to error term)
Relevance: cov (xi , zi ) ̸= 0 (instrument correlated with endogenous
regressor)

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 11 / 32


An alternative estimator: Graphical Intuition
Imagine that each box represents the variability of each variable. Our
problem is that variation in ϵi is correlated with variation in xi .

Two approaches to solve this problem:


In an OLS setting we would try to come out with variables wi to
account for such correlation. Sometimes it is impossible.
In an IV setting we try to find a variable/instrument that is sufficiently
correlated with xi , but only on the valid variation.
Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 12 / 32
An alternative estimator

Take the covariance with zi on both sides of

yi = β1 + β2 xi + ϵi

to obtain
cov (yi , zi ) = β2 cov (xi , zi ) + cov (ϵi , zi )

and thus
cov (yi , zi )
β2 =
cov (xi , zi )

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 13 / 32


An alternative estimator

How can we build an estimator for β2 ?


Replace the population covariances by the sample covariances
1 PN PN
i=1 (yi − y )(zi − z) (yi − y )(zi − z)
βb2,IV = N
1 PN = Pi=1
N
N i=1 (xi − x )(zi − z) i=1 (xi − x )(zi − z)

This is an instrumental variables estimator


Note that this reduces to OLS if zi = xi
βb2,IV is a consistent estimator for β2 provided the instruments are valid
In general we cannot show unbiasedness of the IV estimator (small
sample properties are unknown)
As colunas de X que utilizam variáveis endogénas são subtituidas por variaveis instrumentais

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 14 / 32


The more general case - one IV per endogenous variable

Consider the model


yi = xi′ β + ϵi
where E(ϵi xi ) ̸= 0 for some elements of xi and the ϵi follow the usual
conditions
Suppose we can find a vector of instruments zi having the same
dimension as xi such that

E(ϵi zi ) = 0

exogenous variables are instrumented by themselves

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 15 / 32


The more general case - one IV per endogenous variable

working in matrix terms


Y = Xβ + ϵ
and Z is the vector of instruments (one for each variable in X)
then the IV estimator is given by

βbIV = (Z′ X)−1 Z′ Y

this estimator is consistent and asymptotically follows a normal


distribution.

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 16 / 32


The more general case - one IV per endogenous variable

the (asymptotic) variance-covariance matrix can be estimated by

V(
b β b 2 (Z′ X)−1 Z′ Z(X′ Z)−1
b )=σ
IV

with
N
1 X
σ2
b = (yi − xi′ βbIV )
N i=1

since the results are valid asymptotically it does not matter whether we
correct for degrees of freedom
standard errors of IV estimators are typically quite high when compared
to OLS - this is usually due to low correlation between the instrument
and the regressor

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 17 / 32


The Generalized Instrumental Variable estimator

Consider the linear model

yi = xi′ β + ϵi

where β is a K-dimensional vector of parameters


For OLS to be consistent it must be true that E(ϵi xi ) = 0
If E(ϵi xi ) ̸= 0 then OLS is inconsistent

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 18 / 32


The Generalized Instrumental Variable estimator

The model is unidentified - in order to be identified we need to impose


alternative assumptions
Identification is obtained if we can find an R-dimensional vector of
(relevant) instruments zi such that E(ϵi zi ) = 0
The conditions E(ϵi zi ) = 0

E(ϵi zi ) = E((yi − xi′ β)zi ) = 0

are moment conditions.


These R moment conditions can be used to estimate β

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 19 / 32


The Generalized Instrumental Variable estimator

Simply replace the expectations by sample averages. That is, replace

E((yi − xi′ β)zi ) = 0

by
N
1 X
(yi − xi′ β)zi = 0
N i=1

Next, choose an estimate for β that makes the sample averages as


close to 0 as possible
Why? Sample averages converge to population means if N becomes
infinitely large
population mean is zero (only) for the true parameter values

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 20 / 32


The Generalized Instrumental Variable estimator
But note that
If R < K we do not have enough instruments. There is an infinite
number of values for βbIV that satisfy the moment conditions. The
model remains unidentified
If R = K there is (typically) one unique solution satisfying the moment
conditions. That solution is
βbIV = (Z′ X)−1 Z′ Y

If R > K the model is overidentified. There are more instruments than


necessary for identification. Rather than choosing a subset of
instruments, we can exploit them all by minimizing a quadratic form in
the sample moments
" N
#′ " N
#
1 X ′ 1 X ′
QN (β) = (yi − xi β)zi WN (yi − xi β)zi
N i=1 N i=1

where WN is a R × R positive definite weighting matrix


Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 21 / 32
The Generalized Instrumental Variable estimator

The resulting estimator for β is consistent for any choice of weighting


matrix
the optimal weighting matrix yields the most efficient estimator for β
When ϵi ∼ N(0, σ 2 ) the optimal weighting matrix is given by

N
!−1
opt 1 X
WN = zi z′
N i=1 i

In matrix terms, the resulting estimator is

βbIV = (X′ Z(Z′ Z)−1 Z′ X)−1 X′ Z(Z′ Z)−1 Z′ Y

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 22 / 32


The Generalized Instrumental Variable estimator

This is the generalized instrumental variable estimator (GIVE)


estimator
The GIVE estimator is consistent and asymptotically normal with
estimated variance

V(
b β b 2 (X′ Z(Z′ Z)−1 Z′ X)−1
b )=σ
IV

where
N
1 X
b2 =
σ (yi − xi′ βbIV )
N i=1

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 23 / 32


The Generalized Instrumental Variable estimator
It is also known as the two-stage least squares (2SLS) estimator.
This is because βbIV can be obtained by OLS as follows:

βbIV = (X b −1 X
b ′ X) b ′Y

where
b = Z(Z′ Z)−1 Z′ X
X

In the first step we regress each endogenous variable on all instruments


and exogenous variables
In the second step we replace the endogenous variables by their
predicted values
Note that the standard errors are not correct in the second step
If the ϵi are not homoskedastic we can estimate a “robust”
variance-covariance matrix using an approach similar to what we do in
OLS
Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 24 / 32
Stata commands

The command to fit linear models with endogenous regressors is the


ivregress command
Suppose that x3 is endogenous and we want to instrument with z1 and
z2 using 2SLS. The command is:
ivregress 2sls y x1 x2 (x3 = z1 z2)
If x2 is instrumented by z1 and x3 by z2 the command is
ivregress 2sls y x1 (x2 x3 = z1 z2)

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 25 / 32


Additional remarks

Finding instruments is hard and statistical theory is of little help.


Instruments should be motivated by economic arguments
Instruments should be exogenous, i.e. uncorrelated with the equation’s
error term
They should also be relevant, i.e. correlated with the regressors that
they are supposed to be instrumenting
This means that in the reduced form, where we explain xi from zi , the
instruments should be “sufficiently important ”. Otherwise, we may
have a weak instruments problem
Stock and Watson propose a simple rule-of-thumb: If the F-statistic for
(joint) significance of the instruments in the first stage regression is
above 10 we do not need to worry about weak instruments

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 26 / 32


The Hausman (or Durbin-Wu-Hausman) test for
endogeneity
We can test whether one or more regressors are endogenous (correlated
with the error term), if we are willing to assume that the instruments
are valid (assuming E(ϵi zi ) = 0 we can test whether E(ϵi xi ) = 0)
Under the null, both the OLS and IV estimator are consistent. They
should differ by sampling error only. Under the alternative, only the IV
estimator is consistent (and OLS is inconsistent)
A simple version of the test is obtained by running an auxiliary
regression, where we augment the original model with the residual(s)
from the reduced form equations
The auxiliary regression reproduces the IV estimator. Under the null
(xi is exogenous) – the added residual(s) should be irrelevant
The Hausman test for endogeneity is based on the t-statistic (or
F-statistic) on the reduced form residuals
Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 27 / 32
The Hausman (or Durbin-Wu-Hausman) test for
endogeneity

So the hypothesis of the test is:


H0 : bOLS is efficient and consistent; and bIV is consistent
H1 : bOLS is inconsistent; and bIV is consistent
The test unfolds as:

 ′  −1  
bOLS − bIV VarIV − VarOLS bOLS − bIV ∼ χ2k

The intuition is that if coefficients are sufficiently close, we are more


likely to not reject the null hypothesis.

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 28 / 32


Overidentification restriction (Sargan) test

In the overidentified case, we can test the overidentifying restrictions.


We can do so by checking whether the sample moments
X
(1/N) εbi zi = 0

are “close” to zero


It is not possible to test whether instruments are valid (exogenous) if
they are needed to identify (=consistently estimate) the model. Thus,
in the exactly identified case (K = R) we cannot test the instruments.
We just have to believe them!
This is implemented using the overidentifying restrictions test
(Sargan test). The test follows a chi-squared distribution with R − K
degrees of freedom

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 29 / 32


Overidentification restriction (Sargan) test

So the hypothesis of the test is:


H0 : E [zi εi ] = 0
H1 : E [zi εi ] ̸= 0
The test unfolds as:

 ′  −1  

X X X
N (1/N) εbi zi (1/N) zi zi (1/N) εbi zi ∼ χ2k

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 30 / 32


Overidentification restriction (Sargan) test

A simple way to implement the test is by taking N times R 2 of an


auxiliary regression of IV residuals upon the full set of instruments
If the test rejects, the sample evidence is inconsistent with the joint
validity of the R moment conditions
It is possible to test for the validity of a subset of excluded instruments.
In this case we use a “difference-in-Sargan” test or C-test
The C-test is calculated as the difference between two Sargan tests,
one computed from a regression that uses the full set of overidentifying
restrictions the other without the suspect instrument(s)
The test follows a chi-square with degrees of freedom equal to the
number of instruments in the subset

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 31 / 32


Stata commands

After running ivregress you can:


Obtain the F-statistic from the first stage regression with
estat firststage
Implement the Hausman test for endogeneity with
estat endogenous
Apply Sargan test for overidentification with
estat overid

Endogeneity, Instrumental Variables and GMM 2ME03 - Econometrics October, 2022 32 / 32

Vous aimerez peut-être aussi