Vous êtes sur la page 1sur 6

INSTRUMENTAL VARIABLES REGRSSION MODEL

INTRODUCTION

We have seen that if the error term is correlated with an explanatory variable, then the OLS
estimator is biased in small samples and inconsistent in large samples. This means that even in
large samples the OLS estimator may not produce an estimate that is close to the true value of the
population parameter being estimated. As a result, an empirical study that estimates a linear
regression model using the OLS estimator when the error term is correlated with an explanatory
variable is not internally valid.

Three Major Sources of Correlation Between the Error Term and an Explanatory Variable

The 3 most important sources that produce a correlation between the error term and an
explanatory variable are the following. 1) Confounding variable. 2) Reverse causation. 3)
Measurement error in an explanatory variable. The bias caused by a confounding variable can be
corrected by including it as an explanatory variable in the model, if it is observable, or specifying
and estimating a fixed effects regression model if it is unobservable and differs across units but is
constant over time. However, these methods do not work if the confounding variable(s) is
unobservable and/or differs across units and over time. Also, these methods do not work for
reverse causation and measurement error in an explanatory variable.

INSTRUMENTAL VARIABLE (IV) REGRESSION MODEL

When the error term is correlated with an explanatory variable, it is not possible to find an
estimator that is unbiased in small samples. However, it is possible to find an estimator that is
consistent in large samples. To obtain consistent estimates in large samples, we can specify an
instrumental variable (IV) regression model, and use an instrumental variable (IV) estimator.

IV REGRESSION MODEL WITH ONE EXPLANTORY VARIABLE AND ONE


INSTRUMENTAL VARIABLE

The IV regression model with one explanatory variable is

Yt = + Xt + t

The IV regression model allows the error term to be correlated with the explanatory variable, and
therefore the error term to have a non-constant, non-zero, mean. That is, Corr( t, Xt) 0, and E(t|
Xt) 0. The remaining assumptions are the same as the MCLRM. Any variable correlated with
the error term is called an endogenous variable. Any variable uncorrelated with the error term is
called an exogenous variable.

Estimation

To use the sample of data to obtain estimates of the parameters of the IV regression model, we
use an IV estimator. To use an IV estimator, you must have one or more valid instrumental
variables. An instrumental variable is also called an instrument. We will designate an
instrumental variable as I.

Instrumental Variable
A valid instrumental variable, I, has 2 properties.

1. Instrument Relevance - An instrumental variable, I, is relevant if it is correlated with the


endogenous variable X. That is, Corr(It, Xt) 0.
2. Instrument Exogeneity An instrumental variable, I, is exogenous if it is uncorrelated with the
error term . That is, Corr(t, Xt) 0.

Two-Stage Least Squares (2SLS) Estimator

The most often used IV estimator is the two-stage least squares (2SLS) estimator. It involves two
stages.

Stage #1: Regress X on I using the OLS estimator. Save the predicted values X ^.
Stage #2: Regress Y on the predicted values X^ using the OLS estimator.

The 2SLS estimator for the slope parameter is also given by the following formula.

^2SLS = Cov(I, Y) / Cov(I, X)

where Cov(I, Y) is the sample covariance between I and Y, and Cov (I, X) is the sample
covariance between I and X.

Sampling Distribution of the 2SLS Estimator

The large sample (asymptotic) sampling distribution of the 2SLS estimator ^2SLS is given by

^2SLS ~ N(, Variance), where Variance = (1/n){Var[(I E(I))] / [Cov(I, X)] 2}

This indicates that the 2SLS estimator has an approximate normal distribution in large samples.
Also, as the sample size increases, the sampling distribution of ^2SLS collapses to the true value .
Therefore, in large samples the 2SLS estimator should produce an estimate that is close to the
true value of the population parameter.

2SLS Estimator and Estimated Standard Errors

To obtain a correct estimate of the standard error of the estimate, we must use the residuals,
t^ = Yt ^ - ^Xt, not the residuals from the second-stage regression, t = Yt ^ - ^Xt^.
Statistical programs with a 2SLS command will calculate the correct standard errors for you.

IV REGRESSION MODEL WITH ONE EXPLANTORY VARIABLE AND TWO OR MORE


INSTRUMENTAL VARIABLES

Suppose that we have m instrumental variables, I1, I2, , Im. If m = 0, then the regression
coefficients and are said to be underidentified. If m = 1, then the regression coefficients and
are said to be exactly identified. If m > 1, then the regression coefficients and are said to be
overidentified. The IV estimator can be used to obtain estimates if and are exactly identified
or overidentified. It cannot be used if and are underidentified.

2SLS Estimator

The 2SLS estimator now involves the following two stages.


Stage #1: Regress X on I1, I2, , Im using the OLS estimator. Save the predicted values X ^.
Stage #2: Regress Y on the predicted value variable X ^ using the OLS estimator.

If all instrumental variables are relevant and exogenous, then the 2SLS estimator has a normal
distribution and is consistent in large samples.

IV REGRESSION MODEL WITH TWO OR MORE EXPLANTORY VARIABLES AND TWO


OR MORE INSTRUMENTAL VARIABLES

The IV regression model with two or more explanatory variables and m instrumental variables is

Yt = 1 + 2Xt2 + 3Zt1 + , + 2+rZrt + t

Where Xt2 is the endogenous explanatory variable, Z t1, , Zrt are r exogenous explanatory
variables, and I1, I2, , Im are m instrumental variables.

2SLS Estimator

The 2SLS estimator now involves the following two stages.

Stage #1: Regress X on I1, I2, , Im and Z1, Z2, , Zr using the OLS estimator. Save the predicted
values X^.
Stage #2: Regress Y on the predicted value variable X ^ and the exogenous explanatory variables
Zt1, , Zrt using the OLS estimator.

If I1, I2, , Im are relevant and exogenous, and Z1, Z2, , Zr are exogenous, then the 2SLS
estimator has a normal distribution and is consistent in large samples.

CHECKING THE VALIDITY OF INSTRUMENTAL VARIABLES

If all instrumental variables I1, I2, , Im are uncorrelated with the endogenous explanatory
variable X, then they are not relevant. If the instrumental variables have a relative low correlation
with X, then they are said to be weak instruments. If the instruments are either not relevant or
weak, then 2SLS will not have a normal distribution and be inconsistent in large samples. If any
instrumental variable is correlated with the error term, then it is not exogenous. If any
instrumental variable is not exogenous, then 2SLS will be inconsistent in large samples. If 2SLS
is inconsistent, then it will not produce an estimate that is close to the true value of the population
parameter, even if the sample size is large. Therefore, we should check the validity of our
instrumental variable(s).

Checking Instrument Relevance

To check for instrument relevance, you calculate the F-statistic for the null hypothesis that the
coefficients of the instrumental variables are all zero in the first-stage regression. An often used
rule-of-thumb is that an F-statistic of less than 10 indicates possible weak instruments.

Checking Instrument Exogeneity

If you have one instrumental variable, and therefore the regression coefficients are exactly
identified, then you cannot check for instrument exogeneity. If you have two or more instrumental
variables, and therefore the regression coefficients are overidentified, then you can do a test of the
overidentifying restrictions. This allows you to check if all instrumental variables are exogenous.
The null hypothesis is the hypothesis that all instrumental variables are exogenous. The
alternative hypothesis is that at least one of the instrumental variables is endogenous (i.e.,
correlated with the error term). The test is a Lagrange multiplier test and involves two steps. Step
#1: Estimate the IV regression model using the 2SLS estimator. Save the 2SLS residuals t^. Step
#2. Regress the residuals t^ on the instrumental variables I1, I2, , Im and exogenous explanatory
variables Z1, Z2, , Zr using OLS. Use the R2 statistic from this regression to calculate the LM
test statistic: LM = nR2, where n is sample size. The LM statistic has an approximate chi-square
distribution, with m 1 degrees of freedom.

IV REGRESSION MODEL WITH TWO OR MORE ENDOGENOUS EXPLANTORY


VARIABLES, TWO OR MORE EXOGENOUS EXPLANATORY VARIABLES, AND TWO OR
MORE INSTRUMENTAL VARIABLES

The IV regression model with k endogenous explanatory variables, r exogenous explanatory


variables, and m instrumental variables is

Yt = 1 + 2Xt2 + , + kXtk + k+1Zt1 + , + k+rZrt + t

Where Xt2, , Xtk are k 1 endogenous explanatory variables, Zt1, , Zrt are r exogenous
explanatory variables, and I1, I2, , Im are m instrumental variables.

2SLS Estimator

The 2SLS estimator now involves the following two stages.

Stage #1: Regress each Xi on I1, I2, , Im and Z1, Z2, , Zr using the OLS estimator. Save the k
1 predicted value variables X2^, , Xk^.
Stage #2: Regress Y on the predicted value variables X2^, , Xk^ and the exogenous explanatory
variables Z1, Z2, , Zr using the OLS estimator.

If I1, I2, , Im are relevant and exogenous, and Z1, Z2, , Zr are exogenous, then the 2SLS
estimator has a normal distribution and is consistent in large samples.

Underidentified, Exactly Identified, and Overidentified Regression Coefficients

The regression coefficients are underidentified if m < k 1 ; exactly identified if m = k 1 ; and


overidentified if m > k 1 .

Checking the Validity of Instrumental Variables

Instrument Relevance

If k = 2 and you have one endogenous explanatory variable, then you can use the F-statistic to
check for instrument relevance. If k > 2, then the F-test cannot be used to check for instrument
relevance.

Instrument Exogeneity
The Lagrange multiplier test can be used to test for instrument exogeneity (test the
overidentifying restrictions) if m > k 1. The LM test statistic has an approximate chi-square
distribution with (m (k 1)) degrees of freedom.

Vous aimerez peut-être aussi