Chapter 01 Selection Problem

Introduction
This course is an introduction to applied microeconometric methods including difference-in-

differences estimation, instrumental variables, matching methods, regression discontinuity
designs, and panel data methods. We apply the “micro” prefix to econometrics to imply the
use of individual-level or firm-level survey or administrative data either of the cross-sectional
or panel variety. I’ve written these notes to appeal to a practitioner (like me) rather than a
theoretical econometrician (like the people who developed all the estimators we will cover). I
emphasize the importance of a strong identification strategy in estimating causal effects with
some policy relevance. I also emphasize effective data visualization and displaying regression
results in professional-looking tables.
Learning to use Stata effectively is a vital skill for empirical researchers in any field of
economics. We use Stata in this course and I put anything that would be meaningful to
Stata as input or output in typewriter font. Stata commands do not have any special
characters at the end, so please ignore punctuation marks at the end of the command in
these notes. Sometimes I will put a Stata command on a line by itself as in
. reg y x, robust
Applied microeconometric methods are sometimes described as being a-theoretical and

simply letting the data speak. I strongly disagree with this characterization. Economic
theory is an essential ingredient in justifying the use of a particular microeconometric method.
This is why we wait until the completion of your first-year of PhD coursework before requiring
this course. In addition to economic theory, institutional details are often needed to justify
an identification approach which means that you will need to become on expert on the
institutions related to your research question.
iv
Chapter 1
The Selection Problem
Reading
Assignments: • “Does the Vaccine Matter” The Atlantic (November 2009)
• Angrist and Krueger (1999) “Empirical Strategies in Labor Eco-

nomics” Handbook of Labor Economics, 3A:1277-1284
The selection problem is one of the most important concepts in economics. To illustrate this
concept, we start with a health-related example. Suppose that we would like to empirically
estimate the effect of the flu shot on health. This is known as treatment evaluation,1 where
health is the outcome and the flu shot is the treatment. Unfortunately, we don’t have a
flu shot experiment to analyze. In fact, there has never been a randomized controlled trial
to evaluate the effectiveness of the flu shot. Instead, imagine that we have survey data
from a large random sample of adults. For each individual i we observe if a flu shot was
administered as well as a measure of overall health. The flu shot treatment is given by:
(
1 if individual i received the treatment
si =
0 if individual i did not receive the treatment.
The outcome of interest, yi , is a continuous measure of overall health where higher values
indicate better health.2 We can use this data to calculate the mean of y when s = 1 and the
mean of y when s = 0. Empirical studies of the effect of the flu shot on health have found
1
If the treatment varies by type or intensity, we call this multiple treatment evaluation.
2
In practice, objective measures of health are hard to come by, particularly objective measures that are
continuous. Many researchers just use mortality (an indicator variable for death) as their measure of health.
1
that:
E(yi |si = 1) > E(yi |si = 0). (1.1)
Does this mean that the flu shot is causing health to improve, perhaps by making contracting
the flu less likely and thus reducing the probability of other flu-related illnesses? No, it does
not. Eq. (1.1) simply declares that people who get a flu shot are healthier on average than
those who do not. Comparing the average health of those who received a flu shot with those
who did not does not tell us why there is a difference, it only tells us that the flu shot and
good health are positively correlated. I’ll repeat: the difference between the average health
of those who received a flu shot and the average health of those who did not receive a flu shot
does not does not tell us if the flu shot is making people healthier, that is, causing people
to have better health. Ultimately, it is the causal effect of the flu shot on health that we
care about.
1.1 Potential Outcomes

In order to mathematically define the question of how the flu shot affects health, we will
introduce some notation for denoting potential outcomes.3 For each individual in this
example, there are two potential outcomes:
(
y1i if si = 1
yi =
y0i if si = 0 .
The unit-level treatment effect is the difference between the two potential outcomes:
τi = y1i − y0i . (1.2)
The unit-level treatment effect, τi , is the causal effect of the treatment for individual i. If
τi is positive it means that the flu shot would improve individual i’s health. If τi is zero, it
means that the flu shot has no effect on individual i’s health.
The fundamental problem of causal inference is that it is impossible to observe
both y1i and y0i for the same individual. This makes it impossible to observe the unit-level
treatment effect, τi . For each individual i we we only observe:
yi = si y1i + (1 − si ) y0i = y0i + si (y1i − y0i ) = y0i + si τi . (1.3)
This potential outcome notation allows us to frame causal inference as a missing data problem
3
The potential outcome approach was developed by Rubin (1974, 1977, and 1978).
2
because we can’t know what effect the flu shot has unless we know what would have happened
had the individual not received the flu shot.
A large amount of homogeneity could solve the missing data problem. Assume that
everyone is identical except that some people choose s = 1 and others choose s = 0 randomly.
Everyone is identical so all individuals have the same potential outcomes. This implies that
if individual i selects si = 1 and therefore we observe yi = y1i we could use the health
outcome yj for individual j, who selected sj = 0, as a proxy for y0i . The reason this works is
that the homogeneity ensures that y0j = y0i . This enables us to calculate the causal effect,
y1i − y0i , by using observed health from different individuals. Alternatively, assume that y0i
and y1i are constant over time for individual i. Then if there are some periods in which y0i
is observed and others in which y1i is observed, we can calculate the causal effect, y1i − y0i ,
by using observed health from different time periods. However, in practice neither of these
approaches is likely to work as there is often a large degree of heterogeneity both across
individuals and over time.
There may also be heterogeneity in the unit-level treatment effect. The effect of a flu
shot may be larger for some individuals than others. There may be individuals for which a
flu shot actually decreases health. Because we never observe both y1i and y0i for the same
person, we are just trying to learn about the average value of τ . The three most commonly
estimated treatment effects are the average treatment effect (ATE), the average treatment
effect on the treated (ATT), and the average treatment effect on the untreated (ATU) as
defined below:
Definition 1.1 (Average Treatment Effect) The average treatment effect (ATE) is
the population mean of all unit-level treatment effects:
E [τ ] = E [y1 − y0 ] = E [y1 ] − E [y0 ] .
Definition 1.2 (Average Treatment Effect on the Treated) The average treat-
ment effect on the treated (ATT) is the population mean of unit-level treatment effects
for only those individuals who received the treatment:
E [τ |s = 1] = E [y1 − y0 |s = 1] = E [y1 |s = 1] − E [y0 |s = 1]
3
Definition 1.3 (Average Treatment Effect on the Untreated) The average treat-
ment effect on the untreated (ATU) is the population mean of unit-level treatment effects
for only those individuals who did not received the treatment:
E [τ |s = 0] = E [y1 − y0 |s = 0] = E [y1 |s = 0] − E [y0 |s = 0]
We contrast these average treatment effects with the naı̈ve average treatment effect
(NATE) that is commonly reported in newspapers and in unsophisticated academic arti-
cles and reports. The naı̈ve average treatment effect is what we obtain when we compare
the average value of y for those with s = 1 to the average value of y for those with s = 0. It
is also what we obtain when we run a simple regression of y on s.
Definition 1.4 (Naı̈ve Average Treatment Effect) The naı̈ve average treatment ef-
fect (NATE) is the difference in the population mean for those who receive the treatment
and those who did not receive the treatment:
E [y|s = 1] − E [y|s = 0]
The naı̈ve average treatment effect is not the ATE, the ATT, or the ATU. So, what does it
estimate? We can decompose the NATE into two parts:
E [y|s = 1] − E [y|s = 0] = E [y1 |s = 1)] − E [y0 |s = 0]

| {z }
NATE
(1.4)
= E [y1 |s = 1] − E [y0 |s = 1] + E [y0 |s = 1] − E [y0 |s = 0] .
| {z } | {z }
ATT Selection Bias
Moving from the first to the second line above comes from adding and subtracting the
same term, E [y0 |s = 1]. This enables us to express the naı̈ve average treatment effect as a
combination of the average treatment effect on the treated and selection bias. Consider
the selection bias term, E [y0 |s = 1] − E [y0 |s = 0]. This term represents the difference in
baseline health (in the hypothetical world in which there was no treatment) between those
who choose to get a flu shot and those who do not. If healthier people are more likely to
get the flu shot than those who are less healthy, this selection bias term will be positive.
4
This implies that the naı̈ve average treatment effect is larger than the causal effect for the
treated. This positive selection into the treatment is what likely accounts for the findings
that those who get the flu shot are less likely to be in an car accident, get diabetes, and
become unemployed.
Alternatively, we can express the naı̈ve average treatment effect as a combination of the
ATU and a different form of selection bias:
E [y|s = 1] − E [y|s = 0] = E [y1 |s = 1)] − E [y0 |s = 0]

| {z }
NATE
(1.5)
= E [y1 |s = 1] − E [y1 |s = 0] + E [y1 |s = 0] − E [y0 |s = 0] .
| {z } | {z }
Outcome Selection Bias ATU
This time, the selection bias term, E [y1 |s = 1] − E [y1 |s = 0], is not in terms of the baseline
health status, y0 , but is instead in terms of the post-treatment health status, y1 . This term
incorporates both the positive selection of healthier people into the flu shot treatment as
well as any difference in the causal effect of the treatment (even if not observed). for those
who chose to receive the flu shot and those who did not.
The average treatment effect (ATE) is simply the weighted average of the average treat-
ment effect on the treated (ATT) and the average treatment effect on the untreated (ATU):
E [y1 ] − E [y0 ] = π (E [y1 |s = 1] − E [y0 |s = 1]) + (1 − π) (E [y1 |s = 0] − E [y0 |s = 0]) (1.6)

| {z } | {z } | {z }
ATE ATT ATU
where π is the fraction of the population that selects into treatment. This means that we
can also decompose the naı̈ve average treatment effect as follows:
E(y1 |s = 1) − E(y0 |s = 0) = E [y1 ] − E [y0 ]

| {z } | {z }
NATE ATE
+π (E [y0 |s = 1] − E [y0 |s = 0])
| {z } (1.7)
Selection Bias
+ (1 − π) (E [y1 |s = 1] − E [y1 |s = 0])
| {z }
Outcome Selection Bias
This can then be re-written as:
NATE = ATE + E [y0 |s = 1] − E [y0 |s = 0] + (1 − π) (ATT − ATU) (1.8)

| {z }
Selection Bias
which shows that the difference between the average treatment effect and the naı̈ve average
treatment effect is the selection bias plus (1 − π) times the difference between the average
5
treatment effect on the treated and the average treatment effect on the untreated.
1.2 Random Assignment

Suppose that were were able to do a randomized trial to test the effectiveness of the flu
shot. Participants would be randomly assigned, perhaps by flipping a coin, to either receive
the flu shot or a placebo. Random assignment makes s independent of the potential outcomes
y1 and y0 which means that E [y0 |s = 1] = E [y0 |s = 0]. This means that the average pre-
treatment health status is the same for the treated and untreated groups. Under random
assignment, Eq. (1.4) reduces to equality between the naı̈ve average treatment effect and the
average treatment effect on the treated:
E [y|s = 1] − E [y|s = 0] = E [y1 |s = 1] − E [y0 |s = 1] = E [τ ] . (1.9)

| {z } | {z }
N AT E ATT
In addition, random assignment assures us that there will be no difference between the ATT
and the ATU. The treated and untreated groups were randomly assigned so there should be
no underlying difference between the two groups in the average unit-level treatment effect
and thus E [y1 |s = 1] = E [y1 |s = 0]. Therefore Eq. (1.4), Eq. (1.5), and Eq. (1.8) simplify
down to a single equivalence:
E [y|s = 1] − E [y|s = 0] = E [y1 − y0 |s = 1] = E [y1 − y0 |s = 0] = E [y1 ] − E [y0 ] . (1.10)

| {z } | {z } | {z } | {z }
N AT E ATT ATU ATE
The condition under which equation Eq. (1.10) is true is written as {y1 , y0 } ⊥ s, indicating
that the treatment is independent of the two potential outcomes. This assumption was first
stated by Rosenbaum and Rubin (1983) as unconfoundedness and we will elaborate on
the unconfoundedness assumption below.
Randomized trials are not common in economics, but they offer the most credible evi-
dence. One example is the 1962 Perry preschool project, a experiment in which some black
preschool students in Michigan were randomly assigned to receive high-quality schooling and
home visits. Another is the 1985 Tennessee STAR experiment in which students in kinder-
garten through third-grade were randomly assigned to either small classes (13-17 students),
large classrooms (22-25 students), or large classrooms with a paid teacher aide. The Oregon
Medicaid Experiment randomized which families were allowed to participate in the Medicaid
expansion.
Randomized trials are the conceptual benchmark for observational or non-experimental
6
study designs. When trying to estimate a causal effect using observational data, the
economist should ask, what ideal experiment would yield the causal effect of interest? If you
cannot think of an experiment that would produce the desired causal effect estimate, this
implies that you do not have a well-defined research question. If there is no ideal experiment
that would estimate the causal effect, regression analysis will not be able to estimate the
causal effect either.
1.3 Regression Analysis

Regression analysis is the primary tool of an applied microeconometrician. We spend a large
fraction of our work time preparing data for a regression, running regressions, and describing
the results of regressions. We begin by making two assumptions that will allow us to specify
the standard regression model. The first is the stable unit treatment value assumption
says that individual i’s observed value of yi only depends on if i received the treatment and
does not depend on who else received the treatment or how i was selected for treatment.
This assumption is stated formally as:
Definition 1.5 (Stable Unit Treatment Value Assumption) The observed out-
comes are realized as
yi = y1i si + y0i (1 − si )
regardless of the mechanism used to assign the treatment and regardless of what treatments
the other units receive.
This assumption implies that the potential outcomes of individuals must be unaffected by
the treatment of individual j and rules out all interference across units. In our flu shot
example, this assumption will not be satisfied if the treatment of some individuals directly
increases the potential health outcomes for untreated individuals.
The second assumption is the constant treatment effects assumption which implies
that the treatment effect is the same for every individual. This assumption is stated formally
as:
7
Definition 1.6 (Constant Treatment Effects Assumption)
y1i − y0i = τ ∀i
where the treatment effect τ is the same for every individual.
Under these two assumptions, we specify the regression model as:
y i = α + τ s i + ui . (1.11)
In this model, τ is the causal effect of the flu shot on health. A regression model is considered
a structural model if it represents a causal relationship rather than statistical correlation.
A structural equation may be derived from theory or obtained from informal reasoning. The
error term ui consists of everything other than si that determines yi . This includes both
omitted variables and measurement error.
We can derive the expected value of health conditional on flu shot status:
E (yi |si = 1) = α + τ + E (ui |si = 1)

(1.12)
E (yi |si = 0) = α + E (ui |si = 0)
So by differencing these two conditional expectations, we again find:
E(yi |s = 1) − E(yi |si = 0) = | {z

τ } + E(u|s = 1) − E(u|s = 0) . (1.13)
| {z }
Causal Effect Selection Bias
Lets turn now to estimating the model to obtain τ̂ , the estimated causal effect of s on y.
In this simple linear regression model, the OLS estimate is given by:
n
X
(si − s̄) yi
i=1
τ̂ = n (1.14)
X 2
(si − s̄)
i=1
Let’s consider an example with 1,000 simulated data points:
. clear
. set seed 123
8
. set obs 1000
. gen u = rnormal()
. gen s = rnormal()
. gen y = 2*s + 4*u
. reg y s
. predict yhat, xb
. gen yline = 0.0813193 + 1.988262*s
. summarize y*
. set scheme s1mono
. twoway (lfit y s, lcolor(black) lwidth(medium)) (scatter y s, mcolor(black)
msymbol(point)), legend(off) ytitle(y) xtitle(s)
Run these commands in Stata and examine the results. In this simulation, we chose the
data generating process, y = 2s + 4u, and we generated s and u independently. Therefore,
u and s have little correlation and the regression model that we specified is the correct
structural model producing causal estimates. The estimated coefficient on s is very close to
the true population value of 2, as we would expect.
Figure 1.1: Effect of the Flu Shot on Health

0 20
10
y
−10
−20
−4 −2 0 2 4
s
The figure plots each of the simulated data points as well as the regression line. The
intercept is very close to the true parameter value of 0 and the simulated data points appear to
9
have a constant variance with no unusual patterns. However, it is essential to understand that
no amount of statistical testing can tell us if the regression model we specified is producing
a causal estimate. Consider altering the example slightly to draw random values of s that
are correlated with u:
. clear
. set seed 123
. set obs 1000
. local corr = 0.5
. gen u = rnormal()
. gen s = u + sqrt((1/(‘corr’^2)-1)) * rnormal()
. gen y = 2*s + 4*u
. reg y s
. twoway (lfit y s, lcolor(black) lwidth(medium)) (scatter y s, mcolor(black)
msymbol(point)), legend(off) ytitle(y) xtitle(s)
Figure 1.2: Correlation of Health and the Flu Shot

20
10
0
y
−10
−20
−30
−10 −5 0 5 10
s
The true causal effect of s on y is still 2 as specified in the data generating process.
However, the correlation between u and s causes bias in our estimate of τ as shown in Figure
1.2. Returning to equation 1.14 and then substituting yi = α + τ si + ui into that equation,
10
we obtain the following:
n
X
(si − s̄) (α + τ si + ui )
i=1
τ̂ = n
X
(si − s̄)2
i=1
n
X n
X n
X
(si − s̄) (si − s̄) si (si − s̄) ui
i=1 i=1
=α n +τ n + i=1
Xn
X X
(si − s̄)2 (si − s̄)2 (si − s̄)2
i=1 i=1 i=1
| {z } | {z }
=0 =1
n
X
(si − s̄) ui
i=1
τ̂ = τ + n (1.15)
X 2
(si − s̄)
i=1
| {z }
Selection Bias
The explanatory variable, s is said to be endogenous if it is correlated with u. If s is

P
correlated with u, the term (si − s̄) ui will be non-zero implying that τ̂ is biased. If s is
not correlated with u, s is said to be exogenous and the estimate of τ will be unbiased.
If the treatment, s, is randomly assigned and the stable unit treatment value assumption
holds, this simple regression model will yield an estimate of the true causal effect, τ . However,
this is rarely the case for an applied microeconometrician working with observational data.
The treatment is nearly always endogenous because it is self-selected by the individual.
We should always be suspicious of correlations found in non-experimental data. Indi-
viduals make the choice that they think will be best for them, subject to constraints. This
creates a spurious correlation that shows up in our regression estimates as selection bias. But,
economic theory and an understanding of the institutional details can be used to estimate
causal effects, even when working with observational data.
11

Chapter 01 Selection Problem

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Chapter 01 Selection Problem

Transféré par

Droits d'auteur :

Formats disponibles

Introduction

This course is an introduction to applied microeconometric methods including difference-in-

Applied microeconometric methods are sometimes described as being a-theoretical and

The Selection Problem

• Angrist and Krueger (1999) “Empirical Strategies in Labor Eco-

1.1 Potential Outcomes

τi = y1i − y0i . (1.2)

yi = si y1i + (1 − si ) y0i = y0i + si (y1i − y0i ) = y0i + si τi . (1.3)

E [τ ] = E [y1 − y0 ] = E [y1 ] − E [y0 ] .

E [τ |s = 1] = E [y1 − y0 |s = 1] = E [y1 |s = 1] − E [y0 |s = 1]

E [τ |s = 0] = E [y1 − y0 |s = 0] = E [y1 |s = 0] − E [y0 |s = 0]

E [y|s = 1] − E [y|s = 0] = E [y1 |s = 1)] − E [y0 |s = 0]

E [y|s = 1] − E [y|s = 0] = E [y1 |s = 1)] − E [y0 |s = 0]

E [y1 ] − E [y0 ] = π (E [y1 |s = 1] − E [y0 |s = 1]) + (1 − π) (E [y1 |s = 0] − E [y0 |s = 0]) (1.6)

E(y1 |s = 1) − E(y0 |s = 0) = E [y1 ] − E [y0 ]

This can then be re-written as:

NATE = ATE + E [y0 |s = 1] − E [y0 |s = 0] + (1 − π) (ATT − ATU) (1.8)

1.2 Random Assignment

E [y|s = 1] − E [y|s = 0] = E [y1 |s = 1] − E [y0 |s = 1] = E [τ ] . (1.9)

E [y|s = 1] − E [y|s = 0] = E [y1 − y0 |s = 1] = E [y1 − y0 |s = 0] = E [y1 ] − E [y0 ] . (1.10)

1.3 Regression Analysis

where the treatment effect τ is the same for every individual.

Under these two assumptions, we specify the regression model as:

E (yi |si = 1) = α + τ + E (ui |si = 1)

So by differencing these two conditional expectations, we again find:

E(yi |s = 1) − E(yi |si = 0) = | {z

Let’s consider an example with 1,000 simulated data points:

Figure 1.1: Effect of the Flu Shot on Health

Figure 1.2: Correlation of Health and the Flu Shot

The explanatory variable, s is said to be endogenous if it is correlated with u. If s is

Vous aimerez peut-être aussi