Vous êtes sur la page 1sur 19

Regression

IGS
Dale, S. B., & Krueger, A. B. (1999). Estimating
the payoff to attending a more selective college:
An application of selection on observables and
unobservables (No. w7322). National bureau of
economic research.

2
Uncontrolled earnings gap between private and
public college students (made-up data)

Private
Private
Public Avg(Private) Avg(Public)
Private = $ 92,000 $ 72,500
Public = $ 19,500
Private
Private
Public
Public

Source: Adapted from Angrist & Pischke 2014, p. 53


3
Sources of selection bias
All factors that determine both college choice and later
earnings
Family income
Skills and talents
Diligence
Family connections
???
Virtually infinite combinations
Many factors hard to quantify
4
Matching strategy

Dale & Krueger (2002) matched students based on the


selectivity of colleges to which they applied and were
admitted Single key summary measure of
Ambition
Ability
Assumption: Among students applying and being
admitted to equally selective colleges, the choice
between private and public is serendipitous

5
Matched-applicant groups

Within-group comparisons
more apples-to-apples than
uncontrolled private-public
comparisons

Source: Adapted from Angrist & Pischke 2014, p. 53


6
Matched-applicant groups (contd)

Avg(A) = $ 106,667 Avg(A) Avg(B)


=
Avg(B) = $ 45,000 $ 61,667

Evidence of
Selection Bias
Source: Adapted from Angrist & Pischke 2014, p. 53
7
Matched comparisons
Group-size weighted ATE
3 2
= 5,000 + 30,000 = 9,000
5 5

Avg(Private) Avg(Public) = -$5,000


= $9,000
Avg(Private) Avg(Public) = $30,000

UNINFORMATIVE: Matched-applicant
groups must contain both private
(treated) and public (control)
College students
Source: Adapted from Angrist & Pischke 2014, p. 53
8
Regression as an automated matchmaker
Regression matches on covariates and then averages within-cell
treatment-control differences
Estimates of the regression parameters (called coefficients) are
weighted averages of multiple matched comparisons / group-
specific differences

= + + +

Ordinary least squares (OLS) models estimate parameters so to


minimize the sum of squared residuals, i.e. the difference
between the observed and the fitted values
9
Regression-based causal inference
Regression compares treatment and control
units that have the same observed
characteristics
Assumption: When key observed variables have
been made equal across treatment and control
groups, selection bias from the variables we
cant observe is also eliminated
10
Stata dataset

11
Uncontrolled Matched
Comparison Comparison

12
Matched college selectivity groups
Dale and Krueger (2002): 5,583 matched students falling into 151 similar-
selectivity groups containing both private and public students
Matching based on Barrons selectivity ranking
6 categories (Most Competitive; Highly Competitive; . . . ; Noncompetitive)
Criteria: (a) class rank of enrolled students and (b) admission rate
Examples of the 151 similar-selectivity groups
Group 1: Students who applied and were admitted to three Most Competitive colleges
...
Group 78: Students who applied to two Highly Competitive colleges and one Less Competitive
college and were admitted to one of each type
...

13
Regression model in Dale and Krueger 2002
When data fall into one of J categories/groups, we need (J-1)
dummies in the regression 150 selectivity-group dummies
150

ln = + + + 1 + 2 ln +
=1

NOTE: The model includes additional controls that are not shown (i.e., female, race, athlete, top
10% of high school class)

14
Source: Angrist & Pischke 2014, p. 63
15
SELF-REVELATION MODEL

Selection controls
Average SAT score in the set of
colleges students applied to
Number of applications
Rationale: Weaker students will apply
to fewer and less-selective colleges

Source: Angrist & Pischke 2014, p. 66


16
Source: Angrist & Pischke 2014, p. 67
17
Uncontrolled Matched
Comparison Comparison
Short Long

= + + = + + +

= = 10,000 18
Omitted variable bias formula

= 0 + 1 + = + + +

= = 1 = .1667 60,000 = 10,000


19