Vous êtes sur la page 1sur 43

102B - Introduction to Econometrics - Lecture 1

Paolo Pin
Stanford University - January 8
th
, 2013
Paolo Pin 102B - Lecture 1 January 8, 2013 1 / 43

Outline
Outline of todays lecture
Syllabus
Introduction
Review of Statistics using class size and educational output
The linear regression model with one regressor
Measures of t of the linear regression method
Assumptions of the linear regression method
Paolo Pin 102B - Lecture 1 January 8, 2013 2 / 43

Syllabus
Essential Information
Lectures: Tuesday and Thursday, from 1:15pm to 3:05pm
I am Paolo Pin: paolo.pin@unisi.it
Oce Hours: Thursday 10am-12am, Landau 230
There will be weekly TA sessions with TAs, from next week on, times
and locations to be announced.
Paolo Pin 102B - Lecture 1 January 8, 2013 3 / 43

Syllabus
Coursework

Paolo Pin 102B - Lecture 1 January 8, 2013 4 / 43

Syllabus
Textbook
We will refer extensively to
Introduction to Econometrics, by James H. Stock and Mark W.
Watson (Addison-Wesley, 3rd Edition)
I will use some of the examples from there and the same notation, but I
may change the order of the topics.
In the slides I will always refer to the section of the book I am considering.
Datasets are available at http://wps.pearsoned.co.uk/ema_ge_
stock_ie_3/193/49605/12699039.cw/index.html
Paolo Pin 102B - Lecture 1 January 8, 2013 5 / 43

Syllabus
Clickers
Lets see if they work:
How do you self-estimate your understanding of the topics in Math
51 (calculus and linear algebra)?
A B C D E
very good good so so. . . poor very poor
Paolo Pin 102B - Lecture 1 January 8, 2013 6 / 43

Syllabus
Scheduling
Econ 102B
Lectures TA Problem Sets Exams Ch.s Stock & Watson
1 Tue 08-gen Ch. 4
2 Thu 10-gen Ch. 17.1-.2
3 Tue 15-gen Ch. 5
4 Thu 17-gen 1 due Ch. 17.3-.5
1 Fri 18-gen
5 Tue 22-gen Ch. 6
6 Thu 24-gen Ch. 18
2 Fri 25-gen
7 Tue 29-gen
8 Thu 31-gen 2 due Ch. 7
3 Fri 01-feb
9 Tue 05-feb Ch. 8
10 Thu 07-feb Ch. 9
4 Fri 08-feb
11 Tue 12-feb 3 due
12 Thu 14-feb Ch. 12
5 Fri 15-feb
13 Tue 19-feb
14 Thu 21-feb 4 due Ch. 13
6 Fri 22-feb
15 Tue 26-feb
16 Thu 28-feb
7 Fri 01-mar
17 Tue 05-mar 5 due
18 Thu 07-mar
8 Fri 08-mar
19 Tue 12-mar 6 due
20 Thu 14-mar
9 Fri 15-mar
Final 21-mar






4
,
5
,
6
,
1
7
,
1
8











7
,
8
,
9
,
1
2
,
1
3
T
i
m
e

p
e
r
m
i
t
t
i
n
g
,

w
e
'
l
l

t
r
y

t
o

d
o

c
h
a
p
t
e
r
s

1
0
,

1
1
,

1
4

&

1
5
First Midterm
Second Midterm
Paolo Pin 102B - Lecture 1 January 8, 2013 7 / 43

Introduction Chapter 1
Why Econometrics?
Economic Theory is about qualitative eects
Example on the production of human capital:
What is the quantitative eect of reducing class size on student
achievement?
Ideal solution: an experiment, but how?
Experiments are dicult to make in economics
We need to use data, but. . .
there are always omitted variables
correlation does not imply causation
external validity (what we nd is valid generally?)
You will learn how to address this kind of problems, nd out
causalities and eventually make predictions
From the data you will be able to tell a story, and be able to evaluate
if an alternative story is plausible
Paolo Pin 102B - Lecture 1 January 8, 2013 8 / 43

Review of Statistics using class size and educational output Appendix 4.1
The California Test Score Data Set
All K-6 and K-8 California school districts(n = 420) in 1998 and 1999
Variables:
5
th
grade test scores (Stanford-9 achievement test, combined math
and reading), district average
Student-teacher ratio (STR) =
# of students in the district
# of full-time equivalent teachers
+ other school and demographic characteristics averaged across
districts

Paolo Pin 102B - Lecture 1 January 8, 2013 9 / 43

Review of Statistics using class size and educational output Chapter 3.7
Scatter plot
If we plot the test score vs. the STR we obtain

The sample correlation is negative
r
XY
=
s
XY
s
X
s
Y
=
1
n1

i
(X
i


X)(Y
i


Y)
s
X
s
Y
0.23
Is there an evidence of the eect of the STR on Test scores?
Paolo Pin 102B - Lecture 1 January 8, 2013 10 / 43

Review of Statistics using class size and educational output Chapter 3.2 - 3.5
Hypothesis testing
We need to get some numerical evidence on whether districts with low
STRs have higher test scores - but how?
1
Compare average test scores in districts with low STRs to those with
high STRs (estimation)
2
Test the null hypothesis that the mean test scores in the two types
of districts are the same, against the alternative hypothesis that
they dier (hypothesis testing)
3
Estimate an interval for the dierence in the mean test scores, high v.
low STR districts (condence interval)
Paolo Pin 102B - Lecture 1 January 8, 2013 11 / 43

Review of Statistics using class size and educational output Chapter 3.2 - 3.5
Initial data analysis
Compare districts with small(STR < 20) and large (STR 20) class sizes:
Class Size Average score

Y Standard Deviation s
Y
observations
Small 657.4 19.4 238
Large 650 17.9 182
1
Estimation of = dierence between group means
2
Test the hypothesis that = 0
3
Construct a condence interval for
Paolo Pin 102B - Lecture 1 January 8, 2013 12 / 43

Review of Statistics using class size and educational output Chapter 3.2 - 3.5
Estimation
=

Y
small


Y
large
= 657.4 650 = 7.4
Is this a large dierence?
Standard deviation across districts = 19.1
Dierence between 60
th
and 75
th
percentiles of test score distribution
is 667.6 659.4 = 8.2
This is a big enough dierence to be important for school reform
discussions, for parents, or for a school committee?
Paolo Pin 102B - Lecture 1 January 8, 2013 13 / 43

Review of Statistics using class size and educational output Chapter 3.2 - 3.5
Hypothesis testing
Dierence-in-means test: compute the t-statistic (Normally distributed as
n
s
and n
l
are large Central Limit Theorem)
t =

Y
small


Y
large
SE
_

Y
small


Y
large
_
=

Y
small


Y
large
_
s
2
s
n
s
+
s
2
l
n
l

7.4
_
19.4
2
238
+
17.9
2
182
4.05
SE
_

Y
small


Y
large
_
= 1.83 is the standard error of

Y
small


Y
large
the sample variance is computed as
s
2
s
=
1
n
s
1

i is small
_
Y
i


Y
small
_
2
As |t| > 1.96, we reject (at the 5% signicance level) the null hypothesis
that the two means are the same.
Paolo Pin 102B - Lecture 1 January 8, 2013 14 / 43

Review of Statistics using class size and educational output Chapter 3.2 - 3.5
Condence Interval
A 95% condence interval for the dierence between the means is between
the values
_

Y
small


Y
large
_
1.96 SE
_

Y
small


Y
large
_
= 7.4 1.96 1.83
so, it is [3.8, 11]
The 95% condence interval for does not include 0;
The hypothesis that = 0 is rejected at the 5% level.
Paolo Pin 102B - Lecture 1 January 8, 2013 15 / 43

Some review of Statistics Chapter 2.2-2.3
Mean, Variance and Covariance
Ok, it was very fast. . . We will recall all the theory step by step, as we will
use it in our new framework.
Y represents a characteristic of the population, with some underlying
distribution
E(Y) =
Y
is the mean of the distribution
E[(Y
Y
)
2
] =
2
Y
is the variance (
_

2
Y
is the standard deviation)
If we have two characteristics, X and Y, we can compute the covariance
cov(X, Y) = E [(X
X
)(Y
Y
)] =
XY
it measures the trend between X and Y
Paolo Pin 102B - Lecture 1 January 8, 2013 16 / 43

Some review of Statistics Chapter 2.3
Correlation
corr (X, Y) =

XY

2
X

2
Y
is the correlation (always between 1 and 1)

Paolo Pin 102B - Lecture 1 January 8, 2013 17 / 43

Some review of Statistics Chapter 2.5
Random Sampling
A sample of data drawn randomly from a population: Y
1
, . . . , Y
n
We will assume simple random sampling
Choose and individual (district, entity) at random from the population
Randomness and data
Prior to sample selection, the value of Y
i
is random because the
individual selected is random
Once the individual is selected and the value of Y
i
is observed, then
Y
i
is just a number - not random
The data set is (Y
1
, . . . , Y
n
), where Y
i
= value of Y for the i
th
individual (district, entity) sampled
Very important: The draws are independent and identically distributed
(i.i.d.)
Paolo Pin 102B - Lecture 1 January 8, 2013 18 / 43

Some review of Statistics Chapter 3.1
Estimation of the population mean

Y =

n
i =1
Y
i
n
is the estimator of the mean of Y, based on Y
1
, . . . , Y
n
Why do we use

Y and not:
the rst observation Y
1
?
an average with unequal weights?
the median?
What are the properties of

Y?

Y is a random variable based on Y


1
, . . . , Y
n
, its distribution is called the
sampling distribution.
What are E(

Y) and var (

Y)?
Paolo Pin 102B - Lecture 1 January 8, 2013 19 / 43

Some review of Statistics Chapter 2.6
Example: Bernoulli trials
Suppose Y takes on 0 or 1 (a Bernoulli random variable) with the
probability distribution
Pr (Y = 0) = .22 , Pr (Y = 1) = .78

Paolo Pin 102B - Lecture 1 January 8, 2013 20 / 43

Some review of Statistics Chapter 2.6
General results
For every population Y, with mean
Y
and variance
2
Y
, we have that
E(

Y) =
Y
var (

Y) =

2
Y
n
The rst result tells us that E(

Y) is an unbiased estimator.
with the second it implies the Law of large numbers: as n grows the
distribution becomes centered closer and closer to
Y
.
Central Limit Theorem: As n increases, the distribution of

Y approaches
a normal distribution N
_

y
,

2
Y
n
_
,
so

YE(

Y)

Y
approaches a standard normal distribution N(0, 1)
Paolo Pin 102B - Lecture 1 January 8, 2013 21 / 43

Some review of Statistics Chapter 3.2
Sample Variance and Covariance
The sample variance of Y
1
, . . . , Y
n
is
s
2
Y
=
1
n 1
n

i =1
_
Y
i


Y
_
2
Division by n 1 is a degrees of freedom correction
If we have two variables for each draw i : X
i
and Y
i
, then we can also
compute the sample covariance
s
XY
=
1
n 1
n

i =1
_
X
i


X
_ _
Y
i


Y
_
The sample correlation is unitless
r
XY
=
s
XY
s
X
s
Y
Paolo Pin 102B - Lecture 1 January 8, 2013 22 / 43

The linear regression model with one regressor Chapter 4.1
Linear regression
Linear regression lets us estimate the slope of the population regression line
The slope of the population regression line is the expected eect on
Y of a unit change in X
Our goal is to estimate the causal eect on Y of a unit change in X -
by now we are tting a straight line to data on two variables, Y and X
1
Estimation: How should we draw a line through the data to estimate
the population slope? Ordinary Least Squares (OLS)
2
Hypothesis testing: How to test if the slope is zero?
3
Condence intervals: How to construct a condence interval for the
slope?
Paolo Pin 102B - Lecture 1 January 8, 2013 23 / 43

The linear regression model with one regressor Chapter 4.1
Linear regression model
It is an econometric model (it can be supported by an economic model)
The Population Regression Line is
Y =
0
+
1
X
X is the independent variable or regressor
Y is the dependent variable

0
is the intercept (what is X if Y = 0)

1
=
Y
X
is the slope
Paolo Pin 102B - Lecture 1 January 8, 2013 24 / 43

The linear regression model with one regressor Chapter 4.1
Sample of n observations
We have n observations (X
i
, Y
i
) for i = 1, . . . , n
The econometric model to estimate is
Y
i
=
0
+
1
X
i
+ u
i
, for all i = 1, . . . , n
u
i
is the regression
error:
omitted factors
errors in
measurement
simply randomness

Paolo Pin 102B - Lecture 1 January 8, 2013 25 / 43

The linear regression model with one regressor Chapter 3.1
Analogy with the estimation of the mean
We want to estimate
0
and
1
from the data.
Recall that

Y =

i
Y
i
n
is the least squares estimator of Y, minimizing:
min
m
n

i =1
(Y
i
m)
2
if the sample is drawn with i .i .d. probabilities from the population, then

Y
is BLUE (the Best Linear Unbiased Estimator)
Paolo Pin 102B - Lecture 1 January 8, 2013 26 / 43

The linear regression model with one regressor Chapter 4.2
The Ordinary Least Squares Estimator (OLS)
We want to estimate
0
and
1
from the data.
The OLS estimator solves
min

0
,
1
n

i =1
u
2
i
= min

0
,
1
n

i =1
(Y
i

0

1
X
i
)
2
This can be solved using numerical analysis and linear algebra
In principle we could use a dierent function of u
i
(rather than the
square) to minimize
but we will see that this one has nice properties
rst of all, it can be computed exactly
Paolo Pin 102B - Lecture 1 January 8, 2013 27 / 43

The linear regression model with one regressor Appendix 4.2
Derivation of estimators

0
and

1
We want to solve
min

0
,
1
n

i =1
(Y
i

0

1
X
i
)
2
First Order Conditions are enough, because we are dealing with a
summation of upward parabolas an upward parabola:
_

n
i =1
(Y
i

0

1
X
i
)
2
= 2

n
i =1
(Y
i

0

1
X
i
)

n
i =1
(Y
i

0

1
X
i
)
2
= 2

n
i =1
(Y
i

0

1
X
i
)X
i
So the mimimum (the OLS estimators

0
and

1
) solves (factor 2 is
irrelevant)
_
n
i =1
(Y
i

1
X
i
) = 0

n
i =1
(Y
i

1
X
i
)X
i
= 0
Paolo Pin 102B - Lecture 1 January 8, 2013 28 / 43

The linear regression model with one regressor Appendix 4.2
Derivation of estimators

0
and

1
We can divide the two equations by n, obtaining (by linearity)
_

Y

X = 0

n
i =1
Y
i
X
i
n

n
i =1
X
2
i
n
= 0
This system has the unique solution
_

1
=

n
i =1
Y
i
X
i
n

X

Y

n
i =1
X
2
i
n

X
2
=
s
XY
s
2
X

0
=

Y

X
where s
X
and s
XY
are respectively the sample variance of X and the
sample covariance of X and Y
Paolo Pin 102B - Lecture 1 January 8, 2013 29 / 43

The linear regression model with one regressor Chapter 4.2
OLS predicted values
The predicted values

Y
i
and residuals u
i
are then

Y
i
=

0
+

1
X
i
, for all i = 1, . . . , n
u
i
= Y
i


Y
i
, for all i = 1, . . . , n
In this way we have estimates of the unknown true population
intercept
0
slope
1
error terms u
i
Paolo Pin 102B - Lecture 1 January 8, 2013 30 / 43

The linear regression model with one regressor Appendix 4.3
Some additional facts about predicted values
We have that

u =

i
u
i
n
=

i
(Y
i


Y)

i
(X
i


X)
n
= 0
so that the mean of the residuals is always 0
Moreover, if we call
Explained sum of squares: ESS =

n
i =1
_

Y
i


Y
_
2
Total sum of squares: TSS =

n
i =1
_
Y
i


Y
_
2
Sum of squared residuals: SSR =

n
i =1
u
2
i
we have that TSS = SSR + ESS
Paolo Pin 102B - Lecture 1 January 8, 2013 31 / 43

The linear regression model with one regressor Chapter 4.2
Application to the California Test Score Data Set

0
= 698.9 and

1
= 2.28,
so we can draw a straight line on our scatter plot

One of the districts in the data set is San Mateo, CA, for which STR = 20.16 and Test Score = 661.5
predicted value:

Y
San Mateo
= 698.9 2.28 20.16 = 652.9
residual: u
San Mateo
= 661.5 652.9 = 8.6 (is it large?)
Paolo Pin 102B - Lecture 1 January 8, 2013 32 / 43

Measures of t of the linear regression method Chapter 4.3
Measures of t
Two regression statistics provide complementary measures of how well the
regression line ts or explains the data:
1
The regression R
2
measures the fraction of the variance of Y that is
explained by X
it is unitless and ranges between zero (no t) and one (perfect t)
2
The standard error of the regression (SER) measures the magnitude
of a typical regression residual in the units of Y
Paolo Pin 102B - Lecture 1 January 8, 2013 33 / 43

Measures of t of the linear regression method Chapter 4.3
The regression R
2
Explained sum of squares: ESS =

n
i =1
_

Y
i


Y
_
2
Total sum of squares: TSS =

n
i =1
_
Y
i


Y
_
2
Sum of squared residuals: SSR =

n
i =1
u
2
i
A measure of tness of the regression is the fraction of the rst two sums
of squares
R
2
=
ESS
TSS
= 1
SSR
TSS
it is always between
0:
all non-negative terms
it is 0 if

1
= 0
and 1:
as a result of the minimization we have ESS < TSS
Paolo Pin 102B - Lecture 1 January 8, 2013 34 / 43

Measures of t of the linear regression method Chapter 4.3
The standard error of the regression
The SER measures the spread of the distribution of estimated errors u
i
.
The SER is (almost) the sample standard deviation of the OLS residuals:
SER ==

i
u
2
i
n 2
It has the unit of u
i
s, and hence of Y
It measures the average mistake made by the OLS regression line
Division by n 2 is a degrees of freedom correction
Paolo Pin 102B - Lecture 1 January 8, 2013 35 / 43

Measures of t of the linear regression method Chapter 4.3
Application to the California Test Score Data Set
In this case we have
R
2
= .05
SER = 18.6

STR explains only a small fraction of the variation in test scores.
Does this make sense?
Paolo Pin 102B - Lecture 1 January 8, 2013 36 / 43

Assumptions of the linear regression method Chapter 4.4
The Least Squares Assumptions
We have estimated our econometric model as
Y
i
=

0
+

1
X
i
+ u
i
, for all i = 1, . . . , n
What, in a precise sense, are the properties of the sampling distribution of
the OLS estimator? When will

1
be unbiased? What is its variance?
To answer these questions, we need to make three main assumptions
about
about the true errors u
i
about how they are collected (the sampling scheme)
how Y and X are related to each other
These assumptions are known as the Least Squares Assumptions.
Paolo Pin 102B - Lecture 1 January 8, 2013 37 / 43

Assumptions of the linear regression method Chapter 4.4
The Least Squares Assumptions
We have a true model where, for a sample (X, Y)
Y
i
=
0
+
1
X
i
+ u
i
, for all i = 1, . . . , n
We assume that,
1
The conditional distribution of u given X has mean zero:
E(u|X = x) = 0
this implies that

1
is unbiased
2
(X
i
, Y
i
), for all i = 1, . . . , n, are i.i.d.
this is true if (X, Y) are collected by simple random sampling
this delivers the sampling distribution of

0
and

1
3
Large outliers in X and/or Y are rare
technically, X and Y have nite fourth moments
outliers can result in meaningless values of

1
Paolo Pin 102B - Lecture 1 January 8, 2013 38 / 43

Assumptions of the linear regression method Chapter 4.4
Assumption 1: E(u|X = x) = 0

u
i
represents omitted factors, errors in measurement and randomness:
are we sure that Assumption 1 holds for the omitted factors?
i.e.: are we sure that those factors (e.g. census of the district) are
uncorrelated with the STR?
in an ideal randomized controlled experiment X is randomly assigned
Paolo Pin 102B - Lecture 1 January 8, 2013 39 / 43

Assumptions of the linear regression method Chapter 4.4
Assumption 2: (X
i
, Y
i
), for all i = 1, . . . , n, are i.i.d
This arises automatically if the entity (individual, district) is sampled by
simple random sampling:
the entities are selected from the same population, so (X
i
, Y
i
) are
identically distributed for all i = 1, . . . , n.
the entities are selected at random, so the values of (X, Y) for
dierent entities are independently distributed.
non-i.i.d. sampling may happen when data are recorded over time for
the same entity (panel data and time series data)
Paolo Pin 102B - Lecture 1 January 8, 2013 40 / 43

Assumptions of the linear regression method Chapter 4.4
Assumption 3: Large outliers are rare
Technically: E(X
4
) and E(Y
4
) are nite
A large outlier is an extreme value of X or Y
it can happen when a variable is a ratio of two variables, and the
dividend can take arbitrarily small values (e.g. STR with no teachers)
on a technical level, if X and Y are bounded, then they have nite
fourth moments
the substance of this assumption is that a large outlier can strongly
inuence the results
Paolo Pin 102B - Lecture 1 January 8, 2013 41 / 43

Assumptions of the linear regression method Chapter 4.4
Assumption 3: Large outliers are rare
Looking at the data (plot the data!), if you have a large outlier
is it a typo?
does it belong in your data set?
why is it an outlier?

Usually, outliers are often data mistakes (coding or recording problems).
Sometimes they are observations that really shouldnt be in your data set.
Paolo Pin 102B - Lecture 1 January 8, 2013 42 / 43

Assumptions of the linear regression method Chapter 4.4
The Least Squares Assumptions
Well see that under the Least Square Assumptions the OLS estimator is
unbiased
consistent
with a distribution converging to a normal
Under additional assumptions it is also ecient and it has a normal
distributions also for n small
Paolo Pin 102B - Lecture 1 January 8, 2013 43 / 43

Vous aimerez peut-être aussi