102b Lect1 Jan8

102B - Introduction to Econometrics - Lecture 1
Paolo Pin
Stanford University - January 8
th
, 2013
Paolo Pin 102B - Lecture 1 January 8, 2013 1 / 43

Outline
Outline of todays lecture
Syllabus
Introduction
Review of Statistics using class size and educational output
The linear regression model with one regressor
Measures of t of the linear regression method
Assumptions of the linear regression method

Syllabus
Essential Information
Lectures: Tuesday and Thursday, from 1:15pm to 3:05pm
I am Paolo Pin: paolo.pin@unisi.it
Oce Hours: Thursday 10am-12am, Landau 230
There will be weekly TA sessions with TAs, from next week on, times
and locations to be announced.

Syllabus
Coursework


Syllabus
Textbook
We will refer extensively to
Introduction to Econometrics, by James H. Stock and Mark W.
Watson (Addison-Wesley, 3rd Edition)
I will use some of the examples from there and the same notation, but I
may change the order of the topics.
In the slides I will always refer to the section of the book I am considering.
Datasets are available at http://wps.pearsoned.co.uk/ema_ge_
stock_ie_3/193/49605/12699039.cw/index.html

Syllabus
Clickers
Lets see if they work:
How do you self-estimate your understanding of the topics in Math
51 (calculus and linear algebra)?
A B C D E
very good good so so. . . poor very poor

Syllabus
Scheduling
Econ 102B
Lectures TA Problem Sets Exams Ch.s Stock & Watson
1 Tue 08-gen Ch. 4
2 Thu 10-gen Ch. 17.1-.2
3 Tue 15-gen Ch. 5
4 Thu 17-gen 1 due Ch. 17.3-.5
1 Fri 18-gen
5 Tue 22-gen Ch. 6
6 Thu 24-gen Ch. 18
2 Fri 25-gen
7 Tue 29-gen
8 Thu 31-gen 2 due Ch. 7
3 Fri 01-feb
9 Tue 05-feb Ch. 8
10 Thu 07-feb Ch. 9
4 Fri 08-feb
11 Tue 12-feb 3 due
12 Thu 14-feb Ch. 12
5 Fri 15-feb
13 Tue 19-feb
14 Thu 21-feb 4 due Ch. 13
6 Fri 22-feb
15 Tue 26-feb
16 Thu 28-feb
7 Fri 01-mar
17 Tue 05-mar 5 due
18 Thu 07-mar
8 Fri 08-mar
19 Tue 12-mar 6 due
20 Thu 14-mar
9 Fri 15-mar
Final 21-mar

4
,
5
,
6
,
1
7
,
1
8

7
,
8
,
9
,
1
2
,
1
3
T
i
m
e

p
e
r
m
i
t
t
i
n
g
,

w
e
'
l
l

t
r
y

t
o

d
o

c
h
a
p
t
e
r
s

1
0
,

1
1
,

1
4

&

1
5
First Midterm
Second Midterm

Introduction Chapter 1
Why Econometrics?
Economic Theory is about qualitative eects
Example on the production of human capital:
What is the quantitative eect of reducing class size on student
achievement?
Ideal solution: an experiment, but how?
Experiments are dicult to make in economics
We need to use data, but. . .
there are always omitted variables
correlation does not imply causation
external validity (what we nd is valid generally?)
You will learn how to address this kind of problems, nd out
causalities and eventually make predictions
From the data you will be able to tell a story, and be able to evaluate
if an alternative story is plausible

Review of Statistics using class size and educational output Appendix 4.1
The California Test Score Data Set
All K-6 and K-8 California school districts(n = 420) in 1998 and 1999
Variables:
5
th
grade test scores (Stanford-9 achievement test, combined math
and reading), district average
Student-teacher ratio (STR) =
# of students in the district
# of full-time equivalent teachers
+ other school and demographic characteristics averaged across
districts


Review of Statistics using class size and educational output Chapter 3.7
Scatter plot
If we plot the test score vs. the STR we obtain

The sample correlation is negative
r
XY
=
s
XY
s
X
s
Y
=
1
n1
i
(X
i

X)(Y
i

Y)
s
X
s
Y
0.23
Is there an evidence of the eect of the STR on Test scores?

Review of Statistics using class size and educational output Chapter 3.2 - 3.5
Hypothesis testing
We need to get some numerical evidence on whether districts with low
STRs have higher test scores - but how?
1
Compare average test scores in districts with low STRs to those with
high STRs (estimation)
2
Test the null hypothesis that the mean test scores in the two types
of districts are the same, against the alternative hypothesis that
they dier (hypothesis testing)
3
Estimate an interval for the dierence in the mean test scores, high v.
low STR districts (condence interval)

Initial data analysis
Compare districts with small(STR < 20) and large (STR 20) class sizes:
Class Size Average score

Y Standard Deviation s
Y
observations
Small 657.4 19.4 238
Large 650 17.9 182
1
Estimation of = dierence between group means
2
Test the hypothesis that = 0
3
Construct a condence interval for

Estimation
=

Y
small

Y
large
= 657.4 650 = 7.4
Is this a large dierence?
Standard deviation across districts = 19.1
Dierence between 60
th
and 75
th
percentiles of test score distribution
is 667.6 659.4 = 8.2
This is a big enough dierence to be important for school reform
discussions, for parents, or for a school committee?

Hypothesis testing
Dierence-in-means test: compute the t-statistic (Normally distributed as
n
s
and n
l
are large Central Limit Theorem)
t =
Y
small

Y
large
SE
_
Y
small

Y
large
_
=
Y
small

Y
large
_
s
2
s
n
s
+
s
2
l
n
l
7.4
_
19.4
2
238
+
17.9
2
182
4.05
SE
_
Y
small

Y
large
_
= 1.83 is the standard error of

Y
small

Y
large
the sample variance is computed as
s
2
s
=
1
n
s
1
i is small
_
Y
i

Y
small
_
2
As |t| > 1.96, we reject (at the 5% signicance level) the null hypothesis
that the two means are the same.

Condence Interval
A 95% condence interval for the dierence between the means is between
the values
_
Y
small

Y
large
_
1.96 SE
_
Y
small

Y
large
_
= 7.4 1.96 1.83
so, it is [3.8, 11]
The 95% condence interval for does not include 0;
The hypothesis that = 0 is rejected at the 5% level.

Some review of Statistics Chapter 2.2-2.3
Mean, Variance and Covariance
Ok, it was very fast. . . We will recall all the theory step by step, as we will
use it in our new framework.
Y represents a characteristic of the population, with some underlying
distribution
E(Y) =
Y
is the mean of the distribution
E[(Y
Y
)
2
] =
2
Y
is the variance (
_
2
Y
is the standard deviation)
If we have two characteristics, X and Y, we can compute the covariance
cov(X, Y) = E [(X
X
)(Y
Y
)] =
XY
it measures the trend between X and Y

Some review of Statistics Chapter 2.3
Correlation
corr (X, Y) =

XY
2
X
2
Y
is the correlation (always between 1 and 1)


Random Sampling
A sample of data drawn randomly from a population: Y
1
, . . . , Y
n
We will assume simple random sampling
Choose and individual (district, entity) at random from the population
Randomness and data
Prior to sample selection, the value of Y
i
is random because the
individual selected is random
Once the individual is selected and the value of Y
i
is observed, then
Y
i
is just a number - not random
The data set is (Y
1
, . . . , Y
n
), where Y
i
= value of Y for the i
th
individual (district, entity) sampled
Very important: The draws are independent and identically distributed
(i.i.d.)

Estimation of the population mean
Y =
n
i =1
Y
i
n
is the estimator of the mean of Y, based on Y
1
, . . . , Y
n
Why do we use

Y and not:
the rst observation Y
1
?
an average with unequal weights?
the median?
What are the properties of

Y?
Y is a random variable based on Y

1
, . . . , Y
n
, its distribution is called the
sampling distribution.
What are E(
Y) and var (
Y)?

Example: Bernoulli trials
Suppose Y takes on 0 or 1 (a Bernoulli random variable) with the
probability distribution
Pr (Y = 0) = .22 , Pr (Y = 1) = .78


General results
For every population Y, with mean
Y
and variance
2
Y
, we have that
E(
Y) =
Y
var (
Y) =

2
Y
n
The rst result tells us that E(
Y) is an unbiased estimator.
with the second it implies the Law of large numbers: as n grows the
distribution becomes centered closer and closer to
Y
.
Central Limit Theorem: As n increases, the distribution of

Y approaches
a normal distribution N
_
y
,

2
Y
n
_
,
so

YE(
Y)
Y
approaches a standard normal distribution N(0, 1)

Sample Variance and Covariance
The sample variance of Y
1
, . . . , Y
n
is
s
2
Y
=
1
n 1
n
i =1
_
Y
i

Y
_
2
Division by n 1 is a degrees of freedom correction
If we have two variables for each draw i : X
i
and Y
i
, then we can also
compute the sample covariance
s
XY
=
1
n 1
n
i =1
_
X
i

X
_ _
Y
i

Y
_
The sample correlation is unitless
r
XY
=
s
XY
s
X
s
Y

The linear regression model with one regressor Chapter 4.1
Linear regression
Linear regression lets us estimate the slope of the population regression line
The slope of the population regression line is the expected eect on
Y of a unit change in X
Our goal is to estimate the causal eect on Y of a unit change in X -
by now we are tting a straight line to data on two variables, Y and X
1
Estimation: How should we draw a line through the data to estimate
the population slope? Ordinary Least Squares (OLS)
2
Hypothesis testing: How to test if the slope is zero?
3
Condence intervals: How to construct a condence interval for the
slope?

Linear regression model
It is an econometric model (it can be supported by an economic model)
The Population Regression Line is
Y =
0
+
1
X
X is the independent variable or regressor
Y is the dependent variable
0
is the intercept (what is X if Y = 0)
1
=
Y
X
is the slope

Sample of n observations
We have n observations (X
i
, Y
i
) for i = 1, . . . , n
The econometric model to estimate is
Y
i
=
0
+
1
X
i
+ u
i
, for all i = 1, . . . , n
u
i
is the regression
error:
omitted factors
errors in
measurement
simply randomness


Analogy with the estimation of the mean
We want to estimate
0
and
1
from the data.
Recall that

Y =
i
Y
i
n
is the least squares estimator of Y, minimizing:
min
m
n
i =1
(Y
i
m)
2
if the sample is drawn with i .i .d. probabilities from the population, then

Y
is BLUE (the Best Linear Unbiased Estimator)

The Ordinary Least Squares Estimator (OLS)
We want to estimate
0
and
1
from the data.
The OLS estimator solves
min
0
,
1
n
i =1
u
2
i
= min
0
,
1
n
i =1
(Y
i

0
1
X
i
)
2
This can be solved using numerical analysis and linear algebra
In principle we could use a dierent function of u
i
(rather than the
square) to minimize
but we will see that this one has nice properties
rst of all, it can be computed exactly

The linear regression model with one regressor Appendix 4.2
Derivation of estimators

0
and

1
We want to solve
min
0
,
1
n
i =1
(Y
i

0
1
X
i
)
2
First Order Conditions are enough, because we are dealing with a
summation of upward parabolas an upward parabola:
_

n
i =1
(Y
i

0
1
X
i
)
2
= 2
n
i =1
(Y
i

0
1
X
i
)
n
i =1
(Y
i

0
1
X
i
)
2
= 2
n
i =1
(Y
i

0
1
X
i
)X
i
So the mimimum (the OLS estimators

0
and

1
) solves (factor 2 is
irrelevant)
_
n
i =1
(Y
i

1
X
i
) = 0
n
i =1
(Y
i

1
X
i
)X
i
= 0

Derivation of estimators

0
and

1
We can divide the two equations by n, obtaining (by linearity)
_

Y

X = 0
n
i =1
Y
i
X
i
n

n
i =1
X
2
i
n
= 0
This system has the unique solution
_
1
=
n
i =1
Y
i
X
i
n

X

Y
n
i =1
X
2
i
n

X
2
=
s
XY
s
2
X
0
=

Y

X
where s
X
and s
XY
are respectively the sample variance of X and the
sample covariance of X and Y

OLS predicted values
The predicted values

Y
i
and residuals u
i
are then
Y
i
=

0
+

1
X
i
, for all i = 1, . . . , n
u
i
= Y
i

Y
i
, for all i = 1, . . . , n
In this way we have estimates of the unknown true population
intercept
0
slope
1
error terms u
i

Some additional facts about predicted values
We have that
u =
i
u
i
n
=
i
(Y
i

Y)

i
(X
i

X)
n
= 0
so that the mean of the residuals is always 0
Moreover, if we call
Explained sum of squares: ESS =
n
i =1
_
Y
i

Y
_
2
Total sum of squares: TSS =
n
i =1
_
Y
i

Y
_
2
Sum of squared residuals: SSR =
n
i =1
u
2
i
we have that TSS = SSR + ESS

Application to the California Test Score Data Set
0
= 698.9 and

1
= 2.28,
so we can draw a straight line on our scatter plot

One of the districts in the data set is San Mateo, CA, for which STR = 20.16 and Test Score = 661.5
predicted value:

Y
San Mateo
= 698.9 2.28 20.16 = 652.9
residual: u
San Mateo
= 661.5 652.9 = 8.6 (is it large?)

Measures of t of the linear regression method Chapter 4.3
Measures of t
Two regression statistics provide complementary measures of how well the
regression line ts or explains the data:
1
The regression R
2
measures the fraction of the variance of Y that is
explained by X
it is unitless and ranges between zero (no t) and one (perfect t)
2
The standard error of the regression (SER) measures the magnitude
of a typical regression residual in the units of Y

The regression R
2
Explained sum of squares: ESS =
n
i =1
_
Y
i

Y
_
2
Total sum of squares: TSS =
n
i =1
_
Y
i

Y
_
2
Sum of squared residuals: SSR =
n
i =1
u
2
i
A measure of tness of the regression is the fraction of the rst two sums
of squares
R
2
=
ESS
TSS
= 1
SSR
TSS
it is always between
0:
all non-negative terms
it is 0 if

1
= 0
and 1:
as a result of the minimization we have ESS < TSS

The standard error of the regression
The SER measures the spread of the distribution of estimated errors u
i
.
The SER is (almost) the sample standard deviation of the OLS residuals:
SER ==
i
u
2
i
n 2
It has the unit of u
i
s, and hence of Y
It measures the average mistake made by the OLS regression line
Division by n 2 is a degrees of freedom correction

Application to the California Test Score Data Set
In this case we have
R
2
= .05
SER = 18.6

STR explains only a small fraction of the variation in test scores.
Does this make sense?

Assumptions of the linear regression method Chapter 4.4
The Least Squares Assumptions
We have estimated our econometric model as
Y
i
=

0
+

1
X
i
+ u
i
, for all i = 1, . . . , n
What, in a precise sense, are the properties of the sampling distribution of
the OLS estimator? When will

1
be unbiased? What is its variance?
To answer these questions, we need to make three main assumptions
about
about the true errors u
i
about how they are collected (the sampling scheme)
how Y and X are related to each other
These assumptions are known as the Least Squares Assumptions.

We have a true model where, for a sample (X, Y)
Y
i
=
0
+
1
X
i
+ u
i
, for all i = 1, . . . , n
We assume that,
1
The conditional distribution of u given X has mean zero:
E(u|X = x) = 0
this implies that

1
is unbiased
2
(X
i
, Y
i
), for all i = 1, . . . , n, are i.i.d.
this is true if (X, Y) are collected by simple random sampling
this delivers the sampling distribution of

0
and

1
3
Large outliers in X and/or Y are rare
technically, X and Y have nite fourth moments
outliers can result in meaningless values of

1

Assumption 1: E(u|X = x) = 0

u
i
represents omitted factors, errors in measurement and randomness:
are we sure that Assumption 1 holds for the omitted factors?
i.e.: are we sure that those factors (e.g. census of the district) are
uncorrelated with the STR?
in an ideal randomized controlled experiment X is randomly assigned

Assumption 2: (X
i
, Y
i
), for all i = 1, . . . , n, are i.i.d
This arises automatically if the entity (individual, district) is sampled by
simple random sampling:
the entities are selected from the same population, so (X
i
, Y
i
) are
identically distributed for all i = 1, . . . , n.
the entities are selected at random, so the values of (X, Y) for
dierent entities are independently distributed.
non-i.i.d. sampling may happen when data are recorded over time for
the same entity (panel data and time series data)

Assumption 3: Large outliers are rare
Technically: E(X
4
) and E(Y
4
) are nite
A large outlier is an extreme value of X or Y
it can happen when a variable is a ratio of two variables, and the
dividend can take arbitrarily small values (e.g. STR with no teachers)
on a technical level, if X and Y are bounded, then they have nite
fourth moments
the substance of this assumption is that a large outlier can strongly
inuence the results

Assumption 3: Large outliers are rare
Looking at the data (plot the data!), if you have a large outlier
is it a typo?
does it belong in your data set?
why is it an outlier?

Usually, outliers are often data mistakes (coding or recording problems).
Sometimes they are observations that really shouldnt be in your data set.

Well see that under the Least Square Assumptions the OLS estimator is
unbiased
consistent
with a distribution converging to a normal
Under additional assumptions it is also ecient and it has a normal
distributions also for n small

102b Lect1 Jan8

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

102b Lect1 Jan8

Transféré par

Droits d'auteur :

Formats disponibles

102B - Introduction to Econometrics - Lecture 1

Y is a random variable based on Y

Vous aimerez peut-être aussi