Vous êtes sur la page 1sur 42

EC3304 Econometrics II

Textbook: Stock and Watsons Introduction to Econometrics


Topics:
Ch. 2 and 3: Review of probability and statistics, matrix algebra, and simple regression model
Ch. 10: Panel data
Ch. 11: Binary dependent variables
Ch. 12: Instrumental variables
Ch. 13: Experiments and quasi-experiments
Ch. 14: Intro to time series and forecasting
Ch. 15: Estimation of dynamic causal eects
Ch. 16: Additional topics in time series
1
Assessment:
1. Tutorial participation 20%
2. Midterm exam 30%
week 7, time and location to be announced
3. Final exam 50%
28 April, 2014 (Monday), 1PM, location to be announced
Oce hours: TBA
Tutorials start week 4
2
Chapter 2 Review of Probability
Section 2.2 Expected values, mean, and variance
The expected value of a random variable, Y , denoted E(Y ) or
Y
, is a weighted-average
of all possible values of Y
For a discrete random variable with k possible outcomes and probability function Pr (Y = y):
E(Y ) =
k

j=1
y
j
Pr (Y = y
j
)
For a continuous random variable with probability density function f (y):
E(Y ) =
_

y f (y) dy
Intuitively, the expectation can be thought of as the long-run average of a random variable
over many repeated trials or occurrences
3
The variance and standard deviation measure the dispersion or spread of a proba-
bility distribution
For a discrete random variable:

2
Y
= var (Y ) = E(Y
Y
)
2
=
k

j=1
(y
j

Y
)
2
Pr (Y = y
j
)
For a continuous random variable:

2
Y
= var (Y ) = E(Y
Y
)
2
=
_

(y
Y
)
2
f (y) dy
The standard deviation is the square root of the variance, denoted
Y
It is easier to interpret the SD since it has the same units as Y
The units of the variance are the squared units of Y
4
Linear functions of a random variable have convenient properties
Suppose
Y = a + bX,
where a and b are constants
The expectation of Y is

Y
= a + b
X
The variance of Y is

2
Y
= b
2

2
X
The standard deviation of Y is

Y
= b
X
5
A random variable is standardized by the formula
Z
X
X

X
,
which can be written as
Z = aX + b,
where a =
1

X
and b =

X
.
The expectation of Z is

Z
= a
X
+ b = (
X
/
X
)

X
= 0
The variance of Z is

2
Z
= a
2

2
X
=
_

2
X
/
2
X
_
= 1
Thus, the standardized random variable Z has a mean of zero and a variance of 1
6
Covariance measures the extent to which two random variables move together:

XY
cov (X, Y ) = E [(X
X
) (Y
Y
)]
=
k

i=1
l

j=1
(x
i

X
) (y
j

Y
) Pr (X = x
i
, Y = y
j
)
The covariance depends on the units of measurement
Correlation does not
Correlation is the covariance divided by the standard deviations of X and Y :

XY
corr (X, Y ) =
cov (X, Y )
_
var (X) var (Y )
=

XY

Y
The correlation is constrained to the values:
1 corr (X, Y ) 1
7
Section 2.4 The normal, chi-squared, Student t, and F distributions
We will concern ourselves with the normal distribution which we will see over and over
The normal distribution is a continuous random variable that can take on any value
Its PDF has the familiar bell-shaped graph
We say X is normally distributed with mean and variance
2
, written as X N
_
,
2
_
Mathematically, the PDF of a normal random variable X with mean and variance
2
is
f (x) =
1

2
2
exp
_

(x )
2
2
2
_
If X N
_
,
2
_
, then aX + b N
_
a + b, a
2

2
_
A good relationship to memorize is
A random variable Z such that Z N (0, 1) has the standard normal distribution
The standard normal PDF is denoted by (z) and is given by
(z) =
1

2
exp
_

1
2
z
2
_
The standard normal CDF is denoted by (z)
In other words:
Pr (Z c) = (c)
Ex. Suppose X N (3, 4) and we want to know Pr (X 1)
We compute the probability by standardizing and then looking the probability up in a
textbook:
Pr (X 1) = Pr (X 3 1 3) = Pr
_
X 3
2

2
2
_
= Pr (Z 1) = (1) = 0.159
10
Section 2.5 Random sampling and the distribution of the sample average
A population is any well-dened group of subjects such as individuals, rms, cities, etc
We would like to know something about the population
Ex. the distribution of wages of the working population (or mean and variance, etc.)
We cannot survey the whole population
instead we use random sampling and make inferences regarding the distribution
11
Random sampling: n objects are selected at random from a population with each
member of the population having an equally likely chance of being selected
If we obtain the wages of 500 randomly chosen people from the working population, then
we have a random sample of wages from the population of all working people
We use the sample to infer the distribution
The observations are random reecting the fact that many dierent outcomes are possible
If we sample another 500 people, then the wages in this sample will dier
So our estimation of the distribution (or some statistic) is itself random
12
Formally, let f(y) be some unknown pdf that we want to learn about
The value taken on by random variable Y is the outcome of an experiment and the associated
probabilities are dened by the function f(y)
Suppose that we repeat the experiment n times independently
We say that the n observations denoted Y
1
, Y
2
, . . . , Y
n
is a random sample of size n from
the population f(y)
This random sample consists of observations of n independently and identically distributed
random variables Y
1
, Y
2
, . . . , Y
n
, each with the pdf f(y)
13
The sample mean Y is: Y =
1
n
(Y
1
+ Y
2
+ + Y
n
) =
1
n

n
i=1
Y
i
The sample mean is random and has a sampling distribution
The mean of Y is
E
_
Y
_
= E
_
1
n
n

i=1
Y
i
_
=
1
n
n

i=1
E(Y
i
) =
1
n
n
Y
=
Y
The variance of Y is

2
Y
var
_
Y
_
= var
_
1
n
n

i=1
Y
i
_
=
1
n
2
n

i=1
var (Y
i
) =
1
n
2
n
2
Y
=

2
Y
n
(since Y
1
, Y
2
, . . . , Y
n
are i.i.d. for i = j cov (Y
i
, Y
j
) = 0)
These results hold whatever the underlying distribution of Y is
If Y N
_

Y
,
2
Y
_
, then Y N
_

Y
,
2
/n
_
(the sum of normally distributed random variables is normally distributed)
14
Section 2.6 Large-sample approximations to sampling distributions
There are two approaches to characterizing sampling distributions: exact or approximate
The exact approach means deriving the exact sampling distribution for any value of n
nite-sample distribution
The approximate approach means using an approximation that can work well for large n
asymptotic distribution
The key concepts for asymptotic results are the Law of Large Numbers and the Central
Limit Theorem
15
Convergence in Probability
Let S
1
, S
2
, . . . , S
n
be a sequence of random variables. The sequence S
n
is said to converge
in probability to if the probability that S
n
is close to tends to 1 as n
The constant is called the probability limit of S
n
Ex. S
n
is the sample average of a sample of n observations: S
n
= Y . We will argue below
that the probability limit of Y is
Y
This idea is written mathematically as
Y
p

Y
or plim
_
Y
_
=
Y
16
Usually we will use the Law of Large Numbers to nd the probability limit of a random
variable
The Law of Large Numbers states that under certain conditions Y will be close to
with a very high probability when n is large:
If Y
i
, i = 1, . . . , n are i.i.d with E(Y
i
) = and if var (Y
i
) < , then
Y
p

Y
17
Convergence in Distribution
Let F
1
, F
2
, F
3
, . . . , F
n
be a sequence of cumulative distribution functions corresponding to
a sequence of random variables S
1
, S
2
, . . . , S
n
. The sequence S
n
is said to converge in
distribution to S if the sequence of distribution functions F
n
converges to F, the CDF
of S
That is, as n , the distribution of S
n
is approximately equal to the distribution of S
Ex. S
n
may be the standardized sample average: S
n
=
Y

Y
, which we will show below
has a CDF similar to S N (0, 1) when n is large
We write
S
n
d
S
The distribution F is called the asymptotic distribution of S
n
19
Usually we will use the Central Limit Theorem to argue convergence in distribution
The Central Limit Theorem states that under certain conditions the distribution of
a standardized sample mean is well approximated by a normal distribution when n is large
Suppose that Y
1
, . . . , Y
n
are i.i.d. with E(Y
i
) =
Y
and var (Y
i
) =
2
Y
, where 0 <
2
Y
< .
Then, for large n,
Y
Y

Y
/

n
a
N (0, 1)
This is a very convenient property that is used over and over in econometrics
A slightly dierent presentation of the CLT that is often useful:

n
_
Y
Y
_
a
N
_
0,
2
Y
_
20
Continuity Theorem
Suppose that Y
n
p
and g () is a continuous function. Then
g (Y
n
)
p
g ()
In words:
To nd the probability limit of a function of a random variable,
replace the random variable with its probability limit
Ex. If s
2
Y
p

2
Y
, then s
Y
p

Y
.
22
Slutskys Theorem
Suppose that X
n
d
X, Y
n
p
, and g () is a continuous function. Then,
g (X
n
, Y
n
)
d
g (X, )
In words:
The asymptotic distribution of g (X
n
, Y
n
) is
approximately equal to the distribution of g (X, )
Ex. Suppose Z
n
=
Y

Y
d
Z, where Z N (0, 1). Then, the asymptotic distribution Z
2
n
is
equal to the distribution of Z
2
, which is
2
1
.
23
Chapter 3 Review of statistics
Statistics is learning something about a population using a sample from that population
Estimation, hypothesis testing, and condence intervals
Section 3.1 Estimation of the population mean
An estimator is a function of the sample data
An estimate is the numerical value of the estimator actually computed using data
Suppose we want to learn the mean of Y ,
Y
How do we do it? We come up with an estimator for
Y
, often denoted
Y
Naturally, we could use the sample average Y or even the the rst observation, Y
1
There are many possible estimators of
Y
so we need some criteria to decide which ones
are good
24
Some desirable properties of estimators
Unbiasedness: E(
Y
) =
Y
Consistency:
Y
p

Y
Eciency:
Let
Y
and
Y
be unbiased estimators.
Y
is more ecient than
Y
if var (
Y
) < var (
Y
).
Y is unbiased, E
_
Y
_
=
Y
, and consistent, Y
p

Y
Y
1
is unbiased, E(Y
1
) =
Y
but not consistent (why?)
Eciency? var
_
Y
_
=

2
Y
n
< var (Y
1
) =
2
Y
Y is the Best Linear Unbiased Estimator (BLUE)
That is, Y is the most ecient estimator of
Y
among all unbiased estimators that are
weighted averages of Y
1
, . . . , Y
n
25
Section 3.2 Hypothesis tests concerning the population mean
The null hypothesis is a hypothesis about the population that we want to test
Typically not what the researcher thinks is true
A second hypothesis is called the alternative hypothesis
Typically what the researcher thinks is true
H
0
: E(Y ) =
Y,0
and H
1
: E(Y ) =
Y,0
Hypothesis testing entails using a test statistic to decide whether to accept the null hypoth-
esis or reject it in favor of the alternative hypothesis
In our example, we use the sample mean to test hypotheses about the population mean
Since the sample is random, any test statistic is random, and we have to reject or accept
the null hypothesis using a probabilistic calculation
26
Given a null hypothesis, the p-value is the probability of drawing a test statistic (Y ) that
is as extreme or more extreme than the observed value of the test statistic (y)
A small p-value indicates that the null hypothesis is unlikely to be true
To compute the p-value we make use of the CLT and treat Y
a
N
_

Y
,
2
Y
_
Under the null hypothesis Y
a
N
_

Y,0
,
2
Y
_
(assume for now that
2
Y
is known)
For y >
Y,0
, the probability of getting a more extreme positive value than y:
Pr
_
Y > y |
Y,0
_
= Pr
_
Y
Y,0

Y
>
y
Y,0

Y
_
= Pr
_
Z >
y
Y,0

Y
_
=
_

y
Y,0

Y
_
The p-value is thus
p-value = 2
_

y
Y,0

Y
_
28
For y <
Y,0
, the probability of getting a more extreme negative value than y:
Pr
_
Y < y |
Y,0
_
= Pr
_
Y
Y,0

Y
<
y
Y,0

Y
_
= Pr
_
Z <
y
Y,0

Y
_
=
_
y
Y,0

Y
_
The p-value is thus
p-value = 2
_
y
Y,0

Y
_
We can encompass both these cases by using the absolute value function:
p-value = 2
_

y
Y,0

_
29
Typically, the standard deviation
Y
is unknown and must be estimated
The sample variance is
s
2
Y
=
1
N 1
n

i=1
_
Y
i
Y
_
2
and the standard error is

Y
=
s
Y

n
,
The p-value formula is simply altered by replacing the standard deviation with the standard
error
p-value = 2
_

y
Y,0

Y

_
That is, when n is large, Y
a
N
_

Y
,
2
Y
_
If the p-value is less than some pre-decided value (often 0.05) than the null hypothesis is
rejected. Otherwise, it is accepted (or more precisely, not rejected).
30
Section 3.3 Condence intervals for the population mean
An estimate is a single best guess of
Y
A condence interval gives a range of values that contains
Y
with a certain probability
Using the sampling distribution Y
a
N
_

Y
,
2
Y
_
we have:
Pr
_
1.96
Y
Y

Y
1.96
_
= 0.95
Pr
_
Y 1.96
Y

Y
Y + 1.96
Y
_
= 0.95
The interval
_
Y 1.96
Y
, Y + 1.96
Y

contains
Y
with probability 0.95
An estimate of this interval, [ y 1.96
Y
, y + 1.96
Y
], is called a 95% condence interval
of
Y
Note that the rst interval is random but the second is not
31
Appendix 18.1 Summary of matrix algebra
A matrix is a collection of numbers (called elements) that are laid out in columns and
rows
The dimension of a matrix is mn, where m is the number of rows and n is the number
of columns
An mn matrix A is
A =
_

_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
a
mn
_

_
The element in the i
th
row and j
th
column of matrix A is denoted a
ij
32
A vector of dimension n is an collection of n numbers (called elements) collected either
in a column or row:
b =
_

_
b
1
b
2
.
.
.
b
n
_

_
and c = [ c
1
c
2
c
n
]
A column vector is a n 1 matrix and a row vector is a 1 n matrix
In most cases, well consider only column vectors
The i
th
element of vector b is denoted b
i
A matrix of dimension 1 1 is called a scalar
33
Matrix addition: two matrices A and B, each of the same dimension, are added
together by adding their elements
Vector multiplication (dot product or inner product) of two n-dimensional column
vectors, a and b, is computed as a

b =

n
i=1
a
i
b
i
34
Matrix multiplication: two matrices A and B can be multiplied together to form the
product C AB if they are conformable
conformable: the number of columns of A equals the number of rows of B
The (i, j) element of C is the dot product of the i
th
row of A and the j
th
column of B
35
The identity matrix I
n
is a n n with 1s on the diagonal and 0s everywhere else
I
n
=
_

_
1 0 0
0 1
.
.
. 0
.
.
.
.
.
.
.
.
. 0
0 0 0 1
_

_
The inverse of the square matrix A is dened as the matrix for which A
1
A = I
n
The transpose of matrix A, denoted A

, switches the rows and columns. That is,


element (i, j) of A becomes element (j, i) of A

the element in the i


th
row and j
th
column is moved to the j
th
row and i
th
column
If A has dimension mn, then A

has dimension n m
36
Some useful properties of matrix algebra and calculus:
1. (A+ B)

= A

+ B

2. (A+ B) C = AC + BC
3. (AB)

= B

4.

z
(z

Az) = (A+ A

) z = 2Az (last step assumes A is symmetric)


5.

z
(a

z) = a
37
Simple linear regression model: summation notation vs. matrix notation
The simple linear regression model is
Y
i
=
0
+
1
X
i
+ u
i
, i = 1, . . . , n
where Y
i
is the dependent variable, X
i
is the independent variable,
0
is the intercept,
1
is the slope, and u
i
is the error term
The OLS estimator minimizes the sum of squared residuals:
n

i=1
u
2
i
=
n

i=1
_
Y
i

1
X
i
_
2
The rst-order conditions are:
n

i=1
_
Y
i

1
X
i
_
= 0
and
n

i=1
X
i
_
Y
i

1
X
i
_
= 0
38
Solving for

0
and

1
:

1
=

n
i=1
_
X
i
X
_ _
Y
i
Y
_

n
i=1
_
X
i
X
_
2
and

0
= Y

1
X
Properties of the OLS estimator of the slope of the simple linear regression model
We can also write the linear regression model in matrix notation
39
We generalize the model to include k independent variables (plus an intercept)
Y
i
=
0
+
1
X
1i
+
2
X
2i
+ . . . +
k
X
ki
+ u
i
, i = 1, . . . , n
The (k + 1) dimensional column vector X
i
contains the independent variables of the i
th
observation and is a (k + 1) dimensional column vector of coecients. Then, we can
rewrite the model as
Y
i
= X

i
+ u
i
, i = 1, . . . , n
The n (k + 1) dimensional matrix X contains the stacked X

i
, i = 1, . . . , n:
X =
_

_
X

1
X

2
.
.
.
X

n
_

_
=
_

_
1 X
11
. . . X
k1
1 X
12
. . . X
k2
.
.
.
.
.
.
.
.
.
.
.
.
1 X
1n
. . . X
kn
_

_
40
The n-dimensional column vector Y contains the stacked observations of the dependent
variable and the n-dimensional column vector u contains the stacked error terms:
Y =
_

_
Y
1
Y
2
.
.
.
Y
n
_

_
and u =
_

_
u
1
u
2
.
.
.
u
n
_

_
We can now write all n observations compactly as
Y = X + u
41
The sum of squared residuals is
u

u =
_
Y X

_
Y X

_
= Y

Y 2 Y

The rst-order conditions are


2 X

Y + 2 X

= 0
which can be solved for

as
2 X

Y + 2 X

= 0
(X

X)

= X


= (X

X)
1
X


is the vector representation of the OLS estimators
42

Vous aimerez peut-être aussi