Vous êtes sur la page 1sur 42

ETC2410 Introductory Econometrics

Huize Zhang

27 May 2019
A little bit about me

I Honours | Econometrics
I Summer Research for Tennis Australia
I Worked in Predictive Analytics Group
What we do in ETC2410?

I We talk about an estimation technique called Ordinary Least


Square (OLS).
I There’s also a lot of other techniques i.e. MLE, GMM, other
techniques in statistical learning
What we learn about OLS?

I Mechanism
I Assumptions
I Properties
I Inference based on OLS
Mechanism

Regression in matrix notation

y = Xβ + u
Minimising the error:
X0 ^
u=0

β^ = (X0 X)−1 X0 y
Assumptions of OLS

I Linearity: linear in parameter


I Randomness: random sample
I No perfect collinearity
I Zero condition mean

E (u|X) = 0

I Homoskadesticity
Var (u|X) = σ 2 In
Properties (under assumptions)

I Unbiased Estimator: under assumption 1, 2, 3 and 4

^ =β
E (β)

I BLUE: Best Linear Unbiased Estimator: under all the 5


assumptions
^ = σ 2 (X0 X)−1
Var (β)
Properties: Proof

E (β̂) = β

E (β̂) = E [(X0 X)−1 X0 y]


= E [(X0 X)−1 X0 (Xβ + u)]
= E [(X0 X)−1 X0 Xβ + (X0 X)−1 X0 u)]
= E [β + (X0 X)−1 X0 u]
= β + E [(X0 X)−1 X0 u]
= β + (X 0 X )−1 X 0 E [u] condition on X
= β assumption 4: zero conditional mean
Inference based on OLS

Note: up until now, we haven’t make any distribution assumption


about OLS (you don’t need distribution assumption to have the
property of OLS)
I Normality
u ∼ N(0, σ 2 In )
I This gives
β̂∼N(β, σ 2 (X0 X)−1 )
Then we can make inference (hypothesis testing + confidence
interval) based on this distribution of β
Hypothesis testing: three scenarios

I single parameter & single restriction: T test


I multiple parameters & multiple restrictions: F test
I multiple parameters & single restriction: Reparameterisation
Test hypothesis about a single restriction: t-test

β̂j − βj
∼ tn−k−1
se(βj )

Five steps in hypothesis testing Put that in your cheat sheet


I step 1: H0 = ...; H1 = ...
I step 2: distribution under H0

β̂j − βj
T = ∼ Tn−k−1
se(βj )

I step 3: tcalc = ..; tcrit = ..


I step 4: reject H0 if tcalcl > tcrit
I step 5: Based on the calculated value and critical value, we
reject/ not reject the null and conclude that [put in context]
Test hypothesis about multiple restrictions: F-test
Formulae 1
(SSRr − SSRur )/q
F = ∼Fq,n−k−1
SSRur /(n − k − 1)

I unrestricted model: the origin model:

y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4

I restricted model: the origin model after imposing the


restrictions in H0
I i.e. H0 : β1 = β2 = 0
I the restricted model is then y = β0 + β3 x3 + β4 x4

I q is the number of restriction i.e. the number of ” = ” in your


H0
Test hypothesis about multiple parameters: F-test

Formulae 2: overall significance


If the restriction contains all the β (except β0 ), then the formulae
shrinks to

R 2 /k
F = ∼Fk,n−k−1
(1 − R 2 )/(n − k − 1)

since SSR = 1 − R 2 and SSRr = 1


I choose the formulae based on the information given
A special case: single restriction with multiple parameters

y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + u

H0 : β1 = β2
transform to
H0 : δ = 0 where δ = β1 − β2

since δ = β1 − β2 , β1 = δ + β2 ,
the origin y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + u becomes

y = β0 + (δ + β2 )x1 + β2 x2 + β3 x3 + β4 x4 + u

y = β0 + δx1 + β2 (x2 + x1 ) + β3 x3 + β4 x4 + u

Regress y on x1 , x1 + x2 , x3 , x4 and use the estimated coefficient and


std.err of x1 to conduct t-test
Interval

I Confidence Interval for a parameter

β̂ ± tn−k−1 (α) ∗ se(β̂)

I Prediction Interval for y

ŷ ± tn−k−1 (α) ∗ se(ê)

where q
se(ê) = σ̂ 2 + [se(ŷ )]2
Prediction Interval

Understand why you can’t use se(ŷ )?


I the prediction of y include two sources of uncertainties: error of
the regression: σˆ2 and variation of the estimation: se(ŷ )
Functional Form

I Log transformation
Form change of x change of y
log-level(y-x) unit change percentage change
level-log(y-x) percentage change unit change
log-log percentage change percentage change
Always remeber: “controlling for all other variables”
I Quadratic term
I what’s the turning point? (find the maximum or minimum)
Model Selection Criteria

I R 2 is always increasing as the number of parameter increases -


not good!
Adjusted R 2 | AIC | BIC | HQ
I prefer the model with larger Adjusted Rˆ2 but smaller AIC, BIC
and HQ - the last three add a penalty to SSR and BIC adds
the largest penalty among all three
I the result may conflict, choose the best model considering all
the criteria
Dummy Variables

I Gender dummy
I Name the dummy as the category which has value of 1: female
is a good name; gender is a bad one
I Simple dummy interpretation: bring in the content!
I controlling for all other variables, females on average earn less
than their male counterpart by [whatever]% - don’t say a unit
increase in the female dummy!!!
Dummy Variables: interaction term

lwage = β0 + β1 female + β2 educ + β3 female ∗ educ

When female = 0
lwage = β0 + β2 educ
when female = 1

lwage = (β0 + β1 ) + (β2 + β3 )educ


Dummmy variable trick

I If you have multiple dummy variables: yr2010, yr2011, yr2012,


yr2013, yr2014, only include 4 out of 5 into the regression
I This is because, your observation will definitely fall into one of
the categories: yr2010 + yr2011 + yr2012 + yr2013 + yr2014
=1
I Interpret based on the base level (usually the smallest)
I [number] [more or less] than the base level
Heteroskedasticity

I Definition and Consequence


I Detection - BP test/ White test
I Correction
Definition and Consequence

Definition of HTSK
I The variance of the errors is not equal to each other
I In another word, the diagonal elements of the
variance-covariance matrix is not all the same
Consequence
I still unbiased
I no longer ‘BLUE’, thus no longer efficient - OLS SE is incorrect
I t/F tests are inaccurate
Detection: BP test or White test

Step 1: Null and alternative hypothesis

H0 : Var (u|x1 , x2 ...xn ) = σ 2

H1 (BP) : Var (u|x1 , x2 ...xn ) = δ0 + δ1 z1 + δ2 z2 + ... + δq zq


H1 (White) : Var (u|x1 , x2 ...xn ) a fucntion of x1 , x2 , ...xn

where z1 , z2 , ..., zn is a subset of x1 , x2 , ...xn


Detection: BP test or White test

Step 2: origin and auxiliary regression


1. Original regression
Regress y on x1 , x2 , ...xn to get the residual and note it as ûi
2. Auxiliary regression
(BP) Regress the residual ûi 2 on c, z1 , z2 , ..., zq and note as Rû22
(White) Regress the residual ûi 2 on c,x1 , x2 , ...xn , x12 , x22 , ...xn2 and
cross-product to get R 2 and note as Rû22

(White) Regress the residual ûi 2 on c,yˆi , yˆi2 and cross-product to


get R 2 and note as Rû22
Detection: BP test or White test

Step 3: distribution under the null


asy
n ∗ Rû22 ∼ χ2 (q)

Step 4: Rejection criteria reject H0 if the calculated value is larger


than the critical value and conclude there’s HTSK.
Correction

I Robust Standard Error


I Transformation - if the variance is of a particular form

Var (ui |xi1 , xi2 , ...xik ) = σ 2 hi

1
Var (ui |xi1 , xi2 , ...xik ) = σ 2
hi
ui
Var ( √ |xi1 , xi2 , ...xik ) = σ 2
hi

It shows that if ui becomes √ui , then the error is homo


hi
Correction

y = β0 + β1 xi1 + β2 xi2 + ... + βk xik + ui

Therefore, we regress √y on xi1 √


√ , xi2h , ..., xin
√ , then the model
hi hi i hi
will look like

y 1 1 1 1 1
√ = √ β0 + √ β1 xi1 + √ β2 xi2 + ... + √ βk xik + √ ui
hi hi hi hi hi hi
and your error will be homoskedastic :)
Serial Correlation

I Definition and Consequence


I Detection - BG test
I Correction
Definition and Consequence

Definition
I Errors in different periods are correlated with each other
Consequence
I Affect the variance-covariance matrix: off-diagonal elements
are not all zeros, which implies that

Var (u|X) 6= σ 2 In

I OLS estimation remains unbiased, but it is not BLUE (not best,


not efficient)
Detection - BG test

I Step 1: Set up the structure equation and auxiliary equation


(choose lag period)

yt = β0 + β1 xt1 + β2 xt2 + ... + βk xtk + ut

ut = ρ1 ut1 + ρ2 ut2 + ρ3 ut3 + et

I Step 2: Null and alternative hypothesis testing

H0 : ρ1 = ρ2 = ρ3 = 0

H1 : at least one of the ρ 6= 0


I Step 3: Estimate the origin equation by OLS and acquire OLS
residual uˆt
Detection - BG test

I Step 4: Estimate the auxiliary regression by OLS and acquire


the R 2 and calculate BGcalc = (n − q)Rû2
I Step 5: Distribution of BG stats under H0
asy
BG = (n − 3)Rû2 ∼ χ2 (3)

I Step 6: Rejection rule: reject the null if BGcalc > BGcrit and
conclude there’s serial correlation in the origin regression. If no
adjustment is made, the estimators will be unbiased but not
efficient.
Correction

I Solution 1: HAC standard error: Heteroskedasticity and


Autocorrelation Consistent estimator
I Solution 2: Estimate by GLS (FGLS)
I Solution 3: Dynamic Model
Dynamic Model

I Include lags of y in the RHS of the model


I Example: AR models
Assumptions under time series setting

Assumption 1: Weakly Stationary If a series is weakly stationary or


stationary it has properties that
I E (yt ) = µ for all t
I Var (yt ) = γ0 for all t
I Cov (yt , yt−j ) = γj for all t
Note: 3 means that the covariance between yt , yt−j only depends on
the time interval separating them rather than the time itself
Assumption 2: White Noise
I E (yt ) = 0 for all t
I Var (yt ) = σ 2 for all t
I Cov (yt , yt−j ) = 0 for all t
AR models

AR(p) model

yt = φ0 + φ1 yt−1 + φ1 yt−2 + · · · φp yt−p + ut

AR(1) model
yt = φ0 + φ1 yt−1 + ut
Stationary restriction

| φ1 |≤ 1

yt = φ0 + φ1 yt−1 + ut
= φ0 + φ1 (φ0 + φ1 yt−2 + ut ) + ut
= φ0 (1 + φ1 ) + φ21 yt−2 + ut (1 + φ1 )
= φ0 (1 + φ1 ) + φ21 (φ0 + φ1 yt−3 + ut ) + ut (1 + φ1 )
= · · · + φ31 yt−3 + · · ·
= · · · + φp1 yt−p + · · ·
Deduction of E (yt )
Given that yt is a stationary series and ut is white noise, deduct the
expression for E (yt ), Var (yt ) and Cov (yt , yt−j )

φ0
E (yt ) =
1 − φ1

because
E (yt ) = E (φ0 ) + E (φ1 yt−1 ) + E (ut )
E (yt ) = φ0 + φ1 E (yt−1 ) + E (ut )

Because of stationary assumption 1, E (yt ) = E (yt−1 )


Because of white noise assumption 1, E (ut ) = 0
Therefore,
E (yt ) = φ0 + φ1 E (yt )
φ0
E (yt ) =
1 − φ1
Deduction of Var (yt )
σ2
Var (yt ) =
1 − φ21

because

Var (yt ) = Var (φ0 ) + Var (φ1 yt−1 ) + Var (ut )

Var (yt ) = φ21 Var (yt−1 ) + Var (ut )

Because of stationary assumption 2, Var (yt ) = Var (yt−1 )


Because of white noise assumption 2, Var (ut ) = σ 2
Therefore,
Var (yt ) = φ21 Var (yt ) + σ 2

σ2
Var (yt ) =
1 − φ21
Unit root - what if our series is not stationary: random
walk

yt = yt−1 + ut
Assuming ut is still WN, then a random walk series has property that

E (yt ) = 0

Var (yt ) = tσ 2
Theory w.r.t OLS

1 Pn
I WLLN: As n goes to infinity, n i yi will converge to E (y ),
which is ȳ in OLS
I CLT √
n(Y¯n − µ) d
→ N(0, 1)
σ
This is an exact result: CLT is the theorem for infinity n
If n is large but finite number then we can move around and have
the result

σ 2
asy
Y¯n ∼ N(µ, )
n
Theory w.r.t OLS

I Consistency: plim(θ̂) = θ
I WLLN:
I Jensen’s Inequality
I Slutsky theorem (CMT)
I Asymptotic Normality: θ̂ is asymptotically distributed as
normal with mean of θ0 and a variance V
I Taylor’s theorem
I Consistency
I CLT
I CMT
I WLLN
I Efficiency or asymptotic efficiency
I If the variance in the (asy) normal distribution hits the (fisher
information per observation) CRLB, then it is (asy) efficient