Vous êtes sur la page 1sur 22

ADVANCED ECONOMETRICS

ECO 318
Instructor: Kanika Mahajan
Violation of Assumptions
Assumption on Xs
Multicollinearity

Distributional assumptions on the error term


Homoscedasticity
Serial Correlation
HETEROSCEDASTICITY
What is it?
Consider the below true model :
Yi 1 X 1i 2 X 2i ui
Assumption for OLS estimator of the above to be BLUE :
1) Linear in parameters
2) No perfect multicolli nearity
3) Zero conditional mean (unbiasedn ess) : E(u i | X) 0 i 1,...n
4) Homoskedasticity (Efficienc y, Inference) : Var(u i | X) E(u i | X) 2
2

5) No Serial Correlatio n (Efficienc y, Inference) : cov(u i , u j | X) 0 i j


Homoscedasticity?
This assumption is important to derive the standard errors of the
estimators. To see this, consider a simple regression model
Yi 1 X 1i ui

^ Cov( X , Y ) Cov( X , 1 X ui ) Cov( X , u )


1 1
V (X ) V (X ) V (X )

n
n _

i
_

^ ( X i X )u i / n V
i 1
( X X ) u i / n

V( 1 | X) V 1 i n1

_ 2


n _
(Xi X ) / n
i
2
( X X ) 2
/ n
i 1
i 1
Numerator in the previous expression :
n _
1 n _
Var ( X i X )u i / n | X 2 Var[( X i X )u i | X ]
i 1 n i 1

n 1 n _ _
2
2
n
Cov[( X
i 1 j i 1
i X )u i , ( X j X )u j | X ]

First Expression : Depends on Homoskedasticity


Second Expression : Depends on Serial Correlation
^ 2 ^ 2
1 n ^2
V( 1 ) n _
, To obtain estimate of
T 2 i 1
ui
(X
i 1
i X) 2
Now what happens when we violate
homoscedasticity

Estimator is still unbiased and consistent


R 2 continues to have the same interpretation : a consistent estimator
u2
of population R - square (1 - 2 ). Because both variances are unconditio nal
y
variances.
The estimated standard errors of the regression parameters are incorrect
OLS estimator not the most efficient any longer
The OLS t - stats do not have t - distributions any longer (even in large
samples), F - stats also not have F - distributions
Way Out? : Simple case
In a simple regression , with zero covariance s, the numerator :
n _
1 n _
Var ( X i X )u i / n | X 2 Var[( X i X )u i | X ]
i 1 n i 1

n _
1
2 E[( X i X ) 2 u i | X ]
2

n i 1
n _
1
2 [( X i X ) 2 E (u i | X )]
2

n i 1

1 n _
2 [( X i X ) 2 E (u i | X )]
2

n i 1
_
1 ^
2 i 1 [( X i X ) u i ]
n 2 2

n
Way out?: General case
1) Heteroscedasticity Robust Standard Errors (Huber, White, Eicker SE' s)
Yi 1 X 1i ... k X ki ui

^ ^
i 1 r ij u j
^ n 2 2
^
Var ( j ) 2
SSR j
where,
rij : i th residual obtained by regressing X j on all other independent
variables
SSR j : is the sum of squared residuals obtained from the above regression
( X j on all other independent variables)
STATA : regress y x, vce(robust)
Obtain H - Robust F - Stat and LM Tests : Wooldridg e
Way out?
2) Weighted Least Squares
Used when the functional form of heteroscedasticity is known.
Var(u | X) 2 h(X)
h(X) 0 since variance is always positive
for example : Savingsi Incomei ui
Let h(X) h(income) income. Then, Var(u|X) 2 Income

Idea : Give less weightage to observation which have a larger error variance
WLS

In general,
Yi 1 X 1i ... k X k i ui
Suppose the error is heteroscedastic, V( ui | X) 2 h(X i )
2
ui
Now conditonal on X, E 2 . So, what if we transform the model?
h
i
Yi X X ki u
1 1i ... k i
hi hi hi hi hi
Yi * 1 X 1i ... k X k i ui
* * * *
2) Feasible Generalized Least Squares
But do we know h(x)?
^
Use h i to do the estimation gives Feasible Generalized Least Squares (FGLS)
estimator.

V( u | X) 2 h(X) 2 exp( 0 1 X 1 ... k X k )


Why exponential? Are the parameters known? How to estimate them, right now
model not linear in parameters.
log( u 2 ) 0 1 X 1 ... k X k e

To estimate the above start with OLS residuals estimated.


FGLS estimators : Consistent; efficient but not unbiased
Discussion

What if OLS and WLS estimates very different?


Both are unbiased so should not be very different from each other.
Indicative of functional form misspecification.
Usually, further correct WLS SE' s using White' s estimator when chance
that there may be a misspecification.

In practice : If WLS is misspecified then no longer efficient. So some argue


why use WLS? Use OLS and correct the SE' s. When very strong
heterscedasticity then WLS can be a robustness check.
Testing for Heteroscedasticity (1)
Consider the model below :
Yi 1 X 1i ... k X k i ui
Assume : M LR.1- M LR.4

Ho : Var(u | X) E (u 2 | X ) E (u 2 ) 2
1) Breusch - Pagan Test : Regress u 2 on X' s
u 2 0 1 X 1 ... k X k v
Suppose the error is heteroscedastic, V( ui | X) 2 h(X)

Ho : 1 .... k 0
Can test the above using F - Test or LM Test (asymptotically distributed)
Testing for Heteroscedasticity (1)

R^22 / k
F Stat : u
~ Fk ,n k 1
1 R 2 / n k 1
^2
u

LM Stat : nR^ 2 ~ k
2 2

u
Testing for Heteroscedasticity (2)
Consider the model below :
Y 1 X 1 2 X 2 3 X 3 u
1) Alternative specificat ion
^2
u 0 1 X 1 2 X 2 3 X 3 4 X 1 5 X 2 6 X 3
2 2 2

7 X1 X 2 8 X 2 X 3 9 X1 X 3 v

Another way of doing the above :


^2 ^ ^2
u 0 1 y 2 y v

Ho : 1 2 0
Can test the above using F - Test or LM Test (asymptotically distributed)
Example
* OLS
regress cigs ln_income ln_cigpric educ age age_sq restaurn

* Test for Heteroscedasticity

* original Breasch - Pagan, assuming normality of errors


estat hettest
* White' s Test
estat imtest, white

* White' s robust SE' s


regress cigs ln_income ln_cigpric educ age age_sq restaurn, vce(robust)
* OLS
Example
regress cigs ln_income ln_cigpric educ age age_sq restaurn
* Get the errors
predict e, residual
* generate log of squared error
gen logesq = ln(e * e)
* Estimate the relationship between error square and X' s
reg logesq ln_income ln_cigpric educ age age_sq restaurn
predict esqhat
* generate the h(i)
gen hi = exp(esqhat )
* generate weight and weighted variables
gen wt = 1/((hi)^0.5)

foreach x in cigs ln_income ln_cigpric educ age age_sq restaurn {


gen w`x' = wt *`x'
}

regress wcigs wt wln_income wln_cigpr ic weduc wage wage_sq wrestaurn, noc


A special case of Weighted Least
Squares
Suppose we use averages across a group or geographic al region generated
from an individual level dataset

Wageid 1 Education id 2 Soil d uid


Suppose have only average values for a district. District level equation :
________ __________
____ _
Wage d 1 Education id 2 Soil d u d
If the individual errors are independent of the district's size, then all assumption
_ md

u
1
satisfied for this model, then [E( u d | X) 0 md id ]
i 1
When the original model satisfies homoscedas ticity and errors not correlated
within individual s in a district
_
Var( u d | X) 2 /m d . Previously, h i 1/m d
Yi X 1i X ki ui
1 ... k
hi hi hi hi hi
WLS with weights equal to population
Example
Regress y x, [aw=wt]
In the above case, wt=district population

Vous aimerez peut-être aussi