Vous êtes sur la page 1sur 37

Lecture 10 : Heteroskedasticity

Econ 488

Order of Testing
1. Omitted variables and incorrect functional form (Adjusted R2) 2. Either A or B, but not both
A. Serial Correlation (Durbin-Watson) B. Heteroskedasticity (Parks Test, Whites Test)

3. Multicollinearity (Correlation Matrix, VIF) 4. Irrelevant Variables (t-test)

Homoskedasticiy
Ideal Case: Homoskedasticity
Error variance 2 is constant across sample

2 measures dispersion of dependent variable around regression line Homoskedasticity means that the average relationship between dependent variable and independent variable is the same throughout sample

Homoskedasticity

Heteroskedasticity
Heteroskedasticity (or heteroscedasticity) is when 2 is not constant across sample Dispersion of dependent variable around regression line is not constant.

Heteroskedasticity

Heteroskedasticity

Why do we care?
If we dont fix heteroskedasticity:
Coefficients are not efficient (not minimum variance) Estimated standard errors biased and inconsistentmeaning t-stats are not right!

When can it occur?


Whenever dispersion around regression line differs within sample means relationship between dependent variable and independent variable differs within sample Example: MLB Payroll and Market Size

2008 MLB Payrolls


Large Markets:(Population>5,000,000)
Mean: $104,000,000 Std Dev: $44,600,000 Min: $21,800,000 (Florida Marlins) Max: $209,000,000 (NY Yankees)

Small Markets:(Population<5,000,000)
Mean: $78,800,000 Std Dev: $28,300,000 Min: $43,800,000 (Tampa Bay Rays) Max: $139,000,000 (Detroit Tigers)

Heteroskedasticity
Note: Same principle applies when observations are groups that differ in size. e.g.:
States (population) Countries (population) Colleges (enrollment) Companies (sales) Etc.

Another Example
Household income and consumption. A. Low-income households
Little Flexibility in spending Most income spend on necessities:
Food, shelter, clothing, transportation, utilities

Little dispersion of consumption around mean consumption. Small 2

Household Income vs. Consumption


B. High income households
More flexibility in spending Once necessities are purchased, much remains to be spent in different ways
Big Spenders Savers and Investors

Large dispersion of consumption around mean.

Pure vs. Impure Heteroskedasticity


Impure Occurs when regression is not correctly specified E.g. omitted variables
Can cause heteroskedasticity

Pure Occurs due to nature of data

Consequences
If we ignore heteroskedasticity, coefficient estimates are:
Unbiased OK! Consistent OK! Inefficient Not OK.
t-tests are inaccurate.

Detection
Tests detect heteroskedasticity
But wont distinguish between pure and impure types

If test uncovers heteroskedasticitySTOP! Try to decide if you have omitted variable. If you do
Include it in your model, and then retest for heteroskedasticity

Detection
ORIf you dont have an omitted variable: Employ one of the remedies well discuss After you fix the problem, Test again If you still have heteroskedasticity, It might be the impure type

Detection
Plots
1) Estimate model, save residuals 2) Plot residuals against each independent variable separately

Example: data3-6.gdt

Plots

Plots V on its side

Plots Increasing or Decreasing

Plots Rainbow or inverted rainbow

Park Test
If there is heteroskedasticity, then Var(i)= 2 Zi2
i = error term 2 = variance of homoskedastic error term Zi = proportionality factor

If you know something about Z, you can use the Park test. Find a variable that is related to heteroskedasticity (e.g. population)

Park Test
1. Run regression, obtain residuals 2. Run the following regression:
o o o o o o ln(ei2)= 0+ 1ln(Zi)+ ui Where: ei= residuals from regression Zi= best choice as to proportionality factor in data ui= classical error term If significant, there is evidence of heteroskedasticity.

1. Test the significance of ln(Zi).

Park Test
Problem: We dont always have a good Z So, we can use Whites Test

Whites Test
H0: No Heteroskedasticity HA: Heteroskedasticity

Whites Test
1) Estimate Equation
Yi=0+1X1i+2X2i+i

1) Save residual o ei = Yi 1 X 1i 2 X 2i and square it. 3) Regress squared residual on a constant, X1, X2, X12, X22, X1X2 (all combinations of Xs)
ui2=0+ 1X1i+ 2X2i + 3X1i2+ 4X2i2+ 5X1iX2i+ vi

Whites Test
4) Compute N*R2
o N= sample size o R2 = unadjusted R2

4) Reject Null if
o NR2 >2 (Chi-Square) with 5 degrees of freedom o Because there are 5 independent vars in auxiliary regression (step 3)

Whites Test
If you have 3 independent vars, auxiliary regression will have 9 independent vars. X1, X2, X3, X12, X22, X32, X1X2, X2X3, X1X3 If you have 6 independent vars, auxiliary regression will have 27 independent vars! This can get out of hand quickly.

Whites Test Version 2


Same as before, except in auxiliary regression only use the X and X2 terms (no cross products) Use when you have a lot of independent variables.

Remedies For Heteroskedasticity


1. Heteroskedasticity-Corrected Standard Errors
o Fixes consistency of standard errors, so when N is large, standard errors are correct. o In gretl, just check the robust standard error box when running a regression

Remedies For Heteroskedasticity


2. Weighted Least Squares (WLS)
(1) Yi=0+1X1i+2X2i+i (2) Var(i)= 2 Zi2 eqn. (1) is equivalent to (3) Yi=0+1X1i+2X2i+Ziui So we can divide through by Zi

Remedies For Heteroskedasticity


Step one:
Zi Zi

Yi = 0 + 1 X 1i + 2 X 2i + ui
Zi Zi

Step two: estimate by OLS


Caution about step 2: there are two cases.

Remedies For Heteroskedasticity


Case 1: Z is not in the original equation
Old: Yi=0+1X1i+2X2i+i New: Yi
Zi = 0 1 1 X 1i 2 X 2i + + + ui Zi Zi Zi

Whats Missing? The constant! Solution: Add a constant Better:

Yi 1 X X = 0 + 0 + 1 1i + 2 2i + ui Zi Zi Zi Zi

Remedies For Heteroskedasticity


Case 2: Z is in the original equation
Suppose X1 is Z Old: Yi=0+1X1i+2X2i+i New:
Yi X 1 = 0 + 1 + 2 2i + ui X 1i Xi X 1i

Whats different about this equation? One of the slope coefficients in the original equation becomes an intercept! This happens because X1i/X1i=1

Remedies For Heteroskedasticity


That is: Intercept value in the new equation is the same as slope 2 in the original equation. What should you look at in the new equation to find the equation of X2? The constant.

Remedies For Heteroskedasticity


Example: saving.gdt (weight by income)

Vous aimerez peut-être aussi