Lecture 10

Lecture 10 : Heteroskedasticity
Econ 488
Order of Testing
1. Omitted variables and incorrect functional form (Adjusted R2) 2. Either A or B, but not both
A. Serial Correlation (Durbin-Watson) B. Heteroskedasticity (Parks Test, Whites Test)
3. Multicollinearity (Correlation Matrix, VIF) 4. Irrelevant Variables (t-test)
Homoskedasticiy
Ideal Case: Homoskedasticity
Error variance 2 is constant across sample
2 measures dispersion of dependent variable around regression line Homoskedasticity means that the average relationship between dependent variable and independent variable is the same throughout sample
Homoskedasticity
Heteroskedasticity
Heteroskedasticity (or heteroscedasticity) is when 2 is not constant across sample Dispersion of dependent variable around regression line is not constant.
Heteroskedasticity
Heteroskedasticity
Why do we care?
If we dont fix heteroskedasticity:
Coefficients are not efficient (not minimum variance) Estimated standard errors biased and inconsistentmeaning t-stats are not right!
When can it occur?

Whenever dispersion around regression line differs within sample means relationship between dependent variable and independent variable differs within sample Example: MLB Payroll and Market Size
2008 MLB Payrolls

Large Markets:(Population>5,000,000)
Mean: $104,000,000 Std Dev: $44,600,000 Min: $21,800,000 (Florida Marlins) Max: $209,000,000 (NY Yankees)
Small Markets:(Population<5,000,000)
Mean: $78,800,000 Std Dev: $28,300,000 Min: $43,800,000 (Tampa Bay Rays) Max: $139,000,000 (Detroit Tigers)
Heteroskedasticity
Note: Same principle applies when observations are groups that differ in size. e.g.:
States (population) Countries (population) Colleges (enrollment) Companies (sales) Etc.
Another Example
Household income and consumption. A. Low-income households
Little Flexibility in spending Most income spend on necessities:
Food, shelter, clothing, transportation, utilities
Little dispersion of consumption around mean consumption. Small 2
Household Income vs. Consumption

B. High income households
More flexibility in spending Once necessities are purchased, much remains to be spent in different ways
Big Spenders Savers and Investors
Large dispersion of consumption around mean.
Pure vs. Impure Heteroskedasticity

Impure Occurs when regression is not correctly specified E.g. omitted variables
Can cause heteroskedasticity
Pure Occurs due to nature of data
Consequences
If we ignore heteroskedasticity, coefficient estimates are:
Unbiased OK! Consistent OK! Inefficient Not OK.
t-tests are inaccurate.
Detection
Tests detect heteroskedasticity
But wont distinguish between pure and impure types
If test uncovers heteroskedasticitySTOP! Try to decide if you have omitted variable. If you do
Include it in your model, and then retest for heteroskedasticity
Detection
ORIf you dont have an omitted variable: Employ one of the remedies well discuss After you fix the problem, Test again If you still have heteroskedasticity, It might be the impure type
Detection
Plots
1) Estimate model, save residuals 2) Plot residuals against each independent variable separately
Example: data3-6.gdt
Plots
Plots V on its side
Plots Increasing or Decreasing
Plots Rainbow or inverted rainbow
Park Test
If there is heteroskedasticity, then Var(i)= 2 Zi2
i = error term 2 = variance of homoskedastic error term Zi = proportionality factor
If you know something about Z, you can use the Park test. Find a variable that is related to heteroskedasticity (e.g. population)
Park Test
1. Run regression, obtain residuals 2. Run the following regression:
o o o o o o ln(ei2)= 0+ 1ln(Zi)+ ui Where: ei= residuals from regression Zi= best choice as to proportionality factor in data ui= classical error term If significant, there is evidence of heteroskedasticity.
1. Test the significance of ln(Zi).
Park Test
Problem: We dont always have a good Z So, we can use Whites Test
Whites Test
H0: No Heteroskedasticity HA: Heteroskedasticity
Whites Test
1) Estimate Equation
Yi=0+1X1i+2X2i+i
1) Save residual o ei = Yi 1 X 1i 2 X 2i and square it. 3) Regress squared residual on a constant, X1, X2, X12, X22, X1X2 (all combinations of Xs)
ui2=0+ 1X1i+ 2X2i + 3X1i2+ 4X2i2+ 5X1iX2i+ vi
Whites Test
4) Compute N*R2
o N= sample size o R2 = unadjusted R2
4) Reject Null if
o NR2 >2 (Chi-Square) with 5 degrees of freedom o Because there are 5 independent vars in auxiliary regression (step 3)
Whites Test
If you have 3 independent vars, auxiliary regression will have 9 independent vars. X1, X2, X3, X12, X22, X32, X1X2, X2X3, X1X3 If you have 6 independent vars, auxiliary regression will have 27 independent vars! This can get out of hand quickly.
Whites Test Version 2

Same as before, except in auxiliary regression only use the X and X2 terms (no cross products) Use when you have a lot of independent variables.
Remedies For Heteroskedasticity

1. Heteroskedasticity-Corrected Standard Errors
o Fixes consistency of standard errors, so when N is large, standard errors are correct. o In gretl, just check the robust standard error box when running a regression

2. Weighted Least Squares (WLS)
(1) Yi=0+1X1i+2X2i+i (2) Var(i)= 2 Zi2 eqn. (1) is equivalent to (3) Yi=0+1X1i+2X2i+Ziui So we can divide through by Zi

Step one:
Zi Zi
Yi = 0 + 1 X 1i + 2 X 2i + ui
Zi Zi
Step two: estimate by OLS

Caution about step 2: there are two cases.

Case 1: Z is not in the original equation
Old: Yi=0+1X1i+2X2i+i New: Yi
Zi = 0 1 1 X 1i 2 X 2i + + + ui Zi Zi Zi
Whats Missing? The constant! Solution: Add a constant Better:
Yi 1 X X = 0 + 0 + 1 1i + 2 2i + ui Zi Zi Zi Zi

Case 2: Z is in the original equation
Suppose X1 is Z Old: Yi=0+1X1i+2X2i+i New:
Yi X 1 = 0 + 1 + 2 2i + ui X 1i Xi X 1i
Whats different about this equation? One of the slope coefficients in the original equation becomes an intercept! This happens because X1i/X1i=1

That is: Intercept value in the new equation is the same as slope 2 in the original equation. What should you look at in the new equation to find the equation of X2? The constant.

Example: saving.gdt (weight by income)

Lecture 10

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lecture 10

Transféré par

Droits d'auteur :

Formats disponibles

Lecture 10 : Heteroskedasticity

3. Multicollinearity (Correlation Matrix, VIF) 4. Irrelevant Variables (t-test)

When can it occur?

2008 MLB Payrolls

Little dispersion of consumption around mean consumption. Small 2

Household Income vs. Consumption

Large dispersion of consumption around mean.

Pure vs. Impure Heteroskedasticity

Pure Occurs due to nature of data

Plots V on its side

Plots Increasing or Decreasing

Plots Rainbow or inverted rainbow

1. Test the significance of ln(Zi).

Whites Test Version 2

Remedies For Heteroskedasticity

Remedies For Heteroskedasticity

Remedies For Heteroskedasticity

Step two: estimate by OLS

Remedies For Heteroskedasticity

Whats Missing? The constant! Solution: Add a constant Better:

Remedies For Heteroskedasticity

Remedies For Heteroskedasticity

Remedies For Heteroskedasticity

Vous aimerez peut-être aussi