Académique Documents
Professionnel Documents
Culture Documents
Econ 488
Order of Testing
1. Omitted variables and incorrect functional form (Adjusted R2) 2. Either A or B, but not both
A. Serial Correlation (Durbin-Watson) B. Heteroskedasticity (Parks Test, Whites Test)
Homoskedasticiy
Ideal Case: Homoskedasticity
Error variance 2 is constant across sample
2 measures dispersion of dependent variable around regression line Homoskedasticity means that the average relationship between dependent variable and independent variable is the same throughout sample
Homoskedasticity
Heteroskedasticity
Heteroskedasticity (or heteroscedasticity) is when 2 is not constant across sample Dispersion of dependent variable around regression line is not constant.
Heteroskedasticity
Heteroskedasticity
Why do we care?
If we dont fix heteroskedasticity:
Coefficients are not efficient (not minimum variance) Estimated standard errors biased and inconsistentmeaning t-stats are not right!
Small Markets:(Population<5,000,000)
Mean: $78,800,000 Std Dev: $28,300,000 Min: $43,800,000 (Tampa Bay Rays) Max: $139,000,000 (Detroit Tigers)
Heteroskedasticity
Note: Same principle applies when observations are groups that differ in size. e.g.:
States (population) Countries (population) Colleges (enrollment) Companies (sales) Etc.
Another Example
Household income and consumption. A. Low-income households
Little Flexibility in spending Most income spend on necessities:
Food, shelter, clothing, transportation, utilities
Consequences
If we ignore heteroskedasticity, coefficient estimates are:
Unbiased OK! Consistent OK! Inefficient Not OK.
t-tests are inaccurate.
Detection
Tests detect heteroskedasticity
But wont distinguish between pure and impure types
If test uncovers heteroskedasticitySTOP! Try to decide if you have omitted variable. If you do
Include it in your model, and then retest for heteroskedasticity
Detection
ORIf you dont have an omitted variable: Employ one of the remedies well discuss After you fix the problem, Test again If you still have heteroskedasticity, It might be the impure type
Detection
Plots
1) Estimate model, save residuals 2) Plot residuals against each independent variable separately
Example: data3-6.gdt
Plots
Park Test
If there is heteroskedasticity, then Var(i)= 2 Zi2
i = error term 2 = variance of homoskedastic error term Zi = proportionality factor
If you know something about Z, you can use the Park test. Find a variable that is related to heteroskedasticity (e.g. population)
Park Test
1. Run regression, obtain residuals 2. Run the following regression:
o o o o o o ln(ei2)= 0+ 1ln(Zi)+ ui Where: ei= residuals from regression Zi= best choice as to proportionality factor in data ui= classical error term If significant, there is evidence of heteroskedasticity.
Park Test
Problem: We dont always have a good Z So, we can use Whites Test
Whites Test
H0: No Heteroskedasticity HA: Heteroskedasticity
Whites Test
1) Estimate Equation
Yi=0+1X1i+2X2i+i
1) Save residual o ei = Yi 1 X 1i 2 X 2i and square it. 3) Regress squared residual on a constant, X1, X2, X12, X22, X1X2 (all combinations of Xs)
ui2=0+ 1X1i+ 2X2i + 3X1i2+ 4X2i2+ 5X1iX2i+ vi
Whites Test
4) Compute N*R2
o N= sample size o R2 = unadjusted R2
4) Reject Null if
o NR2 >2 (Chi-Square) with 5 degrees of freedom o Because there are 5 independent vars in auxiliary regression (step 3)
Whites Test
If you have 3 independent vars, auxiliary regression will have 9 independent vars. X1, X2, X3, X12, X22, X32, X1X2, X2X3, X1X3 If you have 6 independent vars, auxiliary regression will have 27 independent vars! This can get out of hand quickly.
Yi = 0 + 1 X 1i + 2 X 2i + ui
Zi Zi
Yi 1 X X = 0 + 0 + 1 1i + 2 2i + ui Zi Zi Zi Zi
Whats different about this equation? One of the slope coefficients in the original equation becomes an intercept! This happens because X1i/X1i=1