Vous êtes sur la page 1sur 2



i 1

X 2 Yi Y i 1 X 3 i X 3 i 1 X 3 i X 3 Yi Y i 1 X 2 i X 2 X 3 i X 3



i 1




i 1



i 1


X 2 X 3 i X 3

Topic 3: Multiple


Cov ( X 2 ,Y )Var ( X 3 ) - Cov( X 3 , Y )Cov ( X 2 , X 3 )

Var ( X 2 )Var (X 3 ) Cov ( X 2 , X 3 ) 2

EARN INGS 4.62 0.74 S 0.15 ASVABC

Whats the interpretation: These expressions make it clear

that multiple regression is a statistical technique that estimates the impact of a given explanatory variable on a
dependent variable, after allowing for its correlation with other explanatory variables
e.g. Estimated coefficient for S means that for every additional year of schooling completed, hourly earnings increase
by 74, holding the IQ score fixed. Similarly, the estimated slope coefficient for ASVABC means that for every point
increase in the IQ score, earnings increase by an average of 15, holding schooling constant





1 rX22 , X 3 nMSD ( X 2 ) 1 rX22 , X 3


The higher is the correlation between the explanatory variables, positive or negative, the greater will be the variance.
Interpretation: The greater the correlation, the harder it is to discriminate between the effects of the explanatory variables on Y,
and the less accurate will be the regression estimates.

Qn: Can we use R^2 as a measure of good fit (of that regression model)?

R2 1

RSS ( n k )
TSS ( n 1)

But when a new independent variable is added to a multiple regression

model, R2 always increases regardless of its relevance. This is because the irrelevant
independent variable will have some (low) correlation with the dependent variable which reduces
RSS and increases R2
A less ideal way to adjust R^2 is to have an ADJUSTED R^2.
Topic 4
must know: 5 rules of OSL
A2 No explanatory variable is a linear function of the others.
When this condition is violated, we have a problem of (perfect) multicollinearity. As a result, the OLS estimators are
undefined. Intuitively, if X3 always moves in a systematic way with X2, the OLS technique cannot tell where the
effect on Y is coming from. (and youll see s.e. for either co-efficients blow up because of a huge Rx2,x3.)
*note that t-stat will SHRINK (i.e. dont judge first based just on t-stat value) when s.e. blows up. Always insist on hypotesting first. The better way is to first match against the theory, do a contrasting test to verify suspicion.
* To find out which variable is the one having multicollinearity, just contrast the restricted and non-restricted tests
the non-related ones will stay pretty much the same.
2. Telling signs of high multicollinearity
* high R2(x2,x3) values high possible that their effects stick together. hence s.e. blows up
* variance inflation factor (VIF) if high (more than 10), means severe
3. How to confirm multicollinearity

Necessary conditions:

Run series of regressions to look for linear relationships among the explanatory variables. Can apply F test to
each of these regressions to check if all coefficients are simultaneously equal to zero.
VIF only a sufficient condition as low VIF doesnt mean no severe multicollinearity (depends on how small is

Testing Linear Restriction:


F test
T test: T-test is good for JUST one restriction. Otherwise, use F test.
T-test: [(b1 + b2) 1]/ se (b1+ b2) ~ t (n-k)

s 2j sb2j 2 p j sb pb j

j 1

p j

*Var (b1 + b2) = Var (b1) + Var (b2) + CoV (b1,b2)

3. Or parameterized T-test. (when cannot find combined variance)