Vous êtes sur la page 1sur 29

IAPRI Quantitative Analysis

Capacity Building Series

Multiple regression analysis


& interpreting results

How important is R-squared?


R-squared

0.45

Published in
Agricultural Economics
Best article of the year,
2008

???

Best article of the year,


2009

0.21

Best article of the year,


2010

Session 3 Topics
n Multiple
What

regression analysis

does it mean?
Why is it important?
How is it done and how are results interpreted?
What are the hazards?

Multiple Regression Analysis


n What

does it mean?

Multivariate

analysis/statistics
Ceteris paribus
All else equal
Controlling for

Multiple Regression Analysis


n Why

does it matter?
y = + 1 x1 + u
u = 2 x2 +

E (u | x1 ) = E (u) = 0 implying Corr(u, x1 ) = 0


What if y = + 1 x1 + 2 x2 +
If Corr

(x1 , x2 ) 0, then Corr(u, x1 ) 0

Results

n If

are biased

E(u | x1 , x2 ) = 0 (and other conditions), we

can estimate w/ multiple regressors

Multiple Regression Analysis


n Consider

maize yield (mzyield) and basal


fertilizer (basaprate), both kg/ha
mzyield = + 1basaprate + u

. reg mzyield basaprate


Source

SS

df

MS

Model
Residual

2.1590e+09
1.2229e+10

1
8646

2.1590e+09
1414446.51

Total

1.4388e+10

8647

1663962.69

mzyield

Coef.

basaprate
_cons

5.254685
1335.84

Std. Err.
.1344979
14.57861

t
39.07
91.63

Number of obs
F( 1, 8646)
Prob > F
R-squared
Adj R-squared
Root MSE

=
8648
= 1526.38
= 0.0000
= 0.1501
= 0.1500
= 1189.3

P>|t|

[95% Conf. Interval]

0.000
0.000

4.991037
1307.262

5.518333
1364.417

Multiple Regression Analysis


n Top

dressing (topaprate) determines yield


and is correlated with basaprate, both kg/ha
mzyield = + 1basaprate + 2topaprate +

. reg mzyield basaprate topaprate


Source

SS

df

MS

Model
Residual

2.3418e+09
1.2046e+10

2
8644

1.1709e+09
1393535.34

Total

1.4387e+10

8646

1664061.58

mzyield

Coef.

basaprate
topaprate
_cons

1.897807
3.62044
1314.93

Std. Err.
.321747
.3157663
14.58701

t
5.90
11.47
90.14

Number of obs
F( 2, 8644)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

8647
840.22
0.0000
0.1628
0.1626
1180.5

P>|t|

[95% Conf. Interval]

0.000
0.000
0.000

1.267106
3.001463
1286.336

2.528508
4.239418
1343.524

Multiple Regression Analysis


y = + 1 x1 + 2 x2 + ... + k xk + u
n

is the intercept

are slope parameters (usually)

1 slope

intercept
1

Multiple Regression Analysis


y = + 1 x1 + 2 x2 + ... + k xk + u
n

is the intercept

n
n
n
n

are slope parameters (usually)


u is the unobserved error or disturbance term
y is the dependant, explained, response or
predicted variable
x1... xk are the independent, explanatory,
control or predictor variables, or regressors

How is it done?
n OLS
n

(y
i =1

finds the parameters that minimize:


1 xi1 2 xi 2 ... k xik )

n Minimize

the noise
n Squared, so residuals dont off set
n Gives us and predicted values y

Ceteris Paribus Interpretation


y = + 1 x1 + 2 x2 + ... + k xk + u
n

is the partial effect or ceteris paribus


n Change x1 only: y = 1 x1
n Change x2 only: y = 2 x 2
n Share of total change attributable to x1:
x

1
1
y = x + x
1

Ceteris Paribus Interpretation


n Now,

how do we interpret the coefficient


estimate for basaprate?
mzyield = + 1basaprate + 2topaprate + u

. reg mzyield basaprate topaprate


Source

SS

df

MS

Model
Residual

2.3418e+09
1.2046e+10

2
8644

1.1709e+09
1393535.34

Total

1.4387e+10

8646

1664061.58

mzyield

Coef.

basaprate
topaprate
_cons

1.897807
3.62044
1314.93

Std. Err.
.321747
.3157663
14.58701

t
5.90
11.47
90.14

Number of obs
F( 2, 8644)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

8647
840.22
0.0000
0.1628
0.1626
1180.5

P>|t|

[95% Conf. Interval]

0.000
0.000
0.000

1.267106
3.001463
1286.336

2.528508
4.239418
1343.524

Ceteris Paribus Interpretation


n

According to these results, a one unit change in


x1 will result in a 1 unit change in y, all else
equal.

The ceteris paribus effect of a one unit change


in x1 is a 1 unit change in y.

Holding x2 constant, a one unit change in x1


results in a unit change in y.

Key Assumptions
n Linear

in parameters
n Random sample
n Zero conditional mean
n No perfect collinearity (variation in data)
n Homoskedastic errors

Key Assumptions
n Linear

in parameters
n Random sample
n Zero conditional mean
n No perfect collinearity (variation in data)
n Homoskedastic errors

Perfect Collinearity
n Variable

is a linear function of one or more

others.
n No variation in one variable (collinear w/
intercept)

Cant estimate slope parameter if no variation in x

Source: Wooldridge (2002)

17

Perfect Collinearity
n Variable

is a linear function of one or more

others.
n No variation in one variable (collinear w/
intercept)
n Perfect correlation between 2 binary
variables

Other hazards
n Multi-collinearity
n Including

irrelevant variables
n Omitting relevant variables

Multi-Collinearity
n Highly

correlated variables
n Variable is a nonlinear function of others
n Whats the problem?
n Efficiency losses
n Schmidt thumb rule

Including Irrelevant Variables


y = + 1 x1 + 2 x2 + 3 x3 + u
n Suppose x3

is has no effect on y, but key


assumptions are satisfied (overspecified)
n OLS is an unbiased estimator of 3, even if
3 is zero
n Estimates of 1 and 2 will be less efficient

Omitting Relevant Variables


y = + 1 x1 + 2 x2 + u
n Suppose

we omit x2 (underspecifying)
n OLS is generally biased

Omitting Relevant Variables


y = + 1 x1 + 2 x2 + u
~y = ~ + ~ x
1 1
n Estimate
n And

n It

let

x1 = 0 + 1 x2
Omitted Variable Bias

can be shown that:

( )
~

E 1 = 1 + 2 1

Multiple Regression Analysis


Corr(x1,x2)>0

Corr(x1,x2)<0

2 > 0

Positive bias

Negative bias

2 < 0

Negative bias

Positive bias

Source: Wooldridge, 2002, page 92

Omitting Relevant Variables


n More

generally, all OLS estimates will be


biased, even if just one explanatory
variable is correlated with the omitted
variables
n Direction of bias is less clear

Multiple Regression Analysis


n Goodness

of fit

R2

is the share of explained variance


R2 never decreases when we add variables
Usually, it will increase regardless of
relevance
n Adjusted

R2 accounts for this

Next time: Interpreting results


n Binary

regressors
n Other categorical regressors
n Categorical regressors as a series of
binary regressors
n Quadratic terms
n Other interactions
n Average Partial Effects

Sessions materials developed by Bill Burke


with input from Nicole Mason. January 2012.
burkewi2@stanford.edu

Vous aimerez peut-être aussi