Multiple Linear Regression

IAPRI Quantitative Analysis
Capacity Building Series
Multiple regression analysis

& interpreting results
How important is R-squared?

R-squared
0.45
Published in
Agricultural Economics
Best article of the year,
2008
???

2009
0.21

2010
Session 3 Topics
n Multiple
What
regression analysis
does it mean?
Why is it important?
How is it done and how are results interpreted?
What are the hazards?
Multiple Regression Analysis

n What
does it mean?
Multivariate
analysis/statistics
Ceteris paribus
All else equal
Controlling for

n Why
does it matter?
y = + 1 x1 + u
u = 2 x2 +
E (u | x1 ) = E (u) = 0 implying Corr(u, x1 ) = 0

What if y = + 1 x1 + 2 x2 +
If Corr
(x1 , x2 ) 0, then Corr(u, x1 ) 0
Results
n If
are biased
E(u | x1 , x2 ) = 0 (and other conditions), we
can estimate w/ multiple regressors

n Consider
maize yield (mzyield) and basal

fertilizer (basaprate), both kg/ha
mzyield = + 1basaprate + u
. reg mzyield basaprate

Source
SS
df
MS
Model
Residual
2.1590e+09
1.2229e+10
1
8646
2.1590e+09
1414446.51
Total
1.4388e+10
8647
1663962.69
mzyield
Coef.
basaprate
_cons
5.254685
1335.84
Std. Err.
.1344979
14.57861
t
39.07
91.63
Number of obs
F( 1, 8646)
Prob > F
R-squared
Adj R-squared
Root MSE
=
8648
= 1526.38
= 0.0000
= 0.1501
= 0.1500
= 1189.3
P>|t|
[95% Conf. Interval]
0.000
0.000
4.991037
1307.262
5.518333
1364.417

n Top
dressing (topaprate) determines yield

and is correlated with basaprate, both kg/ha
mzyield = + 1basaprate + 2topaprate +
. reg mzyield basaprate topaprate

Source
SS
df
MS
Model
Residual
2.3418e+09
1.2046e+10
2
8644
1.1709e+09
1393535.34
Total
1.4387e+10
8646
1664061.58
mzyield
Coef.
basaprate
topaprate
_cons
1.897807
3.62044
1314.93
Std. Err.
.321747
.3157663
14.58701
t
5.90
11.47
90.14
Number of obs
F( 2, 8644)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
8647
840.22
0.0000
0.1628
0.1626
1180.5
P>|t|
0.000
0.000
0.000
1.267106
3.001463
1286.336
2.528508
4.239418
1343.524

y = + 1 x1 + 2 x2 + ... + k xk + u
n
is the intercept
are slope parameters (usually)
1 slope
intercept
1

y = + 1 x1 + 2 x2 + ... + k xk + u
n
is the intercept
n
n
n
n
are slope parameters (usually)

u is the unobserved error or disturbance term
y is the dependant, explained, response or
predicted variable
x1... xk are the independent, explanatory,
control or predictor variables, or regressors
How is it done?
n OLS
n
(y
i =1
finds the parameters that minimize:

1 xi1 2 xi 2 ... k xik )
n Minimize
the noise
n Squared, so residuals dont off set
n Gives us and predicted values y
Ceteris Paribus Interpretation

y = + 1 x1 + 2 x2 + ... + k xk + u
n
is the partial effect or ceteris paribus

n Change x1 only: y = 1 x1
n Change x2 only: y = 2 x 2
n Share of total change attributable to x1:
x
1
1
y = x + x
1

n Now,
how do we interpret the coefficient

estimate for basaprate?
mzyield = + 1basaprate + 2topaprate + u
. reg mzyield basaprate topaprate

Source
SS
df
MS
Model
Residual
2.3418e+09
1.2046e+10
2
8644
1.1709e+09
1393535.34
Total
1.4387e+10
8646
1664061.58
mzyield
Coef.
basaprate
topaprate
_cons
1.897807
3.62044
1314.93
Std. Err.
.321747
.3157663
14.58701
t
5.90
11.47
90.14
Number of obs
F( 2, 8644)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
8647
840.22
0.0000
0.1628
0.1626
1180.5
P>|t|
0.000
0.000
0.000
1.267106
3.001463
1286.336
2.528508
4.239418
1343.524

n
According to these results, a one unit change in

x1 will result in a 1 unit change in y, all else
equal.
The ceteris paribus effect of a one unit change

in x1 is a 1 unit change in y.
Holding x2 constant, a one unit change in x1

results in a unit change in y.
Key Assumptions
n Linear
in parameters
n Random sample
n Zero conditional mean
n No perfect collinearity (variation in data)
n Homoskedastic errors
Key Assumptions
n Linear
in parameters
n Random sample
n Zero conditional mean
n No perfect collinearity (variation in data)
n Homoskedastic errors
Perfect Collinearity
n Variable
is a linear function of one or more
others.
n No variation in one variable (collinear w/
intercept)
Cant estimate slope parameter if no variation in x
Source: Wooldridge (2002)
17
Perfect Collinearity
n Variable
is a linear function of one or more
others.
n No variation in one variable (collinear w/
intercept)
n Perfect correlation between 2 binary
variables
Other hazards
n Multi-collinearity
n Including
irrelevant variables
n Omitting relevant variables
Multi-Collinearity
n Highly
correlated variables
n Variable is a nonlinear function of others
n Whats the problem?
n Efficiency losses
n Schmidt thumb rule
Including Irrelevant Variables

y = + 1 x1 + 2 x2 + 3 x3 + u
n Suppose x3
is has no effect on y, but key

assumptions are satisfied (overspecified)
n OLS is an unbiased estimator of 3, even if
3 is zero
n Estimates of 1 and 2 will be less efficient
Omitting Relevant Variables

y = + 1 x1 + 2 x2 + u
n Suppose
we omit x2 (underspecifying)
n OLS is generally biased

y = + 1 x1 + 2 x2 + u
~y = ~ + ~ x
1 1
n Estimate
n And
n It
let
x1 = 0 + 1 x2
Omitted Variable Bias
can be shown that:
( )
~
E 1 = 1 + 2 1

Corr(x1,x2)>0
Corr(x1,x2)<0
2 > 0
Positive bias
Negative bias
2 < 0
Negative bias
Positive bias
Source: Wooldridge, 2002, page 92

n More
generally, all OLS estimates will be

biased, even if just one explanatory
variable is correlated with the omitted
variables
n Direction of bias is less clear

n Goodness
of fit
R2
is the share of explained variance

R2 never decreases when we add variables
Usually, it will increase regardless of
relevance
n Adjusted
R2 accounts for this
Next time: Interpreting results

n Binary
regressors
n Other categorical regressors
n Categorical regressors as a series of
binary regressors
n Quadratic terms
n Other interactions
n Average Partial Effects
Sessions materials developed by Bill Burke

with input from Nicole Mason. January 2012.
burkewi2@stanford.edu

Multiple Linear Regression

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Multiple Linear Regression

Transféré par

Droits d'auteur :

Formats disponibles

IAPRI Quantitative Analysis

Capacity Building Series

Multiple regression analysis

How important is R-squared?

Best article of the year,

Best article of the year,

Multiple Regression Analysis

Multiple Regression Analysis

E (u | x1 ) = E (u) = 0 implying Corr(u, x1 ) = 0

(x1 , x2 ) 0, then Corr(u, x1 ) 0

E(u | x1 , x2 ) = 0 (and other conditions), we

can estimate w/ multiple regressors

Multiple Regression Analysis

maize yield (mzyield) and basal

. reg mzyield basaprate

[95% Conf. Interval]

Multiple Regression Analysis

dressing (topaprate) determines yield

. reg mzyield basaprate topaprate

[95% Conf. Interval]

Multiple Regression Analysis

are slope parameters (usually)

Multiple Regression Analysis

are slope parameters (usually)

finds the parameters that minimize:

Ceteris Paribus Interpretation

is the partial effect or ceteris paribus

Ceteris Paribus Interpretation

how do we interpret the coefficient

. reg mzyield basaprate topaprate

[95% Conf. Interval]

Ceteris Paribus Interpretation

According to these results, a one unit change in

The ceteris paribus effect of a one unit change

Holding x2 constant, a one unit change in x1

is a linear function of one or more

Cant estimate slope parameter if no variation in x

Source: Wooldridge (2002)

is a linear function of one or more

Including Irrelevant Variables

is has no effect on y, but key

Omitting Relevant Variables

Omitting Relevant Variables

can be shown that:

Multiple Regression Analysis

Source: Wooldridge, 2002, page 92

Omitting Relevant Variables

generally, all OLS estimates will be

Multiple Regression Analysis

is the share of explained variance

R2 accounts for this

Next time: Interpreting results

Sessions materials developed by Bill Burke

Vous aimerez peut-être aussi