Académique Documents
Professionnel Documents
Culture Documents
2019
1 / 43
Recap
2 / 43
Lecture Outline: Wooldridge: Ch 2 - 2.3, Appendix B-1,
B-2, B-4
3 / 43
What is a model?
4 / 43
Terminology and notation
I Notation: Unlike first year, we use lower case letters for random
variables
I Terminology: There are many different ways people refer to the
target variable y that we want to explain using variable x. These
include:
y x
Dependent Variable Independent Variable
Explained Variable Explanatory Variable
Response Variable Control Variable
Predicted Variable Predictor Variable
Regressand Regressor
I The expressions
“Run a regression of y on x”, and,
“Regress y on x”
both mean “Estimate the model y = β0 + β1 x + u using the
ordinary least squares method”
5 / 43
Modelling mean
E (y | x) = β0 + β1 x (y | x) = βˆ0 + βˆ1 x
ŷ = E\
(Conditional expectation function) (Sample regression function)
6 / 43
The simple linear regression model in a picture
7 / 43
The simple linear regression model in equation form
y = β0 + β1 x + u with E (u | x) = 0
8 / 43
The OLS estimator
I Let’s leave the theory universe and go back to the data world
I We have a random sample of n observations on two variables x and
y in two columns of a spreadsheet
I Let’s denote them by (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )
I We want to see if these two columns of numbers are related to each
other
I In the simple case that there is only one x, we look at the scatter
plot, which is very informative visual tools and give us a good idea
of the correlation between y and x (bodyfat and weight)
I Unfortunately though, in some business and economic applications
the signal is too weak to be detected by data visualisation alone
(asset return and size)
9 / 43
The OLS estimator
I How can we determine a straight line that fits our data best?
I We find βb0 and βb1 that minimise the sum of squared residuals
n
X
SSR(b0 , b1 ) = (yi − b0 − b1 xi )2
i=1
I Two equations and two unknowns, we can solve to get βb0 and βb1
10 / 43
I We obtain the set of equations
n
X n
X
ûi = (yi − βb0 − βb1 xi ) = 0
i=1 i=1
n
X n
X
xi ûi = xi (yi − βb0 − βb1 xi ) = 0
i=1 i=1
I And after some algebra (see page 28 of the textbook, 5th ed., or
page 26, 6th ed.)
Pn
(x − x̄)(yi − ȳ )
β1 = i=1
b Pn i 2
i=1 (xi − x̄)
βb0 = ȳ − βb1 x̄
11 / 43
I Several important things to note:
1.
\
Cov (x, y ) σ̂xy
βb1 = = 2
\
Var (x) σ̂x
Those of you who are doing finance, now realise where the
name “beta” of a stock has come from!
2. From the formula for βb0 we see that ȳ = βb0 + βb1 x̄, which
shows that (x̄, ȳ ) lies on the regression line, i.e. regression
prediction is most accurate for the sample average.
12 / 43
Estimators we have seen so far
population
q st. dev. sampleqst. dev.
σy = σy2 σ̂y = σ̂y2
population covariance sample covariancePn
1
σxy = E (x − µx )(y − µy ) σ̂xy = n−1 i=1 (xi − x̄)(yi − ȳ )
population correlation coeff. sample corr. coeff.
σ σ̂
ρxy = σx xyσy ρ̂xy = σ̂x xyσ̂y
slope and intercept of the PRF their OLS estimators
σ̂
β1 and β0 in βb1 = σ̂xy2
x
E (y | x) = β0 + β1 x βb0 = ȳ − βb1 x̄
13 / 43
Simple linear regression in matrix form
yi = β0 + β1 xi + ui , i = 1, . . . , n
14 / 43
I We define the n × 1 vectors y (dependent variable vector) and u
(error vector):
y1 u1
y2 u2
y= . and u =
..
..
.
yn un
and the n × 2 matrix of regressors
1 x1
1 x2
X= .
..
..
.
1 xn
and the 2 × 1 parameter vector
β0
β=
β1
which allow us to write the model simply as
y = Xβ + u
15 / 43
I In multiple regression where we have k explanatory variables plus
the intercept
y = X β + u
n×1 n×(k+1) (k+1)×1 n×1
where
1 x11 x12 ··· x1k
1 x21 x22 ··· x2k
X =
.. .. .. ..
n×(k+1) . . . .
1 xn1 xn2 ··· xnk
and
β0
β1
β =
..
(k+1)×1 .
βk
I Remember that y, X are observable, β and u are unknown and
unobservable
I Also remember that Xb for any (k + 1) × 1 vector b is an n × 1
vector that is a linear combination of columns of X
16 / 43
The power of abstraction
I In the real world we have goals such as:
I We want to determine the added value of education to wage
18 / 43
Vectors and vector spaces
I An n-dimensional vector is an arrow from the origin to the point in
<n with coordinates given by its elements
I Example: v =
4
3
19 / 43
I The length of a vector is the square root of sum of squares of its
coordinates
length(v) = (v0 v)1/2
I Example: vector v on the previous slide
I For u and v of the same dimension
u0 v = 0 ⇔ u and v are perpendicular (orthogonal) to each other
I Example:
3 −1
and
2 1.5
20 / 43
I For any constant c, the vector cv is on the line that passes through
v
I This line is called the “space spanned by v”
I Example:
3 −2 1
and are in the space spanned by v =
3 −2 1
21 / 43
I Consider a matrix X that has two columns that are not a multiple of
each other
I Geometrically, these vectors form a two dimensional plane that
contains all linear combinations of these two vectors, i.e. set of all
Xb for all b. This plane is called the column space of X
I If the number of rows of X is also two, then the column space of X
is the entire <2 , that is any two dimensional y can be written as a
linear combination of columns of X
I Consider the house price example, but only with two houses
10 1 4 1 4 b
= β̂ + β̂ + û = β + û
4 1 0 1 1 1 1
22 / 43
6 obs 2
1
obs 1
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-1
-2
-3
-4
-5
-6
23 / 43
6 obs 2
1
obs 1
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-1
-2
-3
-4
-5
-6
24 / 43
6 obs 2
1
obs 1
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-1
-2
-3
-4
-5
-6
25 / 43
6 obs 2
1
obs 1
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-1
-2
-3
-4
-5
-6
26 / 43
6 obs 2
1
obs 1
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-1
-2
-3
-4
-5
-6
27 / 43
6 obs 2
1
obs 1
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-1
-2
-3
-4
-5
-6
28 / 43
6 obs 2
1
obs 1
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-1
-2
-3
-4
-5
-6
29 / 43
I This shows that with only two observations, we would suggest
price
[ = 2 + 2bedrooms
I While this fits the first two houses perfectly, when we add the third
house (1 bedroom sold for 6 hundred thousands) we make an error
of 2 hundred thousand dollars
I With 3 observations, the 3 dimensional price vector no longer lies in
the space of linear combinations of columns of X.
I The closest point in the column space of X to y is ...
30 / 43
31 / 43
32 / 43
I Hence, the shortest û is the one that is perpendicular to column of
X, i.e.
X0 û = 0
I Since û = y − Xβ,
b we get
X0 (y − Xβ)
b =0
⇒ X0 y = X0 Xβb
⇒ βb = (X0 X)−1 X0 y
33 / 43
I The vector of OLS predicted values is the orthogonal projection of y
in the column space of X
I By definition
y = ŷ + û
I Since y, ŷ and û form a right-angled triangle, we know (remember
the length of a vector)
y0 y = ŷ0 ŷ + û0 û
i.e.,
n
X n
X n
X
yi2 = ŷi2 + ûi2 (L2)
i=1 i=1 i=1
34 / 43
I Since (1 1 ··· 1)û = 0 we also have
n
X n
X
yi = ŷi ⇒ ȳ = ŷ¯
i=1 i=1
or
SST = SSE + SSR
I This leads to the definition of the coefficient of determination R 2 ,
which is a measure of goodness of fit
R 2 = SSE/SST = 1 − SSR/SST
35 / 43
Interpretation of OLS estimates
Example: The causal effect of education on wage
wage = β0 + β1 educ + β2 IQ + u
36 / 43
OLS in action
Example: The causal effect of education on wage
I The coefficient of educ now shows that for two people with the
same IQ score, the one with 1 more year of education is expected to
earn $42 more.
38 / 43
Interpretation of OLS estimates
E (y | x1 , x2 ) = β0 + β1 x1 + β2 x2
39 / 43
I Now, in
∆ŷ = β̂1 ∆x1 + β̂2 ∆x2
what happens if we hold x2 fixed and increase x1 by 1 unit, that is,
∆x2 = 0 and ∆x1 = +1? We will have
∆ŷ = β̂1
40 / 43
I Similarly,
∆ŷ = β̂2 if ∆x1 = 0 and ∆x2 = +1
I Let’s go back to regression output and interpret the parameters.
wage
[ = −128.89 + 42.06 educ + 5.14 IQ
n = 935, R 2 = .134
I 42.06 shows that for two people with the same IQ, the one with one
more year of education is predicted to earn $42.06 more in monthly
wages.
I Or: Every extra year of education increases the predicted wage by
$42.06, keeping IQ constant (or “after controlling for IQ”, or ”after
removing the effect of IQ”, or ”all else constant”, or ”all else
equal”, or ”ceteris paribus”)
41 / 43
Summary
I Given a sample of n observations, OLS finds the orthogonal
projection of the dependent variable vector in the column space of
explanatory variables
I The residual vector is orthogonal to each of the explanatory variable
vectors, including a column of ones for the intercept
X0 û = 0
I This leads to the formula for the OLS estimator in multiple
regression
βb = (X0 X)−1 X0 y
I It also leads to
SST = SSE + SSR
I We learned how to interpret the coefficients of an estimated
regression model
I Note the difference between the population parameter β and its
OLS estimator β.b β is constant and does not change, βb is a
function of sample and its value changes for different samples. Why
are these good estimators?
42 / 43
I The probability that the height of a randomly selected man lies in a
certain interval is the area under the pdf over that interval
43 / 43