Académique Documents
Professionnel Documents
Culture Documents
Lectures 1 and 2
Francis Kramarz and Michael Visser
MASTER 1 EPP
2012
Introduction
Definition
A simple linear regression model is a regression model where the
dependent variable is continuous, explained by a single exogenous
variable, and linear in the parameters.
Theoretical model : Y = 0 + 1 X The model is linear in the
parameters 0 and 1 . The single explanatory variable can be
continuous or discrete.
The error term captures all relevant variables not included in the
model because they are not observed in the data set (for example
ability, dynamism ...)
(1)
n
S ( b0 , b1 )
= 2 Xi (Yi b0 b1 Xi ) = 0
1
i =1
(2)
Proposition
b0 = Y b1 X and b1 =
Proof
Using equation (1), equation (2) turns into b1 =
Remarks
I
I
Definition
I
I
I
I
I
How well does the regression line fits the sample data ?
Proposition
SST = SSE + SSR
(3)
Definition
SSE
SSR
Let R 2 = SST
= 1 SST
.
Remarks
1. 0<R 2 <1
2. The goodness of fit increases with R 2
3. Finding a low R 2 does not mean that the model is useless, or
the estimates non reliable. But doubts on the quality of
prediction.
I
I
SAS Program
Proc reg data=c ;
model wage = age ;
model wage = sup ;
run ;
1. Unbiasedness
Proposition
Let Assumptions (1) to (4) be verified. Then E ( b0 ) = 0 and
E ( b1 ) = 1 . The OLS estimators are unbiased estimators of the
parameters.
The sampling distribution of estimators is centered around the
true parameter. If we could draw an infinite number of samples of
size n from the population, and take the average of the infinite
number of OLS estimates, we would obtain the true value of 0
and 1 . BUT it does not mean that the particular OLS estimates
obtained using a given sample of size n are equal to the true values
of the parameters.
2. Precision
The question is now : are our OLS estimates far from the true
values of 0 and 1 ?
Proposition
Let assumptions (1) to (5) be verified. Then
2 n2 (ni=1 Xi )2
V ( b0 | X ) =
ni=1 (Xi X )2
V ( b1 | X ) =
2
ni=1 (Xi
X )2
(4)
(5)
Remarks
1. V ( b0 ) and V ( b1 ) increase with 2 (the higher the variance of
the error term, the more difficult it is to estimate the
parameters with precision).
2. V ( b0 ) and V ( b1 ) decrease with ni=1 (Xi X )2
3. V ( b0 ) and V ( b1 ) decrease with n
4. As is unknown,V ( b0 ) and V ( b1 ) are also unknown
5. BUT can be estimated using the sum of squared residuals
bi2
ni=1 u
Proposition
n
u
b2
b
2 = ni =12 i =
SSR
n 2
is an unbiased estimator of 2 E (b
2 ) = 2
Replacing by b
in the variance of the estimators give unbiased
estimators of V ( b0 ) and V ( b1 ).
We want to find the best unbiased linear estimator (BLUE) of 0
and 1 .
How is defined the best estimator ? It is, among the unbiased linear
estimators, the one with the smaller variance.
Theorem (Gauss-Markov)
Under assumptions (1) to (5), the OLS estimators b0 and b1 are
the best linear unbiased estimators of respectively 0 and 1 .
I
I
I
I
I
I
b2 > 0
ceteris paribus education has a positive effect on the wage
We estimate the false model
wage = 0 + 1 age + e
I
I
(7)
(6)
1 < 0
age has a negative effect on education
As b
1 < 0 and b2 > 0, the estimator is logically downward
biased
(8)
Introduction
Definition
A multiple linear regression model is a regression model where the
dependant variable is continuous, explained by several exogenous
variables, and linear in the parameters.
Example
K
Y = 0 +
k Xk + u = X + u
(9)
k =1
0 is the constant
Assumptions
Assumption
1. Random sampling : {(X1i , X2i ..., XKi , Yi ); i = 1, ..., n} is a
random sample of size n from the population.
2. Sample variation and no collinearity : the explanatory variables
are not linearly related and none is constant Rank of X 0 X
is K + 1
3. Zero mean : E (ui ) = 0
4. Zero conditional mean : E (ui | X1i , ..., XKi ) = 0
5. Homoscedasticity and non-autocorrelation :
V (ui | X1i , ..., XKi ) = 2 and
corr (ui , uj | X1i , ..., XKi , X1j , ..., XKj ) = 0
Remarks
I
Assumptions (1), (3), (4) and (5) are similar to the simple
case.
identified.
I
contd
Remarks
I
S ( 0 , ..., K ) =
ui2 =
i =1
(Yi 0
i =1
k Xk )2 = (Y X )0 (Y X
i
k =1
(10)
First order Conditions
b
2 ni=1 (Yi b0 K
k =1 k Xki ) = 0
n
K
2 i =1 Xki (Yi b0 k =1 bk Xki ) = 0, k = {1, ..., K }
OR, using matrix notation
n
2 Xi0 (Yi Xi b) = 2X 0 (Y X b) = 0
i =1
(11)
Proposition
b k ))
bki ) (X k X
n (Yi Y )((Xki X
bk = i =1
bki ) (X k X
b k ))2
ni=1 ((Xki X
bki is the predicted value of Xki obtained from a regression
where X
of Xki on a constant and all the other covariates.
K
b0 = Y
bk X k
k =1
(12)
How well does the regression line fits the sample data ?
Definition
Let SST = ni=1 (Yi Y )2 denote the total variation
in Y ,
bi Y )2 denote the explained sum of
SSE = ni=1 (Y
squares,
and SSR = ni=1 u
bi2 denote the regression sum of
squares.
Then
R2 =
b2
SSE
SSR
n u
= 1
= 1 n i =1 i 2
SST
SST
i =1 (Yi Y )
(13)
Remarks
I
SAS Program
Proc reg data=c ;
model wage = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1
nbenf2 nbenf3 ;
model wage = sex1 ag1 ag2 ag4 ag5 diplo1 diplo2 diplo3 diplo4
nbenf1 nbenf2 nbenf3 ;
run ;
(14)
Proposition
Let Assumptions (1) to (5) be verified. Then
V ( bk | X1 , ..., XK ) =
2
ni=1 (Xki X k )2 (1 Rk2 )
(15)
(16)
Remarks
V ( bk | X ) increases with 2 and Rk2 , and decreases with
ni=1 (Xki X k )2 .
Estimating the error variance
I
Proposition
b
2 =
bi2
ni=1 u
n K 1
SSR
degree of freedom
is an unbiased estimator of 2 .
Theorem (Gauss-Markov)
Under assumptions (1) to (5), b is the best linear unbiased
estimator (BLUE).
Asymptotic properties
We study now the properties of the OLS estimator when n +.
First we study the consistency of the OLS estimator. Consistency is
stronger than unbiasedness.
Theorem
Under assumptions (1) to (4), plim( b) = .
Under assumptions (1) to (5), plim(b
) =
Next we study the asymptotic distribution of the OLS estimator.
n( b ) N (0, 2 Q 1 )
0
(17)
Definition
The contribution of individual i to the likelihood is the function Li
defined by : Li (Yi , Xi ; ) = f (Yi , Xi ; ).
The likelihood function of the sample is the function
L(Yi , Xi , i = 1, ..., n; ) defined as the product of the individual
contributions :
N
L(Yi , Xi , i = 1, ..., n; ) =
Li (Yi , Xi ; )
i =1
Theorem
Under assumptions (1) to (6), the Maximum Likelihood estimator
of is the OLS estimator.
MOREOVER
Theorem
Under assumptions (1) to (6), the OLS estimator and the ML
estimator are the minimum variance linear unbiased estimator of .
Remarks
I
Theorem
Under assumptions (1) to (6), bk is normally distributed with
mean k and variance V ( bk )
bk k N (0, V ( bk ))
b
k k N (0, 1)
V ( bk )
Theorem
Under assumptions (1) to (6), k bk TnK 1
b
b ( k )
V
Idea
Assume we want to test the null hypothesis H0
We need
(i) a test statistic (t), i.e. a decision function that takes
its values in the set of hypotheses
(ii) a decision rule that determines when H0 is rejected
choose = Pr ( rejectH0 |H0 is true)
is the significance level (usually = 5%)
(iii) a critical region, i.e. the set of values of the test
statistic for which the null hypothesis is rejected
we want to find the critical value c that verifies
Pr ( rejectH0 |H0 is true) = Pr (|t | > c ) =
c is the 2 th percentile in the t distribution with
n K 1 degrees of freedom
k = 0
k 6= 0
b ( k )
V
significance level
b
If |t | = b 1.96, then H0 is rejected at the 5%
b ( )
V
significance level.
k = 0
k > 0
b ( k )
V
significance level
b
If t = b 1.645, then H0 is rejected at the 5%
b ( )
V
significance level.
k = a
k 6= a
H0 :
H1 :
a
(i) Under the null hypothesis, t = k b
b
b ( k )
V
k = j
k 6= j
The null hypothesis means that Xk and Xj have the same effect on
Y.
(i) Under the null hypothesis,
bk bj
t=
b ( bk )+V
b ( bj )2Cov
d ( bk , bj )
V
(R 2 Rc2 )/q
,
(1R 2 ) / (n K 1)
where Rc2
If F c, then H0 is rejected
R 2 /K
.
(1R 2 ) / (n K 1)
Heteroscedasticity
Assumption (5) (homoscedasticity) is restrictive
Assumption (5)
V (ui |X1i , ..., XKi ) = i2 and
corr (ui , uj | X1i , ..., XKi , X1j , ..., XKj ) = 0
Remarks
I
IDEA
E (ui2 |X1i , ..., XKi ) = i2 ui2 = i2 + ei
ui2 = 0 + K
k =1 k Xki + ei
I
H0 :
H1 :
1 = 2 = ... = K = 0
k such that k 6= 0
I
H0 :
H1 :
k = kj = 0 k , j
k such that k 6= 0 or kj 6= 0
2
Assumption : i2 = i = g (0 + K
k =1 k Xki ) where g (.) is
known.
IDEA
apply a transformation to the initial model Yi = Xi + ui that
makes the error terms of the transformed model
homoscedastic.
q
0 + K
k =1 k Xki
0
i
K
k =1 k Xki
ui
i
Remarks
I
I ui
i
is homoscedastic
Example
SAS Program
Proc Reg Data=outc ;
Model UHat2 = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1
nbenf2 nbenf3 ;
Output OUT= MyData PREDICTED = Sig2hat ;
Run ;
/* Create weights as inverse of root of variance */
Data MyData ; Set MyData ; OmegaInv = SQRT(1/Sig2hat) ;
Run ;
Proc Reg Data=MyData ;
Model wage = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1
nbenf2 nbenf3 ;
Weight OmegaInv ;
Run ;
Figure: Estimation of u
b2 when g (x ) = x
Figure: Estimation of u
b2 when g (x ) = exp (x )