Linear Regression Model

THE LINEAR REGRESSION MODEL
Lectures 1 and 2
Francis Kramarz and Michael Visser
MASTER 1 EPP
2012
THE SIMPLE LINEAR REGRESSION MODEL
Introduction
Definition
A simple linear regression model is a regression model where the
dependent variable is continuous, explained by a single exogenous
variable, and linear in the parameters.
Theoretical model : Y = 0 + 1 X The model is linear in the
parameters 0 and 1 . The single explanatory variable can be
continuous or discrete.
Assumption (1) (Random sampling)

(Xi , Yi ); i = 1, ..., n is a random sample of size n from the
population.
The sample is randomly drawn from the population of interest. For
each individual of the sample, we observe Xi and Yi we want to
estimate the simple linear model Yi = 0 + 1 Xi + ui .
The error term captures all relevant variables not included in the
model because they are not observed in the data set (for example
ability, dynamism ...)
Assumption (2) (Sample variation)

j and k such that Xj 6= Xk .
If all the Xi in the sample take the same value, the slope parameter
1 cannot be identified.
Assumption (3) (zero mean)

E (ui ) = 0
The average value of the error term is 0. This is not a restrictive
assumption. If E (u ) = , then we can rewrite the model as
follows : Y = 0 + + 1 X + e, where e = u .
Assumption (4) (zero conditional mean)

E (ui | Xi ) = 0
This is a crucial and strong assumption.
The Ordinary Least Squares estimator

Definition
The OLS estimators b0 and b1 minimize the sum of squared
residuals S ( 0 , 1 ) = ni=1 ui2 = ni=1 (Yi 0 1 Xi )2
First order Conditions
n
S ( b0 , b1 )
= 2 (Yi b0 b1 Xi ) = 0
0
i =1
(1)
n
S ( b0 , b1 )
= 2 Xi (Yi b0 b1 Xi ) = 0
1
i =1
(2)
Proposition
b0 = Y b1 X and b1 =
ni=1 (Yi Y )(Xi X )

2
ni=1 (Xi X )
Proof
Using equation (1), equation (2) turns into b1 =
ni=1 (Yi Y )Xi

.
(Xi X )Xi
ni=1 Xi (Xi X ) = ni=1 (Xi X )2

and ni=1 Xi (Yi Y ) = ni=1 (Xi X )(Yi Y ) , which gives the
result.
Remarks
I
I
b0 is the constant (or intercept) Ybi = b0 if Xi = 0

b1 is the slope estimate and measures the effect of Xi on Yi
Definition
I
I
I
I
I
Ybi = b0 + b1 Xi is the predicted value of Y for individual i

bi is the residual for individual i. The OLS
u
bi = Yi Y
estimates minimize the sum of squared residuals.
SST = ni=1 (Yi Y )2 is the total variation in Y
bi Y )2 is the explained sum of squares
SSE = ni=1 (Y
SSR = ni=1 u
bi2 is the regression sum of squares
How well does the regression line fits the sample data ?
Proposition
SST = SSE + SSR
(3)
This suggests a fitting criterion
Definition
SSE
SSR
Let R 2 = SST
= 1 SST
.
R 2 measures the proportion of the variation in Y explained by

variation in X .
Remarks
1. 0<R 2 <1
2. The goodness of fit increases with R 2
3. Finding a low R 2 does not mean that the model is useless, or
the estimates non reliable. But doubts on the quality of
prediction.
Example : Determinants of the monthly wage

I
I
I
Data set French sample of the European Community

Household Panel
Dependent variable wage = monthly wage
Explanatory variable
1. age (continuous variable)
2. sup= college graduate (discrete variable 0, 1)
I
I
The sample is restricted to the individuals aged 20 60 and

employed in 2000 (n=5010)
Descriptive statistics
I
I
I
wage= 1432.3 euros

age = 38.87 years
college graduates = 30.64 %
SAS Program
Proc reg data=c ;
model wage = age ;
model wage = sup ;
run ;
Figure: Effect of age on wage
Figure: Effect of education on wage
Finite-sample properties of the OLS estimator

I
I
Other estimation methods exist and could be used (e.g.

maximum likelihood, ...). How to choose ?
By comparing their properties in terms of
1. unbiasedness
2. precision (minimization of the variance)
1. Unbiasedness
Proposition
Let Assumptions (1) to (4) be verified. Then E ( b0 ) = 0 and
E ( b1 ) = 1 . The OLS estimators are unbiased estimators of the
parameters.
The sampling distribution of estimators is centered around the
true parameter. If we could draw an infinite number of samples of
size n from the population, and take the average of the infinite
number of OLS estimates, we would obtain the true value of 0
and 1 . BUT it does not mean that the particular OLS estimates
obtained using a given sample of size n are equal to the true values
of the parameters.
2. Precision
The question is now : are our OLS estimates far from the true
values of 0 and 1 ?
Assumption (5) (Homoscedasticity and non autocorrelation)

V (ui | Xi ) = 2 and corr (ui , uj | Xi , Xj ) = 0
Remarks
1. is the standard error
2. is unknown, since u represents all the unobserved
explanatory variables.
3. Under assumption (5), V (Yi | Xi ) = 2
4. Thus the variance of Y , given X , does not vary with X
(strong assumption)
5. Under assumptions (4) and (5), E (ui2 | Xi ) = 2
Proposition
Let assumptions (1) to (5) be verified. Then
2 n2 (ni=1 Xi )2
V ( b0 | X ) =
ni=1 (Xi X )2
V ( b1 | X ) =
2
ni=1 (Xi
X )2
(4)
(5)
Remarks
1. V ( b0 ) and V ( b1 ) increase with 2 (the higher the variance of
the error term, the more difficult it is to estimate the
parameters with precision).
2. V ( b0 ) and V ( b1 ) decrease with ni=1 (Xi X )2
3. V ( b0 ) and V ( b1 ) decrease with n
4. As is unknown,V ( b0 ) and V ( b1 ) are also unknown
5. BUT can be estimated using the sum of squared residuals
bi2
ni=1 u
Proposition
n
u
b2
b
2 = ni =12 i =
SSR
n 2
is an unbiased estimator of 2 E (b
2 ) = 2
Replacing by b
in the variance of the estimators give unbiased
estimators of V ( b0 ) and V ( b1 ).
We want to find the best unbiased linear estimator (BLUE) of 0
and 1 .
How is defined the best estimator ? It is, among the unbiased linear
estimators, the one with the smaller variance.
Theorem (Gauss-Markov)
Under assumptions (1) to (5), the OLS estimators b0 and b1 are
the best linear unbiased estimators of respectively 0 and 1 .
The effect of omitting relevant variables

I
I
I
I
I
I
I
Assume the true model is Y = 0 + 1 X1 + 2 X2 + u, while

the estimated model is Y = 0 + 1 X1 + e.
X2 is omitted and e = 2 X2 + u
Example : both age and education affect the wage
Let e0 and e1 denote the OLS estimators of
Y = 0 + 1 X1 + e.
Are they biased estimators of 0 and 1 ?
There are 2 cases
1. If Cov (X1 , X2 ) = 0 or 2 = 0, then e0 and e1 are unbiased.
2. If Cov (X1 , X2 ) 6= 0 and 2 6= 0, then e0 and e1 are biased.
Cov (X1 ,X2
The bias in e1 is equal to 2
)
V (X1 )
Sign of the bias

1. If Cov (X1 , X2 ) > 0 (resp. < 0) and 2 > 0 (resp. < 0), then
e1 is upward biased.
2. If Cov (X1 , X2 ) > 0 (resp. < 0) and 2 < 0 (resp. > 0), then
e1 is downward biased.
Example : age and education in a wage equation

I
We estimate the true model

wage = 0 + 1 age + 2 sup + u
b2 > 0
ceteris paribus education has a positive effect on the wage
We estimate the false model
wage = 0 + 1 age + e
I
I
(7)
Comparison : e1 = 26.24 < b1 = 30.47

the effect of age is under-estimated in the false model
We estimate education as a function of age
sup = 0 + 1 age + v
(6)
1 < 0
age has a negative effect on education
As b
1 < 0 and b2 > 0, the estimator is logically downward
biased
(8)
Figure: Effect of age and education on wage
Figure: Effect of age on education
THE MULTIPLE LINEAR REGRESSION MODEL
Introduction
Definition
A multiple linear regression model is a regression model where the
dependant variable is continuous, explained by several exogenous
variables, and linear in the parameters.
Example
K
Y = 0 +
k Xk + u = X + u
(9)
k =1
where 0 is a vector of K + 1 parameters ( 0 , 1 , ..., K )

and X is a matrix (1, X1 , ..., XK ) with K + 1 columns.
I
Linearity of the model in the parameters k , k = 0, ..., K
0 is the constant
k is the slope parameter and measures the ceteris paribus

effect of Xk on Y .
In the multiple case, the matrix notation is more convenient.
Assumptions
Assumption
1. Random sampling : {(X1i , X2i ..., XKi , Yi ); i = 1, ..., n} is a
random sample of size n from the population.
2. Sample variation and no collinearity : the explanatory variables
are not linearly related and none is constant Rank of X 0 X
is K + 1
3. Zero mean : E (ui ) = 0
4. Zero conditional mean : E (ui | X1i , ..., XKi ) = 0
5. Homoscedasticity and non-autocorrelation :
V (ui | X1i , ..., XKi ) = 2 and
corr (ui , uj | X1i , ..., XKi , X1j , ..., XKj ) = 0
Remarks
I
Assumptions (1), (3), (4) and (5) are similar to the simple
case.
Assumption (2) is an extension of the simple case.

Assumption (2) is required for the identification of the
parameters. WHY ?
I Assume X
ki = Xk i. Then 0 and k cannot be separately
identified.
I
Similarly, assume that the variables X1 and X2 are collinear :

X1 = X2 .
The model can be rewritten :
Y = 0 + ( 1 + 2 )X2 + K
k =3 k Xk + u.
1 and 2 cannot be separately identified.
contd
Remarks
I
Other example : dummy variables

If all the dummy variables and the constant are included in the
model, then the constant and the dummy parameters cannot
be identified separately.
One of the dummy variables must be dropped (the reference
variable) (i.e. education).
The rank of a matrix X is equal to the number of nonzero

characteristic roots in X 0 X .
rank (X ) min(number of rows, number of columns).
The Ordinary Least Squares estimator

Definition
The OLS estimates bk , k = {0, ..., K }, minimize the sum of
squared residuals
n
S ( 0 , ..., K ) =
ui2 =
i =1
(Yi 0
i =1
k Xk )2 = (Y X )0 (Y X
i
k =1
(10)
First order Conditions
b
2 ni=1 (Yi b0 K
k =1 k Xki ) = 0
n
K
2 i =1 Xki (Yi b0 k =1 bk Xki ) = 0, k = {1, ..., K }
OR, using matrix notation
n
2 Xi0 (Yi Xi b) = 2X 0 (Y X b) = 0
i =1
(11)
Proposition
b k ))
bki ) (X k X
n (Yi Y )((Xki X
bk = i =1
bki ) (X k X
b k ))2
ni=1 ((Xki X
bki is the predicted value of Xki obtained from a regression
where X
of Xki on a constant and all the other covariates.
K
b0 = Y
bk X k
k =1
OR using matrix notation,

b = (X 0 X )1 X 0 Y
(12)
Comparison with the simple case : bk is equal to the slope

bki )
estimate in the regression of Y on a constant and (Xki X
ceteris paribus effect.
How well does the regression line fits the sample data ?
Definition
Let SST = ni=1 (Yi Y )2 denote the total variation
in Y ,
bi Y )2 denote the explained sum of
SSE = ni=1 (Y
squares,
and SSR = ni=1 u
bi2 denote the regression sum of
squares.
Then
R2 =
b2
SSE
SSR
n u
= 1
= 1 n i =1 i 2
SST
SST
i =1 (Yi Y )
(13)
Remarks
I
R 2 does not decrease when one more variable is included.
The R 2 is useful to compare two models with the same

number of explanatory variables, but not useful if the number
of variables is different.

I

Household Panel
Dependent variable wage = monthly wage

Explanatory variables
1. age (continuous variable or discrete variable)

2. diplo0, diplo1, ... = educational level (discrete variables 0, 1)
3. sex, children
I
The sample is restricted to the individuals aged 20 60 and

SAS Program
Proc reg data=c ;
model wage = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1
nbenf2 nbenf3 ;
model wage = sex1 ag1 ag2 ag4 ag5 diplo1 diplo2 diplo3 diplo4
nbenf1 nbenf2 nbenf3 ;
run ;
Figure: Determinants of wage - model 1
Finite-sample properties of the OLS estimator

Proposition
Let Assumptions (1) to (4) be verified. Then
E ( b) =
(14)
Proposition
Let Assumptions (1) to (5) be verified. Then
V ( bk | X1 , ..., XK ) =
2
ni=1 (Xki X k )2 (1 Rk2 )
(15)
where Rk2 is the R 2 obtained from the regression of Xk on the

covariates.
OR using the matrix notation
V ( b | X ) = 2 (X 0 X )1
(16)
Remarks
V ( bk | X ) increases with 2 and Rk2 , and decreases with
ni=1 (Xki X k )2 .
Estimating the error variance
I
We dont know what the error variance 2 is, because we

dont observe the error term u.
BUT we observe the residuals u

b and we can use the residuals
to find an estimate of the error variance.
Replacing by b
in the variance of the estimators gives
unbiased estimators of V ( b).
Proposition
b
2 =
bi2
ni=1 u
n K 1
SSR
degree of freedom
is an unbiased estimator of 2 .
Theorem (Gauss-Markov)
Under assumptions (1) to (5), b is the best linear unbiased
estimator (BLUE).
Asymptotic properties
We study now the properties of the OLS estimator when n +.
First we study the consistency of the OLS estimator. Consistency is
stronger than unbiasedness.
Theorem
Under assumptions (1) to (4), plim( b) = .
Under assumptions (1) to (5), plim(b
) =
Next we study the asymptotic distribution of the OLS estimator.
Theorem (asymptotic normality)

Under assumptions (1) to (5),
n( b ) N (0, 2 Q 1 )
0
with Q 1 = plim(( XnX )1 )
(17)
The maximum likelihood method

The likelihood is the probability of observing the sample
{(Y1 , X1 ) , ..., (YN , XN )}.
Definition
The contribution of individual i to the likelihood is the function Li
defined by : Li (Yi , Xi ; ) = f (Yi , Xi ; ).
The likelihood function of the sample is the function
L(Yi , Xi , i = 1, ..., n; ) defined as the product of the individual
contributions :
N
L(Yi , Xi , i = 1, ..., n; ) =
Li (Yi , Xi ; )
i =1
If the dependent variable is continuous, L(Y , X ; ) is the

product of the distribution functions associated with each couple.
Assumption (6) (Normality of the error term)

The error term is independent of Xi and normally distributed with
zero mean and variance 2 : ui |Xi N (0, 2 ).
Under this assumption, we can use the Maximum Likelihood
method.
Theorem
Under assumptions (1) to (6), the Maximum Likelihood estimator
of is the OLS estimator.
MOREOVER
Theorem
Under assumptions (1) to (6), the OLS estimator and the ML
estimator are the minimum variance linear unbiased estimator of .
Tests and inference - finite samples

I
I
We want to test hypotheses about the parameters of the

model.
Example : wage equation
I
I
Are men significantly better paid than women ?

Are the 45-55 significantly better paid than the 35-45 (the
reference category) ?
Are the 45-55 significantly better paid than the 25-35 ?
In order to perform statistical tests on finite sample, we need

to add an assumption on the distribution of the error term.
Assumption (6) (Normality of the error term)

The error term is independent of Xi and normally distributed with
zero mean and variance 2 : ui |Xi N (0, 2 ).
the distribution of the error term conditional on the vector of

the explanatory variables is normal.
Remarks
I
Assumption (6) implies assumptions (3), (4) and (5).
Conditionally on the explanatory variables,the dependent

variable is normally distributed with mean Xi and variance
2
2 Yi |X N ( 0 + K
k =1 k Xki , )
Consequence for the distribution of b
Theorem
Under assumptions (1) to (6), bk is normally distributed with
mean k and variance V ( bk )
bk k N (0, V ( bk ))
b
k k N (0, 1)
V ( bk )
Proof b is a linear combination of the error term.

We cannot use directly this property since V ( b) is unknown. BUT
we can replace the variance by the estimate of the variance, which
gives :
Theorem

Under assumptions (1) to (6), k bk TnK 1
b
b ( k )
V
Idea
Assume we want to test the null hypothesis H0
We need
(i) a test statistic (t), i.e. a decision function that takes
its values in the set of hypotheses
(ii) a decision rule that determines when H0 is rejected
choose = Pr ( rejectH0 |H0 is true)
is the significance level (usually = 5%)
(iii) a critical region, i.e. the set of values of the test
statistic for which the null hypothesis is rejected
we want to find the critical value c that verifies
Pr ( rejectH0 |H0 is true) = Pr (|t | > c ) =
c is the 2 th percentile in the t distribution with
n K 1 degrees of freedom
Thet-test : is k significantly different from 0 ?

H0 :
H1 :
k = 0
k 6= 0
The null hypothesis means that Xk has no effect on Y .
(i) Under the null hypothesis, t = k b

b
b ( k )
V
The test statistic follows a t distribution with

n K 1 degrees of freedom.
(ii) = 5%
(iii) Pr (|t | > c ) = 5%
c is the 97.5-th percentile in the t distribution with
n K 1 degrees of freedom. When n K 1 is
large, c = 1.96
Decision
b
I If |t | =
< 1.96, then H0 is accepted at the 5%
b
b ( )
V
significance level
b
If |t | = b 1.96, then H0 is rejected at the 5%
b ( )
V
significance level.
Thet-test : is k significantly greater than 0 ?

H0 :
H1 :
k = 0
k > 0
The null hypothesis means that Xk has no effect on Y .

b
b ( k )
V

(ii) = 5%
(iii) Pr (t > c ) = 5%
c is the 95-th percentile in the t distribution with
large, c = 1.645
Decision
b
I If t =
< 1.645, then H0 is accepted at the 5%
b
b ( )
V
significance level
b
If t = b 1.645, then H0 is rejected at the 5%
b ( )
V
significance level.
The t-test : is k significantly different from a ?

k = a
k 6= a
H0 :
H1 :
a
b
b ( k )
V

(ii) = 1%
(iii) Pr (|t | > c ) = 1%
c is the 99.5-th percentile in the t distribution with
large, c = 2.576
Decision
I
If |t | < 2.576, then H0 is accepted at the 1% significance level
If |t | 2.576, then H0 is rejected at 1%
Thet-test : is k significantly different from j ?

H0 :
H1 :
k = j
k 6= j
The null hypothesis means that Xk and Xj have the same effect on
Y.
(i) Under the null hypothesis,
bk bj
t=
b ( bk )+V
b ( bj )2Cov
d ( bk , bj )
V

(ii) = 10%
(iii) Pr (|t | > c ) = 10%
c is the 95-th percentile in the t distribution with
large, c = 1.645
Decision
I If |t | < 1.645, then H0 is accepted at 10%
I If |t | 1.645, then H0 is rejected at 10%
The F-test : exclusion restrictions

We test q linear restrictions on the parameters

H0 :
K +1q = K +2q = ... = K = 0
H1 :
H0 is false
(i) Under the null hypothesis, the model becomes
Y = 0 + 1 X1 + ... + K q XK q .
(ii) The test statistic is F =
(R 2 Rc2 )/q
,
(1R 2 ) / (n K 1)
where Rc2
denotes the R 2 of the constrained model.

The F-statistic follows a Fisher distribution with
(q, n K 1) degrees of freedom.
(iii) Pr (F > c ) =
Decision
I
If F < c, then H0 is accepted at the significance level
If F c, then H0 is rejected
The F-test : Overall significance

Question : is the model completely false ?

H0 :
1 = 2 = ... = K = 0
H1 :
H0 is false
K is the number of restrictions.
(i) Under the null hypothesis, F =
R 2 /K
.
(1R 2 ) / (n K 1)
The F-statistic follows a Fisher distribution with

(K , n K 1) degrees of freedom.
(ii) = 1%
(iii) Pr (F > c ) = 1%
c is the 99-th percentile in the F distribution with
(K , n K 1) degrees of freedom.
Decision
I
If F < c, then H0 is accepted at the 1% significance level
If F c, then H0 is rejected at the 1% significance level.
Heteroscedasticity
Assumption (5) (homoscedasticity) is restrictive
How to proceed when this assumption is dropped ?
Assumption (5)
V (ui |X1i , ..., XKi ) = i2 and
corr (ui , uj | X1i , ..., XKi , X1j , ..., XKj ) = 0
Remarks
I
The OLS estimator remains unbiased
The OLS estimator remains consistent

BUT
V ( b|X ) is no longer equal to 2 (X 0 X )1
The asymptotic normality is no longer verified
The OLS estimator is no longer the BLUE
Testing for heteroscedasticity

I
There exist two tests : Breusch-Pagan and White. Both tests

are based on the residuals of the fitted model.
IDEA
E (ui2 |X1i , ..., XKi ) = i2 ui2 = i2 + ei
The variance of the error term is of the general form

i2 = i = (X1i , ..., XKi )
The Breusch-Pagan test

I
More restrictive assumption on the form of i :

(X1i , ..., XKi ) = 0 + K
k =1 k Xki
ui2 = 0 + K
k =1 k Xki + ei

I
H0 :
H1 :
1 = 2 = ... = K = 0
k such that k 6= 0
Under the null hypothesis, F = (1R 2R)/(/K

follows a
n K 1)
Fisher distribution with (K , n K 1) degrees of freedom.
The White test
Less restrictive assumption on the form of i :

K
(X1i , ..., XKi ) = 0 + K
k =1 k Xki + k,j =1 kj Xki Xji
K
ui2 = 0 + K
k =1 k Xki + k,j =1 kj Xki Xji + ei

I
H0 :
H1 :
k = kj = 0 k , j
k such that k 6= 0 or kj 6= 0
2
Under the null hypothesis, F = (1R 2R)/(/q

follows a
n q 1)
Fisher distribution with (q, n q 1) degrees of freedom and
K (K +1)
q=K+
2

Household Panel
I The sample is restricted to the individuals aged 20 60 and
I Dependent variable wage = monthly wage
I Explanatory variables age (continuous variable), diplo0,
diplo1, ... = educational level (discrete variables 0, 1), sex,
children
SAS Program
Proc model data=c ;
Parms ac asex1 aage aage2 adiplo1 adiplo2 adiplo3 adiplo4
anbenf1 anbenf2 anbenf3 ;
wage=ac+ +asex1*sex1+aage*age+aage2*age2+ adiplo1*diplo1
+ adiplo2*diplo2 + adiplo3*diplo3 + adiplo4*diplo4
+anbenf1*nbenf1 +anbenf2*nbenf2 +anbenf3*nbenf3 ;
fit wage / white breusch=(1 sex1 age age2 diplo1 diplo2 diplo3
diplo4 nbenf1 nbenf2 nbenf3) ;
run ;
I
Figure: Example : Determinants of wage
Correcting for heteroscedasticity

There are two methods to improve the efficiency of the estimation
in the presence of heteroscedastic errors
1. Use the Feasible General Least Squares (FGLS) method if the
function (X1i , ..., XKi ) is known
2. Use the OLS method but compute a heteroscedastic
consistent-covariance matrix estimator (White
(1980),Davidson and MacKinnon (1993))
1. The Feasible General Least Squares (FGLS) method
I
Assumption : i2 = i = g (0 + K
k =1 k Xki ) where g (.) is
known.
IDEA
apply a transformation to the initial model Yi = Xi + ui that
makes the error terms of the transformed model
homoscedastic.
q
Yi , Xi , and ui are divided by
0 + K
k =1 k Xki
The transformed model is

Yi
0
i
K
k =1 k Xki
ui
i
Remarks
I
there is no constant in the transformed model
the parameters k are the same than in the initial model
I ui
i
is homoscedastic
The transformed model can then be estimated by OLS

EXCEPT that we do not know the parameters 0 and k .
we proceed in the following way

1. estimate the initial model by OLS and compute u
bi
K
2
2. estimate u
bi = 0 + k =1 k Xki + ei and compute
b
k Xki
i = b
0 + K
k =1 b
3. divide the initial model by b
i
4. estimate the transformed model by the OLS
if is well specified, the FGLS estimator is unbiased,
consistent, and asymptotically efficient
2. Compute a heteroscedastic consistent-covariance matrix
estimator
The idea is to use the OLS method but compute a heteroscedastic
consistent-covariance matrix estimator (cf. White (1980),Davidson
and MacKinnon (1993)).
Example
SAS Program
Proc Reg Data=outc ;
Model UHat2 = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1
nbenf2 nbenf3 ;
Output OUT= MyData PREDICTED = Sig2hat ;
Run ;
/* Create weights as inverse of root of variance */
Data MyData ; Set MyData ; OmegaInv = SQRT(1/Sig2hat) ;
Run ;
Proc Reg Data=MyData ;
Model wage = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1
nbenf2 nbenf3 ;
Weight OmegaInv ;
Run ;
Figure: Estimation of u
b2 when g (x ) = x
Figure: Estimation of u
b2 when g (x ) = exp (x )
Figure: Determinants of wage correcting for heteroscedasticity

Linear Regression Model

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Linear Regression Model

Transféré par

Droits d'auteur :

Formats disponibles

THE LINEAR REGRESSION MODEL

THE SIMPLE LINEAR REGRESSION MODEL

Assumption (1) (Random sampling)

Assumption (2) (Sample variation)

Assumption (3) (zero mean)

Assumption (4) (zero conditional mean)

The Ordinary Least Squares estimator

ni=1 (Yi Y )(Xi X )

ni=1 (Yi Y )Xi

ni=1 Xi (Xi X ) = ni=1 (Xi X )2

b0 is the constant (or intercept) Ybi = b0 if Xi = 0

Ybi = b0 + b1 Xi is the predicted value of Y for individual i

This suggests a fitting criterion

R 2 measures the proportion of the variation in Y explained by

Example : Determinants of the monthly wage

Data set French sample of the European Community

The sample is restricted to the individuals aged 20 60 and

wage= 1432.3 euros

Figure: Effect of age on wage

Figure: Effect of education on wage

Finite-sample properties of the OLS estimator

Other estimation methods exist and could be used (e.g.

Assumption (5) (Homoscedasticity and non autocorrelation)

The effect of omitting relevant variables

Assume the true model is Y = 0 + 1 X1 + 2 X2 + u, while

Sign of the bias

Example : age and education in a wage equation

We estimate the true model

Comparison : e1 = 26.24 < b1 = 30.47

Figure: Effect of age and education on wage

Figure: Effect of age on education

THE MULTIPLE LINEAR REGRESSION MODEL

where 0 is a vector of K + 1 parameters ( 0 , 1 , ..., K )

Linearity of the model in the parameters k , k = 0, ..., K

k is the slope parameter and measures the ceteris paribus

In the multiple case, the matrix notation is more convenient.

Assumption (2) is an extension of the simple case.

Similarly, assume that the variables X1 and X2 are collinear :

Other example : dummy variables

The rank of a matrix X is equal to the number of nonzero

rank (X ) min(number of rows, number of columns).

The Ordinary Least Squares estimator

OR using matrix notation,

Comparison with the simple case : bk is equal to the slope

R 2 does not decrease when one more variable is included.

The R 2 is useful to compare two models with the same

Example : Determinants of the monthly wage

Data set French sample of the European Community

Dependent variable wage = monthly wage

1. age (continuous variable or discrete variable)

The sample is restricted to the individuals aged 20 60 and

Figure: Determinants of wage - model 1

Figure: Determinants of wage - model 2

Finite-sample properties of the OLS estimator

where Rk2 is the R 2 obtained from the regression of Xk on the

We dont know what the error variance 2 is, because we

BUT we observe the residuals u

Theorem (asymptotic normality)

with Q 1 = plim(( XnX )1 )

The maximum likelihood method

If the dependent variable is continuous, L(Y , X ; ) is the

Assumption (6) (Normality of the error term)

Tests and inference - finite samples

We want to test hypotheses about the parameters of the

Thet-test : is k significantly different from 0 ?

Thet-test : is k significantly greater than 0 ?

Thet-test : is k significantly different from j ?