Vous êtes sur la page 1sur 57

THE LINEAR REGRESSION MODEL

Lectures 1 and 2
Francis Kramarz and Michael Visser
MASTER 1 EPP

2012

THE SIMPLE LINEAR REGRESSION MODEL

Introduction
Definition
A simple linear regression model is a regression model where the
dependent variable is continuous, explained by a single exogenous
variable, and linear in the parameters.
Theoretical model : Y = 0 + 1 X The model is linear in the
parameters 0 and 1 . The single explanatory variable can be
continuous or discrete.

Assumption (1) (Random sampling)


(Xi , Yi ); i = 1, ..., n is a random sample of size n from the
population.
The sample is randomly drawn from the population of interest. For
each individual of the sample, we observe Xi and Yi we want to
estimate the simple linear model Yi = 0 + 1 Xi + ui .

The error term captures all relevant variables not included in the
model because they are not observed in the data set (for example
ability, dynamism ...)

Assumption (2) (Sample variation)


j and k such that Xj 6= Xk .
If all the Xi in the sample take the same value, the slope parameter
1 cannot be identified.

Assumption (3) (zero mean)


E (ui ) = 0
The average value of the error term is 0. This is not a restrictive
assumption. If E (u ) = , then we can rewrite the model as
follows : Y = 0 + + 1 X + e, where e = u .

Assumption (4) (zero conditional mean)


E (ui | Xi ) = 0
This is a crucial and strong assumption.

The Ordinary Least Squares estimator


Definition
The OLS estimators b0 and b1 minimize the sum of squared
residuals S ( 0 , 1 ) = ni=1 ui2 = ni=1 (Yi 0 1 Xi )2
First order Conditions
n
S ( b0 , b1 )
= 2 (Yi b0 b1 Xi ) = 0
0
i =1

(1)

n
S ( b0 , b1 )
= 2 Xi (Yi b0 b1 Xi ) = 0
1
i =1

(2)

Proposition
b0 = Y b1 X and b1 =

ni=1 (Yi Y )(Xi X )


2
ni=1 (Xi X )

Proof
Using equation (1), equation (2) turns into b1 =

ni=1 (Yi Y )Xi


.
(Xi X )Xi

ni=1 Xi (Xi X ) = ni=1 (Xi X )2


and ni=1 Xi (Yi Y ) = ni=1 (Xi X )(Yi Y ) , which gives the
result.

Remarks
I
I

b0 is the constant (or intercept) Ybi = b0 if Xi = 0


b1 is the slope estimate and measures the effect of Xi on Yi

Definition
I
I

I
I
I

Ybi = b0 + b1 Xi is the predicted value of Y for individual i


bi is the residual for individual i. The OLS
u
bi = Yi Y
estimates minimize the sum of squared residuals.
SST = ni=1 (Yi Y )2 is the total variation in Y
bi Y )2 is the explained sum of squares
SSE = ni=1 (Y
SSR = ni=1 u
bi2 is the regression sum of squares

How well does the regression line fits the sample data ?
Proposition
SST = SSE + SSR

(3)

This suggests a fitting criterion

Definition
SSE
SSR
Let R 2 = SST
= 1 SST
.

R 2 measures the proportion of the variation in Y explained by


variation in X .

Remarks
1. 0<R 2 <1
2. The goodness of fit increases with R 2
3. Finding a low R 2 does not mean that the model is useless, or
the estimates non reliable. But doubts on the quality of
prediction.

Example : Determinants of the monthly wage


I
I
I

Data set French sample of the European Community


Household Panel
Dependent variable wage = monthly wage
Explanatory variable
1. age (continuous variable)
2. sup= college graduate (discrete variable 0, 1)

I
I

The sample is restricted to the individuals aged 20 60 and


employed in 2000 (n=5010)
Descriptive statistics
I
I
I

wage= 1432.3 euros


age = 38.87 years
college graduates = 30.64 %

SAS Program
Proc reg data=c ;
model wage = age ;
model wage = sup ;
run ;

Figure: Effect of age on wage

Figure: Effect of education on wage

Finite-sample properties of the OLS estimator


I
I

Other estimation methods exist and could be used (e.g.


maximum likelihood, ...). How to choose ?
By comparing their properties in terms of
1. unbiasedness
2. precision (minimization of the variance)

1. Unbiasedness

Proposition
Let Assumptions (1) to (4) be verified. Then E ( b0 ) = 0 and
E ( b1 ) = 1 . The OLS estimators are unbiased estimators of the
parameters.
The sampling distribution of estimators is centered around the
true parameter. If we could draw an infinite number of samples of
size n from the population, and take the average of the infinite
number of OLS estimates, we would obtain the true value of 0
and 1 . BUT it does not mean that the particular OLS estimates
obtained using a given sample of size n are equal to the true values
of the parameters.

2. Precision
The question is now : are our OLS estimates far from the true
values of 0 and 1 ?

Assumption (5) (Homoscedasticity and non autocorrelation)


V (ui | Xi ) = 2 and corr (ui , uj | Xi , Xj ) = 0
Remarks
1. is the standard error
2. is unknown, since u represents all the unobserved
explanatory variables.
3. Under assumption (5), V (Yi | Xi ) = 2
4. Thus the variance of Y , given X , does not vary with X
(strong assumption)
5. Under assumptions (4) and (5), E (ui2 | Xi ) = 2

Proposition
Let assumptions (1) to (5) be verified. Then
2 n2 (ni=1 Xi )2
V ( b0 | X ) =
ni=1 (Xi X )2
V ( b1 | X ) =

2
ni=1 (Xi

X )2

(4)
(5)

Remarks
1. V ( b0 ) and V ( b1 ) increase with 2 (the higher the variance of
the error term, the more difficult it is to estimate the
parameters with precision).
2. V ( b0 ) and V ( b1 ) decrease with ni=1 (Xi X )2
3. V ( b0 ) and V ( b1 ) decrease with n
4. As is unknown,V ( b0 ) and V ( b1 ) are also unknown
5. BUT can be estimated using the sum of squared residuals
bi2
ni=1 u

Proposition
n
u
b2
b
2 = ni =12 i =

SSR
n 2

is an unbiased estimator of 2 E (b
2 ) = 2

Replacing by b
in the variance of the estimators give unbiased
estimators of V ( b0 ) and V ( b1 ).
We want to find the best unbiased linear estimator (BLUE) of 0
and 1 .
How is defined the best estimator ? It is, among the unbiased linear
estimators, the one with the smaller variance.

Theorem (Gauss-Markov)
Under assumptions (1) to (5), the OLS estimators b0 and b1 are
the best linear unbiased estimators of respectively 0 and 1 .

The effect of omitting relevant variables


I

I
I
I
I

I
I

Assume the true model is Y = 0 + 1 X1 + 2 X2 + u, while


the estimated model is Y = 0 + 1 X1 + e.
X2 is omitted and e = 2 X2 + u
Example : both age and education affect the wage
Let e0 and e1 denote the OLS estimators of
Y = 0 + 1 X1 + e.
Are they biased estimators of 0 and 1 ?
There are 2 cases
1. If Cov (X1 , X2 ) = 0 or 2 = 0, then e0 and e1 are unbiased.
2. If Cov (X1 , X2 ) 6= 0 and 2 6= 0, then e0 and e1 are biased.
Cov (X1 ,X2
The bias in e1 is equal to 2
)
V (X1 )

Sign of the bias


1. If Cov (X1 , X2 ) > 0 (resp. < 0) and 2 > 0 (resp. < 0), then
e1 is upward biased.
2. If Cov (X1 , X2 ) > 0 (resp. < 0) and 2 < 0 (resp. > 0), then
e1 is downward biased.

Example : age and education in a wage equation


I

We estimate the true model


wage = 0 + 1 age + 2 sup + u

b2 > 0
ceteris paribus education has a positive effect on the wage
We estimate the false model
wage = 0 + 1 age + e

I
I

(7)

Comparison : e1 = 26.24 < b1 = 30.47


the effect of age is under-estimated in the false model
We estimate education as a function of age
sup = 0 + 1 age + v

(6)

1 < 0
age has a negative effect on education
As b
1 < 0 and b2 > 0, the estimator is logically downward
biased

(8)

Figure: Effect of age and education on wage

Figure: Effect of age on education

THE MULTIPLE LINEAR REGRESSION MODEL

Introduction
Definition
A multiple linear regression model is a regression model where the
dependant variable is continuous, explained by several exogenous
variables, and linear in the parameters.
Example
K

Y = 0 +

k Xk + u = X + u

(9)

k =1

where 0 is a vector of K + 1 parameters ( 0 , 1 , ..., K )


and X is a matrix (1, X1 , ..., XK ) with K + 1 columns.
I

Linearity of the model in the parameters k , k = 0, ..., K

0 is the constant

k is the slope parameter and measures the ceteris paribus


effect of Xk on Y .

In the multiple case, the matrix notation is more convenient.

Assumptions
Assumption
1. Random sampling : {(X1i , X2i ..., XKi , Yi ); i = 1, ..., n} is a
random sample of size n from the population.
2. Sample variation and no collinearity : the explanatory variables
are not linearly related and none is constant Rank of X 0 X
is K + 1
3. Zero mean : E (ui ) = 0
4. Zero conditional mean : E (ui | X1i , ..., XKi ) = 0
5. Homoscedasticity and non-autocorrelation :
V (ui | X1i , ..., XKi ) = 2 and
corr (ui , uj | X1i , ..., XKi , X1j , ..., XKj ) = 0

Remarks
I

Assumptions (1), (3), (4) and (5) are similar to the simple
case.

Assumption (2) is an extension of the simple case.


Assumption (2) is required for the identification of the
parameters. WHY ?
I Assume X
ki = Xk i. Then 0 and k cannot be separately

identified.
I

Similarly, assume that the variables X1 and X2 are collinear :


X1 = X2 .
The model can be rewritten :
Y = 0 + ( 1 + 2 )X2 + K
k =3 k Xk + u.
1 and 2 cannot be separately identified.

contd

Remarks
I

Other example : dummy variables


If all the dummy variables and the constant are included in the
model, then the constant and the dummy parameters cannot
be identified separately.
One of the dummy variables must be dropped (the reference
variable) (i.e. education).

The rank of a matrix X is equal to the number of nonzero


characteristic roots in X 0 X .

rank (X ) min(number of rows, number of columns).

The Ordinary Least Squares estimator


Definition
The OLS estimates bk , k = {0, ..., K }, minimize the sum of
squared residuals
n

S ( 0 , ..., K ) =

ui2 =

i =1

(Yi 0

i =1

k Xk )2 = (Y X )0 (Y X
i

k =1

(10)
First order Conditions
b
2 ni=1 (Yi b0 K
k =1 k Xki ) = 0
n
K
2 i =1 Xki (Yi b0 k =1 bk Xki ) = 0, k = {1, ..., K }
OR, using matrix notation
n

2 Xi0 (Yi Xi b) = 2X 0 (Y X b) = 0
i =1

(11)

Proposition
b k ))
bki ) (X k X
n (Yi Y )((Xki X
bk = i =1
bki ) (X k X
b k ))2
ni=1 ((Xki X
bki is the predicted value of Xki obtained from a regression
where X
of Xki on a constant and all the other covariates.
K

b0 = Y

bk X k

k =1

OR using matrix notation,


b = (X 0 X )1 X 0 Y

(12)

Comparison with the simple case : bk is equal to the slope


bki )
estimate in the regression of Y on a constant and (Xki X
ceteris paribus effect.

How well does the regression line fits the sample data ?
Definition
Let SST = ni=1 (Yi Y )2 denote the total variation
in Y ,
bi Y )2 denote the explained sum of
SSE = ni=1 (Y
squares,
and SSR = ni=1 u
bi2 denote the regression sum of
squares.
Then
R2 =

b2
SSE
SSR
n u
= 1
= 1 n i =1 i 2
SST
SST
i =1 (Yi Y )

(13)

Remarks
I

R 2 does not decrease when one more variable is included.

The R 2 is useful to compare two models with the same


number of explanatory variables, but not useful if the number
of variables is different.

Example : Determinants of the monthly wage


I

Data set French sample of the European Community


Household Panel

Dependent variable wage = monthly wage


Explanatory variables

1. age (continuous variable or discrete variable)


2. diplo0, diplo1, ... = educational level (discrete variables 0, 1)
3. sex, children
I

The sample is restricted to the individuals aged 20 60 and


employed in 2000 (n=5010)

SAS Program
Proc reg data=c ;
model wage = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1
nbenf2 nbenf3 ;
model wage = sex1 ag1 ag2 ag4 ag5 diplo1 diplo2 diplo3 diplo4
nbenf1 nbenf2 nbenf3 ;
run ;

Figure: Determinants of wage - model 1

Figure: Determinants of wage - model 2

Finite-sample properties of the OLS estimator


Proposition
Let Assumptions (1) to (4) be verified. Then
E ( b) =

(14)

Proposition
Let Assumptions (1) to (5) be verified. Then
V ( bk | X1 , ..., XK ) =

2
ni=1 (Xki X k )2 (1 Rk2 )

(15)

where Rk2 is the R 2 obtained from the regression of Xk on the


covariates.
OR using the matrix notation
V ( b | X ) = 2 (X 0 X )1

(16)

Remarks
V ( bk | X ) increases with 2 and Rk2 , and decreases with
ni=1 (Xki X k )2 .
Estimating the error variance
I

We dont know what the error variance 2 is, because we


dont observe the error term u.

BUT we observe the residuals u


b and we can use the residuals
to find an estimate of the error variance.
Replacing by b
in the variance of the estimators gives
unbiased estimators of V ( b).

Proposition
b
2 =

bi2
ni=1 u
n K 1

SSR
degree of freedom

is an unbiased estimator of 2 .

Theorem (Gauss-Markov)
Under assumptions (1) to (5), b is the best linear unbiased
estimator (BLUE).

Asymptotic properties
We study now the properties of the OLS estimator when n +.
First we study the consistency of the OLS estimator. Consistency is
stronger than unbiasedness.

Theorem
Under assumptions (1) to (4), plim( b) = .
Under assumptions (1) to (5), plim(b
) =
Next we study the asymptotic distribution of the OLS estimator.

Theorem (asymptotic normality)


Under assumptions (1) to (5),

n( b ) N (0, 2 Q 1 )
0

with Q 1 = plim(( XnX )1 )

(17)

The maximum likelihood method


The likelihood is the probability of observing the sample
{(Y1 , X1 ) , ..., (YN , XN )}.

Definition
The contribution of individual i to the likelihood is the function Li
defined by : Li (Yi , Xi ; ) = f (Yi , Xi ; ).
The likelihood function of the sample is the function
L(Yi , Xi , i = 1, ..., n; ) defined as the product of the individual
contributions :
N

L(Yi , Xi , i = 1, ..., n; ) =

Li (Yi , Xi ; )

i =1

If the dependent variable is continuous, L(Y , X ; ) is the


product of the distribution functions associated with each couple.

Assumption (6) (Normality of the error term)


The error term is independent of Xi and normally distributed with
zero mean and variance 2 : ui |Xi N (0, 2 ).
Under this assumption, we can use the Maximum Likelihood
method.

Theorem
Under assumptions (1) to (6), the Maximum Likelihood estimator
of is the OLS estimator.
MOREOVER

Theorem
Under assumptions (1) to (6), the OLS estimator and the ML
estimator are the minimum variance linear unbiased estimator of .

Tests and inference - finite samples


I
I

We want to test hypotheses about the parameters of the


model.
Example : wage equation
I
I

Are men significantly better paid than women ?


Are the 45-55 significantly better paid than the 35-45 (the
reference category) ?
Are the 45-55 significantly better paid than the 25-35 ?

In order to perform statistical tests on finite sample, we need


to add an assumption on the distribution of the error term.

Assumption (6) (Normality of the error term)


The error term is independent of Xi and normally distributed with
zero mean and variance 2 : ui |Xi N (0, 2 ).

the distribution of the error term conditional on the vector of


the explanatory variables is normal.

Remarks
I

Assumption (6) implies assumptions (3), (4) and (5).

Conditionally on the explanatory variables,the dependent


variable is normally distributed with mean Xi and variance
2
2 Yi |X N ( 0 + K
k =1 k Xki , )

Consequence for the distribution of b

Theorem
Under assumptions (1) to (6), bk is normally distributed with
mean k and variance V ( bk )
bk k N (0, V ( bk ))
b
k k N (0, 1)
V ( bk )

Proof b is a linear combination of the error term.


We cannot use directly this property since V ( b) is unknown. BUT
we can replace the variance by the estimate of the variance, which
gives :

Theorem


Under assumptions (1) to (6), k bk TnK 1
b

b ( k )
V

Idea
Assume we want to test the null hypothesis H0
We need
(i) a test statistic (t), i.e. a decision function that takes
its values in the set of hypotheses
(ii) a decision rule that determines when H0 is rejected
choose = Pr ( rejectH0 |H0 is true)
is the significance level (usually = 5%)
(iii) a critical region, i.e. the set of values of the test
statistic for which the null hypothesis is rejected
we want to find the critical value c that verifies
Pr ( rejectH0 |H0 is true) = Pr (|t | > c ) =
c is the 2 th percentile in the t distribution with
n K 1 degrees of freedom

Thet-test : is k significantly different from 0 ?


H0 :
H1 :

k = 0
k 6= 0

The null hypothesis means that Xk has no effect on Y .

(i) Under the null hypothesis, t = k b


b

b ( k )
V

The test statistic follows a t distribution with


n K 1 degrees of freedom.
(ii) = 5%
(iii) Pr (|t | > c ) = 5%
c is the 97.5-th percentile in the t distribution with
n K 1 degrees of freedom. When n K 1 is
large, c = 1.96
Decision
b
I If |t | =
< 1.96, then H0 is accepted at the 5%
b
b ( )
V

significance level
b
If |t | = b 1.96, then H0 is rejected at the 5%
b ( )
V

significance level.

Figure: Determinants of wage - model 1

Thet-test : is k significantly greater than 0 ?


H0 :
H1 :

k = 0
k > 0

The null hypothesis means that Xk has no effect on Y .

(i) Under the null hypothesis, t = k b


b

b ( k )
V

The test statistic follows a t distribution with


n K 1 degrees of freedom.
(ii) = 5%
(iii) Pr (t > c ) = 5%
c is the 95-th percentile in the t distribution with
n K 1 degrees of freedom. When n K 1 is
large, c = 1.645
Decision
b
I If t =
< 1.645, then H0 is accepted at the 5%
b
b ( )
V

significance level
b
If t = b 1.645, then H0 is rejected at the 5%
b ( )
V

significance level.

The t-test : is k significantly different from a ?




k = a
k 6= a

H0 :
H1 :

a
(i) Under the null hypothesis, t = k b
b

b ( k )
V

The test statistic follows a t distribution with


n K 1 degrees of freedom.
(ii) = 1%
(iii) Pr (|t | > c ) = 1%
c is the 99.5-th percentile in the t distribution with
n K 1 degrees of freedom. When n K 1 is
large, c = 2.576
Decision
I

If |t | < 2.576, then H0 is accepted at the 1% significance level

If |t | 2.576, then H0 is rejected at 1%

Thet-test : is k significantly different from j ?


H0 :
H1 :

k = j
k 6= j

The null hypothesis means that Xk and Xj have the same effect on
Y.
(i) Under the null hypothesis,
bk bj
t=

b ( bk )+V
b ( bj )2Cov
d ( bk , bj )
V

The test statistic follows a t distribution with


n K 1 degrees of freedom.
(ii) = 10%
(iii) Pr (|t | > c ) = 10%
c is the 95-th percentile in the t distribution with
n K 1 degrees of freedom. When n K 1 is
large, c = 1.645
Decision
I If |t | < 1.645, then H0 is accepted at 10%
I If |t | 1.645, then H0 is rejected at 10%

The F-test : exclusion restrictions


We test q linear restrictions on the parameters

H0 :
K +1q = K +2q = ... = K = 0
H1 :
H0 is false
(i) Under the null hypothesis, the model becomes
Y = 0 + 1 X1 + ... + K q XK q .
(ii) The test statistic is F =

(R 2 Rc2 )/q
,
(1R 2 ) / (n K 1)

where Rc2

denotes the R 2 of the constrained model.


The F-statistic follows a Fisher distribution with
(q, n K 1) degrees of freedom.
(iii) Pr (F > c ) =
Decision
I

If F < c, then H0 is accepted at the significance level

If F c, then H0 is rejected

Figure: Determinants of wage - model 3

The F-test : Overall significance


Question : is the model completely false ?

H0 :
1 = 2 = ... = K = 0
H1 :
H0 is false
K is the number of restrictions.
(i) Under the null hypothesis, F =

R 2 /K
.
(1R 2 ) / (n K 1)

The F-statistic follows a Fisher distribution with


(K , n K 1) degrees of freedom.
(ii) = 1%
(iii) Pr (F > c ) = 1%
c is the 99-th percentile in the F distribution with
(K , n K 1) degrees of freedom.
Decision
I

If F < c, then H0 is accepted at the 1% significance level

If F c, then H0 is rejected at the 1% significance level.

Heteroscedasticity
Assumption (5) (homoscedasticity) is restrictive

How to proceed when this assumption is dropped ?

Assumption (5)
V (ui |X1i , ..., XKi ) = i2 and
corr (ui , uj | X1i , ..., XKi , X1j , ..., XKj ) = 0

Remarks
I

The OLS estimator remains unbiased

The OLS estimator remains consistent


BUT

V ( b|X ) is no longer equal to 2 (X 0 X )1

The asymptotic normality is no longer verified

The OLS estimator is no longer the BLUE

Testing for heteroscedasticity


I

There exist two tests : Breusch-Pagan and White. Both tests


are based on the residuals of the fitted model.

IDEA
E (ui2 |X1i , ..., XKi ) = i2 ui2 = i2 + ei

The variance of the error term is of the general form


i2 = i = (X1i , ..., XKi )

The Breusch-Pagan test


I

More restrictive assumption on the form of i :


(X1i , ..., XKi ) = 0 + K
k =1 k Xki

ui2 = 0 + K
k =1 k Xki + ei

I

H0 :
H1 :

1 = 2 = ... = K = 0
k such that k 6= 0

Under the null hypothesis, F = (1R 2R)/(/K


follows a
n K 1)
Fisher distribution with (K , n K 1) degrees of freedom.
The White test

Less restrictive assumption on the form of i :


K
(X1i , ..., XKi ) = 0 + K
k =1 k Xki + k,j =1 kj Xki Xji
K
ui2 = 0 + K
k =1 k Xki + k,j =1 kj Xki Xji + ei


I

H0 :
H1 :

k = kj = 0 k , j
k such that k 6= 0 or kj 6= 0
2

Under the null hypothesis, F = (1R 2R)/(/q


follows a
n q 1)
Fisher distribution with (q, n q 1) degrees of freedom and
K (K +1)
q=K+
2

Example : Determinants of the monthly wage


Data set French sample of the European Community
Household Panel
I The sample is restricted to the individuals aged 20 60 and
employed in 2000 (n=5010)
I Dependent variable wage = monthly wage
I Explanatory variables age (continuous variable), diplo0,
diplo1, ... = educational level (discrete variables 0, 1), sex,
children
SAS Program
Proc model data=c ;
Parms ac asex1 aage aage2 adiplo1 adiplo2 adiplo3 adiplo4
anbenf1 anbenf2 anbenf3 ;
wage=ac+ +asex1*sex1+aage*age+aage2*age2+ adiplo1*diplo1
+ adiplo2*diplo2 + adiplo3*diplo3 + adiplo4*diplo4
+anbenf1*nbenf1 +anbenf2*nbenf2 +anbenf3*nbenf3 ;
fit wage / white breusch=(1 sex1 age age2 diplo1 diplo2 diplo3
diplo4 nbenf1 nbenf2 nbenf3) ;
run ;
I

Figure: Example : Determinants of wage

Correcting for heteroscedasticity


There are two methods to improve the efficiency of the estimation
in the presence of heteroscedastic errors
1. Use the Feasible General Least Squares (FGLS) method if the
function (X1i , ..., XKi ) is known
2. Use the OLS method but compute a heteroscedastic
consistent-covariance matrix estimator (White
(1980),Davidson and MacKinnon (1993))
1. The Feasible General Least Squares (FGLS) method
I

Assumption : i2 = i = g (0 + K
k =1 k Xki ) where g (.) is
known.

IDEA
apply a transformation to the initial model Yi = Xi + ui that
makes the error terms of the transformed model
homoscedastic.
q

Yi , Xi , and ui are divided by

0 + K
k =1 k Xki

The transformed model is


Yi

0
i

K
k =1 k Xki

ui
i

Remarks
I

there is no constant in the transformed model

the parameters k are the same than in the initial model

I ui
i

is homoscedastic

The transformed model can then be estimated by OLS


EXCEPT that we do not know the parameters 0 and k .

we proceed in the following way


1. estimate the initial model by OLS and compute u
bi
K
2
2. estimate u
bi = 0 + k =1 k Xki + ei and compute
b
k Xki
i = b
0 + K
k =1 b
3. divide the initial model by b
i
4. estimate the transformed model by the OLS
if is well specified, the FGLS estimator is unbiased,
consistent, and asymptotically efficient
2. Compute a heteroscedastic consistent-covariance matrix
estimator
The idea is to use the OLS method but compute a heteroscedastic
consistent-covariance matrix estimator (cf. White (1980),Davidson
and MacKinnon (1993)).

Example
SAS Program
Proc Reg Data=outc ;
Model UHat2 = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1
nbenf2 nbenf3 ;
Output OUT= MyData PREDICTED = Sig2hat ;
Run ;
/* Create weights as inverse of root of variance */
Data MyData ; Set MyData ; OmegaInv = SQRT(1/Sig2hat) ;
Run ;
Proc Reg Data=MyData ;
Model wage = sex1 age age2 diplo1 diplo2 diplo3 diplo4 nbenf1
nbenf2 nbenf3 ;
Weight OmegaInv ;
Run ;

Figure: Estimation of u
b2 when g (x ) = x

Figure: Estimation of u
b2 when g (x ) = exp (x )

Figure: Determinants of wage correcting for heteroscedasticity

Vous aimerez peut-être aussi