Vous êtes sur la page 1sur 13

Chapter 13

Asymptotic Theory and Stochastic Regressors


The nature of explanatory variable is assumed to be non-stochastic or fixed in repeated samples in any
regression analysis. Such an assumption is appropriate for those experiments which are conducted inside the
laboratories where the experimenter can control the values of explanatory variables. Then the repeated
observations on study variable can be obtained for fixed values of explanatory variables. In practice, such
an assumption may not always be satisfied. Sometimes, the explanatory variables in a given model are the
study variable in another model. Thus the study variable depends on the explanatory variables that are
stochastic in nature. Under such situations, the statistical inferences drawn from the linear regression model
based on the assumption of fixed explanatory variables may not remain valid.

We assume now that the explanatory variables are stochastic but uncorrelated with the disturbance term. In
case, they are correlated then the issue is addressed through instrumental variable estimation. Such a
situation arises in the case of measurement error models.

Stochastic regressors model


Consider the linear regression model
y Xβ +ε
=

where X is a (n× k ) matrix of n observations on k explanatory variables X 1 , X 2 ,..., X k which are

stochastic in nature, y is a ( n ×1) vector of n observations on study variable, β is a ( k ×1) vector of

regression coefficients and ε is the ( n ×1) vector of disturbances. Under the assumption

V ( ε ) σ 2 I , the distribution of ε i , conditional on xi' , satisfy these properties for all all values of
E (ε ) 0,=
=

X where xi' denotes the i th row of X . This is demonstrated as follows:

Let p ( ε i | xi' ) be the conditional probability density function of ε i given xi' and p ( ε i ) is the unconditional

probability density function of ε i . Then

E ( ε i | xi' ) = ∫ ε i p ( ε i | xi' ) d ε i
= ∫ ε i p (ε i ) dε i
= E (ε i )
=0
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

1
E ( ε i2 | xi' ) = ∫ ε i2 p ( ε i | xi' ) d ε i
= ∫ ε i2 p ( ε i ) d ε i
= E ( ε i2 )
= σ 2.

In case, ε i and xi' are independent, then p ( ε i | xi' ) = p ( ε i ) .

Least squares estimation of parameters


The additional assumption that the explanatory variables are stochastic poses no problem in the ordinary
least squares estimation of β and σ 2 . The OLSE of β is obtained by minimizing ( y − X β ) ' ( y − X β )

with respect β as

b =(X 'X ) X 'y


−1

and estimator of σ 2 is obtained as


1
s2 = ( y − Xb ) ' ( y − Xb ) .
n−k

Maximum likelihood estimation of parameters:


Assuming ε ~ N ( 0, σ 2 I ) in the model=
y X β + ε along with X is stochastic and independent of ε , the

joint probability density function ε and X can be derived from the joint probability density function of y
and X as follows:

f ( ε , X ) = f ( ε1 , ε 2 ,..., ε n , x1' , x2' ,..., xn' )


 n  n 
=  ∏ f ( ε i )   ∏ f ( xi' ) 
=  i 1=  i 1 
 n  n 
=  ∏ f ( yi | xi' )   ∏ f ( xi' ) 
=  i 1=  i 1 

(
= ∏ f ( yi | xi' ) f ( xi' ) )
n

i =1

= ∏ f ( yi , xi' )
n

i =1

= f ( y1 , y2 ,..., yn , x1' , x2' ,..., xn' )


= f ( y, X ) .

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

2
This implies that the maximum likelihood estimators of β and σ 2 will be based on

∏ f ( y | x ) = ∏ f (ε )
n n
'
i i i
=i 1 =i 1

so they will be same as based on the assumption that ε i ' s, i = 1, 2,..., n are distributed as N ( 0, σ 2 ) . So the

maximum likelihood estimators of β and σ 2 when the explanatory variables are stochastic are obtained as

β = ( X ' X ) X ' y
−1

σ 2 =( y − X β )′ ( y − X β ) .
1
n

Alternative approach for deriving the maximum likelihood estimates


Alternatively, the maximum likelihood estimators of β and σ 2 can also be derived using the joint
probability density function of y and X .

Note: Note that the vector x ' is represented by an underscore in this section to denote that it ‘s order is

1× ( k − 1)  which excludes the intercept term.

Let xi' , i = 1, 2,..., n are from a multivariate normal distribution with mean vector µ x and covariance matrix

Σ xx , i.e., xi' ~ N ( µ x , Σ xx ) and the joint distribution of y and xi' is

 y  µ y   σ yy Σ yx  
 '  ~ N  µ  ,   .
 xi   x   Σ xy Σ xx  

Let the linear regression model is


yi =β 0 + xi' β1 + ε i , i =1, 2,..., n

where xi' is a 1× ( k − 1)  vector of observation of random vector x, β 0 is the intercept term and β1 is the

( k − 1) × 1 vector of regression coefficients. Further ε i is disturbance term with ε i ~ N ( 0, σ 2 ) and is

independent of x ' .

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

3
Suppose

 y  µ y   σ yy Σ yx  
  ~ N  µ  ,  Σ  .
Σ xx  
 x  x   xy

The joint probability density function of ( y, x ) based on random sample of size n is


i
'

 1  y − µ y ' y − µ y 
1 −1 
f ( y, x=
') exp  −   Σ   .
k 1
 x − µ x − µ  
( 2π ) 2 Σ2 
2  x   x

Now using the following result, we find Σ −1 :


Result: Let A be a nonsingular matrix which in partitioned suitably as
 B C
A= ,
D E
where E and F= B − CE −1 D are nonsingular matrices, then
 F −1 − F −1CE −1 
A−1 =  −1 −1 −1 −1 −1 −1 
.
 − E DF E + E DF CE 
−1 −1
=
Note that AA A= A I.
Thus

−11  1 −Σ yx Σ −xx1 
Σ = 2  −1 ,
σ  −Σ xx Σ xy σ Σ xx + Σ xx Σ xy Σ yx Σ xx 
2 −1 −1 −1

where
σ=
2
σ yy − Σ yx Σ −xx1Σ xy .
Then

f ( y, x ')
=
( 2π )
1
k
2
1
Σ2
exp
 1 

 2σ 2  y − µ y {
− ( x − µ x ) ' Σ −1
xx Σ xy
 2 −1
} .
 + σ ( x − µ x ) ' Σ xx ( x − µ x )
2

The marginal distribution of x ' is obtained by integrating f ( y, x ') over y and the resulting distribution is

( k − 1) variate multivariate normal distribution as

 1 
exp  − ( x − µ x ) ' Σ −xx1 ( x − µ x )  .
1
=g ( x ') k −1
 2 
1
( 2π ) 2 Σ xx 2

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

4
The conditional probability density function of y given x ' is

f ( y, x ')
f ( y | x ') =
g ( x ')
 1
{( y − µ ) − ( x − µ ) Σ } 
2
1 ' −1
= exp  − 2 Σ xy 
 2σ
y x xx
2πσ 2

which is the probability density function of normal distribution with
• conditional mean
E ( y | x ')= µ y + ( x − µ x ) ' Σ −xx1Σ xy and

• conditional variance
| x ') σ yy (1 − ρ 2 )
Var ( y=

where
Σ yx Σ −xx1Σ xy
ρ2 =
σ yy
is the population multiple correlation coefficient.
In the model
y=β 0 + x ' β1 + ε ,
the conditional mean is
E ( yi | xi' ) =β 0 + x ' β1 + E ( ε | x )
= β 0 + x ' β1.

Comparing this conditional mean with the conditional mean of normal distribution, we obtain the
relationship with β 0 and β1 as follows:

β1 =Σ −xx1Σ xy
β=
0 µ y − µ x' β1.

The likelihood function of ( y, x ') based on a sample of size n is

 n  1  y − µ ' y − µ y  
1  −1  i
= exp ∑ − 
 Σ    .
i y
L nk n
 x − µ x − µ x  
( 2π ) 2 Σ2  i =1   i
2 x   i 

Maximizing the log likelihood function with respect to µ y , µ x , Σ xx and Σ xy , the maximum likelihood

estimates of respective parameters are obtained as

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

5
1 n
µ y= y= ∑ yi
n i =1
1 n
µ x= x= ∑ xi= ( x2 , x3 ,..., xk )
n i =1
1 n 
Σ xx = S xx =  ∑
n  i =1
xi xi' − nx x ' 

1 n 
Σ xy = S xy = ∑ xi yi − nx
n  i =1
y

1
where xi' ( xi 2 , xi 3 ,..., xik ), S xx is [(k -1) × (k -1)] matrix with elements ∑ ( xti − xi )( xtj − x j ) and S xy is
n t
1
[(k -1) ×1] vector with elements ∑ ( xti − xi )( yi − y ).
n t
Based on these estimates, the maximum likelihood estimators of β1 and β 0 are obtained as

β1 = S xx−1S xy
β0= y − x ' β1
 β0 
β = (X 'X )
−1
=   X ' y.
 β1 

Properties of the estimators of least squares estimator:


The estimation error of OLSE b = ( X ' X ) X ' y of β is
−1

(X 'X ) X 'y−β
−1
b−β
=
= ( X ' X ) X '( X β + ε ) − β
−1

= ( X ' X ) X 'ε .
−1

Then assuming that E ( X ' X ) X ' exists, we have


−1
 

E ( X ' X ) X ' ε 
−1
E (b − β ) =
 

 {
= E  E ( X ' X ) X 'ε X 
−1

 }
= E ( X ' X ) X ' E ( ε )
−1
 
=0

because ( X ' X ) X ' and ε are independent. So b is an unbiased estimator of β .


−1

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

6
The covariance matrix of b is obtained as
V ( b ) =E ( b − β )( b − β ) '
= E ( X ' X ) X ' εε ' X ( X ' X ) 
−1 −1
 

 {
= E  E ( X ' X ) X ' εε ' X ( X ' X ) X 
−1 −1

 }
= E ( X ' X ) X ' E ( εε ') X ( X ' X ) X 
−1 −1
 
= E ( X ' X ) X ' σ 2 X ( X ' X ) 
−1 −1
 
= σ 2 E ( X ' X )  .
−1
 
Thus the covariance matrix involves a mathematical expectation. The unknown σ 2 can be estimated by
e 'e
σˆ 2 =
n−k

=
( y − Xb ) ' ( y − Xb )
n−k
where e= y − Xb is the residual and

E (σˆ 2 ) = E  E (σˆ 2 X ) 
  e 'e  
= E E   X
 n−k  
= E (σ 2 )
= σ 2.

Note that the OLSE b = ( X ' X ) X ' y involves the stochastic matrix X and stochastic vector y , so b is
−1

not a linear estimator. It is also no more the best linear unbiased estimator of β as in the case when X is

nonstochastic. The estimator of σ 2 as being conditional on given X is an efficient estimator.

Asymptotic theory:
The asymptotic properties of an estimator concerns the properties of the estimator when sample size n
grows large.

For the need and understanding of asymptotic theory, we consider an example. Consider the simple linear
regression model with one explanatory variable and n observations as
yi =β 0 + β1 xi + ε i , E ( ε i ) =0, Var ( ε i ) =σ 2 , i =1, 2,..., n.

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

7
The OLSE of β1 is
n

∑ ( x − x )( y − y )
i i
b1 = i =1
n

∑(x − x )
2
i
i =1
and its variance is
σ2
Var ( b1 ) = .
n
If the sample size grows large, then the variance of b1 gets smaller. The shrinkage in variance implies that

as sample size n increases, the probability density of OLSE b collapses around its mean because Var (b)
becomes zero.

Let there are three OLSEs b1 , b2 and b3 which are based on sample sizes n1 , n2 and n3 respectively such that

n1 < n2 < n3 , say. If c and δ are some arbitrarily chosen positive constants, then the probability that the

value of b lies within the interval β ± c can be made to be greater than (1 − δ ) for a large value of n. This

property is the consistency of b which ensure that even if the sample is very large, then we can be
confident with high probability that b will yield an estimate that is close to β .

Probability in limit
Let βˆn be an estimator of β based on a sample of size n . Let γ be any small positive constant. Then

for large n , the requirement that bn takes values with probability almost one in an arbitrary small

neighborhood of the true parameter value β is

lim P  βˆn − β < γ  =


1
n →∞  
which is denoted as
plim βˆn = β

and it is said that βˆn converges to β in probability. The estimator βˆn is said to be a consistent estimator of

β.

A sufficient but not necessary condition for βˆn to be a consistent estimator of β is that

lim E  βˆn  = β
n →∞

and lim Var  βˆn  = 0.


n →∞

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

8
Consistency of estimators
Now we look at the consistency of the estimators of β and σ 2 .

(i) Consistency of b
 X 'X 
Under the assumption that lim   = ∆ exists as a nonstochastic and nonsingular matrix (with finite
n →∞
 n 
elements), we have
−1
1 X 'X 
lim V (b) = σ lim 
2

n →∞ n →∞ n
 n 
1
= σ 2 lim ∆ −1
n →∞ n

= 0.
This implies that OLSE converges to β in quadratic mean. Thus OLSE is a consistent estimator of β .
This also holds true for maximum likelihood estimators also.

Same conclusion can also be proved using the concept of convergence in probability.

The consistency of OLSE can be obtained under the weaker assumption that
 X 'X 
plim   = ∆* .
 n 
exists and is a nonsingular and nonstochastic matrix and
 X 'ε 
plim   = 0.
 n 
Since
b−β =
( X ' X ) −1 X ' ε
−1
 X ' X  X 'ε
=  .
 n  n
So
−1
 X 'X   X 'ε 
plim(b − β ) = plim   plim  
 n   n 
= ∆*−1.0
= 0.
Thus b is a consistent estimator of β . The same is true for maximum likelihood estimators also.

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

9
(ii) Consistency of s2
Now we look at the consistency of s 2 as an estimate of σ 2 . We have
1
s2 = e 'e
n−k
1
= ε ' Hε
n−k
−1
1 k 
1 −  ε ' ε − ε ' X ( X ' X ) X ' ε 
−1
=
n n 
 k   ε 'ε ε ' X  X ' X  X 'ε 
−1 −1

=
1 −   −   .
 n   n n  n  n 
ε 'ε 1 n 2
Note that
n
consists of ∑ ε i and {ε i2 , i = 1, 2,..., n} is a sequence of independently and identically
n i =1

distributed random variables with mean σ 2 . Using the law of large numbers
 ε 'ε 
 =σ
2
plim 
 n 
 ε ' X  X ' X  −1 X ' ε   ε 'X    X ' X  
−1
X 'ε 
plim     =  plim   plim     plim 
 n  n  n   n    n    n 
= 0.∆*−1.0
=0
(1 − 0) −1 σ 2 − 0 
⇒ plim( s 2 ) =
= σ 2.
Thus s 2 is a consistent estimator of σ 2 . The same holds true for maximum likelihood estimates also.

Asymptotic distributions:
Suppose we have a sequence of random variables {α n } with a corresponding sequence of cumulative

density functions { Fn } for a random variable α with cumulative density function F . Then α n converges

in distribution to α if Fn converges to F point wise. In this case, F is called the asymptotic distribution of

αn.

Note that since convergence in probability implies the convergence in distribution, so


plim α=
n α ⇒ α n 
D
→ α ( α n tend to α in distribution), i.e., the asymptotic distribution of α n is F
which is the distribution of α .
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

10
Note that
E (α ) : Mean of asymptotic distribution

Var (α ) : Variance of asymptotic distribution


lim E (α n ) : Asymptotic mean
n →∞

2
lim E α n − lim E (α n )  : Asymptotic variance.
n →∞  n →∞ 

Asymptotic distribution of sample mean and least squares estimation


1 n
Let α=
n Y=
n ∑ Yi be the sample mean based on a sample of size n . Since sample mean is a consistent
n i =1

estimator of population mean Y , so


plim Yn = Y

which is constant. Thus the asymptotic distribution of Yn is the distribution of a constant. This is not a
regular distribution as all the probability mass is concentrated at one point. Thus as sample size increases,
the distribution of Yn collapses.

Suppose consider only the one third observations in the sample and find sample mean as
n
3
3
Yn* = ∑ Yi .
n i =1

Then E (Yn* ) = Y
n

and Var (Yn* ) = 2 ∑ Var (Yi )


9 3
n i =1
9 n
= 2 σ2
n 3
3
= σ2
n
→ 0 as n → ∞.

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

11
Thus plim Yn* = Y and Yn* has the same degenerate distribution as Yn . Since Var (Yn* ) > Var (Yn ) , so Yn*

is preferred over Yn .

Now we observe the asymptotic behaviour of Yn and Yn* . Consider a sequence of random variables {α n }.

Thus for all n , we have

αn
= n (Yn − Y )
α n*
= n (Yn* − Y )
E (=
αn ) n E (Yn =
−Y ) 0
E (=
α n* ) n E (Yn* =
−Y ) 0
σ2
Var (α n ) = nE (Yn − Y ) = n
2
= σ2
n
3σ 2
Var (α n =
) nE (Yn − Y ) = n n = 3σ 2 .
* * 2

Assuming the population to be normal, the asymptotic distribution of


• Yn is N ( 0, σ 2 )

• Yn* is N ( 0,3σ 2 ) .

So now Yn is preferable over Yn* . The central limit theorem can be used to show that α n will have an
asymptotically normal distribution even if the population is not normally distributed.

Also, since

n (Yn − Y ) ~ N ( 0, σ 2 )
n (Yn − Y )
⇒Z = ~ N ( 0,1)
σ
and this statement holds true in finite sample as well as asymptotic distributions.

ordinary least squares estimate b = ( X ' X ) X ' y of β


−1
Consider the in linear regression model

y X β + ε . If X is nonstochastic then the finite covariance matrix of b is


=

V (b) = σ 2 ( X ' X ) −1.

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

12
X 'X
The asymptotic covariance matrix of b under the assumption that lim = Σ xx exists and is nonsingular.
n →∞ n
It is given by
−1
1  X 'X 
σ lim ( X ' X ) = σ lim   lim 
2 2

n →∞
 
n →∞ n n →∞
 n 
= σ 2 .0.Σ −xx1
=0
which is a null matrix.

Consider the asymptotic distribution of n ( b − β ) . Then even if ε is not necessarily normally distributed,

then asymptotically
n ( b − β ) ~ N ( 0, σ 2 Σ −xx1 )
n ( b − β ) ' Σ xx ( b − β )
~ χ k2 .
σ2

X 'X
If is considered as an estimator of Σ xx , then
n
X 'X
n (b − β ) ' (b − β ) (b − β ) ' X ' X (b − β )
n =
σ2 σ2

(
is the usual test statistic as is in the case of finite samples with b ~ N β , σ 2 ( X ' X )
−1
).

Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur

13

Vous aimerez peut-être aussi