Econometrics | Stochastic Regressors

Chapter 13
Asymptotic Theory and Stochastic Regressors

The nature of explanatory variable is assumed to be non-stochastic or fixed in repeated samples in any
regression analysis. Such an assumption is appropriate for those experiments which are conducted inside the
laboratories where the experimenter can control the values of explanatory variables. Then the repeated
observations on study variable can be obtained for fixed values of explanatory variables. In practice, such
an assumption may not always be satisfied. Sometimes, the explanatory variables in a given model are the
study variable in another model. Thus the study variable depends on the explanatory variables that are
stochastic in nature. Under such situations, the statistical inferences drawn from the linear regression model
based on the assumption of fixed explanatory variables may not remain valid.
We assume now that the explanatory variables are stochastic but uncorrelated with the disturbance term. In
case, they are correlated then the issue is addressed through instrumental variable estimation. Such a
situation arises in the case of measurement error models.
Stochastic regressors model

Consider the linear regression model
y Xβ +ε
=
where X is a (n× k ) matrix of n observations on k explanatory variables X 1 , X 2 ,..., X k which are
stochastic in nature, y is a ( n ×1) vector of n observations on study variable, β is a ( k ×1) vector of
regression coefficients and ε is the ( n ×1) vector of disturbances. Under the assumption
V ( ε ) σ 2 I , the distribution of ε i , conditional on xi' , satisfy these properties for all all values of
E (ε ) 0,=
=
X where xi' denotes the i th row of X . This is demonstrated as follows:
Let p ( ε i | xi' ) be the conditional probability density function of ε i given xi' and p ( ε i ) is the unconditional
probability density function of ε i . Then
E ( ε i | xi' ) = ∫ ε i p ( ε i | xi' ) d ε i
= ∫ ε i p (ε i ) dε i
= E (ε i )
=0
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
1
E ( ε i2 | xi' ) = ∫ ε i2 p ( ε i | xi' ) d ε i
= ∫ ε i2 p ( ε i ) d ε i
= E ( ε i2 )
= σ 2.
In case, ε i and xi' are independent, then p ( ε i | xi' ) = p ( ε i ) .
Least squares estimation of parameters

The additional assumption that the explanatory variables are stochastic poses no problem in the ordinary
least squares estimation of β and σ 2 . The OLSE of β is obtained by minimizing ( y − X β ) ' ( y − X β )
with respect β as
b =(X 'X ) X 'y

−1
and estimator of σ 2 is obtained as

1
s2 = ( y − Xb ) ' ( y − Xb ) .
n−k
Maximum likelihood estimation of parameters:

Assuming ε ~ N ( 0, σ 2 I ) in the model=
y X β + ε along with X is stochastic and independent of ε , the
joint probability density function ε and X can be derived from the joint probability density function of y
and X as follows:
f ( ε , X ) = f ( ε1 , ε 2 ,..., ε n , x1' , x2' ,..., xn' )

 n  n 
=  ∏ f ( ε i )   ∏ f ( xi' ) 
=  i 1=  i 1 
 n  n 
=  ∏ f ( yi | xi' )   ∏ f ( xi' ) 
=  i 1=  i 1 
(
= ∏ f ( yi | xi' ) f ( xi' ) )
n
i =1
= ∏ f ( yi , xi' )
n
i =1
= f ( y1 , y2 ,..., yn , x1' , x2' ,..., xn' )

= f ( y, X ) .
2
This implies that the maximum likelihood estimators of β and σ 2 will be based on
∏ f ( y | x ) = ∏ f (ε )
n n
'
i i i
=i 1 =i 1
so they will be same as based on the assumption that ε i ' s, i = 1, 2,..., n are distributed as N ( 0, σ 2 ) . So the
maximum likelihood estimators of β and σ 2 when the explanatory variables are stochastic are obtained as
β = ( X ' X ) X ' y
−1
σ 2 =( y − X β )′ ( y − X β ) .
1
n
Alternative approach for deriving the maximum likelihood estimates

Alternatively, the maximum likelihood estimators of β and σ 2 can also be derived using the joint
probability density function of y and X .
Note: Note that the vector x ' is represented by an underscore in this section to denote that it ‘s order is
1× ( k − 1)  which excludes the intercept term.
Let xi' , i = 1, 2,..., n are from a multivariate normal distribution with mean vector µ x and covariance matrix
Σ xx , i.e., xi' ~ N ( µ x , Σ xx ) and the joint distribution of y and xi' is
 y  µ y   σ yy Σ yx  
 '  ~ N  µ  ,   .
 xi   x   Σ xy Σ xx  
Let the linear regression model is

yi =β 0 + xi' β1 + ε i , i =1, 2,..., n
where xi' is a 1× ( k − 1)  vector of observation of random vector x, β 0 is the intercept term and β1 is the
( k − 1) × 1 vector of regression coefficients. Further ε i is disturbance term with ε i ~ N ( 0, σ 2 ) and is
independent of x ' .
3
Suppose
 y  µ y   σ yy Σ yx  
  ~ N  µ  ,  Σ  .
Σ xx  
 x  x   xy
The joint probability density function of ( y, x ) based on random sample of size n is

i
'
 1  y − µ y ' y − µ y 
1 −1 
f ( y, x=
') exp  −   Σ   .
k 1
 x − µ x − µ  
( 2π ) 2 Σ2 
2  x   x
Now using the following result, we find Σ −1 :

Result: Let A be a nonsingular matrix which in partitioned suitably as
 B C
A= ,
D E
where E and F= B − CE −1 D are nonsingular matrices, then
 F −1 − F −1CE −1 
A−1 =  −1 −1 −1 −1 −1 −1 
.
 − E DF E + E DF CE 
−1 −1
=
Note that AA A= A I.
Thus
−11  1 −Σ yx Σ −xx1 
Σ = 2  −1 ,
σ  −Σ xx Σ xy σ Σ xx + Σ xx Σ xy Σ yx Σ xx 
2 −1 −1 −1
where
σ=
2
σ yy − Σ yx Σ −xx1Σ xy .
Then
f ( y, x ')
=
( 2π )
1
k
2
1
Σ2
exp
 1 
−
 2σ 2  y − µ y {
− ( x − µ x ) ' Σ −1
xx Σ xy
 2 −1
} .
 + σ ( x − µ x ) ' Σ xx ( x − µ x )
2
The marginal distribution of x ' is obtained by integrating f ( y, x ') over y and the resulting distribution is
( k − 1) variate multivariate normal distribution as
 1 
exp  − ( x − µ x ) ' Σ −xx1 ( x − µ x )  .
1
=g ( x ') k −1
 2 
1
( 2π ) 2 Σ xx 2
4
The conditional probability density function of y given x ' is
f ( y, x ')
f ( y | x ') =
g ( x ')
 1
{( y − µ ) − ( x − µ ) Σ } 
2
1 ' −1
= exp  − 2 Σ xy 
 2σ
y x xx
2πσ 2

which is the probability density function of normal distribution with
• conditional mean
E ( y | x ')= µ y + ( x − µ x ) ' Σ −xx1Σ xy and
• conditional variance
| x ') σ yy (1 − ρ 2 )
Var ( y=
where
Σ yx Σ −xx1Σ xy
ρ2 =
σ yy
is the population multiple correlation coefficient.
In the model
y=β 0 + x ' β1 + ε ,
the conditional mean is
E ( yi | xi' ) =β 0 + x ' β1 + E ( ε | x )
= β 0 + x ' β1.
Comparing this conditional mean with the conditional mean of normal distribution, we obtain the
relationship with β 0 and β1 as follows:
β1 =Σ −xx1Σ xy
β=
0 µ y − µ x' β1.
The likelihood function of ( y, x ') based on a sample of size n is
 n  1  y − µ ' y − µ y  
1  −1  i
= exp ∑ − 
 Σ    .
i y
L nk n
 x − µ x − µ x  
( 2π ) 2 Σ2  i =1   i
2 x   i 
Maximizing the log likelihood function with respect to µ y , µ x , Σ xx and Σ xy , the maximum likelihood
estimates of respective parameters are obtained as
5
1 n
µ y= y= ∑ yi
n i =1
1 n
µ x= x= ∑ xi= ( x2 , x3 ,..., xk )
n i =1
1 n 
Σ xx = S xx =  ∑
n  i =1
xi xi' − nx x ' 

1 n 
Σ xy = S xy = ∑ xi yi − nx
n  i =1
y

1
where xi' ( xi 2 , xi 3 ,..., xik ), S xx is [(k -1) × (k -1)] matrix with elements ∑ ( xti − xi )( xtj − x j ) and S xy is
n t
1
[(k -1) ×1] vector with elements ∑ ( xti − xi )( yi − y ).
n t
Based on these estimates, the maximum likelihood estimators of β1 and β 0 are obtained as
β1 = S xx−1S xy
β0= y − x ' β1
 β0 
β = (X 'X )
−1
=   X ' y.
 β1 
Properties of the estimators of least squares estimator:

The estimation error of OLSE b = ( X ' X ) X ' y of β is
−1
(X 'X ) X 'y−β
−1
b−β
=
= ( X ' X ) X '( X β + ε ) − β
−1
= ( X ' X ) X 'ε .
−1
Then assuming that E ( X ' X ) X ' exists, we have

−1
 
E ( X ' X ) X ' ε 
−1
E (b − β ) =
 
 {
= E  E ( X ' X ) X 'ε X 
−1
 }
= E ( X ' X ) X ' E ( ε )
−1
 
=0
because ( X ' X ) X ' and ε are independent. So b is an unbiased estimator of β .

−1
6
The covariance matrix of b is obtained as
V ( b ) =E ( b − β )( b − β ) '
= E ( X ' X ) X ' εε ' X ( X ' X ) 
−1 −1
 
 {
= E  E ( X ' X ) X ' εε ' X ( X ' X ) X 
−1 −1
 }
= E ( X ' X ) X ' E ( εε ') X ( X ' X ) X 
−1 −1
 
= E ( X ' X ) X ' σ 2 X ( X ' X ) 
−1 −1
 
= σ 2 E ( X ' X )  .
−1
 
Thus the covariance matrix involves a mathematical expectation. The unknown σ 2 can be estimated by
e 'e
σˆ 2 =
n−k
=
( y − Xb ) ' ( y − Xb )
n−k
where e= y − Xb is the residual and
E (σˆ 2 ) = E  E (σˆ 2 X ) 
  e 'e  
= E E   X
 n−k  
= E (σ 2 )
= σ 2.
Note that the OLSE b = ( X ' X ) X ' y involves the stochastic matrix X and stochastic vector y , so b is
−1
not a linear estimator. It is also no more the best linear unbiased estimator of β as in the case when X is
nonstochastic. The estimator of σ 2 as being conditional on given X is an efficient estimator.
Asymptotic theory:
The asymptotic properties of an estimator concerns the properties of the estimator when sample size n
grows large.
For the need and understanding of asymptotic theory, we consider an example. Consider the simple linear
regression model with one explanatory variable and n observations as
yi =β 0 + β1 xi + ε i , E ( ε i ) =0, Var ( ε i ) =σ 2 , i =1, 2,..., n.
7
The OLSE of β1 is
n
∑ ( x − x )( y − y )
i i
b1 = i =1
n
∑(x − x )
2
i
i =1
and its variance is
σ2
Var ( b1 ) = .
n
If the sample size grows large, then the variance of b1 gets smaller. The shrinkage in variance implies that
as sample size n increases, the probability density of OLSE b collapses around its mean because Var (b)
becomes zero.
Let there are three OLSEs b1 , b2 and b3 which are based on sample sizes n1 , n2 and n3 respectively such that
n1 < n2 < n3 , say. If c and δ are some arbitrarily chosen positive constants, then the probability that the
value of b lies within the interval β ± c can be made to be greater than (1 − δ ) for a large value of n. This
property is the consistency of b which ensure that even if the sample is very large, then we can be
confident with high probability that b will yield an estimate that is close to β .
Probability in limit
Let βˆn be an estimator of β based on a sample of size n . Let γ be any small positive constant. Then
for large n , the requirement that bn takes values with probability almost one in an arbitrary small
neighborhood of the true parameter value β is
lim P  βˆn − β < γ  =

1
n →∞  
which is denoted as
plim βˆn = β
and it is said that βˆn converges to β in probability. The estimator βˆn is said to be a consistent estimator of
β.
A sufficient but not necessary condition for βˆn to be a consistent estimator of β is that
lim E  βˆn  = β
n →∞
and lim Var  βˆn  = 0.

n →∞
8
Consistency of estimators
Now we look at the consistency of the estimators of β and σ 2 .
(i) Consistency of b
 X 'X 
Under the assumption that lim   = ∆ exists as a nonstochastic and nonsingular matrix (with finite
n →∞
 n 
elements), we have
−1
1 X 'X 
lim V (b) = σ lim 
2

n →∞ n →∞ n
 n 
1
= σ 2 lim ∆ −1
n →∞ n
= 0.
This implies that OLSE converges to β in quadratic mean. Thus OLSE is a consistent estimator of β .
This also holds true for maximum likelihood estimators also.
Same conclusion can also be proved using the concept of convergence in probability.
The consistency of OLSE can be obtained under the weaker assumption that
 X 'X 
plim   = ∆* .
 n 
exists and is a nonsingular and nonstochastic matrix and
 X 'ε 
plim   = 0.
 n 
Since
b−β =
( X ' X ) −1 X ' ε
−1
 X ' X  X 'ε
=  .
 n  n
So
−1
 X 'X   X 'ε 
plim(b − β ) = plim   plim  
 n   n 
= ∆*−1.0
= 0.
Thus b is a consistent estimator of β . The same is true for maximum likelihood estimators also.
9
(ii) Consistency of s2
Now we look at the consistency of s 2 as an estimate of σ 2 . We have
1
s2 = e 'e
n−k
1
= ε ' Hε
n−k
−1
1 k 
1 −  ε ' ε − ε ' X ( X ' X ) X ' ε 
−1
=
n n 
 k   ε 'ε ε ' X  X ' X  X 'ε 
−1 −1
=
1 −   −   .
 n   n n  n  n 
ε 'ε 1 n 2
Note that
n
consists of ∑ ε i and {ε i2 , i = 1, 2,..., n} is a sequence of independently and identically
n i =1
distributed random variables with mean σ 2 . Using the law of large numbers
 ε 'ε 
 =σ
2
plim 
 n 
 ε ' X  X ' X  −1 X ' ε   ε 'X    X ' X  
−1
X 'ε 
plim     =  plim   plim     plim 
 n  n  n   n    n    n 
= 0.∆*−1.0
=0
(1 − 0) −1 σ 2 − 0 
⇒ plim( s 2 ) =
= σ 2.
Thus s 2 is a consistent estimator of σ 2 . The same holds true for maximum likelihood estimates also.
Asymptotic distributions:
Suppose we have a sequence of random variables {α n } with a corresponding sequence of cumulative
density functions { Fn } for a random variable α with cumulative density function F . Then α n converges
in distribution to α if Fn converges to F point wise. In this case, F is called the asymptotic distribution of
αn.
Note that since convergence in probability implies the convergence in distribution, so

plim α=
n α ⇒ α n 
D
→ α ( α n tend to α in distribution), i.e., the asymptotic distribution of α n is F
which is the distribution of α .
10
Note that
E (α ) : Mean of asymptotic distribution
Var (α ) : Variance of asymptotic distribution

lim E (α n ) : Asymptotic mean
n →∞
2
lim E α n − lim E (α n )  : Asymptotic variance.
n →∞  n →∞ 
Asymptotic distribution of sample mean and least squares estimation

1 n
Let α=
n Y=
n ∑ Yi be the sample mean based on a sample of size n . Since sample mean is a consistent
n i =1
estimator of population mean Y , so

plim Yn = Y
which is constant. Thus the asymptotic distribution of Yn is the distribution of a constant. This is not a
regular distribution as all the probability mass is concentrated at one point. Thus as sample size increases,
the distribution of Yn collapses.
Suppose consider only the one third observations in the sample and find sample mean as
n
3
3
Yn* = ∑ Yi .
n i =1
Then E (Yn* ) = Y
n
and Var (Yn* ) = 2 ∑ Var (Yi )

9 3
n i =1
9 n
= 2 σ2
n 3
3
= σ2
n
→ 0 as n → ∞.
11
Thus plim Yn* = Y and Yn* has the same degenerate distribution as Yn . Since Var (Yn* ) > Var (Yn ) , so Yn*
is preferred over Yn .
Now we observe the asymptotic behaviour of Yn and Yn* . Consider a sequence of random variables {α n }.
Thus for all n , we have
αn
= n (Yn − Y )
α n*
= n (Yn* − Y )
E (=
αn ) n E (Yn =
−Y ) 0
E (=
α n* ) n E (Yn* =
−Y ) 0
σ2
Var (α n ) = nE (Yn − Y ) = n
2
= σ2
n
3σ 2
Var (α n =
) nE (Yn − Y ) = n n = 3σ 2 .
* * 2
Assuming the population to be normal, the asymptotic distribution of

• Yn is N ( 0, σ 2 )
• Yn* is N ( 0,3σ 2 ) .
So now Yn is preferable over Yn* . The central limit theorem can be used to show that α n will have an
asymptotically normal distribution even if the population is not normally distributed.
Also, since
n (Yn − Y ) ~ N ( 0, σ 2 )
n (Yn − Y )
⇒Z = ~ N ( 0,1)
σ
and this statement holds true in finite sample as well as asymptotic distributions.
ordinary least squares estimate b = ( X ' X ) X ' y of β

−1
Consider the in linear regression model
y X β + ε . If X is nonstochastic then the finite covariance matrix of b is

=
V (b) = σ 2 ( X ' X ) −1.
12
X 'X
The asymptotic covariance matrix of b under the assumption that lim = Σ xx exists and is nonsingular.
n →∞ n
It is given by
−1
1  X 'X 
σ lim ( X ' X ) = σ lim   lim 
2 2

n →∞
 
n →∞ n n →∞
 n 
= σ 2 .0.Σ −xx1
=0
which is a null matrix.
Consider the asymptotic distribution of n ( b − β ) . Then even if ε is not necessarily normally distributed,
then asymptotically
n ( b − β ) ~ N ( 0, σ 2 Σ −xx1 )
n ( b − β ) ' Σ xx ( b − β )
~ χ k2 .
σ2
X 'X
If is considered as an estimator of Σ xx , then
n
X 'X
n (b − β ) ' (b − β ) (b − β ) ' X ' X (b − β )
n =
σ2 σ2
(
is the usual test statistic as is in the case of finite samples with b ~ N β , σ 2 ( X ' X )
−1
).
13

Econometrics | Stochastic Regressors

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Econometrics | Stochastic Regressors

Transféré par

Droits d'auteur :

Formats disponibles

Chapter 13

Asymptotic Theory and Stochastic Regressors

Stochastic regressors model

where X is a (n× k ) matrix of n observations on k explanatory variables X 1 , X 2 ,..., X k which are

stochastic in nature, y is a ( n ×1) vector of n observations on study variable, β is a ( k ×1) vector of

X where xi' denotes the i th row of X . This is demonstrated as follows:

probability density function of ε i . Then

In case, ε i and xi' are independent, then p ( ε i | xi' ) = p ( ε i ) .

Least squares estimation of parameters

b =(X 'X ) X 'y

and estimator of σ 2 is obtained as

Maximum likelihood estimation of parameters:

f ( ε , X ) = f ( ε1 , ε 2 ,..., ε n , x1' , x2' ,..., xn' )

= f ( y1 , y2 ,..., yn , x1' , x2' ,..., xn' )

Alternative approach for deriving the maximum likelihood estimates

1× ( k − 1)  which excludes the intercept term.

Σ xx , i.e., xi' ~ N ( µ x , Σ xx ) and the joint distribution of y and xi' is

Let the linear regression model is

The joint probability density function of ( y, x ) based on random sample of size n is

Now using the following result, we find Σ −1 :

( k − 1) variate multivariate normal distribution as

The likelihood function of ( y, x ') based on a sample of size n is

estimates of respective parameters are obtained as

Properties of the estimators of least squares estimator:

Then assuming that E ( X ' X ) X ' exists, we have

because ( X ' X ) X ' and ε are independent. So b is an unbiased estimator of β .

nonstochastic. The estimator of σ 2 as being conditional on given X is an efficient estimator.

neighborhood of the true parameter value β is

lim P  βˆn − β < γ  =

and lim Var  βˆn  = 0.

Note that since convergence in probability implies the convergence in distribution, so

Var (α ) : Variance of asymptotic distribution

Asymptotic distribution of sample mean and least squares estimation

estimator of population mean Y , so

and Var (Yn* ) = 2 ∑ Var (Yi )

Thus for all n , we have

Assuming the population to be normal, the asymptotic distribution of

ordinary least squares estimate b = ( X ' X ) X ' y of β

y X β + ε . If X is nonstochastic then the finite covariance matrix of b is

V (b) = σ 2 ( X ' X ) −1.

Vous aimerez peut-être aussi