Académique Documents
Professionnel Documents
Culture Documents
We assume now that the explanatory variables are stochastic but uncorrelated with the disturbance term. In
case, they are correlated then the issue is addressed through instrumental variable estimation. Such a
situation arises in the case of measurement error models.
regression coefficients and ε is the ( n ×1) vector of disturbances. Under the assumption
V ( ε ) σ 2 I , the distribution of ε i , conditional on xi' , satisfy these properties for all all values of
E (ε ) 0,=
=
Let p ( ε i | xi' ) be the conditional probability density function of ε i given xi' and p ( ε i ) is the unconditional
E ( ε i | xi' ) = ∫ ε i p ( ε i | xi' ) d ε i
= ∫ ε i p (ε i ) dε i
= E (ε i )
=0
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
1
E ( ε i2 | xi' ) = ∫ ε i2 p ( ε i | xi' ) d ε i
= ∫ ε i2 p ( ε i ) d ε i
= E ( ε i2 )
= σ 2.
with respect β as
joint probability density function ε and X can be derived from the joint probability density function of y
and X as follows:
(
= ∏ f ( yi | xi' ) f ( xi' ) )
n
i =1
= ∏ f ( yi , xi' )
n
i =1
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
2
This implies that the maximum likelihood estimators of β and σ 2 will be based on
∏ f ( y | x ) = ∏ f (ε )
n n
'
i i i
=i 1 =i 1
so they will be same as based on the assumption that ε i ' s, i = 1, 2,..., n are distributed as N ( 0, σ 2 ) . So the
maximum likelihood estimators of β and σ 2 when the explanatory variables are stochastic are obtained as
β = ( X ' X ) X ' y
−1
σ 2 =( y − X β )′ ( y − X β ) .
1
n
Note: Note that the vector x ' is represented by an underscore in this section to denote that it ‘s order is
Let xi' , i = 1, 2,..., n are from a multivariate normal distribution with mean vector µ x and covariance matrix
y µ y σ yy Σ yx
' ~ N µ , .
xi x Σ xy Σ xx
where xi' is a 1× ( k − 1) vector of observation of random vector x, β 0 is the intercept term and β1 is the
( k − 1) × 1 vector of regression coefficients. Further ε i is disturbance term with ε i ~ N ( 0, σ 2 ) and is
independent of x ' .
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
3
Suppose
y µ y σ yy Σ yx
~ N µ , Σ .
Σ xx
x x xy
1 y − µ y ' y − µ y
1 −1
f ( y, x=
') exp − Σ .
k 1
x − µ x − µ
( 2π ) 2 Σ2
2 x x
−11 1 −Σ yx Σ −xx1
Σ = 2 −1 ,
σ −Σ xx Σ xy σ Σ xx + Σ xx Σ xy Σ yx Σ xx
2 −1 −1 −1
where
σ=
2
σ yy − Σ yx Σ −xx1Σ xy .
Then
f ( y, x ')
=
( 2π )
1
k
2
1
Σ2
exp
1
−
2σ 2 y − µ y {
− ( x − µ x ) ' Σ −1
xx Σ xy
2 −1
} .
+ σ ( x − µ x ) ' Σ xx ( x − µ x )
2
The marginal distribution of x ' is obtained by integrating f ( y, x ') over y and the resulting distribution is
1
exp − ( x − µ x ) ' Σ −xx1 ( x − µ x ) .
1
=g ( x ') k −1
2
1
( 2π ) 2 Σ xx 2
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
4
The conditional probability density function of y given x ' is
f ( y, x ')
f ( y | x ') =
g ( x ')
1
{( y − µ ) − ( x − µ ) Σ }
2
1 ' −1
= exp − 2 Σ xy
2σ
y x xx
2πσ 2
which is the probability density function of normal distribution with
• conditional mean
E ( y | x ')= µ y + ( x − µ x ) ' Σ −xx1Σ xy and
• conditional variance
| x ') σ yy (1 − ρ 2 )
Var ( y=
where
Σ yx Σ −xx1Σ xy
ρ2 =
σ yy
is the population multiple correlation coefficient.
In the model
y=β 0 + x ' β1 + ε ,
the conditional mean is
E ( yi | xi' ) =β 0 + x ' β1 + E ( ε | x )
= β 0 + x ' β1.
Comparing this conditional mean with the conditional mean of normal distribution, we obtain the
relationship with β 0 and β1 as follows:
β1 =Σ −xx1Σ xy
β=
0 µ y − µ x' β1.
n 1 y − µ ' y − µ y
1 −1 i
= exp ∑ −
Σ .
i y
L nk n
x − µ x − µ x
( 2π ) 2 Σ2 i =1 i
2 x i
Maximizing the log likelihood function with respect to µ y , µ x , Σ xx and Σ xy , the maximum likelihood
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
5
1 n
µ y= y= ∑ yi
n i =1
1 n
µ x= x= ∑ xi= ( x2 , x3 ,..., xk )
n i =1
1 n
Σ xx = S xx = ∑
n i =1
xi xi' − nx x '
1 n
Σ xy = S xy = ∑ xi yi − nx
n i =1
y
1
where xi' ( xi 2 , xi 3 ,..., xik ), S xx is [(k -1) × (k -1)] matrix with elements ∑ ( xti − xi )( xtj − x j ) and S xy is
n t
1
[(k -1) ×1] vector with elements ∑ ( xti − xi )( yi − y ).
n t
Based on these estimates, the maximum likelihood estimators of β1 and β 0 are obtained as
β1 = S xx−1S xy
β0= y − x ' β1
β0
β = (X 'X )
−1
= X ' y.
β1
(X 'X ) X 'y−β
−1
b−β
=
= ( X ' X ) X '( X β + ε ) − β
−1
= ( X ' X ) X 'ε .
−1
E ( X ' X ) X ' ε
−1
E (b − β ) =
{
= E E ( X ' X ) X 'ε X
−1
}
= E ( X ' X ) X ' E ( ε )
−1
=0
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
6
The covariance matrix of b is obtained as
V ( b ) =E ( b − β )( b − β ) '
= E ( X ' X ) X ' εε ' X ( X ' X )
−1 −1
{
= E E ( X ' X ) X ' εε ' X ( X ' X ) X
−1 −1
}
= E ( X ' X ) X ' E ( εε ') X ( X ' X ) X
−1 −1
= E ( X ' X ) X ' σ 2 X ( X ' X )
−1 −1
= σ 2 E ( X ' X ) .
−1
Thus the covariance matrix involves a mathematical expectation. The unknown σ 2 can be estimated by
e 'e
σˆ 2 =
n−k
=
( y − Xb ) ' ( y − Xb )
n−k
where e= y − Xb is the residual and
E (σˆ 2 ) = E E (σˆ 2 X )
e 'e
= E E X
n−k
= E (σ 2 )
= σ 2.
Note that the OLSE b = ( X ' X ) X ' y involves the stochastic matrix X and stochastic vector y , so b is
−1
not a linear estimator. It is also no more the best linear unbiased estimator of β as in the case when X is
Asymptotic theory:
The asymptotic properties of an estimator concerns the properties of the estimator when sample size n
grows large.
For the need and understanding of asymptotic theory, we consider an example. Consider the simple linear
regression model with one explanatory variable and n observations as
yi =β 0 + β1 xi + ε i , E ( ε i ) =0, Var ( ε i ) =σ 2 , i =1, 2,..., n.
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
7
The OLSE of β1 is
n
∑ ( x − x )( y − y )
i i
b1 = i =1
n
∑(x − x )
2
i
i =1
and its variance is
σ2
Var ( b1 ) = .
n
If the sample size grows large, then the variance of b1 gets smaller. The shrinkage in variance implies that
as sample size n increases, the probability density of OLSE b collapses around its mean because Var (b)
becomes zero.
Let there are three OLSEs b1 , b2 and b3 which are based on sample sizes n1 , n2 and n3 respectively such that
n1 < n2 < n3 , say. If c and δ are some arbitrarily chosen positive constants, then the probability that the
value of b lies within the interval β ± c can be made to be greater than (1 − δ ) for a large value of n. This
property is the consistency of b which ensure that even if the sample is very large, then we can be
confident with high probability that b will yield an estimate that is close to β .
Probability in limit
Let βˆn be an estimator of β based on a sample of size n . Let γ be any small positive constant. Then
for large n , the requirement that bn takes values with probability almost one in an arbitrary small
and it is said that βˆn converges to β in probability. The estimator βˆn is said to be a consistent estimator of
β.
A sufficient but not necessary condition for βˆn to be a consistent estimator of β is that
lim E βˆn = β
n →∞
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
8
Consistency of estimators
Now we look at the consistency of the estimators of β and σ 2 .
(i) Consistency of b
X 'X
Under the assumption that lim = ∆ exists as a nonstochastic and nonsingular matrix (with finite
n →∞
n
elements), we have
−1
1 X 'X
lim V (b) = σ lim
2
n →∞ n →∞ n
n
1
= σ 2 lim ∆ −1
n →∞ n
= 0.
This implies that OLSE converges to β in quadratic mean. Thus OLSE is a consistent estimator of β .
This also holds true for maximum likelihood estimators also.
Same conclusion can also be proved using the concept of convergence in probability.
The consistency of OLSE can be obtained under the weaker assumption that
X 'X
plim = ∆* .
n
exists and is a nonsingular and nonstochastic matrix and
X 'ε
plim = 0.
n
Since
b−β =
( X ' X ) −1 X ' ε
−1
X ' X X 'ε
= .
n n
So
−1
X 'X X 'ε
plim(b − β ) = plim plim
n n
= ∆*−1.0
= 0.
Thus b is a consistent estimator of β . The same is true for maximum likelihood estimators also.
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
9
(ii) Consistency of s2
Now we look at the consistency of s 2 as an estimate of σ 2 . We have
1
s2 = e 'e
n−k
1
= ε ' Hε
n−k
−1
1 k
1 − ε ' ε − ε ' X ( X ' X ) X ' ε
−1
=
n n
k ε 'ε ε ' X X ' X X 'ε
−1 −1
=
1 − − .
n n n n n
ε 'ε 1 n 2
Note that
n
consists of ∑ ε i and {ε i2 , i = 1, 2,..., n} is a sequence of independently and identically
n i =1
distributed random variables with mean σ 2 . Using the law of large numbers
ε 'ε
=σ
2
plim
n
ε ' X X ' X −1 X ' ε ε 'X X ' X
−1
X 'ε
plim = plim plim plim
n n n n n n
= 0.∆*−1.0
=0
(1 − 0) −1 σ 2 − 0
⇒ plim( s 2 ) =
= σ 2.
Thus s 2 is a consistent estimator of σ 2 . The same holds true for maximum likelihood estimates also.
Asymptotic distributions:
Suppose we have a sequence of random variables {α n } with a corresponding sequence of cumulative
density functions { Fn } for a random variable α with cumulative density function F . Then α n converges
in distribution to α if Fn converges to F point wise. In this case, F is called the asymptotic distribution of
αn.
10
Note that
E (α ) : Mean of asymptotic distribution
2
lim E α n − lim E (α n ) : Asymptotic variance.
n →∞ n →∞
which is constant. Thus the asymptotic distribution of Yn is the distribution of a constant. This is not a
regular distribution as all the probability mass is concentrated at one point. Thus as sample size increases,
the distribution of Yn collapses.
Suppose consider only the one third observations in the sample and find sample mean as
n
3
3
Yn* = ∑ Yi .
n i =1
Then E (Yn* ) = Y
n
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
11
Thus plim Yn* = Y and Yn* has the same degenerate distribution as Yn . Since Var (Yn* ) > Var (Yn ) , so Yn*
is preferred over Yn .
Now we observe the asymptotic behaviour of Yn and Yn* . Consider a sequence of random variables {α n }.
αn
= n (Yn − Y )
α n*
= n (Yn* − Y )
E (=
αn ) n E (Yn =
−Y ) 0
E (=
α n* ) n E (Yn* =
−Y ) 0
σ2
Var (α n ) = nE (Yn − Y ) = n
2
= σ2
n
3σ 2
Var (α n =
) nE (Yn − Y ) = n n = 3σ 2 .
* * 2
• Yn* is N ( 0,3σ 2 ) .
So now Yn is preferable over Yn* . The central limit theorem can be used to show that α n will have an
asymptotically normal distribution even if the population is not normally distributed.
Also, since
n (Yn − Y ) ~ N ( 0, σ 2 )
n (Yn − Y )
⇒Z = ~ N ( 0,1)
σ
and this statement holds true in finite sample as well as asymptotic distributions.
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
12
X 'X
The asymptotic covariance matrix of b under the assumption that lim = Σ xx exists and is nonsingular.
n →∞ n
It is given by
−1
1 X 'X
σ lim ( X ' X ) = σ lim lim
2 2
n →∞
n →∞ n n →∞
n
= σ 2 .0.Σ −xx1
=0
which is a null matrix.
Consider the asymptotic distribution of n ( b − β ) . Then even if ε is not necessarily normally distributed,
then asymptotically
n ( b − β ) ~ N ( 0, σ 2 Σ −xx1 )
n ( b − β ) ' Σ xx ( b − β )
~ χ k2 .
σ2
X 'X
If is considered as an estimator of Σ xx , then
n
X 'X
n (b − β ) ' (b − β ) (b − β ) ' X ' X (b − β )
n =
σ2 σ2
(
is the usual test statistic as is in the case of finite samples with b ~ N β , σ 2 ( X ' X )
−1
).
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur
13