Académique Documents
Professionnel Documents
Culture Documents
Lecture 3:
3.1
where expectation of a random matrix is also defined elementwise, i.e., the (i, j)th element
of the expectation is simply the expectation of the (i, j)th element.
Note: Var(X) is sometimes called the variance-covariance matrix of X since its ith
diagonal element is Var(Xi ).
The various properties of univariate expectations and variances have multivariate extensions.
Suppose that X and Y are random vectors of length p and q, respectively. Also suppose
that a Rm and that B and C are m p matrices. Then the following properties hold by
the definitions above and the corresponding properties for (scalar-valued) expectations and
variances:
E(a + BX + CY ) = a + B E(X) + C E(Y ).
Var(a + BX) = B Var(X)B T .
(2)p/2
1
1
exp[ (z )T V 1 (z )],
1/2
2
det V
3.2
1 n
Xi ,
n i=1
S2 =
n
1 n
1
2
2
[ Xi2 ( X ) ],
(Xi X ) =
n 1 i=1
n 1 i=1
(3.2.1)
called (respectively) the sample mean and sample variance. By basic results on the normal
distribution, it is clear that X N (, 2 /n). To discuss the distribution of S 2 (and eventually
the joint distribution of X and S 2 ), we must first define another probability distribution.
Chi-Squared Distribution
Let Z Np (0p , Ip ). The distribution of Z T Z = pi=1 Zi2 is called a chi-squared distribution with p degrees of freedom, which we write as 2p . The following lemmas develop some
properties of the chi-squared distribution.
Lemma 3.2.1. The 21 distribution is the Gamma(1/2, 1/2) distribution.
Proof. Let Z N (0, 1). Then the pdf of Z 2 is
2 f (Z) ( u)
1
u
(1/2)1/2 1/2
u
(Z 2 )
=
f
(u) =
exp( ) =
u
exp( ),
2
(1/2)
2
2 u
2 u
for u > 0 and zero otherwise, which is the pdf of a Gamma(1/2, 1/2) distribution.
Lemma 3.2.2. Let U1 , . . . , Um be independent with Ui Gamma(i , ) for each i {1, . . . , m}.
m
Then m
i=1 Ui Gamma(i=1 i , ).
Proof. See the proof of Theorem 5.7.7 of DeGroot & Schervish.
Lemma 3.2.3. The 2p distribution is the Gamma(p/2, 1/2) distribution.
Proof. This result follows immediately from Lemma 3.2.1 and Lemma 3.2.2.
It follows from Lemma 3.2.3 that the 2p distribution has expectation p and variance 2p.
Joint Distribution of the Sample Mean and Sample Variance
The distribution of S 2 , as well as the joint distribution of X and S 2 , is provided by the
following theorem.
Theorem 3.2.4. Let X1 , . . . , Xn iid N (, 2 ), where n 2, and let X and S 2 be defined
as in (3.2.1). Then X N (, 2 /n), and (n 1)S 2 / 2 2n1 . Moreover, X and S 2 are
independent.
Proof. It suffices to prove the result for = 0 and 2 = 1. Let X = (X1 , . . . , Xn ) Nn (0n , In ).
Now let A be an orthogonal p p matrix for which all elements in the first row are n1/2 .
(Such a matrix can always be constructed, e.g., by the Gram-Schmidt process.) Then let
Y = (Y1 , . . . , Yn ) = AX. Observe that Y Nn (0n , In ) by Lemma 3.1.1, so the sum of
the squares of its last n 1 elements is ni=2 Yi2 2n1 . Now note that the first element is
Y1 = n1/2 X, so we may write
n
i=1
i=1
= (n 1)S 2 .
Finally, note that Y1 , . . . , Yn are all independent, so Y1 and and ni=2 Yi2 are independent.
It can be seen from Theorem 3.2.4 that E[(n 1)S 2 / 2 )] = n 1, and thus E(S 2 ) = 2 . This
result explains why the sample variance is defined with n1 (and not n) in the denominator.
Without Normality
Without the normality assumption, some parts of Theorem 3.2.4 still hold, but others do not.
Suppose X1 , . . . , Xn are iid with E(X1 ) = and Var(X1 ) = 2 , but suppose their distribution
is not necessarily normal.
We still have E( X ) = and Var( X ) = 2 /n. Also, we still have E(S 2 ) = 2 , which
agrees with Theorem 3.2.4.
However, the distribution of X is not necessarily normal (though it is approximately
normal for large n by the CLT), and the distribution of (n 1)S 2 / 2 is not necessarily
chi-squared. Moreover, X and S 2 are not necessarily independent.
3.3
Observe that
X
P ( X X + ) = P ( X ) = P
. (3.3.1)
2 /n
2 /n
2 /n
P ( X X + ) =
= 2
1,
2 /n
2 /n
2 /n
where the last equality follows from the symmetry of the N (0, 1) distribution about zero. If
we want to have P ( X X + ) = 1 (e.g., = 0.05), then we can choose as
2 1
=
(1 ).
n
2
The number 1 (1 /2) is simply the real number z such that (z) = 1 /2, which
is called the 1 /2 quantile of the N (0, 1) distribution. For example, if = 0.05, then
1 (1 /2) = 1 (0.975) 1.960. Thus, the interval
2 1
1
X
(1 ), X +
(1 )
(3.3.2)
n
2
n
2
X
P ( X X + ) = P ( X ) = P
. (3.3.3)
S 2 /n
S 2 /n
S 2 /n
However, to proceed any further, we need to know the distribution of the random variable
X
T=
.
S 2 /n
(3.3.4)
U /p
is called Students t distribution with p degrees of freedom, which we write as tp .
Lemma 3.3.1. The pdf of the tp distribution is, for all t R,
[(p + 1)/2]
t2 (p+1)/2
f (t) =
(1 + )
.
p (p/2)
p
Proof. See pages 483484 of DeGroot & Schervish.
Then we have the following result.
Theorem 3.3.2. Let X1 , . . . , Xn iid N (, 2 ), where n 2, and let T be defined as in
(3.2.1) and (3.3.4). Then T tn1 .
P ( X X + ) = n1
n1
= 2 n1
1,
S 2 /n
S 2 /n
S 2 /n
where the last equality follows from the symmetry of the tn1 distribution about zero (which
can be observed from the form of the pdf in Lemma 3.3.1). Thus, if we want to have
P ( X X + ) = 1 (e.g., = 0.05), then we can choose as
S 2 1
=
n1 (1 ).
n
2
The number 1
n1 (1/2) is simply the 1/2 quantile of the tn1 distribution. For example,
1
if = 0.05, and n = 9, then 1
n1 (1 /2) = 8 (0.975) 2.306. Thus, the interval
S
S 2 1
1
X
(1 ), X +
(1 )
(3.3.5)
n n1
2
n n1
2
Additional Uncertainty
The use of the tn1 quantile in (3.3.5) leads to a slightly wider interval than the interval that
would result from the use of the N (0, 1) quantile instead. In particular, it can always be
shown that
)
1 (1 ) < 1
n1 (1
2
2
for every n 2 and for all such that 0 < < 1/2. The qualitative explanation for this effect
is that there is additional uncertainty introduced when we use the random quantity S 2 in
place of the unknown constant 2 .
Asymptotic Comparison
When the sample size n is large, it turns out that there is very little difference between
using the normal quantile 1 (1 /2) and the tn1 quantile 1
n1 (1 /2) for the intervals
in (3.3.5). To formalize this idea, we first introduce the following lemma.
Lemma 3.3.3. Let Un 2n for every n 1. Then Un /n P 1 as n .
Proof. Let {Uk k 1} be a sequence of iid 21 random variables. Then by the WLLN,
n1 nk=1 Uk P E(U1 ) = 1, which implies that n1 nk=1 Uk D 1 as well. Now note that
for every n 1, the random variables n1 Un and n1 nk=1 Uk have the same distribution by
Lemma 3.2.1, Lemma 3.2.2, and Lemma 3.2.3. Then n1 Un D 1, from which the result
follows immediately.
Then we have the following result.
Theorem 3.3.4. Let Tn tn for every n 1. Then Tn D N (0, 1) as n .
Proof. Let Z N (0, 1), and let Un 2n (with Z and
Un independent) for every n 1. Then
for every n 1, the random variables Tn and Z/ Un /n have the same distribution by the
definition of Students t distribution. The result then follows immediately from Lemma 3.3.3
and Slutskys theorem.
Thus, for large values of the degrees of freedom, Students t distribution is quite similar to
the N (0, 1) distribution, which implies that they have similar 1 /2 quantiles.
Note: This fact explains why some introductory statistics textbooks state that a normal
quantile may be used instead of a Students t quantile in the interval in (3.3.5) if n
is large. The Students t quantile is still correct, but some authors of such textbooks
prefer to approximate it with a normal quantile whenever possible.