Lecture 3

STATS 200 (Stanford University, Summer 2015)
Lecture 3:
Sampling from Normal Distributions
A set of n 2 quantitative observations is often modeled by treating the observations as iid

normal random variables, i.e., X1 , . . . , Xn iid N (, 2 ), where R and 2 > 0. Recall that
the pdf of the N (, 2 ) distribution is
1
(x )2
]
f (x) =
exp[
2 2
2 2
for all x R. A wide variety of statistical procedures are based on this simple setup, so it
is important to study it in detail. It also serves as a good introductory example for certain
principles of statistical inference.
3.1
Random Vectors and the Multivariate Normal Distribution
It is sometimes convenient to treat a sample of observations X1 , . . . , Xn as a single vector

X = (X1 , . . . , Xn ). Since the elements of X are random variables, we call X a random
vector. To discuss random vectors, we first need to extend certain concepts and definitions.
Distribution of Random Vectors
The pmf or pdf of a random vector X is simply the joint pmf or pdf of its components. If
we partition X as X = (Y , Z), then we can consider the marginal pmfs or pdfs of Y and Z,
as well as the conditional pmfs or pdfs of Y Z and Z Y . Also recall that Y and Z are
independent if and only if their joint pmf or pdf can be factored as the product of their
marginal pmfs or pdfs.
Expectation, Variance, and Covariance of Random Vectors
Let X = (X1 , . . . , Xp ) be a random vector. Then E(X) denotes a vector of length p with ith
component E(Xi ), and Var(X) denotes a p p matrix with (i, j)th element Cov(Xi , Xj ).
Equivalently, Var(X) can be written (or is sometimes defined) as
T
T
Var(X) = E{[X E(X)][X E(X)] } = E(XX T ) [E(X)][E(X)] ,
where expectation of a random matrix is also defined elementwise, i.e., the (i, j)th element
of the expectation is simply the expectation of the (i, j)th element.
Note: Var(X) is sometimes called the variance-covariance matrix of X since its ith
diagonal element is Var(Xi ).
The various properties of univariate expectations and variances have multivariate extensions.
Suppose that X and Y are random vectors of length p and q, respectively. Also suppose
that a Rm and that B and C are m p matrices. Then the following properties hold by
the definitions above and the corresponding properties for (scalar-valued) expectations and
variances:
E(a + BX + CY ) = a + B E(X) + C E(Y ).
Var(a + BX) = B Var(X)B T .
Lecture 3: Sampling from Normal Distributions
Multivariate Normal Distribution

Let Z be a random vector, with = E(Z) and V = Var(Z). The distribution of Z is called
multivariate normal, which we write as Z Np (, V ), if and only if aT Z has a (univariate)
normal distribution for all a Rp . The following properties hold for Z Np (, V ):
If V is nonsingular, then the pdf of Z is
f (z) =
(2)p/2
1
1
exp[ (z )T V 1 (z )],
1/2
2
det V
where det denotes the determinant.

Zi and Zj are independent if and only if Vij = Cov(Zi , Zj ) = 0.
Let 0p denote a zeros vector of length p, and let Ip denote the p p identity matrix. The
distribution Np (0p , Ip ) is called the p-variate standard normal distribution and has a useful
property stated in the following lemma.
Lemma 3.1.1. Let A be a p p matrix that is orthogonal (AT A = AAT = Ip ), and let
Z Np (0p , Ip ). Then AZ Np (0p , Ip ).
Proof. For any vector b Rp , the random vector bT AZ = (AT b)T Z has a (univariate)
normal distribution since Z is multivariate normal. Then AZ is multivariate normal. Now
simply note that E(AZ) = A E(Z) = 0p and Var(AZ) = AIp AT = AAT = Ip .
3.2
Sample Mean and Sample Variance
Let X1 , . . . , Xn iid N (, 2 ), where n 2. Two commonly calculated quantities are

X=
1 n
Xi ,
n i=1
S2 =
n
1 n
1
2
2
[ Xi2 ( X ) ],
(Xi X ) =
n 1 i=1
n 1 i=1
(3.2.1)
called (respectively) the sample mean and sample variance. By basic results on the normal
distribution, it is clear that X N (, 2 /n). To discuss the distribution of S 2 (and eventually
the joint distribution of X and S 2 ), we must first define another probability distribution.
Chi-Squared Distribution
Let Z Np (0p , Ip ). The distribution of Z T Z = pi=1 Zi2 is called a chi-squared distribution with p degrees of freedom, which we write as 2p . The following lemmas develop some
properties of the chi-squared distribution.
Lemma 3.2.1. The 21 distribution is the Gamma(1/2, 1/2) distribution.
Proof. Let Z N (0, 1). Then the pdf of Z 2 is
2 f (Z) ( u)
1
u
(1/2)1/2 1/2
u
(Z 2 )
=
f
(u) =
exp( ) =
u
exp( ),
2
(1/2)
2
2 u
2 u
for u > 0 and zero otherwise, which is the pdf of a Gamma(1/2, 1/2) distribution.
Lemma 3.2.2. Let U1 , . . . , Um be independent with Ui Gamma(i , ) for each i {1, . . . , m}.
m
Then m
i=1 Ui Gamma(i=1 i , ).
Proof. See the proof of Theorem 5.7.7 of DeGroot & Schervish.
Lemma 3.2.3. The 2p distribution is the Gamma(p/2, 1/2) distribution.
Proof. This result follows immediately from Lemma 3.2.1 and Lemma 3.2.2.
It follows from Lemma 3.2.3 that the 2p distribution has expectation p and variance 2p.
Joint Distribution of the Sample Mean and Sample Variance
The distribution of S 2 , as well as the joint distribution of X and S 2 , is provided by the
following theorem.
Theorem 3.2.4. Let X1 , . . . , Xn iid N (, 2 ), where n 2, and let X and S 2 be defined
as in (3.2.1). Then X N (, 2 /n), and (n 1)S 2 / 2 2n1 . Moreover, X and S 2 are
independent.
Proof. It suffices to prove the result for = 0 and 2 = 1. Let X = (X1 , . . . , Xn ) Nn (0n , In ).
Now let A be an orthogonal p p matrix for which all elements in the first row are n1/2 .
(Such a matrix can always be constructed, e.g., by the Gram-Schmidt process.) Then let
Y = (Y1 , . . . , Yn ) = AX. Observe that Y Nn (0n , In ) by Lemma 3.1.1, so the sum of
the squares of its last n 1 elements is ni=2 Yi2 2n1 . Now note that the first element is
Y1 = n1/2 X, so we may write
n
Yi2 = Yi2 Y12 = Y T Y Y12 = X T AT AX n( X ) = X T X n( X ) = Xi2 n( X )

i=2
i=1
i=1
= (n 1)S 2 .
Finally, note that Y1 , . . . , Yn are all independent, so Y1 and and ni=2 Yi2 are independent.
It can be seen from Theorem 3.2.4 that E[(n 1)S 2 / 2 )] = n 1, and thus E(S 2 ) = 2 . This
result explains why the sample variance is defined with n1 (and not n) in the denominator.
Without Normality
Without the normality assumption, some parts of Theorem 3.2.4 still hold, but others do not.
Suppose X1 , . . . , Xn are iid with E(X1 ) = and Var(X1 ) = 2 , but suppose their distribution
is not necessarily normal.
We still have E( X ) = and Var( X ) = 2 /n. Also, we still have E(S 2 ) = 2 , which
agrees with Theorem 3.2.4.
However, the distribution of X is not necessarily normal (though it is approximately
normal for large n by the CLT), and the distribution of (n 1)S 2 / 2 is not necessarily
chi-squared. Moreover, X and S 2 are not necessarily independent.
3.3
Students t Distribution and Inference About the Mean
Suppose X1 , . . . , Xn iid N (, 2 ) where R is unknown and 2 > 0. (For now, we ignore

the question of whether 2 is known or unknown.) Clearly X as defined in (3.2.1) is a useful
estimator of the unknown mean . However, we may wish to be more precise. Specifically,
we may wish to construct an interval [ X , X + ] that will contain the true value of
with some high probability (e.g., 0.95).
Note: Such an interval is called a confidence interval. We will return to this notion
later in the course to consider it more carefully.
Observe that
X

P ( X X + ) = P ( X ) = P
. (3.3.1)
2 /n
2 /n
2 /n
The random variable (X )/ 2 /n has a N (0, 1) distribution by Theorem 3.2.4. Let

denote the cdf of the N (0, 1) distribution. Then it follows from (3.3.1) that

P ( X X + ) =

= 2
1,
2 /n
2 /n
2 /n
where the last equality follows from the symmetry of the N (0, 1) distribution about zero. If
we want to have P ( X X + ) = 1 (e.g., = 0.05), then we can choose as
2 1
=
(1 ).
n
2
The number 1 (1 /2) is simply the real number z such that (z) = 1 /2, which
is called the 1 /2 quantile of the N (0, 1) distribution. For example, if = 0.05, then
1 (1 /2) = 1 (0.975) 1.960. Thus, the interval
2 1

1
X
(1 ), X +
(1 )
(3.3.2)
n
2
n
2
will contain with probability 1 .

Unknown Variance
The endpoints of the interval in (3.3.2) depend on 2 . If the variance 2 is known, then
the interval in (3.3.2) is indeed a useful thing to calculate and report. However, if 2 is
unknown, then the interval in (3.3.2) cannot even be calculated. One idea is to attempt to
use the sample variance S 2 instead of the unknown population variance 2 . Taking a similar
approach to (3.3.1), we have
X

P ( X X + ) = P ( X ) = P
. (3.3.3)
S 2 /n
S 2 /n
S 2 /n
However, to proceed any further, we need to know the distribution of the random variable
X
T=
.
S 2 /n
(3.3.4)
This requires us to introduce another probability distribution.

Students t Distribution
Let Z N (0, 1) and U 2p be independent random variables. The distribution of the
random variable
Z
U /p
is called Students t distribution with p degrees of freedom, which we write as tp .
Lemma 3.3.1. The pdf of the tp distribution is, for all t R,
[(p + 1)/2]
t2 (p+1)/2
f (t) =
(1 + )
.
p (p/2)
p
Proof. See pages 483484 of DeGroot & Schervish.
Then we have the following result.
Theorem 3.3.2. Let X1 , . . . , Xn iid N (, 2 ), where n 2, and let T be defined as in
(3.2.1) and (3.3.4). Then T tn1 .
Proof. Let Z = ( X )/ 2 /n and U = (n 1)S 2 / 2 , By Theorem 3.2.4,

Z and U are
2
independent with Z N (0, 1) and U n1 . The result follows since T = Z/ U /(n 1).
Theorem 3.3.2 allows us to construct an interval analogous to (3.3.2) in the case where 2 is
unknown. Let n1 denote the cdf of the tn1 distribution. Then it follows from (3.3.3) that

P ( X X + ) = n1
n1
= 2 n1
1,
S 2 /n
S 2 /n
S 2 /n
where the last equality follows from the symmetry of the tn1 distribution about zero (which
can be observed from the form of the pdf in Lemma 3.3.1). Thus, if we want to have
P ( X X + ) = 1 (e.g., = 0.05), then we can choose as
S 2 1
=
n1 (1 ).
n
2
The number 1
n1 (1/2) is simply the 1/2 quantile of the tn1 distribution. For example,
1
if = 0.05, and n = 9, then 1
n1 (1 /2) = 8 (0.975) 2.306. Thus, the interval

S
S 2 1
1
X
(1 ), X +
(1 )
(3.3.5)
n n1
2
n n1
2
will contain with probability 1 .
Additional Uncertainty
The use of the tn1 quantile in (3.3.5) leads to a slightly wider interval than the interval that
would result from the use of the N (0, 1) quantile instead. In particular, it can always be
shown that
)
1 (1 ) < 1
n1 (1
2
2
for every n 2 and for all such that 0 < < 1/2. The qualitative explanation for this effect
is that there is additional uncertainty introduced when we use the random quantity S 2 in
place of the unknown constant 2 .
Asymptotic Comparison
When the sample size n is large, it turns out that there is very little difference between
using the normal quantile 1 (1 /2) and the tn1 quantile 1
n1 (1 /2) for the intervals
in (3.3.5). To formalize this idea, we first introduce the following lemma.
Lemma 3.3.3. Let Un 2n for every n 1. Then Un /n P 1 as n .
Proof. Let {Uk k 1} be a sequence of iid 21 random variables. Then by the WLLN,
n1 nk=1 Uk P E(U1 ) = 1, which implies that n1 nk=1 Uk D 1 as well. Now note that
for every n 1, the random variables n1 Un and n1 nk=1 Uk have the same distribution by
Lemma 3.2.1, Lemma 3.2.2, and Lemma 3.2.3. Then n1 Un D 1, from which the result
follows immediately.
Then we have the following result.
Theorem 3.3.4. Let Tn tn for every n 1. Then Tn D N (0, 1) as n .
Proof. Let Z N (0, 1), and let Un 2n (with Z and
Un independent) for every n 1. Then
for every n 1, the random variables Tn and Z/ Un /n have the same distribution by the
definition of Students t distribution. The result then follows immediately from Lemma 3.3.3
and Slutskys theorem.
Thus, for large values of the degrees of freedom, Students t distribution is quite similar to
the N (0, 1) distribution, which implies that they have similar 1 /2 quantiles.
Note: This fact explains why some introductory statistics textbooks state that a normal
quantile may be used instead of a Students t quantile in the interval in (3.3.5) if n
is large. The Students t quantile is still correct, but some authors of such textbooks
prefer to approximate it with a normal quantile whenever possible.

Lecture 3

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lecture 3

Transféré par

Droits d'auteur :

Formats disponibles

STATS 200 (Stanford University, Summer 2015)

Sampling from Normal Distributions

A set of n 2 quantitative observations is often modeled by treating the observations as iid

Random Vectors and the Multivariate Normal Distribution

It is sometimes convenient to treat a sample of observations X1 , . . . , Xn as a single vector

Lecture 3: Sampling from Normal Distributions

Multivariate Normal Distribution

where det denotes the determinant.

Sample Mean and Sample Variance

Let X1 , . . . , Xn iid N (, 2 ), where n 2. Two commonly calculated quantities are

Lecture 3: Sampling from Normal Distributions

Yi2 = Yi2 Y12 = Y T Y Y12 = X T AT AX n( X ) = X T X n( X ) = Xi2 n( X )

Lecture 3: Sampling from Normal Distributions

Students t Distribution and Inference About the Mean

Suppose X1 , . . . , Xn iid N (, 2 ) where R is unknown and 2 > 0. (For now, we ignore

The random variable (X )/ 2 /n has a N (0, 1) distribution by Theorem 3.2.4. Let

will contain with probability 1 .

Lecture 3: Sampling from Normal Distributions

This requires us to introduce another probability distribution.

Proof. Let Z = ( X )/ 2 /n and U = (n 1)S 2 / 2 , By Theorem 3.2.4,

will contain with probability 1 .

Lecture 3: Sampling from Normal Distributions

Vous aimerez peut-être aussi