InferenceForMeans Hotelling

Inferences about a Mean Vector
In the following lectures, we test hypotheses about a

p 1 population mean vector = (
1
,
2
, . . . ,
p
)
We could test p disjoint hypothesis (one for each

j
in ) but
that would not take advantage of the correlations between
the measured traits (X
1
, X
2
, . . . , X
p
).
We rst review hypothesis testing in the univariate case, and
then develop the multivariate Hotellings T
2
statistic and the
likelihood ratio statistic for multivariate hypothesis testing.
We consider applications to repeated measures (longitudinal)
studies.
We also consider situations when data are incomplete (data
are missing at random).
250
Approaches to Multivariate Inference
Dene a reasonable distance measure. An estimated mean
vector that is too far away from the hypothesized mean
vector
0
gives evidence against the null hypothesis.
Construct a likelihood ratio test based on the multivariate
normal distribution.
Union-Intersection approach: Consider a univariate test
of H
0
: a
= a
0
versus H
a
: a
= a
0
for some linear
combination of the traits a
X. Optimize over possible values

of a.
251
Review of Univariate Hypothesis Testing
Is a given value
0
a plausible value for the population mean
?
We formulate the problem as a hypothesis testing problem.
The competing hypotheses are
H
0
: =
0
and H
a
: =
0
.
Given a sample X
1
, ..., X
n
from a normal population, we
compute the test statistic
t =
(

X
0
)
s/
n
.
If t is small, then

X and
0
are close and we fail to reject
H
0
.
252
Univariate Hypothesis Testing (contd)
When H
0
is true, the statistic t has a student t distribution
with n1 degrees of freedom. We reject the null hypothesis
at level when |t| > t
(n1)
(/2).
Notice that rejecting H
0
when t is large is equivalent to
rejecting it when the squared standardized distance
t
2
=
(

X
0
)
2
s
2
/n
= n(

X
0
)(s
2
)
1
(

X
0
)
is large.
We reject H
0
when
n(

X
0
)(s
2
)
1
(

X
0
) > t
2
(n1)
(/2)
i.e., the squared standardized distance exceeds the upper
percentile of a central F-distribution with n 1 df.
253
Univariate Hypothesis Testing (contd)
If we fail to reject H
0
, we conclude that
0
is close (in units
of standard deviations of

X) to

X, and thus is a plausible
value for .
The set of plausible values for is the set of all values that
lie in the 100(1 )% condence interval for :
x t
n1
(/2)
s
n

0
x +t
n1
(/2)
s
n
.
The condence interval consists of all the
0
values that
would not be rejected by the level test of H
0
: =
0
.
Before collecting the data, the interval is random and has
probability 1 of containing .
254
Hotellings T
2
Statistic
Consider now the problem of testing whether the p1 vector
0
is a plausible value for the population mean vector .
The squared distance
T
2
= (

X
0
)
_
1
n
S
_
1
(

X
0
) = n(

X
0
)
S
1
(

X
0
)
is called the Hotelling T
2
statistic.
In the expression above,
X =
1
n
i
X
i
, S =
1
n 1
i
(X
i

X)(X
i

X)
.
255
Hotellings T
2
Statistic (contd)
If the observed T
2
value is large we reject H
0
: =
0
.
To decide how large is large, we need the sampling
distribution of T
2
when the hypothesized mean vector
is correct:
T
2
(n 1)p
(n p)
F
p,np
.
We reject the null hypothesis H
0
: =
0
for the p-dimensional
vector at level when
T
2
>
(n 1)p
(n p)
F
p,np
(),
where F
p,np
() is the upper percentile of the central F
distribution with p and n p degrees of freedom.
256
Hotellings T
2
Statistic (contd)
As we noted earlier,
T
2
= (

X
0
)
_
1
n
S
_
1
(

X
0
) = n(

X
0
)
S
1
(

X
0
)
has an approximate central chi-square distribution with p df
when
0
is correct, for large n, or when is known, in which
case the distribution is exact when we have normality.
The exact F-distribution relies on the normality assumption.
Note that
(n 1)p
(n p)
F
p,np
() >
2
p
()
but these quantities are nearly equal for large values of np.
257
Example 5.2: Female Sweat Data
Perspiration from a sample of 20 healthy females was
analyzed. Three variables were measured for each
women:
X
1
=sweat rate
X
2
=sodium content
X
3
=potassium content
The question is whether
0
= [4, 50, 10]
is plausible for
the population mean vector.
258
Example 5.2: Sweat Data (contd)
At level = 0.1, we reject the null hypothesis if
T
2
= 20(

X
0
)
S
1
(

X
0
) >
(n 1)p
(n p)
F
p,np
(0.1)
=
19(3)
17
F
3,17
(0.1) = 8.18.
From the data displayed in Table 5.1:
x =
_
_
4.64
45.4
9.96
_
_ and x
0
=
_
_
4.64 4
45.4 50
9.96 10
_
_ =
_
_
0.64
4.6
0.04
_
_ .
259
Example 5.2: Sweat Data (contd)
After computing the inverse of the 3 3 sample covariance
matrix S
1
we can compute the value of the T
2
statistic as
T
2
= 20[ 0.64 4.6 0.04 ]
_
_
0.586 0.022 0.258
0.022 0.006 0.002
0.258 0.002 0.402
_
_
_
_
0.64
4.60
0.04
_
_
= 9.74.
Since 9.74 > 8.18 we reject H
0
and conclude that
0
is not
a plausible value for at the 10% level.
At this point, we do not know which of the three
hypothesized mean values is not supported by the data.
260
The Female Sweat Data: R code sweat.R
sweat <- read.table(file =
"http://www.public.iastate.edu/~maitra/stat501/datasets/sweat.dat",
header = F, col.names = c("subject", "x1", "x2", "x3"))
library(ICSNP)
HotellingsT2(X = sweat[, -1], mu = nullmean)
# Hotellings one sample T2-test
# data: sweat[, -1]
# T.2 = 2.9045, df1 = 3, df2 = 17, p-value = 0.06493
# alternative hypothesis: true location is not equal to c(4,50,10)
261
Invariance property of Hotellings T
2
The T
2
statistic is invariant to changes in units of
measurements of the form
Y
p1
= C
pp
X
p1
+d
p1
,
with C non-singular. An example of such a transformation
is the conversion of temperature measurements from
Fahrenheit to Celsius.
Note that given observations x
1
, ..., x
n
, we nd that
y = C x +d, and S
y
= CSC
.
Similarly, E(Y ) = C +d and the hypothesized value is
Y,0
= C
0
+d.
262
Invariance property of Hotellings T
2
(contd)
We now show that the T
2
y
= T
2
x
.
T
2
y
= n( y
Y,0
)
S
1
y
( y
Y,0
)
= n(C( x
0
))
(CSC
)
1
(C( x
0
))
= n( x
0
)
(C
)
1
S
1
C
1
C( x
0
)
= n( x
0
)
S
1
( x
0
).
The Hotelling T
2
test is the most powerful test in the class
of tests that are invariate to full rank linear transformations
263
Likelihood Ratio Test and Hotellings T
2
Compare the maximum value of the multivariate normal
likelihood function under no restrictions against the
restricted maximized value with the mean vector held at
0
.
The hypothesized value
0
will be plausible if it produces a
likelihood value almost as large as the unrestricted maximum.
To test H
0
: =
0
against H
1
: =
0
we construct the
ratio:
Likelihood ratio = =
max
{}
L(
0
, )
max
{,}
L(, )
=
_
|
|
|
0
|
_
n/2
,
where the numerator in the ratio is the likelihood at the MLE
of given that =
0
and the denominator is the likelihood
at the unrestricted MLEs for both , .
264
Likelihood Ratio Test and Hotellings T
2
Since
0
= n
1
i
(x
i
0
)(x
i
0
)
under H
0
= n
1
i
x
i
= x, under H
0
H
1
= n
1
i
(x
i
x)(x
i
x)
= n
1
A, under H
0
H
1
,
then under the assumption of multivariate normality
=
|
0
|
n/2
exp{tr[
1
0
i
(x
i
0
)(x
i
0
)
/2]}
|
|
n/2
exp{tr[
1
A]}
.
265
Derivation of Likelihood Ratio Test
|
0
|
n/2
exp {
1
2
i
(x
i
0
)
1
0
(x
i
0
)]}
= |
0
|
n/2
exp{
1
2
i
tr(x
i
0
)
1
0
(x
0
)}
= |
0
|
n/2
exp{
1
2
tr
1
0
i
(x
i
0
)(x
i
0
)
}
= |
0
|
n/2
exp{
1
2
tr
1
0

0
n}
= |
0
|
n/2
exp{
np
2
}.
266
|
|
n/2
exp {
1
2
i
(x
i
x)
1
(x
i
x)]}
= |
|
n/2
exp{
1
2
i
tr(x
i
x)
1
(x
x)}
= |
|
n/2
exp{
1
2
tr
i
(x
i
x)(x
i
x)
}
= |
|
n/2
exp{
1
2
tr
n}
= |
|
n/2
exp{
np
2
}.
267
=
|
0
|
n/2
exp{
np
2
}
|
|
n/2
exp{
np
2
}
=
|
0
|
n/2
|
|
n/2
=
|
|
n/2
|
0
|
n/2
=
_
|
|
|
0
|
_
n/2
.

0
is a plausible value for if is close to one.
268
Relationship between and T
2
It is just a matter of algebra to show that
2/n
=
|
|
|
0
|
=
_
1 +
T
2
n 1
_
1
, or
|
0
|
|
|
= 1 +
T
2
n 1
.
For large T
2
, the likelihood ratio is small and both lead to
rejection of H
0
.
269
Relationship between and T
2
From the previous equation,
T
2
=
_
|
0
|
|
|
1
_
(n 1),
which provides another way to compute T
2
that does not
require inverting a covariance matrix.
When H
0
: =
0
is true, the exact distribution of the
likelihood ratio test statistic is obtained from
T
2
=
_
0
|
|
|
1
_
(n 1)
p(n 1)
n p
F
(p,np)
()
270
Union-Intersection Derivation of T
2
Consider a reduction from p-dimensional observation vectors
to univariate observations
Y
j
= a
X
j
= a
1
X
1
+a
2
X
2
+ +a
p
X
p
NID(a
, a
a)
where a
= (a
1
, a
2
, . . . , a
p
)
The null hypothesis H
0
: =
0
is true if and only if all null
hypotheses of the form H
(0,a)
: a
= a
0
are true.
Test H
(0,a)
: a
= a
0
versus H
(A,a)
: a
= a
0
with
t
2
(a)
=
_
Y a
0
s
Y
_
2
=
_
_
a

X a
0
_
1
n
a
Sa
_
_
2
271
Union-Intersection Derivation of T
2
If you cannot reject the null hypothesis for the a that
maximizes t
2
(a)
, you cannot reject any of the the univariate
null hypotheses and you cannot reject the multivariate null
hypothesis H
0
: =
0
.
From previous results, a vector that maximizes t
2
(a)
is a =
S
1
(
X)
Consequently, The maximum squared t-test is
T
2
= n(
X
0
)
S
1
(
X
0
)
272

InferenceForMeans Hotelling

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

InferenceForMeans Hotelling

Transféré par

Droits d'auteur :

Formats disponibles

Inferences about a Mean Vector

In the following lectures, we test hypotheses about a

We could test p disjoint hypothesis (one for each

X. Optimize over possible values

Vous aimerez peut-être aussi