Académique Documents
Professionnel Documents
Culture Documents
= a
0
versus H
a
: a
= a
0
for some linear
combination of the traits a
n
.
If t is small, then
X and
0
are close and we fail to reject
H
0
.
252
Univariate Hypothesis Testing (contd)
When H
0
is true, the statistic t has a student t distribution
with n1 degrees of freedom. We reject the null hypothesis
at level when |t| > t
(n1)
(/2).
Notice that rejecting H
0
when t is large is equivalent to
rejecting it when the squared standardized distance
t
2
=
(
X
0
)
2
s
2
/n
= n(
X
0
)(s
2
)
1
(
X
0
)
is large.
We reject H
0
when
n(
X
0
)(s
2
)
1
(
X
0
) > t
2
(n1)
(/2)
i.e., the squared standardized distance exceeds the upper
percentile of a central F-distribution with n 1 df.
253
Univariate Hypothesis Testing (contd)
If we fail to reject H
0
, we conclude that
0
is close (in units
of standard deviations of
X) to
X, and thus is a plausible
value for .
The set of plausible values for is the set of all values that
lie in the 100(1 )% condence interval for :
x t
n1
(/2)
s
n
0
x +t
n1
(/2)
s
n
.
The condence interval consists of all the
0
values that
would not be rejected by the level test of H
0
: =
0
.
Before collecting the data, the interval is random and has
probability 1 of containing .
254
Hotellings T
2
Statistic
Consider now the problem of testing whether the p1 vector
0
is a plausible value for the population mean vector .
The squared distance
T
2
= (
X
0
)
_
1
n
S
_
1
(
X
0
) = n(
X
0
)
S
1
(
X
0
)
is called the Hotelling T
2
statistic.
In the expression above,
X =
1
n
i
X
i
, S =
1
n 1
i
(X
i
X)(X
i
X)
.
255
Hotellings T
2
Statistic (contd)
If the observed T
2
value is large we reject H
0
: =
0
.
To decide how large is large, we need the sampling
distribution of T
2
when the hypothesized mean vector
is correct:
T
2
(n 1)p
(n p)
F
p,np
.
We reject the null hypothesis H
0
: =
0
for the p-dimensional
vector at level when
T
2
>
(n 1)p
(n p)
F
p,np
(),
where F
p,np
() is the upper percentile of the central F
distribution with p and n p degrees of freedom.
256
Hotellings T
2
Statistic (contd)
As we noted earlier,
T
2
= (
X
0
)
_
1
n
S
_
1
(
X
0
) = n(
X
0
)
S
1
(
X
0
)
has an approximate central chi-square distribution with p df
when
0
is correct, for large n, or when is known, in which
case the distribution is exact when we have normality.
The exact F-distribution relies on the normality assumption.
Note that
(n 1)p
(n p)
F
p,np
() >
2
p
()
but these quantities are nearly equal for large values of np.
257
Example 5.2: Female Sweat Data
Perspiration from a sample of 20 healthy females was
analyzed. Three variables were measured for each
women:
X
1
=sweat rate
X
2
=sodium content
X
3
=potassium content
The question is whether
0
= [4, 50, 10]
is plausible for
the population mean vector.
258
Example 5.2: Sweat Data (contd)
At level = 0.1, we reject the null hypothesis if
T
2
= 20(
X
0
)
S
1
(
X
0
) >
(n 1)p
(n p)
F
p,np
(0.1)
=
19(3)
17
F
3,17
(0.1) = 8.18.
From the data displayed in Table 5.1:
x =
_
_
4.64
45.4
9.96
_
_ and x
0
=
_
_
4.64 4
45.4 50
9.96 10
_
_ =
_
_
0.64
4.6
0.04
_
_ .
259
Example 5.2: Sweat Data (contd)
After computing the inverse of the 3 3 sample covariance
matrix S
1
we can compute the value of the T
2
statistic as
T
2
= 20[ 0.64 4.6 0.04 ]
_
_
0.586 0.022 0.258
0.022 0.006 0.002
0.258 0.002 0.402
_
_
_
_
0.64
4.60
0.04
_
_
= 9.74.
Since 9.74 > 8.18 we reject H
0
and conclude that
0
is not
a plausible value for at the 10% level.
At this point, we do not know which of the three
hypothesized mean values is not supported by the data.
260
The Female Sweat Data: R code sweat.R
sweat <- read.table(file =
"http://www.public.iastate.edu/~maitra/stat501/datasets/sweat.dat",
header = F, col.names = c("subject", "x1", "x2", "x3"))
library(ICSNP)
HotellingsT2(X = sweat[, -1], mu = nullmean)
# Hotellings one sample T2-test
# data: sweat[, -1]
# T.2 = 2.9045, df1 = 3, df2 = 17, p-value = 0.06493
# alternative hypothesis: true location is not equal to c(4,50,10)
261
Invariance property of Hotellings T
2
The T
2
statistic is invariant to changes in units of
measurements of the form
Y
p1
= C
pp
X
p1
+d
p1
,
with C non-singular. An example of such a transformation
is the conversion of temperature measurements from
Fahrenheit to Celsius.
Note that given observations x
1
, ..., x
n
, we nd that
y = C x +d, and S
y
= CSC
.
Similarly, E(Y ) = C +d and the hypothesized value is
Y,0
= C
0
+d.
262
Invariance property of Hotellings T
2
(contd)
We now show that the T
2
y
= T
2
x
.
T
2
y
= n( y
Y,0
)
S
1
y
( y
Y,0
)
= n(C( x
0
))
(CSC
)
1
(C( x
0
))
= n( x
0
)
(C
)
1
S
1
C
1
C( x
0
)
= n( x
0
)
S
1
( x
0
).
The Hotelling T
2
test is the most powerful test in the class
of tests that are invariate to full rank linear transformations
263
Likelihood Ratio Test and Hotellings T
2
Compare the maximum value of the multivariate normal
likelihood function under no restrictions against the
restricted maximized value with the mean vector held at
0
.
The hypothesized value
0
will be plausible if it produces a
likelihood value almost as large as the unrestricted maximum.
To test H
0
: =
0
against H
1
: =
0
we construct the
ratio:
Likelihood ratio = =
max
{}
L(
0
, )
max
{,}
L(, )
=
_
|
|
|
0
|
_
n/2
,
where the numerator in the ratio is the likelihood at the MLE
of given that =
0
and the denominator is the likelihood
at the unrestricted MLEs for both , .
264
Likelihood Ratio Test and Hotellings T
2
Since
0
= n
1
i
(x
i
0
)(x
i
0
)
under H
0
= n
1
i
x
i
= x, under H
0
H
1
= n
1
i
(x
i
x)(x
i
x)
= n
1
A, under H
0
H
1
,
then under the assumption of multivariate normality
=
|
0
|
n/2
exp{tr[
1
0
i
(x
i
0
)(x
i
0
)
/2]}
|
|
n/2
exp{tr[
1
A]}
.
265
Derivation of Likelihood Ratio Test
|
0
|
n/2
exp {
1
2
i
(x
i
0
)
1
0
(x
i
0
)]}
= |
0
|
n/2
exp{
1
2
i
tr(x
i
0
)
1
0
(x
0
)}
= |
0
|
n/2
exp{
1
2
tr
1
0
i
(x
i
0
)(x
i
0
)
}
= |
0
|
n/2
exp{
1
2
tr
1
0
0
n}
= |
0
|
n/2
exp{
np
2
}.
266
Derivation of Likelihood Ratio Test
|
|
n/2
exp {
1
2
i
(x
i
x)
1
(x
i
x)]}
= |
|
n/2
exp{
1
2
i
tr(x
i
x)
1
(x
x)}
= |
|
n/2
exp{
1
2
tr
i
(x
i
x)(x
i
x)
}
= |
|
n/2
exp{
1
2
tr
n}
= |
|
n/2
exp{
np
2
}.
267
Derivation of Likelihood Ratio Test
=
|
0
|
n/2
exp{
np
2
}
|
|
n/2
exp{
np
2
}
=
|
0
|
n/2
|
|
n/2
=
|
|
n/2
|
0
|
n/2
=
_
|
|
|
0
|
_
n/2
.
0
is a plausible value for if is close to one.
268
Relationship between and T
2
It is just a matter of algebra to show that
2/n
=
|
|
|
0
|
=
_
1 +
T
2
n 1
_
1
, or
|
0
|
|
|
= 1 +
T
2
n 1
.
For large T
2
, the likelihood ratio is small and both lead to
rejection of H
0
.
269
Relationship between and T
2
From the previous equation,
T
2
=
_
|
0
|
|
|
1
_
(n 1),
which provides another way to compute T
2
that does not
require inverting a covariance matrix.
When H
0
: =
0
is true, the exact distribution of the
likelihood ratio test statistic is obtained from
T
2
=
_
0
|
|
|
1
_
(n 1)
p(n 1)
n p
F
(p,np)
()
270
Union-Intersection Derivation of T
2
Consider a reduction from p-dimensional observation vectors
to univariate observations
Y
j
= a
X
j
= a
1
X
1
+a
2
X
2
+ +a
p
X
p
NID(a
, a
a)
where a
= (a
1
, a
2
, . . . , a
p
)
The null hypothesis H
0
: =
0
is true if and only if all null
hypotheses of the form H
(0,a)
: a
= a
0
are true.
Test H
(0,a)
: a
= a
0
versus H
(A,a)
: a
= a
0
with
t
2
(a)
=
_
Y a
0
s
Y
_
2
=
_
_
a
X a
0
_
1
n
a
Sa
_
_
2
271
Union-Intersection Derivation of T
2
If you cannot reject the null hypothesis for the a that
maximizes t
2
(a)
, you cannot reject any of the the univariate
null hypotheses and you cannot reject the multivariate null
hypothesis H
0
: =
0
.
From previous results, a vector that maximizes t
2
(a)
is a =
S
1
(
X)
Consequently, The maximum squared t-test is
T
2
= n(
X
0
)
S
1
(
X
0
)
272