Ghosh, Sen - 1984 - On The Asymptotic Performance of The Log Likelihood Ratio Statistic For The Mixture Model and Related Results

ON THE ASYMPTOTIC PERFORMANCE OF THE LOG LIKELIHOOD RATIO
STATISTIC FOR THE MIXTURE MODEL AND RELATED RESULTS

by
Jayanta Kumar Ghosh
Indian Statistical In$titute, Calcutta
and
Pranab Kumar Sen
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1467
September 1984
ON THE ASYMPTOTIC PERFORMANCE OF THE LOG LIKELIHOOD RATIO STATISTIC

FOR THE MIXTURE MODEL AND RELATED RESULTS I
2
JAYANTA KUMAR GHOSH
Indian Statistical Institute, Calcutta
3
PRANAB KUMAR SEN
University of North Carolina, Chapel Hill
Summary.
The classical distribution theory of the log likelihood ratio
test statistic does not hold for testing homogeneity (i.e., no mixture)
against mixture alternatives.
developed.
Asymptotic theory for this problem is
For some special cases, asymptotically locally minimax tests
are also found.
It is pointed out that the main problem is lack of
identifiability of the usual parameterization even when the mixtures are

identifiable; if one chooses an identifiable parameterisation, then there
is a problem of differentiability of the density.
AMS Subject Classification Numbers:
Key Words & Phrases:
62E20. 62F05.
Asymptotic distribution; asymptotic local minimaxity;
identifiability; likelihood ratio test statistic; mixture model.
1) This is one of the three examples presented by the first author at the
Neyman-Kiefer Conference.
2) Work done partly at the University of California, Berkeley, supported

by the ONR Grant NOOOI4-80-C-0163
3) Work partially supported by the National Heart. Lung and Blood Institute,
Contract NIH-NHLBI-71-2243-L from the National Institutes of Health.
1.
Introduction.
(l-TI)g(x,e(l
Consider a family of probability desnities and mixtures
+ TIg(x,e(2,
0 < TI < 1.
identifiable in the sense that if
TI
We assume the mixtures are
0,
TI
and
e(l)
+ 8(2),
then
the equality
(l-TI)g(x,e(l
implies
8(2)
TI
= TI',
= e(3).
e(l)
+ TIg(x,e(2
= e(3),
= (1-TI')g(x,e(3
8(2)
= 8(4)
Note that because of this,
or
TI
+ TI'g(x,e(4
=1
- TI',
g(x,8) = g(x,8')
(1.1)
8(1)
= 8(4) ,
implies
e = 8' .
(Both here and in (1.1) the relations between two densities hold almost
every where with respect to the dominating
~.)
a-finite measure
Typically in cluster analysis one models data exhibiting two clusters

by postulating a mixture of two densities.
In this context it is important
to test whether the observed clusters are real or merely a matter of

appearance caused by random sampling from a homogeneous population.
Formally, denoting the true density by
H ;f
o
= g(x,e) ,
against the mixture alternatives

and
e(l)
f,
HI
one wishes to test

6
(1.2)
considered above with
TI
+ 0,
TI
+ 6(2).
The identifiability assumption (1.1)

no overlap.
ensur~s
that
H and
O
HI
have
But nonetheless the classical asymptotic theory for likelihood
ratio tests is not applicable.
Of course as pointed out in the literature,
the null hypothesis is in some sense on the boundary of the parameter

space of this problem, rather than its interior as assumed in classical
theory.
+1
However, Chernoff (1954) has shown how to handle this kind of
departure from standard assumptions; see also Feder (1968).
The real
problem is that though the mixtures are identifiable, the parameters

8(1) , 8(2)
are not so.
If the alternative hypothesis
true density is written as
f(x,TI,8(1) .8(2
set of parameter values, namely
H
1
TI,
is true and the
then there is exactly another
(1-TI,8(2) ,8(1,
which will give exactly
the same density; it will be seen in Section 5 that this kind of nonuniqueness is not hard to take care of.
true density
is
g(x,8(0,8(0)
represented by three curves:

8(2)
= 8(0)
or
8(2)
= 8(0)
However, if
H is true and the

O
fixed, then the same density is
= 0 and 8 (1) = 8(0)
TI
8 (1)
and
or
TI
= 1 and
= 8(0) . Another way of expressing
this fact is to observe that we can pass to the one dimensional space of
H by specifying only one co-ordinate at a time -- and not two -- in the
O
three dimensional space of
H
1
Of course one can try a parametrisation
which is identifiable, i.e., one which sets up Euclidean parameters in

one to one correspondence with the mixing distribution.
Then the problem
becomes one of lack of differentiability of the density with respect to

these parameters, at points in the space of
= (1-TI)8(0) + TI8(1),
A
1
A
2
a suitable convention for
TI
H '
O
For example, we may try
= the 8 corresponding to Min(TI,l-TI)
= ~),
and
A
3
(with
= {Min(TI,1-TI)}{2A 2-8(0)-8(1)}.
We shall return to this problem after considering in detail a similar

but simpler one which may be called the case of strongly identifiable mixHere one considers two families of probability densities gl(x,8 )
1
P1
P2
g2(x,8 ), 8 E 01 c Rand 6 E 02 c R . It is notationa11y
2
2
1
tures.
and
convenient to replace
TI
by
in the mixture
f(x,8 ,8 ,8 ) =
0 1 2
(1)
(1-8 )gl(x.8 ) + 8 g (x,8 ). If 8

~ 0 or 1, we assume that
2
0
0 2
0
1
f(x 8(1) 8(1) 8(1 = f(x 8(2) 8(2) 8(2)
implies 8(1) = 8(2) where
'0'1'2
'0'1'2
8
(8 ,8 ,8 ),
0 1 2
Such mixtures may arise as models of partial slippage,
-2-
contamination or cluster analysis with some information about the direction

of additional clustering.
We wish to consider the null hypothesis
(1.3)
against the strongly identifiable mixture alternatives.
is true the parameters are still not identifiable.
Note that if
H
O
Here is an example of
this sort which will be worked out in detail in Section 4.

Example 1.
Here
N(e,l)
stands for a normal density with mean
and variance one.
To motivate our main result in the strongly identifiable case, we must

make a few general remarks about the asymptotic behaviour of the mle
(maximum likelihood estimate) when the parameter is not identifiable.
Suppose, in fact, all of Wald's (1949) conditions for the consistency of
mle hold except for identifiability of
e.
Suppose also, to fix ideas,
that the parameter space is the three dimensional Euclidean space and all
points non-identifiable from the true value
eO
lie on a curve
f.
The
best that one can hope for is that the maximum of the likelihood will
eventually be attained in a neighbourhood of this curve.
Actually, Redner
(1981) has observed that essentially Wald's proof under Wald's conditions
(sans identifiability) guarantees this; Redner calls it convergence of the
mle in the topology of the quotient space obtained by collapsing
a single point.
into
This general fact has the following implication in the
strongly identifiable case.
When the true density is

A
two components of the mle
e,
namely
-3-
eo
gl(x,e~),
the first
and
e ,
l
will converge almost
surely to their true values

true value of
ez
eo
=0
Of course there is no
and
to which
eZ
can converge.
(In fact under the assumpA
tions made for Theorem Z.l in Section 3, it can be shown that
eZ
cannot
converge almost surely to a constant.)

The preceding facts will not be used explicitly in the sequel but they
motivate what is done here.
Among other things one sees that when
H
is
O
true, one cannot confine attention to a neighbourhood of a single point in

order to maximise the likelihood.
This means that the usual quadratic ap-
proximation to the likelihood is available only with respect to the first

two components
eO
and
for which the mle is consistent.
However,
under certain assumptions, we can still utilise these partial quadratic
where
statistic is distributed as a certain functional

sup{T(n
z)}
and
T(e)
W=
is a Gaussian process with zero mean and covariance
kernel depending on the true value
e~
under
H
O
In Remark 2.1 we propose
a family of other tests with simpler limiting distribution.
Note that our
treatment does not follow from Chernoff (1954) or Feder (1968), because
they were able to exploit the existence of a consistent solution of the
likelihood equation in the identifiable case.
To follow their approach,
one would have to develop first results in solutions of the likelihood

equation in the non-identifiable case.
This can be done but use of tech-
niques similar to those of Redner (1981) seemed aesthetically more

satisfactory.
The fact that the likelihood is not locally approximable by a quadratic
has another repercussion.
The proof of asymptotic local minimaxity of the
likelihood ratio test via approximation by Bayes tests also breaks down.
In view of this it seems natural to introduce a prior
G on
82
and
then work with the integrated likelihood ratio statistic

n
f(x i .8 0 81 8 2 ) G(d8 2 ) /s8uP ~ g(yi ,8 1 )

1
or some other functional on the integrated likelihood.
choose
One should probably
G so that the associated test is asymptotically locally minimax.
It is plausible that such a

reasonable conditions.
G and an associated test always exists under
As a first step. it is proved in Section 4 that for
Example 1 the prior degenerate at

test assuming
=b
general exponentials.
and the corresponding likelihood ratio
works in this sense.

On
A similar result is proved for
the other hand it is not hard to show (though we
do not prove it here) that the likelihood ratio test is not asymptotically
locally minimax for these examples -- these are thus new instances of the
failure of the principle of maximising the likelihood.
For the case of (not strongly) identified mixtures. a result analogous
to Theorem 2.1 is obtained for the likelihood ratio test of the hypothesis
in (1.2).
where
To do this we assume a separation condition
E > 0
is a fixed quantity.
118(1)_8(2)
II
>
In a subsequent communication we shall
try to remove this condition.

A review of the literature on this topic is available in Bock (1981)
and Gupta and Huang (1981) and. the final chapter of Everitt and Hand (1981).
Even though there is no overlap with our results, a paper of Moran (1973)
deserves to be mentioned.
Moran derives the asymptotic distribution of
the likelihood ratio test of homogeneity against special mixture alternatives

in two cases. namely. Poissan and Gamma.
-5-
For his alternative Moran considers
G{(8-A )/a 2 }
mixing distributions
where
G is a fixed known distribution
of which one needs only that the third moment about mean is zero.
set-up of two point mixtures, this would correspond to assuming
so that
In our
8
= IT
k2,
18(1)_8(0)\ = O.
H is equivalent to the scale parameter

O
We conclude by making a few remarks about mixtures of
N(~,o
).
The
theorem in section 5 applies directly only the case of mixtures of means

with known
would hold if
and
a
in a compact set.
(~,o)
is unknown but
and only mixtures of mean are allowed.
However, we feel a similar result

lies in a compact set,
# 0
If one also allows scale mixtures,
substantial changes are needed in the treatment since the derivative of

the likelihood with respect to
TI
ceases to be square integrable.
that a similar phenomenon occurs in Moran's case if

moment about mean.
Note
G has non-zero third
It shou1q be possible to handle such cases by suitable
truncation of the derivatives.
Finally, it may be noted that though we
have confined ourselves to the case of mixtures our main conclusions hold
for other cases of non-identifiable parameters.
2.
and
The Main Result for Strongly Identifiable Mixtures.

f(x,8)
Let
(8 ,8 ,8 )
0
1 2
be as in the introduction and

L (8)
n
be the log likelihood based on
=
n
i.i.d. observations.
(1.3) is true and the true density is
Suppose
Ho
of
All expectations and
probabilities will be computed in this and the next section under this
assumption but this will not be displayed in the notation.
We now sketch an argument leading to Theorem 2.1, introducing notations
-6-
as we go along.
The details for handling several remainder terms as well
as the necessary assumptions (AI through AS) are collected in the next
section.
Here we only remark briefly on the nature of the assumptions.
The assumptions are similar to those in the classical case but have to be
strengthened suitably to ensure uniformity in
at various places.
T (-)
latter as well as tightness of the Gaussian process

below makes it convenient to work with
o2
Assumption A (vide
introduced
as a closed bounded interval.

8
However all we really need is compactness of

restriction to dimension one for
or its closure.
The
is made so that the tightness
Section 3) is easy to write down; it may be extended
by making use of analogous conditions in Bickel and Wichura (1972).

usual we take
The
as an open rectangle in
As
RP .
Among other things the assumptions of the next section guarantee that
all quantities introduced below are well-defined.
We now begin by rescaling the parameters through
8
+ 11 0 / In
where
eO
0
eO
ln
I + l1 l /
e2
eO
11 2
Let
L (e)
n
(11 0
,n 1 ,11 2)
be denoted as
Note that
written simply as
V (11)
n
Vn (O,0,112)
V (0).
n
when regarded as a function of

is free of
Let
-7-
11 2
11 =
and hence may be
nO
(n )
eo
be the normalised derivative with respect to
be the
lXp
and let
(row) vector of normalised derivatives with respect to the

(T1 )
components of
8 .
1
The presence or absence of
indicates dependence on
lowed below.
nZ
or lack of it.
nZ in Uno Z
The same convention is fo1-
Let
and
(p+l)X(p+l)
-rOo (n )
z
I(nZ)
r01(nz)T
-8-
U
nl
I .. 's
By Assumption AI.
log f(X ,8)
1
are related to the second order derivatives of
1J
in the usual way.
Expanding
v (n)
n
with respect to the first two co-ordinates, by Al

(2.1)
where the remainder term

uniformly in
n2
nl
is
o (1)
p
on bounded sets of
and
We prove in the next section that uniformly in
Sup L (8)
0<8 <1 n
-
er-
(0)
n2 ,
Sup A (n) + 0 (1)
p
n
ner->0
(2.2)
n1 ERP
GlEel
(The proof is similar to the classical case but one has to ensure uniformity
in
n2 ).
By the well-known Kuhn-Tucker-Lagrange theorem (viz., McCormick (1967)]
the supremum of
A (n)
is
(2.3)
if
(2.4)
and the supremum is

1
I-1U T
~ nl 11 nl
~f
...
-9-
(2.5)
Similarly
L (H )
def
Sup L (8)
n
(Z.6)
eo =0
8 Ee
l l
Hence,
(n z) = Z{L (n z) def
n
L (H )}
n
= 0 (1)
(2.7)
(Z.8)
So the likelihood ratio statistic is by definition
An = Sup L n (n z)
n2
W
where
nZ
T (e)
n
process
(Z.9)
L (H )
= Sup
Assume
cess
ez =
[b,c].
By A4 and A5 of Section 3, the stochastic pro-
taking values in
T(e)
on
C[b,c]
converges weakly to a Gaussian
whose mean is zero and the covariance kernel
is the same as that of
(under
C[b,c]
T (e)
1
T (n
and
and easy to write down.
ZZ )
The covariance
is given below assuming
scalar:
where
8~.
J(nZl,n ZZ )
Note that
is the covariance of
Var(Tl(n Z
1 V n Z.
a continuous functional, we have
-10-
lO
Since
(n Zl )
A11
and
lO
oY (e)
n
(n ZZ )
where
under
is
Under the assumptions A1 through AS of Section 3)
Theorem 2.1.
converges in distribution to
assumes only
would require that
distinct values.
k-l
oT(e).
A
Remark 2.1 (a) The limiting distribution of

Xi
dim 0
simplifies a little when
Identifiability of the mixtures
+ dim 02 + 1
(b) The limiting distributions of
Tn (8 2 )
and
An (8 )
2
are given
explicitly and applied to Example 1 in Section 4.

(c) To get alternative test statistics whose limiting distributions
are easier to compute, one may approximate
i
T (8 ),
2
n
and then consider the statistics
by a finite set
0
2
i = 1, .. ,m
totica1ly multivariate normal with zero mean under
1
m
8 )" .)8
2
2
which are asymp-
H .
0'
the dispersion
matrix can be consistently estimated since the m1e is consistent for

when
of
is true.
o
i
T (8 )'s;
n 2
form in
3.
One can use as a test statistic any suitable function
and then estimating the coefficients, one would get a
X2-distribution.
Assumptions and Details of Proof.
As in Section 2 all expectations
are computed under a fixed

AS.
for example) choosing a suitably positive definite quadratic
T 's
limiting
The assumptions are marked Al through
Instead of collecting them at one place we shall present them as the
need arises, in course of supplying some of the details left out in

Section 2.
Let
for
nol '
801
0
nolo
(8 ,8 ),
0
1
Let
01 =
d
--
(O,e 0l )
ae 0 ,
A similar convention is followed

d
= - -,
D.
ae lj
j =
(i)
is an open set of
interval
[b, c]
of
RP ,
1
R
-11-
and
unless other-
0
(e ol
,8 )
2
wise stated the derivations are evaluated at

AI.
l, . ,p,
a closed bounded
ii)
f(x,8)
is continuous in
tiable with respect to

(iii)
E(D
01
and twice continuously differen,
= 0,
log f)
E(D.D., log f) = -E(D. log f x D., log f)

J J
(iv)
E{
Sup
II 8 ol-8~111 <0
IDjD j , log f(X,8 0l ,8 2) - DjD j , log f(X,8~1,82)1} ~ 0
8 E8
2 2
as
0,
j , j'
= 0, 1, . , p
To handle the remainder in (2.2) we proceed in three stages, Assumption A2

will allow us to restrict attention to a compact subset of
culating the supremum of
which is replaced at the third

n-~ ,
and final stage by neighbourhoods that shrink like
uniformly on bounded
A2.
nol-plane.
nol-sets
The fact that
E(H(Xl
W(X1,o)
<
nl
i. e ., bounded
in (2.1) is
o (1)
p
now completes the proof.
There exists a compact neighbourhood
Moreover,
while cal-
Then with the help of A3 we work with
L (8).
arbitrary but fixed neighbourhoods of
neighbourhoods in the
81
is continuous on
82 ,
N of
01
such that
IW(X ,8 2 )1 ~ H(X ) V 8 2
l
l
and
00.
By the uniform strong law of large numbers (USLLN) applied to
L (8) - V (0),
Sup
and the fact that
8
we get
-12-
01
E[O,l]XN
Sup
(8) -
0l
(1)
(3.1)
E:[0,1]XN
8 .
2
uniformly in
A3.
L (e) =
there exists an open ball with centre
For each
and radius
[0,1] x 01
and
U = U(e 1,0)
such that if
is its intersection with
then
By A3 and continuity of
Let
U = {e
ol
I 18l-8~1 I
; 0 < eo < 0,
U n [0,1] x N by sets
U(8
Now apply the USLLN to
-1
0l
,01)
<
oJ.
Consider an open cover of
and choose a finite subcover
L:t/J(X ,U ,8 ),
i j 2
= l, ,m,
82
02
Ul' ,U
m
to con-
clude that
=
uniformly in
e
2
(1)
This completes the second stage of the proof.
At the third and final stage, note that, by Taylor's theorem and AI,
L (8)
n
-13-
where
IJij(T)
+ 1ij(T)Z)I = 0-0p (1)
(p+1)x (p+l)
1(8 )
2
A4.
is continuous in
greater than
AS.
EIDo log
> 0
V 82 ;
o x 8Z
uniformly in
We now use
and its minimum eigen value is
82
and
f(Xl,8~1,e2) - Do log f(Xl,e~1,e2)la~ KI82-8ill+Y for
a,y > O.
some
AS ensures tightness of
no
(To see this one has to use the theorem
(.).
Sup Uno (T)Z) is 0 p (1). Also by

T)Z
Hence (by A4) for given , we can find ' < ,
of Dharmadhikari et a1. (1968. Hence,

AI,
is also
U
n1
K and
(1).
such that for
n > n ,
o
(Sup U (n ) + Iu 11 < K ,
no Z
n
" n
P the
smallest eigen value of
E:
> 1-
U0 x H2
(p+1)X(p+l)
[Jij(T)] > ' ,
Then by first making a suitable orthogonal transformation, one can find

such that for
P{Vn(T)
n > no'
< Vn(O)
An(n) < An(O)
where
>
n ,
A (n)
n
Rn1
if
if
T)
the supremum of
(over
Uo x HZ
K and
L (8),
n
'.
i.e.,
[0,1] x RP) is attained in
of (2.1) is
0p(l)
I Inoll I
and
Iinoill > M} > 1-
depends only on
> M and
Thus with probability
v (n)
n
(over
IIn ol ll
~M.
o)
> 1-
E:
and that of
Since on this set
by AI, the proof of (2.2) is complete.
The proof of the similar result (2.6) follows along similar lines
from Al through A4 which are of course much stronger than what we need
for (2.6).
-14-
for
Remark 3.1 (a) The tightness assumption A5 holds if

(3.2)
E(~(X8
and
<
8 > 1.
for some
00
T (e)
(b) The limiting distribution of
8 .
1
It is weakly continuous in
T()
is continuous in
8
1
depends on the true value of
provided (i) the covariance kernel of
8 ; and (ii) the Lipschitz condition (3.2) is
suitably strengthened to be uniform over
8 -neighbourhoods, (i) guarantees

l
convergence of finite dimensional distributions of

(ii) guarantees tightness.
o
1
and
el .
Suppose this is so and let
{A ~ t(Ol,a)} = a. If for each 8 , t exists,

1
1
is unique and a point of continuity of the limiting distribution of A
t(Ol,a)
Under these conditions the limiting distribu-
A is also weakly continuous in
tion of
as
T(e)
under
'"
01
be such that
8 ,
1
then
lim P
is continuous in
is consistent for
8
1
under
H
o
8
1
By Redner's (1981) result,
Hence, as pointed out to us by
Peter Bickel, lim P {A ~ t(8 ,a)} = a. Thus the test which rejects H
8
o
l
"
1
if A ~ t(8 ,a) would be asymptotically similar provided the conditions
l
assumed here hold.
e2
They are easy to check under Al to A5 if
is a
finite set.
4.
Asymptotically Locally Minimax Tests in some Examples.
culate the asymptotic properties of tests based on
A (8 2),
n
We shall cal-
A (02)
where
is defined in (2.2) and (2.7), and show that it is asymptotically locally

minimax for problems like Example 1.
Fix
as in Section 2 and a sequence of alternatives
responding to a fixed
n = (n o ,n l ,n2 ).
We fix also a value
and consider the limiting distribution of
-15-
T (b)
n
under
eO1
cor-
b
and
of
8
K
where
T (b)
is defined in (2.4),
Let
Z*
n
= Vn (n)
Then by (2.1),
- V (0).
n
N(-~nOlI(n2)n~1,nOlI(n2)n~1)
Z*
is asymptotically
By a well-known result of LeCam, namely his
first lemma on contiguity [cf. Hajek and Sidak (1967, p. 204)], this shows
K
n
is lIontiguous to
Since
and
T (b)
n
Z*
are asymptotically
bivariate normal, by another well-known result of LeCam, namely, his third

lemma on contiguity (vide Hajek and Sidak (1967)], T (b)
is asymptotically
normal under
6~
under
o
6 .
with same asymptotic variance as under
Kn
equal to mean under
Moreover,
plus the asymptotic covariance under
= Tn(b)I{T (bO}+
relation holds under
and
(1)
K ,
under
K
n
is contiguous.
since the same
is asymptotically normal with mean zero and var-
T (b)
n
iance unity.
A (b)
Under
and mean
Also the asymptotic covariance of
Z*
n
and
T (b)
under
is
p= {I
where
00
(b)}
01
j
-~
[n I
o
and
00
(n 2 )Cov(U (b),U (n 2 )) +n
no
no
0
and
no
J=
1J
are the j-th components of
Un I'J
depends only on
P 01
L I. (n )Cov(U l"U
n J
no
and
and so may be written
(n 2 ))]
Note
By the
remarks in the preceding paragraph, the following result is true.

Theorem 4.1.
Assume the conditions of Section 3.
lim P
o
1
{A (bx} = 1- <P(/x)
n
-
=1
lim PK {An(b)":::'x}
n
if
x> 0
if
Then
=0
1 - <P(v'x- p (n ,T)2))
o
if
x> 0
if
x = 0
-16-
where
is the standard normal distribution function.
Consider now Example 1 given in the introduction and fix

~n
Let the limiting power of a sequence of tests

be denoted by
S({~n},e~,no,nl,n2). Let us say
locally minimax if for all sequences

inf
n l ERl
nZE 0Z
for every
no >
~}
{ 'fin
of size
{~o}
n
a < .5.
under
is asymptotically
which have limiting power,
S({~n},e~,no,nl,n2)
and
By the classical theory for the likelihood ratio test
~o
n
based on
inf
inf
n ER1 S({~n},e~.no,nl,b) ~ n ERl S({~~}.e~,nl,n2,b)
1
r
for every
no > 0 and
~o
Hence asymptotic local minimaxity of

p(no,n z)
if we show
p(no,b)
easily by direct calculation.
V n Z ~ b.
will follow from Theorem 4.1
For Example 1 this follows
However the following lemma shows this
property is true for general exponentials.
Befpre stating the lemma we
introduce some notations.

Let
ge
= g(x.8) = A(e)exp{8x}h(x),
a family of probability densities.
Let
Let
1/!(8)
11
ge
'Cov(g'
a
gb
--)
g
a
-17-
a < b
some open interval
J,
be
be fixed elements of
J.
Ie =
where the covariances are computed under
a '
To relate to Example 1 (and similar problems) note that
elo = a.
with
We assume
Lemma 4.1.
Proof.
~(e)
~(e) ~ ~(b)
if
e > b.
Note that
=0
~(a)
Also
J.
is finite on
~(e)
<
can be expressed as
g
g'
/{I 11 ~
- o
I l(b)
~ - I11}gedlJ
g
g
a
a
Since
(4.1)
~(b)
is convex, for any constant
K,
say
(x) - K can have at most two
sign changes and if there are two, they must be from positive to negative
and negative to positive.
Hence by Karlin's well-known result on sign
diminishing properties of the exponential densities (see, e.g. Karlin

(1968,
8' > b
~(e)
- K has similar sign change properties.
such that
~(8')
<
contradicted at the points

Max{o,~(e')}
< K < ~(b).
~(b)
If there exists
then this sign change property would be
a,b,8'
provided we choose
This proves the lemma.
-18-
K such that
5.
The Case of Identifiable Mixtures.
interval and
its closure.
< 8
< 1
and
182-8112:
loss of generality we may take

i. e. ,
H :
Suppose
H
o
g(x,6),
be an open bounded real

E
be a family of densities
(1-8 )g(x,6 ) + 8 g(x,8 ) ,

0
0
l
2
and consider the mixtures
Let
Let
0.
being a fixed 'positive number.
< 80
~~.
where
Without
We wish to test homogeneity,
against the above mixture alternatives.
In the sequel
We make the
is true and the true density is
following blanket assumption

B1.
Let
be any open set containing
0 n O = 0.
1
2
that
8
61
8~
O2
and
any closed set such
Then AI, A3, A4, AS hold with this
0 ,0
1 2
(Since
is compact by assumption, A2 is dropped.)

One can now imitate the arguments in Sections 2 and 3.
we can show that in order to maximise the log likelihood

restrict to
restricted to
Define
< e
< 0
and
82 -< 80 - (-0)
n,Vn(n),An(n)
I81-8~1
< 0
and
82 > 8
-
(0 < E).
0
etc. as in Section 2.
L (8)
vn (0) + Sup An (n) +

An
is over the set
-19-
0(1)
and
A (n)
= (0,0).
Hence
o1
8 ,
2
where the maximisation of
may be
Then one can prove as in
attain their maximum in a bounded neighbourhood of
Sup L (8)
n
8
01
82
we may
Call this set
Section 3, that with probability tending to one both
uniformly in
L (8)
n
Hence
+ (-c)).
As in Section 3,
(1)
Va'
Because of the nature of
for given E' one can find K and

2
> 1 - E, the maximum of A
(over
n
such that with probability
o<
E -
attained at
no'
n ol
62
2- 8 1 +
if
E -
Kn
Kn
-~
2- 8 2 2- 8 1
+ 0 or
-~
An easy calculation now shows

Sup L (8)
8
where
0(2) = {8
2
L (8)
n
under
L (8)
in Section 2 with
H
o2
A (n)
n
H:
The supremum of
= V (0) +
(1)
~ 8~ +d.
8 2 2- 8 1 -
under
H
a
0(2)
2
Since the expression for the supremum of
or
has therefore the same expression as
remains unaltered, the conclusion of Theorem 2.1 is
valid, i.e., the following is true.

Theorem 5.1.
Assume Bl.
likelihood ratio test under
Then the limiting distribution of the
8~
..
is the same as that in Theorem 2.1 with
o =
0(2)
22
ACKNOWLEDGEMENT
Thanks are due to the referee whose comments clarified many issues
and led to a better presentation.
REFERENCES
Bickel, P.J. and Wichura, M.J. (1971).
Convergence criteria for multi-
parameter stochastic processes and some applications.

Statist., 42, 1656-1670.
-20-
Ann. Math.
Bock, H.H. (1981).

analysis.
Statistical testing and evaluation methods in cluster
Paper presented at the lSI Golden Jubilee Conference, to
appear in the Proceedings .
Chernoff, H. (1954).
On the distribution of the likelihood ratio.
Ann.
Math. Statist., 25, 573-578.

Dharmadhikari, S.W., Fabian, V. and Jogdeo, K. (1968).
of martingales.
Bounds on moments
Ann. Math. Statist., 39, 1719-1723.
Everitt, B.S. and Hand, D.J. (1981).
Finite Mixture Distributions.
Chapman and Hall, London.

Feder, P.I. (1968).
On the distribution of the log likelihood ratio test
statistic when the true parameter is near the boundaries of the

hypothesis region.
Gupta, S.S. and Huang, Wen-Tao (1981).
On mixtures of distributions:
survey and some new results on ranking and selection.
Sankhya
43,
245-290.
Hajek. J. and Sidak. Z. (1967).
Theory of Rank Tests.
Academic Press,
New York.
Karlin. S. (1968).
Total Positivity, Vol. 1, Stanford University Press,
Stanford, California.
McCormick. S.P. (1967).
Moran. P.A.P. (1973).
Nonlinear Programming.
McGraw Hill, New York.
Asymptotic properties of homogeneity tests.
Biometrika, 60, 79-85.

Redner. R. (1981).
Note on the consistency of the maximum likelihood
estimate for non-identifiable distributions.

Wald, A. (1949).
estimate.
Ann. Statist.,
~,
Note on the consistency of the maximum likelihood

-21-
224-227 .

Ghosh, Sen - 1984 - On The Asymptotic Performance of The Log Likelihood Ratio Statistic For The Mixture Model and Related Results

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ghosh, Sen - 1984 - On The Asymptotic Performance of The Log Likelihood Ratio Statistic For The Mixture Model and Related Results

Transféré par

Droits d'auteur :

Formats disponibles

ON THE ASYMPTOTIC PERFORMANCE OF THE LOG LIKELIHOOD RATIO

STATISTIC FOR THE MIXTURE MODEL AND RELATED RESULTS

ON THE ASYMPTOTIC PERFORMANCE OF THE LOG LIKELIHOOD RATIO STATISTIC

The classical distribution theory of the log likelihood ratio

Asymptotic theory for this problem is

For some special cases, asymptotically locally minimax tests

are also found.

It is pointed out that the main problem is lack of

identifiability of the usual parameterization even when the mixtures are

Asymptotic distribution; asymptotic local minimaxity;

identifiability; likelihood ratio test statistic; mixture model.

2) Work done partly at the University of California, Berkeley, supported

Consider a family of probability desnities and mixtures

identifiable in the sense that if

We assume the mixtures are

Note that because of this,

Typically in cluster analysis one models data exhibiting two clusters

In this context it is important

to test whether the observed clusters are real or merely a matter of

against the mixture alternatives

one wishes to test

considered above with

The identifiability assumption (1.1)

But nonetheless the classical asymptotic theory for likelihood

ratio tests is not applicable.

Of course as pointed out in the literature,

the null hypothesis is in some sense on the boundary of the parameter

However, Chernoff (1954) has shown how to handle this kind of

departure from standard assumptions; see also Feder (1968).

problem is that though the mixtures are identifiable, the parameters

are not so.

If the alternative hypothesis

true density is written as

set of parameter values, namely

is true and the

then there is exactly another

which will give exactly

represented by three curves:

H is true and the

fixed, then the same density is

= 0 and 8 (1) = 8(0)

= 8(0) . Another way of expressing

Of course one can try a parametrisation

which is identifiable, i.e., one which sets up Euclidean parameters in

Then the problem

becomes one of lack of differentiability of the density with respect to

a suitable convention for

For example, we may try

= the 8 corresponding to Min(TI,l-TI)

We shall return to this problem after considering in detail a similar

(1-8 )gl(x.8 ) + 8 g (x,8 ). If 8

Such mixtures may arise as models of partial slippage,

contamination or cluster analysis with some information about the direction

We wish to consider the null hypothesis

this sort which will be worked out in detail in Section 4.

stands for a normal density with mean

and variance one.

To motivate our main result in the strongly identifiable case, we must

Suppose also, to fix ideas,

This general fact has the following implication in the

strongly identifiable case.

When the true density is

two components of the mle