Vous êtes sur la page 1sur 23

ON THE ASYMPTOTIC PERFORMANCE OF THE LOG LIKELIHOOD RATIO

STATISTIC FOR THE MIXTURE MODEL AND RELATED RESULTS


by
Jayanta Kumar Ghosh
Indian Statistical In$titute, Calcutta
and
Pranab Kumar Sen
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1467
September 1984

ON THE ASYMPTOTIC PERFORMANCE OF THE LOG LIKELIHOOD RATIO STATISTIC


FOR THE MIXTURE MODEL AND RELATED RESULTS I

2
JAYANTA KUMAR GHOSH
Indian Statistical Institute, Calcutta
3
PRANAB KUMAR SEN
University of North Carolina, Chapel Hill

Summary.

The classical distribution theory of the log likelihood ratio

test statistic does not hold for testing homogeneity (i.e., no mixture)
against mixture alternatives.
developed.

Asymptotic theory for this problem is

For some special cases, asymptotically locally minimax tests

are also found.

It is pointed out that the main problem is lack of

identifiability of the usual parameterization even when the mixtures are


identifiable; if one chooses an identifiable parameterisation, then there
is a problem of differentiability of the density.
AMS Subject Classification Numbers:
Key Words & Phrases:

62E20. 62F05.

Asymptotic distribution; asymptotic local minimaxity;

identifiability; likelihood ratio test statistic; mixture model.

1) This is one of the three examples presented by the first author at the
Neyman-Kiefer Conference.

2) Work done partly at the University of California, Berkeley, supported


by the ONR Grant NOOOI4-80-C-0163

3) Work partially supported by the National Heart. Lung and Blood Institute,
Contract NIH-NHLBI-71-2243-L from the National Institutes of Health.

1.

Introduction.

(l-TI)g(x,e(l

Consider a family of probability desnities and mixtures

+ TIg(x,e(2,

0 < TI < 1.

identifiable in the sense that if

TI

We assume the mixtures are

0,

TI

and

e(l)

+ 8(2),

then

the equality
(l-TI)g(x,e(l
implies
8(2)

TI

= TI',

= e(3).

e(l)

+ TIg(x,e(2

= e(3),

= (1-TI')g(x,e(3

8(2)

= 8(4)

Note that because of this,

or

TI

+ TI'g(x,e(4

=1

- TI',

g(x,8) = g(x,8')

(1.1)

8(1)

= 8(4) ,

implies

e = 8' .

(Both here and in (1.1) the relations between two densities hold almost
every where with respect to the dominating

~.)

a-finite measure

Typically in cluster analysis one models data exhibiting two clusters


by postulating a mixture of two densities.

In this context it is important

to test whether the observed clusters are real or merely a matter of


appearance caused by random sampling from a homogeneous population.
Formally, denoting the true density by
H ;f
o

= g(x,e) ,

against the mixture alternatives


and

e(l)

f,

HI

one wishes to test


6

(1.2)

considered above with

TI

+ 0,

TI

+ 6(2).

The identifiability assumption (1.1)


no overlap.

ensur~s

that

H and
O

HI

have

But nonetheless the classical asymptotic theory for likelihood

ratio tests is not applicable.

Of course as pointed out in the literature,

the null hypothesis is in some sense on the boundary of the parameter


space of this problem, rather than its interior as assumed in classical
theory.

+1

However, Chernoff (1954) has shown how to handle this kind of

departure from standard assumptions; see also Feder (1968).

The real

problem is that though the mixtures are identifiable, the parameters


8(1) , 8(2)

are not so.

If the alternative hypothesis

true density is written as

f(x,TI,8(1) .8(2

set of parameter values, namely

H
1

TI,

is true and the

then there is exactly another

(1-TI,8(2) ,8(1,

which will give exactly

the same density; it will be seen in Section 5 that this kind of nonuniqueness is not hard to take care of.
true density

is

g(x,8(0,8(0)

represented by three curves:


8(2)

= 8(0)

or

8(2)

= 8(0)

However, if

H is true and the


O

fixed, then the same density is

= 0 and 8 (1) = 8(0)

TI

8 (1)

and

or

TI

= 1 and

= 8(0) . Another way of expressing

this fact is to observe that we can pass to the one dimensional space of
H by specifying only one co-ordinate at a time -- and not two -- in the
O
three dimensional space of

H
1

Of course one can try a parametrisation

which is identifiable, i.e., one which sets up Euclidean parameters in


one to one correspondence with the mixing distribution.

Then the problem

becomes one of lack of differentiability of the density with respect to


these parameters, at points in the space of

= (1-TI)8(0) + TI8(1),

A
1

A
2

a suitable convention for

TI

H '
O

For example, we may try

= the 8 corresponding to Min(TI,l-TI)

= ~),

and

A
3

(with

= {Min(TI,1-TI)}{2A 2-8(0)-8(1)}.

We shall return to this problem after considering in detail a similar


but simpler one which may be called the case of strongly identifiable mixHere one considers two families of probability densities gl(x,8 )
1
P1
P2
g2(x,8 ), 8 E 01 c Rand 6 E 02 c R . It is notationa11y
2
2
1

tures.
and

convenient to replace

TI

by

in the mixture

f(x,8 ,8 ,8 ) =
0 1 2

(1)

(1-8 )gl(x.8 ) + 8 g (x,8 ). If 8


~ 0 or 1, we assume that
2
0
0 2
0
1
f(x 8(1) 8(1) 8(1 = f(x 8(2) 8(2) 8(2)
implies 8(1) = 8(2) where
'0'1'2
'0'1'2
8

(8 ,8 ,8 ),
0 1 2

Such mixtures may arise as models of partial slippage,

-2-

contamination or cluster analysis with some information about the direction


of additional clustering.

We wish to consider the null hypothesis

(1.3)
against the strongly identifiable mixture alternatives.
is true the parameters are still not identifiable.

Note that if

H
O

Here is an example of

this sort which will be worked out in detail in Section 4.


Example 1.

Here

N(e,l)

stands for a normal density with mean

and variance one.

To motivate our main result in the strongly identifiable case, we must


make a few general remarks about the asymptotic behaviour of the mle
(maximum likelihood estimate) when the parameter is not identifiable.
Suppose, in fact, all of Wald's (1949) conditions for the consistency of
mle hold except for identifiability of

e.

Suppose also, to fix ideas,

that the parameter space is the three dimensional Euclidean space and all
points non-identifiable from the true value

eO

lie on a curve

f.

The

best that one can hope for is that the maximum of the likelihood will
eventually be attained in a neighbourhood of this curve.

Actually, Redner

(1981) has observed that essentially Wald's proof under Wald's conditions
(sans identifiability) guarantees this; Redner calls it convergence of the
mle in the topology of the quotient space obtained by collapsing
a single point.

into

This general fact has the following implication in the

strongly identifiable case.

When the true density is


A

two components of the mle

e,

namely

-3-

eo

gl(x,e~),

the first

and

e ,
l

will converge almost

surely to their true values


true value of

ez

eo

=0

Of course there is no

and

to which

eZ

can converge.

(In fact under the assumpA

tions made for Theorem Z.l in Section 3, it can be shown that

eZ

cannot

converge almost surely to a constant.)


The preceding facts will not be used explicitly in the sequel but they
motivate what is done here.

Among other things one sees that when

H
is
O

true, one cannot confine attention to a neighbourhood of a single point in


order to maximise the likelihood.

This means that the usual quadratic ap-

proximation to the likelihood is available only with respect to the first


two components

eO

and

for which the mle is consistent.

However,

under certain assumptions, we can still utilise these partial quadratic

where

statistic is distributed as a certain functional


sup{T(n

z)}

and

T(e)

W=

is a Gaussian process with zero mean and covariance

kernel depending on the true value

e~

under

H
O

In Remark 2.1 we propose

a family of other tests with simpler limiting distribution.

Note that our

treatment does not follow from Chernoff (1954) or Feder (1968), because
they were able to exploit the existence of a consistent solution of the
likelihood equation in the identifiable case.

To follow their approach,

one would have to develop first results in solutions of the likelihood


equation in the non-identifiable case.

This can be done but use of tech-

niques similar to those of Redner (1981) seemed aesthetically more


satisfactory.
The fact that the likelihood is not locally approximable by a quadratic
has another repercussion.

The proof of asymptotic local minimaxity of the

likelihood ratio test via approximation by Bayes tests also breaks down.

In view of this it seems natural to introduce a prior

G on

82

and

then work with the integrated likelihood ratio statistic


n

f(x i .8 0 81 8 2 ) G(d8 2 ) /s8uP ~ g(yi ,8 1 )


1
or some other functional on the integrated likelihood.
choose

One should probably

G so that the associated test is asymptotically locally minimax.

It is plausible that such a


reasonable conditions.

G and an associated test always exists under

As a first step. it is proved in Section 4 that for

Example 1 the prior degenerate at


test assuming

=b

general exponentials.

and the corresponding likelihood ratio

works in this sense.


On

A similar result is proved for

the other hand it is not hard to show (though we

do not prove it here) that the likelihood ratio test is not asymptotically
locally minimax for these examples -- these are thus new instances of the
failure of the principle of maximising the likelihood.
For the case of (not strongly) identified mixtures. a result analogous
to Theorem 2.1 is obtained for the likelihood ratio test of the hypothesis
in (1.2).
where

To do this we assume a separation condition

E > 0

is a fixed quantity.

118(1)_8(2)

II

>

In a subsequent communication we shall

try to remove this condition.


A review of the literature on this topic is available in Bock (1981)
and Gupta and Huang (1981) and. the final chapter of Everitt and Hand (1981).
Even though there is no overlap with our results, a paper of Moran (1973)
deserves to be mentioned.

Moran derives the asymptotic distribution of

the likelihood ratio test of homogeneity against special mixture alternatives


in two cases. namely. Poissan and Gamma.

-5-

For his alternative Moran considers

G{(8-A )/a 2 }

mixing distributions

where

G is a fixed known distribution

of which one needs only that the third moment about mean is zero.
set-up of two point mixtures, this would correspond to assuming
so that

In our
8

= IT

k2,

18(1)_8(0)\ = O.

H is equivalent to the scale parameter


O

We conclude by making a few remarks about mixtures of

N(~,o

).

The

theorem in section 5 applies directly only the case of mixtures of means


with known

would hold if

and
a

in a compact set.
(~,o)

is unknown but

and only mixtures of mean are allowed.

However, we feel a similar result


lies in a compact set,

# 0

If one also allows scale mixtures,

substantial changes are needed in the treatment since the derivative of


the likelihood with respect to

TI

ceases to be square integrable.

that a similar phenomenon occurs in Moran's case if


moment about mean.

Note

G has non-zero third

It shou1q be possible to handle such cases by suitable

truncation of the derivatives.

Finally, it may be noted that though we

have confined ourselves to the case of mixtures our main conclusions hold
for other cases of non-identifiable parameters.
2.
and

The Main Result for Strongly Identifiable Mixtures.


f(x,8)

Let

(8 ,8 ,8 )
0
1 2

be as in the introduction and


L (8)
n

be the log likelihood based on

=
n

i.i.d. observations.

(1.3) is true and the true density is

Suppose

Ho

of

All expectations and

probabilities will be computed in this and the next section under this
assumption but this will not be displayed in the notation.
We now sketch an argument leading to Theorem 2.1, introducing notations

-6-

as we go along.

The details for handling several remainder terms as well

as the necessary assumptions (AI through AS) are collected in the next
section.

Here we only remark briefly on the nature of the assumptions.

The assumptions are similar to those in the classical case but have to be

strengthened suitably to ensure uniformity in

at various places.

T (-)

latter as well as tightness of the Gaussian process


below makes it convenient to work with

o2

Assumption A (vide

introduced

as a closed bounded interval.


8

However all we really need is compactness of


restriction to dimension one for

or its closure.

The

is made so that the tightness

Section 3) is easy to write down; it may be extended

by making use of analogous conditions in Bickel and Wichura (1972).


usual we take

The

as an open rectangle in

As

RP .

Among other things the assumptions of the next section guarantee that
all quantities introduced below are well-defined.
We now begin by rescaling the parameters through
8

+ 11 0 / In

where

eO
0

eO

ln
I + l1 l /

e2

eO

11 2

Let

L (e)
n

(11 0

,n 1 ,11 2)

be denoted as
Note that

written simply as

V (11)
n

Vn (O,0,112)

V (0).
n

when regarded as a function of


is free of

Let

-7-

11 2

11 =

and hence may be

nO

(n )

eo

be the normalised derivative with respect to

be the

lXp

and let

(row) vector of normalised derivatives with respect to the


(T1 )

components of

8 .
1

The presence or absence of

indicates dependence on
lowed below.

nZ

or lack of it.

nZ in Uno Z

The same convention is fo1-

Let

and
(p+l)X(p+l)

-rOo (n )
z

I(nZ)

r01(nz)T

-8-

U
nl

I .. 's

By Assumption AI.
log f(X ,8)
1

are related to the second order derivatives of

1J

in the usual way.

Expanding

v (n)
n

with respect to the first two co-ordinates, by Al


(2.1)

where the remainder term


uniformly in

n2

nl

is

o (1)
p

on bounded sets of

and

We prove in the next section that uniformly in

Sup L (8)
0<8 <1 n
-

er-

(0)

n2 ,
Sup A (n) + 0 (1)
p
n

ner->0

(2.2)

n1 ERP

GlEel

(The proof is similar to the classical case but one has to ensure uniformity
in

n2 ).
By the well-known Kuhn-Tucker-Lagrange theorem (viz., McCormick (1967)]

the supremum of

A (n)

is

(2.3)

if

(2.4)

and the supremum is


1

I-1U T

~ nl 11 nl

~f
...

-9-

(2.5)

Similarly
L (H )

def

Sup L (8)
n

(Z.6)

eo =0
8 Ee
l l

Hence,

(n z) = Z{L (n z) def
n

L (H )}
n

= 0 (1)

(2.7)

(Z.8)

So the likelihood ratio statistic is by definition

An = Sup L n (n z)
n2
W

where

nZ

T (e)
n

process

(Z.9)

L (H )

= Sup

Assume
cess

ez =

[b,c].

By A4 and A5 of Section 3, the stochastic pro-

taking values in

T(e)

on

C[b,c]

converges weakly to a Gaussian

whose mean is zero and the covariance kernel

is the same as that of

(under

C[b,c]

T (e)
1
T (n

and

and easy to write down.

ZZ )

The covariance

is given below assuming

scalar:

where

8~.

J(nZl,n ZZ )
Note that

is the covariance of

Var(Tl(n Z

1 V n Z.

a continuous functional, we have

-10-

lO

Since

(n Zl )

A11

and

lO

oY (e)
n

(n ZZ )
where

under

is

Under the assumptions A1 through AS of Section 3)

Theorem 2.1.

converges in distribution to

assumes only

would require that

distinct values.
k-l

oT(e).
A

Remark 2.1 (a) The limiting distribution of


Xi

dim 0

simplifies a little when

Identifiability of the mixtures

+ dim 02 + 1

(b) The limiting distributions of

Tn (8 2 )

and

An (8 )
2

are given

explicitly and applied to Example 1 in Section 4.


(c) To get alternative test statistics whose limiting distributions
are easier to compute, one may approximate
i
T (8 ),
2
n

and then consider the statistics

by a finite set
0
2
i = 1, .. ,m

totica1ly multivariate normal with zero mean under

1
m
8 )" .)8
2
2

which are asymp-

H .
0'

the dispersion

matrix can be consistently estimated since the m1e is consistent for


when
of

is true.

o
i

T (8 )'s;
n 2

form in

3.

One can use as a test statistic any suitable function

and then estimating the coefficients, one would get a

X2-distribution.

Assumptions and Details of Proof.

As in Section 2 all expectations

are computed under a fixed


AS.

for example) choosing a suitably positive definite quadratic

T 's

limiting

The assumptions are marked Al through

Instead of collecting them at one place we shall present them as the

need arises, in course of supplying some of the details left out in


Section 2.
Let
for

nol '

801

0
nolo

(8 ,8 ),
0
1
Let

01 =
d
--

(O,e 0l )

ae 0 ,

A similar convention is followed


d

= - -,

D.

ae lj

j =

(i)

is an open set of

interval

[b, c]

of

RP ,
1
R

-11-

and

unless other-

0
(e ol
,8 )
2

wise stated the derivations are evaluated at


AI.

l, . ,p,

a closed bounded

ii)

f(x,8)

is continuous in

tiable with respect to


(iii)

E(D

01

and twice continuously differen,

= 0,

log f)

E(D.D., log f) = -E(D. log f x D., log f)


J J

(iv)

E{

Sup

II 8 ol-8~111 <0

IDjD j , log f(X,8 0l ,8 2) - DjD j , log f(X,8~1,82)1} ~ 0

8 E8
2 2

as

0,

j , j'

= 0, 1, . , p

To handle the remainder in (2.2) we proceed in three stages, Assumption A2


will allow us to restrict attention to a compact subset of
culating the supremum of

which is replaced at the third


n-~ ,

and final stage by neighbourhoods that shrink like

uniformly on bounded
A2.

nol-plane.
nol-sets

The fact that

E(H(Xl

W(X1,o)

<

nl

i. e ., bounded

in (2.1) is

o (1)
p

now completes the proof.

There exists a compact neighbourhood

Moreover,

while cal-

Then with the help of A3 we work with

L (8).

arbitrary but fixed neighbourhoods of

neighbourhoods in the

81

is continuous on

82 ,

N of

01

such that

IW(X ,8 2 )1 ~ H(X ) V 8 2
l
l

and

00.

By the uniform strong law of large numbers (USLLN) applied to

L (8) - V (0),

Sup

and the fact that

8
we get

-12-

01

E[O,l]XN

Sup

(8) -

0l

(1)

(3.1)

E:[0,1]XN

8 .
2

uniformly in
A3.

L (e) =

there exists an open ball with centre

For each

and radius

[0,1] x 01

and

U = U(e 1,0)

such that if

is its intersection with

then

By A3 and continuity of

Let

U = {e

ol

I 18l-8~1 I

; 0 < eo < 0,

U n [0,1] x N by sets

U(8

Now apply the USLLN to

-1

0l

,01)

<

oJ.

Consider an open cover of

and choose a finite subcover

L:t/J(X ,U ,8 ),
i j 2

= l, ,m,

82

02

Ul' ,U
m
to con-

clude that
=

uniformly in

e
2

(1)

This completes the second stage of the proof.

At the third and final stage, note that, by Taylor's theorem and AI,
L (8)
n

-13-

where

IJij(T)

+ 1ij(T)Z)I = 0-0p (1)

(p+1)x (p+l)
1(8 )
2

A4.

is continuous in

greater than
AS.

EIDo log

> 0

V 82 ;

o x 8Z

uniformly in

We now use

and its minimum eigen value is

82
and

f(Xl,8~1,e2) - Do log f(Xl,e~1,e2)la~ KI82-8ill+Y for

a,y > O.

some

AS ensures tightness of

no

(To see this one has to use the theorem

(.).

Sup Uno (T)Z) is 0 p (1). Also by


T)Z
Hence (by A4) for given , we can find ' < ,

of Dharmadhikari et a1. (1968. Hence,


AI,

is also

U
n1

K and

(1).

such that for

n > n ,
o

(Sup U (n ) + Iu 11 < K ,
no Z
n

" n

P the

smallest eigen value of

E:

> 1-

U0 x H2

(p+1)X(p+l)
[Jij(T)] > ' ,

Then by first making a suitable orthogonal transformation, one can find


such that for
P{Vn(T)

n > no'
< Vn(O)

An(n) < An(O)

where

>

n ,

A (n)
n

Rn1

if
if

T)

the supremum of
(over

Uo x HZ

K and
L (8),
n

'.
i.e.,

[0,1] x RP) is attained in

of (2.1) is

0p(l)

I Inoll I

and

Iinoill > M} > 1-

depends only on

> M and

Thus with probability

v (n)
n

(over

IIn ol ll

~M.

o)

> 1-

E:

and that of

Since on this set

by AI, the proof of (2.2) is complete.

The proof of the similar result (2.6) follows along similar lines
from Al through A4 which are of course much stronger than what we need
for (2.6).

-14-

for

Remark 3.1 (a) The tightness assumption A5 holds if


(3.2)

E(~(X8

and

<

8 > 1.

for some

00

T (e)

(b) The limiting distribution of

8 .
1

It is weakly continuous in

T()

is continuous in

8
1

depends on the true value of

provided (i) the covariance kernel of

8 ; and (ii) the Lipschitz condition (3.2) is

suitably strengthened to be uniform over

8 -neighbourhoods, (i) guarantees


l

convergence of finite dimensional distributions of


(ii) guarantees tightness.

o
1

and

el .

Suppose this is so and let

{A ~ t(Ol,a)} = a. If for each 8 , t exists,


1
1
is unique and a point of continuity of the limiting distribution of A
t(Ol,a)

Under these conditions the limiting distribu-

A is also weakly continuous in

tion of

as

T(e)

under
'"
01

be such that

8 ,
1

then

lim P

is continuous in

is consistent for

8
1

under

H
o

8
1

By Redner's (1981) result,

Hence, as pointed out to us by

Peter Bickel, lim P {A ~ t(8 ,a)} = a. Thus the test which rejects H
8
o
l
"
1
if A ~ t(8 ,a) would be asymptotically similar provided the conditions
l
assumed here hold.

e2

They are easy to check under Al to A5 if

is a

finite set.

4.

Asymptotically Locally Minimax Tests in some Examples.

culate the asymptotic properties of tests based on

A (8 2),
n

We shall cal-

A (02)

where

is defined in (2.2) and (2.7), and show that it is asymptotically locally


minimax for problems like Example 1.
Fix

as in Section 2 and a sequence of alternatives

responding to a fixed

n = (n o ,n l ,n2 ).

We fix also a value

and consider the limiting distribution of

-15-

T (b)
n

under

eO1

cor-

b
and

of

8
K

where

T (b)

is defined in (2.4),

Let

Z*
n

= Vn (n)

Then by (2.1),

- V (0).
n

N(-~nOlI(n2)n~1,nOlI(n2)n~1)

Z*

is asymptotically

By a well-known result of LeCam, namely his

first lemma on contiguity [cf. Hajek and Sidak (1967, p. 204)], this shows
K
n

is lIontiguous to

Since

and

T (b)
n

Z*

are asymptotically

bivariate normal, by another well-known result of LeCam, namely, his third


lemma on contiguity (vide Hajek and Sidak (1967)], T (b)

is asymptotically

normal under

6~

under
o
6 .

with same asymptotic variance as under

Kn

equal to mean under

Moreover,

plus the asymptotic covariance under

= Tn(b)I{T (bO}+

relation holds under

and

(1)

K ,

under

K
n

is contiguous.

since the same

is asymptotically normal with mean zero and var-

T (b)
n

iance unity.

A (b)

Under

and mean

Also the asymptotic covariance of

Z*
n

and

T (b)

under

is
p= {I

where

00

(b)}

01
j

-~

[n I
o

and

00

(n 2 )Cov(U (b),U (n 2 )) +n
no
no
0

and

no

J=

1J

are the j-th components of

Un I'J

depends only on

P 01
L I. (n )Cov(U l"U

n J

no

and

and so may be written

(n 2 ))]

Note

By the

remarks in the preceding paragraph, the following result is true.


Theorem 4.1.

Assume the conditions of Section 3.

lim P

o
1

{A (bx} = 1- <P(/x)
n
-

=1
lim PK {An(b)":::'x}
n

if

x> 0

if

Then

=0

1 - <P(v'x- p (n ,T)2))
o

if

x> 0

if

x = 0

-16-

where

is the standard normal distribution function.

Consider now Example 1 given in the introduction and fix


~n

Let the limiting power of a sequence of tests


be denoted by

S({~n},e~,no,nl,n2). Let us say

locally minimax if for all sequences


inf
n l ERl
nZE 0Z
for every

no >

~}
{ 'fin

of size
{~o}
n

a < .5.
under

is asymptotically

which have limiting power,

S({~n},e~,no,nl,n2)

and

By the classical theory for the likelihood ratio test

~o
n

based on

inf
inf
n ER1 S({~n},e~.no,nl,b) ~ n ERl S({~~}.e~,nl,n2,b)
1
r
for every

no > 0 and
~o

Hence asymptotic local minimaxity of


p(no,n z)

if we show

p(no,b)

easily by direct calculation.

V n Z ~ b.

will follow from Theorem 4.1

For Example 1 this follows

However the following lemma shows this

property is true for general exponentials.

Befpre stating the lemma we

introduce some notations.


Let

ge

= g(x.8) = A(e)exp{8x}h(x),

a family of probability densities.

Let

Let

1/!(8)

11

ge
'Cov(g'
a

gb
--)
g
a

-17-

a < b

some open interval

J,

be

be fixed elements of

J.

Ie =

where the covariances are computed under

a '

To relate to Example 1 (and similar problems) note that

elo = a.

with

We assume

Lemma 4.1.
Proof.

~(e)

~(e) ~ ~(b)

if

e > b.

Note that

=0

~(a)

Also

J.

is finite on

~(e)

<

can be expressed as
g

g'

/{I 11 ~
- o
I l(b)
~ - I11}gedlJ
g
g
a
a
Since

(4.1)

~(b)

is convex, for any constant

K,

say

(x) - K can have at most two

sign changes and if there are two, they must be from positive to negative
and negative to positive.

Hence by Karlin's well-known result on sign

diminishing properties of the exponential densities (see, e.g. Karlin


(1968,
8' > b

~(e)

- K has similar sign change properties.

such that

~(8')

<

contradicted at the points


Max{o,~(e')}

< K < ~(b).

~(b)

If there exists

then this sign change property would be

a,b,8'

provided we choose

This proves the lemma.

-18-

K such that

5.

The Case of Identifiable Mixtures.

interval and

its closure.

< 8

< 1

and

182-8112:

loss of generality we may take


i. e. ,

H :

Suppose

H
o

g(x,6),

be an open bounded real


E

be a family of densities

(1-8 )g(x,6 ) + 8 g(x,8 ) ,


0
0
l
2

and consider the mixtures

Let

Let

0.

being a fixed 'positive number.

< 80

~~.

where
Without

We wish to test homogeneity,

against the above mixture alternatives.

In the sequel

We make the

is true and the true density is

following blanket assumption


B1.

Let

be any open set containing

0 n O = 0.
1
2

that
8

61

8~

O2

and

any closed set such

Then AI, A3, A4, AS hold with this

0 ,0
1 2

(Since

is compact by assumption, A2 is dropped.)


One can now imitate the arguments in Sections 2 and 3.

we can show that in order to maximise the log likelihood


restrict to
restricted to
Define

< e

< 0

and

82 -< 80 - (-0)

n,Vn(n),An(n)

I81-8~1

< 0

and

82 > 8
-

(0 < E).
0

etc. as in Section 2.

L (8)

vn (0) + Sup An (n) +


An

is over the set

-19-

0(1)

and

A (n)

= (0,0).

Hence

o1

8 ,
2

where the maximisation of

may be

Then one can prove as in

attain their maximum in a bounded neighbourhood of

Sup L (8)
n
8
01

82

we may

Call this set

Section 3, that with probability tending to one both

uniformly in

L (8)
n

Hence

+ (-c)).

As in Section 3,

(1)

Va'

Because of the nature of

for given E' one can find K and


2
> 1 - E, the maximum of A
(over
n

such that with probability

o<

E -

attained at

no'

n ol

62

2- 8 1 +

if

E -

Kn

Kn

-~

2- 8 2 2- 8 1

+ 0 or

-~

An easy calculation now shows


Sup L (8)
8

where

0(2) = {8
2

L (8)
n

under

L (8)

in Section 2 with
H

o2

A (n)
n

H:

The supremum of

= V (0) +

(1)

~ 8~ +d.

8 2 2- 8 1 -

under

H
a

0(2)
2

Since the expression for the supremum of

or

has therefore the same expression as

remains unaltered, the conclusion of Theorem 2.1 is

valid, i.e., the following is true.


Theorem 5.1.

Assume Bl.

likelihood ratio test under

Then the limiting distribution of the

8~

..

is the same as that in Theorem 2.1 with

o =

0(2)
22

ACKNOWLEDGEMENT
Thanks are due to the referee whose comments clarified many issues
and led to a better presentation.

REFERENCES
Bickel, P.J. and Wichura, M.J. (1971).

Convergence criteria for multi-

parameter stochastic processes and some applications.


Statist., 42, 1656-1670.

-20-

Ann. Math.

Bock, H.H. (1981).


analysis.

Statistical testing and evaluation methods in cluster

Paper presented at the lSI Golden Jubilee Conference, to

appear in the Proceedings .

Chernoff, H. (1954).

On the distribution of the likelihood ratio.

Ann.

Math. Statist., 25, 573-578.


Dharmadhikari, S.W., Fabian, V. and Jogdeo, K. (1968).
of martingales.

Bounds on moments

Ann. Math. Statist., 39, 1719-1723.

Everitt, B.S. and Hand, D.J. (1981).

Finite Mixture Distributions.

Chapman and Hall, London.


Feder, P.I. (1968).

On the distribution of the log likelihood ratio test

statistic when the true parameter is near the boundaries of the


hypothesis region.

Ann. Math. Statist., 39, 2044-2055.

Gupta, S.S. and Huang, Wen-Tao (1981).

On mixtures of distributions:

survey and some new results on ranking and selection.

Sankhya

43,

245-290.
Hajek. J. and Sidak. Z. (1967).

Theory of Rank Tests.

Academic Press,

New York.
Karlin. S. (1968).

Total Positivity, Vol. 1, Stanford University Press,

Stanford, California.
McCormick. S.P. (1967).
Moran. P.A.P. (1973).

Nonlinear Programming.

McGraw Hill, New York.

Asymptotic properties of homogeneity tests.

Biometrika, 60, 79-85.


Redner. R. (1981).

Note on the consistency of the maximum likelihood

estimate for non-identifiable distributions.


Wald, A. (1949).
estimate.

Ann. Statist.,

~,

Note on the consistency of the maximum likelihood


Ann. Math. Statist., 20, 595-601.

-21-

224-227 .

Vous aimerez peut-être aussi