The Asymptotic Behaviour of Threshold-Based Classification Rules in Case of Three Prescribed Classes

ISSN 2347-1921
Volume 12 Number 05
Journal of Advances in Mathematics
The asymptotic behaviour of threshold-based classification rules

in case of three prescribed classes
Oksana Kubaychuk
Department of Computer Science, National University of Food Technologies, Kyiv, Ukraine
kubaychuk@gmail.com
ABSTRACT
We consider the problem of the classification of an object from the observation after its numerical characteristic in case of
three prescribed classes. We also study a problem on finding and asymptotic behaviour of threshold-based classification
rules constructed from a sample from a mixture with varying concentrations.
Keywords
classification rule; mixture with varying concentrations; estimator; threshold.
Academic Discipline And Sub-Disciplines

Mathematics, Probability and Statistics
SUBJECT CLASSIFICATION
62G05, 62G20, 60G15
1. INTRODUCTION
Object classification by its numerical characteristic is an important theoretical problem and has practical
significance, for example, the definition of a person as not healthy, if the temperature of its body exceeds 37C.
To solve this problem we consider the threshold-based rule. According to this rule, an object is classified to
belong to the first class if its characteristic does not exceed a threshold 37C; otherwise, an object is classified to
belong to the second class. The empirical Bayes classification (EBC) (Devroye and Giorfi, 1985; Ivan ko and
Maiboroda, 2002) and minimization of the empirical risk (ERM) (Vapnik, 1989; Vapnik, 1996) are widely used
methods to estimate the best threshold. The case when the learning sample is obtained from a mixture with
varying concentrations is considered in (Ivanko and Maiboroda, 2006).
However, it is often necessary to classify an object in case of more than one threshold, for example, the
definition of a person as not healthy, if the temperature of its body exceeds 37C or lower then 36C. Another
example: the person is sick, if the level of its haemoglobin exceeds 84 units or lower than 72 units. In particular,
this problem is discussed in (Kubaychuk, 2008; Kubaychuk, 2010).
In all previous examples we have only two prescribed classes. The case of two thresholds and three
prescribed classes deserves special attention. An example is the classification of the disease stages. Thus,
during the diagnosis of breast cancer a tumor marker CA 15 -3 is used. If the value is less than 22 IU/ml, then the
person is healthy; if its level is in the range from 22 to 30 IU/ml precancerous conditions can be diagnosed; if the
index is above 30 IU/ml patient has cancer. When solving some technical problems it is needed to consider the
substance in its various aggregate forms: gaseous, liquid, solid. The transition from state to state occurs at a
specific temperature. According to this, a boiling point and a melting point are used.
2. THE SETTING OF THE PROBLEM

The problem of the classification of an object O from the observation after its numerical characteristic (O)
is studied. We assume that the object may belong to one of the three prescribed classes. An unknown number of
a class containing O is denoted by ind (O) . A classification rule (briefly, classifier) is a function
g : {1,2,3} that assigns a value to ind (O) by using characteristic . In general, classification rule is
defined as a general measurable function, but we restrict the consideration in this paper to the so -called
threshold-based classification rules of the six forms
1, t1 ,
2, t1 ,
1, t1 ,
2
3
g t ,t ( ) 2, t1 t2 , , g t ,t ( ) 1, t1 t2 , , g t ,t ( ) 3, t1 t2 , ,
1 2
1 2
1 2
3, t ,
3, t ,
2, t ,
2
2
2
6261 | P a g e
June 2016
council for Innovative Research

www.cirworld.com
ISSN 2347-1921
Volume 12 Number 05
3, t1 ,
3, t1 ,
2, t1 ,
5
6
g t ,t ( ) 2, t1 t2 , , g t ,t ( ) 1, t1 t2 , , g t ,t ( ) 3, t1 t2 ,
1 2
1 2
1 2
1, t ,
2, t ,
1, t .
2
2
2
The a priori probabilities
pi P(ind (O) i) , i 1,3
are assumed to be known. The characteristic
Hi
are unknown, but they have continuous densities
The family of classifiers is denoted by
is
ind (O) : P( (O) x ind (O) i) H i ( x) , i 1,3 .
assumed to be random, and its distribution depends on

The distributions
G {g t : t 2 } .
hi
with respect to the Lebesgue measure.
The probability of error of such a classification rules
are given by
L( gt1 ) L1 (t) L1 (t1, t2 ) P{g1t ( (O)) ind (O)}
( p2 p3 ) H1 (t1 ) ( p1 p3 ) H 2 (t1 ) ( p3 p1) H 2 (t2 ) ( p2 p1) H3 (t2 ) p2 p1 .

Analogically,
L( gt4 ) ( p1 p2 ) H3 (t1 ) ( p1 p3 ) H 2 (t1) ( p1 p3 ) H 2 (t2 ) ( p2 p3 ) H1(t2 ) p2 p3 .

Furthermore,
L1 (t2 , t1 ) L4 (t1, t2 ) 2 p2 p3 p1 .
Further, similarly
L( gt2 ) ( p1 p3 ) H 2 (t1 ) ( p2 p3 ) H1(t1) ( p2 p1) H3 (t2 ) ( p2 p3 ) H1(t2 ) p2 p1 ,

L( gt5 ) ( p1 p2 ) H3 (t1 ) ( p3 p2 ) H1(t1) ( p3 p1) H 2 (t2 ) ( p2 p3 ) H1(t2 ) p3 p1 ,
L2 (t2 , t1 ) L5 (t1, t2 ) 2 p1 p3 p2 ,
L( gt3 ) ( p2 p3 ) H1 (t1 ) ( p3 p1 ) H 2 (t2 ) ( p2 p1 )( H3 (t2 ) H3 (t1 )) p3 p1 ,
L( gt6 ) ( p2 p3 ) H1 (t2 ) ( p1 p3 ) H 2 (t1 ) ( p1 p2 )( H3 (t2 ) H 3 (t1)) p2 p3 ,
L3 (t2 , t1 ) L6 (t1, t2 ) 2 p3 p1 p2 .
g B G is called a Bayes classification rule in the class G , if L( g ) attains its minimum

g B ( g B arg min L( gt ) ). The threshold t B for a Bayes classification rule is called the Bayes threshold:
A classification rule
at
gG
t B arg min L(t )
(1)
t 2
For
Lit , i 1, 6
we have: t
iB
arg min Li (t1, t2 ) (arg min L1i (t1 ),arg min Li2 (t2 )) , and
t1 ,t2
t1
t2
L11 (t1 ) ( p2 p3 ) H1 (t1 ) ( p1 p3 ) H 2 (t1 ) ,

L12 (t2 ) ( p3 p1 ) H 2 (t2 ) ( p2 p1 ) H3 (t2 ) p1 p2 ,
L12 (t1 ) ( p2 p3 ) H1 (t1 ) ( p1 p3 ) H 2 (t1 ) ,
L22 (t2 ) ( p3 p2 ) H1 (t2 ) ( p2 p1 ) H3 (t2 ) p1 p2 ,
L13 (t1 ) ( p1 p2 ) H3 (t1 ) ( p3 p2 ) H1 (t1 ) ,
6262 | P a g e
June 2016

www.cirworld.com
ISSN 2347-1921
Volume 12 Number 05
L32 (t2 ) ( p3 p1 ) H 2 (t2 ) ( p2 p1 ) H3 (t2 ) p1 p3 ,

L14 (t1 ) ( p1 p3 ) H 2 (t1 ) ( p1 p2 ) H3 (t1 ) ,
L42 (t2 ) ( p3 p1 ) H 2 (t2 ) ( p2 p3 ) H1 (t2 ) p3 p2 ,
L15 (t1 ) ( p1 p2 ) H3 (t1 ) ( p3 p2 ) H1 (t1 ) ,
L52 (t2 ) ( p3 p1 ) H 2 (t2 ) ( p2 p3 ) H1 (t2 ) p1 p3 ,
L16 (t1 ) ( p1 p3 ) H 2 (t1 ) ( p1 p2 ) H3 (t1 ) ,
L62 (t2 ) ( p3 p2 ) H1 (t2 ) ( p2 p1 ) H3 (t2 ) p3 p2 .
Let us consider the threshold rule
functions from the data
independent, if
{ }
N
j:N j 1 ,
3
i 1
(and, hence
hi ) are unknown. One can estimate these
being a sample from a mixture with varying concentrations, where
j:N
are
P{ j:N x} w H1 ( x) w H 2 ( x) w H 3 ( x) . Here w , i 1,3 is a

mixture of objects of the i -th class at the moment when an observation j is made
1
j:N
is fixed and
known concentration in the

(Maiboroda, 2003),
gt11 ,t2 . The functions H i
2
j:N
3
j:N
i
j:N
wij:N 1 .
To estimate the distribution function
H i , empirical distribution function
1 N
H iN ( x) a ij:N I{ j x}
N j 1
is used, where
I{ A}
is the indicator of an event
a ij:N
and
are known weight coefficients (Maiboroda, 2003;
Sugakova, 1998)
a kj:N
defined if
and
ki
det N 0 ,
is the
(k , i )
where
main minor of
w ,w
k
1
det N
l
k i
i 1
ki wkj:N
3
k ,l 1
is the Gramm matrix, where
wk , wl
1
N
wk wl
j 1 j:N j:N
N
N .
One can apply kernel estimators to estimate the densities of distributions hi :
1
hiN ( x)
Nk N
where
a
j 1
i
j:N
x j:N
K
,
kN
is a kernel (the density of some probability distribution),
kN 0
is a smoothing parameter (Sugakova,
1998; Ivanko, 2003).

Let us construct the threshold estimator using EBC method (Kubaychuk, 2008). The empirical Bayes estimator is
constructed as follows. First, one determines the sets
TN1
and
TN2
of all solutions of the equations
( p2 p3 )h1N (t ) ( p1 p3 )h2N (t ) 0 , and ( p1 p3 )h2N (t ) ( p1 p2 )h3N (t ) 0

respectively. Second, one chooses
t EBC arg min L1N (t1, t2 ) ,

t1TN1 ,t2TN2 ,t1t2
6263 | P a g e
June 2016

www.cirworld.com
ISSN 2347-1921
Volume 12 Number 05
as an estimator for
t B , where
L1N (t1, t2 )
N (t ) ( p p ) H
N (t ) ( p p ) H
N (t ) ( p p ) H
N (t ) p p
( p2 p3 ) H
1
1
1
3
2
1
3
1
2
2
2
1
2
2
1
2
and
L1N (t1, t2 )
L1 (t1, t2 ) :
is the estimator for
N (t ) ( p p ) H
N (t ) ,
L1N1 (t1 ) ( p2 p3 ) H
1
1
1
3
2
1
N (t ) ( p p ) H
N (t ) p p ,
L1N2 (t2 ) ( p3 p1 ) H
2
2
2
1
2
2
1
2
tNEBC
arg min L1N1 (t1 ) , tNEBC
arg min L1N2 (t2 ) .
1
2
t1TN1
The sets
TN1
and
Let the densities
TN2
hi
t2TN2
are constructed under condition
exist and be
t1 t2 .
times continuously differentiable in some neighborhood of the points
t1B , t2B .
Put
d sh
d sh
f s2 (t ) (1) s ( p3 p1 ) s2 ( p2 p3 ) s1 ,
dt
dt
d sh
d sh
f s2 (t ) (1) s ( p2 p1 ) s1 ( p3 p1 ) s2 .
dt
dt
Lets assume,
lim rNi ri , i 1, 2
exist. Put
1
2
2
rNi N 1 bij:N w1j:N h1 tiB w2j:N h2 tiB w3j:N h3 tiB , i 1, 2 , where
j 1
b1j:N ( p2 p3 )a1j:N ( p1 p3 )a 2j:N , b2j:N ( p1 p3 )a 2j:N ( p1 p2 )a3j:N .

Lets denote
2
1
1

WN1i N 3 L1Ni tiB N 3 L1Ni tiB L1i tiB N 3 L1i tiB , i 1, 2 .
3. MAIN RESULTS
In what follows we assume that:
(
A)
the threshold
tB
defined by (1) exists and it is the unique point of the global minimum for
the unique global minimum point for

( B ) The limits
Remark 1.
L11 (t1 ) , t2B
lim inf det N c 0
Condition ( B ) is sufficient for
is the unique global minimum point for L2 (t2 ) ).

M
exist;
lim (a k )2 , wr
r 1
lim rNi ri , i 1, 2
heorem 1. Let conditions ( A ) and ( B ) hold. Assume

k N 0 as NkN , k is the continuous function, and
6264 | P a g e
June 2016
L1 (t ) ( t1B is
hr ( x) , 1 k M , M 3 .
existence.
that the densities
hi
exist and are continuous,

www.cirworld.com
ISSN 2347-1921
Volume 12 Number 05
def
d 2 k 2 (t )dt .
Then
t EBC
t B (tNEBC
t1B , tNEBC
t2B )
N
1
2
in probability.
Proof. According to Theorem 1 of (Sugakova, 1998), the assumptions of the theorem imply that
hiN ( x) hi ( x)
in probability at every point
. Therefore
uN1 ( x) (( p2 p3 )h1N ( x) ( p1 p3 )h2N ( x)) u1( x) (( p2 p3 )h1( x) ( p1 p3 )h2 ( x)) ,

uN2 ( x) (( p1 p3 )h2N ( x) ( p1 p2 )h3N ( x)) u2 ( x) (( p1 p3 )h2 ( x) ( p1 p2 )h3 ( x))
in probability.
Put
AN ( i ) {thereexists ti : ti tiB i , uNi (ti ) 0}
i 0 . We can show that
for
P( AN ( i )) 1, N
Since
t1B
t2B
is the point of minimum L1 (t ) ,
( L12 (t )) uN2 (t )
is the point of minimum L2 (t ) ,
are continuous functions, it follows that
means that there are
ti
and
ti
(2)
uNi (t )
( L11 (t )) uN1 (t )
and
tiB .
This
changes sign in the neighborhood of
such that
tiB i ti tiB ti tiB i

and
ui (ti )ui (ti ) 0, i 1,2 .
functions,
P(ui (ti )ui (ti ) 0) 1, N .
Thus,
Since
uNi (t )
are continuous
{ui (t )ui (t ) 0} AN (i ) . Therefore (2) is proved.
Let us fix an
L11 () p2 p1 ,
i , i 1, 2 .
Hence,
L11 (t )
L12 () p2 p1,
i 0 i ti : ti tiB i
it
follows
and
L12 (t )
are continuous functions on
L12 () p3 p1
that
and
condition
L1i (ti ) L1i (tiB ) i .
Let
A)
is
0 i i
L11 () 0 ,
satisfied,
then
be
such
that
that
t [tiB i , tiB i ] : L1i (ti ) L1i (tiB ) i 4 . Put
BNi {
inf
t[ tiB i ,tiB i ]
Fix an arbitrary
i 0 .
P( BNi ) 1 i 2 .
given the event
BNi
Using the uniform convergence
From (2) it follows
occurs, then there exists
L1Ni (t ) L1i (tiB ) i 2
L1Ni
P( AN ( i )) 1 i 2
ti TNi [tiB i , tiB i ]
such that
inf
t[ tiB i ,tiB i ]
to
L1i ,
L1Ni (t )} .
we obtain for sufficiently large
for sufficiently large
L1Ni ti* L1Ni ti
. If the event
for all
AN ( i )
ti tiB i , tiB i
occurs. Therefore, hence
P( AN ( i ) BNi ) P( AN ( i )) P( BNi ) 1
it follows that
P{ tiEBC tiB } 1 i
for sufficiently large
Remark 2.
N . This completes the proof of the theorem, since i , i 1, 2
The estimator
6265 | P a g e
June 2016
Hk
is arbitrary.
(obtained by construction) is unbiased iff

www.cirworld.com
ISSN 2347-1921
Volume 12 Number 05
a k wm
Then, it is easy to see that
I m k , for all m 1,, M , N M .
1.
Remark 3. Often, H k is not a probability distribution, but it is not important. To estimate H k you can use the
corrected weighted empirical distribution function, if necessary. (Kubaychuk, 2003;Maiboroda and Kubaichuk,
2003; Maiboroda and Kubaichuk, 2004).
For the proof next theorem we need some auxiliary result on the asymptotic behavior of the processes
i 1, 2 .
WN1i ,
Lemma 1. Let condition ( A ) hold and 1 2 . Put
AN AN (1, 2 ) [ N 1 31, N 1 3 2 ] .
Then
N
WN11 ( 2 ) WN11 ( 1 ) N 1 3 b1j:N (I{ j:N AN } P{ j:N AN }) ,

j 1
N
WN12 ( 2 ) WN12 (1 ) N 1 3 b2j:N (I{ j:N AN } P{ j:N AN }) .

j 1
Proof.
WN11 ( 2 ) WN11 ( 1 )
N 2 3[ L1N1 (t1B N 1 3 2 ) L11 (t1B N 1 3 2 ) L1N1 (t1B N 1 31 ) L11 (t1B N 1 31 )]
N 2 3[ p2 H 1N (t1B N 1 3 2 ) p1H 2N (t1B N 1 3 2 )
p3 ( H 1N (t1B N 1 3 2 ) H 2N (t1B N 1 3 2 )) p2 H1 (t1B N 1 3 2 ) p1H 2 (t1B N 1 3 2 )
p3 ( H1 (t1B N 1 3 2 ) H 2 (t1B N 1 3 2 )) p2 H 1N (t1B N 1 31 ) p1H 2N (t1B N 1 31 )
p ( H N (t B N 1 3 ) H N (t B N 1 3 )) p H (t B N 1 3 ) p H (t B N 1 3 )
3
p3 ( H1 (t1B N 1 31 ) H 2 (t1B N 1 31 ))]
1 3
[ p a
j 1
1
j:N
I{ j:N AN } p1a 2j:N I{ j:N AN }
p3 (a1j:N I{ j:N AN } a 2j:N I{ j:N AN }) p1a 2j:N P{ j:N AN } p2a1j:N P{ j:N AN }
p3 (a1j:N P{ j:N AN } a 2j:N P{ j:N AN })]

N 1 3 j 1[( p2a1j:N p1a 2j:N ) p3 (a1j:N a 2j:N )]
N
I j:N AN P j:N AN N 1 3 j 1 b1j:N (I{ j:N AN } P{ j:N AN }) .

N
WN12 ( 2 ) WN12 (1 )
N 2 3[ L1N2 (t2B N 1 3 2 ) L12 (t2B N 1 3 2 ) L1N2 (t2B N 1 31 ) L12 (t2B N 1 31 )]
6266 | P a g e
June 2016

www.cirworld.com
ISSN 2347-1921
Volume 12 Number 05
N 2 3[( p3 p1 ) H 2N (t2B N 1 3 2 ) ( p2 p1 ) H 3N (t2B N 1 3 2 )

( p p ) H N (t B N 1 3 ) ( p p ) H N (t B N 1 3 )
3
( p3 p1 ) H 2 (t N
B
2
2 ) ( p2 p1 ) H 3 (t N 2 )
1 3
B
2
1 3
( p3 p1 ) H 2 (t2B N 1 3 1 ) ( p2 p1 ) H 3 (t2B N 1 3 1 )]
N 1 3 j 1[( p3 p1 )a 2j:N ( p2 p1 )a 3j:N ]
N
(I{ j:N AN } P{ j:N AN }) N 1 3 j 1 b 2j:N (I{ j:N AN } P{ j:N AN }) .

N
This completes the proof of the lemma.

In what follows, the symbol
Theorem 2.
be the space of functions without discontinuity of the second kind equipped with the
uniform metric,
W , i 1,2
1
Ni
Let
D(ui )
stands for weak convergence.
the two sided standard Wiener process, ( B ) holds. Then, stochastic processes
W be
weakly converge as
to the process
rW
i
in the space
D(ui )
on an arbitrary finite interval
ui [ i , i ] .
Proof.
The trajectories of
WN1i , i 1,2
are continuous. It is enough to prove: the finite dimensional distributions of
W , i 1, 2
are asymptotically Gaussian, the second moments of incremen ts converge and the distributions of
W , i 1,2
are tight in
1
Ni
1
Ni
D(ui ) . See (Billingsley, 1968).

1
We first compute E(WN
( 2 ) WN1i (1 ))2 , i 1, 2 . Let 1 2 , by using Lemma 1:
E(WN12 ( 2 ) WN12 (1 ))2 N 2 3 j 1 (b2j:N )2 E(I{ j:N AN } P{ j:N AN })2

N
N 2 3 j 1 (b 2j:N )2[( w1j:N H1 ( AN ) w2j:N H 2 ( AN )

N
w3j:N H 3 ( AN )) ( w1j:N H1 ( AN ) w2j:N H 2 ( AN ) w3j:N H 3 ( AN )) 2 ] .
E(WN11 ( 2 ) WN11 (1 ))2 N 2 3 j 1 (b1j:N )2 E(I{ j:N AN } P{ j:N AN })2

N
N 2 3 j 1 (b1j:N )2[( w1j:N H1 ( AN ) w2j:N H 2 ( AN )

N
w3j:N H 3 ( AN )) ( w1j:N H1 ( AN ) w2j:N H 2 ( AN ) w3j:N H 3 ( AN )) 2 ] .

Taking into account that
Hi ( AN ) hi (t1B ) N 1 3 ( 2 1 ), i 1,2,3 , we obtain:
E(WN11 ( 2 ) WN11 ( 1 )) 2
N 2 3 j 1 N 1 3 (b1j:N )2 [ w1j:N h1 (t1B ) w2j:N h2 (t1B ) w3j:N h3 (t1B )]( 2 1 )
N
[1 N 1 3 ( w1j:N h1 (t1B ) w2j:N h2 (t1B ) w3j:N h3 (t1B ))( 2 1 )]

r12 ( 2 1 ) E(rW
( 2 ) rW
( 1 )) 2 as N ,
1
1
where
r1 lim rN1 , rN1 [ N 1 j 1 (b1j:N )2[w1j:N h1 (t1B ) w2j:N h2 (t1B ) w3j:N h3 (t1B )]]
N
6267 | P a g e
June 2016

www.cirworld.com
ISSN 2347-1921
Volume 12 Number 05
Similarly,
taking
into
account
Hi ( AN ) hi (t2B ) N 1 3 ( 2 1 ), i 1,2,3 ,
that
E(W ( 2 ) W (1 )) r ( 2 1 ) E(r2W ( 2 ) r2W (1 ))

1
N2
1
N2
2
2
as
we
obtain
N ,
where
r2 lim rN2 , rN2 [ N 1 j 1 (b2j:N )2[ w1j:N h1 (t1B ) w2j:N h2 (t1B ) w3j:N h3 (t1B )]]
N
The condition ( B ) holds, than all terms at sum from lemma 1 are uniformly bounded. Therefore, the finite
dimensional distributions of processes
WN1i , i 1, 2
are asymptotically Gaussian in view of the central limit
theorem under the Lindeberg condition. The tightness of family of distributions
WN1i , i 1,2
is proving
analogically to (Ivanko and Maiboroda, 2006). This completes the proof.

heorem 3. Let conditions (
(i) the derivatives
A ) and ( B ) hold. Assume that:
hk(t ) d 2hk (t ) dt 2
exist and are bounded in a neighborhood of
t1B , t2B
and
f (tiB ) 0 ,
i 1, 2 ;
def
(ii)
(iii)
kN c N 1 5
zK ( z )dz 0 , D 2 z 2 K ( z )dz
hen N
for some nonrandom c
and d
0.
(tiEBC tiB ) Ai Bii , where

Ai D2c2 5 f 2i (tiB ) (2 f1i (tiB )) , Bi dri (c1 10 f1i (tiB )) ,
and
is a standard Gaussian random variable,
i 1, 2 .
Proof. Let
uN1 (t ) ( p2 p3 )h1N (t ) ( p1 p3 )h2N (t ) ,

uN2 (t ) ( p1 p3 )h1N (t ) ( p1 p2 )h3N (t ) .
By the definition of
t NEBC
i
we have
uNi (t NEBC
) 0 . Put Ni t NEBC
tiB , i 1, 2 . Theorem 1 implies that Ni 0
i
i
in probability. Hence
uN1 (t1B )
( p2 p3 )(h1N (t1B ) h1 (t1B )) ( p1 p3 )(h2N (t1B ) h2 (t1B ))
,
N1
uN1 (t1B )
f11 (t1B )
uN2 (t2B )
( p1 p3 )(h2N (t2B ) h2 (t2B )) ( p1 p2 )(h3N (t2B ) h3 (t2B ))
.
uN2 (t2B )
f12 (t2B )
Similarly to the proof of Lemma 2 of (Ivanko, 2003), we obtain
N 2 5 ([( p2 p3 )(h1N (t1B ) h1 (t1B )) ( p1 p3 )(h2N (t1B ) h2 (t1B ))])

D 2c 2 5 f 21 (t1B ) 2 (dr1 c1 10 )1 ,
N 2 5 ([( p1 p3 )(h2N (t2B ) h2 (t2B )) ( p1 p2 )(h3N (t2B ) h3 (t2B ))])
D 2c 2 5 f 22 (t2B ) 2 (dr2 c1 10 )2
For
k N c N 1 5 , where i
6268 | P a g e
June 2016
is a standard Gaussian random variable,
i 1, 2 .

www.cirworld.com
ISSN 2347-1921
Volume 12 Number 05
This completes the proof.
SUMMARY AND CONCLUSIONS

The results obtained in this paper allow one to see the asymptotic behaviour of threshold -based classification
rules constructed from a sample from a mixture with varying concentrations in case of three prescribed classes.
This is another important step to solving the problem of the classification of an object from the observation after
its numerical characteristic. Future research will be devoted to the situation with an arbitrary number of classes.
ACKNOWLEDGEMENTS
The author would like to thank the referees for their valuable comments.
REFERENCES
[1]
Billingsley, P. (1968): Convergence of Probability Measures. New York: John Wiley & Sons, Inc.
[2]
Devroye, L. and Gyorfi, L. (1985): Nonparametric Density Estimation. The L1 View. New York: John Wiley
&Sons, Inc.
[3]
Ivanko, Yu. O. (2003): The asymptotic behavior of kernel estimators and their derivatives constructed from
observations from a mixture with varying concentrations. Visnyk KNU, Ser. Matematika.Mekhanika, 9, 29
35. (Ukrainian).
[4]
Ivanko, Yu.O. and Maiboroda, R.E. (2002): Exponential estimates for the empirical Bayes risk in the
classification of a mixture with varying concentrations. Ukrain. Mat. Zh., 54, no. 10, 14211428; English
transl. in Ukrainian Math. J., 54, no. 10, 17221731.
[5]
Ivanko, Yu.O. and Maiboroda, R.E. (2006): The asymptotic behaviour of threshold -based classification
rules constructed from a sample from a mixture with varying concentrations. Teor. Imovirnost. Matem.
Statist., 74, 3443; English transl. in Theor. Probability Math. Statist., 74, 3747.
[6]
Kubaychuk, O.O. (2003): Estimation of moments from mixtures using the corrected weighted empirical
distribution functions. Visnyk KNU, Ser. Matematika. Mekhanika, 9, 4852. (Ukrainian).
[7]
Kubaychuk, O.O. (2008): The asymptotic behavior estimator of Bayes threshold. Visnyk KNU, Ser.
Matematika.Mekhanika, 19, 4750. (Ukrainian).
[8]
Kubaychuk, O.O. (2010): The estimator asymptotic behavior of the empirical risk mini mization method for
Bayesian border. Research Bulletin NTU KPI Ser. Physics and Mathematics, 4, 7885. (Ukrainian).
[9]
Maiboroda, R. and Kubaichuk, O. (2003): Asymptotic normality of improved weighted empirical distribution
functions. Teor. Imovirnost. Matem. Statist., 69, 8995; English transl. in Theor. Probability Math. Statist.,
69, 95102.
[10]
Maiboroda, R. and Kubaichuk, O. (2004): Improved estimators for moments constructed from observations
of a mixture. Teor. Imovirnost. Matem. Statist., 70, 7481; English transl. in Theor. Probability Math.
Statist., 70, 8392.
[11]
Maiboroda, R. E. (2003): Statistical Analysis of Mixtures. Kyiv: Kyiv University (Ukrainian).
[12]
Sugakova, O. V. (1998): Asymptotics of a kernel estimate for distribution density constructed

fromobservations of a mixture with varying concentrations. Teor. Imovirnost. Matem. Statist., 59, 156166;
English transl. in Theor. Probability Math. Statist., 59, 161171.
[13]
Vapnik, V. N. (1996): The Nature of Statistical Learning Theory. New York: Springer.
[14]
Vapnik, V.N. (1989): Inductive principles for the search for empirical laws. Pattern. Recognition.
Classification. Prediction, 1, 1781. (Russian)
6269 | P a g e
June 2016

www.cirworld.com

The Asymptotic Behaviour of Threshold-Based Classification Rules in Case of Three Prescribed Classes

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

The Asymptotic Behaviour of Threshold-Based Classification Rules in Case of Three Prescribed Classes

Transféré par

Droits d'auteur :

Formats disponibles

ISSN 2347-1921

The asymptotic behaviour of threshold-based classification rules

Academic Discipline And Sub-Disciplines

2. THE SETTING OF THE PROBLEM

council for Innovative Research

The a priori probabilities

pi P(ind (O) i) , i 1,3

are assumed to be known. The characteristic

are unknown, but they have continuous densities

The family of classifiers is denoted by

ind (O) : P( (O) x ind (O) i) H i ( x) , i 1,3 .

assumed to be random, and its distribution depends on

with respect to the Lebesgue measure.

The probability of error of such a classification rules

L( gt1 ) L1 (t) L1 (t1, t2 ) P{g1t ( (O)) ind (O)}

( p2 p3 ) H1 (t1 ) ( p1 p3 ) H 2 (t1 ) ( p3 p1) H 2 (t2 ) ( p2 p1) H3 (t2 ) p2 p1 .

L( gt4 ) ( p1 p2 ) H3 (t1 ) ( p1 p3 ) H 2 (t1) ( p1 p3 ) H 2 (t2 ) ( p2 p3 ) H1(t2 ) p2 p3 .

L( gt2 ) ( p1 p3 ) H 2 (t1 ) ( p2 p3 ) H1(t1) ( p2 p1) H3 (t2 ) ( p2 p3 ) H1(t2 ) p2 p1 ,

g B G is called a Bayes classification rule in the class G , if L( g ) attains its minimum

t B arg min L(t )

L11 (t1 ) ( p2 p3 ) H1 (t1 ) ( p1 p3 ) H 2 (t1 ) ,

council for Innovative Research

L32 (t2 ) ( p3 p1 ) H 2 (t2 ) ( p2 p1 ) H3 (t2 ) p1 p3 ,

hi ) are unknown. One can estimate these

being a sample from a mixture with varying concentrations, where

P{ j:N x} w H1 ( x) w H 2 ( x) w H 3 ( x) . Here w , i 1,3 is a

known concentration in the

gt11 ,t2 . The functions H i

To estimate the distribution function

H i , empirical distribution function

is the indicator of an event

are known weight coefficients (Maiboroda, 2003;

is the Gramm matrix, where

One can apply kernel estimators to estimate the densities of distributions hi :

is a kernel (the density of some probability distribution),

is a smoothing parameter (Sugakova,

1998; Ivanko, 2003).

of all solutions of the equations

( p2 p3 )h1N (t ) ( p1 p3 )h2N (t ) 0 , and ( p1 p3 )h2N (t ) ( p1 p2 )h3N (t ) 0

t EBC arg min L1N (t1, t2 ) ,

council for Innovative Research

is the estimator for

Let the densities

are constructed under condition

times continuously differentiable in some neighborhood of the points

b1j:N ( p2 p3 )a1j:N ( p1 p3 )a 2j:N , b2j:N ( p1 p3 )a 2j:N ( p1 p2 )a3j:N .

WN1i N 3 L1Ni tiB N 3 L1Ni tiB L1i tiB N 3 L1i tiB , i 1, 2 .

the unique global minimum point for

L11 (t1 ) , t2B

lim inf det N c 0

Condition ( B ) is sufficient for

is the unique global minimum point for L2 (t2 ) ).

heorem 1. Let conditions ( A ) and ( B ) hold. Assume

that the densities

exist and are continuous,

council for Innovative Research

in probability at every point

uN1 ( x) (( p2 p3 )h1N ( x) ( p1 p3 )h2N ( x)) u1( x) (( p2 p3 )h1( x) ( p1 p3 )h2 ( x)) ,

AN ( i ) {thereexists ti : ti tiB i , uNi (ti ) 0}

i 0 . We can show that

is the point of minimum L1 (t ) ,

is the point of minimum L2 (t ) ,

are continuous functions, it follows that