A Conceptual Version of The K-Means Algorithm: Pattern Recognition Letters

Pattern Recognition
Letters
ELSEVIER Pattern Recognition Lcttcrs 1~ ( It~951 1147 1157
A conceptual version of the K-means algorithm

H. R a | a m b o n d r a i n y *
IREMIA. t'acu&; de~ 5ctence~. BP 7151, 15 at. Casein, F-97715 Satnt Denis Messag Cedex, France
Received 26 July 1094: revised 20 April 1995
Abstract
Clustering techniques are important for knowledge acquisition. Traditionally, numerical clustering methods have been
viewed in opposition to conceptual clustering methods developed in Artificial Intelligence. Numerical techniques emphasize
the determination of homogeneous clusters but provide low-level descriptions of clusters. A conceptual approach is more
concerned with high-level, i.e., more understandable descriptions of classes. In this paper, we propose a hybrid numeric-
symbolic method that integrates an extended version of the K-means algorithm for cluster determination and a complemen-
tary conceptual characterization algorithm for cluster description.
Keywords: Learning; Conceptual clustering; Rule discovery from data
1. Introduction required to classify astronomical images containing

many thousand pixels. Clustering algorithms are con-
For many years, clustering procedures have been sidered as useful compression methods to reduce
applied to the analysis of large data sets in a wide huge amounts of multi-dimensional data.
variety of disciplines. They are useful for deriving
hierarchies of plants in biology, for classifying indi- Concept acquisition. In some fields, classes are
viduals into personality types in psychology, for useful only if they have interesting interpretations
classifying stars or optical pictures of galaxies in and if they provide new knowledge (concepts) about
astronomy, etc. the domain, In marketing, the meaning of customer
These typical applications point to the following classes is fundamental to orient the firm's commer-
clustering objectives. cial strategy. In psychology, clusters are interpreted
to find out psychological profiles of young people.
Data reduction. In some fields like astronomy,
users and researchers are coping with large data sets
of spectral data and images collected from satellite Numerical clustering methods (Everitt, 1974) have
and ground based observatories. Fast algorithms are often been criticized for their lack of comprehensibil-
ity. The resultant classes are described with numeri-
cal summaries that are difficult for naive users to
interpret. Numerical algorithms also may not be able
to deal with structured data. Conceptual or symbolic
" Email: ralambon(a univ-reunmn.fr clustering algorithms have been proposed to over-
0167-8655/95/$09.51) ~'; 1995 Else,,ier Sv'ienc¢ B.'V. All rights rescr,~cd

SSDI 0167-8655(95)00075-5
1148 I t Ralamb~mdram~ Pam'rn Re(~,,nitton Letter.~ l b ~1095) 1147-1157
come these limitations. Cluster/2 (Michalski. 1984) 2. The K-means algorithm extended to mixed
and Cluster/S (Stepp and Michalski, 1986) used an data
extension of a predicate calculate formalism to repre-
sent data and knowledge. They determine concepts Clustering algorithms include two steps (Stepp,
that are described using logical predicates. Concep- 1987; Fisher and Langley, 1985): the first one is the
tual clustering systems work well on applications aggregation step when a partition of the observations
relating to complex data and knowledge. But they is calculated and the second one is the characteriza-
are not well-suited to numerical noisy data without tion of the classes resulting from that partition.
background knowledge. Another approach proposed Let ( X 1, ~ ) , . . . , ( X p , ~ ) be p numerical at-
has been to extend the similarity or distance used by tributes. Xj is the name of the jth attribute whose
numerical methods to cluster objects. In (Kodratoff domain is the set of reals ~. Each observation o ~ O
and Tecuci, 1988: Gowda and Diday, 1991; Esposito is a p-tuple o = (X 1, a~) . . . . . (Xp, a p ) t h a t is repre-
et al,, 1992) conceptual distances that take into ac- sented by the vector:
count structure and background knowledge are de-
fined on the objects. In the first step the conceptual
distances between the observations are computed and A distance d on numerical data must be chosen
after the result is analyzed using classical numerical on the observations space. Let o = ( a j ) l < j < p and
hierarchical methods. The weakness of this approach ~; = ( b , ) ~ , ~ p be two given observations. If the
lies in the difficulty to come back to the initial usual Euclidean distance is chosen, we have
knowledge to interpret the resulting classes.
In this paper, we are interested in extending the d (o, o')= E (aj-b;)
well known K-means clustering method (Jain and
Dubes, 1988) to deal with mixed data (data charac- [f the attributes are not homogeneous, the above
terised with numerical and symbolic features) and to distance is not suitable. The normalized distance is
give conceptual interpretation of the resulting clus- preferred because its value does not depend on the
ters. The K-means method is very popular because unit used to measure the attributes and the amplitude
of its ability to cluster huge amounts of numerical of each one. Let o-: denote the standard deviation of
and noisy data quickly and efficiently. It remains a the jth attribute. In the observation space the normal-
basic framework for developing numerical (Ralam- ized distance is written:
bondrainy, 1987; Venkateswarlu and Raju, 1992) or 2 v
conceptual clustering systems (Hanson, 1990) be- o')= E (aj-b/) /o-;.
l~<)<~p
cause of the various possibilities of distance and
prototype choice. We show that this kind of method An attribute X is said to be discrete or symbolic
discovers statistical concepts that are not displayed if its range of possible values (called modalities)
by the usual numerical interpretation tables provided D = {d~ . . . . . d m} is finite. To be processed, symbolic
by clustering programs. We propose a distance to attributes must be coded numerically. The usual way
deal with mixed data and a characterization algo- to do that is to associate to each modality a binary
rithm to derive abstract descriptions of the resulting attribute. For example, the attribute (set, {male,
clusters. female}) will be replaced by the following binaries
The paper is organized as follows. Section 2 attributes (sex male, {1, 0}) and (sex female,
describes the extended K-means algorithm proposed {1, 0}). When all the attributes are symbolic, an
to deal with mixed data. Section 3 describes the observation o is represented by a binary vector
formalisms used to represent data and knowledge. o = (a 1. . . . . aj . . . . . aq) where aj ~ {1, 0} and q is
Section 4 is concerned with the conceptual cluster the total number of modalities.
characterization process. Section 5 compares our al- The usual Euclidean distance can be used on
gorithm with a decision tree method on one applica- symbolic data but it gives equal importance to each
tion. modalitv. The chi-square distance used in correspon-
tt. Ra/amhondratny / Pattern Reco.¢mmm l_etters lh f 1995) 1147-1157 1149
dence analysis (Greenacre. 1984) takes into account algorithm is a classical optimization method that
the weight of each modality in the computing of the seeks a partition that minimizes the intraclass inertia:
distance of two observations o = (a~)~., ; ~ q and o'
= (b~) 1~ j.< q. It is written: w: E E{p,d~(o,, grlIOi~Cr}
l~-~k
d~:(o, o') = E (",-/,,):/,, or maximizes the interclass inertia:

I .<. / -~. q
B= E P, dZ(g,, g)
I~r.<..,(
where n: is the number of observations concerned
with the jth modality. This distance gives more where p: is the weight of the cluster C~. For the
importance to rare modalities than to frequent ones. normalized distance, B is written:
This distance is useful for the analysis of responses
to questions provided by subjects. The same answer B: E Pr E [(m;-m,)/~r,] 2 (2)
given by a majority of subjects is generally not very 1~:<~1, t~d<~p
informative. ,,,,here m i is the mean of the jth attribute and m~ its
When the attributes are not of the same type mean in the class C+. Starting from a random initial
(numerical or symbolic), we speak of mixed data. partition, an iterative process of partition construc-
Mixed data cause problems because it is difficult to tion is used that finds a partition for which the
define a general distance that takes into account the quality criterion takes its optimum.
different attribute scale lexels. We propose the fol- The final partition is described by the list of
lowing distance: observations belonging to each cluster ranged from
the most typical (the closest observation to the class
dZ(o,o')=rr,d~,,:(o.o'! + rr;d~:(,,,o') (1t center of gravity) to the least typical (the furthest
observation from the class center of gravity). Inter-
where -rr~ and rr~ are ~-eights for balancing the pretation of the cluster is made by examining the
numerical and symbolic attributes groups influence. typical attributes.
The difficulty is to choose good weights. In
(Ralambondrainy, 19871 we proposed choices o!
weights that are consistent with the inertia criterion
optimized by the K-means algorithm. 3. Data and knowledge representation
If g is the center of gravity of the observations.
the inertia of the set of observations is written: As input, we have data and the related back-
ground knowledge. After the aggregation step, new
r= E (.,d:(o, .: ,)} = information is available: the clustering description,
the prototypes and the typical attributes of each
where p, is the weight ot tth observation, ir~ = cluster (statistical background knowledge). In this
~, {pid~/,,'-(o,, g ) ! o ~ O} and T, = section, we propose a logical framework for repre-
E{pid~'~(oi, g) I o, c O} are the partial inertia rc- senting this knowledge.
lated to numerical and symbolic attributes. A reason-
able strategy is to choose the weights in such a wa~, Attribuws representation
that each component in Eq. (1) has the same contri- The following observation
bution to the inertia T: ,-rt7: = rr_~T~ = 1. Using this
John = (sex man) (town Paris) (job professor)
"normalization", the attributes number or the partial
inertia value related to each component is not a bias has been represented by a vector in the previous
for the K-means algorithm. section. This p-tuple may be interpreted as a predi-
Let P = { C 1. . . . . ( ' k } be a partition into k classes cate attribute(sex, man) A attribute(town, Paris) A
of the set of observations (), and g~ . . . . , g~ the anribute(job, professor) where A is the logical oper-
centers of gravity' of the clusters. The K-means ator " a n d " that is true for the observation John. We
1150 H. Ralar~ bondrainy / Pattern Recognition Letters 10 (1995) 1147-1157
show below how' to represent p-tuples using predi- * "It doesn'tmatter"

cates. , It doesn'tmatter"
Symbolic attributes representation nO '" ~" "
Aiatticestructure L = I E . ~ . v . /~, * . O ) w i l l no yes ?

be associated with each symbolic attribute (X, D =
{d I . . . . . d,,}) where:
4) "impossible" ~ l ~
E is a set of subsets of D that contains the
singletons {d:} 1<. j~ ,,. The set E is called the search
space associated with the attribute X. dp "impossible"
~<, is a relation " i s less general than" that is a Fig. 1. Examplesof generalizationlattices.
partial order on E which refines the inclusion rela-
tion: Ve, J ' ~ E , e<~f ~ f ~ e .
L is a lattice having one greatest member D values domain. There are various general methods to
denoted also by * that is interpreted as "'all calue~s do this. The bounds of the classes may be defined by
are possible". L has a minimal element ~) ("impos- the study of the distribution of the attribute X or by
sible c,alue"). Every (e. f ) has one least upper using clustering algorithms such as Fisher's one
bound that is denoted by e v)C~ L called also the (Fisher, 1958). The coding approach to be proposed
"generalization" of e and f. The greatest lower relies on the specific characteristics of clusters result-
bound of e and f is denoted e AJ. ing from the K-means aggregation process.
For example, the lattice may be L = ( k - = Let us denote by C the class to be characterized,
{d;}z<,i,<m , <~, V, /~. , , ~ ) . It is the most simple by m e a n t ( X ) the mean of X in C, and by o"x the
structure possible (no structure on the attribute's standard deviation of X. For any class C determined
domain). When this search space is chosen, the by the K-means algorithm, a value r of X is typical
cluster recognition rules have no attribute value dis- for this class if it verifies:
junction (such as attribute(job, professor or
m e a n c ( X ) - Crx<~r<~meanc(X ) + o"x. (3)
engineer)).
The coding function c : ~ --* D = {inf, typical,
Example. Consider the question "'Is advertising a sup} is defined as follows:
good thing?" having the following answers D =
I typical if mean c ( X ) - o - x~<r
{yes, no, ? = " I don't know"}. Fig. 1 illustrates two
possible generalization lattices. For example, if we ~< m e a n t ( X ) + o-x,
have: c(r)= ]inf if r < meanc( X ) - o " x,
John = (sex, man×advertising, "' y e s " L l sup if m e a n t ( X ) + Crx<r.
Mary = (sex, woman)( advertising, ?)),
The lattice associated with a numerical attribute is
then
related to the search space E = {inf, typical, sup}.
John v Mary = (sex, * t(advertising, "'may be" ).
The generalization structure is defined by the user Object representation
from the background knowledge available. It can be We suppose now that we want to characterize a
given by the study of the correlation among at- given class C. The observations are characterized
tributes or by the results of numerical hierarchical with a set of symbolic a n d / o r numerical attributes.
algorithms (Everitt, 1974 ! A generalization structure is associated with each
one. Attributes are represented by the set of triplets
Numerical attribute.s (Xj, D , Lj)~_<j,<p. If we denote by Ej the search
In the characterization step. to be processed a space related to the attribute Xi, the product set
numerical attribute (X, ~) will be coded into a 12= E~ × . . . ×Ep is called the space of objects
symbolic one (X, D) by defining a partition on the and the set of observations O is included in 12.
H. Ralambondrainv / Pattern Recognition Letters lO (1995) 1147-1157 1151
The lattice of the objects of examples will be defined as the most typical
The object set ~ has a lattice structure: L = L~ observations of the cluster to be described. More
X ... XLp=(g2=E1x . - . x E p , ~<, v , A, *. precisely, an observation is an example of a given
0) as the product of the lattices L i. The partial order cluster if its distance to the prototype is less than the
on L is the product of the orders defined in E: average distance of the cluster observations from the
and we have for every ~ o = ( % ) 1 ~ , ~ 1 , and o = prototype of this cluster. The observations of all the
( O j ) 1 ~< j~< p in .Q: other clusters will be the counterexamples.
w~<0 '=, (%~<%)1~,~-r."

The typical attributes
o v co= (oJ v o)l.~,.,p.
Depending on an attribute's contribution deter-
0 A W = ( t°/ A 0,, }1 ~,,~ t,. mining the cluster, the attributes are ranged from the
most typical to the least typical. This result can be
The largest member of f2 is ,(2 also denoted by used to select attributes to reduce the complexity of
* and the minimal element is denoted by [t. the characterization phase.
Logical objects representation

For wj ~ Ej, we define the predicate .4,0' : .(2-*
4. T h e c h a r a c t e r i z a t i o n process
{true, false} such as: Vw' = ( w',)l., : ~ ; ~ g2 A o,(co')
= true if w~ ~< % else A ( w ' ) = false. We associ-
ated to w=(oo,)l<.:~p~.(2 the predicate A~,= Statistical concept
Al<.j.<p A% : .(2--*{true. false} where A is the Several representations have been given to the
logical operator "'and". It is easy to verify that concept notion (see (Matheus, 1987) for a review).
A,o(w') = true if and only if w' 4 w. The map A,o is Numerical algorithms describe concepts in extension
the characteristic function of the set of the predeces- and leave the users the difficult problem of interpret-
sors of w. ing the clusters. Aristotle first proposed to character-
It is a routine to provc that the set K ( f / ) = ize concepts using necessary and sufficient condi-
{A., I ~o e ~} is a lattice (K(f2. ~ . V, A, T r u e tions but this representation is unfeasible for real
= A 0, False = A 0) isomorphic to L and we have for applications. Smith and Medin (1981) introduced the
to, O E . Q : notion of probabilistic concept: a cluster is described
w~<o ~ (A~o ~ A ). by attributes and their associated frequencies in the
cluster. A higher concept characterization has been
ce=oVo) ** A . = . 4 , . " ~ ,4 . proposed by Dennis et al. (1973), Hanson and Bauer
~=oAo <:* A ¢ = A , . " .4 . (1986) and Hanson (1990) using polymorphic rules
~im features out of n). This last definition describes
better categories used by people. The distance we
Statistical background knowledge proposed in Section 2, tends to minimize polymor-
The aggregation step generates knowledge that phy in the concept characterization. In Section 3 we
will guide the characterization process. In the previ- gave the definition of typical attribute values of a
ous section, we have shown how to build a general- given cluster. A given observation is typical of a
ization structure with respect to the numerical at- cluster if most of its attribute values are typical
tribute distribution in a given cluster. The choice of values of the cluster. The characterization process
examples and typical attributes can be derived from goal is to find recognition rules (statistical concept)
the aggregation step result. that express that the typical observations of a given
cluster share a significant number of typical at-
Examples and counterexamples tributes. Other concept definitions have been pro-
An observation is all the more typical of a cluster posed in the frame of fuzzy theory (see (Takagi et
the closer it is to the prototype of the cluster. The set al., 1991) for example).
1152 H Ralambondraim / Pattern Recognition Letters 10 (1995) 1147-1157
The characterization algorithm criminating predicates generated by q examples (q

We first set the optimization problem related to < I E [ ). The predicate A ~ v A e' is computed as we
the learning problem. Then, we describe the main have explained in Section 3 using the structure lat-
features of the proposed learning algorithm. tice defined on each attribute. It is obvious that:
In Section 3, we have associated to o~ =
( wj)l ~ i ~ p ~ -Q the predicate A~, = A 1~ i ~-p A ~,, D,~_disc = EX~-disc U EX2.disc U - . . U EXq_di~c.
K(I2). A cluster C may be represented by the Let us define the operator ® by:
predicate A~ = V { A I w ~ C}. Our goal is to ap-
proximate A c with a predicate ¢~c ~ K(~(2) that is
® E X = { A V B I A, B ~ E X , A C B ,
more general and more simple than A c. A v B a-discriminating},
Let 6 denote the distance defined by the symmet-
rical difference between characteristic functions. We
®:EX= { A V B ] A, B ~ ®EX, A --,sB,
want to find / i t ~ K(~)) that minimizes 6 ( A ( . . . 4 ( ) A v B ol-discriminating},
with A c = V ~ , < ~ A ; under the following condi-
tions: q minimal (simplicity criterion) and A more ® ~ E X = { A V B I A, B ~ ®2EX, A ~ B ,
general than A,o (generally criterion). These criteria A V B a-discriminating},
have a simple interpretation if the set of examples is
etc.
the cluster C and the set of counterexamples is the
complement C' of C in .Q. Then the distance Using the fact that if C = A v B is a-discriminat-
6(A c, Ac) is the number of counterexamples recog- ing then A and B are or-discriminating, i.e., a
nized by A c plus the number of examples not recog- predicate a-discriminating is generated from a-dis-
nized by A c. criminating predicates, it is easy to verify that:
The algorithm proposed, called GENER, includes
® E X =- EX(~.disc,
the two following sequential steps.
Step 1. Find a set of "'pertinent" conjunctive ® =EX = EX2_d~. U EX~a~s,,
predicates Aj more general than the predicates A ® 3ffX = e X , 4 disc. k._)EXS_disc U EX2_disc U EX2_disc
for w ~ C .
Step 2. Find a predicate ,4( disjunction of A ~ that U EX2_disc ,
approximates A~. etc.
The optimization problem suggests the following For example, let us compute ® 2EX. If C ~ ® 2EX
conditions to select the predicates: then C is a-discriminating and C is written: C = A
v B with A =A~I VAe2 ~ ®EX and B =Ae3 V
(C1) The predicate A~ must be a-discriminating.
A, 4 c ®EX where el, e2, e3, e4 are examples such
It means that the number or percentage of counter-
as el ~ e2 and e3 ~ e4. We have C =Ae~ VAe= V
examples recognized must be less than a : [counter-
A. 3 V A. 4 as A v a B then C u-EX3_disc or C E
examples(Ai)l ~< a.
EX,4,-disc"
(C2) The number or percentage of elements of the
Conversely, let C =A~I V Ae= V A~3 ~ EX3_di~c, C
cluster C recognized bv A, must be greater than /3:
can be also written as C = ( A e l VAe2) V ( A ~ z V
/3 ~< lexamples(Aj) I.
A,~). As C is a-discriminating then A~I VA~2 and
In the first phase, we determine the set of predi- A , : V A , . 3 are a-discriminating and Ae~ V A ~ 2 ~
cates D,~_disc= {A,o I w-~ 12} where A~, is a-dis- ®IX, A~2 VA~3 ~ ®EX as well as C ~ ®2EX. In
criminating and subsumes at least two examples. the same way, we can prove that if C ~EX4_ai~c
Let E be the set of examples, E X = { A c I e ~ E } , then C ~ ® :EX.
EX #,_di~={AeVA,,,
~ I e, e ' ~ E , eg=e ', A~,VA Using the operator ®, we construct the sets
a-discriminating}, EX~_d,~,,= {A ~ V A e' v Ac. I e, e'. EX,~_disc for r = 2 . . . . . q and the set D~_disc. The
e"~E, ee:e'~e", A , ' ~ A ~ VAe,. a-discriminat- algorithm for constructing the s e t Dad_disc is shown
ing) and EX~_di~ the last set that contains o~-dis- below.
tt Ralomhondratn~ /Pattern Rcc*<k,nttt,m l ellers 16 (1995) 1 1 4 7 - 1 1 5 7 1153
(s.=)l$:e
Counter cxamplc> 'ix.5
Examples :.[ a.b.c.d !
DISC=,
[terat~tm ]
N E'~V=~~:,l.~,, * ~:I)R()P=/~.
~ ~ ~ .
NE\V='ItZ,k
~ ,'~1. [)IS('= [c.t'} ~
hcratlol:2 N E \ ~ , : I I .DROP:
it V
>-/E\~, ={ I } [)15,(= : c.f}
S:i c,i ,
L
['ILL' "
NEW = the set of examples Fig. 2 illustrates the algorithm on a very simple
DISC = { } example.
1. Compute predicates that subsume at least two Let S = {A~ . . . . . Am} be the resulting set of pred-
examples: icates that satisfy the two conditions (C1) and (C2).
The predicate R s = A l V • • • V A , , is an approxima-
NEW={A =gvg' ~. <~,, ~ : N E W a n d g*g'}
tion of the characteristic function of the cluster C.
2. Remove all elements from NEW that are not This predicate may be made of a great number of
oz-discriminating: terms and may include predicates A j, A k such as
A j <. A z:. It can be simplified according to the follow-
DROP = { A I A~ ~ NEW' and lng,
A predicate of S is pertinent if it recognizes

c~ < [counterexamples( A; )!},
many elements of the cluster C. W e suppose now
NEW = NEW - DROP. that the predicates A ~ , . . . , A m of S are arranged
3. Update the set of predicates that are c~-dis- decreasingly, according to the number of recognized
criminating (condition (C1)): d e m e n t s of C.
- If the set of elements of C recognized by A r is
DISC = DISC U NEW. included in those recognized by A t V - . . V A t then
4. If new predicates have been added to DISC go to A, can be dropped. This will be denoted by A ,
1 else go to 5, , A~ .< - - - VAk (the predicate A t V . . - VAk
5. Remove the elements oL DISC that recognize less is more general than A r in the context of the cluster
than /3 examples (condition (C2)). ('t. More precisely, the algorithm is written:
Table 1
Party Mean o"
Communist party: European electron (('m8'41 7.5 3.5
Communist party: Regional election i ('m921 7.8 3.7
Socialist Party: European election (So891 24.0 3.5
Socialist Parlv: Regional election (Soq2) 21.6 6.2
Environment party: European clecmm ( EnS9 ) 11.6 2.1
Environment party: Regional election ( En'~2 ) 13.7 3.8
Conservative parties UDF + RPR: European electron {('~,~91 37.5 5.4
Conservative parties UDF -- RPR: Regional election 1('s921 38.6 7.0
Ultraright National Front Part',: European election I Ntb, g l 1(t.8 3.8
Ultraright National Front Part', : Regional election ! Nt'~2 i 12.9 5.0
1154 H. Ralamhondrainy / Pattern Recognition Letters 16 (1995) 1147-1157
Table .~"
D~partement CmSO ('m92 So89 So92 En89 En92 Cs89 Cs92 Nf89 Nf92
01 Ain 4.93 4.78 22.8 21.35 11.54 15.2 40.9 43.3 12.5 15.2
02 Aisne 9.83 9.83 24.4 20.4 11.2 15.3 33.3 32.3 11.1 12.5
03 Allier 16.9 20.8 22.1 16.3 8.8 11.2 37.1 37.0 7.86 9.37
Initialization: /{c = AT. objective of the study is to find groups of ddparte-

For r = 2 to m do: if A, ~ { / { ( then nothing else ments which have the same vote profile. In the first
,i,, =~(, v/ix. phase, the aggregation algorithm is p e r f o r m e d to
determine a partition o f the ddpartements. In the
One predicate is added to / ~ if it recognizes new second phase, clusters are described using a classical
observations o f C. The concept related to cluster C statistics coefficient and the G E N E R algorithm pro-
will be described by /{(. posed. Results are c o m p a r e d with rules derived f r o m
a decision tree method.
The national m e a n scores o f the French parties
5. Example of application studied are listed in Table 1. A sample of data is
listed in Table 2.
The data concern the scores of French political Several trials have been p e r f o r m e d and the best
parties in 96 d@artements (counties) in the 1989 partition o f 4 clusters has been selected to be de-
European elections and 1992 regional elections. The scribed. F r o m the expression (2) in Section 2, w e
1 +l -1 0 +l
Cm89 Cm89
Cm92 Cm92
So89 So89
So92 So92
Ev89 Ev89
Ev92 Ev92
Cs89 Cs89
Cs92 Cs92
Nf89 Nf89
N~92 Nl'92
Cluster I Cluster 2
-I 0 +1 -1 0 +1
Cm89
Cm92
Cm89
Cm92 :1:
So89 So89
So92 So92
Ev89 Ev89
Ev92 Ev92
Cs89 Cs89
Cs92 Cs92
NI89
N~92
N f89
Nf92 :I
Cluster 3 Cluster 4
Fig. 3. Contributions of attributes to clusters determination.
H. Ralambondrainy/ Pattern Recognmon Letters 10 (1995) 1147-1157 1155
define the degree of participation of attributes to The n u m b e r of rules that satisfies the conditions
cluster determination by the coefficient: (C1) and (C2) is 115.
The n u m b e r of rules after the simplification step
c o e f ( j , r ) = ( rn~ - m s ) / <
is 3.
where mj is the mean, % the standard deviation of From Eq. (3) in Section 3, we define:
the j t h attribute and m~ is the m e a n in the class ('~.
Cons89 ÷ ) = ( 3 8 . 6 ~< Cs89 ~< 4 9 ) ,
The attribute j is typical for the cluster C,, if
Icoeffj, r)] is greater than 1, i.e., the attribute mean Cons92 ÷ ) = ( 4 4 . 2 ~< Cs92 ~< 5 1 . 7 ) ,
in this class differs significantly from the attribute
Cons + ) = ( C o n s 8 9 + A Cons92 + ),
m e a n in the population. Fig. 3 displays coef(j, r ) for
each cluster r and each attribute j. C o m 8 9 - ) = (2.3 ~ C m 8 9 ~< 8 . 8 ) ,
W e can give the following interpretation of the Com92 - ) = (2.9 ~< C m 9 2 ~< 8 . 8 ) ,
clusters: Cluster 1 (21%) regroups leftist d6parte-
ments that vote heavily for the Socialist party. Clus- Com- ) = ( C o m 8 9 - A C o m 9 2 - ),
ter 2 (21%) can be labelled as " C o n s e r v a t i v e " . Nati89 ) = (4.9 ~< Nf89 ~< 10.7),
D6partements of this cluster support the center-right
Nati92 - ) = (5 ~< Nf92 ~< 12.9),
party U D F and the neo-Gaullist party RPR. The
scores of parties in the biggest Cluster 3 (38%) are Nati - ) = (Nati89 - A Nati92 - ),
not significantly different from the average except
Soci89 = ) = ( 2 0 ~< So89 ~< 2 6 . 8 ) ,
for the E n v i r o n m e n t party. The last Cluster 4 (20%}
reflects " e x t r e m e " political opinion. Only the ultra- ( Soci92 = ) = ( 14.5 ~< So92 ~ 2 5 ) ,
right National Front and C o m m u n i s t parties have ( Soci = ) = ( Soci89 = A Soci92 = ),
high scores.
The G E N E R algorithm has been applied to tie- ( E n v i 8 9 = ) = (9.5 ~ E n 8 9 ~< 12.6),
scribe each cluster. Only the results related to Cluster ( E n v i 9 2 = ) = (8.3 ~< E n 9 2 ~< 15.3),
2 (20 examples) are given here:
( Envi = ) = ( E n v i 8 9 = A Envi92 = )
o~ = 2; /3 = 7; Examples = Cluster 2; Counterex-
amples = Cluster 1 U Cluster 3 W Cluster 4. where " ÷ " m e a n s higher than the average, " - "
The process has been stationary after nine itera- means lower than the average and " = " is for
tions. around the average.
CF02=44.2
Cluster ; Clu~lcr I t lus:cr 4 C]u:~ter 3 Cluster 2

~6.83';;, . 2(I, 909c) ,2~1 . ;~r~ , 129 89.7%}
. (21 , 90.5%)
decisiou rule 'V p rccnta~,e of cluster objects
number of objects
FiG 4. l'hc Ircc on dusiers.
1156 H Ralamhondrainv / Pattern Recognition Letter~ 16 (1995) 1147-1157
The rule quality is measured by the percentage of 6. C o n c l u d i n g r e m a r k s

objects from Cluster 2 recognized by the rule. The
final set of rules related to Cluster 2 or "Conserva- Many conceptual clustering algorithms have been
t i v e " is the following: proposed but they do not have the maturity of nu-
merical ones. Our approach has not been to propose
(R1) I f Cons + and N a t i 9 2 - then Conservative
a completely new algorithm, but to extend the sym-
(80%).
bolic features of the well known K-means algorithm.
(R2) I f Soci89 = and Cons92 + and Nati92 - then
Our starting point is the claim that the chosen aggre-
Conservative (70c~;).
gation algorithm generates statistical background
(R3) I f S o c i = and Envi = and Corn - and Nati .....
knowledge and concepts. We have proposed a frame
and Cons89 + then Conservative (35%).
to represent this knowledge and a characterization
algorithm to display this kind of concept. Could
I f ( R 1 ) o r (R2) o r (R3t then Conservative ( 9 5 % )
decision tree methods help the user to interpret clus-
An alternative stratcg} is to consider a decision ters produced by another method? Our experience is
tree method for characterizing the clusters. This kind that a system may be efficient to discriminate clus-
of method is very popular in numeric (the C A R l ters but not appropriate to describe them. The cluster
system proposed by Breiman et al. (1984)) and characterization process, such as GENER, must be
symbolic research (the algorithm ID3 of Quinlan consistent with the aggregative process. The result of
(1986)). The nonparametric decision tree algorithm K-means and G E N E R is a fast " n u m e r i c - s y m b o l i c "
DNP (Friedman, 1977: Ccleux et al., 1989) chosen clustering method that produces meaningful clusters.
has built the decision tree shown in Fig. 4 using the Extensions to this approach would be to consider
K o l m o g o r o v - S m i r n o v distance. From this tree, we classification paradigms other than partitioning.
can characterize Cluster 2 by the following rule:
I f 44.2 ~< Cs92 then ( o n s e r v a t i v e (90.5c~). Acknowledgement

It appears that our characterization algorithm gives
a more complete and precise description of " C o n - i would like to thank P. Sims from British
servative". The rules found by GENER take into Aerospace for its remarks. This work has been par-
account the typical attributes of " C o n s e r v a t i v e " (cf. tially supported by M L T Machine Learning Toolbox
Fig. 3). The reasons why the proposed algorithm is (MET) Esprit project. The version of the described
more suitable for the concept description goal are as algorithm is available in the M L T system.
follows:
- The objective of decision tree methods is classi-
fication or prediction and not concepts description. References
- Decision tree methods select at each step, in a
top-down way, the best discriminating attribute for Brciman. L., J. Friedman, J. Olshcn and C. Stone (1984). Classifi-
constructing the tree. This strategy eliminates many cation and Regression Trees. Wadsworth, Belmont, CA.
pertinent rules. Cclcux, G., E. Diday, Y. Govaert, G. Lechevallier and H. Ralam-
- Each discriminate analysis has its own assump- bondrainy (1989). Classification automatique des donndes;
encironment statistique et informatique. Dunod, Paris.
tions and its own strategy for decision tree induction. Dennis, I~ J.A. Hampton and S. Lea (1973). New problem in
To select the discriminating attributes, ID3 uses an concept formation. Nature 243, 103-102.
evaluation function based on a formula from infor- Esposito. F., D. Malerba and G. Semeraro (1992). Classification
mation theory, whereas (~ART uses the Gini func- in noisy environments using a distance measure between
tion. The selected attributes are not necessarily typi- structural symbolic descriptions. IEEE Trans. Pattern Anal
Mach. lntell. 14, 390-402.
cal for the clusters. Standard decision tree methods Everitt, B.S. (1974). Cluster Analysis. Heinemann, London.
do not exploit the background knowledge resulting Fisher, W.D. (1958). On grouping for maximum homogeneity. J.
from the aggregation s t e p Amer. Star. Ass. 53, 789-798.
I1 Ralambondraint / Pattern Rc~ o.k,nttion Letters 10 (1995) 1147-1157 1157
Fisher, D. and P. Langlc 3 (1~)~51. Approaches to conceptual scntation and learning in machines and humans. Ph.D. Thesis,
clustering. In: Proc. Ninth lnternat. Joint Con]'. Artificial Computer Science, University of Illinois, Urbana-Champaign,
Intelligence. Kauffmann. l ~ s Angeles, CA, 688 697. [k.
Friedmann, J.H. (19771. A recursive partitioning decision rule for Michalski, R. (1984). A theory and method of inductive learning.
non parametric classification, lEEK Trans. Comput. 26. 4114- In: R. Michalski, J. Carbonell and T. Mitchell, Eds., Machine
408. Learning. An Artificial Intelligence Approach. Springer, New
Gowda, K.C. and E. Diday (19911 Symbolic clustering using a York, 83 134.
new dissimilarity measure. Pattern Recognition 24. 567 57~,. Ouinlan, J.R. (1986). Induction of decision trees. Machine Learn-
Greenacre, M. (19841, Theory and Applications of Correspon- ing 1, 81-106.
dence Analysis. Academic Press, New, York, Ralambondrainy, H. (19871. A clustering method for nominal data
Hanson, S.J. and M. Bauer (19861. Machine learning, clustering and mixture of numerical and nominal data. Proc. First Conf.
and polymorphy. In: L.N. Kanal and J.F. Lammer, Eds.. Internat. Federation of Classification Societies, Aachen.
Uncertain O' in Artificial Intelligence. North-Holland, Amster- Stepp, R.E. and R.S. Michalski (1986). Conceptual clustering of
dam, 415 428. structured objects: a goal-oriented approach. Artificial Intelli-
Hanson, S.J. (1991)). Conceptual clustering and categorization: gence 28. 43 69.
bridging the gap between induction and causal models. In: 5" Stcpp, R.E. 11987). Concepts in conceptual clustering. Proc.
Kodratoff and R.S. Michalski, Eds.. Machine Learning: 4n Tenth Joint Conf. on Artificial Intelligence, Milan.
Artificial Intelligence Approach Morgan Kaufmann, [x)s Al- Smith, E.E. and D.L. Medin (19811. Categories and Concepts.
tos, CA, 3, 235-268. Harvard Univ. Press, Cambridge, MA.
Jain, A.K. and R.C. Dubes (19~;~l Algorithms fi)r Clustering,, Iakagi, T.. T. Yamaguchi and M. Sugeno (1991). Conceptual
Data. Prentice-Hall, Englewoods Cliffs. NJ. fuzzy sets. F'uz~, Engineering toward Human Friendly Sys-
KodratofL Y. and G. Tecuci ( 19881. Learning based on conceptual tems. lOS Press, IFES91, 261-272.
distance, lEEK Trans. Patterns Anal. Mach lntcll, 111. 897 Vcnkateswarlu, N.B. and P.S.V.S.K. Raju (1992). Fast Isodata
909. clustering algorithms. Pattern Recognition 25, 335-342.
Matheus, C.J. (19871. Conceptual purpose implications for repro-

A Conceptual Version of The K-Means Algorithm: Pattern Recognition Letters

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Conceptual Version of The K-Means Algorithm: Pattern Recognition Letters

Transféré par

Droits d'auteur :

Formats disponibles

Pattern Recognition

A conceptual version of the K-means algorithm

Received 26 July 1094: revised 20 April 1995

Keywords: Learning; Conceptual clustering; Rule discovery from data

1. Introduction required to classify astronomical images containing

0167-8655/95/$09.51) ~'; 1995 Else,,ier Sv'ienc¢ B.'V. All rights rescr,~cd

d~:(o, o') = E (",-/,,):/,, or maximizes the interclass inertia:

show below how' to represent p-tuples using predi- * "It doesn'tmatter"

Symbolic attributes representation nO '" ~" "

Aiatticestructure L = I E . ~ . v . /~, * . O ) w i l l no yes ?

w~<0 '=, (%~<%)1~,~-r."

Logical objects representation

The characterization algorithm criminating predicates generated by q examples (q

A predicate of S is pertinent if it recognizes

Initialization: /{c = AT. objective of the study is to find groups of ddparte-

Cluster ; Clu~lcr I t lus:cr 4 C]u:~ter 3 Cluster 2

decisiou rule 'V p rccnta~,e of cluster objects

The rule quality is measured by the percentage of 6. C o n c l u d i n g r e m a r k s

I f 44.2 ~< Cs92 then ( o n s e r v a t i v e (90.5c~). Acknowledgement

Vous aimerez peut-être aussi