© All Rights Reserved

0 vues

© All Rights Reserved

- Detection of Bacterial Blight on Pomegranate Leaf
- Final Report
- 3. Project Report
- Evaluation of E-Learners Behaviour using Different Fuzzy Clustering Models: A Comparative Study
- 1506877
- Document Clustering for Forensic Analysis
- A 091
- FuzzyClusteringToolbox
- Segmentation
- k Means Clusterring 2
- amc11413(thamkhaoeFCM)
- Clustering
- ks20
- Clustering
- CLUSTER ANALYSIS OF INTERNET USAGE IN EUROPEAN COUNTRIESimicevicatal
- Bello Orgaz Gema
- A High Dimensional Clustering Scheme for Data Classification
- Svm
- Discovering Data Set Nature through Algorithmic Clustering Based on String Compression.pdf
- Clustering Analysis

Vous êtes sur la page 1sur 6

Weighted Clustering

University of Waterloo University of Waterloo Aarhus University University of Waterloo

mackerma@cs.uwaterloo.ca shai@cs.uwaterloo.ca simina@cs.au.dk dloker@cs.uwaterloo.ca

for their specific applications.

We investigate a natural generalization of the classical clus- In this paper, we formulate intuitive properties that may

tering problem, considering clustering tasks in which differ-

ent instances may have different weights. We conduct the first

allow a user to select an algorithm based on how it treats

extensive theoretical analysis on the influence of weighted weighted data. Based on these properties we obtain a classi-

data on standard clustering algorithms in both the partitional fication of clustering algorithms into three categories: those

and hierarchical settings, characterizing the conditions under that are affected by weights on all data sets, those that ig-

which algorithms react to weights. Extending a recent frame- nore weights, and those methods that respond to weights on

work for clustering algorithm selection, we propose intuitive some configurations of the data but not on others. Among

properties that would allow users to choose between cluster- the methods that always respond to weights are several well-

ing algorithms in the weighted setting and classify algorithms known algorithms, such as k-means and k-median. On the

accordingly. other hand, algorithms such as single-linkage, complete-

linkage, and min-diameter ignore weights.

Introduction Perhaps the most notable is the last category. We find that

methods belonging to that category are robust to weights

Many common applications of clustering, such as facility when data is sufficiently clusterable, and respond to weights

allocation and vector quantization, may naturally be cast as otherwise. Average-linkage as well as the well-known spec-

weighted clustering tasks - tasks in which some data points tral objective function, ratio cut, both fall into this category.

should have a greater effect on the utility of the clustering We characterize the precise conditions under which these

than others. Consider vector quantification that aims to find methods are influenced by weights.

a compact encoding of signals that has low expected distor-

tion. The accuracy of the encoding is most important for sig- Related Work

nals that occur frequently. With weighted data, such a con-

sideration is easily captured by having the weights of the Clustering algorithms are usually analysed in the context of

points represent signal frequencies. unweighted data. The only related work that we are aware

When applying clustering to facility allocation, such as of is from the early 1970s. (Fisher and Ness 1971) intro-

the placement of police stations in a new district, the dis- duced several properties of clustering algorithms. Among

tribution of the stations should enable quick access to most these, they mention “point proportion admissibility”, which

areas in the district. However, the accessibility of different requires that the output of an algorithm should not change if

landmarks to a station may have varying importance. The any points are duplicated. They then observe that a few algo-

weighted setting enables a convenient method for prioritis- rithms are point proportion admissible. However, clustering

ing certain landmarks over others. algorithms can display a much wider range of behaviours

Traditional clustering algorithms can be readily trans- on weighted data than merely satisfying or failing to satisfy

lated into the weighted setting. This leads to the following point proportion admissibility. We carry out the first exten-

fundamental question: Given a specific weighted clustering sive analysis of clustering on weighted data, characterising

task, how should a user select an algorithm for that task? the precise conditions under which algorithms respond to

Recently, a new approach for choosing a clustering algo- weight.

rithm has been proposed (see, for example, (Ackerman, Ben- In addition, (Wright 1973) proposed a formalisation of

David, and Loker 2010b)). This approach involves identify- cluster analysis consisting of eleven axioms. In two of these

ing significant properties that distinguish between cluster- axioms, the notion of mass is mentioned. Namely, that points

ing paradigms in terms of their input/output behavior. When with zero mass can be treated as non-existent, and that mul-

such properties are relevant to the user’s domain knowledge, tiple points with mass at the same location are equivalent to

one point with weight the sum of the masses. The idea of

Copyright c 2012, Association for the Advancement of Artificial mass has not been developed beyond stating these axioms in

Intelligence (www.aaai.org). All rights reserved. their work.

858

Our work falls within a recent framework for clustering algorithms based on their response to weights. First, we de-

algorithm selection. The framework is based on identifying fine what it means for a partitional algorithm to be weight

properties that address the input/output behaviour of algo- responsive on a clustering. We present an analogous defini-

rithms. Algorithms are classified based on intuitive, user- tion for hierarchical algorithms when we study hierarchical

friendly properties, and the classification can then be used to algorithms below.

assist users in selecting a clustering algorithm for their spe- Definition 1 (Weight responsive). A partitional cluster-

cific application. So far, research in this framework has fo- ing algorithm A is weight-responsive on a clustering C of

cused on the unweighed partitional (Ackerman, Ben-David, (X, d) if

and Loker 2010a), (Bosagh-Zadeh and Ben-David 2009),

1. there exists a weight function w so that A(w[X], d) = C,

(Ackerman, Ben-David, and Loker 2010b) and hierarchical

and

settings (Ackerman and Ben-David 2011). This is the first

application of the framework to weighted clustering. 2. there exists a weight function w0 so that A(w0 [X], d) 6=

C.

Preliminaries Weight-sensitive algorithms are weight-responsive on all

clusterings in their range.

A weight function w over X is a function w : X → R+ .

Given a domain set X, denote the corresponding weighted Definition 2 (Weight Sensitive). An algorithm A is weight-

domain by w[X], thereby associating each element x ∈ X sensitive if for all (X, d) and all C ∈ range(A(X, d)), A is

with weight w(x). A distance function is a symmetric func- weight-responsive on C.

tion d : X × X → R+ ∪ {0}, such that d(x, y) = 0 if and At the other extreme are clustering algorithms that do not

only if x = y. We consider weighted data sets of the form respond to weights on any data set. This is the only category

(w[X], d), where X is some finite domain set, d is a distance that has been considered in previous work, corresponding to

function over X, and w is a weight function over X. “point proportion admissibility”(Fisher and Ness 1971).

A k-clustering C = {C1 , C2 , . . . , Ck } of a domain set X Definition 3 (Weight Robust). An algorithm A is weight-

is a partition of X into 1 < k < |X| disjoint, non-empty robust if for all (X, d) and all clusterings C of (X, d), A is

subsets of X where ∪i Ci = X. A clustering of X is a k- not weight-responsive on C.

clustering for some 1 < k < |X|. To avoid trivial partitions, Finally, there are algorithms that respond to weights on

clusterings that consist of a single cluster, or where every some clusterings, but not on others.

cluster has a unique element, are not permitted.

Definition 4 (Weight Considering). An algorithm A is

PDenote the weight of a cluster Ci ∈ C by w(Ci ) = weight-considering if

x∈Ci w(x). For a clustering C, let |C| denote the num-

ber of clusters in C. For x, y ∈ X and clustering C of X, • There exists an (X, d) and a clustering C of (X, d) so that

write x ∼C y if x and y belong to the same cluster in C and A is weight-responsive on C.

x 6∼C y, otherwise. • There exists an (X, d) and C ∈ range(A(X, d)) so that

A partitional weighted clustering algorithm is a function A is not weight-responsive on C.

that maps a data set (w[X], d) and an integer 1 < k < |X| To formulate clustering algorithms in the weighted set-

to a k-clustering of X. ting, we consider their behaviour on data that allows dupli-

A dendrogram D of X is a pair (T, M ) where T is a cates. Given a data set (X, d), elements x, y ∈ X are dupli-

binary rooted tree and M : leaves(T ) → X is a bi- cates if d(x, y) = 0 and d(x, z) = d(y, z) for all z ∈ X. In a

jection. A hierarchical weighted clustering algorithm is a Euclidean space, duplicates correspond to elements that oc-

function that maps a data set (w[X], d) to a dendrogram cur at the same location. We obtain the weighted version of

of X. A set C0 ⊆ X is a cluster in a dendrogram D = a data set by de-duplicating the data, and associating every

(T, M ) of X if there exists a node x in T so that C0 = element with a weight equaling the number of duplicates of

{M (y) | y is a leaf and a descendent of x}. For a hierar- that element in the original data. The weighted version of an

chical weighted clustering algorithm A, A(w[X], d) out- algorithm partitions the resulting weighted data in the same

puts a clustering C = {C1 , . . . , Ck } if Ci is a cluster in manner that the unweighted version partitions the original

A(w[X], d) for all 1 ≤ i ≤ k. A partitional algorithm A data. As shown throughout the paper, this translation leads

outputs clustering C on (w[X], d) if A(w[X], d, |C|) = C. to natural formulations of weighted algorithms.

For the remainder of this paper, unless otherwise stated,

we will use the term “clustering algorithm” for “weighted Partitional Methods

clustering algorithm”.

In this section, we show that partitional clustering algo-

Finally, given clustering algorithm A and data set (X, d),

rithms respond to weights in a variety of ways. Many pop-

let range(A(X, d)) = {C | ∃w such that A outputs C

ular partitional clustering paradigms, including k-means, k-

on (w[X], d)}, i.e. the set of clusterings that A outputs on

median, and min-sum, are weight sensitive. It is easy to see

(X, d) over all possible weight functions.

that methods such as min-diameter and k-center are weight-

robust. We begin by analysing the behaviour of a spectral

Basic Categories objective function ratio cut, which exhibits interesting be-

Different clustering algorithms respond differently to haviour on weighted data by responding to weight unless

weights. We introduce a formal categorisation of clustering data is highly structured.

859

A−s(x,y) A A

Ratio-Cut Spectral Clustering m−1 <m holds when m < s(x, y), and the latter holds

We investigate the behaviour of a spectral objective function, by choice of x and y.

ratio-cut (Von Luxburg 2007), on weighted data. Instead of Case 2: The similarities between every pair of clusters are

a distance function, spectral clustering relies on a similar- the same. However, there are clusters C1 , C2 , C3 ∈ C, so

ity function, which maps pairs of domain elements to non- that the similarities between C1 and C2 are greater than the

negative real numbers that represent how alike the elements ones between C1 and C3 . Let a and b denote the similarities

are. between C1 , C2 and C1 , C3 , respectively.

The ratio-cut of a clustering C is rcut(C, w[X], s) = Let x ∈ C1 and w a weight function, such that w(x) = W

P for large W , and weight 1 is assigned to all other points in

1 X x∈Ci ,y∈X\Ci s(x, y) · w(x) · w(y) X. The dominant term comes from clusters going into C1 ,

P . specifically edges that include point x. The dominant term

2 C ∈C x∈Ci w(x)

i

of the contribution of cluster C3 is W b and the dominant

The ratio-cut clustering function is rcut(w[X], s, k) = term of the contribution of C2 is W a, totalling W a + W b.

arg minC;|C|=k rcut(C, w[X], s). We prove that this func- Now consider clustering C 0 obtained from clustering C

tion ignores data weights only when the data satisfies a very by merging C1 with C2 , and splitting C3 into two clus-

strict notion of clusterability. To characterise precisely when ters (arbitrarily). The dominant term of the clustering comes

ratio-cut responds to weights, we first present a few defini- from clusters other than C1 ∪ C2 , and the cost of clusters

tions. outside C1 ∪ C2 ∪ C3 is unaffected. The dominant term of

A clustering C of (w[X], s) is perfect if for all the cost of the two clusters obtained by splitting C3 is W b

x1 , x2 , x3 , x4 ∈ X where x1 ∼C x2 and x3 6∼C x4 , for each, for a total of 2W b. However, the factor of W a that

s(x1 , s2 ) > s(x3 , x4 ). C is separation-uniform if there ex- C2 previously contributed is no longer present. This replaces

ists λ so that for all x, y ∈ X where x 6∼C y, s(x, y) = λ. the coefficient of the dominant term from a + b to 2b, which

Note that neither condition depends on the weight function. improved the cost of the clustering because b < a.

We show that whenever a data set has a clustering that is Lemma 2. Given a clustering C of (X, s) where every clus-

both perfect and separation-uniform, then ratio-cut uncovers ter has more than one point, if C is not perfect than ratio-cut

that clustering, which implies that ratio-cut is not weight- is weight-responsive on C.

sensitive. Note that these conditions are satisfied when all

between-cluster similarities are set to zero. On the other The proof of the lemma is included in the ap-

hand, we show that ratio-cut does respond to weights when pendix (Anonymous 2012).

either condition fails. Lemma 3. Given any data set (w[X], s) that has a perfect,

Lemma 1. Given a clustering C of (X, s) where every clus- separation-uniform k-clustering C, ratio-cut(w[X], s, k) =

ter has more than one point, if C is not separation-uniform C.

then ratio-cut is weight-responsive on C.

Proof. Let (w[X], s) be a weighted data set, with a perfect,

separation-uniform clusteringPC = {C1 , . . . , Ck }. Recall

Proof. We consider two cases. that for any Y ⊆ X, w(Y ) = y∈Y w(y). Then,

Case 1: There is a pair of clusters with different similar-

ities between them. Then there exist C1 , C2 ∈ C, x ∈ C1 ,

k

P P

and y ∈ C2 so that s(x, y) ≥ s(x, z) for all z ∈ C2 , and 1X x∈Ci y∈Ci s(x, y)w(x)w(y)

rcut(C, w[X], s) =

there exists a ∈ C2 so that s(x, y) > s(x, a).

P

2 i=1 x∈Ci w(x)

Let w be a weight function such that w(x) = W for some k

P P

sufficiently large W and weight 1 is assigned to all other 1 x∈Ci λw(x)w(y)

P y∈Ci

X

=

points in X. Since we can set W to be arbitrarily large, when 2 i=1 x∈Ci w(x)

looking at the cost of a cluster, it suffices to consider the k

P P k

dominant term in terms of W . We will show that we can λ X w(y)

y∈Ci x∈Ci w(x) λX X

= P = w(y)

improve the cost of C by moving a point from C2 to C1 . 2 i=1 x∈Ci w(x) 2 i=1

y∈Ci

Note that moving a point from C2 to C1 does not affect the k k

dominant term of clusters other than C1 and C2 . Therefore, λ X δ X

= w(Ci ) = [w(X) − w(Ci )]

we consider the cost of these two clusters before and after 2 i=1

2 i=1

rearrangingP points between these clusters. k

!

Let A = a∈C2 s(x, a) and let m = |C2 |. Then the dom- λ X λ

= kw(X) − w(Ci ) = (k − 1)w(X).

A

inant term, in terms of W , of the cost of C2 is W m . The cost 2 i=1

2

of C1 approaches a constant as W → ∞. 0 0 0

Now consider clustering C 0 obtained from C by moving y Consider any other clustering, C = {C1 , . . . , Ck } 6= C.

from cluster C2 to cluster C1 . The dominant term in the cost Since C is both perfect and separation-uniform, all between-

of C2 becomes W A−s(x,y) cluster similarities in C equal λ, and all within-cluster simi-

m−1 , and the cost of C1 approaches

larities are greater than λ. From here it follows that all pair-

a constant as W → ∞. By choice of x and y, if A−s(x,y)

m−1 < wise similarities in the data are at least λ. Since C is a k-

A 0

m then C has lower loss than C when W is large enough. clustering different from C, it must differ from C on at least

860

one between-cluster edge, so that edge must be greater than Proof. Consider any S ⊆ X. Let w be a weight func-

λ. 0

tion over X where w(x) = W if x ∈ S, for large W ,

So the cost of C is, and w(x) = 1 otherwise. As shown by (Ostrovsky et

P P al. 2006), the k-means objective function is equivalent to

k 0 s(x, y)w(x)w(y) d(x,y)2 ·w(x)·w(y)

1 X x∈Ci y∈Ci0

P

0

rcut(C , w[X], s) = P

x,y∈Ci

w(Ci ) Let m1 = minx,y∈X d(x, y)2 >

.

2 i=1 x∈Ci

0 w(x)

P P 0, m2 = maxx,y∈X d(x, y)2 , and n = |X|. Con-

k 0 λw(x)w(y) sider any k-clustering C where all the elements in S be-

1 X x∈Ci y∈Ci0

> P long to distinct clusters. Then k-means(C, w[X], d) <

2 i=1 x∈C

0 w(x)

2

i

km2 (n + nW ). On the other hand, given any k-clustering

λ 0

= (k − 1)w(X) = rcut(C). C where at least two elements of S appear in the

2 2

same cluster, k-means(C 0 , w[X], d) ≥ W m1

W +n . Since

0

0

So clustering C has a higher cost than C. limW →∞ k-means(C ,w[X],d)

k-means(C,w[X],d) = ∞, k-means separates all the

elements in S for large enough W .

We can now characterise the precise conditions under

which ratio-cut responds to weights. Ratio-cut responds to It can also be shown that the well-known min-sum objec-

weights on all data sets but those where cluster separation is tive function is also weight-separable.

both very large and highly uniform. Formally,

Theorem

P 3. Min-sum, which minimises the objective func-

Theorem 1. Given a clustering C of (X, s) where ev- tion

P

d(x, y) · w(x) · w(y), is weight-

Ci ∈C x,y∈Ci

ery cluster has more than one point, ratio-cut is weight- separable.

responsive on C if and only if either C is not perfect, or

C is not separation-uniform. Proof. The proof is similar to that of the previous theorem.

Several other objective functions similar to k-means,

K-Means namely k-median and k-mediods are also weight-separable.

The details appear in the appendix (Anonymous 2012).

Many popular partitional clustering paradigms, including k-

means, k-median, and min-sum, are weight sensitive. More-

over, these algorithms satisfy a stronger condition. By mod- Hierarchical Algorithms

ifying weights, we can make these algorithms separate any

set of points. We call such algorithms weight-separable. Similarly to partitional methods, hierarchical algorithms

also exhibit a wide range of responses to weights. We show

Definition 5 (Weight Separable). A partitional clustering

that Ward’s method, a successful linkage-based algorithm,

algorithm A is weight-separable if for any data set (X, d)

as well as popular divisive heirarchical methods, are weight

and any S ⊂ X, where 2 ≤ |S| ≤ k, there exists a weight

sensitive. On the other hand, it is easy to see that the linkage-

function w so that x 6∼A(w[X],d,k) y for all disjoint pairs

based algorithms single-linkage and complete-linkage are

x, y ∈ S.

both weight robust, as was observed in (Fisher and Ness

Note that every weight-separable algorithm is also 1971).

weight-sensitive. Average-linkage, another popular linkage-based method,

Lemma 4. If a clustering algorithm A is weight-separable, exhibits more nuanced behaviour on weighted data. When

then A is weight-sensitive. a clustering satisfies a reasonable notion of clusterability,

then average-linkage detects that clustering irrespective of

weights. On the other hand, this algorithm responds to

Proof. Given any (w[X], d), let C = A(w[X], d, k). Se- weights on all other clusterings. We note that the notion of

lect points x and y where x ∼C y. Since A is weight- clusterability required for average-linkage is a lot weaker

separable, there exists w0 so that x 6∼A(w0 [X],d,k) y, and so than the notion of clusterability used to characterise the be-

A(w0 [X], d, k) 6= C. haviour of ratio-cut on weighted data.

Hierarchical algorithms output dendrograms, which con-

K-means is perhaps the most popular clustering ob- tain multiple clusterings. Please see the preliminary section

jective function, with cost: k-means(C, w[X], d) = for definitions relating to the hierarchical setting. Weight-

2

P P

Ci ∈C x∈Ci d(x, cnt(Ci )) , where cnt(Ci ) denotes the responsive for hierarchical algorithms is defined analo-

center of mass of cluster Ci . The k-means optimizing func- gously to Definition 1.

tion finds a clustering with minimal k-means cost. We show

that k-means is weight-separable, and thus also weight- Definition 6 (Weight responsive). A clustering algorithm

sensitive. A is weight-responsive on a clustering C of (X, d) if (1)

there exists a weight function w so that A(w[X], d) out-

Theorem 2. The k-means optimizing function is weight- puts C, and (2) there exists a weight function w0 so that

separable. A(w0 [X], d) does not output C.

861

Weight-sensitive, weight-considering, and weight-robust non-negative real αi s. Similarly, `AL (X1 , X3 , d, w0 ) =

are defined as in the preliminaries section, with the above W 2 d(x1 ,x3 )+β1 W +β2

W 2 +β3 W +β4 for some non-negative real βi s.

definition for weight-responsive.

Dividing W , we see that `AL (X1 , X3 , d, w0 ) →

2

∞, and so the result holds since d(x1 , x3 ) < d(x1 , x2 ).

Linkage-based algorithms start by placing each element in Therefore average linkage merges X1 with X3 , so clus-

its own cluster, and proceed by repeatedly merging the “clos- ter Ci is never formed, and so C is not a clustering in

est” pair of clusters until the entire dendrogram is con- AL(w0 [X], d).

structed. To identify the closest clusters, these algorithms

use a linkage function that maps pairs of clusters to a Finally, average-linkage outputs all nice clusterings

real number. Formally, a linkage function is a function ` : present in a data set, regardless of weights.

{(X1 , X2 , d, w) | d, w over X1 ∪ X2 } → R+ . Lemma 6. Given any weighted data set (w[X], d), if C is a

Average-linkage is one of the most popular linkage-based nice clustering of (X, d), then C is in the dendrogram pro-

algorithms (commonly applied in bioinformatics

P under the duced by average-linkage on (w[X], d).

name UPGMA). Recall that w(X) = x∈X w(x). The

average-linkage linkage function is Proof. Consider a nice clustering C = {C1 , . . . , Ck } over

P (w[X], d). It suffices to show that for any 1 ≤ i < j ≤ k,

x∈X1 ,y∈X2 d(x, y) · w(x) · w(y) X1 , X2 ⊆ Ci where X1 ∩ X2 = ∅ and X3 ⊆ Cj ,

`AL (X1 , X2 , d, w) = .

w(X1 ) · w(X2 ) `AL (X1 , X2 , d, w) < `AL (X1 , X3 , d, w).

To study how average-linkage responds to weights, we P It can be show that `AL (X1 , X2 , d, w) ≤

x1 ∈X1 w(x1 )·maxx2 ∈X2 d(x1 ,x2 )

give a relaxation of the notion of a perfect clustering. w(X1 ) and `AL (X1 , X3 , d, w) ≥

P

Definition 7 (Nice). A clustering C of (w[X], d) is nice if x1 ∈X1 w(x1 )·minx3 ∈X3 d(x1 ,x3 )

w(X1 ) .

for all x1 , x2 , x3 ∈ X where x1 ∼C x2 and x1 6∼C x3 ,

d(x1 , x2 ) < d(x1 , x3 ). Since C is nice, minx3 ∈X3 d(x1 , x3 ) >

maxx2 ∈X2 d(x1 , x2 ), thus `AL (X1 , X3 ) >

Data sets with nice clusterings correspond to those that `AL (X1 , X2 ).

satisfy the “strict separation” property introduced by Balcan

et al. (Balcan, Blum, and Vempala 2008). As for a perfect Ward’s Method

clustering, being a nice clustering is independent of weights. Ward’s method is a highly effective clustering algo-

We present a complete characterisation of the way that rithm (Everitt 1993), which, at every step, merges the

average-linkage (AL) responds to weights, showing that it clusters that will yield the minimal increase to the k-

ignores weights on nice clusterings, but responds to weights means cost. Let ctr(X, d, w) be the center of mass

on all other clusterings. of the data set (w[X], d). Then, the linkage func-

Theorem 4. For any data set (X, d) and clustering C ∈ tion for Ward’s method is `W ard (X1 , X2 , d, w) =

range(AL(X, d)), average-linkage is weight robust on w(X1 )·w(X2 )·d(ctr(X1 ,d,w),ctr(X2 ,d,w))2

w(X1 )+w(X2 ) .

clustering C if and only if C is a nice clustering.

Theorem 5. Ward’s method is weight sensitive.

Theorem 4 follows from the two lemmas below.

The proof is included in the appendix (Anonymous 2012).

Lemma 5. If a clustering C = {C1 , . . . , Ck } of (X, d) is

not nice, then either C 6∈ range(AL(X, d)) or average- Divisive Algorithms

linkage is weight-responsive on C.

The class of divisive clustering algorithms is a well-known

Proof. Assume that there exists some w so that C ∈ family of hierarchical algorithms, which construct the den-

AL(w[X], d). If it does not exist then we are done. We con- drogram by using a top-down approach. This family of al-

struct w0 so that C 6∈ AL(w0 [X], d). gorithms includes the popular bisecting k-means algorithm.

Since C is not nice, there exist 1 ≤ i, j ≤ k, i 6= j, and We show that a class of algorithms that includes bisecting

x1 , x2 ∈ Ci , x1 6= x2 , and x3 ∈ Cj , so that d(x1 , x2 ) > k-means consists of weight-sensitive methods.

d(x1 , x3 ). Given a node x in dendrogram (T, M ), let C(x) denote the

Now, define weigh function w0 as follows: w0 (x) = 1 for cluster represented by node x. Formally, C(x) = {M (y) |

all x ∈ X \ {x1 , x2 , x3 }, and w0 (x1 ) = w0 (x2 ) = w0 (x3 ) = y is a leaf and a descendent of x}. Informally, a P-Divisive

W , for some large value W . We argue that when W is suffi- algorithm is a hierarchical clustering algorithm that uses a

ciently large, C is not a clustering in AL(w0 [X], d). partitional clustering algorithm P to recursively divide the

By way of contradiction, assume that C is a clustering data set into two clusters until only single elements remain.

in AL(w0 [X], d) for any setting of W . Then there is a step Formally,

in the algorithm where clusters X1 and X2 merge, where Definition 8 (P-Divisive). A hierarchical clustering algo-

X1 , X2 ⊂ Ci , x1 ∈ X1 , and x2 ∈ X2 . At this point, there is rithm A is P-Divisive with respect to a partitional cluster-

some cluster X3 ⊆ Cj so that x3 ∈ X3 . ing algorithm P, if for all (X, d), we have A(w[X], d) =

We compare `AL (X1 , X2 , d, w0 ) and `AL (X1 , X3 , d, w0 ). (T, M ), such that for all non-leaf nodes x in T with chil-

2

`AL (X1 , X2 , d, w0 ) = W d(x 1 ,x2 )+α1 W +α2

W 2 +α3 W +α4 , for some dren x1 and x2 , P(w[C(x)], d, 2) = {C(x1 ), C(x2 )}.

862

Partitional Hierarchical Acknowledgements

Weight k-means, k-medoids Ward’s method

Sensitive k-median, Min-sum Bisecting k-means

This work supported in part by the Sino-Danish Center for

Weight Ratio-cut Average-linkage the Theory of Interactive Computation, funded by the Dan-

Considering ish National Research Foundation and the National Science

Weight Min-diameter Single-linkage Foundation of China (under the grant 61061130540). The

Robust k-center Complete-linkage authors acknowledge support from the Center for research in

the Foundations of Electronic Markets (CFEM), supported

Table 1: Classification of weighted clustering algorithms. by the Danish Strategic Research Council. This work was

also supported by the Natural Sciences and Engineering Re-

search Council of Canada (NSERC) Alexander Graham Bell

We obtain bisecting k-means by setting P to k-means.

Canada Graduate Scholarship.

Other natural choices for P include min-sum, and exemplar-

based algorithms such as k-median. As shown above, many

of these partitional algorithms are weight-separable. We References

show that whenever P is weight-separable, then P-Divisive Ackerman, M., and Ben-David, S. 2011. Discerning linkage-

is weight-sensitive. The proof of the next theorem appears based algorithms among hierarchical clustering methods. In

in the appendix (Anonymous 2012). IJCAI.

Ackerman, M.; Ben-David, S.; and Loker, D. 2010a. Charac-

Theorem 6. If P is weight-separable then the P-Divisive terization of linkage-based clustering. In COLT.

algorithm is weight-sensitive. Ackerman, M.; Ben-David, S.; and Loker, D. 2010b. To-

wards property-based classification of clustering paradigms.

In NIPS.

Conclusions Agarwal, P. K., and Procopiuc, C. M. 1998. Exact and ap-

proximation algorithms for clustering. In SODA.

We study the behaviour of clustering algorithms on weighted

Anonymous. 2012. Weighted Clustering Appendix. http:

data, presenting three fundamental categories that describe

//wikisend.com/download/573754/clustering appendix.pdf.

how such algorithms respond to weights and classifying sev-

eral well-known algorithms according to these categories. Arthur, D., and Vassilvitskii, S. 2007. K-means++: The ad-

Our results are summarized in Table 1. We note that all of vantages of careful seeding. In SODA.

our results immediately translate to the standard setting, by Balcan, M. F.; Blum, A.; and Vempala, S. 2008. A discrim-

mapping each point with integer weight to the same number inative framework for clustering via similarity functions. In

of unweighted duplicates. STOC.

Our results can be used to aid in the selection of a clus- Bosagh-Zadeh, R., and Ben-David, S. 2009. A uniqueness

tering algorithm. For example, in the facility allocation ap- theorem for clustering. In UAI.

plication discussed in the introduction, where weights are Dasgupta, S., and Long, P. M. 2005. Performance guarantees

of primal importance, a weight-sensitive algorithm is suit- for hierarchical clustering. J. Comput. Syst. Sci. 70(4):555–

able. Other applications may call for weight-considering al- 569.

gorithms. This can occur when weights (i.e. number of du- Everitt, B. S. 1993. Cluster Analysis. John Wiley & Sons Inc.

plicates) should not be ignored, yet it is still desirable to Fisher, L., and Ness, J. V. 1971. Admissible clustering pro-

identify rare instances that constitute small but well-formed cedures. Biometrika 58:91–104.

outlier clusters. For example, this applies to patient data on

potential causes of a disease, where it is crucial to investi- Hartigan, J. 1981. Consistency of single linkage for high-

gate rare instances. While we do not argue that these con- density clusters. J. Amer. Statist. Assoc. 76(374):388–394.

siderations are always sufficient, they can provide valuable Jain, A. K.; Murty, M. N.; and Flynn, P. J. 1999. Data clus-

guidelines when clustering data that is weighted or contains tering: a review. ACM Comput. Surv. 31(3):264–323.

element duplicates. Kaufman, L., and Rousseeuw, P. J. 2008. Partitioning Around

Our analysis also reveals the following interesting phe- Medoids (Program PAM). John Wiley & Sons, Inc. 68–125.

nomenon: algorithms that are known to perform well in Ostrovsky, R.; Rabani, Y.; Schulman, L. J.; and Swamy, C.

practice (in the classical, unweighted setting), tend to be 2006. The effectiveness of Lloyd-type methods for the k-

more responsive to weights. For example, k-means is highly means problem. In FOCS.

responsive to weights while single linkage, which often per- Talagrand, M. 1996. A new look at independence. Ann.

forms poorly in practice (Hartigan 1981), is weight robust. Probab. 24(1):1–34.

We also study several k-means heuristics, specifically the Vapnik, V. 1998. Statistical Learning Theory. New York:

Lloyd algorithm with several methods of initialization and Wiley.

the PAM algorithm. These results were omitted due to a Von Luxburg, U. 2007. A tutorial on spectral clustering. J.

lack of space, but they are included in the appendix (Anony- Stat. Comput. 17(4):395–416.

mous 2012). Our analysis of these heuristics lends further

Wright, W. E. 1973. A formalization of cluster analysis. J.

support to the hypothesis that the more commonly applied

Pattern Recogn. 5(3):273–282.

algorithms are also more responsive to weights.

863

- Detection of Bacterial Blight on Pomegranate LeafTransféré parEditor IJRITCC
- Final ReportTransféré pargarry
- 3. Project ReportTransféré parM Shahid Khan
- Evaluation of E-Learners Behaviour using Different Fuzzy Clustering Models: A Comparative StudyTransféré parijcsis
- 1506877Transféré parDe Bu
- Document Clustering for Forensic AnalysisTransféré parijsret
- A 091Transféré parAmr Magdy
- FuzzyClusteringToolboxTransféré parraphthewizard
- SegmentationTransféré parPriyanka Alison
- k Means Clusterring 2Transféré parDonald Church
- amc11413(thamkhaoeFCM)Transféré parhoangthai82
- ClusteringTransféré parSatheesh Chandran
- ks20Transféré parNuthangi Mahesh
- CLUSTER ANALYSIS OF INTERNET USAGE IN EUROPEAN COUNTRIESimicevicatalTransféré parMirjana Pejic Bach
- Bello Orgaz GemaTransféré parJ Jesús Villanueva García
- SvmTransféré parQuân Bka
- ClusteringTransféré parPhạmTrungKiên
- A High Dimensional Clustering Scheme for Data ClassificationTransféré parAnonymous 7VPPkWS8O
- Discovering Data Set Nature through Algorithmic Clustering Based on String Compression.pdfTransféré parIEEEPROJECTSCHENNAI
- Clustering AnalysisTransféré parshyama
- IJEAS0303042Transféré parerpublication
- Pub 9792Transféré parperl.freak
- RapidMiner Studio Datasheet With FeatureListTransféré parGeorge Chiu
- Learning Traffic Patterns at Intersections by Spectral Clustering of Motion Trajectories.pdfTransféré parsidxxxnitc
- Colloquium PptTransféré parapoorva96
- D4.2.1 First Version of the Data Analytics BenchmarkTransféré pargkout
- Leach Ch5 (Final)Transféré parAhmed Ahmed
- PROJECT XCEATOR.docxTransféré parsridharadg
- Travel Mode Detection Exploiting Cellular NetworkTransféré pardonny_3009
- 07Transféré parClara Bella

- 1Transféré parimran5705074
- Transportation LogisticsTransféré parimran5705074
- Network Security 2Transféré parimran5705074
- Unlock Solar PolyTransféré parimran5705074
- Influenceofgrainsizeandgrain-sizedistributiononworkabilityofgranuleswith3DprintingTransféré parimran5705074
- KTransféré parimran5705074
- AbstractTransféré parimran5705074
- FAQTransféré parimran5705074
- Assignment1 COMP723 2019(2)Transféré parimran5705074
- Lab3ModelAnswers(3)Transféré parimran5705074
- 2Transféré parimran5705074
- IntroductionTransféré parimran5705074
- Vibrationofavariablecross-sectionbeameceTransféré parimran5705074
- Analysis of Axially .. - JtamTransféré parimran5705074
- solution-of-static-and-dynamic-beam-bending-and-static-bucTransféré parimran5705074
- beam theories with solutions...must read.pdfTransféré parRajendra Praharaj
- Statics Problems in MatlabTransféré parJORGE FREJA MACIAS
- 87b0e070d76d2f23332584eb636eae15e0ebTransféré parimran5705074
- OBJ DatastreamTransféré parimran5705074
- Formulation of Governing Differential Equation of Tapered BeamTransféré parimran5705074
- Dynamic Modelling of a Rigid-flexible ManipulatorTransféré parsuvadeep
- Matlab Mechanism DesignTransféré parimran5705074
- Addition OsmosisTransféré parimran5705074
- Module 3 Lean ManagementTransféré parimran5705074
- CHAPTER_2_BASIC_CHARACTERISTICS_OF_SOIL.pdfTransféré parimran5705074
- PDFTransféré parimran5705074
- Can Shipyard Work Towards Lean ShipbuildingTransféré parimran5705074
- Basic Concepts of Lean ManufacturingTransféré parjoseaguilar64
- 2. ShipyardsTransféré parimran5705074
- 2. Ship Building ProcessTransféré parimran5705074

- trig essayTransféré parapi-253723986
- Using EvdreTransféré paranon_753741096
- ssa sylTransféré parManish
- Practice ProblemsTransféré parMichelle Mendoza
- Unit 3Transféré parFirdous A
- Switching and Finite Automata Theory Assignment QuesTransféré parVeeru Manikanta
- Calculas ChallengeTransféré parsrinu27
- M1325 Business Calculus Exam II Review Spring 2012Transféré parlotto20012002
- g 012056063Transféré parinventionjournals
- 9781569904763_9781569904763 SAMPLE PAGES for EPLAN Electric P8 3ETransféré parZoran Nesic
- ama-ch2Transféré parfathur_rachman83
- Counting function points using the IFPUG methodTransféré parluzminda
- Limit and Limit and Continuity Preparation Tips for IIT JEE | askIITiansContinuityTransféré paraskiitian
- Demo-limits ,Continuity and DifferentiabilityTransféré parPintohedfang
- Midterm ReviewTransféré parshivam pandey
- _MCH401_WriteParameterTransféré parJoseph
- Mat Lab TutorialTransféré parMouhamadou Diaby
- al28_23e.pdfTransféré parAhmed Omar Amine
- crtique the reasoning of othersTransféré parapi-291818268
- Lancaster_sample chapter_Intro to modern Bayesian EconometricsTransféré parperri_dimaano
- Short Introduction to Comsol Multi PhysicsTransféré parEdomrg
- Xxx Report Version Oct 2012Transféré parzarcone7
- Livro - Linear Theory of Hydrologic Systems (Dooge, 1988)Transféré parEliézer Cláudio
- 13 - Limits and DerivativesTransféré parharini1995
- 12_chapter4(DISTANCE MEASURES).pdfTransféré parImran Basha
- civilTransféré parapi-236544093
- 2016 MATransféré paralex demskoy
- Mathematical Physic I SyllabusTransféré parTeowanda Putri
- 3U-HSCTransféré parAngela Au
- Sample Assessment Materials Model Answers–Pure for as and a Level MathematicsTransféré parnaz2you

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.