158 9 (Clustering)

Clustering!
adapted from:
Doug Downey and Bryan Pardo, Northwestern University
Bagging
Use bootstrapping to generate L training sets
and train one base-learner with each
(Breiman, 1996)
Use voting
Unstable algorithms profit from bagging
Boosting
Given a large training set, randomly divide it into 3

sets (X1, X2, and X3)
Use X1 to train D1
Test D1 with X2
Training set for D2 = Take all instances from X2
misclassified by D1 (and also as many instances on
which D1 is correct from X2)
Test D1 and D2 with X3
Training set for D3 = The instances from X3 on
which D1 and D2 disagree
AdaBoost
Generate a
sequence of
base-learners
each focusing
on previous
ones errors
(Freund and
Schapire,
1996)
Mixture of Experts
Voting where weights are input-dependent (gating)
L
y = w jd j
j =1
(Jacobs et al., 1991)
Stacking
Combiner f () is
another learner
(Wolpert, 1992)
Cascading
Use dj only if
preceding ones
are not confident
Cascade
learners in order
of complexity
Clustering
Grouping data into (hopefully useful) sets.

Things on the left
Things on the right
Clustering
Unsupervised Learning
No labels
Why do clustering?
Labeling is costly
Data pre-processing
Text Classification (e.g., search engines, Google Sets)
Hypothesis Generation/Data Understanding
Clusters might suggest natural groups.
Visualization
Some definitions
Let X be the dataset:
X = {x1 , x2 , x3 ,...xn }
An m-clustering of X is a partition of X into m

sets (clusters) C1,Cm such that:
1. Clusters are non - empty : Ci {},1 i m

m
i =1
2. Clusters cover all of X :
3. Clusters do not overlap :
Ci C j = {}, if j i
Ci = X
How many possible clusterings?

(Stirling numbers)
Size of
dataset
m
1
mi
n
S(n, m) = (1) i
m! i=0
i
Number
of clusters
S (15,3) = 2,375,101
S (20,4) = 45,232,115,901
S (100,5) 10
68
What does this mean?
We cant try all possible clusterings.
Clustering algorithms look at a small fraction

of all partitions of the data.
The exact partitions tried depend on the kind

of clustering used.
Who is right?
Different techniques cluster the same data
set DIFFERENTLY.
Who is right? Is there a right clustering?
Classic Example: Half Moons
From Batra et al., http://www.cs.cmu.edu/~rahuls/pub/bmvc2008-clustering-rahuls.pdf
Steps in Clustering
Select Features
Define a Proximity Measure
Define Clustering Criterion
Define a Clustering Algorithm
Validate the Results
Interpret the Results
Kinds of Clustering
Sequential
Cost Optimization
Fast
Fixed number of clusters (typically)
Hierarchical
Start with many clusters

join clusters at each step
A Sequential Clustering Method

m =1
Basic Sequential Algorithmic
C1 = {x1 }
Scheme (BSAS)
S. Theodoridis and K. Koutroumbas, Pattern
For i = 2 to n
Recognition, Academic Press, London England, 1999
Find C k : d(xi ,C k ) = min d(xi ,C j ) Assumption: The number of
j
If (d(xi ,C k ) > ) and (m < q)

m = m +1
C m = {x i }
Else
C k = C k {x i }
End
End
clusters is not known in advance.
d(x,C) = the distance between feature

vector x and cluster C.
= the threshold of dissimilarity
q = the maximum number of clusters
n = the number of data points
A Cost-optimization method
K-means clustering
J. B. MacQueen (1967): "Some Methods for classification and Analysis of

Multivariate Observations, Proceedings of 5-th Berkeley Symposium on
Mathematical Statistics and Probability", Berkeley, University of California Press
1:281-297
A greedy algorithm
Partitions n examples into k clusters
minimizes the sum of the squared distances
to the cluster centers
The K-means algorithm

1.
2.
3.
4.
Place K points into the space represented by the

objects that are being clustered. These points
represent initial group centroids (means).
Assign each object to the group that has the closest
centroid (mean).
When all objects have been assigned, recalculate the
positions of the K centroids (means).
Repeat Steps 2 and 3 until the centroids no longer
move.
K-means clustering
The way to initialize the mean values is not specified.

Randomly choose k samples?
Results depend on the initial means
Try multiple starting points?
Assumes K is known.
How do we chose this?
k-Means Clustering
Find k reference vectors (centroids) which
best represent data
Reference vectors, mj, j =1,...,k
Use nearest (most similar) reference:
x =>mi wheremi has min x m j

j
Encoding/Decoding
Reconstruction Error
k
i i =1
E {m } X = t i b x mi
t
i
t
t
x mj
1 if x mi = min
t
j
bi =
0 otherwise
k-means Clustering
Leader Cluster Algorithm

Instance far away from all centroids (dist >
threshold) => becomes a new centroid
Cluster that covers a large number of
instances (num > threshold) => split into 2
clusters
Cluster that covers too few instances (num <
threshold) can be removed (and perhaps
randomly assigned to another random data
point)
Choosing K
Defined by the application, e.g., image
quantization
PCA
Incremental (leader-cluster) algorithm: Add
one at a time until elbow (reconstruction
error)
Manual check for meaning
Supervised Learning After

Clustering
Nave Bayes Mood Classifier
Training Data
Human Powered Compression

Label each of the following moods with one of the
following seven categories: happy, sad, angry, fearful,
disgusted, surprised or none of the above.
Pleased
Jubilant
Recumbent
Ditzy
Weird
Geeky
Blank
Dirty
Thirsty
Guilty
Hot
Worried
Nervous
Hungry
Nostalgic
Artistic
Crushed
Giggly
LiveJournal Mood Hierarchy
angry (#2)
aggravated (#1)
annoyed (#3)
bitchy (#110)
cranky (#8)
cynical (#104)
enraged (#12)
frustrated (#47)
grumpy (#95)
infuriated (#19)
irate (#20)
irritated (#112)
moody (#23)
pissed off (#24)
stressed (#28)
rushed (#100)
awake (#87)
confused (#6)
determined (#45)
predatory (#118)
devious (#130)
energetic (#11)
curious (#56)
bouncy (#59)
hyper (#52)
enthralled (#13)
happy (#15)
amused (#44)
cheerful (#125)
chipper (#99)
ecstatic (#98)
excited (#41)
K-Means Clustering
Happy
Sad
Angry
Energetic
Bouncy
Happy
Hyper
Cheerful
Ecstatic
Excited
Jubilant
Giddy
Giggly
Confused
Crappy
Crushed
Depressed
Distressed
Envious
Gloomy
Guilty
Intimidated
Jealous
Lonely
Rejected
Sad
Scared
Aggravated
Angry
Bitchy
Enraged
Infuriated
Irate
Pissed off
en
r
pi ag
ss ed
in ed
fu of
ria f
de
an d
gr
y
ira
t
ag bi e
t
c
gr
av hy
at
e
gi d
dd
h y
en yp
er er
g
ec etic
st
at
gi ic
g
ch gly
ee
r
bo ful
un
ju cy
bi
la
ex nt
ci
te
ha d
p
en py
vi
ou
s
l
o
de n
e
p
l
in res y
tim s
e
id d
at
e
sc d
ar
e
gu d
co il
nf ty
u
re sed
je
ct
cr ed
di ap
st p
re y
ss
gl ed
oo
m
je y
al
ou
s
s
cr ad
us
he
d
K-Means Clustering
Number of Posts per Mood
18000
16000
14000
12000
10000
8000
6000
4000
2000

158 9 (Clustering)

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

158 9 (Clustering)

Transféré par

Droits d'auteur :

Formats disponibles

Clustering!

Given a large training set, randomly divide it into 3

(Jacobs et al., 1991)

Grouping data into (hopefully useful) sets.

Let X be the dataset:

An m-clustering of X is a partition of X into m

1. Clusters are non - empty : Ci {},1 i m

2. Clusters cover all of X :

3. Clusters do not overlap :

How many possible clusterings?

What does this mean?

We cant try all possible clusterings.

Clustering algorithms look at a small fraction

The exact partitions tried depend on the kind

Classic Example: Half Moons

From Batra et al., http://www.cs.cmu.edu/~rahuls/pub/bmvc2008-clustering-rahuls.pdf

Start with many clusters

A Sequential Clustering Method

If (d(xi ,C k ) > ) and (m < q)

clusters is not known in advance.

d(x,C) = the distance between feature

J. B. MacQueen (1967): "Some Methods for classification and Analysis of

The K-means algorithm

Place K points into the space represented by the

The way to initialize the mean values is not specified.

x =>mi wheremi has min x m j

Leader Cluster Algorithm

Supervised Learning After

Nave Bayes Mood Classifier

Human Powered Compression

LiveJournal Mood Hierarchy

Vous aimerez peut-être aussi