Académique Documents
Professionnel Documents
Culture Documents
2
¡ 1)
For
each
point,
place
it
in
the
cluster
whose
current
centroid
it
is
nearest
¡ 2)
AAer
all
points
are
assigned,
update
the
loca<ons
of
centroids
of
the
k
clusters
¡ 3)
Reassign
all
points
to
their
closest
centroid
§ Some<mes
moves
points
between
clusters
x x x x x x
x … data point
… centroid
Round 1
4
x
x
x
x
x
x x x x x x
x … data point
… centroid
Round 2
5
x
x
x
x
x
x x x x x x
x … data point
… centroid
Round 3
6
How
to
select
k?
¡ Try
different
k,
looking
at
the
change
in
the
average
distance
to
centroid,
as
k
increases.
7
Too few; x
many long x
xx x
distances
x x
to centroid. x x x x x
x x x x x
x xx x xx x
x x x x
x x
x x x
x x x x
x x x
x
8
x
Just right; x
distances xx x
rather short. x x
x x x x x
x x x x x
x xx x xx x
x x x x
x x
x x x
x x x x
x x x
x
9
Too many; x
little improvement x
in average xx x
distance. x x
x x x x x
x x x x x
x xx x xx x
x x x x
x x
x x x
x x x x
x x x
x
10
Average
falls
rapidly
un<l
right
k,
then
falls
much
more
slowly
Best value
of k
Average
distance to
centroid k
11
¡ Approach
1:
Sampling
§ Cluster
a
sample
of
the
data
using
hierarchical
clustering,
to
obtain
k
clusters
§ Pick
a
point
from
each
cluster
(e.g.
point
closest
to
centroid)
§ Sample
fits
in
main
memory
¡ Can we cluster in a single pass over the data?
13