Académique Documents
Professionnel Documents
Culture Documents
Cluster Analysis
It is a class of techniques used to
classify cases into groups that are
relatively
homogeneous
within
themselves
and
heterogeneous
between each other, on the basis of
a defined set of variables. These
groups are called clusters.
Variable 1
Variable 2
Variable 1
A Practical Clustering
Situation
Variable 2
Conducting Cluster
Analysis
Formulate the Problem
Select a Distance Measure
Select a Clustering Procedure
Decide on the Number of Clusters
Interpret and Profile Clusters
Assess the Validity of Clustering
Dij
x
k 1
ki
xkj
Hierarchical
Agglomerative
Linkage Variance
Methods Methods
Divisive
Centroid
Methods
Wards
Method
Single
Linkage
Complete
Linkage
Nonhierarchical
Average
Linkage
Sequential
Threshold
Other
Two-Step
Parallel
Threshold
Optimizing
Partitioning
Clustering procedures
Hierarchical procedures
Agglomerative (start from n clusters,
to get to 1 cluster)
Divisive (start from 1 cluster, to get to
n cluster)
Agglomerative clustering
Agglomerative clustering
Linkage methods
Wards method
1.
2.
Centroid method
.
Linkage Methods of
Clustering
Single Linkage
Minimum
Distance
Cluster 2
Cluster 1
Complete Linkage
Maximum
Distance
Cluster 1
Cluster 1
Average Linkage
Average
Distance
Cluster 2
Cluster 2
Wards Procedure
Centroid Method
Non
hierarchical
clustering
Faster, more reliable
Need to specify the
number
of
clusters
(arbitrary)
Need to set the initial
seeds (arbitrary)
Table 20.1
6
2
7
4
1
6
5
7
2
3
1
5
2
4
6
3
4
3
4
2
4
3
2
6
3
4
3
3
4
5
3
4
2
6
5
5
4
7
6
3
7
1
6
4
2
6
6
7
3
3
2
5
1
4
4
4
7
2
3
2
3
4
4
5
2
3
3
4
3
6
3
4
5
6
2
6
2
6
7
4
2
5
1
3
6
3
3
1
6
4
5
2
4
4
1
4
2
4
2
7
3
4
3
6
4
4
4
4
3
6
3
4
4
7
4
7
5
3
7
2
Cluster 1 Cluster 2
16
1.000000
7
2.000000
13
3.500000
11
5.000000
8
6.500000
14
8.160000
12
10.166667
20
13.000000
10
15.583000
6
18.500000
9
23.000000
19
27.750000
17
33.100000
15
41.333000
5
51.833000
3
64.500000
18
79.667000
4 172.662000
2 328.600000
Coefficient
0
0
6
0
0
7
0
0
15
0
0
11
0
0
16
0
1
9
2
0
10
0
0
11
0
6
12
6
7
13
4
8
15
9
0
17
10
0
14
13
0
16
3 11
18
14
5
19
12
0
18
15 17
19
16 18
0
Results of Nonhierarchical
Clustering
Table 20.4
Initial Cluster Centers
1
V1
V2
V3
V4
V5
V6
4
6
3
7
2
7
Cluster
2
2
3
2
4
7
2
3
7
2
6
4
1
3
Iteration
1
2
Iteration History
Change in Cluster Centers
1
2
3
2.154
2.102
2.550
0.000
0.000
0.000
Results of Nonhierarchical
ClusteringCluster Membership
Table 20.4 cont.
Case Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Cluster
3
2
3
1
2
3
3
3
2
1
2
3
2
1
3
1
3
1
1
2
Distance
1.414
1.323
2.550
1.404
1.848
1.225
1.500
2.121
1.756
1.143
1.041
1.581
2.598
1.404
2.828
1.624
2.598
3.555
2.154
2.102
V1
V2
V3
V4
V5
V6
4
6
3
6
4
6
2
3
2
4
6
3
3
6
4
6
3
2
4
5.568
5.568
5.698
6.928
5.698
6.928
Results of Nonhierarchical
Clustering
Table 20.4, cont.
ANOVA
V1
V2
V3
V4
V5
V6
Cluster
Error
Mean Square
df
Mean Square
df
F
29.108
2
0.608
17
47.888
13.546
2
0.630
17
21.505
31.392
2
0.833
17
37.670
15.713
2
0.728
17
21.585
22.537
2
0.816
17
27.614
12.171
2
1.071
17
11.363
The F tests should be used only for descriptive purposes because the clusters have been
chosen to maximize the differences among cases in different clusters. The observed
significance levels are not corrected for this, and thus cannot be interpreted as tests of the
hypothesis that the cluster means are equal.
Valid
Missing
1
2
3
6.000
6.000
8.000
20.000
0.000
Sig.
0.000
0.000
0.000
0.000
0.000
0.001
Cluster Distribution
Table 20.5, cont.
Cluster
N
6
% of
Combined
30.0%
% of Total
30.0%
30.0%
30.0%
40.0%
40.0%
20
100.0%
100.0%
Combined
Total
20
100.0%
Interpretation
Cluster 3: Fun loving and concerned shoppers
V1: Shopping is fun
V3: I combine shopping with eating out.
Cluster 2: Apathetic shoppers
V5: I dont care about shopping
Cluster 1: Economical shoppers
V2: Shopping is bad for your budget
V4: I try to get the best buys when shopping
V6: You can save a lot of money by comparing
prices.