Académique Documents
Professionnel Documents
Culture Documents
Illustration
I need to identify groups of target consumers who are similar in buying habits, demographic characteristics, or psychographics. Can districts of Sri Lanka be grouped based on demographics, socio-cultural parameters, agricultural operations, extent of development in infrastructure etc?
Cluster Analysis
Cluster Analysis
In simple terms, Cluster Analysis does to objects or entities what Factor Analysis does to variables.
Cluster Analysis groups objects based on a set of variables. The groups would be relatively homogenous within and heterogeneous across.
K-Means
No. of clusters are subjectively input by the researcher.
Cluster Analysis
Requires variables to be metric (interval or ratio) Large samples preferred The worth of the solution often depends on the internal consistency of the clusters and their validity
Statistical method for grouping a set of data objects into clusters A good clustering method produces high quality clusters with high intraclass similarity and low interclass similarity
Clustering is unsupervised classification Can be a stand-alone tool or as a preprocessing step for other algorithms
Example: green/ red data points were generated from two different normal distributions
K-Means Clustering
The meaning of K-means
Why it is called K-means clustering: K points are used to represent the clustering result; each point corresponds to the centre (mean) of a cluster
Each point is assigned to the cluster with the closest center point The number K, must be specified Basic algorithm
10 9 8 7 6 5
10
9
9
8
8
7
7
6
6
5
5
4
4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
reassign
10 9 8
10 9 8 7 6
reassign
7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
Hierarchical Clustering
Start with every data point in a separate cluster Keep merging the most similar pairs of data points/clusters until we have one big cluster left This is called a bottom-up or agglomerative method
Applications
Market segmentation is usually conducted using some form of cluster analysis to divide people into segments
Other methods such as latent class models or archetypal analysis are sometimes used instead
It is also possible to cluster other items such as products/SKUs, image attributes, brands