Vous êtes sur la page 1sur 15

Industrial Statistics

MS3001 Advanced Marketing Research


Faculty of Science University of Colombo

Application of Multivariate Statistical Methods in Marketing Research


Session 4 Cluster Analysis

December 25, 2013

Illustration
I need to identify groups of target consumers who are similar in buying habits, demographic characteristics, or psychographics. Can districts of Sri Lanka be grouped based on demographics, socio-cultural parameters, agricultural operations, extent of development in infrastructure etc?

Cluster Analysis

December 25, 2013

Cluster Analysis
In simple terms, Cluster Analysis does to objects or entities what Factor Analysis does to variables.
Cluster Analysis groups objects based on a set of variables. The groups would be relatively homogenous within and heterogeneous across.

A range of Clustering procedures:


Hierarchical
Each cluster (starting with the whole dataset) is divided into two, then divided again, and so on

K-Means
No. of clusters are subjectively input by the researcher.

December 25, 2013

Cluster Analysis
Requires variables to be metric (interval or ratio) Large samples preferred The worth of the solution often depends on the internal consistency of the clusters and their validity

December 25, 2013

What is Cluster Analysis?


Cluster: a collection of data objects Cluster analysis
Similar to the objects in the same cluster (Intraclass similarity) Dissimilar to the objects in other clusters (Interclass dissimilarity)

Statistical method for grouping a set of data objects into clusters A good clustering method produces high quality clusters with high intraclass similarity and low interclass similarity

Clustering is unsupervised classification Can be a stand-alone tool or as a preprocessing step for other algorithms

December 25, 2013

Group objects according to their similarity


Cluster: a set of objects that are similar to each other and separated from the other objects.

Example: green/ red data points were generated from two different normal distributions

December 25, 2013

K-Means Clustering
The meaning of K-means
Why it is called K-means clustering: K points are used to represent the clustering result; each point corresponds to the centre (mean) of a cluster

Each point is assigned to the cluster with the closest center point The number K, must be specified Basic algorithm

December 25, 2013

The K-Means Clustering Method


Given k, the k-means algorithm is implemented in 4 steps:
Partition objects into k non-empty subsets Arbitrarily choose k points as initial centers Assign each object to the cluster with the nearest seed point (center) Calculate the mean of the cluster and update the seed point Go back to Step 3, stop when no more new assignment Iterate until stable (= no object move group):
Determine the centroid coordinate Determine the distance of each object to the centroids Group the object based on minimum distance

The basic step of k-means clustering is simple:

December 25, 2013

The K-Means Clustering Results


Example
10

10 9 8 7 6 5

10
9

9
8

8
7

7
6

6
5

5
4

4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10

Assign each objects to most similar center

3 2 1 0 0 1 2 3 4 5 6 7 8 9 10

Update the cluster means

4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10

reassign
10 9 8
10 9 8 7 6

reassign

K=2 Arbitrarily choose K object as initial cluster center

7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10

Update the cluster means

5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10

December 25, 2013

Weaknesses of the K-Means Method


Unable to handle noisy data and outliers Very large or very small values could skew the mean

December 25, 2013

Hierarchical Clustering
Start with every data point in a separate cluster Keep merging the most similar pairs of data points/clusters until we have one big cluster left This is called a bottom-up or agglomerative method

December 25, 2013

Hierarchical Clustering (cont.)


This produces a binary tree or dendrogram The final cluster is the root and each data item is a leaf The height of the bars indicate how close the items are

December 25, 2013

Hierarchical Clustering Demo

December 25, 2013

Strengths & Weakness of Hierarchical Clustering Methods


Major advantage
Conceptually very simple Easy to implement most commonly used technique

December 25, 2013

Applications
Market segmentation is usually conducted using some form of cluster analysis to divide people into segments
Other methods such as latent class models or archetypal analysis are sometimes used instead

It is also possible to cluster other items such as products/SKUs, image attributes, brands

December 25, 2013

Vous aimerez peut-être aussi