Industrial Statistics: Application of Multivariate Statistical Methods in Marketing Research

Industrial Statistics
MS3001 Advanced Marketing Research

Faculty of Science University of Colombo
Application of Multivariate Statistical Methods in Marketing Research

Session 4 Cluster Analysis
December 25, 2013
Illustration
I need to identify groups of target consumers who are similar in buying habits, demographic characteristics, or psychographics. Can districts of Sri Lanka be grouped based on demographics, socio-cultural parameters, agricultural operations, extent of development in infrastructure etc?
Cluster Analysis
December 25, 2013
Cluster Analysis
In simple terms, Cluster Analysis does to objects or entities what Factor Analysis does to variables.
Cluster Analysis groups objects based on a set of variables. The groups would be relatively homogenous within and heterogeneous across.
A range of Clustering procedures:

Hierarchical
Each cluster (starting with the whole dataset) is divided into two, then divided again, and so on
K-Means
No. of clusters are subjectively input by the researcher.
December 25, 2013
Cluster Analysis
Requires variables to be metric (interval or ratio) Large samples preferred The worth of the solution often depends on the internal consistency of the clusters and their validity
December 25, 2013
What is Cluster Analysis?

Cluster: a collection of data objects Cluster analysis
Similar to the objects in the same cluster (Intraclass similarity) Dissimilar to the objects in other clusters (Interclass dissimilarity)
Statistical method for grouping a set of data objects into clusters A good clustering method produces high quality clusters with high intraclass similarity and low interclass similarity
Clustering is unsupervised classification Can be a stand-alone tool or as a preprocessing step for other algorithms
December 25, 2013
Group objects according to their similarity

Cluster: a set of objects that are similar to each other and separated from the other objects.
Example: green/ red data points were generated from two different normal distributions
December 25, 2013
K-Means Clustering
The meaning of K-means
Why it is called K-means clustering: K points are used to represent the clustering result; each point corresponds to the centre (mean) of a cluster
Each point is assigned to the cluster with the closest center point The number K, must be specified Basic algorithm
December 25, 2013
The K-Means Clustering Method

Given k, the k-means algorithm is implemented in 4 steps:
Partition objects into k non-empty subsets Arbitrarily choose k points as initial centers Assign each object to the cluster with the nearest seed point (center) Calculate the mean of the cluster and update the seed point Go back to Step 3, stop when no more new assignment Iterate until stable (= no object move group):
Determine the centroid coordinate Determine the distance of each object to the centroids Group the object based on minimum distance
The basic step of k-means clustering is simple:
December 25, 2013
The K-Means Clustering Results

Example
10
10 9 8 7 6 5
10
9
9
8
8
7
7
6
6
5
5
4
4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
Assign each objects to most similar center
3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
Update the cluster means
4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
reassign
10 9 8
10 9 8 7 6
reassign
K=2 Arbitrarily choose K object as initial cluster center
7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
Update the cluster means
5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
December 25, 2013
Weaknesses of the K-Means Method

Unable to handle noisy data and outliers Very large or very small values could skew the mean
December 25, 2013
Hierarchical Clustering
Start with every data point in a separate cluster Keep merging the most similar pairs of data points/clusters until we have one big cluster left This is called a bottom-up or agglomerative method
December 25, 2013
Hierarchical Clustering (cont.)

This produces a binary tree or dendrogram The final cluster is the root and each data item is a leaf The height of the bars indicate how close the items are
December 25, 2013
Hierarchical Clustering Demo
December 25, 2013
Strengths & Weakness of Hierarchical Clustering Methods

Major advantage
Conceptually very simple Easy to implement most commonly used technique
December 25, 2013
Applications
Market segmentation is usually conducted using some form of cluster analysis to divide people into segments
Other methods such as latent class models or archetypal analysis are sometimes used instead
It is also possible to cluster other items such as products/SKUs, image attributes, brands
December 25, 2013

Industrial Statistics: Application of Multivariate Statistical Methods in Marketing Research

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Industrial Statistics: Application of Multivariate Statistical Methods in Marketing Research

Transféré par

Droits d'auteur :

Formats disponibles

Industrial Statistics

MS3001 Advanced Marketing Research

Application of Multivariate Statistical Methods in Marketing Research

December 25, 2013

December 25, 2013

A range of Clustering procedures:

December 25, 2013

December 25, 2013

What is Cluster Analysis?

December 25, 2013

Group objects according to their similarity

December 25, 2013

December 25, 2013

The K-Means Clustering Method

The basic step of k-means clustering is simple:

December 25, 2013

The K-Means Clustering Results

Assign each objects to most similar center

Update the cluster means

K=2 Arbitrarily choose K object as initial cluster center

Update the cluster means

December 25, 2013

Weaknesses of the K-Means Method

December 25, 2013

December 25, 2013

Hierarchical Clustering (cont.)

December 25, 2013

Hierarchical Clustering Demo

December 25, 2013

Strengths & Weakness of Hierarchical Clustering Methods

December 25, 2013

December 25, 2013

Vous aimerez peut-être aussi