Vous êtes sur la page 1sur 20

ADVANCED DATA

ANALYSIS
Factor Analysis & Cluster Analysis

Sumbitted by:
B001 Sahej Abrol
B030 Taranum Kaur
Contents
INTRODUCTION ..................................................................................................................... 2

ABOUT DATA .......................................................................................................................... 3

OBJECTIVE .............................................................................................................................. 4

METHODOLOGY .................................................................................................................... 4

STEP 1: Factor analysis ............................................................................................................. 5

OUTPUT ................................................................................................................................ 6

STEP 2: Cluster analysis .......................................................................................................... 10

OUTPUT .............................................................................................................................. 10

CONCLUSION ........................................................................................................................ 19

1
INTRODUCTION

Retailer performance depends on multiple factors as defined by the company. It is not just a
function of sales value sold by the retailer. The importance of measuring the sales
performance of these retailers is manifold. It helps companies optimize their distribution
strategies, evaluate channel performance, plan sales promotions, forecast demand etc.

In this case, weve used retailer data obtained from an FMCG company. It contains retail data
information of 198 retailers in the city of Chandigarh. The data lists the various parameters of
its sales performance. The aim of this project is to analyse the data and interpret any possible
information the data can provide us in terms of performance, grouping of retailers etc.

Companies often group retailers into segments to design sales promotions and loyalty
programs. This classification or grouping is based on multiple factors. Our aim is to analyse
the given data and classify the retailers into clusters and be able to segment them for
appropriate decision making henceforth.

2
ABOUT DATA
The data set contains the following variables:

State Name Indicates the state the retailer is located in

Stockist Code Indicates the code of the stockist/distributor under which the retailer
operates

Eco It is the number of unique billings of the retailer in that quarter

BillCut Total number of bills cut for the retailer

LineCount Total number of unique product lines sold by the retailer (Product lines
mean different SKUs etc)

Net Sale Total sales value of the products sold by the retailer

The data gives sales information for 198 retailers for financial year 2016-17. The data has
been filtered for any missing points and all outliers.

3
OBJECTIVE
The aim of this project is to classify the retailers into different categories based on the sales
performance.

METHODOLOGY
The process for achieving the objective is firstly, conducting a factor analysis to divide the
variables into simpler components and then conducting a cluster analysis for the same.

Factor Analysis:

Factor analysis is a technique that is used to reduce a large number of variables into fewer
numbers of factors. Factor analysis extracts maximum common variance from all variables
and puts them into a common score. As an index of all variables, we can use this score for
further analysis. Factor analysis is part of general linear model (GLM) and this method also
assumes several assumptions: there is linear relationship, there is no multi-collinearity it
includes relevant variables into analysis, and there is true correlation between variables and
factors. Several types of factor analysis methods are available, but principle component
analysis is used most commonly. Its a statistical method used to describe variability among
observed and unobserved variables. Unobserved variables are called factors. It deals with the
large quantities of data. Observed variables are independent variables and unobserved are
called dependent variables

Cluster Analysis:

The Cluster Analysis is an explorative analysis that tries to identify structures within the data.
Cluster analysis is also called segmentation analysis and used to classify data. More
specifically, it tries to identify homogenous groups of cases, i.e., observations, participants,
respondents.

4
STEP 1: Factor analysis
Source of Data: Summer Internship

Variables in the data:

STATE NAME
STOCKIST ID
ECO
LINES
BILLSCUT
QTY
NET SALES

Preview of the data:

The aim next is to combine various variables into factors. We conduct a Principal Component
Analysis.

Steps to be followed:

1. Analyse Dimension reduction Factor


2. Enter the variables
3. Descriptive Initial solution Coefficients, KMO and Bartlets test OK
4. Extraction Method: Principal Components Analyse: Correlation matrix
5. Varimax Rotation
6. Select save as variables Method: Regression, Display factor score coefficient
matrix

5
OUTPUT
Correlation matrix:

Correlation Matrix

ECO BILLSCUT LINECOUNT QTY NETSALE

Correlation ECO 1.000 .847 .753 .026 .072

BILLSCUT .847 1.000 .899 .247 .300

LINECOUNT .753 .899 1.000 .251 .289

QTY .026 .247 .251 1.000 .931

NETSALE .072 .300 .289 .931 1.000

KMO and Bartlett's T


To carry out a Principal component analysis, there needs to be atleast two or more
Kaiser-Meyer-Olkin Measure of Sampling Ade
correlations of 0.3 or above for the variables of the data. As shown above, this criteria is
Bartlett's Test of Sphericity Approx. Chi-Squ
being met by the current data and variables. df

Sig.

KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .649

Bartlett's Test of Sphericity Approx. Chi-Square 1005.925 KMO and Bartlett's Test

df 10
Kaiser-Meyer-Olkin Measure of Sampling Adequacy.
Sig. .000
Bartlett's Test of Sphericity Approx. Chi-Square

df
The appropriateness of the factor analysis is measured from the KMO measure of sampling
Sig.
adequacy, the minimum value being 0.5. Here the value is 0.649 which means the factor
analysis is appropriate. Also, the chi-square significance is less than 0.05 as per requirement.
This means the Principal component analysis is significant.

Communalities

Initial Extraction

ECO 1.000 .876

BILLSCUT 1.000 .949

LINECOUNT 1.000 .884

QTY 1.000 .964

NETSALE 1.000 .963

6
Communalities

Initial Extraction

ECO 1.000 .876

BILLSCUT 1.000 .949

LINECOUNT 1.000 .884

QTY 1.000 .964

NETSALE 1.000 .963

Extraction Method: Principal Component


Analysis.

Communality is basically the variance of each variable which can be demonstrated by the
common factors from the PCA. A high variance indicated that the components explain the
variables and represent them well after the analysis. For instance, 96.4% variance in QTY is
accurately explained by the extracted component.

Total Variance Explained

Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings

Comp % of Cumulative % of Cumulative % of Cumulative


onent Total Variance % Total Variance % Total Variance %

1 2.916 58.322 58.322 2.916 58.322 58.322 2.655 53.092 53.092

2 1.720 34.409 92.731 1.720 34.409 92.731 1.982 39.639 92.731

3 .219 4.382 97.113

4 .079 1.582 98.695

5 .065 1.305 100.000

Extraction Method: Principal Component Analysis.

The above table explains the total variance in variables. Only components with Eigen values
greater than 1 are extracted. The rotated sums of squared loadings give the explanation that
92% of the variance is explained by the 2 extracted components.

7
The eigen values plotted against component number show that the first two components will
be extracted and the graph flattens for the other components.

Component Matrixa

Component

1 2

ECO .796 -.493

BILLSCUT .932 -.283

LINECOUNT .904 -.258

QTY .525 .830

NETSALE .567 .801

Extraction Method: Principal Component


Analysis.

a. 2 components extracted.

8
Rotated Component Matrixa

Component

1 2

ECO .934 -.063

BILLSCUT .956 .186

LINECOUNT .920 .194

QTY .076 .979

NETSALE .126 .973

Extraction Method: Principal Component


Analysis.
Rotation Method: Varimax with Kaiser
Normalization.

a. Rotation converged in 3 iterations.

The above tables tells where each variable lies, decided by where the maximum values lie.

1. ECO, BILLSCUT, LINECOUNT


2. QTY, NET SALES

The factor analysis has put our variables into 2 components which account for 92.7% of the
variance.

9
STEP 2: Cluster analysis
Cluster analysis is used to classify variables into groups or categories depending on their
attributes. Cluster analysis can be of 3 types, we have used Hierarchical cluster analysis.

Steps:

1. Analyse Classify Hierarchal cluster


2. Variables to be entered are the factor analysis variables
3. Cluster: Cases, Plots: Dendrogram, Icicle: All clusters , Orientation: Vertical
4. Cluster method: Wards method, Interval method: Squared Euclidean distance
5. Transform: Standardized z scores, Cluster membership: Single, number of clusters 4.

OUTPUT
After removing all the invalid values, due to missing information the case was processed as
follows with all values being valid and no entries missing.
Case Processing Summarya

Cases

Valid Missing Total

N Percent N Percent N Percent

198 100.0% 0 .0% 198 100.0%

a. Squared Euclidean Distance used

Agglomeration Schedule
Stage Cluster First Difference
Cluster Combined Appears
Next in
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Stage coefficient
1 70 94 .000 0 0 13 0
2 147 191 .000 0 0 26 0
3 122 170 .000 0 0 53 0
4 135 158 .001 0 0 36 0.001
5 34 57 .001 0 0 56 0
6 101 119 .001 0 0 117 0
7 127 187 .001 0 0 19 0
8 13 36 .002 0 0 84 0.001
9 48 65 .002 0 0 104 0
10 41 93 .003 0 0 63 0.001
11 91 117 .003 0 0 44 0
12 157 196 .004 0 0 53 0.001
13 70 192 .004 1 0 27 0
14 49 50 .005 0 0 73 0.001

10
15 5 6 .006 0 0 126 0.001
16 124 179 .006 0 0 94 0
17 25 103 .007 0 0 61 0.001
18 88 175 .008 0 0 78 0.001
19 120 127 .008 0 7 85 0
20 83 142 .009 0 0 72 0.001
21 155 164 .010 0 0 89 0.001
22 84 174 .011 0 0 71 0.001
23 105 178 .012 0 0 77 0.001
24 19 163 .013 0 0 59 0.001
25 86 118 .014 0 0 49 0.001
26 126 147 .016 0 2 108 0.002
27 70 106 .017 13 0 50 0.001
28 110 145 .018 0 0 69 0.001
29 8 197 .020 0 0 58 0.002
30 80 180 .021 0 0 98 0.001
31 17 38 .023 0 0 59 0.002
32 71 74 .024 0 0 66 0.001
33 123 162 .026 0 0 69 0.002
34 14 152 .028 0 0 110 0.002
35 78 139 .030 0 0 107 0.002
36 102 135 .032 0 4 71 0.002
37 4 177 .034 0 0 90 0.002
38 3 60 .036 0 0 127 0.002
39 131 194 .038 0 0 115 0.002
40 28 69 .041 0 0 105 0.003
41 22 43 .043 0 0 67 0.002
42 18 27 .045 0 0 60 0.002
43 64 112 .048 0 0 121 0.003
44 91 153 .051 11 0 118 0.003
45 54 172 .054 0 0 78 0.003
46 141 193 .056 0 0 112 0.002
47 189 198 .059 0 0 82 0.003
48 132 150 .063 0 0 107 0.004
49 86 159 .066 25 0 92 0.003
50 29 70 .069 0 27 125 0.003
51 138 161 .072 0 0 111 0.003
52 136 171 .076 0 0 81 0.004
53 122 157 .080 3 12 94 0.004
54 58 137 .084 0 0 105 0.004
55 23 87 .088 0 0 84 0.004
56 34 73 .093 5 0 129 0.005
57 134 167 .097 0 0 103 0.004
58 8 46 .102 29 0 109 0.005

11
59 17 19 .107 31 24 144 0.005
60 18 107 .113 42 0 145 0.006
61 15 25 .118 0 17 104 0.005
62 109 173 .124 0 0 89 0.006
63 41 42 .130 10 0 134 0.006
64 154 176 .137 0 0 92 0.007
65 121 156 .143 0 0 108 0.006
66 40 71 .149 0 32 149 0.006
67 22 52 .156 41 0 139 0.007
68 144 149 .162 0 0 141 0.006
69 110 123 .169 28 33 118 0.007
70 100 151 .176 0 0 126 0.007
71 84 102 .183 22 36 110 0.007
72 83 195 .191 20 0 122 0.008
73 49 130 .199 14 0 129 0.008
74 30 55 .207 0 0 112 0.008
75 108 114 .215 0 0 155 0.008
76 98 133 .224 0 0 102 0.009
77 92 105 .233 0 23 135 0.009
78 54 88 .242 45 18 125 0.009
79 169 188 .251 0 0 113 0.009
80 2 115 .261 0 0 133 0.01
81 128 136 .271 0 52 117 0.01
82 181 189 .282 0 47 93 0.011
83 24 82 .293 0 0 163 0.011
84 13 23 .305 8 55 150 0.012
85 120 140 .316 19 0 111 0.011
86 7 99 .328 0 0 119 0.012
87 33 39 .341 0 0 145 0.013
88 53 146 .355 0 0 120 0.014
89 109 155 .368 62 21 148 0.013
90 4 44 .382 37 0 128 0.014
91 35 66 .396 0 0 101 0.014
92 86 154 .412 49 64 138 0.016
93 129 181 .428 0 82 121 0.016
94 122 124 .444 53 16 122 0.016
95 104 165 .461 0 0 175 0.017
96 20 21 .478 0 0 152 0.017
97 16 26 .495 0 0 143 0.017
98 80 81 .512 30 0 156 0.017
99 97 116 .531 0 0 151 0.019
100 9 77 .549 0 0 153 0.018
101 32 35 .568 0 91 146 0.019
102 90 98 .587 0 76 184 0.019

12
103 134 160 .607 57 0 162 0.02
104 15 48 .627 61 9 146 0.02
105 28 58 .648 40 54 113 0.021
106 31 56 .670 0 0 149 0.022
107 78 132 .693 35 48 161 0.023
108 121 126 .718 65 26 140 0.025
109 8 45 .744 58 0 136 0.026
110 14 84 .772 34 71 147 0.028
111 120 138 .800 85 51 140 0.028
112 30 141 .831 74 46 158 0.031
113 28 169 .861 105 79 156 0.03
114 76 183 .892 0 0 157 0.031
115 131 166 .923 39 0 128 0.031
116 75 113 .954 0 0 171 0.031
117 101 128 .987 6 81 138 0.033
118 91 110 1.022 44 69 159 0.035
119 1 7 1.058 0 86 174 0.036
120 53 111 1.094 88 0 144 0.036
121 64 129 1.131 43 93 148 0.037
122 83 122 1.170 72 94 135 0.039
123 12 185 1.209 0 0 142 0.039
124 62 184 1.248 0 0 165 0.039
125 29 54 1.291 50 78 147 0.043
126 5 100 1.333 15 70 161 0.042
127 3 168 1.377 38 0 166 0.044
128 4 131 1.421 90 115 151 0.044
129 34 49 1.466 56 73 150 0.045
130 89 143 1.515 0 0 155 0.049
131 63 148 1.564 0 0 160 0.049
132 11 61 1.614 0 0 165 0.05
133 2 182 1.666 80 0 141 0.052
134 41 68 1.721 63 0 152 0.055
135 83 92 1.776 122 77 158 0.055
136 8 85 1.834 109 0 153 0.058
137 59 125 1.909 0 0 167 0.075
138 86 101 1.989 92 117 164 0.08
139 22 72 2.073 67 0 168 0.084
140 120 121 2.162 111 108 159 0.089
141 2 144 2.251 133 68 162 0.089
142 10 12 2.341 0 123 172 0.09
143 16 51 2.431 97 0 187 0.09
144 17 53 2.529 59 120 154 0.098
145 18 33 2.630 60 87 164 0.101
146 15 32 2.732 104 101 169 0.102

13
147 14 29 2.837 110 125 181 0.105
148 64 109 2.949 121 89 169 0.112
149 31 40 3.061 106 66 170 0.112
150 13 34 3.203 84 129 160 0.142
151 4 97 3.347 128 99 185 0.144
152 20 41 3.495 96 134 191 0.148
153 8 9 3.645 136 100 166 0.15
154 17 186 3.797 144 0 168 0.152
155 89 108 3.956 130 75 167 0.159
156 28 80 4.123 113 98 170 0.167
157 76 79 4.292 114 0 180 0.169
158 30 83 4.475 112 135 177 0.183
159 91 120 4.726 118 140 175 0.251
160 13 63 4.989 150 131 163 0.263
161 5 78 5.269 126 107 173 0.28
162 2 134 5.555 141 103 173 0.286
163 13 24 5.875 160 83 189 0.32
164 18 86 6.211 145 138 177 0.336
165 11 62 6.554 132 124 178 0.343
166 3 8 6.962 127 153 178 0.408
167 59 89 7.388 137 155 180 0.426
168 17 22 7.842 154 139 182 0.454
169 15 64 8.361 146 148 183 0.519
170 28 31 8.923 156 149 182 0.562
171 75 95 9.521 116 0 184 0.598
172 10 47 10.126 142 0 186 0.605
173 2 5 10.743 162 161 188 0.617
174 1 190 11.456 119 0 179 0.713
175 91 104 12.231 159 95 181 0.775
176 37 67 13.041 0 0 187 0.81
177 18 30 13.921 164 158 188 0.88
178 3 11 14.926 166 165 186 1.005
179 1 96 15.964 174 0 190 1.038
180 59 76 17.216 167 157 185 1.252
181 14 91 18.747 147 175 183 1.531
182 17 28 20.635 168 170 191 1.888
183 14 15 22.720 181 169 192 2.085
184 75 90 25.212 171 102 189 2.492
185 4 59 28.024 151 180 190 2.812
186 3 10 32.026 178 172 193 4.002
187 16 37 36.121 143 176 194 4.095
188 2 18 41.146 173 177 192 5.025
189 13 75 47.133 163 184 195 5.987
190 1 4 55.054 179 185 194 7.921

14
191 17 20 63.306 182 152 193 8.252
192 2 14 74.429 188 183 195 11.123
193 3 17 89.821 186 191 196 15.392
194 1 16 119.624 190 187 197 29.803
195 2 13 154.934 192 189 196 35.31
196 2 3 260.657 195 193 197 105.723
197 1 2 394.000 194 196 0 133.343

The above schedule indicates the number of clusters formed at each stage starting from 70
and 94 formed at the first stage. Towards the end of it, the difference in the coefficients
increases greatly, this stage signifies the appropriate number of clusters that should be
formed. Here the last 3 values have been taken. 3 clusters are formed. Also, the company
classified the same set of retailers into 3 categories for its loyalty programs.

* * * * * * * * * * * * * * * * * * * H I E R A R C H I C A L C L U S T E
R A N A L Y S I S * * * * * * * * * * * * * * * * * * *

Dendrogram using Ward Method

Rescaled Distance Cluster Combine

C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+

Case 70 70 -+
Case 94 94 -+
Case 192 192 -+
Case 106 106 -+
Case 29 29 -+
Case 88 88 -+
Case 175 175 -+
Case 54 54 -+
Case 172 172 -+
Case 14 14 -+
Case 152 152 -+
Case 84 84 -+
Case 174 174 -+
Case 135 135 -+
Case 158 158 -+
Case 102 102 -+
Case 104 104 -+
Case 165 165 -+
Case 91 91 -+
Case 117 117 -+
Case 153 153 -+
Case 110 110 -+
Case 145 145 -+
Case 123 123 -+

15
Case 162 162 -+
Case 147 147 -+
Case 191 191 -+
Case 126 126 -+---+
Case 121 121 -+ |
Case 156 156 -+ |
Case 138 138 -+ |
Case 161 161 -+ |
Case 127 127 -+ |
Case 187 187 -+ |
Case 120 120 -+ |
Case 140 140 -+ |
Case 35 35 -+ |
Case 66 66 -+ |
Case 32 32 -+ |
Case 48 48 -+ |
Case 65 65 -+ |
Case 25 25 -+ |
Case 103 103 -+ |
Case 15 15 -+ |
Case 155 155 -+ |
Case 164 164 -+ |
Case 109 109 -+ |
Case 173 173 -+ |
Case 64 64 -+ |
Case 112 112 -+ +-------+
Case 189 189 -+ | |
Case 198 198 -+ | |
Case 181 181 -+ | |
Case 129 129 -+ | |
Case 78 78 -+ | |
Case 139 139 -+ | |
Case 132 132 -+ | |
Case 150 150 -+ | |
Case 5 5 -+ | |
Case 6 6 -+ | |
Case 100 100 -+ | |
Case 151 151 -+ | |
Case 134 134 -+ | |
Case 167 167 -+ | |
Case 160 160 -+ | |
Case 144 144 -+ | |
Case 149 149 -+ | |
Case 2 2 -+ | |
Case 115 115 -+ | |
Case 182 182 -+ | |
Case 141 141 -+ | |
Case 193 193 -+ | |
Case 30 30 -+---+ |
Case 55 55 -+ |
Case 105 105 -+ |
Case 178 178 -+ |
Case 92 92 -+ |
Case 83 83 -+ |
Case 142 142 -+ |
Case 195 195 -+ +-------------------------+
Case 124 124 -+ | |
Case 179 179 -+ | |
Case 122 122 -+ | |
Case 170 170 -+ | |
Case 157 157 -+ | |

16
Case 196 196 -+ | |
Case 86 86 -+ | |
Case 118 118 -+ | |
Case 159 159 -+ | |
Case 154 154 -+ | |
Case 176 176 -+ | |
Case 101 101 -+ | |
Case 119 119 -+ | |
Case 136 136 -+ | |
Case 171 171 -+ | |
Case 128 128 -+ | |
Case 18 18 -+ | |
Case 27 27 -+ | |
Case 107 107 -+ | |
Case 33 33 -+ | |
Case 39 39 -+ | |
Case 24 24 -+ | |
Case 82 82 -+ | |
Case 63 63 -+-+ | |
Case 148 148 -+ | | |
Case 13 13 -+ | | |
Case 36 36 -+ | | |
Case 23 23 -+ | | |
Case 87 87 -+ | | +---------+
Case 34 34 -+ | | | |
Case 57 57 -+ +---------+ | |
Case 73 73 -+ | | |
Case 49 49 -+ | | |
Case 50 50 -+ | | |
Case 130 130 -+ | | |
Case 98 98 -+ | | |
Case 133 133 -+ | | |
Case 90 90 -+-+ | |
Case 75 75 -+ | |
Case 113 113 -+ | |
Case 95 95 -+ | |
Case 12 12 -+ | |
Case 185 185 -+ | |
Case 10 10 -+ | |
Case 47 47 -+ | |
Case 62 62 -+ | |
Case 184 184 -+---+ | |
Case 11 11 -+ | | |
Case 61 61 -+ | | |
Case 3 3 -+ | | |
Case 60 60 -+ | | |
Case 168 168 -+ | | |
Case 9 9 -+ | | |
Case 77 77 -+ | | |
Case 8 8 -+ | | |
Case 197 197 -+ | | |
Case 46 46 -+ | | |
Case 45 45 -+ +---------------------------------+ |
Case 85 85 -+ | |
Case 20 20 -+ | |
Case 21 21 -+ | |
Case 41 41 -+-+ | |
Case 93 93 -+ | | |
Case 42 42 -+ | | |
Case 68 68 -+ | | |
Case 22 22 -+ | | |

17
Case 43 43 -+ | | |
Case 52 52 -+ | | |
Case 72 72 -+ +-+ |
Case 19 19 -+ | |
Case 163 163 -+ | |
Case 17 17 -+ | |
Case 38 38 -+ | |
Case 53 53 -+ | |
Case 146 146 -+ | |
Case 111 111 -+-+ |
Case 186 186 -+ |
Case 71 71 -+ |
Case 74 74 -+ |
Case 40 40 -+ |
Case 31 31 -+ |
Case 56 56 -+ |
Case 80 80 -+ |
Case 180 180 -+ |
Case 81 81 -+ |
Case 169 169 -+ |
Case 188 188 -+ |
Case 28 28 -+ |
Case 69 69 -+ |
Case 58 58 -+ |
Case 137 137 -+ |
Case 16 16 -+ |
Case 26 26 -+ |
Case 51 51 -+---------+ |
Case 37 37 -+ | |
Case 67 67 -+ | |
Case 7 7 -+ | |
Case 99 99 -+ | |
Case 1 1 -+ +-------------------------------------+
Case 190 190 -+-+ |
Case 96 96 -+ | |
Case 97 97 -+ | |
Case 116 116 -+ | |
Case 4 4 -+ +-------+
Case 177 177 -+ |
Case 44 44 -+ |
Case 131 131 -+ |
Case 194 194 -+-+
Case 166 166 -+
Case 76 76 -+
Case 183 183 -+
Case 79 79 -+
Case 59 59 -+
Case 125 125 -+
Case 108 108 -+
Case 114 114 -+
Case 89 89 -+
Case 143 143 -+

18
CONCLUSION
The retailers can thus be classified into 3 based on the above cluster analysis. This analysis
can be further used to categorize the retailers to design and provide trade schemes, margins
and other benefits like loyalty program rewards etc.

This classification will help companies evaluate and classify its retailer for various functions
like sales management, distribution, inventory management etc.

19

Vous aimerez peut-être aussi