Final Project I

Project I – MIS 6324 – Business Intelligence
Association Rules & Clustering
Group – 11
Visalakshi Arunachalam
Saumil Badia
Chirag Gala
Pratik Kapadia
Introduction:
• Confidence:
Confidence is the ratio of the number of transactions that include all items in the consequent as well as the
antecedent (namely, the support) to the number of transactions that include all items in the antecedent.
Confidence = No. of transactions containing both body and head

No. of transactions containing body
• Support:
The support is simply the number of transactions that include all items in the antecedent and consequent parts of
the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.)
Support = No. of transactions containing items in body and head

Total no. of transactions in database
• Lift:
Lift is a measure of the performance of a model. It is basically the likelihood of occurrence of an outcome (head)
given the antecedent (body).
Lift = Confidence
Freq of head
• Association Rule:
Association rule mining finds interesting associations and/or correlation relationships among large set of data
items. Association rules shows attribute value conditions that occur frequently together in a given dataset. A
typical and widely-used example of association rule mining is Market Basket Analysis.
For example, data are collected using bar-code scanners in supermarkets. Such ‘market basket’ databases consist
of a large number of transaction records.
Association rules provide information of this type in the form of "if-then" statements. These rules are computed
from the data and, unlike the if-then rules of logic, association rules are probabilistic in nature.
In addition to the antecedent (the "if" part) and the consequent (the "then" part), an association rule has two
numbers that express the degree of uncertainty about the rule. In association analysis the antecedent and
consequent are sets of items (called itemsets) that are disjoint (do not have any items in common).
• K-Means
The k-means algorithm assigns each point to the cluster whose center (also called centroid) is nearest. The center
is the average of all the points in the cluster — that is, its coordinates are the arithmetic mean for each dimension
separately over all the points in the cluster.
Association Rules:
Q1.
Input Parameters: Support: 150, Confidence: 30
Rules Generated: 11
Data
Input Data project1_data!$M$1:$Q$2187
Data Format Item List
Minimum Support 150
Minimum Confidence % 30
# Rules 11
Overall Time (secs) 3
If customers buy from landsend.com they tend to buy from llbean.com also and the confidence of this
rule is 41.45%
Support(a Lift Ratio

Rule # Conf. % Antecedent (a) Consequent (c) Support(a) Support(c)
U c) 
1 41.54 landsend.com=> llbean.com 479 538 199 1.688051
2 36.99 llbean.com=> landsend.com 538 479 199 1.688051
3 52.59 gap.com=> oldnavy.com 424 820 223 1.402088
4 46.5 ae.com=> victoriassecret.com 357 825 166 1.232072
5 46.2 kohls.com=> jcpenney.com 368 923 170 1.094081
6 36.79 gap.com=> victoriassecret.com 424 825 156 0.974889
7 36.46 oldnavy.com=> victoriassecret.com 820 825 299 0.96617
8 36.24 victoriassecret.com=> oldnavy.com 825 820 299 0.96617
9 34.45 landsend.com=> jcpenney.com 479 923 165 0.815825
10 32.53 llbean.com=> jcpenney.com 538 923 175 0.770379
11 32.44 oldnavy.com=> jcpenney.com 820 923 266 0.768274
Q2.
Comparing two association rules that have reverse body and head
A.
Support
Support Support (Head &
Rule# Confidence Head Body (Head) (Body) Body) Lift Ratio
1 41.54 landsend.com=> llbean.com 479 538 199 1.688051
2 36.99 llbean.com=> landsend.com 538 479 199 1.688051
Rule 1:
Rule 2:
Association Rules Defined in terms of number of customers:
Landsend.com(A) llbean.com (B) Both only A Only B

479 538 199 280 339
B.
Support
Support Support (Head &
Rule# Confidence Head Body (Head) (Body) Body) Lift Ratio
7 36.46 oldnavy.com=> victoriassecret.com 820 825 299 0.96617
8 36.24 victoriassecret.com=> oldnavy.com 825 820 299 0.96617
Rule 1:
Rule 2:
Association Rules Defined in terms of number of customers:
oldnavy.com (A) victoriassecret.com (B) Both only A Only B

820 825 299 521 526
Q3
Relationship between the number of rules and the support/confidence level
Support Confidence
(No. of Transactions) (%) No. of Rules Time
100 30 14 1
100 40 5 1
100 50 2 2
100 60 0 0
150 30 11 1
150 40 4 1
150 50 1 1
200 30 4 1
200 40 1 1
200 50 1 1
250 30 3 1
250 40 0 0
Chart displaying the trend in Association Rules for different values of Support &
Confidence Level
Conclusion:
No. of Association Rules generated depends on both Confidence & Support Level. Higher the value of these two factors
lesser the number of rules.
Q4.
Clustering with K=4:
Cluster centers
Cluster region hhsz age income money
Cluster-1 1.82582 2.764345 6.594262 2.29099 494.759126

Cluster-2 1.30693 3.400283 7.118812 5.951909 572.845096
Cluster-3 3.255893 2.417508 7.215488 4.929293 385.638049
Cluster-4 3.047858 4.695217 6.425693 5.234257 991.819418
Distance between cluster

Cluster-1 Cluster-2 Cluster-3 Cluster-4
centers
Cluster-1 0 78.1778095 109.1646525 497.0742874
Cluster-2 78.1778095 0 187.2225891 418.9811281
Cluster-3 109.1646525 187.2225891 0 606.1862753
Cluster-4 497.0742874 418.9811281 606.1862753 0
Data summary Data summary (In Original coordinates)
Average
Average distance
Cluster #Obs distance in Cluster #Obs
in cluster
cluster
Cluster-1 506 1.68 Cluster-1 506 355.4019014
Cluster-2 704 1.477 Cluster-2 704 404.8165039
Cluster-3 563 1.347 Cluster-3 563 251.4218487
Cluster-4 413 1.868 Cluster-4 413 842.0241951
Overall 2186 1.565 Overall 2186 436.4733185
Elapsed Time
Overall (secs) 5.00

Q5.
Cluster Interpretation for K = 4:
Cluster 1 Cluster 2
Region HH Size Age Income Money Region HH Size Age Income Money
2 2 8 4 178.8 2 4 9 6 542.21
2 2 2 3 504.05 1 3 10 5 393.76
3 2 6 1 519.79 2 4 4 7 308.47
1 3 11 2 151.5 1 4 4 7 272.86
1 2 8 4 968.91 1 5 11 5 1080.14
2 1 7 4 255.67 1 3 11 6 357.63
2 2 3 1 261.99 1 2 11 5 58.97
3 4 6 1 325.5 1 2 9 6 255.89
2 3 3 2 207.06 1 4 9 7 1415.03
3 3 8 2 336.2 1 4 6 6 663.75
1 2 8 2 349.32 1 6 5 5 137.99
2 3 3 2 369.47 1 5 8 7 332.97
2 2 6 2 430.5 <-AVG-> 1 4 8 6 558.5
Cluster 3 Cluster 4
Region HH Size Age Income Money Region HH Size Age Income Money
3 2 11 3 599.08 4 5 6 7 360.49
2 2 9 5 363.89 3 5 7 7 475.95
2 2 5 5 133.48 4 4 5 7 1976.27
3 2 11 5 697.97 3 4 4 5 135.91
3 2 8 4 377.35 2 6 6 6 377.32
3 2 9 7 629.69 4 6 5 5 328.7
4 3 6 4 303.48 2 5 8 5 259.82
2 2 3 5 184 3 4 3 6 158.89
4 2 4 4 131.89 3 6 7 6 1704.46
3 3 10 5 287.85 4 4 9 4 1840.72
3 2 9 4 83.87 2 6 8 5 87.44
4 3 5 7 318.92 3 4 9 7 413.18
3 2 8 5 342.62 <-AVG-> 3 5 6 6 676.595
Region HH Size Age Income Money

c1 2 2 6 2 430.5
c2 1 4 8 6 558.5
c3 3 2 8 5 342.62
c4 3 5 6 6 676.595
Final
Interpretation:
The clusters generated are uniquely identifiable. They are based on the attributes house hold size, age and income.
Cluster1 are called as random buyers. Their income is low and still they end up buying more. Cluster2 is called as luxury
seekers as they are old aged people with high income and they tend to spend more. Cluster3 is called as careful buyers as
they have a middle income and smaller family. They really plan and purchase at a low frequency. Cluster4 is called as Big
Buyers as they have a bigger family and middle aged persons with high income group. So they tend to purchase a lot and
in high frequency.
Q6.
Clustering with different values of K:
CASE-I (K=2):
Distance between cluster Cluster-1 Cluster-2

centers
Cluster region hhsz age income child race connection country money
Cluster-1 0 120.7418813
Cluster-1 2.25067 2.131368 6.689008 3.825738 0.229221 1.0563 0.852547 0.135389 501.111093
Cluster-2 120.7418813 0
Cluster-2 2.277778 3.79375 7.0125 5.193055 0.996528 1.025694 0.964583 0.132639 621.830856
Average Average
Cluster #Obs distance in Cluster #Obs distance in
cluster cluster
Cluster-1 667 2.661 Cluster-1 667 355.8518258
Cluster-2 1519 2.328 Cluster-2 1519 477.1282528
Overall 2186 2.429 Overall 2186 440.1239633
Cluster Center:
For K = 2, we have only 2 clusters. Though the number of observations in each cluster is identical, there is no difference
between the clusters in most of the parameters. Hence we cannot classify the clusters appropriately. So K=2 is not an
optimum value for k-means clustering.
CASE-II (K=4):
Cluster Center:
Cluster-1 2.378882 3.062112 7.329194 4.10559 0.664596 1.024845 -0.000001 0.254658 453.632868
Cluster-2 1.528678 3.609726 7.200748 4.759352 1 1.027431 1 0.017456 574.395328
Cluster-3 2.223282 2.043893 6.664122 4.305343 0 1.057252 1 0.122137 523.652347
Cluster-4 3.125894 3.711016 6.639485 5.147353 0.997139 1.032904 1 0.247497 659.759516
Data summary (In Original

Data summary
coordinates)
Average Average
Cluster #Obs distance Cluster #Obs distance in
in cluster cluster
325.605348
Cluster-1 161 2.568 Cluster-1 161
2
446.428346
9
373.971923
4
Cluster-4 731 2.241 Cluster-4 731 504.745063
Overall 2186 2.141 Overall 2186 439.695642
CASE-II (K=10):
Cluster Center:
Cluster-1 2.253623 3.221015 6.445651 2.289848 1 1 1 0 450.229973

Cluster-2 1.906542 2.051402 6.67757 2.714951 0 1 1 0 475.320936
Cluster-3 3.16 2.752727 7.861819 5.821818 1 1 1 0 450.519958
Cluster-4 1.241071 3.750001 7.174107 5.910715 0.997768 1 1 0 592.383203
Cluster-5 2.597345 2.039824 6.632743 5.69469 0 1 1 0 497.282837
Cluster-6 2.382165 3.050956 7.350318 4.127388 0.662421 1 -0.000001 0.261147 448.730053
Cluster-7 3.008097 5.004043 6.165993 5.441295 0.991903 1 1 0 507.327606
4449.98798
Cluster-8 2.431818 3.227273 7.272726 4.886364 0.863636 1.045455 1 0
8
Cluster-9 2.263158 2.929824 6.578947 3.982455 0.649123 2.350877 0.929825 0.157895 523.31175
Cluster-10 2.330578 3.318182 6.747934 4.747934 0.760331 1 1 0.999999 496.809785
stanc
etwee
Cluster-1 Cluster-2 Cluster-3 Cluster-4 Cluster-5 Cluster-6 Cluster-7 Cluster-8 Cluster-9 Cluster-10
uster
enters
uster- 25.1451665 142.205791 47.2028949 57.2179869 73.1156037
0 3.95034679 2.76989798 3999.75895 46.65710299
4 9 1 3 7
uster- 25.1451665 117.125391 22.1739320 26.6881051 32.2960826 3974.66799 48.0407051
0 25.0839916 21.66266898
4 6 3 1 2 2 6
uster- 141.881422 46.7987016 56.8790043 72.8456421
3.95034679 25.0839916 0 2.86576461 3999.46828 46.33796747
9 2 9 6
uster- 142.205791 117.125391 141.881422 95.1324295 143.674667 85.0904591 3857.60514 69.1276477
0 95.59415137
9 6 9 7 7 8 4 4
uster- 47.2028949 22.1739320 46.7987016 95.1324295 48.6098735 3952.70556 26.1461054
0 10.5412827 2.10163247
1 3 2 7 7 2 1
uster- 26.6881051 143.674667 48.6098735 58.6701395 74.6041191
2.76989798 2.86576461 0 4001.25815 48.10445001
1 7 7 2 1
uster- 57.2179869 32.2960826 56.8790043 85.0904591 58.6701395 3942.66102 16.2671794
10.5412827 0 10.76098261
3 2 9 8 2 1 7
uster- 3974.66799 3857.60514 3952.70556 3942.66102 3926.67664
3999.75895 3999.46828 4001.25815 0 3953.178371
2 4 2 1 5
uster- 73.1156037 48.0407051 72.8456421 69.1276477 26.1461054 74.6041191 16.2671794 3926.67664
0 26.56454999
7 6 6 4 1 1 7 5
uster- 46.6571029 21.6626689 46.3379674 95.5941513 48.1044500 10.7609826 3953.17837 26.5645499
2.10163247 0
0 9 8 7 7 1 1 1 9
Average Average
cluster cluster
Cluster-1 270 1.603 Cluster-1 270 308.507607
Cluster-2 214 1.631 Cluster-2 214 297.5014998
Cluster-3 283 1.344 Cluster-3 283 277.938553
Cluster-4 446 1.379 Cluster-4 446 393.3044052
Cluster-5 226 1.44 Cluster-5 226 348.2440924
Cluster-6 157 2.512 Cluster-6 157 322.7320293
Cluster-7 247 1.339 Cluster-7 247 326.2300767
Cluster-8 44 3.033 Cluster-8 44 1502.255369
Cluster-9 57 3.485 Cluster-9 57 383.4960612
Cluster-10 242 2.163 Cluster-10 242 367.8065808
Overall 2186 1.685 Overall 2186 360.4535118
For K=10, there are too many clusters and the inter cluster difference is minimal. The number of observations in
each cluster is too low and a manager cannot differentiate marketing initiatives to various market segments
identified with 10 clusters as the clusters are too similar.
CASE-III (K=6):
Cluster Center:
connectio
Cluster region hhsz age income child race country money
n
Cluster-1 2.382165 3.050956 7.350318 4.127388 0.662421 1 -0.000001 0.261147 448.730053
Cluster-2 2.253363 2.042601 6.672646 4.242153 0 1 1 0 530.739044
Cluster-3 3.306195 3.582301 6.761062 4.920354 0.99646 1 1 0 611.6884
Cluster-4 1.414226 3.711297 7.129707 5.059972 0.998605 1 1 0 631.830167
Cluster-5 2.271186 2.966102 6.661017 3.983051 0.661017 2.338984 0.932203 0.152542 733.070973
Cluster-6 2.330578 3.318182 6.747934 4.747934 0.760331 1 1 0.999999 496.809785
Distance between
Cluster-1 Cluster-2 Cluster-3 Cluster-4 Cluster-5 Cluster-6
cluster centers
Cluster-1 0 82.02735741 162.9684465 183.1095957 284.3465278 48.10445001
Cluster-2 82.02735741 0 80.97986148 101.1176476 202.3397826 33.98039549
Cluster-3 162.9684465 80.97986148 0 20.23468212 121.4001728 114.8877857
Cluster-4 183.1095957 101.1176476 20.23468212 0 101.2635411 135.0288767
Cluster-5 284.3465278 202.3397826 121.4001728 101.2635411 0 236.2680567
Cluster-6 48.10445001 33.98039549 114.8877857 135.0288767 236.2680567 0
Average Average
cluster cluster
Cluster-1 157 2.512 Cluster-1 157 322.7320293
Cluster-2 446 1.787 Cluster-2 446 376.2517197
Cluster-3 565 1.737 Cluster-3 565 470.6028622
Cluster-4 717 1.77 Cluster-4 717 480.9731203
Cluster-5 59 3.647 Cluster-5 59 664.8660267
Cluster-6 242 2.163 Cluster-6 242 367.8065808
Overall 2186 1.912 Overall 2186 437.9971766
Out of the K=4 and K=6, K=4 would be better for clustering. This is because in k=4 we obtain more disparate clusters.
For e.g. With K=4 there is significant difference between clusters based on the education level besides other variables.
This distinction was not as evident with K value 6.Also with 4 clusters the numbers persons per cluster is sizeable and
since inter cluster difference is higher based on more input variables K=4 should be the optimal value for K-Means
clustering.
Conclusion:
To look at what are the other possible customer segments we formulated clusters but with different values of K
The following is the result of our analysis based on different values of K
Value of K Average Distance b/w Average Distance b/w Interpretability
cluster centers cluster members
(High: clusters are far apart) (High: members of the same

cluster are far apart)
2 Highest(best) Highest Difficult to interpret and

classify
4 Higher Higher Easy to classify and

categorize
6 Lower Lower Convenient to categorize
10 Lowest Lowest(best) Difficult to categorize
Conditions to have unique clusters:

• High value of Inter-cluster distance that implies the clusters have many distinct features.
• Lower value of Intra cluster distance so that the members inside the cluster have high degree of homogeneity.
• Interpretability should be very good so that we can understand and categorize the cluster and plan business
strategies accordingly, concentrating on the key features of that particular cluster.
Q7.
Let us select the following association rule for interpretation:
Landsend.com  Llbean.com
Cluster centers
connectio
Cluster region hhsz age income child money
n
Cluster-1 1.833333 3 8 4.5 0.5 -0.000001 797.148331
Cluster-2 1.117647 3.517647 7.188235 5.541177 1 1 949.315413
Cluster-3 2.128205 1.974359 7.435898 5.282051 -0.000002 1 619.080895
Cluster-4 2.855073 3.84058 8.478263 5.985508 1 1 910.932895
Distance between cluster

Cluster-1 Cluster-2 Cluster-3 Cluster-4
centers
178.076754 113.808448
Cluster-1 0 152.17948
3 8
330.241379 38.4473958
Cluster-2 152.17948 0
1 4
178.076754 330.241379 291.863293
Cluster-3 0
3 1 9
113.808448 38.4473958 291.863293
Cluster-4 0
8 4 9
Average Average
cluster cluster
567.820509
4
740.539882
6
401.573635
5
521.159355
9
593.937492
Overall 199 1.661 Overall 199
2
Elapsed Time
Overall (secs) 2.00
Cluster Interpretation:
Cluster_
1
region hhsz age income child connection money
1 3 7 6 1 1 863.69
1 4 10 6 1 1 723.31
1 3 10 6 1 1 666.65
1 2 7 5 0 0 1065.07
3 2 8 5 1 0 491.12
2 4 7 7 0 0 635.91
1 2 9 3 0 0 2232.66
3 5 8 6 1 0 298.95
1 3 9 1 1 0 59.18
3 8 6 Both No Connection
Cluster_
2
1 3 7 6 1 1 863.69
1 4 10 6 1 1 723.31
1 3 10 6 1 1 666.65
1 4 4 5 1 1 1103.43
1 3 10 6 1 1 1239.79
1 4 4 5 1 1 757.2
1 3 7 6 1 1 1485.98
1 4 8 6 1 1 319.34
1 4 7 6 1 1 259.9
1 3 5 5 1 1 331.22
1 3 7 5 1 1 188
1 3 2 5 1 1 1146.05
1 3 10 5 1 1 1644.13
1 3 6 6 1 1 140.09
1 3 5 5 1 1 201.49
1 4 8 7 1 1 1094.31
1 4 6 7 1 1 1024.22
1 4 6 7 1 1 1058.38
1 4 9 7 1 1 904.3
1 4 5 7 1 1 1043.41
1 4 5 7 1 1 1110.67
1 3 5 7 1 1 1030.66
2 4 9 5 1 1 921.19
1 3 10 6 1 1 137.76
1 3 7 5 1 1 1845.37
Have
1 4 5 to 7 5 to 7 Have Connection 850
Children
Cluster_3
Region hhsz age income child connection money
2 2 7 6 0 1 391.91
2 2 11 5 0 1 870.21
3 2 5 5 0 1 370.31
2 3 6 6 0 1 208.97
3 2 6 6 0 1 156
2 2 8 7 0 1 215.32
1 2 7 5 0 1 568.45
1 2 9 5 0 1 752.38
2 2 6 7 0 1 1182.52
1 2 6 6 0 1 458.29
1 2 6 6 0 1 436.83
3 1 6 6 0 1 730.34
1 2 11 5 0 1 987.32
2 1 4 7 0 1 485.91
3 2 10 7 0 1 854.27
3 2 10 7 0 1 373.19
3 2 11 7 0 1 591.92
1 2 7 5 0 1 1532.77
3 2 5 7 0 1 161.87
3 2 8 7 0 1 1270.35
3 2 5 7 0 1 62.45
1 2 6 7 0 1 249.43
1 2 9 7 0 1 144
1 2 11 7 0 1 343.32
3 2 9 3 0 1 208
1,2,3 2 7 to 11 7 no child Have Connection 500
Cluster_
4
3 4 8 7 1 1 604.5
3 3 11 6 1 1 906.17
3 3 9 6 1 1 303
3 3 8 7 1 1 806.9
3 4 4 7 1 1 882.37
3 3 9 7 1 1 738.83
3 4 5 7 1 1 1363.64
3 4 6 5 1 1 304.83
3 3 10 7 1 1 766.99
3 3 6 7 1 1 831.93
3 3 6 7 1 1 611.97
2 4 9 5 1 1 921.19
2 4 9 7 1 1 842.53
3 3 9 7 1 1 1457.28
3 3 5 7 1 1 679.29
3 3 9 5 1 1 260.82
2 4 6 6 1 1 270.83
3 5 7 5 1 1 610.05
2 4 7 7 1 1 460.74
3 3 3 6 1 1 398
3 3 5 6 1 1 1729.28
3 5 9 7 1 1 1394.86
2 3 9 7 1 1 999.94
3 5 6 7 1 1 471.83
2 5 9 6 1 1 779.66
Have
2,3 4 7 to9 5to7 Have Connection 800
Children
Conclusion
When compared to the clusters generated in Part2 these clusters have more similarities and are classified based on lesser
number of attributes. We are able to narrow down on the customer segments more easily than the clusters in part2 due to
lesser number of attributes defining them. Hence these clusters help in concentrating on the specific segments of interest
to the company chosen in the association rule and hence pave way for better business Intelligence strategies. As it can be
seen from the conclusion members of these clusters are middle or old aged people with good income level and the
differentiating factors are house hold size and children with internet.
Q8.
Association Rule on a cluster from Part 2:

Analyzing a particular cluster: Cluster_2
Case Data
i
Input Data Cluster2!$N$1:$R$87
Minimum Support 15
# Rules 12
If customers buy from landsend.com they tend to from buy llbean.com also and the confidence of this rule is 100%
Support(a
Rule # Conf. % Antecedent (a) Consequent (c) Support(a) Support(c) Lift Ratio
U c)
1 100 oldnavy.com=> landsend.com, llbean.com 16 86 16 1

2 100 oldnavy.com=> landsend.com 16 86 16 1
3 100 landsend.com=> llbean.com 86 86 86 1
4 100 llbean.com=> landsend.com 86 86 86 1
5 100 llbean.com, oldnavy.com=> landsend.com 16 86 16 1
6 100 landsend.com, oldnavy.com=> llbean.com 16 86 16 1
7 100 oldnavy.com=> llbean.com 16 86 16 1
8 100 jcpenney.com=> llbean.com 24 86 24 1
9 100 jcpenney.com=> landsend.com 24 86 24 1
10 100 jcpenney.com=> landsend.com, llbean.com 24 86 24 1
11 100 jcpenney.com, llbean.com=> landsend.com 24 86 24 1
12 100 jcpenney.com, landsend.com=> llbean.com 24 86 24 1
Case Data
ii
Input Data Sheet1!$M$1:$R$87
Minimum Support 25
# Rules 2
Support(a
U c)

Case Data
iii
Input Data Sheet1!$M$1:$R$87
Minimum Support 20
# Rules 7
Support(a
U c)

2 100 jcpenney.com=> llbean.com 24 86 24 1
3 100 jcpenney.com=> landsend.com 24 86 24 1
4 100 jcpenney.com=> landsend.com, llbean.com 24 86 24 1
5 100 jcpenney.com, llbean.com=> landsend.com 24 86 24 1
6 100 jcpenney.com, landsend.com=> llbean.com 24 86 24 1
Analysis of Association Rules generated:
The association rules generated has a 100% confidence level irrespective of the Support level and with a constant Lift
Ratio of 1. This was not the case for the rules generated in Part 1. This implies that we are looking at rules that are
sensible and pertaining to the company of our interest, which is Landsend.com. They appear to be losing their customers
to llbean.com. Also these rules generated are based on one specific company where as in Part1 the rules generated were
based on a huge dataset with no reference to specific company. It just denoted the buying behavior of the customers in
general.
Q9.
Business Intelligence discovered for the firm Landsen.com &

Recommendations:
The shop landsend.com might be losing its customers to llbean.com. This is based on the XL
Miner report that we have generated. But the good thing is Landsen.com is also slowly gaining
customers from oldnavy.com. These are the some of the BI recommendations for Landsend.com
1. Landsend should concentrate on product and offerings/ promotions according to the

preference of our target customer segment. It should reach the target segments at the
right time.
2. They should try to gain or attract other segments. Landsend.com can have attractive
offers for the teens and children as they do not have these as their current target
segments.
3. They must have an easy to navigate user-friendly website with easily designed
shopping cart. Users should be able to check out with fewer clicks.
4. They can also provide better customer service by 24x7 customer support centers
assisting people in online purchases.
5. They can try different marketing techniques like e-mail and snail mail offers to create
awareness among existing customers and customers who are about to leave.
6. Also they must identify major competitors and launch an appropriate marketing
campaign to differentiate and target their customer base.
7. Landsend should better understand the expectations of the unhappy customers and
decide on business strategies. They can conduct store or online surveys and
understand customer behavior. They can even employ BI techniques to implement
product placements for better customer shopping experience.

Final Project I

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Final Project I

Transféré par

Droits d'auteur :

Formats disponibles

Project I – MIS 6324 – Business Intelligence

Association Rules & Clustering

Confidence = No. of transactions containing both body and head

Support = No. of transactions containing items in body and head

Input Parameters: Support: 150, Confidence: 30

Support(a Lift Ratio

Association Rules Defined in terms of number of customers:

Landsend.com(A) llbean.com (B) Both only A Only B

Association Rules Defined in terms of number of customers:

oldnavy.com (A) victoriassecret.com (B) Both only A Only B

Relationship between the number of rules and the support/confidence level

Clustering with K=4:

Cluster region hhsz age income money

Cluster-1 1.82582 2.764345 6.594262 2.29099 494.759126

Distance between cluster

Data summary Data summary (In Original coordinates)

Overall (secs) 5.00

Region HH Size Age Income Money

Clustering with different values of K:

Distance between cluster Cluster-1 Cluster-2

Data summary Data summary (In Original coordinates)

Cluster-1 667 2.661 Cluster-1 667 355.8518258

Cluster-2 1519 2.328 Cluster-2 1519 477.1282528

Overall 2186 2.429 Overall 2186 440.1239633

Cluster-2 1.528678 3.609726 7.200748 4.759352 1 1.027431 1 0.017456 574.395328

Cluster-3 2.223282 2.043893 6.664122 4.305343 0 1.057252 1 0.122137 523.652347

Cluster-4 3.125894 3.711016 6.639485 5.147353 0.997139 1.032904 1 0.247497 659.759516

Data summary (In Original

Cluster-1 2.253623 3.221015 6.445651 2.289848 1 1 1 0 450.229973

Data summary Data summary (In Original coordinates)

Data summary Data summary (In Original coordinates)

(High: clusters are far apart) (High: members of the same

2 Highest(best) Highest Difficult to interpret and

4 Higher Higher Easy to classify and

6 Lower Lower Convenient to categorize

10 Lowest Lowest(best) Difficult to categorize

Conditions to have unique clusters:

Let us select the following association rule for interpretation:

Distance between cluster

Data summary Data summary (In Original coordinates)

Overall (secs) 2.00

Association Rule on a cluster from Part 2:

1 100 oldnavy.com=> landsend.com, llbean.com 16 86 16 1

1 100 llbean.com=> landsend.com 86 86 86 1

1 100 llbean.com=> landsend.com 86 86 86 1

Analysis of Association Rules generated:

Business Intelligence discovered for the firm Landsen.com &

1. Landsend should concentrate on product and offerings/ promotions according to the

Vous aimerez peut-être aussi