Académique Documents
Professionnel Documents
Culture Documents
Discovery
Richard J Bolton, PhD
Associate Director,
Strategic Consulting & Analytics
KnowledgeBase Marketing
Richard.Bolton@kbm1.com
KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing
KnowledgeBase
®
Marketing
Overview
1
KnowledgeBase
®
Marketing
KnowledgeBase
®
Marketing
History
Pattern discovery has a long history in
epidemiology through spatial patterns
(Cholera outbreak at water pump – Snow 1854)
2
KnowledgeBase
®
Marketing
KnowledgeBase
®
Marketing
Patterns
A data-oriented definition of a pattern
Want to include the concepts that patterns
• Are localized events, distinct from some global,
normalized view of the world
• Result from a deterministic data generating mechanism
(they represent something that is not noise)
• Have varying degrees of ‘interestingness’
We can use this definition of a pattern to develop
a framework for pattern discovery
3
KnowledgeBase
®
Marketing
What is a pattern?
A pattern is a local
structure Global Global – describes all data
A pattern generates data
with an anomalously
high density compared Segmentation –
with that expected under divide data
some (global) baseline
model Patterns – local
search
Baseline represents our
beliefs/expectations in the Outliers
system Local
KnowledgeBase
®
Marketing
4
KnowledgeBase
®
Marketing
A Statistical Framework
Start with traditional view of data
• Data = Model + Noise
• Data = Global structural part + Random part
Refine
• Data = Global structural part + local structural parts +
random part
• Data = Baseline + patterns + noise
Unsupervised vs Supervised pattern discovery
• Unsupervised: Local structures with unusually high
densities
• Supervised: Local structures with unusually high values
of Y variable (supervisor)
Copyright © 2007, SAS Institute Inc. All rights reserved.
Unsupervised Pattern
Discovery
KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing
5
KnowledgeBase
®
Marketing
Global Global
Local Outlier
(Full Data) (Divisive)
Outlier
Cluster analysis Pattern search detection
Fit single
Unsupervised Allocates each Find regions of Find singletons
distribution to
observation to a local high density distant from
full data
cluster compared to a (baseline)
baseline (not all global distn or
observations clusters
allocated)
Copyright © 2007, SAS Institute Inc. All rights reserved.
KnowledgeBase
®
Marketing
Example
Earthquakes in California
38.5
mechanisms
Expected baseline
37.5
domain knowledge
37.0
6
KnowledgeBase
®
Marketing
Catalog mailings at E
Valentine’s, Easter, Mother’s
Day and Christmas = baseline M
F H
We discovered purchasing
patterns at Father’s Day and
Halloween
Led to catalog being designed
and mailed for Halloween
season Î increase in sales
KnowledgeBase
®
Marketing
Association Rules
Market Basket Analysis
• Association Analysis, Product Affinity Analysis,
Recommender Engines, etc, etc
Find ‘interesting’ rules or associations
Naïve global baseline model is independence
• Pr(Eggs and Ham) = Pr(Eggs)xPr(Ham)
• Assess O(Eggs and Ham) vs E(Eggs and Ham |
Baseline)
• Various measures of association (some with
probabilistic meaning, some just based on frequency)
End up with ranking of ‘most associated’ products
Usually need to find some additional way of
ranking/filtering patterns according to what
makes them ‘interesting’
Copyright © 2007, SAS Institute Inc. All rights reserved.
7
KnowledgeBase
®
Marketing
KnowledgeBase
®
Marketing
8
Supervised Pattern
Discovery
KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing
KnowledgeBase
®
Marketing
Supervised
CART/CHAID Outlier
Supervised pattern detection
Regression Data split into search
segments, driven by a Unusually high
E.g. Find regions where
supervising variable. (or low) values
Linear/logistic supervisor variable
Each observation of supervisor.
associated with a has locally high values
May affect
split ‘Bottom up’ approach regression fit
‘Top down’ approach
Copyright © 2007, SAS Institute Inc. All rights reserved.
9
KnowledgeBase
®
Marketing
KnowledgeBase
®
Marketing
10
KnowledgeBase
®
Marketing
KnowledgeBase
®
Marketing
11
KnowledgeBase
®
Marketing
KnowledgeBase
®
Marketing
12
Challenges in Pattern
Discovery
KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing
KnowledgeBase
®
Marketing
13
KnowledgeBase
®
Marketing
KnowledgeBase
®
Marketing
14
Thank You
Richard J Bolton, PhD
Associate Director,
Strategic Consulting & Analytics
KnowledgeBase Marketing
Richard.Bolton@kbm1.com
KnowledgeBase
®
Copyright © 2007, SAS Institute Inc. All rights reserved.
Marketing
15