Vous êtes sur la page 1sur 57

Chapter: Introduction to Data Warehousing

and Data Mining

Introduction to
Data Mining

(many slides in this section


Are used courtesy of
Carrig E m e r g i n g T e c h n o l o g i e s
Ph: 410- 553- 6760
www.c a r r i g e t. c o m )

Introduction
Introduction to
to Data
Data Mining
Mining

• What is data mining


• Need a data warehouse
• Need to clean up data for mininig
• Lots of algorithms

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 1
Chapter: Introduction to Data Warehousing
and Data Mining

Data
Data Mining
Mining Introduction
Introduction
• Data Mining examines a database and looks for
patterns in the data

• A data warehouse by itself will respond to queries


from users
– It will not tell users about patterns in data that users may not have
thought about
– To find patterns in data, data mining is used to try and mine key
information from a data warehouse

Typical
Typical Data
Data Architecture
Architecture
Inventory
OLTP
Finance Inventory Data
OLTP Sales
OLTP OLTP
Application Application Application

Finance Enterprise SSaalleess


OLTP Information OLTP
OLTP
Data Data
System Data

Data Warehouse

Finance Inventory Sales


Subject Subject Subject California
New York Area Area Area Data Mart
Data Mart

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 2
Chapter: Introduction to Data Warehousing
and Data Mining

Advantages
Advantages of
of Data
Data Mining
Mining
• Data mining allows companies to exploit
information and use this to obtain competitive
advantage.

• Data mining helps identify trends such as:


– why customers buy certain products
– ideas for very direct marketing
– ideas for shelf placement
– training of employees vs. employee retention
– employee benefits vs. employee retention

Implementing
Implementing Data
Data Mining
Mining
• Apply data mining tools to run data mining
algorithms against data
• There are two approaches:
– Copy data from the Data Warehouse and mine it
– Mine the data within the Data Warehouse

• Popular tools use a variety of different data


mining algorithms:
– regression
– association rules
– genetic algorithms
– decision trees
– neural networks
– clustering
– classification

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 3
Chapter: Introduction to Data Warehousing
and Data Mining

Data
Data Mining
Mining using
using Separate
Separate Data
Data
• You can move data from the data warehouse to
data mining tools
– Advantages
– Data mining tools may organize data so they can run faster
– Disadvantages
– Could be very expensive to move large
amounts of data
Data Warehouse

Data Mining Tool


Copy of data made
by the
Data Mining Tool

Data
Data Mining
Mining Against
Against the
the Data
Data Warehouse
Warehouse

• Data mining tools can access data directly in the


Data Warehouse
– Advantages
– No copy of data is needed for data mining
– Disadvantages
– Data may not be organized in a way that is
efficient for the tool
Data Warehouse

Data Mining Tool

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 4
Chapter: Introduction to Data Warehousing
and Data Mining

Application
Application Areas
Areas
• Database Marketing
• Time-series prediction
• Detection
• Probability Estimation
• Information compression
• Sensitivity Analysis

Database
Database Marketing
Marketing
• Response Modeling
– Improve customer response by targeting prospects that are
predicted as most likiely to respond to a particular advertisement or
promotion (customer relationship management)
• Benefits
– Improved Return on Investment (ROI)
– Improved inventory control and supply-side management
– Improved customer relationships and retention
• Lift Chart
– A lift chart is often used to visualize and measure response
modeling performance.
– Shows how careful selection of customers ensures good response

10

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 5
Chapter: Introduction to Data Warehousing
and Data Mining

Lift
Lift Chart
Chart

100
90
80
70
60
random
50
40 careful
30
20
10
0
0 10 20 30 40 50 60 70 80 90

11

Cross
Cross Selling
Selling
• Leverage existing customer base by selling them
additional products or services
• Similar to response modeling but only target
existing customers
• Coming histrorical purchase data with
demographic data
– Lifestyle
– Credit reports
– Purchases of other products
• Targeted marketing
– Only contact those customers that look promising
– Create specialized mailing lists

12

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 6
Chapter: Introduction to Data Warehousing
and Data Mining

Cross
Cross selling
selling
• Input
– Purchase history
– Purchase date
– Amount of purchase
– Number of purchases
– Products purchased
– Demographics
– Age
– Gender
– Household size
– Other
– Did customer request information
– Number of solicitations sent
– How customer heard of the company
• Output
– Measure of how “approachable” the customer would be for a given
product

13

Time-Series
Time-Series Prediction
Prediction
• Time series
– Sequence of numerical vectors collected over time
– Trajectory of a baseball
– Stock prices
• Estimate one or more values appearing later in
the sequence
• Important aspect is you have to discard a lot of
useless data.

14

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 7
Chapter: Introduction to Data Warehousing
and Data Mining

Detection
Detection
• Identify the existence or the occurrence of a condition
• Fraud detection
– May have only a few cases of fraud, have to find out what is different
about these transactions
• Intrusion Detection
– Find patterns that indicate when an attack is made on an network
• Need to compute the cost of missing a fraud and the cost of
falsely accusing someone of fraud.
• Can define classes of a fault. For baking cookies you might
have:
– Good-no defect
– Broken
– Underbaked
– Overbaked
– Too small
– Too few chips
15

Others
Others
• Probability Estimation
– Approximate the likelihood of an event given an observation.
• Information Compression
– Can be viewed as a special type of estimation problem. For a
given set of data, estimate the key components that be can be used
to construct the data.
• Sensitivity Analysis
– Understand how changes in one variable affect others.
– Identify sensitivity of one variable on another (find out if
dependencies existg).

16

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 8
Chapter: Introduction to Data Warehousing
and Data Mining

Training
Training Data
Data vs.
vs. Test
Test Data
Data
• Training data is known results that are used to
make predictions on test data (unknown results).
• Steps:
– Step 1: Split training data into two halves (A & B)
– Step 2: Build model against A
– Step 3: Run model on B
– Step 4: See how accurate model is
– Step 5: If accurate, use it, if not return to Step 1

AGE SALARY BUYS MAGAZINE


20 30,000 Y
25 35,000 Y
30 40,000 N
35 45,000 Y
40 50,000 N
45 55,000 N

17

Typical
Typical Use
Use of
of Training
Training Data
Data
Steps: AGE SALARY BUYS MAGAZINE

1 20 30,000 Y
A 40 50,000 N
35 45,000 Y
45 55,000 N

B 25 35,000 Y
30 40,000 N

MAKE MODEL FROM “A”

?
2 If AGE >= 20 AND AGE <= 40 3
APPLY IT TO “B”
AND SALARY >= 30,000 AND
It works 2 out of 3
SALARY >= 45,000
times!! 33 percent error
THEN BUYS MAGAZINE

4
5 BACK TO STEP 1! IF
18

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 9
Chapter: Introduction to Data Warehousing
and Data Mining

Decision
Decision Trees
Trees
• Used to represent a sequence of decisions to
arrive at a predicted result
• More general than matrices or flat tables
ROOT
SALARY >
50,000
Y N LEAVES
AGE < 40 AGE > 20

Y N Y N

AGE >30 AGE<30 AGE<20


AGE >50

Y Y Y N

BUYS BUYS
BUYS AGE>25
MAGAZINE MAGAZINE
MAGAZINE

19

Prune
Prune the
the Tree
Tree
• Pruning the Tree means to analyze a portion of
the Decision Tree.

AGE > 20

Y N

AGE<30 AGE<20

Y N

BUYS
MAGAZINE AGE>25

20

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 10
Chapter: Introduction to Data Warehousing
and Data Mining

Pros
Pros and
and Cons
Cons of
of Decision
Decision Trees
Trees

• Pro
– Easy to understand why they make decisions

• Con
– Can spend your life tuning the Decision Trees
– Can take a long time to build

21

Neural
Neural Networks
Networks
• Neural Networks

Hidden Layers
Input NODE
Layer
Output
Layer

22

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 11
Chapter: Introduction to Data Warehousing
and Data Mining

How
How aa Node
Node Works
Works

Weight
(a0)(w 0) + (a 1)(w1)
(a2)(w 2)
a0
w0
Outgoing
activation
NODES

a1
w1
WT
w2
Threshold
If W > 0 Output=1
a2 OTHERWISE Output=0

23

How
How aa Node
Node Works
Works (cont.)
(cont.)

Legend Weight Threshold


Brown = 1 (a0)(w 0) + (a 1)(w1) If W > 0 Output=1
Blue = 2 (a2)(w 2) OTHERWISE Output=0
a0
Age = 24 a0 W 0 = -0.6 -15.7

a1 W 1 = .1
Hair = Brown a1
WT 0
a2 W 2 = -0.7
Eyes = Blue a2

a0 16
Age = 30 a0 W 0 = .5
a1 W 1 = .2
a1
WT 1
Hair = Brown

a2 W 2 = .4
Eyes = Blue a2

24

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 12
Chapter: Introduction to Data Warehousing
and Data Mining

Neural
Neural Networks
Networks
• Back Propagation
Input Hidden Output
Layer Layers Layer

25

Neural
Neural Networks
Networks

• Pros
– There are lots of Neural Network algorithms and some are very
accurate
– OCR (Optical Character Recognition) is a big success

• Cons
– Does not explain why it makes a decision
– Back propagation is a very computationally expensive process

26

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 13
Chapter: Introduction to Data Warehousing
and Data Mining

K-Nearest
K-Nearest Neighbor
Neighbor
• Idea is that related items are very similar in
values.

40
SALARY

30

20

10

10 15 20 25

AGE
27

Research
Research in
in Data
Data Mining
Mining
• Research is on-going
– Knowledge and Data Discovery
– VLDB (conference on very large databases)
– SIGMOD (special interest group on management of data)
– CIKM (conference on information and knowledge management)
– Conference on Data and Knowledge Engineering

28

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 14
Chapter: Introduction to Data Warehousing
and Data Mining

Products
Products
• Large Scale
– A few big ones, very expensive
– Oracle (Darwin)
– IBM (Intelligent Miner)
– SAS (Enterprise Miner)
– SPSS (Clementine)
– Silicon Graphics (Mindset), more on visualization
• Small Scale
– Lots of these (desktop data miniing)
– Pattern Recognition Workbench
– CART, etc.

29

Data
Data Mining:
Mining: Summary
Summary
• Data mining attempts to find patterns in data that
we did not know about
• Often data mining is just a new buzzword for
statistics
• Data mining differs from statistics in that large
volumes of data are used
• Many different data mining algorithms exist and
we will discuss them in the course
• Examples
– identify users who are most likely to commit credit card fraud
– identify what attributes about a person most results in them buying
product x.
30

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 15
Chapter: Introduction to Data Warehousing
and Data Mining

Overview
Overview
Data
Data preparation for
preparation for Data
Data Mining
Mining

31

Overview
Overview
• Development Process
• Defining the Problem
• Collection data
• Preparing data

32

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 16
Chapter: Introduction to Data Warehousing
and Data Mining

Overview
Overview of
of the
the process
process

Define the
Problem

Define the
Problem Develop
Choose Algorithm Model with
Training data
Define the
Problem
Evaluate
Define the The Model
Problem Repeat
Whole Process
33

Define
Define the
the problem
problem
• What problems are suitable
• Evaluating the results
• What are the inputs and the outputs

34

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 17
Chapter: Introduction to Data Warehousing
and Data Mining

Suitability
Suitability of
of the
the Problem
Problem
• Don’t Use
– If there is a nice formula to answer the question (how long will it
take light to travel from Mars to Pluto?)
• Use
– No good existing solution (nothing to lose)
– Lots of historical data is lying around
– Problem is not easily understood
– Problem can be characterized as input-to-output relationship
– Existing models have lots of problems

35

Modeling
Modeling the
the Error
Error
• Sometimes a known technique results in a certain
amount of error.
• If data mining can be used to estimate the error,
this can be useful.
• You can then use data mining in conjunction with
a known technique.

36

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 18
Chapter: Introduction to Data Warehousing
and Data Mining

Evaluate
Evaluate Results
Results
• Level of accuracy needed
• What level of accuracy defines success
• How will we benchmark performance
• What other techniques will be tested
• What kind of data will be used to evaluate the
models.

37

Classification
Classification or
or Estimation
Estimation
• Classification says we need to categorize data
(good customer, bad customer, etc.)
• Estimation we want to define an estimate of how
good a customer will be.
• Very fine line.

38

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 19
Chapter: Introduction to Data Warehousing
and Data Mining

Examples
Examples of
of Classification
Classification
• Classification
– Predicting likelihood of a prospect responding to a marketing offer
– Automatically classifying bitmaps of handwritten characters (A-Z).
– Identifying cookies as “tasty”, “burnt”, “hopeless”
– Classifying bank mortgage as “good risk” or “bad risk”

39

Examples
Examples of
of Estimation
Estimation
• Predict the amount of money a customer will
spend in next six months.
• Estimate the market price for an option based on
time to expiration.
• Estimate the highest temperature for Chicago in
the winter of 2008.
• Estimate who will win the super bowl.

40

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 20
Chapter: Introduction to Data Warehousing
and Data Mining

Overlap
Overlap of
of classification
classification and
and estimation
estimation
• We could either classify cookies as “excellent”,
“bad”, “tasty”, “hopeless” or we could estimate
their “quality value” on a scale of 1-10 with 1
being “hopeless” and 10 being “excellent”.
• In general classification is better if there is a
natural grouping of classes.

41

Rules
Rules to
to choose
choose between
between classification
classification and
and
estimation
estimation
• Are there a finite number of classes
• Classes should be independent such that a
“class 1” is not any closer to a “class 2”. The
difference in “young”, “teenage”, “middleaged” is
really not a good classification problem (too easy
to miss and land in the wrong class).
• Good classes might be to predict someones
occuptation (e.g; farmer, baseball player, janitor,
educator).
• Discrete or continuous output. If output is
continuous estimation is probably better.

42

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 21
Chapter: Introduction to Data Warehousing
and Data Mining

Input
Input and
and Output
Output
• For modeling, the output is the dependent
variable
• For estimation, the output is the classification of
the input.
• Start with small number of inputs and small
number of outputs.
• Most data mining is time consuming because key
data is often missing or inaccurate.
• Inputs should be casually related to output.
– E.g. not like the “super bowl effect” where if the NFC wins, stock
prices rise.

43

Collecting
Collecting Data
Data
• What data to collect
– Collect example patterns
– Use recent data if system changes
– Start with a small set of data

44

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 22
Chapter: Introduction to Data Warehousing
and Data Mining

How
How to
to Collect
Collect Data
Data

• Choose an appropriate sample rate for time series


• Make sure data measurements are consistent (can’t
use some in meters and some in yards)
• Leave out non-essential input variables
• Make sure variables that vary have different values in
your training set
• Make sure no major physical changes have occurred
since you last collected data (e.g. new baseball stadium
would affect baseball predictions).

45

How
How much
much data
data in
in enough?
enough?
• Test by training on a small subset and compare
with a large subset.
• If the model does not change you might be done.
• No good answer, it turns out to be very hard to
compute how much data is needed for most data
mining algorithms.
• Note that sampling is often an effective
technique, but if a large dataset is available, data
mining algorithms are becoming scalable enough
so that sampling is not a necessity.

46

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 23
Chapter: Introduction to Data Warehousing
and Data Mining

Preparing
Preparing Data
Data
• Very few organizations have “clean” data.
• Data entry systems often lack integrity rules or
many different systems are used to populate a
data mining input set. ;

47

Preparing
Preparing Data
Data
• Handling missing data
• Choosing numerical encodings (transforming
categorical data)
• Discarding erroneous data

48

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 24
Chapter: Introduction to Data Warehousing
and Data Mining

Handling
Handling Missing
Missing Data
Data
• Manually examine missing data and enter
reasonable values
• Use automated methods to fill in the missing
values.
– Average
– Mode
• Encode the fact the value is missing, instead of
just A, B, C you have A, B, C with A-missing, B-
missing, and C-missing.

49

Example
Example

Initial Input Preprocessed Input


A BC A’ AM B’ BM C’ CM

100 0 120 100 0 0 0 120 0


? 1 140 0 1 1 0 140 0
97 ? 200 97 0 0 1 200 0
96 2 ? 96 0 2 0 0 1
98 1 160 98 0 1 0 160 0

AM indicates whether or not A is a missing value.


50

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 25
Chapter: Introduction to Data Warehousing
and Data Mining

Transforming
Transforming Data
Data
• Possible to transform data in categories input
input values
– Manual
– PCA (principle component analysis) Reduction
– Uses eigenvectors to “represent” a vector space

51

Inconsistent
Inconsistent Data
Data
• Display input data in a graph to visually check for
outliers
• Thresholding to ensure that data falls within
certain bounds
• Calculate mean and standard deviation to look for
outliers
• Winzorize data
– Set all values below a threshold to the min value
– Set all values above the threshold to a max value

52

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 26
Chapter: Introduction to Data Warehousing
and Data Mining

Details on
Preprocessing

53

Overview
Overview

• Need for preprocessing


• Averaging
• Thresholding
• Reducing the input space
• Normalizing
• Modifying prior probabilities

54

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 27
Chapter: Introduction to Data Warehousing
and Data Mining

Need
Need for
for Preprocessing
Preprocessing

• Input space is large


10
– If you have 10 variables with only 10 possible values you have 10 =
10,000,000,000 possible different values. Need lots of training data to
find meaningful patterns.
• Often data mining can benefit from feature extraction
– Instead of just have words, perhaps identify key features such as Person,
Organization, etc.
• Data often has missing values, need to fill them in.
• Normalize data (many sources might use different
units of measure)

55

Averaging
Averaging
• If the data is primarily time series data, a good
method is to average several data points at a
time.
• For every 5 points simply use as input the
average of each 5.
Time Value
T1 100
T2 200
T3 300
T4 400
T5 500

• Preprocessed: T1-T5: 300

56

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 28
Chapter: Introduction to Data Warehousing
and Data Mining

Thresholding
Thresholding
• For an image you might change all values of
“dark gray” to black and all values of “very light
gray” to bare.

57

Reducing
Reducing the
the Input
Input Space
Space
• Principle Component Analysis
– Factor analysis that redduces the number of input variables
– Attempts to identify an m-dimensional subspace of the n-
dimensional input space that seems most “significant”
– PCA works by computing n vectors known as the prinicpal
components which provide a basis for the input space. The vecotrs
are ordered in sequence from most to least significant.
– PCA requires all input variables to be normalized within the same
range of data.
– The highest prinicpal compoents will exhibit the highest variance.
– Input space can be reduced by throwing out the ones with the
lowest variance.

58

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 29
Chapter: Introduction to Data Warehousing
and Data Mining

More
More on
on reducing
reducing the
the input
input space
space
• Sensitivity Analysis
– Studies how the changes of an input variable affects the output
variable.
– If you do a regression and find one of the coefficients is zero, that
variable may be viewed as having no real bearing in the production
of the output.
– Different from other data reduction in that we are looking at the
output.
– The idea is that if a variable changes and the output changes it’s a
useful variable. Otherwise, it may not be so important. Here, we
can see A, and B are more important C. C changes when output Y
doesn’t change and vice versa.
Y A B C
100 10 20 -50
100 12 14 300
150 35 40 -50
59
200 20 30 -50

Normalizing
Normalizing Data
Data
• Many algorithms require normalized data – all
data lies on a common scale (e.g. between zero
and one)
• Three flavors
– Min-max
– Zscore
– Sigmoidal

60

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 30
Chapter: Introduction to Data Warehousing
and Data Mining

Min-max
Min-max normalization
normalization
• Linear transformation of the original input range
into a newly specified data range (typically 0-1).
• Old min value is mapped to new min
• Old max is mapped to new max
• Let y be the original value, y’ be the new value.
• Min1, max1 are the original min, max
• Min2, max2 are the new min max

 y − min 1 
y'=  (max 2 − min 2) + min 2
 max 1 − min 1 

61

Min-max
Min-max examples
examples

 y − min 1 
y' =  (max 2 − min 2) + min 2
 max 1 − min 1 
• Consider old data that ranged from 0-100, we now
obtain an equation to migrate it to 5-10.

 y 
y' =   + 5
 20 

• Old = 0, New = 5
• Old = 10, New = 5.5
• Old = 90, New = 9.5 62

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 31
Chapter: Introduction to Data Warehousing
and Data Mining

Zscore
Zscore
• Translate the input variable so that the mean is
zero and the variance is one.
• The attempts to center the data around the origin.
The goal is that most of the data will lie within the
origin to one standard deviation.
• Works well when min and max are unknown
• If majority of data falls within 50 and 100, but you
have a few data points outside of that range,
zscore will compress most of the data into a
small range.

63

Zscore
Zscore Continued
Continued

y − mean Ex: 2
y' = Y Y'
std 45
35
0.24 Average
0.10
27.90

70 0.58 Std-dev 72.08


Y Y'
69 0.57
0.23 -0.56 Average 0.80
0.79 -0.01 22 -0.08
0.52 -0.28 Std-dev 1.02 10 -0.25
Ex: 1 0.25 -0.54 5 -0.32
0.80 0.00 15 -0.18
0.65 -0.15 30 0.03
0.92 0.11
300 3.78
0.21 -0.58
21 -0.10
0.64 -0.15
0.17 -0.62 11 -0.23
0.63 -0.17 14 -0.19
0.65 -0.15 27 -0.01
0.67 -0.13 -50 -1.08
0.71 -0.09 -75 -1.43
0.98 0.17
2 -0.36
0.84 0.04
0.10 -0.69 2 -0.36
0.77 -0.03 2 -0.36
0.50 -0.30 3 -0.35 64
5.00 4.11

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 32
Chapter: Introduction to Data Warehousing
and Data Mining

Sigmoidal
Sigmoidal Normalization
Normalization
• Transforms input data non-linearly into the range
–1 to 1.
• Uses mean and standard deviation
• Outliers are compressed along the means of the
sigmoidal function
• Good for data where it is desirable to keep the
outliers

65

Sigmoidal
Sigmoidal Normalization
Normalization Example
Example 11
Alpha Y Sig Y'
-0.56 0.23 -0.27 Average 0.80
y − mean -0.01 0.79 -0.01
α= -0.28 0.52 -0.14 Std-dev 1.02
std -0.54 0.25 -0.26
0.00 0.80 0.00
-0.15 0.65 -0.08
0.11 0.92 0.06

1 − e −α -0.58 0.21 -0.28

y' =
-0.15 0.64 -0.08
-0.62 0.17 -0.30

1 + e −α -0.17
-0.15
0.63
0.65
-0.08
-0.07
-0.13 0.67 -0.06
-0.09 0.71 -0.04
0.17 0.98 0.09
0.04 0.84 0.02
-0.69 0.10 -0.33
-0.03 0.77 -0.02
-0.29 0.50 -0.15
4.11 5.00 0.97
66

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 33
Chapter: Introduction to Data Warehousing
and Data Mining

Sigmoidal
Sigmoidal Normalization
Normalization Example
Example 22
Alpha Y Sig Y'
0.24 45 0.12 Average 27.90
0.10 35 0.05
0.58 70 0.28 Std-dev 72.08
0.57 69 0.28
-0.08 22 -0.04
-0.25 10 -0.12
-0.32 5 -0.16
-0.18 15 -0.09
0.03 30 0.01
3.78 300 0.96
-0.10 21 -0.05
-0.23 11 -0.12
-0.19 14 -0.10
-0.01 27 -0.01
-1.08 -50 -0.49
-1.43 -75 -0.61
-0.36 2 -0.18
-0.36 2 -0.18
-0.36 2 -0.18
-0.35 3 -0.17 67

Sigmoidal
Sigmoidal and
and Zscore
Zscore (Example
(Example 1)
1)
Y Zscore Y' Sig Y' Y Zcore Y' Sig Y'
0.23 -0.56 -0.27 45.00 0.24 0.12
0.79 -0.01 -0.01 35.00 0.10 0.05
0.52 -0.28 -0.14 70.00 0.58 0.28
0.25 -0.54 -0.26 69.00 0.57 0.28
0.80 0.00 0.00 22.00 -0.08 -0.04
0.65 -0.15 -0.08 10.00 -0.25 -0.12
0.92 0.11 0.06 5.00 -0.32 -0.16
Ex: 1 0.21 -0.58 -0.28 15.00 -0.18 -0.09 Ex: 2
0.64 -0.15 -0.08 30.00 0.03 0.01
0.17 -0.62 -0.30 300.00 3.78 0.96
0.63 -0.17 -0.08 21.00 -0.10 -0.05
0.65 -0.15 -0.07 11.00 -0.23 -0.12
0.67 -0.13 -0.06 14.00 -0.19 -0.10
0.71 -0.09 -0.04 27.00 -0.01 -0.01
0.98 0.17 0.09 -50.00 -1.08 -0.49
0.84 0.04 0.02 -75.00 -1.43 -0.61
0.10 -0.69 -0.33 2.00 -0.36 -0.18
0.77 -0.03 -0.02 2.00 -0.36 -0.18
0.50 -0.30 -0.15 2.00 -0.36 -0.18
5.00 4.11 0.97 3.00 -0.35 -0.17 68

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 34
Chapter: Introduction to Data Warehousing
and Data Mining

Summary
Summary
• Many large data mining projects spend most of
their time on data cleanup and preparation (often
cited as 80 percent of their work)
• Various normalization techniques
– Min-max
– Zscore
– Sigmoidal

69

Linear
Linear Regression
Regression

70

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 35
Chapter: Introduction to Data Warehousing
and Data Mining

Overview
Overview
• Goal is to fit points to a line
• Once equation for a line is obtained, new values
can be estimated (predicted) from inputs.

71

Example
Example

• Consider the following points for y and x.


Given a value of x we would like a model that
allows us to predict y.

Sample Data
Y X
15 5 60
50
22 10 40
35 20 30
Y

60 25 20
10
70 30 0
80 40 0 10 20 30 40 50 60 70 80 90 100
90 50 X 72

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 36
Chapter: Introduction to Data Warehousing
and Data Mining

Deriving
Deriving the
the Algorithm
Algorithm

• Equation for a line (one variable) is


y = mx + b
• where x is the input variable and b is the
slope. Goal is to compute the slope m and
the intercept b.
• Equation for a plane in multidimensional
space (we need for this multiple input
variables). Here we use w
y = w0 + w1x1 + w2 x2 + ... + wN x N
73

Input
Input

• We might have several records of training data,


where each record indicates the values of each
input variable and the output variable.
• Input file would look like:

Record 1: y1 x11 x12


Record 2: y2 x21 x22
Record 3: y2 x31 x32

74

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 37
Chapter: Introduction to Data Warehousing
and Data Mining

Solving
Solving for
for Multiple
Multiple Variables
Variables

Record 1: y1 x11 x12


Record 2: y2 x21 x22
Record 3: y2 x31 x32

• We now need one set of slopes that will come


close to solving the following equations:
y1 = w0 + w1 x11 + w2 x12
y2 = w0 + w1 x 21 + w2 x22
y3 = w0 + w1 x31 + w2 x32
75

Using
Using Matrices
Matrices for
for Regression
Regression

• Start with: y1 = w0 + w1 x11 + w2 x12


y2 = w0 + w1 x 21 + w2 x22
y3 = w0 + w1 x31 + w2 x32

X W D
• Cast this 1 x11 x12   w0   y1 = w0 + w1 x11 + w2 x12 
as a matrix     
multiplicat 1 x21 x 22   w1  =  y2 = w0 + w1 x 21 + w2 x 22 
1 x x   w   y = w + w x + w x 
ion:  31 32   2   3 0 1 31 2 32 

Now we just need to solve XW = D 76

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 38
Chapter: Introduction to Data Warehousing
and Data Mining

Solving
Solving the
the Equation
Equation

• Start with: XW = D
• We want to solve for W. Pre-multiply each side by XT (the
transpose of X). We now obtain:

(XT X)W = (X TD)


• Now pre-multiply each side by the inverse of (X TX)-1
(X T X) −1 (X T X)W = (X T X) −1(X T D)

• We obtain: W= (XTX)−1(XTD)

• Since: (XT X) −1(XT X) = I • I is the Identity


77
Matrix

Important
Important Points
Points
• W is just an estimate – not always possible to find
an exact matrix inverse
• Matrix inversion can be a computationally
expensive process

78

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 39
Chapter: Introduction to Data Warehousing
and Data Mining

Algorithm
Algorithm
• Read input variables from a file
• Compute X-transpose
• Matrix Multiple X-transpose with X, store in temp
• Take the matrix inverse of temp and store the
result in left
• Multiply X-transpose with D and store in right
• Multiply left and right

79

Example (compute left)


Example (compute left)

T
• Start with our two dimensional example and compute X

1 1 1 1 1 1 1 1 
X T
=    1 5 
1 10 20 25 25 30 40 50   
 1 10 
 1 20 
 
X = 1 25
 7 180  



( X X) = 
T
 
1 30

180 6150  1 40 
 
 1 50 

 0.577 - 0.017 
( XT X) −1 =  
- 0.017 0 
80

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 40
Chapter: Introduction to Data Warehousing
and Data Mining

Example (compute right)


Example (compute right)
• Start with the matrix and compute XTD

1 1 1 1 1 1 1 1 
X T
=    15 
1 10 20 25 25 30 40 50   
22
 
 372   
XT D =  35
 =
 
12295 D  60 
 70 
 
 0.577 - 0.017   
( XT X) −1 =  80
  
- 0.017 0   90 

7.014 
W = (XT X)−1 (XT D) =  
1.794  81

Graphing
Graphing Solution
Solution
• Using the estimate for W we obtain XW

 1 5  15.984 
  7.014   24.953 
W=
 1 10 
 1 20    
X =

1 25

1.794 

 1 30

  42.892 

 1 40

  

 1 50 
 XW = 51.862 
60.831 
 
78.77 
96.709 
 82 

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 41
Chapter: Introduction to Data Warehousing
and Data Mining

Predicted
Predicted vs
vs Actual
Actual

15.984  15 
 24.953   22 
   
Sample Data

 42.892  35  120


    100
XW = 51.862  D = 60  80
60

Y
60.831  70  40
    20
78.77  80  0

96.709  90  0 10 20 30 40 50 60
    X

83

Implementation
Implementation Issues
Issues
• Lots of good matrix class libraries
• Many matrix multiplication and matrix inversion
algorithms
• Many libraries lack support for cases where the
matrix does not fit into main memory

84

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 42
Chapter: Introduction to Data Warehousing
and Data Mining

Decision
Decision Trees
Trees

85

Algorithm
Algorithm

• Assume we have a data set D and a list of variables to use to build the
tree.
• A split criteria is an object that consists of a variable and a value (e.g.; x >
10)

Algorithm BuildTree(D, vars)


currSplit := bestSplit(D,vars)
vars := vars – currSplit.variable
leftD = Partition(D, currSplit, left)
rightD = Partition(D, currSplit, right)’
BuildTree(leftD, vars)
BuildTree(rightD, vars)
End

86

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 43
Chapter: Introduction to Data Warehousing
and Data Mining

Computing
Computing aa good
good Split
Split
Φ ( s ' , t ) = Max( Φ i ( si ' , t ))
numClasses
Φ(s,t) = 2PLPR ∑| P( j, t ) − P( j,t ) |
L R
j=1
tL = left split criteria (e.g.; x < 10)

tR = right split criteria (e.gl; x >=10)

vL = number of values for this variable (e.g.; x) that match criteria tL

vR = number of values for this variable that match criteria tR


cL = number of class j patterns for tL

cR = number of class j patterns for tR

n = size of the training set

p = number of patterns at t (size of training set at this node)

vL v c c
PL = PR = R P( j,tL ) = L P( j,tR) = R
n n p p 87

Details
Details of
of Splitting
Splitting

Φ( s' , t ) = Max(Φ i ( si ' , t ))


Takes the maximum value of “goodness” of splits for each
possible split criteria.

For a variable with a range of 1 to 5, we must take the


maximum of:

X < 1, X >= 1
X < 2, X >= 2
X < 3, X >= 3
X < 4, X >= 4
X < 5, X >= 5

88

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 44
Chapter: Introduction to Data Warehousing
and Data Mining

Computing
Computing Goodness
Goodness of
of Split
Split

numClasses
Φ (s , t ) = 2PL PR ∑ | P( j , t
j =1
L ) − P( j , t R ) |
For a single split value, we compute the “goodness” by testing how
well it splits all three classes. Consider a split of X < 5 for the
following X, Y combination where we are using X to predict Y.

X Y P L = 4/8 = 0.5 P R = 4/8 = 0.5


1 0
2 0 2 P L P R = (2)(.5)(.5) = .5
3 0
4 0 For Class 0:
5 1 P(j, tL) = 4/4 = 1 P(j, tR ) = 0 / 4 = 0
6 1
7 1
8 1 For Class 1:
P(j, tL) = 0/4 = 1 P(j, tR ) = 4/4 = 1

Total for X < 5: (.5) (2) = 1.0 89

Split
Split Example
Example

1 0 NumRows 8
2 0
3 0
4 0
5 1
6 1
7 1
8 1

Split X<2 X<3 X<4 X<5 X<6 X<7 X<8


patl 1 2 3 4 5 6 7
patr 7 6 5 4 3 2 1
Pl 0.125 0.25 0.375 0.5 0.625 0.75 0.875
Pr 0.875 0.75 0.625 0.5 0.375 0.25 0.125
Pjl0 1.00 1.00 1.00 1.00 0.80 0.67 0.57
Pjr0 0.43 0.33 0.20 0.00 0.00 0.00 0.00
Pjl1 0.00 0.00 0.00 0.00 0.20 0.33 0.43
Pjr1 0.57 0.67 0.80 1.00 1.00 1.00 1.00

2PlPr 0.22 0.38 0.47 0.50 0.47 0.38 0.22


Pj0 0.57 0.67 0.80 1.00 0.80 0.67 0.57
Pj1 0.57 0.67 0.80 1.00 0.80 0.67 0.57
Sum 0.25 0.50 0.75 1.00 0.75 0.50 0.25

90

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 45
Chapter: Introduction to Data Warehousing
and Data Mining

Summary
Summary
• For each variable, obtain all distinct values
– For each value I
– For each possible output class
– Compute the split as applied to the class
– Sum all the classes
• Take the max of all “goodness of split” values.

91

Scalability
Scalability
• Decision Tree is usually built such that it fits
entirely in memory
• Can do a memory-management scheme in which
each split is computed incrementally.
• Also, can build several decision trees and attempt
to merge them.

92

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 46
Chapter: Introduction to Data Warehousing
and Data Mining

Bootstrapped
Bootstrapped Optimistic
Optimistic Algorithm
Algorithm for
for Tree
Tree
Construction (BOAT)
Construction (BOAT)

• BOAT
– Goal is to improve scalability of decision tree algorithms
– Most algorithms require training data to fit into main memory

93

Details
Details of
of BOAT
BOAT
• Basic Steps of BOAT
– Build an initial estimate of a good tree by taking several samples of the
training data (each sample fits into main memory)
– Build a decision tree for each sample
– Using all the samples, build an estimated tree. This is done by scanning
in a top down fashion each sample. If the splitting variable matches at a
given node there are two cases:
– Categorical variable
» (e.g.; gender = male). If the variable and value match perfectly in all the
estimated trees, use this criteria as the estimated criteria.
» Continuous variable. If the variable matches, then build a wide
estimated split range that includes all of the ranges. For example: If
one tree chooses X < 10 and another uses X < 15 – use X < 15 in our
estimated tree.
– Refine the estimates

94

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 47
Chapter: Introduction to Data Warehousing
and Data Mining

BOAT
BOAT (continued)
(continued)

• Refining the estimates


– For a wide confidence range, obtain all training data that fits into that
range (this is a much smaller set than the whole training data). This either
fits into main memory or can be periodically written to temp files.
– Now if it’s a categorical variable there is no range, so there is no
refinement to do.
– If it s a continuous variable (10 < x < 30), try all splits within the range
and test the purity of each split. Use the split with the best purity.
– Now with the refined split, we can move on down the tree
• More details are in the paper
– J. Gehrke, V. Ganti, R. Ramakrishnan, W. Loh, “BOAT-Optimistic
Decision Tree Construction”, In Proceedings of the 1999 SIGMOD
Conference, Philadelphia, Pennsylvania, 1999.
– (paper is at: http://www.cs.cornell.edu/johannes/)

95

Overview
Overview
• Statement of the Problem
• Simple Examples
• Algorithm
• Multilevel associations

96

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 48
Chapter: Introduction to Data Warehousing
and Data Mining

Problem
Problem
• Basic idea is that we are given a set of
transactions such as person bought bread,
butter, milk, eggs and we would like to identify
rules such that:

– BREAD, BUTTER à Eggs

• The idea is that we need to count various


combinations of items that co-occur in a
transaction and then filter them for the ones that
are most common.

97

Queries
Queries
• Some common queries that association rules
might provide some answers:
– Find all rules that have Apple Juice in the consequent. Might give
some idea of how to boost Juice sales.
– Find all rules that have “shoe polish” in the antecedent. Might
give some idea of what would happen to sales of other products if
“shoe polish” was discontinued.
– Find all rules that have “jelly” in the antecedent and “peanut
butter” in the consequent. Tell us what else might be sold with the
peanut butter and jelly combination.
– Find all rules relating items located on a shelf A. Determine if co-
locating on a shelf boosts sales.
– Note that some large installations have said that they are not t hat
inclined to re-arrange their shelves on a frequent basis.
– Which parts of a shelf result in the most sales.

98

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 49
Chapter: Introduction to Data Warehousing
and Data Mining

Formal
Formal Statement
Statement of
of the
the Problem
Problem
• Literals (e.g. bread, milk, etc.) : I = {i1 , i2 ,..., in }

• D set of transactions that contain items in I. Note the


quantity of items bought is not considered. Each
transaction has a unique id, TID.
• Let X be a set of items from I. A transaction T is said to
contain X if and only if X is a proper subset of T.
• An association rule is an implication of the form XèY
where X is a subset of items and Y is a subset of items and
X and Y have no items in common (that is we don’t say
Bread è Bread).
• The rule XèY holds in transaction set D with confidence c
if c% of transactions in D that contain X also contain Y.
• XèY has support s in the transaction set D if s% of
transactions in D contain X U Y.
• A strong rule has high confidence and high support
99

Example
Example
• Consider the following set of transactions:

TID Items

100 A, C, D
200 B, C, E
300 A, B, C, E
400 B, E

• Consider the rule Aè C


• Confidence is 2/2 = 100%, both of the transactions that
contain A also contain C.
• Support is 2/4 = 50% of the transactions contain A and C.

100

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 50
Chapter: Introduction to Data Warehousing
and Data Mining

General
General Approach
Approach
• To find association rules the worst case is exponential.
• Consider 3 items, a,b, and c. We have the following rules to
consider:

aà b
aà c
aà b,c

bà a
bà c
bà a,c

cà a
cà b
cà a,b

• The general approach is to identify the itemsets with some


support above a threshold s.
• For the large itemsets generate the assocation rules for the
database.
101

Overview
Overview of
of the
the Algorithm
Algorithm

Find initial significant candidate items


This is done by counting the number of transactions for all
single items. Keep the big ones. These are the candidate
itemset of size one. Lets say {{bread},{butter}}.
Do until there are no more candidates
• k=k +1
• Generate all itemsets of size k from the candidate set of k-1.
The 2-itemset would be {bread,butter}.
• Update the count of all the candidate itemsets of size k
• Eliminate any candidates with support that does not exceed
the threshold.
End
102

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 51
Chapter: Introduction to Data Warehousing
and Data Mining

Simple
Simple Algorithm
Algorithm for
for items
items of
of size
size 33

• Step 1: Count all itemsets of size 1.


• Step 2: Remove all itemsets below support s
• Step 3: Use remaining itemsets to identify all
candidates of size 2.
• Step 4: Obtain transaction count of all
candidates (have to scan the database) of size 2.
• Step 5: Remove all itemsets below s.
• Step 6: Scan all candidates of size 2, if any of the
candidates have a common beginning (say AB
and AC), now check to see if the ending (BC) is a
candidate. If the ending is in the set of
candidates, form a new candidate of size 3 (ABC).

103

Example
Example of
of Simple
Simple Algorithm
Algorithm
• Step 1: Count on all itemsets of size 1

Itemset Count
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
• Step 2: Remove all itemsets of size s (lets say s=1)
Itemset Count
{A} 2
{B} 3
{C} 3
{E} 3
104

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 52
Chapter: Introduction to Data Warehousing
and Data Mining

Example
Example (continued)
(continued)
• Step 3 and 4: Scan and count remaining itemsets of
size 2 (use size 1 sets combined with themselves)

Itemset Count
{A B} 1
{A C} 2
{A E} 1
{B C} 2
{B E} 3
{C E} 2

105

Example
Example (continued)
(continued)
• Step 5: Remove all 2-itemsets without enough support.

Itemset Count
{A C} 2
{B C} 2
{B E} 3
{C E} 2
• Step 6: Now we need a 3-itemset. Scan all candidates of size 2, if
any of the candidates have a common beginning {B C} and {B E},
now check the size of the ending {C E}. {C E} is in our list so this
means all subsets of {BCE} are of at least size two. Hence, it is a
candidate for a size 3 itemset.
• Step 7: Scan the database to obtain count for {BCE}. It is two,
hence, the algorithm ends as there are no four-level itemsets that
can be generated by {BCE} because there is only one 3-itemset.
106

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 53
Chapter: Introduction to Data Warehousing
and Data Mining

Overview
Overview of
of the
the Details
Details
• Two key steps have been glossed over at this
point
– Generating all pairs of itemsets that should be considered for the
next stage.
– Scanning the database to update the counts for the candidate
itemsets. This is not so easy because the candidate itemsets may
be {{AB},{CD}} and a transaction containing them may be
{ABCD}, or {ABCDE}. Thus a single transaction may update
many candidates in an itemset.

107

Generate
Generate and
and Prune
Prune
• Generate
– We need to take an existing candidate itemset (say itemsets of size k) and
generate all itemsets of size k+1.
– First we find all itemsets with a common prefix (all entries except for the
last digit) and then make sure their suffix (last digit) is in the candidate set
as well.
– Consider A = {{123},{124},{134},{135},{234}}
– Match all of these with themselves (A x A)
– We obtain: {{123 124}, {123 134}, {123 135}, {123 234},…,}
– We find that {123 124} has a common prefix {12} so we have a new
candidate {1234}, similarly {134 135} will generate {1345}
• Prune
– Now we prune the candidates by checking to see if all subsets of size k-1
of each candidate are indeed candidates.
– For {1234}, we have subsets {{123},{134},{234},{124}} so it is fine.
– For {1345}, we have subsets {{134},{135},{145},{345}}, {134} and
{135} are in the candidates, but {145} and for that matter {345} are NOT
so this one is pruned.

108

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 54
Chapter: Introduction to Data Warehousing
and Data Mining

Update
Update Itemset
Itemset Counts
Counts
• Now we have a set of itemsets and a set of transactions.
The items in an itemset and an entry of a transaction are
listed in sorted order (e.g.; {123} instead of {231}). Consider
the 2-itemset {{13}, {15}, {35}} with transactions:

• T1: 245, 1345, 3487,


• T2: 1245

• Note that T1 requires an update to the count of itemsets


{13}, {15}, and {35}. T2 requires an update to {15}.
• Crude algorithm would be for each transaction, scan all
items in the transaction, for each item break it up into all
subsets and then check each subset with the list of
itemsets. If we end up with a LOT of candidate itemsets or
big transactions, this can be expensive.

109

scanCount
scanCount with
with HashTree
HashTree
• scanCount can be improved with a HashTree.
• HashTree has at its leaves a set of itemsets. Any nonleaf entry
simply contains a hash table that points to descendants. The
pointer to descendants is obtained by a hash function.
• All leaves share a common prefix.
• Leaves split when they contain above a threshold of too many
nodes. At this point a hash table is built with an entry for each of
the distinct suffixes in the leaf node.
H(1) 2
1 H(2)
H(3)

H(1) 3
H(2)
H(4) H(3)

4
238
239

145
Note a transaction of 124 or 1234, etc.
Would all land in this leaf. 110

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 55
Chapter: Introduction to Data Warehousing
and Data Mining

Algorithm
Algorithm
• Definitions
– Lk – set of large k itemsets (those with minimum support)
– Ck – set of candidate itemsets (potentially large itemsets), has two attributes itemset
and count.

L1 = {large 1 itemsets)
For (k = 2; Lk-1 <> 0; k++) do begin
Ck = generate(Lk-1 )
// now we have to scan all transactions and update the counts for the candidate itemsets
Forall transactions t in D do begin
Ct = subset(Ck,t)
Forall candidates c in Ct do
C.count++
End
End
Lk = {c in Ck | c.count >= minimumSupport}
End
Answer is the Union of all large K itemsets
111

Improvements
Improvements
• The basic algorithm can be improved by:
– Reducing size of database that needs to be searched
– Using parallel processors
– Developing better filters that restrict the number of candidate
itemsets
– Incremental update of associaton rules so that you do not have to
rescan to identify them
– On-line generation of rules so users can change thresholds as they
are being computed
– Various things can be done for scalability
– Use database system for access to transactions
– Run this algorithms for as many transactions as will fit into main
memory, then sum results, etc.

112

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 56
Chapter: Introduction to Data Warehousing
and Data Mining

Generalized
Generalized Association
Association Rules
Rules
• Often products are associated into categories
such as Bread: wheat bread, white bread, nine-
grain bread.
• Rules may only be meaningful when used at
multiple levels.

Food

milk bread

2% chocolate white wheat

113

Summary
Summary
• Association rules can be helpful in making
business decisions
• Expensive to compute, most algorithms
developed today are based on the one we have
described.
• Scalability to large volumes of data remains a key
issue.

114

© 1999 Carrig Emerging Technology, Inc.


All rights reserved. v1.0 57

Vous aimerez peut-être aussi