Vous êtes sur la page 1sur 21

FP-Growth algorithm

Lecture 33/15-10-09

Lecture 33/15-10-09 1
Observations about FP-tree
• Size of FP-tree depends on how items are
ordered.
• In the previous example, if ordering is done in
increasing order, the resulting FP-tree will be
different and for this example, it will be denser
(wider).
• At the root node the branching factor will
increase from 2 to 5 as shown on next slide.
• Also, ordering by decreasing support count
doesn’t always lead to the smallest tree.

Lecture 33/15-10-09 2
Lecture 33/15-10-09 3
FP-Growth

FP-growth Algorithm:
Mining Frequent Patterns
Using FP-tree

Lecture 33/15-10-09 4
Frequent itemset generation using FP-growth
algorithm
• This algo generates frequent itemsets from
FP-tree by traversing in bottom-up fashion.
• This algo extracts frequent itemsets ending in
‘e’ first and then ending in ‘d’, ‘c’, ‘b’ and ‘a’.
• As every trans. is mapped onto a single path
in the FP-tree, so frequent itemsets, say
ending in ‘e’ can be found by investigating
the paths containing node ‘e’.

5
Lecture 33/15-10-09 6
Mining Frequent Patterns Using FP-tree

• General idea (divide-and-conquer)


Recursively grow frequent patterns using the FP-
tree: looking for shorter ones recursively and then
concatenating the suffix:
– For each frequent item, construct its
conditional pattern base, and then its
conditional FP-tree;
– Repeat the process on each newly created
conditional FP-tree until the resulting FP-tree is
empty.

Lecture 33/15-10-09 7
Major Steps of FP-Growth algorithm
Starting the processing from the end of list L:
Step 1:
Construct conditional pattern base for each item in the header table.

Step 2:
Construct conditional FP-tree from each conditional pattern base.

Step 3:
Recursively mine conditional FP-trees and grow frequent patterns
obtained so far.

Lecture 33/15-10-09 8
Step 1: Construct Conditional Pattern Base
• Starting at the bottom of frequent-item header table in the FP-tree
• Traverse the FP-tree by following the link of each frequent item
• Accumulate all of transformed prefix paths of that item to form a conditional
pattern base

{} Conditional pattern bases


Header Table item cond. pattern base
f:4 c:1 p fcam:2, cb:1
Item head
f m fca:2, fcab:1
c c:3 b:1 b:1 b fca:1, f:1, c:1
a a fc:3
b a:3 p:1
m c f:3
p m:2 b:1 f {}

p:2 m:1 Considering ‘p’ as suffix ,its 2 corresponding


prefix paths are {(fcam:2)} and {(cb:1)} 9
FP-Growth: An Example

Step 2: Construct Conditional FP-tree

• For each pattern base


– Accumulate the count for each item in the base
– Construct the conditional FP-tree for the frequent items of the
pattern base
{}
Header Table
Item head f:4 {}
f 4
c 4 c:3 f:3
m- cond. pattern base:
a 3
b 3
a:3  fca:2, fcab:1 
c:3
m 3 m:2 b:1
p 3 a:3
m:1 m-conditional FP-tree

Lecture 33/15-10-09 10
Principles of FP-Growth
(why ‘b’ is not considered?)

• Pattern growth property


– Let α be a frequent itemset in DB, B be α 's conditional pattern
base, and β be an itemset in B. Then α ∪ β is a frequent
itemset in DB iff β is frequent in B.
• Is “fcabm ” a frequent pattern?
– “fcab” is a branch of m's conditional pattern base
– “b” is NOT frequent in transactions containing “fcab ”
– “bm” is NOT a frequent itemset.

Lecture 33/15-10-09 11
FP-Growth
Step 3: Recursively mine the conditional
FP-tree
conditional FP-tree of conditional FP-tree of conditional FP-tree of
add
“m”: (fca:3) “am”: (fc:3) “c” “cam”: (f:3)
{} {}
{} add Frequent Pattern
Frequent Pattern “a” f:3 Frequent Pattern
f:3
f:3 add c:3 add ad
“f” d
“c”
c:3 “f”
conditional FP-tree of conditional FP-tree of
a:3 “cm”: (f:3) of “fam”: 3
add
{} “f”
Frequent Pattern
Frequent Pattern
add conditional FP-tree of
f:3 “fcm”: 3
“f”

Frequent Pattern
fcam
conditional FP-tree of “fm”: 3 Frequent Pattern

Frequent Pattern Lecture 33/15-10-09 12


12
FP-Growth
Conditional Pattern Bases and
Conditional FP-Tree

Item Conditional pattern base Conditional FP-tree


p {(fcam:2), (cb:1)} {(c:3)}|p
m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m
b {(fca:1), (f:1), (c:1)} Empty
a {(fc:3)} {(f:3, c:3)}|a
c {(f:3)} {(f:3)}|c
f Empty Empty
order of L
Lecture 33/15-10-09 13
FP-Growth

Single FP-tree Path Generation

{}
All frequent patterns concerning m:
combination of {f, c, a} and m
f:3 m,
c:3  fm, cm, am,
fcm, fam, cam,
a:3
fcam
m-conditional FP-tree

Lecture 33/15-10-09 14
Summary of FP-Growth
Algorithm
• Mining frequent patterns can be viewed as first
mining 1-itemset and progressively growing each 1-
itemset by mining on its conditional pattern base
recursively

• Transform a frequent k-itemset mining problem into


a sequence of k frequent 1-itemset mining problems
via a set of conditional pattern bases

Lecture 33/15-10-09 15
Evaluation of Association patterns
• Objective interestingness measure:
– It uses statistics derived from data to
determine whether a pattern is interesting or
not.
– Examples are support, confidence and
correlation.
• Subjective interestingness measure:
– A pattern is called subjectively interesting if it
reveals unexpected information about the
data/ that can approach to profitable actions.

Lecture 33/15-10-09 16
• Example: {butter} {bread} may not be
interesting b’coz relnship represents
obvious information.
– But {Diapers} {beers} can be interesting
as relnship is quite unexpected and can really
help retailers in cross-selling for making
profits.
• Determining subjective knowledge is little
difficult as it requires prior information from
domain experts.
Lecture 33/15-10-09 17
Different approaches for incorporating
subjective knowledge
• 1. Visualization:
– Domain experts interact with the data mining system
by interpreting and verifying the discovered patterns.
• 2. Template-based approach:
– instead of considering all the rules, only those rules
that specify the user requirement are considered.
• 3. Subjective interestingness measure:
– A subjective measure can be defined depending on
domain information such as concept hierarchy. The
measure can be used to filter patterns that are
obvious and not required.
Lecture 33/15-10-09 18
Objective measures of
interestingness
• It is a data-driven approach for evaluating
the quality of an asso. rule.
• Domain-independent and needs least
input from users (like a threshold value for filtering low-
quality patterns).

• An objective measure is computed based


on frequency counts tabulated in a
contingency table.
Lecture 33/15-10-09 19
Computing Interestingness Measure
• Given a rule X → Y, information needed to compute rule
interestingness can be obtained from a contingency table
f11 : support of X and Y
Contingency table for X → Y f : support of X and Y
10
Y Y f01 : support of X and Y
X f11 f10 f1+ f00 : support of X and Y
X f01 f00 fo+ Row sum is support
Supp
count for Y f+1 f+0 |T| count for X

f11 denotes the no. of


times X and Y appear
together in the same Used to define various measures
trans.
◆ support, confidence, lift, Gini,
f01 denotes the no. of
J-measure, etc.
trans. containing Y but
not X Lecture 33/15-10-09 20
Drawback of Confidence

Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Association Rule: Tea → Coffee

Confidence= P(Coffee|Tea) = 0.75


but P(Coffee) = 0.9
⇒ Although confidence is high, rule is misleading
⇒ P(Coffee|Tea) = 0.9375
Lecture 33/15-10-09 21