Vous êtes sur la page 1sur 12

Association based Recommender System

Mamata Jenamani
Professor
Department of Industrial & Systems Engineering
Association based recommendation system
• A variation of collaborative filtering
• Recommending the items that can be purchased
with the items that users have purchased in the past
or have shown interest to purchase
– Co-occurrences of items that the users frequently
preferred to purchase/view together
• Information used
– Unary rating
• Type of recommendation decision
– Prediction
– Top-N recommendations
• Personalized
Introduction to frequent pattern analysis
• Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
• Frequent pattern analysis is the basis of association rule
mining
• Motivation: Finding inherent regularities in data
– What products were often purchased together?
– What are the subsequent purchases after buying a PC?
– What kinds of DNA are sensitive to this new drug?
– Can we automatically classify web documents?

• Applications
– Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
Basic Concepts: Frequent Patterns
and Association Rules
Transaction-id Items bought
10 A, B, D  Itemset X = {x1, …, xk}
20 A, C, D  Find all the rules X  Y with
30 A, D, E minimum support and confidence
40 B, E, F
 support, s, probability that a
50 B, C, D, E, F
transaction contains X  Y
Customer Customer  confidence, c, conditional
buys both buys B
probability that a transaction
having X also contains Y
Let supmin = 50%, confmin = 50%
Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3}
Customer
Association rules:
buys A
A  D (60%, 100%)
D  A (60%, 75%)
Interestingness measures

• Association rule mining searches for interesting relationships


among items in a given data set.
• Two measures of interestingness: Support and Confidence
• Find all the rules X  Y with minimum support and confidence
– support, S, probability that a transaction contains X  Y
• S = (# of tuples containing both X and Y)/(total number of tuples)
• Support Count = # of tuples containing both X and Y)
– confidence, C, conditional probability that a transaction having
X also contains Y
• C = (# of tuples containing both X and Y)/(# of tuples containing X
alone)
= (Support count of tuples containing X Y)/(Support count of tuples
containing A)
Algorithms for association rule mining
• Three major approaches
– Apriori algorithm
– Frequent pattern growth
– Vertical data format approach
The apriori Algorithm
• Apriori Principle
– Suppose an item set is not frequent (i.e. does not have the
minimum support). If an item A is added to this set then
the resulting set cannot occur more frequently.
– It is an anti-monotone property
• If a set cannot pass a test then all its supersets will also fail the
test.
– Two steps of the algorithm
• Join
• Prune
The algorithm

• scan DB once to get frequent 1-itemset C1


• C1 = Prune (C1)
• L1  C 1
• Continue join step till no frequent or candidate set can be generated
• Join
– Ck A set of k-item sets generated by joining Lk-1 with itself
– Ck=Prune(Ck)
– Lk  C k
• Prune(Ck)
– Delete the tuples in Ck that do not satisfy the apriori property
– If any (k-1)-subset of a candidate is not in Lk-1, then the k-
item set cannot be frequent
– Scan D to get the frequency count of each set in Ck. Delete the sets
that does not satisfy the minimum support count.
Assignment

• Derive the frequent Tid Items

pattern from the given


transaction database. 10 A, C, D

• Generate association
rules 20 B, C, E

30 A, B, C, E

40 B, E
Solution Supmin = 2 (50%)
Itemset sup
Database TDB Itemset sup
C1 {A} 2 L1 {A} 2
Tid Items {B} 3
{B} 3
10 A, C, D {C} 3
20 B, C, E
1st scan {C} 3
{D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
Itemset sup {A, B} 1
L2 2nd scan {A, B}
{A, C} 2 {A, C} 2
{B, C} 2 {A, E} 1 {A, C}
{B, E} 3 {B, C} 2 {A, E}
{C, E} 2 {B, E} 3 {B, C}
{C, E} 2 {B, E}
Itemset sup {C, E}
{A, B, C} 1
3rd scan L3 Itemset sup
C3 {A, B, C, E} 1 {B, C, E} 2
{A, C, E} 1
{B, C, E } 2
Solution

• Association Rules and – E{B, C} {2/3}


Confidence – {B, C}  E {2/2}
– BC {2/3} • Assuming we go for the
– CB {2/3} rules with 100% confidence
– BE {3/3} only 4 rules qualify
– EB {3/3}
– B{C, E} {2/3}
– {C, E}B {2/2}
– C {B, E} {2/3}
– {B, E}  C {2/3}
Association rule based recommendation
generation
• Generate association rules from the transaction database
• To generate Top-N recommendation
– Find the association rule supported by the active user (rules whose
LHS appears in the active user’s transaction)
– Let Ip be the set of unique items suggested by the RHS of the rules
– Sort Ip based on confidence score with respect to the association rules.
Confidence is more if an item appears in more rules.
– Choose the top N of these items
• Prediction
– An item can be recommended if it appears in the RHS of the
association rules supported by the active user.

Vous aimerez peut-être aussi