Vous êtes sur la page 1sur 3

Maximal and Closed Itemsets

A compact representation of frequent itemsets is extremely important when we


look at association rule mining. The notion of maximal frequent itemsets comes
handy when suppose we are considering huge amounts of data. If the length of a
frequent itemset is ‘k’ we know all of it 2k subsets are also frequent because of
the downward closure property. But sometimes when the computation is very
expensive and we not interested in associations alone the process of generating
these additional subsets can be avoided and we can just look at the frequent
itemset with maximum length.

Maximal frequent itemset: The definition says that an itemset is maximal


frequent if none of its immediate supersets is frequent.

Like for example look at the lattice below to identify the maximal itemsets
In this lattice, if we closely observe the red border it selectively highlights three
itemsets in blue which are maximal frequent itemsets. This can be easily seen as
none of the supersets of the blue color itemsets are above the border.

However, one problem associated with maximal frequent itemsets is that even
though we know that all its subsets are frequent we do not know their supports.
Actual support information is very important for these itemsets when we are
deriving the association rules. So now we shift our goal and try to find out all such
frequent itemsets that have the same support as their subsets. This forms the
notion of a closed frequent itemset.

Closed Frequent Itemset: An itemset is closed if none of its immediate supersets


has the same support as that of the itemset.

Now to understand the notion of closed frequent itemsets lets look at the
example below.
Itemset Support
TID Items {A} 4 Itemset Support
{B} 5
1 {A,B} {C} 3
{A,B,C} 2
2 {B,C,D} {D} 4 {A,B,D} 3
3 {A,B,C,D} {A,B} 4 {A,C,D} 2
{A,C} 2
4 {A,B,D} {A,D} 3
{B,C,D} 3
5 {A,B,C,D} {B,C} 3 {A,B,C,D} 2
{B,D} 4
{C,D} 3

In the above process we have derived the set of frequent itemsets from the
transactions. Now let’s represent the itemset lattice and identify the closed and
frequent itemsets. The minimum support value here in the current example is set
to 2. The labels in red for each node in the lattice represent the set of transactions
where the element occurs. (support)
We have provided labeling for few of the nodes which can be verified by checking
once. C,D,E are closed because one can observe that none of the supersets have
the same support but there does exist a frequent superset hence it’s not maximal.
But for CE and DE one can observe that both the properties are satisfied and
hence it comes under the category of both Closed and Maximal.

Closed but not maximal

Closed and maximal

Vous aimerez peut-être aussi