Vous êtes sur la page 1sur 20

A seminar

on
Association Rule Mining
to Remotely Sensed Data

Represented By
Madhusmita Sahu
(CSE,950014)

1
Contents
Introduction
Apriori Algorithm
Mining Rules to Imagery data
-Problem definition
-Partitioning quantitative attributes
-Finding larger itemsets from imagery data
New pruning techniques for fast data mining
-Technique one
-Technique two
An example of applying new algorithm
Conclusion
Reference

2
REMOTE SENSING

 Remote Sensing is the science of acquiring information about the Earth's


surface without actually being in contact with it.

recording reflected energy

images collected in multiple bands of the electromagnetic spectrum

3
Association Rule Mining
Associations
Simple rules in categorical data
Sample applications
Market Basket Analysis
Buys(Milk) ⇒ Buys(Eggs)
Transaction Processing
Income(Hi) & Single(Y) ⇒ Owns(Computer)
Search for Strong Rules
Support R(A ⇒ B) = P(A U B)
Confidence R(A ⇒ B) = P(B | A) = P(A ∩ B) / P(A)

4
The Apriori Algorithm : Pseudo code

Join Step: Ck is generated by joining Lk-1with itself


Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-
itemset
Pseudo-code:

Ck: Candidate item set of size k


Lk: frequent item set of size k
L1= { frequent items};
For(k= 1; Lk!=∅; k++) do begin
Ck+1= candidates generated from Lk;
For each transaction t in database do
Increment the count of all candidates in Ck+1 that are contained in t
Lk+1= candidates in Ck+1 with min_support
end
Return ∪k Lk;
5
MINING ASSOCIATION RULES
FROM IMAGERY DATA
  Problem definition

Partitioning Quantitative Attributes

Finding Large Item sets from Imagery Data

6
NEW PRUNING TECHNIQUES FOR FAST DATA
MINING
Technique one

lemma 1: A pixel value can not belong to two different


intervals from the same band.

lemma 2: The combination of k intervals (k>1)from


same band has support zero.

7
Ck : Candidate k-item sets
Lk: Large k-item sets
* : An operation for contactenation
│Ck│ : Number of itemset in candidate k-item sets
Rj : Number of intervals in bandj │L k │:
Number of itemset in large k-item sets

1. According to the apriori algorithm :Apriori use L1*L1 to generate a candidate set of
itemsets C2.
|C2|apriori = |L1 ||L₁-1| ∕ 2

2. According to the new algorithm :


Assume │ L 1 │ = R1 + R2 + ... + Rn. 
│C2│new =R1 (R2 + R3 + ... + Rn) +R2 (R3 + R4 + ... + Rn) + ...
+ Rn-2 (Rn-1 + Rn) + Rn-1(Rn)

=
 

8
Contd…

The number of candidate 2-itemsets generated by new algorithm is much


less than by Apriori .
│C2│prune 1 = │C2 │apriori - │ C2 │new

  when n is large and Rj is large, │C2│prune 1 becomes an extremely large


number.
For example : If the imagery data has 8 bands and each band has 16
intervals.The number of pruned candidate 2-itemsets is
8 *16(16-1)=960.It sharply reduces the process cost.

9
Technique two
 During the process of data mining ,allow user interaction with the
mining engine and use users’ prior knowledge will help to speed up
the mining algorithms by restricting the search space.
 Consider only one band "bandN" in output. The association rule is the
form: bandl Λ ... Λ band(N-l)⇒bandN.
The number of candidate 2-itemset

│C2 │new =
 we are not interested in those itemsets which do not contain bandN.
We will prune those candidate itemset in which none of the interval is
chose from bandN.
The number of pruned candidate 2-itemset is
 │C2 │prune 2 =

10
contd….

 Apply new pruning technique described in technique one.

│C2 │prune 1 =
 
 The total number of pruned candidate 2-itemset
│C2 │prune = │ C2│prune 1 + │ C2 │prune 2

 And the remaining steps are the same as Apriori algorithm,

11
Contd….

 If there are (N-M) bands in output in the form:


bandl Λ ... Λ bandM ⇒ band(mM+l ) .... Λ bandN

The total number of pruned candidate 2-itemset


  │ C2│prune = │C2 │prune l+ │ C 2 │prune2

  = +

And the remaining steps are the same as Apriori algorithm

12
Steps
Step 1: Choose one of the partition method (equal
depth,uneven depth and discontinous partition) to
determine the intervals.

Step 2: From large l-item set, apply new pruning


technique (technique one and technique two) to
generate candidate 2-itemset.

Step 3: Applying remaining steps of Apriori algorithm

13
An example for applying new algorithm (Assume user select equal depth
partitioning.Diameter two for band1 and band4 , Diameter three for band2 and band3

Pixel Band1 Band2 Band3 Band4


1 40 140 200 240
2 50 130 210 250
3 45 135 210 190
4 100 180 50 100
5 110 170 40 120

[0,63] [64,127] [128,191] [192,255]


band1 b11 b12 b13 B14
band4 b41 b42 b43 b44

[0,31] [32,63] [64,95] [96,127 [128, [160, [192, [226,


]
159] 191] 225] 255]
band2 b21 b22 b23 b24 b25 b26 b27 B28
band3 b31 b32 b33 b34 b35 b36 b37 b38
14
An example of partition the value into intervals.
After selecting partition method.Map each value in this
table into intervals.

Pi b b b b b b b b b b b b b b b b
xel 11 12 13 14 21 25 26 28 31 32 37 38 41 42 43 44
1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
2 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
3 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0
4 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0
5 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0

15
Contd….
 Apply new pruning techniques for candidate 2-itemset
generation.Assume the minsup=40% and minconf=60%
 Candidate 1-itemset:
{b11,b12,b13,b14,b21,b22,b23,b24,b25,b26,b27,b28,b31,b32,
b33,b34,b35,b36,b37,b38,b41,b42,b43,b44}
• Large 1-itemset:
{b11(3),b12(2),b25(3),b26(2),b32(2),b37(3),b42(2),b44(2)}
Candidate 2-itemsets: {{b42,b11},{b42,b12},{b42,b25},
{b42,b32},{b42,b37},{b44,b11},{b44,b12},{b44,b25},
{b44,b26},{b44,b32},{b44,b37}}

16
An example contd….
• Applying pruning technique one,
│C2 │prune 1 =1+1+1+1=4
• Applying pruning technique two,
│C2 │prune 2 =2 X (2+2)+2 X2 = 12
• Total pruned no. of candidate 2-itemsets is =12+4=16
• Applying apriori algorithm,the no. of candidate 2-itemset
│C2 │apriori =(8 X 7)/2 = 28
The percentage of pruning is 57%.so,the execution
efficiency of mining process is improved.
• Remaining steps are the same as Apriori algorithm.

17
Conclusion

 In this seminar, we defined a new data mining problem


---mining association rules from imagery data and its
application in precision agriculture.
Since the efficiency of a mining algorithm is a very important
issue of data mining, we proposed two simple and effective
pruning techniques for candidate 2-itemset generation.
by exploiting the nature of the problem and characteristics of
imagery data, we can prune significant number of unnecessary
candidate itemsets during the very early phase of mining
process.

18
References

 Jianning Dong,william Perrizo,Qin Ding and Jingkai Zhou,”Association


rule mining to Remotely sensed data” North Dakota State
University,Fargo,ND 581105
 Data Mining: Concepts and Techniques(Hardcover - Mar 2006)
by Jiawei han,Micheline kamber.
 J. Zhang, H. Wynne, M. L. Lee, “Image mining: issues, frameworks, and
techniques,” in Proceedings of 2nd International Workshop on Multimedia
Data Mining, San Francisco, Aug 2001, pp. 13 – 20.
 J. Li and R. M. Narayanan, "Integrated spectral and spatial information
mining in remote sensing," IEEE Transactions on Geoscience and Remote
Sensing, vol. 42, no. 3, pp. 673 – 685, March 2004.

19
Thank You!!

20

Vous aimerez peut-être aussi