Académique Documents
Professionnel Documents
Culture Documents
Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference
What is WEKA
Developed at Univ of Waikato in New Zealand A collection of state-of-art machine learning algorithms and data preprocessing tools Provide implementation of Regression Classification Clustering Association rules Feature selection
Lecture at National Yang Ming University, June 2006 Copyright 2006 Dong Difeng
What is WEKA
Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference
Knowledge Flow
Experiment 1: Type: Classification Feature selection: GainRatio; Ranker (top 3) Algorithm: ID3 Training: Weather_nominal.arff Test: Weather_nominal.arff
Knowledge Flow
Knowledge Flow
Source file (.ARFF)
Knowledge Flow
10
Knowledge Flow
11
Knowledge Flow
12
Knowledge Flow
13
Knowledge Flow
14
Knowledge Flow
15
Knowledge Flow
16
Knowledge Flow
17
Knowledge Flow
18
Knowledge Flow
19
Knowledge Flow
20
Knowledge Flow
21
Knowledge Flow
22
Knowledge Flow
23
Knowledge Flow
24
Knowledge Flow
25
Knowledge Flow
26
Knowledge Flow
27
Knowledge Flow
28
Knowledge Flow
29
Knowledge Flow
30
Knowledge Flow
31
Knowledge Flow
32
Knowledge Flow
33
Knowledge Flow
34
Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference
35
Explorer
Do the same experiment Experiment 1: Type: Classification Feature selection: GainRatio; Ranker top 3 Algorithm: ID3 Training: Weather_nominal.arff Test: Weather_nominal.arff
36
Explorer
37
Explorer
38
Explorer
39
Explorer
40
Explorer
41
Explorer
42
Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference
43
44
Outline
What is WEKA Knowledge Flow Explorer Why Knowledge Flow Cross Validation Reference
45
Cross Validation
Experiment 2: Type: Classification Feature selection: GainRatio; Ranker top 3 Algorithm: ID3 Training: Weather_nominal.arff (CV) Test: Weather_nominal.arff (CV) CV type: 3-folder CV
46
Cross Validation
47
Cross Validation
What do we view in this case? Text1 VS. Text2 (1)
48
Cross Validation
Text1 VS. Text2 (2)
49
Cross Validation
Text1 VS. Text2 (3)
50
Cross Validation
Text3 VS. Text4 (1)
51
Cross Validation
Text3 VS. Text4 (2)
52
Cross Validation
Text3 VS. Text4 (3)
53
Cross Validation
Trees
54
Evaluation of result
Cross Validation
55
Cross Validation
Conclusion: Source data are separated into several folders for cross validation Feature selection is done for each training folder (only training) folder separately Different trees are build in different cases The evaluation of classification is by overall results
56
Cross Validation
Experiment 3: Type: Classification Feature selection: GainRatio; Ranker top 2 Algorithm: ID3 Training: Weather_nominal.arff Test: Weather_nominal.arff
57
Cross Validation
Ranker top 3 VS. Ranker top 2
58
Cross Validation
Conclusion: Attribute windy was ignored. In this case, the classifier only consider the attribute that was kept
59
Reference
http://www.cs.waikato.ac.nz/~ml/ Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)