Académique Documents
Professionnel Documents
Culture Documents
Decision-Tree Induction
Algorithms
Rodrigo C. Barros
Márcio P. Basgalupp
André C.P.L.F. de Carvalho
Alex A. Freitas
Wednesday, July 3, 13
Submission
Wednesday, July 3, 13
Objective
Note: there are many GPs that induce decision trees for a given
data set, but HEAD-DT is very different
Wednesday, July 3, 13
automating the design of decision
tree induction algorithms
Wednesday, July 3, 13
Manual invention of Decision tree
induction algorithms
data sets
decision
trees
Wednesday, July 3, 13
Automatic invention of decision
tree induction algorithms
Iteratively mix
“Building blocks” building blocks Machine-designed generic
of DTI algorithms decision tree induction program
split measure
components()
...
stop criteria DTI
components()
...
data sets
decision
trees
Wednesday, July 3, 13
Motivation of automating the
design of data mining algorithms
New machine-designed algorithm can be useful for types of data sets where
human-designed algorithms have not a good predictive performance
Wednesday, July 3, 13
algorithm evaluation
FITNESS EVALUATION
INDIVIDUAL
Algorithm tailored
DECISION-TREE
ALGORITHM
TRAINING DT VALIDATION STATS to a single data set
70% 30%
90%
DATA SET META-TRAINING SET META-TRAINING SET
10%
META-TEST SET
TRAINING 1 DT 1 VALIDATION 1 STATS 1
DECISION-TREE VALIDATION 2
TRAINING 2 DT 2 STATS 2
ALGORITHM
... ... ...
Algorithm tailored to TRAINING n DT n VALIDATION n
...
STATS n
Wednesday, July 3, 13
Successful applications
Algorithm tailored to a
single data set - public
UCI data
Algorithm tailored to
flexible-receptor
molecular docking
data
Wednesday, July 3, 13
Successful applications
Wednesday, July 3, 13
Result is Human competitive
Even though enhancements have been proposed to both CART and C4.5, no
top-down algorithm to date has achieved predictive results comparable to
them.
Wednesday, July 3, 13
Result is human competitive
HEAD-DT is the first (and so far the only) algorithm able to automatically design
a complete top-down decision tree algorithm tailored to a particular data set or to
a particular domain.
Wednesday, July 3, 13
Summary
Wednesday, July 3, 13
Thank you!
Questions?
Wednesday, July 3, 13