Académique Documents
Professionnel Documents
Culture Documents
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Positioning decision tree learning
data mining
decision tree
process
learning
discovery
process
association
mining
rule learning
conformance
predictive clustering
checking
analytics
other types of
BPM mining
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Decision tree learning
supervised learning
response
Decision tree learning variable
response
variable
response
variable
response
variable
supervised learning
Data set 1: Effect of lifestyle on life expectancy
smoker
yes no
young
drinker
(195/11) yes no
old
weight
<90 90 (65/2)
old young
(219/34) (381/55)
young
old
old
young
young
old
70 = old
young
(195/11) yes
drinker
no <70 = young
old
weight
<90 90 (65/2)
old young
(219/34) (381/55)
smoker
yes no
young
drinker
(195/11) yes no
old
weight
<90 90 (65/2)
old young
70 = old (219/34) (381/55)
<70 = young
smoker
yes no
young
drinker
(195/11) yes no
smoker
yes no
Mary Jones
drinker yes young
drinker
(195/11)
smoker no yes no
weight 70 kg
old
age 85 year <90
weight
90 (65/2)
old young
(219/34) (381/55)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Answer: Yes
smoker
yes no
Mary Jones
drinker yes young
drinker
(195/11)
smoker no yes no
weight 70 kg
old
age 85 year <90
weight
90 (65/2)
old young
(219/34) (381/55)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Question: Correctly classified?
smoker
yes no
Sue Smith
drinker no young
drinker
(195/11)
smoker no yes no
weight 60 kg
old
age 35 year <90
weight
90 (65/2)
old young
(219/34) (381/55)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Answer: No
smoker
yes no
Sue Smith
drinker no young
drinker
smoker no (195/11) yes no
weight 60 kg
old
age 35 year <90
weight
90 (65/2)
old young
(219/34) (381/55)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Data set 2: Effect of individual course results on graduation
logic 8
-
linear <7
algebra cum laude
6
<6 (20/2)
linear
algebra 6
passed
operat. (87/11)
<6
<6 research 6 passed
(31/7)
failed
failed passed (20/4)
(101/8) (82/7)
predictor variables
logic 8
-
linear
ming
<7
variable
algebra cum laude
6
<6 (20/2)
linear
algebra 6
passed
operat. (87/11)
<6
<6 research 6 passed
(31/7)
failed
failed passed (20/4)
(101/8) (82/7)
logic 8
-
linear <7
algebra cum laude
6
<6 (20/2)
linear
algebra 6
passed
operat. (87/11)
<6
<6 research 6 passed
(31/7)
failed
failed passed (20/4)
(101/8) (82/7)
Data set 3: Muffin or no muffin
tea
0 1
response
latte
muffin variable
0 2 (30/1)
espresso
0 = "no muffin"
0 1
muffin no muffin
(6/2) (11/3)
tea
0 1
muffin
latte
0 2 (30/1)
no muffin 1 muffin
(189/10) (4/0)
espresso
0 1
muffin no muffin
(6/2) (11/3)
Question: Correctly classified?
tea
0 1
capp. 2 0
latte
2
muffin
(30/1)
latte 3
muffin 1 no muffin
(189/10)
1 muffin
(4/0)
bagel 2 espresso
0 1
muffin no muffin
(6/2) (11/3)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Answer: Yes
tea
0 1
capp. 2 0
latte
2
muffin
(30/1)
latte 3
muffin 1 no muffin
(189/10)
1 muffin
(4/0)
bagel 2 espresso
0 1
muffin no muffin
(6/2) (11/3)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Question: Correctly classified?
tea
0 1
bagel 1 0
latte
2
muffin
(30/1)
latte 1
ristr. 1 no muffin
(189/10)
1 muffin
(4/0)
espresso
0 1
muffin no muffin
(6/2) (11/3)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Answer: No
tea
0 1
bagel 1 0
latte
2
muffin
(30/1)
latte 1
ristr. 1 no muffin
(189/10)
1 muffin
(4/0)
espresso
0 1
muffin no muffin
(6/2) (11/3)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Question: Did this person eat a muffin?
tea
0 1
bagel 1 0
latte
2
muffin
(30/1)
latte 2
capp. 1 no muffin
(189/10)
1 muffin
(4/0)
muffin ? espresso
0 1
muffin no muffin
(6/2) (11/3)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Answer: Yes!
tea
0 1
bagel 1 0
latte
2
muffin
(30/1)
latte 2
capp. 1 no muffin
(189/10)
1 muffin
(4/0)
muffin 1 espresso
0 1
muffin no muffin
(6/2) (11/3)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
How does it work? - Basic idea
Split the set of instances in
subsets such that the
variation within each subset
becomes smaller.
high entropy
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
How does it work? - Basic idea
Split the set of instances in
subsets such that the
variation within each subset
split on attribute smoker
becomes smaller.
smoker
yes no
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
How does it work? - Basic idea
Split the set of instances in yes
smoker
no
smoker
yes no
drinker
yes no
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Decreasing entropy
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
high low
entropy entropy
low lower
entropy entropy high low
entropy entropy
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
High entropy
Degree of
uncertainty.
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
High entropy
Degree of
uncertainty.
Inverse of
"compressibility"
("zippability") .
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
High entropy
Degree of
uncertainty.
Inverse of
"compressibility"
("zippability") .
Goal: reduce
entropy in leaves
of tree to improve
predictability.
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Intermezzo: Logarithms
(needed for computing entropy)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Intermezzo: Logarithms
(needed for computing entropy)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Intermezzo: Logarithms
(needed for computing entropy)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Definition entropy
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example E = 1 (three red, three green)
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
3:3
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example E = 0 (two red, no green)
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
2:0
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example E = 0.811 (one red, three green)
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
1:3
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example E = 0 (two red, no green)
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
2:0
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example E = 1 (one red, one green)
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
1:1
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example E = 0 (two red, no green)
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
0:2
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Entropy values
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Weighted average
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Weighted average
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Weighted average
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Information gain
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
E=1
E=0.54
E=0.33
0.46
0.21
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Information gain
split on split on
smoker
attribute attribute yes no
smoker drinker
smoker
yes no
drinker
yes no
E=1
E=0.54
E=0.33
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Answer: E=0 for other cells
Other cells:
16+0+0+0+0+0+0+0 balls
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Overall entropy (weighted average): E=0.33
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
information loss = 2.6666
E=0.3333 E=3
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=546
young
#old=314
E=0.946848 (860/314)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=546
young
#old=314
E=0.946848 (860/314)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=546
young
#old=314
E=0.946848 (860/314)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=546
young
#old=314
E=0.946848 (860/314)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=546
young
#old=314
E=0.946848 (860/314)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
Overall Entropy
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=546
young
#old=314
E=0.946848 (860/314)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
Overall Entropy
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=546
young information
#old=314
E=0.946848 (860/314) gain is
0.107012
split on attribute smoker
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
Overall Entropy
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Although the
classification did not
change there was
#young=546
young information information gain (we
#old=314
E=0.946848 (860/314) gain is are more certain about
0.107012 the group of smokers).
split on attribute smoker
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
Overall Entropy
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young young #old=303
(195/11) (665/303) E=0.994314
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
#young=184
#old=11 smoker
yes no
E = 0.313027
#young=362
young
(195/11)
young
(665/303)
#old=303
E=0.994314
information gain
is 0.076468
split on attribute drinker
#young=184
#old=11 smoker
yes no
E = 0.313027
young #young=2
drinker #old=63
(195/11) yes no E=0.198234
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Decision tree algorithm (sketch)
Start with root node corresponding to all instances.
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Decision tree algorithm (sketch)
Start with root node corresponding to all instances.
Iteratively traverse all nodes to see whether "information
gain" (i.e., reduction of uncertainty) is possible.
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Decision tree algorithm (sketch)
Start with root node corresponding to all instances.
Iteratively traverse all nodes to see whether "information
gain" (i.e., reduction of uncertainty) is possible.
For each node and for every attribute, check what the
effect of splitting the node is in terms of information gain.
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Decision tree algorithm (sketch)
Start with root node corresponding to all instances.
Iteratively traverse all nodes to see whether "information
gain" (i.e., reduction of uncertainty) is possible.
For each node and for every attribute, check what the
effect of splitting the node is in terms of information gain.
Select the attribute with the biggest information gain
above a given threshold.
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Decision tree algorithm (sketch)
Start with root node corresponding to all instances.
Iteratively traverse all nodes to see whether "information
gain" (i.e., reduction of uncertainty) is possible.
For each node and for every attribute, check what the
effect of splitting the node is in terms of information gain.
Select the attribute with the biggest information gain
above a given threshold.
Continue until no significant improvement is possible.
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Decision tree algorithm (sketch)
Start with root node corresponding to all instances.
Iteratively traverse all nodes to see whether "information
gain" (i.e., reduction of uncertainty) is possible.
For each node and for every attribute, check what the
effect of splitting the node is in terms of information gain.
Select the attribute with the biggest information gain
above a given threshold.
Continue until no significant improvement is possible.
Return the decision tree.
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Many parameters/variations are possible
The minimal size of a node
before or after splitting. -
logic 8
linear <7
algebra cum laude
6
<6 (20/2)
linear
algebra 6
passed
operat. (87/11)
<6
<6 research 6 passed
(31/7)
failed
failed passed (20/4)
(101/8) (82/7)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Many parameters/variations are possible
The minimal size of a node
before or after splitting. -
logic 8
passed
linear
algebra
(20/2)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Many parameters/variations are possible
The minimal size of a node
before or after splitting. -
logic 8
passed
linear
algebra
(20/2)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Many parameters/variations are possible
The minimal size of a node
before or after splitting. -
logic 8
passed
linear
algebra
(20/2)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Many parameters/variations are possible
Alternatives to entropy,
e.g., Gini index of diversity.
Splitting the domain of a
numerical variable. -
logic 8
linear <7
algebra cum laude
6
<6 (20/2)
linear
algebra 6
passed
operat. (87/11)
<6
<6 research 6 passed
(31/7)
failed
failed passed (20/4)
(101/8) (82/7)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Many parameters/variations are possible
Alternatives to entropy,
e.g., Gini index of diversity.
Splitting the domain of a
numerical variable. -
logic 8
linear
program
ming 7
6
passed
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example applications in process mining
get support
from local
manager (b)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example applications in process mining
What is driving these decisions?
get support
from local
manager (b)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example applications in process mining
What is the most likely
get support path of a running case
from local
manager (b) given its data attributes?
get detailed accept
motivation request (g)
letter (c)
register travel
decide (e)
request (a)
start check budget reject end
by finance (d) request (h)
reinitiate
request (f)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example applications in process mining
get support
from local
manager (b)
get support
from local
manager (b)
Part III: From Event Logs to Process Models Part IV: Beyond Process Discovery
Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10
Getting the Data Process Discovery: Advanced Process Conformance Mining Additional Operational Support
An Introduction Discovery Techniques Checking Perspectives
Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Wil van der Aalst & TU/e (use only with permission & acknowledgements)