Académique Documents
Professionnel Documents
Culture Documents
2 questions:
• An object is A or B binary classification
• 1 output variable that can take on two values
• An object is A, B, C, …, N multi-class classification
• N output variables, each representing a class
Remember…
BIAS 𝛽𝛽0
𝑥𝑥1
𝛽𝛽1
𝑥𝑥2 1 binary output
𝛽𝛽2
𝛽𝛽3 𝑦𝑦1 It’s output y1 or not?
𝑥𝑥3
𝛽𝛽4
𝑥𝑥4
Logistic Regression with
SoftMax
𝛽𝛽1
𝑥𝑥1 𝑤𝑤1,1
𝑤𝑤1,2
𝛽𝛽2
𝑥𝑥2 𝑤𝑤1,3
𝑦𝑦1 Is class y1?
𝛽𝛽3
3 class output
𝑥𝑥3 𝑦𝑦2 Is class y2?
• This test is to be done on the test set, to be kept separated from the training
set…
• Remember why??
Test Set e Training Set
Algorithm
Confusion Matrix
True
label
How to evaluate binary class
models
• By confronting the predicted labels with the true ones, we derive 4
different types of metrics:
• TP: true positive, instances predicted as belonging to positive
class, which actually belong to this class
• TN: true negative, instances predicted as belonging to negative
class, which actually belong to this class
• FP: false positive, (o Type I Error) instances predicted as
belonging to positive class, which actually belong to negative
class
• FN: false negative, (o Type II Errorinstances predicted as
belonging to negative class, which actually belong to positive
class
Valuation indices
• From the numbers calculated before, we
obtain:
• Accuracy =(TP + TN) / (P + N)
- Be careful: its a very general metrics,
which doesn’t make any difference
between TP and TN
– This is a problem if you are more
interested in one of the two
classes (as is often the case)
True
label
• Precision (Positive predictive value) =
TP/(TP+FP)
True
Doesn’t make any distincion between
label how good the model is in predicting
the positive class (in this case the
fraud), and the negative class (in this
case the non- fraud)
exercise
Decision Trees
Decision Trees
• Decision trees are very broadly utilized in predictive analytics
projects
• Mainly for classification problems
• Based on flexible technique which makes them work efficiently in
various different situations
• The output is very readable, since it’s visually represented as a
tree-like structure
• 3 types of trees:
• cassification trees, for classification problems
• regression trees, used for regression problems
• classification & regression trees, a mix of both, not very often
used in real life
Decision Trees
• Algorithms :
• ID3 (Iterative Dichotomiser 3)
• C4.5 (successor of ID3)
• C5.0 (successor of C4.5)
• CART (Classification And Regression Tree)
• CHAID (CHi-squared Automatic Interaction Detector).
• MARS: facilitates use of numerical variables
• Conditional Inference Trees. Based on statistical methods, uses non-
parametric tests to make splits. Very efficient in minimizing overfitting
when tested several times, which means they’re generally unbiased and
don’t require pruning.
ID3 and CART both invented independently between 1970 and 1980), but
work in similar ways
Will John play tennis?
Will John play tennis?
John giocherà a tennis?
Will John play tennis?
Will John play tennis?
Knowledge, or Entropy is a
information, proxy of
and entropy «chaos»: high
are opposites entropy, lots of
chaos in the
data
Entropy / Information
Entropy / information
Example
1 Sposato M € 35.000 SI
Freq(CSI) = 5 Freq(CNO) = 5 Count(T) = 10
2 Single F € 47.000 NO
3 Sposato F € 58.000 SI
Info(T) = 5/10*log2(5/10)+ 5/10*log2(5/10) =1
4 Single M € 31.000 NO
5 Separato M € 70.000 SI
We now calculate for the subsets:
6 Sposato F € 27.000 NO
7 Single F € 36.000 SI
• reddito (2 subsets: <= o > 35000)
8 Separato M € 50.000 NO • stato coniugale (3 subsets: sposato, single,
9 Sposato F € 65.000 SI separato)
10 Single M € 18.000 NO • genere (2 subsets: M e F).
11 Sposato M € 40.000 ??
• We need to find which variable among those included in the model gives
the purest splits. La ripartizione che massimizza il guadagno
corrisponde al primo livello dell’albero, sotto l’insieme totale.
• After each split we do the same reasoning, starting again with all
available variables (also the ones we already used) A questo punto si
ripete il processo per ciascuno dei sotto insiemi che, al loro interno,
presentano elementi appartenenti a classi diverse.
• Il processo si ferma quando i sottoinsiemi contengono elementi solo di
una classe, oppure quando il continuare con la suddivisione non porta
miglioramenti dell’accuratezza.
Pruning