Vous êtes sur la page 1sur 15

8.

Evaluation Methods

Errors and Error Rates


Precision and Recall
Similarity
Cross Validation
Various Presentations of Evaluation Results
Statistical Tests

7/03 Data Mining 1


How to evaluate/estimate error
Resubstitution
one data set used for both training and for testing
Holdout (training and testing)
2/3 for training, 1/3 for testing
Leave-one-out
If a data set is small
Cross validation
10-fold, why 10?
m 10-fold CV

7/03 Data Mining 2


Error and Error Rate
Mean and Median
mean = 1/nxi
weighted mean = (wixi)/wi
median = x(n+1)/2 if n is odd, else (xn/2+x(n/2)+1)/2
Error disagreement btwn y and y (predicted)
1 if they disagree, 0 otherwise (0-1 loss l01)
Other definitions depending on the output of a
predictor such as quadratic loss l2, absolute loss l

7/03 Data Mining 3


Error estimation
Error rate e = #Errors/N, where N is the total number
of instances
Accuracy A = 1 - e

7/03 Data Mining 4


O|Pred Pve Nve
Precision and Recall Pve TP FN
Nve FP TN
False negative and false positive
Types of errors for k classes = k2-k
k = 3, 3*3-3 = 6, k = 2, 2*2-2 = 2
Precision (wrt the retrieved)
P = TP/(TP+FP)
Recall (wrt the total relevant)
R
R = TP/(TP+FN)
PrecisionRecall (PR) and PR gain
PR gain = (PR PR0)/PR0
Accuracy
A = (TP+TN)/(TP+TN+FP+FN)
P

7/03 Data Mining 5


Similarity or Dissimilarity Measures
Distance (dissimilarity) measures (Triangle Inequality)
Euclidean
City-block, or Manhattan
Cosine (pi,pj)= [(pikpjk)/ (pik)2(pjk)2]
Inter-clusters and intra-clusters
Single linkage vs. complete linkage
Dmin = min|pi - pj|, two data points
Dmax= max|pi - pj|
Centroid methods
Davg= 1/(ninj)|pi pj|
Dmean= |mi - mj|, two means

7/03 Data Mining 6


k-Fold Cross Validation
Fold 1
Cross validation
1 fold for training, the rest for testing Fold 2
rotate until every fold is used for training Fold 3
calculate average
m k-fold cross validation
reshuffle data, repeat XV for m times
what is a suitable k?
Model complexity
use of XV
tree complexity, training/testing error rates

7/03 Data Mining 7


Presentations of Evaluation Results
Results are usually about time, space, trend, average case
Learning (happy) curves Box-plot
Accuracy increases over X Whiskers (min, max)
Its opposite (or error) Box: confidence interval
decreases over X Graphical equivalent of t-test

max

2
mean

min

7/03 Data Mining 8


Statistical Tests
Null hypothesis and alternative hypothesis
Type I and Type II errors
Students t test comparing two means
Paired t test comparing two means
Chi-Square test
Contingency table

7/03 Data Mining 9


Null Hypothesis
Null hypothesis (H0)
No difference between the test statistic and the actual
value of the population parameter
E.g., H0: = 0
Alternative hypothesis (H1)
It specifies the parameter value(s) to be accepted if the
H0 is rejected.
E.g., H1: != 0 two-tailed test
Or H1: > 0 one-tailed test

7/03 Data Mining 10


Type I, II errors
Type I errors ()
Rejecting a null hypothesis when it is true (FN)
Type II errors ()
Accepting a null hypothesis when it is false (FP)
Power = 1
Costs of different errors
A life-saving medicine appears to be effective, which is
cheap and has no side effect (H0: non-effective)
Type I error: it is effective, not costly
Type II error: it is non-effective, very costly

7/03 Data Mining 11


Test using Students t Distribution
Use t distribution for testing the difference between
two population means is appropriate if
The population standard deviations are not known
The samples are small (n < 30)
The populations are assumed to be approx. normal
The two unknown 1 = 2
H0: (1 - 2) = 0, H1: (1 - 2) != 0
Check the difference of estimated means normalized by
common population means
degree of freedom and p level of significance
df = n1 + n2 2
7/03 Data Mining 12
Paired t test
With paired observations, use paired t test
Now H0: d = 0 and H1: d != 0
Check the estimated difference mean
The t in previous and current cases are calculated
differently.
Both are 2-tailed test, p = 1% means .5% on each side
Excel can do that for you!
Rejection Region

-/2 0 +/2
7/03 Data Mining 13
Chi-Square Test (the goodness-of-fit)
Testing a null hypothesis that the population distribution for
a random variable follows a specified form.
The chi-square statistic is calculated: C1 C2
2 k I-1 A11 A12 R1
(Aij Eij) / Eij
2= 2
I-2 A21 A22 R2
i=1 j=1

C1 C2 N
degree of freedom df = k-m-1
k = num of data categories
m = num of parameters estimated Rejection Region

0 uniform, 1- Poisson, 2 - normal


Each cell should be at least 5
One-tail test

7/03 Data Mining 14


Bibliography
W. Klosgen & J.M. Zytkow, edited, 2002, Handbook of
Data Mining and Knowledge Discovery. Oxford
University Press.
L. J. Kazmier & N. F. Pohl, 1987. Basic Statistics for
Business and Economics.
R.E. Walpole & R.H. Myers, 1993. Probability and
Statistics for Engineers and Scientists (5th edition).
MACMILLAN Publishing Company.

7/03 Data Mining 15

Vous aimerez peut-être aussi