Académique Documents
Professionnel Documents
Culture Documents
Lecture 2
Instructor: Max Welling
Evaluation of Results
How do you report classification error?
Questions:
What is the error of h on unseen data?
If we have two competing hypotheses, which one is better
on unseen data?
How do we compare two learning algorithms in the face of limited data?
1 n
error (h | S ) [h (xi ) yi ]
n i 1
2) Error(h|P) is the true error on the unseen data sampled from the
distribution P(x):
error (h | P ) dx P (x ) [h (x ) f (x )]
n!
p (# heads r | p ,n ) p r (1 p )n r
r !(n r )!
Distribution over Errors
Consider some hypothesis h(x)
Do this k times.
Why? imagine a magic coin, where God secretly determines the probability
of heads by the following procedure. First He takes some random hypothesis h.
Then, He draws x~P(x) and observes if h(x) correctly predicts the label correctly.
If it does, he makes sure the coin lands heads up...
mean (r ) E [r | n , p ] np
var
var (r ) E [(r E [r ]) ] np (1 p )
2
mean
If we match the mean, np, with the observed value n*error(h|S) we find:
E [error (h | P )] E [r / n ] p error (h | S )
error (h | S ) (1 error (h | S ))
error (h | P ) error (h | S ) zN
n
Normal(0,1)
1.28
Imagine again you have infinitely many sample sets X1,X2,.. of size n.
Find an unbiased estimator E for this quantity based on observing data X (e.g. error(h|X))
Determine the distribution P(E) of E under the assumption you have infinitely
many sample sets X1,X2,...of some size n. (e.g. p(E)=Binomial(p,n), p=error(h|P))
Estimate the parameters of P(E) from an actual data sample S (e.g. p=error(h|S))
Compute mean and variance of P(E) and pray P(E) it is close to a Normal distribution.
(sums of random variables converge to normal distributions central limit theorem)
State you confidence interval as: with confidence N% error(h|P) is contained in the interval
Y mean zN var
Assumptions
Training data and test data are drawn IID from the same distribution P(x).
(IID: independently & identically distributed)
When you obtain a hypothesis from a learning algorithm, split the data
into a training set and a testing set. Find the hypothesis using the training set
and estimate error on the testing set.
Comparing Hypotheses
Assume we like to compare 2 hypothesis h1 and h2, which we have
tested on two independent samples S1 and S2 of size n1 and n2.
d mean zN var
Paired Tests
Consider the following data:
error(h1|s1)=0.1 error(h2|s1)=0.11
error(h1|s2)=0.2 error(h2|s2)=0.21
error(h1|s3)=0.66 error(h2|s3)=0.67
error(h1|s4)=0.45 error(h2|s4)=0.46
and so on.
You can use a paired t-test (e.g. in matlab) to see if the two errors
are significantly different, or if one error is significantly larger than the other.
Paired t-test
Chunk the data up in subsets T1,...,Tk with |Ti|>30
On each subset compute the error and compute: i error (h1|Ti ) error (h2|Ti )
Now compute: 1 k
k
i 1
i
1 k
s ( )
k (k 1) i 1
(i )2
tN ,k 1s ( )
t is the t-statistic which is related to the student-t distribution (table 5.6).
Comparing Learning Algorithms
In general it is a really bad idea to estimate error rates on the same data
on which a learning algorithm is trained. WHY?
Train both learning algorithm 1 (L1) and learning algorithm 2 (L2) on the complement
of each subset: {S-T1,S-T2,...) to produce hypotheses {L1(S-Ti), L2(S-Ti)} for all i.
FN = false negatives =
# positives classified as negative
divided by # positives
Conclusion