Vous êtes sur la page 1sur 3

CS761 Spring 2015 Homework 2

Assigned Feb. 11, Due Feb. 18, 2015 before class


Instructions:
Homeworks are to be done individually.
Typeset your homework in latex using this file as template (e.g. use pdflatex). We do not accept hand-written homeworks. Show your derivations.
Hand in a single-sided printed copy of your homework before class. Homework will no longer be accepted once the lecture starts.
Unless explicitly specified in the questions, you do not need to hand in
any code.
Fill in your name and email below. This will produce a page separated
from the rest of your homework. We will do double blind review. Do
not include identifying information (e.g. your name) in the rest of your
homework.
Name: Rahul Chatterjee
Email: rchat@cs.wisc.edu

1. Let X = Z be the set of integers. Let H = {hz : z X } {h0 }, where for


each z X , hz (x) = 1 if x = z and hz (x) = 0 if x 6= z. h0 is the hypothesis
that classifies everything 0. We make the realizable assumption, namely
the true hypothesis f classifies everything 0, except perhaps one item. For
this problem, do not use a VC argument.
(a) Define an algorithm that implements ERM.
(b) Prove that H is PAC learnable. Give an upper bound on the sample
complexity.
Ans.
(a) The ERM will be pretty simple. Given a sample S drawn identically
independently with replacement from X , if there exists any z for which
f (z) = 1, then output hz , else, output h0 . Clearly the training error will
be zero, and we get our ERM.
(b) H is PAC learnable if there exists two small non-negative quantity 
and , s.t. for any sample of size n(, ), we can guarantee that sample
error will not far than  from the generalization error with high probability
( ). Let z be the optimal point where f is 1 and zero otherwise. Now,
in this case, the hz (x) can only make mistake if it does not see z in its
sample (S), and, in that case the error LD (hz ) = 1. Because it can never
give false positive, i.e. will not output a hz if it has not seen the z for
which it has got a positive sample.
Now, we have to analyze that what is the probability that the sample of
size n will not contain that z . TODO
2. (a) Find a class H of binary-valued functions over the real interval X =
[0, 1] such that H is infinite and V Cdim(H) = 1.
(b) Find a class H of binary-valued functions over the real interval X =
[0, 1] such that H is finite and V Cdim(H) = log2 (|H|).
Ans.
(a) H = {h : X }, where h (x) = 1 if x , 0 otherwise. Clearly
h can only shatter one point and hence its VC-dimension is 1.
(b) In this case, let H is a function that implements all possible binary
assignments of n points to 0 or 1, for a fixed n. More precisely, X =
n
{x1 , x2 . . . xn } X n , H = {h : X {0, 1} }. We can easily extend the
domain of the function to X from X by setting the functions value zero
outside X. Clearly, |H| = 2n . And we know, this binary function has a
VC dimension of n. A loose reasoning could be, it can definitely shatter
n, points (by taking X as those n points) but for n + 1 it cannot achieve
all 1 sample.
3. Let X be a finite domain. Let 1 k |X | be a fixed integer. Find
the VC dimension of the following hypothesis spaces of binary classifiers
h : X 7 {0, 1}. Prove your claim.
2

P
(a) H = {h : xX h(x) = k}, i.e. all hypotheses that classify exactly k
items positive.
P
(b) H = {h : xX h(x) k}, i.e. all hypotheses that classify at most
k items positive.
Ans.
|X |
(a)
such that,
P H is a subset of all the binary mapping from X to {0, 1}
h(x)
=
k.
It
is
easy
to
prove
that
VC-dim(H)

k.
As,
if we set
xX
x1 , x2 . . . xk+1 all 1, there is no h that can classify them. If, |X | > 2k then
VC-dim(H) is clearly k. As for any labeling of k x0i s, there can be one
h that can shatter them. In this case, the ideal h will have lot of (> k)
points not in the taken set, to classify in whatever way it needs to satisfy
the first constraint.
But if |X | 2k then it can only shatter, |X |/2 points other wise you
can find a always set a subset with all zeros and there will not be any
h H that can classify it with zero error. Summarily, VC-dim(H) =
min(|X |/2, k).
4. Let
h (x) = sgn(sin(x))
where sgn(z) = 1 if z 0, and 0 otherwise. Consider
H = {h : R}.
Let xi = 2i for i = 1 . . . n. Prove that, for any y1 , . . . , yn {0, 1}, R
such that h (xi ) = yi for i = 1 . . . n.
You have just shown that V Cdim(H) = , even though H is parametrized
by a single parameter .

Vous aimerez peut-être aussi