Académique Documents
Professionnel Documents
Culture Documents
Problem 1: (30 pts) The perceptron algorithm is “incremental,” in the sense that small changes are made to
the weight vector in response to each labeled example in turn. For any learning rate η > 0, the algorithm acts
sequentially as shown below. Suppose that there exists a ρ > 0, a weight vector w∗ satisfying w∗ = 1, and a
Proof: Let (wj , bj ) be the state maintained immediately before the jth update occurring at example
(xi , yi ). Note that (wj , bj ) is only updated when the corresponding classifier gj misclassifies yi , which implies
that yi − gj (xi ) = 2yi . Therefore,
T ∗ T ∗
wj+1 w wj η xi w
= + (yi − gj (xi ))
bj+1 b∗ bj 2 1 b∗
T ∗ T ∗
wj w xi w
= + ηy i
bj b∗ 1 b∗
T ∗
wj w
≥ + ηρ
bj b∗
≥ jηρ .
On the other hand,
2
wj+1 2 wj xi
= + ηyi
bj+1 bj 1
2 T 2
wj
= + 2ηyi xi wj 2 xi
+η
bj 1 bj 1
2 2
wj
≤ + η 2 xi
bj 1
≤ jη 2 (R2 + 1) .
Combining these two observations with the Cauchy-Schwarz inequality shows that
wj+1
jη (R + 1) ≥
2 2
bj+1
2
T ∗
wj+1 w
bj+1 b∗
≥
1 + b∗ 2
jηρ
≥ ,
1 + b∗ 2
(1+b∗2 )(R2 +1)
and thus j ≤ ρ2 as desired.
Problem 2: (20 pts) What are the VC dimensions of the following function sets.
1. F = {f : Ê2 → {−1, 1} | f (x) = sign(xT x − b), b ∈ Ê}
2. F = {f : Ê2 → {−1, 1} | f (x) = sign s(xT x) − b , s, b ∈ Ê}
Answer:
1. The VC-dimension is 1.
2. The VC-dimension is 2.
Problem 3: (30 pts) Consider a support vector machine and the following training data from two categories:
1. (20 pts) Plot these six training points, and construct by inspection the weight vector for the optimal
hyperplane, and the optimal margin.
2. (10 pts) What are the support vectors?
Answer:
√
2 2
1. The weight vector for the optimal hyperplane is: w = [2, 2]T , b = −3. The margin is w = 2 .
2. The support vectors are (0, 1)T , (1, 0)T , (1, 1)T , and (2, 0)T .
Problem 4: (20 pts) Let K be a Mercer kernel. Please write pseudo code to implement the kernel version of
the perceptron learning algorithm (described in Problem 1).
Answer:
1 initialize α = 0, b = 0
2 repeat
3 for (i = 1, . . . , m)
n
4 compute g(xi ) = sgn j=1 αj K(xi , xj ) + b
5 update α and b according to
6 α ← α + yi −g(x2
i)
xi
7 b ← b + yi −g(x
2
i)
8 endfor
9 until
10 output (the classifier defined by α and b)