Vous êtes sur la page 1sur 2

1

ENGR 691/692 Section 66 (Fall 06): Machine Learning Assigned: October 10


Homework 3: Perceptron, VC Dimension, and SVMs (Solutions) Due: October 30

Problem 1: (30 pts) The perceptron algorithm is “incremental,” in the sense that small changes are made to
the weight vector in response to each labeled example in turn. For any learning rate η > 0, the algorithm acts
sequentially as shown below. Suppose that there exists a ρ > 0, a weight vector w∗ satisfying w∗  = 1, and a

threshold b∗ such that


yi (w∗ T xi + b∗ ) ≥ ρ for all 1 ≤ i ≤ m .
(b∗2 +1)(R2 +1)
Please prove that the above algorithm converges after no more than ρ2 updates, where R = maxi xi .

Proof: Let (wj , bj ) be the state maintained immediately before the jth update occurring at example
(xi , yi ). Note that (wj , bj ) is only updated when the corresponding classifier gj misclassifies yi , which implies
that yi − gj (xi ) = 2yi . Therefore,
 T  ∗     T  ∗ 
wj+1 w wj η xi w
= + (yi − gj (xi ))
bj+1 b∗ bj 2 1 b∗
 T  ∗   T  ∗ 
wj w xi w
= + ηy i
bj b∗ 1 b∗
 T  ∗ 
wj w
≥ + ηρ
bj b∗
≥ jηρ .
On the other hand,
     2
 wj+1 2  wj xi 
  =  + ηyi 
 bj+1   bj 1 
 2  T    2
 wj   
=   + 2ηyi xi wj 2  xi
+η  
 bj  1 bj 1 
 2  2
 wj   
≤   + η 2  xi 
 bj   1 
≤ jη 2 (R2 + 1) .
Combining these two observations with the Cauchy-Schwarz inequality shows that
 
  wj+1 
jη (R + 1) ≥ 
2 2
 bj+1 

2
 T  ∗ 
wj+1 w
bj+1 b∗
≥ 
1 + b∗ 2
jηρ
≥  ,
1 + b∗ 2
(1+b∗2 )(R2 +1)
and thus j ≤ ρ2 as desired.

Problem 2: (20 pts) What are the VC dimensions of the following function sets.
1. F = {f : Ê2 → {−1, 1} | f (x) = sign(xT x − b), b ∈ Ê}
 
2. F = {f : Ê2 → {−1, 1} | f (x) = sign s(xT x) − b , s, b ∈ Ê}

Answer:

1. The VC-dimension is 1.
2. The VC-dimension is 2.

Problem 3: (30 pts) Consider a support vector machine and the following training data from two categories:

ω1 : (1, 1)T (2, 2)T (2, 0)T


ω2 : (0, 0)T (1, 0)T (0, 1)T

1. (20 pts) Plot these six training points, and construct by inspection the weight vector for the optimal
hyperplane, and the optimal margin.
2. (10 pts) What are the support vectors?
Answer:

2 2
1. The weight vector for the optimal hyperplane is: w = [2, 2]T , b = −3. The margin is w = 2 .

2. The support vectors are (0, 1)T , (1, 0)T , (1, 1)T , and (2, 0)T .

Problem 4: (20 pts) Let K be a Mercer kernel. Please write pseudo code to implement the kernel version of
the perceptron learning algorithm (described in Problem 1).
Answer:

1 initialize α = 0, b = 0
2 repeat
3 for (i = 1, . . . , m)

n
4 compute g(xi ) = sgn j=1 αj K(xi , xj ) + b
5 update α and b according to
6 α ← α + yi −g(x2
i)
xi
7 b ← b + yi −g(x
2
i)

8 endfor
9 until
10 output (the classifier defined by α and b)

The classifier is g(x) = sgn [ ni=1 αi K(x, xi ) + b].

Vous aimerez peut-être aussi