Académique Documents
Professionnel Documents
Culture Documents
Non-separable data
ER
Both are
not 8.6:
linearly
separable.
But there isofaFigure
difference!
Figure
Non-separable
data (reproduction
3.1).
data is not linearly separable? Figure 8.6 (reproduced from Chapter 3) illustrates the two types of non-separability. In Figure 8.6(a), two noisy data
points render the data non-separable. In Figure 8.6(b), the target function is
inherently nonlinear.
Non-separable data
ER
Linear8.6:
with
outliers data (reproduction
Nonlinear
Figure
Non-separable
of Figure 3.1).
data is not linearly separable? Figure 8.6 (reproduced from Chapter 3) illustrates the two types of non-separability. In Figure 8.6(a), two noisy data
points render the data non-separable. In Figure 8.6(b), the target function is
inherently nonlinear.
x2 x2
z2 x=22 x22
z2 =
z1 = x21
z1 = x21
x1
x1
1
x = x11
x = xx1
2
x2
c AM
#
L Creator: Malik Magdon-Ismail
1 1
1(x)
(x)
z = (x) = xx221 =
2
2
2
x2
2(x)
Separate the data in the Z-space with w:
Classification in Z-space
tz)
g(z) = sign(w
In Z-space the data can be linearly separated:
tz)
g(z) = sign(w
c AM
#
L Creator: Malik Magdon-Ismail
To classify a new x, first transform x to (x) Z-space and classify there with g.
Classification in Z-space
a g(x) = g((x))
t(x))
= sign(w
tz)
g(z) = sign(w
c AM
$
L Creator: Malik Magdon-Ismail
Classification in Z-space
. . . if you think linear is not enough, try the 2nd order polynomial transform.
1
x1 = x
x2
1
1
1(x) x1
(x) x
2
2
(x) =
= 2
3(x) x1
(x) x x
4
1 2
5(x)
x22
c AM
#
L Creator: Malik Magdon-Ismail
We canWe
getcan
evenchoose
fancier:higher
degree-k
polynomial
transform:
order
polynomials:
1(x) = (1, x1, x2),
2(x) = (1, x1, x2, x21, x1x2, x22),
3(x) = (1, x1, x2, x21, x1x2, x22, x31, x21x2, x1x22, x32),
4(x) = (1, x1, x2, x21, x1x2, x22, x31, x21x2, x1x22, x32, x41, x31x2, x21x22, x1x32, x42),
..
What are the potential effects of increasing the order of the
Dimensionality
of the feature space increases rapidly (dvc)!
polynomial?
c AM
!
L Creator: Malik Magdon-Ismail
c AM
!
L Creator: Malik Magdon-Ismail
Digits data
Danger of overfitting
Better chance of obtaining linear separability
Computationally expensive (memory and time)
10
(x) =
(x21 ,
2x1 x2 , x22 )|
(x) =
w1 x21
2w2 x1 x2 + w3 x22
11
w=
n
X
i x i
i=1
w x+b
i i
f (x)
w| (x) +
b =
i (xi )| (x) + b
12
Example
Lets go back to the example
(x) =
(x21 ,
2x1 x2 , x22 )|
(x)
(z) =
=
= (x z)
13
Example
Lets go back to the example
(x) =
(x21 ,
2x1 x2 , x22 )|
(x)
(z) =
=
= (x z)
14
Kernels
Definition: A function k(x, z) that can be expressed as a dot
product in some feature space is called a kernel.
In other words, k(x, z) is a kernel if there exists
such that
: X 7! F
15
maximize
with:
n
X
i=1
n
X
subject to: 0 i C,
maximize
n
X
i=1
i yi = 0
i=1
1 XX
i j yi yj x|i xj
2 i=1 j=1
1 XX
i j yi yj k(xi , xj )
2 i=1 j=1
subject to: 0 i C,
n
X
i yi = 0
i=1
16
k(x, z) = x| z
Homogeneous polynomial kernel
k(x, z) = exp(
||x
z||2 )
17
Feature space:
k(x, z) = x| z
Original features
All monomials of
degree d
Polynomial kernel
All monomials of
degree less than d
Gaussian kernel
k(x, z) = exp(
||x
z||2 )
Infinite dimensional
18
Demo
Using polynomial kernel:
19
Demo
Using the Gaussian kernel:
k(x, z) = exp(
||x
z||2 )
20
w = w + yi xi
22
The update
i ! i + yi
Is equivalent to:
w0 = w + yi xi
23
(yi
w| xi )2 = (y
Xw)| (y
Xw)
X| Xw = X| y
If we express w as:
w=
n
X
i x i = X|
i=1
We get:
X| XX| = X| y
24
XX| = y
Compare with:
X| Xw = X| y
Which is harder to find? What have we gained?
25
XX|
Kij = (xi )
(xj ) = k(xi , xj )
26
27
q
q
q
x| Kx
q
0 8x
28
Feature space:
k(x, z) = x| z
Original features
All monomials of
degree d
Polynomial kernel
All monomials of
degree less than d
k(x, z) = exp(
||x
z||2 )
Infinite dimensional
30
31
32
33
34
is a kernel
K (x, z) = p
K(x, z)
K(x, x)K(z, z)
35
is a kernel
K (x, z) = p
K(x, z)
K(x, x)K(z, z)
(x)| (z)
(x)| (z)
cos( (x), (z)) =
=p
|| (x)|| || (z)||
(x)| (x) (z)| (z)
36
K(t) =
1
X
= K(x| z)
a n tn
n=0
is a kernel iff
an
0 for all n.
37
K(t) =
1
X
= K(x| z)
a n tn
n=0
is a kernel iff
Corollary:
an
0 for all n.
K(x, z) = exp (2 x| z)
is a kernel
38
K(t) =
1
X
= K(x| z)
a n tn
n=0
is a kernel iff
an
0 for all n.
Corollary:
K(x, z) = exp (2 x| z)
Corollary:
k(x, z) = exp(
exp(
||x
z||2 ) = exp (
||x
is a kernel
z||2 )
is a kernel
K(x, x)K(z, z)
39