Vous êtes sur la page 1sur 64

Introduction

Going to infinity
Conclusion
Questions

Continuous neural networks


Nicolas Le Roux
joint work with Yoshua Bengio

April 5th , 2006

Nicolas Le Roux joint work with Yoshua Bengio

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

Structure of Neural Networks


Horniks theorem and its consequences

Usual Neural Networks


P
b
f (xt ) = j wj g (vj xt ) + b

g (v1 xt )

g (vn xt )

...

xt,1

...

g is the transfer function: tanh, sigmoid, sign, ...


Nicolas Le Roux

Snowbird 2006

xt,d

Introduction
Going to infinity
Conclusion
Questions

Structure of Neural Networks


Horniks theorem and its consequences

Neural networks are universal approximators


Hornik et al. (1989)
Multilayer feedforward networks with one hidden layer using
arbitrary squashing functions are capable of approximating any
function to any desired degree of accuracy, provided sufficiently
many hidden units are available.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

Structure of Neural Networks


Horniks theorem and its consequences

Neural networks are universal approximators


Hornik et al. (1989)
Multilayer feedforward networks with one hidden layer using
arbitrary squashing functions are capable of approximating any
function to any desired degree of accuracy, provided sufficiently
many hidden units are available.
Neural Networks with an infinite number of hidden units should be
able to approximate every function arbitrarily well.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A nice picture before the Maths

...

h v1

O1

...

Op

...

h v2

...

h vk

xt,2

...

xt,d

xt,1

Nicolas Le Roux

Snowbird 2006

...

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Going to infinity

b
f (x) =

X
j

wj g (vj x) + b b
f (x) =

Nicolas Le Roux

w (j) g (v (j) x) dj + b
j

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Going to infinity

b
f (x) =

X
j

wj g (vj x) + b b
f (x) =

w (j) g (v (j) x) dj + b
j

Horniks theorem tells us that any function f from R d to R can be


approximated arbitrarily well by:

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Going to infinity

b
f (x) =

X
j

wj g (vj x) + b b
f (x) =

w (j) g (v (j) x) dj + b
j

Horniks theorem tells us that any function f from R d to R can be


approximated arbitrarily well by:
1

a function from R to R: w the output weights function

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Going to infinity

b
f (x) =

X
j

wj g (vj x) + b b
f (x) =

w (j) g (v (j) x) dj + b
j

Horniks theorem tells us that any function f from R d to R can be


approximated arbitrarily well by:
1

a function from R to R: w the output weights function

a function from R to Rd+1 : v the input weights function

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Going to infinity

b
f (x) =

X
j

wj g (vj x) + b b
f (x) =

w (j) g (v (j) x) dj + b
j

Horniks theorem tells us that any function f from R d to R can be


approximated arbitrarily well by:
1

a function from R to R: w the output weights function

a function from R to Rd+1 : v the input weights function

a scalar: b the output bias.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Going to infinity

b
f (x) =

X
j

wj g (vj x) + b b
f (x) =

w (j) g (v (j) x) dj + b
j

Horniks theorem tells us that any function f from R d to R can be


approximated arbitrarily well by:
1

a function from R to R: w the output weights function

a function from R to Rd+1 : v the input weights function

a scalar: b the output bias.

But a function from R to Rd+1 is d + 1 functions from R to R.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Going to infinity

b
f (x) =

X
j

wj g (vj x) + b b
f (x) =

w (j) g (v (j) x) dj + b
j

Horniks theorem tells us that any function f from R d to R can be


approximated arbitrarily well by:
1

a function from R to R: w the output weights function

a function from R to Rd+1 : v the input weights function

a scalar: b the output bias.

But a function from R to Rd+1 is d + 1 functions from R to R.


This is very similar to Kolmogorovs representation theorem.
Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

v for usual neural networks


v (j)

wj
vj

b
f (x) =

g (v (u) x) du + b =

X
j

Nicolas Le Roux

Snowbird 2006

j
wj g (vj x) + b

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

v for usual neural networks


v (j)

wj
vj

b
f (x) =

g (v (u) x) du + b =

j
wj g (vj x) + b

Neural networks approximate v with a piecewise constant function.


Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

V is a trajectory in Rd indexed by t
1

The function V is a trajectory in the space of all possible


input weights.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

V is a trajectory in Rd indexed by t
1

The function V is a trajectory in the space of all possible


input weights.

Each point corresponds to an input weight associated to an


infinitesimal output weight.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

V is a trajectory in Rd indexed by t
1

The function V is a trajectory in the space of all possible


input weights.

Each point corresponds to an input weight associated to an


infinitesimal output weight.

A piecewise constant trajectory only crosses a finite number of


points in the space of input weights.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

V is a trajectory in Rd indexed by t
1

The function V is a trajectory in the space of all possible


input weights.

Each point corresponds to an input weight associated to an


infinitesimal output weight.

A piecewise constant trajectory only crosses a finite number of


points in the space of input weights.

We could imagine trajectories that fill the space a bit more.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Piecewise affine approximations


v (j)

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Piecewise affine approximations


v (j)

wj

b
f (x) =

X
j

vj

wj
ln
(vj vj1 ) x

Nicolas Le Roux

cosh(vj x)
cosh(vj1 x)

Snowbird 2006

j
+b

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Piecewise affine approximations


v (j)

wj

b
f (x) =

X
j

vj

wj
ln
(vj vj1 ) x

cosh(vj x)
cosh(vj1 x)

j
+b

Seeing v as a function, we could introduce smoothness wrt j using


constraints on successive values of v .
Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Rate of convergence
1

Z




f (x) 2a
f (x) b

Nicolas Le Roux

|(v (t) b
v (t)) x| dt

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Rate of convergence
1

Z




f (x) 2a
f (x) b

|(v (t) b
v (t)) x| dt

A good approximation of v yields a good approximation of f .

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Rate of convergence
1

Z




f (x) 2a
f (x) b

|(v (t) b
v (t)) x| dt

A good approximation of v yields a good approximation of f .

Trapezoid rule (continuous piecewise affine functions) has a


faster convergence rate than rectangle rule (piecewise
constant functions).

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Rate of convergence
1

Z




f (x) 2a
f (x) b

|(v (t) b
v (t)) x| dt

A good approximation of v yields a good approximation of f .

Trapezoid rule (continuous piecewise affine functions) has a


faster convergence rate than rectangle rule (piecewise
constant functions).

Theorem: rate of convergence of affine neural networks


Affine neural networks converge in O(n 2 ) whereas usual neural
networks converge in O(n 1 ) (when n grows to infinity).

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Piecewise affine approximations (again)


v (j)

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A few remarks on the optimization on the input weights


1

Having a complex function V requires lots of pieces.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A few remarks on the optimization on the input weights


1

Having a complex function V requires lots of pieces.

Without constraints, having many pieces will lead us nowhere.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A few remarks on the optimization on the input weights


1

Having a complex function V requires lots of pieces.

Without constraints, having many pieces will lead us nowhere.

Maybe we could use other parametrizations inducing


constraints on the pieces.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A few remarks on the optimization on the input weights


1

Having a complex function V requires lots of pieces.

Without constraints, having many pieces will lead us nowhere.

Maybe we could use other parametrizations inducing


constraints on the pieces.

Instead of optimizing each input weight v (j) independently,


we could parametrize them as the output of a neural network.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Input weights function as the output of a neural network


2.5

1.5

0.5

0.5

v (j) =

wv ,k g (vv ,k j + bv ,k )

Nicolas Le Roux

Snowbird 2006

10

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Input weights function as the output of a neural network


1

v (j) =

wv ,k g (vv ,k j + bv ,k )

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Input weights function as the output of a neural network


1

v (j) =

wv ,k g (vv ,k j + bv ,k )

Setting a prior on the parameters of that network induces a


prior on v .

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Input weights function as the output of a neural network


1

v (j) =

wv ,k g (vv ,k j + bv ,k )

Setting a prior on the parameters of that network induces a


prior on v .

Such priors include the Gaussian prior commonly used.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Input weights function as the output of a neural network


1

v (j) =

wv ,k g (vv ,k j + bv ,k )

Setting a prior on the parameters of that network induces a


prior on v .

Such priors include the Gaussian prior commonly used.

The prior over vv ,k and bv ,k determines the level of


dependence between the js.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Input weights function as the output of a neural network


1

v (j) =

wv ,k g (vv ,k j + bv ,k )

Setting a prior on the parameters of that network induces a


prior on v .

Such priors include the Gaussian prior commonly used.

The prior over vv ,k and bv ,k determines the level of


dependence between the js.

The prior over wv ,k determines the amplitude of the v (j)s.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A bit of recursion
1

v (j) =

wv ,k g (vv ,k j + bv ,k )

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A bit of recursion
1

v (j) =

wv ,k g (vv ,k j + bv ,k )

What about the vv ,k and the bv ,k ?

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A bit of recursion
1

v (j) =

wv ,k g (vv ,k j + bv ,k )

What about the vv ,k and the bv ,k ?

We could define them as the output of a neural network.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A bit of recursion
1

v (j) =

wv ,k g (vv ,k j + bv ,k )

What about the vv ,k and the bv ,k ?

We could define them as the output of a neural network.

You should be lost by now.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

A bit of recursion
1

v (j) =

wv ,k g (vv ,k j + bv ,k )

What about the vv ,k and the bv ,k ?

We could define them as the output of a neural network.

You should be lost by now.

Lets stop a bit to rest.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Summary
1

Input weights can be seen as a function.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Summary
1

Input weights can be seen as a function.

There are parametrizations of that function that yield


theoretically more powerful networks than the usual ones.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Summary
1

Input weights can be seen as a function.

There are parametrizations of that function that yield


theoretically more powerful networks than the usual ones.

Moreover, such parametrizations allow to set different


constraints than the common ones.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Summary
1

Input weights can be seen as a function.

There are parametrizations of that function that yield


theoretically more powerful networks than the usual ones.

Moreover, such parametrizations allow to set different


constraints than the common ones.

Example: handling of sequential data.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Having all possible input neurons at once


1

Instead Rof optimizing input weights, we could use all of them:


f (x) = E w (v )g (v x) dv

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Having all possible input neurons at once


1

Instead Rof optimizing input weights, we could use all of them:


f (x) = E w (v )g (v x) dv
and only optimize the output weights this is convex.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Having all possible input neurons at once


1

2
3

Instead Rof optimizing input weights, we could use all of them:


f (x) = E w (v )g (v x) dv
and only optimize the output weights this is convex.
The optimal
P R solution is of the form:
f (x) = i E g (x v )g (xi v ) dv .

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Having all possible input neurons at once


1

2
3

Instead Rof optimizing input weights, we could use all of them:


f (x) = E w (v )g (v x) dv
and only optimize the output weights this is convex.
The optimal
P R solution is of the form:
f (x) = i E g (x v )g (xi v ) dv .

With a sign transfer function, this integral can be computed


analytically and yields a kernel machine.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Having all possible input neurons at once


1

2
3

Instead Rof optimizing input weights, we could use all of them:


f (x) = E w (v )g (v x) dv
and only optimize the output weights this is convex.
The optimal
P R solution is of the form:
f (x) = i E g (x v )g (xi v ) dv .

With a sign transfer function, this integral can be computed


analytically and yields a kernel machine.

Setting a prior on the output weights, this becomes a GP.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Having all possible input neurons at once


1

2
3

Instead Rof optimizing input weights, we could use all of them:


f (x) = E w (v )g (v x) dv
and only optimize the output weights this is convex.
The optimal
P R solution is of the form:
f (x) = i E g (x v )g (xi v ) dv .

With a sign transfer function, this integral can be computed


analytically and yields a kernel machine.

Setting a prior on the output weights, this becomes a GP.

Ksign (x, y ) = A Bkx y k

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Having all possible input neurons at once


1

2
3

Instead Rof optimizing input weights, we could use all of them:


f (x) = E w (v )g (v x) dv
and only optimize the output weights this is convex.
The optimal
P R solution is of the form:
f (x) = i E g (x v )g (xi v ) dv .

With a sign transfer function, this integral can be computed


analytically and yields a kernel machine.

Setting a prior on the output weights, this becomes a GP.

Ksign (x, y ) = A Bkx y k

This kernel has no hyperparameter.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Results on USPS with 6000 training samples


Algorithm
Ksign
G. = 1
G. = 2
G. = 4
G. = 6
G. = 7

wd = 103
2.270.13
58.270.50
7.710.10
1.720.11
1.670.10
1.720.10

wd = 106
1.800.08
58.540.27
7.780.21
2.090.09
2.780.25
3.040.26

Nicolas Le Roux

wd = 1012
1.800.08
58.540.27
7.780.21
2.100.09
3.330.35
4.390.49

Snowbird 2006

Test
4.07
58.29
12.31
4.07
3.58
3.77

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Results on MNIST with 6000 training samples


Algorithm
Ksign
G. = 1
G. = 2
G. = 3
G. = 5
G. = 7

wd = 103
5.51 0.22
77.55 0.40
10.51 0.46
3.64 0.10
3.01 0.12
3.15 0.09

wd = 106 , 109 , 1012 , 0


4.54 0.50
77.55 0.40
10.51 0.45
3.64 0.10
3.01 0.12
3.18 0.10

Nicolas Le Roux

Snowbird 2006

Test
4.09
80.03
12.44
4.1
3.33
3.48

Introduction
Going to infinity
Conclusion
Questions

A new vision of neural networks


The input weights function
Properties of continuous neural networks
Another vision

Results on LETTERS with 6000 training samples


Algorithm
Ksign
G. = 2
G. = 4
G. = 6
G. = 8

wd = 103
5.36 0.10
5.47 0.14
4.97 0.10
6.27 0.17
8.45 0.19

wd = 106
5.22 0.09
5.93 0.15
11.06 0.29
8.47 0.20
6.11 0.15

Nicolas Le Roux

Snowbird 2006

wd = 109
5.22 0.09
5.92 0.14
12.50 0.35
17.61 0.40
18.69 0.34

Test
5.5
5.8
5.3
6.63
9.25

Introduction
Going to infinity
Conclusion
Questions

Summary
Future work

Summary
1

We showed that training a neural network can be seen as


learning an input weight function.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

Summary
Future work

Summary
1

We showed that training a neural network can be seen as


learning an input weight function.

We introduced an affine-by-part parametrization of that


function which corresponds to a continuous number of hidden
units.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

Summary
Future work

Summary
1

We showed that training a neural network can be seen as


learning an input weight function.

We introduced an affine-by-part parametrization of that


function which corresponds to a continuous number of hidden
units.

In the extreme case where all the input weights are present,
we showed it is a kernel machine whose kernel can be
computed analytically and possesses no hyperparameter.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

Summary
Future work

Future work
1

Learning the transfer function using a neural network.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

Summary
Future work

Future work
1

Learning the transfer function using a neural network.

Find other (and better) parametrizations of the input weight


function.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

Summary
Future work

Future work
1

Learning the transfer function using a neural network.

Find other (and better) parametrizations of the input weight


function.

Recursively define the input weight function as the output of a


neural network.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

Computation of the covariance matrix


Additive and multiplicative invariance of the covariance matrix

Now is the time for ...

Questions?

Nicolas Le Roux

Snowbird 2006

Computing

Introduction
Going to infinity
Conclusion
Questions

Computation of the covariance matrix


Additive and multiplicative invariance of the covariance matrix

sign(v x + b)sign(v y + b) dvdb

sign(x) function is invariant with respect to the norm of x.

sign(v x + b)sign(v y + b) = sign(v x + b)(v y + b)].

When b ranges from M to +M, for M large enough,


x 0 vv 0 y + b is negative on an interval of size |v (x y )|.
R +M
0 0
b=M sign(x vv y + b) db = 2M 2|v (x y )|.

4
5

Integrating this term on the unit hypersphere yields a kernel


of the form K (x, y ) = A Bkx y k.

Nicolas Le Roux

Snowbird 2006

Introduction
Going to infinity
Conclusion
Questions

Computation of the covariance matrix


Additive and multiplicative invariance of the covariance matrix

Additive and multiplicative invariance of the covariance


matrix
1

2
3

In SVM and kernel regression, the elements of the weights


vector sum to 0.
The final solution involves K .
Thus, adding a term to every element of the covariance
matrix yields the solution
(K + n ee 0 ) = K + n e(e 0 ) = K .

C (K , , b, ) = L(K + b, Y ) + 0 K



K

K
, c, b,
c + b, Y + c0 c
= L
c
c
c
c
c
= C (K , c, b, )
Nicolas Le Roux

Snowbird 2006

Vous aimerez peut-être aussi