Académique Documents
Professionnel Documents
Culture Documents
+ =
=
N
k
k k k
b x x K y sign x y
1
) , ( ) ( o
where
k
o are called support values and b is a constant. The ( ) , K is the kernel, which can be either
( ) x x x x K
T
k k
= , (linear SVM); ( )
d T
k k
x x x x K ) 1 ( , + = (polynomial SVM of degree d);
( ) ] tanh[ , u k + = x x x x K
T
k k
(multilayer perceptron SVM), or ( ) } / exp{ ,
2
2
2
o
k k
x x x x K = (RBF
SVM), where k u , ando are constants.
For instance, the problem of classifying two classes is defined as
1
1
1 ) (
1 ) (
=
+ =
s +
+ > +
k
k
k
T
k
T
y
y
if
if
b x w
b x w
This can also be written as
N k b x w y
k
T
k
,..., 1 , 1 ] ) ( [ = > +
where ( ) is a nonlinear function mapping of the input space to a higher dimensional space. LS-SVM classifiers\
=
+ =
N
k
k
T
LS
e b w
e w w e b w J
1
2
2
1
2
1
, ,
) , , ( min
subjects to the equality constraints
N k e b x w y
k k
T
k
,..., 1 , 1 ] ) ( [ = = +
The Lagrangian is defined as
{ }
=
+ + =
N
k
k k
T
k k LS
e b x w y J e b w L
1
1 ] ) ( [ ) ; , , ( o o
with Lagrange multipliers R
k
e o (called support values).
The conditions for optimality are given by
= + + =
= =
= =
= =
c
c
c
c
=
c
c
=
c
c
0 1 ] ) ( [ 0
0
0 0
) ( 0
1
1
k k
T
k
L
k k e
L
N
k
k k b
L
N
k
k k k w
L
e b x w y
e
y
x y w
k
k
o
o
o
o
for N k ,..., 1 = . After elimination of w and e one obtains the solution
(
=
(
+
v
T
T
b
I ZZ Y
Y
1
0 0
1
o
with ] ;...; [ ], 1 ;...; 1 [ 1 ], ;...; [ ], ) ( ;...; ) ( [
1 1 1 1 N v N N
T
N
T
e e e y y Y y x y x Z = = = = and ] ;...; [
1 N
o o o = .
Mercers condition is applied to the matrix
T
ZZ = O with
) , (
) ( ) (
l k l k
l
T
k l k kl
x x K y y
x x y y
=
= O
The kernel parameters, i.e. o for RBF kernel, can be optimally chosen by optimizing an upper bound on the VC
dimension. The support values o
k
are proportional to the errors at the data points in the LS-SVM case, while in the
standard SVM case many support values are typically equal to zero. When solving large linear systems, it becomes
needed to apply iterative methods [2].
BENCHMARK : MULTI TWO-SPIRAL PROBLEM
One of the well-known benchmark problems for assessing the quality of neural networks classifiers is two-spiral
problem [3]. In [5] the excellent training and generalization performance of LS-SVM with RBF kernel on this
problem has been shown. In the following a more complicated multi two-spiral classification problem is depicted in
Figure 1. Given are 432 training data which consist of the two classes as indicated by '*' and 'o'. The LS-SVM
classifier using RBF kernel with o = 1 and = 50. The resulting classifier with support values o
k
and bias term b
obtained from the large scale algorithm is shown on Figure 1. Taking 432 support values one has no
misclassification on the training set, together with excellent generalization as is clear from the decision boundary
between the black and white regions. The support value spectrum is depicted in the Figure 2, here the obtained
support values are sorted from largest to smallest.
FIGURE 1. Multi two-spiral classification problem with 432 training data (class 1 and class 2 are indicated by * and o). From
black and white regions which determine the decision boundary between two classes is clearly shown the excellent generalization
performance of the LS-SVM with RBF kernel.
FIGURE 2. The spectrum of support values related to the classification problem in Figure 1. The support values o
k
are sorted
from the largest to the smallest value for the given training data set.
CONCLUSION
In this paper a least squares version of Support Vector Machines (LS-SVM) is explained. The solution of the linear
system can be calculated efficiently using a conjugate gradient method. As illustrated in the difficult multi two-spiral
classification problem, excellent generalization performance can be obtained using LS-SVM approach in separable
case. The performance of the classifier turns out to be quite robust with respect to tuning parameters of the
algorithm.
ACKNOWLEDGEMENTS
This research work was carried out at the ESAT laboratory and the Interdisciplinary Center of Neural Networks
ICNN of the Katholieke Universiteit Leuven, in the framework of the FWO project Learning and Optimization : an
Interdisciplinary Approach, the Belgian Program on Interuniversity Poles of Attraction, initiated by the Belgian
State, Prime Minister's Office for Science, Technology and Culture (IUAP P4-02 & IUAP P4-24) and the Concerted
Action Project MEFISTO of the Flemish Community. Johan Suykens is a postdoctoral researcher with the National
Fund for Scientific Research FWO - Flanders.
REFERENCES
[1] Fletcher R., Johnson T., On the stability of null-space method for KKT systems, SIAM J. Matrix Anal. Appl.,
Vol. 18, No. 4, 1997, pp. 938-958.
[2] Golub G.H., Van Loan C.F., Matrix Computations, Baltimore MD, John Hopkins University Press, 1989.
[3] Ridella S., Rovetta S., Zunino R., Circular backpropagation networks for classification, IEEE Transactions on
Neural Networks, Vol. 8, No. 1, 1997, pp. 84-97.
[4] Sun J.-G., Structured backwards errors for KKT systems, Linear Algebra and its Applications, 288, 1999,
pp. 75-88.
[5] Suykens J.A.K., Vandewalle J., Least squares support vector machine classifiers, Neural Processing Letter,
vol. 9, no. 3, Jun. 1999, pp. 293-300.
[6] Suykens J.A.K., Vandewalle J., Training multilayer perceptron classifiers based on a modified support vector
method, IEEE Transactions on Neural Networks, 1999.
[7] Vapnik V. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.