Vous êtes sur la page 1sur 4


SVMSupport Vector Machines
A new classification method for both linear and
nonlinear data
It uses a nonlinear mapping to transform the original
training data into a higher dimension
With the new dimension, it searches for the linear
optimal separating hyperplane (i.e., decision
With an appropriate nonlinear mapping to a
sufficiently high dimension, data from two classes
can always be separated by a hyperplane
SVM finds this hyperplane using support vectors
(essential training tuples) and margins (defined by
the support vectors)
SVMHistory and Applications
Vapnik and colleagues (1992)groundwork from
Vapnik & Chervonenkis statistical learning theory in
Features: training can be slow but accuracy is high
owing to their ability to model complex nonlinear
decision boundaries (margin maximization)
Used both for classification and prediction
handwritten digit recognition, object recognition,
speaker identification, benchmarking time-series
prediction tests

SVMGeneral Philosophy

Data Set D (X1, y1), (X2, y2)(Xn, yn)

Xi Training tuples

yi Class labels
Linearly Separable

Best separating line minimum classification


Plane / Hyperplane
Maximal Margin Hyperplane (MMH)

Larger margin more accurate at classifying

future samples

Gives largest separation among classes

Sides of margin are parallel to hyperplane

Distance from MMH to closest training tuple of

either class

SVMLinearly Separable High Dimensional Data

Classification using SVM

Maximal Margin Hyperplane: d(XT)= i=1 l
yiiXiXT + b0

Xi Support vectors (l numbers) and XTTest data

Class depends on sign of result

SVM vs. Neural Network

Relatively new concept
Deterministic algorithm
Nice Generalization properties
Hard to learn learned in batch mode using
quadratic programming techniques

Using kernels can learn very complex functions

Neural Network
Relatively old
Nondeterministic algorithm
Generalizes well but doesnt have strong
mathematical foundation
Can easily be learned in incremental fashion
To learn complex functionsuse multilayer

Associative Classification

Associative classification

Association rules are generated and analyzed for

use in classification

Search for strong associations between frequent

patterns (conjunctions of attribute-value pairs)
and class labels

Classification: Based on evaluating a set of rules

in the form of
p1 ^ p2 ^ pl Aclass = C (conf, sup)


It explores highly confident associations among

multiple attributes

Overcome some constraints introduced by

decision-tree induction which considers only one
attribute at a time

Associative classification has been found to be

more accurate than some traditional classification
methods, such as C4.5

Typical Associative Classification Methods

CBA (Classification By Association)

Mine possible rules in the form of

Cond-set (a set of attribute-value pairs)

class label

Iterative Approach similar to Apriori

Build classifier: Organize rules according to

decreasing precedence based on confidence and
then support

CMAR (Classification based on Multiple Association


Uses variant of FP-Growth

Mined rules are stored and retrieved using tree


Rule Pruning is carried out Specialized rules

with low confidence are pruned

Classification: Statistical analysis on multiple


CPAR (Classification based on Predictive

Association Rules)

Generation of predictive rules (FOIL-like


High efficiency, accuracy similar to CMAR

Uses best-k rules of each group to predict class