Vous êtes sur la page 1sur 26

Artificial Neural Network

Linear Discriminant Analysis

Support Vector Machine

Neural Networks
Artificial neural network (ANN) is a machine learning
approach that models human brain and consists of a number of
artificial neurons.
Neuron in ANNs tend to have fewer connections than
biological neurons.
Each neuron in ANN receives a number of inputs.
An activation function is applied to these inputs which results
in activation level of neuron (output value of the neuron).
Knowledge about the learning task is given in the form of
examples called training examples.

An Artificial Neural Network is specified by:

neuron model: the information processing unit of the NN,

an architecture: a set of neurons and links connecting neurons.
Each link has a weight,
a learning algorithm: used for training the NN by modifying the
weights in order to model a particular learning task correctly on the
training examples.

The aim is to obtain a NN that is trained and generalizes

It should behaves correctly on new instances of the
learning task.

The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of links, describing the neuron inputs, with weights W1, W2,
, Wm
2 An adder function (linear combiner) for computing the weighted
sum of the inputs:
(real numbers)
u = wjxj
j =1

3 Activation function
for limiting the amplitude of the neuron
output. Here b denotes bias.

y = (u + b)

The Neuron Diagram














Bias of a Neuron
The bias b has the effect of applying a transformation to
the weighted sum u
The bias is an external parameter of the neuron. It can be
modeled by adding an extra input.
v is called induced field of the neuron


w x

j =0

w0 = b

Neuron Models
The choice of activation function
neuron model.
a if v < c

(v ) =
step function:
b if v > c
ramp function:

determines the

a if v < c

( v ) = b if v > d
a + (( v c )( b a ) /( d c )) otherwise

sigmoid function with z,x,y parameters

(v ) = z +

Gaussian function:

(v ) =

1 + exp( xv + y )

1 v 2


Step Function

Ramp Function


Sigmoid function

Linear discriminant Analysis

There are many possible techniques for classification of data. Principal Component
Analysis (PCA) and Linear Discriminant Analysis (LDA) are two commonly used
techniques for data classification and dimensionality reduction. Linear Discriminant
Analysis easily handles the case where the within-class frequencies are unequal and
their performances has been examined on randomly generated test data.
This method maximizes the ratio of between-class variance to the within-class
variance in any particular data set thereby guaranteeing maximal separability. The
use of Linear Discriminant Analysis for data classification is applied to classification
problem in speech recognition.We decided to implement an algorithm for LDA in
hopes of providing better classification compared to Principal Components Analysis.
The prime difference between LDA and PCA is that PCA does more of feature
classification and LDA does data classification. In PCA, the shape and location of
the original data sets changes when transformed to a different space whereas LDA
doesnt change the location but only tries to provide more class separability and draw
a decision region between the given classes.This method also helps to better
understand the distribution of the feature data. Figure 1 will be used as an example to
explain and illustrate the theory of LDA.

DIFFERENT APPROACHES TO LDA Data sets can be transformed and

test vectors can be classified in the transformed space by two different
Class-dependent transformation: This type of approach involves
maximizing the ratio of between class variance to within class variance.
The main objective is to maximize this ratio so that adequate class
separability is obtained. The class-specific type approach involves using
two optimizing criteria for transforming the data sets independently.

Class-independent transformation: This approach involves maximizing the

ratio of overall variance to within class variance. This approach uses only
one optimizing criterion to transform the data sets and hence all data points
irrespective of their class identity are transformed using this transform. In
this type of LDA, each class is considered as a separate class against all
other classes.

Support Vector Machine

Support Vector Machines are based on the concept of decision planes that define
decision boundaries. A decision plane is one that separates between a set of objects
having different class memberships. A schematic example is shown in the illustration
below. In this example, the objects belong either to class GREEN or RED.
The separating line defines a boundary on the right side of which all objects are
GREEN and to the left of which all objects are RED. Any new object (white circle)
falling to the right is labeled, i.e., classified, as GREEN (or classified as RED should
it fall to the left of the separating line).

The above is a classic example of a linear classifier, i.e., a classifier
that separates a set of objects into their respective groups (GREEN
and RED in this case) with a line. Most classification tasks,
however, are not that simple, and often more complex structures are
needed in order to make an optimal separation, i.e., correctly
classify new objects (test cases) on the basis of the examples that are
available (train cases).
This situation is depicted in the illustration below. Compared to the
previous schematic, it is clear that a full separation of the GREEN
and RED objects would require a curve (which is more complex
than a line). Classification tasks based on drawing separating lines to
distinguish between objects of different class memberships are
known as hyperplane classifiers. Support Vector Machines are
particularly suited to handle such tasks.

The illustration below shows the basic idea behind
Support Vector Machines. Here we see the original
objects (left side of the schematic) mapped, i.e.,
rearranged, using a set of mathematical functions,
known as kernels. The process of rearranging the
objects is known as mapping (transformation). Note
that in this new setting, the mapped objects (right side
of the schematic) is linearly separable and, thus, instead
of constructing the complex curve (left schematic), all
we have to do is to find an optimal line that can
separate the GREEN and the RED objects.


Technical Notes
Support Vector Machine (SVM) is primarily a classier method that performs
classification tasks by constructing hyperplanes in a multidimensional space that
separates cases of different class labels. SVM supports both regression and
classification tasks and can handle multiple continuous and categorical variables.
For categorical variables a dummy variable is created with case values as either 0 or
1. Thus, a categorical dependent variable consisting of three levels, say (A, B, C), is
represented by a set of three dummy variables:
A: {1 0 0}, B: {0 1 0}, C: {0 0 1}
To construct an optimal hyperplane, SVM employs an iterative training algorithm,
which is used to minimize an error function. According to the form of the error
function, SVM models can be classified into four distinct groups:
Classification SVM Type 1 (also known as C-SVM classification)
Classification SVM Type 2 (also known as nu-SVM classification)
Regression SVM Type 1 (also known as epsilon-SVM regression)
Regression SVM Type 2 (also known as nu-SVM regression)

Example for SVM



You dont have to consider
these equation direct
alpha values will be given