Vous êtes sur la page 1sur 18

Neural Networks

Chapter 1
Rosenblatt's Perceptron
Dr. Vincent A. Cassella
Catholic University of America
Material Acknowledgement
Neural Networks and Learning Machines, Third Edition
Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron
Figure 1.1 Signal-flow graph of the perceptron.

0, ()=1
<0, ()=-1

Decision Threshold

Neural Networks and Learning Machines, Third Edition


Simon Haykin

The goal of the


perceptron is to correctly
classify the set of
externally applied stimuli
x1,x2,...,xm into one of two
classes C1 or C2. The
decision rule for the
classification is to assign
the point represented by
the inputs x1,x2,...,xm to
class C1 if the perceptron
output y=+1 and to class
C2 if it is -1.

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron (Hyperplane)
Figure 1.2 Illustration of the hyperplane (in this example,
a straight line) as decision boundary for a
two-dimensional, two-class pattern-classification problem.

Decision Threshold

Class 2

Class 1

BREAK
Neural Networks and Learning Machines, Third Edition
Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron Convergence Theorem


To understand the error-correction technique learning algorithm for the
perceptron, it is convenient to work with the modified signal-flow graph
model.
Figure 1.3 Equivalent signal-flow graph of the perceptron;
dependence on time has been omitted for clarity.

x(n)=[+1, x1(n), x2(n),..., xm(n)]T


w(n)=[b, w1(n), w2(n),..., wm(n)]T

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron Convergence Theorem


When data is linearly
separable, there exists a
weight vector w such that we
may state the following.
wTx>0 for every input vector x
belonging to Class 1.
wTx0 for every input vector x
belonging to Class 2.

Figure 1.4 (a) A pair of linearly separable


patterns. (b) A pair of non-linearly separable.

Perceptron
works

Perceptron
doesn't
work

The class with the equality is


arbitrary.
Neural Networks and Learning Machines, Third Edition
Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron Convergence Theorem


Algorithm for adapting the weight vector.
1. If the nth member of the training set, x(n), is correctly classified by the
weight vector w(n) computed at the nth iteration of the algorithm, no
correction is made to the weight vector of the perceptron in accordance with
the rule:

w(n+1)=w(n) if wTx(n)>0 and x(n) belongs to class 1.


w(n+1)=w(n) if wTx(n)0 and x(n) belongs to class 2.
2. Otherwise, the weight vector of the perceptron is updated in accordance
with the following rule.

w(n+1)=w(n)-(n)x(n) if wTx(n)>0 and x(n) belongs to class 2.


w(n+1)=w(n)+(n)x(n) if wTx(n)0 and x(n) belongs to class 1.
The learning-rate parameter (n)>0 controls the adjustment applied to the
weight vector w at iteration n.
Neural Networks and Learning Machines, Third Edition
Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron Convergence Theorem


w(n+1)=w(n)-(n)x(n) if wTx(n)>0 and x(n) belongs to class 2.
w(n+1)=w(n)+(n)x(n) if wTx(n)0 and x(n) belongs to class
1.
Misclassified Data adjusts the
weight vector in a direction that
will enable it to ultimately
correctly classify all the data.

Conflicting requirements for the


learning-rate parameter (n).
Small (n) for stable weights.
Large (n) for fast learning.

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron Convergence Theorem

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron and Two-class Bayes Classifier


We are given an observation x, and we need to determine the class that the
observation came from.
For example, we need to determine if an apple is good or rotten based on the
observed average RGB color.
In the Bayes classifier, all possible observations shall be defined as belonging to
either Class 1 or Class 2. It will make mistakes if there exists an observation that
can possibly come from either class. However, we want to create the classifier so
that it minimizes the average risk or cost.
Definitions:
Cij is the cost of classifying observation x as i when it came from j.
pi is the probability of coming from class i. (p1+p2=1)
px(x|Ci) is the probability of observing x given that it came from class i.

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron and Two-class Bayes Classifier


The Bayes Classifier divides up the observation space into either classes
1 or 2. This is done in a way that minimizes the risk of the classification.

pi px(x|Ci) is the probability of both observing x (i.e. a specific RGB color of


the apple) given that it came from a region we defined as class i AND the
probability of actually coming from class i (i.e. good apple or rotten apple).

Correct Classifications

Wrong Classifications
c11=cost of bagging a good apple, c22=cost of trashing a rotten apple, c21=cost of
trashing a good apple, c12=cost of bagging a rotten apple Note: c21>c11 and c12>c22
Neural Networks and Learning Machines, Third Edition
Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron and Two-class Bayes Classifier

Correct Classifications
Class 1 is defined to
minimize the overall
risk or cost.

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Wrong Classifications

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron and Two-class Bayes Classifier

Fixed risk.

In order to minimize the Risk, assign all


observations x that cause the integrand to
be less than zero to Class 1.

Likelihood ratio test


Assign observation x to Class 1.
Assign observation x to Class 2.

End of Lecture 2
Neural Networks and Learning Machines, Third Edition
Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Perceptron and Two-class Bayes Classifier

These classifiers are equivalent.

Figure 1.5 Two equivalent implementations of the Bayes classifier: (a) Likelihood ratio test, (b) Log-likelihood ratio test.

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Bayes Classifier for a Gaussian Distribution


Special Case: The mean of X varies between classes but the
covariance matrix of X is the same for both classes.
Class 1

Class 2

The covariance matrix C is assumed to be nonsingular. Note C=CT

Decision Boundary

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Bayes Classifier for a Gaussian Distribution


Decision Boundary
Choose Class 1, Otherwise,
Choose Class 2

Choose Class 1, Otherwise, Choose Class 2

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Bayes Classifier for a Gaussian Distribution

Define:

and

Now, the classifier is in the form


Figure 1.6 Signal-flow graph of Gaussian classifier.

Therefore, when the covariance


matrix of X is the same for both
classes, the Bayes classifier of a
Gaussian distribution is in the
form of Rosenblatt's perceptron.

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Bayes Classifier for a Gaussian Distribution


Figure 1.7 Two overlapping, one-dimensional Gaussian distributions.

The perceptron provides perfect classification when the classes are linearly
separable. However, Gaussian distributions overlap and are not linearly
separable. Since, classification error must occur, the goal is to minimize the risk.
Rosenblatt's perceptron is capable of doing that when the covariance matrices
are the same.
Neural Networks and Learning Machines, Third Edition
Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Study Material
What is the structure of a Rosenblatt perceptron and what is its goal? Is the
decision threshold of a Rosenblatt perceptron always a hyperplane or line? What
type of data is the Rosenblatt perceptron capable of perfectly classifying? How
are perceptron weights adapted using the Perceptron Convergence Theorem
(PCT)? Why does the PCT work? How are the weights initialized with the PCT?
What does the Bayes classifier minimize? What are the four costs considered
when forming the Bayes classifier? What is the ultimate test that is formed from
the Bayes classifier? How do you construct the Likelihood Ratio Test (LRT) and
what are all the variables involved? Can log(LRT) give a different result than
LRT? What is the special class of Gaussian data that enables a Rosenblatt
perceptron to duplicate Bayes classification? Can a Rosenblatt perceptron
perfectly classify Gaussian data? Can any classifier perfectly classify Gaussian
data?

Neural Networks and Learning Machines, Third Edition


Simon Haykin

Copyright 2009 by Pearson Education, Inc.


Upper Saddle River, New Jersey 07458
All rights reserved.

Vous aimerez peut-être aussi