Vous êtes sur la page 1sur 8

129: Artificial Neural Networks

Ajith Abraham
Oklahoma State University, Stillwater, OK, USA

human brain is a collection of more than 10 billion inter-


1 Introduction to Artificial Neural Networks 901 connected neurons. Each neuron is a cell (Figure 1) that
uses biochemical reactions to receive, process, and transmit
2 Neural Network Architectures 902
information.
3 Neural Network Learning 903 Treelike networks of nerve fibers called dendrites are
4 Backpropagation Learning 903 connected to the cell body or soma, where the cell nucleus is
5 Training and Testing Neural Networks 904 located. Extending from the cell body is a single long fiber
6 Higher Order Learning Algorithms 905 called the axon, which eventually branches into strands
7 Designing Artificial Neural Networks 905 and substrands, and are connected to other neurons through
8 Self-organizing Feature Map and Radial synaptic terminals or synapses.
Basis Function Network 906 The transmission of signals from one neuron to another
9 Recurrent Neural Networks and Adaptive at synapses is a complex chemical process in which specific
Resonance Theory 907 transmitter substances are released from the sending end of
10 Summary 908 the junction. The effect is to raise or lower the electrical
potential inside the body of the receiving cell. If the
References 908
potential reaches a threshold, a pulse is sent down the axon
and the cell is ‘fired’.
Artificial neural networks (ANN) have been developed
as generalizations of mathematical models of biological
1 INTRODUCTION TO ARTIFICIAL nervous systems. A first wave of interest in neural networks
NEURAL NETWORKS (also known as connectionist models or parallel distributed
processing) emerged after the introduction of simplified
A general introduction to artificial intelligence methods neurons by McCulloch and Pitts (1943).
of measuring signal processing is given in Article 128, The basic processing elements of neural networks are
Nature and Scope of AI Techniques, Volume 2. called artificial neurons, or simply neurons or nodes. In a
The human brain provides proof of the existence of mas- simplified mathematical model of the neuron, the effects
sive neural networks that can succeed at those cognitive, of the synapses are represented by connection weights that
perceptual, and control tasks in which humans are suc- modulate the effect of the associated input signals, and the
cessful. The brain is capable of computationally demanding nonlinear characteristic exhibited by neurons is represented
perceptual acts (e.g. recognition of faces, speech) and con- by a transfer function. The neuron impulse is then computed
trol activities (e.g. body movements and body functions). as the weighted sum of the input signals, transformed by
The advantage of the brain is its effective use of mas- the transfer function. The learning capability of an artificial
sive parallelism, the highly parallel computing structure, neuron is achieved by adjusting the weights in accordance
and the imprecise information-processing capability. The to the chosen learning algorithm.

Handbook of Measuring System Design, edited by Peter H. Sydenham and Richard Thorn.
 2005 John Wiley & Sons, Ltd. ISBN: 0-470-02143-8.
902 Elements: B – Signal Conditioning

2 NEURAL NETWORK ARCHITECTURES


Axon
The basic architecture consists of three types of neuron
Soma
Dendrites layers: input, hidden, and output layers. In feed-forward
networks, the signal flow is from input to output units,
strictly in a feed-forward direction. The data processing
Nucleus can extend over multiple (layers of) units, but no feed-
back connections are present. Recurrent networks contain
feedback connections. Contrary to feed-forward networks,
the dynamical properties of the network are important. In
Synaptic terminals
some cases, the activation values of the units undergo a
Figure 1. Mammalian neuron. relaxation process such that the network will evolve to a
stable state in which these activations do not change any-
more. In other applications, the changes of the activation
A typical artificial neuron and the modeling of a multi- values of the output neurons are significant, such that the
layered neural network are illustrated in Figure 2. Referring dynamical behavior constitutes the output of the network.
to Figure 2, the signal flow from inputs x1 , . . . , xn is con- There are several other neural network architectures (Elman
sidered to be unidirectional, which are indicated by arrows, network, adaptive resonance theory maps, competitive net-
as is a neuron’s output signal flow (O). The neuron output works, etc.), depending on the properties and requirement
signal O is given by the following relationship: of the application. The reader can refer to Bishop (1995)
for an extensive overview of the different neural network
 

n architectures and learning algorithms.
O = f (net) = f  wj xj  (1) A neural network has to be configured such that the
j =1 application of a set of inputs produces the desired set of
outputs. Various methods to set the strengths of the connec-
where wj is the weight vector, and the function f(net) is tions exist. One way is to set the weights explicitly, using
referred to as an activation (transfer) function. The variable a priori knowledge. Another way is to train the neural net-
net is defined as a scalar product of the weight and input work by feeding it teaching patterns and letting it change
vectors, its weights according to some learning rule. The learning
situations in neural networks may be classified into three
distinct sorts. These are supervised learning, unsupervised
net = wT x = w1 x1 + · · · · +wn xn (2)
learning, and reinforcement learning. In supervised learn-
ing, an input vector is presented at the inputs together with
where T is the transpose of a matrix, and, in the simplest a set of desired responses, one for each node, at the output
case, the output value O is computed as layer. A forward pass is done, and the errors or discrep-
 ancies between the desired and actual response for each
1 if wT x  θ node in the output layer are found. These are then used to
O = f (net) = (3)
0 otherwise determine weight changes in the net according to the pre-
vailing learning rule. The term supervised originates from
where θ is called the threshold level; and this type of node the fact that the desired signals on individual output nodes
is called a linear threshold unit. are provided by an external teacher.

Hidden layer
x1 Input layer
Output layer
x2 w1

w2 q f
x3 output (o )
w3
x4 w4

(a) Artificial neuron (b) Multilayered artificial neural network

Figure 2. Architecture of an artificial neuron and a multilayered neural network.


Artificial Neural Networks 903

The best-known examples of this technique occur in the where o is the desired output for
backpropagation algorithm, the delta rule, and the percep-
i = 1 to n(inputs).
tron rule. In unsupervised learning (or self-organization),
a (output) unit is trained to respond to clusters of pattern
Unfortunately, plain Hebbian learning continually streng-
within the input. In this paradigm, the system is supposed
thens its weights without bound (unless the input data is
to discover statistically salient features of the input pop-
properly normalized).
ulation. Unlike the supervised learning paradigm, there is
no a priori set of categories into which the patterns are to
be classified; rather, the system must develop its own rep- 3.2 Perceptron learning rule
resentation of the input stimuli. Reinforcement learning is
learning what to do – how to map situations to actions – so The perceptron is a single layer neural network whose
as to maximize a numerical reward signal. The learner is weights and biases could be trained to produce a correct
not told which actions to take, as in most forms of machine target vector when presented with the corresponding input
learning, but instead must discover which actions yield the vector. The training technique used is called the perceptron-
most reward by trying them. In the most interesting and learning rule. Perceptrons are especially suited for simple
challenging cases, actions may affect not only the imme- problems in pattern classification.
diate reward, but also the next situation and, through that, Suppose we have a set of learning samples consisting
all subsequent rewards. These two characteristics, trial-and- of an input vector x and a desired output d(k). For a
error search and delayed reward are the two most important classification task, the d(k) is usually +1 or −1. The
distinguishing features of reinforcement learning. perceptron-learning rule is very simple and can be stated
as follows:
3 NEURAL NETWORK LEARNING 1. Start with random weights for the connections.
2. Select an input vector x from the set of training
3.1 Hebbian learning samples.
3. If output yk = d(k) (the perceptron gives an incorrect
The learning paradigms discussed above result in an adjust- response), modify all connections wi according to:
ment of the weights of the connections between units, δwi = η(dk − yk )xi ; (η = learning rate).
according to some modification rule. Perhaps the most influ- 4. Go back to step 2.
ential work in connectionism’s history is the contribution Note that the procedure is very similar to the Hebb
of Hebb (1949), where he presented a theory of behav- rule; the only difference is that when the network responds
ior based, as much as possible, on the physiology of the correctly, no connection weights are modified.
nervous system.
The most important concept to emerge from Hebb’s
work was his formal statement (known as Hebb’s postu- 4 BACKPROPAGATION LEARNING
late) of how learning could occur. Learning was based on
the modification of synaptic connections between neurons. The simple perceptron is just able to handle linearly separa-
Specifically, when an axon of cell A is near enough to excite ble or linearly independent problems. By taking the partial
a cell B and repeatedly or persistently takes part in firing derivative of the error of the network with respect to each
it, some growth process or metabolic change takes place weight, we will learn a little about the direction the error
in one or both cells such that A’s efficiency, as one of the of the network is moving.
cells firing B, is increased. The principles underlying this In fact, if we take the negative of this derivative (i.e.
statement have become known as Hebbian Learning. Vir- the rate change of the error as the value of the weight
tually, most of the neural network learning techniques can increases) and then proceed to add it to the weight, the error
be considered as a variant of the Hebbian learning rule. The will decrease until it reaches a local minima. This makes
basic idea is that if two neurons are active simultaneously, sense because if the derivative is positive, this tells us that
their interconnection must be strengthened. If we consider the error is increasing when the weight is increasing. The
a single layer net, one of the interconnected neurons will obvious thing to do then is to add a negative value to the
be an input unit and one an output unit. If the data are rep- weight and vice versa if the derivative is negative. Because
resented in bipolar form, it is easy to express the desired the taking of these partial derivatives and then applying
weight update as them to each of the weights takes place, starting from the
output layer to hidden layer weights, then the hidden layer
wi (new) = wi (old) + xi o, to input layer weights (as it turns out, this is necessary since
904 Elements: B – Signal Conditioning

changing these set of weights requires that we know the data to get the network familiarized with noise and natural
partial derivatives calculated in the layer downstream), this variability in real data.
algorithm has been called the backpropagation algorithm. Poor training data inevitably leads to an unreliable and
A neural network can be trained in two different modes: unpredictable network. Usually, the network is trained for
online and batch modes. The number of weight updates of a prefixed number of epochs or when the output error
the two methods for the same number of data presentations decreases below a particular error threshold.
is very different. Special care is to be taken not to overtrain the network.
The online method weight updates are computed for By overtraining, the network may become too adapted in
each input data sample, and the weights are modified after learning the samples from the training set, and thus may
each sample. be unable to accurately classify samples outside of the
An alternative solution is to compute the weight update training set.
for each input sample, but store these values during one Figure 3 illustrates the classification results of an over-
pass through the training set which is called an epoch. trained network. The task is to correctly classify two pat-
At the end of the epoch, all the contributions are added, terns X and Y. Training patterns are shown by ‘ ’ and test
and only then the weights will be updated with the compos- patterns by ‘ ’. The test patterns were not shown during
ite value. This method adapts the weights with a cumulative the training phase.
weight update, so it will follow the gradient more closely. As shown in Figure 3 (left side), each class of test data
It is called the batch-training mode. has been classified correctly, even though they were not
Training basically involves feeding training samples as seen during training. The trained network is said to have
input vectors through a neural network, calculating the error good generalization performance. Figure 3 (right side) illus-
of the output layer, and then adjusting the weights of the trates some misclassification of the test data. The network
network to minimize the error. initially learns to detect the global features of the input
The average of all the squared errors (E) for the outputs and, as a consequence, generalizes very well. But after
is computed to make the derivative easier. Once the error prolonged training, the network starts to recognize indi-
is computed, the weights can be updated one by one. In the vidual input/output pairs rather than settling for weights
batched mode variant, the descent is based on the gradient that generally describe the mapping for the whole training
∇E for the total training set set (Fausett, 1994).
δE
wij (n) = −η∗ + α ∗ wij (n − 1) (4)
δwij 5.1 Choosing the number of neurons
where η and α are the learning rate and momentum respec-
tively. The number of hidden neurons affects how well the network
The momentum term determines the effect of past weight is able to separate the data. A large number of hidden
changes on the current direction of movement in the neurons will ensure correct learning, and the network is
weight space. A good choice of both η and α are required able to correctly predict the data it has been trained on,
for the training success and the speed of the neural- but its performance on new data, its ability to generalize,
network learning. is compromised. With too few hidden neurons, the network
It has been proven that backpropagation learning with may be unable to learn the relationships amongst the data
sufficient hidden layers can approximate any nonlinear and the error will fail to fall below an acceptable level.
function to arbitrary accuracy. This makes backpropaga- Thus, selection of the number of hidden neurons is a
tion learning neural network a good candidate for signal crucial decision.
prediction and system modeling.
Y Y

5 TRAINING AND TESTING NEURAL


NETWORKS
The best training procedure is to compile a wide range of
examples (for more complex problems, more examples are X X
required), which exhibit all the different characteristics of (a) Good generalization (b) Poor generalization
the problem. Training samples Test samples
To create a robust and reliable network, in some cases,
some noise or other randomness is added to the training Figure 3. Illustration of generalization performance.
Artificial Neural Networks 905

5.2 Choosing the initial weights The fourth method of Levenberg and Marquardt is specif-
ically adapted to the minimization of an error function that
The learning algorithm uses a steepest descent technique, arises from a squared error criterion of the form we are
which rolls straight downhill in weight space until the assuming. A common feature of these training algorithms
first valley is reached. This makes the choice of initial is the requirement of repeated efficient calculation of gradi-
starting point in the multidimensional weight space critical. ents. The reader can refer to Bishop (1995) for an extensive
However, there are no recommended rules for this selection coverage of higher-order learning algorithms.
except trying several different starting weight values to see Even though artificial neural networks are capable of per-
if the network results are improved. forming a wide variety of tasks, in practice, sometimes, they
deliver only marginal performance. Inappropriate topology
selection and learning algorithm are frequently blamed.
There is little reason to expect that one can find a uni-
5.3 Choosing the learning rate formly best algorithm for selecting the weights in a feed-
forward artificial neural network. This is in accordance
Learning rate effectively controls the size of the step that is with the no free lunch theorem, which explains that for
taken in multidimensional weight space when each weight any algorithm, any elevated performance over one class of
is modified. If the selected learning rate is too large, then the problems is exactly paid for in performance over another
local minimum may be overstepped constantly, resulting in class (Macready and Wolpert, 1997).
oscillations and slow convergence to the lower error state. The design of artificial neural networks using evolu-
If the learning rate is too low, the number of iterations tionary algorithms has been widely explored. Evolutionary
required may be too large, resulting in slow performance. algorithms are used to adapt the connection weights, net-
work architecture, and so on, according to the problem
environment.
6 HIGHER ORDER LEARNING A distinct feature of evolutionary neural networks is their
ALGORITHMS adaptability to a dynamic environment. In other words, such
neural networks can adapt to an environment as well as
Backpropagation (BP) often gets stuck at a local minimum changes in the environment. The two forms of adaptation,
mainly because of the random initialization of weights. evolution and learning in evolutionary artificial neural net-
For some initial weight settings, BP may not be able works, make their adaptation to a dynamic environment
to reach a global minimum of weight space, while for much more effective and efficient than the conventional
other initializations the same network is able to reach an learning approach. Refer to Abraham (2004) for more tech-
optimal minimum. nical information related to evolutionary design of neu-
A long recognized bane of analysis of the error sur- ral networks.
face and the performance of training algorithms is the
presence of multiple stationary points, including multiple 7 DESIGNING ARTIFICIAL NEURAL
minima.
Empirical experience with training algorithms show that NETWORKS
different initialization of weights yield different resulting
networks. Hence, multiple minima not only exist, but there To illustrate the design of artificial neural networks, the
may be huge numbers of them. Mackey-Glass chaotic time series (Box and Jenkins, 1970)
In practice, there are four types of optimization algo- benchmark is used. The performance of the designed neural
rithms that are used to optimize the weights. The first three network is evaluated for different architectures and activa-
methods, gradient descent, conjugate gradients, and quasi- tion functions. The Mackey-Glass differential equation is a
Newton, are general optimization methods whose operation chaotic time series for some values of the parameters x(0)
can be understood in the context of minimization of a and τ .
quadratic error function. dx(t) 0.2x(t − τ )
Although the error surface is surely not quadratic, for = − 0.1 x(t). (5)
dt 1 + x 10 (t − τ )
differentiable node functions, it will be so in a sufficiently
small neighborhood of a local minimum, and such an We used the value x(t − 18), x(t − 12), x(t − 6), x(t)
analysis provides information about the behavior of the to predict x(t + 6). Fourth order Runge-Kutta method was
training algorithm over the span of a few iterations and used to generate 1000 data series. The time step used in the
also as it approaches its goal. method is 0.1 and initial condition were x(0) = 1.2, τ =
906 Elements: B – Signal Conditioning

Table 1. Training and test performance for Mackey-Glass Series Table 2. Mackey-Glass time series: training and generalization
for different architectures. performance for different activation functions.
Hidden neurons Root mean-squared error Activation function Root mean-squared error

Training data Test data Training Test


14 0.0890 0.0880 TSAF 0.0439 0.0437
16 0.0824 0.0860 LSAF 0.0970 0.0950
18 0.0764 0.0750
20 0.0452 0.0442
24 0.0439 0.0437
24 1.06

Hidden neurons
20 0.89
17, x(t) = 0 for t < 0. The first 500 data sets were used
18 0.8
for training and remaining data for testing.
16 0.71

14 0.62

7.1 Network architecture 0.5 0.6 0.7 0.8 0.9 1 1.1


Billion flops

A feed-forward neural network with four input neurons, one Figure 5. Computational complexity for different architectures.
hidden layer and one output neuron is used. Weights were
randomly initialized and the learning rate and momentum two node transfer functions. The generalization looks better
are set at 0.05 and 0.1 respectively. The numbers of hidden with TSAF.
neurons are varied (14, 16, 18, 20, 24) and the general- Figure 5 illustrates the computational complexity in bil-
ization performance is reported in Table 1. All networks lion flops for different numbers of hidden neurons. At
were trained for an identical number of stochastic updates present, neural network design relies heavily on human
(2500 epochs). experts who have sufficient knowledge about the differ-
ent aspects of the network and the problem domain. As
the complexity of the problem domain increases, manual
design becomes more difficult.
7.2 Role of activation functions

The effect of two different node activation functions in 8 SELF-ORGANIZING FEATURE MAP
the hidden layer, log-sigmoidal activation function LSAF
AND RADIAL BASIS FUNCTION
and tanh-sigmoidal activation function TSAF), keeping
24 hidden neurons for the backpropagation learning algo- NETWORK
rithm, is illustrated in Figure 4. Table 2 summarizes the
empirical results for training and generalization for the 8.1 Self-organizing feature map

Self-organizing Feature Maps SOFM is a data visualization


0.8
technique proposed by Kohonen (1988), which reduces
0.7
the dimensions of data through the use of self-organizing
0.6
neural networks.
0.5
A SOFM learns the categorization, topology, and dis-
RMSE

0.4 tribution of input vectors. SOFM allocate more neurons


0.3 to recognize parts of the input space where many input
0.2 vectors occur and allocate fewer neurons to parts of the
0.1 input space where few input vectors occur. Neurons next
0 to each other in the network learn to respond to similar
25 150 500 1000 1500 2000 2500
vectors.
LSAF TSAF Epochs
SOFM can learn to detect regularities and correlations
Figure 4. Convergence of training for different node trans- in their input and adapt their future responses to that input
fer function. accordingly. An important feature of the SOFM learning
Artificial Neural Networks 907

algorithm is that it allows neurons that are neighbors to the 9 RECURRENT NEURAL NETWORKS
winning neuron to be output values. Thus, the transition of AND ADAPTIVE RESONANCE THEORY
output vectors is much smoother than that obtained with
competitive layers, where only one neuron has an output at
a time.
9.1 Recurrent neural networks
The problem that data visualization attempts to solve
is that humans simply cannot visualize high-dimensional Recurrent networks are the state of the art in nonlinear
data. The way SOFM goes about reducing dimensions is time series prediction, system identification, and temporal
by producing a map of usually 1 or 2 dimensions, which pattern classification. As the output of the network at time
plot the similarities of the data by grouping similar data t is used along with a new input to compute the output of
items together (data clustering). In this process, SOFM the network at time t + 1, the response of the network is
accomplish two things, they reduce dimensions and display dynamic (Mandic and Chambers, 2001).
similarities. Time Lag Recurrent Networks (TLRN) are multilayered
It is important to note that while a self-organizing map perceptrons extended with short-term memory structures
does not take long to organize itself so that neighboring that have local recurrent connections. The recurrent neural
neurons recognize similar inputs, it can take a long time for network is a very appropriate model for processing temporal
the map to finally arrange itself according to the distribution (time-varying) information.
of input vectors. Examples of temporal problems include time-series pre-
diction, system identification, and temporal pattern recog-
nition. A simple recurrent neural network could be con-
structed by a modification of the multilayered feed-forward
8.2 Radial basis function network network with the addition of a ‘context layer’. The context
layer is added to the structure, which retains information
between observations. At each time step, new inputs are
The Radial Basis Function (RBF) network is a three-layer
fed to the network. The previous contents of the hidden
feed-forward network that uses a linear transfer function for
layer are passed into the context layer. These then feed
the output units and a nonlinear transfer function (normally
back into the hidden layer in the next time step. Initially,
the Gaussian) for the hidden layer neurons (Chen, Cowan
the context layer contains nothing, so the output from the
and Grant, 1991). Radial basis networks may require more
hidden layer after the first input to the network will be the
neurons than standard feed-forward backpropagation net-
same as if there is no context layer. Weights are calculated
works, but often they can be designed with lesser time.
in the same way for the new connections from and to the
They perform well when many training data are avail-
context layer from the hidden layer.
able.
The training algorithm used in TLRN (backpropagation
Much of the inspiration for RBF networks has come from
through time) is more advanced than standard backprop-
traditional statistical pattern classification techniques. The
agation algorithm. Very often, TLRN requires a smaller
input layer is simply a fan-out layer and does no processing.
network to learn temporal problems when compared to
The second or hidden layer performs a nonlinear mapping
MLP that use extra inputs to represent the past samples.
from the input space into a (usually) higher dimensional
TLRN is biologically more plausible and computationally
space whose activation function is selected from a class of
more powerful than other adaptive models such as the hid-
functions called basis functions.
den Markov model.
The final layer performs a simple weighted sum with a
Some popular recurrent network architectures are the
linear output. Contrary to BP networks, the weights of the
Elman recurrent network in which the hidden unit activation
hidden layer basis units (input to hidden layer) are set using
values are fed back to an extra set of input units and the
some clustering techniques. The idea is that the patterns in
Jordan recurrent network in which output values are fed
the input space form clusters. If the centers of these clusters
back into hidden units.
are known, then the Euclidean distance from the cluster
center can be measured. As the input data moves away
from the connection weights, the activation value reduces.
This distance measure is made nonlinear in such a way that 9.2 Adaptive resonance theory
for input data close to a cluster center gets a value close to
1. Once the hidden layer weights are set, a second phase Adaptive Resonance Theory (ART) was initially introduced
of training (usually backpropagation) is used to adjust the by Grossberg (1976) as a theory of human information
output weights. processing. ART neural networks are extensively used for
908 Elements: B – Signal Conditioning

supervised and unsupervised classification tasks and func- Box, G.E.P. and Jenkins, G.M. (1970) Time Series Analy-
tion approximation. sis, Forecasting and Control, Holden Day, San Francisco,
There exist many different variations of ART networks CA.
today (Carpenter and Grossberg, 1998). For example, ART1 Carpenter, G. and Grossberg, S. (1998) in Adaptive Resonance
performs unsupervised learning for binary input patterns, Theory (ART), The Handbook of Brain Theory and Neural
Networks, (ed. M.A. Arbib), MIT Press, Cambridge, MA, (pp.
ART2 is modified to handle both analog and binary input 79–82).
patterns, and ART3 performs parallel searches of distributed
Chen, S., Cowan, C.F.N. and Grant, P.M. (1991) Orthogonal
recognition codes in a multilevel network hierarchy. Fuzzy Least Squares Learning Algorithm for Radial Basis Func-
ARTMAP represents a synthesis of elements from neural tion Networks. IEEE Transactions on Neural Networks, 2(2),
networks, expert systems, and fuzzy logic. 302–309.
Fausett, L. (1994) Fundamentals of Neural Networks, Prentice
Hall, USA.
10 SUMMARY
Grossberg, S. (1976) Adaptive Pattern Classification and Uni-
versal Recoding: Parallel Development and Coding of Neural
This section presented the biological motivation and fun-
Feature Detectors. Biological Cybernetics, 23, 121–134.
damental aspects of modeling artificial neural networks.
Hebb, D.O. (1949) The Organization of Behavior, John Wiley,
Performance of feed-forward artificial neural networks for
New York.
a function approximation problem is demonstrated. Advan-
Kohonen, T. (1988) Self-Organization and Associative Memory,
tages of some specific neural network architectures and
Springer-Verlag, New York.
learning algorithms are also discussed.
Macready, W.G. and Wolpert, D.H. (1997) The No Free Lunch
Theorems. IEEE Transactions on Evolutionary Computing,
REFERENCES 1(1), 67–82.
Mandic, D. and Chambers, J. (2001) Recurrent Neural Networks
Abraham, A. (2004) Meta-Learning Evolutionary Artificial Neu- for Prediction: Learning Algorithms, Architectures and Stabil-
ral Networks, Neurocomputing Journal, Vol. 56c, Elsevier Sci- ity, John Wiley & Sons, New York.
ence, Netherlands, (1–38). McCulloch, W.S. and Pitts, W.H. (1943) A Logical Calculus of
Bishop, C.M. (1995) Neural Networks for Pattern Recognition, the Ideas Immanent in Nervous Activity. Bulletin of Mathemat-
Oxford University Press, Oxford, UK. ical Biophysics, 5, 115–133.

Vous aimerez peut-être aussi