Vous êtes sur la page 1sur 2

7.4.

1 Neural Networks
Neural networks are a popular target representation for learning. These networks are inspired by the neurons in the
brain but do not actually simulate neurons. Artificial neural networks typically contain many fewer than the
approximately 10
11
neurons that are in the human brain, and the artificial neurons, called units, are much simpler
than their biological counterparts.
Artificial neural networks are interesting to study for a number of reasons:
As part of neuroscience, to understand real neural systems, researchers are simulating the neural systems of simple
animals such as worms, which promises to lead to an understanding about which aspects of neural systems are
necessary to explain the behavior of these animals.
Some researchers seek to automate not only the functionality of intelligence (which is what the field of
artificial intelligence is about) but also the mechanism of the brain, suitably abstracted. One hypothesis is
that the only way to build the functionality of the brain is by using the mechanism of the brain. This
hypothesis can be tested by attempting to build intelligence using the mechanism of the brain, as well as
without using the mechanism of the brain. Experience with building other machines - such as flying
machines, which use the same principles, but not the same mechanism, that birds use to fly - would indicate
that this hypothesis may not be true. However, it is interesting to test the hypothesis.
The brain inspires a new way to think about computation that contrasts with currently available computers.
Unlike current computers, which have a few processors and a large but essentially inert memory, the brain
consists of a huge number of asynchronous distributed processes, all running concurrently with no master
controller. One should not think that the current computers are the only architecture available for
computation.
As far as learning is concerned, neural networks provide a different measure of simplicity as a learning bias
than, for example, decision trees. Multilayer neural networks, like decision trees, can represent any function
of a set of discrete features. However, the functions that correspond to simple neural networks do not
necessarily correspond to simple decision trees. Neural network learning imposes a different bias than
decision tree learning. Which is better, in practice, is an empirical question that can be tested on different
domains.
There are many different types of neural networks. This book considers one kind of neural network, the feed-
forward neural network. Feed-forward networks can be seen as cascaded squashed linear functions. The inputs
feed into a layer of hidden units, which can feed into layers of more hidden units, which eventually feed into the
output layer. Each of the hidden units is a squashed linear function of its inputs.
Neural networks of this type can have as inputs any real numbers, and they have a real number as output. For
regression, it is typical for the output units to be a linear function of their inputs. For classification it is typical for
the output to be a sigmoid function of its inputs (because there is no point in predicting a value outside of [0,1]). For
the hidden layers, there is no point in having their output be a linear function of their inputs because a linear
function of a linear function is a linear function; adding the extra layers gives no added functionality. The output of
each hidden unit is thus a squashed linear function of its inputs.
Associated with a network are the parameters for all of the linear functions. These parameters can be tuned
simultaneously to minimize the prediction error on the training examples.
Figure 7.11: A neural network with one hidden layer. The w
i
are weights. The weight inside the nodes is the weight
that does not depend on an input; it is the one multiplied by 1. The meaning of this network is given in Example
7.15.
Example 7.15: Figure 7.11 shows a neural network with one hidden layer for the classification data of Figure 7.9.
As explained in Example 7.11, this data set is not linearly separable. In this example, five Boolean inputs
correspond to whether there is culture, whether the person has to fly, whether the destination is hot, whether there is
music, and whether there is nature, and a single output corresponds to whether the person likes the holiday. In this
network, there is one hidden layer, which contains two hidden units that have no a priori meaning. The network
represents the following equations:
pval(e,Likes) = f(w
0
+w
1
val(e,H1)+w
2
val(e,H2))
val(e,H1) =
f(w
3
+w
4
val(e,Culture)+w
5
val(e,Fly) +w
6
val(e,Hot)+ w
7
val(e,Music)+
w
8
val(e,Nature)
val(e,H2) =
f(w
9
+w
10
val(e,Culture)+w
11
val(e,Fly) +w
12
val(e,Hot)+ w
13
val(e,Music) +
w
14
val(e,Nature)) ,
where f(x) is an activation function. For this example, there are 15 real numbers to be learned (w
0
,...,w
14
). The
hypothesis space is thus a 15-dimensional real space. Each point in this 15-dimensional space corresponds to a
function that predicts a value for Likes for every example with Culture, Fly, Hot, Music, and Nature given.
Given particular values for the parameters, and given values for the inputs, a neural network predicts a value for
each target feature. The aim of neural network learning is, given a set of examples, to find parameter settings that
minimize the error. If there are m parameters, finding the parameter settings with minimum error involves searching
through an m-dimensional Euclidean space.