Vous êtes sur la page 1sur 21

Machine Learning and eural etwork

Ramin Shamshiri Graduate Student of Biosystem Eng, University of Florida, 30-Oct-2007

Introduction [2] There are situation in which there is no known method for computing the desired output from a set of inputs. An alternative strategy for solving this type of problem is for the computer to attempt to learn the input/output functionality from examples. The approach of using examples to synthesis program is known as the learning methodology, and in the particular cases when the examples are input/output it is called supervises learning. The examples of input/output functionality are referred as the training data. Some Terms and Definitions [2] Target function: When an underlying function from inputs to outputs exists, it is referred to as the target function. Solution: The estimate of the target function which is learnt or output by the learning algorithm is known as the solution of the learning problem. Decision Function: In the case of classification this Target function is sometimes referred to as the decision function. Binary classification: A learning problem with binary outputs is referred to as a binary-classification problem. Regression: For real valued outputs the problem becomes known as regression. Unsupervised Learning: Consider the case where there are no output values and the learning task is to gain some understanding of the process that generates the data. This type of learning includes density estimation, learning the support of a distribution, clustering and Query Learning: Is a model of learning which considers more complex interaction between a learner s and their environment. Query Learning is a case when the learner is allowed to query the environment about the outputs associated with a particular input. The study of how this affects the learners ability to learn different task is known as a query. Reinforcement Learning: Is a further complexity of interaction where the learner has a range of actions at their disposed which they can take to attempt to move towards states where they can expect high rewards. Batch Learning and on-line learning: In batch learning all the data are given to the learner at the start of learning, but in on-line learning the learner receives once example at a time and gives their estimate of the output, before receiving the correct value. In on-line learning they update their current hypothesis in response to each new example and the quality of learning is assessed by the total number of mistakes made during learning.

Machine Learning Machine Learning is a subfield of Artificial Intelligence which deals with algorithm development and programming techniques that allow computers to learn. The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods. Hence, machine learning is closely related to data mining, statistics and theoretical computer science. Two types of learning are currently known: Inductive and Deductive. Induction or inductive reasoning, sometimes called inductive logic: is a reasoning process in which premises of an argument are believed to support the conclusion but do not ensure it. Inductive learning is used to assign properties or relations to types based on tokens or formulate laws based on limited observations of recurring phenomenal patterns. Inductive machine learning methods extract rules and patterns out of massive data sets. Induction is employed, for example, in using specific propositions such as: This ice is cold ==> All ice is cold. A billiard ball moves when struck with a cue ==> All billiard balls struck with a cue move. Deductive reasoning, according Oxford, Cambridge and Merriam Dictionary, is the type of reasoning that proceeds from general principles or premises to derive particular information. Deductive reasoning is dependent on its premises. That is, a false premise can possibly lead to a false result, and inconclusive premises will also yield an inconclusive conclusion. [1] For example: All apples are fruit. All fruits grow on trees ==> Therefore all apples grow on trees. Or All apples are fruit. Some apples are red ==> Therefore some fruit is red.

Both types of reasoning are routinely employed. One difference between them is that in deductive reasoning, the evidence provided must be a set about which everything is known before the conclusion can be drawn. Since it is difficult to know everything before drawing a conclusion, deductive reasoning has little use in the real world. This is where inductive reasoning steps in. Given a set of evidence, however incomplete the knowledge is, the conclusion is likely to follow, but one gives up the guarantee that the conclusion follows. However it does provide the ability to learn new things that are not obvious from the evidence.
Learning and Generalization [2] Early machine learning algorithm aimed to learn representations of simple symbolic functions that could be understood and verified by experts. Therefore the goal of learning in this paradigm was to output a hypothesis that performed the correct classification of the training data and early learning algorithm were designed to find such an accurate fit to the data. Such a hypothesis is said to be consistent.

Some of the specific application of Machine Learning includes: Natural Language Processin Pattern Recognition Search Engines Medical Diagnosis Bioinformatic Cheminformatics Credit card and fraud detection Stock Market analysis DNA sequence classification Speech Recognition Handwriting Recognition Object orientation in coputer vision Game Playing Robot locomotion Algorithm types Machine learning algorithms are organized into different categories based on the desired outcome of the algorithm. Some of the most common types include: Supervised Learning: The algorithm generates a function that maps inputs to desired outputs. One of the standard formulation of the supervised learning task is the classification problems, in which the learner s required to learn the behavior of a function that maps a vector [X1,X2,,XN] into one of several classed by looking at several input-output examples of the function. Unsupervised Learning: a model is fit to observations. (labeled examples are not available) It is distinguished from supervised learning by the fact that there is no a priori output. In unsupervised learning, a data set of input objects is gathered. Semi-Supervised Learning: combines both labeled and unlabeled examples to generate an appropriate function or classifier. Reinforcement Learning: the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm. Learning to Learn: the algorithm learns its own inductive bias based on previous experience.

Machine learning involves adaptive mechanisms that enable computers to learn from experience, learn by example and learn by analogy. Learning capabilities can improve the performance of an intelligent system over time. The most popular approaches to machine learning are artificial neural networks and genetic algorithms.

Artificial eural etwork A neural network can be defined as a model of reasoning based on the human brain. The brain consists of a densely interconnected set of nerve cells, or basic information-processing units, called neurons. [3] The human brain incorporates nearly 10 billion neurons and 60 trillion connections, synapses, between them. By using multiple neurons simultaneously, the brain can perform its functions much faster than the fastest computers in existence today. Artificial neural network (ANN), often just called a "neural network" (NN), is a mathematical model or computational model based on biological neural networks. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. [4]

Image - Biological Neural Net- A neuron consists of a cell body, soma, a number of fibers called dendrites, and a single long fiber called the axon. Bilogical eural etwork Artificial eural etwork

Soma Dendrit Axon Synapse

Neuron Input Output Weight

An artificial neural network consists of a number of very simple processors, also called neurons, which are analogous to the biological neurons in the brain. The neurons are connected by weighted links passing signals from one neuron to another. The output signal is transmitted through the neurons outgoing connection. The outgoing connection splits into a number of branches that transmit the same signal. The outgoing branches terminate at the incoming connections of other neurons in the network.[3] In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase. In more practical terms neural networks are non-linear statistical data modeling tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data. [4] euron, the simplest computation element Neural Network is a network of simple processing elements (neurons), which can exhibit complex global behavior, determined by the connections between the processing elements and element parameters. [4] An artificial neuron, also called semi-linear unit, Nv neuron, binary neuron or McCulloch-Pitts neuron, is an abstraction of biological neurons and the basic unit in an artificial neural network. The Artificial Neuron receives one or more inputs (representing the one or more dendrites) and sums them to produce an output (synapse). Usually the sums of each node are weighted, and the sum is passed through a non-linear function known as an activation or transfer function. [4]

Figure An artificial Neuron.

Basic structure of euron:

For a given artificial neuron, let there be n inputs with signals x0 through xn and weights w0 through wn. The output of neuron is: ( = . )

Where (Phi) is the transfer function. Activation functions of a neuron (Types of transfer functions)
Step Function: 1 0 0 < 0

Sign Function:

+1 0 1 < 0

Sigmoid Function:

Linear Function: =

Perceptron, the simplest eural etwork


In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a simple ANN: a perceptron. The perceptron is the simplest form of a neural network. It consists of a single neuron with adjustable synaptic weights and a hard limiter.

The operation of Rosenblatts perceptron is based on the McCulloch and Pitts neuron model. The model consists of a linear combiner followed by a hard limiter. The aim of the perceptron is to classify inputs, x1, x2, . . ., xn, into one of two classes, say A1 and A2. In the case of an elementary perceptron, the n-dimensional space is divided by a hyper-plane into two decision regions. The hyper-plane is defined by the linearly separable function:

. = 0

Linear separately in the perceptron:

Graphs showing linearly separable logic functions (Fig. 4) Since it is impossible to draw a line to divide the regions containing either 1 or 0, the XOR function is not linearly separable In the above graphs, the two axes are the inputs which can take the value of either 0 or 1, and the numbers on the graph are the expected output for a particular input. Using an appropriate weight vector for each case, a single perceptron can perform all of these functions. However, not all logic operators are linearly separable. For instance, the XOR operator is not linearly separable and cannot be achieved by a single perceptron. Yet this problem could be overcome by using more than one perceptron arranged in feed-forward networks.
How does the perceptron learn its classification tasks?[3] This is done by making small adjustments in the weights to reduce the difference between the actual and desired outputs of the perceptron. The initial weights are randomly assigned, usually in the range [0.5, 0.5], and then updated to obtain the output consistent with the training examples. If at iteration p, the actual output is Y(p) and the desired output is Yd(p), then the error is given by: e(p)=Yd(p)-Y(p) Where p=1,2,3,

Iteration p here refers to the pth training example presented to the perceptron. If the error, e(p), is positive, we need to increase perceptron output Y(p), but if it is negative, we need to decrease Y(p).

The perceptron learning rule The perceptron learning rule was first proposed by Rosenblatt in 1960. Using this rule we can derive the perceptron training algorithm for classification tasks.

( + 1) = ( )+ . (). ()
is the learning rate, a positive constant less than unity.

Where p = 1, 2, 3, . . .

Perceptions training algorithm: Step 1: Initialization Set initial weights w1, w2,, wn and threshold to random numbers in the range [0.5, 0.5]. If the error, e(p), is positive, we need to increase perceptron output Y(p), but if it is negative, we need to decrease Y(p). Step 2: Activation Activate the perceptron by applying inputs x1(p),x2(p),, xn(p) and desired output Yd (p). Calculate the actual output at iteration p = 1 ([ = ) (). ( ) ] where n is the number of the perceptron inputs, and step is a step activation function. Step 3: Weight training Update the weights of the perceptron:

( = ) ( )+ () ( = ). (). ()

Where wi(p) is the weight correction at iteration p. The weight correction is computed by the delta rule:

Step 4: Iteration Increase iteration p by one, go back to Step 2 and repeat the process until convergence.

Two-dimensional plots of basic logical operations

A perceptron can learn the operations AND and OR,but not Exclusive-OR.

Example:

Multilayer neural networks A multilayer perceptron is a feed-forward neural network with one or more hidden layers. The network consists of an input layer of source neurons, at least one middle or hidden layer of computational neurons, and an output layer of computational neurons. The input signals are propagated in a forward direction on a layerby-layer basis.

What does the middle layer hide? A hidden layer hides its desired output. Neurons in the hidden layer cannot be observed through the input/output behavior of the network. There is no obvious way to know what the desired output of the hidden layer should be. Commercial ANNs incorporate three and sometimes four layers, including one or two hidden layers. Each layer can contain from 10 to 1000 neurons. Experimental neural networks may have five or even six layers, including three or four hidden layers, and utilize millions of neurons.

Back-propagation neural network Learning in a multilayer network proceeds the same way as for a perceptron. A training set of input patterns is presented to the network. The network computes its output pattern, and if there is an error or in other words a difference between actual and desired output patterns the weights are adjusted to reduce this error. In a back-propagation neural network, the learning algorithm has two phases. First, a training input pattern is presented to the network input layer. The network propagates the input pattern from layer to layer until the output pattern is generated by the output layer. If this pattern is different from the desired output, an error is calculated and then propagated backwards through the network from the output layer to the input layer. The weights are modified as the error is propagated.

The back-propagation training algorithm Step 1: Initialization Set all the weights and threshold levels of the network to random numbers uniformly distributed inside a small range: 2.4 2.4 ,+ Where Fi is the total number of inputs of neuron i in the network. The weight initialization is done on a neuronby-neuron basis.

Step 2: Activation Activate the back-propagation neural network by applying inputs x1(p), x2(p),, xn(p) and desired outputs yd,1(p), yd,2(p),, yd,n(p). (a) Calculate the actual outputs of the neurons in the hidden layer: ([ = ) (). ( ) ]

Where n is the number of inputs of neuron j in the hidden layer, and sigmoid is the sigmoid activation function. (b) Calculate the actual outputs of the neurons in the output layer: ([ = ) (). ( ) ] where m is the number of inputs of neuron k in the output layer. Step 3: Weight training Update the weights in the back-propagation network propagating backward the errors associated with output neurons. (a) Calculate the error gradient for the neurons in the output layer: ( = ) (). [1 (]). () ( = ), ( ) () Calculate the weight corrections: Update the weights at the output neurons: ( = ). (). () ( + 1) = ( )+ ()

(b) Calculate the error gradient for the neurons in the hidden layer: ( = ) (). 1 (). (). ()

Calculate the weight corrections:

( = ). (). () ( + 1) = ( )+ ()

Update the weights at the hidden neurons:

Step 4: Iteration Increase iteration p by one, go back to Step 2 and repeat the process until the selected error criterion is satisfied.

Example: As an example, we may consider the three-layer back-propagation network. Suppose that the network is required to perform logical operation Exclusive-OR. Recall that a single-layer perceptron could not do this operation. Now we will apply the three-layer net. Three-layer network for solving the Exclusive-OR operation

The effect of the threshold applied to a neuron in the hidden or output layer is represented by its weight, , connected to a fixed input equal to 1. The initial weights and threshold levels are set randomly as follows: w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = 1.2, w45 = 1.1, 3 = 0.8, 4 = 0.1 and 5 = 0.3.

Decision boundaries

(a) Decision boundary constructed by hidden neuron 3; (b) Decision boundary constructed by hidden neuron 4; (c) Decision boundaries constructed by the complete three-layer network

The Neural Network Toolbox in Matlab


The neural network toolbox makes it easier to use neural networks in matlab. The toolbox consists of a set of functions and structures that handle neural networks, so we do not need to write code for all activation functions, training algorithms, etc. that we want to use!

The Neural Network Toolbox is contained in a directory called nnet. Type help nnet for a listing of help topics.

The Structure of the Neural Network Toolbox


The toolbox is based on the network object. This object contains information about everything that concern the neural network, e.g. the number and structure of its layers, the conectivity between the layers, etc. Matlab provides high-level network creation functions, like newlin (create a linear layer), newp (create a perceptron) or newff (create a feed-forward backpropagation network) to allow an easy construction of. As an example we construct a perceptron with two inputs ranging from -2 to 2: >> net = newp([-2 2;-2 2],1) First the architecture parameters and the subobject structures subobject structures: inputs: {1x1 cell} of inputs layers: {1x1 cell} of layers outputs: {1x1 cell} containing 1 output targets: {1x1 cell} containing 1 target biases: {1x1 cell} containing 1 bias inputWeights: {1x1 cell} containing 1 input weight layerWeights: {1x1 cell} containing no layer weights are shown. The latter contains information about the individual objects of the network. Each layer consists of neurons with the same transfer function net.transferFcn and net input function net.netInputFcn, which are in the case of perceptrons hardlim and netsum. If neurons should have different transfer functions then they have to be arranged in different layers. The parameters net.inputWeights and net.layerWeights specify among other things the applied learning functions and their parameters. The next paragraph contains the training, initialization and performance functions. functions: adaptFcn: 'trains' initFcn: 'initlay' performFcn: 'mae' trainFcn: 'trainc' The trainFcn and adaptFcn are used for the two different learning types batch learning and incremental or on-line learning. By setting the trainFcn parameter you tell Matlab which training algorithm should be used, which is in our case the cyclical order incremental training/learning function trainc. The ANN toolbox include almost 20 training functions. The performance function is the function that determines how well the ANN is doing it's task. For a perceptron it is the mean absolute error performance function mae. For linear regression usually the mean squared error performance function mse is used. The initFcn is the function that initialized the weights and biases of the network. To get a list of the functions that are available type help nnet. To change one of these functions to another one in the toolbox or one that you have created, just assign the name of the function to the parameter, e.g. >> net.trainFcn = 'mytrainingfun'; The parameters that concerns these functions are listed in the next paragraph.

parameters: adaptParam: .passes initParam: (none) performParam: (none) trainParam: .epochs, .goal, .show, .time By changing these parameters you can change the default behavior of the functions mentioned above. The parameters you will use the most are probably the components of trainParam. The most used of these are net.trainParam.epochs which tells the algorithm the maximum number of epochs to train, and net.trainParam.show that tells the algorithm how many epochs there should be between each presentation of the performance. Type help train for more information.

The weights and biases are also stored in the network structure: weight and bias values: IW: {1x1 cell} containing 1 input weight matrix LW: {1x1 cell} containing no layer weight matrices b: {1x1 cell} containing 1 bias vector The .IW(i,j) component is a two dimensional cell matrix that holds the weights of the connection between the input j and the network layer i. The .LW(i,j) component holds the weight matrix for the connection from the network layer j to the layer i. The cell array b contains the bias vector for each layer.

A Classification Task

Figure 1: Data set X projected to two dimensions.

As example our task is to create and train a perceptron that correctly classifies points sets belonging to three different classes. First we load the data from the file winedata.mat >> load winedata X C Each row of X represents a sample point whose class is specified by the corresponding element (row) in C. Further the data is transformed into the input/output format used by the Neural Network Toolbox >> P=X'; where P(:,i) is the ith point. Since we want to classify three different classes we use 3 perceptrons, each for the classification of one class. The corresponding target function is generated by >> T=ind2vec(C); To create the perceptron layer with correct input range type >> net=newp(minmax(P),size(T,1));

The difference between train and adapt


Both functions, train and adapt, are used for training a neural network, and most of the time both can be used for the same network. The most important difference has to do with incremental training (updating the weights after the presentation of each single training sample) versus batch training (updating the weights after each presenting the complete data set).

Adapt
First, set net.adaptFcn to the desired adaptation function. We'll use adaptwb (from 'adapt weights and biases'), which allows for a separate update algorithm for each layer. Again, check the Matlab documentation for a complete overview of possible update algorithms. >> net.adaptFcn = 'trains'; Next, since we're using trains, we'll have to set the learning function for all weights and biases: >> net.inputWeights{1,1}.learnFcn = 'learnp'; >> net.biases{1}.learnFcn = 'learnp'; where learnp is the Perceptron learning rule. Finally, a useful parameter is net.adaptParam.passes, which is the maximum number of times the complete training set may be used for updating the network: >> net.adaptParam.passes = 1; When using adapt, both incremental and batch training can be used. Which one is actually used depends on the format of your training set. If it consists of two matrices of input and target vectors, like >> [net,y,e] = adapt(net,P,T); the network will be updated using batch training. Note that all elements of the matrix y are one, because the weights are not updated until all of the trainings set had been presented.

If the training set is given in the form of a cell array >> for i = 1:length(P), P2{i} = P(:,i); T2{i}= T(:,i); end >> net = init(net); >> [net,y2,e2] = adapt(net,P2,T2); then incremental training will be used. Notice that the weights had to be initialized before the network adaption was started. Since adapt takes a lot more time then train we continue our analysis with second algorithm.

Train
When using train on the other hand, only batch training will be used, regardless of the format of the data (you can use both). The advantage of train is that it provides a lot more choice in training functions (gradient descent, gradient descent w/ momentum, Levenberg-Marquardt, etc.) which are implemented very efficiently. So for static networks (no tapped delay lines) usually train is the better choice.

We set

>> net.trainFcn = 'trainb'; for batch learning and >> net.trainFcn = 'trainc'; for on-line learning. Which training parameters are present depends in general on your choice for the training function. In our case two useful parameters are net.trainParam.epochs, which is the maximum number of times the complete data set may be used for training, and net.trainParam.show, which is the time between status reports of the training function. For example, >> net.trainParam.epochs = 1000; >> net.trainParam.show = 100; We initialize and simulate the network with >> net = init(net); >> [net,tr] = train(net,P,T); The trainings error is calculated with >> Y=sim(net,P); >> train_error=mae(Y-T) train_error = 0.3801 So we see that the three classes of the data set were not linear seperable. The best time to stop learning would have been >> [min_perf,min_epoch]=min(tr.perf) min_perf = 0.1948 min_epoch = 703

Figure 2: Performance of the learning algorithm train over 1000 epochs.

A Simple logical problem


The task is to create and train a neural network that solves the XOR problem. XOR is a function that returns 1 when the two inputs are not equal,

Construct a Feed-Forward etwork


To solve this we will need a feedforward neural network with two input neurons, and one output neuron. Because that the problem is not linearly separable it will also need a hidden layer with two neurons.To create a new feed forward neural network use the command newff. You have to enter the max and min of the input values, the number of neurons in each

layer and optionally the activation functions. >> net = newff([0 1; 0 1],[2 1],{'logsig','logsig'}); The variable net will now contain an untrained feedforward neural network with two neurons in the input layer, two neurons in the hidden layer and one output neuron, exactly as we want it. The [0 1; 0 1] tells matlab that the input values ranges between 0 and 1. The 'logsig','logsig' tells matlab that we want to use the logsig function as activation function in all layers. The first parameter tells the network how many nodes there should be in the input layer, hence you do not have to specify this in the second parameter. You have to specify at least as many transfer functions as there are layers, not counting the input layer. If you do not specify any transfer function Matlab will use the default settings.

First we construct a matrix of the inputs. The input to the network is always in the columns of the matrix. To create a matrix with the inputs "1 1", "1 0", "0 1" and "0 0" we enter: >> input = [1 1 0 0; 1 0 1 0] input = 1 1 0 0 1 0 1 0 Further we construct the target vector: >> target = [0 1 1 0] target = 0

Train the etwork via Backpropagation


In this example we do not need all the information that the training algorithms shows, so we turn it of by entering: >> net.trainParam.show=NaN; Let us apply the default training algorithm Levenberg-Marquardt backpropagation trainlm to our network. An additional training parameters is .min_grad. If the gradient of the performance is less than .min_grad the training is ended. To train the network enter: >> net = train(net,input,target); Because of the small size of the network, the training is done in only a second or two. Now we simulate the network, to see how it reacts to the inputs: >> output = sim(net,input) output = 0.0000 1.0000 1.0000 0.0000 That was exactly what we wanted the network to output! Now examine the weights that the training algorithm has set >> net.IW{1,1} ans = 11.0358

-9.5595

16.8909

-17.5570

>> net.LW{2,1} ans = 25.9797

-25.7624

Graphical User Interface


A graphical user interface has been added to the toolbox. This interface allows you to:

Creat networks Enter data into the GUI Initialize, train, and simulate networks Export the training results from the GUI to the command line workspace Import data from the command line workspace to the GUI

To open the Network/Data Manager window type nntool.

References: 1- Brief Discussion on Inductive/Deductive Profiling(http://www.investigativepsych.com/inductive.htm) 2- An introduction to support vector machines and other kernel based methods 3- Introduction to Knowledge based intelligent system, Negnevitsky, Pearson Negnevitsky, Pearson Education, 2002 4- Wiki 5- http://cse.stanford.edu/class/sophomore-college/projects-00/neural-networks/Neuron/index.html

Vous aimerez peut-être aussi