Vous êtes sur la page 1sur 13

WAP to implement Artificial Neural

Network
Implementing Artificial Neural Network training process in Python
An Artificial Neural Network (ANN) is an information processing paradigm that is inspired the brain.
ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern
recognition or data classification, through a learning process. Learning largely involves adjustments to
the synaptic connections that exist between the neurons.

The brain consists of hundreds of billion of cells called neurons. These neurons are connected
together by synapses which are nothing but the connections across which a neuron can send an
impulse to another neuron. When a neuron sends an excitatory signal to another neuron, then this
signal will be added to all of the other inputs of that neuron. If it exceeds a given threshold then it will
cause the target neuron to fire an action signal forward — this is how the thinking process works
internally.
In Computer Science, we model this process by creating “networks” on a computer using matrices.
These networks can be understood as abstraction of neurons without all the biological complexities
taken into account. To keep things simple, we will just model a simple NN, with two layers capable of
solving linear classification problem.

Let’s say we have a problem where we want to predict output given a set of inputs and outputs as
training example like so:
Note that the output is directly related to third column i.e. the values of input 3 is what the output is in
every training example in fig. 2. So for the test example output value should be 1.
The training process consists of the following steps:
1. Forward Propagation:
Take the inputs, multiply by the weights (just use random numbers as weights)
Let Y = W iIi = W 1I1+W 2I2+W3I3
Pass the result through a sigmoid formula to calculate the neuron’s output. The Sigmoid
function is used to normalise the result between 0 and 1:
1/1 + e-y
2. Back Propagation
Calculate the error i.e the difference between the actual output and the expected output.
Depending on the error, adjust the weights by multiplying the error with the input and again with
the gradient of the Sigmoid curve:
Weight += Error Input Output (1-Output) ,here Output (1-Output) is derivative of sigmoid curve.
Note: Repeat the whole process for a few thousands iterations.

Let’s code up the whole process in Python. We’ll be using Numpy library to help us with all the
calculations on matrices easily. You’d need to install numpy library on your system to run the code
Command to install numpy:
sudo apt -get install python-numpy
Implementation:

from numpy import *

class NeuralNet(object):
def __init__(self):
# Generate random numbers
random.seed(1)

# Assign random weights to a 3 x 1 matrix,


self.synaptic_weights = 2 * random.random((3, 1)) - 1

# The Sigmoid function


def __sigmoid(self, x):
return 1 / (1 + exp(-x))

# The derivative of the Sigmoid function.


# This is the gradient of the Sigmoid curve.
def __sigmoid_derivative(self, x):
return x * (1 - x)

# Train the neural network and adjust the weights each time.
def train(self, inputs, outputs, training_iterations):
for iteration in xrange(training_iterations):

# Pass the training set through the network.


output = self.learn(inputs)

# Calculate the error


error = outputs - output

# Adjust the weights by a factor


factor = dot(inputs.T, error * self.__sigmoid_derivative(output))
self.synaptic_weights += factor

# The neural network thinks.


def learn(self, inputs):
return self.__sigmoid(dot(inputs, self.synaptic_weights))

if __name__ == "__main__":

#Initialize
neural_network = NeuralNet()

# The training set.


inputs = array([[0, 1, 1], [1, 0, 0], [1, 0, 1]])
outputs = array([[1, 0, 1]]).T

# Train the neural network


neural_network.train(inputs, outputs, 10000)

# Test the neural network with a test example.


print neural_network.learn(array([1, 0, 1]))

Expected Output: After 10 iterations our neural network predicts the value to be 0.65980921. It
looks not good as the answer should really be 1. If we increase the number of iterations to 100, we
get 0.87680541. Our network is getting smarter! Subsequently, for 10000 iterations we get 0.9897704
which is pretty close and indeed a satisfactory output.

References:
 NEURAL NETWORKS by Christos Stergiou and Dimitrios Siganos
 Fundamentals of Deep Learning – Starting with Artificial Neural Network
 Tinker With a Neural Network Right Here in Your Browser
 Neural Networks Demystified
WAP to implement Activation Functions
The activation function (also called transfer function) as part of the neural networks.
an artificial neuron is a signal collector in the inputs and an activation unit in the output triggering a
signal that will be forwarded to other neurons as shown in the following picture:

The artificial neuron receives one or more inputs and sums them to produce an output or activation.
Usually, the sums of each node are weighted and the sum is passed through an activation function. In
most cases, it is the nonlinear activation function that allows such networks to compute nontrivial
problems using only a small number of nodes.
In biologically inspired neural networks, the activation function is usually an abstraction representing the
rate of action potential firing in the cell. Using Java as a programming language, this abstraction can be
designed using an activation function interface:
/**
* Neural network's activation function interface.
*/
public interface ActivationFunction {
/**
* Performs calculation based on the sum of input neurons output.
*
* @param summedInput
* neuron's sum of outputs respectively inputs for the connected
* neuron
*
* @return Output's calculation based on the sum of inputs
*/
double calculateOutput(double summedInput);
}
The implementations of this interface will provide a way to easily experiment with and replace various
types of activation functions. Let's start implementing them!
Step Function
The simplest form of activation function is binary — the neuron is either firing or not. The output y of this
activation function is binary, depending on whether the input meets a specified threshold, θ. The "signal"
is sent, i.e. the output is set to one if the activation meets the threshold.
This function is used in perceptrons. It performs a division of the space of inputs by a hyperplane. It is
especially useful in the last layer of a network intended to perform binary classification of the inputs.
/**
* Step neuron activation function, the output y of this activation
function is
* binary, depending on whether the input meets a specified threshold, 0.
The
* "signal" is sent, i.e. the output is set to one, if the activation meets
the
* threshold.
*
*/
public class StepActivationFunction implements ActivationFunction {
/**
* Output value if the input is above or equal the threshold
*/
private double yAbove = 1d;
/**
* Output value if the input is bellow the threshold
*/
private double yBellow = 0d;
/**
* The output of this activation function is binary, depending on whether
* the input meets a specified threshold.
*/
private double threshold = 0d;
/**
* {@inheritDoc}
*/
@Override
public double calculateOutput(double summedInput) {
if (summedInput >= threshold) {
return yAbove;
}
else {
return yBellow;
}
}
}
Linear Combination
A linear combination is where the weighted sum input of the neuron is summed up with a linearly
dependent bias to build the neuron's output. A number of such linear neurons performs a linear
transformation of the input vector. This is usually more useful in the first layers of a network.

/**
* Linear combination activation function implementation, the output unit
is
* simply the weighted sum of its inputs plus a bias term.
*/
public class LinearCombinationFunction implements ActivationFunction {
/**
* Bias value
*/
private double bias;
/**
* {@inheritDoc}
*/
@Override
public double calculateOutput(double summedInput) {
return summedInput + bias;
}
}
Sigmoid Function
The Sigmoid function (also a logistic function) is calculated using the following formula:

In this formula, the weighted input is multiplied by a slope parameter.


/**
* Sigmoid activation function. Calculation is based on:
*
* y = 1/(1+ e^(-slope*x))
*
*/
public class SigmoidActivationFunction implements ActivationFunction {
/**
* Slope parameter
*/
private double slope = 1d;
/**
* Creates a Sigmoid function with a slope parameter.
*
* @param slope
* slope parameter to be set
*/
public SigmoidActivationFunction(double slope) {
this.slope = slope;
}
/**
* {@inheritDoc}
*/
@Override
public double calculateOutput(double summedInput) {
double denominator = 1 + Math.exp(-slope * summedInput);
return (1d / denominator);
}
}
Sinusoid Function
The Sinusoid activation function is based on calculating the sinus of the weighted input.
/**
* Sinusoid activation function. Calculation is based on:
*
* y = sin(x)
*
*/
public class SinusoidActivationFunction implements ActivationFunction {
/**
* {@inheritDoc}
*/
@Override
public double calculateOutput(double summedInput) {
return Math.sin(summedInput);
}
}
Rectified Linear Unit
This function is also known as a ramp function. According Wikipedia:
It has been used in convolutional networks more effectively than the widely used logistic
sigmoid and its more practical counterpart, the hyperbolic tangent (not covered in this article).
The rectifier is, as of 2015, the most popular activation function for deep neural networks.
/**
* Rectified Linear activation function
*/
public class RectifiedLinearActivationFunction implements
ActivationFunction {
/**
* {@inheritDoc}
*/
@Override
public double calculateOutput(double summedInput) {
return Math.max(0, summedInput);
}
}
Wikipedia also states that there are various activation functions that might be applied in neural networks,
this article presented some of them implemented in Java. Their number will be further extended in next
articles presenting applications of various types, as well as the learning process in neural networks.

WAP to implement Adaptive prediction


in ADALINE NN
The key difference between the Adaline rule (also known as the Widrow-Hoff rule) and Rosenblatt's
perceptron is that the weights are updated based on a linear activation function rather than a unit step
function like in the Perceptron model.

Perceptron
Adaptive linear neuron

The difference is that we're going to use the continuous valued output from the linear activation
function to compute the model error and update the weights, rather than the binary class labels.
Artificial neurons
The perceptron algorithm enables the model automatically learn the optimal weight coefficients that
are then multiplied with the input features in order to make the decision of whether a neuron fires or
not.
In supervised learning and classification, such an algorithm could then be used to predict if a sample
belonged to one class or the other.
In binary classifiers perceptron algorithm, we refer to our two classes as either 1 (positive class) or -1
(negative class).
In the context of neural networks, a perceptron is an artificial neuron using the Heaviside step
function as the activation function.
The perceptron algorithm is also termed the single-layer perceptron, to distinguish it from
a multilayer perceptron. As a linear classifier, the single-layer perceptron is the
simplest feedforward neural network.
We can then define an activation function ϕ(z)ϕ(z) that takes a linear combination of certain input
values x and a corresponding weight vector w, where z is the so-called net
input (z=w1x1+...+wmxmz=w1x1+...+wmxm).

w=[w1...wm],x=[x1...xm]

For our case (binary), we can have:


If ∑i=1wixi > θ then ϕ=1
else ∑i=1wixi < θ then ϕ=−1

If we set x0=1 and w0=−θ, we can have more compact form:


If ∑i=0wixi > 0 then ϕ(z)=1
else ∑i=0wixi < 0 then ϕ(z)=−1
Or more compact form.
1. Heaviside step activation function:
ϕ(z)={ 1 z>0
−1 otherwise
2. Linear activation function:
ϕ(z)=z
where θ is a treshold or bias
Implementation - Adaptive Linear Neuron
Since the perceptron rule and Adaptive Linear Neuron are very similar, we can take the perceptron
implementation that we defined earlier and change the fit method so that the weights are updated
by minimizing the cost function via gradient descent.

Here is the source code:

import numpy as np

class AdaptiveLinearNeuron(object):
def __init__(self, rate = 0.01, niter = 10):
self.rate = rate
self.niter = niter

def fit(self, X, y):


"""Fit training data
X : Training vectors, X.shape : [#samples, #features]
y : Target values, y.shape : [#samples]
"""

# weights
self.weight = np.zeros(1 + X.shape[1])

# Number of misclassifications
self.errors = []

# Cost function
self.cost = []

for i in range(self.niter):
output = self.net_input(X)
errors = y - output
self.weight[1:] += self.rate * X.T.dot(errors)
self.weight[0] += self.rate * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost.append(cost)
return self

def net_input(self, X):


"""Calculate net input"""
return np.dot(X, self.weight[1:]) + self.weight[0]

def activation(self, X):


"""Compute linear activation"""
return self.net_input(X)

def predict(self, X):


"""Return class label after unit step"""
return np.where(self.activation(X) >= 0.0, 1, -1)

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data', header=None)
y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', -1, 1)
X = df.iloc[0:100, [0, 2]].values

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(8, 4))

# learning rate = 0.01


aln1 = AdaptiveLinearNeuron(0.01, 10).fit(X,y)

ax[0].plot(range(1, len(aln1.cost) + 1), np.log10(aln1.cost), marker='o')


ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('log(Sum-squared-error)')
ax[0].set_title('Adaptive Linear Neuron - Learning rate 0.01')

# learning rate = 0.01


aln2 = AdaptiveLinearNeuron(0.0001, 10).fit(X,y)

ax[1].plot(range(1, len(aln2.cost) + 1), aln2.cost, marker='o')


ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Sum-squared-error')
ax[1].set_title('Adaptive Linear Neuron - Learning rate 0.0001')
plt.show()

Gradient descent

As we can see in the resulting cost function plots below, we have two different types of issues.
The left one shows what could happen if we choose a learning rate that is too large. Instead of
minimizing the cost function, the error becomes larger in every epoch because we overshoot the
global minimum.
On the other hand, we can see that the cost decreases for the plot on the right side. That's because
we chose the learning rate η=0.0001η=0.0001 is so small that the algorithm would require a very large
number of epochs to converge.

The following figure demonstrates how we change the value of a particular weight parameter to
minimize the cost function JJ (left). The figure on the right illustrates what happens if we choose a
learning rate that is too large, we overshoot the global minimum:

Picture from "Python Machine Learning by Sebastian Raschka, 2015"

Feature scaling
Feature scaling is a method used to standardize the range of independent variables or features of
data. In data processing, it is also known as data normalization and is generally performed during
the data preprocessing step.
Gradient descent is one of the many algorithms that benefit from feature scaling.
Here, we will use a feature scaling method called standardization, which gives our data the property
of a standard normal distribution.
In machine learning, we can handle various types of data, e.g. audio signals and pixel values for
image data, and this data can include multiple dimensions.
Feature standardization makes the values of each feature in the data have zero-mean (when
subtracting the mean in the enumerator) and unit-variance.
This method is widely used for normalization in many machine learning algorithms (e.g., support
vector machines, logistic regression, and neural networks).
This is typically done by calculating standard scores.
The general method of calculation is to determine the distribution mean and standard deviation for
each feature. Next we subtract the mean from each feature. Then we divide the values (mean is
already subtracted) of each feature by its standard deviation.
- from Feature scaling
So, to standardize the jj-th feature, we just need to subtract the sample mean μjμj from every training
sample and divide it by its standard deviation sigmajsigmaj:
x′j=xj−μjσjxj′=xj−μjσj
where xjxj is a vector consisting of the jj-th feature values of all training samples nn.
We can standardize by using the NumPy methods mean and std:

X_std = np.copy(X)
X_std[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X_std[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()

After the standardization, we will train the Linear model again using the not so small learning rate
of η=0.01η=0.01:

Here is our new code for the two pictures above:

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd
from matplotlib.colors import ListedColormap

class AdaptiveLinearNeuron(object):
def __init__(self, rate = 0.01, niter = 10):
self.rate = rate
self.niter = niter

def fit(self, X, y):


"""Fit training data
X : Training vectors, X.shape : [#samples, #features]
y : Target values, y.shape : [#samples]
"""

# weights
self.weight = np.zeros(1 + X.shape[1])

# Number of misclassifications
self.errors = []

# Cost function
self.cost = []

for i in range(self.niter):
output = self.net_input(X)
errors = y - output
self.weight[1:] += self.rate * X.T.dot(errors)
self.weight[0] += self.rate * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost.append(cost)
return self

def net_input(self, X):


"""Calculate net input"""
return np.dot(X, self.weight[1:]) + self.weight[0]

def activation(self, X):


"""Compute linear activation"""
return self.net_input(X)

def predict(self, X):


"""Return class label after unit step"""
return np.where(self.activation(X) >= 0.0, 1, -1)

def plot_decision_regions(X, y, classifier, resolution=0.02):


# setup marker generator and color map
markers = ('s', 'x', 'o', '^', 'v')
colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
cmap = ListedColormap(colors[:len(np.unique(y))])

# plot the decision surface


x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
np.arange(x2_min, x2_max, resolution))
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())

# plot class samples


for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
alpha=0.8, c=cmap(idx),
marker=markers[idx], label=cl)

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data', header=None)

y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', -1, 1)
X = df.iloc[0:100, [0, 2]].values

# standardize
X_std = np.copy(X)
X_std[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X_std[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()

# learning rate = 0.01


aln = AdaptiveLinearNeuron(0.01, 10)
aln.fit(X_std,y)

# decision region plot


plot_decision_regions(X_std, y, classifier=aln)

plt.title('Adaptive Linear Neuron - Gradient Descent')


plt.xlabel('sepal length [standardized]')
plt.ylabel('petal length [standardized]')
plt.legend(loc='upper left')
plt.show()

plt.plot(range(1, len(aln.cost) + 1), aln.cost, marker='o')


plt.xlabel('Epochs')
plt.ylabel('Sum-squared-error')
plt.show()

Vous aimerez peut-être aussi