Académique Documents
Professionnel Documents
Culture Documents
Network
Implementing Artificial Neural Network training process in Python
An Artificial Neural Network (ANN) is an information processing paradigm that is inspired the brain.
ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern
recognition or data classification, through a learning process. Learning largely involves adjustments to
the synaptic connections that exist between the neurons.
The brain consists of hundreds of billion of cells called neurons. These neurons are connected
together by synapses which are nothing but the connections across which a neuron can send an
impulse to another neuron. When a neuron sends an excitatory signal to another neuron, then this
signal will be added to all of the other inputs of that neuron. If it exceeds a given threshold then it will
cause the target neuron to fire an action signal forward — this is how the thinking process works
internally.
In Computer Science, we model this process by creating “networks” on a computer using matrices.
These networks can be understood as abstraction of neurons without all the biological complexities
taken into account. To keep things simple, we will just model a simple NN, with two layers capable of
solving linear classification problem.
Let’s say we have a problem where we want to predict output given a set of inputs and outputs as
training example like so:
Note that the output is directly related to third column i.e. the values of input 3 is what the output is in
every training example in fig. 2. So for the test example output value should be 1.
The training process consists of the following steps:
1. Forward Propagation:
Take the inputs, multiply by the weights (just use random numbers as weights)
Let Y = W iIi = W 1I1+W 2I2+W3I3
Pass the result through a sigmoid formula to calculate the neuron’s output. The Sigmoid
function is used to normalise the result between 0 and 1:
1/1 + e-y
2. Back Propagation
Calculate the error i.e the difference between the actual output and the expected output.
Depending on the error, adjust the weights by multiplying the error with the input and again with
the gradient of the Sigmoid curve:
Weight += Error Input Output (1-Output) ,here Output (1-Output) is derivative of sigmoid curve.
Note: Repeat the whole process for a few thousands iterations.
Let’s code up the whole process in Python. We’ll be using Numpy library to help us with all the
calculations on matrices easily. You’d need to install numpy library on your system to run the code
Command to install numpy:
sudo apt -get install python-numpy
Implementation:
class NeuralNet(object):
def __init__(self):
# Generate random numbers
random.seed(1)
# Train the neural network and adjust the weights each time.
def train(self, inputs, outputs, training_iterations):
for iteration in xrange(training_iterations):
if __name__ == "__main__":
#Initialize
neural_network = NeuralNet()
Expected Output: After 10 iterations our neural network predicts the value to be 0.65980921. It
looks not good as the answer should really be 1. If we increase the number of iterations to 100, we
get 0.87680541. Our network is getting smarter! Subsequently, for 10000 iterations we get 0.9897704
which is pretty close and indeed a satisfactory output.
References:
NEURAL NETWORKS by Christos Stergiou and Dimitrios Siganos
Fundamentals of Deep Learning – Starting with Artificial Neural Network
Tinker With a Neural Network Right Here in Your Browser
Neural Networks Demystified
WAP to implement Activation Functions
The activation function (also called transfer function) as part of the neural networks.
an artificial neuron is a signal collector in the inputs and an activation unit in the output triggering a
signal that will be forwarded to other neurons as shown in the following picture:
The artificial neuron receives one or more inputs and sums them to produce an output or activation.
Usually, the sums of each node are weighted and the sum is passed through an activation function. In
most cases, it is the nonlinear activation function that allows such networks to compute nontrivial
problems using only a small number of nodes.
In biologically inspired neural networks, the activation function is usually an abstraction representing the
rate of action potential firing in the cell. Using Java as a programming language, this abstraction can be
designed using an activation function interface:
/**
* Neural network's activation function interface.
*/
public interface ActivationFunction {
/**
* Performs calculation based on the sum of input neurons output.
*
* @param summedInput
* neuron's sum of outputs respectively inputs for the connected
* neuron
*
* @return Output's calculation based on the sum of inputs
*/
double calculateOutput(double summedInput);
}
The implementations of this interface will provide a way to easily experiment with and replace various
types of activation functions. Let's start implementing them!
Step Function
The simplest form of activation function is binary — the neuron is either firing or not. The output y of this
activation function is binary, depending on whether the input meets a specified threshold, θ. The "signal"
is sent, i.e. the output is set to one if the activation meets the threshold.
This function is used in perceptrons. It performs a division of the space of inputs by a hyperplane. It is
especially useful in the last layer of a network intended to perform binary classification of the inputs.
/**
* Step neuron activation function, the output y of this activation
function is
* binary, depending on whether the input meets a specified threshold, 0.
The
* "signal" is sent, i.e. the output is set to one, if the activation meets
the
* threshold.
*
*/
public class StepActivationFunction implements ActivationFunction {
/**
* Output value if the input is above or equal the threshold
*/
private double yAbove = 1d;
/**
* Output value if the input is bellow the threshold
*/
private double yBellow = 0d;
/**
* The output of this activation function is binary, depending on whether
* the input meets a specified threshold.
*/
private double threshold = 0d;
/**
* {@inheritDoc}
*/
@Override
public double calculateOutput(double summedInput) {
if (summedInput >= threshold) {
return yAbove;
}
else {
return yBellow;
}
}
}
Linear Combination
A linear combination is where the weighted sum input of the neuron is summed up with a linearly
dependent bias to build the neuron's output. A number of such linear neurons performs a linear
transformation of the input vector. This is usually more useful in the first layers of a network.
/**
* Linear combination activation function implementation, the output unit
is
* simply the weighted sum of its inputs plus a bias term.
*/
public class LinearCombinationFunction implements ActivationFunction {
/**
* Bias value
*/
private double bias;
/**
* {@inheritDoc}
*/
@Override
public double calculateOutput(double summedInput) {
return summedInput + bias;
}
}
Sigmoid Function
The Sigmoid function (also a logistic function) is calculated using the following formula:
Perceptron
Adaptive linear neuron
The difference is that we're going to use the continuous valued output from the linear activation
function to compute the model error and update the weights, rather than the binary class labels.
Artificial neurons
The perceptron algorithm enables the model automatically learn the optimal weight coefficients that
are then multiplied with the input features in order to make the decision of whether a neuron fires or
not.
In supervised learning and classification, such an algorithm could then be used to predict if a sample
belonged to one class or the other.
In binary classifiers perceptron algorithm, we refer to our two classes as either 1 (positive class) or -1
(negative class).
In the context of neural networks, a perceptron is an artificial neuron using the Heaviside step
function as the activation function.
The perceptron algorithm is also termed the single-layer perceptron, to distinguish it from
a multilayer perceptron. As a linear classifier, the single-layer perceptron is the
simplest feedforward neural network.
We can then define an activation function ϕ(z)ϕ(z) that takes a linear combination of certain input
values x and a corresponding weight vector w, where z is the so-called net
input (z=w1x1+...+wmxmz=w1x1+...+wmxm).
w=[w1...wm],x=[x1...xm]
import numpy as np
class AdaptiveLinearNeuron(object):
def __init__(self, rate = 0.01, niter = 10):
self.rate = rate
self.niter = niter
# weights
self.weight = np.zeros(1 + X.shape[1])
# Number of misclassifications
self.errors = []
# Cost function
self.cost = []
for i in range(self.niter):
output = self.net_input(X)
errors = y - output
self.weight[1:] += self.rate * X.T.dot(errors)
self.weight[0] += self.rate * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost.append(cost)
return self
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data', header=None)
y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', -1, 1)
X = df.iloc[0:100, [0, 2]].values
Gradient descent
As we can see in the resulting cost function plots below, we have two different types of issues.
The left one shows what could happen if we choose a learning rate that is too large. Instead of
minimizing the cost function, the error becomes larger in every epoch because we overshoot the
global minimum.
On the other hand, we can see that the cost decreases for the plot on the right side. That's because
we chose the learning rate η=0.0001η=0.0001 is so small that the algorithm would require a very large
number of epochs to converge.
The following figure demonstrates how we change the value of a particular weight parameter to
minimize the cost function JJ (left). The figure on the right illustrates what happens if we choose a
learning rate that is too large, we overshoot the global minimum:
Feature scaling
Feature scaling is a method used to standardize the range of independent variables or features of
data. In data processing, it is also known as data normalization and is generally performed during
the data preprocessing step.
Gradient descent is one of the many algorithms that benefit from feature scaling.
Here, we will use a feature scaling method called standardization, which gives our data the property
of a standard normal distribution.
In machine learning, we can handle various types of data, e.g. audio signals and pixel values for
image data, and this data can include multiple dimensions.
Feature standardization makes the values of each feature in the data have zero-mean (when
subtracting the mean in the enumerator) and unit-variance.
This method is widely used for normalization in many machine learning algorithms (e.g., support
vector machines, logistic regression, and neural networks).
This is typically done by calculating standard scores.
The general method of calculation is to determine the distribution mean and standard deviation for
each feature. Next we subtract the mean from each feature. Then we divide the values (mean is
already subtracted) of each feature by its standard deviation.
- from Feature scaling
So, to standardize the jj-th feature, we just need to subtract the sample mean μjμj from every training
sample and divide it by its standard deviation sigmajsigmaj:
x′j=xj−μjσjxj′=xj−μjσj
where xjxj is a vector consisting of the jj-th feature values of all training samples nn.
We can standardize by using the NumPy methods mean and std:
X_std = np.copy(X)
X_std[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X_std[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()
After the standardization, we will train the Linear model again using the not so small learning rate
of η=0.01η=0.01:
class AdaptiveLinearNeuron(object):
def __init__(self, rate = 0.01, niter = 10):
self.rate = rate
self.niter = niter
# weights
self.weight = np.zeros(1 + X.shape[1])
# Number of misclassifications
self.errors = []
# Cost function
self.cost = []
for i in range(self.niter):
output = self.net_input(X)
errors = y - output
self.weight[1:] += self.rate * X.T.dot(errors)
self.weight[0] += self.rate * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost.append(cost)
return self
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data', header=None)
y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', -1, 1)
X = df.iloc[0:100, [0, 2]].values
# standardize
X_std = np.copy(X)
X_std[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X_std[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()