Académique Documents
Professionnel Documents
Culture Documents
ARTIFICIAL NEURONS
Artificial Neurons
An artificial neuron is a mathematical function conceived as a counterpart of a
biological neuron. It is alternatively named elementary processing unit, binary
neuron, node, linear threshold function or McCulloch-Pitts (MCP) neuron.
An artificial neuron receives one or more weighted inputs (representing the
number of dendrites and synaptic weights) and sums them up (equivalently to
spatio-temporal summation of signals at soma level). Subsequently, the sum is
passed through a nonlinear function known as an activation or transfer function.
If a certain threshold level is exceeded, the neuron fires up and send a signal to
the neighboring cells.
The transfer functions usually have a sigmoid shape, though they may take
several forms of non-linear functions, like piecewise linear functions, or step
functions.
Generally, the transfer functions are monotonically increasing, continuous,
differentiable, and bounded.
Most artificial neurons of simple types, such as McCulloch-Pitts models are
sometimes characterized as caricature models, in that they are intended to
reflect only some neurophysiological characteristics, but without regard to
realism /full representation of their biological counterparts.
uk = wkj x j
j =1
=
yk =
( uk ) wkj x j
j =1
Here u refers to the spatiotemoral sum of all weighted inputs of the neuron.
In most cases, it is useful to include the threshold k for each neuron:
=
yk =
( uk ) wkj x j k
j =1
x = {x1 , x2 ,..., xP }
Transfer Function
The transfer function of a neuron is chosen to have a number of properties
which either enhance or simplify the network containing the neuron.
A sigmoid function is a bounded differentiable real function that is defined for all
real input values and has a positive derivative at each point.
All sigmoid functions are normalized in that their slope at the origin is 1.
Sigmoid Function
A sigmoid function is a bounded differentiable real function that is defined for all
real input values and has a positive derivative at each point.
A sigmoid function is a function having an "S" shape (sigmoid curve).
Often, sigmoid functions refer to special cases like the logistic function defined
by the formula:
S (t ) =
1
1 + exp ( t )
Error Function
The error function (also called the Gauss error function) is a special function
(non-elementary) of sigmoid shape:
=
erf ( x )
( )
exp t 2 dt
2
erfc ( x ) =
1 erf ( x ) =
1
( )
exp t 2 dt
Gudermannian Function
dt
=
gd ( x ) =
gd 1 ( x )
cosh ( t )
0
dt
cos ( t )
0
Heaviside Function
The Heaviside step function, or
the unit step function, usually
denoted by H, is a discontinuous
function. The output y of H
transfer function is binary,
depending on the specified
threshold, :
1 if u
y (u ) =
0 if u <
It seldom matters what value is
used for H(0), since H is mostly
used as a distribution.
The Heaviside function is the integral of the Dirac delta function, , although this
expansion may not hold (or even make sense) for x = 0, depending on which
formalism is used to give meaning to integrals involving .
=
H
H (u ) =
( s ) ds
Heaviside Function
The Heaviside step function is used in some neuromorphic models as well. It
can be approximated from other sigmoidal functions by assigning large values
to the weights. It performs a division of the space of inputs by a hyperplane.
An affine hyperplane is an affine subspace of codimension 1 in an affine space.
Such a hyperplane in Cartesian coordinates is described by a linear equation
(where at least one of the ais is non-zero):
a1 x1 + a2 x2 + ... + an xn =
b
In the case of a real affine space (when the coordinates are real numbers) this
affine space separates the space into two half-spaces, which are the connected
components of the complement of the hyperplane given by the inequalities:
a1 x1 + a2 x2 + ... + an xn < b
a1 x1 + a2 x2 + ... + an xn > b
The Heaviside function is specially useful in the last layer of a multilayered
network intended to perform binary classification of the inputs
Affine hyperplanes are used to define decision boundaries in many machine
learning algorithms such as decision trees and perceptrons.
Random Variables
A random variable (aleatory variable or stochastic variable) is a real-valued
function defined on a set of possible outcomes of a random experiment, the
sample space . That is, the random variable is a function that maps from its
domain, the sample space , to its range (real numbers or a subset of the real
numbers).
A random variable can take on a set of possible different values (similarly to
other mathematical variables), each with an associated probability (in contrast
to other mathematical variables).
The mathematical function describing the possible values of a random variable
and their associated probabilities is known as a probability distribution.
Random variables can be discrete, that is, taking any of a specified finite or
countable list of values, and hence with a probability mass function as
probability distribution, continuous, taking any numerical value in an interval or
collection of intervals, and with a probability density function describing the
probability distribution, or a mixture of both types.
Random variables with discontinuities in their CDFs can be treated as mixtures
of discrete and continuous random variables.
Pr [ a X b] =f X ( x ) dx
The cumulative distribution function (cdf), describes the probability that a realvalued random variable X with a given probability distribution X will be found at a
value less than or equal to x. In the case of a continuous distribution, it gives the
area under the probability density function from minus infinity to x.
The cdf of a continuous random variable X can be expressed as the integral of its
probability density function X as follows:
FX ( x ) =
f X ( t ) dt
X : A R, f X : A [0,1], f X ( x=
) Pr( X= x=) Pr ({s : X ( s=) x})
In this case, the random variable of interest X is defined as the function that
maps the pairs to the sum:
{( n1 , n2 )},
X : S , X ( ( n1 , n2 ) ) = n1 + n2 , n1 , n2 {1,2,3,4,5,6}
fX (S )
min( S 1, 13 S )
, S {2,3,4,5,6,7,8,9,10,11,12}
36
Degenerate Distribution
A degenerate distribution is the probability distribution
of a random variable which takes a single value only.
The degenerate distribution is localized at a point k0 on the real axis. The
probability mass function and cumulative distribution function are given by:
References
McCulloch, W. and Pitts, W. (1943), A logical calculus of the ideas immanent in
nervous activity. Bulletin of Mathematical Biophysics, 7: 115-133.
Mutihac R., Modelarea si Simularea Neuronala Elemente Fundamentale.
Editura Universitatii din Bucuresti, 2000.
Werbos, P.J. (1990). Backpropagation through time: What it does and how to
do it. Proceedings of the IEEE, 78 (10):1550-1560.
Robertson, J.S. (1997), "Gudermann and the simple pendulum", The College
Mathematics Journal 28(4):271276.
Fitzhugh, R. and Izhikevich, E. (2006), FitzHugh-Nagumo model.
Scholarpedia, 1 (9): 1349.
Haykin S., Neural Networks: A Comprehensive Foundation. 2 ed., Prentice Hall,
1998.
Hebb, D.O., The Organization of Behavior. New York Wiley, 1949.
Hodgkin, A.L. and Huxley, A.F. (1952), A quantitative description of membrane
current and its application to conduction and excitation in nerve. The Journal
of Physiology, 117 (4): 500544.
Hoppensteadt, F.C. and Izhikevich E.M., Weakly Connected Neural Networks.
Springer, 1997.
References
Abbott, L.F. (1999). Lapique's introduction of the integrate-and-fire model
neuron (1907). Brain Research Bulletin, 50 (5/6): 303-304.
Koch, C. and Segev, I. (1999), Methods In Neuronal Modeling : From Ions To
Networks. 2nd ed., Cambridge, Mass., MIT Press., 1999.