Académique Documents
Professionnel Documents
Culture Documents
Abstract—We describe a novel spiking neural network (SNN) networks (ANNs) with back-propagation of errors to train the
for automated, real-time handwritten digit classification and its network and thereafter converting it into spiking versions [9]–
arXiv:1711.03637v1 [stat.ML] 9 Nov 2017
implementation on a GP-GPU platform. Information processing [12]. There have been several supervised learning algorithms
within the network, from feature extraction to classification is
implemented by mimicking the basic aspects of neuronal spike proposed to train the SNNs, by explicitly using the time
initiation and propagation in the brain. The feature extraction of spikes of neurons to encode information, and to derive
layer of the SNN uses fixed synaptic weight maps to extract the appropriate weight update rules to minimize the distance
the key features of the image and the classifier layer uses the between desired spike times and observed spike times in
recently developed NormAD approximate gradient descent based a network [13]–[17]. We use the Normalized Approximate
supervised learning algorithm for spiking neural networks to
adjust the synaptic weights. On the standard MNIST database Descent (NormAD) algorithm to design a system to identify
images of handwritten digits, our network achieves an accuracy handwritten digits. The NormAD algorithm has shown supe-
of 99.80% on the training set and 98.06% on the test set, with rior convergence speed compared to other methods such as the
nearly 7× fewer parameters compared to the state-of-the-art Remote Supervised Method (ReSuMe) [13].
spiking networks. We further use this network in a GPU based Our SNN is trained on the MNIST database consisting of
user-interface system demonstrating real-time SNN simulation to
infer digits written by different users. On a test set of 500 such 60, 000 training images and 10, 000 test images [18]. The
images, this real-time platform achieves an accuracy exceeding highest accuracy SNN for the MNIST was reported in [16],
97% while making a prediction within an SNN emulation time where a two-stage convolution neural network achieved an
of less than 100 ms. accuracy of 99.31% on the test set. Our network, in contrast,
Index Terms—Spiking neural networks, classification, super- has just three layers, with about 82, 000 learning synapses (7×
vised learning, GPU based acceleration, real-time processing
fewer parameters compared to [16]) and achieves an accuracy
I. I NTRODUCTION of 98.06% on the MNIST test dataset.
The paper is organized as follows. The computational units
The human brain is a computational marvel compared to of the SNN and the network architecture are described in
man-made systems, both in its ability to learn to execute highly section II. Section III details how the network simulation is
complex cognitive tasks, as well as in its energy efficiency. The divided among different CUDA kernels. The user-interface
computational efficiency of the brain stems from its use of system and the image pre-processing steps are explained in
sparsely issued binary signals or spikes to encode and process Section IV. We present the results of our network simulation
information. Inspired by this, spiking neural networks (SNNs) and speed related optimizations in Section V. Section VI
have been proposed as a computational framework for learning concludes our GPU based system implementation study.
and inference [1]. General purpose graphical processing units
(GP-GPUs) have become an ideal platform for accelerated II. S PIKING N EURAL N ETWORK
implementation of large scale machine learning algorithms [2]. The basic units of an SNN are spiking neurons and synapses
There have been multiple GPU based implementations for interconnecting them. For computational tractability, we use
simulating large SNNs [3]–[8], with most of these targeting the leaky integrate and fire (LIF) model of neurons, where the
the forward communication of spikes through large networks evolution of the membrane potential, Vm (t) is described by:
of spiking neurons and/or local weight update based on dVm (t)
spike timing difference. In contrast, we demonstrate a highly C = −gL (Vm (t) − EL ) + I(t) (1)
dt
optimized real time implementation scheme for spike based Here I(t) is the total input current, EL is the resting po-
supervised learning on GPU platforms and use the framework tential, and C (300 pF) and gL (30 nS) model the membrane
for real time inference on digits captured from different users capacitance and leak conductance, respectively [13]. Once the
through a touch-screen interface. membrane potential crosses a threshold (Vm (t) ≥ VT ), it is
Previous efforts to develop deep convolutional spiking net- reset to its resting value EL and remains at that value till
works started by using second generation artificial neural the neuron comes out of its refractory period (tref = 3 ms).
This research was supported in part by grants from National Science The synapse, with weight wk,l connecting input neuron k to
Foundation Award 1710009, CISCO, and the McNair Fellowship Program. output neuron l, transforms the incoming spikes (arriving at
©2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.
times t1k , t2k , . . .) into a post-synaptic current (Ik,l ), based on in the output layer generates a spike train with a frequency
the following transformation, close to 285 Hz and the other output neurons issue no spikes
during the presentation duration, T (set to 100 ms in baseline
X
ck (t) = δ(t − tik ) ∗ e−t/τ1 − e−t/τ2 (2)
experiments). T is also a hyper-parameter of our network,
i
and its effect on the network’s classification ability will be
Ik,l (t) = wk,l × ck (t) (3)
discussed in section V. This layer also has lateral inhibitory
Here, the summed δ function represents the incoming spike connections that helps to prevent the non-label neurons from
train and the double decaying exponentials with τ1 = 5 ms spiking for a given input. The output neuron with the highest
and τ2 = 1.25 ms represent the synaptic kernel. These values number of spikes is declared the winner of the classification.
closely match the biological time constants [19].
B. Learning layer
A. Network architecture & spike encoding
The 8112×10 synapses connecting the hidden layer neurons
We use a three-layered network where hidden layer per- to the 10 output layer neurons are modified during the course
forms feature extraction and the output layer performs clas- of training using the NormAD rule [13]. The strength of the
sification (see Fig. 1). The network is designed to take input weights are adjusted based on the error between the observed
from 28×28 pixel MNIST digit image. We translate this pixel and desired spike streams (e(t) = S d (t) − S o (t)) and the
value into a set of spike streams, by passing the pixels as ˆ
term d(t), denoting the effect of incoming spike kernels on
currents to a layer of 28 × 28 neurons (first layer). The current the neuron’s membrane potential, according to the relation:
i(k) applied to a neuron corresponding to pixel value k, in the Z T
d̂(t)
range [0, 255] is obtained by the following linear relation: ∆w = r e(t) dt, d̂(t) = c(t) ∗ ĥ(t) (5)
0 kd̂(t)k
i(k) = I0 + k × Ip (4)
where ĥ(t) = (1/C) exp(−t/τL ) represents the neuron’s
where Ip = 101.2 pA is a scaling factor, and I0 = 2700 pA is
impulse response with τL = 1 ms, and r is the learning rate.
the minimum current above which an LIF neuron can generate
a spike (for the parameters chosen in equation 1). These spike III. CUDA IMPLEMENTATION
streams are then weighted with twelve 3 × 3 synaptic weight
maps (or filters), with a priori chosen values to generate The SNN is implemented on a GPU platform using the
equivalent current streams using equations 2 and 3. These CUDA-C programming framework. A GPU is divided into
12 spatial filter maps are chosen to detect various edges and streaming multiprocessors (SM), each of which consists of
corners in the image. stream processors (SP) that are optimized to execute math
operations. The CUDA-C programming framework exploits
Convolution layer the hardware parallelism of GPUs and launches jobs on the
Output Layer
GPU in a grid of blocks each mapped to an SM. The blocks
are further divided into multiple threads, each of which is
scheduled to run on an SP, also called a CUDA core. Since
26x26x12 neurons
Input Layer
10 neurons
memory
Image
neurons corresponding to each digit. The weights of the fully- Spike Netowork
Output
errors
connected feed-forward synapses to the output layer neurons
(8112 × 10) are adjusted using the NormAD learning rule Compute d_hat
Compute del_w
(8112x10 synapses)
(8112 Spike Trains)
[13]. Additionally, the output layer neurons also have lateral
inhibitory connections. Fig. 2: Diagram showing the different variables of the network
being computed each time step and how the signals flow across
The output layer consists of 10 neurons, one for each of different layers. The dimensions within the brackets are the
the ten digits. We train the network so that the correct neuron sizes of those variables and their respective CUDA kernels.
Fig. 2 shows the forward pass and backward pass for weight update the synaptic weights in parallel. The evaluation of the
update during the training phase. Image pixels read into the total synaptic current and the norm is performed using parallel
GPU memory are passed as currents to layer one neurons (grid reduction in CUDA [20]. During the inference or testing phase,
size of 28 × 28) for the presentation duration, T . The filtering we calculate the synaptic currents and membrane potentials of
process involves 2D convolution of the incoming spike kernels neurons in both layers to determine spike times, but do not
and the weight matrix (3 × 3). The computation is parallelized evaluate the d̂ term and the weight update.
across 12 CUDA kernels, each with a grid size of 26 × 26
threads. Each thread computes the current to the hidden layer IV. R EAL - TIME INFERENCE ON USER DATA
neurons, indexed as a 2D-array i, j, {0 ≤ i, j, ≤ 25} at a time- We used the CUDA based SNN described in the previous
step n, based on the following spatial convolution relation: section, to design a user interface that can capture and identify
2 X
X 2 the images of digits written by users in real-time from a touch-
Iin (i, j, n) = wconv (a, b) × c(i + a, j + b, n) (6) screen interface. The drawing application to capture the digit
a=0 b=0 drawn by the user is built using OpenCV, an image processing
where c represents the synaptic kernel (equation 2) calculated library [21]. The captured image from the touch screen is
from the spike trains of the 28 × 28 pixels and wconv (a, b) pre-processed using standard methods similar to that used to
represents each of the weights from the 3 × 3 filter matrix. generate the MNIST dataset images [18]. We convert the user
The membrane potential of an array of k LIF neurons, for drawn images to the required format which is a grayscale
applied current I(n) (as described in equation 1) is evaluated image of size 28 × 28 pixels. The network is implemented on
using the second order Runge-Kutta method as: the NVIDIA GTX 860M GPU which has 640 CUDA cores.
The preprocessing phase takes about 15 ms and this image
k1 = [−gL (Vm (n) − EL ) + I(n)]/C (7)
is then passed to the trained SNN for inference. The CUDA
k2 = [−gL (Vm (n) + k1 ∆t − EL ) + I(n)]/C (8) process takes about 300 ms to initialize the network in the GPU
Vm (n + 1) = Vm (n) + [(k1 + k2 )∆t/2] (9) memory, after which the network simulation time depends on
Each thread k independently checks if the membrane potential the presentation time T and the time step interval ∆t.
has exceeded the threshold to artificially reset it.
(a). (b).
If Vmk (n + 1) ≥ VT ⇒ Vmk (n + 1) = EL (10)
Refractory period is implemented by storing the latest spike Normalized and
Drawn Image Inverted Image Bounding center-massed
issue time, nlast
k of each neuron in a vector R; the membrane boxed Image 28x28 Image
0
28 x 28
98.0
Image 4 formatics, vol. 11, 2017.
Output
9 Spikes [8] J. L. Krichmar, P. Coussy, and N. Dutt, “Large-scale spiking neural net-
97.8 Pre- SNN works using neuromorphic hardware compatible models,” ACM Journal
dt=0.1 ms
Image processing prediction
dt=1.0 ms Capture complete
and result on Emerging Technologies in Computing Systems (JETC), vol. 11, no. 4,
97.6 display
p. 36, 2015.
97.4
50 100 150 t=0 t=15 t=80 [9] P. U. Diehl et al., “Fast-classifying, high-accuracy spiking deep networks
(a). Presentation time [ms] (b). Time [ms] through weight and threshold balancing,” in 2015 International Joint
Conference on Neural Networks (IJCNN), July 2015, pp. 1–8.
Fig. 4: (a). MNIST test-set accuracy as a function of presenta- [10] Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural
tion time and the integration time step ∆t. (b) Various stages networks for energy-efficient object recognition,” International Journal
of classifying a user’s input: the image pre-processing takes of Computer Vision, vol. 113, no. 1, pp. 54–66, 2015.
[11] B. Rueckauer et al., “Theory and tools for the conversion of
15 ms and the 75 ms SNN emulation is completed in real-time. analog to spiking convolutional neural networks,” arXiv preprint
arXiv:1612.04052, 2016.
the network’s accuracy with ∆t = 1 ms on a set of 500 [12] E. Hunsberger and C. Eliasmith, “Training Spiking Deep Networks for
Neuromorphic Hardware,” arXiv preprint arXiv:1611.05141, 2016.
handwritten digits collected from various users through our [13] N. Anwani and B. Rajendran, “NormAD -Normalized Approximate
user-interface system. At T = 75 ms, we measure an accuracy Descent based supervised learning rule for spiking neurons,” in Inter-
of 97.4% on our set of 500 captured images, while on the national Joint Conference on Neural Networks, July 2015, pp. 1–8.
[14] F. Ponulak and A. Kasinski, “Supervised learning in spiking neural
MNIST test-set it was 97.68%. The slight loss in performance networks with ReSuMe: sequence learning, classification, and spike
compared to the MNIST dataset is attributed to the deviations shifting,” Neural Computation, vol. 22, no. 2, pp. 467–510, 2010.
from the statistical characteristics of the captured images [15] A. Mohemmed et al., “SPAN: Spike Pattern Association Neuron for
Learning Spatio-Temporal Spike Patterns,” International Journal of
compared to the MNIST dataset. Neural Systems, vol. 22, no. 04, 2012.
[16] J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training Deep Spiking Neural
VI. C ONCLUSION Networks using Backpropagation,” Frontiers in Neuroscience, vol. 10,
p. 508, 2016.
We developed a simple three-layer spiking neural network [17] W. W. Lee, S. L. Kukreja, and N. V. Thakor, “CONE: Convex-
that performs spike encoding, feature extraction, and classi- Optimized-Synaptic Efficacies for Temporally Precise Spike Mapping,”
fication. All information processing and learning within the IEEE Transactions on Neural Networks and Learning Systems, vol. 28,
no. 4, pp. 849–861, April 2017.
network is performed entirely in the spike domain. With [18] “The MNIST database of handwritten digits,” available at
approximately 7 times lesser number of synaptic weight pa- http://yann.lecun.com/exdb/mnist/.
rameters compared to the state of the art spiking networks, [19] P. Dayan, L. Abbott et al., “Theoretical neuroscience: computational
and mathematical modeling of neural systems,” Journal of Cognitive
we show that our approach achieves classification accuracy Neuroscience, vol. 15, no. 1, pp. 154–155, 2003.
exceeding 99% on the training set of the MNIST database and [20] M. Harris, “Optimizing parallel reduction in CUDA,” available at
98.06% on its test set. The trained network implemented on the http://bit.ly/2gd7fSb.
[21] I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, “A brief
CUDA parallel computing platform is also able to successfully introduction to OpenCV,” in 2012 Proceedings of the 35th International
identify digits written by users in real-time, demonstrating its Convention MIPRO, May 2012, pp. 1725–1730.
true generalization capability. [22] L. Wan et al., “Regularization of Neural Networks using DropConnect,”
in Proceedings of the 30th International Conference on Machine Learn-
We have also demonstrated a general framework for imple- ing, 2013, pp. 1058–1066.
menting spike based neural networks and supervised learning