1711 03637

Learning and Real-time Classification of
Hand-written Digits With Spiking Neural Networks

Shruti R. Kulkarni, John M. Alexiades, Bipin Rajendran
Department of Electrical and Computer Engineering,
New Jersey Institute of Technology, Newark, NJ, 07102, USA
Email: {srk68, jma59, bipin}@njit.edu
Abstract—We describe a novel spiking neural network (SNN) networks (ANNs) with back-propagation of errors to train the
for automated, real-time handwritten digit classification and its network and thereafter converting it into spiking versions [9]–
arXiv:1711.03637v1 [stat.ML] 9 Nov 2017
implementation on a GP-GPU platform. Information processing [12]. There have been several supervised learning algorithms
within the network, from feature extraction to classification is
implemented by mimicking the basic aspects of neuronal spike proposed to train the SNNs, by explicitly using the time
initiation and propagation in the brain. The feature extraction of spikes of neurons to encode information, and to derive
layer of the SNN uses fixed synaptic weight maps to extract the appropriate weight update rules to minimize the distance
the key features of the image and the classifier layer uses the between desired spike times and observed spike times in
recently developed NormAD approximate gradient descent based a network [13]–[17]. We use the Normalized Approximate
supervised learning algorithm for spiking neural networks to
adjust the synaptic weights. On the standard MNIST database Descent (NormAD) algorithm to design a system to identify
images of handwritten digits, our network achieves an accuracy handwritten digits. The NormAD algorithm has shown supe-
of 99.80% on the training set and 98.06% on the test set, with rior convergence speed compared to other methods such as the
nearly 7× fewer parameters compared to the state-of-the-art Remote Supervised Method (ReSuMe) [13].
spiking networks. We further use this network in a GPU based Our SNN is trained on the MNIST database consisting of
user-interface system demonstrating real-time SNN simulation to
infer digits written by different users. On a test set of 500 such 60, 000 training images and 10, 000 test images [18]. The
images, this real-time platform achieves an accuracy exceeding highest accuracy SNN for the MNIST was reported in [16],
97% while making a prediction within an SNN emulation time where a two-stage convolution neural network achieved an
of less than 100 ms. accuracy of 99.31% on the test set. Our network, in contrast,
Index Terms—Spiking neural networks, classification, super- has just three layers, with about 82, 000 learning synapses (7×
vised learning, GPU based acceleration, real-time processing
fewer parameters compared to [16]) and achieves an accuracy
I. I NTRODUCTION of 98.06% on the MNIST test dataset.
The paper is organized as follows. The computational units
The human brain is a computational marvel compared to of the SNN and the network architecture are described in
man-made systems, both in its ability to learn to execute highly section II. Section III details how the network simulation is
complex cognitive tasks, as well as in its energy efficiency. The divided among different CUDA kernels. The user-interface
computational efficiency of the brain stems from its use of system and the image pre-processing steps are explained in
sparsely issued binary signals or spikes to encode and process Section IV. We present the results of our network simulation
information. Inspired by this, spiking neural networks (SNNs) and speed related optimizations in Section V. Section VI
have been proposed as a computational framework for learning concludes our GPU based system implementation study.
and inference [1]. General purpose graphical processing units
(GP-GPUs) have become an ideal platform for accelerated II. S PIKING N EURAL N ETWORK
implementation of large scale machine learning algorithms [2]. The basic units of an SNN are spiking neurons and synapses
There have been multiple GPU based implementations for interconnecting them. For computational tractability, we use
simulating large SNNs [3]–[8], with most of these targeting the leaky integrate and fire (LIF) model of neurons, where the
the forward communication of spikes through large networks evolution of the membrane potential, Vm (t) is described by:
of spiking neurons and/or local weight update based on dVm (t)
spike timing difference. In contrast, we demonstrate a highly C = −gL (Vm (t) − EL ) + I(t) (1)
dt
optimized real time implementation scheme for spike based Here I(t) is the total input current, EL is the resting po-
supervised learning on GPU platforms and use the framework tential, and C (300 pF) and gL (30 nS) model the membrane
for real time inference on digits captured from different users capacitance and leak conductance, respectively [13]. Once the
through a touch-screen interface. membrane potential crosses a threshold (Vm (t) ≥ VT ), it is
Previous efforts to develop deep convolutional spiking net- reset to its resting value EL and remains at that value till
works started by using second generation artificial neural the neuron comes out of its refractory period (tref = 3 ms).
This research was supported in part by grants from National Science The synapse, with weight wk,l connecting input neuron k to
Foundation Award 1710009, CISCO, and the McNair Fellowship Program. output neuron l, transforms the incoming spikes (arriving at
©2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.
times t1k , t2k , . . .) into a post-synaptic current (Ik,l ), based on in the output layer generates a spike train with a frequency
the following transformation, close to 285 Hz and the other output neurons issue no spikes
during the presentation duration, T (set to 100 ms in baseline
X
ck (t) = δ(t − tik ) ∗ e−t/τ1 − e−t/τ2 (2)
experiments). T is also a hyper-parameter of our network,
i
and its effect on the network’s classification ability will be
Ik,l (t) = wk,l × ck (t) (3)
discussed in section V. This layer also has lateral inhibitory
Here, the summed δ function represents the incoming spike connections that helps to prevent the non-label neurons from
train and the double decaying exponentials with τ1 = 5 ms spiking for a given input. The output neuron with the highest
and τ2 = 1.25 ms represent the synaptic kernel. These values number of spikes is declared the winner of the classification.
closely match the biological time constants [19].
B. Learning layer
A. Network architecture & spike encoding
The 8112×10 synapses connecting the hidden layer neurons
We use a three-layered network where hidden layer per- to the 10 output layer neurons are modified during the course
forms feature extraction and the output layer performs clas- of training using the NormAD rule [13]. The strength of the
sification (see Fig. 1). The network is designed to take input weights are adjusted based on the error between the observed
from 28×28 pixel MNIST digit image. We translate this pixel and desired spike streams (e(t) = S d (t) − S o (t)) and the
value into a set of spike streams, by passing the pixels as ˆ
term d(t), denoting the effect of incoming spike kernels on
currents to a layer of 28 × 28 neurons (first layer). The current the neuron’s membrane potential, according to the relation:
i(k) applied to a neuron corresponding to pixel value k, in the Z T
d̂(t)
range [0, 255] is obtained by the following linear relation: ∆w = r e(t) dt, d̂(t) = c(t) ∗ ĥ(t) (5)
0 kd̂(t)k
i(k) = I0 + k × Ip (4)
where ĥ(t) = (1/C) exp(−t/τL ) represents the neuron’s
where Ip = 101.2 pA is a scaling factor, and I0 = 2700 pA is
impulse response with τL = 1 ms, and r is the learning rate.
the minimum current above which an LIF neuron can generate
a spike (for the parameters chosen in equation 1). These spike III. CUDA IMPLEMENTATION
streams are then weighted with twelve 3 × 3 synaptic weight
maps (or filters), with a priori chosen values to generate The SNN is implemented on a GPU platform using the
equivalent current streams using equations 2 and 3. These CUDA-C programming framework. A GPU is divided into
12 spatial filter maps are chosen to detect various edges and streaming multiprocessors (SM), each of which consists of
corners in the image. stream processors (SP) that are optimized to execute math
operations. The CUDA-C programming framework exploits
Convolution layer the hardware parallelism of GPUs and launches jobs on the
Output Layer
GPU in a grid of blocks each mapped to an SM. The blocks
are further divided into multiple threads, each of which is
scheduled to run on an SP, also called a CUDA core. Since
26x26x12 neurons
Input Layer
10 neurons
memory transfer between CPU and GPU local memory is one

28 x 28 neurons
Input Image
of the main bottlenecks, all network variables (i.e., neuron

28x28 pixel
membrane potentials and synaptic currents) are declared in the

Input
currents global GPU memory in our implementation. The simulation
Spike
Trains All-to-all lateral inhibitory equations (1), (2) and (3) are evaluated numerically in an
connections
Learning Synapses iterative manner at each time step.
Twelve spatial (Fully Connected Layer)
filters of size 3x3
Next epoch
Next time step
LIF Layer -1 LIF Layer-2 Current summation
Fig. 1: The 28×28 pixel images from the MNIST database are 784 neurons
(8112 neurons) (10 currents)
converted to spike trains, which are presented for a duration

spike trains
Generate
GPU device
(28x28)
memory
Image
T , weighted with twelve 3 × 3 synaptic weight maps (below) Synaptic Spike

Trains Weight Update
resulting in twelve 26 × 26 current streams and then feed to current
(8112x10 synapses)
Read in 12 Convolution kernels Synaptic currents LIF Layer-3
(10 neurons)
the corresponding feature map neurons. There are 10 output all images
from MNIST
(26x26 synapses per map) (8112x10 synapses)
neurons corresponding to each digit. The weights of the fully- Spike Netowork
Output
errors
connected feed-forward synapses to the output layer neurons
(8112 × 10) are adjusted using the NormAD learning rule Compute d_hat
Compute del_w
(8112x10 synapses)
(8112 Spike Trains)
[13]. Additionally, the output layer neurons also have lateral
inhibitory connections. Fig. 2: Diagram showing the different variables of the network
being computed each time step and how the signals flow across
The output layer consists of 10 neurons, one for each of different layers. The dimensions within the brackets are the
the ten digits. We train the network so that the correct neuron sizes of those variables and their respective CUDA kernels.
Fig. 2 shows the forward pass and backward pass for weight update the synaptic weights in parallel. The evaluation of the
update during the training phase. Image pixels read into the total synaptic current and the norm is performed using parallel
GPU memory are passed as currents to layer one neurons (grid reduction in CUDA [20]. During the inference or testing phase,
size of 28 × 28) for the presentation duration, T . The filtering we calculate the synaptic currents and membrane potentials of
process involves 2D convolution of the incoming spike kernels neurons in both layers to determine spike times, but do not
and the weight matrix (3 × 3). The computation is parallelized evaluate the d̂ term and the weight update.
across 12 CUDA kernels, each with a grid size of 26 × 26
threads. Each thread computes the current to the hidden layer IV. R EAL - TIME INFERENCE ON USER DATA
neurons, indexed as a 2D-array i, j, {0 ≤ i, j, ≤ 25} at a time- We used the CUDA based SNN described in the previous
step n, based on the following spatial convolution relation: section, to design a user interface that can capture and identify
2 X
X 2 the images of digits written by users in real-time from a touch-
Iin (i, j, n) = wconv (a, b) × c(i + a, j + b, n) (6) screen interface. The drawing application to capture the digit
a=0 b=0 drawn by the user is built using OpenCV, an image processing
where c represents the synaptic kernel (equation 2) calculated library [21]. The captured image from the touch screen is
from the spike trains of the 28 × 28 pixels and wconv (a, b) pre-processed using standard methods similar to that used to
represents each of the weights from the 3 × 3 filter matrix. generate the MNIST dataset images [18]. We convert the user
The membrane potential of an array of k LIF neurons, for drawn images to the required format which is a grayscale
applied current I(n) (as described in equation 1) is evaluated image of size 28 × 28 pixels. The network is implemented on
using the second order Runge-Kutta method as: the NVIDIA GTX 860M GPU which has 640 CUDA cores.
The preprocessing phase takes about 15 ms and this image
k1 = [−gL (Vm (n) − EL ) + I(n)]/C (7)
is then passed to the trained SNN for inference. The CUDA
k2 = [−gL (Vm (n) + k1 ∆t − EL ) + I(n)]/C (8) process takes about 300 ms to initialize the network in the GPU
Vm (n + 1) = Vm (n) + [(k1 + k2 )∆t/2] (9) memory, after which the network simulation time depends on
Each thread k independently checks if the membrane potential the presentation time T and the time step interval ∆t.
has exceeded the threshold to artificially reset it.
(a). (b).
If Vmk (n + 1) ≥ VT ⇒ Vmk (n + 1) = EL (10)
Refractory period is implemented by storing the latest spike Normalized and
Drawn Image Inverted Image Bounding center-massed
issue time, nlast
k of each neuron in a vector R; the membrane boxed Image 28x28 Image
potential of a neuron is updated only when the current time

Fig. 3: (a) Outline of the preprocessing steps used to convert
step n > nlast
k + (tref /∆t).
the user input to a 28×28 image that is fed to the network, (b)
The synaptic current from neuron k in hidden layer to
Examples of user input (left) and the pre-processed 28 × 28
neuron l in output layer as given in equation 3 can be re-written
pixel images fed to the SNN (right).
to be evaluated in an iterative manner, thereby avoiding the
evaluation of expensive exponential of the difference between
the current time n and previous spike times nik . The synaptic A. Image Preprocessing
current computation, at time step n, for each of the (k, l) Fig. 3(a) shows the preprocessing steps used to create the
synapse is spawned in CUDA across 8112 × 10 kernels as: input signal to the SNN from the captured image and Fig. 3(b)
ak (n) = ak (n − 1) × exp(−∆t/τ1 ) + δ(n − nik ) (11) shows some sample pre-processed images. The image captured
bk (n) = bk (n − 1) × exp(−∆t/τ2 ) + δ(n − nik ) (12) from the user is first binarized by thresholding and cropped
to remove excess background. The image is resized to 20
ck (n) = ak (n) − bk (n) (13) pixels along its longer dimension, while maintaining its aspect
Ik,l = wk,l × ck (n) (14) ratio. Thereafter, the resized image is placed in a 28 × 28
where ak (n) and bk (n) represent the rising and falling regions bounding box such that the image’s center of mass coincides
of the double exponential synaptic kernel. The strength of the with the center of the bounding box. Finally, the image is
synapses between the hidden and output layers is initialized passed through a blurring filter to create gray-scale images
to zero during training. At every time step, the error function similar to the ones in the MNIST dataset.
for each output neuron is calculated, based on the difference V. R ESULTS
between the observed and desired spikes. Next, dˆk (equation
5) for the spikes originating from neuron k is computed as: We trained the network on the MNIST training data-set
consisting of 60, 000 images, for 20 epochs. Our network
dˆk (n) = dˆk (n − 1)e−∆t/τL + (ck (n)∆t)/C (15) achieves an error of 0.2% on the training set and 1.94% on
Once dˆk (n) is evaluated, we compute its norm across all the test set with a time step of ∆t = 0.1 ms when the network
k neurons and determine the instantaneous ∆wk,l (n) for all is simulated for T = 100 ms. Table I lists the state-of-the-
the 81, 120 synapses in parallel, if there is a spike error. At art networks (ANN and SNN) for the MNIST classification
the end of presentation, the accumulated ∆wk,l is used to problem. It can be seen that though these networks have
classification accuracies exceeding 99%, they use more than with non-local weight update rules on a GPU platform. At each
7× the number of parameters compared to our network, which time step, the neuronal spike transmission, synaptic current
is designed to simplify the computational load in developing computation and weight update calculation for the network
real-time system. are all executed in parallel in this framework. Using this
GPU implementation, we demonstrated a touch-screen based
TABLE I: Comparison of our SNN with state-of-the-art platform for real-time classification of user-generated images.
Network and learning algorithm Learning synapses Accuracy
Deep Learning [22] 2, 508, 470 99.79% R EFERENCES
ANN converted to SNN [9] 1, 422, 848 99.12%
[1] W. Maass, “Networks of spiking neurons: The third generation of neural
4-layer convolution SNN [16] 581, 520 99.31%
network models,” Neural networks, vol. 10, no. 9, pp. 1659–1671, 1997.
SNN, with NormAD (this work) 81, 120 98.06%
[2] A. Coates et al., “Deep learning with COTS HPC systems,” in 30th
International Conference on Machine Learning, 2013.
If the integration time step interval used during inference [3] A. K. Fidjeland and M. P. Shanahan, “Accelerated simulation of spiking
is 1 ms (i.e., approximating the neuronal integration) instead neural networks using GPUs,” in The 2010 International Joint Confer-
ence on Neural Networks (IJCNN). IEEE, 2010, pp. 1–8.
of 0.1 ms, the MNIST test error increases only by about 0.4% [4] J. M. Nageswaran et al., “A configurable simulation environment for the
(see Fig. 4(a)), but there is a 10× reduction in the processing efficient simulation of large-scale spiking neural networks on graphics
time. Hence, for our touch screen based interface system we processors,” Neural networks, vol. 22, no. 5, pp. 791–800, 2009.
[5] E. Yavuz, J. Turner, and T. Nowotny, “GeNN: a code generation
simulate the SNN with ∆t of 1 ms to infer the users’ digits. framework for accelerated brain simulations,” Nature Scientific Reports,
When each digit is presented for T = 75 ms, the network vol. 6, no. 18854, 2016.
can be simulated in an average wall clock time of 65 ms, [6] D. Yudanov et al., “GPU-based simulation of spiking neural networks
with real-time performance and high accuracy,” in International Joint
making real-time processing possible (Fig. 4(b)). We tested Conference on Neural Networks, July 2010.
[7] F. Naveros et al., “Event-and time-driven techniques using parallel CPU-
GPU co-processing for Spiking Neural Networks,” Front. in Neuroin-
Test accuracy (%)
0
28 x 28
98.0
Image 4 formatics, vol. 11, 2017.
Output
9 Spikes [8] J. L. Krichmar, P. Coussy, and N. Dutt, “Large-scale spiking neural net-
97.8 Pre- SNN works using neuromorphic hardware compatible models,” ACM Journal
dt=0.1 ms
Image processing prediction
dt=1.0 ms Capture complete
and result on Emerging Technologies in Computing Systems (JETC), vol. 11, no. 4,
97.6 display
p. 36, 2015.
97.4
50 100 150 t=0 t=15 t=80 [9] P. U. Diehl et al., “Fast-classifying, high-accuracy spiking deep networks
(a). Presentation time [ms] (b). Time [ms] through weight and threshold balancing,” in 2015 International Joint
Conference on Neural Networks (IJCNN), July 2015, pp. 1–8.
Fig. 4: (a). MNIST test-set accuracy as a function of presenta- [10] Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural
tion time and the integration time step ∆t. (b) Various stages networks for energy-efficient object recognition,” International Journal
of classifying a user’s input: the image pre-processing takes of Computer Vision, vol. 113, no. 1, pp. 54–66, 2015.
[11] B. Rueckauer et al., “Theory and tools for the conversion of
15 ms and the 75 ms SNN emulation is completed in real-time. analog to spiking convolutional neural networks,” arXiv preprint
arXiv:1612.04052, 2016.
the network’s accuracy with ∆t = 1 ms on a set of 500 [12] E. Hunsberger and C. Eliasmith, “Training Spiking Deep Networks for
Neuromorphic Hardware,” arXiv preprint arXiv:1611.05141, 2016.
handwritten digits collected from various users through our [13] N. Anwani and B. Rajendran, “NormAD -Normalized Approximate
user-interface system. At T = 75 ms, we measure an accuracy Descent based supervised learning rule for spiking neurons,” in Inter-
of 97.4% on our set of 500 captured images, while on the national Joint Conference on Neural Networks, July 2015, pp. 1–8.
[14] F. Ponulak and A. Kasinski, “Supervised learning in spiking neural
MNIST test-set it was 97.68%. The slight loss in performance networks with ReSuMe: sequence learning, classification, and spike
compared to the MNIST dataset is attributed to the deviations shifting,” Neural Computation, vol. 22, no. 2, pp. 467–510, 2010.
from the statistical characteristics of the captured images [15] A. Mohemmed et al., “SPAN: Spike Pattern Association Neuron for
Learning Spatio-Temporal Spike Patterns,” International Journal of
compared to the MNIST dataset. Neural Systems, vol. 22, no. 04, 2012.
[16] J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training Deep Spiking Neural
VI. C ONCLUSION Networks using Backpropagation,” Frontiers in Neuroscience, vol. 10,
p. 508, 2016.
We developed a simple three-layer spiking neural network [17] W. W. Lee, S. L. Kukreja, and N. V. Thakor, “CONE: Convex-
that performs spike encoding, feature extraction, and classi- Optimized-Synaptic Efficacies for Temporally Precise Spike Mapping,”
fication. All information processing and learning within the IEEE Transactions on Neural Networks and Learning Systems, vol. 28,
no. 4, pp. 849–861, April 2017.
network is performed entirely in the spike domain. With [18] “The MNIST database of handwritten digits,” available at
approximately 7 times lesser number of synaptic weight pa- http://yann.lecun.com/exdb/mnist/.
rameters compared to the state of the art spiking networks, [19] P. Dayan, L. Abbott et al., “Theoretical neuroscience: computational
and mathematical modeling of neural systems,” Journal of Cognitive
we show that our approach achieves classification accuracy Neuroscience, vol. 15, no. 1, pp. 154–155, 2003.
exceeding 99% on the training set of the MNIST database and [20] M. Harris, “Optimizing parallel reduction in CUDA,” available at
98.06% on its test set. The trained network implemented on the http://bit.ly/2gd7fSb.
[21] I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, “A brief
CUDA parallel computing platform is also able to successfully introduction to OpenCV,” in 2012 Proceedings of the 35th International
identify digits written by users in real-time, demonstrating its Convention MIPRO, May 2012, pp. 1725–1730.
true generalization capability. [22] L. Wan et al., “Regularization of Neural Networks using DropConnect,”
in Proceedings of the 30th International Conference on Machine Learn-
We have also demonstrated a general framework for imple- ing, 2013, pp. 1058–1066.
menting spike based neural networks and supervised learning

1711 03637

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

1711 03637

Transféré par

Droits d'auteur :

Formats disponibles

Learning and Real-time Classification of

Hand-written Digits With Spiking Neural Networks

memory transfer between CPU and GPU local memory is one

of the main bottlenecks, all network variables (i.e., neuron

membrane potentials and synaptic currents) are declared in the

converted to spike trains, which are presented for a duration

T , weighted with twelve 3 × 3 synaptic weight maps (below) Synaptic Spike

potential of a neuron is updated only when the current time

Vous aimerez peut-être aussi