Académique Documents
Professionnel Documents
Culture Documents
Massively-Parallel
Computation
Steve Furber
ICL Professor of Computer
Engineering
The University of Manchester
1
Turing Centenary
Turing in Manchester
ARM 968
63 years of progress
Baby:
filled a medium-sized room
used 3.5 kW of electrical power
executed 700 instructions per second
Energy efficiency
Baby:
5 Joules per instruction
SpiNNaker ARM968:
0.000 000 000 2 Joules per
instruction
25,000,000,000
better than Baby!
times
Bio-inspiration
Can massively-parallel computing
resources accelerate our understanding of
brain function?
Building brains
Brains demonstrate
massive parallelism (1011 neurons)
massive connectivity (1015 synapses)
excellent power-efficiency
much better than todays microchips
Building brains
Neurons
multiple inputs, single
output (c.f. logic gate)
useful across multiple scales
(102 to 1011)
Brain structure
regularity
e.g. 6-layer cortical
microarchitecture
10
Neural Computation
To compute we need:
Processing
Communication
Storage
Processing:
abstract model
linear sum of
weighted inputs
x1
x2
x3
x4
w1
w2
w3
w4
ignores non-linear
processes in dendrites
Processing
Leaky integrate-and-fire
model
xi (t tik )
k
I wi xi
i
A A / A I
if A A fire
& set A 0
12
Processing
Izhikevich model
two variables, one fast, one slow:
v 0.04v 5v 140 u I
u a (bv u)
40
20
-20
-40
-60
-80
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
0
-2
-4
-6
-8
-10
-12
-14
( www.izhikevich.com )
13
Communication
Spikes
biological neurons communicate principally
via spike events
asynchronous
information is only:
40
20
Address Event
Representation (AER)
-20
-40
-60
-80
20
40
60
80
100
120
140
160
180
200
14
Storage
Synaptic weights
stable over long periods of time
with diverse decay properties?
Medicine
medical informatics
early diagnosis
personalized treatment
Future computing
interactive supercomputing
neuromorphic computing
17
SpiNNaker project
A million mobile phone
processors in one
computer
Able to model about 1%
of the human brain
or 10 mice!
18
Design principles
Virtualised topology
physical and logical connectivity are decoupled
Bounded asynchrony
time models itself
Energy frugality
processors are free
the real cost of computation is energy
19
SpiNNaker chip
Multi-chip
packaging by
UNISEM Europe
20
Chip resources
21
48-node PCB
864 cores
2,592 cores
22
SpiNNaker machines
20,000 cores
100,000 cores
23
24
Network packets
Four packet types
Header (8 bits)
T ER
TS 0 - P
Header (8 bits)
T SQ
TS 1 - P
Srce
25
Network MC Router
All MC spike event packets are sent to a router
Ternary CAM keeps router size manageable at 1024 entries
(but careful network mapping also essential)
CAM hit yields a set of destinations for this spike event
automatic multicasting
0 0 1 0 X 1 0 1
000000010000010000
001001
On-chip
Inter-chip
26
Topology mapping
Topology
72
14
01
Core 10
1
03
12
Synapse
10
06
07
11
Fragment of
MC table
07
15
8
7
02
23
72
4
5
3
23 0
72 94 0
23 3
72 2
94 3
0
23 3
72 2
94 2
2
3
01
09
2
Node 94
1
Problem graph (circuit)
23 0
72 1
94 0
0
2
94
23 0
72 2
94 -
1
0
2
23
23 72 1
94 2
27
Problem mapping
SpiNNaker:
...abstract problem
topology...
Problem: represented as a
network of nodes with a
certain behaviour...
...problem is
split into two
parts...
...compile, link...
28
Bisection performance
1,024 links
in each direction
~10 billion packets/s
10Hz mean firing rate
250 Gbps bisection
bandwidth
29
30
Conclusions
We have come a long way in 60
years
x1010 improvement in efficiency