Académique Documents
Professionnel Documents
Culture Documents
Using next generation computers and algorithms for modeling the dynamics of large biomolecular systems
Makoto Taiji
Computational Biology Research Core RIKEN Quantitative Biology Center Processor Research Team RIKEN Advanced Institute for Computational Sciences
taiji@riken.jp
Cilia of mouse embryo Fluid dynamic mechanism responsible for breaking left
right symmetry of the Human Body: The Nodal Flow,
N. Hirokawa, Y. Okada, Y. Tanaka, Annual Review of Fluid Mechanics 41, 53-72 (2009). https://www.youtube.com/watch?v=3y_P67KwuvU
30,000 year ExaFLOPS 1021J energy (~3x1018J is spent in Japan in each year) Multiscale approach is essential
Weak Scaling
K computer
Organization
nMolecular Fluctuations (Aston Group) nMolecular Fluctuations Fluid Dynamics (Moscow Group) nMultiscale Fluid Dynamics (Univ. London/ Cambridge Group) nHPC (RIKEN)
Mercedes-Benz
water
as
a
bridge
to
the
macroscale
Arturs
Scukins
and
Dmitry
Nerukh
Implementa8on
in
molecular
dynamics
Why
MB
water?
A
rela8vely
simple,
2D
model
that
has
all
features
and
peculiari8es
of
real
water
Being
2D
scales
much
beFer
with
size:
allows
to
reach
the
spa8al
sizes
of
hydrodynamics
Well
developed
theore8cally
(started
by
Ben- Naim
in
early
seven8es)
Computa8onally
well
studied
by
Monte
Carlo,
but
no
inves8ga8ons
by
Molecular
Dynamics
Well
suited
fro
our
purpose
of
developing
hybrid
Molecular
Dynamics
Fluid
Dynamics
approach
Mercedes Benz poten8al: , where is Lennard-Jones poten8al, , is orienta8on dependant poten8al, is a Gaussian func8on.
where is a 8me average, V is an area, N - number of molecules, T - temperature, K kine8c energy, density.
Results
Structure
The
RDF
qualita8vely
diers
from
Lennard
Jones
RDF
but
coincides
with
the
results
obtained
using
Monte
Carlo
Conclusions
The
2D
Mercedes-Benz
model
mimics
real
water
behaviour.
Captures
minimum
of
pressure
(volume),
nega8ve
expansion
coecient,
minimum
of
compressibility
and
high
heat
capacity.
RDF
qualita8vely
diers
from
Lennard-Jones
RDF.
Towards
accurate
modeling
across
dierent
scales:
high- resolu6on
methods
for
Fluctua6ng
Hydrodynamics
equa6ons
8 k T ( x x ) ( t t ) ; 3 2 k T 2 q ( x, t ) q ( x , t ) = ( x x ) ( t t ) ;
( q + u s ) =0 x
u2 E = cv T + ; 2
c2 s< ( 1)
Stochas6c
uxes
Stochastic fluxes approximation
8 k T ( x x ) (t t ) ; 3 2 k T 2 q ( x, t ) q ( x, t ) = ( x x ) (t t ) ; s ( x, t ) s ( x, t ) =
sh ( x, t ) =
8 k T Gauss ( 0,1) ; 3 x t
2 k T 2 qh ( x, t ) = Gauss ( 0,1) ; x t
For high value of stochastic forcing (large s and q fluxes) the solution of the LL Navier-Stokes equations is very challenging
+ c =0 t x
Explicit, second-order in space and time Non-dissipative and low-dispersive Very compact stencil Conservation form Staggered variables: one-cell stencil in space and time Nonlinear flux correction based on maximum principle Nonlinear flux reconstruction based on the minimum solution variation Highly scalable method and has already been successfully used in unsteady convection-dominated flow modelling
J2
MacCormack scheme Piecewise parabolic method Third-order Runge-KuFa CABARET Molecular Simula8on
E2
MacCormack scheme Piecewise parabolic method Third-order Runge-KuFa CABARET Molecular Simula8on
Interesting phenomena concurrently occur at small and larger scale, both in time and space
Numerically dicult to deal eciently with large time/space dierences A multi-space-time algorithm is demonstrated
Scaling challenges in MD
n 50,000 FLOP/particle/step n Typical system size : N=105 n 5 GFLOP/step n 5TFLOPS eective performance 1msec/step = 170nsec/day Rather Easy n 5PFLOPS eective performance 1sec/step = 200sec/day??? Dicult, but important
Scaling of MD on K Computer
Strong scaling 50 atoms/core ~3M atoms/Pflops
Since K Computer is still under development, the result shown here is tentative.
1,674,828 atoms
22
Host Computer
Particle Data
GRAPE
Results
J. Makino & M. Taiji, Scientific Simulations with Special-Purpose Computers, John Wiley & Sons, 1997.
SoC-based System
Generalpurpose core
Generalpurpose core
Embedded memories
Embedded memories
Network Low-Latency
Anton
n D. E. Shaw Research n Special-purpose pipeline + General-purpose CPU core + Specialized network n Anton showed the importance of the optimization in communication system
MDGRAPE-4
n Special-purpose computer for MD simulation n Test platform for special-purpose machines n Target performance
20sec/step for 100K atom system 8.6sec/day (2fsec/step)
MDGRAPE-4 System
MDGRAPE-4 SoC 12 lane 6Gbps Electric = 7.2GB/s (after 8B10B encoding) 48 Optical Fibers
Total 512 chips (8x8x8)
MDGRAPE-4 System-on-Chip
n40 nm (Hitachi HDL4S), ~ 230mm2 n64 force calculation pipelines @ 0.8GHz ~ 2.5 TFLOPS equivalent n64 general-purpose processors Tensilica Extensa LX4 @0.6GHz n72 lane SERDES @6GHz n65W
Control GP
Core
IMem DMem
8 Pipelines
Core
IMem DMem
Core
IMem DMem
Core
IMem DMem
Global Memory
Network Unit
6Gbps x 12 x 6
128bit X 2 for Generalpurpose core 192bit X 2 for Pipeline 64 bit X 6 for Network 256bit X 2 for Inter-block
General-Purpose Core
n Tensilica LX @ 0.6 GHz n 32bit integer / 32bit Floating n 4KB I-cache / 4KB D-cache n 8KB Local Memory
DMA or PIF access
Dcache
4KB
8KB I-ram
Core Core
Core Core
Core Core
Core Core
Global Memory
Control Processor
Software evaluation platform for MDGRAPE-4: RTL model nRTL-based simulator on Candence Ncsim nCycle accurate nSlow (>10ms/cycle)
nTwo-levels
Precise memory models for instruction In nite memory for instruction
nDirect control of network units nNo operating system, with simple monitor
Evaluation platform based on MDGRAPE-4 simulator
nExtend MDGRAPE-4 simulator nChange Balance
More resources for general-purpose cores
nGeneral-purpose cores
Shared on-chip memory for 8-16 cores o-chip memory synchronization mechanism
Special- purpose block General- purpose cores Local/Cache memories On-chip Network O-chip Memory
Special- purpose block General- purpose cores Local/Cache memories On-chip Network
Special- purpose block General- purpose cores Local/Cache memories On-chip Network O-chip Memory
Special- purpose block General- purpose cores Local/Cache memories On-chip Network
O-chip Network
O-chip Network
Toward Exascale
For Molecular Dynamics n Single-chip system
>1/30 of the MDGRAPE-4 system can be embedded with 11nm process Local MD + Multiscale Still network is necessary inside SoC