06 Session3 G8ExascaleWorkshop INGENIOUS

INGENIOUS:
Using next generation computers and algorithms for modeling the dynamics of large biomolecular systems
Makoto Taiji
Computational Biology Research Core RIKEN Quantitative Biology Center Processor Research Team RIKEN Advanced Institute for Computational Sciences
taiji@riken.jp
Our future targets
Cilia of mouse embryo Fluid dynamic mechanism responsible for breaking left right symmetry of the Human Body: The Nodal Flow,
N. Hirokawa, Y. Okada, Y. Tanaka, Annual Review of Fluid Mechanics 41, 53-72 (2009). https://www.youtube.com/watch?v=3y_P67KwuvU
Bacterial Flagellum https://www.youtube.com/watch?v=vxiwhfgzL0Q
Challenges in Molecular Dynamics simulations of biomolecules

Target Region
Strong Anton/MDG4 Scaling
30,000 year ExaFLOPS 1021J energy (~3x1018J is spent in Japan in each year) Multiscale approach is essential
Weak Scaling
K computer
S. O. Nielsen, et al, J. Phys. (Condens. Matter.), 15 (2004) R481
Organization
nMolecular Fluctuations (Aston Group) nMolecular Fluctuations Fluid Dynamics (Moscow Group) nMultiscale Fluid Dynamics (Univ. London/ Cambridge Group) nHPC (RIKEN)
Mercedes-Benz water as a bridge to the macroscale Arturs Scukins and Dmitry Nerukh
Implementa8on in molecular dynamics
Why MB water?
A rela8vely simple, 2D model that has all features and peculiari8es of real water Being 2D scales much beFer with size: allows to reach the spa8al sizes of hydrodynamics Well developed theore8cally (started by Ben- Naim in early seven8es) Computa8onally well studied by Monte Carlo, but no inves8ga8ons by Molecular Dynamics Well suited fro our purpose of developing hybrid Molecular Dynamics Fluid Dynamics approach
Mercedes Benz poten8al: , where is Lennard-Jones poten8al, , is orienta8on dependant poten8al, is a Gaussian func8on.
We have derived the formulas for calcula8ng thermodynamics from MD trajectories

Temperature: Pressure: Heat capacity: Heat expansion coecient: Compressibility:
where is a 8me average, V is an area, N - number of molecules, T - temperature, K kine8c energy, density.
Results
Structure
The RDF qualita8vely diers from Lennard Jones RDF but coincides with the results obtained using Monte Carlo
Conclusions
The 2D Mercedes-Benz model mimics real water behaviour. Captures minimum of pressure (volume), nega8ve expansion coecient, minimum of compressibility and high heat capacity. RDF qualita8vely diers from Lennard-Jones RDF.
Towards accurate modeling across dierent scales: high- resolu6on methods for Fluctua6ng Hydrodynamics equa6ons

V.Y.Glotov, V.M.Goloviznin, A.V.Danilin

s ( x, t ) s ( x , t ) =
One dimensional case

u + =0 t x 2 u ( u + P ) 4 2u s + 2 = 0 t x 3 x x E ( E + P ) u 4 u T + u + t x x 3 x x
8 k T ( x x ) ( t t ) ; 3 2 k T 2 q ( x, t ) q ( x , t ) = ( x x ) ( t t ) ;
( q + u s ) =0 x
u2 E = cv T + ; 2
Characteristic form of LL-NS equations

u ( 1) s 1 P u 1 P + = G1 ; + u + c 2 + 2 2 2 2 t t x x c ( 1) s c ( 1) s u u 1) s ( 1 P 1 P 2 + u c = G2 ; 2 2 2 2 t t x c ( 1) s c ( 1) s x P ln t P s 1 + u ln c T t v x s 1 c T x = G3 ; v
Condition for hyperbolicity
c2 s< ( 1)
Stochas6c uxes
Stochastic fluxes approximation
8 k T ( x x ) (t t ) ; 3 2 k T 2 q ( x, t ) q ( x, t ) = ( x x ) (t t ) ; s ( x, t ) s ( x, t ) =
sh ( x, t ) =
8 k T Gauss ( 0,1) ; 3 x t
2 k T 2 qh ( x, t ) = Gauss ( 0,1) ; x t
For high value of stochastic forcing (large s and q fluxes) the solution of the LL Navier-Stokes equations is very challenging
Our choice: Compact Accurately Boundary Adjusting high-REsolution Technique (CABARET)

Iserlis 1986 Roe 1998 Samarskii and Goloviznin 1998 Goloviznin and Karabasov 1998 Karabasov, Hynes and Goloviznin 2001 Tran and Scheurer 2002 Kim 2004 Goloviznin 2005 Karabasov and Goloviznin 2007, 2009
+ c =0 t x
Explicit, second-order in space and time Non-dissipative and low-dispersive Very compact stencil Conservation form Staggered variables: one-cell stencil in space and time Nonlinear flux correction based on maximum principle Nonlinear flux reconstruction based on the minimum solution variation Highly scalable method and has already been successfully used in unsteady convection-dominated flow modelling
Comparison of several computa8onal schemes for the Bell problem

Variance in conserved quantities at equilibrium
2 Exact value: 2.35 108
MacCormack scheme Piecewise parabolic method Third-order Runge-KuFa CABARET Molecular Simula8on 2.01 1.97 2.34 2.31 2.35 -14.3% -16.0% -1.3% -1.7% 0%
J2
Exact value: 13.34

13.31 13.27 13.65 13.18 13.21 -0.3% -0.5% 2.3% -1.2% -1%
MacCormack scheme Piecewise parabolic method Third-order Runge-KuFa CABARET Molecular Simula8on
E2
Exact value: 2.84 1010

2.61 2.58 2.87 2.75 2.78 -8.4% -9.4% 0.9% -3.2% -2.1%
MacCormack scheme Piecewise parabolic method Third-order Runge-KuFa CABARET Molecular Simula8on
A MULTI-SPACE-TIME ALGORITHM FOR CONCURRENT LARGE/SMALL SCALE FLUID DYNAMICS SIMULATIONS

Anton Markesteijn and Sergey Karabasov
Towards micro- and nano-scales

Temperature uctuations important, large density and velocity uctuations Acoustics: ultra-sound / Biological applications: Coupling with MD
Interesting phenomena concurrently occur at small and larger scale, both in time and space
Numerically dicult to deal eciently with large time/space dierences A multi-space-time algorithm is demonstrated
Mul6 Space-Time algorithm - Overview

Fluctuating Hydrodynamics (Landau&Lifshitz)
Mimic microscopic behaviour at macroscopic scales Dissipative uxes treated as stochastic variables Thermodynamics: Fluctuation-Dissipation theorem
Scale Function: A (pre)de ned meshless zoom value

The value of this function is increased where small time and space phenomena are dominant The scale function also determines the actual comp. grid
Equation transformations both in space and time

Transformations are dependent on scale function Transformed (Computational Domain) / Untransformed (Physical Domain)
Special time marching (local and global time)

Local time step controlled by scale function Cells only updated when necessary (local<global time) (CFL curse): increased eciency, decreased error
Some Examples of Scale Func6ons

1D Scale dierence of 5
Both mesh size and local time scaled Computational domain simple Cartesian mesh
2D Mesh (radial 25 to 1 (200x200 mesh)
2D Example: Fluctua6ng Hydrodynamics

Scale dierence of 100, on a 200x200 mesh
Probe in centre of domain

Measure density transient Acoustic signal recovered from noise
Time ensemble
Variables are Maxwellian
Fluctua6ng Hydrodynamics vs MD Density uctuations and the speed of sound

Domain 250x40, Scale Function 1 to 25 to 1 in plateaus Smallest volume 0.6x0.6x0.6 nm3 (liquid water) Speed of sound obtained by t (~1510 m/s) Continuum results compared to MD results
Scaling challenges in MD
n 50,000 FLOP/particle/step n Typical system size : N=105 n 5 GFLOP/step n 5TFLOPS eective performance 1msec/step = 170nsec/day Rather Easy n 5PFLOPS eective performance 1sec/step = 200sec/day??? Dicult, but important
Scaling of MD on K Computer
Strong scaling 50 atoms/core ~3M atoms/Pflops
Since K Computer is still under development, the result shown here is tentative.
1,674,828 atoms
22
GRAPE: special-purpose computer for classical particle simulations

n GRAvity PipE n Originaly proposed by Prof. Chikada, NAOJ n Special-purpose accelerator
Astrophysical N-body simulations Molecular Dynamics Simulations
Host Computer
Particle Data
GRAPE
Results
Most of Calculation Others
GRAPE Host computer
J. Makino & M. Taiji, Scientific Simulations with Special-Purpose Computers, John Wiley & Sons, 1997.
Problem in Heterogeneous System - GRAPE/ GPUs nIn small system

Good acceleration, High performance/cost
nIn massively-parallel system

Scaling is often limited by host-host network, host-accelerator interface
Typical Accelerator System
Accelerator Low-Bandwidth Host Computer Host Computer Accelerator
Accelerator Accelerator High-Bandwidth Low-Latency System-on-Chip
SoC-based System
Generalpurpose core
Generalpurpose core
Host Network High-Latency
Embedded memories
Embedded memories
Host Computer Low-Bandwidth High-Latency
Network Low-Latency
Anton
n D. E. Shaw Research n Special-purpose pipeline + General-purpose CPU core + Specialized network n Anton showed the importance of the optimization in communication system
R. O. Dror et al., Proc. Supercomputing 2009, in USB memory.
MDGRAPE-4
n Special-purpose computer for MD simulation n Test platform for special-purpose machines n Target performance
20sec/step for 100K atom system 8.6sec/day (2fsec/step)
n Target application : GROMACS n Completion: ~2013 n Enhancement from MDGRAPE-3

130nm 40nm process Integration of Network / CPU
MDGRAPE-4 System
MDGRAPE-4 SoC 12 lane 6Gbps Electric = 7.2GB/s (after 8B10B encoding) 48 Optical Fibers Total 512 chips (8x8x8)
Node (2U Box) Total 64 Nodes (4x4x4) =4 pedestals
12 lane 6Gbps Optical
MDGRAPE-4 System-on-Chip
n40 nm (Hitachi HDL4S), ~ 230mm2 n64 force calculation pipelines @ 0.8GHz ~ 2.5 TFLOPS equivalent n64 general-purpose processors Tensilica Extensa LX4 @0.6GHz n72 lane SERDES @6GHz n65W
SoC Block Diagram

Instruction Memory (CGP) Pipeline Blocks 8 Pipelines
Core
Instruction Memory (1) GP Blocks IMem DMem
Instruction Memory (2)
Control GP
Core
IMem DMem
8 Pipelines
Core
IMem DMem
Core
IMem DMem
Message Queue 8 Pipelines
Core
IMem DMem
Bus Arbiter /DMAC
Bus Arbiter /DMAC
Global Memory
Network Unit
FPGA IF 100MHz x 128
6Gbps x 12 x 6
Embedded Global Memories in SoC

n ~1.8MB n 4 Block n For Each Block
2 Pipeline Blocks Network 2 GP Blocks GM4 Block 460KB GM4 Block 460KB
128bit X 2 for Generalpurpose core 192bit X 2 for Pipeline 64 bit X 6 for Network 256bit X 2 for Inter-block
GM4 Block 460KB
GM4 Block 460KB
General-Purpose Core
n Tensilica LX @ 0.6 GHz n 32bit integer / 32bit Floating n 4KB I-cache / 4KB D-cache n 8KB Local Memory
DMA or PIF access
Core 4KB 8KB D-ram
Dcache
Core Integer Queue Floa:ng

Icache
4KB
n 8KB Local Instruction Memory

DMA read from 512KB Instruction memory
GP Block Instruction Memory
Inst- ruc:on DMAC Barrier
8KB I-ram
Core Core
Core Core
Core Core
Core Core
DMAC PIF Queue IF
Global Memory
Control Processor
Software evaluation platform for MDGRAPE-4: RTL model nRTL-based simulator on Candence Ncsim nCycle accurate nSlow (>10ms/cycle)
Software evaluation platform for MDGRAPE-4:

nUnder construction (4Q 2012) nTensilica XTMP based multicore processor simulator (non-free) nIncludes behavior models of
Network Special-purpose pipeline Memories (latency can be considered)
nTwo-levels
Precise memory models for instruction In nite memory for instruction
Software evaluation platform for MDGRAPE-4 (2) nProgramming language

C C++ (without malloc)
nDirect control of network units nNo operating system, with simple monitor
Evaluation platform based on MDGRAPE-4 simulator nExtend MDGRAPE-4 simulator nChange Balance
More resources for general-purpose cores
nGeneral-purpose cores
Shared on-chip memory for 8-16 cores o-chip memory synchronization mechanism
nSpecial-purpose pipelines nNetwork interface
Special- purpose block General- purpose cores Local/Cache memories On-chip Network O-chip Memory
Special- purpose block General- purpose cores Local/Cache memories On-chip Network
Special- purpose block General- purpose cores Local/Cache memories On-chip Network O-chip Memory
Special- purpose block General- purpose cores Local/Cache memories On-chip Network
O-chip Network
O-chip Network
Toward Exascale
For Molecular Dynamics n Single-chip system
>1/30 of the MDGRAPE-4 system can be embedded with 11nm process Local MD + Multiscale Still network is necessary inside SoC
n For further strong scaling for MD

# of operations / step / 20Katom ~ 109 # of arithmetic units in system ~ 106 /P ops
Exascale means Flash (one-path) calculation

More specialization is required
Meetings & Visits

n Past
Nov 2011 @ Cambridge UK PI(Dr. Nerukh)s visit to RIKEN for a month in Dec 2011 Sep 2012 @ Kobe UK Researcher (Mr. Skukins) RIKEN (Sep-Nov 2012)
n Future related events

Dec 2012: UK-Japan bilateral workshop at British embassy in Tokyo (supported by British embassy Japan) Jul 2013: Royal Society Kavli Seminar in UK
Multiscale systems: linking quantum chemistry, molecular dynamics, and micro uidic hydrodynamics
Project workshops in UK or/and Japan (2013)

06 Session3 G8ExascaleWorkshop INGENIOUS

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

06 Session3 G8ExascaleWorkshop INGENIOUS

Transféré par

Droits d'auteur :

Formats disponibles

INGENIOUS:

Our future targets

Bacterial Flagellum https://www.youtube.com/watch?v=vxiwhfgzL0Q

Challenges in Molecular Dynamics simulations of biomolecules

S. O. Nielsen, et al, J. Phys. (Condens. Matter.), 15 (2004) R481

We have derived the formulas for calcula8ng thermodynamics from MD trajectories

V.Y.Glotov, V.M.Goloviznin, A.V.Danilin

One dimensional case

Characteristic form of LL-NS equations

Condition for hyperbolicity

Our choice: Compact Accurately Boundary Adjusting high-REsolution Technique (CABARET)

Comparison of several computa8onal schemes for the Bell problem

Exact value: 13.34

Exact value: 2.84 1010

A MULTI-SPACE-TIME ALGORITHM FOR CONCURRENT LARGE/SMALL SCALE FLUID DYNAMICS SIMULATIONS

Anton Markesteijn and Sergey Karabasov

Towards micro- and nano-scales

Mul6 Space-Time algorithm - Overview

Scale Function: A (pre)de ned meshless zoom value

Equation transformations both in space and time

Special time marching (local and global time)

Some Examples of Scale Func6ons

2D Mesh (radial 25 to 1 (200x200 mesh)

2D Example: Fluctua6ng Hydrodynamics

Probe in centre of domain

Variables are Maxwellian

Fluctua6ng Hydrodynamics vs MD Density uctuations and the speed of sound

GRAPE: special-purpose computer for classical particle simulations

Most of Calculation Others

GRAPE Host computer

Problem in Heterogeneous System - GRAPE/ GPUs nIn small system

nIn massively-parallel system

Host Network High-Latency

Host Computer Low-Bandwidth High-Latency

R. O. Dror et al., Proc. Supercomputing 2009, in USB memory.

n Target application : GROMACS n Completion: ~2013 n Enhancement from MDGRAPE-3

Node (2U Box) Total 64 Nodes (4x4x4) =4 pedestals

12 lane 6Gbps Optical

SoC Block Diagram

Instruction Memory (1) GP Blocks IMem DMem

Instruction Memory (2)

Message Queue 8 Pipelines

Bus Arbiter /DMAC

Bus Arbiter /DMAC

FPGA IF 100MHz x 128

Embedded Global Memories in SoC

GM4 Block 460KB

GM4 Block 460KB

Core 4KB 8KB D-ram

Core Integer Queue Floa:ng

n 8KB Local Instruction Memory

DMAC PIF Queue IF

Software evaluation platform for MDGRAPE-4:

Software evaluation platform for MDGRAPE-4 (2) nProgramming language

nSpecial-purpose pipelines nNetwork interface

n For further strong scaling for MD

Exascale means Flash (one-path) calculation

Meetings & Visits

n Future related events

Project workshops in UK or/and Japan (2013)

Vous aimerez peut-être aussi