Vous êtes sur la page 1sur 22

MPI Quick Reference: Compiling/Running

M. D. Jones, Ph.D.
Center for Computational Research University at Buffalo State University of New York

High Performance Computing I, 2012

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

1 / 24

Quickstart to Compiling & Running MPI Applications at CCR

Background

Background

This document covers the essentials of compiling and running MPI applications on the CCR platforms. It does not cover MPI programming itself, nor debugging, etc. (covered more thoroughly in separate presentations).

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

3 / 24

Quickstart to Compiling & Running MPI Applications at CCR

Modules

Modules Software Management System

There are a large number of available software packages on the CCR systems, particularly the Linux clusters. To help maintain this often confusing environment, the modules package is used to add and remove these packages from your default environment (many of the packages conict in terms of their names, libraries, etc., so the default is a minimally populated environment).

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

4 / 24

Quickstart to Compiling & Running MPI Applications at CCR

Modules

The module Command


module command syntax:
bash2.05b$ module h e l p Modules Release 3 . 1 . 6 ( C o p y r i g h t GNU GPL v2 1 9 9 1 ) : A v a i l a b l e Commands and Usage : + add | l o a d modulefile [ modulefile . . . ] + rm | unload modulefile [ modulefile . . . ] + s w i t c h | swap modulefile1 modulefile2 + d i s p l a y | show modulefile [ modulefile . . . ] + avail [ modulefile [ modulefile . . . ] ] + use [a | - append ] dir [ dir . . . ] + unuse dir [ dir . . . ] + update + purge + list + clear + help [ modulefile [ modulefile . . . ] ] + whatis [ modulefile [ modulefile . . . ] ] + apropos | keyword string + initadd modulefile [ modulefile . . . ] + initprepend modulefile [ modulefile . . . ] + initrm modulefile [ modulefile . . . ] + initswitch modulefile1 modulefile2 + initlist + initclear

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

5 / 24

Quickstart to Compiling & Running MPI Applications at CCR

Modules

Using module in Batch

If you change shells in your batch script you may need to explicitly load the modules environment: tcsh :
source $MODULESHOME/ i n i t / t c s h

bash :
. $ {MODULESHOME} / i n i t / bash

but generally you should not need to worry about this step (do a module list and if it works ok your environment should already be properly initialized).

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

6 / 24

Simple MPI Example

Objective: Construct a very elementary MPI program to do the usual Hello World problem, i.e. have each process print out its rank in the communicator.

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

8 / 24

Simple MPI Example

in C

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

# include < s t d i o . h> # include " mpi . h " i n t main ( i n t argc , char * * argv ) { i n t myid , nprocs ; i n t namelen , mpiv , mpisubv ; char processor_name [MPI_MAX_PROCESSOR_NAME ] ; M P I _ I n i t (& argc ,& argv ) ; MPI_Comm_size (MPI_COMM_WORLD,& nprocs ) ; MPI_Comm_rank (MPI_COMM_WORLD,& myid ) ; MPI_Get_processor_name ( processor_name ,& namelen ) ; p r i n t f ( " Process %d o f %d on %s \ n " , myid , nprocs , processor_name ) ; i f ( myid == 0 ) { MPI_Get_version (& mpiv ,& mpisubv ) ; p r i n t f ( " MPI V e r s i o n : %d.%d \ n " , mpiv , mpisubv ) ; } MPI_Finalize ( ) ; return 0; }

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

9 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

U2: Intel MPI

There are several commercial implementations of MPI, Intel and HP currently being the most prominent (IBM, Sun, SGI, etc. all have their own variants, but usually are only supported on their own hardware). CCR has a license for Intel MPI, and it has some nice features: Support for multiple networks (Inniband, Myrinet, TCP/IP) Part of the ScaLAPACK support in the Intel MKL MPI-2 features (one-sided, dynamic tasks, I/O with parallel lesystems support) CPU pinning/process afnity options (extensive)

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

10 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

Build the code with the appropriate wrappers:


[ u2 : ~ / d_mpisamples ] $ module l o a d i n t e l mpi i n t e l / 1 2 . 1 [ u2 : ~ / d_mpisamples ] $ module l i s t C u r r e n t l y Loaded M o d u l e f i l e s : 1) n u l l 2 ) modules 3 ) use . own 4 ) i n t e l mpi / 4 . 0 . 3 5) i n t e l /12.1 [ u2 : ~ / d_mpisamples ] $ m p i i c c o h e l l o . i m p i h e l l o . c # icc version [ u2 : ~ / d_mpisamples ] $ mpicc o h e l l o . i m p i . gcc h e l l o . c # gcc v e r s i o n

Unfortunately Intel MPI still lacks tight integration with PBS/Torque, and instead relies on daemons (launched by you) or the hydra task launcher to initiate MPI tasks.

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

11 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

#PBS S / b i n / bash #PBS q debug #PBS l w a l l t i m e =00:10:00 #PBS l nodes =2:GM: ppn=2 #PBS jonesm@ccr . b u f f a l o . edu M #PBS e m #PBS N t e s t #PBS o subQ . o u t #PBS j oe # # Note t h e above d i r e c t i v e s can be commented o u t u s i n g an # a d d i t i o n a l " # " ( as i n t h e debug queue l i n e above ) # module l o a d i n t e l mpi i n t e l / 1 2 . 1 # # cd t o d i r e c t o r y from which j o b was s u b m i t t e d # cd $PBS_O_WORKDIR # # I n t e l MPI has no t i g h t i n t e g r a t i o n w i t h PBS, # so you have t o t e l l i t where t o run , b u t i t s mpirun # wrapper w i l l autod e t e c t PBS . # You can f i n d d e s c r i p t i o n o f a l l I n t e l MPI parameters i n t h e # I n t e l MPI Reference Manual . # See < i n t e l mpi i n s t a l l d i r > / doc / Reference_manual . p d f #

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

12 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

27 28 29 30 31 32 33 34 35 36 37 38 39

export I_MPI_DEBUG=5 # n i c e debug l e v e l , s p i t s o u t u s e f u l i n f o NPROCS= ` cat $PBS_NODEFILE | wc l ` NODES= ` cat $PBS_NODEFILE | uniq ` NNODES= ` cat $PBS_NODEFILE | u n i q | wc l ` # # mpd based way : mpdboot n $NNODES f $PBS_NODEFILE v mpdtrace mpiexec np $NPROCS . / h e l l o . i m p i mpdallexit # # mpirun wrapper : mpirun np $NPROCS . / h e l l o . i m p i

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

13 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

Intel MPI on Myrinet


The older U2 nodes have Myrinet - by default Intel MPI tries to run over the best available network:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [ u2 : ~ / d_mpisamples ] $ c a t subQ . o u t Job 2949822. d15n41 . c c r . b u f f a l o . edu has requested 2 cores / p r o c e ss o r s per node . r u n n i n g m p d a l l e x i t on f09n35 LAUNCHED mpd on f09n35 v i a RUNNING : mpd on f09n35 LAUNCHED mpd on f09n34 v i a f09n35 RUNNING : mpd on f09n34 f09n35 f09n34 [ 0 ] DAPL s t a r t u p ( ) : t r y i n g t o open d e f a u l t DAPL p r o v i d e r from d a t r e g i s t r y : mx2 [ 1 ] DAPL s t a r t u p ( ) : t r y i n g t o open d e f a u l t DAPL p r o v i d e r from d a t r e g i s t r y : mx2 [ 2 ] DAPL s t a r t u p ( ) : t r y i n g t o open d e f a u l t DAPL p r o v i d e r from d a t r e g i s t r y : mx2 [ 3 ] DAPL s t a r t u p ( ) : t r y i n g t o open d e f a u l t DAPL p r o v i d e r from d a t r e g i s t r y : mx2 [ 1 ] MPI s t a r t u p ( ) : DAPL p r o v i d e r mx2 [ 0 ] MPI s t a r t u p ( ) : DAPL p r o v i d e r mx2 [ 3 ] MPI s t a r t u p ( ) : DAPL p r o v i d e r mx2 [ 2 ] MPI s t a r t u p ( ) : DAPL p r o v i d e r mx2 [ 0 ] MPI s t a r t u p ( ) : shm and d a p l data t r a n s f e r modes [ 1 ] MPI s t a r t u p ( ) : shm and d a p l data t r a n s f e r modes [ 3 ] MPI s t a r t u p ( ) : shm and d a p l data t r a n s f e r modes [ 2 ] MPI s t a r t u p ( ) : shm and d a p l data t r a n s f e r modes Process 1 o f 4 on f09n35 . c c r . b u f f a l o . edu Process 3 o f 4 on f09n34 . c c r . b u f f a l o . edu [ 0 ] MPI s t a r t u p ( ) : Rank Pid Node name Pin cpu [ 0 ] MPI s t a r t u p ( ) : 0 30112 f09n35 . c c r . b u f f a l o . edu 0

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

14 / 24

Simple MPI Example 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

U2: Intel MPI (Preferred!!)

[ 0 ] MPI s t a r t u p ( ) : 1 30111 f09n35 . c c r . b u f f a l o . edu 1 [ 0 ] MPI s t a r t u p ( ) : 2 31983 f09n34 . c c r . b u f f a l o . edu 0 [ 0 ] MPI s t a r t u p ( ) : 3 31984 f09n34 . c c r . b u f f a l o . edu 1 [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEBUG=5 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FABRICS_LIST=dapl , t c p [ 0 ] MPI s t a r t u p ( ) : I_MPI_FALLBACK=enable [ 0 ] MPI s t a r t u p ( ) : I_MPI_PLATFORM=auto Process 0 o f 4 on f09n35 . c c r . b u f f a l o . edu MPI V e r s i o n : 2 . 1 Process 2 o f 4 on f09n34 . c c r . b u f f a l o . edu [ 1 ] DAPL s t a r t u p ( ) : t r y i n g t o open d e f a u l t DAPL p r o v i d e r from d a t r e g i s t r y [ 0 ] DAPL s t a r t u p ( ) : t r y i n g t o open d e f a u l t DAPL p r o v i d e r from d a t r e g i s t r y [ 3 ] DAPL s t a r t u p ( ) : t r y i n g t o open d e f a u l t DAPL p r o v i d e r from d a t r e g i s t r y [ 2 ] DAPL s t a r t u p ( ) : t r y i n g t o open d e f a u l t DAPL p r o v i d e r from d a t r e g i s t r y [ 0 ] MPI s t a r t u p ( ) : DAPL p r o v i d e r mx2 [ 1 ] MPI s t a r t u p ( ) : DAPL p r o v i d e r mx2 ... [ 3 ] MPI s t a r t u p ( ) : shm and d a p l data t r a n s f e r modes [ 0 ] MPI s t a r t u p ( ) : Rank Pid Node nameProcess 1 o f 4 on f09n35 . c c r . Pin cpu [ 0 ] MPI s t a r t u p ( ) : 0 30125 f09n35 . c c r . b u f f a l o . edu 0 Process 2 o f 4 on f09n34 . c c r . b u f f a l o . edu Process 3 o f 4 on f09n34 . c c r . b u f f a l o . edu [ 0 ] MPI s t a r t u p ( ) : 1 30126 f09n35 . c c r . b u f f a l o . edu 1 [ 0 ] MPI s t a r t u p ( ) : 2 32040 f09n34 . c c r . b u f f a l o . edu 0 [ 0 ] MPI s t a r t u p ( ) : 3 32041 f09n34 . c c r . b u f f a l o . edu 1 [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEBUG=5 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FABRICS_LIST=dapl , t c p [ 0 ] MPI s t a r t u p ( ) : I_MPI_FALLBACK=enable [ 0 ] MPI s t a r t u p ( ) : I_MPI_PIN_MAPPING=2:0 0 ,1 1 [ 0 ] MPI s t a r t u p ( ) : I_MPI_PLATFORM=auto Process 0 o f 4 on f09n35 . c c r . b u f f a l o . edu MPI V e r s i o n : 2 . 1

: : : :

mx2 mx2 mx2 mx2

b u f f a l o . edu

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

15 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

Intel MPI on Inniband

The newest U2 nodes have Inniband (IB) as the optimal interconnect for message-passing, running an Intel MPI job should automatically nd and use IB on those machines (and they have 8 or 12 cores each, so adjust your script accordingly):
1 [ u2 : ~ / d_mpisamples ] $ c a t subQ . o u t

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

16 / 24

Simple MPI Example 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

U2: Intel MPI (Preferred!!)

Job 2949999. d15n41 . c c r . b u f f a l o . edu has requested 12 cores / p r o c e ss o r s per node . r u n n i n g m p d a l l e x i t on k16n13a LAUNCHED mpd on k16n13a v i a RUNNING : mpd on k16n13a LAUNCHED mpd on k16n12b v i a k16n13a RUNNING : mpd on k16n12b k16n13a k16n12b [ 0 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 2 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 4 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 9 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 2 0 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 1 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 4 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 5 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 3 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 3 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 2 3 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 2 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 2 1 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 0 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 5 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 7 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 2 2 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 6 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 8 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 9 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 7 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 8 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes [ 1 6 ] MPI s t a r t u p ( ) : shm and t m i data t r a n s f e r modes

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

17 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

[ 0 ] MPI s t a r t u p ( ) : Rank Pid Node nameProcess 1 o f 24 on k16n13a . c c r . b u f f a l o . edu Process 2 o f 24 on k16n13a . c c r . b u f f a l o . edu Process 5 o f 24 on k16n13a . c c r . b u f f a l o . edu Process 11 o f 24 on k16n13a . c c r . b u f f a l o . edu ... Process 18 o f 24 on k16n12b . c c r . b u f f a l o . edu Process 20 o f 24 on k16n12b . c c r . b u f f a l o . edu Process 22 o f 24 on k16n12b . c c r . b u f f a l o . edu [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEBUG=5 [ 0 ] MPI s t a r t u p ( ) : I_MPI_FABRICS_LIST= tmi , dapl , t c p [ 0 ] MPI s t a r t u p ( ) : I_MPI_FALLBACK=enable [ 0 ] MPI s t a r t u p ( ) : I_MPI_PIN_MAPPING=12:0 0 ,1 1 ,2 2 ,3 3 ,4 4 ,5 5 ,6 6 ,7 7 ,8 8 ,9 9 ,10 10 ,11 11 [ 0 ] MPI s t a r t u p ( ) : I_MPI_PLATFORM=auto Process 0 o f 24 on k16n13a . c c r . b u f f a l o . edu MPI V e r s i o n : 2 . 1

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

18 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

Intel MPI on TCP/IP


You can force Intel MPI to run using TCP/IP (or a combination of tcp/ip and shared memory as in the example below) by setting the I_MPI_DEVICE variable, or equivalently I_MPI_FABRICS_LIST):
export I_MPI_DEBUG=5 export I_MPI_DEVICE=ssm # t c p / i p between nodes , shared memory w i t h i n

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Job 2950135. d15n41 . c c r . b u f f a l o . edu has requested 2 cores / p r o c e ss o r s per node . r u n n i n g m p d a l l e x i t on f09n35 LAUNCHED mpd on f09n35 v i a RUNNING : mpd on f09n35 LAUNCHED mpd on f09n34 v i a f09n35 RUNNING : mpd on f09n34 f09n35 f09n34 [ 3 ] MPI s t a r t u p ( ) : shared memory and s o c k e t data t r a n s f e r modes [ 0 ] MPI s t a r t u p ( ) : shared memory and s o c k e t data t r a n s f e r modes [ 2 ] MPI s t a r t u p ( ) : shared memory and s o c k e t data t r a n s f e r modes [ 1 ] MPI s t a r t u p ( ) : shared memory and s o c k e t data t r a n s f e r modes [ 1 ] MPI s t a r t u p ( ) : shm and t c p data t r a n s f e r modes [ 2 ] MPI s t a r t u p ( ) : shm and t c p data t r a n s f e r modes [ 3 ] MPI s t a r t u p ( ) : shm and t c p data t r a n s f e r modes [ 0 ] MPI s t a r t u p ( ) : shm and t c p data t r a n s f e r modes

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

19 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Process 2 o f 4 on f09n34 . c c r . b u f f a l o . edu Process 1 o f 4 on f09n35 . c c r . b u f f a l o . edu Process 3 o f 4 on f09n34 . c c r . b u f f a l o . edu [ 0 ] MPI s t a r t u p ( ) : Rank Pid Node name [ 0 ] MPI s t a r t u p ( ) : 0 30385 f09n35 . c c r . b u f f a l o . edu [ 0 ] MPI s t a r t u p ( ) : 1 30384 f09n35 . c c r . b u f f a l o . edu [ 0 ] MPI s t a r t u p ( ) : 2 32261 f09n34 . c c r . b u f f a l o . edu [ 0 ] MPI s t a r t u p ( ) : 3 32260 f09n34 . c c r . b u f f a l o . edu [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEBUG=5 [ 0 ] MPI s t a r t u p ( ) : I_MPI_DEVICE=ssm [ 0 ] MPI s t a r t u p ( ) : I_MPI_FABRICS_LIST=dapl , t c p [ 0 ] MPI s t a r t u p ( ) : I_MPI_FALLBACK=enable [ 0 ] MPI s t a r t u p ( ) : I_MPI_PLATFORM=auto [ 0 ] MPI s t a r t u p ( ) : MPICH_INTERFACE_HOSTNAME= 1 0 . 1 0 6 . 9 . 3 5 Process 0 o f 4 on f09n35 . c c r . b u f f a l o . edu MPI V e r s i o n : 2 . 1

Pin cpu 0 1 0 1

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

20 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

Intel MPI Summary


Intel MPI has some real advantages: Multi-protocol support with the same build, by default gives you the "best" network, but also gives you the exibility to choose your protocol CPU/memory afnity settings Multiple compiler support (wrappers for GNU compilers, mpicc, mpicxx, mpif90, as well as Intel compilers, mpiicc, mpicpc, mpiifort) (Relatively) simple integration with Intel MKL, including ScaLAPACK Reference manual - on the CCR systems look at $INTEL_MPI/doc/Reference_Manual.pdf for a copy of the reference manual (after loading the module)

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

21 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

Whither Goest Thou, MPI?

MPI processes - things to keep in mind: You can over-subscribe the processors if you want, but that is going to under-perform (but it is often useful for debugging). Note that batch queuing systems (like those in CCR) may not let you easily over-subscribe the number of available processors Better MPI implementations will give you more options for the placement of MPI tasks (often through so-called "afnity" options, either for CPU or memory) Typically want a 1-to-1 mapping of MPI processes with available processors (cores), but there are times when that may not be desirable

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

22 / 24

Simple MPI Example

U2: Intel MPI (Preferred!!)

Afnity

Intel MPI has options for associating MPI tasks to cores - better known as CPU-process afnity I_MPI_PIN, I_MPI_PIN_MODE, I_MPI_PIN_PROCESSOR_LIST, I_MPI_PIN_DOMAIN in the current version of Intel MPI (it never hurts to check the documentation for the version that you are using, these options have a tendency to change) Can specify core list on which to run MPI tasks, also domains of cores for hybrid MPI-OpenMP applications

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

23 / 24

Simple MPI Example

Summary

Summary - MPI at CCR

Use modules environment manager to choose your MPI avor I recommend Intel MPI on the clusters, unless you need access to the source code for the implementation itself. It has a lot of nice features and is quite exible. Be careful with task launching - use mpiexec whenever possible Ensure that your MPI processes end up where you want - use ps and top to check (also use MPI_Get_processor_name in your code). Also use the CCR ccrjobviz.pl job visualizer utility to quickly scan for expected task placement and performance issues.

M. D. Jones, Ph.D. (CCR/UB)

MPI Quick Reference: Compiling/Running

HPC-I Fall 2012

24 / 24

Vous aimerez peut-être aussi