Académique Documents
Professionnel Documents
Culture Documents
Eric Aubanel
Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick
Shared Memory
Address space
Processes
Processes
Processes
Processes
Processes
Processes
Processes
Parallel execution is achieved by generating multiple threads which execute in parallel Number of threads (in principle) is independent of the number of processors
Because they are lightweight, they are (relatively) inexpensive to create and destroy.
Creation of a thread can take three orders of magnitude less time than process creation!
Threads can be created and assigned to multiple processors: This is the basis of SMP parallelism!
Process
stack
stack IP
code heap
Threads
stack IP
OpenMP
1997: group of hardware and software vendors announced their support for OpenMP, a new API for multi-platform shared-memory programming (SMP) on UNIX and Microsoft Windows NT platforms.
www.openmp.org
OpenMP provides comment-line directives, embedded in C/C++ or Fortran source code, for
scoping data specifying work load synchronization of threads
OpenMP example
Subroutine saxpy(z, a, x, y, n) integer i, n real z(n), a, x(n), y !$omp parallel do do i = 1, n z(i) = a * x(i) + y end do return end
OpenMP Threads
1.All OpenMP programs begin as a single process: the master thread 2.FORK: the master thread then creates a team of parallel threads 3.Parallel region statements executed in parallel among the various team threads 4.JOIN: threads synchronize and terminate, leaving only the master thread
Global shared memory Parallel execution Each thread has a private copy of i References to i are to the private copy
Division of Work
n = 40, 4 threads
a y n
Subroutine saxpy(z, a, x, y, n) integer i, n real z(n), a, x(n), y !$omp parallel do i = 1, 10 do i = 1, n z(i) = a * x(i) + y end do return end
local memory
i = 11, 20 i = 21, 30 i = 31, 40
Variable Scoping
The most difficult part of shared memory parallelization.
What memory is shared What memory is private (i.e. each processor has its own copy) How private memory is treated vis vis the global address space.
Variables are shared by default, except for loop index in parallel do This must mesh with the Fortran view of memory
Global: shared by all routines Local: local to a given routine
saved vs. non-saved variables (through the SAVE statement or -save option)
Free form:
!$OMP PARALLEL DO & !$OMP PRIVATE (JMAX) & !$OMP SHARED(A, B)
OpenMP in C
Same functionality as OpenMP for FORTRAN Differences in syntax:
#pragma omp for
static variables declared within a parallel region are also shared heap allocated memory (malloc) is shared (but pointer can be private) automatic storage declared within a parallel region is private (ie, on the stack)
OpenMP Overhead
Overhead for parallelization is large (eg. 8000 cycles for parallel do over 16 processors of SGI Origin 2000)
size of parallel work construct must be significant enough to overcome overhead rule of thumb: it takes 10 kFLOPS to amortize overhead
OpenMP Use
How is OpenMP typically used? OpenMP is usually used to parallelize loops:
Find your most time consuming loops. Split them up between threads.
Better scaling can be obtained using OpenMP parallel regions, but can be tricky!
Small API based on compiler directives and limited library routines Same program can be used for sequential and parallel execution Shared vs private variables can cause confusion
Portable to all platforms Parallelize all or nothing Vast collection of library routines Possible but difficult to use same program for serial and parallel execution variables are local to each processor
References
Parallel Programming in OpenMP, by Chandra et al. (Morgan Kauffman) www.openmp.org Multimedia tutorial at Boston University:
scv.bu.edu/SCV/Tutorials/OpenMP/