Parallel Programming
in OpenMP
Rohit Chandra
Leonardo Dagum
Dave Kohr
Dror Maydan
Jeff McDonald
Ramesh MenonAbout the Authors
Rohit Chandra is a chief scientist at NARUS, Inc., a provider of internet
business infrastructure solutions. He previously was a principal engineer
in the Compiler Group at Silicon Graphics, where he helped design and
implement OpenMP.
Leonardo Dagum works jor Silicon Graphies in the Linux Server Platform
Group, where he is responsible for the [/O infrastructure in SGI's scalable
Linux server systems. He helped define the OpenMP Fortran API. His
research interests include parallel algorithms and performance modeling
for parallel systems
Dave Kohr is a member of the technical staff at NARUS, Inc. He previ-
ously was a member of the technical staff in the Compiler Group at Silicon
Graphics, where he helped define and implement the OpenMP.
Dror Maydan is director of software at Tensilica, Inc., a provider of appli
cation-specific processor technology. He previously was an engineering
department manager in the Compiler Group of Silicon Graphics, where he
helped design and implement OpenMP.
Jeff McDonald owns SolidFX, a private software development company.
As the engineering department manager at Silicon Graphics, he proposed
the OpenMP API effort and helped develop it into the industry standard it
is today.
Ramesh Menon is a staff engineer at NARUS, Ine, Prior to NARUS,
Ramesh was a staff engineer at SGI, representing SGI in the OpenMP
forum, He was the founding chairman of the OpenMP Architecture Review
Board (ARB) and supervised the writing of the first OpenMP specifica-
tions.Foreword
by John L. Hennessy
President, Stantord University
FOR A NUMBER OF YEARS, | have believed that advances in
software, rather than hardware, held the key to making parallel computing
more commonplace. In particular, the lack of a broadly supported standard
for programming shared-memory multiprocessors has been a chasm both
for users and for software vendors interested in porting their software to
these multiprocessors. OpenMP represents the first vendor-independent,
commercial “bridge” across this chasm.
Such a bridge is critical to achieve portability across different shared-
memory multiprocessors. In the parallel programming world, the chal
lenge is to obtain both this functional portability as well as performance
portability. By performance portability, | mean the ability to have reason-
able expectations about how parallel applications will perform on different
multiprocessor architectures. OpenMP makes important strides in enhanc-
ing performance portability among shared-memory architectures.
Parallel computing is attractive because it offers users the potential of
higher performance. The central problem in parallel computing for nearly
20 years has been to improve the “gain to pain ratio.” Improving this ratio,
with either hardware or software, means making the gains in performance
come at less pain to the programmer! Shared-memory multiprocessing
was developed with this goal in mind. It provides a familiar programming
model, allows parallel applications to be developed incrementally, and
viiviii
Foreword
supports fine-grain communication in a very cost effective manner. All of
these factors make it easier to achieve high performance on parallel
machines. More recently, the development of cache-coherent distributed
shared memory has provided a method for scaling shared-memory archi-
tectures to larger numbers of processors. In many ways, this development
removed the hardware barrier to scalable, shared-memory multiprocess-
ing.
OpenMP represents the important step of providing a software stan-
dard for these shared-memory multiprocessors. Our goal now must be to
Jearn how to program these machines effectively (i.e., with a high value
for gain/pain). This book will help users accomplish this important goal.
By focusing its attention on how to use OpenMP, rather than on defining
the standard, the authors have made a significant contribution to the
important task of mastering the programming of multiprocessors.Contents
Foreward, by John L. Hennessy
Preface
Chaplet | = Introduction
Performance with OpenMP
A First Glimpse of OpenMP
The OpenMP Parallel Computer
Why OpenMP?
History of OpenMP
Navigating the Rest of the Book
Chaplet? == Getting Started with OpenMP
Introduction
OpenMP from 10,000 Meters
OpenMP Compiler Directives or Pragmas
Parallel Control Structures
Communication and Data Environment
Synchronization
Parallelizing a Simple Loop
Runtime Execution Model of an OpenMP Program
Communication and Data Scoping
6
9
13
4
15
15
16
7
20
20
22
23
24
25Contents
Synchronization in the Simple Loop Example 27
Final Words on the Simple Loop Example 28
! A More Complicated Loop 29
Explicit Synchronization 32
The reduction Clause 35
Expressing Parallelism with Parallel Regions 36
Concluding Remarks 39
Exercises: 40
Chaplet}. = Exploiting Loop-Level Parallelism 4
Introduction 41
Form and Usage of the parallel do Directive 42
Clauses 43
Restrictions on Parallel Loops 44
Meaning of the parallel do Directive 46
Loop Nests and Parallelism 46
trolling Data Sharing 7
General Properties of Data Scope Clauses 49
The shared Clause 50
The private Clause 51
Default Variable Scopes 53
Changing Default Scoping Rules 56
Parallelizing Reduction Operations 59
Private Variable Initialization and Finalization 63
Removing Data Dependences 65
Why Data Dependences Are a Problem 66
The First Step: Detection 67
The Second Step: Classification 71
The Third Step: Removal el
Summary 81
Enhancing Performance 82
Ensuring Sufficient Work 82
Scheduling Loops to Balance the Load 85
Static and Dynamic Scheduling 86
Scheduling Options 86
Comparison of Runtime Scheduling Behavior 88
«! Concluding Remarks 90
Exercises 90Contents xi
Chapter 4... Beyond Loop-Level Parallelism: Parallel Regions 93
4) Introduction 93
}) Form and Usage of the parallel Directive 94
Clauses on the parallel Directive 95
12) Restrictions on the parallel Directive 96
| Meaning of the parallel Directive 97
| Parallel Regions and SPMD-Style Parallelism 100
‘ threadprivate Variables and the copyin Clause 100
The threadprivate Directive 103
The copyin Clause 106
1), Work-Sharing in Parallel Regions 108
A Parallel Task Queue 108
Dividing Work Based on Thread Number 109
28) Work-Sharing Constructs in OpenMP mw
‘4, Restrictions on Work-Sharing Constructs 119
i) Block Structure 119
Entry and Exit 120
Nesting of Work-Sharing Constructs 122
1) Orphaning of Work-Sharing Constructs 123
Data Scoping of Orphaned Constructs 125
Writing Code with Orphaned Work-Sharing
Constructs 126
i} Nested Parallel Regions 126
i!) Directive Nesting and Binding 129
{4 Controlling Parallelism in an OpenMP Program 130
‘i Dynamically Disabling the parallel Directives 130
46) Controlling the Number of Threads 131
Dynamic Threads 133
4. Runtime Library Calls and Environment Variables 135
4) Concluding Remarks 137
1) Exercises 138
Chaptet§. Synchronization 141
it Introduction 141
) Data Conflicts and the Need for Synchronization 142
5 Getting Rid of Data Races 143xii
Examples of Acceptable Data Races
1) Synchronization Mechanisms in OpenMP
52 Mutual Exclusion Synchronization
The Critical Section Directive
The atomic Directive
| Runtime Library Lock Routines
4 Event Synchronization
(4 Barriers
542 Ordered Sections
1 The master Directive
‘4 Custom Synchronizatioi
The flush Directive
Rolling Your Own
Some Practical Considerations
i} Concluding Remarks
i!) Exercises
Chaplet = Performance
Introduction
i) Key Factors That Impact Performance
Coverage and Granularity
Load Balance
Locality
“8 Synchronization
Performance-Tuning Methodology
Dynamic Threads
Bus-Based and NUMA Machines
Coneluding Remarks
5] Exercises
Appendix A A Quick Reference to OpenMP
References
Index
Contents
144
146
147
147
152
155
157
157
159
161
162
163
165
168
168
171
171
173
173
175
179
192
198
201
204
207
207
211
pals
221Preface
OPENMP 15 A PARALLEL PROGRAMMING MODEL for shared
memory and distributed shared memory multiprocessors. Pioneered by
SGI and developed in collaboration with other parallel computer vendors,
OpenMP is fast becoming the de facto standard for parallelizing applica-
tions. There is an independent OpenMP organization today with most of
the major computer manufacturers on its board, including Compaq,
Hewlett-Packard, Intel, IBM, Kuck & Associates (KAI), SGI, Sun, and the
U.S. Department of Energy ASCI Program. The OpenMP effort has also
been endorsed by over 15 software vendors and application developers,
reflecting the broad industry support for the OpenMP standard.
Unfortunately, the main information available about OpenMP is the
OpenMP specification (available from the OpenMP Web site at www,
openmp.org). Although this is appropriate as a formal and complete speci-
fication, it is not a very accessible format for programmers wishing to use
OpenMP for developing parallel applications. This book tries to fulfill the
needs of these programmers
This introductory-level book is primarily designed for application
developers interested in enhancing the performance of their applications
by utilizing multiple processors. The book emphasizes practical concepts
and tries to address the concerns of real application developers. Little
background is assumed of the reader other than single-processor program-
ming experience and the ability to follow simple program examples in the
xi