Part1 RecognizingPotentialParallelism

Introduction to Parallel Programming Part 1
Recognizing Potential Parallelism

Intel Software College
Objectives
At the end of this module you should be able to:
Define parallel computing
Explain why parallel computing is becoming

mainstream
Explain why explicit parallel programming is
necessary
Identify opportunities for parallelism in code
segments and applications
Copyright 2006, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
What Can We Do with

Faster Computers?
Solve problems faster
Reduce turn-around time of big jobs

Increase responsiveness of interactive apps
Get better solutions in same amount of time
Increase resolution of models
Make model more sophisticated

What Is Parallel Computing?

Attempt to speed solution of a particular task by
1. Dividing task into sub-tasks
2. Executing sub-tasks simultaneously on
multiple processors
Successful attempts require both
1. Understanding of where parallelism can be

effective
2. Knowledge of how to design and implement
good solutions

Why Parallel Computing?

The free lunch is over. Herb Sutter
We want applications to execute faster
Clock speeds no longer increasing exponentially
10 GHz
1 GHz
100 MHz
10 MHz
1 MHz
79 87 95 03 11
5

Clock Speeds Have Flattened Out

Problems caused by higher speeds
Excessive power consumption
Heat dissipation
Current leakage
Power consumption critical for mobile devices
Mobile computing platforms increasingly important

Retail laptop sales now exceed desktop sales
Laptops may be 35% of PC market in 2007

Execution Optimization
Popular optimizations to increase CPU speed
Instruction prefetching
Instruction reordering
Pipelined functional units
Branch prediction
Functional unit allocation
Register allocation
Hyperthreading
Added sophistication more silicon devoted to
control hardware

Multi-core Architectures
Potential performance = CPU speed # of CPUs
Strategy:
Limit CPU speed and sophistication
Put multiple CPUs (cores) on a single chip
Potential performance
the same
CPUs
Speed
8

History of Parallel Computing, Part I

Multiple-processor systems supporting parallel
computing
1960s: Experimental systems

1980s: Microprocessor-based commercial systems

Old Dynamic of Parallel Computing

Parallel computers
are expensive
Parallel computing
not mainstream
There are not many

parallel computers
Most people do not learn

parallel programming
Parallel programming
environments are inadequate
10
is difficult

Sequential Language Approach

Problem has inherent parallelism
Programming language cannot express parallelism
Programmer hides parallelism in sequential
constructs
Compiler and/or hardware must find hidden
parallelism
Sadly, doesnt work
11

Alternative Approach: Programmer

and Compiler Work Together
Problem has inherent parallelism
Programmer has way to express parallelism

Compiler translates program for multiple processors
12

Nothing Radical about a

Programmer/Compiler Team
Programmers of modern CPUs must take architecture
and compiler into account in order to get peak
performance
you can actively reorganize data and algorithms to
take advantage of architectural capabilities
Introduction to Microarchitectural Optimization for
Itanium 2 Processors, p. 3
13

History of Parallel Computing, Part II

2004: Intel demos Montecito dual-core CPU
2006: Intel demos Clovertown quad-core CPU
Clovertown scalable to 32+ cores in a single package
14

New Dynamic of Parallel Computing

PCs are parallel computers
considered mainstream
Everyone has a
parallel computer
More people learning

parallel programming
environments improve
15
gets easier

Methodology
Study problem, sequential program, or code segment
Look for opportunities for parallelism
Try to keep all processors busy doing useful work
16

Ways of Exploiting Parallelism

Domain decomposition
Task decomposition
Pipelining
17

Domain Decomposition
First, decide how data elements should be divided
among processors
Second, decide which tasks each processor should be
doing
Example: Vector addition
18

Find the largest element of an array
19

CPU 0
CPU 1
20
CPU 2
CPU 3

CPU 0
CPU 1
21
CPU 2
CPU 3

CPU 0
CPU 1
22
CPU 2
CPU 3

CPU 0
CPU 1
23
CPU 2
CPU 3

CPU 0
CPU 1
24
CPU 2
CPU 3

CPU 0
CPU 1
25
CPU 2
CPU 3

CPU 0
CPU 1
26
CPU 2
CPU 3

CPU 0
CPU 1
27
CPU 2
CPU 3

CPU 0
CPU 1
28
CPU 2
CPU 3

CPU 0
CPU 1
29
CPU 2
CPU 3

CPU 0
CPU 1
30
CPU 2
CPU 3

Task (Functional) Decomposition

First, divide tasks among processors
Second, decide which data elements are going to be
accessed (read and/or written) by which
processors
Example: Event-handler for GUI
31

Task Decomposition
f()
g()
h()
r()
q()
s()
32

Task Decomposition
CPU 1
g()
h()
CPU 0
f()
CPU 2
r()
q()
s()
33

Task Decomposition
CPU 1
g()
h()
CPU 0
f()
CPU 2
r()
q()
s()
34

Task Decomposition
CPU 1
g()
h()
CPU 0
f()
CPU 2
r()
q()
s()
35

Task Decomposition
CPU 1
g()
h()
CPU 0
f()
CPU 2
r()
q()
s()
36

Task Decomposition
CPU 1
g()
h()
CPU 0
f()
CPU 2
r()
q()
s()
37

Pipelining
Special kind of task decomposition
Assembly line parallelism
Example: 3D rendering in computer graphics

Model
Project
Clip
Output
Input
38
Rasterize

Processing One Data Set (Step 1)

Model
39
Project
Clip
Rasterize


Model
40
Project
Clip
Rasterize


Model
41
Project
Clip
Rasterize


Model
Project
Clip
Rasterize
The pipeline processes 1 data set in 4 steps
42

Processing Two Data Sets (Step 1)

Model
43
Project
Clip
Rasterize

Processing Two Data Sets (Time 2)

Model
44
Project
Clip
Rasterize


Model
45
Project
Clip
Rasterize


Model
46
Project
Clip
Rasterize


Model
Project
Clip
Rasterize
The pipeline processes 2 data sets in 5 steps
47

Pipelining Five Data Sets (Step 1)

CPU 0
CPU 1
CPU 2
CPU 3
Data set 0
Data set 1
Data set 2
Data set 3
Data set 4
48


CPU 0
CPU 1
CPU 2
CPU 3
Data set 0
Data set 1
Data set 2
Data set 3
Data set 4
49


CPU 0
CPU 1
CPU 2
CPU 3
Data set 0
Data set 1
Data set 2
Data set 3
Data set 4
50


CPU 0
CPU 1
CPU 2
CPU 3
Data set 0
Data set 1
Data set 2
Data set 3
Data set 4
51


CPU 0
CPU 1
CPU 2
CPU 3
Data set 0
Data set 1
Data set 2
Data set 3
Data set 4
52


CPU 0
CPU 1
CPU 2
CPU 3
Data set 0
Data set 1
Data set 2
Data set 3
Data set 4
53


CPU 0
CPU 1
CPU 2
CPU 3
Data set 0
Data set 1
Data set 2
Data set 3
Data set 4
54


CPU 0
CPU 1
CPU 2
CPU 3
Data set 0
Data set 1
Data set 2
Data set 3
Data set 4
55

Dependence Graph
Graph = (nodes, arrows)
Node for each
Variable assignment (except index variables)
Constant
Operator or function call

Arrows indicate use of variables and constants
Data flow
Control flow
56

Dependence Graph Example #1

for (i = 0; i < 3; i++)
a[i] = b[i] / 2.0;
b[0]
b[1]
b[2]
a[0]
a[1]
a[2]
57


for (i = 0; i < 3; i++)
a[i] = b[i] / 2.0;
b[0]
possible
b[1]
b[2]
a[0]
a[1]
a[2]
58


for (i = 1; i < 4; i++)
a[i] = a[i-1] * b[i];
a[0]
b[1]
b[2]
b[3]
a[1]
a[2]
a[3]
59


for (i = 1; i < 4; i++)
a[i] = a[i-1] * b[i];
a[0]
b[1]
No domain decomposition
b[2]
b[3]
a[1]
a[2]
a[3]
60


a = f(x, y, z);
b = g(w, x);
t = a + b;
c = h(z);
s = t / c;
/
s
61


a = f(x, y, z);
b = g(w, x);
t = a + b;
c = h(z);
s = t / c;
Task
decomposition
with 3 CPUs.
/
s
62


for (i = 0; i < 3; i++)
a[i] = a[i] / 2.0;
a[0]
a[1]
a[2]
a[0]
a[1]
a[2]
63


for (i = 0; i < 3; i++)
a[i] = a[i] / 2.0;
a[0]
a[1]
a[2]
a[0]
a[1]
a[2]
64


for (i = 0; i < 3; i++) {
a[i] = a[i] / 2.0;
if (a[i] < 1.0) break;
}
a[0]
2
65
a[1]
a[2]
a[0]
<
a[1]
<
a[2]

Can You Find the Parallelism?

Resizing a photo
Searching a document for all instances of a word
Updating a spreadsheet
Compiling a program
Prefetching pages in a Web browser

Using a word processor to type a report
66

Good/Bad Opportunities for a

Parallel Solution
Parallel Solution Easier
Parallel Solution More Difficult

or Even Impossible
Larger data sets
Smaller data sets
Dense matrices
Sparse matrices
Dividing space among

processors
Dividing time among processors
67

Speculative Computation in a TurnBased Strategy Game

Make moves for distant
AI-controlled countries
in parallel
68

Risk: Unexpected Interaction
69

70

Orange Cannot Move a Ship that

Has Already Been Sunk by Green
71

Solution: Reverse Time

Must be able to undo an erroneous, speculative
computation
Analogous to what is done in hardware after

incorrect branch prediction
Speculative computations typically do not have a big
payoff in parallel computing
72

References
Richard H. Carver and Kuo-Chung Tai, Modern Multithreading:
Implementing, Testing, and Debugging Java and C++/Pthreads/
Win32 Programs, Wiley-Interscience (2006).
Robert L. Mitchell, Decline of the Desktop, Computerworld
(September 26, 2005).
Michael J. Quinn, Parallel Programming in C with MPI and
OpenMP, McGraw-Hill (2004).
Herb Sutter, The Free Lunch is Over: A Fundamental Turn
Toward Concurrency in Software, Dr. Dobbs Journal 30, 3
(March 2005).
73

74


Part1 RecognizingPotentialParallelism

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Part1 RecognizingPotentialParallelism

Transféré par

Droits d'auteur :

Formats disponibles

Introduction to Parallel Programming Part 1

Recognizing Potential Parallelism

Intel Software College

Explain why parallel computing is becoming

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

What Can We Do with

Reduce turn-around time of big jobs

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

What Is Parallel Computing?

1. Understanding of where parallelism can be

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

Why Parallel Computing?

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

Clock Speeds Have Flattened Out

Mobile computing platforms increasingly important

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

History of Parallel Computing, Part I

1960s: Experimental systems

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

Old Dynamic of Parallel Computing

There are not many

Most people do not learn

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

Sequential Language Approach

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

Alternative Approach: Programmer

Programmer has way to express parallelism

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

Nothing Radical about a

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

History of Parallel Computing, Part II

Recognizing Potential Parallelism

Copyright 2006, Intel Corporation. All rights reserved.

Intel Software College

New Dynamic of Parallel Computing

More people learning