Vous êtes sur la page 1sur 26

Parallel Algorithms

CS170
Fall 2016

Parallel computation is here!


During your career, Moores Law will
probably slow down a lot (possibly to
a grinding halt)
Googles engine (reportedly) has
about 900,000 processors (recall
Map-Reduce)
The fastest supercomputers have >
107 cores and 1017-18 flops
So, in an Algorithms course we
must at least mention parallel

This lecture
What are parallel algorithms, and
how do they differ from (sequential)
algorithms?
What are the important performance
criteria parallel algorithms?
What are the basic tricks?
What does the landscape look like?
Sketches of two sophisticated
parallel algorithms: MST and
connected components

Parallel Algorithms
need a completely new mindset!!
In sequential algorithms:
We care about Time
Acceptable: O(n), O(n log n), O(n 2),
O(|E||V|2)
Polynomial time
Unacceptable: Exponential time 2 n
Sometimes unacceptable is the only
possible: NP-complete problems
How about in parallel algorithms?

To start, what is a parallel algorithm? What kinds


of computers will it run on?
PRAM
Same clock,
synchronous.

RAM

P processors
Q: How about memory
congestion?
ShA: OK to Read concurrently, not OK to Write:

CREW PRAM

Language?
Threads in Java, Python, etc.
Parallel languages facilitate parallel
programming through syntax
(parbegin/parend)
In our pseudocode:
Instead of
for every edge (u,v) in E do
we may say
for every edge (u,v) in E do in parallel

And what do we care about?


Two things:
Work = the total number of
instructions executed by all
processors
Depth = clock time in parallel
execution

And what is acceptable?


Polynomial work
Depth?
O(log n) -- or maybe O((log n)2) etc.
Q: But how many processors? P = ?
A: Pretend to have as many as you
want!
Saturate the problem with processors!

The reason: Brents


Principle
If you can solve a problem with depth
D and work W with as many
processors as you want
then you can also solve it with P
processors with work O(W) and depth
D = D + W/P

Proof
time
t

wt

P
processors
we have

Processors each step

Can simulate each parallel step t (work w t)


with ceil[wt/P] steps of our P processors
Adding over all t, we get depth D < D +
W/P

To recap: Brents Principle


If a problem can be solved in parallel
with work W, depth D, and many
processors, then it can be solved:
with P processors
the same work W = W
and depth D = W/P +D

Toy Problem: Sum


Sequential algorithm
sum = 0;
for i = 1 to n do sum = sum + A[i]
return sum
O(n) time
In parallel?

function sum(A[1..n])
If n = 1 return A[1]
for i = 1,,n/2 do in parallel
A[i] = A[2i -1] + A[2i]
return sum(A[1..n/2])
23175486
5 8 9 14
13 23
36

Work? Time?
W(n) = W(n/2) + n/2 O(n)
D(n) = D(n/2) + 2 O(log n)

Work efficient = same work as best


sequential
Depth log n (as little as possible)
Important: sums, all sums: sums[j] = 1j
A[i]

Another toy problem:


compact
Given array
2001043000600001
make it into
214361
Also work efficient

Another Basic Problem:


Find-Root

Solution: pointer jumping


Repeat log n times:
for every node v do in parallel
if
next[v] v next[v] =
next[next[v]]

The parallel algorithm


landscape
These are some of the very basic
tricks of parallel algorithm design
(like divide and conquer or
greedy in algorithm design)
There are a couple of others
The go a long way, but not all the
way
So, what happens to the problems
we learned how to solve sequentially
in CS170?

Matrix multiplication (recall sum)


(redesign, pquicksort,
Merge sort
radixsort)
(begging)
FFT
Connected components (redesign)
(redesign)
DFS/SCC
(redesign)
Shortest path
(redesign)
MST
(impossible, P-complete)
LP, HornSAT
(redesign)
Huffman

Hackattack:
(embarrassing
(for i = 1 to n
parallelism)
check if k[i] is the secret key)

MST
Prim?
Applies the cut property to the component
that contains S sequential
Kruskal?
Goes through the edges in sorted order
sequential

Borvkas Algorithm (1926)


(applies cut principle to all components at
once)
T = empty (the MST under construction)
C (list of the ccs of T) = [{1}, {2}, , {n}]
while |C| > 1
for each c in C do
find the shortest edge out of c
add it to T
C = connected components of T

Little problem
Solution
3
3

3.007
3

3.003

3.001

(or: break edge ties


lexicographically)

Borvkas Algorithm O(|E| log |V|)


T = empty
C (list of the ccs of T) = [{1}, {2},
, {n}]
while |C| > 1
for each c in C do
log |V|
stages
find the shortest edge out of c
O(|
E|)
and add it to T

Borvkas Algorithm in
parallel?
T = empty
C (list of the ccs of T) = [{1}, {2}, , {n}]
while |C| > 1
for each c in C do in parallel
find the shortest edge out of c W=|E|, D=log|V|
add it to T
C = connected components of T W = |E| log |V|
D = log |V|
Total: W = O(|E| log2 |V|), D = O(log2 |V|)

Borvkas Algorithm in
parallel?
T = empty
C (list of the ccs of T) = [{1}, {2}, ,
{n}]
while |C| > 1
for each c in C do in parallel
find the shortest edge out of c How???
add it to T
C = connected components of T How???
Total: W = O(|E| log2 |V|), D = O(log2 |V|)

Connected Components
function cc(V,E) returns array[V] of V
initialize: for every node v do in parallel:
leader[v] = fifty-fifty(), ptr[v] = v
for all non-leader node v do in parallel:
chose an adjacent leader node u, if one exists,
and set ptr[v] = u
(ptr is now a bunch of stars)
V = {v: ptr[v] = v} (the roots of the stars)
E = {(u,v): u v in V, there is (a, b) in E such that
ptr[a] = u and ptr[b] = v} (contract the graph)
label[] = cc(V,E) (compute cc recursively on
the contracted graph)
return cc[v] = label[ptr[v]]

Vous aimerez peut-être aussi