Chapter 12

Shared Counters and Parallelism
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
A Shared Pool
public interface Pool { public void put(Object x); public Object remove(); }
Unordered set of objects

Put
Inserts object blocks if full
Remove
Removes & returns an object blocks if empty
Art of Multiprocessor Programming 2
A Shared Pool
Put
Insert item block if full
Remove
Remove & return item block if empty
public interface Pool<T> { public void put(T x); public T remove(); }
Art of Multiprocessor Programming
Simple Locking Implementation

put
put

put
put
Problem: hotspot contention

5

put Problem: sequential bottleneck
put
Problem: hotspot contention

6

put Problem: sequential bottleneck
put
Problem: hotSolution: spot contention Art of Multiprocessor Queue Lock

Programming

put Problem: sequential Solution:? bottleneck ??
put
Problem: hotSolution: spot contention Art of Multiprocessor Queue Lock

Programming
Counting Implementation
put
19 19 20 21 20 21
remove
Counting Implementation
put
19 19 20 20 21
remove
Only the counters are sequential

21
Shared Counter
0 12 3
3
2 1
11
Shared Counter
No duplication
0 12 3
3
2 1
12
Shared Counter
No duplication No Omission 0
3
2 1
12 3
13
Shared Counter
No duplication No Omission 0
3
2 1
12 3
Not necessarily linearizable
14
Shared Counters
Can we build a shared counter with
Low memory contention, and Real parallelism?
Locking
Can use queue locks to reduce contention No help with parallelism issue
15
Software Combining Tree

Contention: All spinning local 4
Parallelism: Potential n/log n speedup
16
Combining Trees
0
17
Combining Trees
0
+3
18
Combining Trees
0
+3
+2
19
Combining Trees
0
+3
+2
Two threads meet, combine sums
20
Combining Trees
0
+5
+3 +2
Two threads meet, combine sums
21
Combining Trees
5
+5
+3 +2
Combined sum added to root
22
Combining Trees
5
0
+3 +2
Result returned to children
23
Combining Trees
5
0
0 3
Results returned to 0 threads
24
What if?
Threads dont arrive together?
Should I stay or should I go?
How long to wait?

Waiting times add up
Idea:
Use multi-phase algorithm Where threads wait in parallel
Combining Status
enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT };
26
Combining Status
Nothing going on
27
Combining Status
1st thread is a partner for combining, will return to check for 2nd thread
Combining Status
2nd thread has arrived with value for combining

Combining Status
1st thread has deposited result for 2nd thread

Combining Status
Special case: root node

Node Synchronization
Short-term
Synchronized methods Consistency during method call
Long-term
Boolean locked field Consistency across calls
32
Phases
Precombining
Set up combining rendez-vous
33
Phases
Precombining
Combining
Collect and combine operations
34
Phases
Precombining
Combining
Operation
Hand off to higher thread
35
Phases
Precombining
Combining
Operation

Hand off to higher thread Distribute results to waiting threads
Distribution
Precombining Phase
0
IDLE
Examine status
37
Precombining Phase
0
FIRST
If IDLE, 0 promise to return to look for partner
38
Precombining Phase
0
FIRST
At ROOT,turn back
39
Precombining Phase
0
FIRST
40
Precombining Phase
0
SECOND
If FIRST, Im 0 combine, willing to but lock for now
41
Code
Tree class
In charge of navigation
Node class
Combining state Synchronization state Bookkeeping
42
Precombining Navigation
Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node;
43
Start at leaf
Move up while instructed to do so

Remember where we stopped

Precombining Node
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } }
Precombining Node
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() Short-term } synchronization }
Synchronization
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; Wait while node is locked return false; (in use by earlier combining phase) case ROOT: return false; default: throw new PanicException() } }
Precombining Node
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Check combining }
status
50
Node was IDLE

synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() nd I will return to look for 2 } threads input value }
Precombining Node
synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; Continue up the case ROOT: return false; default: throw new PanicException() } }
tree
52
Im the
nd 2
Thread
synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() st thread has promised to return, If 1 } } lock node so it wont leave without me
Precombining Node
synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() nd Prepare to deposit 2 } threads input value }
Precombining Node
synchronized boolean phase1() { End of precombining while (sStatus==SStatus.BUSY) {wait();} phase, dont continue switch (cStatus) { up tree case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } }
Node is the Root

synchronized boolean phase1() { If root, precombining while (sStatus==SStatus.BUSY) {wait();} phase ends, dont switch (cStatus) { continue up tree case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } }
Precombining Node
synchronized boolean precombine() { Always check for while (locked) {wait();} switch (cStatus) { unexpected values! case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } }
Combining Phase
0
SECOND
+3
1st thread locked 0 2nd out until provides value
58
Combining Phase
0
SECOND
+3
2nd thread deposits value to 0 be combined, unlocks node, & waits

zzz
59
Combining Phase
0
+5
SECOND
2
+3
+2
1st thread moves up the tree with combined value

zzz
60
Combining (reloaded)
0
FIRST
2nd thread 0 has not yet deposited value
61
0
FIRST
+3
1st thread is alone, locks out late partner
62
0 +3
FIRST
Stop at root
+3
63
0 +3
FIRST
+3
2nd threads late precombining phase visit locked out
64
Combining Navigation
node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; }
65
Start at leaf
66
Add 1
node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); Revisit nodes node = node.parent; visited in }
precombining
68
Accumulate combined values, if any

node = myLeaf; We will retraverse path in int combined = 1; reverse order while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; }
70
node = myLeaf; Move up the tree int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; }
71
Combining Phase Node

synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: } }

synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; Wait until node is case SECOND: unlocked. It is locked by return firstValue + secondValue; the 2nd thread default: } until it deposits its value }

synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; Why is it that no thread case SECOND: acquires+ the lock between return firstValue secondValue; the two lines? default: } }

synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; Lock out late switch (cStatus) { attempts to combine case FIRST: (by threads still in return firstValue; case SECOND: precombining) return firstValue + secondValue; default: } }

synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: } Remember my (1st }
thread) contribution
76

synchronized int combine(int combined) { while (locked) wait(); Check status locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: } }

synchronized int combine(int combined) { while (locked) wait(); st thread) am I (1 locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: } }
alone
78
Combining Node
synchronized int combine(int combined) { while (locked) wait(); locked = true; Not alone: firstValue = combined; combine with switch (cStatus) { 2nd thread case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: } }
Operation Phase
5 +5
+3
Add combined value to root, start back down

+2
zzz
80
Operation Phase (reloaded)

5
SECOND
Leave value to be combined
81
Operation Phase (reloaded)

5
SECOND
Unlock, and wait

+2
zzz
82
Operation Phase Navigation

prior = stop.op(combined);
83
Operation Phase Navigation

prior = stop.op(combined);
The node where we stopped. Provide collected sum and wait for combining result
Operation on Stopped Node

synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default:
Op States of Stop Node

synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; and SECOND possible. Only ROOT case SECOND: secondValue = combined; Why? locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default:
At Root
synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; Add sum to root, return result; default: return prior value
Intermediate Node
synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; Deposit value for return result; default: later combining
synchronized int op(int combined) Unlock node (which I { locked in switch (cStatus) { precombining). Then notify 1st thread case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default:
Intermediate Node
Intermediate Node
synchronized int op(int combined) { switch (cStatus) { for 1st thread case ROOT: int prior Wait = result; result += combined; to deliver results return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default:
Intermediate Node
synchronized int op(int combined) { switch (cStatus) { st thread Unlock node (locked by 1 case ROOT: int prior = result; result in += combining combined; phase) & return return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default:
Distribution Phase
5 0
SECOND
Move down with result
zzz
92
Distribution Phase
5
SECOND
Leave result for 2nd thread & lock node

zzz
93
Distribution Phase
5
Push result down tree
SECOND
zzz
94
Distribution Phase
5
IDLE
3
2nd thread awakens, unlocks, takes value
95
Distribution Phase Navigation

while (!stack.empty()) { node = stack.pop(); node.distribute(prior); } return prior;
96

Traverse path in reverse order
97

Distribute results to waiting 2nd threads
98

Return result to caller

Distribution Phase
synchronized void distribute(int prior) { switch (cStatus) { case FIRST: cStatus = CStatus.IDLE; locked = false; notifyAll(); return; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; notifyAll(); return; default:
100
Distribution Phase
synchronized void distribute(int prior) { switch (cStatus) { case FIRST: cStatus = CStatus.IDLE; locked = false; notifyAll(); return; case SECOND: result = prior + firstValue; nd No 2 thread to combine cStatus = CStatus.RESULT; notifyAll(); with me, unlock node & return; reset default:
101
nd thread synchronized void distribute(int prior) Notify 2 that result is{ switch (cStatus) { available (2nd thread will release lock) case FIRST: cStatus = CStatus.IDLE; locked = false; notifyAll(); return; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; notifyAll(); return; default:
Distribution Phase
102
Bad News: High Latency

+5
Log n
+2 +3
103
Good News: Real Parallelism

+5
1 thread
+3
+2
2 threads
104
Throughput Puzzles
Ideal circumstances
All n threads move together, combine n increments in O(log n) time
Worst circumstances
All n threads slightly skewed, locked out n increments in O(n log n) time
105
Index Distribution Benchmark

void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }}
106

How many iterations
107

Expected time between incrementing counter


Take a number
109

Pretend to work (more work, less concurrency)

Performance
Here are some fake graphs
Distilled from real ones
Performance
Your performance will probably vary

But not by much?
Performance
Your performance will probably vary

But not by much?
Throughput
Average incs in 1 million cycles
Performance
Your performance will probably vary Throughput
But not by much?
Average incs in 1 million cycles Average cycles per inc Distilled from real ones
Latency
Latency
Spin lock bad Combining tree
good
Number of processors
115
Throughput
Spin lock
Combining tree
bad Combining tree Spin lock
good
Number of processors
116
Load Fluctuations
Combining is sensitive:
if arrival rates drop So do combining rates & performance deteriorates!
Test
Vary work Duration between accessess
117
Combining Rate vs Work

70 60 50 40
W=100 W=1000
30
20 10 0 1 2 4 8 16 31 48 64
W=5000
118
Better to Wait Longer

Short wait
Latency
Medium wait
Indefinite wait
processors
119
Conclusions
Combining Trees
Linearizable Counters Work well under high contention Sensitive to load fluctuations Can be used for getAndMumble() ops
And now for something completely different

A Balancer
Input wires
Output wires
121
Tokens Traverse Balancers
Token i enters on any wire leaves on wire i (mod 2)

123
124
125
126
QuiescentTraverse State: all tokens have exited Tokens Balancers
Arbitrary input distribution

Balanced output distribution

127
Smoothing Network
1-smooth property Art of Multiprocessor

Programming
128
Counting Network
step property
129
Counting
Step property guarantees no duplication or Networks Count! omissions, how?

0, 4, 8....
1, 5, 9.....
2, 6, ...
3, 7 ...
counters Multiple counters distribute Art load of Multiprocessor

Programming
130
Step property guarantees that in-flight Counting Count! tokens willNetworks take missing values
0
1, 5, 9.....
2, 6, ...
3, 7 ...
If 5 and 9 are taken before 4 and 8
131
Counting Networks
Good for counting number of tokens low contention no sequential bottleneck high throughput 2 practical networks depth log n
132
Counting Network
1
133
Counting Network
1 2
134
Counting Network
1 2 3
135
Counting Network
1 2 3
136
Counting Network
15 2 3 4
137
Counting Network
1 2 3 4 5
138
Bitonic[k] Counting Network
139
Bitonic[k] Counting Network
140
Bitonic[k] not Linearizable
141
Bitonic[k] is not Linearizable
142
Bitonic[k] is not Linearizable

2
143
Bitonic[k] is not Linearizable 0

2
144
Bitonic[k] is not Linearizable 0

Problem is: Red finished before Yellow started Red took 2 Yellow took 0
145
But it is Quiescently Consistent
Has Step Property in Any Quiescent State (one in which all tokens have exited)
Shared Memory Implementation

class balancer { boolean toggle; balancer[] next;
synchronized boolean flip() { boolean oldValue = this.toggle; this.toggle = !this.toggle; return oldValue; }

state synchronized boolean flip() { boolean oldValue = this.toggle; this.toggle = !this.toggle; return oldValue; }

Output connections to balancers

getAndComplement

Balancer traverse (Balancer b) { while(!b.isLeaf()) { boolean toggle = b.flip(); if (toggle) b = b.next[0] else b = b.next[1] return b; }

Balancer traverse (Balancer b) { while(!b.isLeaf()) { boolean toggle = b.flip(); if (toggle) b = b.next[0] Stop when we exit the network else b = b.next[1] return b; }

Balancer traverse (Balancer b) { while(!b.isLeaf()) { boolean toggle = b.flip(); if (toggle) b = b.next[0] else Flip state b = b.next[1] return b; }

Balancer traverse (Balancer b) { while(!b.isLeaf()) { Exit on wire boolean toggle = b.flip(); if (toggle) b = b.next[0] else b = b.next[1] return b; }
Bitonic[2k] Inductive Structure

Bitonic[k] Merger[2k]
Bitonic[k]
Bitonic[4] Counting Network

Bitonic[2] Merger[4] Bitonic[2]
157
Bitonic[4]
Bitonic[8] Layout
Merger[8]
Bitonic[4]
158
Unfolded Bitonic[8] Network
Merger[8]
159

Merger[4]
Merger[4]
160

Merger[2]
Merger[2]
Merger[2] Merger[2]
161
Bitonic[k] Depth
Width k Depth is (log2 k)(log2 k + 1)/2
162
Proof by Induction
Base:
Bitonic[2] is single balancer has step property by definition
Step:
If Bitonic[k] has step property So does Bitonic[2k]
Bitonic[2k] Schematic
Bitonic[k] Merger[2k]
Bitonic[k]
Bitonic[2k] Counts
Induction Hypothesis Need to prove
Merger[2k]
165
Merger[2k] Schematic
Merger[k]
Merger[k]
Merger[2k] Layout
167
Proof: Lemma 1
If a sequence has the step property
168
Lemma 1
So does its even subsequence
169
Lemma 1
Also its odd subsequence
170
Lemma 2
even
Even + odd Odd + even
Diff at most 1
even
171
Bitonic[2k] Layout Details

Merger[2k]
Bitonic[k]
even
Merger[k]
Bitonic[k]
even
Merger[k]
172
By induction hypothesis
Outputs have step property
Bitonic[k]
Merger[k]
Bitonic[k]
Merger[k]
By Lemma 1
even
All subsequences have step property
Merger[k]
even
Merger[k]
By Lemma 2
even
Diff at most 1
Merger[k]
even
Merger[k]
By Induction Hypothesis
Outputs have step property
Merger[k]
Merger[k]
By Lemma 2
At most one diff
Merger[k]
Merger[k]
Last Row of Balancers

Merger[k] Merger[k]
Outputs of Merger[k]
Outputs of last layer

178

Wire i from one merger
Merger[k] Merger[k]
Wire i from other merger
179

Merger[k] Merger[k]
Outputs of Merger[k]
Outputs of last layer

180

Merger[k] Merger[k]
181
So Counting Networks Count

Merger[k] Merger[k]
182
Periodic Network Block
183
184
185
186
Block[2k] Schematic
Block[k]
Block[k]
Block[2k] Layout
188
Periodic[8]
189
Network Depth
Each block[k] has depth log2 k Need log2 k blocks Grand total of (log2 k)2
190
Lower Bound on Depth

Theorem: The depth of any width w counting network is at least (log w). Theorem: there exists a counting network of (log w) depth. Unfortunately, proof is non-constructive and constants in the 1000s.
191
Sequential Theorem
If a balancing network counts
Sequentially, meaning that Tokens traverse one at a time
Then it counts
Even if tokens traverse concurrently
192
Red First, Blue Second
193 (2)
Blue First, Red Second
194 (2)
Either Way
Same balancer states
195
Order Doesnt Matter

Same balancer states
Same output distribution
196

void indexBench(int iters, int work) { while (int i = 0 < iters) { i = fetch&inc(); Thread.sleep(random() % work); } }
197
Performance (Simulated)
Throughput
Higher is better!
MCS queue lock Spin lock Number processors

* All graphs taken from Herlihy,Lim,Shavit, copyright ACM.
198
64-leaf combining tree 80-balancer counting network
Throughput
Higher is better!
199
Throughput
Combining and counting are pretty close

200
Throughput
But they beat the hell out of the competition!
MCS queue lock Spin lock

Number processors
201
Saturation and Performance

Undersaturated P < w log w
Optimal performance
Saturated
P = w log w
Oversaturated
P > w log w
202
Throughput vs. Size

Bitonic[16]
Throughput
Bitonic[8]
Bitonic[4]
Number processors
Shared Pool
put
19 19 20 21 20 21
remove
204
Shared Pool
put
remove
Depth log2w
239
Counting Trees
A Tree Balancer:
Single input wire Step property in quiescent state
Counting Trees
Interleaving of output wires
Inductive Construction
Tree[2k] =
b
y1
Tree1[k]
y0
Tree0[k]
. . .
k even outputs
. . .
k odd outputs
Lemma: Tree[2k] has step property in quiescent state.
At most 1 more token in top wire
Inductive Construction
Tree[2k] =
b
y1
Tree1[k]
y0
Tree0[k]
. . .
k even outputs
. . .
k odd outputs
Lemma: Tree[2k] has step property in quiescent state.
Top step sequence has at most one extra on last wire of step
Implementing Counting Trees

b 0/1
b 0/1
0/1 0/1 b 0/1
b 0/1
b 0/1
b 0/1
Example
inc = follow getAndComplement of toggle-bits
1 0
0 1 0
0 1
Implementing Counting Trees

problem: toggle bit in balancer..
b 0/1 0/1 0/1 b 0/1 b 0/1
To lesser extent in next balancers

b 0/1
b 0/1
Contention and Sequential bottleneckso what have we achieved?

b 0/1
Diffraction Balancing
Idea (as in elimination stack): if an even number of tokens pass balancer, the toggle bit remains unchanged!
Prism Array
0/1
toggle bit
Diffracting Tree
B2
prism
B1
1 2 3
prism
1 2 0/1 . . k / 2
1 2 . . : : k
0/1
B3
prism
Diff-Bal
2
Diff-Bal
1 2 0/1 . . k / 2
Diff-Bal
Lemma: Diffracting balancer same as balancer.
Diffracting Tree
B2
B1
1 2 3
prism
1 2 . . k / 2
0/1
prism
1 2 . . : : k
0/1
B3
Diff-Bal
2
prism
1 2 . . k / 2
Diff-Bal
0/1
Diff-Bal
High load Low load
Lots of Diffraction + Few Toggles Low Diffraction + Few Toggles

High Throuhput with Low Contention
Performance
Throughput
160000 140000 10000
MCS
Latency
Ctree
120000
100000
8000
6000 80000
60000 40000 20000 0 0 50 100 150 200 250
Dtree4000 Ctree MCS

300 2000
Dtree
0 0 50 100 150 200 250 300
P=Concurrency
P=Concurrency
Summary
Can build a linearizable parallel shared counter By relaxing our coherence requirements, we can build a shared counter with
Low memory contention, and Real parallelism
This work is licensed under a Creative Commons AttributionShareAlike 2.5 License.

You are free: to Share to copy, distribute and transmit the work to Remix to adapt the work Under the following conditions: Attribution. You must attribute the work to The Art of Multiprocessor Programming (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.
252

Chapter 12

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Chapter 12

Transféré par

Droits d'auteur :

Formats disponibles

Shared Counters and Parallelism

Unordered set of objects

public interface Pool<T> { public void put(T x); public T remove(); }

Art of Multiprocessor Programming

Simple Locking Implementation

Art of Multiprocessor Programming

Simple Locking Implementation

Problem: hotspot contention

Simple Locking Implementation

Problem: hotspot contention

Simple Locking Implementation

Problem: hotSolution: spot contention Art of Multiprocessor Queue Lock

Simple Locking Implementation

Problem: hotSolution: spot contention Art of Multiprocessor Queue Lock

Art of Multiprocessor Programming

Only the counters are sequential

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Not necessarily linearizable

Art of Multiprocessor Programming

Software Combining Tree

Parallelism: Potential n/log n speedup

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Two threads meet, combine sums

Art of Multiprocessor Programming

Two threads meet, combine sums

Art of Multiprocessor Programming

Combined sum added to root

Art of Multiprocessor Programming

Result returned to children

Art of Multiprocessor Programming

Results returned to 0 threads

Art of Multiprocessor Programming

How long to wait?

Art of Multiprocessor Programming

Art of Multiprocessor Programming

2nd thread has arrived with value for combining

1st thread has deposited result for 2nd thread

Special case: root node

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Collect and combine operations

Art of Multiprocessor Programming

If IDLE, 0 promise to return to look for partner

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Art of Multiprocessor Programming

If FIRST, Im 0 combine, willing to but lock for now

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Art of Multiprocessor Programming

Move up while instructed to do so

Remember where we stopped

Node was IDLE

Node is the Root

1st thread locked 0 2nd out until provides value