Académique Documents
Professionnel Documents
Culture Documents
Theoretical only
30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e 90 min
r
D
This operator scheduled his loads to be delivered to the laundry every 90 minutes
which is the time required to finish one load. In other words he will not start a new
task unless he is already done with the previous task
The process is sequential. Sequential laundry takes 6 hours for 4 loads
Efficiently scheduled laundry: Pipelined
Laundry
Operator start work ASAP
6 PM 7 8 9 10 11 Midnight
Time
30 40 40 40 40 20
40 40 40
T
a A
s
k
B
O
r
d C
e
r
D
Another operator asks for the delivery of loads to the laundry every 40 minutes!?.
Pipelined laundry takes 3.5 hours for 4 loads
Multiple tasks operating
Pipelining Facts simultaneously
Pipelining doesn’t help
latency of single task, it
helps throughput of
6 PM 7 8 9 entire workload
Time
Pipeline rate limited by
slowest pipeline stage
T Potential speedup =
a 30 40 40 40 40 20
Number of pipe stages
s
k A Unbalanced lengths of
pipe stages reduces
O speedup
r B Time to “fill” pipeline
d and time to “drain” it
e The washer reduces speedup
r C waits for the
dryer for 10
minutes
D
9.2 Pipelining
• Decomposes a sequential process into
segments.
• Divide the processor into segment processors
each one is dedicated to a particular segment.
• Each segment is executed in a dedicated
segment-processor operates concurrently with
all other segments.
• Information flows through these multiple
hardware segments.
9.2 Pipelining
Instruction execution is divided into k
segments or stages
Instruction exits pipe stage k-1 and
k segments
9.2 Pipelining
Suppose we want to perform the
combined multiply and add
operations with a stream of
numbers:
Ai * Bi + Ci for i =1,2,3,…,7
9.2 Pipelining
The suboperations performed in
each segment of the pipeline are
as follows:
R1 Ai, R2 Bi
R3 R1 * R2 R4 Ci
R5 R3 + R4
Pipeline Performance
Tk = (k + (n − 1))τ
T1 nk n
Speedup = =
Tk k + (n − 1) k
SPEEDUP
• Consider a k-segment pipeline operating on n data
sets. (In the above example, k = 3 and n = 4.)
> It takes k clock cycles to fill the pipeline and get the
first result from the output of the pipeline.
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6