Vous êtes sur la page 1sur 25

Pipelining and Parallel Processing

ECEG 3202 Computer Architecture and


Organization
By Getachew T.
What is Pipelining?

2 Memory Organization
Laundry Example
from David patersson

Almaz, Bekele, Chala, Desta


each have one load of clothes A B C D
to wash, dry, and fold
 Washer takes 30 minutes
 Dryer takes 40 minutes
 Folder takes 20 minutes
Sequential Laundry
6 PM 7 8 9 10 11 Midnight
Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r D
 Sequential laundry takes 6 hours for 4 loads
 If they learned pipelining, how long would laundry take?
Pipelined Laundry
Start work ASAP
6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r
D

 Pipelined laundry takes 3.5 hours for 4 loads


Effect of Pipelining
 Pipelining doesnt help latency of single task, it helps
throughput of entire workload
 Pipeline rate limited by slowest pipeline stage
 Multiple tasks operating simultaneously
 Potential speedup = Number pipe stages
 Unbalanced lengths of pipe stages reduces speedup
 Time to fill pipeline and time to drain it reduces
speedup
Instruction Cycle

 Instruction Fetch
 Instruction Decoding
 Operand Fetch
 Execute
 Store Result
5 Steps of MIPS Datapath
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back
Next PC

MUX
Adder

Next SEQ PC

4 RS1
Zero?

Reg File
Address

Memory

MUX MUX
RS2
Inst

ALU

Memory
Data
RD L
M

MUX
D
Sign
Imm Extend

WB Data
5 Steps of MIPS Datapath
Figure 3.4, Page 134 , CA:AQA 2e
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back
Next PC

IF/ID

ID/EX

EX/MEM

MEM/WB
MUX
Next SEQ PC Next SEQ PC
Adder

4 RS1
Zero?

Reg File
Address

Memory

MUX MUX
RS2

ALU

Memory
Data

MUX
Sign

WB Data
Extend
Imm

RD RD RD

Data stationary control


local decode for each instruction phase / pipeline stage
Visualizing Pipelining
Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7


I

ALU
Ifetch Reg DMem Reg
n
s
t

ALU
r. Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg

d
e
r

ALU
Ifetch Reg DMem Reg
Pipeline Time Analysis
 With Pipeline
 k segment pipeline
 n tasks
 tp clock cycle time
 ktp time to complete task T1
 (n-1)tp time to complete remaining n-1 tasks
 k+(n-1) clock cycles to complete n tasks
 (k+n-1)tp time to complete n tasks
 Without pipeline
 tn time to complete each task
 ntn time to complete n tasks
Pipeline Time Analysis
 Speedup of pipelining
nt n nt n tn
S= S = lim =
(k + n 1)t p n (k + n 1)t
p tp

 Assuming equal time for the pipeline and non-


pipeline kt p
S= =k
tp
 Thus, theoretical speedup limit is k, number of
pipeline segments
Arithmetic Pipelining
 Floating-point operations
 Fixed-point multiplication
 Other scientific problem
computations
Hazards due to Pipelining
Limits to pipelining: Hazards prevent next instruction from
executing during its designated clock cycle
 Structural hazards: HW cannot support this combination of
instructions (Contention for similar hardware)
 Data hazards: Instruction depends on result of prior instruction
still in the pipeline.
 Control hazards: Caused by delay between the fetching of
instructions and decisions about changes in control flow
(branches and jumps).
One Memory Port/Structural Hazards
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
I Load Ifetch Reg DMem Reg

n
s

ALU
t Instr 1 Ifetch Reg DMem Reg

r.

ALU
Ifetch Reg DMem Reg
Instr 2
O
r

ALU
Reg
d Instr 3 Ifetch Reg DMem

ALU
r Instr 4 Ifetch Reg DMem Reg
Data Hazard on R1
Time (clock cycles)

IF ID/RF EX MEM WB

ALU
add r1,r2,r3 Ifetch Reg DMem Reg

n
s

ALU
t sub r4,r1,r3 Ifetch Reg DMem Reg

r.

ALU
Ifetch Reg DMem Reg
O and r6,r1,r7
r
d

ALU
Ifetch Reg DMem Reg
e or r8,r6,r9
r

ALU
Reg
xor r10,r1,r11 Ifetch Reg DMem
Control Hazard due to Branches

ALU
10: beq r1,r3,36 Ifetch Reg DMem Reg

ALU
Ifetch Reg DMem Reg
14: and r2,r3,r5

ALU
Reg Reg
18: or r6,r1,r7 Ifetch DMem

ALU
Ifetch Reg DMem Reg
22: add r8,r1,r9

ALU
36: xor r10,r1,r11 Ifetch Reg DMem Reg
Solutions

 Instruction Reordering
 Branch Prediction
Parallel Processing
 Concurrent data processing
 Possibilities
 Fetch next instruction while current instruction is executed in
ALU
 System may have more than one ALU
 System may have more than one CPU
 Overall goal is to increase throughput
Multiple Functional Units
Parallel Processing Classifications
 Classification of parallel processing can be considered based
on
 Internal organization of processors
 Interconnection structure between processors
 Flow of information through system
 Flynns classification
 SISD: Single Instruction, Single Data
 SIMD: Single Instruction, Multiple Data
 MISD: Multiple Instruction, Single Data
 MIMD: Multiple Instruction, Multiple Data
SISD
 Single computer with
 Control Unit
 CPU, and
 Memory
 Instructions are executed sequentially
 Parallel processing achieved by
 Multiple functional units
 Pipeline processing
SIMD
 Multiple processing units supervised by a common control
unit
 All processors:
 Receive same instruction received from the control unit
 Operate on different data
 Shared memory unit must have multiple modules so multiple
processors can each access their own memory module
simultaneously
MIMD
 Computer system that simultaneously executes many
programs
 Category for most multiprocessor and multicomputer
systems