Lect02.LecJan12 2006.PipelineProcessor

CSL718 : Pipelined Processors
PipelineTimings
12th Jan, 2006
Anshul Kumar, CSE IITD
Pipelined Processors
Parallel architectures
Function-parallel
Instr level (ILP)
Thread level
Data-parallel
Process level
Intels terminology:
Pipelined VLIWs Superscalar
processors
processors
Anshul Kumar, CSE
intra ILP
inter ILP
slide 2
Processor Performance
MIPS and MFLOPS
may not truly represent performance
Execution time of a program
true measure of performance
SPEC rating
acceptable
Anshul Kumar, CSE
slide 3
Execution Time and Clock Period

Instruction execution time = Tinst = CPI* t
t
IF
RF EX/AG M
WB
Program exec time = Tprog = N * Tinst

= N * CPI * t
N:
CPI :
t :
Anshul Kumar, CSE
Number of instructions
Cycles per instruction(Av)
Clock cycle time
slide 4
What influences clock period?

Tprog = N * CPI * t
Technology - t
Software N
Architecture - N * CPI * t
Instruction set architecture (ISA)

trade-off
N vs CPI * t
Micro architecture (A)
trade-off
CPI vs t
Anshul Kumar, CSE
slide 5
Determining Clock Period

Reg
Comb
Reg
Clock
Pmax
Clock Period = t = Pmax

Pmax = max propagation delay
Anshul Kumar, CSE
slide 6
Ideal Pipelining
Tinst
S stages
t = Tinst / S
CPI = 1
Effective time per inst Teff = 1 * Tinst / S
Anshul Kumar, CSE
slide 7
Pipelining with hazards

Tinst
S stages
Frequency of interruptions - b
t = Tinst / S
CPI = 1 + (S - 1) * b
Teff = (1 + (S - 1) * b) * Tinst / S
Anshul Kumar, CSE
slide 8
Teff
12
10
8
6
4
2
0
Teff vs. S (Tinst=10, b=.2)
1 2 3 4 5 6 7 8 9 10
S
Anshul Kumar, CSE
slide 9
A more realistic view

Reg
Reg
Comb
Clock
Pmax
t = Pmax + C
Pmax = max propagation delay
C = clocking overhead
Anshul Kumar, CSE
slide 10
Clocking Overhead
Fixed overhead
Setup time
Output delay
Variable overhead
(stretching factor) k
Clock skew
t = Tinst / S + k * Tinst / S + c
= (1 + k) * Tinst / S + c
Anshul Kumar, CSE
slide 11
Pipelining with Clocking Overhead

Teff =
[1 + (S - 1) * b] *
[(1 + k) * Tinst / S + c]
Sopt = [(1 - b) * (1 + k) * Tinst / (b * c)]
Anshul Kumar, CSE
slide 12
15
Teff vs. S (Tinst=10, b=.2, k=.1, c=1)
Teff
10
5
0
1 3 5 7 9 11 13 15
S
Anshul Kumar, CSE
slide 13
Partitioning instruction into cycles

with non-uniform stage times
IF
RF
AG T
DF
EX
PA
One action - one pipeline stage

=> large quantization overhead
Multiple actions per stage?
Multiple stages per action?
Anshul Kumar, CSE
slide 14
Example
Put Away 2 ns
Execute 7+7+8 ns
Data - ALU 3 ns
Cache Data 10 ns
Cache Dir 6 ns
Addr - MAR 3 ns
Gen Addr 9ns
Decode 6+6 ns
Data - IR 3 ns
Cache Data 10 ns
Anshul Kumar, CSE
Cache Dir 6 ns
PC - MAR 4 ns
slide 15
Optimal Pipelining
Tinst = 4+6+10+3+12+9+3+6+10+3+22+2
= 90 ns
b = 0.2
c = 4 ns
k = 5%
Sopt = [(1 - b) * (1 + k) * Tinst / (b * c)]

= 9.7 9
Tseg = 10 ns
Anshul Kumar, CSE
slide 16
Example
Put Away 2 ns
Execute 7+7+8 ns
Tseg = 10 ns
S = 10
t = 14.5 ns
S * t = 145 ns
Anshul Kumar, CSE
Data - ALU 3 ns
Cache Data 10 ns
Cache Dir 6 ns
Addr - MAR 3 ns
Gen Addr 9ns
Decode 6+6 ns
Data - IR 3 ns
Cache Data 10 ns
Cache Dir 6 ns
PC - MAR 4 ns
slide 17
Example
Put Away 2 ns
Execute 7+7+8 ns
S=9
Tseg = 13 ns
t = 17.65 ns
S * t = 159 ns
Anshul Kumar, CSE
Data - ALU 3 ns
Cache Data 10 ns
Cache Dir 6 ns
Addr - MAR 3 ns
Gen Addr 9ns
Decode 6+6 ns
Data - IR 3 ns
Cache Data 10 ns
Cache Dir 6 ns
PC - MAR 4 ns
slide 18
Example
Put Away 2 ns
Execute 7+7+8 ns
Tseg = 20 ns
S=5
t = 25 ns
S * t = 125 ns
Anshul Kumar, CSE
Data - ALU 3 ns
Cache Data 10 ns
Cache Dir 6 ns
Addr - MAR 3 ns
Gen Addr 9ns
Decode 6+6 ns
Data - IR 3 ns
Cache Data 10 ns
Cache Dir 6 ns
PC - MAR 4 ns
slide 19
Comparison
S
Tseg
S * t
Teff
13
17.65
159
45.89
10
10
14.50
145
40.60
20
25.00
125
45.00
Anshul Kumar, CSE
slide 20
Cycle Quantization
Delays are not integral multiple of clock
period
Total overhead = clocking overhead
+ quantization overhead
S * t Tinst + S * C
(ignoring k)
quantization overhead = S * (t - C) -Tinst
reduces as clock period becomes small
Anshul Kumar, CSE
slide 21
Other Timing Approaches

Self Timed Circuits
No centralized free running clock
An operation begins as soon as its inputs are
available, that is, all its predecessors have
completed
Higher speed, lower power consumption
Wave Pipelining
Omit inter-stage registers
Reduced clocking overhead
Anshul Kumar, CSE
slide 22
Conventional vs Wave Pipelining

Conventional Pipeline
Wave Pipeline
Registers separate
adjoining stages
Clock period > max prop
delay
Inter-stage data stored in
registers
No registers between
adjoining stages
Clock period less than
max prop delay
Waves of data propagate
through combinational
network (effectively, data
is stored in the
combinational circuit
delay!)
Anshul Kumar, CSE
slide 23
No pipelining
Reg X
X Reg Y
Clock
X
X
Y
Anshul Kumar, CSE
slide 24
Conventional pipelining
Reg X
Y Z
Clock
X
X
Y
Y
Z
Z
W
Z Reg W
Wave pipelining
Reg X
Z Reg W
Clock
Z
Anshul Kumar, CSE
slide 26
Timing
Reg
Reg
Comb ckt
Clock
Tp+s
T
clock period
X
Y
p
propagation delay
Anshul Kumar, CSE
s
set-up time
slide 27
Timing with clock skew

Reg
Reg
Comb ckt
Clock
T
Clock skew =
X
Y
Anshul Kumar, CSE
s
T p + s + 2
slide 28
Variation in propagation delay

Different delays in different paths
Delay variation due to process /
temperature/ power variations
Data-dependent delay variations
Anshul Kumar, CSE
slide 29
Timing for wave pipelining

Reg
X
Reg
Comb ckt
Clock
Y
T
pmin
pmax
Anshul Kumar, CSE
T p + s + 4
slide 30
Timing for wave pipelining

(expanded view)
T
Y
nT
pmin
pmax
pmin (n-1) T + 2
nT pmax + s + 2
T
p + s + 4
Anshul Kumar, CSE
(n-1) T
slide 31
Comparison
Conventional Pipeline
T pmax/n + s + 2
(plus cycle quantization
overhead)
nT pmax + ns + 2n
Anshul Kumar, CSE
Wave Pipeline
T p + s + 4
nT pmax + s + 2
slide 32
Problems with wave pipelining
Need to balance delays

Narrow range of clock frequencies
Control difficult
Not very suitable for non-linear pipelines
Anshul Kumar, CSE
slide 33
Additional Reading
Wayne P. Burleson, Maciej Ciesielski, Fabian
Klass, and Wentai Liu, Wave-Pipelining: A
Tutorial and Research Survey, IEEE Trans.
on VLSI Systems, vol. 6, no. 3, September
1998, pp. 464 474.
Anshul Kumar, CSE
slide 34

Lect02.LecJan12 2006.PipelineProcessor

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lect02.LecJan12 2006.PipelineProcessor

Transféré par

Droits d'auteur :

Formats disponibles

CSL718 : Pipelined Processors

Anshul Kumar, CSE IITD

Anshul Kumar, CSE

Execution Time and Clock Period

Program exec time = Tprog = N * Tinst

What influences clock period?

Instruction set architecture (ISA)

Anshul Kumar, CSE

Determining Clock Period

Clock Period = t = Pmax

Pipelining with hazards

Teff vs. S (Tinst=10, b=.2)

A more realistic view

Pipelining with Clocking Overhead

Sopt = [(1 - b) * (1 + k) * Tinst / (b * c)]

Anshul Kumar, CSE

Teff vs. S (Tinst=10, b=.2, k=.1, c=1)

Anshul Kumar, CSE

Partitioning instruction into cycles

One action - one pipeline stage

Anshul Kumar, CSE

Sopt = [(1 - b) * (1 + k) * Tinst / (b * c)]

Anshul Kumar, CSE

Anshul Kumar, CSE

Anshul Kumar, CSE

Anshul Kumar, CSE

Other Timing Approaches

Conventional vs Wave Pipelining

Anshul Kumar, CSE

Anshul Kumar, CSE

Timing with clock skew

Anshul Kumar, CSE

Variation in propagation delay

Anshul Kumar, CSE

Timing for wave pipelining

Timing for wave pipelining

Anshul Kumar, CSE

Problems with wave pipelining

Need to balance delays

Anshul Kumar, CSE

Anshul Kumar, CSE

Vous aimerez peut-être aussi