Vous êtes sur la page 1sur 34

CSL718 : Pipelined Processors

PipelineTimings
12th Jan, 2006

Anshul Kumar, CSE IITD

Pipelined Processors
Parallel architectures

Function-parallel
Instr level (ILP)

Thread level

Data-parallel
Process level

Intels terminology:
Pipelined VLIWs Superscalar
processors
processors
Anshul Kumar, CSE

intra ILP
inter ILP

slide 2

Processor Performance
MIPS and MFLOPS
may not truly represent performance
Execution time of a program
true measure of performance
SPEC rating
acceptable

Anshul Kumar, CSE

slide 3

Execution Time and Clock Period


Instruction execution time = Tinst = CPI* t
t

IF

RF EX/AG M

WB

Program exec time = Tprog = N * Tinst


= N * CPI * t
N:
CPI :
t :
Anshul Kumar, CSE

Number of instructions
Cycles per instruction(Av)
Clock cycle time
slide 4

What influences clock period?


Tprog = N * CPI * t
Technology - t
Software N

Architecture - N * CPI * t

Instruction set architecture (ISA)


trade-off
N vs CPI * t
Micro architecture (A)
trade-off
CPI vs t

Anshul Kumar, CSE

slide 5

Determining Clock Period


Reg

Comb

Reg

Clock

Pmax

Clock Period = t = Pmax


Pmax = max propagation delay
Anshul Kumar, CSE

slide 6

Ideal Pipelining
Tinst

S stages

t = Tinst / S
CPI = 1
Effective time per inst Teff = 1 * Tinst / S
Anshul Kumar, CSE

slide 7

Pipelining with hazards


Tinst

S stages

Frequency of interruptions - b

t = Tinst / S
CPI = 1 + (S - 1) * b
Teff = (1 + (S - 1) * b) * Tinst / S
Anshul Kumar, CSE

slide 8

Teff

12
10
8
6
4
2
0

Teff vs. S (Tinst=10, b=.2)

1 2 3 4 5 6 7 8 9 10
S
Anshul Kumar, CSE

slide 9

A more realistic view


Reg

Reg

Comb

Clock

Pmax

t = Pmax + C
Pmax = max propagation delay
C = clocking overhead
Anshul Kumar, CSE

slide 10

Clocking Overhead
Fixed overhead

Setup time
Output delay

Variable overhead
(stretching factor) k
Clock skew

t = Tinst / S + k * Tinst / S + c
= (1 + k) * Tinst / S + c
Anshul Kumar, CSE

slide 11

Pipelining with Clocking Overhead


Teff =

[1 + (S - 1) * b] *
[(1 + k) * Tinst / S + c]

Sopt = [(1 - b) * (1 + k) * Tinst / (b * c)]

Anshul Kumar, CSE

slide 12

15

Teff vs. S (Tinst=10, b=.2, k=.1, c=1)

Teff

10
5
0
1 3 5 7 9 11 13 15
S

Anshul Kumar, CSE

slide 13

Partitioning instruction into cycles


with non-uniform stage times

IF

RF

AG T

DF

EX

PA

One action - one pipeline stage


=> large quantization overhead
Multiple actions per stage?
Multiple stages per action?
Anshul Kumar, CSE

slide 14

Example

Put Away 2 ns
Execute 7+7+8 ns
Data - ALU 3 ns
Cache Data 10 ns
Cache Dir 6 ns
Addr - MAR 3 ns
Gen Addr 9ns
Decode 6+6 ns
Data - IR 3 ns
Cache Data 10 ns

Anshul Kumar, CSE

Cache Dir 6 ns
PC - MAR 4 ns

slide 15

Optimal Pipelining
Tinst = 4+6+10+3+12+9+3+6+10+3+22+2
= 90 ns
b = 0.2
c = 4 ns

k = 5%

Sopt = [(1 - b) * (1 + k) * Tinst / (b * c)]


= 9.7 9
Tseg = 10 ns
Anshul Kumar, CSE

slide 16

Example

Put Away 2 ns
Execute 7+7+8 ns

Tseg = 10 ns

S = 10
t = 14.5 ns
S * t = 145 ns

Anshul Kumar, CSE

Data - ALU 3 ns
Cache Data 10 ns
Cache Dir 6 ns
Addr - MAR 3 ns
Gen Addr 9ns
Decode 6+6 ns
Data - IR 3 ns
Cache Data 10 ns
Cache Dir 6 ns
PC - MAR 4 ns

slide 17

Example

Put Away 2 ns
Execute 7+7+8 ns

S=9

Tseg = 13 ns
t = 17.65 ns
S * t = 159 ns

Anshul Kumar, CSE

Data - ALU 3 ns
Cache Data 10 ns
Cache Dir 6 ns
Addr - MAR 3 ns
Gen Addr 9ns
Decode 6+6 ns
Data - IR 3 ns
Cache Data 10 ns
Cache Dir 6 ns
PC - MAR 4 ns

slide 18

Example

Put Away 2 ns
Execute 7+7+8 ns

Tseg = 20 ns

S=5
t = 25 ns
S * t = 125 ns

Anshul Kumar, CSE

Data - ALU 3 ns
Cache Data 10 ns
Cache Dir 6 ns
Addr - MAR 3 ns
Gen Addr 9ns
Decode 6+6 ns
Data - IR 3 ns
Cache Data 10 ns
Cache Dir 6 ns
PC - MAR 4 ns

slide 19

Comparison
S

Tseg

S * t

Teff

13

17.65

159

45.89

10

10

14.50

145

40.60

20

25.00

125

45.00

Anshul Kumar, CSE

slide 20

Cycle Quantization
Delays are not integral multiple of clock
period
Total overhead = clocking overhead
+ quantization overhead
S * t Tinst + S * C
(ignoring k)
quantization overhead = S * (t - C) -Tinst
reduces as clock period becomes small
Anshul Kumar, CSE

slide 21

Other Timing Approaches


Self Timed Circuits
No centralized free running clock
An operation begins as soon as its inputs are
available, that is, all its predecessors have
completed
Higher speed, lower power consumption

Wave Pipelining
Omit inter-stage registers
Reduced clocking overhead
Anshul Kumar, CSE

slide 22

Conventional vs Wave Pipelining


Conventional Pipeline

Wave Pipeline

Registers separate
adjoining stages
Clock period > max prop
delay
Inter-stage data stored in
registers

No registers between
adjoining stages
Clock period less than
max prop delay
Waves of data propagate
through combinational
network (effectively, data
is stored in the
combinational circuit
delay!)

Anshul Kumar, CSE

slide 23

No pipelining
Reg X

X Reg Y

Clock

X
X
Y

Anshul Kumar, CSE

slide 24

Conventional pipelining
Reg X

Y Z

Clock

X
X
Y
Y
Z
Z
W

Z Reg W

Wave pipelining
Reg X

Z Reg W

Clock

Z
Anshul Kumar, CSE

slide 26

Timing
Reg

Reg
Comb ckt

Clock
Tp+s

T
clock period
X
Y
p
propagation delay
Anshul Kumar, CSE

s
set-up time
slide 27

Timing with clock skew


Reg

Reg
Comb ckt

Clock
T

Clock skew =
X
Y

Anshul Kumar, CSE

s
T p + s + 2

slide 28

Variation in propagation delay


Different delays in different paths
Delay variation due to process /
temperature/ power variations
Data-dependent delay variations

Anshul Kumar, CSE

slide 29

Timing for wave pipelining


Reg
X

Reg
Comb ckt

Clock

Y
T

pmin

pmax
Anshul Kumar, CSE

T p + s + 4

slide 30

Timing for wave pipelining


(expanded view)
T

Y
nT
pmin
pmax
pmin (n-1) T + 2
nT pmax + s + 2
T
p + s + 4
Anshul Kumar, CSE
(n-1) T

slide 31

Comparison
Conventional Pipeline
T pmax/n + s + 2
(plus cycle quantization
overhead)
nT pmax + ns + 2n

Anshul Kumar, CSE

Wave Pipeline
T p + s + 4

nT pmax + s + 2

slide 32

Problems with wave pipelining

Need to balance delays


Narrow range of clock frequencies
Control difficult
Not very suitable for non-linear pipelines

Anshul Kumar, CSE

slide 33

Additional Reading
Wayne P. Burleson, Maciej Ciesielski, Fabian
Klass, and Wentai Liu, Wave-Pipelining: A
Tutorial and Research Survey, IEEE Trans.
on VLSI Systems, vol. 6, no. 3, September
1998, pp. 464 474.

Anshul Kumar, CSE

slide 34

Vous aimerez peut-être aussi