Vous êtes sur la page 1sur 40

1

Design Space Exploration for


Power-Efficient Mixed-Radix Ling
Adders
Chung-Kuan Cheng
Computer Science and Engineering Depart.
University of California, San Diego
2
Outline
Prefix Adder Problem
Background & Previous Work
Extensions: High-radix, Ling
Our Work
Area/Timing/Power Models
Mixed-Radix (2,3,4) Adders
ILP Formulation
Experimental Results
Future Work

Prefix Adder Challenges
Logical
Levels
Wire Tracks
Fanouts
Area
Physical
placement
Detail routing
New
Design
Scope
Timing
Gate Cap
Wire Cap
Gate sizing
Buffer
insertion
Signal slope
Input arrival
time
Output
require time
Increasing impact of physical design
and concern of power.
Power
Static
power
Dynamic
power
Power
gating
Activity
Probability
4
Binary Addition
Input: two n-bit binary numbers
and , one bit carry-in
Output: n-bit sum and one bit carry
out
Prefix Addition: Carry generation &
propagation
0 1 1
... a a a
n
0 1 1
... b b b
n
0
c
0 1 1
... s s s
n
n
c
) (
: Propagate
: Generate
1
i i i i
i i i i
i i i
i i i
b a c s
c p g c
b a p
b a g
=
+ =
=
=
+
5
Prefix Addition Formulation
i i i i i i
b a p b a g = =
Pre-
processing:
Post-
processing:
Prefix
Computation:
i i i
i i i
c p s
c P G c
=
+ =
+ 0 ] 0 : [ ] 0 : [ 1
] : 1 [ ] : [ ] : [
] : 1 [ ] : [ ] : [ ] : [
k j j i k i
k j j i j i k i
P P P
G P G G

=
+ =
6
1
2 3
4
1 2:1 3:1 4:1
Prefix Adder Prefix Structure
Graph
gp
i

p
i

G
[i:0]

s
i

b
i
a
i

GP
[i, j]
GP
[j-1, k]

GP
[i, k]

gp generator

sum generator

GP cell

Pre-
processing
Post-
processing
Prefix
Computation
7
Previous Works Classical prefix
adders
1 2 3 4 5 6 7 8
1 2:1 3:1 4:1 6:1 7:1 8:1 5:1
Brent-Kung:
Logical levels: 2log
2
n1
Max fanouts: 2
Wire tracks: 1
1 2 3 4 5 6 7 8
1 2:1 3:1 4:1 6:1 7:1 8:1 5:1
Kogge-Stone:
Logical levels: log
2
n
Max fanouts: 2
Wire tracks: n/2
1 2 3 4 5 6 7 8
1 2:1 3:1 4:1 6:1 7:1 8:1 5:1
Sklansky:
Logical levels: log
2
n
Max fanouts: n/2
Wire tracks: 1
8
High-Radix Adders
Each cell has more than two fan-ins
Pros: less logic levels
6 levels (radix-2) vs. 3 levels (radix-4) for
64-bit addition
Cons: larger delay and power in each
cell
9
Radix-3 Sklansky & Kogge-
Stone Adder
David Harris, Logical Effort of Higher Valency Adders
10
Ling Adders
i i i i i i
b a p b a g = =
Pre-
processing:
Post-
processing:
Prefix
Computation:
i i i
i i i
c p s
c P G c
=
+ =
0 ] 0 : 1 [ ] 0 : 1 [
] : 1 [ ] : [ ] : [
] : 1 [ ] : [ ] : [ ] : [
k j j i k i
k j j i j i k i
P P P
G P G G

=
+ =
Prefix
Ling
* *
[ 1:0] [ 1:0] 1
( )
i i i i i i
s G t G t p

= +
,
i i i i i i
i i i
g a b p a b
t a b
= = +
=
1
*
] 1 : [ 1
*
] 1 : [
,

- = + =
i i i i i i i i
p p P g g G
* * * *
[ : ] [ : ] [ 1: 1] [ 1: ] i k i j i j j k
G G P G

= +
*
] : 1 [
*
] : [
*
] : [ k j j i k i
P P P

=
*
] 0 : 1 [ 1
=
i i i
G p c
11
An 8-bit Ling Adder
H
1
H
2
H
3
H
4
H
5
H
6
H
7
H
8
1 2 3 4 5 6 7 8
0 1
s
i
a
i
b
i
p
i-1
H
i
g
i
p
i-1
p
i-2
G
*
i
P
*
i-1
g
i-1
12
Area Model
Distinguish physical placement from logical
structure, but keep the bit-slice structure.
Logical view
Physical view
Bit position
L
o
g
i
c
a
l

l
e
v
e
l

Bit position
P
h
y
s
i
c
a
l

l
e
v
e
l

Compact placement

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
13
Timing Model
Cell delay calculation:
p f d + =
Effort Delay Intrinsic Delay
h g f - =
Logical Effort
Electrical Effort = Cout/Cin
= (fanouts+wirelength) / size
Intrinsic properties of the cell
14
Power Model
Total power consumption:
Dynamic power + Static Power
Static power: leakage current of device
P
sta
= *#cells
Dynamic power: current switching capacitance
P
dyn
= C
load
is the switching probability
= j (j is the logical level*)
cells C j P P P
load sta dyn total
# + = + =
* Vanichayobon S, etc, Power-speed Trade-off in Parallel Prefix Circuits
15
ILP Formulation Overview
Structure variables:
GP cells
Connections (wires)
Physical positions
Capacitance variables:
Gate cap
Vertical wire cap
Horizontal wire cap
Timing variables:
Input arrival time
Output arrival time
Power
Objective
ILP
ILOG CPLEX
Optimal Solution
16
Integer Linear Programming
(ILP)
ILP: Linear Programming with integer
variables.
Difficulties and techniques:
Constraints are not linear
Linearize using pseudo linear constraints
Search Space too large
Reduce search space
Search is slow
Add redundant constraints to speedup
17
ILP Integer Linear Programming
Linear Programming: linear constraints, linear
objective, fractional variables.
Integer Linear Programming: Linear Programming
with integer variables.
LP Optimal
Constraints
ILP Optimal

ILP Pseudo-Linear Constraint
Minimize: x
3
Subject to: x
1
> 300
x
2
> 500
x
3
= min(x
1
, x
2
)
Minimize: x
3
Subject to: x
1
> 300
x
2
> 500
x
3
s x
1
x
3
s x
2
x
3
> x
1
1000 b
1
(1)

x
3
> x
2
1000 (1 b
1
)

(2)

b
1
is binary
Problem:
ILP formulation:
LP objective: 0
ILP objective: 300
A constraint is called pseudo-linear if its not effective
until some integer variables are fixed.
Pseudo-linear constraints mostly arise from IF/ELSE
scenarios
binary decision variables are introduced to indicate true or false.
19
ILP Solver Search Procedure
0
Root (all vars are fractional)
b
1
b
2
b
3
b
4
2
0 1
infeasible
0 1
3
feasible
(current
best)
infeasible
Minimize F(b
1
, b
2
, b
3
, b
4
, f
1
,) b
i
is binary
It is VERY helpful if ILP objective
is close to LP objective
2
0 1
4
0 1
3 2
4
5 5
0
1
Cut
2
Bound
(Smallest candidate)
Interval Adjacency Constraint
H
1
H
2
H
3
H
4
H
5
H
6
H
7
H
8
1 2 3 4 5 6 7 8
(7,3): Interval [7,1]
(3,2): Interval [3,1]
(7,2): Interval [7,4]
Must be adjacent,
i.e. 4 = 3 + 1
(column id, logic level)
21
Linearization for Interval
Adjacency Constraint
(i, j)
(i, h)
(k
1
, l
1
) (k
2
, l
2
)
wl wr
1
wr
2
] , [
) , ( ) , (
R
h i
L
h i
y y ] , [
) 1 , 1 ( ) 1 , 1 (
R
l k
L
l k
y y ] , [
) 2 , 2 ( ) 2 , 2 (
R
l k
L
l k
y y
] , [
) , ( ) , (
R
j i
L
j i
y y
1 1 if 1
) , ( ) , (
= = + = (i,j,k,l) wr wl(i,j,h) y y
L
l k
R
h i
1 if 1 ) , , , ( 1
) , (
) , (
= + =

wl(i,j,h) l k j i wr k y
l k
R
h i
1 1 ) , , , ( 1
) , (
) , (
+ >

wl(i,j,h)) ( n l k j i wr k y
l k
R
h i
1 1 ) , , , ( 1
) , (
) , (
+ + s

wl(i,j,h)) ( n l k j i wr k y
l k
R
h i
i y
L
j i
=
) , (
Linearize
Pseudo Linear
Left interval bound
equal to column index
22
Search Space Reduction
Lings adder:
separate odd and
even bits
Double the bit-width
we are able to
search
H
1
H
2
H
3
H
4
H
5
H
6
H
7
H
8
1 2 3 4 5 6 7 8
23
Redundant Constraints
Cell (i,j) is known to have logic level j before wire
connection
Assume load is MinLoad (fanout=1 with minimum
wire length):

+ > MinLoad j P
j i ) , (
Cell (i,j) has a path of length j-1
Assume each cell along the path has MinLoad
) (
) , (
MinLoad LE PD j T
j i
+ >
24
Experiments
16-bit Uniform Timing

Experiments
16-bit Uniform Timing
25
26
Min-Power Radix-2 Adder
(delay= 22, power = 45.5FO4 )
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
27
Min-Power Radix-2&4 Adder
(delay=18, power = 29.75FO4 )
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
Radix-2 Cell Radix-4 Cell
28
Min-Power Mixed-Radix Adder
(delay=20, power = 28.0FO4)
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
Radix-2 Cell Radix-4 Cell Radix-3 Cell
29
Experiments 16-bit Non-
uniform Time (Mixed Radix)
ILP is able to handle non-uniform timings
Ling adders are most superior in increasing arrival time
faster carries
Increasing Arrival Time
(delay=35.5, power = 27.0FO4 )
30
Decreasing Arrival Time
(delay=34.5, power = 30.5FO4)
31
Convex Arrival Time
(delay=35.9, power = 32.4FO4 )

32
Increasing Required Time
(delay=34.5, power = 30.5FO4)

33
Decreasing Required Time
(delay=36.5, power = 32.5FO4)

34
Convex Required Time
(delay=36.5, power = 32.5FO4)
35
36
Experiments 64-bit Hierarchical
Structure (Mixed-Radix)
Handle high bit-width applications
16x4 and 8x8
ILP Block ILP Block ILP Block ILP Block
ILP Block
a
1
b
1
a
16
b
16
a
17
b
17
a
32
b
32
a
33
b
33
a
48
b
48
a
49
b
49
a
64
b
64
... ... ... ...
Level 1
Level 2
... ... ... ...
... ... ...
...
GP*
[64:50]
GP*
[48:34]
GP*
[32:18]
GP*
[16:2]
GP*
[1:1]
GP*
[17:17]
GP*
[33:33]
GP*
[49:49]
... ... ...
H
64
H
49
H
48
H
33
H
32
H
17
H
16
H
1
37
Experiments 64-bit
Hierarchical Structure
TSL: a 64-bit high-radix three-stage Ling adder
V. Oklobdzija and B. Zeydel, Energy-Delay Characteristics of CMOS Adders,
in High-Performance Energy-Efficient Microprocessor Design, pp. 147-170, 2006
38
ASIC Implementation - Results
64-bit hierarchical design (mixed-radix) by
ILP vs. fast carry look-ahead adder by
Synopsys Design Compiler
TSMC 90nm standard cell library was used
39
Future Work
ILP formulation improvement
Expected to handle 32 or 64 bit
applications without hierarchical scheme
Optimizing other computer arithmetic
modules
Comparator, Multiplier
40
Q & A
Thank You!

Vous aimerez peut-être aussi