Opt

A Distinctive O ( mn ) Time Algorithm for Optimal Buffer Insertions
Xinsheng Wang1*, Wenpan Liu1, Mingyan Yu2

1
Harbin Institute of Technology at Weihai, 2 Wenhua West Road, Weihai, China
2
Ningbo Institute of Technology Zhejiang University, 1 Qianhu South Road, Ningbo, China
*
E-mail: xswang@hit.edu.cn
Ginnekens algorithm to allow b buffer types and improved
the time complexity to O ( b 2 n 2 ) . In 2003, Shi and Li [11]
Abstract
With the development of technology, interconnect delay
has become a key factor in VLSI. Buffer insertion is an
effective technique for reducing interconnect delay. This
paper presents an advanced algorithm for finding the
optimal buffer insertion solution. The advanced algorithm
can further improve O ( mn ) time algorithm for optimal
buffer insertion, in which m is the number of sinks and n is
the number of candidate buffer insertion positions.
Assuming that the sink number m is fixed, it is a significant
improvement over O ( nlog 2 n ) time algorithm, and the
applied four novel techniques to their algorithm and

improved the time complexity to O ( b 2 nlogn ) for 2-pin nets
and O ( b 2 nlog 2 n ) for multi-pin nets. To reduce the time
complexity caused by buffer type b, Li and Shi [12]

improved the algorithm to time O ( bn 2 ) in 2005. In 2012, Li
and Shi [13] advanced their algorithm to time complexity
O ( mn ) , where m is the number of sinks and n is the number
of buffer positions. The speedup of their algorithm in [13] is
achieved by a clever bookkeeping method and an innovative
linked list that allows time O (1) update for adding a wire or
O ( n 2 ) time algorithm. The improvement is made possible
by a new pruning rule and the predictive merging technique

that can perform 2-pin and multi-pin interconnect optimal
buffer insertion faster. Based on the test cases, the advanced
algorithm is evidently faster than previous best algorithm.
a buffer.
In this paper, we first propose a new algorithm that can
find optimal buffer insertion solution faster than previous
best algorithm. The speedup is achieved by the nature
sorting method of the buffer insertion solutions and the
observation that the optimal candidate solution associated
with any buffer type must lie on the concave shell of the
(C,Q) plane. The algorithm can be extended to multi-pin
nets and b buffer types in time O ( b 2 n + bmn ) . In fact, m is
Keywords
Buffer insertion, interconnect, time complexity, delay
1. Introduction
As integrated circuit feature size continues to scale down,
the propagation delay of interconnect becomes more and
more serious, so that delay optimization techniques for
interconnect are increasingly important for achieving timing
closure of high performance design. Buffer insertion is a
popular technique for reducing interconnect delay. To solve
the buffer insertion problem, many algorithms have been
presented, whose basic target is finding out an optimal
solution to insert buffers on a wiring tree so that the time
slack at the source is max. A study [1] by Saxena et al
shows that intra-block repeaters for 32nm node will reach an
alarming 70% of the total block cells. It will cost a very long
time to find out the optimal buffer insertion solution if the
algorithm is not fast enough. Consequently, more efficient
algorithms are required.
In the last two decades, buffer insertion problem has
been extensively studied. In 1990, van Ginneken [2]
presented a dynamic programming algorithm. The algorithm
can obtain the optimal buffer insertion solution with time
complexity O ( n 2 ) , where n is the number of buffer
much less than n in general based on industrial ASIC chips

statistics according to [14]. Furthermore, from the previous
works and researches, we found that it takes a considerable
time to merge two branches of the wiring tree for multi-pin
nets. Based on this discovery, we develop a new data
structure to speed up the process of merging two branches
compared with previous best work.
This paper is organized as follows. Section 2 gives the
model and problem formulation. The new pruning rule for 2pin nets is shown in Section 3. Section 4 extends the
algorithm to multi-pin nets. Experiment results are given in
Section 5 and the last section gives the conclusion.
2. Model and problem formulation

We define a net as a tree. For simplicity, we take the
same topology as [2] by assuming that the routing is given
as a binary tree. The routing tree is denoted as T= (V, E),
where V is the vertex and E is the wire. The relationship
between V and E can be explained by an expression:
e = (v j ,vi ), e E , v j , vi V . The vertex can be classified to
insertion positions. In addition to the presentation of the

dynamic programming algorithm, he also projected some
extensions of his algorithm. Many extensions have been
made based on van Ginnekens algorithm. [3] and [4]
considered wire effects, [5] and [6] took interconnect
process and delay variation into consideration, [7] dealt with
the noise constraint effect, and [8] and [9] studied the
simultaneous tree construction. Lillis [10] extended van
978-1-4799-7581-5/15/$31.00 2015 IEEE
three types, source vertex V0, sink vertices Vs , and internal

vertices Vn . The buffer insertion positions are the vertices in
Vn except for the merging ones. Each sink vertex s Vs has
sink capacitance C ( s ) and required arrival time RAT ( s ) .

293
16th Int'l Symposium on Quality Electronic Design
The buffer to be inserted into the tree hass intrinsic delay

K ( B ) , driving resistance R ( B ) , and innput capacitance
C ( B) .
Each wire e = (v j ,vi ) E is associatedd with lumped

resistance r l and capacitance c l , where l is the length
from v j to vi , r is wire unit length resistancce and c is wire
unit length capacitance. The special propertiees of the Elmore
delay model allow the use of hierarchical aalgorithm, which
is essential to the bottom-up structure off our algorithm.
Considering two Vertices vi+1 and vi wheree signal travels
from vi+1 to vi , when a wire e = (vi+1 , vi ) is added at vi , the
Elmore delay of e will be:
r l cl
+ r l C (vi )
D(vi , e) =
2
where C (vi ) is the downstream loading ccapacitance from
vertex vi to sink. When vi is obtained by inserting a buffer B,
the buffer delay is:
D(vi ,B) = K(B)+ R(B) C(vvi ).
For v V , each different buffer insertioon method on the
downstream sub-tree from v to sink s m
makes a unique
solution. Then the delay from v to sink s unnder solution
is:
D ( v, ) =
e = ( vi +1 , vi )
( D (vi +1 , B ) + D (vi , e)).
If vi is not actually inserted by a buffer, then
D(vi ,B) = 0 .The time slack of v under soluttion is defined

as:
Q(v, ) = min{RAT ( s) D(v, )}
C ( vi +1 ) = C ( B )
The set of the pair ( C, Q ) of eaach solution is denoted as
N ( v ) . Once we have the N ( v0 ) at the source vertex, we
can find out the optimal solutio

on that maximizes the
Q ( v0 , ) .
3. New pruning rule
In the solutions set N ( v ) , it may

m contain such solutions
those cant be the optimal buffer in

nsertion methods after all.
We call such solutions as redundant ones which can be
pruned when the bottom-up algorithm is in process. The
definition of redundant solutions is different from neither
van Ginnekens [2] nor Li and Shis [12]. We will introduce
ng.
our pruning in detail in the followin
In our new pruning, we sort N ( v ) in order of increasing
Q ( v, ) at first, so q1 q2 q3 q4 , which is shown in

Figure 1. The process of the new
w pruning process consists
of two stages. In the first stage, wee consider the relationship
of each two solutions as the comm
mon pruning rule does in
[2]. If c j ci , j > i , then i is reedundant, which will be
pruned. In other words, solution
n i results in a smaller
required arrival time but with a laarger loading capacitance
when compared to solution j . Th
he reason of this criterion
is well understood shown in Figurre 2. Taking 4 and 5 as
example, since c5 < c4 , 4 is redund
dant. Figure 2 shows that
any vertices are above the line: C = c5 before q5, they are
redundant.
sVs
Following the model and definitions, w

we formulate the
buffer insertion problem: Given the routing tree T = (V, E),
from where we can obtain all the verrtices messages
including buffer insertion positions and mergging vertices, etc.
The parameters of wire and device are also given: wire unit
length capacitance c and unit length resisstance r, buffer
intrinsic delay K ( B ) , output resistance R ( B ) , and input
capacitance C ( B ) . The objective of the prooblem is to find
the optimal solution for tree T that maxim
mizes Q ( v0 , ) .
We denote the time slack and capacitancce under solution

at vertex vi as Q ( vi , ) and C ( vi , ) respeectively. When a
s N ( v ) before pruning
Figure 1: Original solutions set
wire e = (vi+1 , vi ) is added, the time slack and capacitance at

vertex vi +1 are respectively:
r l cl
r l C (vi , )
2
C ( vi +1 ) = C ( vi , ) + c l .
Q ( vi +1 ) = Q ( vi , )
When a buffer is inserted at vi +1 ffrom its direct

downstream vertex vi , we can compute thee time slack and
capacitance respectively:
Q ( vi +1 ) = Q ( vi , ) K ( B) R( B) C (vi , )
Figure 2: The first sttage of pruning
In the second stage of pruning, we connsider each three

solutions. If i , j and k , i < j < k , meet the condition:
c j ci
q j qi
>
ck c j
qk q j
which conflicts with given conditio

on.
When
Q(v', k ) - Q(v'', j ) > 0,
Q(v', j ) - Q(v',', i ) = 0,
then j is redundant and should be pruned. We prove it in
we have
C(v, k ) - C(v, j )
the following.
Lemma 1: At the vertex v in a 2-pin net, iif there are three
solutions i , j and k , i < j < k , which meet the condition:
C(v, j ) - C(v, i )
Q(v, j ) Q(v, i )
>
C(v, k ) - C
C(v, j )
Q(v, k ) Q
Q(v, j )
Q(v, k ) Q(v,, j )
C(v, j ) - C(v, i )
Q(v, j ) Q(v, i )
<
1
,
R
1
.
R
C i ) > 0 , we get that

As C(v', j ) - C(v', i ) = C(v', j ) - C(v',
then j is redundant.
j is redundant according to the first

f
stage of pruning. The
Proof: Assume v ' is the upstream vertexx from v . Let L

be the length between v ' and v , D be the ttotal sum of the
delay of wires and the delay of the buffer orr driver at source
vertex from v ' to v , and R be the sum of tthe resistance of
wires and the resistance of the buffer or ddriver at source
vertex from v ' to v .Then the time slack at v ' is:
all other conditions are similar with above three type

conditions.
Finally, after combining all the situations, we can get the
conclusion that when i , j andd k , i < j < k , meet the
condition:
r c L
r L C(v, i )
2
K ( B) R( B ) (C(v, i ) + c L)
Q(v', i ) = Q(v, i ) -
= Q(v, i ) - R C(v, i ) - D
D.
c j ci
q j qi
>
ck c j
qk q j
, j is redundant.
Taking 2 , 3 and 5 as exam

mple shown in Figure 3,
c3 c2 c5 c3
>
, 3 is redundant and should be
q3 q2 q5 q3
pruned.
since
Therefore
Q(v', k ) - Q(v', j )= Q(v, k ) - Q(v, j ) - R (C(v, k ) - C(v, j )),
Q(v', j ) - Q(v', i ) = Q(v, j ) - Q(v, i ) - R (C(v, j ) - C(v, i )).
When
Q(v', k ) - Q(v', j ) > 0,
Q(v', j ) - Q(v', i ) > 0,
we have
C(v, k ) - C(v, j )
Q(v, k ) Q(v, j )
C(v, j ) - C(v, i )
Q(v, j ) Q(v, i )
<
1
,
R
<
1
.
R
As C(v', k )= C(v', j ) , we get that j is redundant

according to the first stage of pruning.
When
Q(v', k ) - Q(v', j ) < 0,
The Similarity between our new

w pruning and the convex
pruning is the similar pruning con
ndition and the difference
between them is that the new prun
ning sorts the solutions in
increasing Q order instead of in increasing C order. The
w pruning is that when we
most evident advantage of our new
want to choose the best solution th
hat maximizes time slack
Q , we can get it right now as the solutions are organized in
increasing Q order.
Q(v', j ) - Q(v', i ) > 0,
we have
C(v, k ) - C(v, j )
Q(v, k ) Q(v, j )
C(v, j ) - C(v, i )
Q(v, j ) Q(v, i )
>
1
,
R
<
1
,
R
4. Extension to multi-pin nets
therefore
C(v, j ) - C(v, i )
Q(v, j ) Q(v, i )
<
C(v, k ) - C
C(v, j )
Q(v, k ) Q(v,
Q j )
Figure 3: The second stage of pruning

Figure 3 shows that 3 is in th
he shadow triangle, which
meets the above pruning condition
n, it is redundant and then
pruned. After the second stage of pruning, we get the nonredundant solutions set L ( v ) .
[13]s algorithm applies to 2--pin nets well. However,

when it comes to multi-pin nets, th
he situation becomes more
complex. This is because the merging process is considered.
Assuming vk is a merging vertex and vm and vn are its two

direct branch vertices. In such situation, we cannot any more
guarantee that the pruning can preserve optimality because
when a merging vertex is reached, the time slack and
capacitance are respectively:
Q ( vk ) = min{Q ( vm ) , Q(vn )}
C ( vk ) = C ( vm ) + C ( vn )
4.1. New data structure

To extend the algorithm to multi-pin, [13] maintains
another list A ( v ) which stores the solutions only pruned by
van Ginnekens pruning, in addition to the solutions list
L ( v ) after convex pruning. A ( v ) is only used when the
merging occurs, which causes much time and space waste.
To overcome this disadvantage, we construct a new form
data structure:
CVertex
{
char m_Type;//source, sink, intern and merging
CVertex parent;
CVertex leftchild;
CVertex rightchild;
Solution sol;
}
When algorithm reaches a vertex vi , it first judges
whether its parent vertex is merging. If not, the solutions of
the vertex vi are pruned by two stages of pruning, which is
denoted as Prune() and if yes, the solutions are just pruned
by the first stage of pruning, which is denoted as
MergePrune(). By the prediction of the parents vertex type,
we can save much time and space compared with [13] which
maintains a redundant solution list A ( v ) .
4.2. Algorithm for multi-pin net

The algorithm first traverses the routing tree bottom-up
and establishes a set of non-redundant solutions for all subtrees of T which is stored in a linked list L . When it reaches
the source vertex, it picks the optimal solution from the set
of left solutions in L . Then the algorithm backtracks the
optimal solution to determine whether to insert a buffer or
not for each buffer position recursively. The key part of the
algorithm is the bottom-up traversal of T, as it computes the
solutions at every vertex and determines the optimal buffer
insertion. The detailed bottom-up algorithm for multi-pin
nets is shown in Figure 4.
From Section 3, we know that the advancement of the
pruning rule doesnt change the character of O (1) time
complexity of AddWire() or AddBuffer(). In addition, the
process of merging two branches is also O (1) time. So when
we combine the advancements into the algorithm, the
character of linear time is preserved. Consequently, the time
complexity of our new algorithm is also O ( b 2 n+ mn ) , but it

can fast gain the optimal solution.
Advanced algorithm
Input:
The routing tree T
Output: Non-redundant solutions at the source
vertex stored in a linked list L
Begin
For i = n to 1 do
If vertex vi is sink
Set Qa = Ca = Ra = 0 ;
Let L contain one solution ( Q, C ) , where
Q = RAT ( vn ) and C = C ( vn ) ;
Let all best and new pointers point to the

only solution in L;
Return L(vi ) .
Else if vi is a merging vertex
MergeBranch();
MergePrune();
Return L(vi ) .
Else if vi is a obtained by adding a wire
AddWire();
Prune();
Return L(vi ) .
Else if vi is a obtained by inserting a buffer
AddBuffer();
Prune();
Return L(vi ) .
Else Return L(v0 ) ;
End of algorithm
Figure 4: Algorithm for multi-pin nets
5. Experimental results
All algorithms are implemented in C++ on a Linux
server with Intel(R) Xeon(R) 2.4GH CPU and 12G memory.
The parameters of device and interconnect are shown in
Table 1 adapted from [4] and [13] which are based on
TSMC 180 nm technology.
Table 1: Device and interconnect parameters
Parameter
Value
unit length capacitance
0.118 fF/m
unit length resistance
0.076 /m
buffer intrinsic delay
29 ps -34 ps
output resistance
180 -500
input capacitance
0.7 fF-10fF
Table 2 shows for the simulation results of 2-pin nets
with different numbers of buffer insertion positions based
one buffer type. Table 3 shows for the simulation results of
multi-pin nets with different numbers of sinks and buffer
insertion positions. Our new algorithm is implemented with
comparison to both van Ginnekens algorithm [2] and Li and
Shis algorithm [13]. We have three different buffer libraries,
whose size are 1 and 2, respectively, and denote them as
b1 and b2 .The simulation results show that our new

algorithm can be as high as 22 times faster in 2-pin nets and
23 times faster in multi-pin nets than the algorithm
presented in [13] as is shown in Table 2 and Table 3.
Table 2: Simulation results for 2-pin net with buffer
library b1
Buffer
positions
n
CPU time(second)
Li and
Shi
[13]
0.004
New
algorithm
325
Van
Ginneken
[2]
0.315
404
0.377
0.006
0.0003
725
1.008
0.007
0.0004
1297
3.277
0.013
0.0006
1522
4.569
0.015
0.0007
2044
8.343
0.020
0.0010
2567
13.309
0.025
0.0011
0.0002
Table 3: Simulation results for multi-pin net with buffer

library b1 and b2
Sinks
m
Buffer
pos.
n
10
467
50
2547
75
3848
100
5147
Buffer
Type
b
b1
CPU time(second)
Van
Li and
New
Ginneken
Shi
algorithm
[2]
[13]
0.295
0.050
0.0002
b2
0.465
0.070
0.0003
b1
9.574
0.018
0.0013
b2
12.153
0.026
0.0017
b1
22.289
0.028
0.0023
b2
28.170
0.041
0.0030
b1
40.138
0.040
0.0039
b2
50.820
0.058
0.0046
6. Conclusion
We have proposed an advanced algorithm to speed up the
process of finding the optimal buffer solution of a wire tree.
The algorithm time complexity is O ( b 2 n+ mn ) for multi-pins.
Simulation results have shown that our algorithm runs
evidently faster than previous works for both 2-pin nets and
multi-pin nets. In addition, as a fundamental algorithm, our
achievement is applicable to some of the precious works,
such as in [4] and [10].
7. References
[1]
[2]
P. Saxena, N. Menezes, P. Cocchini, and D. A.

Kirkpatrick, Repeater scaling and its impact on
CAD, IEEE Trans. Computer-Aided Design, vol. 23,
no. 4, 2004, pp. 451463.
L. P. P. P. van Ginneken, Buffer placement in
distributed RC-tree network for minimal Elmore delay,
in Proc. IEEE Int. Symp. Circuits Syst. 1990, pp. 865
868.
[3]
Shiyan Hu, Zhuo Li, and C. J. Alpert, A fully

polynomial time approximation scheme for timing
driven minimum cost buffer insertion, ACM/IEEE,
Design Automation Conference, 2009, pp. 424-429.
[4] C. J. Alpert, and A. Devgan, Wire segmenting for
Improved Buffer Insertion, Design Automation
Conference, 1997, pp. 588-593.
[5] Jinjun Xiong, Kingho Tam, and Lei He, Buffer
insertion considering process variation, Proceedings
Design, Automation and Test in Europe, 2005, pp.
970-975.
[6] A. Narasimhan, and R. Sridhar, Variability Aware
Low-Power Delay Optimal Buffer Insertion for Global
Interconnects, IEEE Trans. Circuits and Systems I:
Regular Papers, 2010, pp. 3055-3063.
[7] C. J. Alpert, A. Devgan, and S. T. Quay, Buffer
insertion for noise and delay optimization, in Proc.
ACM/IEEE Design Automation Conf.1998, pp. 362
367.
[8] T. Okamoto, and J. Cong, Buffered steiner tree
construction with wire sizing for interconnect layout
optimization, in Proceedings IEEE/ACM Int. Conf.
Computer-Aided Design 1996, pp. 4449.
[9] M. Hrkic, and J. Lillis, S-tree: a technique for
buffered r outing tree synthesis, in Proc. ACM/IEEE
Design Automation Conf. 2002, pp. 578583.
[10] J. Lillis, C. K. Cheng, and Ting-Ting Y. Lin, Optimal
wire sizing and buffer insertion for low power and a
generalized delay model, IEEE J. Solid-State Circuits,
vol. 31, no. 3, 1996, pp. 437447.
[11] Weiping Shi, and Zhuo Li, A fast algorithm for
optimal buffer insertion, on IEEE Trans. ComputerAided Design, vol. 24, no. 6, 2005, pp. 879891.
[12] Zhuo Li and Weiping Shi, An O ( bn 2 ) time algorithm
for buffer insertion with b buffer types, in Proc.
Design, Automation and Test in Europe 2005, pp.
13241329.
[13] Zhuo Li, Ying Zhou, and Weiping Shi, O ( mn ) time
algorithm for optimal buffer insertion of nets with m
sinks, IEEE Trans. Computer-Aided Design of
Integrated Circuits and Systems, vol. 31, no.3, 2012,
pp.437-441.
[14] Z. Li, C. N. Sze, C. J. Alpert, J. Hu and W. Shi,
Making fast buffer insertion even faster via
approximation techniques , in P roc. Asia South
Pacific Design Automation Conf., 2005, pp. 1318.

Opt

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Opt

Transféré par

Droits d'auteur :

Formats disponibles

A Distinctive O ( mn ) Time Algorithm for Optimal Buffer Insertions

Xinsheng Wang1*, Wenpan Liu1, Mingyan Yu2

applied four novel techniques to their algorithm and

complexity caused by buffer type b, Li and Shi [12]

O ( n 2 ) time algorithm. The improvement is made possible

by a new pruning rule and the predictive merging technique

much less than n in general based on industrial ASIC chips

2. Model and problem formulation

insertion positions. In addition to the presentation of the

three types, source vertex V0, sink vertices Vs , and internal

sink capacitance C ( s ) and required arrival time RAT ( s ) .

16th Int'l Symposium on Quality Electronic Design

The buffer to be inserted into the tree hass intrinsic delay

Each wire e = (v j ,vi ) E is associatedd with lumped

( D (vi +1 , B ) + D (vi , e)).

If vi is not actually inserted by a buffer, then

D(vi ,B) = 0 .The time slack of v under soluttion is defined

N ( v ) . Once we have the N ( v0 ) at the source vertex, we

can find out the optimal solutio

3. New pruning rule

In the solutions set N ( v ) , it may

those cant be the optimal buffer in

Q ( v, ) at first, so q1 q2 q3 q4 , which is shown in

Following the model and definitions, w

We denote the time slack and capacitancce under solution

wire e = (vi+1 , vi ) is added, the time slack and capacitance at

When a buffer is inserted at vi +1 ffrom its direct

Figure 2: The first sttage of pruning

In the second stage of pruning, we connsider each three

which conflicts with given conditio

then j is redundant and should be pruned. We prove it in

C i ) > 0 , we get that

j is redundant according to the first

Proof: Assume v ' is the upstream vertexx from v . Let L

all other conditions are similar with above three type

Taking 2 , 3 and 5 as exam

As C(v', k )= C(v', j ) , we get that j is redundant

The Similarity between our new

Q(v', j ) - Q(v', i ) > 0,

4. Extension to multi-pin nets

Figure 3: The second stage of pruning

[13]s algorithm applies to 2--pin nets well. However,

Assuming vk is a merging vertex and vm and vn are its two

4.1. New data structure

4.2. Algorithm for multi-pin net

complexity of our new algorithm is also O ( b 2 n+ mn ) , but it

Let all best and new pointers point to the

b1 and b2 .The simulation results show that our new

Table 3: Simulation results for multi-pin net with buffer

P. Saxena, N. Menezes, P. Cocchini, and D. A.

Shiyan Hu, Zhuo Li, and C. J. Alpert, A fully

Vous aimerez peut-être aussi