Vous êtes sur la page 1sur 4

A Novel Fast Canonical-Signed-Digit Conversion Technique for Multiplication

Rui Guo and Linda S. DeBrunner Department of Electrical and Computer Engineering Florida State University, Tallahassee, FL 32310

AbstractFast multiplication can be achieved by using canonical signed digit (CSD) to speed-up computations. Conversion to CSD is needed when the multiplier is not known a priori. In this work, a novel approach for converting an unsigned binary number or twos complement number to its CSD form from least signicant bit to most signicant bit, (right-to-left), is presented. Comparison shows that our algorithm is faster and requires less area than existing CSD conversion algorithms. Index Termscanonical-signed-digit (CSD), minimum-signeddigit (MSD), redundant number representation, Booths Recoding.

II. BACKGROUND The value of an unsigned binary number X of length W can be determined by
W 1 i=0

X=

xi 2i

I. I NTRODUCTION The multiplication of two numbers x y can be implemented by accumulating the shifted partial product xi y, for each digit xi of the multiplier x. So, the number of necessary addition operations required to sum the partial products is one less than the number of nonzero digits in the representation of the corresponding constant multiplier x. Multiplierless approaches, such as distributed arithmetic or shift/add using CSD, can be employed to implement digital lters with constant coefcients. However, for lters with coefcients that are not know a priori, such as adaptive lters, general multipliers are needed, and they are often implemented using shift-and-add operations based on binary representation. Canonical-signed-digit (CSD) representation and minimumsigned-digit (MSD) representation, both of which have a minimum number of nonzero digits in their representations, are frequently used to reduce hardware complexity and increase throughput [1], [2]. The development of fast CSD and MSD conversion algorithms has been the focus of much effort. In 1951, Booths recoding was presented [3] to efciently multiply two numbers using recoding multipliers. In 1960, Reitwiesner developed an algorithm to convert twos complement numbers to CSD [4]. Since then, more techniques for CSD and MSD conversion have been proposed [5][7]. In this paper, a new approach for converting an unsigned binary number to its CSD representation is proposed. Our algorithm can be readily extended to the conversion of numbers in twos complement representation. The conversion begins with the least signicant bit and moves to the most signicant bit, i.e. right-to-left. The time required to perform the conversion with the proposed method is less than the time required using existing CSD conversion algorithms. At the same time, the area required is reduced signicantly.

where xi {0, 1}. This expression can also be used to determine the value of a number X of length W written in radix-2 signed-digit representation where xi {0, 1}. Canonical signed digit representation is a special case of radix2 signed-digit representation that has the following constraints [4]: No two consecutive digits of a CSD number are non-zero. The CSD representation of a number is unique. The CSD representation of a number has the minimum number of nonzero digits, which is W/3+1/9+O(2W ) for a W -digit number [8]. A. Typical Right-To-Left Conversion Algorithm In [4], Reitwiesner described a method for converting a twos complement number to CSD representation. Let
k2

X = (xk1 , xk2 , ..., x0 ) = xk1 2k1 +


i=0

xi 2i

where xi {0, 1}, then the CSD representation is of the form Y = (yk1 , yk2 , ..., y0 ) where yi {0, 1} and having the properties described above. The algorithm given in [4] proceeds from the least signicant bit to the most signicant bit, i.e. from right-to-left.
Table I T YPICAL R IGHT-T O -L EFT A LGORITHM [6] xi+1 0 1 0 1 xi 0 1 1 0 0 1 ci 0 0 0 1 1 1 yi 0 1 1 1 1 0 ci+1 0 0 1 0 1 1

To implement this right-to-left conversion algorithm, two bits are used to represent each CSD digit, i.e. the magnitude s m (yi , yi ). We bit y m and the sign bit y s , so we dene yi write 1 (0, 1), 1 (1, 1), and 0 (, 0), where 1 denotes

978-1-4577-0539-7/11/$26.00 2011 IEEE

1637

ICASSP 2011

xW

...

xi + 2

xi +1

xi

...

x0

xW
ci

...
ci

xi

xi 1

xi 2

...
ci 1

x2

...

c i +1

...

y si

y mi

Figure 1.

Hardware implementation for typical right-to-left algorithm Figure 2.

y si

y mi

Hardware implementation for typical left-to-right algorithm [6] Table III B OOTH S R ECODING [3] xi 0 0 1 1 xi1 0 1 0 1 Di 0 1 1 0

-1, and denotes 1 or 0. Based on Table I, the Boolean equations can be written
s yi = xi+1 m yi = x i c i + x i c i = x i c i

(1) (2) (3)

ci+1 = xi+1 xi + ci xi + xi+1 ci

The hardware implementation according to (1)-(3) is shown in Fig. 1. Since each carry out bit ci is propagated to the next s m bit position for the generation of the CSD digit pair (yi , yi ), the critical path propagation delay is 2W + 1 gates, assuming the length of input X is W bits. B. Typical Left-To-Right Conversion Algorithm In applications such as digital lters, vector multiplication, and exponentiation, the performance of MSD numbers is similar to CSD numbers, since they both have the minimum number of nonzero digits. But MSD representation has other, often desirable, properties: MSD representation is not unique, and consecutive nonzero digits are allowed. Several algorithms for MSD conversion have been proposed in [5], [6]. Although these algorithms are based on different LUT congurations and are dened differently, the conversions are executed leftto-right with the same output.
Table II T YPICAL L EFT-T O -R IGHT A LGORITHM [6] xi 0 0 0 1 0 1 1 1 xi1 0 1 1 0 1 0 0 1 xi2 0 1 0 1 ci 0 0 0 0 1 1 1 1 yi 0 0 1 1 1 1 0 0 ci1 0 0 1 0 1 0 1 1

s yi = c i

The implementation circuit is given in Fig. 2. C. Booths Recoding In [3], a twos complement number is converted to a radix2 signed-digit representation with the digit set {0, 1, 1} to multiply two numbers efciently. The proposed algorithm uses Table III, from which we can see that Di = xi1 xi . We can use the technique, known as Booths recoding, to convert a number x to D = 2x x. The resulting value for D does not meet the constraints of a CSD representation, e.g. xtwo s = 101 is converted to D = 111. However, D can be used to obtain the CSD representation, e.g. 101. In [7], Booths recoding is implemented (with exclusive-or gates, AND gates and inverters): Dm = x << 1 x Ds = x << 1&x (4)

(5)

where <<, , and & are left shift, XOR, and the AND operation, respectively. The radix-2 signed-digit representation obtained by Booths recoding has several properties [7]:

The Boolean equations derived from Table II [6] are ci1 = ci xi1 xi2 + ci xi1 + ci xi2
m yi = ci xi1 xi2 + ci xi + ci xi + xi1 ci xi2

Property 1: No two consecutive nonzero bits in the Booths recoded number D have the same sign. Property 2: The zero digits in the Booths recoded number D correspond to zero digits in the CSD form. Property 3: The CSD representation can be obtained by converting each nonzero segment in the Booths recoded number D.

1638

Table IV TABLE FOR P ROPOSED A LGORITHM


m Di

xW +1

xW

xW 1

x3

x2

x1

x0 x1

0 1 1

m yi1

1 0

m yi

0 0 1

...

III. P ROPOSED T WO S C OMPLEMENT T O CSD C ONVERSION M ETHOD As in [4], we use the following representation for the signed digits: yi = (y s , y m ), where 1 = (0, 1), 1 = (1, 1), and 0 = (, 0), where y s and y m are the sign and magnitude bit of the converted CSD digits, and stands for 0 or 1. The proposed CSD conversion algorithm is performed in three steps: Step 1: Use (4) to generate the Booths recoded value, Dm , for the input x. This step can be performed concurrently for each bit, with a constant time delay, i.e. one XOR gate delay. m Step 2: Based on Table IV, assuming y1 = 0, each CSD m magnitude bit yi is generated by
m m m yi = Di yi1

y sW y sW 1 D m y sW 2D m W 1 W
D
m W

s s D m 3 y 1 D m 2 y 0 D m1
m W 1

D m0 D
m 0

m 2

m 1

y m 1

...
y mW y mW 1 ym2 y m1 y m0

Figure 3.
n+1

Hardware implementation for proposed algorithm


n

for all i [0, W ]. The delay for this step is W 2-input AND gate delays, ignoring the time required for inversion. Step 3: If we dene xW +1 = xW = 0 (which does not change the value of the W -bit value x), then the sign bit of the resulting CSD digits is generated by
s yi = xi+1

m (xi xi1 )yi1 (1)xi+1 2i = i=0 i=0

x i 2i

(7)

The two cases of xn =0 and xn = 1 can be considered separately to facilitate the computations. Theorem 1: The proposed conversion method works for any value of an unsigned binary number x with W bits. Proof. The proof follows directly from Lemma 1, the sparseness of the representation, and Lemma 2. The corresponding hardware implementation is shown in Fig. 3. Note that, if in Step 3, instead of setting xW = 0, we set xW = xW 1 (sign extension), our proposed conversion scheme works for a twos complement input as well. IV. P ERFORMANCE C OMPARISON Based on Table V of the characteristics of CMOS gates from [9], we compare the area cost and time delay for our proposed algorithm with that of other CSD/MSD conversion algorithms. We assume that unsigned binary numbers are used as inputs, and a bit-parallel scheme is selected for implementation. A. Propagation Delay Estimation Considering the critical path for each implementation, we estimate the total propagation delay Tproposed , TRtoL , TLtoR for the proposed scheme, right-to-left scheme, and left-to-right scheme as follows: Tproposed = txor + (tand + tnot )(W + 1) = 0.25W + 0.58

for all i [0, W ]. This step does not add to the latency since it can be performed concurrently with Step 1 and Step 2. To prove our proposed algorithm yields a valid CSD number, we need to prove that it has the minimum number of nonzero digits and no two consecutive digits are non-zero, and that the value of the representation is correct for the represented number. Lemma 1: No two consecutive digits are non-zero, that is m m yi yi1 = 0. m m m Proof: Since yi = Di yi1 , we have
m m m m m m yi yi1 = Di (yi1 yi1 ) = Di 0 = 0

Lemma 2: The value of the representation is correct, that is y= .


i=0 W m yi (1)yi 2i = i=0
s

W 1

xi 2i

Sketch of Proof: (By Induction) For W = 1, we consider two cases: (1) x0 = x1 = x1 = 0 m and (2) x0 = 1 and x1 = x1 = x2 = y1 = 0. In both cases, straightforward computation shows the lemma is true. So, suppose for W = n, the lemma holds and show that it is true for W = n + 1. So, we know that
n n1

(xi
i=0

m xi1 )yi1 (1)xi+1 2i

=
i=0

xi 2

(6) TRtoL = (tand + tor3 )W + txor = 0.44W + 0.33

Applying the basic denitions and using Boolean algebra we can show that

1639

Table V C HARACTERISTICS O F A FAMILY O F CMOS G ATES [9] Gate Type AND OR OR NOT XOR
16 Proposed Algorithm 14 Propagation Delay (ns) 12 10 8 6 4 2 0 0 5 10 15 20 Input x Word Length 25 30 Typical RighttoLeft Algorithm Typical LefttoRight Algorithm

700 600 Total Area Usage 500 400 300 200 100 0 0 Proposed Algorithm Typical RighttoLeft Algorithm Typical LefttoRight Algorithm

Fanin 2 3 4 1 2

Propagation Delay (ns) 0.16 + 0.027L 0.23 + 0.025L 0.29 + 0.032L 0.04 + 0.028L 0.30 + 0.029L

Load Factor 1.0 1.0 1.0 1.0 1.1

Size in gates 2 2 3 1 3

10

15 20 Input x Word Length

25

30

Figure 5.

Area usage comparison

propagation for the left-to-right method is 2(W + 1) gate delays, ignoring NOT gates, which is one more gate delay than Reitwiesners method, but the area cost is larger since it needs more gates for the generation of each CSD digit pair s m (yi , yi ) than Reitwiesners. V. C ONCLUSIONS In this work, a new right-to-left CSD conversion method is proposed. The performance is compared with other CSD and MSD conversion algorithms. Our proposed method provides faster conversion speed, and requires less area than the two mentioned CSD and MSD conversion methods. CSD representation can be used to implementing fast multipliers with cost that increases linearly with the number of nonzero digits in the CSD multiplier representation. Our proposed CSD conversion method can be used to reduce area cost, and increase clock speed by decreasing necessary operations in applications such as vector multiplication, exponentiation and implementation of digital lters. R EFERENCES
[1] S.M. Kim, J.G. Chung, and K.K. Parhi, Design of low error CSD xedwidth multiplier, in Proc. 2002 IEEE ISCAS, Scottsdale, AZ, May 2002, pp. I-69I-72. [2] M.A. Soderstrand, CSD multipliers for FPGA DSP applications, ISCAS03, May 2003. [3] A.D. Booth, A Signed Binary Multiplication Technique, Quarterly J. Mechanics and Applied Math., vol. 4, pp. 236240, 1951. [4] G.W. Reitwiesner, Binary arithmetic, Advances in Computers, vol. 1, pp. 261265, 1960. [5] Y.C. Lim, J.B. Evans, and B. Liu, Decomposition of binary integers into signed power-of-two terms, IEEE Trans. on Circuits and Systems, vol. 38, no. 6, pp. 667672, June 1991. [6] M. Joye and S.-M. Yen, Optimal left-to-right binary signed-digit recoding, IEEE Trans. Computers, vol. 49, no. 7, pp. 740748, July 2000. [7] Y. Wang, L. S. DeBrunner, D. Zhou, and V. E. DeBrunner,A novel multiplierless hardware implementation method for adaptive lter coefcients, in Proc. IEEE Intl. Conf. Acoust., Speech, Signal Proc., 2007. [8] K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley, 1999. [9] M. D. Ercegovac and T. Lang, Digital Arithmetic, Morgan Kaufmann, 2004.

Figure 4.

Propagation delay comparison

TLtoR

= =

(tand +tnot +tor3 )W +tand +tnot +tor4 0.51W +0.57

where txor , tand , tnot , and tor3 are the time delay by 2-input XOR, AND, NOT gate, and 3-input OR gate, and W is the length of the input x. The time delay for the three different algorithms are plotted in Fig. 4 for values of W between 1 and 30. B. Area Usage Estimation Since the three schemes are implemented with exclusive-OR gates, based on Table V, the area usages Aproposed ARtoL , ALtoR for the proposed scheme, right-to-left, and left-to-right scheme are estimated as follows: Aproposed = Axor (W +1)+(Anot +Aand )(W + 1) = 6W +6 ARtoL = (3Aand + Aor3 )W + Axor (W + 1) = 11W + 3 ALtoR = (5Anot + 6Aand + Aor4 + Aor3 )(W 1) + 5Anot + 4Aand + Aor4 = 22W + 16 where Axor , Anot , Aand , Aor3 , and Aor4 are the area cost for 2-input XOR, NOT, 2-input AND, 3-input OR, and 4-input OR gate, respectively. We plot the area requirements in Fig. 5 for values of W between 1 and 30. Our proposed conversion method has the least propagation delay and requires the smallest area. The critical path

1640

Vous aimerez peut-être aussi