Vous êtes sur la page 1sur 2

Low-power parallel multiplier with column bypassing

M.-C. Wen, S.-J. Wang and Y.-N. Lin


A low-power parallel multiplier design, in which some columns in the multiplier array can be turned-off whenever their outputs are known, is proposed. This design maintains the original array structure without introducing extra boundary cells, as was the case in previous designs. Experimental results show that it saves 10% of power for random input. Higher power reduction can be achieved if the operands contain more 0s than 1s.

(enclosed in the circle) can be bypassed, and the outputs from the rst row are fed directly to the third row CSA. However, since the rightmost FA in the second row is disabled, it does not execute the addition and thus the output is not correct. To remedy this problem, an extra circuit must be added, and these elements locate in the triangle area in Fig. 2.
a3b1 a3b0 a2b1

a2b0 a1b1

a1b0 a0b1

0 a0b0

0
a3b2 a2b2 a1b2

0
a0b2

+
1

01 10

01 10

01 10 -b

0
a3b3 a2b3

+
01 10

+
01 10 b

b2

+
2

01 10

Introduction: Multiplication is an essential arithmetic operation for common DSP applications, such as ltering and fast Fourier transform (FFT). To achieve high execution speed, parallel array multipliers are widely used. These multipliers tend to consume most of the power in DSP computations, and thus power-efcient multipliers are very important for the design of low-power DSP systems. CMOS is currently the dominant technology in digital VLSI. Two components contribute to the power dissipation in CMOS circuits. The static dissipation is due to leakage current, while dynamic power dissipation is due to switching transient current as well as charging and discharging of load capacitances. Since the amount of leakage current is usually small, the major source of power dissipation in CMOS circuits is the dynamic power dissipation. Dynamic power dissipation appears only when a CMOS gate switches from one stable state to another. Thus, the power consumption can be reduced if one can reduce the switching activity of a given logic circuit without changing its function. Many low-power multiplier designs can be found in the literature. A straightforward approach is to design a full adder (FA) that consumes less power [1]. Power reduction can also be achieved through structural modication. For example, rows of partial products can be ignored [2]. Parallel multiplier: Consider the multiplication of two unsigned n-bit numbers, where A an1 an2, . . . , a0 is the multiplicand and B bn1 bn2, . . . , b0 is the multiplier. The product P p2n1p2n2, . . . , p0, can be written as follows: P
n1 X n1 X ai bj 2ij i0 j0

a1b3

a0b3

+
01 10

+
01 10 -b

b3
3

01 10

P7

P6

P5

P4

P3

P2

P1

P0

Fig. 2 4 4 Braun multiplier with row-bypassing

An array implementation, known as the Braun multiplier [3], is shown in Fig. 1. On the other hand, the Baugh-Wooley multiplier uses the same array structure to handle 2s complement multiplication, with some of the partial products replaced by their complements. The multiplier array consists of (n 1) rows of CSA, in which each row contains (n 1) FA cells. Each FA in the CSA array has two outputs: the sum bit goes down while the carry bit goes to the lower-left FA. For an FA in the rst row, there are only two valid inputs, and the third input bit is set two 0. Therefore, it can be replaced by a two-input half-adder. The last row is a ripple adder for carry propagation. In this Letter, we propose a low-power design for this multiplier.

Proposed method: Instead of bypassing rows of full adders, we propose a multiplier design in which columns of adders are bypassed. In this approach, the operations in a column can be disabled if the corresponding bit in the multiplicand is 0. There are two advantages to this approach. First, it eliminates the extra correcting circuit as shown in Fig. 2. Secondly, the modied FA is simpler than that used in the row-bypassing multiplier. Assume that we execute 1010 1111 in Fig. 1. It can be veried that, for FAs in the rst and third diagonals, two out of the three input bits are 0: the carry bit from its upper right FA, and the partial product aibj (note that a0 a2 0). As a result, the output carry bit of such an FA is 0, and the output sum bit is simply equal to the third bit, which is the sum output of its upper FA. The following theorem shows that this is true in general. Therefore, when ai is 0, the operations in the corresponding diagonal can be disabled since all the outputs are known. We refer to the FAs in a diagonal in Fig. 1 as a column. Let FAi, j be the full adders locating in row i and column j, 0  i, j  n 2, in the (n 1) (n 1) array, as shown in Fig. 1. FA0,0 is the adder at the upper-right corner. The following theorem establishes reason for column bypassing. Theorem 1: When aj 0, the output of a column j adder cell FAi, j can be specied as follows. 1. The output carry bit is 0. 2. The output sum bit is equal to the output sum bit of FAi1, j1. Proof: We prove this theorem by induction. 1. Consider row 0. Note that, in row 0, there are only two bits to be added. Adder FA0, j carries out ajb1 aj1b0. If aj 0, then the output carry bit must be zero, and the out sum bit is equal to aj1b0. 2. Assume that the theorem holds for row i. 3. In row i 1, the inputs of FAi1, j are carry bit from FAi, j, sum bit from FAi, j1, and the partial product ajbi1. Since aj 0, two out of the three inputs are 0, and the output sum bit is equal to the sum bit sent by FAi, j1. According to theorem 1, when aj 0, the operations in column j can be ignored and thus the full adders can be disabled since the outputs are known.
a3b0 a2b1 + a3b2 a2b2 + a3b3 a2b3 a2 +
10

a3b1

a2b0 a1b1 +
10

a1b0 a0b1 +
10

a0b0

a1b2

10

a0b2 +
10

Fig. 1 4 4 Braun multiplier

+ a0b3
10

a1b3 a1

10

Low-power multipliers with row-bypassing: A low-power multiplier design may disable the operations in some rows to save power [2]. If bit bj is 0, all partial products aibj, 0  i  n 1, are zero. Therefore, the additions in the corresponding row in Fig. 1 can be bypassed. The rowbypassing multiplier is shown in Fig. 2. Each cell in the CSA array is augmented with three tri-state gates and two multiplexers. For example, let b2 be 0 in Fig. 2. In this case, the CSA in the second row

+
10

a0 +
10

+ P7 P6

+ P5

+ P4 P3 P2 P1 P0

Fig. 3 4 4 column-bypassing multiplier

ELECTRONICS LETTERS 12th May 2005 Vol. 41 No. 10

Multiplier design: The column bypassing multiplier is shown in Fig. 3. Note that we only need two tri-state gates and one multiplexer in a modied adder cell. If aj 0, the FA will be disabled. We do not need a tri-state gate for the carry input (Ci1, j), and the reason is given as follows. For a Braun multiplier, there are only two inputs for each FA in the rst row (i.e. row 0). Therefore, when aj 0, the two inputs of FA0, j are disabled, and thus its output carry bit will not be changed. Therefore, all three inputs of FA1,j are xed, which prohibits its output changing. In the bottom of the CSA array, we need to set the carry outputs to be 0. Otherwise, the corresponding FAs may not produce the correct outputs since their inputs are disabled. This is done by adding an AND gate at the outputs of the last-row CSA adders.

Table 2: Area (mm2)


Multiplier type Braun [2] Proposed Size 44 8672 (%) 88 (%) 16 16 (%) 131040 185367 162131 100 141 124 100 33286 100

13692 158 48991 147 10063 116 40236 121

Conclusion: We have presented a new low-power parallel multiplier design, which disables the operations in columns of full adders. Compared with row-bypassing, this technique achieves higher power reduction with lower hardware overhead. # IEE 2005 Electronics Letters online no: 20050464 doi: 10.1049/el:20050464 2 February 2005

Results: To evaluate the performance of this low-power multiplier, we implement the design with TSMC 0.35 mm technology. We compare the performance of this design with a normal Braun multiplier and rowbypassing multiplier [2]; the results are given as follows. Table 1 gives the power consumption by the three designs. In this experiment, the input patterns are assumed to be random, i.e. the probability of 0 and 1 are both 0.5. The power is estimated by running HSPICE. Note that this is a relatively pessimistic estimation. If the operands are sparse (i.e. the number of 0s is more than 1s), there will be greater power saving. Our results show that the row-bypassing multipliers actually consume more power, possibly due to the extra logic. Our design consumes less power in all cases, and the reduction increases as the size becomes larger. If the distribution of 0s and 1s is not uniform, we shall be able to achieve higher power saving. The areas of the three designs are listed in Table 2. In our design, the area overhead is roughly 20%, while the area overheads of row-bypassing multipliers are more than 40%.

M.-C. Wen, S.-J. Wang and Y.-N. Lin (Department of Computer Science, National Chung-Hsing University, 250 Kuo-Kuan Road, Taichung 40227, Taiwan) E-mail: sjwang@cs.nchu.edu.tw References
1 2 3 Wu, A.: High performance adder cell for low power pipelined multiplier. Proc. IEEE Int. Symp. on Circuits and Systems, May 1996, Vol. 4, pp. 5760 Ohban, J., Moshnyaga, V.G., and Inoue, K.: Multiplier energy reduction through bypassing of partial products. Proc. Asia-Pacic Conf. on Circuits and Systems, 2002, Vol. 2, pp. 1317 Abu-Khater, I.S., Bellaouar, A., and Elmasry, M.: Circuit techniques for CMOS low-power high-performance multipliers, IEEE J. Solid-State Circuits, 1996, 31, (10), pp. 15351546

Table 1: Power (mWatt)


Multiplier type Braun [2] Proposed Size 44 0.4325 0.5537 (%) 100 128 88 2.31 2.76 2.25 (%) 100 119 97.4 16 16 8.01 8.26 7.15 (%) 100 103 89.3

0.4298 99.4

ELECTRONICS LETTERS 12th May 2005 Vol. 41 No. 10