Vous êtes sur la page 1sur 4

2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-AS1C2004)/ Aug.

4-5,2004
4-3

A Low-Power Booth Multiplier Using Novel Data Partition Method


Jongsu Park, San Kim and Yong-Surk Lee
Processor Laboratory, Department of Electrical and Electronic Engineering, Yonsei University
134 Shinchon-dong, Seodaemun-gu, Seoul, Korea
E-mail: jspark@dubiki.yonsei.ac.kr

Abstract ~51.
The Booth algorithm has a characteristic that the Booth Many researchers have proposed methods to reduce power
algoriihm produces the Booth encoded products with a value consumption by modifying conventional multiplication
of zero when input data stream have sequentially equal algorithms [1][6][7][81[91[lOl[111[12].
values. Therefore, parrial products have greater chances of In order to reduce the increased amount of power
being zero when the one with a smaller dynamic range of two consumption, we propose a novel data partition method and a
inputs is used as a multiplier. To minimize greater switching multiplication algorithm by modifying the low power Booth
activities of partial products, we propose a novel multiplier [12].
multiplication algorithm and its associated architecture. The The organization of the remainder of the paper is as
proposed algorithm divides a multiplication expression into follows: Section 2 describes the basics of radix-4 Booth
four multiplication expressions, and each multiplication is
algorithm. Sections 3 and 4 describe an existing
computed independently. Finally, the results of each
multiplication and the proposed multiplication, respectively.
multiplication are added Therefore, the exchanging rate of
Section 5 shows experimental results and finally, conclusions
two input data calculations can be higher during
multiplication. Implementation results show the proposed are discussed in Section 6.
multiplier can maximally save about 20% in terms of power
dissipation than the previous Booth multiplier. 11. RADIX-4 BOOTH ALGORITHM

I. INTRODUCTION The radix-4 Booth algorithm is a powerful method to


increase the speed of the radix-2 Booth algorithm, since
greater numbers of bits are inspected and eliminated during
Digital signal processing (DSP) is one of core technologies
the total number of cycles necessary to obtain the product.
necessary for the next generation of multimedia and mobile
The operation multiplication needs two inputs and they are
communication systems [I]. Most DSP applications involve a multiplicand and a multiplier. To realize low-complexity 2's
addition and multiplication arithmetic operations. For complement multiplication, the radix-4 Booth algorithm can
example, DCT, FFT, wavelet transform, and OFDM are he applied to encode one of two inputs, X,Y. If data series of
essential DSP algorithms used for image and video data are used for Booth encoding, a datum ofX. is partitioned
processing, audio signal processing and mobile into a large number of 3-bit groups. The 2's complement ofX,
communications [2][3][4]. Currently, many portable with a word length of W, which can be represented by
information devices are batlery-powered. The multiplication
process is complex and dissipates a large amount of power
due to the need for summations of the partial products. j=a
Therefore, low-power multiplication is a key concern of
hattely-powered multimedia devices.
In a CMOS circuit, power consumption can he reduced by
using a smaller switching activity in the circuit as expressed E-,
in the following equation (I). = 2x;,, 2"
1 4
x

Ps",,tc*i"g
= aCVAf,, (1)
Where a is the switching activity parameter, c is the Here, W is assumed to he an even number. When
loading capacitor, Vdd is the supply voltage, and Fclk is the considering the other input datum of E multiplied by X;,
operating frequency. The symbol aC can also be viewed as Equation ( 2 ) can he modified into
effective switching activities when measured at the capacitor
node during the charging and discharging. The only --I
W

parameter which can he reduced in an algorithmic level is the


switching activity. Therefore, minimizing the switching
activity in the algorithmic level during the multiplication --I
W

process should he considered first before the complex and


=yB(xj,,,y)x22' (3)
expensive process of implementing a multiplier is attempted I=O

54
2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-ASIC2004)/ Aug. 4-5,2004

As shown in Equation (3), the Booth encoded product,


B(Xj,,,YJ is a value of -2Y, -Y, 0, Y and 2Y. We can observe
that a two input multiplier is replaced with the Booth encoded
product. As shown in Table 1, the Booth encoded product is
zero when three consecutive hits have the same value (0 or 1).
The Booth encoded product with a value of zero does not
produce a partial product. Therefore, we must produce greater
Booth encoded product with a value of zero to reduce power
dissipation.

Table 1 Radix4 Booth Encoded Product

w Switcher

M”II#DI,LBOI UultlDIIeI

Fig. 1 Multiplication proposed in [12]

I 11110101 x 10110110 = (1111 x l 0 l l ) X 100000000


I
= (1 I I 1 x 01IO) x 10000
111. PREVIOUS MULTIPLIER = (0101 x 101I) x 10000
= (0101 x 01 IO) x 1
A. Dynamic Range and the Booth Encoded Product I I
First of all, we must understand the concept of dynamic Fig. 2 Example of multiplication with smaller number of
range. Dynamic range means sequential binary data changes. bits
For example, ‘0000’ or ‘1 11 I’ have the smallest dynamic
range. However, ‘0101’ or ‘1010’ have large dynamic ranges. IV. PROPOSED MULTIPLIER
For the Booth encoded result to have with a smaller
dynamic range between two inputs, partial products have a A. Multiplication Input Data Partitioning
greater chance of equaling zero. Therefore we do not need We propose a low-power multiplication process applied
any additional computation, and we can reduce power with enhanced power efficiency than the previous method
dissipation if we find a scheme that produces more zero [12]. The previous method simply compares the entire
results during the Booth encoded products, since the number number of bits between two input data sources
of partial products during the multiplication can be decreased. simultaneously. Therefore, it has a less chance of two input
data exchange for Booth encoding than the proposed method
B. Shen’s Multiplication Algorithm in which the two input data are divided into a large number of
Shen et al. proposed a multiplication algorithm used for terms with smaller bits. For example, in order to increase the
low power dissipation [LO]. The multiplication algorithm chance of data exchanges occur during multiplication, the
concentrated on reducing the amount of switching activity of multiplication process can be modified as shown in Fig. 2.
partial products, As explained in Table 1, the Booth encoded The two inputs used for a multiplication are divided into the
result with a smaller dynamic range produces more partial upper part and the lower pm. As shown in Fig. 2, (1 1110101
products with a value of zero. Therefore, the one with a x 10110000) is not exchanged in the previous multiplication
smaller dynamic range between the two inputs must be the scheme. However, this data is exchanged in the proposed
multiplier instead of a multiplicand. multiplication.
The comparator shown in Fig. 1 compares the effective With the proposed scheme, the chance of the exchanges
dynamic range between two inputs and the switcher can be increased because four terms of a multiplication with a
exchanges between two inputs if the dynamic range of a smaller number of bits than those of the original input are
multiplicand is less than the multiplier. Then, the compared for Booth encoding. Therefore, the proposed
multiplicand and the multiplier of the switcher outputs are multiplication can increases the chance of partial product
used as the conventional multiplier inputs. becoming zero and reduces the overall power dissipation with
The limitation of this algorithm is that actual input data little additional hardware. The proposed multiplier also uses a
streams exchange may occur infrequently because this higher speed parallel multiplication architecture with smaller
method compares the entire number of bits calculated bits than the existing Booth multipliers.
between two input data streams during the overall dynamic
range comparison. B. Architecture of the Proposed Multiplier
Fig. 3 shows one example of the proposed multiplication
architecture where two input data streams are divided into
upper and lower parts.

55
2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-AS1C2004)IAug. 4-5,2004

C. Multiplication of Negative Input


10110110 x 0110=1001000100
Stcp1)1011&0110 x 0110
Step2)1011+1&0110 x 0110
Step3)1100&0110 x 0110

I 1100
x 0110
signbits()
x
(11)0110
0110

1 M6ecnee

output I I I101000(0000)+ 11 I1000100= 1001000100


Fig. 3 Block diagram ofthe proposed multiplication
Fig. 6 Multiplication of negative inputs
The multiplication scheme is composed of four modules;
input dividing unit, dynamic range determination unit (DRD), We need to take special care when partitioning negative
a radix-4 Booth multiplier, and an adder tree used for data. For example, 100000000000001 is negative, however,
summing partial products. the lower part of this data is positive when used with the
The input dividing unit divides each input data stream into Booth algorithm. We proposed novel data partition method to
parts with smaller data bits, i.e., and upper parts and lower solve this problem.
parts are used in this example. These smaller-sized data are Fig. 6 shows one example of the negative input data case.
processed independently for multiplication using Booth Three steps are required to accomplish this process: 1)
encoding. The DRD module detects the effective dynamic partitions two input data, 2) adds one to upper part of the data,
range of two inputs and exchanges each so that the following 3) perform the same multiplication as in normal case.
condition is met: the input with a larger effective dynamic
range is the multiplicand, and the input with a smaller V. EXPERIMENTAL RESULTS
effective dynamic range is the multiplier. The micro-
architecture of the DRD is shown in Fig. 4, where two 16-hit A. Analysis of Exchanging Ratio
multiplication inputs are divided into two 8-bit parts. The first We analyzed the input exchanging ratios of the proposed
and second groups of the comparator's inputs in the DRD multiplication scheme used for DSP applications. We obtain
module are the first 4 MSB bits and the next 4 LSB bits of the some results for discrete cosine transform. The QClF images
8-bit DRD input. The third bit of the DRD input is used (Lena, Flower Garden, Miss America, and Table Tennis) are
commonly io the first and second groups because three hits used for these transforms. As shown in Table 2, 8 x 8 DCTs
(i+l, i, and i-I) are needed at once for use in the radix-4 of one image require 262,144 multiplications in these
Booth algorithm. experiments.
Fig. 5 shows the comparator that is used in the DRD block. In the Shen's multiplication, 10,276 data exchanges
The comparator consists of two AND gates and one OR gate. (3.91% of the total number of multiplications) are achieved
The comparator output is zero when all input bits are equal whereas the proposed multiplication provides 27,708
value (0 or I). (10.56% of the total number of multiplications) data

lq-p-% A
+ +,
x i 7 01 "17'0,
exchanges on average. The proposed multiplication increases
7
I
- 1
i
,
I----
7-n 3-0 , the number of exchanges about 2.5 times (10.5613.91) when
compared by the previous multiplication more smaller bit
multiplications than the previous method.
-.,.
CO ... 1 c~_.I.lI.
In those experiments, similar data exchange rates are
achieved with four different images. This is due to the fact
that DCT coefficients are fixed, and the exchanges occur in
similar data positions.

+ +
v sxrtcner

Mi,ll,Dl,imd

Fig. 4 DRD of the proposed multiplication


U"I,."h**

VI -
$-3b
$ $> r
$!
y,

Fig. 5 Comparator
B. Analysis of Power Dissipation

56
2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-ASIC2004)/Aug. 4-5,2004

We compared the power dissipation of the proposed REFERENCES


multiplier with existing Booth multiplers (Shen’s [12], Yu’s
[IO], and Ahn’s [ I I]). We evaluated power dissipation levels Chang-Young Han, Hyoung-Joon Park and Lee-
of the proposed layouted multiplier using a Synopsys Prime Sup Kim, “A low-power m y multiplier using separated
Power tool. Tables 3 and 4 show the overall power analysis multiplication technique,” Circuits and Systems 11:
results of the four multipliers when applied to FFT and Analog and Digital Signal Processing, IEEE Trans. on,
Wavelet transform of images. Power dissipation of the Volume: 48 Issue: 9, Sep 2001 Page(s):866-871
proposed multiplier used for the DSP algorithms was reduced C. Lemonds, “A high throughput 16 by 16 bit multiplier
maximally by ahout 7% (Shen), 15% (Ahn) and 20% (Yu) on for DSP cores,” IEEE International Symposium on
average, respectively. The reason why the exchange ratio is Circuits and Systems, ISCAS, vol. 2, pp. 477-480. 1996.
not the same as the power dissipation reduction ratio is that Tumer, R.H., Courtney, T. and Woods, R.,
the input data exchange ratio is based only on the dyanmic “Implementation of fixed DSP functions using the
range of the partitioned input data. Therefore, when two reduced coefficient multiplier, Acoustics, Speech, and
partitioned input data streams are exchanged, all partial Signal Processing,” 2001. proceedings. (ICASSP ’01).
products for the Booth encoding may not be zero.
2001 IEEE International Conference on, volume: 2,2001
Page(s): 881-884 v01.2
Yiquan Wu and Zhaoda Zhu, “The new real-multiplier
Proposed Shen’s YU’S Ahn’s
Multiplier Multiplier Multiplier Multiplier FFT-j alforithms,” Aerospace and Electronics
Miss Conference, 1993. NAECON 1993., proceedings of the
16.73 17.81 20.72 19.16
America IEEE 1993 National, 24-28 May 1993, Page(s): 90-93
Le”. 16.08 17.46 19.56 18.56 vol. I
Yi-Wen Wu, Chen, 0.T.X and Ruey-Liang Ma, “A low-
Flaner
Garden
1682 18.07 20.06 19.46 power digital signal processor core by minimizing inter-
Table data switching activities,” Circuits and Systems, 2001.
16.54 17.39 19.25 18.11
Tennis MWSCAS 2001. Proceedings of the 44th IEEE 2001
Midwest Symposium on, Volume: 1, 2001 Page(s): 172-
175 vol.1
[6] Paliouras, V., Karaginni, K. and Stouraitis, T. “A low-
Table 4 Power analysis for wavelet transform application complexity combinatorial RNS multiplier,” Circuits and
Systems 11: Analog and Digital Signal Processing, IEEE
Transactions on, Volume: 48 Issue: 7, Jul 2001 Page(s):
675-683
[7] Fayed, A.A and Bayoumi, M.A, “A merged multiplier-
accumulator for high speed signal processing
applications,” Acoustics, Speech, and Signal Processing,
2002. Proceedings. IEEE International Conference on,
Volume: 3, 2002 Page(s): Ill-3212-111-3215 vo1.3
[8] Kim, S. and Papaefthymiou, M.C., “Reconfigurable low
energy multiplier for multimedia system design,” VLSI,
2000. Proceedings. IEEE Computer Society Workshop
on, 2000 Page@): 129-134
[9] Bakalis, D., Kalligeros, E., Nikolos, D., Vergos, H.T.
VI. CONCLUSION and Alexiou, G., “Low power BIST for wallace tree-
based multipliers,” Quality Electronic Design, 2000.
We proposed a low-power multiplier using a modified ISQED 2000. Proceedings. IEEE 2000 First International
Booth-algorithm. In order to reduce power dissipation, we Symposium on, 2000 Page(s): 433-438
partititioned two multiplication input data streams into [IOIZhan Yu, Wasserman, L., and Willson, A.N., Jr., “A
smaller hits so that a higher probability of partial products painless way to reduce power dissipation by over 18% in
becoming zero occurs for a lower switching rate. Whereas the Booth-encoded carry-save array multipliers for DSP,”
overall area of the proposed multiplier is increased up to 9%, Signal Processing Systems, 2000. SiPS 2000. 2000 IEEE
the power dissipation ratio of the proposed multiplier can be Workshop on, 11-13 Oct. 2000, Page(s): 571-580
reduced maximally by about 20% of the total amount of [ 1 I] Taekyoon Ahn and Kiyoung Choi, “Dynamic operand
power dissipation when compared with the existing Booth interchange for low power,” Electronics Letters, Volume:
multiplier. Therefore, the proposed multiplication process can 33 Issue: 25,4 Dec. 1997, Page(s): 2118-2120
be applied to a low power design for use in portable [12]Nan-Ying Shen and Chen, 0.T.-C., “Low-
multimedia information devices and SoC designs, especially power multipliers by minimizing switching activities of
when low power consumption and high rates of speed are partial products,” Circuits and Systems, 2000. ISCAS
primary design constraints. 2002. IEEE International symposium on, Volume: 4,
2002 Page(s): 1V-93 -1V-96 vo1.4

57

Vous aimerez peut-être aussi