ACMA: Accuracy-Configurable Multiplier Architecture For Error-Resilient System-on-Chip

ACMA: Accuracy-Configurable Multiplier
Architecture for Error-Resilient System-on-Chip

Kartikeya Bhardwaj and Pravin S. Mane
Electrical and Electronics Engineering
BITS Pilani-Goa Campus, Goa 403 726, India
Email: bhardwajkartikeya@gmail.com, pravinmane@goa.bits-pilani.ac.in
AbstractIn nanometer regime, optimization of System-on- cise/approximate designs have attracted significant research
Chip (SoC) designs w.r.t. speed, power and area is a major interest in recent years. Conventional wisdom investigated
concern for VLSI designers today. Imprecise/approximate design several mechanisms such as truncation [2] [3], over-clocking,
obviates the constraints on accuracy, stemming a novel Speed-
Power-Accuracy-Area (SPAA) metrics which can pilot to tremen- and voltage over-scaling (VOS) [4], which could not config-
dous improvements in speed and/or power with a feeble accord ure Speed-Power-Accuracy-Area (SPAA) metrics effectively.
in accuracy. This astonishingly expediency captivated researchers Apart from these, other design techniques rely on functional
to delve into imprecise/approximate VLSI design evolution. In approximations and mostly focus on imprecise/approximate
this paper, we present a new accuracy-configurable multiplier adders via the concept of shortening the carry-chain to elevate
architecture (ACMA) for error-resilient systems. The ACMA uses
a technique called Carry-in Prediction for approximate multipli- design performance. Lu [5] proposed a -bit carry look-ahead
cation based on efficient precomputation logic that increases its adder in which only previous bits are considered to estimate
throughput. The proposed multiplication reduces the latency of current carry signal. Lus adder is found unattractive due to
an accurate multiplier by almost half by reducing its critical low probability of getting a correct sum and increased area
path. The simulation results suggest that SPAA metrics can be overhead. Shin et al. [6] reduce data-path delay and re-design
administered by exploiting the design for apposite number of
iterations. The results for 16-bit multiplication show the mean the data-path modules and cuts the critical-path in carry-
accuracy of 99.85% to 99.9% in case there is no lower bound chain to exploit a given error rate to improve parametric
on the size of operands and if size of operands are 10-bit or more yield. Further, Shin et al. [7] explored a logic synthesis
(numbers > 1000), it results into a mean accuracy of 99.965%. approach to exploit a given error rate to reduce the area of
Index TermsApproximate Arithmetic; Error-Resilient De- imprecise/approximate circuits.
signs, multiplier architecture; Accuracy-Configurable multiplier
Zhu et al.[8] [9] reveal four error-tolerant adders: ETA-I,
ETA-II, ETA-IIM, and ETA-IV. ETA-I [8] segments inputs
into: 1) Accurate part, and 2) Inaccurate part, in which no
I. I NTRODUCTION
carry signal is considered at any bit position. ETA-II [9]
With the recent spectacular progress in sub-nanometer tech- concurrently completes carry propagation by dividing the path
nology, shrinking of transistor sizes has led to integration of into a number of short paths. In ETA-IIM, more MSBs
many cores on a single chip, thereby increasing system per- are considered in evaluating carry signal at the expense of
formance. This continuous technology scaling is putting forth degradation in speed performance and ETA-IV uses Carry
new design challenges because of the contradictory design Select Adder (CSL). Gupta et al. [10] target low-power to
specifications such as low-power and high-speed. Further, it leverage error-resiliency and propose five different versions
would be ridiculous to entertain these design specifications of mirror adder by reducing the number of transistors and
by dilating the design with exorbitant costs of manufac- internal node capacitances. Verma et al. [11] proposed a
turing, verification, and test. The International Technology Variable Latency Speculative Adder (VLSA) which provides
Roadmap for Semiconductors (ITRS) [1] has anticipated im- approximate/accurate results but land up in delay and large
precise/approximate designs that became a state-of-the art area overhead. Kahng et. al [12] demonstrated an accuracy-
demand in view of the emerging class of killer applications configurable adder (ACA) with reduced critical-path delay and
that manifest inherent error-resilience such as multimedia, error rate. The ACA provides for a better trade-off between
graphics, and wireless communications. In complex multi- accuracy, speed and power but with the large area overhead.
media applications, DSP blocks are implemented as cores In comparison to the above work on adders, very few
that process signals relevant to human senses, e.g., sight and researchers reported their work on approximate multipliers.
hearing. The verity of limited perception of human senses Sullivan et al. [13] investigated an iterative approximate mul-
alleviate the constraints on accuracy, resulting into error- tiplier based on (TEC) in which
resilient System-on-Chip (SoC) designs. a small amount of error correcting circuitry is added for each
In the SoC design for aforementioned applications, adders iteration. This circuitry inexpensively replicates the effects of
and multipliers are used as basic building blocks and multiple pipeline iterations for the most problematic inputs.
in accordance with error-resilient system design, impre- Kulkarni et al. [14] presented a 2 2 underdesigned multiplier
978-1-4673-6180-4/13/$31.00 2013 IEEE

block and built arbitrarily large power efficient inaccurate AH XL
multipliers. Khaing et al. [15] developed an Error Tolerant AH XH AL XL
Multiplier (ETM) algorithm in which the input operands are AL XH

b bits b bits
split into two parts: a multiplication part that includes a 2b bits
number of higher order bits and a non-multiplication part Final Product

4b bits
that is of the remaining lower order bits. The multiplication (a) (b)
process begins at the point where the bits split and move
simultaneously towards in the two opposite directions till all Fig. 1. (a) Recursive Multiplication (b) Recursive Tree (Each box represents
bits are taken care of. The ETM is able to achieve a reduction a partial product of equal size.)
in the delay, power saving and hardware cost as compared to
2b1 2b1
the traditional 12-bit multipliers. AH XH
Multiplier
Most of the above design approaches of approximate mul- (bxb)
tipliers are based on the concept of shortening the carry- XH AH AH XL

Multiplier
chains in which error is configurable and to provide course (bxb)
Addition Stage
and fine tuning to dynamically update the probability of input AH XL Final Product
(2bx2b)
combinations exhibiting accurate results which bound SPAA AL XH
AH XH AL XL
metrics much tightly. However, the algorithms for approximate XL AL

Multiplier
(bxb)
AL XH
multiplication of smaller numbers are employed in the designs AL XL

and most of them give large magnitude of error as the bit-width Multiplier
(bxb)
of operands increases. Therefore, in this paper we propose 0 0
an efficient accuracy configurable approximate multiplier that

gives fairly accurate results for large numbers (numbers having Fig. 2. A Pipelined Architecture for Recursive Multiplication
bit-width of around 10 bit or more). Our contributions are as
follows:
performed in the same clock cycle. Let be the multiplicand
We propose a new accuracy-configurable multiplier ar-
and be the multiplier and assume that both are of 2 bits
chitecture (ACMA) for error-resilient SoC designs.
each. and can be written as
Since our aim is make approximate multiplier fast, we
present a novel Carry-in Prediction Logic that signifi- = and =
cantly reduces the horizontal critical path.
The proposed multiplication reduces the latency of an Here, , , , and are of bits each.
accurate multiplier by almost half by reducing its critical The multiplication will be a 2 2 multiplication
path. It also increases the throughput of the ACMA. which can be recursively carried out as shown in Figure 1 (a).
We obtain a mean error between 0.1% to 0.15% (mean In this multiplication, , , , and
accuracy of 99.85% to 99.9%) for a 16-bit multiplication are partial products, each being a multiplication. Hence,
in case of no lower bound on the size of operands. a 22 multiplication is divided into four multiplications
However, if the size of operands are 10 bits or more followed by additions. In general, the recursive multiplication
(numbers > 1000), the mean error drops to merely builds a recursive tree as shown in Figure 1 (b).
0.035%, giving a mean accuracy of 99.965%. In case we further divide multiplication into very small
Rest of the paper is organized in various sections. We first dis- partial products (e.g. a 16 16 multiplication broken down
cuss the recursive multiplication procedure and then propose into 2 2 multiplications), the size of recursive tree will be
an approximate multiplier architecture in Section 2. Section very large. As a result, the overhead required for addition
3 gives the design of accuracy configurable multiplier. The stage will increase substantially. Further, this approach will
simulation results and their analysis for the proposed multiplier not be helpful in reducing the critical path. In our approach,
are given in Section 4. Finally, we conclude and report the we consider a pipelined architecture of a 2 2 recursive
future scope of our work in Section 5 of the paper. multiplier broken down into four multiplications as shown
in Figure 2. This pipelined architecture is found to be more
II. A PPROXIMATE M ULTIPLIER A RCHITECTURE optimum because further dividing multiplication ( ) into
In this section, we begin with the recursive multiplication smaller partial products would not only increase the overhead
procedure and then propose a new approximate pipelined of the addition stage but may require either more clock cycles
multiplier architecture. Since the focus is to design a fast to complete the addition or may need a longer clock period.
approximate multiplier with low-power consumption, we in- We now explain the approximate pipelined multiplier archi-
troduce and explain a novel Carry-in Prediction Logic here. tecture as follows.
A. Recursive Multiplication B. Proposed Architecture

A given multiplication can be recursively broken down into In the proposed architecture, we first focus on its accuracy,
several smaller-size multiplications, each of which can be that is minimizing error in the final product of and .
Accurate = Partial Product
Partial Product pi q j
2b bits
AH XH AL XL
AL XH
AH XL
Accurate to a
Large Extent b bits
Final Product
4b bits
Critical Column
Fig. 3. Achieving upper 2 bits as accurate to a high extent 1 1 1 1
Accurate to a Accurate
Large Extent Completely
However, in order to minimize the error, we derive a logic Inaccurate
that gives us accurate results for certain most significant bits b bits b bits
(MSBs). In our logic, we render most significant 2 bits (out ACCURATE PART INACCURATE PART
of a total of 4 bits) as accurate to a large extent. Note that Fig. 4. Carry-in Prediction for = 8 i.e. 16 16 multiplication
here the objective is to make upper 2 bits accurate to a high
extent and not completely accurate. This will minimize the
degree of error involved and at the same time will give a In order to make the approximate partial product multipliers
suitable tradeoff between accuracy and power consumed in the fast, it is necessary to break the carry chain which reduces
multiplication. Further, out of the least significant 2 bits that the horizontal critical path. Therefore, we propose a novel
remain, lower /2 bits will be accurate and upper 3/2 bits will Carry-in Prediction Logic. This logic reduces the horizontal
be inaccurate. We make the lower /2 bits as accurate because critical path by half and makes the upper bits of approximate
it doesnt take a lot of hardware and at the same time increases partial product multipliers accurate to some extent. We divide
accuracy. In other words, out of 4 bits in the final product, the products , and of 2 bits each into
least significant /2 bits will be accurate and most significant two parts: Accurate to a large extent part and inaccurate
2 bits will be accurate to a large extent and remaining 3/2 part. Both these parts are of bits each as shown in the
bits will be inaccurate. This kind of arrangement will not only Figure 4 for = 8. The circles in figure represent the
provide accuracy but also give promising results for speed, tree of partial products ( ) obtained from multiplication of
power and area. As we shall see, this is achieved by employing multiplicand (say = 1 2 ...1 0 ) and multiplier (say
approximate multipliers for , and = 1 2 ...1 0 ). Please note, here can take values of
computation in Figure 2 and an accurate multiplier is or and can take values of or as required
used for evaluation. for , and evaluation. As evident from
In order to obtain most significant bits accurate (out of 4 the figure, inaccurate part further consists of a completely
bits in final product), we make ( multiplication) inaccurate part and a completely accurate part of /2 bits
completely accurate. Further, we reduce critical path to a large each. In rest of the paper, we denote the accurate to a large
extent by dividing the approximate partial product computation extent part as Accurate part and the remaining lower bits
into two parts: accurate part and inaccurate part. We now of approximate partial product as Inaccurate part.
introduce a new concept named as carry-in prediction logic Figure 4 further shows a critical column which is the column
which is used in approximate partial product computation only containing the maximum number of partial products. Carry-
( , and ) and not for accurate . in Prediction logic exploits the fact that if there are two or
more 1s in the critical column, then a carry of atleast 1 is
C. Carry-in Prediction Logic
definitely propagated to the next column. When is large
In the proposed architecture, our aim is to make the most (greater than 5 i.e. for operands of size 10 bits or so) we
significant 2 (out of a total of 4 bits) as accurate to a make the carry-in propagated to the accurate part as 1 if one
large extent and we achieve this as illustrated in Figure 3. or more of the circles in the critical column are 1, since there
In this illustration, the most significant bits of the is a very low probability of 0 carry being propagated. In fact,
approximate partial products ( , , and ) for such a magnitude of , there is a good probability that
must be accurate to a certain extent so that the sum of upper the carry propagated to accurate part from most significant bit
bits of , upper bits of and the 2 bits of of inaccurate part will be more than 1. In order to minimize
achieves higher degree of accuracy. this error we make the upper /2 bits of inaccurate part as a
2b bits
two additions is just and + 1 respectively, they almost
AH XL
take same time to complete their addition process as the latter
b bits addition of 2 bits. Further from Figure 5, it is observed
that carry out emerging from latter 2 bit addition can be
AHH X HL
propagated using half adders and a 1 bit full adder can be
AHH X HH AL XL employed at the position where a carry-out emerges from the
AHL X HL
former +1 addition to add the 3 bits together at that position.
b A
HL
X
HH
The carry-out thus produced will be further propagated using
2 b bits b bits half adders till the MSB. Hence, we significantly reduce the
b AL XH latency of stage 1 and get almost same latency for stage 2 using
2
the methodology discussed above, thus reducing the overall
critical path of the said multiplier.
2b bits
Now that we have 7 multipliers in stage 1, we can vary
Final Product
4b bits
the accuracy level of the proposed multiplier by varying the
number of multipliers that are accurate. In any case, we keep
the as always accurate, so that the accuracy level
Fig. 5. Reducing the critical path of first stage of pipelined approximate
multiplier does not fall below a certain level. Therefore, we obtain
an Accuracy Configurable Multiplier whose accuracy can be
adjusted according to error tolerance of the application. The
series of 1s. The error is minimized because the difference number of inaccurate multipliers used will directly determine
in actual and approximate partial products will be analogous the amount of power saved by the multiplier. Also, because
to that between 128 and 127 i.e. in binary, 128 is represented the algorithm has been designed in general for 2 2
as 10000000 and 127 is given by 01111111 (128 just passes multiplications, it is also configurable according to bit-width
an extra carry). Now that we have a carry-in for accurate of operands, i.e. size of inaccurate part in the approximate
part beforehand, we can start the multiplication procedure partial products will always be equal to bits. For instance, if
simultaneously from both sides, thus reducing the horizontal bit-width of operands is scaled to 12 bits, value of becomes
critical path by half. This completes the so-called Carry-in 6, and there will be three approximate partial products of 66
Prediction Logic and its impact on reducing the critical path and four accurate partial products of 3 3 each. Hence, the
in accuracy configurable multiplier design as described next. proposed algorithm is reconfigurable with operand bit width
as well as accuracy configurable.
III. ACCURACY-C ONFIGURABLE M ULTIPLIER Next, we present experimental results by considering a
In this section, we present the design of an accuracy simple and suitable tradeoff between accuracy and power -
configurable 16 16 approximate multiplier. The first stage 3 partial products as inaccurate (8 8) and 1 partial product
of the pipelined multiplier uses a total of 4 multipliers: 1 as accurate (further recursively divided into 4 accurate partial
accurate (8 8) and 3 approximate (8 8). Because the products that are 4 4) to reduce latency. Theoretical analysis
inaccurate multiplier is inherently faster than a corresponding of power, area and latency is also described in the later part
accurate multiplier, using an accurate 8 8 multiplier in the of the next section.
same pipeline stage as other approximate ones would give no
IV. E XPERIMENTAL R ESULTS AND A NALYSIS
improvement in critical path. Therefore, to reduce the critical
path of stage 1, we further recursively divide the 88 accurate We simulate the proposed ACMA by writing a C - Program
product into four 4 4 accurate partial products as shown in and generating 5000 random numbers to compute accurate and
Figure 5. We then add them up together in the second stage. approximate products for all possible combinations without
In other words, the stage 1 of the pipelined approximate repetition. We use the following design metrics [8], [12] for
multiplier effectively consists of 7 multipliers, namely - the analysis of the proposed multiplier:
, , and which are 1) Overall Error (OE): It is the absolute error between
4 4 and accurate and , and which are approximate and accurate products. It is given by =
8 8 and inaccurate. Although, this decreases the latency of , where is the correct result and is the
stage 1 significantly when compared to an accurate 16 16 result obtained from approximate arithmetic circuit. All
pipelined multiplier, it appears that this methodology increases the numbers here are in decimal numbers.
the latency of stage 2 as we need to compute more number 2) Relative Error: Relative Error is simply (/ )
of additions. It can be observed that the latency of stage 2 100%. It gives the percentage of error involved in the
remains unaffected in case we perform addition of all the result of an algorithm.
partial products in parallel i.e. addition of and 3) Accuracy ( ): It is given by
( bits) and that of resulting sum with , (1 ) 100%. It measures the degree
( + 1 bits) in parallel with the addition of of correctness of the output of certain approximate
and (2 bits). Since the size of operands of the former algorithm.
TABLE I b bit Ripple Carry Adder
S IMULATION R ESULTS U SING C-P ROGRAM
b bit Ripple Carry Adder
Run Operand Mean Mean Acceptance
b times
No. Range Error Accuracy Probability b bit Ripple Carry Adder
1 > 1000 0.034% 99.966% 99.72%
2 >1 0.10% 99.90% 98.44%
3 >1 0.11% 99.89% 98.37%
4 >1 0.13% 99.87% 97.93% b bit Ripple Carry Adder
5 >1 0.15% 99.85% 98.00%

2b bits
Fig. 6. Combinational multiplier used for impementing individual

4) Mean Error: Mean error is the average of relative errors partial products of a combinational multiplier
of all the combinations tested in an algorithm. Similarly
mean accuracy can also be defined as an average of 75
Percentage Critical Path Reduction vs. Half Bitwidth of Operands
(or alternatively as 100 - Mean Error) of all
Critical Path Delay Reduction in stage1(%)

the individual values tested by the algorithm. Both of 70
these are expressed in percentages.

5) Minimum Acceptable Accuracy (MAA): Minimum Ac- 65
ceptable Accuracy is the minimum level of accuracy that

60
an application can tolerate. It is determined solely by the
application. (100 )% would give the maximum
55
error that can be tolerated by an application.
6) Acceptance Probability (AP): It is the probability that 50
0 10 20 30 40 50 60 70
accuracy of the approximate arithmetic circuit is higher Half Bitwidth(b)
than the minimum acceptable accuracy. Its value is given

by = ( > ) Fig. 7. Percentage Reduction in Critical Path of Stage 1 vs. Half Operand
Bit-width
A. Simulation Results
The proposed algorithm is simulated using a C - Program
for a minimum acceptable accuracy ( ) of 99%. It stage 1, normalized clock period and percentage reduction in
generates results like , relative error, , mean error clock period and plot them vs. half of operand bit-width (b)
and Acceptance Probability. The results are tabulated in the in MATLAB to see the reduction in Latency.
Table I for various runs conducted. There is no difference as We have considered the individual delay of the two stages
such between the runs but they have been provided here to because the net clock period is decided by the maximum delay
emphasize that the results are consistent, not random. of both the stages, Since the proposed algorithm is concerned
The table shows spectacular results for accuracy levels. The only with improvement in stage 1, we discuss the plot of
Acceptance Probability of more than 98% for a minimum Critical Path Reduction in stage 1 and the half bit-width () as
acceptable accuracy of 99% means that for all possible com- shown in Figure 7. It can be observed that as increases over
binations of total number of random numbers generated, more 64 bits, percentage critical path reduction saturates at around
than 98% cases give an accuracy greater than 99%. Also, as 51%. Figure 8 shows the normalized clock period vs. . It
explained earlier, for larger numbers (operand size > 1000 i.e. basically describes the rate at which the clock period increases
> 10 bits or so i.e. >= 5 or so), accuracy level shoots up as the bit-width of operands rises i.e. 2. As evident from the
to as high as 99.966%. figure, the rate of increase of clock period for approximate
multiplier (green line) is far less than that of an accurate
B. Theoretical Analysis of Power, Area and Latency one (blue line). It takes into account both the stages of the
We analyze our proposed approximate multiplier design multiplier and not just the stage 1.
w.r.t. power, area and latency theoretically and validate results Figure 9 is obtained by simply taking the difference between
experimentally. For simplicity, we take a very basic multiplier the two curves drawn in Figure 8. It shows that the proposed
and show the degree of advantages of using our approximate algorithm will give maximum reduction in clock period when
algorithm on it. We consider a pipelined multiplier, each = 12 i.e. for 24 24 multiplication. Practically speaking,
of the four partial products of which are evaluated by a 24 24 multiplications are widely used in Multiply Accumu-
simple combinational multiplier as shown in the Figure 6. We late Units which help in evaluation of Discrete Cosine Trans-
apply the proposed approximate multiplication algorithm on form and Quantization process for image/video compression
this pipelined multiplier. Therefore, partial products ( ) in application. This is a kind of application where error can be
approximate multipliers are added using ripple carry adders. tolerated to a certain extent.
Theoretically, we calculate critical path delay reduction in The reduction in power is also theoretically calculated based
Normalized Clock Period vs. Half Bitwidth of Operands
1200 logic is employed to increase its throughput. The proposed
design reduced the latency of an accurate multiplier by almost
1000
half by reducing its critical path. The simulation results for
16-bit multiplication are reported into two parts: 1) When
Normalized Clock Period
800
there is no lower bound on the size of operands, the mean

accuracy comes out to be 99.85% to 99.9% and 2) When the
600
400 size of operands are 10-bit or more (numbers > 1000), the
mean accuracy is found to be 99.965%. These results demon-
200
strate the effectiveness of the proposed accuracy-configurable
0
approximate multiplier design w.r.t. the SPAA design metrics.
0 5 10 15 20 25 30 35
Half Bitwidth(b) As far as future scope of this research is concerned, the
proposed ACMA design can be employed in Error-Resilient
Fig. 8. Normalized Clock Period vs. Half Operand Bit-width System Architectures for well-known Recognition, Mining and
Synthesis (RMS) applications and also in the error-tolerant
Percentage Reduction in Clock Period vs. Half Bitwidth of Operands
60 processor designs where speed, power, and area are major
design concerns and not the accuracy.
50
Percentage Reduction in Clock Period
R EFERENCES
40 [1] International technology roadmap for semiconductors,
http://www.itrs.net.
30 [2] M. Sheng, H. Libo, L. Mingce, and W. Zhiying, A comparative study
of subword parallel adders for multimedia applications, in ASIC, 2009.
20
ASICON 09. IEEE 8th International Conference on, oct. 2009, pp. 179
182.
[3] E. J. Swartzlander, Truncated multiplication with approximate round-
10
ing, in Signals, Systems, and Computers, 1999. Conference Record of
the Thirty-Third Asilomar Conference on, vol. 2, oct. 1999, pp. 1480
0
5 10 15 20 25 30 35 1483 vol.2.
Half Bitwidth(b)
[4] L. N. Chakrapani, K. K. Muntimadugu, L. Avinash, J. George, and
K. V.Palem, Highly energy and performance efficient embedded com-
Fig. 9. Percentage Clock Period Reduction vs. Half Operand Bit-width puting through approximately correct arithmetic: a mathematical foun-
dation and preliminary experimental validation, CASES, pp. 187196,
2008.
TABLE II
[5] S. L. Lu, Speeding up processing with approximation circuits, Com-
R ESULTS OF RTL C OMPILER
puter, vol. 37, no. 3, pp. 6773, mar 2004.
[6] D. Shin and S. Gupta, A re-design technique for datapath modules in
Power Area error tolerant applications, in Asian Test Symposium, 2008. ATS 08.
( ) (2 ) 17th, nov. 2008, pp. 431437.
[7] , Approximate logic synthesis for error tolerant applications, in
Approximate 0.295 2298.24
Design, Automation Test in Europe Conference Exhibition (DATE), 2010,
Accurate 0.438 3004.20 march 2010, pp. 957960.
% Reduction 32.73% 23.5% [8] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, Design of
low-power high-speed truncation-error-tolerant adder and its application
in digital signal processing, Very Large Scale Integration (VLSI) Sys-
tems, IEEE Transactions on, vol. 18, no. 8, pp. 12251229, aug. 2010.
on the reduction of number of full adders. For instance, in [9] N. Zhu, W. L. Goh, G. Wang, and K. S. Yeo, Enhanced low-power high-
speed adder for error-tolerant application, in SoC Design Conference
the case of a 16 16 multiplier, for a single approximate (ISOCC), 2010 International, nov. 2010, pp. 323327.
partial product (8 8), reduction in the number of adders [10] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, Low-power
is about 40%. But a 16 16 pipelined multiplier requires digital signal processing using approximate adders, Computer-Aided
Design of Integrated Circuits and Systems, IEEE Transactions on,
4 such multiplications, so the net reduction in number of vol. 32, no. 1, pp. 124137, jan. 2013.
adders is about 30% in stage 1. Therefore, this provides atleast [11] A. Verma, P. Brisk, and P. Ienne, Variable latency speculative addition:
30% reduction in power. We verified this hypothesis using A new paradigm for arithmetic circuit design, in Design, Automation
and Test in Europe, 2008. DATE 08, march 2008, pp. 12501255.
Cadence RTL Compiler and the results obtained are tabulated [12] A. Kahng and S. Kang, Accuracy-configurable adder for approximate
in Table II. We used a 45 standard cell library called arithmetic designs, in Design Automation Conference (DAC), 2012 49th
Nangate Opencell Library for RTL Synthesis. Further, it can ACM/EDAC/IEEE, june 2012, pp. 820825.
[13] M. B. Sullivan and E. E. Swartzlander, Truncated error correction for
be observed from the Table II that power and area results are flexible approximate multiplication, in Signals, Systems and Computers
in agreement with the theoretical results. (ASILOMAR), 2012 Conference Record of the Forty Sixth Asilomar
Conference on, 2012, pp. 355359.
V. C ONCLUSION AND F UTURE S COPE OF THE W ORK [14] P. Kulkarni, P. Gupta, and M. Ercegovac, Trading accuracy for power
with an underdesigned multiplier architecture, in VLSI Design (VLSI
In this paper, we proposed an Accuracy-Configurable Multi- Design), 2011 24th International Conference on, 2011, pp. 346351.
plier Architecture (ACMA) for error-resilient System-on-Chip [15] K. Y. Kyaw, W.-L. Goh, and K.-S. Yeo, Low-power high-speed multi-
plier for error-tolerant application, in Electron Devices and Solid-State
designs. The ACMA design is based on a new algorithm for Circuits (EDSSC), 2010 IEEE International Conference of, 2010, pp.
approximate multiplication where an efficient precomputation 14.

ACMA: Accuracy-Configurable Multiplier Architecture For Error-Resilient System-on-Chip

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

ACMA: Accuracy-Configurable Multiplier Architecture For Error-Resilient System-on-Chip

Transféré par

Droits d'auteur :

Formats disponibles

ACMA: Accuracy-Configurable Multiplier

Architecture for Error-Resilient System-on-Chip

978-1-4673-6180-4/13/$31.00 2013 IEEE

multipliers. Khaing et al. [15] developed an Error Tolerant AH XH AL XL

Multiplier (ETM) algorithm in which the input operands are AL XH

number of higher order bits and a non-multiplication part Final Product

tipliers are based on the concept of shortening the carry- XH AH AH XL

metrics much tightly. However, the algorithms for approximate XL AL

multiplication of smaller numbers are employed in the designs AL XL

an efficient accuracy configurable approximate multiplier that

A. Recursive Multiplication B. Proposed Architecture

Fig. 3. Achieving upper 2 bits as accurate to a high extent 1 1 1 1

5 >1 0.15% 99.85% 98.00%

Fig. 6. Combinational multiplier used for impementing individual

(or alternatively as 100 - Mean Error) of all

Critical Path Delay Reduction in stage1(%)

these are expressed in percentages.

ceptable Accuracy is the minimum level of accuracy that

than the minimum acceptable accuracy. Its value is given

there is no lower bound on the size of operands, the mean

Vous aimerez peut-être aussi