Académique Documents
Professionnel Documents
Culture Documents
VLSI SYSTEMS
Magdy Bayoumi
The Centerfor Advanced Computer
Studies, Universityof Louisiana
at Lafayette, Lafayette, Louisiana,
USA
This Section covers the broad spectrum of VLSI arithmetic, design engineer to analyze, design, and predict the behavior of
custom memory organization and data transfer, the role of large-scale systems. While design formulas and tables are
hardware description languages, clock scheduling, low-power listed, emphasis is placed on the key concepts and the theories
design, micro electro mechanical systems, and noise analysis underlying the processes. In order to do so, the material is
and design. It has been written and developed for practicing reinforced with frequent examples and illustrations.
electrical engineers in industry, government, and academia. The compilation of this section would not have been pos-
The goal is to provide the most up-to-date information in sible without the dedication and efforts of the section editor
the field. and contributing authors. I wish to thank them all.
Over the years, the fundamentals of the field have evolved to
include a wide range of topics and a broad range of practice. To Wai-Kai Chen
encompass such a wide range of knowledge, the section focuses Editor
on the key concepts, models, and equations that enable the
1
Logarithmic and Residue Number
Systems for VLSI Arithmetic
1.1 Introduction This chapter describes two arithmetic systems that employ
nonstandard encoding of numbers. The logarithmic number
Very large-scale integrated circuit (VLSI) arithmetic units are system (LNS) and the residue number system (RNS) are
essential for the operations of the data paths and/or the ad- singled out because they have been shown to offer important
dressing units of microprocessors, digital signal processors advantages in the efficiency of their operation and may be at
(DSPs), as well as data-processing application-specific inte- the same time more power- or energy-efficient, faster, and/or
grated circuits (ASICs) and programmable integrated circuits. smaller than other systems.
Their optimized realization, in terms of power or energy con- Although a detailed comparison of performance of these
sumption, area, and/or speed, is important for meeting systems to their counterparts is not offered here, one must
demanding operational specifications of such devices. keep in mind that such comparisons are only meaningful
In modern VLSI design flows, the design of standard arith- when the systems under question cover the same dynamic
metic units is available from design libraries. These units range and present the same precision of operations. This
employ binary encoding of numbers, such as one's or two's necessity usually translates in certain data word lengths,
complement, or sign magnitude encoding to perform ad- which, in their turn, affect the operating characteristics of the
ditions and multiplications. If nonstandard operations are systems.
required, or if high performance components are needed, then
the design of special arithmetic units is necessary. In this case,
the choice of arithmetic system is of utmost importance. 1.2 LNS Basics
The impact of arithmetic in a digital system is not only
limited to the definition of the architecture of arithmetic Traditionally, LNS has been considered as an alternative
circuits. Arithmetic affects several levels of the design abstrac- to floating-point representation (Koren, 1993; Stour-
tion because it may reduce the number of operations, the aitis, 1986). The organization of an LNS word is shown in
signal activity, and the strength of the operators. The choice Figure 1.1.
of arithmetic may lead to substantial power savings, reduced The LNS maps a linear real number X to a triplet as
area, and enhanced speed. follows:
n n-1 ... 0 Ix - 21
gre1(X) -- - - , (1.4)
X
Sx ...
where Sx is the sign of X, b is the base of the logarithmic gave, LNS = grel, LNS = b2-I - 1. (1.5)
representation, and Zx is a single-bit flag, which, when asserted,
denotes that X is zero. A zero flag is required because log b X is Due to formula 1.3, the average representational error for the
not a finite number for X = O. Similarly, since the logarithm of n-bit linear fixed-point case is given by:
a negative number is not a real number, the sign information
of X is stored in flag Sx. Logarithm x = log b Ixl is encoded as a
binary number, and it may comprise a number of k integer and (1.6)
gave, FXP - - 2n - 1 i=1 ~
I fractional bits.
The inverse mapping of a logarithmic triple (Zx, Sx, x) to a
which, by computing the sum on the right-hand side, can be
linear number X is defined by:
written as:
LNS
Xma FXP
x _> Xm~~. (1.11) Of course, the average (relative) error is not the only way to
compare the accuracy of computing systems. Especially true
E . . . . LNS ~ gave, FXP' (1.12) for signal processing systems, one may use the signal-to-noise
ratio (SNR), assuming that quantization errors represent
Hence, from equations 1.5 and 1.7 through 1.10 the following noise, to compare the precision of two systems. In that case,
equations are obtained: by equating the SNRs of the LNS and the fixed-point system
that covers the required dynamic range, the integer and frac-
tional word lengths of the LNS may be computed.
l = I - l o g 2log b ( l + *(2~)-~-~
2 ~ - 1 )7
"[" (1.13)
5 3 2 6 3 3 9 2 3 7
6 4 3 10 3 4 9 2 4 7
7 4 4 10 3 5 9 3 5 12
8 4 5 10 3 5 9 3 6 12
9 4 5 10 4 6 17 3 7 12
10 5 6 20 4 7 17 3 7 12
11 5 7 20 4 8 17 3 8 12
12 5 8 20 4 9 17 4 9 23
13 5 9 20 4 10 17 4 10 23
14 5 10 20 4 11 17 4 II 23
15 5 11 20 4 12 17 4 12 23
182 Thanos Stouraitis
, Sum/difference
Multiply/
divide//
IOgb(1 +_b-Ix-yl)
Add/
subtract ' Product/quotient
FIGURE 1.2 The Organization of a Basic LNS Processor: the processor comprises an adder, two multiplexers, a sign-inversion unit, a look-up
table, and a final adder. It may perform the four operations of addition, subtraction, multiplication, or division.
organization of an LNS processor that can perform the four To use the benefits of LNS, a conversion overhead is re-
basic operations of addition, subtraction, multiplication, or quired in most cases to perform the forward LNS mapping
division is shown in Figure 1.2. Note that to implement LNS defined by equation 1.1. It is noted that conversions of equa-
subtraction (i.e., the addition of two quantities of opposite tions 1.1 and 1.2 are required if an LNS processor receives
sign) a different m e m o r y look-up table (LUT) is required. input or transmits output linear data in digital format. Since
The main complexity of an LNS processor is the implemen- all arithmetic operations can be performed in the logarithmic
tation of the LUTs for storing the values of the functions s a ( d ) domain, only an initial conversion is imposed; therefore, as the
and s s ( d ) . A straightforward implementation is only feasible amount of processing implemented in LNS grows, the contri-
for small word lengths. A different technique can be used for bution of the conversion overhead to power dissipation and to
larger word lengths based on the partitioning of an LUT into area-time complexity becomes negligible because it remains
an assortment of smaller LUTs. The particular partitioning constant.
becomes possible due to the nonlinear behavior of the addition In stand-alone DSP systems, the adoption of a different
and subtraction functions, log b (1 + b -a) and log b (1 - b d), solution to the conversion problem is possible. In particular,
respectively, that are depicted in Figure 1.3 for b = 2. By the LNS forward and inverse mapping overhead can be miti-
exploiting the different minimal word length required by gated by converting the analog data directly into digital loga-
groups of function samples, the overall size of the LUT is rithms.
compressed, leading to a LUT organization of Figure 1.4. In
addition to the above techniques, reduction of the size of
m e m o r y can be achieved by proper selection of the base of LNS Arithmetic Example
the logarithms. It turns out that the same bases that yield L e t X = 2.75, Y = 5.65, and b = 2. P e r f o r m t h e o p e r a t i o n s X . Y,
m i n i m u m power consumption for the LNS arithmetic unit X + Y, v/X and y2 u s i n g t h e L N S .
by reducing the bit activity, as mentioned in the next section, Initially, the data are transferred to the logarithmic domain
also result in m i n i m u m LUT sizes. as implied by equation 1.h
1 Logarithmic and Residue Number Systems for VLSI Arithmetic 183
1 i , , , ,
6 8
0.8
0.6
0.4
0.2 -3
2 4 6
(a) : S a (O) (b) : s s (d)
FIGURE 1.3 The Functions sa(d) and s,(d): Approximations required for LNS addition and subtraction.
1 1
w = - x = - 1.4594 = 0.7297. (1.23)
2 2
W = 20.7297 = 1 . 6 5 8 3 . (1.24)
As b o t h o p e r a n d s are o f the same sign (i.e., Sx = sy = 0), the = 2.4983 + log 2 (1 + 2 -1"0389) (1.28)
sign o f the p r o d u c t is s~ = 0. In addition, because
Zx ¢ 1 and zy ¢ 1, the result is n o n - z e r o (i.e., z~ = 0). =3.0704. (1.29)
184 Thanos Stouraitis
The actual value of the sum W = X + Y is obtained as: 1Ogb(1 ± by-X), although different approaches have been
proposed in the literature (Orginos et aL, 1995; Paliouras and
W = 23.0704 = 8.4001. (1.30) Stouraitis, 1996). An LUT operation requires a ROM of n × 2 ~
bits, a size that can inhibit use of LNS for large values of n.
1.2.3 LNS a n d P o w e r D i s s i p a t i o n In an attempt to solve this problem, efficient table reduction
techniques have been proposed (Taylor et al., 1988). As a result
Power dissipation minimization is sought at all levels of design
of the above analysis, applications with a computational load
abstraction, ranging from software and hardware partitioning
dominated by operations of simple LNS implementation can
down to technology-related issues. The average power dissipa-
be expected to gain power dissipation reduction due to the
tion in a circuit is computed via the relationship:
LNS impact on architecture complexity.
Since multiplication-additions are important in DSP appli-
Pave = ~lkCLgffl d, (1.31)
cations, the power requirements of an LNS and a linear fixed-
point adder-multiplier have been compared. It has been
where fdk is the clock frequency, CL is the total switching reported that approximately a two times reduction in power
capacitance, Vda is the supply voltage, and a is the average dissipation is possible for operations with word sizes of 8 to 14
activity in a clock period. bits (Paliouras and Stouraitis, 2001). Given a sufficient number
LNS is applicable for low-power design because it reduces of consecutive multiplication-additions, the LNS implementa-
the complexity of certain arithmetic operators and the bit tion becomes more efficient from the low-power dissipation
activity. viewpoint, even when a constant conversion overhead is taken
into consideration.
Power Dissipation and LNS Architecture
LNS exploits properties of the logarithm function to reduce Power Dissipation and LNS Encoding
the strength of several arithmetic operations; thus, it leads to The encoding of data through logarithms of various bases
complexity savings. By reducing the area complexity of oper- implies variations in the bit activity (i.e., the a factor of
ations, the switching capacitance CL of equation 1.31 can be equation 31 and, therefore, the power dissipation) (Paliouras
reduced. Furthermore, reduction in latency allows for further and Stouraitis, 1996, 2001).
reduction in supply voltage, which also reduces power dissipa- Assuming a uniform distribution of linear n-bit input
tion (Chandrakasan and Brodersen, 1995). A study of the numbers, the distribution of bit assertions of the correspond-
impact of the choice of the number system on the QRD-RLS ing LNS words reveals that LNS can be exploited to reduce the
algorithm revealed that LNS offers accuracy comparable to average activity. Let P0~l(i) be the bit assertion probabilities
that of floating-point operations but only at a fraction of the (i.e., the probability of the /th bit transition from 0 to 1).
switched capacitance per iteration of the algorithm (Sacha and Assuming that data are temporarily independent, it holds that:
Irwin, 1998). The reduction of average switched capacitance of
LNS systems stems from the simplification of basic arithmetic p o ~ ( i ) = po(i)p~(i) = (1 - pl(i) )P~(i), (1.33)
operations, shown in Table 1.2. It can be seen that n-bit
multiplication and division are reduced to (k +/)-bit addition where P0 (i) and Pl (i) is the probability of the/th bit being 0 or
and subtraction, respectively, while the computation of roots 1, respectively. Due to the assumption of uniform data distri-
and powers is reduced to division and multiplication by a bution, it holds that:
constant, respectively. For the common cases of square root
or square, the operation is reduced to left or right shift respect- 1
ively. For example, assume that a n-bit carry-save array multi- po(i) = pl(i) 2 (1.34)
plier, which has a complexity of n 2 - n 1-bit full adders (FAs),
is replaced by an n-bit adder, assuming k + l = n has a com- which, due to equation 1.33, gives:
plexity of n FAs for a ripple-carry implementation (Koren,
1993). Therefore, multiplication complexity is reduced by a 1
factor rcL, given as: po-.l(i) = - . (1.35)
4
(A) n = 8
1.3 The Residue Number System
POl
Save
/x /\\ ~'~.
15 / \ / -. -. ,k<,.
2.5 .l:
i!
6 8 10 12 14 16
(A) n - bit FXP
Save
6O
" /
/~ \ /',. \ \ " r b=3
50 /"\ /\.\/ x",,k\ \ ' \ \ , ~'' b=1.7
, /, ix / ~ ", \ / /
40
, ,,/ //- \ ",..
30 1% i\ /I ",-- ~ ' \ . \ ",~" \1 b=l.a
6 8 10 12 14 16
neq
FIGURE 1.6 Percentage of Average Activity Reduction from Use of LNS. The percentage is compared to n-bit and to neq-bit linear fixed-point
system for various bases b of the logarithm. The diagram reveals that the optimal selection of b depends on n, and it can lead to significant power
dissipation reduction.
providing speed and power savings at the algorithmic level of ation (X)m returns the integer remainder of the integer di-
the design abstraction. vision x div m (i.e., an integer k such that x = m . l + k) where
l is an integer.
1.3.1 RNS Basics RNS is of interest because basic arithmetic operations can be
performed in a digit-parallel carry-free manner, such as in:
The RNS maps a natural number X in the range [0, M - 1],
with M = uN=I mi, to an N-tuple of residues xi: zi = (xi o Yi)m~, (1.39)
xRNS {X1, X2 . . . . . XN}, (1.38) where i = 1, 2 . . . . . N and where the symbol o stands for
addition, subtraction, or multiplication. Every integer in the
where xi = ( X ) m i, (')m~ denotes the m o d rni operation and range 0 _< X < I-IN_I mi has a unique RNS representation.
where mi is a member of the set of the co-prime integers Inverse conversion may be accomplished by means of the
B = {ml, m2 . . . . . mN} called moduli. Co-prime integers' Chinese remainder theorem (CRT) or the mixed-radix conver-
greatest c o m m o n divisor is gcd(mi, mj) = 1, i ¢ j. The set sion (Soderstrand et al., 1986). The CRT retrieves an integer
of RNS moduli is called the base of RNS. The modulo oper- from its RNS representation as:
1 Logarithmic and Residue N u m b e r Systems for VLSI Arithmetic 187
(1.40)
X~{X1, X2, X 3 } = {(10)3, (10)5, (10)71 (1.44)
={1,0,3}.
where roT, = G , M = M HN_lmi, and m7 1 is the multiplicative Y~{yl, Y2, Y3} = {(5)3, (5)5, ( 5 ) 7 } = {2, O, 51. (1.45)
inverse of ~ modulo m i (i.e., an integer such that
(-mi" ~ 1)mi = 1). The RNS image of the sum Z = X + Y is obtained as:
Using an associated mixed radix system, inverse conversion
may also be performed by translating the residue representa- zRNS{zI, Z2, Z3} = {(1 4- 2)3, (0 4- 055, (3 4- 5)7 ) (1.46)
tions to a mixed radix representation. By choosing the RNS
= {0, 0, 1}.
moduli to be the weights in the mixed radix representation, the
inverse mapping is facilitated by associating the mixed radix
To retrieve the integer that corresponds to the RNS representa-
system with the RNS. Specifically, an integer 0 < X < M can
tion {0, 0, 1} by applying the CRT of equation 1.40, the
be represented by N mixed radix digits (x~l. . . . . x~) as:
following quantities are precomputed: M = 3 - 5 - 7 = 105,
--ml = - 5-=1°5 35, ~ = -5-
=105 21, ~ _ 1 0 5 ~ _ = 15, /7"/11-1 = 2 ,
X = Xm(mN-lmN 2 . ml) 4 - . . . 4- x3(m2ml)
' m~ 1 = 1, and m33 1 = 1. The value of the sum in integer
(1.41)
4-4ml 4-Xll , form is obtained by applying equation 1.40
!
where 0 _< ~ < mi, i = 1 . . . N, and the x i can be generated Z = X 4- Y = (35(2.0)3 4- 21(1.0)5 4- 15(1- 1)7)105
sequentially from the xi using only residue arithmetic, such as = (15)105 = 15. (1.47)
in:
To verify the result of equation 1.46, notice that
X '1 ~ (x) ml ~ Xl X 4 - Y = 10 4- 5 = 1 5 and that:
x~ = (m~-l(x -- Xtl))m 2 (1.42)
RNS
15---+{(15)3, (15)5, (15)7 } = {0, 0, 1} = {Zl, z2, z3}, (1.48)
=
(m21(mll(X --
Xtl)) -- 4))m3,
= (((X 3 t
-- xl)m -1 !
1 m3 -- x2)m 2-1 m3)m3 with 0 < Ztl < 3, 0 _< z~ < 5, 0 _< z~ < 7 and the following:
(1.43)
Ztl z Z 1 ~ 0
XN= ( ( "'" ( ( x u - x l ' ) m l lmm -- ~ ) m 2 1 Z~ = (3 I ( Z 2 -- Ztl))5 = (2" z2)5 = 0
t --1
m, -... - xN 1)m,_lmN)mN.
and
The digits x I can be generated sequentially through residue
subtraction and multiplication by the fixed m~-1. The sequen- z3 (5 -1 [3-1(z3 - z,) - 4 ] ) 7 = - Z'l) - 3 . 257
tial nature of calculation increases the latency of the residues = ((1-0)-3.0)7 = 1
conversion to binary numbers. (1.49)
The set of RNS moduli is often chosen so that the imple-
mentation of the various RNS operations (e.g., addition, so t h a t Z = 1.15+0.3+0= 15.
multiplication, and scaling) becomes efficient. A c o m m o n
choice is the set of moduli {2 n - 1, 2", 2" + 1}, which may
also form a subset of the base of RNS. 1.3.2 R N S A r c h i t e c t u r e s
The basic architecture of an RNS processor in comparison to a
RNS Arithmetic Example binary counterpart is depicted in Figure 1.7. This figure shows
Consider the base B = {3, 5, 7} and two integers X = I0 and that the word length n of the binary counterpart is parti-
Y = 5. The R N S images of X and Yare as written here: tioned into N subwords, the residues, that can be processed
188 Thanos Stouraitis
n
8 \
-O
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . n o
n/M[ ~ ~ _ i n/M
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
independently and are of word length significantly smaller with the rest of the system operating in a "soft failure" mode,
than n. The architecture in Figure 1.7 assumes, without loss being allowed to gracefully degrade into accurate operations of
of generality, that the moduli are of equal word length. The ith reduced dynamic range. Provided that the remaining dynamic
residue channel performs arithmetic modulo mi. range contains the results, there is no problem with this deg-
Most implementations of arithmetic units for RNS consist of radation.
an accumulator and a multiplier and are based on ROMs or The more redundant an RNS is, the easier it is to identify
PLAs. Bayoumi et al. (1983) have analyzed the efficiency of and correct errors. A redundant RNS (RRNS) uses a number r
various VLSI implementations of RNS adders. Moreover, im- of moduli in addition to the N standard moduli that
plementations of arithmetic units that operate in a finite integer are necessary for covering the desired dynamic range. All
ring R(m) and that are called AUras are offered in the literature N ÷ r moduli must be relatively prime. In an RRNS, a number
(Stouraitis, 1993). They are less costly and require less area and X is presented by a total of N nonredundant residue digits
lower hardware complexity and power consumption. They are {X2. . . . . XN} plus r redundant residue digits { X N + 1 . . . . XN+r}.
based on continuously decomposing the residue bits that cor- Of the total number of states, MR = IIN+rm i=~ , is represented by
respond to powers of 2 that are larger than or equal to 2 n, until the RRNS. The M = IIiN1mi first states constitute its "legitim-
they are reduced to a set of bits that correspond to a sum of ate range" while any number that lies in the range (M, MR), is
powers of 2 that is less than 2 n, where n = [log2 m1. This called "illegitimate."
decomposition is implemented by using full adder (FA) arrays. Any single error moves a legitimate number X into an
For all moduli, the FA-based AUras are shown to execute much illegitimate number X t. Once it is verified that the number
faster as well as have much smaller hardware complexity and being tested is illegitimate, its digits are discarded one by one,
time-complexity products than ROM-hased general multi- until a legitimate representation is found. The discarded digit
pliers. Since the AUras use full adders as their basic units, they whose omission results in the legitimate representation is
lead to modular and regular designs, which are inexpensive and the erroneous one. A correct digit can then be produced
easy to implement in VLSI. by extending the base of the reduced RNS that produced
the legitimate representation. The above error-locating-and-
correcting procedure can be implemented in a variety of
1.3.3 Error Tolerance in RNS Systems ways. Assuming that the mixed radix representations of all
Because there is no interaction among digits (channels) in the reduced RNS representations can be efficiently generated,
residue arithmetic, any errors generated at one digit cannot the legitimate one can be easily identified by checking the
propagate and contaminate other channels during subsequent highest order mixed radix digit against zero. If it is zero,
operations, given that no conversion has occurred from the the representation is legitimate.
RNS to a weighted representation. Mixed radix representations associated with the RNS
In addition, because there is no weight associated with the numbers can be used to detect overflows as well as to detect
RNS residues (digits), if any digit becomes corrupted, the and correct errors in redundant RNS systems. For example,
associated channel may be easily identified and dealt with. to detect overflows, a redundant modulus mN+l is added to
Based on the amount of redundancy that is built in an RNS the base and the corresponding highest order mixed radix
processor, the faulty channels may be replaced or just isolated, digit aN.l is found and compared to zero. Assuming that the
1 Logarithmic and Residue Number Systems for VLSI Arithmetic 189
number being tested for overflow is not large enough to Monte Carlo runs. It is observed that RNS performs better than
overflow the augmented range of the redundant system, over- two's complement representation for anticorrelated data and
flow occurs whenever aN+a is not zero. slightly worse than sign-magnitude and two's complement
representations for uncorrelated and correlated sequences.
References Paliouras, V., and Stouraitis, T. (1996). A novel algorithm for accurate
Bayoumi, M.A., Jullien, G.A., and Miller, W.C. (1983). Models of VLSI logarithmic number system subtraction. Proceedings of Inter-
implementation of residue number system arithmetic modules. national Symposium on Circuits and Systems. 4, 268-271.
Proceedings of 6th Symposium on Computer Arithmetic, 412-413. Peebles, EZ. Jr. (1987). Probability, random variables, and random
Chandrakasan, A.P., and Brodersen, R.W. (1995). Low power digital signal principles. New York: McGraw-Hill.
CMOS design. Boston: Kluwer Academic Publishers. Soderstrand, M.A., Jenkins, W.K., Jullien, G.A., and Taylor, EJ. (1986).
Chren, W.A., Jr., (1998). One-hot residue coding for low delay-power Residue number arithmetic: Modern applications in digital signal
product CMOS design. IEEE Transactions on Circuits and Systems-- processing. New York: IEEE Press.
Part II 45, 303-313. Stouraitis, T., Kim, S.W., and Skavantzos, A. (1993). Full adder-based
Freking, W.L., and Parhi, K.K. (1997). Low-power FIR digital filters units for finite integer rings. IEEE Transactions on Circuits and
using residue arithmetic. Proceedings of Thirty-first Asilomar Con- Systems--Part H 40, 740-745.
ference on Signals, Systems, and Computers 739-743. Taylor, E, Gill, R., Joseph, J., and Radke, J. (1988). A 20-bit logarith-
Ibrahim, M.K. (1994). Novel digital filter implementations using mic number system processor. IEEE Transactions on Computers, 37,
hybrid RNS-binary arithmetic. Signal Processing 40, 287-294. 190-199.
Koren, I. (1993). Computer arithmetic algorithms. Englewood Cliffs, Taylor, EJ., Papadourakis, G., Skavantzos, A., and Stouraitis, T. A
NJ: Prentice Hall. radix-4 FFT using complex RNS arithmetic. IEEE Transactions on
Orginos, I., Paliouras, V., and Stouraitis, T. (1995). A novel algorithm Computers C-34, 573-576.
for multioperand logarithmic number system addition and sub- Sacha, J.R., and Irwin, M.J. (1998). The logarithmic number system
traction using polynomial approximation. Proceedings of Inter- for strength reduction in adaptive filtering. Proceedings of
national Symposium on Circuits and Systems, III.1992-III.1995. International Symposium on Low-Power Electronics and Design,
Paliouras, V., and Stouraitis, T. (2001). Signal activity and power 256-261.
consumption reduction using the logarithmic number system. Pro- Stouraitis, T. (1986). Logarithmic number system: Theory, analysis and
ceedings of IEEE International Symposium on Circuits and Systems, design. Ph.D. diss., University of Florida.
II.653-II.656. Szab6, N., and Tanaka, R. (1967). Residue arithmetic and its applica-
Paliouras, V., and Stouraitis, T. (2001). Low-power properties of the tions to computer technology. New York: McGraw-Hill.
logarithmic number system. Proceedings of the 15th Symposium on
Computer Arithmetic (ARITH15), 229-236.