Vous êtes sur la page 1sur 5

Lowering Power Dissipation and Energy Consumption in Arithmetic Logic Units

From the Years 2002 to 2015

James Harrison
Department of Electrical and Computer Engineering
University of Central Florida
Orlando, FL 32816-2362

Abstract When designing a processor or a logical circuit


one of the most important factors that needs to be taken into
account is power and energy consumption. This paper will
evaluate how different Arithmetic Logic Units (ALU) are
designed with a purpose of lowering power consumption and
being more energy efficient. Along with Power, datapath
width, the number of bits in operands, ITRS technology node,
execution time, and area will all be measured. Some examples
of Arithmetic Logic Units tha will be discussed are Low Power
Delay Fault Testable 32b ALU, the Ultra-area-efficient faulttolerant QCA full adder as well as the High Throughput Power
Aware FIR Filter.
KeywordsALU, Power, Data bus, Execution, time, adder,
multiplier, floating point, area, energy

I. INTRODUCTION
In the world today there are typically two components of a
Central Processing Unit (CPU), the Arithmetic Logic Unit or
ALU for short and the Control Unit (CU). In the CPU the ALU
performs arithmetic and logical operations, it is a fundamental
building block of the CPU [11]. Some examples of the
Arithmetic operations performed by an ALU would be
subtraction, multiplication, addition and also division. As the
name implies the ALU can also perform logical operations
such as XOR, OR, AND. A logical operation can also be a
comparison, the unit can compare letters, characters or
numbers and based on the result of the comparison that took
place, the computer can take a specific action [11].
In [12] a computer before you can execute an instruction,
data and program instructions must be place from a device,
whether it be an input or secondary storage, into memory.
Once they are in memory the CPU can perform four steps for
each of the instructions. The first two steps are called the
Instruction Time and the last two steps are called Execution
time. First the Control Unit will get the instruction from
memory, and then it will decode the instruction and direct the
data to the ALU. The ALU then executes the arithmetic or
logical instruction based off the data, and then stores the result
of the operation in memory or it can also store the result into a
register. The ALU is the unit that has control and performs the
actual operations on the data. [12]
There are many important metrics when dealing with
computers such as data bus width, ITRS technology node,

execution time, power dissipation, and energy consumption of


processors [13].
The data bus width helps to determine its data rate, which
is, the number of bytes per second it can carry. This is one of
the main factors determining the processing power of a
computer. A 32-bit bus is what most current processor designs
today are using, which means that 32 bits of data can be
transferred at one time, the wider the bus the more information
that can be transferred [13].
Execution time is defined as the time in which it takes a
single instruction is executed, the last portion of the instruction
cycle which is comprised of the actions taken by the ALU,
such as executing the arithmetic or logical instructions based
off the data and then storing the result into a register [13].
The process that CPUs use to consume electrical energy
and also to disperse or dissipate this energy by losing it as heat
or also by switching devices within the CPU, this process is
called Power Dissipation [13].
In the forthcoming sections of this paper, 10 ALU designs
ranging in years from 2003 to the year of 2015 will be
discussed, and also how they lowered the power dissipated and
also improved energy consumption by different ALU designs.

II. LITERATURE REVIEW


One of the most important factors that need to be
considered when designing a logical circuit or a processor is
power as well as energy consumption. The usage of power,
meaning how much power a digital logic circuit is consuming,
greatly affects the performance of the system. Many different
Arithmetic Logic units have been designed with one of the
main purposes being to lower the power dissipated. Below are
10 models and implementations of different method which all
lower power or energy consumption, in some way.
In [1] the design reviewed is the IEEE 754 Single Precision
Floating-Point Unit in 2015. In this paper different methods of
improving the energy efficiency are discussed one being
partially truncating the computation of mantissa and allowing
the bit-width of mantissa to be dynamically interchangeable in
the multiplicand, multiplier and output product. Another
method to minimize power consumption and energy of digital
systems, which are implemented in Complementary Metal

Page 1 of 5

Oxide Semiconductors would be to reduce the supply voltage


near to the threshold voltage, this will cause a penalty to
performance and will also have an impact on the logical speed
[1].
In [2] implementation of Quantum dot cellular automata
(QCA) circuits aim toward lowering energy consumption, as
well as faster operations. They used a ultra area efficient fault
tolerant QCA full adder in the year 2015. In order to restore
energy four distinct clock phases were applied, the QCA cells
were synchronized by the clock signals, making the accuracy
of the QCA operations more accurate allowing more efficient
performance [2].
Paper [3] proposes a computing scheme, which is centered
on probabilistic domain transformation aiming for fault
resilience as well as low power operations. By switching form
the normal multiplier-based convolution methods [3] presents a
multiplier less probabilistic convolution. They used a energy
efficient multiplier in 2014. In this model a lightweight adder
replaces the expensive multipliers, through probabilistic
domain transformation. This allows the more basic operations
to be performed to achieve higher energy-efficiency at a lower
cost of the hardware. [3]
In 2003 a High power Aware Finite Impulse Adder and
multiplier, the average power dissipation and latency are both
significantly reduced by pipelining the multipliers and adders.
In order to save the power dissipation the power awareness was
improved. By using a selective method and 2-D gating
technique together power awareness as well as reducing
latency of the FIR filter was achieved [4].
In [5] we learn about the implementation of the Xilinx
DSP48 multiplier in 2014. They implemented the Priority
Using Resource Escalation (PURE) approach, which provided
an adaptive and dynamic reconfiguration to achieve
survivability. PURE achieves the objectives, of dynamic
reconfiguration of redundancy permits autonomous operations
while maintaining a defined quality measure within area
resource, power and energy constraints, at reduced power
overheads and area compared to static redundancy schemes. It
does so by adapting a uniplex instance of the data path when
aberrant behavior occurs [5].

In 2002 a high-speed 4 bits ALU is designed for 1 Volt


operations in order to display how useful the back gate forward
substrate bias method (BGFSB) can be. This ALU used a
ripple carry adder and was also capable of performing eight
operations, four of them being Logical and four being
arithmetic. The BGFSB method is low voltage as well as
capable of high speed applications. In the steady state the
subthreshhold current increase due to a reduction in the
threshold voltage because of BGFSB. [6]

In 2009 a Vedic Multiplier Module as well as a 64-bit


Adder were used in order to reduce the complexity, area
execution time and the power in computations. The design
produced a high-speed power efficient multiplier [7].
A 16 bit low power pipelined RISC processor is used in
paper [8], using a carry select adder in 2015. To design the
RISC processor a Verilog HDL was used, it was evaluated
using the XILINX KINTEX XC7K1607-3fbg676, a 28 nm
technology processor was used to implement the two clock
cycles. The 16-bit RISC processor using a 2-stage pipeline that
will increase the speed and also reduce the latency. Also a
design technique called clock gating was used, clock gating is a
low power technique which reduces the consumption of power.
By using this method the dynamic power was greatly reduced
from .71 watts and the quiescent power was reduced to .149
watts and the total power was reduced to .22 watts [8].
In [9] an Upper order 32b adder was used in the year 2005
because fast 32 and 64b ALU operations with a single cycle
latency and throughput are essential ingredients of high
performance microprocessors execution cores [9]. In the 64b
mode power performance was optimized by a gated secondary
off chip supply voltage. By using high-speed single rail
dynamic circuit techniques and a sparse tree semi dynamic
adder low dynamic power consumption and high noise
robustness with a maximum voltage was obtainable. For
efficient power performance tradeoff, the upper order 32b carry
merge tree slack was able to lower its supply voltage to 1V.
This resulted in an extra 22% power benefit [9].
In paper [10] the design used was a 32b ALU, which
consisted of a 32-bit adder in the year 2005. This design
allowed low power operations while also supporting a design
for test (DST) scheme. This resulted in a 22% reduction in the
standby mode power leakage, and an 18% reduction in ALU
total energy. This method integrated a delay fault testable
scheme with logic design flow and could detect a large range
of delay fault. These design techniques can help in achieving
low power operation of high end digital ICs [10].

III. DATA ANALYSIS

Power vs Year

.
Metrics covered by various papers which are
suitable for plotting:

Year

2015

Data bus width (bits) vs. Year


ITRS technology node (nm) vs. Year

2010
2005
2000

Execution time per ALU or Floating Point Unit


operation (nsec) vs. Year
Power or Energy vs. Year

Power (mW)

Fig. 4. This chart is for Power vs. Year, some of the data had power
in different units or didnt have a power measurement so they are
indicataed as low on the chart.

Databus Width vs year


IV. CONCLUSION

2020

Year

2015
2010
2005
2000
32

1 128 64 128 4

64 16 64 32

Data bus width (bits)

Fig. 1. This chart represents the data bus width in bits vs the year
from 2003-2015

ITRS Technology Node vs Year

Lowering power and energy consumption is a very


important aspect of designing and utilizing ALUs. This paper
covered many different ALU designs and implementations,
which were trying to lower the power and energy consumption,
along with other metrics such as execution time. In [6] I was
able to see the effect of using a Ripple carry Adder as I
previously learned about in module-09 as well and in [8] the
effect of a carry select adder was shown.

REFERENCES
[1]

S. Salehi, and R. F. DeMara, "Energy and Area Analysis of a FloatingPoint Unit in 15nm CMOS Process Technology," in Proceedings of
IEEE SoutheastCon 2015 (SECon-2015), Fort Lauderdale, FL, April 9 12, 2015.

[2]

A. Roohi, R. F. DeMara, and N. Khoshavi, "Design and Evaluation of


an Ultra-Area-Efficient
Fault-Tolerant QCA Full Adder,"
Microelectronics Journal, Vol. 46, No. 6, pp. 531-542., June 2015,

[3]

M. Alawad, Y. Bai, R. F. DeMara, and M. Lin, Energy-Efficient


Multiplier-Less Discrete Convolver through Probabilistic Domain
Transformation , in Proceedings of 22nd ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays (FPGA-14), pp. 185188, Monterey, California, USA, February 27-28, 2014.

[4]

J. Di, J. S. Yuan, and R. DeMara, "High Throughput Power-aware FIR


Filter Design based on Fine-grain Pipeline Multipliers and Adders," in
Proceedings of the 2003 IEEE Annual Symposium on VLSI (ISVLSI03), pp. 260-261, Tampa, Florida, U.S.A., February 20-21, 2003.

[5]

N. Imran, R. F. DeMara, J. Lee, and J. Huang, "Self-adapting Resource


Escalation for Resilient Signal Processing Architectures," The Springer
Journal of Signal Processing Systems (JSPS), December 2014, Volume
77, Issue 3, pp. 257-280.

[6]

A. Srivastava and D. Govindarajan, A Fast ALU Design in CMOS for


Low Voltage Operation, VLSI Design, vol. 14, no. 4, pp. 315-327,
2002.

2020

Year

2015
2010
2005
2000
45

15

18

240

28

90

180

ITRS Technology Node(nm)

Fig. 2. This graph is for ITRS Technology Node vs Year. Some data
had units other than nm or didnt have an ITRS Technology Node
listed so that data was not placed into the chart.

[7]

Ramalatha, M.; Dayalan, K.D.; Dharani, P.; Priya, S.D., "High speed
energy efficient ALU design using Vedic multiplication techniques,"
Advances in Computational Tools for Engineering Applications, 2009.
ACTEA '09. International Conference on , vol., no., pp.600,603, 15-17
July 2009.

[8]

Trivedi, Priyanka; Tripathi, Rajan Prasad, "Design & analysis of 16 bit


RISC processor using low power pipelining," Computing,
Communication & Automation (ICCCA), 2015 International Conference
on , vol., no., pp.1294,1297, 15-16 May 2015.

[9]

Mathew, S.K.; Anders, M.A.; Bloechel, B.; Trang Nguyen;


Krishnamurthy, R.K.; Borkar, S., "A 4-GHz 300-mW 64-bit integer
execution ALU with dual supply voltages in 90-nm CMOS," Solid-State
Circuits, IEEE Journal of , vol.40, no.1, pp.44,51, Jan. 2005.

[10] Chatterjee, B.; Sachdev, M., "Design of a 1.7-GHz low-power delayfault-testable 32-b ALU in 180-nm CMOS technology," Very Large
Scale Integration (VLSI) Systems, IEEE Transactions on , vol.13, no.11,
pp.1296,1304, Nov. 2005.

Additional References
[11] Angelina. "What Is A CPU and What Does It Do? [Technology
Explained]." MakeUseOf. N.p., n.d. Web. 07 Dec. 2015.

[12] Zandbergen, Paul. "Arithmetic Logic Unit (ALU): Definition, Design &
Function." Study. N.p., n.d. Web. 07 Dec. 2015.
[13] Patterson, David A., and John L. Hennessy. Computer Organization and
Design: The Hardware/software Interface. N.p.: n.p., n.d. Print.

TABLE I.

ALU or Floating Point


Architecture Name

ALU ARCHITECTURE NAME AND SPECIFICATIONS.

Time for Operation


or
Design Type

Datapath width (bits)


or
#bits in operands
Adder

Multiplier

Floating Point

ITRS Technology
Node (nm)
or
Area
or
Model of Chip
used

Energy/Power
Consumption(W or J)
else
indicate low or
high

Energy and Area Analysis


of a Floating-Point Unit [1]

32 bits (Operands)

N/A

N/A

IEEE-754
Single
Precision

45nm and 15nm


(ITRS Node)

2.048mW (45nm)
0.6340mW (15nm)

Ultra-area-efficient faulttolerant QCA full adder [2]

1 bit (Operands)

Ultra-areaefficient faulttolerant QCA


full adder

N/A

N/A

18nm^2 (Cell
Area)

low

N/A

Virtex 6 FPGA
devices
(XC6VLX550t)
(Model of Chip
used)

166.63 nJ

N/A

.24 static
CMOS logic

low

N/A

N/A

low

Energy-Efficient MultiplierLess Discrete Convolver


through Probabilistic
Domain Transformation [3]

128 bits (Operands)

N/A

4.09 s
EnergyEfficient
Multiplier

High Throughput Power


Aware FIR Filter [4]

64 bits(Operands)

High
Throughput
Power Aware
FIR Adder

High
Throughput
Power Aware
FIR multiplier

Advanced Encryption
Standard Design [5]

128Bits (Operands)

N/A

Xilinx DSP48

Back- gate forward


4 bit(Operand)

Vedic ALU [7]

1.2 mm N-well

Ripple carry

substrate bias (BGFSB)


method [6]

multiplier

64 bit(Operand)

adder

64 bit Adder

CMOS
N/A

N/A

Vedic
Multiplier
Module

N/A

Technology

N/A

low

low

XILINX
KINTEX
XC7K160716 Bit (Operands)
16 bit RISC Processor [8]

Carry Select
Adder

N/A

N/A

3fbg676 in it kit

.22Watts

28 nm
technology

Integer Execution ALU with


Dual Supply Voltages [9]

64bit(Operands)

Upper order
32 bit Adder

N/A

N/A

32 bit (operands)

32 bit Adder

N/A

N/A

Low Power Delay Fault

Testable 32b ALU

[10]

Figure 1 ALU Design

90nm CMOS

180nm CMOS

300mW

200W

Vous aimerez peut-être aussi