MCSLA

CHAPTER 1
INTRODUCTION
1.1 AIM
The challenge of the verifying a large design is growing exponentially. There is a
need to define new methods that makes functional verification easy. Several strategies in
the recent years have been proposed to achieve good functional verification with less
effort. Recent advancement towards this goal is methodologies. The methodology defines
a skeleton over which one can add flesh and skin to their requirements to achieve
functional verification.
The report is organized as two major portions; first part is brief introduction and
history of the functional verification of regular Carry select adder which tells about
different advantages of Carry select adder and RCA architecture and in this Regular
model, there is a drawback and in order to overcome that complexity, the modified
architecture of CSLA has been designed.
Thus, the aim of this project is to design a simple and efficient gate level
modification to significantly reduce the area and powerof the CSLA.Based on this
modification, 8-, 16-, 32-, and 64-b square-root CSLA (SQRT CSLA) architecture have
been developed and compared withthe regular SQRT CSLA architecture.
1.2 BASIC IDEA

The CSLA is used in many computational systems to alleviate theproblem of carry
propagation delay by independently generating multiplecarries and then select a carry to
generate the sum. However,the CSLA is not area efficient because it uses multiple pairs of
RippleCarry Adders (RCA) to generate partial sum and carry by considering carry input
Cin=0 and Cin=1, then the final sum and carry areselected by the multiplexers (mux).
The basic idea of this work is to use AOI instead of RCA with Cin=1 in the regular
CSLA to achievelower area and power consumption. The main advantage of this AOI
logic comes from the lesser number of logic gates than the n-bitFull Adder (FA) structure.
1.3 NEED FOR LOW POWER AND AREA EFFICIENT DESIGN

Design of area and powerefficient highspeed data path logicsystems are one of the
most substantial areas of research in VLSIsystem design. In digital adders, the speed of
addition is limited by thetime required to propagate a carry through the adder. The sum
for eachbit position in an elementary adder is generated sequentially only afterthe
previous bit position has been summed and a carry propagated intothe next position.
Thisadder plays a vital role in many data processing processors to perform fast
arithmetic functions.Hence to resolve this issue, this adder has been developed to reduce
the propagation delay for the carry to propagate to the next position.
Now another important point here is the evaluation of the carry select adder is
compared with the proposed design as it has a more balanced delay and requires lower
power andarea.
1.4 REGULAR CARRY SELECT ADDER

The Regular Carry Select Adder is represented in Figure 1.1. Basically this project is
mainly targeted for data processing processors to perform fast arithmetic functions.
Fig 1.1: Regular Carry Select Adder
Here in this design, the carry select adder is designed using Ripple carry adders and
multiplexers. The design can be viewed as groups where the groups are internally
designed using n-bit RCA and multiplexers.
Since this design uses multiple pairs of Ripple carry adders to generate partial sum
and carry, it is not area efficient. Thus Binary to Excess-1 Converter is used (BEC)
instead of RCA with Cin=1 in the regular CSLA to achievelower area and power
consumption. The main advantage of thisBEC logic comes from the lesser number of
logic gates than the n-bitFull Adder (FA) structure.
1.5 MODIFIED CARRY SELECT ADDER
Fig 1.2: Modified Carry Select Adder

As stated above, the main idea of this work is to use AOI instead of the RCA with
Cin=1 in order to reduce the area and power consumptionof the regular CSLA. To replace
the n-bit RCA, an (n+1)-bit AOI is required.
Finally, the performance of the two designs is evaluated in terms of area, power,
delay and their products- area-delay product and power-delay product.
CHAPTER 2
LITERATURE REVIEW
2.1 INTRODUCTION TO VLSI:
The electronics industry has achieved a phenomenal growth over the last two
decades, mainly due to the rapid advances in integration technologies, large-scale systems
design due to the advent of VLSI. The number of applications of integrated circuits in
high-performance computing, telecommunications and consumer electronics has been
rising steadily and at a very fast pace. Typically, the required computational power of
these applications is the driving force for the fast development of this field.
The figure 2.1 gives an overview of the prominent trends in information technologies
over the next few decades. The current leadingedge technologies already provide the endusers, a certain amount of processing power and portability.
Fig 2.1: Overview of the prominent trends in information technologies

This trend is expected to continue with very important implications on VLSI and
systems design. One of the most important characteristics of information services is their
increasing need for very high processing power and bandwidth (in order to handle realtime video, for example).
The other important characteristic is that the information services tend to become
more and more personalized (as opposed to collective services such as broadcasting),
which means that the devices must be more intelligent to answer individual demands and
at the same time they must be portable to allow more flexibility/mobility.
As more and more complex functions are required in various data processing and
telecommunications devices, the need to integrate these functions in a small
system/package is also increasing.
The level of integration, as measured by the number of logic gates in a monolithic
chip, has been steadily rising for almost three decades mainly due to the rapid progress in
processing technology and interconnect technology.
Table 2.1 shows the evolution of logic complexity in integrated circuits over the last
three decades and marks the milestones of each era. Here, the numbers for circuit
complexity should be interpreted only as representative examples to show the order-ofmagnitude. A logic block can contain anywhere from 10 to 100 transistors, depending on
the function. State-of-the-art examples of ULSI chips, such as the DEC Alpha or the
INTEL Pentium contain 3 to 6 million transistors.
T
able-2.1: Evolution of logic complexity in integrated circuits
The most important message here is that the logic complexity per chip has been (and
still is) increasing exponentially. The monolithic integration of a large number of
functions on a single chip usually provides:
Less area/volume and therefore compactness.

Less power consumption.
Less testing requirements at system level.
Higher reliability, mainly due to improved on-chip interconnects.
Higher speed, due to significantly reduced interconnection length.
Significant cost savings.
The discussionon different types of adders is carried out here and the comparison is
carried out with respect to their own functionalities.
2.2 Introduction to Adders:

Addition usually impacts the overall performance of digital systems and a crucial
arithmetic function. In electronic applications adders are most widely used.
The applications where the adders are used are multipliers, DSP to execute various
algorithms like FFT, FIR and IIR. The adders come into the picture wherever the concept
of multiplication comes. As the millions of instructions per second are performed in
microprocessors,so the speed of operation is the most important constraint to be
considered while designing multipliers. Due to device portability, miniaturization of
device should be high and power consumption should be low. Devices like Mobile,
Laptops etcrequire more battery backup.
So, a VLSI designer has to optimize these three parameters in a design. These
constraints are very difficult to achieve.So depending on demand or application, some
compromise between constraints has to be made. Ripple carry adders exhibits the most
compact design but the slowest in speed whereas carry look aheadadder is the fastest one
but consumes more area. Carry select adders act as a compromise between the two
adders. In 2002, a new concept of hybrid adders is presented to speed up addition process
by Wang et al. that gives hybrid carry look-ahead/carry select adders design. In 2008, low
power multipliers, based on new hybrid full adders is presented in.
Much of the research efforts of the past years in the area of digital electronics have
been directed towards increasing the speed of digital system. Recently the requirement of
portability and the moderate improvement in battery performance indicates that the power
dissipation is one of the most critical design parameter.
The three most widely accepted metrics to measure the quality of a circuit or to
compare various circuit styles are area, delay and power dissipation. Portability imposes
a strict limitation on power dissipation while still demands high computational speeds.
Hence, in recent VLSI Systems, the power-delay product becomes the most essential
metric of performance. The reduction of the power dissipation and the improvement of
the speed require optimizations at all levels of the design procedure. Since, most digital
circuitry is composed of simple and/or complex gates,we study the best way to implement
adders in order to achieve low power dissipation and high speed.
Design of area and power efficient high-speed data path logicsystems are one of the
6
most substantial areas of research in VLSI system design. In digital adders, the speed of
addition is limited by the time required to propagate a carry through the adder. The sum
for each bit position in an elementary adder is generated sequentially only after the
previous bit position has been summed and a carry propagated into the next position.
The CSLA is used in many computational systems to alleviate the problem of carry
propagation delay by independently generating multiple carries and then select a carry to
generate the sum. However, the CSLA is not area efficient because it uses multiple pairs
of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry
input Cin=0 and Cin=1, then the final sum and carry are selected by the multiplexers
(mux).
Adder is about a digital circuit. In electronics, an adder or summer is a digital
circuit that performs addition of numbers. In many computers and other kinds of
processors, adders are used not only in the arithmetic logic units, but also in other parts of
the processor, where they are used to calculate addresses, table indices and similar.
Although adders can be constructed for many numerical representations, such
as binary-coded
decimal or excess-3
but
the
most
common
adders
operate
on binary numbers. In cases where two's complement or ones' complement is being used
to represent negative numbers, it is trivial to modify an adder into an addersubtractor.
Other signed number representations require a more complex adder.
2.2.1.Basic Adders
2.2.1.1 Half Adder
Fig 2.2: Half adder

The half adder adds two one-bit binary numbers A and B. It has two
outputs, Sum and Carry. The simplest half adder design, shown in figure 2.2, incorporates
an XOR gate for Sum and an AND gate for Carry.
Table 2.2: Truth table of half adder
2.2.1.2 Full adder

With the addition of an OR gate to combine their carry outputs, two half adders can
be combined to make a full adder.
Fig 2.3: Full adder

The logic diagram of Full adder is shown in the figure 2.4. The full adder uses 2
XOR gates for the calculation of its sum and 1 XOR, 1 OR and 2 AND gates for the
calculation of carry out.
Fig 2.4: Logic diagram of Full adder

The fulladder is usually a component in a cascade of adders, which add 8, 16, 32etc
binary numbers. The circuit produces a two-bit output sum represented by the
signals Cout and S. The one-bit full adder's truth table is:
Table 2.3: Truth table of 1 bit full adder

A full adder can be implemented in many different ways such as with a
custom transistor-level circuit or composed of other gates. One example implementation
is withS=ABCin
Cout= (A.B) + (Cin. (A B))

In this implementation, the final OR gate before the carry-out output may be replaced
by an XOR gate without altering the resulting logic. Using only two types of gates is
convenient if the circuit is being implemented using simple IC chips which contain only
one gate type per chip.
In this, Cout can be implemented as Cout=(A.B)
+ (Cin . (A B)).
A full adder can be constructed from two half adders by connecting A and B to the
input of one half adder, connecting the sum from that to an input to the second adder,
connecting Ci to the other input and OR the two carry outputs. Equivalently, S could be
made the three-bit XOR of A, B and Ci and Cout could be made the three-bit majority
function of A, B and Ci.
10
2.2.2 FAST ADDERS

2.2.2.1 Ripple Carry Adder
Concatenating the N full adders forms N bit Ripple carry adder. In this carry out of
previous full adder becomes the input carry for the next full adder. It calculates sum and
carry according to the following equations. As carry ripples from one full adder to the
other, it traverses longest critical path and exhibits worst-case delay.
Si = Ai xor Bi xorCi
Ci+1 = Ai Bi + (Ai + Bi) Ci; where i = 0, 1, , n-1
RCA is the slowest in all adders (O (n) time) but it is very compact in size (O (n)
area). If the ripple carry adder is implemented by concatenating N full adders, the delay
of such an adder is 2N gate delays from Cin to Cout. The delay of adder increases linearly
with increase in number of bits. The block diagram of RCA is shown in figure 2.5.
Fig 2.5: Block diagram of RCA

It is possible to create a logical circuit using multiple full adders to add N-bit
numbers. Each full adder inputs a Cin, which is the Cout of the previous adder. This kind of
adder is a ripple carry adder, since each carry bit "ripples" to the next full adder. It can be
noted that the first (and only the first) full adder may be replaced by a half adder.
The layout of a ripple carry adder is simple, which allows for fast design
time.However, the ripple carry adder is relatively slow, since each full adder must wait for
the carry bit to be calculated from the previous full adder. The gate delay can easily be
calculated by inspection of the full adder circuit. Each full adder requires three levels of
logic. In a 32-bit ripple carry adder, there are 32 full adders, so the critical path (worst
case) delay is 3 (from input to carry in first adder) + 31 * 2 (for carry propagation in later
adders) = 65 gate delays. A design with alternating carry polarities and optimized ANDOR-Invert gates can be about twice as fast.
11
2.2.2.2 Carrylookahead adders
Fig 2.6: 4-bit adder with carry lookahead

To reduce the computation time, engineers devised faster ways to add two binary
numbers by using carry-lookahead adders.
They work by creating two signals (P and G) for each bit position, based on if a carry
is propagated through from a less significant bit position (at least one input is a '1'), a
carry is generated in that bit position (both inputs are '1'), or if a carry is killed in that bit
position (both inputs are '0').
In most cases, P is simply the sum output of a half-adder and G is the carry output of
the same adder. After P and G are generated, the carries for every bit position are created.
Some advanced carry-lookahead architectures are the Manchester carry chain, Brent
Kung adder and the KoggeStone adder.
Some other multi-bit adder architectures break the adder into blocks. It is possible to
vary the length of these blocks based on the propagation delay of the circuits to optimize
computation time. These block based adders include the carry bypass adder which
willdetermine P and G values for each block rather than each bit and the carry select
adderwhich pre-generates sum and carry values for either possible carry input to the
block.
12
Other adder designs include thecarry save adder, carry-select adder, conditional-sum
adder, carry-skip adder and carry-complete adder.
Fig 2.7: A 64-bit carry look ahead unit

By combining multiple carry lookahead adders even larger adders can be created.
This can be used at multiple levels to make even larger adders. For example, the
following adder is a 64-bit adder that uses four 16-bit CLAs with two levels of LCUs.
2.2.2.3 Carry Skip Adder (CSKA)
The carry-skip adder reduces the time needed to propagate the carry by skipping over
groups of consecutive adder stages, is known to be comparable in speed to the carry lookahead technique while it uses less logic area and less power.
Uniform sized adder:
A carry skip adder divides the words to be added into groups of equal size of k-bits.
Carry Propagate pi signals may be used within a group of bits to accelerate the carry
propagation. If all the pi signals within the group are pi=1, carry bypasses the entire group
as shown in figure 2.8.
Fig 2.8: Carry skip adder

P = pi * pi+1 * pi+2 * pi+k
In this way, the delay is reduced as compared to ripple carry adder. The worst-case
carry propagation delay in a N-bit carry skip adder with fixed block width b, assuming
that one stage of ripple has the same delay as one skip, can be derived:
13
TCSKA = (b -1)+0.5+(N/b-2)+(b -1) = 2b + N/b 3.5 Stages

Block width tremendously affects the latency of adder. Latency is directly
proportional to block width. More number of blocks means block width is less, hence
more delay.
Variable Block Adder
The idea behind Variable Block Adder (VBA) is to minimize the critical path delay in
the carry chain of a carry skip adder, while allowing the groups to take different sizes. In
case of carry skip adder, such condition will result in more number of skips between
stages.
Such an adder design is called variable block design, which is tremendously used to
fasten the speed of adder. In the variable block carry skip adder design, we divided a 32bit adder into 4 blocks or groups. The bit widths of groups are taken as: First block is of 4
bits, second is of 6 bits, third is 18 bit wide and the last group consist of most significant
4 bits.
Fig 2.9: Architectural block of 8-bit Carry skip adder

The carry skip adder provides a compromise between a ripple carry adder and a
CSLA adder. The carry skip adder divides the words to be added into blocks. Within each
block, ripple carry is used to produce the sum bit and the carry. The Carry Skip Adder
reduces the delay due to the carry computation i.e. by skipping over groups of
consecutive adder stages.
14
If each Ai # Bi in a group, then we do not need to compute the new value of

Ci+1 for that block; the carry-in of the block can be propagated directly to the
next block.
If Ai = Bi = 1 for some i in the group, a carry is generated which may be
propagated up to the output of that group.

If Ai = Bi = 0, a carry will not be propagated by that bit location.
The basic idea of a carry-skip adder is to detect if in each group all Ai # Bi and
enable the blocks carryin to skip the block when this happens as shown in figure 2.8.
In general, a blockskip delay can be different from the delay due to the propagation
of a carry to the next bit position. With carry skip adders, the linear growth of carry chain
delay with the size of the input operands is improved by allowing carries to skip across
blocks of bits, rather than rippling through them.
2.2.2.4 Carry Select Adder (CSLA)
The carry select adder comes in the category of conditional sum adder. Conditional
sum adder works on some condition. Sum and carry are calculated by assuming input
carry as 1 and 0 prior the input carry comes. When actual carry input arrives, the actual
calculated values of sum and carry are selected using a multiplexer.
The conventional carry select adder consists of k/2 bit adder for the lower half of the
bits i.e. least significant bits and for the upper half i.e. most significant bits (MSBs) two
k/2 bit adders.
In MSB adders, one adder assumes carry input as one for performing addition and
another assumes carry input as zero. The carry out calculated from the last stage i.e. least
significant bit stage is used to select the actual calculated values of output carry and sum.
The selection is done by using a multiplexer. This technique of dividing adder in to stages
increases the area utilization but addition operation fastens.
In electronics, a carry-select adder is a particular way to implement an adder, which
is a logic element that computes the (n+1)-bit sum of two n-bit numbers. The carry-select
adder is simple but rather fast, having a gate level depth of O( n).
The carry select adder generally consists of two ripple carry adders and
a multiplexer. Adding two n-bit numbers with a carry select adder is done with two adders
(therefore two ripple carry adders) in order to perform the calculation twice, one time
with the assumption of the carry being zero and the other assuming one. After the two
15
results are calculated, the correct sum, as well as the correct carry, is then selected with
the multiplexer once the correct carry is known.
The number of bits in each carry select block can be uniform or variable. In the
uniform case, the optimal delay occurs for a block size of [n]. When variable, the block
size should have a delay, from addition inputs A and B to the carry out, equal to that of
the multiplexer chain leading into it, so that the carry out is calculated just in time.
The o( n) delay is derived from uniform sizing, where the ideal number of fulladder
elements per block is equal to the square root of the number of bits being added, since
that will yield an equal number of MUX delays.
Design of area- and power-efficient high-speed data path logicsystems are one of the
most substantial areas of research in VLSIsystem design. In digital adders, the speed of
addition is limited by thetime required to propagate a carry through the adder. The sum
for eachbit position in an elementary adder is generated sequentially only afterthe
previous bit position has been summed and a carry propagated intothe next position.
The CSLA is used in many computational systems to alleviate theproblem of carry
propagation delay by independently generating multiplecarries and then select a carry to
generate the sum. However,the CSLA is not area efficient because it uses multiple pairs of
RippleCarry Adders (RCA) to generate partial sum and carry by consideringcarry input
Cin=0 and Cin=1, then the final sum and carry areselected by the multiplexers (mux).
The basic idea of this work is to use Binary to Excess-1 Converter(BEC) instead of
RCA with Cin=1 in the regular CSLA to achievelower area and power consumption. The
main advantage of thisBEC logic comes from the lesser number of logic gates than the nbitFull Adder (FA) structure. The details of the BEC logic are discussedin the next
chapter.
The carry select adder comes in the category of conditional sum adder. Conditional
sum adder works on some condition. Sum and carry are calculated by assuming input
carry as 1 and 0 prior the input carry comes. When actual carry input arrives, the actual
calculated values of sum and carry are selected using a multiplexer.
The conventional carry select adder consists of k/2 bit adder for the lower half of the
bits i.e. least significant bits and for the upper half i.e. most significant bits (MSBs) two
k/ bit adders. In MSB adders, one adder assumes carry input as one for performing
addition and another assumes carry input as zero. The carry out calculated from the last
16
stage i.e. least significant bit stage is used to select the actual calculated values of output
carry and sum. The selection is done by using a multiplexer.
17
2.2.2.4.1 Linear carry select adder (LCSLA):

Linear carry select adder is an adder that is made to reduce calculation time by
having the carry be the limiting factor. It accomplishes the addition by adding small
portions of bits (each of equal size) and wait for the carry to complete the calculation.
It is a trick for critical paths dependent on late input X. It precomputes two possible
outputs for carry=0, 1 and selects proper output when carry arrives.
Fig 2.10: 16-bit Linear carry select adder

2.2.2.4.2 Square root carry select adder:
The square root carry select adder is constructed by equalizing the delay through
two carry chains and the block multiplexer signal from previous stages. This is an
extension of linear carry select adder which improves the delay time greatly.
If we used the square root CSLA, the time can be improved as the time waiting for
the carry bit is used to calculate an extra bit in each stage. Thus, this adder is one of the
fastest adders but it comes at the price of area and power usage.
18
Fig 2.11: 16-bit Square root carry select adder

This adder design can be complemented with a carrylookahead adder structure to
generate the MUX inputs, thus gaining even greater performance as a parallel prefix
adder while potentially reducing area. An example is the KoggeStone adder.
2.3 3:2 compressors

The full adder can be viewed as 3:2 lossycompressor.It sums three one-bit inputs and
returns the result as a single two-bit number; that is, it maps 8 input values to 4 output
values.
Thus,
for
example,
binary
input
of 101 results
in
an
output
of 1+0+1=10 (decimal number '2'). The carry out represents bit 1 of the result, while the
sum represents bit zero. Likewise, the half adder can be used as a 2:2 lossy compressor,
compressing the four possible inputs into three possible outputs.
Such compressors can be used to speed up the summation of three or more addends.
If the addends are exactly three, the layout is known as the carry-save adder. If the
addends are four or more, more than one layer of compressors is necessary and there are
various possible designs for the circuit: the most common are Dadda and Wallace trees.
This kind of circuit is most notably used in multipliers, which is why these circuits are
also known asDadda and Wallace multipliers.
19
2.4 Multiplexer
In electronics, a multiplexer (or MUX) is a device that selects one of several analog
or digital input signals and forwards the selected input into a single line.A multiplexer of
2n inputs has n select lines, which are used to select which input line to send to the
output.
Multiplexers are mainly used to increase the amount of data that can be sent over the
network within a certain amount of time and bandwidth. A multiplexer is also called a
data selector. They are used in CCTV and almost every business that has CCTV fitted,
will own one of these.
An electronic multiplexer makes it possible for several signals to share one device or
resource, for example one A/D converter or one communication line, instead of having
one device per input signal.
On the other hand, a demultiplexer (or demux) is a device taking a single input signal
and selecting one of many data-output-lines, which is connected to the single input. A
multiplexer is often used with a complementary demultiplexer on the receiving end.
An electronic multiplexer can be considered as a multiple-input, single-output switch
and a demultiplexer as a single-input, multiple-output switch.
The schematic symbol for a multiplexer is an isosceles trapezoid with the longer
parallel side containing the input pins and the short parallel side containing the output pin.
The wire connects the desired input to the output based on the selection line.
In digital circuit design, the selector wires are of digital value. In the case of a 2-to-1
multiplexer, a logic value of 0 would connect
would connect
to
where
to the output while a logic value of 1
to the output. In larger multiplexers, the number of selector pins is equal

is the number of inputs.
For example, 9 to 16 inputs would require no fewer than 4 selector pins and 17 to 32
inputs would require no fewer than 5 selector pins. The binary value expressed on these
selector pins determines the selected input pin.
20
2.4.1 2:1 Mux

A 2-to-1 multiplexer has a boolean equation where A and B are the two inputs, S is
the selector input and Z is the output:
Fig 2.12: Block diagram of 2:1 Mux

This truth table shows that when S=0 then Z=A but when S=1 then Z=B. A
straightforward realization of this 2-to-1 multiplexer would need 2 AND gates, an OR
gate and a NOT gate.
Larger multiplexers are also common and as stated above, require
selector
pins for n inputs. Other common sizes are 4-to-1, 8-to-1 and 16-to-1. Since digital logic
uses binary values, powers of 2 are used (4, 8, 16) to maximally control a number of
inputs for the given number of selector inputs.
Fig 2.13: Block representations of various multiplexers

These are two realizations of a 4-to-1 multiplexer:
one realized from a decoder, AND gates and an OR gate
one realized from 3-state buffers and AND gates (the AND gates are acting as the
decoder)
21
Here the output Z is given as

Z= (A.~S0.~S1) + (B.~S0.S1) + (C.S0.~S1) + (D.S0.S1)
Fig 2.14: Implementation of 4:1 MUX using 2:1 MUXs

It can be noted that the subscripts on the
inputs indicate the decimal value of the
binary control inputs at which that input is let through.
22
CHAPTER 3
DESIGN OF 16-bit SQUARE ROOT CARRY SELECT ADDER AND
CARRY SELECT ADDER WITH AOI
Basically, this project can be classified into three major parts.
Design of square root carry select adder

Design of square root carry select adder with AOI (modified CSLA)
Comparison of the two designs in terms of area, power consumption and time
delay.
The above classification will be discussed individually in the following sessions.

3.1 DESIGN OF SQUARE ROOT CSLA
Fig 3.1: Basic building block of CSLA

The basic building block of the square root carry select adder of block size 4 is
shown in the figure 3.1.
23
3.1.1 Block diagram of 16-bit Carry Select adder:
Fig 3.2: Existing system (Regular 16-bit Carry select adder)

The block diagram of the regular 16-bit square root CSLA is shown in the figure
3.2. This adder is a variable sized adder.
The carryselect adder generally consists of two ripple carry adders and
a multiplexer. Adding two n-bit numbers with a carry-select adder is done with two
adders (therefore two ripple carry adders) in order to perform the calculation twice, one
time with the assumption of the carry being zero and the other assuming one. After the
two results are calculated, the correct sum, as well as the correct carry, is then selected
with the multiplexer once the correct carry is known.
24
Fig 3.3: 4-bit carry select adder module topology

Seeing at the figure 3.3, the hardware overhead of the carry select adder is
restricted to an additional carry path and a multiplexer and equals about 80% with respect
to ripple carry adder. A full carry select adder is now constructed by chaining equal
number of adder stages.
The critical path is shaded in gray color. From inspection of the circuit, we can
derive the first order model of the worst case propagation delay of the module written as,
T = tsetup + P tcarry + (2N)^ tmux + tsum
Fig 3.4: Delay propagation of 16-bit CSLA

25
The design procedure and the delay propagation of the 16-bit square root CSLA
can be best explained from the figure 3.4. As from the figure, it can be seen that the
model consists of 5 groups of different size. The addition process is carried out by
considering the carry Cin=0 and Cin=1 and then generating the actual sum and carry
using the actual carry from the previous stage is accomplished.
3.1.2 Architecture of 16-bit square root CSLA:
Fig 3.5: Delay and area evaluation of regular SQRT CSLA: (a) group2, (b) group3, (c)
group4 and (d) group5. F is a Full Adder.
26
This 16-bit square root CSLA consists of five groups where each group is of
variable size. The 16-bit value data is divided as 2-bit, 2-bit, 3-bit, 4-bit, 5-bit groups. The
first group consists of 2-bit ripple carry adder. The actual input carry is applied to this
adder. The ripple carry adder receives the carry and performs the 2 2-bit addition (a[1:0],
b[1:0]).
The 2-bit sum generated from this adder is written as sum[1:0]. The carry
generated by this adder is propagated to the next group with a delay. This delay is
calculated using the basic circuit shown in Fig 3.6.
3.1.3 Delay and Area Evaluation Methodology of the basic adder blocks:
The AND, OR and Inverter (AOI) implementation of an XOR gate is shown in the
figure 3.6. The gates between the dotted lines are performing the operations in parallel
and the numeric representation of each gate indicates the delay contributed by that gate.
Fig 3.6: Delay and Area evaluation of an XOR gate

The delay and area evaluation methodology considers all gates to be made up of
AND, OR and Inverter, each having delay equal to 1 unit and area equal to 1 unit.
Then the number of gates, in the longest path of a logic block,is added that
contributes to the maximum delay.
The area evaluation is done by counting the total number of AOI gates required for
each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder
(HA) and FA are evaluated and listed in Table 3.1.
27
Table 3.1: Delay and area count of the basic blocks of CSLA
3.1.4 Delay and Area Evaluation of CSLA groups:
The structure of the 16-b regular SQRT CSLA is shown in the figure 3.5. Ithas five
groups of different size RCA. The delay and area evaluation ofeach group are shown in
Fig. 5, in which the numerals within [ ] specifythe delay values, e.g., sum2 requires 10
gate delays. The steps leadingto the evaluation are as follows.
1) The group2 has two sets of 2-b RCA. Based onthe consideration of delay values of
Table 3.1, the arrival time ofselection input c1[time(t)= 7] of 6:3 mux is earlier than
s3[t=8] and later than s2[t=6]. Thus, sum3[t=11]is summation of s3 and mux[t=3] and
sum2[t=10] is summation of c1 and mux.
2) Except for group2, the arrival time of mux selection input is alwaysgreater than the
arrival time of data outputs from the RCAs.Thus, the delay of group3 to group5 is
determined, respectively asfollows:
{c6, sum [6:4]}=c3 [t=10] +mux
{c10, sum [10:7]}=c6 [t=13] +mux
{Cout, sum [15:11]}=c10 [t=16] +mux
3) The one set of 2-b RCA in group2 has 2 FA forCin=1 and theother set has 1 FA and 1
HA for Cin=0.
Based on the area countof Table 3.1, the total number of gate counts in group2 is
determinedas follows:
Gate count =57 (HA+FA+Mux)
FA=39(3*13)
HA=6(1*6)
Mux=12(3*4)
28
4) Similarly, the estimated maximum delay and area of the othergroups in the regular
SQRT CSLA are evaluated and listed.
The area and delay values of all the groups of square root CSLA are shown in the Table
3.2.
Table 3.2: Delay and area count of regular SQRT CSLA groups
3.2. AOI
3.2.1 Design of CSLA with AOI:
As stated above, the main idea of this work is to use BEC instead of the RCA with
Cin=1 in order to reduce the area and power consumptionof the regular CSLA. To replace
the n-bit RCA, an (n+1)-bit AOI is required.
The structure and the function table of a 4-b AOI are shown in the figure 3.7 and
Table 3.3 respectively.
3.2.3 Block diagram of CSLA with AOI:
The structure of the proposed 16-b SQRT CSLA using BEC for RCAwithCin=1 to
optimize the area and power is shown in the figure 3.9.
Fig 3.9: Modified system (Modified 16-b SQRT CSLA)

29
Comparing the block diagram of the regular square root CSLA with the modified square
root CSLA, it can be seen that the RCA with Cin=1 is replaced by AOI. This is done to
reduce the area consumption. This can be seen after evaluating the group delay and the
number of gates required for the design.
3.2.4 Architecture of modified model:
The architecture of this modified model is shown in figure 3.10.
30
Fig 3.10: Delay and area evaluation of modified SQRT CSLA: (a) group2, (b) group3, (c)
group4 (d) group5. H is a Half Adder.
3.2.5 Delay and Area Evaluation methodology of Modified 16-b sqrtCSLA:

31
The structure is split into five groups. The delay and area estimation of each group
are shown in the figure 3.10. The steps leading to the evaluation are given here.
1. The group2 has one 2-b RCA which has 1 FA and1 HA for Cin=0. Instead of
another 2-b RCA with Cin=1, a 3-b BEC is used which adds one to the output
from 2-b RCA.Based on the consideration of delay values of the gates, the arrival
time of selection input c1[time(t)=7] of 6:3 mux is earlier than the s3[t=9] and
c3[t=10] and later than the s2[t=4]. Thus, the sum3 and final c3 (output from mux)
are depending on s3and mux and partial c3 (input to mux) and mux, respectively.
The sum2 depends on c1 and mux.
2. For the remaining groups the arrival time of mux selection input is always greater
than the arrival time of data inputs from the BECs.Thus, the delay of the
remaining groups depends on the arrivaltime of mux selection input and the mux
delay.
3. The area count of group2 is determined as follows:
Gate count= 43 (FA + HA + Mux + AOI)
FA= 13(1*13)
HA= 6(1*6)
AND=4
OR=2
NOT=3
MUX = 12 (3*4)
4. Similarly, the estimated maximum delay and area of the othergroups of the
modified SQRT CSLA are evaluated and listed in the Table 3.4.
Table 3.4: Delay and area count of modified SQRT CSLA groups
32
33
3.3 Comparison of Regular CSLA and CSLA with AOI:

From the Table 3.4, it can be seen that the second and third columns represent the
delay and are of Regular CSLA and fourth and fifth columns represent delay and area of
CSLA with AOI.
It can also be seen from the table that Regular CSLA requires more number of gates
for its implementation when compared to that of the CSLA with AOI. Thus, it can be
concluded that less number of gates means less chip size and finally the cost of the design
and fast performance.
Regular CSLA
Group
Group2
Group3
Group4
Group5
Delay
11
13
16
19
Regular CSLA with AOI

Area
57
87
117
147
Delay
11
16
23
30
Area
40
57
74
98
Table 3.5: Comparison Table of Regular CSLA and CSLA with AOI
34
CHAPTER 5
RESULTS AND DISCUSSIONS

5.1 Results
The
regular
and
modified
designs
have
been
developed
using
VerilogLanguage,simulated and synthesized in Xilinx version 10.1. The simulated and

synthesized results of modified designs are presented in the following sections.
35
5.2 Applications:
1. Image processing
In image processing with interpolation, an output of the gamma circuit and the input
data are input to an adder circuit so as to obtain the added and averaged values at a
predetermined ratio.
2. Signal processing
Addition is by far the most fundamental arithmetic operation. It has been ranked the
most extensively used operation among a set of real-time digital signal processing
benchmarks from applicationspecific DSP to general purpose processors.
3. Arithmetic logic units
Carry select adder is used in arithmetic logic units to perform addition and
multiplication in a less amount of time.
4. Advanced microprocessor design
In microprocessor design, the adder is used for the conversion mechanism in
calculating the physical address using the offset address and segment address.
5. High speed multiplications
In multiplier, each bit of the Product P is obtained by a summation of bits AiBj using
an array of single bit adders. The bits AiBj are formed using AND gates.
5.3 Advantages:
1. Low area:
The modified Carry Select adder consumes less logic gates (low area) as it eliminates
the pairs of Ripple carry adders.
36
2. Low power consumption:

In this design, it can be seen that the total power consumed by both the design is
almost the same.
5.4Disadvantages:
1. Increased delay:
Even though there is reduction in area, a slight increase in delay can be seen in the
modified CSLA.
37
CHAPTER 6
CONCLUSIONS AND FUTURE SCOPE

6.1 CONCLUSION
A simple approach is proposed in this paper to reduce the area andpower of SQRT
CSLA architecture. The reduced number of gates of this work offers the great advantage
in the reduction of area and also the total power.
6.2 FUTURE SCOPE

This project uses System Verilog i.e., the technology used is direct test cases, randoized
test cases, OVM for verification. Even though the coverage is 100%, there may be some
errors which cannot be shown. So in order to overcome this, the new technology of
System Verilog is OVM and UVM. In the coming future, the Router can be done by using
OVM and UVM.
38

MCSLA

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

MCSLA

Transféré par

Droits d'auteur :

Formats disponibles

CHAPTER 1

1.2 BASIC IDEA

1.3 NEED FOR LOW POWER AND AREA EFFICIENT DESIGN

1.4 REGULAR CARRY SELECT ADDER

Fig 1.1: Regular Carry Select Adder

1.5 MODIFIED CARRY SELECT ADDER

Fig 1.2: Modified Carry Select Adder

Fig 2.1: Overview of the prominent trends in information technologies

Less area/volume and therefore compactness.

2.2 Introduction to Adders:

Fig 2.2: Half adder

Table 2.2: Truth table of half adder

2.2.1.2 Full adder

Fig 2.3: Full adder

Fig 2.4: Logic diagram of Full adder

Table 2.3: Truth table of 1 bit full adder

Cout= (A.B) + (Cin. (A B))

2.2.2 FAST ADDERS

Fig 2.5: Block diagram of RCA

2.2.2.2 Carrylookahead adders

Fig 2.6: 4-bit adder with carry lookahead

Fig 2.7: A 64-bit carry look ahead unit

Fig 2.8: Carry skip adder

TCSKA = (b -1)+0.5+(N/b-2)+(b -1) = 2b + N/b 3.5 Stages

Fig 2.9: Architectural block of 8-bit Carry skip adder

If each Ai # Bi in a group, then we do not need to compute the new value of

propagated up to the output of that group.

2.2.2.4.1 Linear carry select adder (LCSLA):

Fig 2.10: 16-bit Linear carry select adder

Fig 2.11: 16-bit Square root carry select adder

2.3 3:2 compressors

to the output while a logic value of 1

to the output. In larger multiplexers, the number of selector pins is equal

2.4.1 2:1 Mux

Fig 2.12: Block diagram of 2:1 Mux

Fig 2.13: Block representations of various multiplexers

one realized from a decoder, AND gates and an OR gate

Here the output Z is given as

Fig 2.14: Implementation of 4:1 MUX using 2:1 MUXs

inputs indicate the decimal value of the

binary control inputs at which that input is let through.

Design of square root carry select adder

The above classification will be discussed individually in the following sessions.

Fig 3.1: Basic building block of CSLA

3.1.1 Block diagram of 16-bit Carry Select adder:

Fig 3.2: Existing system (Regular 16-bit Carry select adder)

Fig 3.3: 4-bit carry select adder module topology

Fig 3.4: Delay propagation of 16-bit CSLA

Fig 3.6: Delay and Area evaluation of an XOR gate

Fig 3.9: Modified system (Modified 16-b SQRT CSLA)

3.2.5 Delay and Area Evaluation methodology of Modified 16-b sqrtCSLA:

3.3 Comparison of Regular CSLA and CSLA with AOI:

Regular CSLA with AOI

RESULTS AND DISCUSSIONS

VerilogLanguage,simulated and synthesized in Xilinx version 10.1. The simulated and

2. Low power consumption:

CONCLUSIONS AND FUTURE SCOPE

6.2 FUTURE SCOPE

Vous aimerez peut-être aussi