Vous êtes sur la page 1sur 55

Chapter 1

INTRODUCTION
1.1 Preamble The increasing demand for high performance, high speed and battery-operated system in communication and computing has shifted the focus from traditional constraints (such as area, performance, cost, reliability) to power consumption. Multiplier is used as computational unit in various systems like DSP and in Microprocessors etc. A Multiplier is said to be efficient if it is having high speed, low power and less area. Generally in digital system we deal with two type of logic static and dynamic. The largest difference between static logic and dynamic logic is that in dynamic logic a clock is used. Dynamic logic is over twice as fast as normal logic. Dynamic logic is harder to work, but if need the speed there is no other choice. Dynamic logic requires two phases, the first phase is set up phase or precharge phase, in this phase the output is unconditionally go to high(no matter the values of the inputs). The capacitors which present the load capacitance of this gate become charged. During evaluation phase, clock is high. Popular implementation of dynamic logic is domino logic. Domino logic is a CMOS based evaluation of the dynamic logic techniques which are based on the either PMOS or NMOS transistors. It was developed to speed up the circuits. The dynamic gate outputs connect to one inverter, in domino logic. In domino logic, cascade structure consisting of several stages, the evaluation of each stage ripples the next stage evaluation, similar to a domino falling one after the other. Once fallen, the node state cannot return to 1 (until the next clock cycle), just as dominos, once fallen, cannot stand up. The structure is hence called domino CMOS logic. Domino logic is used in today's high performance microprocessors for implementing circuits that are both high speed and area efficient. Among its many advantages, domino logic provides reduced input capacitance, and low power dissipation. The domino logic circuits are driven by clock. The distribution network dissipates from 20% to 45% of the overall consumed power, thus this prevents the use of domino logic circuits in low power applications. The multiplier operation consists of various shifting and addition steps. The multiplier will be faster if the addition steps are performed by high speed adders. For this purpose I have used the
1

fast Carry lookahead, Wallace tree and Kogge stone adder. Also I have used modified booth algorithm to design the multiplier as it reduces the no of partial products almost to half .this reduces the power dissipation and area. The performance evaluation of multiplier is necessary in order to know which multiplier design is meeting a particular systems requirement. Because different system may require a multiplier with different parameters for example a microprocessor requires a multiplier with high speed but other system may require a multiplier with less area. We cannot get the two things in one design. So in order to know which multiplier design will suit to our requirement, the performance of multiplier is evaluated. 1.2 Historical perspective In the early CMOS days there was great need of low power technologies. But due to its unavailability the CMOS industries was not able to reduce the chip size or area , because reducing chip size means more no. of circuits on a single chip. Thus the power per unit area increases which leads to power dissipation. Therefore there is a great need of circuits which are having less power dissipation. This need leads to the invention of fast and low power dynamic circuits. So in the NMOS days of 1970s, dynamic logic was used to reduce power consumption (inherent in NMOS logic) and area. But this dynamic logic (implemented on the cmos circuit) was having contention problem. Then there comes footed dynamics circuits, which removes the problem of contention. There was a problem associated with these dynamic circuits that is the monotonicity requirement. Then in the early 1980s variant of dynamic circuits was proposed. The proposed variant was known as domino logic circuits. Domino logic was proposed by Robert Krambeck 1982 [10]. This domino logic circuit the only produces inverting output but however certain applications require inverting as well as non inverting operations. Therefore this need leads to dual-rail domino logic circuits in the recent years. As multiplier operation consist of various shifts and adds operations so talking about the historical background of the adders used in the multiplier design. In early 1950s ripple carry adder was invented. The adder passes the carry from least significant bit to most significant bit, this result in more delay and hence increased power dissipation. Then after a faster adder known as Carry lookahead added was invented in 1958 by Weinberger. But it was found that the delay
2

of Carry lookahead adder increases with increase in the no. bits to be added therefore need was felt to have faster adder. So in 1973 Kogge Stone adder was designed. It is based up on the same principle as Carry lookahead adder or we can say that it is a Carry lookahead adder in which carry is generated in different manner which makes it faster than Carry lookahead adder. After Kogge stone adder many new adders were invented but Kogge stone adder was found faster than other adders. 1.3 Thesis Objective Based up on above discussion the thesis has the following objectives:
To design and simulate the two different types of multiplier. The performance evaluation of multipliers will be done in terms of speed, power

dissipation and leakage current. 1.4 Organization of thesis The 2nd chapter explains about the four main modules that is booth encoder partial product generator, Wallace tree adder, the carry lookahead and Kogge stone adder. Or we can say that the 2nd chapter explains the Architecture of the multiplier. In the 3rd chapter includes definition of dynamic circuits, various advantages and disadvantages are studied. Also the reason to place the footed transistor mention in details. The 4th chapter explains the concept of leakage and sources of leakage and a leakage reduction technique. The 5th chapter includes the simulation waveforms, results, and Observations. 1.5 Methodology In this thesis, two different types of multiplier are designed using domino dynamic circuit. In order to have high speed multiplier, adders like Carry lookahead, Kogge stone and Wallace tree are used. Two 4 4 modified booth multipliers are designed and simulated by using Design Architect tool of Mentor Graphics based on 180nm CMOS technology.
3

Chapter 2
MULTIPLIER ARCHITECTURE
2.1 Architecture The multiplier consists of four main modules i.e. booth encoder, partial product generator, Wallace tree adder and Carry lookahead or Kogge stone adder as shown in fig. 2 [11]. The multiplier is based up on modified booth algorithm which reduces the power dissipation by reducing the number of partial product rows. The output of booth encoder acts as input to partial product generator which generates the partial product rows. The partial product rows are further added in the Wallace tree adder. The adder gives sum and carries bits at its output .The sum and carry bits are then added in the Kogge stone adder, which provides the final product bits.
Booth encoder

Local clk

Partial product generator

Wallace tree adder

Global clk

Kogge stone adder

Product Fig 2.1 Block Diagram of Multiplier The multiplier uses two clocks, the local and global clocks. The local clock is given to the booth encoder while global clock is given to the Partial product generator, Wallace tree adder and the final adder which may be carry lookahead or Kogge stone adder. The two clocks are given to avoid clock skew.
4

2.2 Booth encoder The encoder used in the multiplier is based up on Booth algorithm so it is called Booth encoder. The Booth encoding was proposed by Andrew Donal Booth in 1951 [2]. The algorithm consisted of various shift and add operations, it become difficult when the number of add- subtract operations and the number of shift operations becomes variable so it becomes inconvenient in designing parallel multipliers and also the algorithm becomes inefficient when there are isolated 1s. Booth algorithm was later on modified by O. L. Macsorley in 1961 algorithm this is called Modified Booth Algorithm [11]. The booth encoding algorithm is a bit-pair encoding algorithm that generates partial products which are multiples of the multiplicand. The encoding method is widely used to generate the partial products for implementation of large parallel multipliers, which adopts the parallel encoding scheme. The basic principle for the modified Booth encoding can be described as follows. Let us consider the multiplication of two fixed-point twos complement numbers X and Y, where Y is the multiplier and Y is the multiplicand, both of them have n bits, and the Y can be expressed by
Y = Yn 1 2 n 1 + Y= Y= Y=
i =n/2 -1 i =0 i =n 2 i =0

Y 2 ,
i i

(2.1)

(-2Y d .2
i =0 i i =0 i

2i +1

+Y2i + Y2i + Y2i -1 )2 2i ,

i =n/2 1

2i

i =n/2 1

d .4 ,
i

(2.2)

Using this notation the multiplication of X and Y is given by


XY = XY =
i =n/2 1 i =0

d .4 .X,
i i

i =n/2 1 i =0

P .4
i

(2.3)

The booth algorithm shifts and/or complements the multiplicand (X operand) based on the bit patterns of the multiplier (Y operand). Essentially, three multiplier bits [Y (i+1),Y(i), and Y(i-1)] are encoded into nine bits that are used to select multiples of the multiplicand {-2X, -X, 0, +X,
5

+2X}. The three multiplier bits consist of a new bit pair [Y(i+1), Y(i)] and the leftmost bit from the previously encoded bit pair [Y(i-1)] as shown in table 2.1.Obviously, from the above equation, the partial product, Pi+1, should be shifted two positions to the left of the partial product, P i , as Pi the is multiplied by 2i.For an n n-bit multiplication, the booth algorithm produces n/2 partial products. That is this algorithm reduces the number of partial products almost to half. It reduces the number of adders by 50% which results in a higher speed, a lower power dissipation, and a smaller area than a conventional multiplication array. Table 2.1 Booth Encoding
Y(i+1) Y(i) Y(i-1)

Neg 0 0 0 0 1 1 1 0

One

Two 0 0 0 1 1 0 0 0

Operation Multiplicand(X) 0 +1 +1 +2 -2 -1 -1 0

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 1 0 0 1 1 0

X X X X X X X X

The problems faced in Booth algorithm can be overcome by using Modified Booth algorithm. The algorithm works as below: 1) Two zeros are appended to the left of the MSB and one zero is appended on the right of LSB of the multiplier bits in unsigned multiplication. 2) Append a 0 to the right of the LSB of the multiplier bits in signed multiplication. 3) Divide the resulting total bits into triplets, in case of unsigned multiplication there are 3 triplets so there are three partial product rows in this case and in case of signed multiplication there are only two triplets so there are two partial product rows in this case. These steps will be clearer in the next section that is sign extension. Multiplier bits 0 0 0
6

P03 P02 P01 P00 P13 P12 P11 P10 P23 P22 P21 P20 Fig 2.2 Booth encoding and partial product generation As explained above, we divide the multiplier bits in overlapping triplets. Then these triplets are given as input to the Booth encoder block. The number of encoder block is equal to the number of triplets. Each booth encoder has three bits output which are named as neg, one and two. So in case of 4 adders. The gate level diagram of booth encoder is shown below that is how three outputs are derived from three multiplier bits.

multiplication there are three booth encoders. However the Modified Booth

algorithm reduces the number of partial products and hence this results in the reduction of

Yi+1 ` Yi bar Yi-1bar Neg

Yi+1 Yi bar Yi-1bar Two Yi+1bar Yi Yi-1

Yi Yi-1

One

Fig 2.3 Gate level diagram of booth encoder

2.3 Partial product generator


The output from the Booth encoder is used in this module to generate the partial products. Since there are three Booth encoders there will be a total of three partial product rows in case of 4

multiplication. The simplest partial product generator produces N partial product rows where N is
the no. bits to be multiplied. Since the amount of hardware and delay depend up on the number of partial product to be added this reduces the hardware cost and improve performance. The architecture of the partial product generator is shown in figure 2.3.

Two Xi-1

One Xi PPG

Neg

Fig 2.4 Architecture of the partial product generator The neg, one, and two are the three output bits of booth encoder and the X i and Xi-1 are the multiplicand bits. These five inputs are given to partial product generator and these results in the generation of a partial product bit. Xi is the recent and Xi-1 is the previous bit of the multiplicand. PPG is the output of the partial product generator. In case of 4

4 bit multiplications since there

are twelve partial product bits so we have to use twelve partial product generators. However the booth encoding scheme has proved very useful as it has reduced the number of partial product to almost half. 2.4 Sign extension in Booth multipliers

While multiplying two numbers we first have to see whether they are signed or unsigned [11]. The signed numbers are those which contain negative sign and unsigned are those which contain positive sign. Both these types of numbers are handled differently before adding them to get the final product. 2.4.1 Sign extension for unsigned multiplication The unsigned numbers are multiplied normally but further we have to see whether the partial products resulted are positive or negative. Both these types are handled differently. The partial product is positive if the neg is low that is its value is zero. And the partial product is negative if neg is high that is its value is one this will be more clear from the following figure 2.5. 0

LSB

0 0

0 0

P03 P02 P01 P00 neg0

MULTIPLIER BITS

P13 P12 P11 P10 neg1

0 0 MSB

P23 P22 P21 P20

Fig 2.5 Sign extension in booth multiplier with positive partial product The figure above is the case of unsigned 4

4 bit multiplications in which the partial products

are positive. In this case neg0 and neg1 will be zero and each partial product except is appended with zeros and extended up to last partial products last bit. Taking another case where all the partial products are negative, in this case neg is taken as one and we have to complement all the partial product bits, clearer from following figure 2.6.

LSB

1 neg0

MULTIPLIER BITS

1 neg1

0 0 MSB

Fig 2.6 Sign extension in booth multiplier with negative partial product As clear from the figure the there are two partial product rows which are negative, one can have a question in their mind that why the last partial product row is not complemented? As we have appended two zeros in the most significant position so you can see it from table the multiplier bits with two zeros in the most significant position will always produce positive partial product row. Also in this case neg0 and neg1 will be equal to one. And each partial product except last one is appended with ones and extended up to last partial products last bit. Also, each partial product is shifted two bits to the left relative to the partial product above it to account for the Modified Booth encoding of the multiplier. If a unsigned number have both negative as well as positive partial products then the same procedure as discussed in above two cases is applied. We simply complement the partial product which is negative and add neg equals to one to the partial product. But if the partial product is positive in that case do not complement it and then add neg equal to zero to it. 2.4.2 Sign extension for signed multiplication As the signed numbers are negative and these numbers are 2s complemented before they are multiplied. The following modifications are necessary for 2s complement, signed multiplication.

10

LSB

P03 P02 P01 P00 neg0

MULTIPLIER BITS

P13 P12

P11

P10 neg1

Fig 2.7 Sign extension in booth multiplier with signed numbers The most significant partial product, which is necessary to guarantee a positive result, is not needed for signed multiplication. That is we do not need to append the two zeros at left side of most significant bits of multiplier. So in this case we have only two partial product rows in case of 4

4 bit multiplications. As above, in this case too we can have positive as well negative

partial products. The sign extension for signed number as shown above, in this case we do not need to complement the partial product rows. We uses E a variable, its value varies as follows. Neg is 1 if partial products are negative. Neg is 0 if partial products are positive.
E is 1 if multiplicand and partial product is negative or if both are positive.

E is 0 if multiplicand is negative and partial product is positive or if multiplicand is positive and partial product is negative.

2.5 Wallace tree Adder A Wallace tree is an implementation of an adder tree designed for minimum propagation delay
[9]. Rather than completely adding the partial products in pairs like the ripple adder tree does,

the Wallace tree sums up all the bits of the same weights in a merged tree. Usually full adders are used, so that 3 equally weighted bits are combined to produce two bits: one (the carry) with
11

weight of n+1 and the other (the sum) with weight n. Each layer of the tree therefore reduces the number of vectors by a factor of 3:2 (Another popular scheme obtains a 4:2 reduction using a different adder style that adds little delay in an ASIC implementation as shown in figure 2.5.1). The tree has as many layers as is necessary to reduce the number of vectors to two (a carry and a sum). A conventional adder is used to combine these to obtain the final product. Probably the single most important advance in improving the speed of multipliers, pioneered by Wallace, is the use of carry save adders (CSAs also known as full adders or 3-2 counters), to add three or more numbers in a redundant and carry propagate free manner. The method is illustrated in figure 2.8.

0/1 P33

P23

P32

P13 P22 P31 P03 P12 P30 P21

P02 P11 P20

P10

P01 P00

HA

FA

FA

FA

HA

HA

HA

FA

FA

HA

A7

C 5 A6

C4 A5

C3 A4

C 2 A3

C1

A2

A1

A0

Fig 2.8 4 4 Wallace tree adder However, in addition to the large number of adders required, the Wallace trees wiring is much less regular and more complicated. As a result, Wallace trees are often avoided by designers, while design complexity is a concern to them. Wallace tree styles use a log-depth tree network for reduction. Faster, but irregular, they trade ease of layout for speed. Wallace tree styles are generally avoided for low power applications, since excess of wiring is likely to consume extra power. While subsequently faster than Carry-save structure for large bit multipliers, the Wallace tree multiplier has the disadvantage of being very irregular, which complicates the task of
12

coming with an efficient layout. The Wallace tree multiplier is a high speed multiplier. The summing of the partial product bits in parallel using a tree of carry-save adders became generally known as the Wallace Tree. Three step processes are used to multiply two numbers. Formation of bit products. Reduction of the bit product matrix into a two row matrix by means of a carry save adder. Summation of remaining two rows using a faster Carry Look Ahead Adder or other adders like Kogge stone adder.

Fig 2.9 3:2 Compressor

3:2 COMPRESSOR
As shown in figure 3:2 compressors takes three inputs and produces two outputs. The equations for sum and carry are shown below.
Sum = a b c

Sum

Carry

(2.4)
C arry = (a b)c + (a b)a

(2.5)

Apart from 3:2 compressors, 4:2 compressors have become a topic of significant research in the arithmetic community [13]. The 4:2 compressors has transformed the standard frame of mind of counter based partial product reduction schemes by introducing the notion of horizontal data paths within stages of reduction.

13

Cout

4:2 COMPRESSOR

Cin

sum

carry

Fig 2.10.1 4:2 Compressor

As clear from the figure the 4:2 compressor takes four inputs and produces two outputs. However 4:2 compressors can be realized from 3:2 compressors. The realization will be clearer from the figure as shown below.

14

Cin

3:2 COMPRESSOR

Cout

3:2 COMPRESSOR

Sum

Carry

Fig. 2.10.2 4:2 Compressor realization using 3:2 Compressor

The Cin is initially taken as zero. The sum, carry and Cout for the 4:2 compressor can be given
Sum = a b c d C in

(2.6)
C ot u =(a )c +(a )a b b

(2.7) (2.8)

Carry = (a b c d)C in + (a b c d)d

However we can also use higher compressors. These are the compressors which compresses large number of inputs to lower number of outputs. For example we can use 5:2 and other higher compressors where large numbers of bits are to be multiplied. But I am designing 4 4 bit multiplier so 3:2 compressor can serve the purpose.
15

2.6 Carry lookahead adder Carry Look Ahead Adder can produce carries faster due to carry bits generated in parallel by an additional circuitry whenever inputs change. This technique uses carry bypass logic to speed up the carry propagation [11]. Let ai and bi be the augends and addend inputs, ci the carry input, Si and Ci+1, the sum and carryout to the ith bit position. If the auxiliary functions, pi and gi called propagate and generate signals, then the sum is given as follows.
p = a b , i i i g = a b i i i

(2.8) (2.9)

C =g +p C i i i i -1,

S = p C i i i

As we increase the number of bits in the Carry Look Ahead adders, the complexity increases because the no. of gates in the expression Ci+1 increases. So practically its not desirable to use the traditional CLA shown above because it increases the Space required and the power too. Instead we will use here Carry Look Ahead adder (less bits) in levels to create a larger CLA. Commonly smaller CLA may be taken as a 4-bit CLA. So we can define carry lookahead over a group of 4 bits. The carry lookahead logic which produces the individual group carries is illustrated in Figure 2.5. The carries are produced in two stages. Since the group G and P signals are positive logic coming from the groups, the first stage is set up in a product of sums manner (i.e. the first stage is OR-AND-INVERT logic). The first stage of the carry lookahead logic produces supergroup G and P. The second stage of the carry lookahead logic uses the supergroup G and P produced in the first stage, along with the carry-in, to make the final group carries, which are then distributed to the individual group output stages. This may be clearer from the figure below and equations written above.

16

a1

sm1 b1 a2 sm2 b2 a3 sm3

b3

a4

sm4

b4

PG Logic

Carry logic

Sum logic

Fig 2.11 4 Bit Carry Lookahead adder

A Carry look-ahead adder improves speed by reducing the amount of time required to determine carry bits. It can be contrasted with the simpler, but usually slower, ripple carry adder for which the carry bit is calculated alongside the sum bit, and each bit must wait until the previous carry has been calculated to begin calculating its own result and carry bits . A ripple-carry adder works in the same way as pencil-and-paper methods of addition. The ripple carry adder, although simple in concept, has a long circuit delay due to the many gates in the carry path from the least significant bit to the most significant bit. The carry path in the 4-bit ripple carry adder has a total of eight gates in cascade, so the circuit has a delay of eight gate delays. Since only AND and OR
17

gates are involved in the carry path, ideally, the delay for each of the four carry signals produced, C1 through C4 , would be just two gate delays. The basic carry lookahead circuit is simply a circuit in which functions C1 through C3 have a delay of only two gate delays. The implementation of C4 is more complicated in order to allow the 4-bit carry lookahead adder to be extended to multiples of 4 bits, such as 16 bits. The carry lookahead adder is faster than ripple carry because some computations are done in advance. It will be clearer from the equations below. C1 = g1 + p1C0 C2 = g2 + p2 (g1 + p1C0) C3 = g3+ p3 (g2 + p2 (g1 + p1C0)) C4 = g4+ p4 (g3+ p3 (g2 + p2 (g1 + p1C0))) (2.9) (2.10) (2.11) (2.12)

As clear from above equation (2.12) part of C4 can be pre-computed after C0, p1 and g1 are known [10]. We defined a term Area-Delay Product which gave us the clear picture of the space-time tradeoff. It is worthy to note that while we consider all the adders discussed above Ripple Carry adder and Carry lookahead Adder. Because, while Ripple Carry Adders have a smaller area and lesser speed, in contrast to which Carry Select adders have high speed (nearly twice the speed f Ripple Carry Adders) and occupy a larger area. But Carry Look Ahead Adder (CLA) has a proper balance between both the Area occupied and Time required. Hence among the three, Carry Look Ahead Adder has the least AREA DELAY PRODUCT. Hence we should use Carry Lookahead Adders when it comes to optimization with both Area and Time. For an instance, the last stage of the Wallace tree Adder in Booth multiplier is a Carry look Ahead Adder. Regarding the circuit area complexity in the adder architectures, the ripple-carry adder (RCA) is the most efficient one, but the carry lookahead adder (CLA) is more complex than ripple carry adder.

18

2.7 Kogge stone adder This adder was proposed by Kogge and Stone in 1973 has minimal depth as well as bounded fanout (i.e., maximum fan-out is 2) at the cost of a massively increased number of black nodes and interconnections [3]. This is achieved by using a large number of independent tree structures in parallel. However, Kogge stone adder is nothing but prefixed carry-lookhead adder. In prefix circuit, every output depends on all inputs of equal or lower magnitude, and every input influences all outputs of equal or higher magnitude. Let

be an arbitrary associative binary


xn

operation. A prefix circuit for is a combinational circuit which takes n inputs x1, x2, . . . , xn and generates n outputs x1, x1

x2, x1 x2 x3, . . . , x1 xn as shown in Figure 2.12.


x1 x2

Prefix Circuit

x1 x1 x2 x1 .. xn

Fig 2.12 Function of parallel prefix circuit It generates carry in O (logn) time and is widely considered as the fastest adder and is widely used in the industry for high performance arithmetic circuits. In KSA, carries are computed fast by computing them in parallel at the cost of increased area. Wiring congestion is often a problem for Kogge-Stone adders. The Kogge stone adder is also called tree adder. The complete functioning of KSA can be easily comprehended by analyzing it in terms of three distinct parts as shown in the next page.
19

Preprocessing:-This step involves computation of generate and propagate signals

corresponding too each pair of bits in A and B. These signals are given by the logic equations below: pi = Ai xor Bi gi = Ai and Bi
Carry lookahead network:-This block differentiates KSA from other adders and is the

main force behind its high performance. This step involves computation of carries corresponding to each bit. It uses group propagate and generate as intermediate signals which are given by the logic equations below: Pi:j = Pi:k+1 and Pk:j Gi:j = Gi:k+1 or (Pi:k+1 and Gk:j )
(2.13) (2.14)

Post processing:-This is the final step and is common to all adders of this family (carry

look ahead). It involves computation of sum bits. Sum bits are computed by the logic given below: Si = pi xor Ci-1

a3 G3 P3

b3

a2

b2 G2 P2

a1 b1 G1 P1

a0 b0 G0 P0

G 3:2 P3:2

G 2:1 P2:1

G 1:0 P1:0

G0 =C0

G 3:0 P3:0

G 2 =C2

G 1 =C1

Fig 2.12 4bit Kogge Stone adder

20

As shown in above example of 4 bit Kogge-Stone adder. Each vertical stage produces a "propagate" and a "generate" bit, as shown [6]. The culminating generate bits (the carries) are produced in the last stage (vertically), and these bits are XOR'd with the initial propagate after the input (the red boxes) to produce the sum bits. e.g., the first (least-significant) sum bit is calculated by XORing propagate in the farthest-right red box (a "1") with the carry-in (a "0"), producing a "1". The second bit is calculated by XORing propagate in second box from the right (a "0") with C0 (a "0"), producing a "0". The equations for PG logic written above are for valency 2 group PG logic because it combines pair of smaller groups. When large numbers of groups are combined then that in valency 4 group logic the equations [11] for PG logic are: Pi:j = Pi:k and Pk-1:i and Pl-1:m and Pm-1:j Gi:j = Gi:k or Pi:k (Gk-1:l or Pk-1:l (Gl-1:m or Pl-1:m and Gm-1:j )), (i>=k>m>j)
(2.15) (2.16)

The Kogge stone adder is faster one out of other CLA based tree adder. However Kogge stone adders in higher radix are also available. The logic depth of higher (radix-4) KSA adder is less but each stage is more complex than radix 2 Kogge stone adders. The 4 bit radix-2 is shown above. The 8 bit radix-4 Kogge is shown below.

Input

Carry logic

Sum S7 S6 S5 S4 S3 S2 S1 S0
.

Fig 2.13 8 bit Radix-4 Kogge stone adder

21

CHAPTER 3
DYNAMIC CMOS CIRCUITS
3.1 Introduction Although there are many positive reasons for using static CMOS logic, there are also numerous drawbacks. Static devices inherently have more components and clocked transistors than dynamic devices. A full latch for example in the traditional static configuration may require 66 transistors. A dynamic configuration performing the same function may require only 36 transistors. The number of transistors used to construct a flip-flop is also significantly reduced by using dynamic logic as opposed to fully static logic. Reducing the total number of transistors not only allows the overall device to be significantly smaller, but also reduces the power requirements of the system. Most of the disadvantages of using static CMOS, however, are associated with the use of PMOS because hole mobilities are significantly slower than electron mobilities, PMOS devices must be much larger than NMOS devices for the two to have the same ability to transport a fixed amount of charge during a fixed time interval. The larger surface area needed to form a PMOS device than an NMOS device is not only a detriment to the overall chip size, but also increases the capacitance associated to the PMOS device. The larger capacitance and slower carrier mobilities associated with PMOS cause results in a greater time delay for the PMOS to charge up the capacitor associated with the next logic stage. This increased time delay becomes a bottleneck when trying to design faster circuits. In standard CMOS logic, one PMOS device will always compliment an NMOS device. Altering this logic so that fewer PMOS devices are needed will vastly improve circuit performance. An alternative logic that reduces the number of PMOS devices while also solving most of the problems associated with pseudoNMOS logic is dynamic CMOS. The basic structure of dynamic CMOS logic is shown in Fig 3.1 [11]. When the clock is low, the NMOS device is cutoff while the PMOS is turned ON. This has the effect of disconnecting the output node from ground while simultaneously connecting the node to VDD. Since the input to the next stage is charged through the PMOS transistor when the clock is low, this phase of the clock is known as the precharge phase. When clock is high however, the PMOS is cutoff and the bottom NMOS is turned ON, thereby disconnecting the output node from VDD and providing a possible pull-down path to ground through the bottom
22

NMOS transistor. This part of the clock cycle is known as the evaluation phase, and so the bottom NMOS is called the evaluation NMOS. When the clock is in the evaluation phase, the output node will either be maintained at its previous logic level or discharged to GND.

Vdd

Clk Y A

Gnd
Fig 3.1 Basic Dynamic CMOS circuit

In other words, the output node may be selectively discharged through the NMOS logic structure depending upon whether or not a path to GND is formed due to inputs of the NMOS logic block. If a path to ground is not formed during the evaluation phase, the output node will maintain its previous voltage level since no path exists from the output to VDD or GND for the charge to flow away. 3.2 Footed dynamic circuit If the input A is high during precharge ,contention will take place because both the PMOS and NMOS transistors will be ON. When the input cannot be gauranteed to be zero during the precharge ,an extra clocked evaluated transistor can be added to the bottom of nmos stack to avoid contention. The extra transistor is called foot . Due to the fact that we have removed contention the logic effort get improved footed transistor have higher logical effort then unfooted.
23

Vdd

Y Clk A

Gnd

Fig 3.2 Footed dynamic circuit

.3.3 Advantages of Dynamic circuits No static power dissipation The dynamic circuits are not having any power dissipation when there is no circuit activity i.e.there is no change in inputs occurs. The dynamic circuit dissipates power when the inputs are active i.e. when the input switches from one state to another. Higher speed Dynamic logic is faster than the normal logic. It uses only fast N transistors that is it use more no. of nmos than pmos but in static circuits more no of PMOS are used to represent a logic. A pmos is slower than nmos as the mobility of holes in PMOS is slower than the mobility of electrons in nmos. So dynamic are circuits faster than static. The example is shown in figure 3.3 and 3.4.

24

Vdd

Vdd

Vdd

B
Y

Y A

Clk

Gnd
Gnd

Fig 3.3 Static NAND

Fig 3.4 Dynamic NAND

Low power requirement A static circuit uses more no. of transistor than dynamic for e.g. static latch require 66 transistors and a dynamic latch requires only 36 transistors [7]. Reducing the no. of transistor not only allows the overall device to be significantly smaller but also reduces power requirement of a system. 3.4 Monotonicity Requirement in dynamic circuits A fundamental difficulty with the dynamic circuit is the monotonicity requirement [6]. While a dynamic gate is in evaluation, the input must be monotonically rising. That is, the input can start LOW and remain LOW , start LOW and go HIGH, start HIGH and can remain HIGH .As shown in figure 3.6 the dynamic inverter violates monotonicity. During precharge, the output is pulled HIGH. When the clock rises, the input is HIGH so the output is discharged LOW through the pull down network. The input falls later LOW, turning off the pull down network. However the precharge transistor is also off so the output floats, staying LOW rather than rising .The output will remain low until the next precharge step.

25

In summary the inputs must be monotonically rising for the dynamic gate to compute correct function.
Vdd

Precharge transistor
Y Clk A

Gnd

Fig 3.5 Dynamic circuit

A Precharge Clk Y evaluation

Fig 3.6 Monotonicity in dynamic circuits Unfortunately the output of a dynamic gate begins HIGH and monotonically falls LOW during evaluation. This monotonically falling output is not suitable input to second gate. So dynamic gates sharing same clock cannot be directly connected. So a solution to overcome this problem is required. In the next chapter the solution to overcome monotonicity is discussed in details.

26

3.5 Domino logic dynamic circuits The monotonicity problem can be solved by placing a static CMOS inverter between dynamic gates as shown below in figure 3.7. This converts the monotonically falling output in to monotonically rising signal suitable for the next gate. The dynamic static pair together is called DOMINO GATE [7] because precharge resembles setting up a chain of dominos tipping over, each triggering the next. A single clock can be used to precharge and evaluate all the logic gates within the chain. The dynamic output is monotonically falling during evaluation, so the static inverter output is monotonically rising.
Vdd Vdd Vdd Vdd

R A

X Y

Clk

B Gnd

B Gnd

Gnd

Gnd

Fig 3.7 Two dynamic NAND gates sharing same clock The dynamic-static combination is known as a domino gate. This is analogous to a chain of dominoes - the precharge represents setting up of dominoes and the evaluation represents their sequential triggering. No doubt, the domino circuit has removed the problem of monotonicity but it further has certain disadvantages. The problems associated with domino are non-inverting output and the charge sharing problem.

27

As it is clear that by placing a inverter solves the problem of monotonicity. Therefore we are getting correct output Y. This will be clear from the figure below.

Clk W R

Precharge

Evaluation

X Y
Fig 3.8 Output waveforms of two dynamic NAND gates sharing same clock 3.5.1 Properties of Domino Logic A single clock can be used to precharge/evaluate each stage in a chain Precharge occurs in parallel, but evaluation occurs sequentially The static inverter can in general be replaced by a static gate Unlike static CMOS gates, domino gates are inherently non-inverting
The gate is capable of very high speed.

28

3.5.2 Example: - XOR gate using domino

Vdd Vdd

Y A Clk B B Gnd A

Gnd

Fig 3.9 Domino Logic Xor gate When clock is in the precharge phase PMOS is On whereas pull down NMOS is Off, output is high (Vdd) at the dynamic node , therefore in the precharge phase whatever the inputs are , the output always remain high at the dynamic node and final output is low after passing through inverter. In the evaluation phase clock goes low, therefore the pull up PMOS is Off and pull down NMOS is On. The output now is evaluated based up on the status of inputs. 3.5.3 Advantages of domino logic circuits
Faster Switching speed: - The study of dynamic circuit shows that dynamic circuits are

having faster switching speed than static cmos. Therefore they greatly serve the need of CMOS industry which requires faster logic to perform million of functions at a time.
29

No short circuit current:-The domino logic circuit is not having any short circuit

current means less power dissipation. Therefore this is best suited for low power applications. 3.5.4 Disadvantages of domino logic circuits Non-inverting output The domino circuits produce only the non -inverting output however certain logic synthesis operations require inverting as well as non -inverting operation in the same circuit. So there is a need of some logic with inverting as well as non-inverting function. Charge sharing Charge sharing [5] is an undesirable signal integrity phenomenon observed most commonly in the domino logic family of digital circuits. The charge sharing problem occurs when the charge which is stored at the output node in the phase is shared among the output or junction capacitances of transistors which are in the evaluation phase. Charge sharing may degrade the output voltage level or even cause erroneous output value.
Clock overloading

In the domino logic circuits clock is associated with every PMOS, therefore in case of large or cascading domino logic circuits overloading occurs. So more the no. of PMOS transistor more will be the clocks associated and hence more is the power dissipation.

30

3.5.5 Keeper The dynamic circuits also suffer from charge leakage on the dynamic node. If the dynamic node is prechaged high and then left floating, the voltage on the dynamic node will drift over the time due to subthreshold, gate, and, junction leakage. This problem is analogous to the leakage in dynamic RAM. Moreover the dynamic circuits have poor noise margins. If the input rises above Vt while the gate is in evaluation phase, the input transistor till turn ON weakly and incorrectly discharge the output. Both leakage and noise margin problem can be reduced by adding a keeper. The keeper is a weak that holds the output at the correct level when it would otherwise float [10]. When the dynamic node X is high and the output Y is low and keeper is ON to prevent X from floating. When X fall, the keeper initially opposes the transistor so it must be weaker than the pull down network. Eventually Y rises turning the keeper OFF and thus avoiding static power dissipation.
Vdd Keeper

Vdd

Vdd

X Clk A Gnd B

Gnd

Fig 3.10 Dynamic Nand gate with Keeper The keeper must be stong enough to compentiate for any type of leakage drawnwhen the output is floating and the pull down stack is OFF. Strong keeper also improves the noise margin
31

because when theinput is slighly above Vt the keeper can supply enough current to hold the dynamic circuit. The keeper width should be carefully decided because too strong keeper may create contention with the pulll down network and too weak may not be useful to hold the output node to its correct value.

32

Chapter 4
LEAKAGE IN CMOS CIRCUITS
4.1 Leakage in CMOS circuits Low-power consumption in high performance VLSI circuits is highly desirable aspect as it directly relates to battery life, reliability, packaging, and heat removal costs .With the continuous trend of technology scaling, leakage power is becoming major contributor to the total power consumption in CMOS circuits. Scaling of Vdd reduces dynamic power consumption but degrades the performance of the circuit as well. This can be partially compensated by lowering Vth but at the cost of increased leakage power. Minimizing leakage power consumption is currently an extremely challenging area of research, especially with on-chip devices doubling every two years. Leakage power dissipation [4] arises from the leakage currents flowing through the transistor when there are no input transitions and the transistor has reached steady state. Unlike dynamic power, leakage power depends on the total number of transistors in the circuit, their types, and their operation status regardless of their switching activity. This makes it more difficult to attempt to reduce leakage power than to reduce dynamic power. Leakage current constitutes only of subthreshold leakage, which is pattern dependent as it only occurs in OFF transistors. Hence, this necessitates the need for robust techniques to reduce this leakage power dissipation. To this effect, several techniques have been proposed that efficiently minimize leakage power dissipation. Leakage power has two main forms in modern IC processes: Subthreshold leakage and gate leakage. Subthreshold leakage power is due to a non-zero current between the source and drain terminals of an OFF MOS transistor. With each process generation, supply voltages are reduced and transistor threshold voltages (Vth) must also be reduced to mitigate performance degradations. Reducing Vth leads to an exponential increase in subthreshold leakage. Gate leakage on the other hand is due to tunneling current through the gate oxide of an MOS transistor. In modern IC processes, gate oxides are thinned to improve transistor drive capability, which has led to a considerable increase in gate leakage.

33

4.1.1 Subthreshold leakage Subthreshold current is the most dominant among all sources of leakages [8]. It is caused by minority carriers drifting across the channel from drain to source due to presence of weak inversion layer when the transistor is operating in cutoff region VGS < Vth. The minority carrier concentration rises exponentially with gate voltage VG. ISUB depends on the substrate doping concentration and halo implant, which modifies the threshold voltage VTH. ISUB also rises exponentially with temperature. Leakage power dissipation has become a considerable proportion of the total power dissipated in modern deep submicron technologies. The following equation relates subthreshold current ISUB with other device parameters.
Vds (Vgs - V - V - V ) th0 ds sb (1 - e V ) ISUB = I o e nV W 2 1.8 Io = C ox V e L KT V = q

(4.1)

(4.2)

(4.3) W and L are the width and length of the transistor, is carrier mobility, V is a thermal voltage

is the Drain Induced Barrier Lowering coefficient and n is the slope shape factor/subthreshold
The dependence of subthreshold current on above parameter can be summed up in a table. The leakage current occurrence in NMOS can be seen in figure 4.1. Table 4.1 Dependence of subthreshold leakage current on device parameters Parameter Transistor Width(W) Transistor Length(L) Input Voltage (Vgs) Temperature (T) Dependence Directly proportional Inversely proportional Exponential increase Exponential increase

Above table provides the clear view of the dependence of subthreshold current on transistor width, length, input voltage and temperature. It is good to have the knowledge of these parameters because by knowing these parameters the subthreshold leakage can be avoided. However increasing and decreasing of these parameters may have adverse effect. In deepsubmicron processes, Vdd and VTH MOS transistors have greatly reduced. This effects extent
34

reduction in the switching power dissipation. Exponential behavior of subthreshold leakage current thereby increases static power dissipation. Static power consumption is the product of the device leakage current and the supply voltage. PS = (leakage current)

(supply voltage)

So it is clear from above with increase in leakage current and supply voltage the static power VG < VT dissipation will increase. However there are many remedies to overcome static power dissipation these will be discussed in next section. Portable battery operated devices that have long idle times are particularly affected by this leakage power loss. It remains idle for a majority of time. However, since it is not turned off, valuable battery power is drained. This reduces battery service life. Existing designs must therefore be modified his work analyzes the proposed techniques with circuit performance metrics such as leakage power, dynamic power and propagation delay forming the basis. VS=0 VD=VDD

n+ ISUB

n+ ++ + +++

VB=0 Fig 4.1 Illustration of subthreshold leakage in NMOS transistor.

4.1.2 Gate leakage Gate leakage is a current flowing into the gate of the transistor, called Tunneling. Gate leakage is a serious concern at gate oxide thicknesses below 2 nm [8]. With such thin gate oxide, fairly small potential difference across the gate oxide can induce high electric field, causing electrons
35

to easily tunnel into/through the oxide. The two main tunneling phenomena that lead to gate leakage currents are FowlerNordheim Tunneling and Direct Tunneling. The tunneling probability of an electron depends on the thickness of the barrier, the barrier height, and the structure of the barrier. FN tunneling occurs at very high fields only[7]. Regular tunneling is a more common phenomenon in 45 nm Bulk-CMOS devices. The reason why gate leakage was neglected up till recent years is because of the fact that tunneling drops exponentially with the increase in gate oxide thickness, so for older processes with tox greater than 2nm, the gate leakage was much smaller than the subthreshold leakage (I
SUB

) and therefore was neglected. But with

current process technology parameters, gate leakage has already increased to more than double the subthreshold current and will continue to increase in a much higher rate mandating the use of high-k materials other than Silicon Dioxide to enable the use of thicker oxide thicknesses and/or the use of a different gate material other than polysilicon such as metal gate. As technology advances, tox decreases by 30% with every technology generation and for ox t smaller than 1.4nm, IGate rises by about 1000X in the following process technology step, while I SUB rises by about 5X under normal scaling theory. As an example, NECs 100nm process technology has a tox =1.6nm, I Sub of 0.3 nA/m of gate width, and an NMOS IGate of 0.65 nA/m. IGate has already increased to more than double ISUB in some cases. Tunneling current in PMOS devices is an order of magnitude smaller than tunneling in NMOS devices because the holes in PMOS devices have to pass a higher barrier to tunnel (holes tunnel from the valence band). Gate leakage complicates CMOS circuit operation and analysis as gates have no infinite impedances any more as was previously assumed. Figure 4.1 shows the possible transistor states that will cause gate leakage current to flow in an NMOS device. The input vectors have an impact on the gate leakage which is different from that on the subthreshold leakage. With 10 as the input vector, |Vs|=Vth_b (the threshold voltage considering body (effect), whereas with 00 as the input |Vg1|=Vm (<Vth) [4]. Hence, the gate-to-source overlap and gate-to-channel currents of M1 with 10are higher than the corresponding currents with00. However, this increase is much less than the decrease in gate to drain tunneling in M2 with 10 (|Vgd2|=Vdd-Vth) from the gate to drain tunneling in M1 with 00 (|Vgd1|=Vdd). This is due to the fact that the rate of change of tunneling current density increases rapidly with an increase in Vox (potential drop across oxide). Therefore, the total gate current in a stack with input 10 is less than that with input 00 because a decrease in Vox reduces the tunneling current.
36

Vdd

Vdd

Y A M1

M2

Gnd

Fig 4.2 Dependence of gate leakage on input vector Battery operated devices are either idle (Standby) or Active. Based on these operational modes, leakage power reduction techniques are classified as:
Leakage Control in Standby Mode: Techniques that fall into this category cutoff the

circuit from the supply rails when the circuit is in idle state. Examples are power gating and super cutoff CMOS
Leakage Control in Active Mode: Certain design modifications such as adding

additional transistors to form transistor stacks can be incorporated such that leakage currents can be minimized during runtime. Examples are forced stacking and sleepy stack. There are many techniques to overcome both type of leakage current. But in this thesis work we will discuss the remedy to overcome subthreshold leakage only. I have applied the leakage reduction techniques to reduce standby leakage in multipliers. The technique will be clearer from the figure below.

37

Vdd

Sleep

High Vt

Standard Vt Multiplier design

Seepbar

High Vt

Gnd
Fig 4.3 Multi-Threshold CMOS 4.2 Multi-Threshold CMOS Multi-Threshold CMOS (MTCMOS) a technique for leakage reduction has emerged as a very popular technique for standby mode leakage power reduction. In this technique, a high-threshold voltage transistor is inserted in series with the power supply and the existing design and ground. The working of this circuit is as follows [4]. During Active mode of operation, the high threshold Vt transistors are turned on, thereby facilitating normal operation of the circuit as there exists a direct path from the output to ground and Vdd. During Standby mode, these transistors are turned off creating a virtual power supply and ground rail and cutting off the circuit from supply. Since the high Vt transistors operating in standby mode forces the circuit to go to sleep, they are also known as sleep transistors. The disadvantage of such a scheme is that the additional delay

38

introduced due to high Vt sleep transistors during active mode that is during evaluation phase of the clock. So in order to reduce the evaluation delay the high Vt NMOS transistor can be removed. It will further result in power reduction.

Vdd

Sleep

highVt

Standard Vt Multiplier design

Gnd
Fig 4.4 Multi-Threshold CMOS with reduction of evaluation delay

The extra transistor added to the design in figure 4.3 increases the impedance between true and supply voltage going to standard design, causing greater power supply noise and gate delay. Thus in second case we have remove the NMOS so it will result in thre reduction of noise and delay.

39

Chapter 5
SIMULATION AND RESULT DISCUSSION
In this section we are going to first describe about the tool used that is known as DESIGN ARCHITECT and then further will see the output waveforms and then discuss the results. 5.1 DESIGN ARCHITECT Design Architect tool is a product of MENTOR GRAPHICS.Mentor Graphics is a technology leader in electronic design automation (EDA), providing software and hardware design solutions that enable companies to develop better electronic products faster and more cost-effectively. The company offers innovative products and solutions that help engineers overcome the design challenges they face in the increasingly complex worlds of board and chip design. Mentor Graphics has the broadest industry portfolio of best-in-class products and is the only EDA Company with an embedded software solution [16]. The Design Architect tool provides the most comprehensive technology available for Schematic capture and hierarchical design. It includes schematic capture, symbol creation and design viewpoint generation as well as library data management. Design Architect uses ELDO, a simulator. Eldo includes the most advanced simulation technology and provides extensive simulation capabilities. Its advanced analysis features include transient noise, DC mismatch, sensitivity, aging analysis, library encryption and licensing capabilities, optimization, distributed computing, multi-threading, RC reduction, pole-zero, enhanced Monte-Carlo analysis, S-parameters, S-domain and Z-domain generalized transfer functions. Eldo's extensive device model libraries include all the latest transistor models. It is the simulator of choice for CPU-intensive applications such as digital cell characterization. Eldo's capabilities can be further extended with VerilogA analog behavioral modeling, advanced RF analysis with Eldo RF. After simulating the design one can easily view the logs. The logs contain various useful parameters like threshold voltage, channel length modulation coefficient and body effect coefficient and many more. So, one can easily view different parameters from ASCII files.
40

5.2 SCHEMATIC OF MULTIPLIER

Fig 5.1 Schematic of Multiplier

The schematic of multiplier is shown above, consists of 3 booth encoders, 12 partial product generators, a Wallace tree adder and a CLA or Kogge stone adder. This schematic contains basically the symbols of booth encoders, partial product generators, Wallace tree adder and CLA or Kogge stone adder. This is done to avoid congestion of wires which makes the debugging of design easy. Apart from these sub modules two clocks local and global are used.

41

5.3 MULTIPLICAND INPUT WAVEFORMS

The multiplier I designed is 4 4 bit multiplier. X is multiplicand and Y is multiplier. X=0111 and Y=1001. As according to the procedure of multiplication in booth multiplier, Y is first appended with 2 zeros at left of MSB and a zero to the right of LSB. Then the multiplier bits are divided in to overlapping triplets. In this case we will be having 3 triplets; corresponding to these triplets we are having three partial products. Then we apply sign extension depending up on whether the partial products are positive or negative.

42

5.4 MULTIPLIER INPUT WAVEFORMS

The above waveforms are the multiplier waveforms. All the inputs are given in parallel so they are arriving at the same time at booth encoder. As we were discussing the multiplier operation, after sign extension the partial products are summed in Wallace tree adder. This provides carry and sum as output. Then these carry and sum bits are added in CLA or KSA adder.

43

MULTIPLIER INPUT WAVEFORMS

The technology I am using is tsmc 0.18 micro meter. The tsmc stands for Taiwan Semiconductor Manufacturing Company Limited and 0.18 tells us technology on which our design is based. We are using 0.18 m technology. The 0.18 basically tell us the channel length of a transistor.

44

5.5 OUTPTUT WAVEFORMS

One would think that why I have written A0 instead of SM0, this is because A0 is the least significant output bit of Wallace tree adder and there is no carry to be added with it so it is written like this.

45

OUTPTUT WAVEFORMS

SM4 and SM5 are fifth and sixth sum bits of the multiplier. Vclk is the input clock given to multiplier circuit. In this multiplier circuit I have used two clock buffers one is local clk buffer and the other is global clock buffer.

46

OUTPTUT WAVEFORMS

As one can easily see the last two waveforms (SM6 and SM7) are showing zero volts but if one watches clearly there are certain vertical bars in the waveform. These bars are showing leakage current.

47

5.6 RESULTS AND OBSERVATIONS MULTIPLICAND =0111, MULTIPLIER=1001

Table 5.1 Simulation results for average case Type of multiplier Dynamic Energy Energy of power clock Dissipation (W/MHz) dissipation (W) (W/MHz) Precharge delay (ns) Evaluation Leakage delay (ns) current(nA)

CLA MULT.

374.499

37.4499

0.4019

0.474

1.70111

0.7817

KSA MULT.

378.606

37.8606

0.46583

0.556

1.67516

0.8751

Dynamic power dissipation is power consumption is due to the current that flows only when the transistors of the devices are switching from one logic state to another. This is a result of the current required to charge the internal nodes (switching current) plus the through current (current that flows from VCC to GND when the p-channel transistor and n-channel transistor turn on briefly at the same time during the logic transition). The dynamic power dissipation directly depends up on the switching activity. Observation: It can be seen from above table the CLA based multiplier is having less power dissipation than KSA based multiplier. Also the energy of clock buffer in first case is less than the second case. However the second multiplier is having faster evaluation phase than the first one. The leakage current in first case is less than the second multiplier. Therefore CLA based multiplier will have less power dissipation.

48

MULTIPLICAND =0001, MULTIPLIER=0001

Table5.2 Simulation results for best case Type of multiplier Dynamic Energy Energy of power clock Dissipation (W/MHz) dissipation (W) (W/MHz) Precharge delay (ns) Evaluation Leakage delay (ns) current(nA)

CLA MULT.

8.7534

0.8753

0.39461

0.355

0.684

0.7878

KSA MULT.

10.4586

1.0458

0.41770

0.493

0.537

0.8825

Observation: It can be seen from above table the CLA based multiplier is having less power dissipation than KSA based multiplier as you can compare this table from the previous one in which dynamic power dissipation is much higher than this one. This is because this is the case where we are multiplying the inputs with less switching. Also the energy of clock buffer in first case is less than the second case. However the second multiplier is having faster evaluation phase than the first one. The leakage current in first case is less than the second multiplier. Therefore CLA based multiplier will have less power dissipation.

49

5.7 LEAKAGE CURRENT MINIMIZATION WAVEFORM

As one can easily see the last two waveforms (SM6 and SM7) are showing zero volts but if one watches clearly there are certain vertical bars in the waveform. These bars are showing leakage current. These bars if compared to the previous waveforms bars, where leakage reduction technique is not applied, are having less leakage. It can be confirmed by comparing the amplitude of these bars. However these bars are not the vertically straight but when we apply zoom to the bars we will get waveform in place of this bar which is rising and then falling vice versa.

50

5.8 REDUCED LEAKAGE CURRENT

Table 5.3 Simulation results after applying leakage reduction TYPES OF MULTIPLIER LEAKAGE CURRENT (nA) 0.7817 REDUCED LEAKAGE CURRENT (nA) 0.2692 PRECHARGE DELAY(ps) EVALUATION DELAY(ps)

CLA MULTIPLIER

637.409

515.815

KSA MULTIPLIER

0.8751

0.2856

556.81

543.14

Observation: As it is clear from above by applying the reduction technique the leakage current (subthreshold leakage current) is reduced to more than half. Also the precharge and evaluation delay is reduced to picoseconds from nano seconds to picoseconds.

51

Chapter 6
CONCLUSION & FUTURE SCOPE 6.1 Conclusion
The Kogge stone adder can be used for faster evaluation applications but cannot be used

in low power applications.


The leakage current can be reduced by using a PMOS with sleep input in series with the

multiplier.
The leakage reduction technique results in the reduction of precharge and evaluation

delay from Nano second to Pico second.

6.2 Future Scope


We can apply alternative scheme in place of MTCMOS Power. In this scheme, the sleep transistors are under-driven (or over-driven) when in standby mode. For example, an NMOS device would be turned off with a slight negative voltage instead of zero voltage. This means that the gate voltage is negative (Vg < 0). From the equation of sub-threshold current (4.1), it is clear that a negative gate voltage decreases the leakage current exponentially. The key difference with standard MTCMOS is that the sleep transistors have the same low threshold voltage.

52

REFRENCES
[1]. F. Frustaci M. Lanuzza P. Zicari S. Perri P. Corsonello Low-power split-path data-driven dynamic logic Department of Electronics, Computer Science and Systems, University of Calabria, Arcavacata di Rende 87036, Rende (CS),Italy, published in IET Circuits, Devices & Systems Received on 20th April 2009, Revised on 4th September 2009, doi: 10.1049/ietcds.2009.0099. [2]. G. Goto, et al., A 4.1-ns Compact 54x54-b Multiplier Utilizing Sign-Select Booth Encoders, IEEE Journal of Solid-State Circuits vol.32, no. 11, November 1997. [3] G. Goto, et al., A 54x54-b Regularly Structured Tree Multiplier, IEEE Journal of SolidState Circuits vol.27, no. 9, September 1992. [4] K. Roy., S Mukhopadhyay., H Mahmoodi-Meimand.: Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits, Proc. IEEE, 2003, 91, (2), pp. 305327. [5]. P. Srivastava, A. Pua, and L. Welch, Issues in the Design of Domino Logic Circuits, in Proc. IEEE Symp. on VLSI. [6] P.M Kogge and H.S Stone, A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations, IEEE Trans. On Computers, Vol. C_22, No. 8, August 1973. [7] A. Abdollahi, F. Fallah, and M. Pedram, "Leakage current reduction in CMOS VLSI circuits by input vector control." IEEE Transactions on VLSI Systems, Vol. 12, No. 2, Feb. 2004, pp.140-154. [8] Y. Ye, S. Borkar, and V. De, New technique for standby leakage reduction in highperformance circuits, in Symp. VLSI Circuits Dig. Tech. Papers, 1998, pp. 4041. [9] J. B. Kuo, K. W. Su, and J. H. Lou A BiCMOS Dynamic Multiplier Using Wallace Tree Reduction Architecture and 1.5-V Full-Swing BiCMOS Dynamic Logic Circuit. IEEE Journal of Solid- State Circuits, Vol. 30, No. 8, August 1995.
53

[10] J. Rabaey, C Anathan, N Borivoje "Digital Integrated Circuits: A Design Perspective", Prentice Hall, 1996. [11] W. Neil H.E.,H David, B Ayan Cmos Vlsi Design: Circuit And System Perspective, third edition,2007. [12] B. Razavi, Design of Analog CMOS Integrated Circuits. Boston, MA: McGraw-Hill, 2001. [13] K. Prasad and K. K. Parhi, Low-power 4-2 and 5-2 compressors, in Proc. of the 35th Asilomar Conf. on Signals, Systems and Computers,Vol. 1, 2001, pp. 129133. [14] http;//www.vlsi.4windor.ca/presentations/2008/17 compressor cells. pdf [15] H. Davis: Introduction to CMOS VLSI Design, lecture 9: circuit families Davis Harris mid college spring 2004. [16] http://www.mentor.com

54

55

Vous aimerez peut-être aussi