Fast Adders Using EnhancedMultiple-Output Domino Logic

206
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 2, FEBRUARY 1997
Fast Adders Using Enhanced Multiple-Output Domino Logic

Zhongde Wang, Senior Member, IEEE, Graham A. Jullien, Senior Member, IEEE, William C. Miller, Member, IEEE, Jinghong Wang, and Sami S. Bizzan
Abstract Using an enhanced multiple output domino logic (EMODL) implementation of a carry lookahead adder (CLA), sums of several consecutive bits can be built in one nFET tree with a single carry-in. Based on this result, a new sparse carry chain architecture is proposed for the CLA adder. We demonstrate the design approach using a 32-b adder, and show that only four carries are sufcient for generating all sums, with a consequent reduction in the number of stage delays. Using a 1.2-m CMOS technology, we verify our simulation procedures by fabrication and measurement of a 2.7 ns critical path. Index Terms Carry lookahead adders, CMOS logic, domino logic, sparse carry chains.
I. INTRODUCTION
DDITION is a fundamental arithmetical operation in almost any kind of processor, and improving the efciency of addition is a continuously attractive research topic. Highspeed adder architectures include the carry lookahead adder (CLA) [1][5], carry-skip adder [6][8], carry-select adder [9], conditional sum adder [10], and combinations of these basic structures. For example, the spanning tree adder [11] is a combination of the CLA and carry-select adder. A recent comparison among these adders [12] showed that the CLA adder is the fastest, and requires the least hardware. The CLA algorithm was rst introduced by Weinberger and Smith [1], and several variants have been developed (e.g., Brent and Kung [2]). The implementation of a CLA adder, using dynamic CMOS logic, was reported recently [4]. Conventional CLA adders consist of three distinct stages: the preliminary stage; carry chain building stage; and the sum generation stage. The task of the preliminary stage is to generate bit level generates and propagates for the use of the second stage to generate the carry chain. This second stage, which usually contains many substages, is the major part of the CLA adder, and generates carries for all bit positions. The output sums of the adder, for all bit positions, are generated in
Manuscript received June 29, 1995; revised May 13, 1996. This work was supported by the Natural Science and Engineering Research Council of Canada and the Micronet Network of Centres of Excellence. Z. Wang was with the VLSI Research Group, Department of Electrical Engineering, University of Windsor, ON N9B 3P4, Canada. He is now with Genesis Microchip, Inc., Markham, ON Canada. G. A. Jullien and W. C. Miller are with the VLSI Research Group, Department of Electrical Engineering, University of Windsor, ON N9B 3P4, Canada. J. Wang is with Telecom Microelectronics Centre, Northern Telecom Ltd., Nepean, ON, K1Y 4H7, Canada. S. S. Bizzan is with ATI Technology, Thornhill, ON L3T 7N6, Canada. Publisher Item Identier S 0018-9200(97)01128-1.
the nal stage. Since the speed of a CLA adder mainly depends on the speed of the second stage, the rst and third stages are usually not emphasized in the literature. The carry chain is a full carry chain, which contains carries for all bit positions. Although a block technique has been previously suggested [3], the full carry chain cannot be avoided. To speed up the generation of the carry chain, Hwang and Fisher [4] suggested the use of multiple-output domino logic (MODL) to generate 2-b group generates and propagates in the beginning stage. Lynch and Swartzlander [11] use a redundant (carry-select) structure in the last stage to relieve the load of building a full carry chain by replacing it with a sparse carry chain. In this paper, we use enhanced MODL (EMODL) circuits which can generate sums of several consecutive bit positions by one single carry-in. The function of this technique is the same as the redundant structure introduced in [11]; that is, to replace a full carry chain with a sparse carry chain. The advantage of this new approach is a reduction in hardware due to both the EMODL circuit form and the requirement for fewer carries in the chain. In addition, the preliminary stage of the conventional CLA adder, which generates the bit level generates and propagates, can be eliminated using our approach; this yields additional reductions in the adder critical path. The paper is structured as follows. Section II reviews fundamentals for the CLA algorithm. Since we are using domino logic, the complements of carries would normally be generated in parallel. Here, we formalize the concept of pseudocomplements, which simplies the circuit design and layout process for the CLA adder design. Section III provides examples of typical EMODL circuits for generating several consecutive sums from one carry-in, and Section IV discusses the entire design procedure using the EMODL technique. Example designs for a 32-b adder are presented in Section V, and comparisons with other CLA techniques are made in Section VI. II. FUNDAMENTALS In this section, we rst provide a brief review of the theory behind the conventional CLA adder, and then we formalize the pseudocomplement technique that is the theoretical backbone of our new circuit architecture. A. CLA Algorithm Assuming that two binary summands , are fed to the adder, with a carry-in, and , we
00189200/97$10.00 1997 IEEE
WANG et al.: FAST ADDERS USING ENHANCED MULTIPLE-OUTPUT DOMINO LOGIC
207
dene the generate term, , propagate term, , and exclusiveor (XOR), , for each bit position , as , , . We use the operator , associated with the carry generation, introduced by Ladner and Fisher [13], to link two generate and propagate pairs (1) For a group, starting from the bit position , and ending at the , we dene the group generate, , and bit position group propagate, , as given by (2) The group generate and propagate functions possess the following properties: (3) (4) Note that is an associative but not commutative operator [8]. Also note that in (1)(4), can be replaced by ; i.e., propagate and group propagate can be replaced by XOR and group XOR. Now we allow to operate on a pair of generates and propagates at bit position , and the carry-in , to form the carry-out, by the denition in (5) We shall restrict (5) to apply only when there is a carry-in to that bit position, otherwise the operation can only link two pairs of generates and propagates. With the expression given by (5), the carry for the th bit position, , is given by (6a) or
B. Pseudocomplements Since we use a domino logic design for the adder, we require to generate both the true and complements of the and variables; to obtain high speed, these should be generated in parallel. Although differential cascode voltage switch (DCVS) logic [14] has been shown to be very efcient in producing both true and complement signals, the properties of the generates and propagates, including group generates and group propagates, do not derive any advantage from applying DCVS circuit principles. These circuits require more transistors, with larger fan-in, and thus more silicon area, than single-ended circuits. An important point in favor of generating single-ended circuits is that the generates and propagates, including group generates and group propagates, are intermediate logical variables, and their complements are not necessary if the complements of the carries can be generated another way. In order to generate the complement carry chain, we have dened the concept of pseudocomplements (PC) [16] to produce a PC generate, , and PC propagate, . Note that is not the complement of , and is not the complement of . Instead, and . This representation allows us to derive the complements of carries from , and using the operator , with full parallelism.1 When is applied to two pairs of pseudocomplement generates and propagates, it is dened as (8) , and the PC group propagate, Then the PC group generate, , can be generated in parallel (9) Lemma 1: The complement carry chain can be derived from and in the same manner as the carry chain is derived from and , namely (10a) (10b) Proof: From (5): , therefore
(6b) Equation (6a) directly formulates the Manchester carry chain; while (6b) formalizes carry generation by group generate and propagate. After is obtained, the sum, , is given by (7) Note that although the propagate (or group propagate) in (1)(6b) can be replaced by XOR (or group exclusive OR), the exclusive OR in (7) cannot be replaced by propagate. In fact, some authors (e.g., [5]) directly treat the XOR, instead of the inclusive OR, as the propagate. On the other hand, however, although the propagate can be replaced by the XOR, the delay for generating the XOR will be larger than that for generating the inclusive OR. For this reason, we distinguish the propagate from the XOR and use the speed advantage of propagate in our circuit design, as will be shown later.
We now continue to use this decomposition until (10a) is obtained. The complement of the carry at the th bit position can thus be generated by a carry-in at the th bit position (11) It is interesting to note that the PC propagate (or the group PC propagate) in (8)(10b) can also be replaced by the XOR (or the group XOR).
1 We are grateful to one of the reviewers for pointing out that a similar symmetrical carry-chain circuit appeared in a very recent book, Fig. 9.17 of [19]; this work previously appeared in an unpublished report.
208
(a)
(a)
(b) Fig. 1. Manchester chain to generate group (a) generates and (b) propagates.
To illustrate the advantages of the pseudocomplement concept to the circuit design, Fig. 1 shows the Manchester chain for group (a) generates and (b) propagates. Because of the presence of sneak paths (current owing in the wrong direction through a transistor causing a false discharge) it is necessary to use instead of for the propagate function. As an example of this phenomenon, assume that and in Fig. 1(a), then we and . If should have the outputs the input of transistor in Fig. 1(a) is , then since , the voltage at node will be pulled down leading to an incorrect result. On the other hand, if the input of transistor is , the voltage at node will remain precharged. If the conventional domino logic approach is adopted, the complement of each function has to be generated at the same time as the true function is generated. The two domino logic circuits, which from our knowledge use the least number of transistors to generate the complements of the group generates and propagates, are shown in Fig. 2. The transistors with inputs and in both (a) and (b), are required to prevent sneak paths. Without these prevention transistors, the circuit , , , will not provide correct outputs for and . Fig. 2 clearly shows that the circuits needed to generate the complement group generates and propagates require more
(b) Fig. 2. Minimal domino logic circuits to generate complement group (a) generates and (b) propagates.
transistors and a larger fan-in (longer delay and larger area) than the Manchester chain in Fig. 1. Since the generates and propagates are intermediate variables, their sole function is to build up the carry chain. Whether their complements are generated or not is not of importance for the adder, so long as the complement carry chain can be built up at the same time as the true carry chain. The pseudocomplement approach guarantees that identically structured circuits can be used to generate the pseudocomplement group generates and propagates. Therefore, no extra hardware is required and the true and pseudocomplement delays will be identical. The symmetry of the circuitry also reduces the degrees of freedom in transistor sizing optimization, as will be discussed later. III. ENHANCED MULTIPLE OUTPUT DOMINO LOGIC MODL was formally introduced by Hwang and Fisher [4] in 1989, and it has been shown to provide considerable hardware savings. The Manchester carry chain of Fig. 1 is an example of such a logic circuit. MODL is used when a logic function to be implemented contains a subfunction of another logic output. In this case, the main function, together with its subfunctions, can
209
Fig. 4. Three consecutive sums generated by a carry k
0 i bit positions apart.
Fig. 3. Four consecutive sums built in a single EMODL block.
be built in a single domino logic tree with multiple outputs. We expand the MODL concept to a more general case. When the evaluation of a subfunction, which is not necessary to be an output, is a common base of two or more complex functions, whose evaluations are outputs, the domino logic trees of the output functions can be built on a single tree with a common base function subtree. We refer to this approach as enhanced multiple output domino logic (EMODL) [17]. One example of an EMODL circuit is shown in Fig. 3, where sums for four consecutive bit positions are built in a single EMODL tree with only one carry-in. The two trees identied by the dashed blocks, are 4-b carry chains. The block on the left-hand side is the true carry chain, and the block on the right-hand side is the complement carry chain. All four sums are built directly upon these two carry chain trees. The tree height is limited to six (including the ground switch), based on considerations for charge sharing and pull down evaluation delay. In this particular circuit, internal pull-up and in order to reduce devices are connected to nodes charge-sharing problems. The EMODL circuits are also sized in order to minimize pull down evaluation delay [15]. One of the advantages of the EMODL circuit approach is that only a few bottom transistors are required, and since these have the largest width when sizing the circuit, this tends to reduce the area required compared to other tree techniques. Fig. 4 shows another EMODL example, where three consecutive sums and a carry-out are generated by a carry-in with a separation of bit positions. In general, we use EMODL to generate the sums of a block by a single tree with a single complementary pair of input carries. Therefore, we only require a sparse carry chain containing carry-ins for the single tree blocks, and there is no need to produce the usual bit level generates and propagates. In the implementation of the 32-b adder, which will be shown , and . later, we replace and in Figs. 3 and 4 by The exclusive OR functions, however, are necessary for all
bit positions; they can be used to replace both the and functions. In Figs. 3 and 4, the height of the tree (the fan-in for the gate plus onethe ground switch) is restricted to be six. This has been determined from comprehensive simulation experiments with optimized transistor sizing. It is clear from Figs. 3 and 4 that a dynamic logic tree with height can generate, at most, consecutive sums by an adjacent carry-in. The height of the tree strongly affects the efciency of EMODL circuits. Higher trees can generate more sums than lower trees; and they are more efcient than lower trees for the EMODL implementation. On the other hand, the delay and chargesharing problems of higher trees mitigate this efciency. Chan and Schlag [7] point out that if a uniform size is applied to all nFETs, the delay of a dynamic tree (gate) is quadratically related to the height of the tree. Our experiments, and a recent reference [19], have shown that this assumption is quite false for sized trees. Our simulation results, given later, show that sizing the nFET tree changes the relationship between the delay and tree height to one in favor of using higher trees. There is a trade-off between the tree height and the number of stages for building a CLA adder. For a 32-b adder, limiting the tree height to be six has been found to be optimal in terms of speed. IV. CLA ADDER DESIGN USING EMODL TECHNIQUES In this section we show that the EMODL circuit technique can considerably reduce the number of stages for CLA adders, with a corresponding decrease in the critical path. The conventional CLA adder, discussed in Section II, is shown in Fig. 5. The EMODL CLA adder structure is shown in Fig. 6. Instead of producing generates and propagates for every bit position, the rst stage produces group generates and group propagates, one carry-out, and XOR outputs for all bit positions. The second part generates the sparse carry chain. For adders with , no more than two substages are required for sparse carry chain generation. For short length , this second part of the adder can even adders, e.g., be eliminated, with the carry chain containing the carry-out of the rst stage only. The third part of the adder contains
210
Fig. 5. Structure of a conventional CLA adder.
Fig. 6. Structure of EMODL CLA adder. TABLE I 32-BIT ADDER DESIGN COMPARISON
transistors in the broken line block) and is a similar three bit block. contains only one circuit, the group generate circuit shown in Fig. 9, including the broken line block. For this circuit, is set zero, and the output is replaced by , the carry-out of the rst stage. Both and blocks generate sparse carries and group propagate and generate signals. In order to prevent sneak paths (see p. 3), separate circuits are used for each carry and group generate and propagate. The blocks in the third stage, , , and , are EMODL circuit blocks. The circuit for is given by Fig. 3, and the circuit for or is given by Fig. 4. We exclude the circuit surrounded by broken lines for . Since bit level generates are not necessary for our design, and in Figs. 3 and 4 are directly implemented by and , respectively, using two transistors connected in series. To summarize the speed advantages of our architecture; the conventional CLA requires stages for our example 32-b adder [3]. Hwang and Fisher [4] reduce the number of stages to ve, plus a stage of static logic to produce the sums from carries and XORs. Our approach does not need the static logic stage, and it requires only three stages of the same, or lower, tree height compared to [4]; this should yield a lower critical path.
V. FABRICATION EMODL trees to generate sums for all bit positions from the sparse carry chain. To illustrate the advantages and the methodology of the EMODL technique, we will use the design example of a 32-b adder. We initially decide on the trade-off among tree height, fan-out, and the number of stages. For this design we performed a comparison among four different designs [17], with the results shown in Table I. The technology is a 1.2- m DLM CMOS process from NorTel. The delays are obtained from schematic designs and are somewhat optimistic, but are useful for comparison purposes. The transistors in each block were sized individually and, although this is not optimal, again the object was to provide design choice. Clearly design I is the superior architecture. Fig. 7 shows the block diagram with the variables, except for the sum and carry outputs of the nal stage, representing both the true and complement or pseudocomplement of the variables. The rst stage generates group generates and propagates, and a single carry . The group size is four for bit positions below 20, and three for bit positions above 20. , , and , are fed to both the All inputs to the adder, rst and third stage. The rst stage also generates exclusive , for all bit positions, and feeds them directly OR terms, to the third stage. The adder uses nine different block types, . The parentheses within each block contain represented by the gure numbers pertaining to the circuits used in that block. Each of the four different blocks in the rst stage contain several copies of the circuit in Fig. 8 for the bit level XOR only contains these circuits. and complement. Block contains additional circuits to generate four bit group generates and propagates, as shown in Fig. 9(a) and (b) (excluding the
AND
TEST RESULTS
The 32-b example design has been implemented in the same 1.2- m CMOS process used for the schematic simulation tests. The micrograph of the chip shown in Fig. 10(a) and (b) shows the arrangement of the blocks. is not shown since it is quite small; it is laid out in one of the gaps between the major blocks. The peripheral circuitry around the design is used to distribute inputs from the limited number of pads (the design is clearly I/O bound). Sufcient inputs and outputs are provided to allow reasonable condence levels in the testing. The critical path was tested separately as discussed later. The adder was sized using a recently developed combination iterative/analytical technique that is able to produce very close to optimal transistor widths with low computational overhead [15], [18]. HSPICE simulations on the mask-extracted layout show a worst-case delay of 2.7 ns. For comparison, the delay of the 32-b adder in [4], which was fabricated in a 0.9- m CMOS technology, is 3.1 ns. The data for these two designs are listed in Table II. Applying reasonable scaling criteria, we would expect our design to be faster and smaller than their design using the same technology. Table II compares the main characteristics of our design with Hwang and Fishers design [4]. For a conventional CLA adder, the critical path always occurs between the carry-in and the output of the most signicant sum. In our design, the critical path is between and . In order to make accurate measurements for this critical path delay, we fabricated a second test chip. This chip contains ten critical paths in series with complete load circuitry. A calibration for the I/O delay was obtained by connecting, in series, the circuitry for both an input and output pad. The micrograph of this test chip is shown in Fig. 11
211
Fig. 7. Block diagram for an EMODL implementation of the 32-b adder.
close proximity of the circuitry on the silicon. The subtraction yields a delay of 28 ns for 10 critical paths, resulting in a delay of about 2.8 ns for each critical path. This gives us great condence in our HSPICE simulation results (2.7 ns), and in any conclusions drawn from these results. VI. ADDITIONAL OBSERVATIONS In the previous sections, it has been shown that the EMODL concept is more advantageous for higher trees. If no restriction is imposed on the fan-out, a much lower number of stages can be achieved. Table III gives maximum word lengths possible using a two-stage implementation. Table III shows that it is possible to implement a 32-b adder by two stages of logic gates; the maximum tree height, however, is nine, which will produce considerable charge sharing problems. Some of these designs are presented in Table I. The usual arguments against using relatively high trees are based on charge-sharing and quadratic delay concerns; the
Fig. 8. Circuit for bit level XOR and its complement.
and the waveforms from a digital oscilloscope are shown in Fig. 12. The effective delay of the input/output pad circuitry can be subtracted from the measured delay by assuming that the calibration circuitry delay is identical to that in the critical path measurement. This is quite reasonable considering the
212
(a)
(a)
(b) Fig. 9. Circuits for group generate (or carry) and group propagate.
latter is often the stronger argument and carries through to static circuits. It should be pointed out that the quadratic delay relationship does not hold for sized chains [19], and certainly is not in evidence with more complex sized trees. Fig. 13 demonstrates simulation results for the worst-case evaluation to ) using optimally node pull-down delay (from sized EMODL circuits for the sum with the largest index. The sizing constraint was chosen to make the average transistor width equal to the comparison xed-width circuit (10 m). Clearly, the relationship is not based on an analytical function since the circuits, and their critical paths under worst-case conditions, do not change in an analytical way as we increase their height. It is clear, however, that increasing tree height is a correct design decision; the function appears to be close to linear for the limited sample set in our experiment. Our observation, therefore, is that circuit/architectural choices based on an assumed quadratic relationship are not valid for sized circuits. It is also clear that we have obtained our improved results by trading off this linear tree delay against the extra delay introduced by more cascaded stages, where each cascade stage includes the delay associated with the domino logic inverter.
(b) Fig. 10. The layout of the 32-b adder.
213
Fig. 11.
Test chip micrograph with ten critical paths in series. TABLE II HWANG AND FISHERS DESIGN [4]
COMPARISON
WITH
TABLE III MAXIMUM WORD-LENGTH FOR TWO-STAGE ARCHITECTURES Fig. 12. Results from the critical path test chip.
Fig. 13.
Relationship between delay and fan-in for EMODL circuits.
There are many factors affecting the delay of the complete adder. The most important are: number of stages; delay of each stage (fan-in); and the fan-out. Although we cannot precisely predict which combination of the number of stages and the tree height provide the shortest delay based on individual block delays, e.g., Table III, we still can make comparative judgments by approximating complete adder delays by a summation of block delays, as we have shown in Table I. VII. CONCLUSIONS In this paper a modied logic family, EMODL, is discussed. We have demonstrated the use of EMODL trees for CLA
adders, which generate sums for several consecutive bit positions. This allows the use of a sparse carry chain with reduced critical path delay. Bit level generates and propagates are not required in this approach, and the preliminary stage required for a conventional CLA adder can be eliminated. The concept of pseudocomplements is formalized in this paper in order to exploit a circuit symmetry between the carries and their complements. This symmetry provides improvement in layout design and also reduces the number of degrees of freedom in transistor sizing optimization. The design technique is illustrated with a 32-b adder design, and favorable comparisons are made with a recently published MODL design. We nally demonstrate that the improvements are due to a close to linear delay relationship between EMODL tree fan-in using optimally sized transistors and pull-down delay. ACKNOWLEDGMENT Acknowledgments are due to the Canadian Microelectronics Corporation for providing design tools and fabrication resources used in this work. REFERENCES
[1] A. Weinberger and J. L. Smith, A logic for high speed addition, Nat. Bur. Stand. Circ., vol. 591, pp. 312, 1958.
214
[2] R. P. Brent and H. T. Kung, A regular layout for parallel adders, IEEE Trans. Comput., vol. C-31, pp. 280284, 1982. [3] S. Waser and M. J. Flynn, Introduction to Arithmetic for Digital System Design. New York: CBS, 1982, ch. 3. [4] I. S. Hwang and A. L. Fisher, Ultra fast compact 32-bit CMOS adder in multiple-output domino logic, IEEE J. Solid-State Circuits, vol. 24, pp. 358369, 1989. [5] B. W. Y. Wei and C. D. Thompson, Area-time optimal adder design, IEEE Trans. Comput., vol. 39, pp. 666675, 1990. [6] S. Turrini, Optimal group distribution in carry-skip adders, in Proc. 9th Symp. Comp. Arithmetic, Sept. 1990, pp. 96103. [7] P. K. Chan and M. D. F. Schlag, Analysis and design of CMOS Manchester adders with variable carry-skip, IEEE Trans. Comput., vol. 39, pp. 983992, 1990. [8] A. Guyot et al., A way to build efcient carry skip adders, IEEE Trans. Comput., vol. C-36, pp. 11441151, 1987. [9] O. J. Bedrij, Carry-select adder, IRE Trans. Elec. Comp., vol. EC-11, pp. 340346, 1962. [10] J. Sklansky, Conditional-sum addition logic, IRE Trans. Elec. Comp., vol. EC-9, pp. 226231, 1960. [11] T. Lynch and E. E. Swartzlander, A spanning tree carry lookahead adder, IEEE Trans. Comput., vol. C-41, pp. 931939, 1992. [12] T. K. Callway and E. E. Swartzlander, Optimizing arithmetic elements for signal processing, in VLSI Sig. Proc., Vol. V, K. Yao et al., Ed. New York, NY: IEEE, 1992, pp. 91100. [13] R. E. Ladner and M. J. Fischer, Parallel prex computation, J. ACM, vol. 27, pp. 831838, 1980. [14] K. M. Chu and D. I. Pulfrey, Design procedures for differential cascode voltage switch circuits, IEEE J. Solid-State Circuits, vol. SC-21, pp. 10821087, 1986. [15] S. S. Bizzan, G. A. Jullien, and W. C. Miller, Analytical approach to sizing nFET chains, IEE Elec. Lett., vol. 28, no. 14, pp. 13341335, 1992. [16] Z. Wang, G. A. Jullien, W. C. Miller, and J. Wang, New concepts for the design of carry look-ahead adders, in Proc. 1993 Int. Conf. Circuits, Systems, vol. 3, pp. 11371140. [17] J. Wang, Z. Wang, G. A. Jullien, and W. C. Miller, Area-time analysis of carry look-ahead adders using enhanced multiple output domino logic, in Proc. 1994 Int. Conf. Circuits Systems, vol. 4, pp. 5962. [18] S. S. Bizzan, G. A. Jullien, and W. C. Miller, A combined iterative analytical technique for fast sizing of connected domino logic chains, in preparation. [19] J. M. Rabay, Digital Integrated Circuits, A Design Perspective. Englewood Cliffs, NJ: Prentice-Hall, 1996, ch. 7.
Graham A. Jullien (M71SM83) was educated in the U.K., receiving degrees in electrical engineering from the Universities of Loughborough, Birmingham, and Aston (Ph.D., 1969). He was a student engineer and data processing engineer at English Electric Computers, U.K., from 1961 to 1966, and a Visiting Senior Research Engineer at the Central Research Laboratories of EMI Ltd., U.K., from 1975 to 1976. Since 1969, he has been with the Electrical Engineering Department of the University of Windsor, ON, Canada, and currently holds the rank of University Professor. He is also Director of the VLSI Research Group at the University of Windsor. He was a member of the Board of Directors of the Canadian Micro-electronics Corporation from 1990 to 1993 and is a Principle Researcher and Member of the Coordinating Committee of the Micronet Network of Centers of Excellence. He has published widely in the elds of computer arithmetic, digital signal processing, and VLSI systems and teaches courses in related areas. Dr. Jullien has served on the technical committees of many international conferences; he serves on the Editorial Board of the Journal of VLSI Signal Processing, and was an Associate Editor of the IEEE TRANSACTIONS ON COMPUTERS from 1994 to 1996. He hosted and was program co-chair of the 11th IEEE Symposium on Computer Arithmetic.
William C. Miller (S56M60) received the B.S.E. degree in electrical engineering from the University of Michigan, Ann Arbor, in 1960 and the M.Sc. and Ph.D. degrees in electrical engineering from the University of Waterloo, ON, in 1961 and 1969, respectively. Since 1968, he has been a Professor in the Department of Electrical Engineering at the University of Windsor, ON, Canada. His research interests include digital signal processing, massively parallel DSP architectures, neural networks, VLSI implementations, intelligent sensors, and process control applications. Dr. Miller is a Registered Professional Engineer in Ontario.
Zhongde Wang (M83SM87) graduated, with honor, from the Department of Physics, Yunnan University, Kunming, China, in 1960. He was with Kunming Research Institute of Physics from 1960 to 1987. He visited the Department of Electrical Engineering of the University of Arizona as an exchange scholar from June 1980 to January 1983 and was awarded a grant from National Science Foundation of the United States in 1982. From August 1987 to September 1990, he was with Beijing University of Posts and Telecommunications as an Associate Professor (1987) and Professor (19881990). From September 1990 until June 1996, he was with the VLSI Research Group, University of Windsor, as a Senior Research Scientist. He is now employed by Genesis Microchip, Inc., Markham, ON, Canada. Mr. Wangs interests include computer arithmetic, VLSI architecture, orthogonal transforms and their algorithms and applications, digital signal and image processing. He is a fellow of the Chinese Institute of Electronics and a member of the American Mathematics Society.
Jinghong (June) Wang received the B.E. in electronic engineering from Tsinghua University, Beijing, China, in 1988 and the M.Sc. in electrical engineering from the University of Windsor, ON, Canada, in 1995. During 1988 to 1991, she worked on adder design and microprocessor analysis in the Institute of Microelectronics at Tsinghua University, Beijing, China. From 1994 to the Summer 1995 she worked on microcontroller and memory analysis at Semiconductor Insights Inc., Kanata, ON, Canada. Since August 1995, she has been with Telecom Microelectronics Center, Northern Telecom Ltd., Canada, as an IC Design Engineer.
Sami S. Bizzan was born in Tripoli, Libya, in 1965. He received the Bachelor of Applied Science and Master of Applied Science in electrical engineering from University of Windsor, ON, Canada, in 1989 and 1991, respectively. He is a candidate in the electrical engineering Ph.D. program at the University of Windsor. Currently, he is working with ATI Technology, Thornhill, Canada, as an ASIC design engineer where he is designing and optimizing full custom macros for speed and silicon area. His research interests include high performance VLSI circuit design, parallel computing architectures, VLSI architectures, and digital signal processing.

Fast Adders Using EnhancedMultiple-Output Domino Logic

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Fast Adders Using EnhancedMultiple-Output Domino Logic

Transféré par

Droits d'auteur :

Formats disponibles

206

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 2, FEBRUARY 1997