Vous êtes sur la page 1sur 23

A PIPELINED MIXED ARCHITECTURE OF 16x16 MULTIPLIER FOR LOW POWER AND HIGH SPEED DSP APPLICATIONS

A Project Report submitted in the partial fulfilment for the award of Bachelor of Technology in Electronics and Communication Engineering By Nitesh Heda Prasad Nirmal Kameshwar Rohit Kumar Bachelor of Technology, VII Semester, Electronics and Communication Engineering (2010-11)

Under the guidance of Mr. K V Krishna Rao Assistant Professor Electronics and Communication Engineering Department MNNIT, Allahabad

ELECTRONICS AND COMMUNICATION ENGINEERING DEPARTMENT MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY ALLAHABAD-211004

MOTILAL NEHARU NATIONAL INSTITUTE OF TECHNOLOGY Department of Electronics and Communication Engineering Allahabad-211004

Certificate

This is to certify that the term paper project titled A Pipelined Mixed Architecture of 16x16 bit Multiplier for low power and high speed DSP Applications submitted by Nitesh Heda, Prasad Nirmal Kameshwar and Rohit Kumar in the partial fulfilment of the requirement for the award of Bachelor of Technology in Electronics and Communication Engineering to the Electronics and Communication Engineering Department, Motilal Nehru National Institute of Technology (Deemed University), Allahabad, is a bonafide work of students carried out under my supervision.

Date: Place: (Mr. K V Krishna Rao) Assistant Professor

Acknowledgement
It is a great privilege for us to express our deep sense of gratitude to our supervisor, Asst. Prof. K V Krishna Rao of Electronics and Communication Engineering Department, MNNIT Allahabad for his stimulating guidance and profound assistance. We shall always cherish our association with him for his constant encouragement and freedom of thought and action that he rendered to us throughout the final year project. We also express out thanks to the head of department Prof. Sudarshan Tiwari for his invaluable support and encouragement throughout the project. We also feel a great pleasure to thank Dr. Rajeev Tripathi, Asst. Prof. Amit Dhawan, Asst. Prof. Sanjeev Rai and Asst. Prof. Rajiv Gupta for their cooperation which led to the successful competition of our work. Finally, we deem it a great pleasure to thank our family and one and all who helped us in carrying out this project.

Date: Place: Nitesh Heda (200750) Prasad Nirmal Kameshwar (20075005) Rohit Kumar (20075021)

Abstract
This project describes a Pipelined mixed architecture of a 16x16 bit multiplier for low power and high speed DSP application. In this project, some of the key multiplier structure, such as Array Multiplier, Wallace multiplier and Bypass Tree Multiplier, have been implemented and their performance parameters compared. Then, pipelining of these multipliers was considered for the requirement of continuous multiplication in DSP processors. Finally, a mixed architecture consisting of the altered Wallace and bypass tree multiplier, with pipelining was simulated and its performance was measured. Pipelining allowed multiple processes running at the same time whereas; the low power dissipation of Bypass logic and the low delay of Wallace structure were exploited. It has been shown that this structure offers a good choice as a multiplier for DSP processors, which require continuous multiplication.

Keywords: Wallace Multiplier, Bypass Multiplier, Array Multiplier, Pipelining, DSP.

Contents
Certificate .................................................................................................................................... i Acknowledgement ..................................................................................................................... ii Abstract .....................................................................................................................................iii Contents ....................................................................................................................................iii List of Figures ............................................................................................................................ v Chapter 1. 1.1 1.2 1.3 1.4 Chapter 2. 2.1 2.2 2.3 2.4 2.5 2.6 Chapter 3. 3.1 3.2 3.3 3.4 3.5 3.6 Chapter 4. 4.1 Introduction ...................................................................................................... 1 History and Background..................................................................................... 2 Important Features.............................................................................................. 2 Applications ....................................................................................................... 2 Methodology ...................................................................................................... 2 Face Detection .................................................................................................. 2 Introduction ........................................................................................................ 2 Image Acquisition .............................................................................................. 2 Colour Segmentation .......................................................................................... 2 Noise Removal ................................................................................................... 4 Edge Detection and Dilation .............................................................................. 4 Face Cropping .................................................................................................... 6 Facial Expression Recognition ........................................................................ 9 Introduction ...................................................................................................... 10 Principal Component Analysis ......................................................................... 10 Computation of Eigen Face .............................................................................. 10 Calculation of Eigen Vector ............................................................................. 10 Representing faces onto its basis ...................................................................... 10 Training and Testing ........................................................................................ 10 Hardware Implementation ............................................................................ 13 L293 Motor Driver ........................................................................................... 13

Chapter 5.

Experimental Results and Analysis .............................................................. 23

References ............................................................................................................................... 24

List of Figures
Figure 1: An IP-Surveillance system ........................................................................................ 1 Figure 2: OpenCV Structure and Content ................................................................................. 3 Figure 3: Graphical User Interface Design ............................................................................... 5 Figure 4: Common Haar features .............................................................................................. 9 Figure 5: Face detection output ............................................................................................... 12 Figure 6: AXIS 214 PTZ Network Camera ............................................................................. 13 Figure 7: Flow Chart ............................................................................................................... 17 Figure 8: GUI to detect and track the biggest face in live video ............................................. 18

List of Tables
Table 1.1: Applications of face recognition and face detection technology .............................. 2 Table 1.2: Argument and Values for MJPG Request ............................................................. 14 Table 1.2: Argument and Values for PTZ Functions .............................................................. 15

CHAPTER 1 INTRODUCTION

1.1 Background
In todays fast technologically developing world, the shift has been towards construction of small and portable devices. As the number of these battery operated, processor driven equipments increase and their performance demand is expected to be more, there is a need of increasing their processing speed and reducing their power dissipation. In such a consumer controlled scenario, these demands mean a serious look into the construction of the devices. These processors used for such purposes are nothing but the DSP processors. Also, in these processors, major operations such as FIR filter design, DCT, etc are done through multipliers. As multipliers are the major components of DSP, optimization in multiplier design will surely lead to a better operating DSP.

1.2 Multiplier Features


The features of the multiplier proposed in this paper are: 1. Pipelining: Pipelining allows this multiplier to accept and start the partial process of multiplication of a set of data, even though a part of another multiplication is taking place. 2. Mixed Architecture: The mixed type architecture has been considered, consisting of Wallace and Bypass tree multiplier. This allows taking the advantage of low delay of Wallace multiplier and low power dissipation in bypass multiplier. 3. Clocking: Clocking has been so done as to allow the multiplier to work at its highest clock frequency without compromising with the perfect flow of partial products in the structure. 4. Data range: The data range has been extended from initial 4x4 bit to 16x16 bit, which is actually the required working data range for many of the DSP processors. 5. Structural Modelling: This makes sure the best implementation of the multiplier, be it on ASIC or in FPGA, and removes any chance of redundant hardware that may be generated.

1.3 Pipelining

1.4 Scenario

CHAPTER 2 BASIC MULTIPLIER ARCHITECTURES

2.1 Introduction
Basic multiplier consists ANDed terms (as shown in Fig 1.1) and array of full adders and/or half adders arranged so as to obtain partial products at each level. These partial products are added along to obtain the final result. It is the different arrangement and the construction changes in these adders that lead to various type of structures of basic multipliers.

Fig.1.1 ANDed terms generated using logic AND gate

Fig. 1.2: Full Adder (FA) implementation showing the two bits (A,B) and Carry In (Ci) as inputs and Sum (S) and Carry Out (Co) as outputs.

2.2 Array Multiplier


This is the most basic form of binary multiplier construction. Its basic principle is exactly like that done by pen and paper. It consists of a highly regular array of full adders, the exact number depending on the length of the binary number to be multiplied. Each row of this array generates a partial product. This partial generated value is then added with the sum and carry generated on the next row. The final result of the multiplication is obtained directly after the last row.

Fig 2.3: A pictorial description of 6x6 bit Array multiplier. Due to the highly regular structure, array multiplier is very easily constructed and also can be densely implemented in VLSI, which takes less space. But compared to other multiplier structures proposed later, it shows a high computational time. In fact, the computational time is of order of O(N), one of the highest in any multiplier structure.

2.3 Wallace Multiplier


A Wallace tree is an efficient hardware implementation of a digital circuit that multiplies two integers. For a NxN bit multiplication, partial products are formed from N2 AND gates. Next N rows of the partial products are grouped together in set of three rows each. Any additional rows that are not a member of these groups are transferred to the next level without modification. For a column consisting of three partial products, a full adder is used, with the sum dropped down to the same column whereas the carry out is brought to the next higher column. For column with two partial products, a half adder is used in place of full adder. At the final stage, a carry propagation adder is used to add over all the propagating carries to get the final result.

Fig. 2.4: Dot diagram stages in 8x8 bit Wallace tree multiplier (Courtesy: W. J. Townsend, E. E. SSwartzlander and J. A. Abraham) The computational complexity of Wallace tree multiplier has achieved the lowest bound i.e. O3/2(N). Thus, Wallace tree clearly offers advantage over other type of multipliers on the basis of high speed.

2.4 Bypass Tree multiplier


The principle underlying Bypass multiplier is to bypass those hardware (or cells) whose input multiplicand or/and multiplier bit is 0. This removes that hardware whose output is 0 and so reduces the power consumption in those areas. The basic structural arrangement of Bypass multiplier is very similar to that of Array multiplier. The difference lies in the construction of basic cell of bypass multiplier, used in place of full adder in array multiplier. It consists of bypassing logic, which depends on the bit value of the multiplicand and multiplier input. The combinational part of the logic is

implemented using two MUXs which outputs the actual sum and carry out of the full adder if the input bits are 1, else it bypasses the FA by outputting the sum and carry out from the cell of previous row.

Fig. 2.5: A 4x4 bit bypassing Multiplier implementation showing the construction using the basic cell (courtesy: C. C. Wang and G. N. Sung)

Fig. 2.6: Basic Cell with bypassing logic for 1-D bypassing (courtesy: C. C. Wang and G. N. Sung)

Fig. 2.7: Basic Cell with bypassing logic for 2-D bypassing (courtesy: C. C. Wang and G. N. Sung)

Bypassing can be 1-D or 2-D. In 1-D (one dimensional) bypassing, the bypassing logic depends only on the value of the multiplier bits. This logic is easy to implement but does not efficiently use the bypassing technique. The 2-D (two dimensional) bypassing depends both on the bit value of multiplier and multiplicand. The logic for this is hard to implement with respect to the 1-D, but efficiently uses the advantage offered by bypassing technique.

The complexity in 2-D bypassing lies in the fact that in 2-D, not only the row with particular multiplier bit as 0 is bypassed but also the column with multiplicand bit as 0 has to be bypassed. This bypassing in a cell should make sure that the carry out and the sum from previous cell is added at the respective weights, even if that particular cell is bypassed. If care is not taken in such designing, then these carry outs and/or sums may get lost in between and never be compensated for. The main logic signals defining the working of bypass logic in 2-D bypass multiplier are as follows: [] muxR_blij= . + 1 . .+1, 2 .
=1, =3 32,1 4

muxR_blij= . + 1 . .+1, 2 . + 2 . ., 2 muxC_blij= + 1 . .+1, 2 .


=1, =3

muxC_blij= + 1 . .+1, 2 . + 1 . ., 2 muxL_blij= + 1 . .+1, 2 .


=1, =3

32,1 4

muxL_blij= + 1 . .+1, 2 . + 1 . ., 2

32,1 4

Thus we observe that in 2-D bypassing, the logic not only depends on the multiplier and multiplicand bits but also on the carry out bits from previous rows (maximum of 2).

CHAPTER 3 PIPELINING IN MULTIPLIERS

3.1 Introduction

CHAPTER 4 MIXED PIPELINE MULTIPLIER ARCHITECTURE

4.1 The need of Mixed Architecture


During the simulation of Pipelined Wallace Tree multiplier (PWTM) and Pipelined Bypass multiplier (PBM), it was observed that PWTM offered the low delay whereas PBM had the upper hand because of very low power dissipation, with the same amount of total resources used. Surely, the next move in designing the low power and high speed multiplier architecture was to try to take advantage of both by mixing their architecture. This would fulfil our expectations from the multiplier in terms of power and delay, while being practically implementable.

4.2 Architecture Outline


Most of the DSP processors work on the floating point data types. That is, the numbers to be multiplied are in form of mantissa and the exponent. Also, the mantissa is represented in 1.M form and exponent as 2E. The real multiplication to be done is between the mantissa of the two numbers only, as the exponent needs to be added. One advantage offered by such a method is that we can be sure to have MSBs of mantissa as 1. The LSBs may or may not be 1. This implies that the real gain of the mixed architecture can be taken if we use bypassing logic for multiplication of LSBs of mantissa, as they have higher probability of containing 0s, whereas we can use Wallace tree structure for the multiplication of MSBs so as to reduce the delay in that side.

4.3 Structure
The inputs considered for multiplication are 16 bits of data. Each of these has been divided into two parts of 8 bits, consisting of the MSBs and LSBs. The multiplication has been considered in 4 parts now. Pipelined bypass multiplier has been used for multiplication of two LSB parts, or a MSB part and LSB part. This is done in lieu of the explanation done above so as to reduce the power dissipation. The MSB parts of the two binary numbers have been multiplied using the pipelined Wallace tree multiplier so to reduce the delay in multiplication. The four products obtained are then inputted to an adder arrangement which adds all these products, taking care of their respective weights. The final result is the Output of this adder arrangement.

Fig. 4.1: Block Diagram of our Proposed Pipelined Mixed Multiplier Structure. X {X1,X0} and Y {Y1,Y0} are the 16 bit input.

CHAPTER 5 SIMULATION AND RESULT

5.1 Introduction
The tool used for the simulation and verifying of result was XILINX ISE (11 and 12.2). The hardware implementation has been done with the basis of Vertex 5 (XCV110T). The complete hardware coding has been done in Verilog. Also, the whole implementation has been done through Structural coding, which has the advantage of removing any redundant hardware generated by any other type of modelling. Also, it is easily and practically implementable. For power analysis, we have used the XPower Analyzer tool of Xilinx.

5.2 Basic Multiplier


For initial comparison and understanding of the differences in the basic multiplier structures, a 4x4 bit multiplier was implemented using the explained basic architecture and their performance was evaluated.

Architecture

Delay (ns)

Power Dissipation (mW)

Area overhead

Array Bypass Wallace Tree

5.603 6.538 6.685

Fig. 5.1: Technology Schematic of 4x4 bit Array Multiplier. (Generated from Xilinx Synthesis)

Fig. 5.2: Technology Schematic of 4x4 bit Bypass Multiplier. (Generated from Xilinx Synthesis)

Fig. 5.3: Technology Schematic of 4x4 bit Wallace Tree Multiplier. (Generated from Xilinx Synthesis)

5.3 Pipelined Multiplier Structure (8x8 bit)

Architecture

Delay (ns)

Power Dissipation (mW)

Resources used

Array (Nonpipelined) Pipelined Wallace Tree

10.97

422

6.36

436 (56 mW Dynamic)

0.859% of Slice Registers 0.796% of Slice LUTs

Pipelined Bypass

7.26

396 (16 mW Dynamic)

1.161% of Slice Registers 0.776% of Slice LUTs

Maximum Clock Frequency: Pipelined Wallace tree Multiplier: 411.00 MHz Pipelined Bypass Multiplier : 423.99 MHz

5.4 Pipelined Mixed Architecture (16x16 bit)

Vous aimerez peut-être aussi