Vous êtes sur la page 1sur 78

PIPELINE FFT ARCHITECTURE IMPLEMENTATION USING VERILOG HDL

A Thesis

by

BHAVISHYA MURUKUTLA

Submitted to the College of Graduate Studies


Texas A&M University-Kingsville
in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

DECEMBER 2013

Major: Electrical Engineering


ABSTRACT

PIPELINE FFT Architecture Implementation Using VERILOG HDL

(December 2013)

Bhavishya Murukutla, Bachelor of Technology, JNTU University, Kakinada, INDIA

Chairman of Advisory Committee: Dr. Reza Nekovei

In most of the Communication Systems the Fourier transform is the main concept to process the

signals which are used in the system. Then the FFT/IFFT comes in to the picture for fast signal

processing, but the FFT/IFFT has some delays present and area of the implementation is very

large [1]. So we need to design an architecture which is optimized in terms of both delay and

area.

The Main Reason for the delay and complexity of the architecture due to the complex

multiplications implementation present in the Fast Fourier Transform due to the twiddle factors

(𝑤𝑁𝑟 ). Then the proposed pipeline architecture leads to decreasing of the complex multiplications

[1]. But the delay is present in this architecture also.

Here we propose Pipeline architecture which removes the complex multiplications using the

twiddle factors. In this Pipeline architecture we are going to use the delay elements, switch

elements and basic butterfly structure and the input data stream is divided in to two half streams

and processing is done at the same time and output data stream will also be in two half data

streams and here the number of complex multiplications will be reduced a lot which reduces the

cost of implementation.

iii
ACKNOWLEDGEMENT

I consider this opportunity to show my gratitude towards my advisor chair Dr. Reza

Nekovei, for his invaluable guidance throughout this thesis work, my career choices and for

answering every question very patiently.

I also extend my appreciation to the members of the supervisory committee;

Dr. Lifford McLauchlan and Dr. Sung-won Park. This thesis would not have been possible or

successful without their invaluable instructions.

I would like to thank all the faculty and members and staff of Texas A&M University-

Kingsville for the timely response and help I received during my course of study here.

I would like to thank my sister and all my friends for being there for me with constant

encouragement.

I wish to deliver my deep love and thankfulness to my dear parents. Their selfless and

great love is a significant impetus throughout my study life.

Finally, I thank God for pouring out his wisdom and knowledge on me.

iv
TABLE OF CONTENTS

Page

ABSTRACT................................................................................................................................... iii

ACKNOWLEDGEMENT ............................................................................................................ iv

TABLE OF CONTENTS ................................................................................................................v

LIST OF FIGURES ....................................................................................................................... vi

LIST OF TABLES ........................................................................................................................ vii

CHAPTER I. INTRODUCTION ....................................................................................................1

CHAPTER II. DIFFERENT FFT ALGORITHMS .........................................................................7

CHAPTER III. IMPLEMENTATION OF DITFFT AND PIPELINED FFT ...............................24

CHAPTER IV. RESULTS .............................................................................................................33

CHAPTER V. CONCLUSION ......................................................................................................37

REFERENCES ..............................................................................................................................39

APPENDIX A CODE FOR IMPLEMENTATION OF DIT FFT AND PIPELINED FFT ..........42

VITA ..............................................................................................................................................70

v
LIST OF FIGURES

Page

Fig. 1.1. Quatrus-II Work Flow………………………………………………………………….. 5

Fig. 2.1. Basic Butterfly Structure ..................................................................................................8

Fig. 2.2. Modified Butterfly Structure .............................................................................................9

Fig. 2.3. Butterfly Structure used in the DIF FFT .........................................................................10

Fig. 2.4. Basic Decimation in Time FFT .......................................................................................13

Fig. 2.5. 8-Point Decimation in Time FFT. ...................................................................................14

Fig. 2.6. Sequence of input in DIT FFT .........................................................................................15

Fig. 2.7. Radix-4 DIT FFT .........................................................................................................…16

Fig. 2.8. Input to Output sequence generation of Decimation in Frequency FFT .........................18

Fig. 2.9. Butterfly structure used in DIF FFT ............................................................................…19

Fig. 2.10. 8-Point Radix-2 Decimation in Frequency FFT ............................................................20

Fig. 2.11. Radix-4 DIF FFT ...........................................................................................................20

Fig. 2.12. Basic Pipeline architecture ........................................................................................…22

Fig. 2.13. R2MDC Pipeline architecture of 8- Point .....................................................................22

Fig. 3.1. Stage-1 Butterfly..............................................................................................................24

Fig. 3.2. Second Stage Butterfly ....................................................................................................26

Fig. 3.3. Butterfly used in third stage.............................................................................................28

Fig. 3.4. Pipeline architecture of 64-Point FFT using R2MDC .....................................................32

Fig. 4.1. Simulation Output for 8-Point DIT FFT..........................................................................33

Fig. 4.2. RTL View of 8-Point DIT FFT .......................................................................................34

Fig. 4.3. Resource Utilization Summary........................................................................................35

vi
Fig. 4.4. Worst Case Delay ............................................................................................................35

Fig. 4.5. Simulation Output for 64-Point Pipeline FFT .................................................................36

vii
LIST OF TABLES

Page

Table. 2.1. Bit Reversal Order .......................................................................................................12

Table. 3.1. Comparison between Normal DFT and FFT ...............................................................31

viii
CHAPTER I

INTRODUCTION

In most of the communication systems [1] and control system the frequency spectrum of

the signal is important to calculate the frequency range of the signal to know whether the system

can use the signal for further processing.

Most of the signals are in time domain in which the variation is represented with respect to time

so to get the frequency domain signal which means to know the signal variation with respect to

frequency [2] we need a transformation from time domain to frequency domain which is done by

using different transformation techniques [3]. They are the following:

1) Fourier series [3]

2) Fourier Transform(FT)

3) Laplace Transform

4) Z-transform

The Fourier series is applied only for the repetitive signals or periodic signals so we are

going to the Fourier Transform which can be applied for the periodic and non-periodic signals

[3] also.

The FT of a signal is done by decomposing the signal in to sum of finite sinusoidal

components. The FT can be done by using the Discrete Fourier Transform in which equally

spaced samples of a function is transformed to finite combination of complex sinusoidal signals

[4].

The Discrete Fourier Transform can be expressed as [4]:


2ᴨ
𝑁−1 −𝑗 ( )𝑘𝑛
𝐹 𝑘 = 𝑘=0 𝑓(𝑛)𝑒 𝑁 ……………….. (1.1)

Here Nis the samples present in the signals

1
F(k) is the frequency domain signal

f(n) is the time domain signal

The Frequency domain output F(k) is discrete signal as the input considered should be in discrete

ones.

The Fourier Transform can also be done by using the DTFT in which the input signal should be

discrete and the output frequency domain signal is continuous signal.

The DTFT is expressed as [4]:

𝐹 𝑒 𝑗𝜔 = 𝑛=∞
𝑛=−∞ 𝑓(𝑛) 𝑒
−𝑗𝜔𝑛
………………. (1.2)

Where 𝐹(𝑒 𝑗𝑤 )is the Frequency domain signal which is continuous and periodic one

f(n) is time domain signal and discrete

ω is the frequency

If the input is continuous signal we need to do sampling and get the output as discrete

signal and apply the DTFT technique to get the frequency domain signal.

Applying DFT and DTFT to the signals in time domain leads to frequency domain of the signal

For a N point the conversion can be done by using the following [3] [4]

No of complex multiplications present in DFT: 𝑁 2

No of complex additions present in DFT: 𝑁(𝑁 − 1)

If we consider 8-point input sequence the following is required to convert in to frequency

domain:

64 complex multiplications

56 complex additions

If we increase the numbers of samples in the input sequence the multiplications going to increase

very rapidly.

2
Let us consider the 16-point sequence the conversion requires the following:

No of complex multiplications: 256

No of complex additions: 240

To reduce the no of complex multiplications and additions [4], we are going to use FFT

technique to calculate the frequency domain of the signal the conversion requires the following:

No of complex multiplications: (N/2)𝑙𝑜𝑔2 𝑁 [5]

No of complex additions : N 𝑙𝑜𝑔2 𝑁 [5]

The Fast Fourier Transform is done using the COOLEY-TUKEY [6] algorithm which is

also called prime factor algorithm.

Basically, Fast Fourier Transform can be done by using radix algorithm which can be of

type radix-r, r can be any integer and the N-point FFT can be calculated by using different radix

like radix-2, radix-4 and radix-8 and so-on. The FFT can be implemented by using two different

methods like DIT FFT and DIFFFT.

To increase the speed, the pipeline architecture [6] [7] is used in the computation of FFT

and in particularly Multi Delay Commutator [6] [8] [9] [10] architecture is used in the

communication systems. In the Pipeline architecture, we also use a butterfly element and the

butterfly can be done by using different radix like radix-2 and radix-4 and in this the elements

will be retrieved by using memory addressing [11] [12] [13].

The implementation of FFT using the DIT FFT for 8-Point sequence is done using the

Verilog and synthesized in Quartus-II [14] and the Pipeline architecture for 64-Point is done

using the Verilog in the Modelsim [14].

3
1.2 Radix:

In this the Radix means number of elements can be taken in at a time and processing can

be done using the Butterfly if it is a Radix-2 the input elements will be „2‟ and the processing

like addition and multiplication operations are done and the output can be obtained. If it is

Radix-4 the input elements will be „4‟ and the output elements will be „4‟ at a time.

1.3 Verilog:

Basically the hardware description languages are different from the software description

languages and the mostly used hardware description languages are as follows:

1) VHDL

2) Verilog HDL

Verilog is a hardware description language which is similar to C language which is

standardization of IEEE 1364.In the hardware description languages there is a need of

propagation of signal and time.

1.4 Quartus-II

The synthesis of designed code will be done by using the Quartus-II and to do the

synthesis first we need to do simulation in Modelsim and the synthesis and implementation is

done, placing of Integrated circuits, allocating pins respectively and the timing analysis will be

done to analyze the worst case delay present in the circuit.

After the synthesis we get the different views of the circuit we designed they are:

1) RTL view

2) STATE MACHINE view

The dumping of the program in to the hardware can also be done by using QUARTUS-II

4
In the first step coding can be done by using the different hardware description

languages. In this we are using the Verilog Language But we can also use the VHDL language

and the synthesis and implementation can also be done by using Xilinx software also.

The Flow of the synthesis steps can be as shown below [14]:

DESIGN

VERILOG Coding

Functional Simulation
Using MODLESIM

Synthesis and
Implementation

Place and Route

Timing Analysis Timing Closure

SIMULATION

PROGRAMMING
And
CONFIGURATION

Fig. 1.1.Quartus- II Work Flow [14]

5
The thesis is divided in to Chapter and subsections:

1) Chapter II: deals with theoretical description of different types of FFT algorithms

a) Cooley-Tukey Method

b) DIT FFT

c) DIF FFT

d) Pipelined FFT architecture

2) Chapter III: deals with Implementation of algorithms

a) Implementation of DIT FFT of 8-Point input

b) Implementation of Pipelined FFT of 64-Point

3) Chapter IV: deals with Results

a) Simulation Results

b) RTL view

4) Chapter V: Conclusion

6
CHAPTER II

DIFFERENT FFT ALGORITHMS

2.1 Cooley-Tukey Method:

This Method is most used in the computation of FFT in this the DFT of N point is

expressed as product of N1 and N2[3]

N=N1*N2

It can be done by breaking in to N1 DFT‟s of size N2 point or breaking in to N2 DFT‟s of size N1

point.

In the N1 and N2 one of them is small value compared with other one and if N1 is radix FFT can

be done by using Decimation in Time FFT and if N2 is radix FFT can be done by using

Decimation in frequency FFT.

The operation done in recursive model by using radix-2 DFT‟s and the radix-2 DIT will be done

by multiplying the phase factor which is called as Twiddle factor to odd transform after that

addition and subtraction operation will be performed, butterfly of even and odd transform is

called size-2 DFT

The Fast Fourier Transform can be done by using two different methods[4]:

1) DIT Fast Fourier Transform

2) DIF Fast Fourier Transform

This is done by dividing in to number of stages and they can be calculated as:

𝑣 = 𝑙𝑜𝑔2 𝑁

Where N is the no of input samples present in the time domain

7
𝑁 𝑁
N-Point DFT with even N will be calculated with two ( 2 ) point DFT again each point DFT is
2

𝑁
done by using ( 4 ) point and so on until it reach to 2-point DFT‟s only.

Basically the Fast Fourier Transform can be done by using butterfly structure and the operation

can be done in two ways:

In the two ways one is used in the DIT FFT and other is used in the DIF FFT

Butterfly used in DIT FFT:

>
a c

𝑤𝑁𝑟

b d
>
𝑁
(𝑟+ )
2
𝑤𝑁

Fig.2.1. Basic Butterfly Structure


𝑁
(𝑟+ )
Here a and b are the input samples for the butterfly and 𝑤𝑁𝑟 , 𝑤𝑁 2
are the Twiddle

Factors.

The results from the butterfly structure are as below:

𝑐 = 𝑎 + 𝑏𝑤𝑁𝑟 ……………………….. (2.1)

𝑁
(𝑟+ )
𝑑 = 𝑎 + 𝑏𝑤𝑁 2
………………….... (2.2)

Twiddle Factor:

Twiddle Factor is root of a unity complex in the butterfly operation used to compute the discrete

Fourier transform

8
𝑤𝑁𝑟 = 𝑒 −𝑗 2𝜋𝑟 /𝑁 ……………………… (2.3)

The butterfly requires two complex multiplications and two complex additions we can

reduce the no of complex multiplications by using symmetry property.

The symmetry property is


𝑁 𝑁
(𝑟+ ) ( )
𝑤𝑁 2
= 𝑤𝑁𝑟 ∗ 𝑤𝑁2 ………………………. (2.4)

𝑁
( )
Consider 𝑤𝑁2 the value will be equal to 𝑒 −𝑗𝜋

As From the trigonometric equations 𝑒 −𝑗𝜃 = 𝑐𝑜𝑠Ɵ + 𝑗𝑠𝑖𝑛Ɵ the value can be calculated as

𝑒 −𝑗𝜋 = 𝑐𝑜𝑠𝜋 + 𝑗𝑠𝑖𝑛𝜋……………………. (2.5)

The value will be equal to “-1” the Twiddle Factor can becomes equal to the−𝑤𝑁𝑟 .

From this the butterfly can be modified as shown below:

a 1 c

b d
𝑤𝑁𝑟 -1

Fig. 2.2.Modified Butterfly Structure

The results from the modified butterfly will be equal to

𝑐 = 𝑎 + 𝑏𝑤𝑁𝑟 …………………. (2.6)

𝑑 = 𝑎 − 𝑏𝑤𝑁𝑟 …………………. (2.7)

This requires only “1” complex multiplication and“2” complex additions.

9
Butterfly used in DIF FFT [1]:

The Butterfly structure used in DIF FFT is as shown below:

 1 >
a c

b 𝑤𝑁𝑟 d
 -1 >
Fig. 2.3.Butterfly structure used in the DIF FFT
The results from the butterfly structure is given by

𝑐 = 𝑎 + 𝑏……………………. (2.8)

𝑑 = (𝑎 − 𝑏)𝑤𝑁𝑟 ……………….. (2.9).

This requires “2” complex additions and “1” complex multiplication.

Decimation in Time FFT:

In the DIT FFT the input will be given in bit reversal order and the output will be in the

order.

Decimation in Frequency FFT:

In the DIF FFT the input will be in the correct order and the output will be in the bit

reversal order.

Bit reversal order:

The Bit reversal order is generated using the exchange the first and last bits, the next bit

to first to the previous bit to the last bit present in the sequence and so on.

X (b0 b1 b2 b3 b4) ------------ original order of bits

10
For getting the bit reversal order

1) First exchange the bits b4 and b0

X (𝑏0 𝑏1 𝑏2 𝑏3 𝑏4) = X (𝑏4 𝑏3 𝑏2 𝑏1 𝑏0)

2) exchange the bits b3 and b2

X (𝑏4 𝑏1 𝑏2 𝑏3 𝑏0) = X (𝑏4 𝑏3 𝑏2 𝑏1 𝑏0)

3) The result is bit reversal order

X (𝑏4 𝑏3 𝑏2 𝑏1 𝑏0) is the bit reversal order

Let us consider 8 point input the bit reversal order can be as shown below:

The input samples can be{𝑥 0 , 𝑥 1 , 𝑥 2 , 𝑥 3 , 𝑥 4 , 𝑥 5 , 𝑥 6 , 𝑥 7 }

The bit reversal order can be obtained as:

11
Original sample Binary Representation Bit reversal Order

X 0 X 000 X 000 = X 0

𝑋 1 𝑋 001 𝑋 100 = 𝑋(4)

𝑋 2 𝑋 010 𝑋 010 = 𝑋(2)

𝑋 3 𝑋 011 𝑋 110 = 𝑋(6)

𝑋 4 𝑋 100 𝑋 001 = 𝑋(1)

𝑋 5 𝑋 101 𝑋 101 = 𝑋 5

𝑋 6 𝑋 110 𝑋 011 = 𝑋 3

𝑋 7 𝑋 111 𝑋 111 = 𝑋(7)

Table.2.1. Bit Reversal order


2.2 DIT FFT:

The algorithm in which the x(n) is break down in to smaller subsequences and the

principle of the decimation in time FFT can be explained by considering the No of i/p points in

FFT should be expressed as power of 2.

𝑁 = 2𝑟

The x(n) is break down in to two parts in which one has only even parts and other has odd parts.

The Frequency domain can be obtained from the time domain by using the below formula:

𝑛=𝑁−1
𝑋 𝑘 = 𝑛=0 𝑥 𝑛 ∗ 𝑤𝑁𝑛𝑘 ………………………. (2.10)

Here X(k) is the representation of a signal x(n) in frequency domain.

And the breaking of the signal in to two subsequences leads to the frequency domain as

represented below:

12
𝑋 𝑘 = 𝑛𝑒𝑣𝑒𝑛 𝑥 𝑛 ∗ 𝑤𝑁𝑛𝑘 + 𝑛𝑜𝑑𝑑 𝑥 𝑛 ∗ 𝑤𝑁𝑛𝑘 ………………. (2.11)

Here n will be replaced by 2*r where r varies from 0 to (N/2)-1 the above equation can be

modified as shown below:


𝑁 𝑁
𝑟= −1 𝑟= −1 (2𝑟+1)𝑘
𝑋 𝑘 = 𝑟=0
2
𝑥 2𝑟 ∗ 𝑤𝑁2𝑟𝑘 + 𝑟=0
2
𝑥 2𝑟 + 1 ∗ 𝑤𝑁 ………….(2.12)

By the symmetry property we can break the Twiddle Factor and the frequency domain is sum of

even sequence and odd sequence multiplied by 𝑤𝑁𝑘 .

The decimation in time FFT process as shown below:

𝐺(0)
𝑤𝑁0
G(1)
Even (N/2) Point
DFT G(2) 𝑤𝑁1
𝑤𝑁2
𝐺(3)
𝑤𝑁3

Output
𝐻(0) 𝑤𝑁4 frequency
responses
Odd (N/2) Point 𝐻(1) 𝑤𝑁5
=
DFT
𝐻(2) 𝑤𝑁6

𝐻(3) 𝑤𝑁7

Fig.2.4. Basic Decimation in Time FFT

Dividing the input sequence in to odd and even can be done by giving the input in bit reversal

order and the output frequency responses will be in order as 𝑥 0 , 𝑥 1 , 𝑥 2 … … . 𝑥 7 .

13
𝑁 𝑁
Again each is divided in to two point DFT and so on the process is done till the
2 4

2-point DFT.

The total decimation in time for 8-Point Sequence is as shown below:

𝑋[0] 𝑋[0]

𝑋[4] 𝑤80 -1 𝑋[1]

𝑋[2] 𝑤80 -1 𝑋[2]

𝑋6 𝑤80 -1 𝑤82 -1 𝑋[3]

𝑋[1] 𝑤80 -1 𝑋[4]

𝑋[5] 𝑤80 -1 𝑤81 -1 𝑋[5]

𝑋[3] 𝑤80 -1 𝑤82 -1 𝑋[6]

𝑋7 𝑤80 -1 𝑤82 -1 𝑤83 -1 𝑋[7]

Fig. 2.5. 8-Point DIT FFT

The DIT FFT can be done using different radices:

1) Radix-2

2) Radix-4

14
Radix-2 DIT FFT:

In the radix-2 DIT FFT i/p sequence is divided as shown below:

0 1 2 3 4 5 6 7
8- Point

0 2 4 6 1 3 5 7
0246 0246
4-Point

0 4 2 6 1 5 3 7
2- Point

Fig.2.6. Sequence of Input in DIT FFT

15
Radix-4 FFT:
The Radix-4 basic butterfly diagram is as shown below:

𝑤𝑁0

𝑤𝑁𝑞 -1

-1
1

1
2𝑞
𝑤𝑁 -1

-1
3𝑞
𝑤𝑁 -j

Fig. 2.7.Radix-4 DIT FFT

16
2.3 DIF FFT [1]:

The DIF FFT can be done with i/p in normal order and the o/p in the bit reversal order.

𝑁
In this the N-point is divided in to two point sequences and the sequences can be shown as
2

below:

The first half sequence is with 𝑥 𝑛 where 0≤n≤ (N/2)-1 and

𝑁
The second sequence is with 𝑥 𝑛 + ( 2 ) where 0≤n≤ (N/2)-1

The decimation in Frequency FFT can be done by using different Radices:

1) Radix-2

2) Radix-4

17
Radix-2 DIF FFT:

In this the 𝑁-point is divided in to two parts and the two parts are individually divided as

shown below:

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

Butterfly Computation

0 1 2 3 0 1 2 3

0 1 2 3 0 1 2 3

Butterfly Computation Butterfly Computation

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1

Butterfly Butterfly Butterfly Butterfly

0 4 2 6 1 5 3 7

Fig. 2.8. Input to output sequence generation of DIF FFT

18
Computing the DFT of N-point i/p sequence x(n)

𝑛 =𝑁−1
𝑋(𝑘) = 𝑛 =0 𝑥(𝑛)𝑤𝑁𝑛𝑘 ……………………… (2.13)

𝑁 𝑁
−1 𝑁−1 (𝑛 + )
𝑋(𝑘) = 2
𝑛 =0 𝑥(𝑛)𝑤𝑁𝑛𝑘 + 𝑁 𝑥(𝑛)𝑤𝑁
2
…… (2.14)
𝑛=( )
2

The above equation is modified as shown below:

𝑁
−1 𝑁
𝑥(𝑘) =[ 2
𝑛=0 (𝑥 𝑛 + −1 𝑘
𝑥(𝑛 + )]𝑤𝑁𝑛𝑘 … (2.15) where 𝑘 = 0,1,2 … … . (𝑁 − 1)
2

Basic Butterfly structure used in DIF FFT of Radix-2:

The butterfly structure used in DIF FFT is different from the Butterfly structure used in

DIT FFT. The Basic difference is in the DIT FFT Butterfly the multiplication is done before

additions but in the DIF FFT Butterfly the multiplication is done after additions.

𝑁
𝑥(𝑛) [𝑥 𝑛 + 𝑥(𝑛 + ]
2

𝑁
𝑥(𝑛 + ) -1 𝑤𝑁𝑛 [𝑥(𝑛) – 𝑥(𝑛 + (𝑁/2)] ∗ 𝑤𝑁𝑛
2

Fig.2.9. Butterfly structure used in DIF FFT

This involves two complex additions and one complex multiplication.

19
8-Point DIF FFT:

The FFT is performed using the decimation in frequency as shown below:

𝑋(0) 𝑋(0)
2-Point DFT
𝑋(1) 𝑋(4)
4-Point DFT
𝑋(2) -1 𝑋(2)
2-Point DFT
𝑋(3) -1 𝑋(6)

𝑋(4) -1 𝑤80 𝑋(1)


2-Point DFT
𝑋(5) -1 𝑤81 𝑋(5)
4-Point DFT
𝑋(6) -1 𝑤82 -1 𝑋(3)
2-Point DFT
𝑋 7 -1 𝑤83 -1 -1
𝑋(7)

Fig. 2.10. 8-Point Radix-2 DIF FFT

Radix- 4 FFT:

The Radix-4 FFT basic butterfly diagram is as shown below:

𝑋(𝑛) 𝑋(4𝑟)

𝑁
𝑋 𝑛+ 𝑤𝑛 𝑋 (4𝑟 + 1)
4

𝑁
𝑋(𝑛 + ) 𝑤 2𝑛
2

𝑋(4𝑟 + 2)

𝑤 3𝑛

3𝑁
𝑋 𝑛+ 𝑋(4𝑟 + 3)
4

Fig. 2.11.Radix-4DIF FFT

20
In this the FFT, length can be calculated by using 4𝑣 here v is the number of stages and

the Twiddle Factor can be expressed as shown below:


𝑁
( ) 𝜋 𝜋
𝑤𝑛 4 = (cos − 𝑗𝑠𝑖𝑛 )𝑘 = (−𝑗)𝑘 ……………… (2.16)
2 2

𝑁
( )
𝑤𝑛 2 = (cos 𝜋 − 𝑗𝑠𝑖𝑛 𝜋 )𝑘 = (1)𝑘 ………………... (2.17)

3𝑁
( ) 3𝜋 3𝜋
𝑤𝑛 4 = (cos − 𝑗𝑠𝑖𝑛 )𝑘 = (𝑗)𝑘 ………..….... (2.18).
2 2

2.4 Pipelined R2MDC:

For the H/W architecture of the FFT there are three different types of architectures they are:

1) Single Butterfly Architecture

2) Pipeline Architecture [15] [16]

3) Parallel Architecture [15] [16] [17]

In all these the pipeline architecture [8] is very attractive in the multimedia

communication systems which uses the FFT processor in their systems.

To reduce the complex multiplications further more we proposed pipeline architecture which

produces the low latency, power consumption will be low, throughput will be high and occupies

less area.

The pipeline architecture can be done by using different types as below:

1) Multi Path Delay Commutator (MDC)

2) Single Path Delay Commutator (SDC)

3) Single Path Delay Feedback(SDF)

In this the MDC architecture is having the multiple input data because of its high

throughput and the hardware utilization of the MDC is low.

21
Single Path Delay feedback is best solution for the single input data stream and it is used when

the memory requirement is less but in the SDC architecture [18] usage of adders is very low but

the memory requirement will be more and the output will be in reversal order and we need to get

in to the normal order it requires more operations for that one.

The Basic Pipeline architecture is as shown below:

Butterfly Delay Butterfly Delay Butterfly

Computation Commutator Computation Commutator Computation

Fig. 2.12 Basic Pipeline architecture

In this the input data stream is divided in to two parallel data streams and the processing

is done using the delay elements, butterfly elements and processing elements and in the MDC

architectures depending up on radix we are using the utilization of the resources will depend.

If we are using radix-r the utilization of the resources will be 1/r, r can be any integer and

if we are using radix-2 for the FFT computation the utilization of the resources will be 50%.

If we are using the radix-2 it is called as R2MDC pipeline architecture and the architecture using

the Radix-2 is as shown below [19] [20]:

X3X2X1X0
R R R
R B B B
E F F S
S F
R R
X7X6X5X4 G -j R
RR𝑅

Twiddle

Factor

Fig.2.13. R2MDC Pipeline architecture of 8-Point

22
The R represents the delay elements and BF is the butterfly structure used in the FFT, and

it is done by using the Radix-2 and S is the Switch.

The Pipeline architecture can be implemented using different radix like radix-4 it is

called as R4MDC.By using the pipelined R2MDC architecture the complex multipliers will be

reduced compared with the normal DIT FFT and DIF FFT.

23
CHAPTER III

IMPLEMENTATION OF DIT FFT AND PIPELINED FFT

The implementation of the DIT FFT and Pipelined FFT is done using the Verilog

Hardware Description Language using the Modelsim.

3.1 Implementation of DITFFT:

Starting with the implementation of DIT FFT of 8-point as it can be implemented using different

radix we are starting with Radix-2 Butterfly if we are using Radix-2 number of stages can be

calculated by using 𝑙𝑜𝑔2 𝑁 Here N is number of points in the input sequence [21] For 8-point

sequence it uses 3 stages in each stages there will be usage of different butterflies.

First Stage:

The first stage consists of four similar butterflies which is shown below:

X0 X10

X1𝑤𝑁𝑟 -1 X11

Fig. 3.1.Stage-1 Butterfly

In the Twiddle Factor it consists of the both real and imaginary parts and it can be expressed as

𝑤𝑁0 = 𝑟 + 𝑗 𝑖…………….. (3.1)

In the outputs X10 and X11 also there are real and imaginary parts and they are obtained separately

and it is as shown below:

As X10 = X10r + jX10i………….. (3.2)

24
If real and imaginary parts are not separated the output from the butterfly can be

obtained as:

X10= X0 + (X1*𝑤𝑁𝑟 )…………… (3.3)

Considering the real and imaginary parts in the Twiddle Factor we are going modify the

equation as shown below:

X10=(X0+(X1*(r+ji)))…………. (3.4)

It can be expressed as:

X10=(X0+(X1*r+ (j*X1*i))……… (3.5)

But X10= X10r+j X10i

X10r=(X0+(X1*r))……………….. (3.6)

X10i=(X1*i)……………………… (3.7).

The second output X11 can be obtained by using:

X11=X11r+jX11i

TheX11=(X0-(X1*𝑤𝑁𝑟 ))……………….…. (3.8)

X11r=(X0-(X1*r))…………………… (3.9)

X11i= (-X1*i)……………………... (3.10)

The negative can be obtained by taking the two‟s Complement and the two complement of a

binary number can be calculated by using one‟s Complement and adding „1‟ to it.

X11r=(X0+ (~(X1*r) +1)………… (3.11)

25
X11i= (~(X1*i) +1)……………… (3.12)

The inputs to the first stage can be given in the bit reversal order as there are four

butterflies in the first stage the inputs given to the four butterflies shown below [22]:

For the First Butterfly inputs will be {𝑋(0)and𝑋(4)}

For the Second Butterfly inputs will be {𝑋(2)and𝑋(6)}

For the third butterfly inputs will be {𝑋(1)and𝑋(5)}

For the Fourth butterfly inputs will be {𝑋(3) and 𝑋(7)}

In this the first stage the Twiddle Factor used is 𝑤80 which has real part equal to „1‟ and

the imaginary part equal to „0‟.

Second Stage:

The second stage uses the four input butterfly which can be shown as below:

X10 X20

X11 X21

X12 𝑤𝑁0 -1 X22


𝑤𝑁2 -1

X13 X23

Fig. 3.2. Second Stage Butterfly

Here the Twiddle Factors used are 𝑤80 and 𝑤82 . In the second Twiddle Factor the imaginary part

will be equal to „-1‟ and the real part equal to „0‟.

In the above butterfly shown the output values can be obtained as

X20 = (X10+(X12*(𝑤80 ))…………….. (3.13)

The output X22= (X10-(X12*(𝑤80 ))……………… (3.14).

26
The above two equations will be similar to the equations used in the butterfly which is

used in the first stage so we use same butterfly here imaginary parts of X10 and X12 are not

considered because the imaginary part in the X12 will be equal to zero and Twiddle Factor also

equal to zero [23] [24].

The other two outputs will be multiplied by the Twiddle Factor which has imaginary part

equal to „-1‟ and that will have an consideration of imaginary and real parts so we consider the

(X11r, X11i) and (X22r, X22i) the operation in this butterfly is done by using inversion only of real

and imaginary parts.

X21=(X11 + (X13*𝑤82 )……………….. (3.15)

And the X21= (X11r+jX11i) + (X13r+jX13i)*(-j))…….. (3.16).

The real part will be obtained by using (X11r+ X13i)

Here the imaginary part will be equal to „0‟ and the real part is X11r

The imaginary part will be obtained by using (X11i-X13r)

Here the imaginary part in X11i is equal to „0‟ and the imaginary part is equal to –X13r which is

calculated using two‟s complement which is addition of one‟s complement and one [25].

The other output can be obtained by using X11 and X13 is as shown below:

X23=(X11-(X13*𝑤82 ))………….. (3.17)

The X23= (X11r+jX11i)-(X13r+jX13i)*(-j)…………. (3.18).

The real part will be equal to (X11r-X13i) here X13i will be equal to zero and the

X23r= X11r

The imaginary part will be equal to (X11i+X13r) here X11i will be equal to zero and the imaginary

part will be

27
X23i= X13r

The four input butterfly is a combination of the two 2-input butterflies one using the

Twiddle Factors and the other using the real and imaginary values of the inputs.

Stage 3:

This stage uses the four types of butterflies in which the:

1) Butterfly using the Twiddle Factor 𝑤80

2) Butterfly using the Twiddle Factor𝑤82

3) Butterfly using the Twiddle Factor𝑤81

4) Butterfly using the Twiddle Factor 𝑤83

The butterflies using the Twiddle Factors with 𝑤80 and 𝑤82 are already explained above

and the butterflies using 𝑤81 and 𝑤83 having value (0.707-j (0.707)) and (-0.707-j (0.707)) and the

value can be obtained by using right shifting operation.

X21 Y0

X22 Y1

X23 Y2

X24 Y3

X25 𝑤80 -1 Y4

X26 𝑤81 -1 Y5

X27 𝑤82 -1 Y6

X28 𝑤83 -1 Y7

Fig. 3.3.Butterfly used in third stage

28
The outputs from the butterfly can be obtained as:

Y0=(X21+(X25*𝑤80 ))………………....….. (3.19)

Y4=(X21-(X25*𝑤80 ))……………………… (3.20)

This can be done by using the first stage butterfly only and the outputs from the butterfly

using the Twiddle Factor𝑤82 is done by using taking the two‟s complement of number and the

rest of the outputs can be obtained by using the shifting operation butterflies.

The output Y1 can be obtained by using the

Y1=(X22+(X26*𝑤81 ))…………………. (3.21)

The X26 is having the real and imaginary parts and the Twiddle Factor also consists of

the real and imaginary values and it can be shown as

Y1= (X22r+jX22i) + ((X26r+jX26i)*(0.707-j (0.707)))………… (3.22)

Y1= (X22r+jX22i) + ((X26r*0.707) +(X26i*0.707)) +j (X26i*0.707-(X26r*0.707)))

Y1r= (X22r+ ((X26r*0.707) + (X26i * 0.707))…………. (3.23)

Y1i= (X22i+ ((X26i*0.707) - (X26r*0.707)))………….. (3.24).

The internal ones are obtained by using the internal products and taking the two‟s complement

the other output can be expressed as:

Y5=(X22-(X26*(𝑤83 ))………….. (3.25)

It can be modified as Y5= (X22r+jX22i)-((X26r+jX26i)*(0.707-j (0.707)))

Y5= (X22r+jX22i)-((X26r*0.707) + (X26i*0.707)) +j ((X26r*0.707) + (X26i*0.707))

29
The real part is equal to Y5r= (X22r-((X26r*0.707) + (X26i*0.707))…….. (3.26)

The imaginary part is equal to Y5i= (X22i- (X26r*0.707) - (X26i*0.707))….. (3.27).

They need four internal products and addition and subtraction is done using the two‟s

complement.

The other butterfly using the Twiddle Factor 𝑤83 which is equal to the -0.707-j(0.707) it

is also implemented using the partial products and also by taking the two‟s complementsit can be

as shown below:

In this the two outputs will be

Y3= (X23r+jX23i) + ((X28r+jX28i)*(-0.707-j (0.707)))…….. (3.28)

Y7= (X23r+jX23i)-((X28r+jX28i)*(-0.707-j (0.707)))……… (3.29).

After the simplification we get the values for the real and imaginary parts and it is as shown

below:

Y3r= (X23r+ ((-X28r*0.707) + (X28i*0.707)))……………... (3.30)

Y3i= (X23i + ((-0.707*X28i) + (-X28r*0.707)))…………….. (3.31)

Y7r= (X23r+ ((X28r*0.707) + (-X28i*0.707)))……………… (3.32)

Y7i= (X23i+ ((X28r*0.707) + (X28i*0.707)))……………….. (3.33)

It is obtained using the two internal products and by using for subtraction the two‟s complement

operation will be used.

30
Complex Complex
multiplications Additions
Normal DFT of 8- 64 56
Point
DFT of 8-Point using 12 24
FFT

Table. 3.1. Comparison between Normal DFT and FFT

3.2 Implementation of Pipelined FFT:

The pipelined FFT is implemented using the delay elements and also switches and in this

the input buffer is used to store all the values that needs to be given as an input and the number

of delay elements need to be used depends up on N-Point sequence and as shown above for the

8-point it uses first 2 delay elements and the 1 delay element.

In this we are going to implement the 64-point R2MDC here we use the delay elements with 16,

8, 4 and 2 and switches will also be used of 16, 8, 4 and 2 and the input buffer will have memory

and accessing of the elements can be done by using the address.

The implementation of 64-point is as shown below:

First the input is divided in to two half parallel streams and they are passed through the

delay elements and the switch operation is done. In this the half bits present in the data stream is

getting delayed by the delay elements and the processing will be done with the second half of the

data stream.

First delay of the second half data stream is done and it is delayed by 2 delay elements in

the 8-point but in the 64-point the delay of the 32-point is done with 16 elements and after the

switch operation again the delay operation is done and the butterfly processing is done and the 8

delay elements will be used and the processing is done and again the delay4, delay2 and 1 will be

used.

Totally the delay present is equal to 16+8+8+4+4+2+2+1+1=46.

31
The output will also be obtained as two half sequences and inside the butterfly the

complex multiplications can be done by using the booth multiplication and addition operation the

further processing includes the addition and subtraction operations.

Here the Twiddle Factors are computed and stored and they are given in synchronous to

the operation and the R2MDC of 64-point [21] [22] is as shown below:

X31X30X29……X0 16 ………………. 1

R B B
E F S S F
X63X62X61…….X32 16 ...…………….-j. 1
G

Fig. 3.4. Pipeline architecture of 64-point FFT using R2MDC [23]

Here the input to delay element „1‟ is multiplied by –j and the processing using the

switch is done and the butterfly operation is done at the end also the butterfly used in the

R2MDC architecture is the Radix-2 Butterfly.

32
CHAPTER IV

RESULTS

The FFT is implemented in Verilog and the simulation is done using the MODELSIM PE

10.2 C and the results obtained as shown below:

Fig. 4.1. Simulation output for 8-Point DIT FFT

The clk is given with duty cycle and period and the selection is done using the force

value to select the output to be displayed.

Here Y0r, Y0i, Y1r, Y1i ….Y7r, Y7i are the outputs and at the end one of them is displayed by

using sel.

This is synthesized using the Altera Quartus-II using Cyclone II EP2C35F672C6 device

and the RTL view is as shown

33
Mux0

s el[2..0] SEL[2..0]

bfly1:s 11 bfly2:s 22 bfly4:s 34 yr[7..0]~reg0


PRE
8' h01 -- x[7..0] xr[7..0] x0r[7..0] xr[7..0] OUT D Q yr[7..0]
x0r[7..0] DATA[7..0]
8' h10 -- y[7..0] xi[7..0] x0i[7..0] xi[7..0] x0r[7..0]
x1r[7..0]
8' h01 -- wr[7..0] yr[7..0] x1r[7..0] yr[7..0] x0i[7..0]
x1i[7..0]
8' h00 -- wi[7..0] yi[7..0] x1i[7..0] yi[7..0] x1r[7..0]
8' h4B -- wr[7..0] x1i[7..0]
8' h4B -- wi[7..0]

bfly1:s 12 bfly1:s 21
MUX
8' h04 -- x[7..0] x[7..0]
x0r[7..0] x0r[7..0] bfly2:s 33 Mux8
8' h40 -- y[7..0] y[7..0]
x1r[7..0] x1r[7..0] ENA
8' h01 -- wr[7..0] 8' h01 -- wr[7..0] xr[7..0] x0r[7..0]
x1i[7..0] x1i[7..0] CLR
8' h00 -- wi[7..0] 8' h00 -- wi[7..0] xi[7..0] x0i[7..0] SEL[2..0]

yr[7..0] x1r[7..0]

yi[7..0] x1i[7..0]
yi[7..0]~reg0
bfly1:s 13 PRE
OUT D Q yi[7..0]
8' h02 -- x[7..0] bfly2:s 24 DATA[7..0]
x0r[7..0] bfly3:s 32
8' h20 -- y[7..0]
x1r[7..0] xr[7..0] x0r[7..0]
8' h01 -- wr[7..0] xr[7..0]
x1i[7..0] xi[7..0] x0i[7..0]
8' h00 -- wi[7..0] xi[7..0] x0r[7..0]
yr[7..0] x1r[7..0]
yr[7..0] x0i[7..0]
yi[7..0] x1i[7..0]
yi[7..0] x1r[7..0]

bfly1:s 14 8' hB5 -- wr[7..0] x1i[7..0] MUX


8' h4B -- wi[7..0]
8' h08 -- x[7..0] bfly1:s 23
x0r[7..0]
8' h80 -- y[7..0] ENA
x1r[7..0] x[7..0]
8' h01 -- wr[7..0] x0r[7..0] CLR
x1i[7..0] y[7..0] bfly1:s 31
8' h00 -- wi[7..0] x1r[7..0]
8' h01 -- wr[7..0]
x1i[7..0] x[7..0] x0r[7..0]
8' h00 -- wi[7..0]
y[7..0] x0i[7..0]
8' h01 -- wr[7..0] x1r[7..0]
8' h00 -- wi[7..0] x1i[7..0] Mux1

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux2

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux3

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux4

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux5

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux6

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux7

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux9

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux10

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux11

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux12

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux13

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux14

SEL[2..0]

OUT
DATA[7..0]

MUX

Mux15

SEL[2..0]

OUT
DATA[7..0]

MUX

clk

Fig. 4.2. RTL View of 8-Point DITFFT

By simulating the FFT using the Quartus-II, the resource utilization is as shown below:

34
Fig. 4.3. Resource Utilization summary

The timing analysis is as shown below:

Fig. 4.4. Worst case delay

The Pipeline FFT is implemented using the Verilog and simulated using the MODELSIM

10.2 C and the output is as shown below:

35
Fig. 4.5. Simulation Output for 64-Point Pipeline FFT

To get the output, the reset should have a value‟1‟ and din_valid should be „1‟. When the output

dout_valid is equal to „1‟, the output is getting displayed.

36
CHAPTER V

CONCLUSION

This Thesis work shows the implementation of the 8-Point FFT using the Verilog

Hardware description language and the implementation of 64 Point Pipeline FFT using the

Verilog Hardware Description Language. In the implementation of the 8-Point FFT using the

Verilog the synthesis is done using the Quartus-II and the RTL View of the 8-Point can be

observed and by observing the timing analysis the FFT has less time compared with the other

Fourier transform techniques. From the paper published by PawanVerma, HarpeetKaur,

Mandeep Singh and Balwinder Singh the computation of DFT using the DIT FFT will have less

no of multiplications and additions.

The implementation in the base paper is done using the VHDL but in the thesis it is done

using Verilog which is a Hardware Description language similar to the C Programming

language.

The implementation itself shows that the no of multiplications and additions are reduced

compared to normal one. Due to the reduced multiplications and additions the worst case delay

will be reduced and that leads use of FFT in most of the communication systems which uses the

computation of FT.

The implementation of FFT using the Pipeline architecture by Verilog Hardware

Description Language and in this the delay elements will be used it indicates the increase in the

delay but the complex multipliers will be reduced. From the paper published by Mounir Arioua

the complex multipliers are reduced compared with the multipliers in FFT implementation.

In this Thesis the RTL View of the FFT implementation using the Verilog is shown

above in the results.

37
The RTL view shown in the results is done by synthesizing the 8-Point FFT in the

Quartus-II Cyclone II EP2C35F672C6 device.

The Pipeline architecture can be used when there is requirement of less resource usage

but when it is in point of time delay we can use the general FFT architecture because of using

delay elements in the Pipeline architecture. This study can be expanded by reducing the resource

usage further and also reducing the no of complex multiplications and additions required for the

calculation of FFT which is used in most of the communication systems.

38
REFERENCES

[1] Mounir.A, Moha Hassant,” VHDL implementation Of Optimized 8-point FFT in

Pipelined Architecture For OFDM applications”, International Conference on Multimedia

communication systems, IEEE, pp. 1-5, 2010.

[2] Weidong Li, “Studies on implementation of lower power FFT processors”, Linkoping

Studies in Science and Technology ,Thesis No. 1030, ISBN 91-7373-692-9 , Linkoping,

Sweden, June 2003.

[3] S. He and M. Torkelson, “A new approach to Pipeline FFT processor”, In proceedings of

the 10th International Parallel Processing symposium. (IPPS). pp.766-770, April 1996.

[4] Pawan Verma, Harpeet Kaur and Mandeep singh, “VHDL implementation of FFT/IFFT

Blocks for OFDM,” International Conference on Advances in Recent Technologies in

Communication and Computing, pp. 186-188, 2009.

[5] Johnson, L.,” Conflict Free Memory Addressing for Dedicated FFT Hardware”, IEEE

Transactions on Analog and Digital Signal Processing, pp.312-316, 1992.

[6] J.W. Cooley and J.W. Tukey, “An algorithm for the machine calculation of complex

fourier series” IEEE Transactions on Math computation, Vol 19, pp297-301,1965.

[7] W. Li and L. Wanhammar, "A Pipeline FFT Processor", IEEE Workshop on signal

Processing Systems (SiPS), Taipei, China, pp.1982-1985,Oct.1999

[8] E.H. Wold and A.M. Despain, “Pipeline and Parallel Pipeline FFT processors for VLSI

implementation”, IEEE Transactions on Computers, pp.414-426, May 1984.

[9] R. Stron, “Radix -2 FFT Pipeline architecture with reduced noise to signal ratio”, IEEE

Proceeding on Image signal processing, pp. 81-86, Apr, 1984.

39
[10] D. Cohen, “Simplified Control of FFT Hardware”, IEEE Transactions on Signal and

Speech Processing, pp. 577–579, Dec 1976.

[11] T. widhe, “Efficient Implementation of FFT processing Elements”, Linkoping Studies in

Science and Technology, Thesis No. 619, Linkoping University, Sweden, 1997.

[12] L.R. Rabiner and B. Gold, “Theory and Application of Digital signal Processing”,

Prentice-Hall, pp. 23-27, 1975.

[13] M. Petrov, M. Glesner, “Optimal FFT Architecture Selection for OFDM Receivers on

FPGA”, In Proc. Of 2005 IEEE International Conference on Field Programmable

Technology, pp. 313 – 314, 2005.

[14] J. Viejo, A. Millan, M.J. Bellido, ”Design Of a FFT/IFFT module as an IP core suitable

for embedded systems”, IEEE Transactions on Industrial Embedded Systems, pp. 337-

340, 2007.

[15] J. Melander, “Design Of SIC FFT Architectures”, Linkoping Studies in Science and

Technology, Thesis No.618.Linkoping University, Sweden 1997.

[16] U.M. Baese , Digital signal Processing with Field Programmable Gate Arrays, 3rd edition

Springer,2007.

[17] Weidong. Li, Mark Vesterbacka and Lars Wanhammar, “An FFT Processor Based On 16

POINT Module”, Electronics Systems, Dept. of EE., Linkoping University, pp.1-8,1996.

[18] Y. Ma, “A VLSI oriented Parallel FFT algorithm”, IEEE Transactions on Signal

Processing, VOL 44, NO 2, pp. 445-448,Feb 1996.

[19] E.E. Swatzlander., W.K.W. Young, and S.J. Joseph, “A radix-4 delay commutator for

fast Fourier transforms processor implementation”, IEEE J. Solid- State Circuits, SC-

19(5), pp. 702-709 Oct 1984.

40
[20] Yunho Jung, Hongil Yoon and Jaeseok Kim, "New Efficient FFT Algorithm and Pipeline

Implementation Results for OFDM/ DMT Applications”, IEEE Transactions on

consumer Electronics, vol.49, no.1, pp. 14-17, Feb. 2003.

[21] W. Li, L. Wanhammar, “Complex Multiplication reduction in FFT processor”, IEEE

Workshop on Signal Processing Systems, Sweden, pp. 654-662, Mar 2002.

[22] Hsin-Lei Lin, Hongchin Lin, Yu Chuan Chen and Robert C. Chang, “A Novel Pipelined

Fast Fourier Transform Architecture for Double Rate OFDM Systems”, IEEE

Transactions on Signal Processing Systems, pp. 7-11, 2004.

[23] Shousheng He and Mats Torkelson, “Design and Implementation of a 1024- Point

Pipeline FFT processor”, Custom Integrated Circuits Conference, IEEE, pp. 131-134,

1998.

[24] H.L. Lin, H. Lin, Y.C. Chen and R.C. Chang, “A Novel Pipelined Fast Fourier Transform

Architecture for Double Rate OFDM Systems”, IEEE workshop on signal processing

systems design and implementation, pp. 7-11, 2004.

[25] K. Maharatna, E. Grass and U. Jaghold,” A Low power 64 Point FFT/IFFT Architecture

for wireless Broadband Communication”, In 5thInternational OFDM Workshop,

Hamburg, 2000.

41
APPENDIX A

Code for implementation of 8-Point FFT using verilog:

Modulefft(clk,sel,yr,yi); //main module


inputclk;
input [2:0]sel;
outputreg [7:0]yr,yi;
wire [7:0]y0r,y1r,y2r,y3r,y4r,y5r,y6r,y7r,y0i,y1i,y2i,y3i,y4i,y5i,y6i,y7i;
wire [7:0]x20r,x20i,x21r,x21i,x22r,x22i,x23r,x23i,x24r,x24i,x25r,x25i,x26r,x26i,x27r,x27i;
wire [7:0]x10r,x10i,x11r,x11i,x12r,x12i,x13r,x13i,x14r,x14i,x15r,x15i,x16r,x16i,x17r,x17i;
wire [7:0]x0,x1,x2,x3,x4,x5,x6,x7;
parameter w0r=8'b1;
parameter w0i=8'b0;
parameter w1r=8'b10110101;
parameter w1i=8'b01001011;
parameter w2r=8'b0;
parameter w2i=8'b11111111;
parameter w3r=8'b01001011;
parameter w3i=8'b01001011;
//stage1
bfly1 s11(x0,x4,w0r,w0i,x10r,x10i,x11r,x11i);
bfly1 s12(x2,x6,w0r,w0i,x12r,x12i,x13r,x13i);
bfly1 s13(x1,x5,w0r,w0i,x14r,x14i,x15r,x15i);
bfly1 s14(x3,x7,w0r,w0i,x16r,x16i,x17r,x17i);
//stage2
bfly1 s21(x10r,x12r,w0r,w0i,x20r,x20i,x22r,x22i);
bfly2 s22(x11r,x11i,x13r,x13i,x21r,x21i,x23r,x23i);
bfly1 s23(x14r,x16r,w0r,w0i,x24r,x24i,x26r,x26i);
bfly2 s24(x15r,x15i,x17r,x17i,x25r,x25i,x27r,x27i);
//stage3
bfly1 s31(x20r,x24r,w0r,w0i,y0r,y0i,y4r,y4i);
bfly3 s32(x21r,x21i,x25r,x25i,w1r,w1i,y1r,y1i,y5r,y5i);
bfly2 s33(x22r,x22i,x26r,x26i,y2r,y2i,y6r,y6i);
bfly4 s34(x23r,x23i,x27r,x27i,w3r,w3i,y3r,y3i,y7r,y7i);

always@(posedgeclk)
case(sel)
0:beginyr=y0r; yi=y0i; end
1:beginyr=y1r; yi=y1i; end
2:beginyr=y2r; yi=y2i; end
3:beginyr=y3r; yi=y3i; end
4:beginyr=y4r; yi=y4i; end
5:beginyr=y5r; yi=y5i; end
6:beginyr=y6r; yi=y6i; end
7:beginyr=y7r; yi=y7i; end

42
endcase
endmodule
module bfly1(x,y,wr,wi,x0r,x0i,x1r,x1i);// sub module
input [7:0]x,y,wr,wi;
output[7:0]x1r,x1i,x0r,x0i;
assign x0r=x+(y*wr);
assign x0i=y*wi;
assign x1r=x+(~(y*wr)+1);
assign x1i=~(y*wi)+1;
endmodule
module bfly2(xr,xi,yr,yi,x0r,x0i,x1r,x1i); // sub module
input [7:0]xr,xi,yr,yi;
output [7:0]x0r,x0i,x1r,x1i;
assign x0r=xr;
assign x0i=~yr+1;
assign x1r=xr;
assign x1i=yr;
endmodule
module bfly3(xr,xi,yr,yi,wr,wi,x0r,x0i,x1r,x1i); // sub module
input [7:0]xr,xi,yr,yi,wr,wi;
output [7:0]x0r,x0i,x1r,x1i;
wire [15:0]p1,p2,p3,p4;
wire [7:0]win,yrn,yin;
wire [8:0]ywr,ywi;
parametersht=8'b1000;
assignyrn=~yr+1;
assign yin=yi;
assign win=~wi+1;
assign p1=(yrn*wr)>>sht;
assign p2=(yin*win)>>sht;
assign p3=(yrn*win)>>sht;
assign p4=(yin*wr)>>sht;
assignywr=(~p1+1)+p2;
assignywi=p3+p4;
assign x0r=xr+ywr;
assign x0i=xi+ywi;
assign x1r=xr+(~ywr+1);
assign x1i=xi+(~ywi+1);
endmodule
module bfly4(xr,xi,yr,yi,wr,wi,x0r,x0i,x1r,x1i); // sub module
input [7:0]xr,xi,yr,yi,wr,wi;
output [7:0]x0r,x0i,x1r,x1i;
wire [15:0]p1,p2;
wire [7:0]win,yrn,yin;
wire [8:0]ywr,ywi;
parametersht=8'b1000;

43
assignyrn=~yr+1;
assign yin=~yi+1;
assign win=~wi+1;
assign p1=(yrn*win)>>sht;
assign p2=(yin*win)>>sht;
assignywr=p1+(~p2+1);
assignywi=p1+p2;
assign x0r=xr+ywr;
assign x0i=xi+ywi;
assign x1r=xr+(~ywr+1);
assign x1i=xi+(~ywi+1);
endmodule
Implementation of 64 Point Pipeline FFT:
`timescale 1ns/1ns // main module
module tb_fft64;
regclk,reset,din_valid;
reg [9:0] din_re,din_im;
wire [9:0] dout_re,dout_im;
wiredout_valid;
fft64 f1(
.clk(clk),.reset(reset),.din_valid(din_valid),.din_re(din_re),.din_im(din_im),.dout_re(dout_re),.do
ut_im(dout_im) dout_valid(dout_valid) );
always #20 clk=~clk;
integer file;
initial begin
clk=0;
reset=0;
din_valid=0;
din_re=10'b0;din_im=10'b0;
#80 reset=1;din_valid=1;
din_re=10'b0010110100;din_im=10'b1000010101;
repeat(200)begin
#40 din_re=din_re+1;din_im=din_im+1;
file=$fopen("result_out.txt") | 1;
$fdisplay(file, "(%d) + (%d )*j ;", dout_re, dout_im );
end
end
endmodule
//submodule of fft64
`timescale 1ns/1ns
module fft64(clk,reset, din_valid, din_re,din_im, //first_r,first_i,last_r,last_i,
dout_re,dout_im,dout_valid);
parameter IN_WIDTH=10;
input clk,reset,din_valid;
input [IN_WIDTH-1:0] din_re,din_im;
output [IN_WIDTH-1:0] dout_re,dout_im;

44
outputdout_valid;
wiredout_valid;
wire [IN_WIDTH-1:0] first_r,first_i,last_r,last_i;
wire [IN_WIDTH-1:0] r0_0,i0_0,r32_0,i32_0;//the output signals of buffer
wire [IN_WIDTH-1:0] br0_1,bi0_1,br32_1,bi32_1;
wire [IN_WIDTH-1:0] dr32_1,di32_1,sr0_1,si0_1;
wire [IN_WIDTH-1:0] r0_1,i0_1,r32_1,i32_1;//the output signals of first stage
wire [IN_WIDTH-1:0] br0_2,bi0_2,br32_2,bi32_2;
wire [IN_WIDTH-1:0] dr32_2,di32_2,sr0_2,si0_2;
wire [IN_WIDTH-1:0] r0_2,i0_2,r32_2,i32_2;//the output signals of second stage
wire [IN_WIDTH-1:0] br0_3,bi0_3,br32_3,bi32_3;
wire [IN_WIDTH-1:0] dr32_3,di32_3,sr0_3,si0_3;
wire [IN_WIDTH-1:0] r0_3,i0_3,r32_3,i32_3;//the output signals of thirdstage
wire [IN_WIDTH-1:0] br0_4,bi0_4,br32_4,bi32_4;
wire [IN_WIDTH-1:0] dr32_4,di32_4,sr0_4,si0_4;
wire [IN_WIDTH-1:0] r0_4,i0_4,r32_4,i32_4;//the output signals of four stage
wire [IN_WIDTH-1:0] br0_5,bi0_5,br32_5,bi32_5,ir32_5,ii32_5;
wire [IN_WIDTH-1:0] dr32_5,di32_5,sr0_5,si0_5;
wire [IN_WIDTH-1:0] r0_5,i0_5,r32_5,i32_5;//the output signals of five stage
wire [IN_WIDTH-1:0] r0_6,i0_6,r32_6,i32_6;
wireclk_in;
wire [4:0]count;
///***************** control *********************************************///
clk_divclk_div(.clk(clk),.reset(reset),.hclk(clk_in));
control control(.clk(clk_in),.reset(reset),.count(count));
input_buffer i1(.wclk(clk),.rclk(clk_in),.reset(reset),.din_valid(din_valid),
.indata_r(din_re),.indata_i(din_im), .first_r(r0_0),.first_i(i0_0),.last_r(r32_0),.last_i(i32_0));
//*****************first stage*********************************************///
bm
b1(.clk(clk_in),.reset(reset),.address(count),.ar(r0_0),.ai(i0_0),.br(r32_0),.bi(i32_0),.r0(br0_1),.i0
(bi0_1),.r16(br32_1),.i16(bi32_1));
delay16 d32_1(.clk(clk_in),.reset(reset),.x_r(br32_1),.x_i(bi32_1), .y_r(dr32_1),.y_i(di32_1));
switch16 s1(.count(count),.x0_r(br0_1),.x0_i(bi0_1),.x1_r(dr32_1),.x1_i(di32_1),
.y0_r(sr0_1),.y0_i(si0_1),.y1_r(r32_1),.y1_i(i32_1));
delay16 d0_1(.clk(clk_in),.reset(reset),.x_r(sr0_1),.x_i(si0_1),.y_r(r0_1),.y_i(i0_1));
///*****************second stage*********************************************///
bm
b2(.clk(clk_in),.reset(reset),.address({count[3:0],1'b0}),.ar(r0_1),.ai(i0_1),.br(r32_1),.bi(i32_1),
.r0(br0_2),.i0(bi0_2),.r16(br32_2),.i16(bi32_2));
delay8 d32_2(.clk(clk_in),.reset(reset),.x_r(br32_2),.x_i(bi32_2), .y_r(dr32_2),.y_i(di32_2));
switch8 s2(.count(count[3:0]),.x0_r(br0_2),.x0_i(bi0_2),.x1_r(dr32_2),.x1_i(di32_2),
.y0_r(sr0_2),.y0_i(si0_2),.y1_r(r32_2),.y1_i(i32_2));
delay8 d0_2(.clk(clk_in),.reset(reset),.x_r(sr0_2),.x_i(si0_2),.y_r(r0_2),.y_i(i0_2));
///*************************third stage *********************************///

45
bm
b3(.clk(clk_in),.reset(reset),.address({count[2:0],2'b0}),.ar(r0_2),.ai(i0_2),.br(r32_2),.bi(i32_2),
.r0(br0_3),.i0(bi0_3),.r16(br32_3),.i16(bi32_3));
delay4 d32_3(.clk(clk_in),.reset(reset),.x_r(br32_3),.x_i(bi32_3), .y_r(dr32_3),.y_i(di32_3));
switch4
s3(.count(count[2:0]),.x0_r(br0_3),.x0_i(bi0_3),.x1_r(dr32_3),.x1_i(di32_3),.y0_r(sr0_3),.y0_i(s
i0_3),.y1_r(r32_3),.y1_i(i32_3));
delay4 d0_3(.clk(clk_in),.reset(reset),.x_r(sr0_3),.x_i(si0_3),.y_r(r0_3),.y_i(i0_3));
///*************************four stage *********************************///
bm
b4(.clk(clk_in),.reset(reset),.address({count[1:0],3'b0}),.ar(r0_3),.ai(i0_3),.br(r32_3),.bi(i32_3),
.r0(br0_4),.i0(bi0_4),.r16(br32_4),.i16(bi32_4));
delay2 d32_4(.clk(clk_in),.reset(reset),.x_r(br32_4),.x_i(bi32_4), .y_r(dr32_4),.y_i(di32_4));
switch2 s4(.count(count[1:0]),.x0_r(br0_4),.x0_i(bi0_4),.x1_r(dr32_4),.x1_i(di32_4),
.y0_r(sr0_4),.y0_i(si0_4),.y1_r(r32_4),.y1_i(i32_4));
delay2 d0_4(.clk(clk_in),.reset(reset),.x_r(sr0_4),.x_i(si0_4),.y_r(r0_4),.y_i(i0_4));
///*************************five stage *********************************///
butterfly b5(.a_r(r0_4),.a_i(i0_4),.b_r(r32_4),.b_i(i32_4),
.a1_r(br0_5),.a1_i(bi0_5),.b1_r(br32_5),.b1_i(bi32_5));
inverter i5(.count(count[0]),.a_r(br32_5),.a_i(bi32_5),.a1_r(ir32_5),.a1_i(ii32_5));
delay1 d32_5(.clk(clk_in),.reset(reset),.x_r(ir32_5),.x_i(ii32_5), .y_r(dr32_5),.y_i(di32_5));
switch1 s5(.count(count[0]),.x0_r(br0_5),.x0_i(bi0_5),.x1_r(dr32_5),.x1_i(di32_5),
.y0_r(sr0_5),.y0_i(si0_5),.y1_r(r32_5),.y1_i(i32_5));
delay1 d0_5(.clk(clk_in),.reset(reset),.x_r(sr0_5),.x_i(si0_5),.y_r(r0_5),.y_i(i0_5));
///*************************six stage *********************************///
butterfly b6(.a_r(r0_5),.a_i(i0_5),.b_r(r32_5),.b_i(i32_5),
.a1_r(r0_6),.a1_i(i0_6),.b1_r(r32_6),.b1_i(i32_6));
dataout dataout(.clk(clk),.reset(reset),.first_r(r0_6),.first_i(i0_6),.last_r(r32_6),.last_i(i32_6),
.dout_re(dout_re),.dout_im(dout_im),.dout_valid(dout_valid));
//always @(posedgeclk_in)
//begin
//end
Endmodule
For dividing the clock:
`timescale 1ns/1ns
Moduleclk_div(clk,reset, hclk);
input clk,reset;
output hclk;
reghclk;
//reg count;
always @(posedgeclk or negedge reset)
begin
if (!reset)
hclk<=0;
else
hclk<=hclk+1;

46
end
endmodule
// control block implementation
`timescale 1ns/1ns
module control( clk,reset, count);
input clk,reset;
output [4:0] count;
reg [4:0] count;
always @(posedgeclk or negedge reset)
begin
if (!reset) begin
count<=5'b11111;
end
else begin
count<=count+1;
end
end
endmodule
// input buffer
`timescale 1ns/1ns
Moduleinput_buffer(wclk,rclk,reset,din_valid, indata_r, indata_i,first_r, last_r,first_i, last_i);
parameter IN_WIDTH=10;
inputwclk,rclk,reset,din_valid;
input [IN_WIDTH-1:0] indata_r,indata_i;
output [IN_WIDTH-1:0] first_r,last_r,first_i,last_i;

reg [IN_WIDTH-1:0] mem_r [127:0];


reg [IN_WIDTH-1:0] mem_i [127:0];
reg [6:0] count1;
reg [5:0] count2;
reg [IN_WIDTH-1:0] first_r,last_r,first_i,last_i;
always @(posedgewclk or negedge reset )
begin
if(!reset)
count1<=7'b1111111;
else if(din_valid==1)
count1<=count1+1;
end
always @(posedgewclk or negedge reset )
begin
if (!reset)
begin
mem_r[0]=10'b0; mem_i[0]=10'b0;mem_r[1]=10'b0; mem_i[1]=10'b0;mem_r[2]=10'b0;
mem_i[2]=10'b0;mem_r[3]=10'b0;mem_i[3]=10'b0;
mem_r[4]=10'b0; mem_i[4]=10'b0;mem_r[5]=10'b0; mem_i[5]=10'b0;mem_r[6]=10'b0;
mem_i[6]=10'b0;mem_r[7]=10'b0;mem_i[7]=10'b0;

47
mem_r[8]=10'b0; mem_i[8]=10'b0;mem_r[9]=10'b0; mem_i[9]=10'b0;mem_r[10]=10'b0;
mem_i[10]=10'b0;mem_r[11]=10'b0;mem_i[11]=10'b0;
mem_r[12]=10'b0; mem_i[12]=10'b0;mem_r[13]=10'b0; mem_i[13]=10'b0;mem_r[14]=10'b0;
mem_i[14]=10'b0;mem_r[15]=10'b0;mem_i[15]=10'b0;
mem_r[16]=10'b0; mem_i[16]=10'b0;mem_r[17]=10'b0; mem_i[17]=10'b0;mem_r[18]=10'b0;
mem_i[18]=10'b0;mem_r[19]=10'b0;mem_i[19]=10'b0;
mem_r[20]=10'b0; mem_i[20]=10'b0;mem_r[21]=10'b0; mem_i[21]=10'b0;mem_r[22]=10'b0;
mem_i[22]=10'b0;mem_r[23]=10'b0;mem_i[23]=10'b0;
mem_r[24]=10'b0; mem_i[24]=10'b0;mem_r[25]=10'b0; mem_i[25]=10'b0;mem_r[26]=10'b0;
mem_i[26]=10'b0;mem_r[27]=10'b0;mem_i[27]=10'b0;
mem_r[28]=10'b0; mem_i[28]=10'b0;mem_r[29]=10'b0; mem_i[29]=10'b0;mem_r[30]=10'b0;
mem_i[30]=10'b0;mem_r[31]=10'b0;mem_i[31]=10'b0;
mem_r[32]=10'b0; mem_i[32]=10'b0;mem_r[33]=10'b0; mem_i[33]=10'b0;mem_r[34]=10'b0;
mem_i[34]=10'b0;mem_r[35]=10'b0;mem_i[35]=10'b0;
mem_r[36]=10'b0; mem_i[36]=10'b0;mem_r[37]=10'b0; mem_i[37]=10'b0;mem_r[38]=10'b0;
mem_i[38]=10'b0;mem_r[39]=10'b0;mem_i[39]=10'b0;
mem_r[40]=10'b0; mem_i[40]=10'b0;mem_r[41]=10'b0; mem_i[41]=10'b0;mem_r[42]=10'b0;
mem_i[42]=10'b0;mem_r[43]=10'b0;mem_i[43]=10'b0;
mem_r[44]=10'b0; mem_i[44]=10'b0;mem_r[45]=10'b0; mem_i[45]=10'b0;mem_r[46]=10'b0;
mem_i[46]=10'b0;mem_r[47]=10'b0;mem_i[47]=10'b0;
mem_r[48]=10'b0; mem_i[48]=10'b0;mem_r[49]=10'b0; mem_i[49]=10'b0;mem_r[50]=10'b0;
mem_i[50]=10'b0;mem_r[51]=10'b0;mem_i[51]=10'b0;
mem_r[52]=10'b0; mem_i[52]=10'b0;mem_r[53]=10'b0; mem_i[53]=10'b0;mem_r[54]=10'b0;
mem_i[54]=10'b0;mem_r[55]=10'b0;mem_i[55]=10'b0;
mem_r[56]=10'b0; mem_i[56]=10'b0;mem_r[57]=10'b0; mem_i[57]=10'b0;mem_r[58]=10'b0;
mem_i[58]=10'b0;mem_r[59]=10'b0;mem_i[59]=10'b0;
mem_r[60]=10'b0; mem_i[60]=10'b0;mem_r[61]=10'b0; mem_i[61]=10'b0;mem_r[62]=10'b0;
mem_i[62]=10'b0;mem_r[63]=10'b0;mem_i[63]=10'b0;
mem_r[64]=10'b0; mem_i[64]=10'b0;mem_r[65]=10'b0; mem_i[65]=10'b0;mem_r[66]=10'b0;
mem_i[66]=10'b0;mem_r[67]=10'b0;mem_i[67]=10'b0;
mem_r[68]=10'b0; mem_i[68]=10'b0;mem_r[69]=10'b0; mem_i[69]=10'b0;mem_r[70]=10'b0;
mem_i[70]=10'b0;mem_r[71]=10'b0;mem_i[71]=10'b0;
mem_r[72]=10'b0; mem_i[72]=10'b0;mem_r[73]=10'b0; mem_i[73]=10'b0;mem_r[74]=10'b0;
mem_i[74]=10'b0;mem_r[75]=10'b0;mem_i[75]=10'b0;
mem_r[76]=10'b0; mem_i[76]=10'b0;mem_r[77]=10'b0; mem_i[77]=10'b0;mem_r[78]=10'b0;
mem_i[78]=10'b0;mem_r[79]=10'b0;mem_i[79]=10'b0;
mem_r[80]=10'b0; mem_i[80]=10'b0;mem_r[81]=10'b0; mem_i[81]=10'b0;mem_r[82]=10'b0;
mem_i[82]=10'b0;mem_r[83]=10'b0;mem_i[83]=10'b0;
mem_r[84]=10'b0; mem_i[84]=10'b0;mem_r[85]=10'b0; mem_i[85]=10'b0;mem_r[86]=10'b0;
mem_i[86]=10'b0;mem_r[87]=10'b0;mem_i[87]=10'b0;
mem_r[88]=10'b0; mem_i[88]=10'b0;mem_r[89]=10'b0; mem_i[89]=10'b0;mem_r[90]=10'b0;
mem_i[90]=10'b0;mem_r[91]=10'b0;mem_i[91]=10'b0;
mem_r[92]=10'b0; mem_i[92]=10'b0;mem_r[93]=10'b0; mem_i[93]=10'b0;mem_r[94]=10'b0;
mem_i[94]=10'b0;mem_r[95]=10'b0;mem_i[95]=10'b0;
mem_r[96]=10'b0; mem_i[96]=10'b0;mem_r[97]=10'b0; mem_i[97]=10'b0;mem_r[98]=10'b0;
mem_i[98]=10'b0;mem_r[99]=10'b0;mem_i[99]=10'b0;

48
mem_r[100]=10'b0; mem_i[100]=10'b0;mem_r[101]=10'b0;
mem_i[101]=10'b0;mem_r[102]=10'b0;
mem_i[102]=10'b0;mem_r[103]=10'b0;mem_i[103]=10'b0;
mem_r[104]=10'b0; mem_i[104]=10'b0;mem_r[105]=10'b0;
mem_i[105]=10'b0;mem_r[106]=10'b0;
mem_i[106]=10'b0;mem_r[107]=10'b0;mem_i[107]=10'b0;
mem_r[108]=10'b0; mem_i[108]=10'b0;mem_r[109]=10'b0;
mem_i[109]=10'b0;mem_r[110]=10'b0;
mem_i[110]=10'b0;mem_r[111]=10'b0;mem_i[111]=10'b0;
mem_r[112]=10'b0; mem_i[112]=10'b0;mem_r[113]=10'b0;
mem_i[113]=10'b0;mem_r[114]=10'b0;
mem_i[114]=10'b0;mem_r[115]=10'b0;mem_i[115]=10'b0;
mem_r[116]=10'b0; mem_i[116]=10'b0;mem_r[117]=10'b0;
mem_i[117]=10'b0;mem_r[118]=10'b0;
mem_i[118]=10'b0;mem_r[119]=10'b0;mem_i[119]=10'b0;
mem_r[120]=10'b0; mem_i[120]=10'b0;mem_r[121]=10'b0;
mem_i[121]=10'b0;mem_r[122]=10'b0;
mem_i[122]=10'b0;mem_r[123]=10'b0;mem_i[123]=10'b0;
mem_r[124]=10'b0; mem_i[124]=10'b0;mem_r[125]=10'b0;
mem_i[125]=10'b0;mem_r[126]=10'b0;
mem_i[126]=10'b0;mem_r[127]=10'b0;mem_i[127]=10'b0;
end
else if(din_valid==1)begin
mem_r[count1]=indata_r;
mem_i[count1]=indata_i;
end
end
always @(posedgerclk or negedge reset )
begin
if(!reset)
count2<=6'b111111;
else if(din_valid==1)
count2<=count2+1;
end
always @(posedgerclk or negedge reset)
if(!reset) begin
first_r<=10'b0;
last_r<=10'b0;
first_i<=10'b0;
last_i<=10'b0;
end
else begin
if (count2<32)begin
first_r<=mem_r[count2+64];
last_r<=mem_r[count2+96];
first_i<=mem_i[count2+64];

49
last_i<=mem_i[count2+96];
end
else begin
first_r<=mem_r[count2-32];
last_r<=mem_r[count2];
first_i<=mem_i[count2-32];
last_i<=mem_i[count2];
end
end
endmodule
`timescale 1ns/1ns
Moduletb_buffer;
Regwclk,rclk,reset,din_valid;
reg [9:0] indata_r,indata_i;
wire [9:0] first_r,first_i,last_r,last_i;
always #5 wclk=~wclk;
always #10 rclk=~rclk;
initial begin
wclk=0;
rclk=0;
reset=0;
din_valid=0;
indata_r=0;indata_i=0;
#10 reset=1;din_valid=1;
indata_r=0;indata_i=0;
repeat(200)begin
#10 indata_r=indata_r+1;indata_i=indata_i+1;
end
end
input_buffer i1(.wclk( wclk),.rclk(rclk),.reset(reset),.din_valid(din_valid),
.indata_r(indata_r),.indata_i(indata_i),
.first_r(first_r),.first_i(first_i),.last_r(last_r),.last_i(last_i));
endmodule
`timescale 1ns/1ns
Moduledff( clk,reset, d, y );
parameter IN_WIDTH=10;
inputclk,reset;
input [IN_WIDTH-1:0]d;
output [IN_WIDTH-1:0]y;
wire [IN_WIDTH-1:0]y;
reg [IN_WIDTH-1:0]r;
assign y=r;
always @(posedgeclk or negedge reset)
begin
if(!reset)begin
r<=10'b0;

50
end
else begin
r<=d;
end
end
endmodule
//butterfly
`timescale 1ns/1ns
module butterfly( a_r,a_i,b_r,b_i, a1_r,a1_i,b1_r,b1_i);
parameter IN_WIDTH=10;
input [IN_WIDTH-1:0] a_r,a_i,b_r,b_i;
output [IN_WIDTH-1:0] a1_r,a1_i,b1_r,b1_i;
wire [IN_WIDTH:0] a0_r,a0_i,b0_r,b0_i;
assign a0_r=a_r+b_r;
assign b0_r=a_r-b_r;
assign a0_i=a_i+b_i;
assign b0_i=a_i-b_i;
assign a1_r=a0_r[IN_WIDTH:1];
assign b1_r=b0_r[IN_WIDTH:1];
assign a1_i=a0_i[IN_WIDTH:1];
assign b1_i=b0_i[IN_WIDTH:1];
endmodule
//delay 16
`timescale 1ns/1ns
module delay16 (clk,reset,x_r,x_i,y_r,y_i);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg [IN_WIDTH-1:0]y_r,y_i;
wire [IN_WIDTH-1:0]x0_r,x0_i;
wire [IN_WIDTH-1:0]x1_r,x1_i;
wire [IN_WIDTH-1:0]x2_r,x2_i;
wire [IN_WIDTH-1:0]x3_r,x3_i;
wire [IN_WIDTH-1:0]x4_r,x4_i;
wire [IN_WIDTH-1:0]x5_r,x5_i;
wire [IN_WIDTH-1:0]x6_r,x6_i;
wire [IN_WIDTH-1:0]x7_r,x7_i;
wire [IN_WIDTH-1:0]x8_r,x8_i;
wire [IN_WIDTH-1:0]x9_r,x9_i;
wire [IN_WIDTH-1:0]x10_r,x10_i;
wire [IN_WIDTH-1:0]x11_r,x11_i;
wire [IN_WIDTH-1:0]x12_r,x12_i;
wire [IN_WIDTH-1:0]x13_r,x13_i;
wire [IN_WIDTH-1:0]x14_r,x14_i;
//wire [IN_WIDTH-1:0]x15_r,x15_i;

51
dff d1(.clk(clk),.reset(reset),.d(x_r),.y(x0_r));
dff d2(.clk(clk),.reset(reset),.d(x0_r),.y(x1_r));
dff d3(.clk(clk),.reset(reset),.d(x1_r),.y(x2_r));
dff d4(.clk(clk),.reset(reset),.d(x2_r),.y(x3_r));
dff d5(.clk(clk),.reset(reset),.d(x3_r),.y(x4_r));
dff d6(.clk(clk),.reset(reset),.d(x4_r),.y(x5_r));
dff d7(.clk(clk),.reset(reset),.d(x5_r),.y(x6_r));
dff d8(.clk(clk),.reset(reset),.d(x6_r),.y(x7_r));
dff d9(.clk(clk),.reset(reset),.d(x7_r),.y(x8_r));
dff d10(.clk(clk),.reset(reset),.d(x8_r),.y(x9_r));
dff d11(.clk(clk),.reset(reset),.d(x9_r),.y(x10_r));
dff d12(.clk(clk),.reset(reset),.d(x10_r),.y(x11_r));
dff d13(.clk(clk),.reset(reset),.d(x11_r),.y(x12_r));
dff d14(.clk(clk),.reset(reset),.d(x12_r),.y(x13_r));
dff d15(.clk(clk),.reset(reset),.d(x13_r),.y(x14_r));
//dffd16(.clk(clk),.reset(reset),.d(x14_r),.y(x15_r));
dff d17(.clk(clk),.reset(reset),.d(x_i),.y(x0_i));
dff d18(.clk(clk),.reset(reset),.d(x0_i),.y(x1_i));
dff d19(.clk(clk),.reset(reset),.d(x1_i),.y(x2_i));
dff d20(.clk(clk),.reset(reset),.d(x2_i),.y(x3_i));
dff d21(.clk(clk),.reset(reset),.d(x3_i),.y(x4_i));
dff d22(.clk(clk),.reset(reset),.d(x4_i),.y(x5_i));
dff d23(.clk(clk),.reset(reset),.d(x5_i),.y(x6_i));
dff d24(.clk(clk),.reset(reset),.d(x6_i),.y(x7_i));
dff d25(.clk(clk),.reset(reset),.d(x7_i),.y(x8_i));
dff d26(.clk(clk),.reset(reset),.d(x8_i),.y(x9_i));
dff d27(.clk(clk),.reset(reset),.d(x9_i),.y(x10_i));
dff d28(.clk(clk),.reset(reset),.d(x10_i),.y(x11_i));
dff d29(.clk(clk),.reset(reset),.d(x11_i),.y(x12_i));
dff d30(.clk(clk),.reset(reset),.d(x12_i),.y(x13_i));
dff d31(.clk(clk),.reset(reset),.d(x13_i),.y(x14_i));
//dffd32(.clk(clk),.reset(reset),.d(x14_i),.y(x15_i));
always @(posedgeclk or negedge reset )
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else
begin
y_r<=x14_r;
y_i<=x14_i;
end
end
endmodule

52
//delay8
`timescale 1ns/1ns
module delay8 ( clk,reset, x_r,x_i,y_r,y_i);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg[IN_WIDTH-1:0]y_r,y_i;
wire [IN_WIDTH-1:0]x0_r,x0_i;
wire [IN_WIDTH-1:0]x1_r,x1_i;
wire [IN_WIDTH-1:0]x2_r,x2_i;
wire [IN_WIDTH-1:0]x3_r,x3_i;
wire [IN_WIDTH-1:0]x4_r,x4_i;
wire [IN_WIDTH-1:0]x5_r,x5_i;
wire [IN_WIDTH-1:0]x6_r,x6_i;
//wire [IN_WIDTH-1:0]x7_r,x7_i;
dff d1(.clk(clk),.reset(reset),.d(x_r),.y(x0_r));
dff d2(.clk(clk),.reset(reset),.d(x0_r),.y(x1_r));
dff d3(.clk(clk),.reset(reset),.d(x1_r),.y(x2_r));
dff d4(.clk(clk),.reset(reset),.d(x2_r),.y(x3_r));
dff d5(.clk(clk),.reset(reset),.d(x3_r),.y(x4_r));
dff d6(.clk(clk),.reset(reset),.d(x4_r),.y(x5_r));
dff d7(.clk(clk),.reset(reset),.d(x5_r),.y(x6_r));
//dffd8(.clk(clk),.reset(reset),.d(x6_r),.y(x7_r));
dff d9(.clk(clk),.reset(reset),.d(x_i),.y(x0_i));
dff d10(.clk(clk),.reset(reset),.d(x0_i),.y(x1_i));
dff d11(.clk(clk),.reset(reset),.d(x1_i),.y(x2_i));
dff d12(.clk(clk),.reset(reset),.d(x2_i),.y(x3_i));
dff d13(.clk(clk),.reset(reset),.d(x3_i),.y(x4_i));
dff d14(.clk(clk),.reset(reset),.d(x4_i),.y(x5_i));
dff d15(.clk(clk),.reset(reset),.d(x5_i),.y(x6_i));
//dffd16(.clk(clk),.reset(reset),.d(x6_i),.y(x7_i));
always @(posedgeclk or negedge reset)
begin if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x6_r;
y_i<=x6_i;
end
end
endmodule
//delay4
`timescale 1ns/1ns

53
module delay4 (
clk,reset,
x_r,x_i,
y_r,y_i
);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg [IN_WIDTH-1:0]y_r,y_i;
wire [IN_WIDTH-1:0]x0_r,x0_i;
wire [IN_WIDTH-1:0]x1_r,x1_i;
wire [IN_WIDTH-1:0]x2_r,x2_i;
//wire [IN_WIDTH-1:0]x3_r,x3_i;
dff d1(.clk(clk),.reset(reset),.d(x_r),.y(x0_r));
dff d2(.clk(clk),.reset(reset),.d(x0_r),.y(x1_r));
dff d3(.clk(clk),.reset(reset),.d(x1_r),.y(x2_r));
//dffd4(.clk(clk),.reset(reset),.d(x2_r),.y(x3_r));
dff d5(.clk(clk),.reset(reset),.d(x_i),.y(x0_i));
dff d6(.clk(clk),.reset(reset),.d(x0_i),.y(x1_i));
dff d7(.clk(clk),.reset(reset),.d(x1_i),.y(x2_i));
//dffd8(.clk(clk),.reset(reset),.d(x2_i),.y(x3_i));
always @(posedgeclk or negedge reset )
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x2_r;
y_i<=x2_i;
end
end
endmodule
//delay2
`timescale 1ns/1ns
module delay2 ( clk,reset,x_r,x_i, y_r,y_i);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg [IN_WIDTH-1:0]y_r,y_i;
wire [IN_WIDTH-1:0]x0_r,x0_i;
//wire [IN_WIDTH-1:0]x1_r,x1_i;
dff d1(.clk(clk),.reset(reset),.d(x_r),.y(x0_r));

54
//dffd2(.clk(clk),.reset(reset),.d(x0_r),.y(x1_r));
dff d3(.clk(clk),.reset(reset),.d(x_i),.y(x0_i));
//dffd4(.clk(clk),.reset(reset),.d(x0_i),.y(x1_i));
always @(posedgeclk or negedge reset)
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x0_r;
y_i<=x0_i;
end
end
endmodule
//delay1
`timescale 1ns/1ns
module delay1 ( clk,reset, x_r,x_i, y_r,y_i);
parameter IN_WIDTH=10;
input clk,reset;
input [IN_WIDTH-1:0]x_r,x_i;
output[IN_WIDTH-1:0]y_r,y_i;
reg[IN_WIDTH-1:0]y_r,y_i;
always @(posedgeclk or negedge reset )
begin
if(!reset)
begin
y_r<=10'b0;
y_i<=10'b0;
end
else begin
y_r<=x_r;
y_i<=x_i;
end
end
endmodule
//switch16
`timescale 1ns/1ns
module switch16(count, x0_r,x1_r,x0_i,x1_i, y0_r,y1_r,y0_i,y1_i);
parameter IN_WIDTH=10;
input [4:0] count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)

55
begin
if(count>15)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch 8
`timescale 1ns/1ns
module switch8( count,x0_r,x1_r,x0_i,x1_i, y0_r,y1_r,y0_i,y1_i);
parameter IN_WIDTH=10;
input [3:0] count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
begin
if(count>7)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch 4
`timescale 1ns/1ns
module switch4(

56
count,
x0_r,x1_r,x0_i,x1_i,
y0_r,y1_r,y0_i,y1_i
);
parameter IN_WIDTH=10;
input [2:0] count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
begin
if(count>3)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch 2
`timescale 1ns/1ns
module switch2( count, x0_r,x1_r,x0_i,x1_i,,y0_r,y1_r,y0_i,y1_i);
parameter IN_WIDTH=10;
input [1:0] count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
begin
if(count>1)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin

57
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// switch1
`timescale 1ns/1ns
module switch1( count, x0_r,x1_r,x0_i,x1_i,,y0_r,y1_r,y0_i,y1_i);
parameter IN_WIDTH=10;
input count;
input [IN_WIDTH-1:0] x0_r,x1_r,x0_i,x1_i;
output[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
reg[IN_WIDTH-1:0] y0_r,y1_r,y0_i,y1_i;
always @(count or x0_r or x1_r or x0_i or x1_i)
begin
if(count==1)
begin
y0_r=x1_r;
y0_i=x1_i;
y1_r=x0_r;
y1_i=x0_i;
end
else
begin
y0_r=x0_r;
y0_i=x0_i;
y1_r=x1_r;
y1_i=x1_i;
end
end
endmodule
// cla20
`timescale 1ns/1ns
module cla20 (a,b,ci,s,co);
input [19:0]a,b;
input ci;
output [19:0] s;
output co;
wire [19:0] a,b,s;
wireci,co;
wire [19:0]c;
wire [19:0] p,g,ps;
wire [18:0] p_1,g_1;
wire [15:0] p_2,g_2;

58
wire [3:0] p_3,g_3;
assign p=a|b;
assign g=a&b;
assign c[0]=ci;
assign c[1]=g[0]|(p[0]&ci);
//first line
opo2 l101(.p2(p[1]),.g2(g[1]),.p1(p[0]),.g1(g[0]),.p3(p_1[0]),.g3(g_1[0]));
opo3 l102(.p3(p[2]),.g3(g[2]),.p2(p[1]),.g2(g[1]),.p1(p[0]),.g1(g[0]),.p4(p_1[1]),.g4(g_1[1]));
opo4 l103(.p4(p[3]),.g4(g[3]),.p3(p[2]),.g3(g[2]),.p2(p[1]),.g2(g[1]),.p1(p[0]),.g1(g[0]),
.p5(p_1[2]),.g5(g_1[2]));//1
opo4 l104(.p4(p[4]),.g4(g[4]),.p3(p[3]),.g3(g[3]),.p2(p[2]),.g2(g[2]),.p1(p[1]),.g1(g[1]),
.p5(p_1[3]),.g5(g_1[3]));//2
opo4
l105(.p4(p[5]),.g4(g[5]),.p3(p[4]),.g3(g[4]),.p2(p[3]),.g2(g[3]),.p1(p[2]),.g1(g[2]),.p5(p_1[4]),.g5
(g_1[4]));//3
opo4 l106(.p4(p[6]),.g4(g[6]),.p3(p[5]),.g3(g[5]),.p2(p[4]),.g2(g[4]),.p1(p[3]),.g1(g[3]),
.p5(p_1[5]),.g5(g_1[5]));//4
opo4
l107(.p4(p[7]),.g4(g[7]),.p3(p[6]),.g3(g[6]),.p2(p[5]),.g2(g[5]),.p1(p[4]),.g1(g[4]),.p5(p_1[6]),.g5
(g_1[6]));//5
opo4 l108(.p4(p[8]),.g4(g[8]),.p3(p[7]),.g3(g[7]),.p2(p[6]),.g2(g[6]),.p1(p[5]),.g1(g[5]),
.p5(p_1[7]),.g5(g_1[7]));//6
opo4 l109(.p4(p[9]),.g4(g[9]),.p3(p[8]),.g3(g[8]),.p2(p[7]),.g2(g[7]),.p1(p[6]),.g1(g[6]),
.p5(p_1[8]),.g5(g_1[8]));//7
opo4 l110(.p4(p[10]),.g4(g[10]),.p3(p[9]),.g3(g[9]),.p2(p[8]),.g2(g[8]),.p1(p[7]),.g1(g[7]),
.p5(p_1[9]),.g5(g_1[9]));//8
opo4 l111(.p4(p[11]),.g4(g[11]),.p3(p[10]),.g3(g[10]),.p2(p[9]),.g2(g[9]),.p1(p[8]),.g1(g[8]),
.p5(p_1[10]),.g5(g_1[10]));//9
opo4 l112(.p4(p[12]),.g4(g[12]),.p3(p[11]),.g3(g[11]),.p2(p[10]),.g2(g[10]),.p1(p[9]),.g1(g[9]),
.p5(p_1[11]),.g5(g_1[11]));//10
opo4
l113(.p4(p[13]),.g4(g[13]),.p3(p[12]),.g3(g[12]),.p2(p[11]),.g2(g[11]),.p1(p[10]),.g1(g[10]),
.p5(p_1[12]),.g5(g_1[12]));//11
opo4
l114(.p4(p[14]),.g4(g[14]),.p3(p[13]),.g3(g[13]),.p2(p[12]),.g2(g[12]),.p1(p[11]),.g1(g[11]),
.p5(p_1[13]),.g5(g_1[13]));//12
opo4
l115(.p4(p[15]),.g4(g[15]),.p3(p[14]),.g3(g[14]),.p2(p[13]),.g2(g[13]),.p1(p[12]),.g1(g[12]),
.p5(p_1[14]),.g5(g_1[14]));//13
opo4
l116(.p4(p[16]),.g4(g[16]),.p3(p[15]),.g3(g[15]),.p2(p[14]),.g2(g[14]),.p1(p[13]),.g1(g[13]),
.p5(p_1[15]),.g5(g_1[15]));//14
opo4
l117(.p4(p[17]),.g4(g[17]),.p3(p[16]),.g3(g[16]),.p2(p[15]),.g2(g[15]),.p1(p[14]),.g1(g[14]),
.p5(p_1[16]),.g5(g_1[16]));//15

59
opo4
l118(.p4(p[18]),.g4(g[18]),.p3(p[17]),.g3(g[17]),.p2(p[16]),.g2(g[16]),.p1(p[15]),.g1(g[15]),
.p5(p_1[17]),.g5(g_1[17]));//16
opo4
l119(.p4(p[19]),.g4(g[19]),.p3(p[18]),.g3(g[18]),.p2(p[17]),.g2(g[17]),.p1(p[16]),.g1(g[16]),.p5(p
_1[18]),.g5(g_1[18]));//17
assign c[2]=g_1[0]|(p_1[0]&ci);
assign c[3]=g_1[1]|(p_1[1]&ci);
//second line
opo2 l201(.p2(p_1[3]),.g2(g_1[3]),.p1(p[0]),.g1(g[0]),.p3(p_2[0]),.g3(g_2[0]));
opo2 l202(.p2(p_1[4]),.g2(g_1[4]),.p1(p_1[0]),.g1(g_1[0]),.p3(p_2[1]),.g3(g_2[1]));
opo2 l203(.p2(p_1[5]),.g2(g_1[5]),.p1(p_1[1]),.g1(g_1[1]),.p3(p_2[2]),.g3(g_2[2]));
opo2 l204(.p2(p_1[6]),.g2(g_1[6]),.p1(p_1[2]),.g1(g_1[2]),.p3(p_2[3]),.g3(g_2[3]));
opo3
l205(.p3(p_1[7]),.g3(g_1[7]),.p2(p_1[3]),.g2(g_1[3]),.p1(p[0]),.g1(g[0]),.p4(p_2[4]),.g4(g_2[4]))
;
opo3
l206(.p3(p_1[8]),.g3(g_1[8]),.p2(p_1[4]),.g2(g_1[4]),.p1(p_1[0]),.g1(g_1[0]),.p4(p_2[5]),.g4(g_
2[5]));
opo3
l207(.p3(p_1[9]),.g3(g_1[9]),.p2(p_1[5]),.g2(g_1[5]),.p1(p_1[1]),.g1(g_1[1]),.p4(p_2[6]),.g4(g_
2[6]));
opo3
l208(.p3(p_1[10]),.g3(g_1[10]),.p2(p_1[6]),.g2(g_1[6]),.p1(p_1[2]),.g1(g_1[2]),.p4(p_2[7]),.g4(
g_2[7]));
opo4
l209(.p4(p_1[11]),.g4(g_1[11]),.p3(p_1[7]),.g3(g_1[7]),.p2(p_1[3]),.g2(g_1[3]),.p1(p[0]),.g1(g[0
]), .p5(p_2[8]),.g5(g_2[8]));//1
opo4
l210(.p4(p_1[12]),.g4(g_1[12]),.p3(p_1[8]),.g3(g_1[8]),.p2(p_1[4]),.g2(g_1[4]),.p1(p_1[0]),.g1(
g_1[0]), .p5(p_2[9]),.g5(g_2[9]));//2
opo4
l211(.p4(p_1[13]),.g4(g_1[13]),.p3(p_1[9]),.g3(g_1[9]),.p2(p_1[5]),.g2(g_1[5]),.p1(p_1[1]),.g1(
g_1[1]), .p5(p_2[10]),.g5(g_2[10]));//3
opo4
l212(.p4(p_1[14]),.g4(g_1[14]),.p3(p_1[10]),.g3(g_1[10]),.p2(p_1[6]),.g2(g_1[6]),.p1(p_1[2]),.g
1(g_1[2]), .p5(p_2[11]),.g5(g_2[11]));//4
opo4
l213(.p4(p_1[15]),.g4(g_1[15]),.p3(p_1[11]),.g3(g_1[11]),.p2(p_1[7]),.g2(g_1[7]),.p1(p_1[3]),.g
1(g_1[3]), .p5(p_2[12]),.g5(g_2[12]));//5
opo4
l214(.p4(p_1[16]),.g4(g_1[16]),.p3(p_1[12]),.g3(g_1[12]),.p2(p_1[8]),.g2(g_1[8]),.p1(p_1[4]),.g
1(g_1[4]), .p5(p_2[13]),.g5(g_2[13]));//6
opo4
l215(.p4(p_1[17]),.g4(g_1[17]),.p3(p_1[13]),.g3(g_1[13]),.p2(p_1[9]),.g2(g_1[9]),.p1(p_1[5]),.g
1(g_1[5]), .p5(p_2[14]),.g5(g_2[14]));//7

60
opo4
l216(.p4(p_1[18]),.g4(g_1[18]),.p3(p_1[14]),.g3(g_1[14]),.p2(p_1[10]),.g2(g_1[10]),.p1(p_1[6]),
.g1(g_1[6]) .p5(p_2[15]),.g5(g_2[15]));//8
assign c[4]=g_1[2]|(p_1[2]&ci);
assign c[5]=g_2[0]|(p_2[0]&ci);
assign c[6]=g_2[1]|(p_2[1]&ci);
assign c[7]=g_2[2]|(p_2[2]&ci);
assign c[8]=g_2[3]|(p_2[3]&ci);
assign c[9]=g_2[4]|(p_2[4]&ci);
assign c[10]=g_2[5]|(p_2[5]&ci);
assign c[11]=g_2[6]|(p_2[6]&ci);
assign c[12]=g_2[7]|(p_2[7]&ci);
assign c[13]=g_2[8]|(p_2[8]&ci);
assign c[14]=g_2[9]|(p_2[9]&ci);
assign c[15]=g_2[10]|(p_2[10]&ci);
//third line
opo2 l301(.p2(p_2[12]),.g2(g_2[12]),.p1(p_2[4]),.g1(g_2[4]),.p3(p_3[0]),.g3(g_3[0]));
opo2 l302(.p2(p_2[13]),.g2(g_2[13]),.p1(p_2[5]),.g1(g_2[5]),.p3(p_3[1]),.g3(g_3[1]));
opo2 l303(.p2(p_2[14]),.g2(g_2[14]),.p1(p_2[6]),.g1(g_2[6]),.p3(p_3[2]),.g3(g_3[2]));
opo2 l304(.p2(p_2[15]),.g2(g_2[15]),.p1(p_2[7]),.g1(g_2[7]),.p3(p_3[3]),.g3(g_3[3]));
//result
assign c[16]=g_2[11]|(p_2[11]&ci);
assign c[17]=g_3[0]|(p_3[0]&ci);
assign c[18]=g_3[1]|(p_3[1]&ci);
assign c[19]=g_3[2]|(p_3[2]&ci);

assign co=g_3[3]|p_3[3]&ci;
assign s=(p&(~g))^c;
endmodule
//booth
`timescale 1ns/1ns
module booth (a,b,out,signal);
input [9:0] a;
input [2:0] b;
output [10:0] out;
output signal;
wire [9:0] a;
wire [2:0] b;
reg [10:0] out;
reg signal;
always @(a or b)
begin
case (b)
3'b000: begin
out=11'b0;
signal=0;

61
end
3'b001: begin
out={a[9],a};
signal=0;
end
3'b010: begin
out={a[9],a};
signal=0;
end
3'b011: begin
out[10:0]=a<<1;
signal=0;
end
3'b100: begin
out[10:0]=(~(a<<1));
signal=1;
end
3'b101: begin
out[10]=~a[9];
out[9:0]=~a;
signal=1;
end
3'b110: begin
out[10]=~a[9];
out[9:0]=~a;
signal=1;
end
3'b111: begin
out=11'b0;
signal=0;
end
endcase
end
endmodule
//complex_mul
`timescale 1ns/1ns
Modulecomplex_mul( a,b,c,d, yr,yi);
parameter IN_WIDTH=10;
input [IN_WIDTH-1:0]a,b,c,d;
output [IN_WIDTH*2-1:0]yr,yi;
wire [IN_WIDTH:0] a1,c2,c3;
wire [IN_WIDTH-1:0] a0,c0,c1;
wire [IN_WIDTH*2-1:0] y0,y1,y2;
assign a1=a-b;
assign c2=c-d;
assign c3=c+d;

62
assign a0=a1[IN_WIDTH:1];
assign c0=c2[IN_WIDTH:1];
assign c1= c3[IN_WIDTH:1];
multiplier m0(.x(a0),.y(d),.result(y0));
multiplier m1(.x(c0),.y(a),.result(y1));
multiplier m2(.x(c1),.y(b),.result(y2));
assign yr=y0+y1;
assignyi=y0+y2;
endmodule
//tbcla
`timescale 1ns/1ns
Moduletbcla;
Regclk;
reg ci;
reg [19:0] a,b;
wire [19:0] s;
wire co;
reg [20:0] check;
cla20 c1(.a(a),.b(b),.ci(ci),.s(s),.co(co));
always #5 clk=~clk;
initial begin
clk=1'b0;
a=20'b0;
b=20'b0;
ci=1'b0;
repeat(100) begin
a=$random;b=$random;ci=1'b0;
check=a+b+ci;
#10 $display ($time, " %d+%d+%d=%d(%d)",a,b,ci,{co,s},check);
end
end
endmodule
//cl42_20
module cl42_20(a,b,c,d,ci,s,cr);
input [19:0]a,b,c,d;
input ci;
output [20:0]s;
output [20:0]cr;
wire [19:0] txr,tao,toa;
assigntxr=(a^b)^(c^d);
assigntao=(a&b)|(c&d);
assigntoa=(a|b)&(c|d);
assign s={txr[19],txr}^{toa,ci};
assign cr=({txr[19],txr}&{toa,ci})|((~{txr[19],txr})&{tao[19],tao});
endmodule

63
//multiplier
`timescale 1ns/1ns
module multiplier ( x,y, result );
input [9:0]x,y;
output [19:0]result;
wire [19:0] result;
wire [9:0] a,b;
wire [10:0] w0,w1,w2,w3,w4;
wire x0,x1,x2,x3,x4;
wire [14:0] s1,s2;
wire [12:0] s3,s4;
wire [20:0] s5,s6;
wire [19:0] s7;
wire co;
assign a=x;
assign b=y;
assign result=s7;
//booth coding
booth b0(.a(a),.b({b[1:0],1'b0}),.out(w0),.signal(x0));
booth b1(.a(a),.b(b[3:1]),.out(w1),.signal(x1));
booth b2(.a(a),.b(b[5:3]),.out(w2),.signal(x2));
booth b3(.a(a),.b(b[7:5]),.out(w3),.signal(x3));
booth b4(.a(a),.b(b[9:7]),.out(w4),.signal(x4));
//******************first line with 3:2 compressor w0_w1_w2 w3_w4_x4************//
csa_15
c1(.a({{4{w0[10]}},w0}),.b({{2{w1[10]}},w1,1'b0,x0}),.ci({w2,1'b0,x1,2'b0}),.s(s1),.co(s2));
csa_13 c2(.a({{2{w3[10]}},w3}),.b({w4,1'b0,x3}),.ci({10'b0,x4,2'b0}),.s(s3),.co(s4));
//********************second line with 4:2 compressor******************//
cl42_20
c3(.a({{5{s1[14]}},s1}),b({{4{s2[14]}},s2,1'b0}),.c({{s3[12]},s3,1'b0,x2,4'b0}),.d({s4,7'b0}),.c
i(1'b0),.s(s5),.cr(s6));
//******************** leading carry adder**********************************//
cla20 cla(.a({s5[19:0]}),.b({s6[18:0],1'b0}),.ci(1'b0),.s(s7),.co(co));
endmodule
//inverter
`timescale 1ns/1ns
module inverter( count, a_r,a_i, a1_r,a1_i);
parameter IN_WIDTH=10;
input count;
input [IN_WIDTH-1:0] a_r,a_i;
output [IN_WIDTH-1:0] a1_r,a1_i;
wire[IN_WIDTH-1:0] a1_r,a1_i;
assign a1_r=(count)?a_i:a_r;
assign a1_i=(count)?(-a_r):a_i;
endmodule

64
//bm
`timescale 1ns/1ns
module bm(clk,reset address, ar,ai,br,bi, r16,i16, r0,i0);
parameter IN_WIDTH=10;
input clk,reset;
input [4:0]address;
input [IN_WIDTH-1:0] ar,ai;
input [IN_WIDTH-1:0] br,bi;
output [IN_WIDTH-1:0] r16,i16;
output [IN_WIDTH-1:0] r0,i0;
wire [IN_WIDTH-1:0] r16,i16;
wire [IN_WIDTH-1:0] r0,i0;
wire [IN_WIDTH-1:0] yr0,yi0,yr16,yi16;
wire [IN_WIDTH*2-1:0] yr,yi;
wire [IN_WIDTH-1:0] wr,wi;
butterfly b1(.a_r(ar),.a_i(ai),.b_r(br),.b_i(bi),.a1_r(yr0),.a1_i(yi0),.b1_r(yr16),.b1_i(yi16));
twiddle1 t1(.clk(clk),.reset(reset),.address(address),.wr(wr),.wi(wi));
complex_mul m1(.a(yr16),.b(yi16),.c(wr),.d(wi),.yr(yr),.yi(yi));
assign r0=yr0;
assign i0=yi0;
assign r16=yr[IN_WIDTH*2-1:IN_WIDTH];
assign i16=yi[IN_WIDTH*2-1:IN_WIDTH];
endmodule
// opo2
`timescale 1ns/1ns
module opo2(p2,g2,p1,g1,p3,g3);
input p1,p2,g1,g2;
output p3,g3;
assign p3=p2&p1;
assign g3=g2|(g1&p2);
endmodule
//opo3
`timescale 1ns/1ns
module opo3(p3,p2,p1,g3,g2,g1,p4,g4);
input p1,p2,p3,g1,g2,g3;
output p4,g4;
assign p4=p3&p2&p1;
assign g4=g3|(p3&g2)|(p3&p2&g1);
endmodule
//opo4
`timescale 1ns/1ns
module opo4(p4,p3,p2,p1,g4,g3,g2,g1,p5,g5);
input p4,p3,p2,p1,g4,g3,g2,g1;
output p5,g5;
assign p5=p4&p3&p2&p1;
assign g5=g4|p4&g3|p4&p3&g2|p4&p3&p2&g1;

65
endmodule
//csa13
module csa_13(a,b,ci,s,co);
input[12:0] a,b,ci;
output[12:0] s,co;
assign s=a^b^ci;
assign co=(a&b)|(a&ci)|(b&ci);
endmodule
//dataout
`timescale 1ns/1ns
Moduledataout ( clk,reset,first_r,first_i, last_r,last_i, dout_re,dout_im,dout_valid);
Inputclk,reset;
input [9:0]first_r,first_i,last_r,last_i;
output [9:0]dout_re,dout_im;
outputdout_valid;
reg [9:0]dout_re,dout_im;
regdout_valid;
reg flag;
reg [6:0]count2;
reg count;
always @(posedgeclk or negedge reset)
begin
if (!reset)
count2<=7'b1111111;
else if(flag==0)
count2<=count2+1;
end
always @(posedgeclk or negedge reset)
begin
if (!reset)
flag<=0;
else if(count2==7'b1111101)
flag<=1;
end
always @(posedgeclk or negedge reset)
begin
if (!reset)
count<=1;
else
count<=count+1;
end
always @(posedgeclk or negedge reset)
begin
if(!reset) begin
dout_re<=10'b0; dout_im<=10'b0;
end

66
else begin
if(count==0) begin
dout_re<=first_r; dout_im<=first_i;
end
else begin
dout_re<=last_r;dout_im<=last_i;
end
end
end
always @(posedgeclk or negedge reset)
begin
if(!reset)
dout_valid<=0;
else if(flag==1)
dout_valid<=1;
else
dout_valid<=0;
end
endmodule
//twiddle
`timescale 1ns/1ns
module twiddle1(clk,reset,address,wr,wi);
parameter IN_WIDTH=10;
parameter mem0=10'b0111111111;
parameter mem1=10'b0111111101;
parameter mem2=10'b0111110110;
parameter mem3=10'b0111101001;
parameter mem4=10'b0111011001;
parameter mem5=10'b0111000011;
parameter mem6=10'b0110101001;
parameter mem7=10'b0110001011;
parameter mem8=10'b0101101010;
parameter mem9=10'b0101000100;
parameter mem10=10'b0100011100;
parameter mem11=10'b0011110001;
parameter mem12=10'b0011000011;
parameter mem13=10'b0010010100;
parameter mem14=10'b0001100011;
parameter mem15=10'b0000110010;
input clk,reset;
input [4:0]address;
output [IN_WIDTH-1:0] wr,wi;
reg [IN_WIDTH-1:0] wr,wi;
always @(posedgeclk or negedge reset )
if (!reset) begin
wr<=0;wi<=0;

67
end
else
begin
case(address)
5'd0 :beginwr<=mem0;wi<=0; end
5'd1 : begin wr<=mem1;wi<=-mem15;end
5'd2 : begin wr<=mem2;wi<=-mem14; end
5'd3 : begin wr<=mem3;wi<=-mem13; end
5'd4 : begin wr<=mem4;wi<=-mem12;end
5'd5 : begin wr<=mem5;wi<=-mem11;end
5'd6 : begin wr<=mem6;wi<=-mem10;end
5'd7 : begin wr<=mem7;wi<=-mem9;end
5'd8 : begin wr<=mem8;wi<=-mem8;end
5'd9 : begin wr<=mem9;wi<=-mem7;end
5'd10 :beginwr<=mem10;wi<=-mem6;end
5'd11 :beginwr<=mem11;wi<=-mem5;end
5'd12 :beginwr<=mem12;wi<=-mem4;end
5'd13 :beginwr<=mem13;wi<=-mem3;end
5'd14 :beginwr<=mem14;wi<=-mem2;end
5'd15 :beginwr<=mem15;wi<=-mem1;end
5'd16 :beginwr<=0;wi<=-mem0; end
5'd17 :beginwr<=-mem15;wi<=-mem15;end
5'd18 :beginwr<=-mem14;wi<=-mem14; end
5'd19 :beginwr<=-mem13;wi<=-mem13; end
5'd20: begin wr<=-mem12;wi<=-mem12;end
5'd21: begin wr<=-mem11;wi<=-mem11;end
5'd22: begin wr<=-mem10;wi<=-mem10;end
5'd23 :beginwr<=-mem9;wi<=-mem9;end
5'd24 :beginwr<=-mem8;wi<=-mem8;end
5'd25 :beginwr<=-mem7;wi<=-mem7;end
5'd26 :beginwr<=-mem6;wi<=-mem6;end
5'd27 :beginwr<=-mem5;wi<=-mem5;end
5'd28 :beginwr<=-mem4;wi<=-mem4;end
5'd29 :beginwr<=-mem3;wi<=-mem3;end
5'd30 :beginwr<=-mem2;wi<=-mem2;end
5'd31 :beginwr<=-mem1;wi<=-mem1;end
Endcase
end
endmodule
//tbmul
`timescale 1ns/1ns
Moduletbmul;
Regclk,reset;
reg [9:0] x,y;
wire [19:0] result;
reg [19:0] check;

68
multiplier m0(
.x(x),
.y(y),
.result(result)
);
always #20 clk=~clk;
initial begin
clk=0;
reset=1;
x=-10'd15;
y=10'd30;
#5 reset=0;
#20 reset=1;
#15;
//check=x*y;
repeat(100) begin
x=x+20;y=y+30;
check=x*y;
#40;
end
end
endmodule

69
VITA

Bhavishya Murukutla was born in Guntur, Andhra Pradesh, India. She has graduated with a

Bachelor‟s degree in Electronics and Communication Engineering from JNTU Kakinada

University, Kakinada, India in May 2012. After completion of her Bachelor‟s degree, she moved

to the United States of America in August 2012 to pursue her Master of Science in Electrical

Engineering at Texas A&M University–Kingsville. She is scheduled to graduate in December

2013.

70

Vous aimerez peut-être aussi