Sang PSR

Architecture for Real-Valued Signal Based on Radix - 2
Decimation Algorithm
S. Sangeetha Ms. A. Sheeba Ms. C. K. Balasundari
PG student (VLSI Design), Assistant Professor, Assistant Professor,
Department of ECE, Department of ECE, Department of ECE,
Dr. Sivanthi Aditanar college of Dr. Sivanthi Aditanar college of Dr. Sivanthi Aditanar college of
Engineering, Engineering, Engineering,
Tiruchendur, India. Tiruchendur, India. Tiruchendur, India.
Sangeetha.sivaraman24@gmail.com sheeba_a01@yahoo.com ckbalasundari90@gmail.com
conservation of memory space and for effective utilization of

AbstractThis brief presents the architecture for Real Fast resources re-usability of memory are adopted here W
Fourier Transform (RFFT) computation for the real valued corresponds to the word-length used. The memory-based
signal based on radix-2 and radix-4 decimation algorithm. The architectures are suitable for applications where trade-off
stage partitioning in the RFFT architecture were done to between hardware cost, performance and speed are acceptable.
maximize the usage of the Processing Elements (PE). The RFFT It is well-suited for large sized RFFT computations, low and
architecture uses the properties of FFT that reduces the
redundant operation saving the memory. In addition it also
moderate speed applications. When the number of processing
reduces the number of multipliers in RFFT. The simulation elements used are increased the performance are enhanced to a
analyses are performed using Xilinx ISE 14.3 for fixed and greater extend.
floating point numbers and the numerical accuracy are
B. Decimation algorithm
compared.
The FFT algorithms are classified into two types namely
Index TermsFast Fourier Transform (FFT), Real Fast Decimation-In-Time (DIT) algorithm and Decimation-In-
Fourier Transform (RFFT), real-valued signals. Frequency (DIF) algorithm. In case of DIT the input samples
are in the bit-reversal order, while the output samples are
obtained are in natural order. Initially the twiddle factors are
I. INTRODUCTION multiplied and then respective addition or subtraction was
AST Fourier Transform are increasingly used in the done. The flow of DIT is shown in the Fig. 1a.
F computation of DFT and finds application in the fields of

DSP. In addition it is also used for the computation of
RFFT using real-valued signals. FFTs are predominantly
used in application which includes audio, video, speech and Fig. 1a. DIT-FFT
image processing, where FFTs are performed on real-valued
signals. This is used widely because most of the signals are In DIF input samples are in the natural order while the
real-valued signals, such as bio-medical signals processing. output samples are obtained are in the bit-reversal order,
FFT using real-valued signals exhibits the property of addition or subtraction was done initially and this is followed
conjugate and symmetry renders half the outputs of the FFT by the multiplication of the twiddle factors. The multiplication
because, when the input samples are real-valued the spectrum of the twiddle factor increases the complexity of the RFFT
of the FFT remains symmetry and nearly half of the operations computation as well as the FFT computation. The flow of DIF
remain redundant in the computation. is shown in the Fig. 1b.
A. RFFT architecture
The RFFT architecture can be classified into two categories
namely pipelined architecture and memory-based architecture. Fig. 1b. DIF-FFT
Memory-based architecture is also referred as in-place
architecture or continuous-flow architecture. The pipelined In this brief, we propose an architecture that computes the
architecture uses either feedback or feed-forward data paths. RFFT based on the radix-2 and radix-4 decimation-in-time
The feed-forward architecture is termed as multi-path delay (DIT) algorithm. The flow of data in a single line for radix-4 is
commutator, whereas the feedback architecture is termed to be shown in the Fig. The flow of data in a single line for radix-4
single-path delay feedback. The pipelined architecture was is shown in the Fig. 2a.The flow graph for radix-4 is show in
found to be more useful in areas where high throughput is the Fig. 2b. The DIT RFFT algorithm requires lowest number
required. The memory-based architecture uses few processing of operations for radix-2 and radix-4 by computing only half
element they occupy smaller area more concentration are of the output samples. The algorithm separates the data into
focused in conserving the usage of memory, for better real and imaginary components. As the result, the word length
of the required memory can be W instead of 2W, where W is II. RFFT
the word length chosen to represent the data, which can be The N-point DFT of a sequence x(n) is defined as
either a real or imaginary component, this reduces the amount N 1
X [k ] x[n]W N , 0 k N 1
nk
of storage space in addition to this the usage of redundant
operations are reduced to half. The continuous-flow FFT n 0 (1)
architecture based on the modified radix-2 algorithm [1] has j ( 2 / N ) nk
e
nk
been recently proposed to achieve the lower area-time (AT) W N
product for RFFT, where area corresponds to the data-path The RFFT considers the input sequence, x[n] to be a real
area and time corresponds to the number of cycles required for sequence for all values of n x[n] R.
the computation, lowering AT will enhance the efficiency of if x[n] real
the RFFT computation. The RFFT architecture in [12] does
then
not achieve better hardware utilization.
output ( X [ N k ]) X * [k ]
X[4r] The RFFT can be considered as a conventional FFT with
x[n]
additional conditions
Im( x[n]) 0 (2)
x[n+N/4]
X [ N K ] X * [k ] (3)
X[4r+1] These two conditions distinguish CFFT and RFFT a direct
W n computation of (1) is done for RFFT, this computation
x[n+N/2] reduces number of operations. This can be obtained by
X[4r+2]
using algorithms that computes RFFT. Most of them are
W2n
obtained from the CFFT this is done by applying the
x[n+N/4] properties of the RFFT in order to remove the redundant
W3n X[4r] operations. The algorithms were defined for the decimation
in time (DIT) decomposition of the FFT. The DIT FFT has
the property that the samples at each intermediate stage can
Fig. 2a. Flow of Radix-4 FFT be computed using the conventional FFT, one half of the
This brief is organized as follows. Section II first describes intermediate outputs must be calculated, whereas the rest of
the RFFT algorithm and then presents the strategy of the the intermediate values can be obtained by conjugating
RFFT flow-graph for radix-2 and radix-4 DIT algorithm, to them. These algorithms does not holds for the decimation in
achieve fewer computations. Then, the numerical accuracies frequency (DIF) algorithm because it is not possible to
are compared for the fixed point and the floating point number apply the property (3) at each stage, since multiplication of
for DIT algorithms, detailed in Section III. The implemented twiddle factor is done after computing the arithmetic
results for radix-2 and radix-4 are shown in Section IV. The operation which includes either addition or subtraction. On
architecture is compared with other radices in addition to this the other hand, it has been manifested that it is feasible to
the time computation of multiplication addition and addition retrieve the same savings for the DIF decomposition using
multiplication are compared and explained in section V. an alternative algorithm [2] that uses linear-phase
Finally it is concluded in section VI. sequences. Almost number of multiplications in these
algorithms is diminished to half of that required for the
WN 0 computations of CFFT, and the number of additions used
are less than half the additions used for the calculation of
CFFT. As a result, one half of the memory is needed. Thus,
there are minor contrasts in the number of operations and in
WN q the order in which the computations are executed. The flow
graph for an 8 point radix-2 DIT FFT is shown in Fig.3.
This computes all the redundant operations as a result the
requirement of complex adders, multipliers and the
WN 2q subtactors required will be greater this adversely affects the
memory requirements. To find a solution to this RFFTs are
used. In case of RFFT, the redundant operations are
neglected, one half of the computations are done and the rest
WN 3q of the computations are computed by taking complex
conjugate. In case of FFT and RFFT at the final stage, 2 real
Fig. 2b. Flow-Graph of Radix-4 FFT
values will be obtained
x(0) X[0]
x(2) X[1]
x(4) X[2]
x(6) X[3]
x(1) X[4]
x(3) X[5]
x(5) X[6]
x(7) X[7]
Fig. 3. Flow-Graph for a radix-2 8 Point FFT
Fig. 5. Flow-Graph for an 8-point radix-4 RFFT
The flow-graph for an 8-point RFFT is shown in the Fig.4.In
case of RFFT no multipliers are used when the twiddle factor has III. NUMERICAL ACCURACY COMPARISON
a value as WN0.The dark lines represents the data propagation The numerical computations of Fast Fourier Transforms are
path the dashed lines represents the presence of redundant calculated both theoretically as well as practically. The
operations in the computation and hence these path can be theoretical calculations were done using the floating point
eliminated.
number having precision factor as 4, considering the fact that
A. Basics of RFFT algorithm when the precision factor is increased the accuracy of the
Fig. 3 shows the flow graph of an 8-point FFT for the results will be greater thereby lowering the truncation error,
case of radix-2, and decomposed according to the furthermore the practical calculations are also done both by
decimation in time (DIT) [9]. The flow-graph of radix-2 is means of fixed point number as well as the floating point
divided into n-stages, where n=log2N, here N represents the number. Table 1 shows the numerical comparison between the
floating point and the fixed point numbers that are obtained as
x(0) X[0] the result of the computation of RFFT. The result obtained
x(4) X[1]
will be a complex value, which has real and imaginary
component. The imaginary components are represented as j.
x(2) X[3] The practical calculation produces the same result produced by
RFFT when computed using floating point.
x(6) X[4]
TABLE 1 NUMERICAL COMPARISON
x(1) X[5]
Theoretical value Practical value
x(5) X[6]
Fixed point Floating point
x(3) X[7]
36 36 36
x(7) X[8] -4 -4 -4
-4+4j -4+4j -4+4j
Fig. 4. Flow-Graph for a radix-2 8 Point RFFT
-4-4j -4-4j -4-4j
N-point RFFT and 2 represents the radix, each stages -4+9.656j -4+12j -4+9.6569j
consists of a set of butterflies [7]. The numbers at the input -4-1.656j -4-4j -4-1.656j
and the output of the graph represent respectively the index -4j+9.656 -4+4j -4j+9.656
of the input and output samples. When considering the -4j+1.656 -4+12j -4j+1.656
inputs as complex, all the internal nodes and outputs of the
graph are essential for the computation of the CFFT, and the IV. SIMULATION RESULTS
consistency of the flow graph leads to efficient pipelined
architectures [2]. On the other hand, if the inputs are real, it The inputs are applied to a radix-2 8-point RFFT, the
is possible to simplify the graph according to the properties intermediate stage outputs are being stored in the buffer, for
of the RFFT, simplified flow-graph of radix-2 and radix-4 the next instance of time the previously stored register will be
RFFT are shown in the Fig. 4. and Fig. 5. updated by the computed values produced by that stage of the
RFFT.
TABLE 2 COMPUTATION TIME
Input size TMA TAM TAM-TMA % Difference

8-bit 1.24 1.30 0.06 4.8%
16-bit 2.29 2.48 0.19 8.30%
24-bit 3.52 3.82 0.30 8.52%
TMA computation time of multiplication followed by addition

TAM - computation time of addition followed by multiplication
The comparisons were made between radix-2 and radix-4
the number of multipliers and the adders are used is shown in
the Table 3.
TABLE 3 NUMERICAL ACCURACY COMPARISON
Fig. 6. Output for radix-2 8-point DIT-RFFT Radix-2 Radix-4 Split Mixed
M (3/2)MN-5N+8 3/8Nlog2N (4/3)MN-(38/9)N+6+(2/9(- 2MN-7N+12

1)M
A (7/2)MN-5N+8 (7/2)MN-5N+8 (8/3)MN-(16/9)N+2-(2/9(-1)M 3MN-3N+4
M-number of multipliers used

A-number of adders used
VI. CONCLUSION
In this paper, we urge a generalized RFFT architecture for
performing RFFT for real-valued signal using radix-2 and
radix-4. The radix-4 is found to be more dominance as it
accomplishes the requirement of maximizing the management
of processing elements with minimal storage of words.
Complex multiplier, adder, subtractor are used for computing
RFFT. From the results, it is obvious that the RFFT processor
Fig. 7. Output for radix-4 8-point DIT-RFFT
has advantages of adaptability and hardware efficiency in
terms of arithmetic complexity and timing computations are
In this paper radix-4 and radix-2 RFFT architecture is
compared.
studied and simulated in VERILOG using Xilinx software. REFERENCES
Radix-4 algorithm have an advantage over radix-2 algorithms
because a single radix-4 butterfly does the work of four radix- [1] B. G. Jo and M. H. Sunwoo, New continuous-flow mixed-radix (CFMR)
2 butterflies which will reduce the steps of computation and FFT processor using novel in-place strategy, IEEE Trans. Circuits Syst. I,
end up with more speed in processing the data. In case of Reg. Papers, vol. 52, no. 5, pp. 911919, May 2005.
radix-2, for an 8-point RFFT 3 stages exist whereas in case of [2] B. R. Sekhar and K. M. M. Prabhu, Radix-2 decimation - in-frequency
algorithm for the computation of the real-valued FFT, IEEE Trans. Signal
radix-4 RFFT, only 2 stages exist. In addition radix-2 had Process., vol. 47, no. 4, pp. 11811184, Apr. 1999.
lowest number of redundant operation, the lowest number of [3]H. F. Luo, Y. J. Liu, and M.-D. Shieh, Efficient memory-addressing
redundant operation is due to the fact that initially in the first algorithms for FFT processor design, IEEE Trans. Very Large Scale Integr.
stage the symmetric property cant be applied and hence in (VLSI) Syst., 2014, to be published. [Online]. Available: http://ieeexplore.
ieee.org
further stages that includes stage-2 and 3 had redundant [4] H. Murakami, Real-valued decimation-in-time and decimation-
operation due to the multiplication of the twiddle factor. On infrequency algorithms, IEEE Trans. Circuits Syst. II, Analog Digit. Signal
the other hand, the radix-4 algorithm includes greater Process., vol. 41, no. 12, pp. 808816, Dec. 1994.
redundant operations, in this case the number of stages will be [5] H. V. Sorensen, D. L. Jones,M. Heideman, and C. S. Burrus, Real-valued
fast Fourier transform algorithms, IEEE Trans. Acoust., Speech, Signal
2, because number of stages used in radix-4 computation is Process., vol. ASSP-35, no. 6, pp. 849863, Jun. 1987.
log4N, here N represents the N-point RFFT and 4 represents [6] M. Ayinala and K. K. Parhi, FFT architectures for real-valued signals
the radix. based on radix-23 and radix-24 algorithms, IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 60, no. 9, pp. 24222430, Sep. 2013.
[7] M. Ayinala, L. Yingjie, and K. K. Parhi, An in-place FFT architecture for
V. COMPARISON real-valued signals, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60,no. 10,
The timing complexities were compared for DIT and DIF. pp. 652656, Oct. 2013.
[8] M. Ayinala and K. K. Parhi, Parallel-pipelined radix-22 FFT architecture
DIT uses multiplication followed by arithmetic operation such for real-valued signals, in Conf. Rec. 44th ASILOMAR Signals,
as addition whereas the DIF FFT performs addition followed Syst.Comput., 2010, pp. 12741278.
by multiplication. For computing DIT algorithm, it consumes [9] M. Garrido, K. K. Parhi, and J. Grajal, A pipelined FFT architecture for
lesser time in comparison with DIF and the percentage real-valued signals, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56,
no. 12, pp. 26342643, Dec. 2009.
difference is shown in the Table 2 and hence it was concluded [10]P.-Y. Tsai and C.-Y. Lin, A generalized conflict-free memory addressing
that DIT finds better scope in areas where computation time scheme for continuous-flow parallel-processing FFT processors with
must be lower that is the execution speed is quite faster. rescheduling, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19,
no. 12, pp. 22902302, Dec. 2011.
[11]S. A. Salehi, R. Amirfattahi, and K. K. Parhi, Pipelined architectures for
real-valued FFT and Hermitian-symmetric IFFT with real datapaths, IEEE
Trans. Circuits Syst. II, Exp. Briefs, vol. 60, no. 8, pp. 507511, Aug. 2013.
[12]Zhen-Guo Ma, Xiao-Bo Yin, and Feng Yu, A Novel Memory-Based
FFT Architecture for Real-Valued Signals Based on a Radix-2 Decimation-In-
Frequency Algorithm, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 62, no.
9, pp. 507511, Sep. 2015.

Sang PSR

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Sang PSR

Transféré par

Droits d'auteur :

Formats disponibles

Architecture for Real-Valued Signal Based on Radix - 2

conservation of memory space and for effective utilization of

F computation of DFT and finds application in the fields of

Input size TMA TAM TAM-TMA % Difference

TMA computation time of multiplication followed by addition

M (3/2)MN-5N+8 3/8Nlog2N (4/3)MN-(38/9)N+6+(2/9(- 2MN-7N+12

M-number of multipliers used

Vous aimerez peut-être aussi