Académique Documents
Professionnel Documents
Culture Documents
I. I NTRODUCTION
Ultra-Wideband (UWB) Technology brings the convenience
and mobility of wireless communications to high-speed interconnects in devices through out the digital home and ofce
[1]. Multiband-OFDM standard is one solution for UWB
technology. A proposal for Multi-band OFDM UWB standard
is published by IEEE 802.15 3a study group [2]. In December
2007, the second revised version Standard ECMA-368 was
released, which specied physical layer (PHY) and medium
access control layer (MAC) of the UWB technology based on
Multiband-OFDM [3].
Some key issues need to be solved for designing CMOS
based Multiband-OFDM UWB solution in support of the low
power requirement. One of the issues focuses on its FFT (Fast
Fourier Transform) block, which takes 25% design complexity
of the total digital baseband transceiver [4]. Although many
results have already been published in this research area for the
past few years [5], [6], [7], the area and power consumption
of the FFT block still need to be improved since this system
targets for the wireless portable devices. Therefore, this paper
focuses on the area and power consumption improvement
under the ECMA-368 standard requirements. Section II describes the requirements for the FFT block and the algorithm
which the proposed design is based on. Section III focuses
on presenting the proposed FFT solution from algorithm,
architecture, and implementation level respectively. Section
IV shows the synthesis results both targeted for FPGA and
ASIC implementation. Meanwhile, the comparison with other
published implementations is also presented.
II. BACKGROUND
A. The Requirements of FFT for Multiband OFDM System
According to the ECMA-368, the required sampling frequency is 528MHz and the total number of subcarriers, which
determines the FFT size, is 128. The time period available for
the IFFT and FFT is 242.42ns, which is the inverse of sampling
frequency multiplying the FFT size (TF F T = 128 f1s ). There
are 37 zero padded sufx samples, which take 70.08ns. So the
total symbol interval is 312.5ns (TSY M = TF F T + TZP S ).
The word length choice is a critical issue for FFT processor
design. The trade-off between chip area consideration and
signal to quantization noise ratio (SQNR) directly determines
the choice. Based on the analysis of [5] and [8], the word
length is chosen to be 10 bits in this paper for simulation and
comparison with their designs.
B. The Selection of FFT Algorithms
The traditional radix 2 FFT algorithms have simple structure
and clear data ow, which are easy to implement and are
suitable for generic FFT implementation. Nevertheless, these
algorithms need large memory to store data at inner stages,
which require large power and area consumption. Nowadays,
there are two trends for FFT implementation of OFDM system,
the mixed radix algorithms, such as [7] and the pipeline
structure based algorithms, such as [9]. Based on extensive
algorithm analysis and selection, the proposed design employs
the Radix 22 algorithm developed by He and Torkelson [10],
which integrates the twiddle factor decomposition every two
stages. The Radix 22 algorithm has the same multiplicative
complexity as radix 4 algorithm, but retains the buttery
structure of radix 2 algorithm, which is very suitable for ASIC
implementation.
The detailed algorithm deduction can be found in [10]. Its
application to 8 point FFT is used here to briey explain the
algorithm, which is shown in Figure 1. In this application the
Radix 22 algorithm is only used once for the rst two stages,
because 8 point DFT can only be decomposed once by radix
4. For the last stage, normal radix 2 DIF algorithm is used. By
using Radix 22 algorithm, complex multiplication of the twiddle factor in the rst stage is changed into multiplying (j).
Therefore, in a pipeline structure, one complex multiplier can
be reduced for 8 point FFT.
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 5.
Fig. 4.
The last block only includes the seventh stage. Because the
odd and even data need to be commutated, two demultiplexers
seem to be required to switch the data, as shown in the
Figure 3. However, this can be improved by analyzing the
scheduling of the last stage. It can be found that only one
buttery is working per clock circle and the rst output data
of the even path will be processed with the rst output of the
odd path of the 6th stage. As long as the timing is matched,
the even path outputs will be processed with the odd path
ones correspondingly. Therefore, the two demultiplexers are
not necessary and only one buttery in the last stage is required
to process the data. The modied structure of the last stage
and interface with previous stage is shown in Figure 6.
IV. I MPLEMENTATION AND RESULT ANALYSIS
A. FPGA Implementation
The proposed design is synthesized and implemented by
Xilinx ISE which is targeted for FPGA Xilinx Virtex4 implementation. The arithmetic blocks are directly mapped to
TABLE II
T HE ASIC I MPLEMENTATION C OMPARISON
Fig. 6.
Technology
Clock frequency (MHz)
Parallel data format
Algorithm
Word length (bits)
Complex multipliers
Registers
Gates
Area (m2 )
Area (m2 ) scaled for
90 nm
DSP48 components in Xilinx Virtex4. Table I is the performance of the proposed implementation and the comparison
with [7]. The table clearly shows the reduced resource count
of the proposed design compared with the implementation in
[7]. The reason is that the proposed design employs far less
memory blocks and complex multipliers.
TABLE I
T HE COMPARISON WITH [7]
[7]
11
7390
3860
12749
48
proposed
10
717
457
2230
20
proposed implementation
90 nm, 1 V
264
2 data-path
Radix 22
10
5
128
38540
181140
181140
[8]
[12]
0.18 m, 1.8 V
450
2 data-path
Radix 24
10
2+0.41
190
70000
-
0.18 m, 1.8 V
250
4 data-path
Mixed Radix
10
2+2.48
2466382
616595.5
V. C ONCLUSION
In this paper, a novel parallel pipeline FFT processor is
designed for the ECMA-368 standard. Our architecture is
based on a revised version of the Radix 22 algorithm. Our
revision amounts to restructuring of the associated signal ow
graph into an even and odd part. As such, it not only achieves
the low multiplier count of the standard 22 algorithms, but
also a 50 % reduction of the clock frequency and the lowest
circular buffer count compared to the traditional SDF architectures. Both FPGA and ASIC targeted synthesis results of this
architecture are presented. The results show that the required
area is dramatically reduced based on the proposed design.
R EFERENCES
The used word length is lower than [7]. However, even when
the word length of proposed design is increased to 15, the total
equivalent gate count is still much lower than [7]. At 15 bits,
the total number slice registers, 4 input LUTS and DSP48s of
proposed design is 1052, 3600, and 20 respectively.
B. ASIC targeted results
The proposed design is also synthesized by Synopsys Design Compiler which is targeted for ASIC implementation.
The synthesis library is Faraday 90nm standard cell library
[11], which is tailored for UMC 90 nm logic LL-RVT (lowK)
process. During the implementation stage of our processor, [8]
was published, which employed the similar parallel structure.
However, there are some key differences between these two
architectures. Specically important differences are in the rst
and last stages where the proposed design reduces the number
of shift registers and the latency of the processor. Table II
is the performance of the proposed implementation and the
comparison with other start-of-the-art designs. The table shows
that the number of used gates of the proposed design is only
55% of [8]. If 180 nm technology would be linear scaled to 90
nm, the area is reduced by a factor of 4. Hence, the design of
[12] in 180 nm would compare to a area of 616595.5 m2 in
90 nm technology, which is still much larger than the proposed
design.
[1] INTEL,
Ultra-wideband
(uwb)
technology,
http://www.intel.com/technology/comms/uwb/.
[2] e. a. A. Batra, Multi-band OFDM physical layer proposal for IEEE
802.15 Task Group 3a, Tech. Rep., IEEE P.802.15-04/0493r0, 2004.
[3] Standard ECMA-368: High Rate Ultra Wideband PHY and MAC Standard 2nd Edition.
[4] A. Batra, J. Balakrishnan, G. Aiello, J. Foerster, and A. Dabak, Design
of a multiband OFDM system for realistic UWB channel environments,
Microwave Theory and Techniques, IEEE Transactions on, vol. 52, no. 9,
pp. 21232138, Sept. 2004.
[5] Y.-W. Lin, H.-Y. Liu, and C.-Y. Lee, A 1-GS/s FFT/IFFT processor
for UWB applications, Solid-State Circuits, IEEE Journal of, vol. 40,
no. 8, pp. 17261735, Aug. 2005.
[6] R. Chidambaram, A scalable and high-performance FFT processor,
optimized for UWB-OFDM, Masters thesis, Delft University of Technology, 2005.
[7] N. Rodrigues, H. Neto, and H. Sarmento, A OFDM module for a
MB-OFDM receiver, Design & Technology of Integrated Systems in
Nanoscale Era, 2007. DTIS. International Conference on, pp. 2529,
Sept. 2007.
[8] J. Lee and H. Lee, A High-Speed Two-Parallel Radix-24 FFT/IFFT
Processor for MB-OFDM UWB Systems, IEICE Trans Fundamentals,
vol. E91-A, no. 4, pp. 12061211, 2008.
[9] E. Saberinia, K. C. Chang, G. Sobelman, and A. H. Tewk, Implementation of a Multi-band Pulsed-OFDM Transceiver, J. VLSI Signal
Process. Syst., vol. 43, no. 1, pp. 7388, 2006.
[10] S. He and M. Torkelson, A new approach to pipeline FFT processor,
Parallel Processing Symposium, 1996., Proceedings of IPPS 96, The
10th International, pp. 766770, Apr 1996.
[11] FARADAY, FSD0A A 90 nm Logic SP-RVT(Low-K) Process. FARADAY Technology Corporation, 2006.
[12] T. Chakraborty and S. Chakrabarti, A reduced area 1 GSPS FFT design
using MRMDF architecture for UWB communication, in Circuits and
Systems, 2008. APCCAS 2008. IEEE Asia Pacic Conference on, 30
2008-Dec. 3 2008, pp. 11281131.