Vous êtes sur la page 1sur 6

40 IEEE Transactions on Consumer Electronics, Vol. 57, No.

1, February 2011

Manuscript received 01/13/11
Current version published 02/21/11
Electronic version published 02/21/11. 0098 3063/11/$20.00 2011 IEEE
Chu Yu, Member, IEEE, Mao-Hsu Yen, Pao-Ann Hsiung, Senior Member, IEEE,
and Sao-Jie Chen, Senior Member, IEEE

Abstract4G and other wireless systems are currently hot
topics of research and development in the communication
field. Broadband wireless systems based on orthogonal
frequency division multiplexing (OFDM) often require an
inverse fast Fourier transform (IFFT) to produce multiple sub-
carriers. In this paper, we present the efficient implementation
of a pipeline FFT/IFFT processor for OFDM applications.
Our design adopts a single-path delay feedback style as the
proposed hardware architecture. To eliminate the read-only
memories (ROMs) used to store the twiddle factors, the
proposed architecture applies a reconfigurable complex
multiplier and bit-parallel multipliers to achieve a ROM-less
FFT/IFFT processor, thus consuming lower power than the
existing works. The design spends about 33.6K gates, and its
power consumption is about 9.8mW at 20MHz
1
.

Index Terms FFT, IFFT, OFDM, complex multiplier.
I. INTRODUCTION
Discrete Fourier transform (DFT) is a very important
technique in modern digital signal processing (DSP) and
telecommunications, especially for applications in orthogonal
frequency demodulation multiplexing (OFDM) systems, such
as IEEE 802.11a/g [1], Worldwide Interoperability for
Microwave Access (WiMAX) [2], Long Term Eevolution
(LTE) [3], and Digital Video BroadcastingTerrestrial
(DVB-T) [4]. However, DFT is computational intensive and
has a time complexity of O(N
2
). The fast Fourier transform
(FFT) was proposed by Cooley and Tukey [5] to efficiently
reduce the time complexity to O(Nlog
2
N), where N denotes
the FFT size.
For hardware implementation, various FFT processors have
been proposed [6]-[23]. These implementations can be mainly
classified into memory-based and pipeline architecture styles.
Memory-based architecture is widely adopted to design an

1
This work was supported by the National Science Council, Taiwan, ROC,
under Grant NSC 98-2220-E-002-007 and NSC 99-2220-E-197-002.
Chu Yu is with the Department of Electronic Engineering, National Ilan
University, Yilan, Taiwan, ROC (e-mail: chu@niu.edu.tw).
Mao-Hsu Yen is with the Department of Computer Science and
Engineering, National Taiwan Ocean University, Keelung, Taiwan, ROC (e-
mail: yenmh@mail.ntou.edu.tw).
Pao-Ann Hsiung is with the Department of Computer Science and
Information Engineering, National Chung Cheng University, Chiayi, Taiwan,
ROC (e-mail: pahsiung@cs.ccu.edu.tw).
Sao-Jie Chen is with the Graduate Institute of Electronics Engineering,
National Taiwan University, Taipei, Taiwan, ROC (e-mail:
csj@cc.ee.ntu.edu.tw).
FFT processor, also known as the single processing element
(PE) approach. This deign style is usually composed of a
main PE and several memory units, thus the hardware cost
and the power consumption are both lower than the other
architecture style. However, this kind of architecture style
has long latency, low throughput, and cannot be parallelized.
On the other hand, the pipeline architecture style can get rid
off the disadvantages of the foregoing style, at the cost of an
acceptable hardware overhead. Generally, the pipeline FFT
processors have two popular design types. One uses a
single-path delay feedback (SDF) pipeline architecture, and
the other uses a multiple-path delay commutator (MDC)
pipeline architecture. The single-path delay feedback (SDF)
pipeline FFT [6]-[7] is good in its requiring less memory
space (about N-1 delay elements) and its multiplication
computation utilization being less than 50%, as well as its
control unit being easy to design. Such implementations are
advantageous to low-power design, especially for
applications in portable DSP devices. Based on these
reasons, the SDF pipeline FFT is adopted in our work.
However, the FFT computation often needs to multiply input
signals with different twiddle factors for an outcome, which
results in higher hardware cost because a large size of ROM
is needed to store the wanted twiddle factors. Therefore, to
throw off these ROMs for area-efficient consideration,
Maharatna et al. [8] proposed an efficient ROM-less
FFT/IFFT processor. The complex multipliers used in the
processor are realized with shift-and-add operations. Hence,
the processor uses only a two-input digital multiplier and
does not need any ROM for internal storage of coefficients.
Howerver, low speed and higher hardware cost caused by
the propsoed complex multiplier are the pay-off. Lin et al. [9]
employs a smart structure for ROM-size reduction to
produce twiddle factors as well as to compact the chip area.
In order to further improve the power consumption and chip
area of previous works, we propose an efficient radix-2
pipeline architecture with low power consumption for the
FFT/IFFT processor. Our proposed architecture includes a
reconfigurable complex constant multiplier and bit-parallel
complex multipliers instead of using ROMs to store twiddle
factors, which is suited for the power-of-2 radix style of
FFT/IFFT processors. In essence, a short version of the
present work has been published in [10]. In this paper, a more
detailed and completed description of the entire work is
provided and the final design of a 64-point FFT/IFFT
processor for OFDM applications is shown.
A Low-Power 64-point Pipeline FFT/IFFT Processor
for OFDM Applications

C. Yu et al.: A Low-Power 64-point Pipeline FFT/IFFT Processor for OFDM Applications 41
The rest of this paper is organized as follows. First, a brief
review of the fast Fourier transform is described in Section II.
Section III presents our proposed FFT architecture for
application in wireless communication systems. The
performance evaluation of various FFT architectures is then
discussed in Section IV. Finally, concluding remarks are given
in Section V.
II. FFT AND IFFT ALGORITHMS
The discrete Fourier transform (DFT) X
k
of an N-point
discrete-time signal x
n
is defined by:
1 0 ,
1
0
s s =

=
N k W x X
nk
N
N
n
n k
, (1)
where the twiddle factor
N nk j nk
N
e W
/ 2t
= denotes the N-point
primitive root of unity. However, a straightforward
implementation of this algorithm is obviously impractical due
to the huge hardware required. Therefore, the fast Fourier
transform (FFT) [5] was developed to efficiently speed up its
computation time and significantly reduce the hardware cost.
Generally, FFT analyzes an input signal sequence by using a
decimation-in-frequency (DIF) or decimation-in-time (DIT)
decomposition to construct an efficiently computational
signal-flow graph (SFG). Here, our work employs a DIF
decomposition because it matches the manipulation manner of
single-path delay pipeline facility. An example of radix-2 DIF
FFT SFG for N = 16 is depicted in Fig. 1.
X[0]
X[8]
X[4]
X[12]
X[2]
X[10]
X[6]
X[14]
X[1]
X[9]
X[5]
X[13]
X[3]
X[11]
X[7]
X[15]
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-j
-j
-j
-j
2
16
W
-j
6
16
W
x[0]
x[1]
x[2]
x[3]
x[4]
x[5]
x[6]
x[7]
x[8]
x[9]
x[10]
x[11]
x[12]
x[13]
x[14]
x[15]
2
16
W
-j
6
16
W
5
16
W
6
16
W
7
16
W
-j
3
16
W
2
16
W
1
16
W
PE1 PE2 PE3 PE3

Fig. 1 Radix-2 DIF FFT signal-flow graph of length 16.
The radix-2 DIF FFT described above appears regularity in
SFG and has less complex multipliers required. Thus, it is
suited for hardware implementation, because some complex
multiplications can be simplified to reduce the chip area. For
instance, an input signal multiplied by
2
16
W
in Fig. 1 can be
expressed as:
2 / )] ( ) [( 2 ) (
2
16
a b j b a W jb a + + = + , (2)
where (a + jb) denotes a discrete-time signal in complex form.
Similarly, the complex multiplication of
6
16
W
is given by:

2 / )] ( ) [( 2 ) (
6
16
b a j a b W jb a + = +
. (3)
Both these above equations will ease hardware
implementation in the future, because they only need to
calculate the multiplication by 2 / 2 and two real additions,
respectively. Especially, the multiplication by 2 / 2 can be
obtained easily, which circuit design will be introduced in the
latter section.
The inverse discrete Fourier transform (IDFT) of length N
is given by:
1 0 ,
1
1
0
s s =

N n W X
N
x
nk
N
N
k
k n
. (4)
To reuse the same hardware core for reducing the chip area
[16], (4) can be rewrite as:
1 0 ,
1
*
1
0
*
s s |
.
|

\
|
=

=
N n W X
N
x
nk
N
N
k
k n
, (5)
where the star symbol * denotes a conjugate. This new form
can be viewed as a general DFT. In other words, DFT and
IDFT can reuse the same hardware core, while IDFT requires
some extra computations. These extra computations include
conjugating the input data X
k
and the outcomes of DFT, as
well as dividing the previous output by N. Obviously, this new
reuse version of DFT/IDFT algorithm will also simplify the
design effort of an DFT/ IDFT processor and thus reduce the
chip area, if both the DFT and IDFT processors are activated
alternatively, and not simultaneously.
III. PROPOSED ARCHITECTURE
Traditional hardware implementation of FFT/IFFT
processors usually employs a ROM to look up the wanted
twiddle factors, and then wordlength complex multipliers to
perform FFT computing. However, this introduces more
hardware cost, thus a bit-parallel complex constant
multiplication scheme [8]-[11], [14]-[18] is used to improve
the foregoing issue.
Besides, since the twiddle factors have a symmetric
property, the complex multiplications used in FFT
computation can be one of the following three operation types:
Type 1:
2 4 ), ( ) (
) 4 / (
N/ k N/ ja b W jb a W
N k
N
k
N
< < = +

, (6)
Type 2:
4 3 2 ), ( ) (
) 2 / (
N/ k N/ jb a W jb a W
N k
N
k
N
< < + = +

, (7)
Type 3:
N k N/ ja b W jb a W
N k
N
k
N
< < = +

4 3 ), ( ) (
) 4 / 3 (
. (8)
Given the above three equations, any twiddle factor can be
obtained by a combination of these twiddle-factor primary
elements. In other words, arbitrary twiddle factor used in FFT
can utilize these operation types to derive the wanted value,
thus can significantly shorten the size of ROM used to store
the twiddle factors. Moreover, for hardware implementation
consideration, we add two extra operation types to further
decrease the size of ROM. Our method can also prune away
42 IEEE Transactions on Consumer Electronics, Vol. 57, No. 1, February 2011
the critical path in the designed hardware such that the system
clock becomes faster. The two additional operation types are
given by:
Type 4:
| | 4 / 1 , ) ( ) (
*
) 4 / (
N k ja b W jb a W
k N
N
k
N
< s + = +

, (9)
Type 5:
| | 2 4 , ) ( ) (
*
) 2 / (
N/ k N/ ja b W j jb a W
k N
N
k
N
< < + = +

. (10)
Based on the five operation types above, the 49 complex
multiplications after the third butterfly stage for 64-point FFT
will be reduced to the computation of 16 primary elements.
Clearly, this results in the twiddle-factor ROM table to be
shrunk significantly.
A. Proposed Architecture
In order to improve the previous works on power reduction,
we propose a radix-2 64-point pipeline FFT/IFFT processor
with low power consumption, as shown in Fig. 2. The
proposed architecture is composed of three different types of
processing elements (PEs), a complex constant multiplier,
delay-line (DL) buffers (as shown by a rectangle with a
number inside), and some extra processing units for
computing IFFT. Here, the conjugate for extra processing
units is easy to implement, which only takes the 2s
complement of the imaginary part of a complex value. The
divided-by-64 module can be substituted with a barrel shifter.
In addition, for a complex constant multiplier in Fig. 2, we
propose a novel reconfigurable complex constant multiplier to
eliminate the twiddle-factor ROM. This new multiplication
structure thus becomes the key component in reducing the
chip area and power consumption of our proposed FFT/IFFT
processor. The detailed functions of these modules appeared
in Fig. 2 are described in the following subsections.
PE3
8
PE2
16
PE1
32
in
i
W
64
[ ]*
0
1
FFT/IFFT FFT/IFFT
PE1
4
PE3
1
PE2
2
out
[ ]* 64
0
1
FFT/IFFT FFT/IFFT

Fig. 2 Proposed radix-2 64-point pipeline FFT/IFFT processor.
B. Processing Elements
Based on the radix-2 FFT algorithm, the three types of
processing elements (PE3, PE2, and PE1) used in our design
are illustrated in Fig. 3, Fig. 4, and Fig. 5, respectively. The
functions of these three PE types correspond to each of the
butterfly stages as shown in Fig. 1. First, the PE3 stage is used
to implement a simple radix-2 butterfly structure only, and
serves as the submodules of the PE2 and PE1 stages. In the
figure, I
in
and I
out
are the real parts of the input and output data,
respectively. Q
in
and Q
out
denote the image parts of the input
and output data, respectively. Similarly, DL_I
in
and DL_I
out

stand for the real parts of input and output of the DL buffers,
and DL_Q
in
and DL_Q
out
are for the image parts, respectively.
As for the PE2 stage, it is required to compute the
multiplication by j or 1. Note that the multiplication by -1 in
Fig. 4 is practically to take the 2s complement of its input
value.
In the PE1 stage, the calculation is more complex than the
PE2 stage, which is responsible for computing the
multiplications by j,
8 / N
N
W
, and
8 / 3N
N
W
, respectively. Since
8 / 8 / 3 N
N
N
N
jW W =
, it can be given by either the multiplication
by
8 / N
N
W
first and then the multiplication by j or the reverse
of the previous calculation. Hence, the designed hardware
utilizes this kind of cascaded calculation and multiplexers to
realize all the necessary calculations of the PE1 stage. This
manner can also save a bit-parallel multiplier for
computing
8 / 3N
N
W
, which further forms a low-cost hardware.
1
0
I
in
1
0
DL_I
out
I
out
1
0
Q
in
1
0
DL_Q
out
Q
out
s0
s0
DL_I
in
DL_Q
in

Fig. 3 Circuit diagram of our proposed PE3 stage.

I
in
DL_I
out
DL_Q
in
I
out
1
0
-1
Q
in
DL_Q
out
Q
out
DL_I
in
0
1
s1 PE3

Fig. 4 Circuit diagram of our proposed PE2 stage.

DL_I
in
0
1
0
1
1
0
1
0
-1
DL_Q
in
s2
8 / N
N
W
I
Q
0
1
0
1
1
0
1
0
s1
I
in
DL_I
out
I
out
Q
in
DL_Q
out
Q
out
PE3

Fig. 5 Circuit diagram of our proposed PE1 stage.
C. Bit-Parallel Multipliers
In Section II, the multiplication by
2 / 1
can employ a bit-
parallel multiplier to replace the wordlength multiplier and
square root evaluation for chip area reduction. The bit-parallel
operation in terms of power of 2 is given by [15]:
) 2 2 2 2 2 2 (
14 8 6 4 3 1
2
2
+ + + + + = = in in output
.
(11)
If a straightforward implementation for the above equation
is adopted, it will introduce a poor precision due to the
truncation error [17], and will spend more hardware cost.
Therefore, to improve the precision and hardware cost, Eq.

C. Yu et al.: A Low-Power 64-point Pipeline FFT/IFFT Processor for OFDM Applications 43
(11) can be rewritten as:
)] 2 2 )( 2 1 ( 1 [
2 6 2
2
2
+ + = = in in output
.
(12)
According to (12), the circuit diagram of the bit-parallel
multiplier is illustrated in Fig. 6. The resulting circuit uses
three additions and three barrel shift operations. The
realization of complex multiplication by
8 / N
N
W
using a radix-2
butterfly structure with its both outputs commonly multiplied
by
2 / 1
[26], is shown in Fig. 7. This circuit has just been
used in the PE1 stage.

Fig. 6 Circuit diagram of the bit-parallel multiplication by
2 / 1
.
2 / 1
2 / 1
I
in
Q
in
I
out
Q
out

Fig. 7 Circuit diagram of the multiplication by
8 / N
N
W
.
D. Reconfigurable Complex constant Multipliers
Based on Eqs. (6)-(10), a reconfigurable low-complexity
complex constant multiplier for computing
i
W
64
is proposed,
as shown in Fig. 8 and Fig. 9. This structure of this complex
multiplier also adopts a cascaded scheme to achieve low-cost
hardware. Here, the meaning of two input signals (I
in
and I
out
)
and two output signals (Q
in
and Q
out
) are the same as the
signals in the PE1 stage.
I
out
Q
out
s5
I
in
Q
in
(-1)
(-1)
s3
I
Q
s1
0
1
0
1
1
0
1
0
0
1
0
1
1
0
1
0
0
1
0
1
1
0
1
0
s4
s2
(-1) 1
0
1
0
Complex
Multiplier

Fig. 8 Proposed reconfigurable complex constant multiplier.
I
in
Q
in
I
out
Q
out
i
1
i
2
i
3
i
4
i
5
i
6
i
7
i
8
q
1
q
2
q
3
q
4
q
5
q
6
q
7
q
8
Circuit Switch

Fig. 9 Complex multiplier used in Fig. 8.
In Fig. 9, this circuit is responsible for the computation of
multiplication by a twiddle factor
i
W
64
in Fig. 8, which is also
an important circuit of our FFT/IFFT processor. The
wordlength multiplier used in Fig. 9 adopts a low-error fixed-
width modified Booth multiplier for hardware cost reduction
[25]. The coefficient values i
1
-i
8
and q
1
-q
8
are listed in Table I,
which can be used to synthesize the entire twiddle factors
required in our proposed 64-point FFT processor.
TABLE I
COEFFICIENT VALUES IN Fig. 9
Coefficient Value Coefficient Value
i
1
0.7071 q
1
0.7071
i
2
0.7730 q
2
0.6343
i
3
0.8314 q
3
0.5555
i
4
0.8819 q
4
0.4713
i
5
0.9238 q
5
0.3826
i
6
0.9569 q
6
0.2902
i
7
0.9807 q
7
0.1950
i
8
0.9951 q
8
0.0980
Besides, we need not to use bit-parallel multipliers to
replace the wordlength one for two reasons. One is on the
operation rate. If bit-parallel multipliers are used, the clock
rate is decreased due to the many cascased adders. The other
reason is the introduction of high wiring complexity because
many bit-parallel multipliers are required to be switched for
performing multiplication operations with different twiddle
factors. In fact, this phenomenon also appears in [8], [11],
[16]. Based on the above two reasons, the wordlength
multiplier is still adopted to implement our complex constant
multiplier under the consideration of operation speed and chip
area. Note that our proposed complex constant multiplier will
not introduce the issue of high hardware cost as described
earlier, because no ROM is used.
IV. PERFORMANCE EVALUATION AND RESULT
The performance evaluation of various 64-point FFT
architectures is summarized in Table II. Here, the formulation
of normalization power per FFT is defined as follows [17],
[19]:
2
normalized power per FFT
1000.
( ) ( )
Power
Voltage FFT size Frequency
=




(13)
From this table, both the gate counts and power consumption
of our proposed architecture are lower than the previous
works. Especially, our design relatively consumes much lower
active power than other competitors. For instance, under the
same process technology, the power consumption of our
design is about 2.8 times lower than the one of [18].
Compared with the work in [20], our design is about twice
lower than its power consumption.
The functional simulation of the proposed architecture has
been justified by using Verilog HDL. The result evidences the
validation of the proposed architecture. To further validate our
proposed architecture, we implement this architecture on a
commercial FPGA chip. The result shows that the proposed
architecture works very well.
44 IEEE Transactions on Consumer Electronics, Vol. 57, No. 1, February 2011
TABLE II
COMPARISON OF VARIOUS 64-POINT FFT ARCHITECTURES
Design
Word
Length
Gate
Counts
Technology Power
Normali-
zed
Power
[11] 16 - 0.13-m, 1.2v
22.36 mW
@20 MHz
12.1
[22] 16 - 0.18-m, 1.8v
217.18 mW
@100 MHz
10.4
[8] 16 - 0.25-m, 1.8v
41 mW
@20 MHz
9.8
[18] 16 - 0.18-m, 1.8v
68.95 mW
@50 MHz
6.7
[20]
In: 12
Out: 20
38168 0.35-m, 3.3v
507.85 mW
@150 MHz
4.9
Ours 16 33590 0.18-m, 1.8v
9.79 mW
@20 MHz
2.4

In addition, the proposed design has also been implemented
in 0.18 m CMOS technology. The chip layout is shown in Fig.
10, and its design parameters are summarized in Table III.

Fig. 10 Chip layout of our design.

TABLE III
PARAMETERS OF OUR CHIP
Process 0.18 m CMOS
Core Size 0.94 0.94 mm
2

Die Size 1.42 1.42 mm
2

Gate Counts 33590
Clock Rate 80 MH
Z

Pin Count 88
V. CONCLUSION
A novel ROM-less and low-power pipeline 64-point
FFT/IFFT processor for OFDM applications has been
described in this paper. Considering the symmetric property of
twiddle factors in FFT, we have designed a reconfigurable
complex constant multiplier such that the size of twiddle-
factor ROM is significantly shrunk, especially no ROM is
needed in our work. This result shows that our design owns
lower hardware cost and power consumption compared to the
existing ones. Of course, our proposed scheme can also be
adapted to high-point FFT applications, with a lower size of
twiddle-factor ROMs.
In addition, our hardware, synthesized in 0.18m CMOS
technology, requires about 33.6k gates, and has a working
frequency up to 80 MHz. Since our design is relatively low cost
and consumes lower power, it can serve as a powerful FFT/IFFT
processor in many other wireless communication systems.
ACKNOWLEDGMENT
The authors would like to thank anonymous referees for
their valuable suggestions to improve the presentation of this
paper. The authors would also like to thank the National
Science Council, Taiwan, ROC, to financially support this
work under Grants NSC 98-2220-E-002-007 and NSC 99-
2220-E-197-002.
REFERENCES
[1] IEEE Std 802.11a, 1999, Wireless LAN Medium Access Control
(MAC) and Physical Layer (PHY) specifications: High-Speed Physical
Layer in the 5 GHz band.
[2] IEEE 802.16, IEEE Standard for Air Interface for Fixed Broadband
Wireless Aceess Systems, the Institute of Electrical and Electronics
Engineers, Inc., June 2004.
[3] 3GPP LTE, Evolved Universal Terrestrial Radio Access (E-UTRA);
Physical Channels and Modulation 3GPP TS 36.211 v8.5.0, 2008-12.
[4] ETSI, Digital Video Broadcasting (DVB); Framing Structure, Channel
Coding and Modulation for Digital Terrestrial Television, ETSI EN 300
744 v1.4.1, 2001.
[5] J. W. Cooley and J. W. Tukey, An algorithm for the machine
calculation of complex Fourier series, Math. Comput., vol. 19, pp. 297-
301, Apr. 1965.
[6] S. He and M. Torkelson, Designing Pipeline FFT Processor for OFDM
(de)Modulation, in Proc. URSI Int. Symp. Signals, Systems, and
Electronics, vol. 29, Oct.1998, pp. 257-262.
[7] H.L. Groginsky and G.A. Works, A pipeline fast Fourier transform,
IEEE Transactions on Computers, vol. C-19, no. 11, pp. 1015-1019,
Nov. 1970.
[8] Koushik Maharatna, Eckhard Grass, and Ulrich Jagdhold, A 64-Point
fourier transform chip for high-speed wireless LAN application using
OFDM, IEEE Journal of Solid-State Circuits, vol. 39, no. 3, pp. 484-
493, Mar. 2004.
[9] Y.T. Lin, P.Y. Tsai and T.D. Chiueh, Low-power variable-length fast
Fourier transform processor, IEE Proc. Comput. Digit. Tech., vol. 152,
no. 4, pp. 499-506, July 2005.
[10] Chu Yu, Yi-Ting Liao, Mao-Hsu Yen, Pao-Ann Hsiung, and Sao-Jie
Chen, A Novel Low-Power 64-point Pipelined FFT/IFFT Processor for
OFDM Applications, in Proc. IEEE Intl Conference on Consumer
Electronics. Jan. 2011, pp. 452-453.
[11] Chin-Teng Lin, Yuan-Chu Yu, and Lan-Da Van, A low-power 64-point
FFT-IFFT design for IEEE 802.11a WLAN application, in Proc. IEEE
Int. Symp. Circuits Syst. (ISCAS), May 2006, pp. 4523-4526.
[12] Yuan Chen, Yu-Wei Lin, and Chen-Yi Lee, A Block Scaling FFT/IFFT
Processor for WiMAX Applications, in Proc. IEEE Asian Solid-state
Circuits Conf., 2006, pp. 203-206.
[13] Sheng-Yeng Peng, Kai-Ting Shr, Chao-Ming Chen, Yuan-Hao Huang,
Energy-Efficient 128rv2048/1536-point FFT Processor with Resource
Block Mapping for 3GPP-LTE system, in Proc. 2010 International
Conference on Green Circuits and Systems (ICGCS), 2010, pp. 14-17
[14] Minhyeok Shin and Hanho Lee, A High-Speed Four-Parallel Radix-2
4

FFT/IFFT Processor for UWB Applications, in Proc. IEEE Int. Symp.
Circuits and Systems, 2008, pp. 960-963.
[15] Jia Lihong, Gao Yonghong, Isoaho Jouni, and Tenhunen Hannu, A new
VLSI-oriented FFT algorithm and implementation, in Proc. IEEE
International on ASIC Conference, Sept. 1998, pp. 337341.
[16] Y.-W. Lin, H.-Y. Liu, and C.-Y. Lee, A 1 GS/s FFT/IFFT processor for
UWB applications, IEEE Journal of Solid-State Circuits, vol. 40, no. 8,
pp. 1726-1735, Aug. 2005.

C. Yu et al.: A Low-Power 64-point Pipeline FFT/IFFT Processor for OFDM Applications 45

[17] Jen-Chih Kuo, Ching-Hua Wen, Chih-Hsiu Lin, and An-Yeu Wu, VLSI
Design of a Variable-Length FFT/IFFT Processor for OFDM-based
Communication Systems, EURASIP Journal on Applied Signal Processing,
no. 13, pp. 1306-1316, Dec. 2003.
[18] Wei Han, T. Arslan, A.T. Erdogan, M. Hasan, A novel low power pipelined
FFT based on subexpression sharing for wireless LAN applications, in
Proc. IEEE Workshop on Signal Processing Systems, 2004, pp. 83-88.
[19] B. M. Bass, A low-power, high performance, 1024-point FFT processor,
IEEE Journal of Solid-State Circuits, vol. 34, no. 3, pp.380-387, Mar. 1999.
[20] Wen-Chang Yeh and Chein-Wei Jen, High-speed and low-power split-
radix FFT, IEEE Transactions on Signal Processing, vol. 51, no. 3, pp. 864-
874, Mar. 2003.
[21] Y. Jung, H. Yoon, and J. Kim, New efficient FFT algorithm and pipeline
implementation results for OFDM/DMT applications, IEEE Transactions
on Consumer Electronics, vol. 49, no. 1, pp. 14-20, Feb. 2003.
[22] M. Hasan, T.Arslan, and J.S. Thompson, A novel coefficient ordering based
low power pipelined radix-4 FFT processor for wireless LAN applications,
IEEE Transactions on Consumer Electronics, vol. 49, no. 1, pp. 128-134,
Feb. 2003.
[23] Chua-Chin Wang, Jian-Ming Huang, and Hsian-Chang Cheng, A 2K/8K
mode small-area FFT processor for OFDM demodulation of DVB-T
receivers, IEEE Transactions on Consumer Electronics, vol. 51, no. 1, pp.
28-32, Feb. 2005.
[24] P. Duhamel and H. Hollman, Split-radix algorithms. Electronics Letters,
vol. 20, pp. 14-16, Jan. 5, 1984.
[25] K. J. Cho, K. C. Lee, J. G. Chung, and K. K. Parhi, Design of low-error
fixed-width modified Booth multiplier, IEEE Trans. Very Large Scale
Integration Systems, vol. 12, no. 5, pp. 522531, May 2004.
[26] A.Wenzler and E. Luder, New structures for complex multipliers and their
noise analysis, in Proc. IEEE Int. Symp. on Circuits and Systems, May 1995,
vol. 2, pp. 14321435.
[27] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and
Implementation, New York: John Wiley & Sons, 1999.

BIOGRAPHIES

Chu Yu received the B.S. and M.S. degrees in
electronic engineering from the National Taiwan
University of Science and Technology, Taipei, Taiwan,
in 1991 and 1993, respectively, and the Ph.D. degree
in electrical engineering from the National Taiwan
University, Taipei, Taiwan, in 1999. Since 2000, he
has been a member of the faculty in the Department of
Electronic Engineering, National Ilan University,
where he is currently an associate professor. His
research interests include IC design for digital
communications and digital signal processing. Dr. Yu is a member of the IEEE.


Mao-Hsu Yen received the B.S., M.S. and Ph.D.
degrees in electronic engineering from the National
Taiwan University of Science and Technology,
Taipei, Taiwan, in 1991, 1993 and 2000,
respectively. Since 2005, he has been a member of
the faculty in the Department of Computer Science
and Engineering, National Taiwan Ocean University,
where he is currently an associate professor. His
research interests include the design of ASIC and
FPGA architectures.
Pao-Ann Hsiung (M98-SM07) received his
B.S. in Mathematics and his Ph.D. in Electrical
Engineering from the National Taiwan
University, Taipei, Taiwan, ROC, in 1991 and
1996, respectively. From February 2001 to July
2002, he was an assistant professor and from
August 2002 to July 2007 he was an associate
professor in the Department of Computer
Science and Information Engineering, National
Chung Cheng University, Chiayi, Taiwan, ROC.
Since August 2007, he has been a full professor. Dr. Hsiung was the
recipient of the 2001 ACM Taipei Chapter Kuo-Ting Li Young
Researcher for his significant contributions to design automation of
electronic systems. He was also a recipient of the 2004 Young Scholar
Research Award given by National Chung Cheng University to five young
faculty members per year. Dr. Hsiung is a senior member of the IEEE, a
senior member of the ACM, and a life member of the IICM. He has been
included in several professional listings including Marquis' Who's Who in
the World. Dr. Hsiung is an editorial board member of the International
Journal of Embedded Systems, Inderscience Publishers; the International
Journal of Multimedia and Ubiquitous Engineering, Science and
Engineering Research Center; an associate editor of the Journal of
Software Engineering, Academic Journals, Inc.; an editorial board
member of the Open Software Engineering Journal, Bentham Science
Publishers, Ltd.; and an international editorial board member of the
International Journal of Patterns. Dr. Hsiung has been on the program
committee of more than 50 international conferences and served as
organizer and chair for several conferences. He has published more than
150 papers in international journals and conferences. His main research
interests include: reconfigurable system design, system-on-chip design,
embedded software synthesis and verification, hardware-software
codesign and coverification, and application frameworks for real-time
embedded multicore software.




Sao-Jie Chen (M88SM03) received the B.S. and
M.S. degrees in electrical engineering from the
NationalTaiwan University, Taipei, Taiwan, R.O.C., in
1977 and 1982, respectively, and the Ph.D. degree in
electrical engineering from the Southern Methodist
University, Dallas, TX, in 1988. Since 1982, he has
been a member of the faculty in the Department of
Electrical Engineering, National Taiwan University,
where he is currently a Professor. From 1985 to 1988,
he was on leave from National Taiwan University and
working toward the Ph.D. degree at Southern Methodist University. During
the fall of 1987, he held a visiting appointment at the Department of Electrical
and Computer Engineering, University of Wisconsin, Madison. His current
research interests include VLSI physical design, SOC hardware/software co-
design, and Wireless LAN and Bluetooth IC design. Dr. Chen is a member of
the Chinese Institute of Engineers, the Association for Computing Machinery,
and the IEEE Computer Society.

Vous aimerez peut-être aussi