Académique Documents
Professionnel Documents
Culture Documents
课程设计报告
设计题目:快速傅立叶变换(FFT)的计算机实现
完 成 人:王 璠
班号:电气 0612 班
姓名:王 璠
学号:012006019801
二 OO 八年 十一 月
Fast Fourier Transform Computation Using a C8051 MCU
Fan Wang
(Huazhong University of Science & Technology, Wuhan, 430074)
Ⅰ INTRODUCTION
The Fast Fourier Transform(FFT) is an efficient algorithm for digital N-point DFT
computation. The algorithm was first used by Gauss in 1805, and rediscovered in the
1960s by Cooley and Tukey. The most widely used method of Cooley-Tukey
Algorithm is called Radix-2 Algorithm, which is applied in this paper.
FFT is of great importance in current digital processing systems, and there are
specially designed processors for this kind of use, for example, DSPs. Unfortunately,
the processor introduced in this system---C8051 MCU---is mostly used in simple
control systems and is not specially designed for multiplications and additions.
Therefore, the purpose of this paper is not a practical industrial solution, but a deep
understanding of the basic theories of signal processing systems and the way of
putting theories into practice.
Ⅱ BACKGROUNDS
-1-
complexity becomes larger.
Fast Fourier Transform is a fast structure to compute the DFT. Note that WNnk has
two important properties: periodicity and symmetry, which are the fundamentals of
FFT algorithm. Using the symbol ((nk )) N to represent the remainder from a division
= ∑ x(n)W
n ~ even
nk
N + ∑ x(n)W
n ~ odd
nk
N
N −1 N −1
2 2
= ∑ x(2n)W
n =0
kn
N /2 +W k
N ∑ x(2n + 1)W
n =0
nk
N /2
= G (k ) + WNk H (k )
N −1 N −1
2 2
where G (k ) = ∑ x(2n)W
n =0
kn
N /2 , H (k ) = ∑ x(2n + 1)W
n =0
nk
N /2 . Now that the previous N-
point DFT becomes two N/2-point DFTs, the complexity is reduced. According to the
N
symmetry property, WNk + N / 2 = −WNk , so we have X (k + ) = G (k ) − WNk H (k ) . In
2
conclusion,
X (k ) = G (k ) + WNk H (k ) k = 0,1,2,...N / 2 − 1
X (k + N / 2) = G (k ) − WNk H (k ) k = 0,1,2,...N / 2 − 1
-2-
In the decimation in time FFT, the computation flow graph of the time domain
sequence can be briefly illustrated by a butterfly diagram. A diagram of 8-point
sequences is shown in Figure 1.
The diagram immediately suggests that three loops should be included in the software:
the stage loop, the group loop, and the butterfly loop. As the diagram shows, there are
v = log 2 N stages, each stage m, has 2 v −m groups and each group has 2 m−1 butterflies.
Since the transformed complex numbers are not in normal order, more storage space
is needed by the intermediate number, in order to obtain the right results. To minimize
the storage space required for intermediate data, either the input data is required to be
scrambled or the output data needs to be unscrambled. The butterfly diagram
illustrates that if the input data is in bit-reversed order, the results will be in the
normal order; if the input data is in normal order, the output data can be unscrambled
to normal by a process of bit-reversal.
-3-
C. Anti-aliasing Filtering and Windowing
If the input signal has an high frequency component which is higher than half of the
sampling frequency, it may be mis-interpreted to have lower frequencies which are
not exist. This effect is known as frequency aliasing. To avoid this, the input signal
should be low-pass filtered.
When dealing with practical sytems, the sampling signal is time limited. In time
domain, the signal is multiplied by a square wave function (a rectangular window) of
the time. But the software makes the assumption that the data is periodic and repeated
infinitely. Taking DFT transform of a sampled sinusoidal signal, we obtain a spectrum
which has the shape of a sinc function. In this way, the energy of the sinusoidal signal
leaks to a higher spectrum. In practice, the main lobe of the sinc function is desired to
be narrow and high enough, while the side lobe level should be low. Several sampling
windows has been invented to make a practical trade-off of the two. Triangular,
Hanning, Hamming and Blackman windows are all common windows in practical,
which will be discussed in advance in Section Ⅲ.
-4-
enables it to execute most of instruction set in only 1 or 2 system clocks. Under the
help of the on-chip PLL, the peak speed of the chip can reach as high as 100MIPS. At
the same time, a 2-cycle 16×16 MAC engine accelerates the computation efficiently.
Ⅲ SOFTWARE DESIGN
A. ADC0_ISR( )
The 12-bit on-chip ADC0 collects data at a designed a sample rate. When a Timer3
overflow happens, a data-collecting conversation is initiated. The data is then stored
in the ADC0H :ADC0L registers in a left-justified form, that is, the higher 12 bits are
the data bits, while the last 4 bits are not used. So the range of the input data is from
0x0000 to 0xFFF0. The overflow interval of Timer3 is determined by a configurable
macro variable SAMPLE_RATE, so that the sample rate of the system is determined.
After the N-point data samples have been collected, ADC0 interrupt is disabled. Then
comes the next function.
B. WindowCalc( )
Four types of windows are available in this software. Each of them has its own
coefficients, which will be multiplied with the time domain data. The coefficient
equations are listed below.
Type Equation
0: None(Rectangular) W(n)=1
1: Triangular W(n)=n/(N/2) (0≤n≤N/2)
W(n)=2-n/(N/2) (N/2<n<N)
2: Hanning W(n)=0.5-0.5cos(2πn/N)
3: Hamming W(n)=0.54-0.46cos(2πn/N)
4: Blackman W(n) = 0.42 - 0.5cos(2πn / N)
+0.08cos(4πn / N)
-5-
The window coefficients are pre-calculated and stored in the header file
FFT_Code_Tables.h. The type define variable WINDOW_TYPE can be selected from
0 to 4.
The function also converts the single-ended data collected by ADC0 into differential,
in order to remove the DC component of the input signal. The process is carried out
by XOR the input data with 0x8000, which has the same effect with subtracting
32768 from the input data, so that the data ranges from 32752 to -32768.
C. Bit_Reverse( )
From the butterfly diagram, we know that either the input data or the output FFT array
requires a bit reverse operation. In this software, the input data is bit-reversed. Take a
8-point FFT as an example:
Previous Data Previous Index New Index Desired Data
x(0) 000 000 x(0)
x(1) 001 100 x(4)
x(2) 010 010 x(2)
x(3) 011 110 x(6)
x(4) 100 001 x(1)
x(5) 101 101 x(5)
x(6) 110 011 x(3)
x(7) 111 111 x(7)
To save data storage space, only half of the bit-reverse array is stored, the other half is
useless for the data will be exchanged by the first half of operation. For example, the
8-point FFT listed above may use the following bit-reverse array:
BRTable[ ]={0,2,1,3};
And the sentence NewIndex=BRTable[PreviousIndex]*2 will help us location the nth
reversed index.
D. Int_FFT( )
In this algorithm, complex numbers are divided into real parts and imaginary parts.
The modified input data has only real parts and is stored in the array Real[]. The later
stages will produce imaginary parts as well, which will be stored in Imag[]. Again
check the butterfly equation, when WN is divided into real part and imaginary part,
-6-
Re1 = Re1 + (cos(x) × Re2 + sin(x) × Im2)
Re2 = Re1 - (cos(x) × Re2 + sin(x) × Im2)
Im1 = Im1 + (cos(x) × Im2 - sin(x) × Re2)
Im2 = Im1 - (cos(x) × Im2 - sin(x) × Re2)
-7-
To improve the speed of the FFT computation, integer numbers are used instead of
float numbers. All of them are set to be 16-bit int. Multiplications are avoided
whenever possible, and all divisions are implemented using right shifts. But this
method may cause some asymmetrical effects when the number is negative and the
deleted bits are non-zero. For example, -3/2=-1, but -3>>1=-2. A one should be added
to the result. In general, if m is a signed int or long int, to obtain m / 2 k , the following
-8-
E. Spectrum_Display( )
The FFT result is in complex form, of which the real part is stored in Real[], and the
imaginary part is stored in Imag[]. Based on the two arrays, the magnitude spectrum
and phase spectrum can be easily obtained. This software only calculates power
spectrum of the input signal. The spectrum is displayed on a 128×64 point LCD
screen.
An EE1643C Function Signal Generator serves as the system’s testing signal source.
The signal is first filtered by an anti-aliasing filter with 32kHz cut-off frequency (1
order passive low-pass), and then sampled by ADC0 at a certain sampling rate. Four
configurable macro variables determine the system’s performance. The chart shows
what they are.
Name of variable Function
NUM_FFT Number of points of FFT algorithm
SAMPLE_RATE Sampling rate of ADC0 in Hz
WINDOW_TYPE Type of windows
RUN_ONCE Program runs once when the value is 0,many times
when the value is non-zero.
Further discussions are started by changing the values of the above 4 variables or the
wave form of the function generator.
A. Function Change
-9-
When the frequency of the sinusoidal changes, the “energy bar” which has the same
height will move to another place.
- 10 -
Fig.TR.9. Rectangular without windowing. Fig.TR.10.Rectangular with Blackman window.
(iii) SAMPLE_RATE Change
All the above examples are signal samples of a sampling rate of 40kHz. According to
the Nyquist sampling theory, frequencies lower than 20kHz can be easily located. So
the full range of the screen is set to be 20kHz. But when the sampling rate is divided
by 2, a range of 10kHz spectrum is obtained.
Fig.TR.11. 5k sine at 40kHz sampling rate. Fig.TR.12. 5k sine at 20kHz sampling rate.
(iv) RUN_ONCE Change
To check the speed of the system, RUN_ONCE is set to zero. For all audio signals at a
sampling rate of 40kHz, the LCD screen never blinks and a steady frequency
spectrum of the input time domain signals can be read without any obstacles. This
proves the real-time property of the system.
C. Property of Symmetry
The real part of FFT output exhibits even symmetry and imaginary part of FFT output
exhibits odd symmetry. This results in the symmetry of frequency spectrum. That is,
for N-point FFT products, point N/2+1 is the same as point N/2-1, point N/2+2 is the
same as point N/2-2, etc. To illustrate this property, set the full range of the screen to
40kHz. Here are the results.
Fig.TR.13. 2kHz square wave at full range. Fig.TR.14. 13kHz square wave at full range.
- 11 -
Ⅴ CONCLUSION
In this paper, a method of Fast Fourier Transform computation based on C8051 MCU
is introduced. Computational efficiency and accuracy are improved by the use of three
techniques: (1) avoidance of multiplication whenever possible; (2) 16-bit integer
storage; (3) on-chip PLL. Basic concepts of DFT and FFT are introduced in detail.
Practical problems such as bit-reversal, frequency aliasing and spectrum leakage are
carefully discussed and well solved by software or hardware solutions. Some typical
wave forms’ spectrums and properties of the FFT spectrums are checked by a LCD
screen.
Ⅵ REFERENCES
[1] Edward W. Kanmen and Bonnie S. Heck, Fundamentals of Signals and Systems:
Using the Web and MATLAB, 2nd ed. Pearson Education, 2002.
[2] Julius O. Smith Ⅲ, Mathematics of the Discrete Fourier Transform (DFT) with
Audio Applications, 2nd ed. http://ccrma.stanford.edu/~jos/mdft/
[3] Junli Zheng, Signals and systems, 2nd ed. Higher Education Press, 2000.
[4] AN142 “FFT Routines for the C8051F12X Family”, Rev.1.1. Inc. ©Silicon
Laboratories, 2003.
[5] “FFT Library-Module user’s Guide C28x Foundation Software”. Inc. ©Texas
Instruments, 2002.
- 12 -
APPENDIX
A. Important Codes
//-----------------------------------------------------------------------------
// Int_FFT
//-----------------------------------------------------------------------------
void Int_FFT(int ReArray[], int ImArray[])
{
#if (NUM_FFT >= 512)
unsigned int sin_index, g_cnt, s_cnt; // Keeps track of the proper index
unsigned int indexA, indexB; // locations for each calculation
#endif
#if (NUM_FFT <= 256)
unsigned char sin_index, g_cnt, s_cnt; // Keeps track of the proper index
unsigned char indexA, indexB; // locations for each calculation
#endif
// FIRST STAGE
indexA = 0;
for (g_cnt = 0; g_cnt < NUM_FFT/2; g_cnt++)
{
indexB = indexA + 1;
TempReA = ReArray[indexA];
TempReB = ReArray[indexB];
ReArray[indexA] = TempReA2;
ImArray[indexA] = 0; // set Imaginary locations to '0'
ImArray[indexB] = 0;
indexA = indexB + 1;
}// END OF FIRST STAGE
- 13 -
TempImA = ImArray[indexA];
TempImB = ImArray[indexB];
// The following first checks for the special cases when the angle "x" is
// equal to either 0 or pi/2 radians. In these cases, unnecessary
// multiplications have been removed to improve the processing speed.
if (sin_index == 0) // corresponds to "x" = 0 radians
{
TempL.l = (long)TempReA + TempReB;
if ((TempL.l < 0)&&(0x01 & TempL.b[3]))
TempReA2 = (TempL.l >> 1) + 1;
else
TempReA2 = TempL.l >> 1;
- 14 -
{
SinVal = SinTable[(NUM_FFT/2) - sin_index];
CosVal = -SinTable[sin_index - (NUM_FFT/4)];
}
else
{
SinVal = SinTable[sin_index];
CosVal = SinTable[(NUM_FFT/4) - sin_index];
}
// The SIN and COS values are used here to calculate part of the
// Butterfly equation
ReTwid.l = ((long)TempReB * CosVal) +
((long)TempImB * SinVal);
ImTwid.l = ((long)TempImB * CosVal) -
((long)TempReB * SinVal);
TempL.i[1] = 0;
TempL.i[0] = TempReA;
TempL.l = TempL.l >> 1;
ReTwid.l += TempL.l;
if ((ReTwid.l < 0)&&(ReTwid.i[1]))
TempReA2 = ReTwid.i[0] + 1;
else
TempReA2 = ReTwid.i[0];
// Calculate new value for ReArray[indexB]
TempL.l = TempL.l << 1;
TempL.l -= ReTwid.l;
if ((TempL.l < 0)&&(TempL.i[1]))
TempReB2 = TempL.i[0] + 1;
else
TempReB2 = TempL.i[0];
// Calculate new value for ImArray[indexA]
TempL.i[1] = 0;
TempL.i[0] = TempImA;
TempL.l = TempL.l >> 1;
ImTwid.l += TempL.l;
if ((ImTwid.l < 0)&&(ImTwid.i[1]))
TempImA = ImTwid.i[0] + 1;
else
TempImA = ImTwid.i[0];
// Calculate new value for ImArray[indexB]
TempL.l = TempL.l << 1;
TempL.l -= ImTwid.l;
if ((TempL.l < 0)&&(TempL.i[1]))
TempImB = TempL.i[0] + 1;
else
TempImB = TempL.i[0];
}
ReArray[indexA] = TempReA2;
ReArray[indexB] = TempReB2;
ImArray[indexA] = TempImA;
ImArray[indexB] = TempImB;
indexA++;
sin_index += group;
} // END of stage FOR loop (s_cnt)
indexA = indexB + 1;
sin_index = 0;
} // END of group FOR loop (g_cnt)
group /= 2;
stage *= 2;
} // END of While loop
} // END Int_FFT
- 15 -
B. Photo of the System
- 16 -