Vous êtes sur la page 1sur 171

AALBORG UNIVERSITY

INSTITUTE OF ELECTRONIC SYSTEMS


TITLE :
Hardware/Software
Co-Design of a Multipath
Jakes Channel Simulator for
an OFDM System
PROJECT PERIOD :
September 14
th
, 2004 -
June 2
nd
, 2005
PROJECT GROUP :
Antanas Veiverys
Vara Prasad Goluguri
SUPERVISORS :
Christian Rom
Ole Olsen
Peter Koch
Troels Bundgaard Srensen
Number of reports printed: 11
Number of pages in report: 114
Number of pages in appendix: 42
Total number of pages: 171
ABSTRACT:
The rst part of this Masters thesis de-
tails the analysis of a generic OFDM
system. An analytical OFDM system
model is expressed using matrix-vector
notation. Several channel models are
analyzed and added to the transceiver
simulator. The extreme programming
methodology is used and evaluated dur-
ing the development of the MATLAB sim-
ulator for the OFDM system model.
The second part of the thesis is dedi-
cated to decreasing the simulation time.
The most computationally complex part
of the simulator is ofoaded to an FPGA-
based hardware accelerator by using
hardware/software co-design techniques.
A structured analysis advocated by the
Rugby model is followed to dene the ac-
tivities and abstraction levels. The con-
cepts of co-design are presented in a lit-
erature study. The xed precision per-
formance of the Jakes channel simula-
tor algorithm is analyzed. The algorithm
is optimized in order to reduce preci-
sion requirements. The simulator is writ-
ten in Handel-C programming language
and compiled into hardware by using DK
Suite IDE. Finally, an analysis of perfor-
mance increase is performed.
PREFACE
This thesis is submitted to Aalborg University (AAU), Institute of Electronic Sys-
tems, for the Master of Science in Electronic Engineering degree with specializa-
tion in Applied Signal Processing & Implementation (ASPI). It presents the work
performed from September 14
th
2004 to June 2
nd
2005. It has been conducted in
co-operation with Cellular Systems (CSys) division at Aalborg University.
The thesis consists of two parts:
An Analytical part that contains the analysis and simulation of an Orthogonal
Frequency Division Multiplexing (OFDM) system model;
An Implementation part that applies hardware/software co-design to the radio
channel model in order to speedup the simulations.
The ASPI specialization that we have been attending is not directed to deal
with topics relating to mobile communications, so an extensive effort was found
necessary to pursue the analytical part of the thesis. It deals with the analysis
and simulation of a generic OFDM system considering various channel effects.
The approach followed to conduct this part of the thesis is based on the Extreme
Programming (XP) methodology.
The implementation part deals with hardware/software co-design issues relat-
ing to the goal of decreasing the execution time of the simulator developed in the
analytical part. The approach followed to conduct this part of the thesis is inspired
by the Rugby model.
A list of symbols and abbreviations used in the thesis is presented at the end
of the report.
MATLAB, C, Handel-C, DK Suite, GCC, RC203 Celoxica development board
and Design Trotter are the development tools and languages used during the thesis.
A CD containing the Handel-C source code, MATLAB les and L
A
T
E
X & pdf
versions of the report is supplied with the report.
The report is addressed to the supervisors and students of ASPI specialization
at Aalborg University.
We would like to thank Yannick Le Moullec, a post-doc fellow with Centre
for Embedded Systems (CISS, AAU) for our useful discussions during the imple-
mentation part of the thesis. Moreover, Sren Skovgrd Christensens comments
on the analytical part of the report are greatly appreciated. In particular, we would
like to thank our supervisor Christian Rom for helping us to get the necessary
background about OFDM by allocating his personal time.
June 2, 2005
Antanas Veiverys Vara Prasad Goluguri
NOTATION
References to equations, gures, sections, tables and appendices are given in the
form of Equation 2.1, Figure 1.1, Section 1, Table 3.1 and Appendix A respec-
tively. The references to literature is given in the form of [Proakis, 2001] which
consists of the last name of the author and year of publication of the correspond-
ing literature. If a reference consists of more than two authors, it is represented in
the form of [Coleri et al, 2002]. When the colored item in any reference is clicked
while accessing the text in the soft copy version of the report, the cursor directs
the reader to the details of that reference.
A list of symbols is given in Appendix A. Page number where the symbol rst
appears in an equation is given next to the symbols. Some of the symbols do not
appear in more than one equation, therefore they are explained below the equation
and not included in the list of symbols.
A list of abbreviations is given in Appendix B.
Italized letters in equations indicate scalar values, such as a. Boldface lower-
case letters indicate vectors, for example, a. Boldface uppercase letters indicate
matrices: C. A hat over a variable,

h, means an estimated value of h.
H
rep-
resents the conjugate transpose (Hermitian transpose) of the matrix , while
T
represents a transpose of . a indicates the smallest integer greater than or
equal to a. a indicates rounding of a to the nearest integer.
Keywords are written in italics.
vi
CONTENTS
List of Figures xi
List of Tables xiii
1 Introduction 1
1.1 Project Background . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Project Background . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Project Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 Time Schedule . . . . . . . . . . . . . . . . . . . . . . . 5
2 OFDM Analytical Model 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Extreme Programming Methodology . . . . . . . . . . . . . . . . 7
2.2.1 XP Evaluation . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 OFDM Symbol Structure . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Basic OFDM Transmitter . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Cyclic Prex . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 Sub-carrier Correlation Properties . . . . . . . . . . . . . 19
2.4.3 Approaches to Modulation . . . . . . . . . . . . . . . . . 22
2.4.4 Structure of Orthogonality Matrix . . . . . . . . . . . . . 25
2.4.5 Orthogonality Matrix with Cyclic Prex . . . . . . . . . . 26
2.4.6 OFDM Transmitter Using IDFT . . . . . . . . . . . . . . 27
2.5 Channel Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
viii CONTENTS
2.6 Basic OFDM Receiver . . . . . . . . . . . . . . . . . . . . . . . 28
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Downlink Simulator 33
3.1 AWGN Channel Simulator . . . . . . . . . . . . . . . . . . . . . 34
3.1.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.3 Channel Model . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.4 Channel Estimation . . . . . . . . . . . . . . . . . . . . . 36
3.1.5 Theoretical Performance . . . . . . . . . . . . . . . . . . 36
3.1.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Rayleigh Channel Simulator . . . . . . . . . . . . . . . . . . . . 38
3.2.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.3 Channel Model . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.4 Channel Estimation . . . . . . . . . . . . . . . . . . . . . 45
3.2.5 Theoretical Performance . . . . . . . . . . . . . . . . . . 46
3.2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Jakes Channel Model . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.1 Reference Model . . . . . . . . . . . . . . . . . . . . . . 50
3.3.2 Jakes Model . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.3 Jakes Multipath . . . . . . . . . . . . . . . . . . . . . . 53
3.3.4 Approaches for Implementation . . . . . . . . . . . . . . 54
3.3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.6 Channel Proles . . . . . . . . . . . . . . . . . . . . . . 58
3.3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Channel Estimation for OFDM . . . . . . . . . . . . . . . . . . . 63
3.4.1 Overview of OFDM Channel Estimation Algorithms . . . 63
3.4.2 Time-Domain Channel Estimation . . . . . . . . . . . . . 67
3.4.3 Performance of Time-Domain Channel Estimation Algo-
rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.4.4 Adaptive Time-Domain Channel Estimation . . . . . . . . 71
3.4.5 Performance of the Adaptive Channel Estimation Algorithm 71
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
CONTENTS ix
4 Implementation of the OFDM Simulator 75
4.1 Hardware/Software Co-Design . . . . . . . . . . . . . . . . . . . 76
4.1.1 Rugby Meta Model . . . . . . . . . . . . . . . . . . . . . 76
4.1.2 Generic Co-Design Flow . . . . . . . . . . . . . . . . . . 76
4.1.3 HW/SW Co-Design Literature Survey . . . . . . . . . . . 80
4.2 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3 Platform Selection . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.4 Methodology for Channel Model Implementation . . . . . . . . . 87
4.4.1 The DK Methodology . . . . . . . . . . . . . . . . . . . 87
4.4.2 Handel-C . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.3 Followed Methodology . . . . . . . . . . . . . . . . . . . 90
4.5 Channel Model Implementation . . . . . . . . . . . . . . . . . . 92
4.5.1 Partitioning and Allocation . . . . . . . . . . . . . . . . . 92
4.5.2 Hardware/Software Implementation Testing . . . . . . . . 93
4.5.3 Fixed-point MATLAB . . . . . . . . . . . . . . . . . . . . 94
4.5.4 Jakes Model Optimization . . . . . . . . . . . . . . . . . 95
4.5.5 Handel-C Implementation . . . . . . . . . . . . . . . . . 98
4.6 Design Options and Decisions . . . . . . . . . . . . . . . . . . . 108
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5 Conclusions 111
5.1 The Analytical Part . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2 The Implementation Part . . . . . . . . . . . . . . . . . . . . . . 113
5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Bibliography 115
A List of Symbols 125
B List of Abbreviations 131
C Sequential Handel-C Code 135
C.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
C.2 EDIF Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
C.3 Trace and Route Results . . . . . . . . . . . . . . . . . . . . . . 140
x CONTENTS
D Parallel Handel-C Code 141
D.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
D.2 EDIF Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
D.3 Trace and Route Results . . . . . . . . . . . . . . . . . . . . . . 146
E Lyrtech SignalMaster 149
E.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . 149
E.2 Platform Architecture . . . . . . . . . . . . . . . . . . . . . . . . 150
E.3 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
F Celoxica RC203 Development Platform 153
F.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . 153
F.2 Platform Architecture . . . . . . . . . . . . . . . . . . . . . . . . 153
F.3 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
G Comparison of FPGAs 157
LIST OF FIGURES
1.1 Digital communication system . . . . . . . . . . . . . . . . . . . 2
1.2 The A
3
framework . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Gantt chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Comparison of OFDM and FDM spectra . . . . . . . . . . . . . . 8
2.2 Extreme Programming (XP) project ow . . . . . . . . . . . . . . 10
2.3 OFDM symbol structure (baseband) . . . . . . . . . . . . . . . . 13
2.4 Single-carrier and multi-carrier transmission . . . . . . . . . . . . 13
2.5 Structure of a basic OFDM transmitter/receiver . . . . . . . . . . 14
2.6 Structure of a basic OFDM transmitter . . . . . . . . . . . . . . . 15
2.7 Quadrature Phase Shift Keying (QPSK) . . . . . . . . . . . . . . 16
2.8 Inter-symbol interference and cyclic prex . . . . . . . . . . . . . 18
2.9 Cyclic prex insertion . . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 Auto-correlation of orthogonality functions . . . . . . . . . . . . 21
2.11 Maximum cross-correlation of orthogonality functions . . . . . . 22
2.12 Cross-correlation of orthogonality functions . . . . . . . . . . . . 23
2.13 Sub-carrier modulation methods . . . . . . . . . . . . . . . . . . 24
2.14 IFFT-based OFDM transmitter . . . . . . . . . . . . . . . . . . . 28
2.15 Basic OFDM receiver . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 BER performance in AWGN channel . . . . . . . . . . . . . . . . 37
3.2 OFDM system simulator with a slow fading channel . . . . . . . . 38
3.3 Multipath fading channel . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Impulse response . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Slow fading Rayleigh channel BER performance . . . . . . . . . 47
xii LIST OF FIGURES
3.6 Multipath BER performance with xed SNR . . . . . . . . . . . . 48
3.7 Rayleigh channel BER performance, T
u
= 100s . . . . . . . . . 49
3.8 Rayleigh channel BER performance, T
u
= 30s . . . . . . . . . . 49
3.9 Jakes envelope . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.10 Jakes output waveform correlation . . . . . . . . . . . . . . . . . 57
3.11 Jakes fast-fading channel simulation results . . . . . . . . . . . . 61
3.12 Pilot schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.13 Structure of a time-domain channel estimator . . . . . . . . . . . 68
3.14 Channel estimation BER performance at 30km/h . . . . . . . . . 69
3.15 Channel estimation BER performance at 60km/h . . . . . . . . . 70
3.16 Channel estimation BER performance at 120km/h . . . . . . . . 70
3.17 Structure of an adaptive time-domain channel estimator . . . . . . 71
4.1 Rugby meta model . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Domains of the Rugby model . . . . . . . . . . . . . . . . . . . . 77
4.3 Generic co-design ow . . . . . . . . . . . . . . . . . . . . . . . 78
4.4 Manual OFDM simulator complexity estimates . . . . . . . . . . 85
4.5 Design ow with DK Suite . . . . . . . . . . . . . . . . . . . . . 88
4.6 Activities of the FPGA implementation . . . . . . . . . . . . . . 91
4.7 Block diagram of the simulator . . . . . . . . . . . . . . . . . . . 93
4.8 Block diagram of HW/SW implementation test setup . . . . . . . 94
4.9 Fixed-point simulation results . . . . . . . . . . . . . . . . . . . 99
4.10 Channel simulator algorithm . . . . . . . . . . . . . . . . . . . . 100
4.11 Optimized Jakes algorithm . . . . . . . . . . . . . . . . . . . . . 101
4.12 Data Flow Graph of the Jakes simulator . . . . . . . . . . . . . . 103
4.13 Resource-constrained schedule . . . . . . . . . . . . . . . . . . . 104
4.14 Time-constrained schedule . . . . . . . . . . . . . . . . . . . . . 105
4.15 Options and decisions in the co-design process . . . . . . . . . . 108
E.1 SignalMaster block diagram . . . . . . . . . . . . . . . . . . . . 150
F.1 RC203 block diagram . . . . . . . . . . . . . . . . . . . . . . . . 154
LIST OF TABLES
3.1 WiMAX parameters . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Cross-correlation coefcients of Jakes model path inphase com-
ponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3 Cross-correlation coefcients of Jakes model path quadrature com-
ponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4 ITU Vehicular A channel prole . . . . . . . . . . . . . . . . . . 58
3.5 ITU Vehicular B channel prole . . . . . . . . . . . . . . . . . . 58
3.6 ITU Pedestrian B channel prole . . . . . . . . . . . . . . . . . . 58
3.7 Resampled ITU Vehicular A channel prole . . . . . . . . . . . . 60
3.8 Resampled ITU Vehicular B channel prole . . . . . . . . . . . . 60
3.9 Resampled ITU Pedestrian B channel prole . . . . . . . . . . . . 60
4.1 Costs of FPGA implementation . . . . . . . . . . . . . . . . . . . 106
4.2 Speedup estimation results . . . . . . . . . . . . . . . . . . . . . 107
C.1 Resource usage of sequential Handel-C program . . . . . . . . . . 140
C.2 Minimum FPGA clock period for sequential program implemen-
tation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
D.1 Resource usage of parallel Handel-C program . . . . . . . . . . . 147
D.2 Minimum FPGA clock period for parallel program implementation 147
G.1 FPGA comparison . . . . . . . . . . . . . . . . . . . . . . . . . 157
CHAPTER 1
INTRODUCTION
1.1 Project Background
The history of Orthogonal Frequency Division Multiplexing (OFDM) dates back
to 1957 [Nee, 2005], but it was used in a standard for the rst time in 1995. This
standard was developed by the European Telecommunications Standards Insti-
tute (ETSI) for Digital Audio Broadcasting (DAB). The long application list of
OFDM includes Asynchronous Digital Subscriber Line (ADSL), Wireless LAN
(IEEE 802.11), Digital Video Broadcast (DVB), WiMAX (IEEE 802.16), and oth-
ers. With time, the use of OFDM has also been increasing. OFDM is considered
to be the most prospective candidate for use in 4
th
generation mobile communica-
tions, which makes it a good candidate for research. The basic idea of OFDM is to
use a large number of parallel narrow-band sub-carriers instead of a single wide-
band carrier to transmit information. This way of transmitting information helps
in handling the effects of multipath propagation efciently. The transmission is
efcient because of the spectral overlap of narrowband sub-carriers. OFDM sup-
ports various modulations techniques like BPSK, QAM, QPSK etc.
1.2 Project Background
The OFDM system has numerous topics which are interesting but it is unman-
ageable for a Masters thesis to deal with all of those topics. To make a realistic
time schedule and a task list, the scope of the project is dened in this chapter. A
generalized digital communication system, shown in Figure 1.1, helps to frame a
task list that not only includes the parts of the system that have to be focused on
but also those that have to be neglected.
2 Introduction
Figure 1.1: Digital communication system
The messages produced by a source are converted into binary formby the input
transducer. The number of binary bits should be as few as possible as the source
output needs to be free from redundancy. The process of efciently converting the
output of a digital source into a sequence of binary digits is called source encod-
ing [Proakis, Salehi, 2002]. The output of the source encoder is fed to the channel
encoder. The channel encoder adds some redundancy into the binary information
sequence to overcome the negative effects of the channel during the transmis-
sion of the signal. The digital modulator modulates the binary sequence from the
channel encoder to be transmitted using an arbitrary transmission scheme. At the
receiver of the digital communication system, the digital demodulator processes
the transmitted waveform that has been affected by the channel and decides on
the transmitted data. The remaining blocks, namely channel decoder, source de-
coder and output transducer in Figure 1.1 correspond to their counterparts at the
transmitter.
Only digital modulator, radio channel and digital demodulator in Figure 1.1
are investigated in this thesis.
1.3 Project Contents 3
1.3 Project Contents
The project consists of two major parts:
Analytical part: A generic OFDM system model is studied and modeled. Sev-
eral channel models and channel estimation techniques are analyzed. Ex-
treme Programming (XP) methodology (explained in Section 2.2) is fol-
lowed for implementing the OFDM system simulator in MATLAB;
Implementation part: The execution time of the MATLAB simulator developed
in the Analytical part is optimized. The simulator is partitioned, allocated
and implemented on a chosen heterogeneous platform. Several software/
hardware co-design techniques and tools are investigated. They are mapped
to the Rugby meta model (see Section 4.1.1).
The framework illustrated in Figure 1.2 relates the analytical part and the im-
plementation part. This framework consists of three domains, namely Applica-
tion, Algorithm, and Architecture. Using the rst letters of each of the three do-
mains, the framework is symbolically called A
3
. It states that there is more than
one way to express the given application through algorithms. In this step various
radio channel modeling algorithms are evaluated. The algorithm that provides
the most realistic channel model is selected. Also, there is a one-to-many map-
ping from algorithm to architecture meaning that it is possible to implement the
functionality in more than one type of architecture.
1.3.1 Goals
Several goals are set up for the Analytical and Implementation parts of the thesis.
The goals also dene the time schedule.
Analytical Part
Analysis and design of a downlink OFDM system model;
Framing of a generic OFDM system. The components of the system are
written in simple mathematical notation in order to be able to investigate
them independently;
Study of the OFDM transceiver performance in various radio channel mod-
els in the increasing order of real-life resemblance;
4 Introduction
Figure 1.2: The A
3
framework
Study of channel estimation algorithms at the receiver side;
Development of the OFDM transceiver model in MATLAB;
Verication of the MATLAB system model against theoretical performance
from literature.
Implementation Part
Decreasing the execution time of the simulator;
Design space exploration for the algorithm implementation;
Use of a hardware/software co-design methodology based on the Rugby
meta-model;
Implementation of the OFDM simulator on a chosen heterogeneous archi-
tecture;
1.3 Project Contents 5
1.3.2 Limitations
Each and every aspect of the system can not be analyzed during the duration of
the thesis, therefore, the following limitations are set:
Analytical Part
Only the downlink scenario is considered;
Source information processing at the transmitter and receiver is not consid-
ered (see Figure 1.1);
Source encoding at the transmitter is not considered;
Channel encoding at the transmitter is not considered;
It is assumed that the transmitter and the receiver have identical clocks,
they are perfectly synchronized at the start of the rst OFDM symbol path
arrival, i.e., synchronization issues are not analyzed;
Channel impulse response is assumed to be sample-spaced.
Implementation Part
Hardware platform selection is limited to items available at the department
during selection.
1.3.3 Time Schedule
The tasks of the project are listed in the Gantt chart shown in Figure 1.3. As
already stated, the thesis is divided into the Analytical and the Implementation
parts, with an evaluation between. The time frame of the rst part is allocated a
longer time period because of study courses during the 9th semester.
6 Introduction
Figure 1.3: Gantt chart
CHAPTER 2
OFDM ANALYTICAL MODEL
An introduction to Orthogonal Frequency Division Multiplexing (OFDM) tech-
nology is presented in this chapter. The purpose of the chapter is to explain
the essential terms and introduce a simple analytical model of a basic OFDM
transceiver.
2.1 Introduction
OFDM is a multi-carrier modulation technique. It splits the transmitted data
stream into several interleaved lower rate streams and transmits them in parallel
by using several narrowband sub-carriers. The advantage of OFDM over Fre-
quency Division Multiplexing (FDM) is that the sub-carriers are orthogonal. Be-
cause of the orthogonality, spectral overlap is possible and the spectral efciency
is enhanced. OFDM does not require guard bands between the sub-carriers. The
spectra of OFDM and FDM are depicted in Figure 2.1.
The Extreme Programming methodology is followed for developing the OFDM
system simulator. It is explained in the next section.
2.2 Extreme Programming Methodology
Extreme Programming (XP) is a discipline of software development based on val-
ues of simplicity, communication, feedback, and courage. It works by bringing
a team together in the presence of simple practices, with enough feedback to en-
able the team to see where it is and to tune the practices to a particular project
[Jeffries, 2001]. It can be considered being a methodology in the sense that it is a
8 OFDM Analytical Model
(a) OFDM spectrum (b) FDM spectrum
Figure 2.1: Comparison of OFDM and FDM spectra
repeatable process for developing software, it is in fact a methodology, although a
lightweight one [Brewer, 2001].
XP consists of several core rules and practices:
Planning Game: at the beginning of each small iteration, a list of desired features
for the system is set. The required effort is estimated together with the effort
that a team can produce in a given time interval (the iteration). Each feature
is written out as a User Story, which gives the feature a name, and describes
in broad strokes what is required. User stories are typically written on small
paper cards [Brewer, 2001]. They are used to create time estimates for the
release planning meeting. They are also used instead of a large requirements
document [Wells, 2004].
Small Releases: the project is started with the smallest useful feature set. Re-
leases are produced early and often, adding a few features each time
[Brewer, 2001].
Continuous Testing: before programmers add a feature, they write a test for it.
When the suite runs, the job is done. Tests in XP come in two basic avors.
1. Unit Tests are automated tests written by the developers to test func-
tionality as they write it. Each unit test typically tests only a single
class, or a small cluster of classes [Brewer, 2001]
2.2 Extreme Programming Methodology 9
2. Acceptance Tests (also known as Functional Tests) are specied by the
customer to test that the overall system is functioning as specied. Ac-
ceptance tests typically test the entire system, or some large chunk of
it. When all the acceptance tests pass for a given user story, that story
is considered complete. At the very least, an acceptance test could
consist of a script of user interface actions and expected results that
a human can run. Ideally acceptance tests should be automated, ei-
ther using the unit testing framework, or a separate acceptance testing
framework [Brewer, 2001].
System Metaphor: each project has an organizing metaphor, which provides an
easy way to remember naming convention, such as class, variable or le
names [Brewer, 2001].
Simple Design: according to [Jeffries, 2001], XP teams build software to a sim-
ple design. They start simple, and through programmer testing and design
improvement, they keep it that way. An XP team keeps the design exactly
suited for the current functionality of the system. There is no wasted mo-
tion, and the software is always ready for whats next.
Refactoring: any duplicate code generated in a coding session is re-factored.
Refactoring can be done with condence as the functionality remains un-
changed because it is veried by tests. [Brewer, 2001].
Pair Programming: all production code is written by two programmers sitting
at one machine. This practice ensures that all production code is reviewed
by at least one other programmer, and results in better design, better testing,
and better code [Jeffries, 2001].
Collective Code Ownership: any developer in the team is expected to be able
to work on any part of the code base at any time [Brewer, 2001].
On-site Customer: all the phases of an XP project require communication with
the customer, preferably face to face, on the site. Its best to simply assign
one or more customers to the development team [Wells, 2004].
Coding Standards the code is kept consistent and easy for the entire team to
read and re-factor [Wells, 2004].
The phases of a XP project are shown in Figure 2.2.
The development begins with requirements written as User Stories. Release
planning is performed by rst sorting the user stories by their estimated risk and
10 OFDM Analytical Model
Figure 2.2: Extreme Programming (XP) project ow
value. The spike solutions might be developed to solve or explore critical prob-
lems. Ideally those programs will be thrown away once every developer gets clear
idea about the problem [Sakthivel, 2004]. Each user story is assigned a velocity
(speed at which the task can be performed). Then, a release plan is made. It in-
cludes several user stories to be implemented for the next release. The release date
is also determined by the picked user stories [Wikipedia, 2004].
When release planning is done, the team proceeds with implementation. XP
practices are followed. Unit tests together with Acceptance Tests from the cus-
tomer are used to nd bugs. When the acceptance tests are passed, the small
release is nished. It gives feedback to next release planning about the project
velocity - i.e., what speed can be considered realistic for the next tasks.
2.2.1 XP Evaluation
We applied the XP practices during the analytical part of the project. Extreme Pro-
gramming gives advice for both programming and overall project management.
However, we did not apply all the rules related to programming. This happened
because pure programming did not take a considerable time as compared to the
other activities.
The XP practices seemed highly suitable for our needs at the beginning of
the project. We organized the analytical part in four major iterations (AWGN,
Rayleigh fading, Jakes channel models and channel estimation algorithm). Each
iteration took from 2 to 4 weeks to complete (see Gantt chart in Figure 1.3). Lists
of required features and activities were written for the iterations, although we did
not use small cards to write User Stories as it did not seem necessary.
2.2 Extreme Programming Methodology 11
Small weekly intermediate releases of the simulators were produced. Each of
them added one or more features while preserving the previous functionalities and
condently moved us towards the nal simulator. We found that the approach of
performing a number of small iterations is very useful because it:
forces "on-the-y" verication of each feature that adds up to the nal re-
sult;
allows easier returning to earlier versions after a failure;
gives condence by seeing some results at once instead of keeping the
whole system in development state. In the latter case, there is a possibility
to overlook some critical design aws. In comparison, producing a small
release involves only addition of more functionality to an existing system.
It might require renement of the system, but this is relatively safe because
there are automatic tests from the previous iterations. The tests are repeated
in order to verify the success of the renement.
The acceptance tests were performed mainly by means of comparing the Bit
Error Rate (BER) curves produced by the simulators with theoretical BER curves.
As this project is of non-commercial kind, we considered everyone connected to it
to be customers: the team and supervisors. Frequent discussions with supervisors
were held.
The XP practices that are related to the programming process were not con-
sidered to be very important in this project. As already mentioned, it was mostly
because of the relatively low amount of programming as compared to analysis.
During the project we committed ourselves to a common coding style in both the
source code and documentation. The pair programming paradigm proved its ben-
ets at the start of the project when we had very little knowledge of OFDM. By
trying to put our understanding into code together we were able to complement
each others ideas. However, as knowledge increased, the need of pair program-
ming decreased. We believe that pair programming is needed when:
the amount of code is large - it would be hard for one person to remember
many details of it;
many external interfaces (built-in functions, etc.) are used - the "co-pilot"
helps to ensure that they are used with proper semantics;
and the most important - when programming is the main means of trans-
forming the ideas and requirements into a product.
12 OFDM Analytical Model
The pair programming did not work as expected because we had to do plenty of
analysis before starting to write the code (i.e., it was always clear what exactly
has to be implemented during the next small step). Moreover, there was a slight
difference in programming skills of the team members. The task of improving the
programming skills was considered on an individual basis in order not to ham-
per the development of the working model. Therefore, it was decided to have a
working model as fast as possible.
To conclude, Extreme Programming is a relatively newand certainly good way
to develop software systems. We used its principles during the part of the project
when OFDM was studied and several simulators were written in MATLAB. XP
showed its benets as a way to manage the work ow. However, its rules of
programming were not very strictly followed as this project is not concentrated on
software development.
2.3 OFDM Symbol Structure
The structure of a baseband OFDM symbol is shown in Figure 2.3. A baseband
signal implies that its spectral content extends from (or near) dc up to some nite
value, usually less than a few megahertz, [Sklar, 2001].
An OFDM symbol is transmitted by N
s
orthogonal sub-carriers. The sys-
tem bandwidth F
0
is equally divided among the sub-carriers. The inter-carrier
frequency (difference between frequencies of two subsequent sub-carriers) is
f =
F
0
N
s
(2.1)
The k
th
sub-carrier transmits a wave of k periods during the useful OFDM
symbol transmission time T
u
. Its frequency is equal to
f
k
=
k
T
u
= k f k = 0, 1, . . . , N
s
1 (2.2)
2.3 OFDM Symbol Structure 13
Figure 2.3: OFDM symbol structure (baseband)
Figure 2.4: Single-carrier and multi-carrier transmission
In a single-carrier system where data symbols are transmitted serially (N
s
=
1), the time required to transmit a data symbol from a data block is T
u
. In contrast,
a multi-carrier OFDM system (N
s
> 1) transmits the data block by dividing it into
N
s
parallel streams as shown in Figure 2.4. In this case, the time available for each
14 OFDM Analytical Model
block element is (N
s
T
u
). In other words, the time available for the transmission
of each data symbol of the block in the multi-carrier case is N
s
times greater than
that of a single-carrier system, provided that the data rate is the same for both
systems. The OFDM symbol time is:
T
u
=
N
s
F
0
(2.3)
The number of sub-carriers (N
s
) is equal to the number of OFDM samples
transmitted during one symbol time (N
u
).
N
s
= N
u
(2.4)
Figure 2.5 depicts the structure of a basic OFDM transmitter. The details of
each block are presented in the following sections.
Figure 2.5: Structure of a basic OFDM transmitter/receiver
2.4 Basic OFDM Transmitter 15
2.4 Basic OFDM Transmitter
Figure 2.6 depicts the structure of a basic OFDM transmitter. At a particular
OFDM symbol transmission interval m, a serial bit stream of transmitted data
b is transformed into complex valued encoded symbols d by using a mapping
function F
map
based on an arbitrary modulation scheme:
d
m
= F
map
(b
m
) (2.5)
=
_
d
m
(0), d
m
(1), d
m
(2), . . . , d
m
(N
s
1)

(2.6)
Figure 2.6: Structure of a basic OFDM transmitter
The number of bits in d depends on the chosen F
map
.
Quadrature Phase Shift Keying (QPSK) modulation is used in the thesis be-
cause of its popularity in the literature. The signal constellation for the QPSK
modulation is shown in Figure 2.7. Two bits of the vector b result in one modu-
lated symbol of d when QPSKscheme is used. The amplitude for all combinations
of two bits of b is same, but signal phase is different. Gray coding is used to map
the transmitted bits into data symbols. It gives one bit error when a symbol error
occurs, i.e., when a received data symbol is decided to belong to a neighboring
constellation point.
After encoding, the elements in d are placed in parallel by serial-to-parallel
transformation, they may be multiplied by a weighting matrix . is a diagonal
matrix of size [N
s
N
s
] that allows setting different values of power to different
elements in an OFDM symbol as explained in [Sandell, Edfors, 1996].
16 OFDM Analytical Model
Figure 2.7: Quadrature Phase Shift Keying (QPSK)
=
_

0
0 0
0
1
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0
N
s
1
_

_
(2.7)
a
m
= d
T
m
(2.8)
The symbol vector a (size [N
s
1]) contains encoded weighted symbols to be
transmitted by using N
s
sub-carriers:
a
m
=
_

_
a
m
(0)
a
m
(1)
.
.
.
a
m
(N
s
1)
_

_
(2.9)
As the encoded symbols are complex, the real part of each encoded symbol is
modulated using a cosine harmonic and the imaginary part is modulated using a
sine harmonic. The reason that sine and cosine harmonics are used is that they are
2.4 Basic OFDM Transmitter 17
orthogonal to each other and thus faithful demodulation is possible at the receiver.
2.4.1 Cyclic Prex
In an OFDM system, sub-carriers passing through a time dispersive channel loose
their orthogonality because of inter-carrier interference (ICI) and inter-symbol in-
terference (ISI).
ISI, in the context of the thesis is dened as the crosstalk between signals
within the same sub-channel of consecutive FFT frames, which are separated in
time by the signalling interval. ICI is the crosstalk between adjacent sub-channels
or frequency bands of the same FFT frames.
The problem of ISI is illustrated in Figure 2.8(a). Portions of the transmitted
OFDM symbols overlap because of multipath, causing constructive or destructive
interference, therefore, changing the spectral contents of the received symbol (ISI
is one of possible causes of ICI). The system performance degrades because of
the interference.
To overcome inter-symbol interference, a Cyclic Prex (CP) is added to OFDM
symbols. It is a copy of the last N
g
samples of the OFDMsymbol that is prepended
to the transmitted symbol and removed at the receiver before demodulation (Fig-
ure 2.8(b)). When a cyclic prex long enough to accommodate the maximum path
delay is added, the overlapping parts are rejected at the receiver. The received
OFDM symbol contains no ISI during its useful period N
u
.
The benet of the cyclic prex is twofold. First, it avoids ISI by acting as a
guard space between symbols. Second, the cyclic prex also converts the linear
convolution between the transmitted symbol and channel impulse response into
a cyclic convolution . Cyclic convolution in time domain translates into a scalar
multiplication in frequency domain [Mitra, 1998, pp. 140], so the sub-carriers
remain orthogonal and there is no ICI, [Engels, 2002], [Sramek, 2003].
Although BER performance is preserved, the spectral efciency decreases be-
cause of enlarged symbol time. As the cyclic prex does not carry any new infor-
mation, it should be set to a minimum length. The minimum length of the cyclic
prex that cancels ISI completely is equal to N
g
= N
ds
. The multipath delay
spread N
ds
is the relative delay of the last path as compared to the rst arriving
path in samples, or the length of the channel impulse response (in samples) minus
one.
The insertion of cyclic prex is explained in Figure 2.9, the OFDM symbol
being sent is extended with a cyclic prex of length N
g
by copying last N
g
samples
to the beginning of the symbol. Therefore, the total symbol length becomes N
gu
=
N
u
+ N
g
samples.
18 OFDM Analytical Model
(a) ISI without cyclic prex
(b) ISI with cyclic prex
Figure 2.8: Inter-symbol interference and cyclic prex
2.4 Basic OFDM Transmitter 19
Figure 2.9: Cyclic prex insertion
The important principle of OFDM is the orthogonality of its sub-carriers.
Cyclic Prex is a technique that combats the negative effects of multipath environ-
ment on the transmitted signal. Therefore, a further insight about the sub-carrier
orthogonality and Cyclic Prex is given in the next section.
2.4.2 Sub-carrier Correlation Properties
An orthogonal waveform can be represented, [Engels, 2002, pp. 35], as:

k
(t) =
_
1

T
u
e
2f
k
t
(t [0, T
u
])
0 otherwise
(2.10)

k
(t) k
th
orthogonality function;
Each base function has an integer number of cycles in useful OFDM symbol
time and therefore, is orthogonal to the other functions [IEC, 2004]. The cross-
correlation between two arbitrary orthogonality functions
k
(t) and
k
(t) when
channel delay is less than the length of cyclic prex is calculated according to
[Engels, 2002] as:
20 OFDM Analytical Model
_
T
u
0

k
(t)

k
(t)dt = (2.11)
=
_
T
u
0
_
(
1

T
u
e
2f
k
t
) (
1

T
u
e
2f
k
t
)
_
dt (2.12)
=
1
T
u
_
T
u
0
e
2(f
k
f
k
)t
dt (2.13)
when k = k

, f
k
= f
k

_
T
u
0

k
(t)

k
(t)dt = (2.14)
=
1
T
u
_
T
u
0
dt (e
0
= 1) (2.15)
=
1
T
u
_
1 T
u
1 0
_
(2.16)
= 1 (2.17)
when k = k

,
_
T
u
0

k
(t)

k
(t)dt = (2.18)
=
1
T
u
_
T
u
0
e
2(f
k
f

k
)
dt (2.19)
=
1
T
u
1
2(f
k
f
k
)
_
e
2(f
k
f
k
)T
u
1
_
(2.20)
= 0 (2.21)
2.4 Basic OFDM Transmitter 21
(a) Auto-correlation sequence of
2
(b) Auto-correlation sequence of
3
Figure 2.10: Auto-correlation of orthogonality functions
Upon summarizing,
_
T
s
0

k
(t)

k
(t)dt =
_
0 (k = k

)
1 (k = k

)
(2.22)
Figure 2.10 shows how the auto-correlation of the base functions
2
and
3
varies when the delay varies from 2N
gu
to 2N
gu
in unit increments. It can
be seen from the gure that the correlation value is one while the delay value
is between 0 and N
g
, i.e., the orthogonality holds when the cyclic period length
N
g
is greater than the maximum excess delay N
ds
as the cyclic prex cancels the
effects of delay. When the maximum excess delay exceeds N
g
, the correlation
starts decreasing.
Cross-correlation sequences between different base functions
k
and
k
are
shown in Figure 2.12. It is seen that the cross-correlation sequence does not de-
pend on any particular values of k and k

but rather on the difference between


them. Figure 2.12(a) and 2.12(b) correspond to the cross-correlation sequences
when the difference between sub-carrier indices is one. The cross-correlation se-
quence has the same absolute values. The same applies to Figure 2.12(c) and
2.12(d), but the maximum absolute value is lower.
Figure 2.11 is used to investigate how the maximum absolute cross-correlation
22 OFDM Analytical Model
Figure 2.11: Maximum values of base function
4
cross-correlation with other base func-
tions
value of different pairs of orthogonality functions depends on the difference be-
tween their indices. The orthogonality function
4
is correlated with
k
, k =
0 . . . N
s
1. For each plot point in Figure 2.11 a cross-correlation sequence be-
tween
4
and
k
is calculated with a varying time delay ( = N
gu
. . . N
gu
). A
sequence of length 2N
gu
1 is obtained for each plot point in Figure 2.11. The
maximum absolute value of the sequence (the maximum cross-correlation coef-
cient between
4
and
k
at any delay ) is shown in the plot. It is observed in
the gure that an orthogonality function has the highest correlation with itself, but
the correlation decreases as the difference in the number of cycles in the functions
increases.
2.4.3 Approaches to Modulation
In practice, the data to be transmitted as represented in Equation 2.8 is modulated
by using a bank of orthogonal sub-carriers represented by orthogonality matrix
in Equation 2.27. The modulation of data onto the sub-carriers can be performed
using three approaches as shown in Figure 2.13.
1. The matrix of orthogonality functions (see Section 2.4.4) of size [N
u

N
s
] is multiplied with the transmitted data vector a (see Equation 2.9) of
2.4 Basic OFDM Transmitter 23
(a) k = 2, k

= 3 (b) k = 3, k

= 4
(c) k = 2, k

= 4 (d) k = 3, k

= 5
Figure 2.12: Cross-correlation of orthogonality functions
24 OFDM Analytical Model
Figure 2.13: Three approaches for modulation and demodulation using orthogonal sub-
carriers
size [N
s
1]. This multiplication results in a time-domain signal of size
[N
u
1]. Finally, cyclic prex is inserted making the size of the signal to
be transmitted to [N
gu
1]. The cyclic prex insertion can be described in
matrix form as multiplication of the transmitted signal with a matrix G
cp
that has two diagonals, as shown in [Engelhart et al, 1999]:
G
cp
=
_
0
N
g
(N
FFT
N
g
)
I
N
g
N
g
I
N
FFT
N
FFT
_
(2.23)
s
m
= G
cp
a
m
(2.24)
0 zero matrix;
I identity matrix;
2. Instead of generating the orthogonal sub-carriers, an IDFT operation can be
used as proposed by [Weinstein, Ebert, 1971]. The advantage of using an
IFFT operation instead of IDFT is described in Section 2.4.6. Cyclic prex
is added to the IDFT result:
2.4 Basic OFDM Transmitter 25
s
m
= G
cp
IFFT(a
m
) (2.25)
3. By copying N
g
trailing rows before the leading rows of , an orthogonality
matrix already containing cyclic prex (
cp
, size [N
gu
N
s
]) is formed.
Both modulation and prex insertion now can be done by one matrix multi-
plication (
cp
a), resulting in time-domain signal vector of size [N
gu
1]:
s
m
=
cp
a
m
(2.26)
The orthogonality matrices and
cp
are described in detail in Sections 2.4.4
and 2.4.5.
2.4.4 Structure of Orthogonality Matrix
The matrix is used for mapping a vector of transmitted data onto orthogonal
sub-carriers. It is dened as:
=
_

0,0

0,1

0,N
s
2

0,N
s
1

1,0

1,1

1,N
s
2

1,N
s
1
.
.
.
.
.
.
i,k
.
.
.
.
.
.

N
u
2,0

N
u
2,1

N
u
2,N
s
2

N
u
2,N
s
1

N
u
1,0

N
u
1,1

N
u
1,N
s
2

N
u
1,N
s
1
_

_
(2.27)
The dimensions of are [N
u
N
s
].
Each column of the matrix contains the values of the k
th
orthogonality
function:
26 OFDM Analytical Model

i,k
= cos
k
i + sin
k
i, i = 0, 1, . . . , N
u
1 (2.28)
There is a total of N
s
used sub-carriers in the system, i.e.,
k = 0, 1, . . . , N
s
1 (2.29)
From Equation 2.4 it is known that N
s
= N
u
.
The rst column of with frequency f
0
(with zero cycles per N
u
samples)
contains samples of orthogonality function, the second function has one cycle in
it and so on. There is a total of N
s
columns (base functions) in the matrix and
each of them has N
u
samples. The real part of
k
is a cosine harmonic and the
imaginary part is a sine harmonic.
2.4.5 Orthogonality Matrix with Cyclic Prex
The orthogonality matrix can be enhanced by including the cyclic prex into
it. In this case there is no need of having a separate operations for mapping and
cyclic prex insertion.
When a cyclic prex of length N
g
is added, N
g
samples from the end of each
base function are copied to their beginning of that base function, so the matrix is
increased to size [(N
u
+ N
g
) N
s
]. The matrix after the addition of cyclic prex
is represented as:
2.4 Basic OFDM Transmitter 27

cp
= G
cp
= (2.30)
=
_

N
g
,0

N
g
,1

N
g
,N
s
2

N
g
,N
s
1

N
g
+1,0

N
g
+1,1

N
g
+1,N
s
2

N
g
+1,N
s
1
.
.
.
.
.
.
.
.
.
.
.
.

2,0

2,1

2,N
s
2

2,N
s
1

1,0

1,1

1,N
s
2

1,N
s
1

0,0

0,1

0,N
s
2

0,N
s
1

1,0

1,1

1,N
s
2

1,N
s
1
.
.
.
.
.
.
i,k
.
.
.
.
.
.

N
u
2,0

N
u
2,1

N
u
2,N
s
2

N
u
2,N
s
1

N
u
1,0

N
u
1,1

N
u
1,N
s
2

N
u
1,N
s
1
_

_
(2.31)
It is observed from Equation 2.31 that the lower sub-matrix is equal to the
matrix from Equation 2.27. The upper sub-matrix contains the cyclic prex
parts of the orthogonality functions.
The values of k
th
column of
cp
are calculated as:

cp
i,k
= cos
k
i + sin
k
i, i = N
g
, . . . , N
u
1 (2.32)
2.4.6 OFDM Transmitter Using IDFT
As described in Section 2.4.3, the computational complexity of the modulator can
be reduced.
The number of complex multiplications required in a DFT operation is N
FFT
2
and the number of complex additions is N
FFT
2
N
FFT
. The computational com-
plexity can further be reduced using an FFT operation. The number of com-
plex multiplications and complex additions in an FFT operations are of order
N
FFT
2
log
2
(N
FFT
) and N
FFT
log
2
(N
FFT
), respectively. As N
FFT
increases the use-
fulness of FFT becomes more apparent. Thus, the modulation of the encoded
28 OFDM Analytical Model
symbols can be performed in an efcient way by using IFFT. A representation of
an OFDM transmitter which uses IFFT operation instead of independent modula-
tors for each encoded signal is depicted in Figure 2.14.
Figure 2.14: IFFT-based OFDM transmitter
At a particular signalling interval m, the transmitted signal is given by:
s
m
= G
cp
a
m
G
cp
IFFT(a
m
) (2.33)
2.5 Channel Models
The channel models are not discussed in this iteration of the simulator. This ap-
proach is followed because the channel is decoupled from the transceiver tech-
nology and the basic OFDM system is described without considering a channel.
Various channel models are developed in the next chapter.
2.6 Basic OFDM Receiver
The block diagram of a basic OFDM receiver which is adopted from Figure 2.5
is shown in Figure 2.15. The received discrete signal vector r
m
of size [1 N
gu
]
contains the transmitted OFDM symbol affected by the channel. Although no
particular channel model is considered in this chapter, an imitation of the channel
is assumed in order to maintain a full view of the generic OFDM system. It does
not distort the transmitted signal.
2.6 Basic OFDM Receiver 29
Figure 2.15: Basic OFDM receiver
After reception, the cyclic prex is removed because it is not a part of the
useful transmitted symbol. Then, the data symbols originally mapped onto or-
thogonal sub-carriers in the transmitter (vector a) are demodulated. The received
time-domain signal vector r is transformed into a frequency-domain signal vector
z. Similar to the techniques described in Section 2.4.3 and shown in Figure 2.13,
these operations are performed at the receiver in one of the following ways:
1. Discarding rst N
g
OFDM samples of the received signal r
m
by multiplying
it with a cyclic prex removal matrix

G
cp
. The size of vector r is decreased
from [1 N
gu
] to [1 N
u
]. Then, the signal is converted into a parallel
form by transposing the vector. Finally, the received signal vector without
the cyclic prex is multiplied with a Hermitian of the orthogonality matrix
:

G
cp
=
_
0
N
g
N
FFT
I
N
FFT
N
FFT
_
(2.34)
z
m
=
H

_
r
m


G
cp
)
T
(2.35)
0 zero matrix;
I identity matrix;
2. Discarding rst N
g
samples of the received signal r
m
and then performing
an FFT on the result:
z
m
= FFT((r
m


G
cp
)
T
) (2.36)
This approach has the lowest computational complexity;
30 OFDM Analytical Model
3. Discarding the cyclic prex and performing the FFT at the same time by
multiplying the received signal r
m
by a Hermitian of the orthogonality ma-
trix
cp
that has zeros in its rst N
g
rows (it shall be called

H
cp
):

H
cp
=
H


G
T
cp
(2.37)
z
m
=

H
cp
r
m
(2.38)
This approach has high computational complexity as compared to FFT but
is written in compact matrix-vector notation. Because of this compact nota-
tion, this approach is used in the report.
After the demodulation, a complex-valued data symbol vector z
m
with dimen-
sions [N
u
1] is obtained. Depending on the channel model used, the received
data symbols z
m
may have been affected by the channel. The data symbols have
to be restored by using a channel equalization function F
eq
:
y
m
= F
eq
(z
m
) (2.39)
This function puts the constellations of the received symbols into their esti-
mated correct positions in the complex plane by canceling the effects of multi-
path radio channel. Finally, a hard decision has to be made on the data symbols
y
m
. A demapping scheme, corresponding to its counterpart used at the transmitter
is applied to extract a serial bit sequence:
x
m
= F
dec
(y
m
) (2.40)

b
m
= F
demap
(x
m
) (2.41)
2.7 Summary 31
2.7 Summary
In this chapter, a basic analytical model of an OFDM transceiver is explained
using matrix vector notation. The model does not take any radio channel into
account.
In the next chapter, a few channel models are presented and analyzed. The
transceiver is placed in an environment represented by the channel and a channel
estimation algorithm is applied at the receiver to recover the corrupted data.
32 OFDM Analytical Model
CHAPTER 3
DOWNLINK SIMULATOR
The goal of this chapter is to extend the basic OFDM transceiver from the previous
chapter with radio channel models. Several channel models with an increasing
order of complexity and real environment resemblance are presented.
The channel models are created in order to enhance the knowledge of OFDM
technology that was gained in the previous chapters. The OFDM performance in
several theoretical environments is analyzed through simulations.
A simple Additive White Gaussian Noise (AWGN) channel simulator that
does not require any channel estimation is presented rst, then it is extended to
a multipath slow-fading Rayleigh channel. Finally, the Rayleigh channel is ex-
tended to a fast-fading multipath Jakes channel model that includes International
Telecommunication Union (ITU) proles.
A literature survey of various channel estimation algorithms is given. One
algorithm is applied in order to explore the effect of the channel on the transmitted
signal.
WiMAX is one of the latest applications in which OFDM is applied. The
parameters specied in WiMAX (IEEE 802.16) standard are used for testing the
simulator.
WiMAX is a high-throughput broadband wireless technology that provides
connections over long distances. It is intended to be used for a number of ap-
plications, including "last mile" broadband connections, hotspots and high-speed
enterprise connectivity for business [Intel, 2004].
Various parameters regarding WiMAX specication are given in
[Yaghoobi, 2004, pp. 203]. The parameters that are relevant to this project are
shown in Table 3.1.
34 Downlink Simulator
Parameter Value
Sampling frequency (F
0
) 20 MHz
Sample time (
1
F
0
) 50 ns
FFT size (N
FFT
= N
s
) 2048
Sub-carrier spacing (f =
F
0
N
s
) 9.76 kHz
Useful symbol time (T
u
=
1
f
) 102.4 s
Guard time (T
g
=
T
u
8
) 12.8 s
OFDM symbol length (T
gu
= T
u
+ T
g
) 115.2 s
Table 3.1: WiMAX parameters
3.1 Additive White Gaussian Noise Channel Simu-
lator
The noise characteristics - white, additive and Gaussian are most often used to
model noise in a communication system. Since zero-mean Gaussian noise is com-
pletely characterized by its variance, this model is particularly simple to use in the
detection of signals, [Sklar, 2001, pp.33]. In order to seek analytical tractability,
the problem of analyzing a complex channel model is simplied by conditioning
it with an elementary Additive White Gaussian Noise (AWGN) effected channel.
Unconditioning is performed in order to approach reality in further models.
3.1.1 Denitions
A random signal with known statistical properties of amplitude, distribution, and
spectral density having a frequency spectrum that is continuous and uniform over
a specied frequency band is called white noise, [Haykin, 2001]. For a specied
bandwidth consisting of a continuous frequency spectrum, the total power in the
specied bandwidth divided by the specied bandwidth is termed as spectral den-
sity, [Haykin, 2001]. It is usually expressed in Watt per Hertz,
[Telecom Glossary, 2000]. The power spectral density of white noise is indepen-
dent of the operating frequency and is given according to [Haykin, 2001] as:
3.1 AWGN Channel Simulator 35

w
(f) =
N
0
2
(3.1)

w
(f) power spectral density of white noise;
N
0
is expressed in Watt per Hertz [W/Hz]. The parameter N
0
is usually refer-
enced to the input stage of the receiver of a communication system, [Haykin, 2001,
pp. 61].
The auto-correlation function for the white noise is
R
w
() =
N
0
2
() (3.2)
R
w
() auto-correlation function of white noise;
The delta function in Equation 3.2 means that the noise signal w(t) is to-
tally uncorrelated with its time shifted version, for any > 0, or that any two
different samples of a white noise process are uncorrelated. A random variable
R has a Gaussian distribution function if its probability density has the form,
[Haykin, 2001, pp. 54]:
PDF(R(r)) =
1

2
R
e
_

(r
R
)
2
2
2
R
_
(3.3)

R
mean of the random variable R;

R
variance of R;
36 Downlink Simulator
3.1.2 Assumptions
The simulator uses AWGN as the only source of degradation to the trans-
mitted signal;
All the sub-carriers would have experienced the same attenuation when they
reach the receiver;
The transmitter and the receiver are perfectly synchronized.
3.1.3 Channel Model
Since the noise in this iteration of the OFDM simulator is a Gaussian process
and the samples are uncorrelated, the noise samples are also independent. Such
channel is called memoryless channel because the AWGN affects the transmitted
symbol independently. The term additive means that the noise is simply added
to the transmitted signal and that there are no multiplicative mechanisms involved,
[Sklar, 2001, pp. 33].
3.1.4 Channel Estimation
No channel estimation is considered for the AWGN channel simulator. The chan-
nel estimator is transparent in the sense that it does not make any effect on the
signal received by it.
3.1.5 Theoretical Performance
One of the most important metrics of performance of digital communication sys-
tems is a plot of the bit-error probability (P
b
) versus

b
N
0
, [Sklar, 2001]. The data
received at the receiver is corrupted by noise while transmission. This results in an
error while estimating the transmitted data. The smaller the required

b
N
0
, the more
efcient is the detection process for a given probability of error, [Sklar, 2001].
The theoretical BER curves are generated according to the formulae given in
[Proakis, 2001, pp.256]. The practical curves are plotted against the theoretical
curves as described in [Proakis, 2001, pp. 260].
3.1 AWGN Channel Simulator 37
P
b
= Q
_
_
2
b
N
0
_
(3.4)

b
N
0
signal-to-noise ratio per bit;
3.1.6 Results
The simulated BER curve corresponds to the theoretical curve as shown in Fig-
ure 3.1. The basic OFDM transceiver simulator produces theoretically expected
results.
Figure 3.1: BER performance in AWGN channel
38 Downlink Simulator
3.2 Rayleigh Channel Simulator
A slow-fading Rayleigh channel model is chosen to be the extension to the simple
AWGN channel model in the previous section. The process of extending is what
has been referred to as unconditioning ([Jeruchim et al, 2000, pp. 16]) in Sec-
tion 3.1. The Rayleigh channel is selected because it is commonly used to describe
the statistical time varying multipath component of a channel [Hara, Prasad, 2003].
This model introduces a multipath channel and a channel estimator (Figure 3.2).
This system model is more realistic compared to the AWGN channel model be-
cause of the simulated multipath effects that are inevitable in real environments.
Figure 3.2: Block diagram of the OFDM system simulator with a slow fading channel
3.2.1 Denitions
According to [Hara, Prasad, 2003, pp. 13-18], multipath fading is due to multipath
reections of a transmitted wave by local scatterers such as houses, buildings and
man-made structures, or natural objects such as forest surrounding a mobile unit.
Figure 3.3 shows a typical multipath fading channel with L paths.
Before dealing further with multipath channels, it is necessary to have an un-
derstanding about the parameters of multipath channels and various types of fad-
ing. The channel can be characterized as a function of channel impulse response
(CIR) h(t, ). The term t represents the time variations occurring because of the
movement of the receiver (assuming that the transmitter does not move). The
3.2 Rayleigh Channel Simulator 39
Figure 3.3: Multipath fading channel
Figure 3.4: Impulse response
channel multipath delay for a xed value of t is represented by . h(t, ) is shown
as in Figure 3.4, [Rappaport, 1996].
The delay axis of the impulse response can be divided into equal time delay
segments called excess delay bins, where each bin has time delay width equal to
=
i+1

i
. The rst signal at the receiver arrives at a relative delay
0
= 0
as the propagation delay between the transmitter and the receiver is neglected. A
single resolvable multipath component having a delay
i
represents all the mul-
tipath signals received with in the i
th
bin. The total number of equally spaced
multipath components, N
comp
corresponds to normalized maximum excess delay
of the channel, N
ds
=
max
/T
s
[Rappaport, 1996]. The relative delay of the i
th
multipath component as compared to the rst arriving component and given by
i
is the excess delay. At some time t and
i
, there might be no multipath components
at some excess delay bins as the reections of signal occur randomly. A power
40 Downlink Simulator
delay prole is useful in obtaining excess delay spread. It is calculated by taking
the spatial average of h(t, )
2
. The spatial average is calculated by averaging
the values of h(t, )
2
at same
i
for different values of t.
A channel that passes all spectral components with approximately equal gain
and linear phase is a at channel. Coherence bandwidth, (f)
c
, is a statistical
measure of the range of frequencies over which the channel can be considered
at. Coherence bandwidth is also described as the range of frequencies over
which two frequency components have a strong potential for correlation in ampli-
tude. Two sub-carriers that have their frequency separation greater than (f)
c
are
affected differently by the channel.
Delay spread and coherence bandwidth describe the time () dispersive nature
of the channel. The time(t) varying nature of the channel is given by Doppler
spread and coherence time. When a pure sinusoidal frequency f is transmitted,
the received signal spectrum, called the Doppler spectrum, will have components
in the range of (f f
d
) to (f + f
d
), where f
d
is the Doppler shift depending on
the velocity of the receiver and the angle between the transmitter and the receiver
[Rappaport, 1996, pp. 141]. The effects of Doppler spread are negligible at the
receiver if the baseband transmitted signal bandwidth is greater than the Doppler
spread.
Coherence time, (t)
c
is the time duration over which two received signals
have a strong correlation. The channel will change during the transmission of
baseband message if the time duration of the baseband signal is greater than (t)
c
,
thus causing distortion at the receiver.
Different transmitted signals undergo different types of fading depending on
the relation between the signal parameters like bandwidth, symbol period, etc.
and the channel parameters like delay spread, Doppler spread, etc. There are four
possible effects, that are exhibited depending on the nature of the transmitted sig-
nal the channel, and the velocity of the mobile unit. The multipath delay spread
leads to time dispersion and frequency selective fading. The Doppler spread leads
to frequency dispersion and time selective fading. These two propagation mecha-
nisms are independent of one another.
Time dispersion due to multipath causes at fading and frequency selective
fading. The time duration of the transmitted signal in a at fading channel is
larger than the multipath time delay spread of the channel. Flat fading channels
are sometimes referred to as amplitude varying channels, frequency non-selective
channels or narrowband channels, since the bandwidth of the transmitted signal
is narrow compared to the bandwidth of the at fading channel. The instanta-
neous amplitude distribution of at fading channels is commonly considered to
be Rayleigh distributed [Rappaport, 1996, pp. 169]. Thus, a Rayleigh at fad-
3.2 Rayleigh Channel Simulator 41
ing channel model assumes that the channel induces an amplitude which varies in
time (t) according to the Rayleigh distribution.
The PDF of Rayleigh distribution envelope and phase is given by
p() =

2
r
e


2
2
2
r
, ( > 0) (3.5)
p() =
1
2
, (0 < 2) (3.6)
p() and p() are statistically independent.
Frequency selective fading occurs on the received signal if the channel pos-
sesses a constant gain and linear phase response over a bandwidth that is smaller
than the bandwidth of transmitted signal. In frequency selective fading, the mul-
tipath delay spread of the channel impulse response is greater than the time du-
ration of the transmitted signal and the received signal contains multiple versions
of the transmitted signal. These multiple versions are attenuated and delayed in
time and make the received signal distorted. Certain frequency components in
the received signal spectrum have greater gains than others in the frequency do-
main. Frequency selective fading channels are also called wideband channels
[Rappaport, 1996, pp. 169] since the bandwidth of the transmitted signal is wider
than the bandwidth of the channel impulse response.
If the channel impulse response changes rapidly within the symbol duration
i.e., the coherence time of the channel is smaller than the time duration of the
transmitted symbol, the channel is a fast fading channel. On the contrary and if
the channel response changes at a slower rate than the transmitted symbol, the
channel is a slow fading channel. The channel is decided to be fast or slow fading
by the bandwidth of the transmitted signal and the velocity of the mobile unit that
gives the Doppler spread.
The received signal after passing through the multipath fading channel is r(t)
and is described by Equation 3.7.
42 Downlink Simulator
r(t) =
L1

l=0
p
l
(t)e
2f
k

l
(t)
(3.7)
=
L1

l=0

l
(t) (3.8)

l
(t) complex-valued stochastic process;
3.2.2 Assumptions
The following assumptions are made for this system model:
The channel is slow fading, frequency non-selective;
There is a perfect synchronization between the transmitter and the receiver;
Perfect channel estimation exists.
3.2.3 Channel Model
The modeled radio channel involves distortion of the transmitted signal because
of two reasons: multipath effects and white Gaussian noise. As the channel fol-
lows slow fading, the coherence time is longer than OFDM symbol time T
gu
, in
other words, the channel impulse response can be considered constant during one
OFDM symbol duration. An account of the impulse response characteristics have
been given in Section 3.2.1. According to [Jeruchim et al, 1992, pp. 374], the
effects of a channel on a sent signal may be described as:
r(n) =
L1

l=0
p
l
(n) s(n
l
(n)) (3.9)
3.2 Rayleigh Channel Simulator 43
A simulated radio channel has L paths, the rst of them appears at time delay
= 0, and each path is associated with a complex attenuation coefcient p
l
and a
time delay
l
.
Tests are performed with the multipath delay spread being both smaller than
cyclic prex length Ng (no ISI expected) and longer (to investigate the effects of
ISI). According to [Jeruchim et al, 1992, pp. 375], the complex low-pass impulse
response of the multipath channel can be described as:
h(; n) =
L1

l=0
p
l
(n) (
l
(n)) (3.10)
The effect of channel on the transmitted signal can be expressed in matrix-
vector notation as multiplication of the matrix H
m
with the transmitted signal
vector. The matrix H
m
has N
gu
columns and each column contains a delayed
channel impulse response with length N
ds
. If the channel is approximated by
slow-fading, i.e., the channel impulse response h is considered to be constant
during one OFDM symbol duration, the channel matrix H
m
is written as:
H
m
=
_

_
h
m
[0] 0 0
h
m
[1] h
m
[0] 0
h
m
[2] h
m
[1] 0
.
.
.
.
.
.
.
.
.
h
m
[N
ds
1] h
m
[N
ds
2] 0
0 h
m
[N
ds
1] 0
.
.
.
.
.
.
.
.
.
0 0 h
m
[0]
.
.
.
.
.
.
0 h
m
[N
ds
1] h
m
[N
ds
2]
0 0 h
m
[N
ds
1]
_

_
(3.11)
44 Downlink Simulator
If the channel is modeled as fast-fading, the impulse response changes from
one OFDM sample to another.
The dimensions of H are [(N
ds
+ N
gu
1) N
gu
]. Convolution of the trans-
mitted signal is performed by multiplying the matrix with the signal vector:
r
m
= H
m
s
m
+w
m
(3.12)
The impulse response matrix H
m
can be divided into several sub-matrices
H
m
[0], H
m
[1], . . . of size [N
gu
N
gu
]. For example, the sub-matrix H
m
[0] is
shown between the horizontal lines in Equation 3.11:
H
m
=
_

_
H
m
[0]
H
m
[1]
.
.
.
_

_
(3.13)
The number of sub-matrices in Equation 3.13 depends on the impulse response
length N
ds
. H
m
[0] corresponds to the multipath channel effects to the current
transmitted symbol s
m
. The ISI term for the (m + 1)
th
received OFDM symbol
can be written as H
m
[1] s
m
.
The received OFDM symbol r
m
contains current transmitted symbol s
m
con-
volved with H
0
, possible ISI from the previous transmitted symbols and AWGN:
r
m
=
_
_
N
ds
+N
gu
1
N
gu
_
1

i=0
H
mi
[i] s
mi
_
+w
m
(3.14)
When the multipath delay spread N
ds
is shorter than cyclic prex length N
g
,
the ISI effect can be neglected and the received signal is written as:
3.2 Rayleigh Channel Simulator 45
r
m
= H
m
[0] s
m
+w
m
(3.15)
= H
m
[0] a
m
+w
m
(3.16)
3.2.4 Channel Estimation
If the multipath delay spread is longer than the guard period length, but not longer
than 2N
gu
, the current received OFDM symbol is corrupted with one previous
symbol:
r
m
= H
m
[0]s
m
+H
m1
[1]s
m1
+w
m
(3.17)
At the receiver the cyclic prex is removed from the received signal and FFT
is performed. Both FFT operation and cyclic prex removal can be substituted by
received signal multiplication by orthogonality matrix

H
cp
, i.e., a Hermitian of
and has zeros instead of the cyclic prex elements as described in Section 2.6.
z
m
=

H
cp
r
m
(3.18)
=

H
cp
H
m
[0]
cp
. .
C
m
[0]
a
m
+

H
cp
H
m1
[1]
cp
. .
C
m
[1]
a
m1
(3.19)
+

H
cp
w
m
(3.20)
According to [Kim et al, 1999], the variance of AWGN does not change af-
ter FFT. Therefore, the noise component is written as w instead of

H
cp
w
m
in
Equation 3.21. The transmitted and received OFDM symbols are related by the
diagonal matrix C
m
[0] containing the channel effects. C
m
[1] a
m1
is the inter-
symbol interference. If the multipath delay spread is shorter than the guard period
length, H
m1
[1] contains only zeros and the ISI term can be ignored.
46 Downlink Simulator
Finally, the received OFDM symbols after the FFT operation as depicted in
Figure 2.15 can be expressed as:
z
m
= C
m
[0] a
m
+C
m
[1] a
m1
+w
m
(3.21)
The C matrices contain the channel effects, therefore, successful recovery of
the transmitted data symbols relies on correct estimation of C
m
[0]. The issues of
OFDM channel estimation are presented in Section 3.4.
For the purpose of verifying the Rayleigh multipath channel model, a per-
fect channel estimation is assumed, i.e., the channel estimation matrix

H
m
[0] is
known. As there is also an assumption of normalized channel delay spread be-
ing less than the cyclic prex length,

H
m
[1] contains only zeros and the sent data
symbols can be extracted:

C
m
[0] =

H
cp


H
m
[0] (3.22)
y
m
= diag(

C
m
[0])
H
z
m
(3.23)
3.2.5 Theoretical Performance
According to [Proakis, 2001, pp. 831], the expression of bit error rate for QPSK
is:
P
b
=
1
2
_
1

_
2
2
L1

l=0
_
2l
l
_
_
1
2
4 2
2
_
l
_
(3.24)
=
_

c
1 +
c
(3.25)
where
c
is average received SNR per channel.
3.2 Rayleigh Channel Simulator 47
3.2.6 Results
Figure 3.5 shows the bit error performance of the model in comparison to the
theoretical performance given by Equation 3.24. The theoretical AWGN BER
curve from Figure 3.1 is also included. The simulations are performed with one,
two and three Rayleigh fading paths. The maximum excess delay of the channel
is less than the cyclic prex length. The plot shows that the simulated BER curves
correspond to theoretical estimates. However, the Rayleigh BER curve is different
from AWGN. The transceiver BER performance in a Rayleigh fading channel is
poorer than in OFDM channel, because the symbol energy becomes scattered in
time when passing a multipath environment. The effect of of scattering can be
mitigated using methods that would gather the dispersed symbol energy back into
its original time interval.
Figure 3.5: Slow fading Rayleigh channel BER performance
A second simulation is performed with two paths and the SNR xed at 10dB.
The delay of rst path arrival is xed at zero (
0
= 0). The delay of the second
path
1
changing from 1 to 2N
g
OFDM samples. It can be seen in Figure 3.6 that
the OFDM performance does not suffer when the maximum excess delay is less
than cyclic prex length N
g
.
The performance of different symbol and cyclic prex lengths is compared.
Figure 3.7 shows the BER performance curves of the basic OFDM system with
48 Downlink Simulator
Figure 3.6: BER performance with xed SNR and varying second path delay
100s symbol length. Two values, T
u
/8 and 0 are considered as the cyclic prex
length. Theoretical BER performance is also shown on the plot. ITU Vehicular
B and ITU Pedestrian B channel proles (Tables 3.8 and 3.9) are used for this
simulation (channel proles are discussed in Section 3.3.6). The channel model
used for these simulations differs from the model described in Section 3.2 because
the paths have xed delays and different variances.
The maximum delay spread of the ITU Vehicular B channel prole is 20ms.
It exceeds the cyclic prex of 12ms. Figure 3.7 shows that the BER performance
is worse than the theoretical limit. The ITU Pedestrian B channel prole has
the maximum delay spread of 3.7ms. It can be seen that the BER performance
deteriorates when the maximum delay spread is greater provided the same cyclic
prex length.
Figure 3.8 shows the results of a simulation with 30s OFDM symbol length.
As expected, the shorter symbol length results in higher inter-symbol interference
and bit error rate.
3.2 Rayleigh Channel Simulator 49
Figure 3.7: Rayleigh channel BER performance, T
u
= 100s
Figure 3.8: Rayleigh channel BER performance, T
u
= 30s
50 Downlink Simulator
3.3 Jakes Channel Model
The channel models analyzed in the previous sections are Additive White Gaus-
sian Noise (AWGN) and Rayleigh multipath channels. The AWGNchannel model
has a drawback of not considering any multipath effects on the transmitted signal.
The Rayleigh fading channel model is only suitable for simulating slow-fading
channels because the channel impulse response is calculated for each OFDM sym-
bol independently. The individual channel samples are not correlated in time. The
next step is to introduce time correlation to the Rayleigh channel model.
Historically, the Jakes model has been used for modeling a Rayleigh fading
channel. The Jakes simulator models the low-pass envelope of a stationary (at)
frequency non-selective (see Section 3.2.1) mobile fading channel under isotropic
scattering conditions [Rappaport, 1996]. The condition when the transmitted en-
ergy arrives equally distributed over all possible spatial angles, with uniformly
distributed phases is called an isotropic condition [ien, 2003]. An approximate
analytical model for such a channel is a zero-mean complex Gaussian noise pro-
cess with uncorrelated inphase and quadrature components. Jakes model allows
an effective approximation of the desired analytical model by using a nite num-
ber of low-frequency oscillators [Ptzold, Laue, 1998].
A reference model gives theoretical performance of the Jakes model and al-
lows performance evaluation of the practical Jakes simulator. Next, a practical
approach to Jakes simulator is given together with its statistical properties. In
order to use the Jakes simulator for multipath frequency-selective channel mod-
eling, a single-path case is considered and extended to multipath.
3.3.1 Reference Model
The complex low-pass Rayleigh envelope for the frequency non-selective (single
path) Jakes reference model is, [Xiao et al, 2002]:
3.3 Jakes Channel Model 51
g(t) = g
1
(t) + g
2
(t) (3.26)
g
1
(t) =
_
2
N
osc
N
osc

n=1
cos(2f
d
tcos
n
+
n
) (3.27)
g
2
(t) =
_
2
N
osc
N
osc

n=1
sin(2f
d
tcos
n
+
n
) (3.28)
For large N
osc
, the central limit theorem justies that g
1
(t) and g
2
(t) can be ap-
proximated as a Gaussian random processes assuming that and
n
are mutually in-
dependent and uniformly distributed over [, ] for each oscillator,
[Xiao et al, 2002].
The reference model gives the principle of generating a Rayleigh process by
using two banks of oscillators.
Second-order statistics, namely auto-correlation and cross-correlation func-
tions, are useful for analyzing the correlation properties of the inphase and quadra-
ture components. They are given as, [Xiao et al, 2002]:
R
g
1
g
1
() = E[ g
1
(t) g
1
(t + )] = J
0
(2f
d
) (3.29)
R
g
2
g
2
() = E[ g
2
(t) g
2
(t + )] = J
0
(2f
d
) (3.30)
R
g
2
g
1
() = E[ g
2
(t) g
1
(t + )] = 0 (3.31)
R
g
1
g
2
() = E[ g
1
(t) g
2
(t + )] = 0 (3.32)
3.3.2 Jakes Model
It is possible to reduce the computational complexity of the reference model. Ac-
cording to [Xiao et al, 2002] and [Pop, Beaulieu, 2002], the number of oscillators
in each bank can be reduced. The complex low-pass envelope g is given by the
Jakes model as:
52 Downlink Simulator
g(t) = g
1
(t) + g
2
(t) (3.33)
g
1
(t) =
2

N
osc
M+1

n=1
u
n
cos(
n
t +
n
) (3.34)
g
2
(t) =
2

N
osc
M+1

n=1
v
n
sin(
n
t +
n
) (3.35)
where
N
osc
= 4M + 2 (3.36)
u
n
=
_
2cos
n
, n = 1, 2, . . . , M

2cos
n
, n = M + 1
(3.37)
v
n
=
_
2sin
n
, n = 1, 2, . . . , M

2sin
n
, n = M + 1
(3.38)

n
=
_
n
M
, n = 1, 2, . . . , M

4
, n = M + 1
(3.39)

n
=
_
w
d
cos
2n
N
osc
, n = 1, 2, . . . , M
w
d
, n = M + 1
(3.40)

d
= 2f
d
(3.41)
The Jakes model generates the g
1
and g
2
components by two independent
banks of cosine wave generators. In the reference model, the means of g
1
and
g
2
are zero and the variance of g is equal to one. In principle, the two banks
of oscillators should generate two zero-mean real Gaussian noise processes with
identical variances. The processes are uncorrelated. However, this is not the case
as each bank with a limited number of oscillators generates colored noise. The
noise generated from one bank is correlated with the noise generated by the other
bank. Second-order statistics for the Jakes model are given by:
3.3 Jakes Channel Model 53
R
g
1
g
1
() =
4
N
osc
_
M+1

n=1
u
n
2
2
cos(
n
)
_
(3.42)
R
g
2
g
2
() =
4
N
osc
_
M+1

n=1
v
n
2
2
cos(
n
)
_
(3.43)
R
g
1
g
2
() = E
_
g
2
(t)g
1
(t + )

=
4
N
_
M+1

n=1
u
n
v
n
2
cos(
n
)
_
(3.44)
R
g
2
g
1
() = R
g
1
g
2
() (3.45)
3.3.3 Jakes Multipath
The Jakes fading model is suitable for simulating a at fading, i.e., single-path
channel [Li, Guan, 2000]. For simulating frequency-selective channel, it has to
be extended to a multipath. The wide-sense stationary-uncorrelated scattering
(WSSUS) channel is a commonly employed model for the multipath channel,
[Sadowsky, Kafedziski 1998]. The WSSUS model for multipath channel includes
both the variations in t and (see Section 3.2.1). The time-varying nature of the
channel is modeled as a wide-sense stationary (WSS) process. The attenuation
and phase shift associated with different delays are modeled with an uncorrelated
scattering assumption, [Jeruchim et al, 1992].
The extension to multipath can be achieved in a few different ways. A method
of assigning different arrival angles for the paths and applying orthogonal weight-
ing functions is proposed in [Dent et al, 1993]. Another possible approach is to
use the theoretical correlation function to nd a time offset after which the auto-
correlation of the process reaches a negligible value. The different paths could be
produced by the same Jakes model with the time offset.
A Jakes model with random phases of the oscillators is analyzed in
[Xiao et al, 2002] and is used in this project.
In this case, each path is modeled as a Jakes fading process with random
phases assigned to its low-frequency oscillators. All the paths are low-correlated
because of the random phases.
54 Downlink Simulator
3.3.4 Approaches for Implementation
There are different approaches for the implementation of the multipath Jakes
channel model. One possibility is to pre-generate the sequences of g (one se-
quence for each multipath component, or multipath can be achieved by using a
single delayed process) and reuse them later in the simulations. However, this ap-
proach implies that the channel is identical throughout the simulations. Moreover,
long sequences of the Jakes process require a lot of storage space. For example,
the complex Jakes coefcients for one second of channel data (Ts = 50ns, single
path) occupy 320 megabytes.
It is decided to investigate the possibilities to calculate the g function for every
transmitted OFDM sample on the y.
MATLAB code proling was performed because the initial simulator code was
extremely slow. Proling showed that almost all computational complexity is con-
centrated in only few lines of the code. After some optimizations, namely replac-
ing for loops with vector operations and usage of sparse matrices, the execution
time of the simulation was reduced up to 1500 times. Nevertheless, performing a
simulation with a big amount of transmitted-received bits still stays time consum-
ing. For the performance evaluation of the channel estimation algorithm described
in Section 3.4, a fast-fading channel simulator is required. A 50 10
6
bit transmis-
sion through a fast fading channel with 6 paths takes around 1 hour for a single
point of a BER curve. The rate of the simulated transmission is 7500 OFDM sam-
ples per second. The PC used for this measurement has a 2.4GHz Pentium IV
CPU, 1GB RAM and runs Linux 2.6 operating system.
3.3.5 Results
Figure 3.9(a) shows the envelope of the Jakes model output waveform. The
Rayleigh process is generated using one path of Equation 3.33. It can be ob-
served that the process is correlated in time. The probability density function
(PDF) of the waveform envelope is shown in Figure 3.9(b). The Jakes enve-
lope is an approximation of Rayleigh random process. It can be observed that the
PDF approaches theoretical curve as M increases. Choosing parameter M is a
trade-off between computational complexity and approximation accuracy. It can
be seen from Equations 3.33-3.40 that increment of M corresponds to increase
of storage requirements by 5 real numbers, adds a sin and a cos operation, four
multiplications and four additions for one calculation of complex channel attenu-
ation. We decided to set the value of M to 20 for further simulations as it closely
approximates the theoretical Rayleigh PDF.
3.3 Jakes Channel Model 55
(a) Jakes envelope (b) Probability Density Function of Jakes enve-
lope
Figure 3.9: Jakes envelope (f
d
= 400Hz, f
s
= 20MHz, carrier frequency: 3.5GHz)
The auto and cross-correlation plots shown in Figure 3.10 depict the compari-
son of the correlation properties of the Jakes model against the reference model.
The simulated curves are obtained using Jakes single path simulator given in
Equation 3.33, the reference curves correspond to Equations 3.29 to 3.32. The
theoretical correlation sequences are given in Equations 3.42 to 3.45. It can be
observed that the obtained correlation sequences approximate the desired correla-
tion characteristics.
The auto-correlation function of inphase, or quadrature components, decreases
in time at a rate that corresponds to Doppler frequency. As the Doppler frequency
increases, the time in which the channel is highly correlated (channel coherence
time (t)
c
) decreases. This effect can be observed in Figures 3.10(b) (400Hz
maximum Doppler frequency: carrier frequency is 3.5GHz, speed of movement -
120km/h) and 3.10(d) (100Hz maximum Doppler frequency - 3.5GHz at
30km/h). The Jakes simulator provides a sequence of complex Rayleigh dis-
tributed path gains. These path gains are correlated in time in a controllable fash-
ion.
According to [Rappaport, 1996], the coherence time of a channel (see Section
3.2.1) corresponding to the Doppler shift of 400Hz (3.5GHz carrier frequency at
the speed of 120km/h) is equal to:
56 Downlink Simulator
(t)
c
=

9
16 f
2
d
1ms (3.46)
The coherence time (t)
c
is also dened as the time for which the channel
correlation decreases by 3dB [Engels, 2002 (2), pp. 27]. It can be seen from
Figure 3.10 that Jakes path correlation decreases by 3dB (the correlation coef-
cient decreases by 2 times) in approximately 1 millisecond, thus the correlation
corresponds to the theoretical assumption in Equation 3.46.
As the channel coherence time ((t)
c
= 1ms) is much larger than the OFDM
symbol time (N
gu
= 100s), the channel can be considered to be slow fading in
WiMAX system. The model is capable to provide a fast fading channel model as
well.
Tables 3.2 and 3.3 give the cross-correlation coefcients of inphase and quadra-
ture components between 5 different Jakes paths. It can be seen that the differ-
ent paths are low-correlated. It approaches the WSSUS condition of zero cross-
correlation between paths.
Path 1 Path 2 Path 3 Path 4 Path 5
Path 1 1.0000 -0.0841 -0.1458 -0.0513 -0.0364
Path 2 1.0000 0.0410 -0.0670 0.0981
Path 3 1.0000 -0.0237 0.1185
Path 4 1.0000 0.1174
Path 5 1.0000
Table 3.2: Cross-correlation coefcients of Jakes model path inphase components
Path 1 Path 2 Path 3 Path 4 Path 5
Path 1 1.0000 0.0295 -0.0410 0.0269 0.0542
Path 2 1.0000 0.0510 0.0409 0.1791
Path 3 1.0000 -0.0663 0.1793
Path 4 1.0000 -0.0393
Path 5 1.0000
Table 3.3: Cross-correlation coefcients of Jakes model path quadrature components
3.3 Jakes Channel Model 57
(a) Inphase component auto-correlation, f
d
=
400Hz
(b) Quadrature component auto-correlation, f
d
=
400Hz
(c) Cross-correlation, f
d
= 400Hz (d) Quadrature component auto-correlation, f
d
=
100Hz
Figure 3.10: Jakes output waveform correlation (f
s
= 20MHz, carrier frequency: 3.5GHz)
58 Downlink Simulator
3.3.6 Channel Proles
In order to bring a channel model even closer to reality, a well-dened channel
prole has to be used. A channel prole denes the average signal attenuations at
certain path delays. The ITU Vehicular A, Vehicular B and Pedestrian B channel
proles are used to specify the properties of the multipath Jakes model in this
project. The path delays and the average powers are given in Tables 3.4 - 3.6.
Relative delay Average relative

i
, ns power A
i
, dB
0 0.0
310 -1.0
710 -9.0
1090 -10.0
1730 -15.0
2510 -20.0
Table 3.4: ITU Vehicular A channel prole
Relative delay Average relative

i
, ns power A
i
, dB
0 -2.5
300 0
8900 -12.0
12900 -10.0
17100 -25.2
20000 -16.0
Table 3.5: ITU Vehicular B channel prole
Relative delay Average relative

i
, ns power A
i
, dB
0 0.0
200 -0.9
800 -4.9
1200 -8.0
2300 -7.8
3700 -23.9
Table 3.6: ITU Pedestrian B channel prole
3.3 Jakes Channel Model 59
It is seen that the delays
i
of the ITU Vehicular A prole are not sample-
spaced according to the OFDMsampling rate (f
s
= 50.0ns), specied by WiMAX
in Table 3.1. The channel prole is resampled using this sampling rate in order
to fulll the assumption of the delays

i
being sample-spaced. The channel is
resampled using a simple approach of assigning the paths to the closest OFDM
sample:

i
=
i
, i = 1, 2, . . . , N (3.47)
N number of paths in channel prole;
Next, the mean attenuation values A, given in dB are converted to path vari-
ances
2
p
. In order to be able to control the signal power in the simulator, it is
desired to have the sum of path variances equal to one, but the given proles do
not have this property. Therefore, the average path powers are recalculated for all
the three channel proles by using the following equations:
A
i
= 10log
10
_

2
i

2
1
_
, i = 1, 2 . . . , N (3.48)

i
=
2

1
10
A
i
10
(3.49)
N

i=1

i
=
N

i=1
_

1
10
A
i
10
_
=
2

1

N

i=1
10
A
i
10
= 1 (3.50)

1
=
1

N
i=1
10
A
i
10
(3.51)

i
variance (energy gain) of i
th
path;
N number of paths in channel prole;
The variance of the rst path is calculated using Equation 3.51 and then the
rest are calculated using Equation 3.49. The resulting channel prole is given in
Table 3.7. The path delays are sample-spaced and the sum of tap attenuations is
equal to one, resulting in no amplication in the multipath channel. The same
approach is taken to resample ITU Vehicular B and ITU Pedestrian B channel
60 Downlink Simulator
proles [Chang et al, 2003]. They are given in Tables 3.8 and 3.9.
Relative delay Average relative

i
, ns power
2

i
0 0.485 (-3.1 dB)
300 0.385 (-4.1 dB)
700 0.061 (-12.1 dB)
1100 0.049 (-13.1 dB)
1750 0.015 (-18.1 dB)
2500 0.005 (-23.1 dB)
Table 3.7: Resampled ITU Vehicular A channel prole
Relative delay Average relative

i
, ns power
2

i
0 0.322 (-4.9 dB)
300 0.574 (-2.4 dB)
8900 0.030 (-15.2 dB)
12900 0.057 (-12.4 dB)
17100 0.002 (-27.6 dB)
20000 0.014 (-18.4 dB)
Table 3.8: Resampled ITU Vehicular B channel prole
Relative delay Average relative

i
, ns power
2

i
0 0.406 (-3.9 dB)
200 0.330 (-4.8 dB)
800 0.131 (-8.8 dB)
1200 0.064 (-11.9 dB)
2300 0.067 (-11.7 dB)
3700 0.002 (-27.8 dB)
Table 3.9: Resampled ITU Pedestrian B channel prole
The Doppler effect is investigated. The ITU Vehicular A channel prole de-
nes the path delays and variances of the Jakes simulator. Perfect (known) chan-
nel estimation (see Equation 3.23) is applied at the receiver. The properties of the
OFDM system are the following: N
FFT
= 256, F
0
= 250kHz, f = 1kHz,
T
s
= 4s, QPSK modulation. These settings are selected in order to reduce the
3.3 Jakes Channel Model 61
inter-carrier frequency f. When compared to the standard WiMAX parame-
ters, the ICI effect is substantial when the sub-carriers are closer in frequency,
i.e., the sub-carriers with closer frequency spacing become more overlapped due
to Doppler spread. Simulation results for three Doppler frequencies, correspond-
ing to receiver movement at 3km/h, 30km/h and 120km/h are shown in Figure
3.11. A theoretical multipath curve for QPSK (Equation 3.24) is also given for a
reference. It can be observed that the BER performance is close to the theoreti-
cal curve in low Doppler environment (3km/h). As expected, BER performance
deteriorates when the speed of movement increases.
Figure 3.11: Jakes fast-fading channel simulation results
62 Downlink Simulator
3.3.7 Summary
In this section, a Jakes fading channel model is presented and extended to a mul-
tipath model. It enables us to explore the performance of OFDM in a simulated
time-varying multipath radio channel with known and adjustable correlation prop-
erties.
This section starts with an assumption of slow-fading channel. It is shown by
both theory and simulation results with WiMAX parameters that a slow-fading
channel model is suitable to characterize a channel in a WiMAX system.
Several ITU channel proles are adapted to meet the assumption of sample-
spaced path delays. A method to recalculate the path attenuations is formulated
and applied to the proles.
In the next section, an OFDM channel estimation algorithm is presented and
the Jakes multipath channel is used for the performance evaluation of the estima-
tor.
3.4 Channel Estimation for OFDM 63
3.4 Channel Estimation for OFDM
In the previous section, Jakes channel model was presented. The slow-fading ra-
dio channel that is considered is frequency selective and time variant. The channel
transfer function varies across the sub-carriers of the OFDM system and from one
OFDM symbol to the next. In order to recover the transmitted data at the receiver,
channel estimation is needed. The purpose of this section is to uncondition the
assumption of ideal channel estimation presented in Section 2.6.
This section begins with an overview of existing channel estimation algo-
rithms for OFDM. Then, a time-domain channel estimation algorithm is analyzed.
Finally, simulation results are presented and discussed.
3.4.1 Overview of OFDM Channel Estimation Algorithms
Various channel estimation algorithms found in the literature are presented and
compared in this section without exhaustive details.
Blind Estimation vs Pilot Symbol Assisted Modulation
There are two essential methods for OFDM channel estimation - blind and pilot
assisted estimation.
Blind channel estimation techniques provide improved spectral efciency as
no pilot tones are needed but are effective only when a large amount of data is col-
lected under the same channel conditions. This is a disadvantage in the case of mo-
bile wireless systems because of their time-varying channel [Jeremic et al, 2004].
Blind channel estimation constraints itself to a static channel model, therefore,
use of blind channel estimation is rejected in this project.
Pilot-aided channel estimation is another technique that involves insertion of
pilots (known symbols) to the time-frequency grid of the OFDM system and esti-
mating the channel impulse response at the receiver. This technique is called Pilot
Symbol Assisted Modulation (PSAM).
Channel estimation using PSAM consists of two stages: estimation and in-
terpolation. During estimation, the channel gains are obtained in the OFDM
frequency-time grid points that contain pilots. Then the estimates are interpolated
to cover the whole grid.
64 Downlink Simulator
(a) Block type pilot scheme (b) Comb type pilot scheme
Figure 3.12: Pilot schemes
Pilot Schemes
The training data has to be sent on the selected pilot tones. Various schemes can
be used to insert the pilot symbols into the transmitted data stream: block, comb,
rectangular, triangular and others. [Negi, Ciof, 1998] gives a study of the impact
of pilot selection to channel estimation. The number of pilot tones are treated as
the number of equations in a system describing the impulse response. Therefore,
a number of pilots less than channel length results in an under-determined system
and a non-unique solution

H. It is also proved that equally spaced pilots are
optimally spaced and thereby give a better channel estimation. When the pilot
tones are concentrated in the time-frequency grid (not equally spaced), there is a
noise enhancement effect.
Comb type pilot scheme (Figure 3.12(b)) is one that has a few pilot tones
uniformly distributed within each OFDM symbol. It has a higher re-transmission
rate than block scheme and provides better tracking of dynamic channels. Since,
only some sub-carriers contain the pilot signal, the channel response of non-pilot
sub-carriers is estimated by interpolating neighboring pilot sub-channels. Thus,
this type of pilot arrangement is sensitive to frequency selectivity, i.e., the pilot
spacing must be much smaller than the coherence bandwidth (f)
c
of the channel
[Hsieh, Wei, 1998]. If the pilot spacing is greater than the coherence bandwidth,
interpolation can be complicated.
A block type scheme, on the other hand, periodically transmits an OFDM
3.4 Channel Estimation for OFDM 65
symbol containing only pilot information in time-domain (see Figure 3.12(a)).
This type of pilot arrangement is especially suitable for static channels. Since
the training block contains all pilots, channel interpolation in frequency-domain
is not required. Therefore, this type of pilot arrangement is relatively insensitive
to frequency selectivity [Hsieh, Wei, 1998].
The scattered pilot tones can be treated as noisy samples of the stochastic
channel frequency response function. They have to be placed close enough to
fulll the Nyquist sampling theorem and avoid aliasing [Sandell, Edfors, 1996].
[Yoon et al, 2002] presents an alternative pilot scheme - a boosted impulse is
inserted before a block of OFDM symbols and surrounded by no transmission
periods of maximum excess delay length. However, it is mentioned that this ap-
proach gives only satisfactory results in a single-path Rayleigh channel.
A technique called boosted pilots is proposed for Digital Video Broadcast
(DVB). The pilot tones are given a higher power than the data symbols. The aver-
age SNR of the data symbols is reduced but the channel estimates are better and
the BERcan be decreased with a suitable pilot power level [Sandell, Edfors, 1996].
Time and Frequency Domain PSAM
Channel estimation can be performed by both time-domain windowing and
frequency-domain interpolation. In time-domain windowing algorithms, the chan-
nel impulse response (CIR) is obtained by performing IFFT of the frequency-
domain channel response at the pilot symbols. The number of pilots must be
greater than the maximum excess delay [Tsai, Chiueh, 2004]:
N
p
>

max
T
s
(3.52)
Different techniques can be applied to the CIR in order to reduce noise and
aliasing: cutting off below a threshold, leaving most signicant samples or using
minimum mean squared error (MMSE). At last, the time-domain CIR is converted
back to frequency-domain channel attenuation coefcients at all sub-carriers by
using FFT [Tsai, Chiueh, 2004].
Frequency-domain interpolation, on the other hand, performs interpolation of
channel responses at the pilot symbols to obtain estimates at all sub-carriers. In
this case, the pilot sub-carriers over-sample the channel frequency response in the
frequency domain by at least a factor of two [Tsai, Chiueh, 2004]:
66 Downlink Simulator
N
p
>
2
max
T
s
(3.53)
Time-domain estimation algorithms improve spectral efciency with a draw-
back of higher latency and complexity. Frequency domain estimation is usually
less complex but has lower transmission efciency because of a higher pilot rate
[Tsai, Chiueh, 2004].
Estimation Algorithms
In conventional pilot estimation methods, the estimate of pilot symbols based on
least-squares (LS) is given by:

h
p
=
_

h
p
(0),

h
p
(1), . . . ,

h
p
(N
p
1)
_
T
=
_
z(D
f
0)
a(D
f
0)
,
z(D
f
1)
a(D
f
1)
, . . . ,
z(D
f
(N
p
1))
a(D
f
(N
p
1))
_
T
(3.54)

h vector of channel estimates;


z received symbol vector after FFT;
D
f
pilot spacing;
a transmitted data symbol vector;
This estimate has very low computational complexity, but is susceptible to
Gaussian noise and ICI. An MMSE estimator of comb-type pilots reduces the
AWGN and ICI components signicantly in slow and fast fading channels and
gives BER improvements. The computational complexity of MMSE algorithm
is reduced by using a simplied Linear MMSE estimator (LMMSE) with low
rank approximation by singular value decomposition (SVD) [Hsieh, Wei, 1998],
[Hutter et al, 2002], [Beek, 1995]. The LMMSE estimator is also found to have
about 1.5 times lower mean square error than of the FIR Wiener lter in
[Edfors et al, 1998].
3.4 Channel Estimation for OFDM 67
Interpolation Algorithms
After an estimation is made to get the channel gain values at some pilot sub-
carriers, the channel has to be estimated for the useful sub-carriers.
[Coleri et al, 2002] presents a decision feedback equalizer for block type pilot
arrangement. The channel is estimated periodically in dedicated pilot symbols. In
the other OFDMsymbols, the sub-carrier estimates fromprevious symbol are used
to nd the transmitted data symbol. After the received symbol is mapped to binary
data by making a hard decision, the channel estimates are updated. This is done by
mapping the binary data back to data symbols and using these symbols to update
the estimates. However, the decision feedback equalizer makes an assumption
that every decision is correct. A fast-fading channel would cause a total loss of
channel estimates. Faster channel fading causes estimation error due to the loss of
channel tracking [Coleri et al, 2002].
Alow-rank separable lter - frequency LMMSE estimator combined with time
direction FIR lter is presented in [Sandell, Edfors, 1996]. It shows low receiver
complexity and good performance in a slow fading channel.
With comb pilot scheme, the best interpolation algorithms are spline and low-
pass lter, according to [Coleri et al, 2002].
3.4.2 Time-Domain Channel Estimation
The performance of a time-domain OFDM channel estimation algorithm is in-
vestigated because it provides better spectral efciency by using less number of
pilot tones compared to frequency-domain interpolation. A non-adaptive channel
estimator is analyzed rst. The structure of the estimator is shown in Figure 3.13.
The chosen OFDM pilot scheme has N
p
= N
g
pilot tones spaced equally
in each OFDM symbol. The spacing between the pilot tones D
f
is an integer
given by D
f
= N
FFT
/N
p
. Having the assumption of /T
s
< N
g
, the number of
pilots satises the condition in Equation (3.52). The method does not involve time
interpolation. The channel estimates are calculated for every OFDM symbol, not
taking into account the estimates from the previous symbol. Therefore, the OFDM
symbol index m is dropped in the following equations.
The data received after the FFT operation as in Figure 2.15 is in the frequency
domain and is used for channel estimation. The size of the received signal vector
z is [N
FFT
1]. The received symbol contains N
p
pilots. The Least Squares
estimation (Equation 3.54) is performed to obtain the vector of channel responses
at the pilot sub-carriers

h
p
. The dimensions of

h
p
are [N
p
1].
68 Downlink Simulator
Figure 3.13: Structure of a time-domain channel estimator
The

h
p
vector is converted into time-domain channel impulse response using
an IFFT operation with FFT size equal to N
p
. After the IFFT operation, the chan-
nel impulse response is N
p
samples long. It holds the whole CIR as it is assumed
that the CIR length is less than N
g
. The time-domain CIR is converted back to the
frequency domain channel response coefcients at all the sub-carriers by using an
FFT operation of size N
s
. This corresponds to frequency-domain interpolation as
the channel response for every sub-carrier is acquired.
3.4.3 Performance of Time-Domain Channel Estimation Algo-
rithm
The BER performance of the presented time-domain OFDM channel estimation
algorithm is shown in Figures 3.14, 3.15 and 3.16. Both slow- and fast-fading
Jakes multipath channel simulators are used (see Section 3.3.2) with ITU Vehic-
ular A channel prole (see Section 3.3.6). The parameters of the OFDM system
correspond to those of WiMAX specication (Table 3.1). A theoretical multipath
channel performance curve is also given as a reference. For ease of comparison,
the plots also include BER performance curves of an adaptive channel estimator
that is presented later in this section.
It can be seen that the time-domain channel estimation algorithm performs
about 3dB poorer than the theoretical channel estimation at all speeds and SNR
3.4 Channel Estimation for OFDM 69
values. As expected, the BER performance decreases when speed increases.
Also, there is a difference between the BER performance when a slow- and
fast-fading channel simulators are used. Therefore, the fast-fading simulations
can not be approximated by slow-fading.
Figure 3.14: Channel estimation BER performance at 30km/h
70 Downlink Simulator
Figure 3.15: Channel estimation BER performance at 60km/h
Figure 3.16: Channel estimation BER performance at 120km/h
3.4 Channel Estimation for OFDM 71
3.4.4 Adaptive Time-Domain Channel Estimation
An adaptive estimator that tracks the time-varying channel response is investigated
in order to reduce the bit error level that was previously observed in the non-
adaptive approach.
The structure of the estimator is shown in Figure 3.17.
Figure 3.17: Structure of an adaptive time-domain channel estimator
In the non-adaptive estimator, only pilot tones of the current OFDM symbol
are used to obtain the channel frequency response. This leads to noise-corrupted
estimates and hence the increased BER. An adaptive process is employed to track
the time-varying channel response. In this project, a simple Least Mean Squares
(LMS) adaptive lter [Haykin, 2002] is used because of its low computational
requirements [Veiverys et al., 2004].
This approach corresponds to the identication class of adaptive ltering,
[Haykin, 2002]. The adaptive lter is a linear model that provides an approxima-
tion to the channel. The radio channel supplies a desired response to the adaptive
lter. The desired response is used during adaptation. The equalizer operates on
the adaptive channel estimate so that the cascade of the channel and equalizer
outputs provide an approximation to an ideal transmission medium.
3.4.5 Performance of the Adaptive Channel Estimation Algo-
rithm
The performance of the adaptive channel estimator is shown in Figures 3.14, 3.15
and 3.16 together with the non-adaptive estimator results.
72 Downlink Simulator
It can be seen that the adaptive estimator provides better BER performance
than the non-adaptive estimator. The performance loss is approximately 1.5dB
against a 3dB loss in the case of non-adaptive estimator. However, when a high
speed of movement is simulated (Figure 3.16), the performance drops because of
the tracking error in the adaptive lter.
Further investigation about using different adaptive algorithms other than LMS
is required to achieve a better tracking of the fast fading channel. However, us-
age of adaptive algorithms that provide better channel tracking demands higher
computational complexity.
It was found that the adaptive channel estimation gives better BER perfor-
mance compared to that of the non-adaptive approach. Nevertheless, the analysis
and performance comparison of various adaptive lter algorithms is not continued
further in the project as it is not the primary goal.
3.5 Summary 73
3.5 Summary
In this chapter, the performance of the OFDM system simulator is analyzed.
The parameters assigned to the OFDM system correspond to the IEEE 802.16
(WiMAX) standard.
The rst step is to start with a simple AWGN channel model. It is proved
that the simulator based on the analytical model presented in Chapter 2 matches
its theoretical performance. Rayleigh and Jakes channel models that approach
reality in that order are presented and analyzed.
Finally, a time-domain OFDM channel estimation algorithm is presented. The
performance is measured through simulations and increased by using an adaptive
algorithm. While analyzing the performance of the channel estimator, it is shown
that a slow-fading Jakes channel model is not a sufcient approximation for a
high mobility WiMAX system.
CHAPTER 4
IMPLEMENTATION OF THE OFDM
SIMULATOR
During the analytical part of the project, several channel models for the OFDM
system simulator were written in MATLAB. The Jakes multipath fast fading sim-
ulator (Section 3.3.2) has high computational requirements, because its gain co-
efcients have to be updated during every OFDM sample. Independent banks of
oscillators are used for each channel path, making the simulator computationally
complex. In the following part of the project, methods for improving the speed of
the simulator are investigated.
In order to fulll the goal of speed improvement, design space exploration is
performed throughout the chapter. The design space includes all suitable design
tools, methodologies, architectures and optimization strategies for the algorithm
to architecture transformation. Solutions, that are best suitable for the particular
case, are selected from the design space for every step of the design process.
This chapter starts with a survey of hardware/software (HW/SW) co-design
methodologies. A generic development ow is described in order to show what
steps are common in many of the HW/SW co-design methodologies found in the
literature. The various solutions for each of the steps are briey presented.
Acomplexity analysis of the OFDMsimulator algorithmis performed. It gives
suggestions for the HW/SW partitioning and platform selection. Selection of a
hardware platform is performed before formulating the project methodology for
the implementation part. Choosing the platform also implies the implementation
activities and at least some of the development tools to be used. After the decision
about the platform is taken, a more precise activity ow of implementation is
planned and described in detail.
Finally, the system is implemented, results are obtained and discussed.
76 Implementation of the OFDM Simulator
4.1 Hardware/Software Co-Design
In this section, hardware/software methodologies are presented. The Rugby meta
model species the domains and abstraction levels that are used to describe method-
ologies. Next, a generic co-design ow is presented. It is further expanded by
presenting various solutions and tools for the design steps that are available in the
market or found in the literature.
4.1.1 Rugby Meta Model
The Rugby meta model is a model of design processes for complex systems re-
quiring concurrent processes and mixed hardware/software implementation. The
Rugby model has four domains, namely, Computation, Communication, Data and
Time [Jantsch et al, 2000].
The four domains of the Rugby meta model allow analysis of different aspects
of the model independently at different level of abstractions. As shown in Figure
4.1, [Jantsch et al, 2000], the abstraction level of the model decreases as the design
stages proceed. The design process starts with an idea and nishes with a low-
level description of the hardware part and a compiled software. An important
property of the meta model is that it is designed for HW/SW systems. Therefore,
each domain is split to hardware and software abstractions at the point where
HW/SW co-design begins, as seen in Figure 4.2.
The design description abstractions are shown in Figure 4.2. It is quite obvious
that different abstractions are required for hardware and software descriptions, as
for example the lowest software description is a source code compiled in its target
instruction set, whereas hardware can go further to transistor level.
The Rugby meta model allows to identify design aspects and abstraction levels
while dening a HW/SW co-design methodology. The Rugby conceptual frame-
work relates design phases from requirements to implementation. It aids in iden-
tifying and categorizing the problems in the design process.
In Section 4.1.2, a conventional approach of HW/SW co-design is described
and is mapped to the Rugby meta model.
4.1.2 Generic Co-Design Flow
In this section, a generic HW/SW co-design ow is presented. The purpose is to
identify the various steps of the co-design process so that a methodology for the
implementation part of this project could be formulated.
4.1 Hardware/Software Co-Design 77
Figure 4.1: Rugby meta model
Figure 4.2: Domains of the Rugby model
Hardware/software co-design can be dened as the cooperative design of hard-
ware (Application Specic Integrated Circuits (ASICs), Application Specic Inte-
grated Processors (ASIPs), Field Programmable Gate Arrays (FPGAs), etc.) and
software (executed on programmable processors such as General Purpose Proces-
sors (GPPs), Digital Signal Processors (DSPs), etc.). Co-design research deals
with the problems related with the development of heterogeneous architectures,
[Niemann, 1998].
According to [Niemann, 1998], heterogeneous systems can be classied as:
78 Implementation of the OFDM Simulator
Multi-chip systems
Single-chip systems
A multi-chip system is integrated on one or more boards containing compo-
nents like ASICs, FPGAs, GPPs, ASIPs, etc. A single-chip system, also called
as System on a Chip (SoC), combines multiple "soft" or "hard" processor cores,
ASICs, DSPs, memories and so on. Processor and micro-controller Intellectual
Property (IP) cores are provided as component libraries for ASIC or FPGA devel-
opment tools. The criteria for choosing between a single- and a multi-chip system
are performance, reliability, power consumption, manufacturing cost, exibility
and others.
A generic approach to hardware/software co-design is depicted in Figure 4.3.
Figure 4.3: Generic co-design ow
The system specication gives a description of the system model using a
4.1 Hardware/Software Co-Design 79
high-level language. Also, implementation requirements are provided in the form
of computational or memory usage constraints, performance constraints such as
frames per second, data rate, energy consumption, etc.
The system specication corresponds to the initial stage of the Rugby model.
In the time domain, time is expressed in terms of performance constraints, com-
munication is dened by data relationships, computation is given by the overall
algorithm and data domain is specied by abstract data types.
In order to proceed with the implementation in hardware, the system speci-
cation is expressed by an internal behavioral description that is usually in terms
of a task graph. It expresses the system design in the form of a graph containing
system functions and processes. This step corresponds to the Rugby meta-model
in the following way: the specication is divided into separate tasks, expressed by
system functions in the computation domain. Then, timing abstraction is lowered
by expressing it in causality between tasks. System partitioning deals with:
partitioning the functionality to be implemented into grains that interact
with each other.
obtaining an optimum grain size dened according to [Ge, Yun, 1996] as a
set of program steps or instructions that have to be executed sequentially by
a single processor.
The partitioning introduces parallelism into the system. However, the par-
allelism is abstract because it is still not clear which tasks are going to execute
concurrently yet. However, they can be divided into concurrent processes. This
relates to a lower abstraction level of the computation domain. In the Rugby
model, partitioning is the step when the system is split into separate concurrent
processes and communication between the processes is dened.
Allocation is the process of allocating the grains obtained after partitioning
onto processing elements available on the architecture. A processing element (PE)
is an entity that implements one or more processes depending in its complexity.
ASICs, ASIPs, DSPs, FPGAs and general purpose microprocessors are all differ-
ent types of PEs with different abilities to deal with varying levels of complex-
ity. Allocation distributes the grains to available processing elements. Allocation
phase results in sets of grains that are decided to be implemented as hardware and
software parts of the system. After the allocation, the development divides to two
branches - software and hardware (see Figure 4.2).
An interface synthesis is required to implement the system on a heteroge-
neous target architecture that enables communication and synchronization be-
tween different PEs on the architecture.
80 Implementation of the OFDM Simulator
Scheduling is performed once all the resources are allocated. The goal of
scheduling is to produce a design with best possible performance in terms of
speed of execution or limited resource usage, [Gajski, Ramachandran, 1994]. The
software descriptive elements are compiled for their respective processor instruc-
tion sets. The parts of the system that are allocated onto hardware processing
elements are synthesized from their corresponding hardware descriptions. The
time domain is expressed in clocked and physical time form. Inter-node communi-
cations are synthesized into buses between processing units or data ow topology
inside the PEs. The abstraction for data types is lowered by redening exact data
format in terms of bits or analog values.
In the nal HW/SW integration step, the hardware description blocks are
integrated together with the software and interface elements.
In each phase of the ow as depicted in Figure 4.3, a verication ow runs
through parallely. At the system specication level, the functionality that is writ-
ten in a high-level language is veried. The performance of the partitioned and
allocated system is veried against various requirements and constraints, such as
timing, area, clock rate, energy, etc. Once the partition is set and the hardware and
software architectures are determined, the focus moves to verifying their behav-
iors. The hardware and software designs are implemented and veried separately
along with the interface between them. Again, verication takes place on the sys-
tem level after HW/SW integration step. If all the performance constraints are met
and the cost of the design is acceptable, the co-design process stops. Otherwise,
the process is repeated from the system partitioning step to optimize the design
until a satisfactory system implementation is found.
4.1.3 HW/SW Co-Design Literature Survey
An extensive research has been performed relating to issues dealing with hard-
ware/software co-design. The issues start from system specication and end with
the implementation on an architecture. In this chapter, a literature survey is per-
formed in order to explore various tools and methods to pursue the intermediate
steps depicted in Figure 4.3.
Various tools have emerged to address the hardware/software co-design prob-
lem. [Niemann, 1998] has an account of such tools namely Cosyma, Lycos, Mickey,
Tosca, Vulcan, Chinook, Cosmos, CoWare, Polis and SpecSyn. The present sec-
tion gives a brief account of some of the tools. Therefore, it is emphasized that a
thorough investigation of the tools is not performed in this project.
Cosyma (CO-SYnthesis for eMbedded Architectures) [Cosyma, 1998] can
generate single processor systems with application specic coprocessors and user
4.1 Hardware/Software Co-Design 81
dened peripherals. Cosyma rst translates the algorithmic description to an in-
ternal behavioural description. Secondly, a run time analysis is performed on
it. A hardware/software partitioner based on simulated annealing [Reeves, 1995]
partitions the program into hardware and software parts. The hardware part is
synthesized using a high level synthesis tool. The software part is compiled with
a C compiler. Finally, a co-simulation is performed using the compiled software
part and the timing information of the hardware part which is supplied by the high
level synthesis.
Lycos [Madsen et al, 1997] addresses the target architecture consisting of a
single CPU and a single ASIC communicating through memory-mapped I/O. The
initial specication of the system is in VHDL or C. Automatic translation into an
internal representation is performed with some limitations. Automatic partitioning
is performed using an algorithm called PACE. The register transfer level and down
to the nal layout rely on commercial design tools.
Mickey [Mitra, Basu, 1997] addresses hardware/software co-design issues re-
lated to microprocessor-based systems. The system is specied using SpeX, an
augmented version of Statecharts and the Speccharts languages. Mickey renes
the specication by a set of decomposition rules into control and data ow graph
of primitive functions which are described in the design library. After selecting
the primitive functions, they are allocated and scheduled on hardware and software
parts. Interface, SW and HW design steps are then performed for implementing
the system.
Chinook [Chou et al, 1995] emphasizes system module interface and syn-
chronization. The system is used for real-time reactive controllers initially spec-
ied in Verilog. Real-time reactive controllers are control dominated and are de-
signed to react to external events. Therefore, they are often called reactive sys-
tems, [Niemann, 1998]. Hardware/software partitioning is performed manually by
the designer. It does not provide code generation tools for the target processors,
but uses standard C compilers. Chinook synthesizes the hardware and software
needed for inter-process communication.
DK Suite [Celoxica, 2005 (4)] is a hardware/software co-design environment.
The system is specied using Handel-C, a C-based language. Facilities for both
manual and automatized partitioning are available. DK Suite is described in more
detail in Section 4.4.1.
AccelChip [AccelChip, 2005] accepts system specied in MATLAB code that
uses oating-point arithmetics. After the system is veried manually using sim-
ulations, the tool automatically generates xed-point MATLAB code. The tool
also generates a synthesizable RTL code that is used for implementation in hard-
ware. All the issues related to hardware/software co-design as handled in the
82 Implementation of the OFDM Simulator
tools described previously are not dealt by AccelChip. However, it achieves a
rather fast implementation of a functionality in hardware. The implementation is
fast because the designer does not have to convert the description in MATLAB into
any other language for hardware synthesis. Evaluation of the tool could not be
obtained from the literature. The only information available is from the articles
provided by the vendor.
A system specication may be provided using tool-specic languages men-
tioned previously in addition to STATEMATE [I-Logix, 2005], MatrixX
[NI, 2005], MATLAB [MathWorks, 2005], COSSAP [Synopsys, 2005],
SPW [CoWare, 2005], etc. The information about the target architecture helps
in making some decisions while modeling a system of interest. For example, an
architecture that supports only xed-point arithmetic forces the designer to pursue
oating to xed-point conversions.
Task graphs can be generated manually or automatically. Automatic gener-
ation spares the designer from manually sketching the task graphs that tend to
explode in size with increase in the complexity of an application. [Walker, 1991]
made a survey about high-level synthesis systems that make use of different forms
of task graphs to represent the behavioral description of an application. Task
graphs can be in the form of directed acyclic precedence graphs (DAPG)
[Koch, 1996], data ow graphs (DFG) [Groot, 1990], sequential graphs
[Micheli, 1994], hierarchical control and data ow graphs (HCDFG)
[Le Moullec et al, 2002], etc. Every tool that generates task graphs automatically
requires the behavioral description to be specied in a particular language. For
example, a tool called Design Trotter (DT) [Le Moullec et al, 2002], generates
HCDFG automatically from C code. The input to any task graph generation tool
has to take care of restrictions placed by the tool. For example, the requirement
of DT is that the C code is written without using pointers.
Partitioning the system functionality into grains has been studied by many
researchers. The grain size depends on the requirements of the functionality to
be implemented. On one hand, if the grain size is too large, parallelism is lim-
ited. Otherwise, if the grain size is too small, communication delays reduce per-
formance, [Krauatrachue, Lewis, 1988]. Also, excessively ne graining would
decrease the understandability of a system as its functionality is distributed onto
many processing elements available on the architecture.
Various methods have been proposed in order to solve the grain size prob-
lem. [Krauatrachue, Lewis, 1988] and [Ge, Yun, 1996] suggest grain packing and
2D compression respectively to solve the grain size problem. They claim that
their corresponding automatic tools give optimal grain entities without users in-
tervention about the trade offs between parallelism and communication overhead.
4.2 Algorithm Analysis 83
Each method has its own requirements and assumptions regarding the type of task
graph, underlying architecture, connectivity between different parts of the archi-
tecture, and completion of tasks assigned to different parts of the architecture.
The grain size decision also affects the clock speed which might be a per-
formance measure for some applications. A mention of clock speed in relation
to partitioning can be found in [Frank, 1996]. In a coarse-grain decision an en-
tity would need more time for execution as it has comparatively more tasks to do
than a ne-grain decision. The clock period has to be greater than or equal to the
execution time of the entity that takes maximum time to get executed.
Allocation is an NP (non-deterministic polynomial, [Reeves, 1995]) problem
and a variety of heuristics are proposed in the literature to solve it. The ap-
plication of genetic algorithm for allocation is described in [Ouaiss et al, 1997].
[Cosyma, 1998] uses simulated annealing for allocation.
Scheduling can be either resource-constrained or time-constrained. After allo-
cation is performed where different grains are assigned to different PEs, there can
be two different goals for a scheduling algorithm. If the goal is to minimize the use
of number of functional units for a xed number of control steps (clock cycles), it
is called a time-constrained approach. On the other hand, if the goal is to minimize
the number of control steps for a given number of resources (functional units, stor-
age units, etc.), it is called resource-constrained approach. The approaches of Inte-
ger Linear Programming (ILP), Force-Directed heuristic and iterative renement
for time-constrained scheduling are described in [Gajski, Ramachandran, 1994].
Resource-constrained scheduling can be performed using list-based method and
static-list scheduling, [Gajski, Ramachandran, 1994].
4.2 Algorithm Analysis
Initially, the MATLAB simulator is targeted to run on a PC with 64-bit oating-
point arithmetics. The area of general purpose microprocessors is high compared
to processors specically targeted for DSP applications. The execution time is
also high because of overhead of the operating system, MATLAB interpreter and
other tasks. During the design steps, the updated position in the Time-Area plane
will be depicted with small gures in the margin.
The process of speeding up the OFDM simulator starts with analysis of the
algorithm. The analysis is performed using the two following approaches:
84 Implementation of the OFDM Simulator
MATLAB proling;
Manual estimation of algorithm complexity.
As already mentioned in Section 3.3.2, MATLAB proling showed that ap-
proximately 90% of the computation time is spent on the Jakes channel simu-
lation and the convolution of the channel with the transmitted signal. However,
the proling does not give enough input to make design decisions. It only shows
what parts of the system run slower than the others and therefore are candidates
for rewriting or ofoading into a coprocessor. However, due to latency between
the system boundaries and interfaces, it is desirable to minimize the data ow
between the hardware and software, [Xcell, 2005].
To get more detailed estimates of the computational complexity of the overall
OFDM simulator, a manual analysis is performed. The nature of the operations
and the rate of data exchanged between various steps of the algorithm are taken
into account. The option of calculating manual estimates is chosen because of
lack of proper tools for this task. One tool that promises such analysis is Design
Trotter [Le Moullec et al, 2002], but it only takes plain C code and has a number
of restrictions on it. It is observed from the experience with DT that it is an
academic tool in an early stage of development. It was decided to perform the
MATLAB code estimation manually in this step.
For the manual estimation, the whole OFDM simulator is split to grains. The
grain size corresponds to steps of the algorithm explained in the Analytical part of
the thesis. Complexity, data and operation types are analyzed for each grain (num-
ber and types of operations during simulation of a single OFDM symbol transmis-
sion). Data rates between the grains are also calculated. Global variables, such
as constant arrays used in some of the functions are not taken into account.why?
The control ow inside the different grains is not taken into account. This is cho-
sen because of two reasons. First, most of the blocks do not include conditional
execution (except for decision), only matrix/vector operations that are expressed
as xed loop operations and therefore, are subject to loop optimization. Second,
calculating the control ow manually for the whole simulator is a infeasible task
because various matrix operations are built-in MATLAB functions, or black boxes.
To perform the control ow estimation would require detailed analysis of the op-
eration implementation. The results of manual estimation are shown in Figure
4.4.
It has to be noticed that the proling and manual estimation give results of
different type. It is not easy to estimate run time for each function manually, but it
is feasible to estimate the amount of data and operation types for a small amount
of source code. As already mentioned, the proler tool in MATLAB produces a
statement of processor time, but does not consider communication costs.
4.2 Algorithm Analysis 85
Figure 4.4: Manual OFDM simulator complexity estimates
The manual analysis conrmed that most of the computational complexity is
concentrated in the channel simulator. It also shows that the communication data
rate is higher inside the simulator as compared to its boundaries. The transmitted
and received OFDM symbols are complex time-domain signals, whereas inside
86 Implementation of the OFDM Simulator
the channel simulator a number of independent paths are simulated.
We decided to ofoad the channel simulator, to a hardware accelerator be-
cause it contains most of the computational complexity, as shown by both manual
estimation and MATLAB code proling.
When choosing a suitable platform, the nature of operations also has to be
considered. As it can be observed from Figure 4.4, most of the operations in the
channel simulator are arithmetic, namely, addition and multiplication of complex
numbers. It is also found that the multiplications and additions in the channel con-
volution part can be grouped together into Multiply-and-ACcumulate operations
(MAC) that can be used to speed-up execution on certain architectures. Also,
there are no conditional execution branches in the channel simulator source code.
However, it is likely that they are hidden in the built-in MATLAB functions and
may appear when the simulator is implemented in a lower-level language. During
the manual analysis it is also found that the code is suitable for parallel execution.
More detailed parallelism exploration is presented later in Section 4.5.5.
4.3 Platform Selection
In this section, a platform for hardware implementation is chosen. The possible
candidates cover DSPs, FPGAs and ASICs solutions.
A DSP can be considered to be a suitable architecture when the functionality
to be implemented has a substantial number of signal processing operations. Com-
mon signal processing, such as FIR lters, dot product or FFT, involve a heavy
use of multiply-and-add or multiply-and-accumulate operations (MAC). DSPs in-
clude one or more MAC units combined with facilities for fast operand fetching
from memory [Eyre, Bier, 2000].
FPGAs and ASICs are faster when compared to DSPs. ASICs run as fast as
the technology allows. Similarly, an FPGA that has been optimized to perform
a digital-signal-processing task, will run anywhere from 10 times to more than
1000 times faster than a single DSP chip, [Mentor Graphics, 2003]. Whereas a
DSP typically employs serial processing, the parallel capacities inherent to either
ASIC or FPGA architecture will always give them both a signicant edge over
DSPs, [Mentor Graphics, 2003].
There are several other factors like power consumption, xed costs, sampling
rate etc., in which FPGAs and ASIC score over DSP. However, they are not among
the project requirements and are not explored further.
Though ASICs provide a better speed performance over both DSP and FPGA,
they are not exible and involve high NRE (non-recurring engineering) costs and
4.4 Methodology for Channel Model Implementation 87
so are not viable in our academic environment. An FPGA for digital-signal-
processing gives an unlimited customizing options in a chip without all the sil-
icon physical-design work required for an ASIC solution. Performance-wise,
an optimized FPGA chip can run with speeds at par with its ASIC counterpart,
[Mentor Graphics, 2003].
We decided to look for an architecture involving an FPGA because the al-
gorithm is suitable for parallel implementation, as found in the previous section.
Also, the majority of operations are non-control oriented. Such kind of algorithm
is well suitable for FPGA implementation, as stated in [Hunt Engineering, 2004].
The Lyrtech SignalMaster, described in Appendix E was considered to be used
because of its availability in the laboratory. It was acquired in the year 2000. The
software supplied by Lyrtech is compatible only with very old MATLAB versions.
Though the old software versions were obtained after a long delay, the expected
MATLAB co-simulation examples did not work. Finally, it was decided to discon-
tinue the efforts for the SignalMaster because of lack of time and risk of failing to
set up the development tools.
After abandoning SignalMaster, Xilinx Virtex 4 FX, Virtex II Pro and Virtex II
FPGAs are investigated as they are advertised to be the fastest available FPGAs
for signal processing and communication applications in the present day market.
Some comparisons are drawn between these chips and a comparison table can be
found in Appendix G. The RC203 development board from Celoxica containing
a Xilinx Virtex II FPGA is chosen to be used for implementing the channel model.
A summary of the boards features is given in Appendix F.
4.4 Methodology for Channel Model Implementa-
tion
4.4.1 The DK Methodology
The DK Design Suite enables the designer to enter system descriptions in a high-
level programming language like Handel-C, simulate and debug the code using an
integrated development environment (IDE). The DK Suite includes Data Stream-
ing Manager (DSM) and Platform Abstraction Layer (PAL) which facilitate the
development of Handel-C applications and the migration of software from mi-
croprocessor implementations to FPGAs. PAL provides a consistent API (Appli-
cation Programming Interface) through which Handel-C applications can access
hardware I/O and other features. The DSM is an integration mechanism for soft-
ware applications executing in processors and functions in FPGA hardware.
88 Implementation of the OFDM Simulator
The design ow for the algorithm to architecture mapping, according to A
3
domain using DK Suite is shown in Figure 4.5, [Celoxica, 2005 (4)]. The in-
vestigations in this thesis include only the shaded portion of the gure and are
explained below.
Figure 4.5: Design ow with DK Suite
The program code can be simulated and debugged in a cycle-accurate simula-
tor. It allows implementation to be tested without a real FPGA. Execution speed
can be estimated with the simulator, as it gives information about the clock cy-
cles used. Before placement of the design, the simulator allows the designer to
experiment with different optimization strategies until the design goals are met.
The DK EDIF compiler allows the user to perform technology mapping of
general logic into device-specic logic blocks. The Electronic Design Interchange
Format (EDIF) output of the DK compiler is a device-specic netlist, which uses
logic gates to describe the design. The Xilinx Place/Route tools perform technol-
ogy mapping, which translates the gate-level netlist into a device-specic bit le.
The DK Suite generates a report of the number of LUTs, ip-ops and memory
bits synthesized for each line of Handel-C code. The Logic Estimation Tool that
is a part of the DK Suite provides logic area and depth summary from a pre-place
and route estimation based upon the design and the target device specications.
4.4 Methodology for Channel Model Implementation 89
4.4.2 Handel-C
According to [Page, 2002 (1)], the traditional design of a chip starts with a ref-
erence design written in a high level programming language. It is manually re-
written as a description of an electronic circuit. Hardware Compilation, a process
of turning a program into hardware, aims to achieve the same result but without
the re-writing process. In this way, designing the two major components of elec-
tronic products namely hardware and software becomes almost the same process.
The designers program is compiled to a circuit, passed through a Place/Route tool
that turns it into a set of bits, and loaded into a static RAM conguration memory
inside the FPGA where it executes. This bit stream denes:
the functionality of each of the logic elements
the internal routing between the logic elements
The Hardware Compilation Research Group that Prof.Ian Page founded at
Oxford University dened the semantics and syntax of the Handel-C program-
ming language. DK Suite, which is the core of the commercial offer from Celox-
ica, is based upon Handel-C. Handel-C is the C language retrotted with the
Handel model of space and time. The C language is rst stripped of the few
features that make it incompatible with the Handel model, such as side-effecting
expressions. C was chosen because it is a widely known and accepted language
[Page, 2002 (1)].
The underlying methodology behind Handel-C is called Handel. The goal of
this methodology is to describe functionality and algorithms in a language apart
from describing hardware. Oppositely, hardware description languages are only
usable by hardware engineers and are at too low a level of abstraction to cope with
complex designs.
Though it is possible to describe an algorithm in a popular programming lan-
guage such as C or C++ and compile it directly to hardware, it is not sensible
as these languages have sequential semantics. Direct hardware compilation from
a sequential language would probably be no faster than software running on a
processor and the major advantage of hardware implementation would not be re-
alized, [Page, 2002 (1)]. Also, according to [Celoxica, 2005 (3)] there are fun-
damental limits to the extraction of concurrency from sequential algorithms and
heuristics-based approaches have been unsuccessful for all but the most naturally
pipelined algorithms. Even if a compiler that extracts parallelism automatically
from a sequential description of an algorithm is employed, there is a probability
that it fails or performs poorly on that particular algorithm, [Page, 2002 (2)]. To
90 Implementation of the OFDM Simulator
avoid the above stated scenario, Handel-C makes it possible for the designer to in-
dicate parallelism explicitly. Handel-C also gives control of time to the designer.
Then, with the control at an appropriate level of abstraction over both space (par-
allelism) and time, the designer can discharge the obligations that a compiler is
not capable of doing, [Page, 2002 (2)].
The essential features of the Handel methodology, according to [Page, 2002 (2)]
are:
the par construct which allows the designer to indicate that two or more
computations will execute in parallel;
the single clock assignment rule that states that an assignment statement will
always complete in a single clock cycle and that nothing but an assignment
and delay statements will take any additional clock cycles to execute;
the channel element that connects two parallel processes and allows data
exchange between them;
a bit width inference system which allows variables, channels and expres-
sions to have an arbitrary bit-width associated with them and that the com-
piler will determine and set this width whenever the designer has not set it
explicitly.
4.4.3 Followed Methodology
The DK Suite environment is chosen to be used for FPGA implementation be-
cause of its compatibility with the Celoxica RC203 development board.
The ow of activities, that was planned originally, is shown in Figure 4.6.
The system is specied in MATLAB. After performing MATLAB-based pro-
ling and manual analysis of the algorithm complexity in Section 4.2, the most
computationally expensive part of the OFDM system is allocated onto the hard-
ware (FPGA) and the rest of the system is allocated to software (MATLAB). The
communication between hardware and software is achieved by a HW/SW inter-
face. The hardware part is transformed into C code from MATLAB. The C code is
tested and rened until it exactly mimics the functionality given by the MATLAB
specication. The C code, which is sequential in nature is rewritten in Handel-C
(DK Suite), introducing parallelism to the code. The Logic Estimator gives the
estimated number of gates that would be used in the FPGA when the Handel-C
code is synthesized and Placed/Routed. The goal of the Handel-C code is to max-
imize the use of parallelism while keeping the number of gates required below the
number of gates available on the FPGA.
4.4 Methodology for Channel Model Implementation 91
Figure 4.6: Activities of the FPGA implementation
Though DK Suite has a capability for hardware/software partitioning using
Data Stream Manager, it was decided not to use it, as it requires that the whole
system description is provided in C language.
After the Handel-C code is optimized, it is synthesized into a bit le using
Place/Route tool provided by the FPGA vendor. Place/Route is the process of
implementing the design on the target silicon.
Finally, the hardware and software integrated and the implementation is tested
for its adherence to the system specication. In this project, priority is given to
speed improvement while maintaining functionality intact.
However, not all of the steps were performed. Some of them were changed or
92 Implementation of the OFDM Simulator
removed, as explained in the following sections.
4.5 Channel Model Implementation
4.5.1 Partitioning and Allocation
As already described in Section 4.2, the simulator is split into hardware and soft-
ware parts by ofoading the Jakes channel simulator and convolution of the trans-
mitted signal with the channel to hardware. The rest of the simulator, namely the
transmitter, receiver and a part of the channel simulator that is responsible for
adding White Gaussian Noise to the signal continue to be MATLAB models. From
usability point of view, this allocation has the following advantages:
SNR control is left entirely to the software part, making its adjustment ex-
ible;
the transceiver system is decoupled from channel model, and therefore, can
be developed separately;
the most computationally intensive parts of the simulator are implemented
in hardware.
Two interface ports are present on both the Celoxica RC203 board and the PC
system - parallel and Ethernet. The parallel port is commonly used to download
a bit le to the FPGA or on-board Flash memory, but its speed is not enough for
the co-simulation purposes. The transmitted and received data rates are identical
(see Figure 4.4), therefore, the overall maximum available bandwidth of 150KB/s
would be divided by two, resulting in maximum 75KB/s uplink and downlink (as-
suming that the port is fully utilized). Each transmitted and received sample is a
complex number with 38 total bits (see page 98), so it is only possible to send and
receive a maximum of 15,000 OFDM samples per second, if every clock cycle
was dedicated for parallel port operations only. Noting that communication
should take as less clock cycles as possible (thereby, enabling more computation
to be performed), it is infered that the data rate of parallel port does not allow a
simulation speed signicantly above that of PC-only simulations. After these es-
timations, the parallel port is not used for data transfer, only FPGA programming.
The 10/100 Mbit Ethernet port provides enough throughput, but unlike the
parallel port, it has an nondeterministic timing (due the possibility of lost/corrupt
network packets, handshaking, etc), and a complicated API. In order to reduce
the transmission delays, packet dropping and corruption problems, it is decided to
4.5 Channel Model Implementation 93
connect the development board directly to a PC (without intermediate routing or
switching equipment).
A block diagram of the system is shown in Figure 4.7
Figure 4.7: Block diagram of the simulator
4.5.2 Hardware/Software Implementation Testing
The channel simulator algorithm is written in C language. An interface between
MATLAB and C is implemented with binary les for data transfer. From a higher
abstraction level, the les and the Ethernet port driver provide similar functionality
- they pass data from one part of the simulator to another. The test setup is shown
in Figure 4.8.
The Jakes simulator contains sine and cosine functions that are implemented
with look-up tables in order to reduce computational complexity.
The MATLAB simulator is modied to output its transmitted OFDM samples
to a le. Received samples, produced with the MATLAB channel simulator are
also written to another le. They are considered to be the expected output from
the hardware implementation. The C program (the same channel simulator as in
MATLAB) is executed and its results are compared with the expected results from
MATLAB. This setup allowed us to debug and nally bring the C implementation
to a level where no errors could be found in a reasonable test run time.
94 Implementation of the OFDM Simulator
Figure 4.8: Block diagram of HW/SW implementation test setup
4.5.3 Fixed-point MATLAB
In order to proceed with the FPGAimplementation of the channel simulator, xed-
point simulations are performed in MATLAB. The reason for doing this is that
oating-point operations are expensive to generate in hardware in terms of area,
power and performance. Therefore, the oating-point operations are usually trans-
formed into xed-point for hardware implementation [Jussel, 2005 ].
The xed-point simulations in MATLAB are performed in the following way:
1. A reference model is required. It is the unmodied MATLAB model with
oating-point arithmetics;
2. The reference model is duplicated. The second copy is converted to xed-
point arithmetics by using objects from MATLAB Fixed-Point Toolbox
[MathWorks, 2005]. The objects provide the designer with automatic es-
timates of variable precision and additionally allow him/her to ne-tune var-
ious precision parameters;
3. Identical conditions have to be provided for both the models in order to
be able to compare the outputs, e.g., random variables must be changed
to predened variables or be driven by the same random process. Simu-
lations are performed with the oating-point and xed-point models. The
error between the outputs is analyzed. Fixed-point precision is adjusted and
simulations repeated until the designer is satised with the error level;
4.5 Channel Model Implementation 95
4. The number of integer and fractional bits is dened after a set of simula-
tions. This setup is used in the later xed-precision hardware implementa-
tion.
The rst xed-point simulations showed that 40 bit xed-point arithmetics are
required. According to [zer et al, 2003], bit-width reduction causes signicant
reductions in area and power consumption. Potentially, it may decrease the clock
period by shortening the wire lengths and the sizes of circuits due to reduction
in bits. As shown in an example in [Traquair, 2005], the number of gates used
in an FPGA implementation increases by almost four times when bit count is
doubled. The maximum clock rate of the FPGA is also limited when high bit
count arithmetics are employed.
Instead of pursuing implementation using 40 bit xed-point arithmetics, the
Jakes simulator is decided to be optimized.
4.5.4 Jakes Model Optimization
The purpose of the following optimization is to transform the Jakes simulator al-
gorithm in order to reduce the required precision. After the optimization, the area
of the design is decreased as the result of decreased precision requirement. The
optimization covers two domains of the Rugby model, namely computation and
data, because the executable model is changed - on a higher abstraction level the
simulator provides results identical to the original model, but the lower abstrac-
tion implementation is different. In the data domain, the number of bits required
is reduced, so affecting the models data types.
96 Implementation of the OFDM Simulator
From Section 3.3.2, Jakes simulator is written as:
g(t) = g
1
(t) + g
2
(t) (4.1)
g
1
(t) =
2

N
osc
M+1

n=1
u
n
cos(
n
t +
n
) (4.2)
g
2
(t) =
2

N
osc
M+1

n=1
v
n
sin(
n
t +
n
) (4.3)
g normalized low-pass process of Jakes model;
g
1
inphase component of Jakes model;
g
2
quadrature component of Jakes model;
M reduced number of oscillators;
initial phase associated with a propagation path (random variable, uniformly distributed
over [, ]);
Different attenuations A
p
and random initial phases are assigned to paths
(hence 2-dimensional phase matrix ), see Section 3.3.3. From this point on we
denote multipath Jakes process as g. The path attenuation is included in the in-
phase and quadrature components of the Jakes simulator:
g(t, l) = g
1
(t, l) + g
2
(t, l) (4.4)
g
1
(t, l) =
2

N
osc

_
A
p
2
M+1

n=1
u
n
cos(
n
t +
n,l
) (4.5)
g
2
(t, l) =
2

N
osc

_
A
p
2
M+1

n=1
v
n
sin(
n
t +
n,l
) (4.6)
(4.7)
l path index;
q
A
p
2
gain adjustment for pth path;
4.5 Channel Model Implementation 97
The sine and cosine functions in Equations 4.2 and 4.3 are expanded as:
sin( + ) = sin cos + cos sin (4.8)
cos( + ) = cos cos sin sin (4.9)
After moving the path gain adjustment
_
A
p
2
inside g
1
(t, l) and g
2
(t, l) equa-
tions, we obtain:
g
1
(t, l) =
M+1

n=1
_
2 A
p
N
osc
u
n
cos(
n
t +
n,l
) (4.10)
g
2
(t, l) =
M+1

n=1
_
2 A
p
N
osc
v
n
sin(
n
t +
n,l
) (4.11)
After expanding according to 4.8 and 4.9,
g
1
(t, l) =
M+1

n=1
_
cos(
n
t)
_
2 A
p
N
osc
u
n
cos(
n,l
. .
P1
)
sin(
n
t)
_
2 A
p
N
osc
u
n
sin(
n,l
)
. .
P2
_
(4.12)
g
2
(t, l) =
M+1

n=1
_
sin(
n
t)
_
2 A
p
N
osc
v
n
cos(
n,l
)
. .
P3
+
cos(
n
t)
_
2 A
p
N
osc
v
n
sin(
n,l
)
. .
P4
_
(4.13)
98 Implementation of the OFDM Simulator
Calculation of the four terms P1, P2, P3 and P4 in the equations above can
be substituted with pre-calculated matrices of size [ML], where L is the number
of paths, since v, and path attenuations are known in advance and do not change
during the simulation.
Finally,
n
t requires a large number of bits because may contain big values
when Doppler frequency is high. Simulation time t is a small number, requiring a
large number of fraction bits. But the nature of t can be exploited by substituting
the multiplication
n
t by addition of smaller bit count xed-point numbers. We
know that t = 0 at the start of the simulation. The proceeding time is incremented
by a xed sample time. Therefore, another vector that contains the current
n
t
values is introduced. The vector is initialized with zeros and increased by
n

T
s
(a constant vector requiring less precision bits) at each simulated sample of
the transmitted signal. Periodicity of sine and cosine functions is also exploited
by subtracting 2 from the elements of the vector whenever 2 is exceeded. It
guarantees that the sine/cosine function arguments never exceed 2. The number
of required integer bits for the arguments can now be safely reduced to 3 (2
3
=
8 > 2).
This approach to time requires rewriting the channel simulator so that all the
operations are performed for the current sample only. MATLAB simulations be-
come slower because the fast built-in vector operations are no longer used. For
implementation in C language this is not an issue because all vector operations
are written and optimized manually.
The following step is to perform xed-point MATLAB simulations for the op-
timized code. This showed that having signed xed-point numbers with 4 integer
and 15 fraction bits provided a precision close to oating-point simulations as
depicted in Figure 4.9. Further increasing the number of fractional bits does not
change the error between xed- and oating-point simulations signicantly.
To conclude, the rearrangement of the Jakes simulator signicantly reduces
the required precision and allows an FPGA implementation to be feasible.
4.5.5 Handel-C Implementation
After the Ccode is tested to produce output identical to that of MATLAB and xed-
point simulations are complete (required xed-point precision is known), Handel-
C code development is initiated. The purpose of C to Handel-C conversion is not
only to introduce xed-point arithmetics with specic word widths to the code,
but also to specify sections of code that should execute in parallel. Therefore, the
execution time is decreased.
4.5 Channel Model Implementation 99
Figure 4.9: Fixed-point simulation results
Memory Usage Optimization
At the very start of the transformation of C code to Handel-C, it is found that the
algorithm has a requirement for a high gate count. It is because of the buffers
for the transmitted and received signals. They alone consumed about 50% of all
the available gates in the FPGA, as shown by DK Suite Logic Estimation Tool, if
implemented as register arrays.
The buffers can be stored in an external or internal RAM. Turning the buffers
into RAM arrays has the drawback that RAM can be accessed only once per clock
cycle, which creates restrictions for maximum achievable parallelism. The chan-
nel simulator algorithm is optimized in terms of memory usage and tested again.
The area usage is reduced as a result of this optimization.
The buffers are substituted with a "sample in - sample out" approach. Only a
few small buffers are maintained inside the FPGA, therefore, they can be imple-
mented as fast register arrays. Buffering of the transmitted and received signal is
left to the on-board Ethernet device. The nal overall channel simulator algorithm
is given in Figure 4.10.
100 Implementation of the OFDM Simulator
The algorithm runs in a loop, where a new input (one sample from the trans-
mitted OFDM symbol) is taken at the start. The simulation of L Jakes paths is
performed together with channel convolution with the transmitted signal in lines
4-11 of Figure 4.10. A circular buffer, large enough to hold N
ds
OFDM samples
is maintained. In line 7, a call to function g is made. The implementation of this
function is given in Figure 4.11. It returns a complex attenuation coefcient for a
given path l. The function g is realization of Equations 4.4, 4.12 and 4.13.
When the OFDM sample transmission through all channel paths is complete,
the result is sent back to the software part of the simulator (line 12). Finally, the
simulation time is increased by one OFDM sample time T
s
(lines 14-18). The
explanation for such time management is given in Section 4.5.4.
begin 1
while ContinueSimulation do 2
buffer [index [0]] sentSample ; 3
l 0; 4
recSample 0; 5
while l < L do 6
recSample recSample + buffer [index [l ]] g(l, wt); 7
index [l ] index [l ] + 1; 8
if index [l ] >= L then 9
index [l ] 0; 10
l l + 1; 11
returnResult(recSample); 12
n 0; 13
while n <= M do 14
wt [n ] wt [n ] + (w [n] T
s
); 15
if wt [n ] >= 2 then 16
wt [n ] wt [n ] - 2; 17
n n + 1; 18
end 19
Figure 4.10: Channel simulator algorithm
The optimized Jakes algorithm is given in Figure 4.11.
4.5 Channel Model Implementation 101
begin 1
Real(res) 0; 2
Imag(res) 0; 3
n 0; 4
while n <= M do 5
t1 wt [n ]; 6
t2 cos(t1); 7
t3 sin(t1); 8
t4 t2 * P1 [n ][l ]; 9
t5 t3 * P2 [n ][l ]; 10
t6 t3 * P3 [n ][l ]; 11
t7 t2 * P4 [n ][l ]; 12
Real(res) Real(res) + t4- t5 ; 13
Imag(res) Imag(res) + t6 + t7 ; 14
n n + 1; 15
returnResult(res); 16
end 17
Figure 4.11: Optimized Jakes algorithm
Hardware-Software Interface
A wrong assumption about the interface between the RC203 development board
and a PC was made while designing the system. A signicant effort is required to
make the interface because the Ethernet controller available on the board provides
only Data Link level communications. The following possible solutions were
considered:
nding a TCP/IP protocol stack implementation on FPGA, or
performing Data Link layer communications from MATLAB.
Considering the scope of the project, it is infeasible to implement either of
these two possibilities. The Celoxica RC200/203 hardware and PSL Reference
Manual was found slightly misleading, because it does not give any information
on the Open Systems Interconnection (OSI) layers. Therefore, the data packets
that are mentioned in the document were mistakenly treated as TCP/IP packets
while designing the system. Also, the lack of experience with such Ethernet con-
trollers played a major role. The wrong assumption was noticed only later, while
writing the Handel-C code.
102 Implementation of the OFDM Simulator
Finally, it was decided not to implement the link between the PC and the de-
velopment board. Instead of that, the focus was concentrated on an efcient im-
plementation of the hardware part of the algorithm. The cost of communication
with the PC is not considered anymore. This is justied as PCI-card FPGA co-
processor accelerators, such as the Celoxica RC2000 exist. They provide a fast
link between the PC chipset and FPGA (such as 528 MBps (64-bit 66MHz) PCI
bus in the case of the RC2000). Therefore, our approach for the hardware accel-
erator implementation is still valid.
Parallelism and Scheduling
We decided to complete the implementation part by obtaining and analyzing the
estimates about speed improvement in the hardware part when trade-offs between
resource usage (area) and parallelism (speed) are made. These estimates include:
manual estimates of clock cycles per OFDM sample transmission simula-
tion;
gate, CLB and memory usage estimates from the DK Logic Estimation Tool;
simulated number of clock cycles (to verify the correctness of manual esti-
mates).
Two extreme cases are considered - no parallelism and full achievable par-
allelism. The level of parallelism is increased by adding inline functions and
par{ ... } statements to an initially sequential code. Function inlining in
itself does not introduce parallelism, but allows more than one instance of the
function to be synthesized and to be executed in parallel. Inlining and parallel
execution increases the area and decreases the time.
In order to explore the trade-offs between inherent parallelism and resource
utilization in the source code, a Data Flow Graph (DFG) of the Jakes simulator is
constructed. The DFG is shown in Figure 4.12. A resource-constrained schedule
(see Section 4.1.3) is derived from the DFG and is shown in Figure 4.13. One
multiplier and one ALU functional units are assumed to be the constraints in terms
of the resources. Resource-constrained scheduling produces a schedule that ts a
certain architecture. However, all the inherent parallelism may not be exploited. It
can be seen from Figure 4.13 that the execution takes place in nine control steps.
As the main goal of the implementation in this thesis is to reduce the execution
time, a time-constrained schedule (see Section 4.1.3) is derived from the initial
DFG and shown in Figure 4.14. It is found that the minimum execution time is
4.5 Channel Model Implementation 103
Figure 4.12: Data Flow Graph of the Jakes simulator
achieved when four multipliers and two ALUs are employed. Compared to the
resource-constrained schedule, the area usage is higher. The execution time is
reduced to six control steps.
A parallel version of the Handel-C program is written by including the par-
allelism of the time-constrained schedule. The sequential and parallel Handel-C
source code that are used during the investigation are given in Appendices C and D.
104 Implementation of the OFDM Simulator
Figure 4.13: Resource-constrained schedule
Time and Resource Estimation
First, the number of FPGA clock cycles used for channel simulation per OFDM
sample are counted and accumulated manually. The following rules are used:
assignment operator takes one clock cycle;
4.5 Channel Model Implementation 105
Figure 4.14: Time-constrained schedule
the instructions following the par{ ... } do not execute until all branches
of the parallel block are complete.
Next, the execution of the source code is simulated. The number of clock
cycles used for all the functions of the programs are counted step-by-step in the
simulator. The results are shown in Table 4.1 (Clocks Man in the table represents
manual estimates, Clocks Sim corresponds to simulated values). The manual esti-
mation is performed only once for each version of the code. It is performed only
once in order to avoid any possible bias during the analysis. An accurate manual
prediction of the timing of the Handel-C code is found to be possible.
An important result is that by specifying parallelism in the Handel-C code, the
number of clock cycles required to simulate a transmission of one OFDM sample
can be reduced by 60%.
In order to acquire the estimates about the number of NAND gates, memory
and ip-op usage, both versions of the Handel-C code are compiled to Elec-
tronic Design Interchange Format (EDIF) with DK Suite. The compiler produces
a report regarding the resource usage for each line of the source code. The total
resource usage is given in Table 4.1. The number of memory bits and ip-ops
106 Implementation of the OFDM Simulator
Function Implementation Difference
Min. parallelism Max. parallelism
Sin() Clocks Man 6 Clocks Man 5 17%
Clocks Sim 6 Clocks Sim 5 17%
Cos() Clocks Man 7 Clocks Man 6 14%
Clocks Sim 7 Clocks Sim 6 14%
CircIndex() Clocks Man 1 Clocks Man 1 -
Clocks Sim 1 Clocks Sim 1 -
InitJakes() Clocks Man 12 Clocks Man 1 92%
Clocks Sim 12 Clocks Sim 1 92%
JSample() Clocks Man 465 Clocks Man 190 59%
Clocks Sim 486 Clocks Sim 195 60%
CMul() Clocks Man 6 Clocks Man 2 67%
Clocks Sim 6 Clocks Sim 2 67%
CAdd() Clocks Man 2 Clocks Man 1 50%
Clocks Sim 2 Clocks Sim 1 50%
CAssign() Clocks Man 2 Clocks Man 1 50%
Clocks Sim 2 Clocks Sim 1 50%
Main() Clocks Man 3,089 Clocks Man 1,271 57%
(All code) Clocks Sim 3,122 Clocks Sim 1,246 60%
Memory 12,448 Memory 12,448 -
NAND Gates 545,993 NAND Gates 631,770 14%
Flip Flops 1,176 Flip Flops 1,176 -
Table 4.1: Costs of FPGA implementation
(registers) is the same for the sequential and parallel versions. This is justied as
the same global and local variables are used in both versions. However, the num-
ber of NAND gates is higher in the case of parallel execution, as more hardware
is synthesized as compared to the sequential version.
The nal step towards FPGA implementation is to Place/Route the EDIF rep-
resentation of the design. Xilinx ISE 7.1i toolkit is used. The Place/Route tool
gives a report about the timing, power, area, functional unit usage, etc. As the
goal is to speedup the simulator, efforts are spent only in reducing the clock time
while ensuring that the limitations specic to the FPGA chip are fullled.
Place/Route is performed for the EDIF les obtained from DK Suite. Both
sequential and parallel Handel-C code versions result in a maximum FPGA clock
frequency of 50 MHz (20ns clock time). The detailed results are given in Appen-
dices C and D.
4.5 Channel Model Implementation 107
The maximum clock frequency depends on the critical path of the design.
Handel-Csemantics guarantee that an assignment takes a single clock cycle. There-
fore, assignments to complex operations are synthesized to equally complex logic
that must be executed in one clock cycle. As the logic depth increases, the clock
period has to be increased in order to accommodate the complex logic. A sin-
gle complex operation (critical path) may decrease the performance of the whole
system because the clock period is determined according to the critical path. By
simplifying or pipelining complex operations, the performance of the system can
be improved. In our case, the report of the DK Logic Estimation Tool showed that
the critical path is in the xed-point library. In order to reduce the clock period,
the Celoxica xed-point library must be substituted with its pipelined version.
Finally, the simulation speed is compared between the FPGA/Handel-C (esti-
mated) and PC/MATLAB (measured) simulators. The comparison is given in Ta-
ble 4.2. A 5.5 speedup factor over a 2.4GHz CPU (see Section 3.3.4) is achieved
by using an FPGA running at only 50MHz. If the difference in the clock fre-
quencies is taken into account, the efciency of the FPGA implementation is 264
(
2.400MHz
50MHz
5.5) times higher to that of the PC.
Sequential Parallel
FPGA Clock Rate 50 MHz 50 MHz
Clocks per OFDM sample 3, 122 1, 246
OFDM Samples/s, FPGA
5010
6
3,122
= 16, 000
5010
6
1,246
= 41, 100
OFDM Samples/s, PC 7, 500 7, 500
Speedup 2.13 5.48
Table 4.2: Speedup estimation results
It is not completely fair to compare only the pure performance of a general
purpose CPU and FPGA, because:
the present day FPGA technology does not achieve clock frequencies com-
parable to those of CPUs;
MATLAB operates on its native 64-bit oating-point format numbers, whereas
FPGA implementation uses 19 bit xed-point arithmetics in our case;
the MATLAB simulator is a mixture of translated and interpreted programs,
therefore, the overhead of the interpreter has to be taken into account;
MATLAB simulations are performed in a multi-tasking operating system
environment and are constantly interrupted by other concurrent processes;
108 Implementation of the OFDM Simulator
a signicant effort is spent in optimizing the Handel-C code as compared to
the MATLAB simulator.
4.6 Design Options and Decisions
The overview of the options and decisions that were made during the design pro-
cess is shown in Figure 4.15. The dark path shows the design ow decisions. The
light lines show the options that were considered but not followed.
Figure 4.15: Options and decisions in the HW/SW co-design process
4.7 Summary 109
4.7 Summary
This chapter presents the characterization of various steps involved in a traditional
co-design ow according to the Rugby meta model.
A literature study of the various steps involved in hardware/software co-design
is performed. A methodology is chosen for the Implementation part of the thesis
based on the literature study.
An analysis of the underlying complexity of the OFDM simulator algorithm is
performed. The algorithm is divided into grains and partitioned between hardware
and software. The partitioning decision is supported with an evaluation of the
computational complexity, operation nature and data rates of different grains of
the algorithm.
A brief account of various factors that are considered while choosing an archi-
tecture for the implementation is also given.
The Jakes channel simulator is optimized in order to reduce the high bit count
requirement for xed-point arithmetics. The requirement of 40-bit xed-point
arithmetics found initially. It is reduced until 19-bit by optimizing the Jakes
simulator algorithm. This signicantly reduces the chip area and allows achieving
higher clock rates.
Finally, the channel simulator is implemented using the Handel-C program-
ming language. Cycle-accurate simulations and FPGA synthesis show that it is
possible to reduce the channel simulation time by 5.5 times compared to simula-
tion on an average PC.
CHAPTER 5
CONCLUSIONS
The overall goals of the project were to:
study and analyze a generic OFDM downlink system;
build a generic OFDM downlink simulator in MATLAB by following a spe-
cic methodology;
analyze channel modeling and estimation techniques;
decrease the execution time of the simulator;
explore and apply hardware-software co-design methodologies.
To achieve these goals, we decided to break the tasks in the thesis into two
parts namely: 1) the analytical part, 2) the implementation part. The conclusions
for the two parts are presented in this chapter.
5.1 The Analytical Part
The thesis started with a study of various issues related to communication systems
and OFDM. The analytical part provides a mathematical description of a generic
OFDM transceiver. A low complexity MATLAB simulator was written to test the
transceiver.
The low complexity MATLAB simulator does not represent all the issues that
are encountered in a real OFDM system. It is assumed that a perfect channel be-
tween the transmitter and the receiver exists. However, such channel does not exist
112 Conclusions
in reality. In order to make the transceiver usable in realistic environments, vari-
ous mathematical channel models are considered. Sophisticated channel estima-
tion and equalization techniques are required at the receiver when more realistic
channel models are used.
We followed an iterative approach while building the simulator as professed
by the XP methodology that was used while pursuing the analytical part. As the
understanding of the issues regarding channel modeling and estimation improved,
several iterations of the simulator were derived.
The rst channel model considered only additive white Gaussian noise (page
34). It does not require equalization because multipath effects, present in a real-
istic channel, do not occur. The performance of the OFDM system was measured
with simulations. It matched the theoretical performance, and therefore, gave us
condence about the model.
Further, a multipath Rayleigh channel was investigated (page 38). The chan-
nel effects are written as a multiplication of the transmitted symbols with channel
matrices, the number of which depends on the maximum excess delay of the chan-
nel. Therefore, the analytical model of the channel is not constrained to a certain
maximum excess delay. Our channel model is more accommodating than many
other channel models proposed in the literature. It allows a particular OFDM
symbol to contain ISI from more than one previous OFDM symbols. In addition,
fast-fading channels can be modeled. The use of Cyclic Prex in combating the
effects of ISI was investigated. It was shown by the simulation results that the per-
formance of the OFDM transceiver deteriorates when the maximum excess delay
of the channel exceeds the cyclic prex length.
Then, the Jakes channel simulator (page 50) was included to the project as
it introduces time correlation to the channel. The properties of both single- and
multi-path Jakes channel model were examined. The effect of Doppler spread on
the performance of the transceiver was observed in the results of the simulation.
Various ITU channel proles were presented and adapted to a channel sampling
rate. A discussion about the implementation of the Jakes simulator was given.
The nal task of the analytical part of the thesis was to provide a channel
estimation algorithm that would be able to restore the transmitted signal after it is
corrupted by the fast-fading multipath channel. A literature study was performed.
We decided to concentrate on time-domain channel estimation techniques as they
provide better spectral efciency compared to frequency-domain estimation. The
results of a simple channel estimator showed a big error level. Therefore, we
attached an adaptive process to the channel estimator in order to reduce the error
(page 71). It was found that adaptive estimation improves the performance.
5.2 The Implementation Part 113
5.2 The Implementation Part
The goal of the implementation part was to decrease the execution time of the
simulator that was built in MATLAB in the analytical part. The simulator con-
tains an OFDM transceiver with an adaptive channel estimator. The channel is a
multipath fast-fading Jakes simulator.
We started by presenting a generic hardware/software co-design ow. The
generic co-design ow was explained in terms of the Rugby meta-model. The
various steps in the co-design ow were explored by a literature study.
The analysis of the OFDM simulator was performed to nd the distribution
of computational complexity. After MATLAB code proling and manual com-
plexity analysis, we found out that 90% of the complexity is concentrated in the
few program lines of the channel model. The operations in the channel model
are computation-oriented. The rate of data passed through the interface between
the transceiver and the channel simulator was found to be much lower than in-
side the channel simulator. From these observations, we decided to ofoad the
channel model into a hardware accelerator. The accelerator was decided to be
implemented on an FPGA development board.
The DK Suite methodology was adapted to the requirements for the imple-
mentation of the channel model.
The Jakes channel model was optimized to reduce the number of precision
bits required. We showed that the optimization allows using 19-bit arithmetics
instead of 40-bit, while keeping the functionality intact. As a consequence, a
higher maximum clock rate of the FPGA can be achieved and the usage of the
FPGA resources is reduced. Such optimization with the obtained results has not
been found elsewhere in the literature.
The MATLAB channel simulator was written in Handel-C programming lan-
guage. Two versions of the optimized xed-point code, sequential and parallel
were produced. The speedup was estimated for both versions. It was found that
the parallel code consumes around 60% less clock cycles to execute than the se-
quential version. The overall simulation time obtained by using the parallel code
executing on an FPGA is 5.5 times lower than that of an average PC system run-
ning the MATLAB simulator.
114 Conclusions
5.3 Future Work
The work presented in this Masters thesis has some assumptions. The following
items could be further investigated:
improving the BER performance in channels with very high Doppler fre-
quency (high mobility OFDM applications) by using more sophisticated
adaptive channel estimators;
the transmitter/receiver synchronization issues, modulation and source cod-
ing techniques;
improving the simulator performance in terms of execution speed by using a
more efcient xed-point arithmetic library and/or pipelining the simulator;
development of a working prototype. The current hardware implementation
does not have an interface to the transceiver running on a PC.
BIBLIOGRAPHY
[AccelChip, 2005] AccelChip, 2005: AccelWare IP for DSP Design Targeting
FPGAs and ASICs.
Internet: <http://www.accelchip.com/files/whitepapers/
3_05_AccelWare_IP_0304.pdf> Date Visited: 28-04-2005
[Anniballe, 1993] Anniballe, J.V.D., P. J. Koopman, Jr., 1993: Towards execution
models of distributed systems: a case study of elevator design. ACMDEEE
International Workshop on Hardware-Software Co-design, October 1993.
[Axelsson, 1997] Axelsson, J., 1997: Analysis and Synthesis of Heterogeneous
Real-Time Systems. Phd. Thesis, No. 502, Linkping University, 1997.
[Beek, 1995] Beek, J. J. van de, O. Edfors, M. Sandell, S. K. Wilson, P. O. Br-
jesson, 1995: On channel estimation in OFDM systems. Proceedings IEEE
Vehicular Technology Conference, July 1995. pp. 815-819.
[Brewer, 2001] Brewer, J., 2001: Extreme Programming FAQ
Internet: <http://www.jera.com/techinfo/xpfaq.html> Date
Visited: 06-12-2004.
[Brooks, 2004] Brooks, T., 2005: Key questions to consider when using a highly
integrated DSP. GSPx Conference, September 2004.
[Celoxica, 2005 (1)] Celoxica, 2005: RC203.
Internet: <http://www.celoxica.com/products/rc203/
default.asp> Date Visited: 20-04-2005
[Celoxica, 2004 (2)] Celoxica, 2004: Handel-C Language Reference Manual.
[Celoxica, 2005 (3)] Celoxica, 2005: Technology Backgrounder.
Internet: <http://www.celoxica.com/corporate/tech_
backgrounder_01000.pdf> Date Visited: 26-04-2005
116 BIBLIOGRAPHY
[Celoxica, 2005 (4)] Celoxica, 2005: Handel-C Language Overview.
Internet: <http://www.celoxica.com/techlib/files/
CEL-W0307171KDD-47.pdf> Date Visited: 26-04-2005
[Chang, Gibby, 1968] Chang, R. W., R. A. Gibby, 1968: A Theoretical Study
of Performance of an Orthogonal Multiplexing Data Transmission Scheme.
IEEE Transactions on Communication Technology, Vol. 16, No. 4, August
1968. pp. 529-540.
[Chang et al, 2003] Chang, J. W., D. S. Park, J. R. Cleveland, 2003: Summary of
Delay Proles for MBWA. IEEE 802 Executive Commitee Study Group on
Mobile Broadband Wireless Access.
Internet: <http://grouper.ieee.org/groups/802/20/
Contribs/C802.20-03-77.ppt> Date Visited: 26-11-2004
[Chou et al, 1995] Chou, P. H., R. B. Ortega, G. Borriello, 1995: The Chinook
Hardware/Software Co-Synthesis System. Proceedings of the 8th international
symposium on System Synthesis, ACM Press, 1995. pp. 22-27.
[Coleri et al, 2002] Coleri, S., M. Ergen, A. Puri, A. Bahai, 2002: A Study of
Channel Estimation in OFDM Systems. Procedings Of IEEE Vehicular Tech-
nology conference, Fall 2002. pp. 894-898.
[Cosyma, 1998] Cosyma, 1998: Cosyma Architecture and Input Languages.
Internet: <http://www.ida.ing.tu-bs.de/research/
projects/cosyma/overview/> Date Visited: 28-03-2005
[CoWare, 2005] CoWare, 2005: SPW Product Oveview.
Internet: <http://www.coware.com/products/spw4.php> Date
Visited: 30-04-2005
[CSys, 2004] Larsen, L. S.,2004: Cellular Systems Division. Internet: <http:
//kom.aau.dk/csys/index.htm> Date Visited: 27-04-2005
[Dent et al, 1993] Dent, P., G.E. Bottomley, T. Croft, 1993: Jakes Fading
Model Revisited. IEEE Electronic Letters, Volume 29, No.13, January 1993.
pp.1162-1163.
bibitem[Dick, 2000]dick Dick, C.,2000: FPGAs:The High-End Alterna-
tive for DSP Applications. Internet: <http://www.hunteng.co.uk/
pdfs/tech/DSP1736FPGA.pdf> Date Visited: 05-05-2005
BIBLIOGRAPHY 117
[Edfors et al, 1998] Edfors, O., M. Sandell, J. J. van de Beek, S. K. Wilson, P. O.
Brjesson, 1998: OFDM Channel Estimation by Singular Value Decomposi-
tion. IEEE Transactions on Communications, Vol. 46, No. 7, July 1998. pp.
931-939.
[Engelhart et al, 1999] Engelhart, A., H. Gryska, C. Sgraja, W. G. Teich, J. Lind-
ner, 1999: The Discrete-Time Channel Matrix Model for General BDFM
Packet Transmission Schemes. Proceedings of International OFDM Work-
shop, 1999.
[Engels, 2002] Engels, M., 2002: Wireless OFDM Systems: How to Make Them
Work? Kluwer Academic Publishers.
[Engels, 2002 (2)] Engels, M, 2002: Wireless OFDM Systems. Kluwer Academic
Publishers.
[Ernst, 1998] Ernst, R., 1998: Codesign of Embedded Systems: Status and
Trends. IEEE Design and Test, Vol. 15, No. 2, April 1998. pp 45-54.
[Eyre, Bier, 2000] Eyre, J., J. Bier 2000: The Evolution of DSP Processors. IEEE
Signal Processing Magazine, Vol 2, No. 2, March 2000. pp 43-51.
[Frank, 1996] Frank, V., T. D. Le, Y. C. Hsu, 1996: A comparison of Functional
and structural Partitioning. Processdings of IEEE International Symposium
on System Synthesis, November 1996. pp. 121-126.
[Gajski et al, 1992] Gajski, D., N. Dutt, A. Wu, S. Lin, 1992: High-Level Synthe-
sis. Kluwer Academic Pulishers.
[Gajski, Ramachandran, 1994] Gajski, D. D., L. Ramachandran, 1994: Introduc-
tion to High-Level Synthesis. IEEE Design and Test of Computers, Winter
1994. pp 44-54.
[Ge, Yun, 1996] Ge, Y., D.Y.Y. Yun, 1996: A method that Determines Optimal
Grain Size and Inherent Parallelism Concurrently. Proceedings of Interna-
tional Symposium on Parallel Architectures, Algorithms and Networks, June
1996. pp 200-206.
[Groot, 1990] Heemstra de Groot S.M., 1990: Scheduling Techniques for Itera-
tive Data-Flow Graphs. PhD Thesis, University of Twente, The Netherlands.
[Hara, Prasad, 2003] Hara, S., R. Prasad, 2003: Multicarrier Techniques for 4G
Mobile Communications. Artech House.
118 BIBLIOGRAPHY
[Haykin, 2001] Haykin, S., 2001: Communication Systems, Fourth Edition. John
Wiley & Sons.
[Haykin, 2002] Haykin, S., 2002: Adaptive Filter Theory, Fourth Edition. Pren-
tice Hall.
[Hsieh, Wei, 1998] Hsieh, M., C. Wei, 1998: Channel Estimation for OFDM Sys-
tems Based on Comb-type Pilot Arrangement in Frequency Selective Fading
Channels. IEEE Transactions on Consumer Electronics, Vol. 44, No. 1, Febru-
ary 1998. pp 217-225.
[Hunt Engineering, 2004] , Hunt 2004: Choosing DSP or FPGA for your Appli-
cation.
Internet: <http://www.hunteng.co.uk/dsp-fpga.htm> Date
Visited: 1-03-2005
[Hutter et al, 2002] Hutter, A.A., R. Hasholzner, J.S. Hammerschmidt, 1999:
Channel estimation for mobile OFDM systems. IEEE Vehicular Technology
Conference, Vol. 1, Fall 1999. pp. 305-309.
[Hwang et al, 1991] Hwang, C. T., J.H. Lee, Y.C. Hsu: A formal approach to
scheduling problemin High Level Synthesis. IEEE Transactions on Computer-
Aided Design, Vol. 10, No. 4, April 1991. pp.464-475.
[IEC, 2004] IEC, 2004: OFDM for Mobile Data Communications.
Internet: <http://www.iec.org/online/tutorials/ofdm/>
Date Visited: 17-12-2004
[I-Logix, 2005] I-Logix, 2005: Embedded System Design Software.
Internet: <http://www.ilogix.com/statemate/statemate.
cfm> Date Visited: 27-04-2005
[Intel, 2004] Intel, 2004: WiMAX - Broadband Wireless Access Technology.
Internet: <http://www.intel.com/netcomms/technologies/
wimax> Date Visited: 26-11-2004
[Jantsch et al, 2000] Jantsch, A., S. Kumar, A. Hemani, 2000: A Metamodel for
Studying Concepts in Electronic System Design. IEEE Design Test of Com-
puters, Vol. 17, No. 3, July/September 2000. pp. 78-85.
[Jeffries, 2001] Jeffries, R., 2001: What is Extreme Programming?
Internet: <http://www.xprogramming.com/xpmag/whatisxp.
htm> Date Visited: 07-12-2004
BIBLIOGRAPHY 119
[Jeremic et al, 2004] Jeremic, A., T. A. Thomas, A. Nehorai, 2004: OFDMChan-
nel Estimation in the Presence in Interference. IEEE Transactions on Signal
Processing, Vol. 52, No. 12, December 2004. pp. 3429-3439.
[Jeruchim et al, 1992] Jeruchim, M. C., P. Balaban, K. S. Shanmugan, 1992: Sim-
ulation of Communication Systems. Plenum Press, NY.
[Jeruchim et al, 2000] Jeruchim, M. C., P. Balaban, K. S. Shanmugan, 2000: Sim-
ulation of Communication Systems. Plenum Press, NY.
[Jussel, 2005 ] Jussel, J., 2005: C to FPGA: An Abstract Concept for Concrete
Design Implementation.
Internet: <http://www.rtcmagazine.com/home/printthis.
php?id=100304> Date Visited: 28-04-2005
[Kim et al, 1999] Kim, Y. H., I. Song, H. G. Kim, T. Chang, H. M. Kim, 1999:
Performance Analysis of a Coded OFDM System in Time-Varying Multipath
Rayleigh Fading Channels. IEEE Transactions on Vehicular Technology, Vol.
48, No. 5, September 1999. pp. 1612-1615.
[Koch, 1996] Koch, P., 1996: Strategies for Realistic and Efcient Static Schedul-
ing of Data Independent Algorithms onto Multiple Digital Signal Processors.
PhD Thesis, Institute of Electronic Systems, Aalborg University, Denmark.
[Krauatrachue, Lewis, 1988] Krauatrachue, B., T. Lewis, 1988: Grain Size De-
termination for Parallel Processing. IEEE Software, Vol. 5, No. 1, January
1988. pp.23-32.
[Langton, 2002 ] langton, C., 2002: Orthogonal Frequency Division Multiplex-
ing
Internet: <http://www.complextoreal.com/chapters/ofdm2.
pdf> Date Visited: 10-09-2004
[Li, Guan, 2000] Li, Y., Y. L. Guan, 2000: Modied Jakes Model for Simulat-
ing Multiple Uncorrelated Fading Waveforms. Proceedings of IEEE vehicular
Technology Conference, Vol. 3, 2000. pp. 1819-1822.
[Litwin, Pugel, 2001] Litwin, L., M. Pugel, 2001: The Principles of OFDM.
Internet: <http://rfdesign.com/images/archive/
0101Puegel30.pdf> Date Visited: 10-09-2004
[Lyrtech, 2005 (1)] Lyrtech, 2005: Virtex-II-based SignalMaster-DSP/FPGA De-
velopment Products - Lyrtech Signal Processing.
Internet: <http://www.lyrtech.com/DSP-development/dsp_
fpga/signalmaster_cpci.php> Date Visited: 16-04-2005
120 BIBLIOGRAPHY
[Lyrtech, 2000 (2)] Lyrtech, 2000: Users Manual and Installation Guide fro SM-
C67X
[Madsen et al, 1997] Madsen, J., J. Grode,P. V. Knudsen, M.E. Petersen,
A.Haxthausen, 1997: LYCOS: the Lyngby Co-Synthesis System. Design Au-
tomation for Embedded Systems, Vol.2, No. 2, Kluwer Academic Publishers,
March 1997. pp. 195-235.
[MathWorks, 2005] The MathWorks, Inc., 2004: Fixed-Point Toolbox. Inter-
net: <http://www.mathworks.com/access/helpdesk/help/
toolbox/fixedpoint/> Date Visited: 20-03-2005
[McDonough, 1995] McDonough, R. N, 1995: Detection of Signals in Noise.
Academic Press.
[Mentor Graphics, 2003] Mentor Graphics Corporation, 2003: FPGAs: Fast
Track to DSP. Internet: <http://www.mentor.com/techpapers/
fulfillment/upload/mentorpaper_11937.pdf> Date Visited:
04-05-2005
[Micheli, 1994] Giovanni De Micheli, 1994: Synthesis and Optimization of Dig-
ital Circuits. McGraw-Hill.
[Mitra, 1998] Mitra, S. K, 1998: Digital Signal Processing. McGraw-Hill.
[Mitra, Basu, 1997] Mitra, R. S., A. Basu, 1997: Knowledge Representation in
MICKEY: An Expert System for Designing Microprocessor-Based Systems.
IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and
Humans, Vol. 27, No. 4, July 1997. pp. 467-479.
[Moher, Lodge, 1989] Moher, M. L., J. H. Lodge, 1989: A Modulation and Cod-
ing Strategy for Rician-Fading Channels. IEEE Transactions Vehiculat Tech-
nology, 40(4), pp.686-693.
[Morris, 2005 ] Morris, K., 2005: FPGA and Programmable Logic Journal
Internet: <http://www.fpgajournal.com/articles_2005/
20050426_power.htm> Date Visited: 03-05-2005
[Le Moullec et al, 2002] Le Moullec, Y., P.Koch, J.P.Diguet, J.L.Philippe: De-
sign Trotter: Building and Selecting Architectures for Embedded Multimedia
Applications. IEEE International Symposium on Consumer Electronics, De-
cember 2003.
[Nee, 2005] Nee, R. van, 2005: Basics and History of OFDM.
Internet: <www.ofdm-forum.com> Date Visited: 12-10-2004
BIBLIOGRAPHY 121
[Negi, Ciof, 1998] Negi, R., J. Ciof, 1998: Pilot Tone Selection for Channel
Estimation in a Mobile OFDM System. International conference on Consumer
Electronics, June 1998. pp. 466-467.
[NI, 2005] National Instruments, 2005: NI MATRIXx Design and Development
Tools.
Internet: <http://www.ni.com/matrixx/what_is_matrixx.
htm> Date Visited: 16-04-2005
[Niemann, 1998] Niemann, R., 1998: Hardware/Software Co-design for Data
Flow Dominated Embedded Systems. Kluwer academic Publishers.
[Nilsson et al, 1997] Nilsson, R., O. Edfors, M. Sandell, P.O. Brjesson, 1997:
An Analysis of Two-Dimensional Pilot-Symbol Assisted Modulation for
OFDM. IEEE International Conference on Personal Wireless Communica-
tions, December 1997. pp. 71-74.
[Orfanidis, 1996] Orfanidis, S. J, 1996: Introduction to Signal Processing. Pren-
tice Hall.
[Ouaiss et al, 1997] Ouaiss, I., S. Govindarajan, V. Srinivasan, M. Kaul, R. Ve-
muri, 1997: An Integrated Partitioning and Synthesis System for Dynamically
Recongurable Multi-FPGA Architectures. DDEL, University of Cincinnati.
[ien, 2003] ien, G. E., 2003: Modelling and Analysis of Wireless Fad-
ing Channels. Internet: <http://www.inf.fu-berlin.de/inst/
ag-tech/resources/material/> Date Visited: 20-02-2005
[zer et al, 2003] zer E., A. P. Nisbet, D. Gregg, 2003: Classication of Com-
piler Optimizations for High Performance, Small Area and Low Power in FP-
GAs. Technical Report, Department of Computer Science, Trinity College,
Dublin.
[Page, 2002 (1)] Page, I., 2002: Computing Without Computers
Internet: <http://www.doc.ic.ac.uk/~ipage/cwoc.html>
Date Visited: 26-04-2005
[Page, 2002 (2)] Page, I., 2002: The Handel Methodology
Internet: <http://www.doc.ic.ac.uk/~ipage/handel_
methodology.html> Date Visited: 26-04-2005
[Pop, Beaulieu, 2002] Pop M. F., N. C. Beaulieu, 2001: Limitations of Sum-
of-Sinusoids Fading Channel Simulators IEEE Transactions on Communica-
tions, Vol. 49, No. 4, April 2001. pp. 699-708.
122 BIBLIOGRAPHY
[Proakis, 2001] Proakis, J. G., 2001: Digital Communications. Mc Graw Hill.
[Proakis, Salehi, 2002] Proakis, J. G., Salehi M., 2002: Communication System
Engineering. Prentice Hall.
[Ptzold, Laue, 1998] Ptzold, M., F.Laue, 1998: Statistical Properties of Jakes
Fading Channel Simulator IEEE Vehicular Technology Conference, Vol. 2,
May 1998. pp. 712-718.
[Rappaport, 1996] Rappaport, T. S., 1996: Wireless Communications. Prentice
Hall.
[Reeves, 1995] Reeves, C. R., 1995: Modern Heuristic Techniques for Combina-
torial Problems. McGraw-Hill.
[Sadowsky, Kafedziski 1998] Sadowsky J. S., V. Kafedziski, 1998: On the Cor-
relation and Scattering Functions of the WSSUS Channel for Mobile Commu-
nications IEEE Transactions on Vehicular Technology, Vol 7, No. 1, February
1998. pp. 270-282.
[Sakthivel, 2004] Sakthivel A., 2004: Extreme Programming (XP) - An Overview
Internet: <http://www.c-sharpcorner.com/Code/2004/Sept/
ExtremeProgXP.asp> Date Visited: 10-12-2004
[Sandell, Edfors, 1996] Sandell M., O. Edfors, 1996: A Comparative Study
of Pilot-Based Channel Estimators for Wireless OFDM. Research Report
TULEA 1996:19, Div. of Signal Processing, LuleUniversity of Technology.
[Sanghamitra, Prith, 2004] Roy,S., P. Banerjee, 2004: An Algorithm for Convert-
ing Floating-Point Computations to Fixed-Point Computations to Fixed-point
in MATLAB based FPGA design. Proceedings of 41
st
Design Automation
Conference, June 2004. pp. 484-487.
[Schmidt, 2005] Schmidt, C. F., 2005: Exhaustive Search Techniques
Internet: <http://www.rci.rutgers.edu/~cfs/305_html/
Computation/ExhaustiveSearch_305.html> Date Visited:
22-04-2005
[Shanmugan, 1988] Shanmugan, K. S., 1988: Random Signals: Detection, Esti-
mation and Data Analysis. John Wiley and Sons Ltd.
[Simeone et al, 2004] Simeone, O., Y. Bar-Ness, U. Spagnolini, 2004: Pilot-
Based Channel Estimation for OFDM Systems by Tracking the Delay-
Subspace. IEEE Transactions on Wireless Communications, Vol. 3, No. 1,
January 2004. pp. 315-325.
BIBLIOGRAPHY 123
[Sklar, 2001] Sklar, B., 2001: Digital Communications, Second Edition. Prentice
Hall.
[Sramek, 2003] Sramek, C., 2003: Cyclic Prex
Internet: <http://cnx.rice.edu/content/m11762/latest/>
Date Visited: 19-12-2004
[Synopsys, 2005] Synopsys, 2005: COSSAP
Internet: <http://www.synopsys.com/products/success/
hyundai_ss.html> Date Visited:20-04-2005
[Telecom Glossary, 2000] Telecom Glossary 2K,
Internet: <http://www.atis.org/tg2k/> Date Visited: 05-12-2004
[Traquair, 2005] Traquair, 2005: Estimating FPGA Requirements for DSP
Applications.
Internet: <http://www.traquair.com/technology/fpga.
dspest.html> Date Visited: 06-05-2005
[Tsai, Chiueh, 2004] Tsai, P. Y., T. D. Chiueh, 2004: Frequency-Domain
Interpolation-Based Channel Estimation in Pilot-Aided OFDM Systems.
IEEE 59
th
Vehicular Technology Conference, Vol. 1, Spring 2004. pp. 420-
424.
[Veiverys et al., 2004] Veiverys, A., E. Jatkonis, M. urnait e, A. Rashid, 2004:
Analysis of Adaptive Car Noise Cancellation Algorithms and Hardware Ar-
chitecture Modelling on Ptolemy II Aalborg University
[Walker, 1991] Walker, R. A., R. Camposano, 1991: A survey of High-Level Syn-
thesis Systems. Kluwer Academic Pulishers.
[Weinstein, Ebert, 1971] Weinstein, S.B., P.M. Ebert, 1971: Data Transmission
by Frequency-Division-Multiplexing Using the Discrete Fourier Transform.
IEEE Transactions on Communications, Vol. 19, No. 5, October 1971. pp.
628-634.
[Wells, 2004] Wells, D., 2004: Extreme Programming
Internet: <http://www.extremeprogramming.org> Date Visited:
07-12-2004
[Wikipedia, 2004] Wikipedia, 2004: Extreme Programming
Internet: <http://en.wikipedia.org/> Date Visited: 08-12-2004
[Wolf, 1994] Wolf W.H., 1994:Hardware-Software Co-Design of Embedded Sys-
tems. Proceedings of the IEEE, Vol. 82, No. 7, July 1994. pp. 967-989.
124 BIBLIOGRAPHY
[Xcell, 2005] Xcell Journal Online,
Internet: <http://www.xilinx.com/publications/
xcellonline/partners/xc_celoxica44.htm> Date Visited:
06-05-2005
[Xiao et al, 2002] Xiao, C, Zheng Y. R, Beaulieu N. C, 2002: Second-Order Sta-
tistical Properties of the WSS Jakes Fading Channel Simulator IEEE Trans-
actions on Communications, Vol. 50, No. 6, June 2002. pp. 888-891.
[Xilinx, 2005] Virtex-II Platform FPGA User Guide,
Internet: <http://www.xilinx.com/bvdocs/userguides/
ug002.pdf> Date Visited: 01-05-2005
[Yaghoobi, 2004] Yaghoobi, H., 2004: Scalable OFDMA Physical Layer in IEEE
802.16 WirelessMAN. Intel Technology Journal, Vol. 8, No.3, August 2004.
pp. 201-212.
[Yoon et al, 2002] Yoon, P. K., P. H. Kar-Ming, N. C. Sum, 2002: Channel Esti-
mation for Mobile OFDMSystemwith different Detectors under Time-Varying
Rayleigh Fading Channel. The 8
th
International Conference on Communica-
tion Systems, Vol. 1, November 2002. pp. 294-298.
APPENDIX A
LIST OF SYMBOLS
A - path attenuation (pp. 58).
a - transmitted data symbol vector (pp. 16).
a - element of transmitted data symbol vector (pp. 16).
b - serial data input vector (pp. 15).
b - serial data input vector element (pp. 125).
b - bit (pp. 36).
C - estimation matrix for symbol retrieval (pp. 45).
D
f
- pilot spacing (pp. 66).
d - encoded data symbol vector (pp. 15).
d - encoded data symbol (pp. 15).
E - expectation (pp. 51).
F
0
- total wideband frequency width or OFDM sampling frequency, Hz
(pp. 12).
F
dec
- decision function (pp. 30).
126 List of Symbols
F
demap
- symbol demapping function (pp. 30).
F
eq
- channel estimation function (pp. 30).
F
map
- symbol mapping function (pp. 15).
f - sub-carrier frequency, Hz (pp. 12).
f
d
- maximum Doppler frequency occurring when = 0, Hz (pp. 40).
f
s
- OFDM sampling frequency, Hz (pp. 55).
G
cp
- cyclic prex insertion matrix (pp. 24).

G
cp
- cyclic prex removal matrix (pp. 29).
g - normalized low-pass process of Jakes model (pp. 51).
g - complex envelope of the reference or ideal process (pp. 51).
g
1
- inphase component of Jakes model (pp. 52).
g
1
- inphase component of the reference model (pp. 51).
g
2
- quadrature component of Jakes model (pp. 52).
g
2
- quadrature component of the reference model (pp. 51).
H - channel impulse response matrix (pp. 43).
H - transfer function (pp. 126).

H - channel estimate matrix (pp. 46).

h - vector of channel estimates (pp. 66).


h - channel impulse response (pp. 38).

h - estimate of channel impulse response (pp. 126).


h - complex tap of channel impulse response (pp. 43).
127
J
0
- zeroth order Bessel function of rst kind (pp. 51).
k - sub-carrier index (pp. 12).
L - number of paths (pp. 38).
l - path index (pp. 42).
M - reduced number of oscillators (pp. 52).
m - signalling interval (pp. 15).
N
0
- parameter referenced to the input of the receiver (pp. 35).
N
ds
- normalized maximum excess delay (pp. 17).
N
FFT
- FFT size (pp. 24).
N
g
- guard interval length (in samples) (pp. 17).
N
gu
- length of a OFDM symbol (useful part and guard period) (pp. 17).
N
osc
- number of oscillators in Jakes fading process (pp. 51).
N
p
- number of pilots in one OFDM symbol (pp. 65).
N
s
- number of sub-carriers (pp. 12).
N
u
- length of useful OFDM symbol, samples (pp. 14).
P
w
- noise power (pp. 127).
P
b
- average probability of bit error (pp. 36).
P
s
- signal power (pp. 127).
PDF - probability density function (pp. 35).
p - complex attenuation or gain coefcient of a channel path (pp. 42).
p - pilot tones (pp. 66).
128 List of Symbols
Q - error function (pp. 37).
q - arbitrary constant (pp. 128).
R - correlation function (pp. 35).
r - received signal (pp. 28).
S - transmitted signal (frequency domain) (pp. 128).
s - transmitted signal (pp. 24).
T
g
- length of guard period, seconds (pp. 34).
T
gu
- length of a OFDM symbol (useful part and guard period), seconds
(pp. 34).
T
s
- OFDM sample time, seconds (pp. 21).
T
u
- length of useful OFDM symbol, seconds (pp. 12).
w - white Gaussian noise (pp. 35).
x - received symbol after decision (pp. 30).
Y - received symbol after channel estimation (frequency domain) (pp. 128).
y - received symbol after channel estimation (pp. 30).
z - received symbol vector after FFT (pp. 29).
z - element of received symbol vector after FFT (pp. 128).
- number of OFDM symbols in a OFDM block (pp. 128).

c
- average SNR of a channel (pp. 128).
(f)
c
- coherence bandwidth (pp. 40).
f - inter-carrier frequency, Hz (pp. 12).
129
(t)
c
- coherence time (pp. 40).
- energy (pp. 36).
- phase (pp. 41).
- angle of an incoming wave at the receiving antenna (pp. 51).
- weighting matrix (pp. 15).
- weighting matrix element (pp. 16).
- mean (pp. 35).
- envelope (pp. 41).
- variance (pp. 35).
- time delay (pp. 21).

max
- maximum excess delay (pp. 39).
- initial phase associated with a propagation path (random variable,
uniformly distributed over [, ]) (pp. 51).
- power spectral density (pp. 35).
- orthogonality matrix (pp. 22).

cp
- orthogonality matrix with cyclic prex (pp. 25).

H
cp
- orthogonality matrix for DFT and cyclic prex removal (pp. 30).
- element of orthogonality matrix (pp. 19).
- orthogonality vector (pp. 129).
- angular frequency (pp. 26).
APPENDIX B
LIST OF ABBREVIATIONS
A
3
AAU
ADSL
ALU
ASIC
ASIP
ASPI
AWGN
BER
BPSK
CIR
CISS
CLB
CP
CPU
CSys
DAB
DAPG
DFG
DFT
DK
DSE
DSM
DSP
DT
DVB
Application Algorithm Architecture
Aalborg University
Asynchronous Digital Subscriber Loop
Arithmetic Logic Unit
Application-Specic Integrated Circuit
Application-Specic Integrated Processor
Applied Signal Processing and Implementation
Additive White Gaussian Noise
Bit Error Rate
Binary Phase Shift Keying
Channel Impulse Response
Center for Embedded Software Systems
Congurable Logic Block
Cyclic Prex
Central Processing Unit
Cellular Systems Division, AAU
Digital Audio Broadcast
Directed Acyclic Precedence Graph
Data Flow Graph
Discrete Fourier Transform
Celoxica DK Design Suite
Design Space Exploration
Data Stream Manager
Digital Signal Processing(-or)
Design Trotter
Digital Video Broadcast
132 List of Abbreviations
EDIF
ETSI
FDM
FFT
FIR
FPGA
GCC
GPU
HCDFG
HW
ICI
IDE
IDFT
IEEE
IFFT
ILP
IP
ISI
ITU
LMMSE
LMS
LS
LUT
MAC
MMSE
NAND
NP
NRE
OFDM
OSI
PAL
PCI
PDF
PE
PSAM
QAM
QPSK
RAM
RC203
SNR
SVD
SW
Electronic Design Interchange Format
European Telecommunications Standards Institute
Frequency Division Multiplexing
Fast Fourier Transform
Finite Impulse Response
Field-Programmable Gate Array
GNU Compiler Collection
Graphics Processing Unit
Hierarchical Control and Data Flow Graph
Hardware
Inter-Carrier Interference
Integrated Development Environment
Inverse Discrete Fourier Transform
Institute of Electrical and Electronic Engineers
Inverse Fast Fourier Transform
Integer Linear Programming
Intellectual Property
Inter-Symbol Interference
International Telecommunication Union
Linear Minimum Mean-Squared Error
Least Mean Square
Least Squares
Look-Up Table
Media Access Control
Minimum Mean-Squared Error
Not AND
Nondeterministic Polynomial
Non-Recurring Engineering
Orthogonal Frequency Division Multiplexing
Open Systems Interconnection
Platform Abstraction Layer
Peripheral Component Interconnect
Probability Density Function
Processing Element
Pilot Symbol Assisted Modulation
Quadrature Amplitude Modulation
Quadrature Phase Shift Keying
Random Access Memory
Celoxica RC203 FPGA Development Board
Signal-to-Noise Ratio
Singular Value Decomposition
Software
133
TCP/IP
VHDL
WiMAX
WSS
WSSUS
XP
Transmission Control Protocol and Internet Protocol
Very High Speed Integrated Circuit Hardware Description Lan-
guage
IEEE 802.16 standard
Wide Sense Stationary
Wide Sense Stationary Uncorrelated Scattering
Extreme Programming
APPENDIX C
SEQUENTIAL HANDEL-C CODE
C.1 Source Code
#include "fixed.hch"
#include <stdlib.hch>
set clock = external "P1";
#define nf 15 // fraction bits
#define ni 4 // integer bits
#define bn1 23 // bit indices for fixed point data reading/writing
#define bn2 20
#define bn3 19
#define bn4 5
#define jakesM 20 // Jakes parameters
#define jakesM1 21
#define jakesN 82
#define numPaths 6
#define max_delay 50
macro expr REAL(a) = a[0];
macro expr IMAG(a) = a[1];
typedef FIXED_SIGNED(ni, nf) TFixed;
typedef TFixed TComplex[2];
typedef unsigned int 4 TPathNumberIndex;
typedef unsigned int 6 TPathIndex;
typedef unsigned int 5 TJakesIndex;
TFixed PI2 = FixedLiteral(FIXED_ISSIGNED, ni, nf, 6.283172607421875);
TFixed PIby2 = FixedLiteral(FIXED_ISSIGNED, ni, nf, 1.57080078125);
TPathIndex delays[numPaths] = {0, 6, 14, 22, 35, 50};
TPathIndex index[numPaths];
TFixed gains[numPaths];
#define sinTableSize 25735
#define sinTableIndexBits 14
typedef FIXED_SIGNED(1, 13) TTableItem;
136 Sequential Handel-C Code
unsigned 8 out;
interface bus_clock_in(unsigned 8 i) pi() with
{data = {"N1", "N3", "N2", "M4", "M3", "M2", "M1", "L4"}};
interface bus_out() po(out) with
{data = {"H1", "H2", "H3", "H4", "G1", "G2", "G3", "G4"}};
interface bus_clock_in(unsigned 1 P) sin_int() with
{data = {"F5"}};
interface bus_clock_in(unsigned 13 K) sin_frac()with
{data = {"L3", "L2", "L1", "L5", "K5", "K4", "K3",
"K2", "K1", "J4", "J3", "J2", "J1"}};
ram TComplex buffer[max_delay+1];
ram TFixed w[jakesM+1];
ram TFixed dw[jakesM+1];
ram TFixed p1[jakesM+1][numPaths];
ram TFixed p2[jakesM+1][numPaths];
ram TFixed p3[jakesM+1][numPaths];
ram TFixed p4[jakesM+1][numPaths];
TFixed Sin(TFixed inval)
{
unsigned int (ni+nf) v;
unsigned int sinTableIndexBits ind;
TTableItem sinres;
TFixed res;
unsigned int 1 t;
v = (unsigned int)inval.FixedIntBits @ (unsigned int)inval.FixedFracBits;
ind = v[16:3];
t = sin_int.P;
res.FixedIntBits = (signed int ni) (t[0] @ t[0] @ t[0] @ t);
res.FixedFracBits = (signed int nf) (sin_frac.K @ 0);
return res;
}
TFixed Cos(TFixed inval)
{
TFixed in2;
in2 = FixedSub(PIby2, inval);
return Sin(in2);
}
TPathIndex CircIndex(TPathIndex a)
{
if (a == max_delay)
return 0;
else return a+1;
}
void InitJakes(void)
{
// initialize gains
gains[0] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.484375);
gains[1] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.38671875);
gains[2] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.0625);
gains[3] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.046875);
gains[4] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.015625);
gains[5] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.00390625);
index[0] = 0; //assuming that first delay is 0
index[1] = max_delay + 1 - delays[1];
index[2] = max_delay + 1 - delays[2];
C.1 Source Code 137
index[3] = max_delay + 1 - delays[3];
index[4] = max_delay + 1 - delays[4];
index[5] = max_delay + 1 - delays[5];
}
//==============================================================================
// returns one complex Jakes simulator waveform value for one path.
// Time is contained in vector w.
void JSample(TPathNumberIndex path, TComplex *res)
{
TJakesIndex n;
TFixed coef;
TFixed t1, t3, t4, t5, t6, t7, t8;
REAL(*res) = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0);
IMAG(*res) = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0);
n = 0;
while (n != jakesM1)
{
t1 = w[n];
t3 = Cos(t1);
t4 = Sin(t1);
t5 = FixedMultSigned(t3, p1[n][path<-3]);
t6 = FixedMultSigned(t4, p2[n][path<-3]);
t7 = FixedMultSigned(t4, p3[n][path<-3]);
t8 = FixedMultSigned(t3, p4[n][path<-3]);
t3 = FixedSub(t5, t6);
t4 = FixedAdd(t7, t8);
REAL(*res) = FixedAdd(REAL(*res), t3);
IMAG(*res) = FixedAdd(IMAG(*res), t4);
n++;
}
}
//==============================================================================
// complex multiplication operation
void Cmul(TComplex *a, TComplex *b, TComplex *res)
{
// (x + yi)(u + vi) = (xu

U yv) + (xv + yu)i


TFixed t1, t2, t3, t4;
t1 = FixedMultSigned(REAL(*a), REAL(*b));
t2 = FixedMultSigned(IMAG(*a), IMAG(*b));
t3 = FixedMultSigned(REAL(*a), IMAG(*b));
t4 = FixedMultSigned(IMAG(*a), REAL(*b));
REAL(*res) = FixedSub(t1, t2);
IMAG(*res) = FixedAdd(t3, t4);
}
//==============================================================================
// complex addition operation
void Cadd(TComplex *a, TComplex *b, TComplex *res)
{
REAL(*res) = FixedAdd(REAL(*a), REAL(*b));
IMAG(*res) = FixedAdd(IMAG(*a), IMAG(*b));
}
//==============================================================================
// complex assignment operator ( res <= a )
void Cassign(TComplex *res, TComplex *a)
{
138 Sequential Handel-C Code
REAL(*res) = REAL(*a);
IMAG(*res) = IMAG(*a);
}
//==============================================================================
// main entry point
void main (void)
{
unsigned int 14 rn;
TPathNumberIndex l;
TPathIndex ind;
TJakesIndex n;
unsigned int 8 byte1, byte2, byte3;
unsigned int 24 allBytes;
TComplex *inValue;
TComplex *pValue;
TComplex res;
TComplex tmp1;
TComplex tmp2;
TComplex tmp3;
TFixed tmpf;
InitJakes();
while (1)
{
// get real part of new sample
byte1 = pi.i;
byte2 = pi.i;
byte3 = pi.i;
allBytes = byte1 @ byte2 @ byte3;
ind = index[0];
inValue = &buffer[ind];
// extract integer and fraction bits, read imaginary part
REAL(*inValue).FixedIntBits = (signed int ni) allBytes[bn1:bn2];
REAL(*inValue).FixedFracBits = (signed int nf) allBytes[bn3:bn4];
byte1 = pi.i;
byte2 = pi.i;
byte3 = pi.i;
allBytes = byte1 @ byte2 @ byte3;
IMAG(*inValue).FixedIntBits = (signed int ni) allBytes[bn1:bn2];
IMAG(*inValue).FixedFracBits = (signed int nf) allBytes[bn3:bn4];
REAL(res) = FixedLiteralFromInts(FIXED_ISSIGNED, ni, nf, 0, 0);
IMAG(res) = FixedLiteralFromInts(FIXED_ISSIGNED, ni, nf, 0, 0);
l = 0;
do
{
JSample(l, &(tmp1));
ind = index[l<-3];
pValue = &(buffer[ind]);
l++;
Cassign(&(tmp3), pValue);
Cmul(&(tmp1), &(tmp3), &(tmp2));
Cadd(&(res), &(tmp2), &(res));
} while (l != numPaths);
C.2 EDIF Output 139
// update circular buffer indices
index[0] = CircIndex(index[0]);
index[1] = CircIndex(index[1]);
index[2] = CircIndex(index[2]);
index[3] = CircIndex(index[3]);
index[4] = CircIndex(index[4]);
index[5] = CircIndex(index[5]);
n = 0;
//return result to output bus
allBytes = (unsigned int ni)REAL(res).FixedIntBits @ (unsigned int nf)REAL(res).FixedFracBits @ 0;
out = (allBytes[23:16]);
out = (allBytes[15:8]);
out = (allBytes[7:0]);
allBytes = (unsigned int ni)IMAG(res).FixedIntBits @ (unsigned int nf)IMAG(res).FixedFracBits @ 0;
out = (allBytes[23:16]);
out = (allBytes[15:8]);
out = (allBytes[7:0]);
// advance simulation time
while(n != jakesM1)
{
tmpf = w[n];
tmpf = FixedAdd(tmpf, dw[n]);
if(FixedGTE(tmpf, PI2))
{
tmpf = FixedSub(tmpf, PI2);
}
w[n] = tmpf;
n++;
} // while
}
}
C.2 EDIF Output
The following output is given by the DK Suite when compiling the source code
to Electronic Design Interchange Format (EDIF):
NAND gates after compilation : 549445 (1353 FFs, 12312 memory bits)
NAND gates after optimization : 300421 (1293 FFs, 12312 memory bits)
NAND gates after expansion : 305132 (1293 FFs, 12312 memory bits)
NAND gates after optimization : 287585 (948 FFs, 12312 memory bits)
NAND gates after expansion : 321910 (1003 FFs, 29184 memory bits)
NAND gates after optimization : 262015 (746 FFs, 7872 memory bits)
NAND gates after expansion : 140365 (743 FFs, 7872 memory bits)
NAND gates after optimization : 95871 (659 FFs, 7872 memory bits)
LUTs after mapping : 5422 (659 FFs, 7872 memory bits)
LUTs after retiming : 5405 (659 FFs, 7872 memory bits)
LUTs after post-optimization : 5407 (653 FFs, 7872 memory bits)
0 errors, 0 warnings
140 Sequential Handel-C Code
C.3 Trace and Route Results
The summary of Trace and Route results given by Xilinx ISE 7.1i is given in
Tables C.1 and C.2.
Logic Utilization Used Available Utilization
Number of Slice Flip Flops 623 21,504 2%
Number of 4 input LUTs: 4,731 21,504 22%
Logic Distribution
Number of occupied Slices 2,845 10,752 26%
Number of Slices containing only related logic 2,845 2,845 100%
Number of Slices containing unrelated logic 0 2,845 0 %
Total Number 4 input LUTs 5,255 21,504 24 %
Number used as logic 4,731
Number used as a route-thru 30
Number used for 32x1 RAMs 224
Number used as 16x1 RAMs 268
Table C.1: Resource usage of sequential Handel-C program
Constraint Requested Actual Logic
Levels
TS_PADIN_ClockInPin_0_jakes_3_W1 = 20.000ns 19.970ns 38
PERIODTIMEGRP
"PADIN_ClockInPin_0_jakes_3_W1"
20 ns HIGH 50%
Table C.2: Minimum FPGA clock period for sequential program implementation
APPENDIX D
PARALLEL HANDEL-C CODE
D.1 Source Code
#include "fixed.hch"
#include <stdlib.hch>
set clock = external "P1";
#define nf 15 // fraction bits
#define ni 4 // integer bits
#define bn1 23 // bit indices for fixed point data reading/writing
#define bn2 20
#define bn3 19
#define bn4 5
#define jakesM 20 // Jakes parameters
#define jakesM1 21
#define jakesN 82
#define numPaths 6
#define max_delay 50
macro expr REAL(a) = a[0];
macro expr IMAG(a) = a[1];
typedef FIXED_SIGNED(ni, nf) TFixed;
typedef TFixed TComplex[2];
typedef unsigned int 4 TPathNumberIndex;
typedef unsigned int 6 TPathIndex;
typedef unsigned int 5 TJakesIndex;
TFixed PI2 = FixedLiteral(FIXED_ISSIGNED, ni, nf, 6.283172607421875);
TFixed PIby2 = FixedLiteral(FIXED_ISSIGNED, ni, nf, 1.57080078125);
TPathIndex delays[numPaths] = {0, 6, 14, 22, 35, 50};
TPathIndex index[numPaths];
TFixed gains[numPaths];
#define sinTableSize 25735
#define sinTableIndexBits 14
typedef FIXED_SIGNED(1, 13) TTableItem;
142 Parallel Handel-C Code
unsigned 8 out;
interface bus_clock_in(unsigned 8 i) pi() with
{data = {"N1", "N3", "N2", "M4", "M3", "M2", "M1", "L4"}};
interface bus_out() po(out) with
{data = {"H1", "H2", "H3", "H4", "G1", "G2", "G3", "G4"}};
interface bus_clock_in(unsigned 1 P) sin_int() with
{data = {"F5"}};
interface bus_clock_in(unsigned 13 K) sin_frac()with
{data = {"L3", "L2", "L1", "L5", "K5", "K4", "K3",
"K2", "K1", "J4", "J3", "J2", "J1"}};
ram TComplex buffer[max_delay+1];
ram TFixed w[jakesM+1];
ram TFixed dw[jakesM+1];
ram TFixed p1[jakesM+1][numPaths];
ram TFixed p2[jakesM+1][numPaths];
ram TFixed p3[jakesM+1][numPaths];
ram TFixed p4[jakesM+1][numPaths];
inline TFixed Sin(TFixed inval)
{
unsigned int (ni+nf) v;
unsigned int sinTableIndexBits ind;
TTableItem sinres;
TFixed res;
unsigned int 1 t;
v = (unsigned int)inval.FixedIntBits @ (unsigned int)inval.FixedFracBits;
ind = v[16:3];
t = sin_int.P;
par
{
res.FixedIntBits = (signed int ni) (t[0] @ t[0] @ t[0] @ t);
res.FixedFracBits = (signed int nf) (sin_frac.K @ 0);
}
return res;
}
inline TFixed Cos(TFixed inval)
{
TFixed in2;
in2 = FixedSub(PIby2, inval);
return Sin(in2);
}
inline TPathIndex CircIndex(TPathIndex a)
{
if (a == max_delay)
return 0;
else return a+1;
}
inline void InitJakes(void)
{
// initialize gains
par
{
gains[0] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.484375);
gains[1] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.38671875);
gains[2] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.0625);
gains[3] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.046875);
D.1 Source Code 143
gains[4] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.015625);
gains[5] = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0.00390625);
index[0] = 0; //assuming that first delay is 0
index[1] = max_delay + 1 - delays[1];
index[2] = max_delay + 1 - delays[2];
index[3] = max_delay + 1 - delays[3];
index[4] = max_delay + 1 - delays[4];
index[5] = max_delay + 1 - delays[5];
}
}
//==============================================================================
// returns one complex Jakes simulator waveform value for one path.
// Time is contained in vector w.
inline void JSample(TPathNumberIndex path, TComplex *res)
{
TJakesIndex n;
TFixed coef;
TFixed t1, t3, t4, t5, t6, t7, t8;
par
{
REAL(*res) = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0);
IMAG(*res) = FixedLiteral(FIXED_ISSIGNED, ni, nf, 0);
n = 0;
}
while (n != jakesM1)
{
t1 = w[n];
par
{
t3 = Cos(t1);
t4 = Sin(t1);
}
par
{
t5 = FixedMultSigned(t3, p1[n][path<-3]);
t6 = FixedMultSigned(t4, p2[n][path<-3]);
t7 = FixedMultSigned(t4, p3[n][path<-3]);
t8 = FixedMultSigned(t3, p4[n][path<-3]);
}
par
{
t3 = FixedSub(t5, t6);
t4 = FixedAdd(t7, t8);
}
par
{
REAL(*res) = FixedAdd(REAL(*res), t3);
IMAG(*res) = FixedAdd(IMAG(*res), t4);
n++;
}
}
}
//==============================================================================
// complex multiplication operation
inline void Cmul(TComplex *a, TComplex *b, TComplex *res)
{
// (x + yi)(u + vi) = (xu

U yv) + (xv + yu)i


TFixed t1, t2, t3, t4;
par
144 Parallel Handel-C Code
{
t1 = FixedMultSigned(REAL(*a), REAL(*b));
t2 = FixedMultSigned(IMAG(*a), IMAG(*b));
t3 = FixedMultSigned(REAL(*a), IMAG(*b));
t4 = FixedMultSigned(IMAG(*a), REAL(*b));
}
par
{
REAL(*res) = FixedSub(t1, t2);
IMAG(*res) = FixedAdd(t3, t4);
}
}
//==============================================================================
// complex addition operation
inline void Cadd(TComplex *a, TComplex *b, TComplex *res)
{
par
{
REAL(*res) = FixedAdd(REAL(*a), REAL(*b));
IMAG(*res) = FixedAdd(IMAG(*a), IMAG(*b));
}
}
//==============================================================================
// complex assignment operator ( res <= a )
inline void Cassign(TComplex *res, TComplex *a)
{
par
{
REAL(*res) = REAL(*a);
IMAG(*res) = IMAG(*a);
}
}
//==============================================================================
// main entry point
void main (void)
{
unsigned int 14 rn;
TPathNumberIndex l;
TPathIndex ind;
TJakesIndex n;
unsigned int 8 byte1, byte2, byte3;
unsigned int 24 allBytes;
TComplex *inValue;
TComplex *pValue;
TComplex res;
TComplex tmp1;
TComplex tmp2;
TComplex tmp3;
TFixed tmpf;
InitJakes();
while (1)
{
// get real part of new sample
byte1 = pi.i;
byte2 = pi.i;
D.1 Source Code 145
byte3 = pi.i;
par
{
allBytes = byte1 @ byte2 @ byte3;
ind = index[0];
}
inValue = &buffer[ind];
par // extract integer and fraction bits, read imaginary part
{
REAL(*inValue).FixedIntBits = (signed int ni) allBytes[bn1:bn2];
REAL(*inValue).FixedFracBits = (signed int nf) allBytes[bn3:bn4];
{
byte1 = pi.i;
byte2 = pi.i;
byte3 = pi.i;
allBytes = byte1 @ byte2 @ byte3;
}
}
par
{
IMAG(*inValue).FixedIntBits = (signed int ni) allBytes[bn1:bn2];
IMAG(*inValue).FixedFracBits = (signed int nf) allBytes[bn3:bn4];
REAL(res) = FixedLiteralFromInts(FIXED_ISSIGNED, ni, nf, 0, 0);
IMAG(res) = FixedLiteralFromInts(FIXED_ISSIGNED, ni, nf, 0, 0);
l = 0;
}
do
{
par
{
JSample(l, &(tmp1));
ind = index[l<-3];
}
par
{
pValue = &(buffer[ind]);
l++;
}
Cassign(&(tmp3), pValue);
Cmul(&(tmp1), &(tmp3), &(tmp2));
Cadd(&(res), &(tmp2), &(res));
} while (l != numPaths);
par
{ // update circular buffer indices
index[0] = CircIndex(index[0]);
index[1] = CircIndex(index[1]);
index[2] = CircIndex(index[2]);
index[3] = CircIndex(index[3]);
index[4] = CircIndex(index[4]);
index[5] = CircIndex(index[5]);
n = 0;
}
par
{
return result to output bus
{
allBytes = (unsigned int ni)REAL(res).FixedIntBits @ (unsigned int nf)REAL(res).FixedFracBits @ 0;
out = (allBytes[23:16]);
out = (allBytes[15:8]);
146 Parallel Handel-C Code
out = (allBytes[7:0]);
allBytes = (unsigned int ni)IMAG(res).FixedIntBits @ (unsigned int nf)IMAG(res).FixedFracBits @ 0;
out = (allBytes[23:16]);
out = (allBytes[15:8]);
out = (allBytes[7:0]);
} // seq
// advance simulation time
{
while(n != jakesM1)
{
tmpf = w[n];
tmpf = FixedAdd(tmpf, dw[n]);
if(FixedGTE(tmpf, PI2))
{
tmpf = FixedSub(tmpf, PI2);
}
par
{
w[n] = tmpf;
n++;
}
} // while
} // seq
} // par
}
}
D.2 EDIF Output
The following output is given by the DK Suite when compiling the source code
to Electronic Design Interchange Format (EDIF):
NAND gates after compilation : 635222 (1387 FFs, 12312 memory bits)
NAND gates after optimisation : 454465 (1288 FFs, 12312 memory bits)
NAND gates after expansion : 471584 (1288 FFs, 12312 memory bits)
NAND gates after optimisation : 243851 (926 FFs, 5700 memory bits)
NAND gates after expansion : 253951 (984 FFs, 11552 memory bits)
NAND gates after optimisation : 225452 (698 FFs, 4848 memory bits)
NAND gates after expansion : 119006 (695 FFs, 4848 memory bits)
NAND gates after optimisation : 79802 (613 FFs, 4848 memory bits)
LUTs after mapping : 4323 (613 FFs, 4848 memory bits)
LUTs after retiming : 4309 (613 FFs, 4848 memory bits)
LUTs after post-optimisation : 4309 (613 FFs, 4848 memory bits)
0 errors, 0 warnings
D.3 Trace and Route Results
The summary of Trace and Route results given by Xilinx ISE 7.1i is given in
Tables D.1 and D.2.
D.3 Trace and Route Results 147
Logic Utilization Used Available Utilization
Number of Slice Flip Flops 583 21,504 2%
Number of 4 input LUTs: 3,923 21,504 18%
Logic Distribution
Number of occupied Slices 2,326 10,752 21%
Number of Slices containing only related logic 2,326 2,326 100%
Number of Slices containing unrelated logic 0 2,326 0 %
Total Number 4 input LUTs 4,266 21,504 19 %
Number used as logic 3,923
Number used as a route-thru 40
Number used for 32x1 RAMs 224
Number used as 16x1 RAMs 79
Table D.1: Resource usage of parallel Handel-C program
Constraint Requested Actual Logic
Levels
TS_PADIN_ClockInPin_0_jakes_3_W1= 20.000ns 19.609ns 35
PERIODTIMEGR
"PADIN_ClockInPin_0_jakes_3_W1"
20 ns HIGH 50%
Table D.2: Minimum FPGA clock period for parallel program implementation
APPENDIX E
LYRTECH SIGNALMASTER
The Lyrtech SignalMaster SM-C67X rapid prototyping board [Lyrtech, 2000 (2)]
allows the designer to develop and test digital signal processing algorithms and
perform hardware/software parallel design in a real-time environment.
E.1 System Requirements
The development tools needed to perform hardware/software design with Signal
Master are:
TI Code Composer Studio C6000
MATLAB Simulink and Real-Time Workshop
The TMS320C6701 is a Texas Instruments DSP with a 32-bit oating-point
architecture. The VIM interface allows direct connection between peripherals in
SignalMaster for real time signal processing. The JTAG connector is the
debugging interface for use with Code Composer Studio for C6000.
In order to use the SignalMaster board in a rapid prototyping environment,
device drivers and utilities are supplied by the manufacturer for the high-level
development tools. SignalMaster is integrated to MATLAB R11 / SIMULINK /
Real-Time Workshop tool suite with functional demos. MATLAB is a numeric
computation, visualization, and simulation environment. SIMULINK is a
graphical extension which allows designs to be entered as block-diagrams.
Real-Time Workshop is used to generate ANSI C code for external targets.
Scripts are used to compile and launch this code onto the SignalMaster platform.
150 Lyrtech SignalMaster
E.2 Platform Architecture
The block diagram of the board architecture is shown in Figure E.1
Figure E.1: SignalMaster block diagram
E.3 Components
The SignalMaster consists of the following components:
CompactPCI (Peripheral Component Interconnect) form factor
Texas Instruments TMS320C6701 at 166 MHz
FPGA for recongurability (Xilinx Virtex XCV300)
One 96kHz CODEC (CS4228) with 2 inputs and 2 outputs
Up to 2MB of SBSRAM(Synchronous Burst Static Random Access
Memory)
16MB of SDRAM (Synchronous Dynamic Random Access Memory)
VIM-2 Site (Velocity Interface Mezzanine)
E.3 Components 151
JTAG (Joint Test Action Group) connector
The Xilinx Virtex XCV300 FPGA has the following features:
333K system gates
32x48 congurable logic blocks (CLBs)
6.912 logic cells
65 Kbit Block RAM
98 kbit maximum SelectRAM
APPENDIX F
CELOXICA RC203 DEVELOPMENT
PLATFORM
The RC203 platform is for evaluation and development of high performance
FPGA-based applications. The platform includes a Xilinx Virtex-II FPGA,
external memory, programmable clocks, Ethernet, Audio, Video, SmartMedia,
Parallel port, RS-232 and PS/2 keyboard and mouse. Supporting software
includes PAL (Platform Abstraction Layer), DSM (Data Streaming Manager), the
RC200 PSL (Platform Support Libraries), and the FTU2 (File Transfer Utility).
F.1 System Requirements
DK Design Suite. Only required if you want to use the PAL, DSM and RC200
Platform Support libraries. Microsoft Windows NT4, Windows 2000 or
Windows XP for the FTU2 program and for use of the DK Design Suite.
F.2 Platform Architecture
The block diagram of the board architecture is shown in Figure F.1
F.3 Components
The RC203 professional version platform consists of:
Virtex-II (XC2V3000-4FG672C) FPGA
154 Celoxica RC203 Development Platform
Figure F.1: RC203 block diagram
Ethernet MAC/PHY with 10/100baseT socket
2 banks of ZBT SRAM providing a total of 4-MB
Video support
AC97 compatible Audio
Connector for SmartMedia Flash memory for storage of BIT les
CPLD (Xilinx XC95144XL)
Parallel port connector and cable, for BIT-le download and host
communication with FPGA
RS-232
PS/2 keyboard and mouse connectors
F.3 Components 155
2 seven-segment displays
2 blue LEDs
2 momentary contact switches
50 pin expansion header
JTAG connector
Perspex top and bottom covers
Universal 110/240V power supply
Headphone/microphone set
Mouse
16-MB SmartMedia card
Color camera
The Xilinx Virtex-II XC2V3000 FPGA has the following features:
3M system gates
64x56 congurable logic blocks (CLBs) in 14336 slices
448 Kbit distributed RAM
96 dedicated 18-bit x 18-bit multipliers
1.728 kbit SelectRAM
12 Digital Clock Managers (DCM)
The CPLD on the board is connected to:
FPGA
Parallel port
SmartMedia Flash RAM
JTAG chain
156 Celoxica RC203 Development Platform
The CPLD (Complex Programmable Logic Device) is used in conguring the
FPGA with data received via the parallel port or from SmartMedia ash memory.
The le transfer utility program provided by Celoxica simplies the process of
programming the RC203 via the parallel port which is IEEE 1284-compatible.
The LAN91C111 Ethernet device by Standard Microsystems Corporation tted
with RC203 supports 8-bit and 16-bit access to the FPGA. The device has a
clock input of 25MHz, generated from the CPLD.
APPENDIX G
COMPARISON OF FPGAS
Parameter Virtex4 FX VirtexII Pro VirtexII
FPGA name VFX12 XC2VP30 XC2V3000
Xtreme Multipliers (18 X 18-bit) 136 96 (Multiplier Blocks)
CLB Resources
CLB Array (Row X Column) 64 X 24 64 X 56
Slices 5472 13696 14336
Logic Cells 12312 30816
CLB Flip Flops 10944 27392 28672
Memory Resources
Max. Distributed RAM (kbits) 86 428 448
block RAM1 w/ECC (18 kbits each) 36 136 96
Total Block RAM (kbits) (18 * 36) = 648 (136 * 18) = 2448 (96 * 18) = 1728
Clock Resources
Digital Clock Managers (DCM) 4 8 12
I/O Resources
Max. User I/O 320 644 720
Total I/O Banks 9
Digitally Controlled Impedance Yes
Max. Differential I/O Pairs 160
DSP Resources
XtremeDSP Slices 32
Embedded Hard IP Resources
PowerPC Processor Blocks 1 2
10/100/1000 Ethernet MAC Blocks 2
RocketIO Serial Transceivers 0 8
Table G.1: Comparison of Virtex 4 FX, Xilinx Virtex II Pro and Virtex II FPGAs

Vous aimerez peut-être aussi