Vous êtes sur la page 1sur 294

2

LTE implementation using XILINX FPGA


July 8, 2013
ii
Prepared by
Abdullah Elsaied Kamal Elsheikh
eng abdullahelsheikh@yahoo.com
Ahmed Helmy Elbendary
ahmedelbendary61@yahoo.com
Ahmed Talal Khalifa
ahmed.talal.911@gmail.com
Belal Mohammed Abu El-Ola
belal.general@yahoo.com
Eslam Ossama Youssef
eslam4pc@gmail.com
Hassan Hesham Hassan Shalaby
eg.hassanshalaby96@gamil.com
Hazem Mamdouh Tharwat
eng.hazem mamdouh@yahoo.com
iii
Khalid Eid Elsayed
khalidbarakat91@yahoo.com
Mahmoud Gamal Assal
engassal99@gmail.com
Muhammad Gamal Abbas Ahmed
eng mohamedgamal91@yahoo.com
Samer Sarwat Nageeb
samersarwat 150@hotmail.com
iv
Acknowledgments
This book was written during our fourth year time at the Department of Communications Engineer-
ing at the University of Alexandria and basically describes our work and study in our graduation
project. Certainly, it could not have been written without the support and patience of many peo-
ple. Therefore, we are obliged to everyone who assisted us during that time. In particular, we
want to express our gratitude to our supervisor Dr. Mohamed Rizk for all the valuable advice,
encouragement, and discussions. The opportunity to work with him was a precious experience, he
exerts all the eort and time to help us to learn, search, and do our best in this project.
Also we want to thank Our Professors in the communication department, who made their best
to teach us the soul of Communication and electronic Engineering, Specially Dr. Essam Sourrour
who accorded us with all the help and support whenever we asked, and our deep thanks to teacher
assistants eng. Kareem Banawan , eng. Ahmed Serag and eng. Mostafa Medra who were
our beacon through our project journey.
Also we want to thank eng. Mohammed Mostafa for helping us with the FPGA .
Most of all, we thank our beloved families for their immeasurable support, encouragement, and
patience while working on this project. Without their love and understanding, this book and our
project would not have come to fruition.
At the end and the beginning, we would be remiss if we fail to express our profound gratitude to
Allah who always we asking for his assistance and we owing to him with any success and progress
we made in our life.
v
vi
Preface
Market needs for higher data rates are driving the evolution of wireless cellular systems from
narrowband 2G GSM systems to 4G LTE systems supporting peak data rates up to 100 Mbps.
For LTE specications, complex signal processing techniques such as multiple-input multiple-
output (MIMO), along with radio technologies like OFDMA, are considered key to achieving target
throughputs in excess of 100 Mbps In-building coverage is also regarded as a key requirement for
future wireless growth, with technologies such as pico and femto base stations trying to address
this issue.
The emerging wireless technologies described above pose signicant challenges for operating
equipment manufacturers needing to design products that are not only scalable and cost-eective
but also exible and reusable. These diverse requirements ultimately make FPGA the hardware
platform of choice.
The aim of our project is to implement the LTE physical layer on FPGA.
vii
viii
Abbreviations
16-QAM 16 quadrature amplitude modulation
2G Second generation
3G Third generation
3GPP Third Generation Project Partnership
4G Fourth generation
64-QAM 64quadrature amplitude modulation
ARQ Automatic repeat request
BCJR Bahl, Cocke, Jelinek and Raviv
BLAST Bell Labs Layered Space Time
BPSK Binary phase shift keying
E-UTRA Evolved UMTS Terrestrial Radio Access
EGC Equal Gain Combining
eNB E-UTRAN NodeB
FDD Frequency Division Duplex
FDMA Frequency division multiple access
FFT Fast Fourier transform
HARQ Hybrid ARQ
HDA Hard Decision Aided
HSDPA High speed downlink packet access
LLR Log Likelihood Ratio
MAP maximum a posteriori
MIMO Multiple Input Multiple Output
ix
MISO Multiple Input Single Output
ML Maximum Likelihood
MMSE Minimum Mean Square Error
MRC Maximum Ratio Combining
MU-MIMO Multi User MIMO
OFDM Orthogonal frequency division multiplexing
OFDMA Orthogonal frequency division multiple access
PAPR Peak-to-Average Power Ratio
PMI Precoding Matrix Indicator
QAM Quadrature Amplitude Modulation
QPSK Quadrature Phase Shift Keying
RI Rank Indicator
SFBC SpaceFrequency Block Code
SIC Successive Interference Cancellation
SIMO Single Input Multiple Output
SISO Single Input Single Output
SNR Signal-to-Noise Ratio
STBC SpaceTime Block Code
STC SpaceTime Code
STTC SpaceTime Trellis Code
SU-MIMO Single User MIMO
TDD Time Division Duplex
V-BLAST Vertical BLAST
ZF Zero Forcing
x
Contents
1 Overview on LTE 1
1.1 Motivation For LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 From UMTS to LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 High Level Architecture of LTE . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Long Term Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 System Architecture Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 FPGA 9
2.1 Key factors for describing FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Fabrication process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Logic density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Clock management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.4 On-chip memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5 DSP capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Virtex-5 FPGA Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Summary of Virtex-5 FPGA Features . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Virtex-5 FPGA Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 550 MHz Clock Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.4 SelectIO Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.5 550 MHz Integrated Block Memory . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.6 550 MHz DSP48E Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.7 Digitally Controlled Impedance (DCI)Active I/O Termination . . . . . . . . . 16
2.2.8 Advanced Flip-Chip Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.9 System Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.10 65-nm Copper CMOS Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.11 Tri-Mode Ethernet Media Access Controller . . . . . . . . . . . . . . . . . . . 17
2.2.12 RocketIO GTP Transceivers (LXT/SXT only) . . . . . . . . . . . . . . . . . 17
2.3 Architectural Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Virtex-5 FPGA Array Overview . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.2 Virtex-5 FPGA Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.3 Input/Output Blocks (SelectIO) . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.4 Congurable Logic Blocks (CLBs) . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.5 Block RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.6 Global Clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.7 DSP48E Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
xi
2.3.8 Routing Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.9 Boundary Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.10 Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.11 System Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.12 Virtex-5 LXT, SXT, TXT, and FXT Platform Features . . . . . . . . . . . . 23
2.3.13 Tri-Mode (10/100/1000 Mb/s) Ethernet MACs . . . . . . . . . . . . . . . . . 23
2.3.14 Integrated Endpoint Blocks for PCI Express . . . . . . . . . . . . . . . . . . . 24
2.3.15 Virtex-5 LXT and SXT Platform Features . . . . . . . . . . . . . . . . . . . . 24
2.3.16 RocketIO GTP Transceivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.17 Virtex-5 TXT and FXT Platform Features . . . . . . . . . . . . . . . . . . . 24
2.3.18 RocketIO GTX Serial Transceivers . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 ML505 evaluation board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 CRC and Segmentation 29
3.1 CRC (cyclic redundancy check) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 CRC polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.2 CRC calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.3 Modulo-2 arithmatic example . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.4 CRC calculation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 What is segmentation ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Problem solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.4 Segmentation process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Matlab code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 VHDL code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Turbo Codes 45
4.1 A Brief History of Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Turbo Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.1 The Component Encoder with Binary Codes . . . . . . . . . . . . . . . . . . 47
4.2.2 Interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.3 Trellis Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.4 Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Iterative Decoding Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.1 BCJR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.2 Tools for Iterative Decoding of Turbo Codes . . . . . . . . . . . . . . . . . . . 50
4.4 Optimal and Suboptimal Algorithms for Turbo Decoding . . . . . . . . . . . . . . . 52
4.4.1 MAP algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.2 Log-MAP Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.3 Max-Log-Map Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 Improvements In Turbo Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5.1 Extrinsic Information Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5.2 The Sliding Window Soft Input Soft Output Decoder . . . . . . . . . . . . . 57
4.5.3 Stopping Criteria for Turbo Decoding . . . . . . . . . . . . . . . . . . . . . . 59
4.5.4 Modulo Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
xii
4.6 LTE Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.6.1 Turbo Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.6.2 Trellis termination for turbo encoder . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.3 Interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7 Implementation of Turbo Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7.1 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7.2 The Turbo Encoder main blocks . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7.3 PISO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.7.4 Interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7.5 Convolutional code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.7.6 SIPO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7.7 TRELLIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.8 Simulations of Turbo Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.8.1 By using Modelsim and Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.9 Workow for Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.9.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.9.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.10 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.10.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.10.2 Extrinsic Information Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.10.3 Sliding window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.10.4 Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.10.5 Internal word length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.11 Implementation of Map Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.11.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.11.2 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.12 Implementation of Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.12.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.12.2 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.12.3 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.12.4 Ressource utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.12.5 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.12.6 BER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5 RATE MATCHING 89
5.1 Subblock interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3 Subblock interlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Hybrid ARQ soft buer limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5 RV starting points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6 Implementation of Rate Matching Transmitter . . . . . . . . . . . . . . . . . . . . . 95
5.6.1 The Rate Matching Transimatter main blocks . . . . . . . . . . . . . . . . . . 95
5.6.2 Sub block interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.6.3 The function of the Sub block interleaver . . . . . . . . . . . . . . . . . . . 96
5.6.4 Bit collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.7 Simulation of Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
xiii
5.7.1 the rst Sub block interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.7.2 the Third Sub block interleaver . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.7.3 The Bit collection Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.7.4 The Bit selection Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.8 Simulation of receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.8.1 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.8.2 VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6 Scrambling 115
6.1 PN-sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1.1 m-sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1.2 Preferred Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.1.3 Gold Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2 Scrambler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.3 Why scrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.3.1 Data randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.3.2 PAPR reduction(peak to average power ratio) . . . . . . . . . . . . . . . . . . 122
6.4 Matlab code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7 Digital Modulation Technique 129
7.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2 HIERARCHY OF DIGITAL MODULATION TECHNIQUES . . . . . . . . . . . . 131
7.3 Pass band Transmission Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.4 COHERENT PHASE-SHIFT KEYING . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.4.1 Binary Phase-Shift Keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.4.2 QUADRIPHASE-SHIFT KEYING . . . . . . . . . . . . . . . . . . . . . . . 137
7.4.3 M-ARY PSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.4.4 Frequency-Shift Keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.4.5 M-ary QUADRATURE AMPLITUDE Modulation (QAM Mod.): . . . . . . 148
7.4.6 Compare between (PSK) Vs (QAM) . . . . . . . . . . . . . . . . . . . . . . . 152
7.5 Noncoherent Orthogonal Modulation : . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.5.1 NONCOHERENT BINARY FSK: . . . . . . . . . . . . . . . . . . . . . . . . 153
7.5.2 Dierential phase shift keying (DPSK): . . . . . . . . . . . . . . . . . . . . . 155
7.6 Table of BER equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.7 Modulation in LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.8 Soft demodulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8 MIMO 173
8.1 MIMO concepts and capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.1.2 WIRELESS CHANNEL IMPAIREMENTS: . . . . . . . . . . . . . . . . . . 174
8.1.3 What is MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.1.4 MIMO vs. Channel Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.1.5 SISO, SIMO, MISO and MIMO terminology . . . . . . . . . . . . . . . . . . 177
8.2 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.2.1 Types of diversity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
xiv
8.2.2 Receive Diversity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
8.3 Spatial multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.3.1 Principles of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.3.2 V-blast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.3.3 spatial multiplexing Types : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8.4 Downlink MIMO modes in LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.4.1 Precoding for two antenna ports . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.4.2 CDD-based precoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9 Orthogonal Frequency Division Multiplixing (OFDM) 231
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
9.2 OFDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
9.2.1 Why OFDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
9.2.2 Orthogonal Multiplexing Principle . . . . . . . . . . . . . . . . . . . . . . . . 235
9.2.3 OFDM adventage and disadventages . . . . . . . . . . . . . . . . . . . . . . . 239
9.2.4 Peak-to-Average Power Ratio and Sensitivity to Non-Linearity . . . . . . . . 240
9.2.5 PAPR Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
9.2.6 Cyclic Prex Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.2.7 Frequency-domain model of OFDM transmission . . . . . . . . . . . . . . . . 246
9.2.8 Channel estimation and reference symbols . . . . . . . . . . . . . . . . . . . . 248
9.3 OFDM as a user-multiplexing and multiple-access scheme . . . . . . . . . . . . . . . 249
9.4 The downlink physical resource: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
A Matlab 259
A.1 Communications System Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
A.2 Fixed Point Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
A.3 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
A.4 HDL Verier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
A.4.1 Workow for Using the Cosimulation Wizard to Create a MATLAB System Ob-
ject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
B Xilinx ISE Overview 263
B.1 Design Flow Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
B.1.1 Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
B.1.2 Design Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
B.1.3 Design Verication (simulation) . . . . . . . . . . . . . . . . . . . . . . . . . . 264
B.1.4 Design Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
B.1.5 Device Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
B.2 Starting the ISE Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
B.2.1 Create a New Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
B.2.2 Create an HDL Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
B.2.3 Checking the Syntax of the New Counter Module . . . . . . . . . . . . . . . . 268
B.2.4 Implement Design and Verify Constraints . . . . . . . . . . . . . . . . . . . . 269
xv
xvi
List of Figures
1.1 Global total trac in mobile networks, 2007-2012 . . . . . . . . . . . . . . . . . . . . 2
1.2 Main LTE performance targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Evolution of the system architecture from GSM and UMTS to LTE. . . . . . . . . . 4
2.1 Global total trac in mobile networks, 2007-2012 . . . . . . . . . . . . . . . . . . . . 26
4.1 Brief history of turbo codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 The Turbo Coding/Decoding Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Encoder Block Diagram (Binary) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Recursive systematic convolution encoder with feedback rate 1/2 code with memory
2. The generator polynomials are g
0
(D) = 1 +D +D
2
and g
1
(D) = 1 +D
2
. . . . . 47
4.5 soft-in/soft-out decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.6 Iterative decoding procedure with two soft-in/soft-out decoders . . . . . . . . . . . . 52
4.7 Relation between Map, Log Map and Max Log Map . . . . . . . . . . . . . . . . . . 53
4.8 Trellis structure of Systematic Convolution Codes with Feedback Encoders . . . . . 53
4.9 turbo code with dierent scaling factors and block length 5114 bit, 8 iterations, AWGN 57
4.10 Graphical representation of a real-time MAP architecture . . . . . . . . . . . . . . . 58
4.11 Average number of iterations for various stopping schemes . . . . . . . . . . . . . . . 60
4.12 Graphical example of modulo normalisation. . . . . . . . . . . . . . . . . . . . . . . . 61
4.13 Hardware realisation of modulo normalisation. . . . . . . . . . . . . . . . . . . . . . 61
4.14 Structure of rate 1/3 turbo encoder (dotted lines apply for trellis termination only) . 62
4.15 The work ow used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.16 Steps of oating point design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.17 Fixed point design is obtained by quantizing the oating point design . . . . . . . . 72
4.18 Steps of implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.19 BER rate curve for turbo codes using Map at dierent iterations . . . . . . . . . . . 74
4.20 comparison between max log map and map BER curves (interleaver size=1088 num-
ber of iterations = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.21 comparison between dierent scaling factors (interleaver size=1088 number of itera-
tions = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.22 comparison between dierent sliding window techniques (interleaver size=1088 num-
ber of iterations = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.23 comparison between two B units and no sliding window (interleaver size=1088 num-
ber of iterations = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.24 comparison between dierent early stopping criteria . . . . . . . . . . . . . . . . . . 77
xvii
4.25 relation between BER and internal size of turbo decoder at SNR -9.16 dB and 2 iter-
ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.26 comparison between oating point and xed point turbo decoder with internal width
of 11 (interelaver size=1088 number of iterations = 2) . . . . . . . . . . . . . . . . . 78
4.27 High-level VLSI architecture of the implemented max-log map decoder (thin boxes in-
dicate registers). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.28 High-level VLSI architecture of the implemented turbo decoder. . . . . . . . . . . . . 80
4.29 The timing diagram of the implemented map decoder . . . . . . . . . . . . . . . . . 81
4.30 The timing diagram of the implemented map decoder . . . . . . . . . . . . . . . . . 81
4.31 The placed and routed design on FPGA . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.32 BER curves for the implemented decoder . . . . . . . . . . . . . . . . . . . . . . . . 85
5.1 Circular-buer rate matching for turbo . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.1 CHANNEL IMPAIREMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.2 Shadowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.3 Interference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.4 SISO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.5 SIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.6 MISO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.7 MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.8 MIMO single-user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.9 MIMO multi-user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.10 table 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.11 Frequency diversity Vs time at one slot . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.12 Frequency diversity Vs time at two slots . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.13 Twp Antenna Delay Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.14 Twp Antenna Cyclic Delay Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
8.15 Receive Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.16 main idea of Receive Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.17 Selective Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.18 branch selective diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.19 Threshold Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.20 Switch-and-examine strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8.21 Switch-and-stay strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.22 Maximal Ratio Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.23 Transmit Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.24 SpaceTime Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.25 SpaceFrequency Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.26 Transmit Diversity Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.29 Two-Branch Transmit Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.31 Two-Branch transmit diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
xviii
8.33 Demodulation/decoding of spatially multiplexed signals based on successive interfer-
ence cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8.34 2 2 MIMO channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.38 Transmit diversity precoding and RE mapping for two antenna ports . . . . . . . . . 220
8.39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8.40 Illustration of feedback-based MIMO precoding . . . . . . . . . . . . . . . . . . . . . 223
9.1 Spectral eciency of OFDM compared to classical multicarrier modulation: (a) clas-
sical multicarrier system spectrum; (b) OFDM system spectrum. . . . . . . . . . . . 232
9.2 Extension to wider transmission bandwidth by means of multi-carrier transmission. . 233
9.3 Per-subcarrier pulse shape and spectrum for basic OFDM transmission. . . . . . . . 234
9.4 OFDM subcarrier spacing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.5 Serial-to-Parallel (S/P) conversion operation for OFDM. . . . . . . . . . . . . . . . . 236
9.6 Eect of channel on signals with short and long symbol duration. . . . . . . . . . . . 237
9.7 OFDM system model: (a) transmitter; (b) receiver. . . . . . . . . . . . . . . . . . . . 238
9.8 OFDM Cyclic Prex (CP) insertion. . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.9 PAPR distribution for dierent numbers of OFDM subcarriers. . . . . . . . . . . . . 241
9.10 Time dispersion and corresponding received-signal timing. . . . . . . . . . . . . . . . 243
9.11 Cyclic-prex insertion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
9.12 Frequency-domain model of OFDM transmission/reception. . . . . . . . . . . . . . . 247
9.13 Frequency-domain model of OFDM transmission/reception with one-tap equalization
at the receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
9.14 Time-frequency grid with known reference symbols. . . . . . . . . . . . . . . . . . . . 248
9.15 OFDM as a user-multiplexing/multiple-access scheme : (a) downlink and (b) uplink 249
9.16 Distributed user multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
9.17 Uplink transmission-timing control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
9.18 The LTE downlink physical resource . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
9.19 Frequency-domain structurefor LTE downlink . . . . . . . . . . . . . . . . . . . . . . 253
9.20 detailed time domain structure for LTE downlink transmission . . . . . . . . . . . . 254
9.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
9.22 downlink resource block assuming normal cyclic prex (i.e 7 OFDM symbols per slot).
with extended cyclic prex there are six OFDM symbols per slot. . . . . . . . . . . . 255
xix
xx
List of Tables
1.1 Key features of the air interfaces of WCDMA and LTE . . . . . . . . . . . . . . . . . 5
1.2 Key features of the radio access networks of UMTS and LTE . . . . . . . . . . . . . 6
1.3 Key features of the core networks of UMTS and LTE . . . . . . . . . . . . . . . . . . 6
4.1 Detailed power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2 Summary of power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3 Resources utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4 Throughput of the implemented design . . . . . . . . . . . . . . . . . . . . . . . . . . 83
xxi
xxii
Chapter 1
Overview on LTE
1.1 Motivation For LTE
The evolution of 3G systems into 4G is driven by the creation and development of new services
for mobile devices, and is enabled by advancement of the technology available for mobile systems.
There has also been an evolution of the environment in which mobile systems are deployed and
operated, in terms of competition between mobile operators, challenges from other mobile tech-
nologies, and new regulation of spectrum use and market aspects of mobile systems.
The rapid evolution of the technology used in telecommunication systems, consumer electronics,
and specically mobile devices has been remarkable in the last 20 years. Moores law illustrates
this and indicates a continuing evolution of processor performance and increased memory size, of-
ten combined with reduced size, power consumption, and cost for devices. High-resolution color
displays and megapixel camera sensors are also coming into all types of mobile devices. Com-
bined with a high-speed internet backbone often based on optical ber networks, we see that a
range of technology enablers are in place to go hand-in-hand with advancement in mobile com-
munications technology such as LTE.
The rapid increase in use of the internet to provide all kinds of services since the 1990s started
at the same time as 2G and 3G mobile systems came into widespread use. The natural next step
was that those internet-based services also moved to the mobile devices, creating what is today
know as mobile broadband. Being able to support the same Internet Protocol (IP)-based services
in a mobile device that people use at home with a xed broadband connection is a major chal-
lenge and a prime driver for the evolution of LTE. A few services were already supported by the
evolved 2.5G systems, but it is not until the systems are designed primarily for IP-based services
that the real mobile IP revolution can take o. An interesting aspect of the migration of broad-
band services to mobile devices is that a mobile avor is also added. The mobile position and the
mobility and roaming capabilities do in fact create a whole new range of services tailored to the
mobile environment.
Fixed telephony (POTS) and earlier generations of mobile technology were built for circuit switched
services, primarily voice. The rst data services over GSM were circuit switched, with packet-
based GPRS coming in as a later addition. This also inuenced the rst development of 3G,
which was based on circuit switched data, with packet-switched services as an add-on. It was
not until the 3G evolution into HSPA and later LTE/LTE-Advanced that packet-switched ser-
vices and IP were made the primary design target. The old circuit-switched services remain, but
1
will on LTE be provided over IP, with Voice-over IP (VoIP) as an example. IP is in itself service
agnostic and thereby enables a range of services with dierent equirements.
The main service-related design parameters for a radio interface supporting a variety of services
are:
Data rate. Many services with lower data rates such as voice services are important and still
occupy a large part of a mobile networks overall capacity, but it is the higher data rate services
that drive the design of the radio interface. The ever increasing demand for higher data rates for
web browsing, streaming and le transfer pushes the peak data rates for mobile systems from
kbit/s for 2G, to Mbit/s for 3G and getting close to Gbit/s for 4G.
increased user data rates as shown in gure ??
cell-edge bit-rate, for uniformity of service provision
Figure 1.1: Global total trac in mobile networks, 2007-2012
Delay. Interactive services such as real-time gaming, but also web browsing and interactive
le transfer, have requirements for very low delay, making it a primary design target. There are,
however,many applications such as e-mail and television where the delay requirements are not as
strict. The delay for a packet sent from a server to a client and back is called latency.
delays, in terms of both connection establishment and transmission latency
2
Capacity. From the mobile system operators point of view, it is not only the peak data rates
provided to the end-user that are of importance, but also the total data rate that can be pro-
vided on average from each deployed base station site and per hertz of licensed spectrum. This
measure of capacity is called spectral eciency. In the case of capacity shortage in a mobile sys-
tem, the Quality-of-Service (QoS) for the individual end-users may be degraded.
reduced cost per bit, implying improved spectral eciency
greater exibility of spectrum usage, in both new and pre-existing bands
Also
seamless mobility, including between dierent radio-access technologies
reasonable power consumption for the mobile terminal.
packet switched optimized
Figure 1.2: Main LTE performance targets
1.2 From UMTS to LTE
1.2.1 High Level Architecture of LTE
In 2004, 3GPP began a study into the long term evolution of UMTS. The aim was to keep 3GPPs
mobile communication systems competitive over timescales of 10 years and beyond, by delivering
the high data rates and low latencies that future users would require. Figure ?? shows the re-
sulting architecture and the way in which that architecture developed from that of UMTS.
In the new architecture, the evolved packet core (EPC) is a direct replacement for the packet
switched domain of UMTS and GSM. It distributes all types of information to the user, voice
as well as data, using the packet switching technologies that have traditionally been used for data
alone. There is no equivalent to the circuit switched domain: instead, voice calls are transported
using voice over IP. The evolved UMTS terrestrial radio access network (E-UTRAN) handles the
EPCs radio communications with the mobile, so is a direct replacement for the UTRAN. The
mobile is still known as the user equipment, though its internal operation is very dierent from
before.
3
Figure 1.3: Evolution of the system architecture from GSM and UMTS to LTE.
The new architecture was designed as part of two 3GPP work items, namely system architecture
evolution (SAE), which covered the core network, and long term evolution (LTE), which covered
the radio access network, air interface and mobile. Ocially, the whole system is known as the
evolved packet system (EPS), while the acronym LTE refers only to the evolution of the air inter-
face. Despite this ocial usage, LTE has become a colloquial name for the whole system, and is
regularly used in this way by 3GPP.
1.2.2 Long Term Evolution
The main output of the study into long-term evolution was a requirements specication for the
air interface [6], in which the most important requirements were as follows.LTE was required to
deliver a peak data rate of 100 Mbps in the downlink and 50 Mbps in the uplink. This require-
ment was exceeded in the eventual system, which delivers peak data rates of 300 Mbps and 75
Mbps respectively. For comparison, the peak data rate of WCDMA, in Release 6 of the 3GPP
specications, is 14 Mbps in the downlink and 5.7 Mbps in the uplink.
It cannot be stressed too strongly, however, that these peak data rates can only be reached in
idealized conditions, and are wholly unachievable in any realistic scenario. A better measure is
the spectral eciency, which expresses the typical capacity of one cell per unit bandwidth. LTE
was required to support a spectral eciency three to four times greater than that of Release 6
WCDMA in the downlink and two to three times greater in the uplink.
Latency is another important issue, particularly for time-critical applications such as voice and
interactive games. There are two aspects to this. Firstly, the requirements state that the time
taken for data to travel between the mobile phone and the xed network should be less than
ve milliseconds, provided that the air interface is uncongested. Mobile phones can operate in
two states: an active state in which they are communicating with the network and a low-power
standby state. The requirements state that a phone should switch from standby to the active
state, after an intervention from the user, in less than 100 milliseconds.
There are also requirements on coverage and mobility. LTE is optimized for cell sizes up to 5 km,
4
works with degraded performance up to 30 km and supports cell sizes of up to 100 km. It is also
optimized for mobile speeds up to 15 km hr1, works with high performance up to 120 km hr1
and supports speeds of up to 350 km hr1. Finally, LTE is designed to work with a variety of dif-
ferent bandwidths, which range from 1.4MHz up to a maximum of 20 MHz. Table 1.1 summa-
rizes its key technical features, and compares them with those of WCDMA.
Feature WCDMA LTE
Multiple access scheme WCDMA OFDMA and SC-FDMA
Frequency re-use 100% Flexible
Use of MIMO antennas From Release 7 Yes
Bandwidth 5MHz 1.4, 3, 5, 10, 15 or 20MHz
Frame duration 10 ms 10 ms
Transmission time interval 2 or 10ms 1 ms
Modes of operation FDD and TDD FDD and TDD
Uplink timing advance Not required Required
Transport channels Dedicated and shared Shared
Uplink power control Fast Slow
Table 1.1: Key features of the air interfaces of WCDMA and LTE
1.3 System Architecture Evolution
The main output of the study into system architecture evolution was a requirements specication
for the xed network , in which the most important requirements were as follows.
The evolved packet core routes packets using the Internet Protocol (IP) and supports devices
that are using IP version 4, IP version 6, or dual stack IP version 4/version 6. In addition, the
EPC provides users with always-on connectivity to the outside world, by setting up a basic IP
connection for a device when it switches on and maintaining that connection until it switches o.
This is dierent from the behaviour of UMTS and GSM, in which the network only sets up an IP
connection on request and tears that connection down when it is no longer required.
The EPC is designed as a data pipe that simply transports information to and from the user: it
is not concerned with the information content or with the application. This is similar to the be-
haviour of the internet, which transports packets that originate from any application software,
but is dierent from that of a traditional telecommunication system, in which the voice appli-
cation is an integral part of the system. Because of this, voice applications do not form part of
LTE: instead, voice calls are controlled by some external entity such as the IP multimedia sub-
system (IMS). The EPC simply transports the voice packets in the same way as any other data
stream.
Unlike the internet, the EPC contains mechanisms to specify and control the data rate, error rate
and delay that a data stream will receive. There is no explicit requirement on the maximum time
required for data to travel across the EPC, but the relevant specication suggests a user plane
latency of 10 milliseconds for a non roaming mobile, increasing to 50 milliseconds in a typical
roaming scenario [8]. To calculate the total delay, we have to add the earlier gure for the de-
lay across the air interface, giving a typical delay in a non roaming scenario of around 20 mil-
5
liseconds. Table 1.2 Key features of the radio access networks of UMTS and LTE Feature UMTS
LTE Chapter The EPC is also required to support inter-system handovers between LTE and ear-
lier 2G and 3G technologies. These cover not only UMTS and GSM, but also non 3GPP systems
such as cdma2000 and WiMAX. Tables 1.2 and 1.3 summarize the key features of the radio ac-
cess network and the evolved packet core, and compare them with the corresponding features of
UMTS.
Feature UMTS LTE
Radio access network Node B, RNC eNB
components
RRC protocol states CELL DCH, CELL FACH, RRC IDLE
CELL PCH, URA PCH, RRC CONNECTED,
RRC IDLE
Handovers Soft and hard Hard
Neighbour lists Always required Not required
Table 1.2: Key features of the radio access networks of UMTS and LTE
Feature UMTS LTE
IP version support IPv4 and IPv67 IPv4 and IPv6
USIM version support Release 99 USIM onwards Release 99 USIM onwards
Transport mechanisms Circuit & packet switching Packet switching
CS domain components MSC server, MGW n/a
PS domain components SGSN, GGSN MME, S-GW, P-GW
IP connectivity After registration During registration
Voice and SMS Included External
Table 1.3: Key features of the core networks of UMTS and LTE
6
Bibliography
[1] Christopher Cox. An Introduction to LTE. John Wiley & Sons Ltd, 2012.
[2] Stefan Parkvall Erik Dahlman and Johan Skld. 4G LTE/LTE-Advanced for Mobile Broad-
band. Elsevier Ltd., 2011.
[3] Harri Holma and Antti Toskala. LTE for UMTS OFDMA and SC-FDMA Based Radio Ac-
cess. John Wiley & Sons, Ltd, 2009.
7
8
Chapter 2
FPGA
Field programmable gate arrays (FPGAs) are digital integrated circuits (ICs) that contain con-
gurable (programmable) blocks of logic along with congurable interconnects between these
blocks. Design engineers can congure (program) such devices to perform a tremendous variety
of tasks.
2.1 Key factors for describing FPGAs
2.1.1 Fabrication process
The more advanced fabrication process brings higher integration, and thus higher density and/or
reduced size of chips.
2.1.2 Logic density
For the logic structure consisting of a 4-input look-up table (LUT), a D-ip-op and some addi-
tional circuitry Xilinx uses the term LC Logical Cell. The terminology used for expressing the
logic density of FPGAs is quite confusing. The point is we need a unit to express the logic capa-
bility of our FPGA. The problem is how to dene this unit. By introducing new features into a
logic block, its functionality increases, and cannot be easily expressed in terms of LCs.
Xilinx uses the term CLB Congurable Logic Block to name the basic logic block of all its FP-
GAs. Each CLB has 8 LCs. But since these 8 LCs provide a grater functionality than if they
were separate, Xilinx now uses the unit ELC Equivalent Logic Cell (1 ELC = 1.125 LC) to state
the complexity of its FPGAs. To make it all more complicated, Xilinx introduced the term ASMBL
Advanced Silicon Modular Block (pronounced like assemble) to describe the new feature-rich
architecture of their Virtex-4 building blocks
2.1.3 Clock management
Clock management comprises two basic functions:
Remove clock skew1 and propagation delay
All parts of a digital circuit need to be synchronized to a desired clock signal. If the circuit is
large, complex, and operating at high frequencies, the clock propagation delay and clock skew
9
have a great impact on its performance. Therefore, providing a clock signal with zero-delay in
all parts of an FPGA becomes crucial. Generally, this can be done using either DLLs Delay
Locked Loops, or PLLs Phase Locked Loops. Both of these 2 types of circuits yield the same
result they compensate for the delay generated on the routing network inside the FPGA, provid-
ing zero-delay clock signal (with respect to a user source clock) to dierent parts of FPGA.
Clock generation and phase shifting
Since the whole point of FPGAs lies in their congurability, having the option to make dierent
parts (called clock domains) of the same FPGA work at dierent frequencies dramatically simpli-
es the design, at the same time improving the performance. Clock multiplication gives the de-
signer a number of design alternatives. For instance, a 50 MHz source clock multiplied 4X by the
DLL/PLL can drive an FPGA design operating at 200 MHz. This technique can simplify board
design because the clock path on the board no longer distributes such a high-speed signal.
2.1.4 On-chip memory
As FPGA applications grow in complexity so does their need for memory. Using Look-Up Tables
as registers for storing data couldnt possibly provide enough space for serious applications. Es-
pecially if these applications require numerous arithmetical computations to be performed, and
are time dependent. As this is often the case, the outside memory could not produce desired ef-
ciency. This is why, with every new generation of FPGAs, more and more memory gets embed-
ded into FPGA. The main advantages of embedded (built-in) memory are:
Short access time
High bandwidth
Great versatility Versatility means that the embedded memory can behave like various mem-
ory forms, and implement some of the most commonly used memory functions, including: RAM
(synchronous/asynchronous), ROM, FIFO, Buers, Cache, Shift registers, etc
2.1.5 DSP capabilities
The majority of FPGA applications require some sort of Digital Signal Processing (DSP). DSP
requires many computations to take place in short periods of time. In order to reduce the time
these computations take, and to increase eciency, computations are executed in parallel (pipelin-
ing). FPGAs are ideal for implementing this pipeline mode of DSP, thanks to their adaptable
structure. FPGA manufacturers have over years developed special DSP units to help designers
fully exploit the FPGA possibilities. These units are designed to optimize execution of most com-
monly used DSP algorithms (ltering, compression, encoding/decoding, equalization, digital con-
version, FFT, modulation, etc.) They usually contain a great number of multipliers (in parallel),
accumulators, shift registers, adders
10
2.2 Virtex-5 FPGA Features
2.2.1 Summary of Virtex-5 FPGA Features
Cross-platform compatibility
Most advanced, high-performance, optimal-utilization, FPGA fabric
Real 6-input look-up table (LUT) technology
Dual 5-LUT option
Improved reduced-hop routing
64-bit distributed RAM option
SRL32/Dual SRL16 option
Powerful clock management tile (CMT) clocking
Digital Clock Manager (DCM) blocks for zero delay buering, frequency synthesis, and
clock phase shifting
PLL blocks for input jitter ltering, zero delay buering, frequency synthesis, and phase-
matched clock division
36-Kbit block RAM/FIFOs
True dual-port RAM blocks
Enhanced optional programmable FIFO logic
Programmable
True dual-port widths up to x36
Simple dual-port widths up to x72
Built-in optional error-correction circuitry
Optionally program each block as two independent 18-Kbit blocks
High-performance parallel SelectIO technology
1.2 to 3.3V I/O Operation
Source-synchronous interfacing using ChipSync technology
Digitally-controlled impedance (DCI) active termination
Flexible ne-grained I/O banking
High-speed memory interface support
Advanced DSP48E slices
25 x 18, twos complement, multiplication
Optional adder, subtracter, and accumulator
Optional pipelining
Optional bitwise logical functionality
Dedicated cascade connections
11
Flexible conguration options
SPI and Parallel FLASH interface
Multi-bitstream support with dedicated fallback reconguration logic
Auto bus width detection capability
System Monitoring capability on all devices On-chip/O-chip thermal monitoring
On-chip/O-chip power supply monitoring
JTAG access to all monitored quantities
Integrated Endpoint blocks for PCI Express Designs
LXT, SXT, TXT, and FXT Platforms
Compliant with the PCI Express Base Specication 1.1
x1, x4, or x8 lane support per block
Works in conjunction with RocketIO transceivers
Tri-mode 10/100/1000 Mb/s Ethernet MACs
RocketIO transceivers can be used as PHY or connect to external PHY using many
soft MII (Media Independent Interface) options
RocketIO GTP transceivers 100 Mb/s to 3.75 Gb/s
LXT and SXT Platforms
RocketIO GTX transceivers 150 Mb/s to 6.5 Gb/s
TXT and FXT Platforms
PowerPC 440 Microprocessors
FXT Platform only
RISC architecture
7-stage pipeline
32-Kbyte instruction and data caches included
Optimized processor interface structure (crossbar)
65-nm copper CMOS process technology
1.0V core voltage
High signal-integrity ip-chip packaging available in standard or Pb-free package options
12
Notes:
1. Virtex-5 FPGA slices are organized dierently from previous generations. Each Virtex-
5 FPGA slice contains four LUTs and four ip-ops (previously it was two LUTs and
two ip-ops.)
2. Each DSP48E slice contains a 25 x 18 multiplier, an adder, and an accumulator.
3. Block RAMs are fundamentally 36 Kbits in size. Each block can also be used as two
independent 18-Kbit blocks.
4. Each Clock Management Tile (CMT) contains two DCMs and one PLL.
5. This table lists separate Ethernet MACs per device.
6. RocketIO GTP transceivers are designed to run from 100 Mb/s to 3.75 Gb/s. Rocke-
tIO GTX transceivers are designed to run from 150 Mb/s to 6.5 Gb/s.
7. This number does not include RocketIO transceivers.
8. Includes conguration Bank 0.
13
2.2.2 Virtex-5 FPGA Logic
On average, one to two speed grade improvement over Virtex-4 devices
Cascadable 32-bit variable shift registers or 64-bit distributed memory capability
Superior routing architecture with enhanced diagonal routing supports block-to-block con-
nectivity with minimal hops
Up to 330,000 logic cells including:
Up to 207,360 internal fabric ip-ops with clock enable (XC5VLX330)
Up to 207,360 real 6-input look-up tables (LUTs) with greater than 13 million total LUT
bits
Two outputs for dual 5-LUT mode gives enhanced utilization
Logic expanding multiplexers and I/O registers
2.2.3 550 MHz Clock Technology
Up to six Clock Management Tiles (CMTs)
Each CMT contains two DCMs and one PLLup to eighteen total clock generators
Flexible DCM-to-PLL or PLL-to-DCM cascade
Precision clock deskew and phase shift
Flexible frequency synthesis
Multiple operating modes to ease performance trade-o decisions
Improved maximum input/output frequency
Fine-grained phase shifting resolution
Input jitter ltering
Low-power operation
Wide phase shift range
Dierential clock tree structure for optimized low-jitter clocking and precise duty cycle
32 global clock networks
Regional, I/O, and local clocks in addition to global clocks
2.2.4 SelectIO Technology
Up to 1,200 user I/Os
Wide selection of I/O standards from 1.2V to 3.3V
14
Extremely high-performance
Up to 800 Mb/s HSTL and SSTL (on all single-ended I/Os)
Up to 1.25 Gb/s LVDS (on all dierential I/O pairs)
True dierential termination on-chip
Same edge capture at input and output I/Os
Extensive memory interface support
2.2.5 550 MHz Integrated Block Memory
Up to 16.4 Mbits of integrated block memory
36-Kbit blocks with optional dual 18-Kbit mode
True dual-port RAM cells
Independent port width selection (x1 to x72)
Up to x36 total per port for true dual port operation
Up to x72 total per port for simple dual port operation (one Read port and one Write
port)
Memory bits plus parity/sideband memory support for x9, x18, x36, and x72 widths
Congurations from 32K x 1 to 512 x 72 (8K x 4 to 512 x 72 for FIFO operation)
Multirate FIFO support logic
Full and Empty ag with fully programmable Almost Full and Almost Empty ags
Synchronous FIFO support without Flag uncertainty
Optional pipeline stages for higher performance
Byte-write capability
Dedicated cascade routing to form 64K x 1 memory without using FPGA routing
Integrated optional ECC for high-reliability memory requirements
Special reduced-power design for 18 Kbit (and below)
2.2.6 550 MHz DSP48E Slices
25 x 18 twos complement multiplication
Optional pipeline stages for enhanced performance
Optional 48-bit accumulator for multiply accumulate (MACC) operation with optional ac-
cumulator cascade to 96-bits
15
Integrated adder for complex-multiply or multiply-add operation
Optional bitwise logical operation modes
Independent C registers per slice
Fully cascadable in a DSP column without external routing resources
2.2.7 Digitally Controlled Impedance (DCI)Active I/O Termination
Optional series or parallel termination
Temperature and voltage compensation
Makes board layout much easier
Reduces resistors
Places termination in the ideal location, at the signal
2.2.8 Advanced Flip-Chip Packaging
Pre-engineered packaging technology for proven superior signal integrity
Minimized inductive loops from signal to return
Optimal signal-to-PWR/GND ratios
Reduces SSO induced noise by up to 7x
Pb-Free and standard packages
2.2.9 System Monitor
On-Chip temperature measurement (4C)
On-Chip power supply measurement (1
Easy to use, self-contained
No design required for basic operation
Autonomous monitoring of all on-chip sensors
User programmable alarm thresholds for on-chip sensors
User accessible 10-bit 200kSPS ADC
Automatic calibration of oset and gain error
DNL = 0.9 LSBs maximum
Up to 17 external analog input channels supported
0V to 1V input range
Monitor external sensors e.g., voltage, temperature
General purpose analog inputs
16
Full access from fabric or JTAG TAP to System Monitor
Fully operational prior to FPGA conguration and during device power down (access via
JTAG TAP only)
2.2.10 65-nm Copper CMOS Process
1.0V Core Voltage
12-layer metal provides maximum routing capability and accommodates hard-IP immersion
Triple-oxide technology for proven reduced static power consumption
2.2.11 Tri-Mode Ethernet Media Access Controller
Designed to the IEEE 802.3-2002 specication
Operates at 10, 100, and 1,000 Mb/s
Supports tri-mode auto-negotiation
Receive address lter (5 address entries)
Fully monolithic 1000Base-X solution with RocketIO GTP transceivers
Supports multiple external PHY connections (RGMII, GMII, etc.) interfaces through soft
logic and SelectIO resources
Supports connection to external PHY device through SGMII using soft logic and RocketIO
GTP transceivers
Receive and transmit statistics available through separate interface
Separate host and client interfaces
Support for jumbo frames
Support for VLAN
Flexible, user-congurable host interface
Supports IEEE 802.3ah-2004 unidirectional mode
2.2.12 RocketIO GTP Transceivers (LXT/SXT only)
Full-duplex serial transceiver capable of 100 Mb/s to 3.75 Gb/s baud rates
8B/10B, user-dened FPGA logic, or no encoding options
Channel bonding support
CRC generation and checking
17
Programmable pre-emphasis or pre-equalization for the transmitter
Programmable termination and voltage swing
Programmable equalization for the receiver
Receiver signal detect and loss of signal indicator
User dynamic reconguration using secondary conguration bus
Out of Band (OOB) support for Serial ATA (SATA)
Electrical idle, beaconing, receiver detection, and PCI Express and SATA spread-spectrum
clocking support
Less than 100 mW typical power consumption
Built-in PRBS Generators and Checkers
2.3 Architectural Description
2.3.1 Virtex-5 FPGA Array Overview
Virtex-5 devices are user-programmable gate arrays with various congurable elements and em-
bedded cores optimized for high-density and high-performance system designs. Virtex-5 devices
implement the following functionality:
I/O blocks provide the interface between package pins and the internal congurable logic.
Most popular and leading-edge I/O standards are supported by programmable I/O blocks
(IOBs). The IOBs can be connected to very exible ChipSync logic for enhanced source-
synchronous interfacing. Source-synchronous optimizations include per-bit deskew (on both
input and output signals), data serializers/deserializers, clock dividers, and dedicated I/O
and local clocking resources.
Congurable Logic Blocks (CLBs), the basic logic elements for Xilinx FPGAs, provide com-
binatorial and synchronous logic as well as distributed memory and SRL32 shift register
capability. Virtex-5 FPGA CLBs are based on real 6-input look-up table technology and
provide superior capabilities and performance compared to previous generations of pro-
grammable logic.
Block RAM modules provide exible 36 Kbit true dualport RAM that are cascadable to
form larger memory blocks. In addition, Virtex-5 FPGA block RAMs contain optional pro-
grammable FIFO logic for increased device utilization. Each block RAM can also be cong-
ured as two independent 18 Kbit true dual-port RAM blocks, providing memory granular-
ity for designs needing smaller RAM blocks.
Cascadable embedded DSP48E slices with 25 x 18 twos complement multipliers and 48-
bit adder/subtracter/accumulator provide massively parallel DSP algorithm support. In
addition, each DSP48E slice can be used to perform bitwise logical functions.
18
Clock Management Tile (CMT) blocks provide the most exible, highest-performance clock-
ing for FPGAs. Each CMT contains two Digital Clock Manager (DCM) blocks (self-calibrating,
fully digital), and one PLL block (selfcalibrating, analog) for clock distribution delay com-
pensation, clock multiplication/division, coarse- /ne-grained clock phase shifting, and in-
put clock jitter ltering. Additionally, LXT, SXT, TXT, and FXT devices also contain:
Integrated Endpoint blocks for PCI Express designs providing x1, x4, or x8 PCI Express
Endpoint functionality. When used in conjunction with RocketIO transceivers, a complete
PCI Express Endpoint can be implemented with minimal FPGA logic utilization.
10/100/1000 Mb/s Ethernet media-access control blocks oer Ethernet capability. LXT
and SXT devices contain:
RocketIO GTP transceivers capable of running up to 3.75 Gb/s. Each GTP transceiver
supports full-duplex, clock-and-data recovery. TXT and FXT devices contain:
GTX transceivers capable of running up to 6.5 Gb/s. Each GTX transceiver supports full-
duplex, clock-anddata recovery. FXT devices contain:
Embedded IBM PowerPC 440 RISC CPUs. Each PowerPC 440 CPU is capable of run-
ning up to 550 MHz. Each PowerPC 440 CPU also has an APU (Auxiliary Processor Unit)
interface that supports hardware acceleration, and an integrated cross-bar for high data
throughput.
general routing matrix (GRM) provides an array of routing switches between each in-
ternal component. Each programmable element is tied to a switch matrix, allowing multiple
connections to the general routing matrix. The overall programmable interconnection is
hierarchical and designed to support high-speed designs. In Virtex-5 devices, the routing
connections are optimized to support CLB interconnection in the fewest number of hops.
Reducing hops greatly increases post place-and-route (PAR) design performance. All pro-
grammable elements, including the routing resources, are controlled by values stored in
static storage elements. These values are loaded into the FPGA during conguration and
can be reloaded to change the functions of the programmable elements.
2.3.2 Virtex-5 FPGA Features
This section briey describes the features of the Virtex-5 family of FPGAs.
2.3.3 Input/Output Blocks (SelectIO)
IOBs are programmable and can be categorized as follows:
Programmable single-ended or dierential (LVDS) operation
Input block with an optional single data rate (SDR) or double data rate (DDR) register
Output block with an optional SDR or DDR register
Bidirectional block
Per-bit deskew circuitry
19
Dedicated I/O and regional clocking resources
Built-in data serializer/deserializer The IOB registers are either edge-triggered D-type ip-
ops or level-sensitive latches. IOBs support the following single-ended standards:
LVTTL
LVCMOS (3.3V, 2.5V, 1.8V, 1.5V, and 1.2V)
PCI (33 and 66 MHz)
PCI-X
GTL and GTLP
HSTL 1.5V and 1.8V (Class I, II, III, and IV)
HSTL 1.2V (Class 1)
SSTL 1.8V and 2.5V (Class I and II) The Digitally Controlled Impedance (DCI) I/O fea-
ture can be congured to provide on-chip termination for each single-ended I/O standard
and some dierential I/O standards. The IOB elements also support the following dieren-
tial signaling I/O standards:
LVDS and Extended LVDS (2.5V only)
BLVDS (Bus LVDS)
ULVDS
Hypertransport
Dierential HSTL 1.5V and 1.8V (Class I and II)
Dierential SSTL 1.8V and 2.5V (Class I and II)
RSDS (2.5V point-to-point) Two adjacent pads are used for each dierential pair. Two
or four IOB blocks connect to one switch matrix to access the routing resources. Per-bit
deskew circuitry allows for programmable signal delay internal to the FPGA. Per-bit deskew
exibly provides ne-grained increments of delay to carefully produce a range of signal
delays. This is especially useful for synchronizing signal edges in source-synchronous in-
terfaces. General purpose I/O in select locations (eight per bank) are designed to be re-
gional clock capable I/O by adding special hardware connections for I/O in the same local-
ity. These regional clock inputs are distributed within a limited region to minimize clock
skew between IOBs. Regional I/O clocking supplements the global clocking resources. Data
serializer/deserializer capability is added to every I/O to support source-synchronous in-
terfaces. A serial-toparallel converter with associated clock divider is included in the in-
put path, and a parallel-to-serial converter in the output path. An in-depth guide to the
Virtex-5 FPGA IOB is found in the Virtex-5 FPGA Tri-Mode Ethernet MAC User Guide.
20
2.3.4 Congurable Logic Blocks (CLBs)
A Virtex-5 FPGA CLB resource is made up of two slices. Each slice is equivalent and contains:
function generators
Four storage elements
Arithmetic logic gates
Large multiplexers
Fast carry look-ahead chain The function generators are congurable as 6-input LUTs or
dual-output 5-input LUTs. SLICEMs in some CLBs can be congured to operate as 32-bit
shift registers (or 16-bit x 2 shift registers) or as 64-bit distributed RAM. In addition, the
four storage elements can be congured as either edge-triggered D-type ip-ops or level
sensitive latches. Each CLB has internal fast interconnect and connects to a switch matrix
to access general routing resources. The Virtex-5 FPGA CLBs are further discussed in the
Virtex-5 FPGA User Guide.
2.3.5 Block RAM
The 36 Kbit true dual-port RAM block resources are programmable from 32K x 1 to 512 x 72, in
various depth and width congurations. In addition, each 36-Kbit block can also be congured
to operate as two, independent 18- Kbit dual-port RAM blocks. Each port is totally synchronous
and independent, oering three read-during-write modes. Block RAM is cascadable to implement
large embedded storage blocks. Additionally, back-end pipeline registers, clock control circuitry,
built-in FIFO support, ECC, and byte write enable features are also provided as options. The
block RAM feature in Virtex-5 devices is further discussed in the Virtex-5 FPGA User Guide.
2.3.6 Global Clocking
The CMTs and global-clock multiplexer buers provide a complete solution for designing high-
speed clock networks. Each CMT contains two DCMs and one PLL. The DCMs and PLLs can
be used independently or extensively cascaded. Up to six CMT blocks are available, providing up
to eighteen total clock generator elements. Each DCM provides familiar clock generation capabil-
ity. To generate deskewed internal or external clocks, each DCM can be used to eliminate clock
distribution delay. The DCM also provides 90, 180, and 270 phase-shifted versions of the output
clocks. Fine-grained phase shifting oers higherresolution phase adjustment with fraction of the
clock period increments. Flexible frequency synthesis provides a clock output frequency equal
to a fractional or integer multiple of the input clock frequency. To augment the DCM capabil-
ity, Virtex-5 FPGA CMTs also contain a PLL. This block provides reference clock jitter ltering
and further frequency synthesis options. Virtex-5 devices have 32 global-clock MUX buers. The
clock tree is designed to be dierential. Dierential clocking helps reduce jitter and duty cycle
distortion.
21
2.3.7 DSP48E Slices
DSP48E slice resources contain a 25 x 18 twos complement multiplier and a 48-bit adder / sub-
tacter / accumulator. Each DSP48E slice also contains extensive cascade capability to eciently
implement high-speed DSP algorithms. The Virtex-5 FPGA DSP48E slice features are further
discussed in Virtex-5 FPGA XtremeDSP Design Considerations.
2.3.8 Routing Resources
All components in Virtex-5 devices use the same interconnect scheme and the same access to the
global routing matrix. In addition, the CLB-to-CLB routing is designed to oer a complete set
of connectivity in as few hops as possible. Timing models are shared, greatly improving the pre-
dictability of the performance for highspeed designs.
2.3.9 Boundary Scan
Boundary-Scan instructions and associated data registers support a standard methodology for
accessing and conguring Virtex-5 devices, complying with IEEE standards 1149.1 and 1532.
2.3.10 Conguration
Virtex-5 devices are congured by loading the bitstream into internal conguration memory us-
ing one of the following modes:
Slave-serial mode
Master-serial mode
Slave SelectMAP mode
Master SelectMAP mode
Boundary-Scan mode (IEEE-1532 and -1149)
SPI mode (Serial Peripheral Interface standard Flash)
BPI-up/BPI-down modes (Byte-wide Peripheral interface standard x8 or x16 NOR Flash)
In addition, Virtex-5 devices also support the following conguration options:
256-bit AES bitstream decryption for IP protection
Multi-bitstream management (MBM) for cold/warm boot support
Parallel conguration bus width auto-detection
Parallel daisy chain
Conguration CRC and ECC support for the most robust, exible device integrity check-
ing Virtex-5 device conguration is further discussed in the Virtex-5 FPGA Conguration
Guide.
22
2.3.11 System Monitor
FPGAs are an important building block in high availability/reliability infrastructure. Therefore,
there is need to better monitor the on-chip physical environment of the FPGA and its immediate
surroundings within the system. For the rst time, the Virtex-5 family System Monitor facili-
tates easier monitoring of the FPGA and its external environment. Every member of the Virtex-
5 family contains a System Monitor block. The System Monitor is built around a 10-bit 200kSPS
ADC (Analog-to-Digital Converter). This ADC is used to digitize a number of on-chip sensors to
provide information about the physical environment within the FPGA. On-chip sensors include a
temperature sensor and power supply sensors. Access to the external environment is provided via
a number of external analog input channels. These analog inputs are general purpose and can be
used to digitize a wide variety of voltage signal types. Support for unipolar, bipolar, and true dif-
ferential input schemes is provided. There is full access to the onchip sensors and external chan-
nels via the JTAG TAP, allowing the existing JTAG infrastructure on the PC board to be used
for analog test and advanced diagnostics during development or after deployment in the eld.
The System Monitor is fully operational after power up and before conguration of the FPGA.
System Monitor does not require an explicit instantiation in a design to gain access to its basic
functionality. This allows the System Monitor to be used even at a late stage in the design cycle.
The Virtex-5 FPGA System Monitor is further discussed in theVirtex-5 FPGA System Monitor
User Guide.
2.3.12 Virtex-5 LXT, SXT, TXT, and FXT Platform Features
This section briey describes blocks available only in LXT, SXT, TXT, and FXT devices.
2.3.13 Tri-Mode (10/100/1000 Mb/s) Ethernet MACs
Virtex-5 LXT, SXT, TXT, and FXT devices contain up to eight embedded Ethernet MACs, two
per Ethernet MAC block. The blocks have the following characteristics:
Designed to the IEEE 802.3-2002 specication
UNH-compliance tested
RGMII/GMII Interface with SelectIO or SGMII interface
when used with RocketIO transceivers
Half or full duplex
Supports Jumbo frames
1000 Base-X PCS/PMA: When used with RocketIO GTP transceiver, can provide complete
1000 Base-X implementation on-chip
DCR-bus connection to microprocessors
23
2.3.14 Integrated Endpoint Blocks for PCI Express
Virtex-5 LXT, SXT, TXT, and FXT devices contain up to four integrated Endpoint blocks. These
blocks implement Transaction Layer, Data Link Layer, and Physical Layer functions to provide
complete PCI Express Endpoint functionality with minimal FPGA logic utilization. The blocks
have the following characteristics:
Compliant with the PCI Express Base Specication 1.1
Works in conjunction with RocketIO transceivers to provide complete endpoint functional-
ity
1, 4, or 8 lane support per block
2.3.15 Virtex-5 LXT and SXT Platform Features
This section briey describes blocks available only in LXT and SXT devices.
2.3.16 RocketIO GTP Transceivers
4 - 24 channel RocketIO GTP transceivers capable of running 100 Mb/s to 3.75 Gb/s.
Full clock and data recovery
8/16-bit or 10/20-bit datapath support Optional 8B/10B or FPGA-based encode/decode
Integrated FIFO/elastic buer
Channel bonding and clock correction support
Embedded 32-bit CRC generation/checking
Integrated comma-detect or A1/A2 detection
Programmable pre-emphasis (AKA transmitter equalization)
Programmable transmitter output swing
Programmable receiver equalization
Programmable receiver termination
Embedded support for:
Out of Band (OOB) signalling: Serial ATA
Beaconing, electrical idle, and PCI Express receiver detection
Built-in PRBS generator/checker Virtex-5 FPGA RocketIO GTP transceivers are further
discussed in the Virtex-5 FPGA RocketIO GTP Transceiver User Guide.
2.3.17 Virtex-5 TXT and FXT Platform Features
This section describes blocks only available in TXT and FXT devices.
24
2.3.18 RocketIO GTX Serial Transceivers
(TXT/FXT) 8 - 48 channels RocketIO serial transceivers capable of running 150 Mb/s to 6.5
Gb/s
Full Clock and Data Recovery
8/16/32-bit or 10/20/40-bit datapath support
Optional 8B/10B encoding, gearbox for programmable 64B/66B or 64B/67B encoding, or
FPGA-based encode/decode
Integrated FIFO/Elastic Buer
Channel bonding and clock correction support
Dual embedded 32-bit CRC generation/checking
Integrated programmable character detection
Programmable de-emphasis (AKA transmitter equalization)
Programmable transmitter output swings
Programmable receiver equalization
Programmable receiver termination
Embedded support for:
Serial ATA: Out of Band (OOB) signalling
PCI Express: Beaconing, electrical idle, and receiver detection
Built-in PRBS generator/checker Virtex-5 FPGA RocketIO GTX transceivers are further
discussed in the Virtex-5 FPGA RocketIO GTX Transceiver User Guide.
25
2.4 ML505 evaluation board
Virtex-5
LXT/ S XT/FXT
FPGA
GPIO
(B utton/LED/DIP S witch)
PLL Clock Gener ator
Pl us User O scillator
S ystem Monitor
S MA
(Diferenti al In/Out Clock s)
Dual P S /2
GTP: PCIe 1x
Fl as h
S ync
S RAM
Pl atform Fl as h
S PI
S ystem ACE
Controller
CPLD
Misc. Gl ue Logic
S
e
l
e
c
t
M
a
p
S
P
I

C
f
g
B
P
I

F
l
a
s
h

C
f
g
S
l
a
v
e

S
e
r
i
a
l
J
T
A
G
J
T
A
G
J
T
A
G
J
T
A
G
M
a
s
t
e
r

S
e
r
i
a
l
XGI He ader
US B
Controller
10/100/1000
Ethernet PHY
AC97
Audio CODEC
Battery and
Fan He ader
CF PC4
R S -232 XCVR
VGA Inp ut
Codec
16 X 32
Character LCD
IIC EEPROM
RJ-45
Line O ut /
Headphone
Digital Audio
Mic In / Line In
S erial
Piezo/ S peaker
Host
Peripher al
Peripher al
16
32
16
32
32 16
User IIC B us
DDR2
S O-DIMM
DVI-I Video O ut
DVI O utput
Codec
GTP: 2 S erial ATA
GTP: 4 S FP
GTP: 4 S MA
Figure 2.1: Global total trac in mobile networks, 2007-2012
26
Bibliography
[1] Ognjen ekic. Fpga comparative analysis. note.
[2] Xilinx. ML505/ML506/ML507 Evaluation Platform User Guide Evaluation Platform, 2009.
[3] Xilinx. Virtex-5 Family Overview, 2009.
27
28
Chapter 3
CRC and Segmentation
3.1 CRC (cyclic redundancy check)
The rst step in the processing sequence is the CRC attachment. A xed 24-A CRC code is ap-
pended to each transport block (TB)(A transport block is dened as the data accepted by the
physical layer to be jointly encoded). CRC codes are error-detecting codes typically used in automatic-
repeat request (ARQ) systems. CRC codes have no error correction capability but they can be
used in a combination with an error-correcting code to improve the performance of the system. A
CRC constructed by an (n, k) cyclic code is capable of detecting any error burst of length n k
or less. Binary (n, k) CRC codes are capable of detecting the following error patterns:
1. All error bursts of length n k or less.
2. A fraction of error bursts of length equal to nk + 1; the fraction equals 12
(nk1)
3. A fraction of error bursts of length greater than to nk + 1; the fraction equals 2
(nk1)
.
4. All combinations of d
min
1 (or fewer) errors
5. All error patterns with an odd number of errors if the generator polynomial g(X) for the
code has an even number of nonzero coecients.
3.1.1 CRC polynomials
Denote the input bits to the CRC computation by a
0
, a
1
, a
2
, a
3
, ..., a
A1
, and the parity bits by
p
0
, p
1
, p
2
, p
3
, ..., p
L1
. A is the size of the input sequence and L is the number of parity bits. The
parity bits are generated by one of the following cyclic generator polynomials:
1. g
CRC24A
(D) = [D
24
+D
23
+D
18
+D
17
+D
14
+D
11
+D
10
+D
7
+D
6
+D
5
+D
4
+D
3
+D+1]
2. g
CRC24B
(D) = [D
24
+D
23
+D
6
+D
5
+D + 1]
3. g
CRC16
(D) = [D
16
+D
12
+D
5
+ 1]
4. g
CRC8
(D) = [D
8
+D
7
+D
4
+D
3
+D + 1]
29
3.1.2 CRC calculation
The theory of a CRC calculation is straight forward. The data is treated by the CRC algorithm
as a binary number. This number is divided by another binary number called the polynomial.
The rest of the division is the CRC checksum, which is appended to the transmitted message.
The receiver divides the message (including the calculated CRC), by the same polynomial the
transmitter used. If the result of this division is zero, then the transmission was successful. How-
ever, if the result is not equal to zero, an error occurred during the transmission. The division
uses the Modulo-2 arithmetic. Modulo-2 calculation is simply realized by XORing two numbers.
3.1.3 Modulo-2 arithmatic example
1 0 0 1 1 0 0 1 0 1
0 1 0 0 1 1 0 1 1 1 XOR
1 1 0 1 0 1 0 0 1 0 =
XOR function:
X
1
X
2
Y
0 0 0
1 0 1
0 1 1
1 1 0
3.1.4 CRC calculation example
In this example calculation, the message is two bytes long. In general, the message can have any
length in bytes. Before we can start calculating the CRC value 1, the message has to be aug-
mented by n-bits, where n is the length of the polynomial. The CRC-16 polynomial has a length
of 16-bits, therefore, 16-bits have to be augmented to the original message. In this example calcu-
lation, the polynomial has a length of 3-bits, therefore, the message has to be extended by three
zeros at the end. An example calculation for a CRC is shown in Example 1. The reverse calcula-
tion is shown in Example 2.
1. Example(1):
Message = 1 1 0 1 0 1
CRC polynomial = 1 0 1
1 1 0 1 0 1 0 0 1 0 1 = 1 1 1 0 1 = Quotient (has no function in CRC calculation)
30
1 1 0 1 0 1 0 0
1 0 1
1 1 1
1 0 1
1 0 0
1 0 1
1 1 0
1 0 1
1 1 0
1 0 1
1 1 = Remainder = CRC checksum
Message with CRC = 1 1 0 1 0 1 1 1
2. Example (2) :
Message with CRC = 1 1 0 1 0 1 1 1
Polynomial = 1 0 1
1 1 0 1 0 1 1 1 1 0 1 = 1 1 1 0 1 = Quotient
1 1 0 1 0 1 1 1
1 0 1
1 1 1
1 0 1
1 0 0
1 0 1
1 1 1
1 0 1
1 0 1
1 0 1
0 0 = Checksum is zero, therefore, no transmission error
3.2 Segmentation
For the purposes of reducing complexity, a certain xed number of turbo interleaver sizes is sup-
ported as given in the table(realease 8 standard) . The dierence between two adjacent inter-
leaver sizes is 8-bits for small codeblocks and goes up to 64 bits for the largest codeblock size
why?The reason for a coarser granularity of interleaver sizes for larger code blocks is that a larger
number of ller bits is still a small fraction of the codeblock size when the codeblock size is large.
3.2.1 What is segmentation ?
The maximum codeblock size is limited to 6144 bits.
31
When the transport is larger than 6144 bits, segmentation of the input bit sequence is per-
formed
When the transport block size is not matched to the turbo interleaver size, ller bits are
added.
3.2.2 Example:
Let us assume a transport block size of 19 000 bits
They will be segmented into four codeblocks
The last three segments are of maximum size 6144 bits and the rst segment is of size 576
bits.
The rst segment size is actually 568 bits and is matched to the nearest interleaver size of
576.
This results in a ller bits overhead of 8-bits (576 568 = 8 bits).
There is a problem with this segmentation approach
Vastly dierent codeblock sizes would result in dierent turbo code performance.
This approach will lead to the increase of ller bits, which is considered a delay or un-
useful
Data increasing the amount of redundancy.
3.2.3 Problem solution
In order to reduce the number of ller bits while keeping the codeblock sizes approximately the
same, the LTE system uses two adjacent interleaver sizes In the later stages of lte standard devel-
opment, it was agreed that ller bits are mostly removed after channel coding.
3.2.4 Segmentation process
32
A xed 24-a CRC is calculated for each transport block comming from MAC layer.
The calculated CRC is appended to the transport block in order to check the data integrity
at the receiver end.
If the input sequence length is shorter than 40 bits, ller bits are added to the beginning of
the code block. (no segmentation)
If the resulting bit sequence is longer than the maximum allowed code block size (6144
bits), a segmentation process must be carried out.
After the segmentation, ller bits are added to the last segment, if needed.
Finally, an additional CRC sequence of 24-b is then attached to each resulting segment.
3.3 Matlab code
For the matlab code , rst I needed to enter the whole avaliable k to be able to choose the
suitable ones from them.
This psuedo-code taken from the standard is very helpful in explaining the whole segmenta-
tion process
33
34
Now to describe the crc operation , I will use the following matlab code
First , we need to enter the crc generator matrix : crc
24a
= [1100001100100110011111011];
Crc generation process , is the long division of the data block by the crc generator ; in bi-
nary , we can translate this operation into a simple xor process , but with shifting the crc
generator after every xor operation.
We will insert a number of 24 zeros to be concatenated with the data block, its like provid-
ing a space to insert the crc reminder in the end of the process,it will be named as (shifted-
data).
Using a while loop , and two variables , count and coordinator . count=length(shifted-
data)-length(crc24a); determines the number of shifts needed coordinator = 1 ; initialized
by one and then incremented at each shifting process
while coordinator count
Run the code till reaching the end of the data block
if data-xord(1,1)==0 and coordinator==count
This is the end of the process , where number of shifts is max and the rst bit is zero (MSB),
which means its less in order than the crc generator ;break to end the looping
break
elseif data-xord(1,1)==0
35
Ordinary looping case , where looping continuous , but the order of the portion of bits is
less than the order of the crc generator polynomial ; proceed to next step
else
data-xord=xor(data-xord,crc24a);
Ordinary looping case , where looping continuous, now the MSB is equal to one so xoring
operation can be performed normally
end
for plus=1:24
Looping on the data portion for max of 24 shifts , to change the value of MSB
check=data-xord(1,1);
if check==0
this if case is performed to check the MSB ,assuming and xor operation did occur and the
MSB changed from one to zero , and this is the last shift can be performed on the data ;
break is needed
if coordinatorcount
break
end
data-xord=[data-xord(1,2:end) shifted-data(1,coordinator+25)];
Adding the MSB new bit (shifting)
coordinator=coordinator+1;
elseif check==1
If the MSB is 1 break this loop and go back to the main loop to perform xor operation
break
end
end
if coordinator > count
Checking on the coordinator after incrementing it in the previous stage
break
end
end
remainder=data-xord(1,2:end);
End of crc operation remove the MSB as it must be zero if the crc operation is correct and
the crc remainder is 24 bits only
nal-data=[data remainder];
The reciever is the exact same design were you perform crc operation on the data with the
remainder .
36
3.4 VHDL code
1. For lack of resources restrictions , we will only use a transport block size of 16 bits.
2. We will perform only a crc
24a
operation and no segmentation will tack place
3. The code is divided into 3 codes:
4. First block is the transmitter , adds 24 crc bits to the data
5. The second block is the reciever , extracts the data from the recieved block ,and checks on
the crc remainder
6. Third code is were the transmitter and reciever are connected ,by calling the tx and rx as
components , and then wiring them together .
7. Initializing the code at the entity portion
TB1 : is the input of transport block as bits
clk1: the clock of the system
block1: the output of the transmitter , it is a bus of 40 bits
8. Architecture is where the code is written ,also some internal signals and constants are ini-
tialized
crc24a : the crc generator , preserved as a constant .
bits25 : the portion of data xord with the crc generator , and shifted after each xor op-
eration
transb24 : a signal containing the saved 16 bits , and then the crc bits are added to it
37
9. The initialization of the sequential statements.
process(clk1) : meaning that the sequential statements in the process will be aected
by the clock change
i : variable to count to data , and to index the signal trans24b
counta : variable to enter the crc calculation process when its one ,
countb : variable to take the bits from trans24b to bits25 to perform the crc opera-
tions and shifting.
dist: variable to determine whether the transb24 bus is clear to put data on it or not .
10. The whole system is set to start at the rising edge of the clock
11. When the bus is clear (dist=0) each clock cycle the TB1 data is placed in the trans24b
12. When the indexing variable i = 16 , dist is set to one , and counta is set to 1 to enter the
crc calculation stage .
13. Shifting trans24b by 24 bits , its like inserting 24 zeros to prepare it for the crc genera-
tion.
14. This is the crc calculation stage when counta 1
15. When counta=1 meaning that this is the rst entrance of this stage ,so 25 bits of the -
trans24b is added to bits25 to be xord with the crc generator polynomial
16. counta is incremented by 25 indicating that 25 bits were taken.
17. If counta is one ,its the rst time to enter this stage , so 25 bits are added to bits25 , and
counta incremented so it doesnt enter this stage again.
18. checking the MSB of the bits25 , if its zero , then shift the bits and add a bit from trans24b.
38
19. If the MSB is one perform the xor operation , as the order of the bits25 is equal to the or-
der of the crc polynomial , so xor operation can be performed , according to long division
laws
20. and decrement countb by one if the MSB is zero .
21. When countb reaches 1 , this means that it reached the limit of shifts ; the end of the data
bus.
22. The bits25 (crc bits) are placed in the trans24b in order to be sent out on the block1 bus.
23. counta is set to zero to exit the crc stage .
24. dist is set to one to start the taking new data .
25. countb is set to 40 again to make it ready for the next crc stage
The reciever have the exact same design but with reversing the start and end operation , as
you take data as a bus and send it out in serial form as th input TB
The last code which combines the TX and RX will be explained next few steps :
Entity of the code , conatins input data and outputs
TB : input stream
TBo : output stream ( TB=TBo ) but with a delay due to processing time
clock
error : calculated at reciever side by checking the bits25 after crc calculation , it should
equal zero
39
TX and RX blocks are dened as components
blockin : signal connecting to two blocks to take the output of the TX and insert it as in-
put for the RX
Wiring operation is done to by connecting the inputs and outputs of each block.
Using model sim to analyze the output wave form :
Each cycle one bit is saved in the trans24 bus
After a number of 16 cycles exactly , according to the code bits25 takes a portion of 25 bits
of the data
40
Its obvious that the data was shifted in the trans24b bus.
After calculation of crc is done after a number of cycles ,the crc bits are placed in the place
of the rst 24 bits.
Checking the waveform of the big block code , containing TX and RX.
After a number of cycles blockin signal which connects between TX and RX have the out-
put of the TX and its sent to the RX
TBo (the output of RX) remains zero
Error isnt calculated yet.
After processing the data at the reciever side , after a number of cycles , the TBo starts to
output data in serial as they were taken as input in TB input .
You check the data comming in blockin bus (comming from TX) it will be the same as the
output TBo data .
As the wave form graph displays , the error calculated is to be zero .
41
42
Bibliography
[1] Havish Koorpaty Jung-Fu (Thomas) Cheng. Error detection reliability of lte crc coding.
43
44
Chapter 4
Turbo Codes
Turbo codes is a coding scheme consisting of two parallel recursive systematic convolutional en-
coders rst introduced by Berrou in 1993
4.1 A Brief History of Turbo Codes
The invention of turbo codes is not the outcome of a mathematical development. It is the result
of an intuitive experimental approach whose origin can be found in the work of several European
researchers: Gerard Battail, Joachim Hagenauer and Peter Hoeher who, at the end of the 80s
highlighted the interest of probabilistic processing in receivers. Others before them, mainly in the
United States: Peter Elias Michael Tanner, Robert Gallager, etc. had earlier imagined procedures
for coding and decoding that were the forerunners of turbo codes .
In a laboratory at cole Nationale Suprieure des Tlcommunications de Bretagne (Telecom Bre-
tagne), Claude Berrou and Patrick Adde were attempting to transcribe the Viterbi algorithm
with weighted input (SOVA: Soft-Output Viterbi Algorithm) , into MOS transistors, in the sim-
plest possible way. A suitable solution was found after two years which enabled these researchers
to form an opinion about probabilistic decoding. Claude Berrou, then Alain Glavieux, pursued
the study and observed, after Gerard Battail, that a decoder with weighted input and output
could be considered as a signal to noise ratio amplier. This encouraged them to implement the
concepts commonly used in ampliers, mainly feedback.
Perfecting turbo codes involved many very pragmatic stages and also the introduction of neolo-
gisms, like parallel concatenation or extrinsic information, nowadays common in information
theory jargon. The publication in 1993 of the rst results, with a performance 0,5 dB from the
Shannon limit, shook the coding community, a gain of almost 3 dB, compared to solutions exist-
ing at that time.
4.2 Turbo Encoding
The original turbo code is the combination of two parallel Recursive Systematic Convolutional
(RSC) codes concatenated by a pseudo-random interleaver, and an iterative MAP decoder. The
turbo coding/decoding principle is illustrated in Figure 4.2. represents the interleaver between
Encoder 1 and Encoder 2 and
1
represents deinterleaver between Decoder 2 and Decoder 1
45
1994
1996
1998
2000
2002
2004
1993
2003
Near Shannon limit error correcting coding and
decoding:Turbo-Codes by Claude Berrou,
Alain Glavieux and Punya Thitimajshima was
presented in ICC93 at Geneva with
patent application no. FR91 05279, EP92 460011.7
and US 870,483 (ML Decoding).
Recursive Systematic Convolutional codes and
application to parallel concatenation by
Punya Thitimajshima was published in Globecom95
Near Optimum Error Correcting Coding and
Decoding : Turbo-Codes by Claude Berrou and
Alain Glavieux was published in IEEE Transactions
on Communications on October.
IEEE Stephen O. Rice Award (Best Paper on
IEEE Trans. Commun.) was presented to
Claude Berrou and Alain Glavieux.
IEEE Information Theory Society Paper Award was
awarded to Claude Berrou and Alain Glavieux for
their publication in IEEE Trans. Commun. in 1996.
Claude Berrou, Alain Glavieux, and
Punya Thitimajshima recieved Golden Jubilee Awards
for Technological Innovation for the Invention of
Turbo Codes on August.
Claude Berrou and Alain Glavieux recieved the
IEEE Richard W. Hamming Medal for invention of
turbo codes, which have revolutionized digital
communications.
Punya Thitimajshima recieved Thailands Outstanding
Technologist Award.
10 years anniversary for the invention of turbo codes
(1993-2003).
th
Figure 4.1: Brief history of turbo codes
Figure 4.2: The Turbo Coding/Decoding Principle
46
4.2.1 The Component Encoder with Binary Codes
A general binary convolutional turbo encoder structure using two component encoders is illus-
trated in Figure 4.3 as an example. It consists of three basic building blocks: an interleaver the
component encoders, and a puncturing device with a multiplexing unit to compose the codeword.
The interleaver is a device that re-orders the symbols in its input sequence.
Figure 4.3: Encoder Block Diagram (Binary)
The Component encoders are RSC encoders, i.e., systematic convolutional encoders with feed-
back. Such an encoder with two memory elements is depicted in Figure 4.4 . For systematic
codes, the information sequence is part of the codeword, which corresponds to a direct connec-
tion from the input to one of the outputs. For each input bit, the encoder generates two code-
word bits: the systematic bit and the parity bit. Thus, the code rate is 1/2 and the encoder in-
put and output bits are denoted U
k
and (X
k,1
= U
k
, X
k,2
) respectively.
Figure 4.4: Recursive systematic convolution encoder with feedback rate 1/2 code with memory
2. The generator polynomials are g
0
(D) = 1 +D +D
2
and g
1
(D) = 1 +D
2
If If the generator matrix of a non-recursive convolutional encoder with rate 1/n is given by
G(D) = (g
0
(D), g
1
(D), , g
n1
(D)) (4.1)
the recursive encoder will be dened by,
G
sys
(D) = (1,
g
1
(D)
g
0
(D)
, ,
g
n1
(D)
g
0
(D)
) (4.2)
Since the performance of any binary code is dominated by its free distance (the minimum Ham-
ming distance between codewords, which coincides with the minimum Hamming weight of a nonzero
codeword for linear codes) , the optimal-recursive component encoders should have maximum ef-
fective free distance to achieve a good performance. Furthermore, to achieve a good performance,
it is also important that the component codes be recursive.
In the design of convolutional codes, one advantage of systematic codes is that encoding is some-
what simpler than for the non-systematic codes and less hardware is required.
47
4.2.2 Interleaving
Interleaving is the process of rearranging the ordering of an information sequence in a one-to-one
deterministic way before the application of the second component code in a turbo coding scheme.
The inverse of this process is called deinterleaving which restores the received sequence to its
original order. Interleaving is a practical technique to enhance the error correcting capability of
the coding schemes . It plays an important role in achieving good performance in turbo coding
schemes.
Constructing a long block code from short memory convolutional codes using the interleaver re-
sults in the creation of codes with good distance properties, which can be eciently decoded
through iterative decoding. The interleaver breaks low weight input sequences, and hence in-
creases the codes free Hamming distance or reduces the number of codewords with small dis-
tance in the code distance spectrum. On the other hand, the interleaver spreads out burst errors
through providing scrambled information data to the second component encoder, and at the de-
coder, decorrelates the inputs to the two component decoders so that an iterative sub-optimum
decoding algorithm based on uncorrelated information exchange between the two component de-
coders can be applied. For example, after correction of some of the errors in the rst component
decoder, some of the remaining errors can be spread by the interleaver such that they become
correctable in the other decoder. By increasing the number of iterations in the decoding process,
the bit error probability approaches that of the maximum likelihood decoder. Typically, the per-
formance of a turbo code is improved when the interleaver size is increased, which has a positive
inuence on both the code properties and iterative decoding performance.
A key component of turbo code is the interleaver whose design is essential for achieving high per-
formance and is of interest to many turbo code researchers. Many interleaving strategies have
been proposed, including block interleavers, Odd-Even block interleavers, block helical simile in-
terleavers; Convolutional interleavers and Cyclic shift interleavers; Random interleavers including
pseudo-random interleaver, Uniform and Non-uniform interleavers, S-random interleavers; Code
matched interleavers, Relative prime interleavers; Golden interleavers, etc.
4.2.3 Trellis Termination
As mentioned above, the performance of a code is highly dependent on its Hamming distance
spectrum. For convolutional turbo codes, the Hamming distances between the codewords are the
result of taking dierent paths through the trellis. In principle, the larger the number of trellis
transitions in which the two paths dier, the larger is the possible Hamming distance between
the corresponding codewords. It is thus desirable that the shortest possible detour from a trel-
lis path is as long as possible, to ensure a large Hamming distance between the two codewords
that correspond to the two paths. However, in practice, convolutional turbo codes are truncated
at some point in order to encode the information sequence block-by-block. If no precautions are
taken before the truncation, each of the encoder states is a valid ending state and thus the short-
est possible dierence between the two trellis paths is made up of only one trellis transition. Nat-
urally, this procedure may result in very poor distance properties, with accompanying poor error
correcting performance.
Since the component codes are recursive, it is not possible to terminate the trellis by transmit-
ting m zero tail bits. The tail bits are not always zero, and depend on the state of the component
encoder after encoding N information bits. Trellis termination forces the encoder to the all-zero
48
state at the end of each block to make sure that the initial state for the next block is the all-zero
state. This way, the shortest possible trellis detour does not change with truncation, and the dis-
tance spectrum is preserved.
Another approach to the problem of trellis truncation is tail-biting. With tail-biting, the en-
coder is initialized to the same state that it will end up in, after encoding the whole block. For
feed-forward encoders tail-biting is readily obtained by inspection of the last bits in the input
sequence, since these dictate the encoder ending state. The advantage of using tail-biting com-
pared to trellis termination is that tail-biting does not require transmission of tail bits (the use of
tail bits reduces the code rate and increases the transmission bandwidth). For large blocks, the
rate-reduction imposed by tail-bits is small, often negligible. For small blocks, however, it may be
signicant.
4.2.4 Puncturing
Puncturing is the process of removing certain symbols/positions from the codeword, thereby re-
ducing the codeword length and increasing the overall code rate. In the original turbo code pro-
posal, Berrou et al. punctured half of the bits from each constituent encoder. Puncturing half
of the systematic bits from each constituent encoder corresponds to sending all the systematic
bits once, if the puncturing is properly performed. The overall code rate is R = 1/2 . Further-
more, puncturing may have dierent eect for dierent choices of interleavers, and for dierent
constituent encoders.
When puncturing is considered, for example, some output bits of v
0
, v
1
and v
2
are deleted ac-
cording to a chosen pattern dened by a puncturing matrix P. For instance, a rate 1/2 turbo
code can be obtained by puncturing a rate 1/3 turbo code. Commonly used puncturing matrix
is given by
P =
_
_
1 1
1 0
0 1
_
_
(4.3)
where the puncturing period is 2. According to the puncturing matrix, the parity check digits
from the two component encoders are alternately deleted. The punctured turbo code symbol at
a given time consists of an information digit followed by a parity check digit which is alternately
obtained from the rst and the second component encoders.
4.3 Iterative Decoding Principle
An iterative turbo decoder consists of two component decoders concatenated serially via an in-
terleaver, identical to the one in the encoder. SISO (Soft Input/Soft Output) algorithms are well
suited for iterative decoding because they accept a priori information at their input and produce
a posteriori information at their output. In turbo decoding, trellis based decoding algorithms are
used. These are recursive methods suitable for the estimation of the state sequence of a discrete-
time nite-state Markov process observed in memoryless noise. With reference to decoding of
noisy coded sequences, the MAP algorithm is used to estimate the most likely information bit to
have been transmitted in a coded sequence. Here, we only discuss the iterative decoding of two-
dimensional turbo codes. The extension to the case of multidimensional concatenated codes is
straightforward.
49
4.3.1 BCJR Algorithm
The Bahl, Cocke, Jelinek, and Raviv (BCJR) algorithm, also known as the forward-backward or
the a posteriori probability algorithm, or Maximum a posteriori algorithm, is the core component
in many iterative detection and decoding schemes. BCJR algorithm is optimal for estimating the
states or the outputs of a Markov process observed in white noise. It produces the sequence of A
Posteriori Probabilities (APP), where is the APP of the data bit given all the received sequence.
The numerical representation of probabilities, non-linear functions and mixed multiplications and
additions of these values perhaps make this algorithm too dicult to implement. As a result, dif-
ferent derivatives of this algorithm such as Log-MAP and Max-Log-MAP algorithm have been
used in the decoding of turbo codes.
4.3.2 Tools for Iterative Decoding of Turbo Codes
Log-likelihood Algebra. The log-likelihood ratio of a binary random variable u
k
, L(u
k
) is
dened as
L(u
k
) = ln
P(u
k
= +1)
P(u
k
= 1)
(4.4)
where u
k
is information bit at time k Since
P(u
k
= +1) = 1 P(u
k
= 1) (4.5)
L(u
k
) = ln
P(u
k
) = +1)
1 P(u
k
= +1)
(4.6)
Simplifying we nd
P(u
k
= 1) = (
e
L(u
k
)/2
1 +e
L(u
k
)/2
) e
u
k
L(u
k
)/2
= A
k
e
u
k
L(u
k
)/2
(4.7)
where A
k
= (
e
L(u
k
)/2
1+e
L(u
k
)/2
) is a common factor .
If the binary random variable u
k
is conditioned on a dierent random variable or vector y
k
then
we have a conditioned log-likelihood L(u
k
[y
k
) ratio with
L(u
k
[y
k
) = ln
P(u
k
= +1[y
k
)
P(u
k
= 1[y
k
)
= ln
P(y
k
[u
k
= +1) P(u
k
= +1)
P(y
k
[u
k
= 1) P(u
k
= 1)
= ln
P(y
k
[u
k
= +1)
P(y
k
[u
k
= 1)
+
P(u
k
= +1)
P(u
k
= 1)
= L(y
k
[u
k
) +L(u
k
)
(4.8)
50
Soft Channel Outputs After transmission over a channel with a fading factor a and additive
Gaussian noise,
L(u
k
[y
k
) = ln
P(y
k
[u
k
= +1) P(u
k
= +1)
P(y
k
[u
k
= 1) P(u
k
= 1)
= ln
exp(
Es
N
0
(y
k
a)
2
)
exp(
Es
N
0
(y
k
+a)
2)
+
P(u
k
= +1)
P(u
k
= 1)
= 4
E
s
N
0
y
k
+L(u
k
)
= L
c
y
k
+L(u
k
)
(4.9)
where L
c
= 4
Es
N
0
For a fading channel, a denotes the fading amplitude whereas for a Gaussian
channel , we set a = 1
Since
P(y
k
) = P(y
k
[u
k
= +1) P(u
k
= +1) +P(y
k
) = P(y
k
[u
k
= 1) P(u
k
= 1) (4.10)
and using the previous equations, we can prove that
p(y
k
[u
k
) = B
k
e
u
k
Lcy
k
/2
(4.11)
where B
k
= (
P(y
k
)(1+e
L(u
k
)
)e
Lcy
k
/2
1+e
L(u
k
)+Lcy
k
Principle of the Iterative Decoding Algorithm Assume that we have a soft-in/soft-out
decoder available as shown in Figure 4.5 for decoding of the component codes. The output of
Figure 4.5: soft-in/soft-out decoder
the symbol-by-symbol Maximum a posteriori Probability (MAP) decoder is dened as the a pos-
teriori log-likelihood ratio, that is, the logarithm of the ratio of the probabilities of a given bit
being +1 or -1 given the observation y.
L( u) = L(u[y) = ln
P(u = +1[y)
P(u = 1[y)
(4.12)
Such a decoder uses a priori values L(u) for all information bits u , if available,and the channel
values L
c
y for all coded bits. It also delivers soft outputs L(

(u)) on all information bits and an
extrinsic information L
e
(

(u)) which contains the soft output information from all the other coded
bits in the code sequence and is not inuenced by the L(u) and L
c
y values of the current bit.
51
For systematic codes, the soft output for the information bit u will be represented as the sum of
three terms
L( u) = L
c
y +L(u) +L
e
(

(u)) (4.13)
This means that we have three independent estimates for the log-likelihood ratio of the infor-
mation bits: the channel values the a priori values L(u) and the values by a third independent
estimator utilizing the code constraint. The whole procedure of iterative decoding with two Soft-
in/Softout decoders is shown in Figure 4.6.
Figure 4.6: Iterative decoding procedure with two soft-in/soft-out decoders
In the rst iteration of the iterative decoding algorithm, Decoder 1 computes the extrinsic infor-
mation
L
1
e
( u) = L
1
[L
c
y +L(u)] (4.14)
We assume equally likely information bits: thus we initialize L(u) = 0 for the rst iteration. This
extrinsic information from the rst decoder, is passed to the Decoder 2, which uses L
1
e
( u) as the
a priori value in place of L(u) to compute L
2
e
( u) Hence, the extrinsic information value computed
by Decoder 2 is
L
2
e
( u) = L
2
( u) [L
c
y +L
1
e
(u)] (4.15)
Then, Decoder 1 will use the extrinsic information values L
2
e
(

(u)) as a priori information in the
second iteration. The computation is repeated in each iteration.
The iterative process is usually terminated after a predetermined number of iterations, when the
soft-output value L
2
e
(

(u)) stabilizes and changes little between successive iterations. In the nal
iteration, Decoder 2 combines both extrinsic information values in computing the soft-output val-
ues
L
2
( u) = L
c
y +L
1
e
( u) +L
2
e
( u) (4.16)
4.4 Optimal and Suboptimal Algorithms for Turbo Decoding
The Maximum Likelihood Algorithms such as Viterbi Algorithm, nd the most probable infor-
mation sequence that was transmitted, while the MAP algorithm nds the most probable infor-
mation bit to have been transmitted given the coded sequence. The information bits returned by
the MAP algorithm need not form a connected path through the trellis.
For estimating the states or the outputs of a Markov process, the symbol by symbol MAP al-
gorithm is optimal. However, MAP algorithm is not practicable for implementation due to the
numerical representation of probabilities, nonlinear functions and lot of multiplications and addi-
tions. Log-MAP algorithm avoids the approximations in the Max-Log-MAP algorithm and hence
52
is equivalent to the true MAP but without its major disadvantages. MAP like algorithms, the
Max-Log-MAP algorithm, is both suboptimal at low signal to noise ratios. The relationship be-
tween these algorithms is illustrated in Figure 4.7.
Figure 4.7: Relation between Map, Log Map and Max Log Map
4.4.1 MAP algorithm.
The trellis of a binary feedback convolutional encoder has the structure shown in Figure 2.4.
From above, dene the log-likelihood ratio as:
Figure 4.8: Trellis structure of Systematic Convolution Codes with Feedback Encoders
L( u) = L(u[y) = ln
P(u = +1[y)
P(u = 1[y)
= ln
(s

,s)

u
k
=+1
P(s

, s, y)
(s

,s)

u
k
=1
P(s

, s, y)
(4.17)
where
P(s

, s, y) = P(s

, y
j<k
) P(s, y
k
[s

) P(y
j>k
[s)
= P(s

, y
j<k
) P(s[s

) P(y
k
[s

, s) P(y
j>k
[s)
=
k1
(s

)
k
(s

, s)
k
(s)
(4.18)
53
Here y
j<k
denotes the sequence of received symbols y
j
from the beginning of the trellis up to
time k1 and y
j>k
is the corresponding sequence from time k + 1 up to the end of the trellis. The
forward recursion and backward recursion of the MAP algorithm yield

k
(s) =

(s

,s)

k
(s

, s)
k1
(s

) (4.19)

k1
(s) =

(s

,s)

k
(s

, s)
k
(s) (4.20)
L( u) = L(u[y) = ln
P(u = +1[y)
P(u = 1[y)
= ln
(s

,s)

u
k
=+1

k1
(s

)
k
(s

, s)
k
(s)
(s

,s)

u
k
=1

k1
(s

)
k
(s

, s)
k
(s)
(4.21)
Whenever, there is a transition from s

to s , P(s[s

) = P(u
k
), where u
k
is the information bit
corresponding to the transition from s

to s and the branch transition probability is given as,

k
(s

, s) = P(s[s

) p(y
k
[s

, s)
= P(y
k
[u
k
) P(u
k
)
(4.22)
The index pair (s

, s) determines the information bit u


k
and the coded bits x
k,v
for v = 2, , n
where
P(y
k
[u
k
) = P(y
k,1
[u
k
) (
n

v=2
P(y
k,v
[u
k
, s

, s))
= P(y
k,1
[u
k
) (
n

v=2
P(y
k,v
[u
k,v
))
(4.23)
is the independent joint probabilities of the received symbols and
P(u
k
) = A
k
e
u
k
L(u
k
)/2
(4.24)
From Equation (4.11), we have,
P(y
k
[u
k
) = P(y
k,1
[u
k
) (
n

v=2
P(y
k,v
[u
k,v
))
= B
k
exp(
1
2
L
c
y
k,1
u
k
) (
n

v=2
exp(
1
2
L
c
y
k,v
u
k,v
))
= B
k
exp(
1
2
L
c
y
k,1
u
k
+
n

v=2
1
2
L
c
y
k,v
u
k,v
)
(4.25)
Hence,

k
(s

, s) = P(y
k
[u
k
) P(u
k
)
= A
k
B
k
exp(
1
2
L
c
y
k,1
u
k
+
n

v=2
1
2
L
c
y
k,v
u
k,v
+
1
2
u
k
L(u
k
))
(4.26)
54
The terms A
k
and B
k
in Equation (4.26) are equal for all transitions from level k1 to level k and
hence will cancel out in the ratio of Equation (4.21). Thus we use

k
(s

, s) = exp(
1
2
L
c
y
k,1
u
k
+
n

v=2
1
2
L
c
y
k,v
u
k,v
+
1
2
u
k
L(u
k
)) (4.27)
The extrinsic information can be calculated as
L
e
( u
k
) = L( u
k
) [L
c
y
k
+L(u
k
)] (4.28)
4.4.2 Log-MAP Algorithm.
The Log-MAP algorithm is a transformation of MAP, which has equivalent performance without
its problems in practical implementation. It works in the logarithmic domain, where multiplica-
tion is converted to addition. The following are the calculations of branch transition probabilities
and the forward/backward recursion formulas:

LM
k
(s

, s) = ln
k
(s

, s)
=
1
2
L
c
y
k,1
u
k
+
1
2
n

v=2
L
c
y
k,v
x
k,v
+
1
2
u
k
L(u
k
)
(4.29)

LM
k
(s) = ln
k
(s)
= ln(

LM
k
(s

,s)
e

LM
k1
(s)
)
= ln(

LM
k
(s

,s)+
LM
k1
(s)
)
(4.30)

LM
k1
(s) = ln
k1
(s

)
= ln(

LM
k
(s

,s)
e

LM
k
(s

)
)
= ln(

LM
k
(s

,s)+
LM
k
(s)
)
(4.31)
Therefore, the log-likelihood ratio is given by
L( u
k
) = ln
(s

,s)

u
k
=+1
e

LM
k
(s

,s)
e

LM
k1
(s)
e

LM
k
(s

)
(s

,s)

u
k
=1
e

LM
k
(s

,s)
e

LM
k1
(s)
e

LM
k
(s

)
= ln(
(s

,s)

u
k
=+1
e

LM
k
(s

,s)
e

LM
k1
(s)
e

LM
k
(s

)
) ln(
(s

,s)

u
k
=1
e

LM
k
(s

,s)
e

LM
k1
(s)
e

LM
k
(s

)
)
(4.32)
55
Max Function Dene
E(x, y) = ln(e
x
+e
y
) (4.33)
ln(e
x
+e
y
) = ln e
x
+ ln(e
x
+e
y
) ln e
x
= x + ln
e
x
+e
y
e
x
= x + ln(1 +e
yx
)
(4.34)
Similar way
ln(e
x
+e
y
) = ln e
y
+ ln(e
x
+e
y
) ln e
y
= y + ln(1 +e
xy
)
(4.35)
Hence
E(x, y) = ln(e
x
+e
y
)
= max(x, y) + ln(1 +e
|xy|
)
(4.36)
and take
E(x, y) = ln(e
x
+e
y
) max(x, y) (4.37)
We can easily prove that in general
E(x
1
, x
2
, , x
k
) = ln
k

i=1
(e
x
i
) = max(x
i
) + ln
k

i=1
(e
x
i
max(x
i
)
)
= max(x
i
) +(x
1
, x
2
, , x
k
)
= max

(x
i
)
(4.38)
Where (x
1
, x
2
, , x
k
) is called the correction term and can be computed using a look-up table.
Using equation (4.38), the calculations of MAP algorithm are done without its complexity.
4.4.3 Max-Log-Map Algorithm
With max-function, the Log-MAP algorithm becomes Max-Log-MAP algorithm resulting in some
degradation in the performance, but, with a drastic reduction in computational complexity. The
correction term in equation (4.38) is negelected.
E(x
1
, x
2
, , x
k
) max(x
i
) (4.39)
A
k
=
MLM
k
= max(
LM
k
(s

, s) +
LM
k1
(s)); (4.40)
B
k
=
MLM
k1
= max(
LM
k
(s

, s) +
LM
k
(s)); (4.41)
L( u
k
) =
(s

,s)
max
u
k
=+1
[
LM
k
(s

, s) +
LM
k1
(s) +
LM
k
(s

)]
(s

,s)
max
u
k
=1
[
LM
k
(s

, s) +
LM
k1
(s) +
LM
k
(s

)] (4.42)
56
4.5 Improvements In Turbo Decoding
4.5.1 Extrinsic Information Scaling
Extrinsic information is calculated as shown in equation (4.15)
L
2
e
( u) = L
2
( u) [L
c
y +L
1
e
(u)] (4.43)
We add a scaling factor s as shown
L
2
e
( u) =
_
L
2
( u) [L
c
y +L
1
e
(u)]
_
s (4.44)
Figure 4.9 shows the performance of the best evaluated scaling factor compared to the standard
algorithm (s = 1) for block length 51 14 and AWGN. For a bit error rate of 10
6
the improvement
of the MLMAP is 0.3dB and the dierence between MLMAP and MAP is now only O.ldB. It is
assumed that the scaling factor reduces the correlation between extrinsic and systematic symbols
which came from the approximation of equation (4.37).
Figure 4.9: turbo code with dierent scaling factors and block length 5114 bit, 8 iterations,
AWGN
4.5.2 The Sliding Window Soft Input Soft Output Decoder
The SISO algorithm requires that the whole sequence has been received before starting the smooth-
ing process. The reason is due to the backward recursion that starts from the (supposed-known)
nal trellis state. As a consequence, its practical application is limited to the case when the dura-
tion of the transmission is short (n small).
A more fexible decoding strategy is oered by modifying the algorithm in such a way that the
SISO module operates on a xed memory span and outputs the smoothed probability distribu-
tions after a given delay, D.
We propose three versions of the Sliding Window SISO that dier in the way they overcome the
problem of initializing the backward recursion without waiting for the entire sequence.
57
Use
MLM
k
We compute the forward recursion using equation 4.40. At time k > D we initialize

MLM
k
as follows

MLM
k
=
MLM
k
(4.45)
Use Equipropable beta
MLM
k
We compute the forward recursion using equation 4.40. At time
k > D we initialize
MLM
k
as follows

MLM
k
=
1
N
(4.46)
Where N is the number of states
Use 2 Backward Recursion Units This solution is based on three recursion units (RUs),
two used for the backward recursion (RU
B1
and RU
B2
), and one forward unit (RU
A
). Each RU
contains operators working in parallel so that one recursion can be performed in one clock cycle.
The horizontal axis in gure (4.10 ) represents time, with units of a symbol period. The vertical
axis represents the received symbol. Thus, the curve (x = y) shows that, at time t = k, the sym-
bol y
k
becomes available. Let us describe how the L symbols y
kLk<2L
are decoded (segment I of
Fig. 4.10). From t = 3L to 4L 1 , RU
B1
performs recursions, starting from y
3L1
down to y
2L
(segment II of Fig. 4.10). This process is initialized with the all-zero state vector , but after it-
erations, the convergence is reached and is then B
2L
obtained. During those L same cycles, RU
A
generates the vectors A
kLk<2L
(segment III of Fig. 4.10). The A
kLk<2L
vectors are stored in
the state vector memory (SVM) until they are needed for the LLR computation (grey area of
Fig. 4.10). Then, between t = 4L and 5L1 , RU
B1
starts from state B
2L1
to B
L
compute down
to (segment IV of Fig. 4.10). At each cycle, the vector A
k
corresponding to the computed B
k
is
extracted from the memory in order to compute L( u
k
). Finally, between t = 5L and 6L 1, the
data are reordered (segment V of Fig. 4.10) using a memory for reversing the LLR (light grey
area of Fig. 4.10). The same process is then reiterated every cycles, as shown in Fig. 4.10.
Figure 4.10: Graphical representation of a real-time MAP architecture
58
4.5.3 Stopping Criteria for Turbo Decoding
Iterative decoding is a key feature of turbo codes. Each decoding iteration results in additional
computations and decoding delay. As the decoding approaches the performance limit of a given
turbo code, any further iteration results in very little improvement. Often, a xed number M is
chosen and each frame is decoded for M iterations. Usually M is set with the worst corrupted
frames in mind. Most frames need fewer iterations to converge. Therefore, it is important to de-
vise an ecient criterion to stop the iteration process and prevent unnecessary computations and
decoding delay.
HDA Although iterative decoding improves the LLR value for each information bit through
iterations, the hard decision of the information bit is ultimately made based on the sign of its
LLR value. The hard decisions of the information sequence at the end of each iteration provide
information on the convergence of the iterative decoding process.
At iteration (i 1), we store the hard decisions of the information bits based on L
(i1)
2
( u) and
check the hard decisions based on L
(i)
2
( u) at iteration If they agree with each other for the entire
block, we simply terminate the iterative process at iteration i This stopping criterion is called the
hard-decision-aided (HDA) criterion.
IHDA Although iterative decoding improves the LLR value (L( u
k
)) for each information bit
through iterations, the hard decision of the information bit is ultimately made based on the sign
of its LLR value. From repeated simulations, it was observed that, as the number of iterations
used increases, for a good (easy to decode) frame, the magnitudes of the LLRs gradually become
larger. Since the term L
c
y is xed for every iteration, the increase in the magnitudes of the
LLRs is due to increases in the magnitudes of the extrinsic information. Since the extrinsic infor-
mation keeps increasing as the number of iteration i increases, it is conceivable, as the decoding
iteration converges to the nal stage, the hard decision based on L
c
y + L
(i)
e1
( u) from the rst
component decoder should agree with the hard decision based on the LLR at the output of the
second component decoder1 according to the following equation
L
2
( u) = L
c
y +L
1
e
( u) +L
2
e
( u) (4.47)
At iteration i, compare the hard decisions of the information bit based on L
c
y +L
(i)
e1
( u) with the
hard decision based on L
(i)
2
( u). If they agree with each other for the entire block, terminate the
iterative process at iteration i.
4.5.4 Modulo Normalization
In a SISO decoder, both A
k
(s) and B
k
(s) grow in magnitude as the recursions proceed. Without
normalization, overow may occur when the data width is nite. To avoid overow, A
k
(s) may
be normalized by subtracting a constant from all the metrics at a given time , and the same is
true for B
k
(s) . This is made possible by the fact that the soft output only depends on the dier-
ence between path metrics but not their magnitudes. Usually, such subtractive normalization is
done according to

A
k
(s) = A
k
(s) max(A
k
(s

)), s (4.48)
59
Figure 4.11: Average number of iterations for various stopping schemes

B
k
(s) = B
k
(s) max(B
k
(s

)), s (4.49)
where

A
k
and

B
k
are path metrics normalized by subtraction. This technique requires extra com-
putations to nd the maxima and perform the subtractions and increases latencies.
Modulo normalization can be implemented inherently by employing twos complement arithmetic.
There are 2 conditions to use it 1) the dierence between path metrics is bounded. 2) path selec-
tion depends only on path metric dierences These 2 conditions are shown to be true in [10]
The idea behind the modulo normalisation is for a metric mi to be replaced by a normalised met-
ric m
i
:
m
i
= (m
i
+C/2) mod C C/2 (4.50)
This normalisation can be represented graphically as wrapping the metric m
i
around a circle
whose circumference equals C, starting from 0 angle point and moving in the counter-clockwise
direction. Also, it can be seen that the range of the normalised metric is now:C/2 m
i
< C/2
. Using this method, the comparison between two metrics is equivalent to comparing the angle
between them (moving in the CCW direction) to . An example of this is shown in Fig. 4.12,
where m
1
< m
2
if and only if < . In order for this method to work correctly, the dierence
between the two metrics being compared has to be smaller than C/2 i.e. ([m
1
m
2
[ < C/2).
It is possible to show that the comparison of two normalised metrics c( m
1
, m
2
) is equivalent to:
c( m
1
, m
2
) = m
w1
1
m
w1
1
c
u
( m
1
, m
2
) (4.51)
where c
u
( m
1
, m
1
) represents an unsigned comparison of the metrics m
1
and m
2
where
m
i
= m
i
mod C/2 (4.52)
( the magnitude of m
i
), as shown in gure 4.13
60
0
256
-256
-512
00 0000 0000 10 0000 0000
Figure 4.12: Graphical example of modulo normalisation.
Figure 4.13: Hardware realisation of modulo normalisation.
4.6 LTE Standard
4.6.1 Turbo Encoder
The coding rate of turbo encoder is 1/3. The structure of turboencoder is illustrated in gure
4.14. The transfer function of the 8-state constituent code is:
G(D) =
_
1,
g
1
(D)
g
0
(D)
_
(4.53)
61
k
c
k
c
k
x
k
x
k
z
k
z
Figure 4.14: Structure of rate 1/3 turbo encoder (dotted lines apply for trellis termination only)
where
g
0
(D) = 1 +D
2
+D
3
(4.54)
g
1
(D) = 1 +D +D
3
(4.55)
The output from the turbo encoder is
d
(
k
0) = x
k
(4.56)
d
(
k
1) = z
k
(4.57)
d
(
k
2) = z

k
(4.58)
4.6.2 Trellis termination for turbo encoder
Trellis termination is performed by taking the tail bits from the shift register feedback after all
information bits are encoded. Tail bits are padded after the encoding of information bits. The
62
rst three tail bits shall be used to terminate the rst constituent encoder (upper switch of gure
4.14 in lower position) while the second constituent encoder is disabled. The last three tail bits
shall be used to terminate the second constituent encoder (lower switch of gure 4.14 in lower
position) while the rst constituent encoder is disabled. The transmitted bits for trellis termina-
tion shall then be:
d
(0)
k
= x
k
, d
(0)
k+1
= z
k+1
, d
(0)
k+2
= x

k
, d
(0)
k+3
= z

k+1
(4.59)
d
(1)
k
= z
k
, d
(1)
k+1
= x
k+2
, d
(1)
k+2
= z

k
, d
(1)
k+3
= x

k+2
(4.60)
d
(2)
k
= x
k+1
, d
(2)
k+1
= z
k+2
, d
(2)
k+2
= x

k+1
, d
(2)
k+3
= z

k+2
(4.61)
4.6.3 Interleaver
The bits input to the turbo code internal interleaver are denoted by c
0
, c
1
, ..., c
k1
, where k is the
number of input bits. The bits output from the turbo code internal interleaver are denoted by
c

0
, c

1
, ..., c

k1
. The relationship between the input and output bits is as follows:
c

i
= c
(i)
, i = 0, 1, (k 1) (4.62)
where the relationship between the output index i and the input index (i) satises the following
quadratic form:
(i) = (f
1
i +f
2
i
2
) mod k (4.63)
The parameters f
1
and f
2
depend on the block size k and are summarized in [1]
4.7 Implementation of Turbo Encoder
4.7.1 Encoder
The function of the Encoder
Its used to get the encoded bits with rate 1/3.
Turbo Encoder block diagram
63
The input ports of the ENCODER
1. c:
Its the input 40 bits of data (codeblock length).
2. clk:
Its the clock of the system to synchronize the system.
3. reset:
Its used to reset the all system and the block.
The output ports of the ENCODER
1. d
0
k
:
It represents the systematic output from the Turbo Encoder.
2. d
1
k
:
It represents the parity one output from the Turbo Encoder.
3. d
2
k
:
It represents the parity two output from the Turbo Encoder.
4. enable:
Its used to indicate that output is ready at output ports.
4.7.2 The Turbo Encoder main blocks
Turbo Encoder blocks diagram
We note that the Turbo Encoder contains seven blocks with ve main blocks
64
1. PISO.(Parallel input serial output ).
2. The Interleaver.
3. The Convolutional code. (The core of turbo Encoder).
4. SIPO.(Serial input prallel output).
5. Trellis.
4.7.3 PISO
The function of the PISO
Its used to transfer the parallel bits to serial bits.
PISO block diagram
The input ports of the PISO
1. d:
Its the input 40 bits of data (codeblock length).
2. clk:
Its the clock of the system to synchronize the system.
3. reset:
Its used to reset the all system and the block.
4. f:
Its the feedback data comes from the convolutional block at switching period.
65
The output ports of the PISO
1. q:
The serial output bits from PISO block.
2. x
k
:
Its the 43 bits containg the systematic bits and 3 bits from the convolutional code feedback
3. load:
Its a signal to indicate that the output bits is available at the output port.
4. rc:
Its one output pulse for one clock cycle only.
4.7.4 Interleaver
The function of the Interleaver
Its used to randomize the input data with random sequence.
Interleaver block diagram
The input ports of the Interleaver
1. D:
Its the input 40 bits of data (codeblock length).
2. clk:
Its the clock of the system to synchronize the system.
3. reset:
Its used to reset the all system and the block.
4. f:
Its the feedback data comes from the convolutional code feedback at switching period .
66
The output ports of the Interleaver
1. Q:
The serial output bits from the Interleaver block.
2. x
dk
:
Its the 43 bits block containing the interleaved bits and 3 bits from the convolutional code.
3. load:
Its a signal to indicate that the output bits is available at the output port.
4. rc:
Its one output pulse for one clock cycle only.
4.7.5 Convolutional code
The function of the Interleaver
Its the core of the Turbo Encoder.
Convolutional block diagram
The input ports of the Convolutional
1. d:
Its the input port for data bits.
2. clk:
Its the clock of the system to synchronize the system.
3. reset:
Its used to reset the block and ll the three registers with zeros.
4. en:
Its used to enable the block.
67
The output ports of the Convolutional
1. q:
The output encoded bits.
2. sw:
Its feedback signal to the PISO and Interleaver blocks.
3. rd:
Its a signal to indicate that the output bits is available at the output port.
4.7.6 SIPO
The function of the SIPO
It accepts serial bits and give block of parallel bits
SIPO block diagram
The input ports of the SIPO
1. d:
Its the input serial bits which come from the Convolutional block.
2. clk:
Its the clock of the system to synchronize the system.
3. reset:
Its used to reset the block.
The output ports of the SIPO
1. q:
Its one output block, contains 43 bits.
2. Load:
Its a signal to indicate that the output bits is available at the output port.
68
4.7.7 TRELLIS
The function of the TRELLIS
the function of the trellis is to form the trellis termination.
TRELLIS block diagram
The input ports of the TRELLIS
1. x
k
:
Its the one stream of 43 bits comes from the PISO.
2. x
dk
:
Its the one stream of 43 bits comes from the Interleaver.
3. z
k
:
Its the one stream of 43 bits come from the SIPO represent the encoded systematic bits.
4. z
dk
:
Its the one stream of 43 bits come from the SIPO represent the encoded interleaved bits.
5. clk:
Its the clock of the system to synchronize the system.
6. reset:
Its used to reset the block.
The output ports of the TRELLIS
1. d
0
k
:
It represents the systematic output from the Turbo Encoder
2. d
1
k
:
It represents the parity one output from the Turbo Encoder
69
3. d
2
k
:
It represents the parity two output from the Turbo Encoder
4. load:
Its a signal to indicate that the output bits is available at the output port.
4.8 Simulations of Turbo Encoder
4.8.1 By using Modelsim and Matlab
We will make the simulation by using the Modelsim and check the results by using Matlab
Let the 40 input bits of the Turbo Encoder are c = 0011000111011000101010111110001010100010.
Output by Matlab
d
0
k
= 00110001110110001010101111100010101000100010.
d
1
k
= 00100011100110100010011000100011000001100010.
d
2
k
= 00001011101001100100011110100011110011000000.
Output by Modelsim
0011000111011000101010111110001010100010
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU 00110001110110001010101111100010101000100010
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU 00100011100110100010011000100011000001100010
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU 00001011101001100100011110100011110011000000
0 ps 200000 ps 400000 ps 600000 ps
/encodertest/c 0011000111011000101010111110001010100010
/encodertest/clk
/encodertest/reset
/encodertest/dk0 UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU 00110001110110001010101111100010101000100010
/encodertest/dk1 UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU 00100011100110100010011000100011000001100010
/encodertest/dk2 UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU 00001011101001100100011110100011110011000000
/encodertest/enable
Output simulation of the Turbo Encoder by using Modelsim
We note that the output from the Modelsim and Matlab is identical.
4.9 Workow for Turbo Decoder
The work ow used consists of two main steps: Design and implementation .See g 4.15
4.9.1 Design
The LTE standard has very high technical requirements , when it comes to frequency and round
trip time. The turbo decoder by nature is a computationally intensive unit . A lot of research
has been published to optimize the turbo decoder , reducing complexity , power consumption and
latency . The aim of this phase is to design a turbo decoder that is simple and ecient . It has
to be suitable for implementation on FPGA .
The design process starts with exploring the research published to nd techniques to optimize
the decoder . These various techniques are simulated and compared using Matlab . The nal de-
cision is made based on the results obtained . See gure 4.16
70
Design
Implementaon
Figure 4.15: The work ow used
The oating point arithmetic is complex and not suitable for FPGA implementation . Integer
arithmetic will cause a huge performance degradation . Thus, xed point arithmetic is the most
suitable . The oating point design previously developed is quantized to obtain xed point design
. This design will later be used as reference for the VHDL implementation. See gure 4.17
4.9.2 Implementation
The bottom up design method was used for implementing the decoder . The smaller blocks were
rst developed , then grouped and wired to form the top level design . The xed point design
was used as reference . Each block was tested individually and the whole system was veried .
The workow is shown in gure 4.18
71
Research
Simulate
Decide
Figure 4.16: Steps of oating point design
Floang point Design
Fixed Point Desgn
Quanzaon
Figure 4.17: Fixed point design is obtained by quantizing the oating point design
72
Fixed Point
Design
RTL Design
RTL
Vericaon
Synthesis and
opmizaon
RTL vs Netlist
vercaon
FPGA
implementaon and
tesng
Figure 4.18: Steps of implementation
73
4.10 Design Phase
4.10.1 Algorithm
Two algorithms for turbo decoding were tested : Map and Max Log Map
Figure 4.19 shows the performance of Map algorithm for dierent number of iterations. Figure
4.20 shows a comparison between Map and Max Log Map algorithm . The map algorithm uses
logarithmic functions and multiplications. Thus, its not suitable for FPGA. On the other hand ,
the Max Log Map algorithm used addition and max function . So , we will use the Max Log Map
algorithm .
1 0 1 2 3 4 5 6
10
4
10
3
10
2
10
1
10
0
Es/No (dB)
B
i
t

e
r
r
o
r

r
a
t
e


uncoded bits
iter 1
iter 2
iter 3
iter 6
iter 18
Figure 4.19: BER rate curve for turbo codes using Map at dierent iterations
4.10.2 Extrinsic Information Scaling
The extrinsic information scaling was tested for a factor of 1 , 0.75 and 0.7 . The results are
shown in gure 4.21 . The 0.7 scale shows slightly better performance then 0.75, but the 0.75
is simpler to implement on FPGA. So we choose 0.75 .
4.10.3 Sliding window
Three methods for sliding window were investigated : reusing A , assuming equiprobable , using
2 B units . See gure 4.22 . It shows no performance degradation compared to normal normal
decoder as shown in gure 4.23 . So its our choice for sliding window .
4.10.4 Stopping Criteria
As seen in gure 4.24 the HDA exhibits the best performance . So , it is chosen despite it has a
minimum of 2 iterations .
74
1 0.5 0 0.5 1 1.5 2
10
4
10
3
10
2
10
1
10
0


Mine Max
Mine Map
Figure 4.20: comparison between max log map and map BER curves (interleaver size=1088 num-
ber of iterations = 3)
1 0.5 0 0.5 1 1.5 2
10
4
10
3
10
2
10
1
10
0
scaling vs no scaling iter=3


scale=1
scale=0.75
scale=0.7
Figure 4.21: comparison between dierent scaling factors (interleaver size=1088 number of itera-
tions = 3)
4.10.5 Internal word length
Figure 4.25 shows the eect of changing the word length for the internal calculations of the in-
terleaver on the BER As seen in gure BER starting from word length of 11 and going up stop
decreasing . So we choose word length of 11. Comparing to oat point in gure 4.26, there is ap-
75
1 0.5 0 0.5 1 1.5 2 2.5 3
10
5
10
4
10
3
10
2
10
1
10
0
Eb/No
B
E
R


a reuse
Equipropable
Dummy b
Figure 4.22: comparison between dierent sliding window techniques (interleaver size=1088 num-
ber of iterations = 3)
1 0.5 0 0.5 1
10
3
10
2
10
1
10
0
Eb/No
B
E
R


normal
SW dummy B
Figure 4.23: comparison between two B units and no sliding window (interleaver size=1088 num-
ber of iterations = 3)
proximately no increase in BER.
76
0 1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8


Eb/No (dB)
n
u
m
b
e
r

o
f

i
t
e
r
a
t
i
o
n
s
HDA
IHDA
GENIE
Figure 4.24: comparison between dierent early stopping criteria
9 10 11 12 13 14
0.0326
0.0328
0.033
0.0332
0.0334
0.0336
0.0338
0.034
0.0342
word length
B
E
R


Figure 4.25: relation between BER and internal size of turbo decoder at SNR -9.16 dB and 2
iterations
77
10 9 8 7 6 5 4 3
10
3
10
2
10
1
10
0
Es/No (dB)
B
i
t

e
r
r
o
r

r
a
t
e
fixed wld=8 wl=11 vs floating


Fixed
Floating
Figure 4.26: comparison between oating point and xed point turbo decoder with internal
width of 11 (interelaver size=1088 number of iterations = 2)
78
4.11 Implementation of Map Decoder
4.11.1 Architecture
Figure 4.27 shows the top level architecture of the map decoder.
ACS_elem
a_column
aRam aExt
ysRam
leRam
BMU_column
gamma
Ram
ACS_elem
b_column
bExt calcLe
yp
ys
LeIn
LeOut
decision
Figure 4.27: High-level VLSI architecture of the implemented max-log map decoder (thin boxes
indicate registers).
4.11.2 Timing
First gamma is calculated . After the rst value of gamma is calculated , the corresponding al-
pha gets calcualted . At the last value of gamma , Beta calculation starts , followed directly by
extrinsic value calculations . Timing diagram for map decoder is shown in gure 4.18 .
4.12 Implementation of Turbo Decoder
4.12.1 Architecture
Figure 4.28 shows the top level architecture of the turbo decoder.
79
y1pR
y2pR
Ram
mapDec
Inter
Deinter
Le
interYs
decisionDeint
ysRam ysRam
ysRam
Interface
LeR
din
op
Figure 4.28: High-level VLSI architecture of the implemented turbo decoder.
4.12.2 Timing
First , inputs are read and stored in ysRam , y1pRam and y2pRam . Trellis termination are read
into ttRam. In the following cycles values stored in ttRam are written in the proper ram after re-
ordering them. During this time Le input is equal to zero . ysRam is interfaced to enable ready
y1s and y2s . During initial write data are read into the map decoder unit , and clock is disabled
until trellis termination is nished and then map operation continues until its nished . Extrin-
sic values output from mapDec are written to LeRam and are read interleaver for the second
stage. Timing diagram for turbo decoder is shown in gure 4.18 .
80
I
D
T
a
s
k

N
a
m
e
1
r
e
a
d

a
n
d

w
r
i
t
e

i
n
p
u
t
s

(
y
s
,

y
p
,

L
e
)
2
b
r
a
n
c
h

m
e
t
r
i
c
s

c
a
l
c
u
a
l

o
n

(
g
a
m
m
a
)
3
f
o
r
w
a
r
d

m
e
t
r
i
c
s

c
a
l
c
u
l
a

o
n

(
a
l
p
h
a
)
4
r
e
a
d

b
r
a
n
c
h

m
e
t
r
i
c
s
5
r
e
a
d

f
o
r
w
a
r
d

m
e
t
r
i
c
s
6
c
a
l
c
u
l
a
t
e

B
a
c
k
w
a
r
d

m
e
t
r
i
c
s
7
c
a
l
c
u
l
a
t
e

E
x
t
r
i
n
s
i
c

V
a
l
u
e
s
-
6
1
7
1
3
1
9
2
5
3
1
3
7
4
3
4
9
5
5
6
1
6
7
7
3
7
9
8
5
F
i
g
u
r
e
4
.
2
9
:
T
h
e
t
i
m
i
n
g
d
i
a
g
r
a
m
o
f
t
h
e
i
m
p
l
e
m
e
n
t
e
d
m
a
p
d
e
c
o
d
e
r
I
D
T
a
s
k

N
a
m
e
1
r
e
a
d

a
n
d

w
r
i
t
e

i
n
p
u
t
s

(
y
s
,

y
1
p
,

L
e
)
2
s
t
a
r
t

m
a
p

d
e
c
o
d
e
r

s
t
a
g
e

1
3
w
r
i
t
e

d
a
t
a

i
n
t
o

t
r
e
l
l
i
s

t
e
r
m
i
n
a

o
n

r
a
m
4
w
r
i
t
e

d
a
t
a

i
n
t
o

p
r
o
p
e
r

r
a
m

a
n
d

l
o
c
a

o
n
5

n
i
s
h

m
a
p

s
t
a
g
e

1
6
w
r
i
t
e

L
e
7
r
e
a
d

L
e

a
n
d

y
s

i
n
t
e
r
l
e
a
v
e
d

a
n
d

y
2
p
8
s
t
a
r
t

m
a
p

d
e
c
o
d
e
r

s
t
a
g
e

2
-
6
7
1
9
3
1
4
3
5
5
6
7
7
9
9
1
1
0
3
1
1
5
1
2
7
1
3
9
1
5
1
1
6
3
1
7
5
1
8
7
F
i
g
u
r
e
4
.
3
0
:
T
h
e
t
i
m
i
n
g
d
i
a
g
r
a
m
o
f
t
h
e
i
m
p
l
e
m
e
n
t
e
d
m
a
p
d
e
c
o
d
e
r
81
4.12.3 Power
Detailed power estimation is shown in table 4.1 and the summary in 4.2 . As seen from table, the
leakage power constitute the majority of the estimated power consumption .
On-Chip Power (W)
Clocks 0.092
Logic 0
Signals 0.001
BRAMs 0.031
IOs 0
Leakage 1.191
Total 1.315
Table 4.1: Detailed power consumption
Type Power (W)
Quiescent 1.191
Dynamic 0.124
Total 1.315
Table 4.2: Summary of power consumption
4.12.4 Ressource utilization
Table 4.3 shows the Virtex 5 ressources consumed by the design . Notice that these ressource
dont are just a small fraction of the resources available. Figure 4.31 shows the design after place
and route .
4.12.5 Throughput
Table 4.4 shows throughput of the implemented decoder
4.12.6 BER
Figure 4.32 shows the BER perfermance of the decoder . Unfortunately , only one iteration has
been implemented .
82
Resource usage
LUT /FF Pairs 2,447
Slice LUTs 2,171
Slice Registers 1,178
Block RAMs (36k) 2
Block RAMs (18k) 8
Max Clock Freq 201.295 MHz
Table 4.3: Resources utilization
Number of Cycles 210
Throughput 38.09 MHz
Table 4.4: Throughput of the implemented design
83
Figure 4.31: The placed and routed design on FPGA
84
1 0 1 2 3 4 5 6
10
5
10
4
10
3
10
2
10
1
10
0
Eb/No (dB)
B
i
t

e
r
r
o
r

r
a
t
e


iter 1
iter 2
iter 3
iter 6
iter 18
Figure 4.32: BER curves for the implemented decoder
85
86
Bibliography
[1] 3GPP. Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and channel
coding. TS 36.212, 3rd Generation Partnership Project (3GPP), January 2010.
[2] IEEE Schekeb Fateh Student Member IEEE Christian Benkeser Member IEEE
Christoph Studer, Member and IEEE Qiuting Huang, Fellow. Implementation trade-os
of soft-input soft-output map decoders for convolutional codes. 2007.
[3] Jelena Dragas. Design trade-os in the vlsi implementation of high-speed viterbi decoders
and their application to mlse in isi cancellation jelena draga. Masters thesis, Institut fr Inte-
grierte Systeme Integrated Systems Laboratory, March 2011.
[4] Warren J. Grossand P. Glenn Gulak Emmanuel Boutillon. Vlsi architectures for the map
algorithm. IEEE Transactions on Communications, 51(2), 2003.
[5] U. Vilaipornsawai M.R.Soleymani, Yingzi Gao. Turbo Coding for Satellite and Wireless
Communications. The Kluwer International Series in Engineering and Computer Science.
Kluwer Academic Publishers, 2002.
[6] T.M.N. Ngatched and F. Takawira. Simple stopping criterion for turbo decoding. Electronics
Letters, 37(22), 2001.
[7] Shu Lin Rose Y. Shao and Marc P. C. Fossorier. Two simple stopping criteria for turbo de-
coding. IEEE Transactions on Communications, 47(8), 1999.
[8] G. Montorsi a S. Benedetto, D. Divsalar and F. Pollara. A soft-input soft-output maximum
a posteriori (map) module to decode parallel and serial concatenated codes. Technical re-
port, TDA Progress Report, 1996.
[9] J. Vogt and A. Finger. Improving the max-log-map turbo decoder. Electronics Letters,
36(23), November 2000.
[10] Brian D. Woerner Yufei Wu and T. Keith Blankenship. Data width requirements in siso de-
coding with modulo normalization. IEEE Transactions on Communications, 49(11), Novem-
ber 2001.
87
88
Chapter 5
RATE MATCHING
The Rate-Matching (RM) algorithm selects bits for transmission from the rate 1/3 turbo coder
output via puncturing and/or repetition. Since the number of bits for transmission is determined
based on the available physical resources, the RM should be capable of generating puncturing
patterns for arbitrary rates. Furthermore, the RM should send as many new bits as possible
in retransmissions to maximize the Incremental Redundancy (IR) HARQ gains The main con-
tenders for LTE RM were to use the same (or similar) algorithm as HSPA,or to use Circular
Buer (CB) RM as in CDMA2000 1xEV and WiMAX as shown in,5.1 .
89
Figure 5.1: Circular-buer rate matching for turbo
90
5.1 Subblock interleaving
The bits input to the block interleaver are denoted by:
where D = K + 4 is the number of bits for each of systematic, parity 1 and parity 2 streams.
Note that K is the number of bits within a codeblock with bits xk , k = 0, 1, 2, . . . ,K ? 1,
and trellis termination adds four bits to each of systematic, parity 1 and parity 2 streams The
sub block interleaving is achieved by writing row-wise in a rectangular matrix, applying matrix
columns permutations and nally reading from the matrix column-wise. The number of columns
in the matrix is xed to 32, that is
The number of rows of the matrix:
Then
When the number of bits D does not completely ll the
Rectangular matrix, dummy bits are padded to fully ll the matrix as below:
Note that the maximum number of dummy bits is limited to
91
and these bits are added to the beginning of the stream. Also, note that when
no dummy bits need to be added as the total D bits fully ll the matrix in this case. The input
bit sequence is then written into the
rectangular matrix row by row starting with bit y0 in column 0 of row 0 as below:
5.2 permutation
The turbo code tail bits are uniformly distributed into the three streams, with all streams the
same size. Each sub-block interleaver is based on the traditional row-column interleaver with 32
columns (for all block size), and a simple length-32 intra-column permutation.
A length-32 column permutation is applied and the bits are read out column-by-column to form
the output of the sub-block interleaver for systematic and parity1
[0,16, 8,24,4,20,12,28,2,18,10,26,6,22,14,30,1,179,25,5,21,13,29,3,19,11,27,7,23 15,31]
For parity 2 stream, the output of the sub block interleaver permutation Given by equation
This leads to the foremost advantage of the LTE CB approach, in that it enables ecient HARQ
operation, because the CB operation can be performed without requiring an intermediate step of
92
forming any actual physical buer. In other words, for any combination of the 188 stream sizes
and 4 RV values, the desired codeword bits can be equivalently obtained directly from the out-
put of the turbo encoder using simple addressing based on sub-block permutation. Therefore the
term Virtual Circular Buer (VCB) is more appropriate in LTE. The LTE VCB operation also
allows Systematic Bit Puncturing (SBP) by dening RV = 0 to skip the (2*Rsubblock)bits lead-
ing to approximately six percentage punctured systematic bits (with no wrap around).
5.3 Subblock interlacing
The circular buer length is K
w
= 3K

, where K

is the number of interleaved bits in each of


systematic, parity 1 and parity 2 streams. The bit stream in the circular buer is denoted as
w
0
, w
1
, w
2
, ......w
K
W
1
and is given as:
w
k
= v
(0)
k
where k = 0, 1, 2, ..., (K

1)
w
K+2k
= v
(1)
k
where k = 0, 1, 2, ..., (K

1)
w
K+2k+1
= v
(2)
k
where k = 0, 1, 2, ..., (K

1)
subblock interleaver
It should be noted that the subblock interlacing is only performed between parity 1 and 2 bits as
shown in the Figure. The systematic bits are not interlaced. The reason is that systematic bits
are generally part of the rst hybrid ARQ transmission. In response to hybrid ARQ NACK, for
example, subblock interlacing guarantees that an equal amount of parity 1 and 2 bits are trans-
mitted.
5.4 Hybrid ARQ soft buer limitation
The soft buer size for the rth code blockN
cb
is given as:
N
cb
=
_
min
__
N
IR
C
_
, K
W
_
downlink
KW uplink
93
where C is the number of codeblocks within the transport block andK
W
is the circular buer size
for the rth codeblock. N
IR
is soft buer size per codeword per hybrid ARQ process available at
the UE and is given as:
N
IR
=
_
N
soft
K
mimo
.min(M
DLHARQ
, M
Limit
)
_
where N
soft
is the total soft buer size, which is set by higher layers. K
mimo
= 1, 2 for the case of
single codeword and dual-codeword MIMO spatial multiplexing respectively. M
DLHARQ
= 8 is
the maximum number of hybrid ARQ processes and M
Limit
= 9
We note that the soft buer limitation only applies for the downlink due to soft buering con-
cerns for the UE receiver. In the uplink, there is no soft buer limitation for the eNB and hence
incremental redundancy can always be used. The soft buer size is directly proportional to the
supported data rate and is inversely proportional to the turbo coding rate. The idea with soft
buer limitation is that if UE has a certain buer size dimensioned for a given data rate and a
given coding rate then it can support either higher data rates with increasing coding rate (weaker
code) or lower data rates with a stronger code.
5.5 RV starting points
The transmission of bits from two codeblocks from the same transport block within a single re-
source element is avoided by rst dening G

as:
G

=
G
(N
L
Q
M
)
where G is the total number of bits available for the transmission of one transport block and
Q
M
= 2, 4, 6 for QPSK, 16 QAM and 64 QAM respectively. N
L
= 1 for transport blocks
mapped onto one MIMO transmission layer and N
L
= 2 for transport blocks mapped onto two or
four MIMO transmission layers.
Let us now set:
= G

modC
The rate-matching output sequence of length Efor the rth coded block is then given as:
E =
_
N
L
.Q
m
. G

/C| , r C 1
N
L
.Q
m
. ,G

/C| , otherwise
We note that some codeblocks may need to use one fewer resource element and some others one
more resource element to avoid mixing of bits in the same resource element from two codeblocks
from the same transport block. It should also be noted that the rate-matching output sequence
length E is determined independently of the codeblock size. we also note that the codeblocks
with lower index r C 1 may use one fewer resource element than the codeblocks with
higher index r > C 1.
The rate-matching output bit sequence is:
e
k
= w
(ko+j)modN
cb
k = 0, 1, 2, 3, ....., (E 1), j = 0, 1, 2, 3, ...., (K
W
1)
94
Note that the bit positions with w
(ko+j)modN
cb
= NULL, which denote dummy bits in the circu-
lar buer, a total of 3N
D
= (K
W
E) , are ignored and not included in the transmission. The
Redundancy Version (RV) starting point k
o
is given as:
k
o
= R
subblock
.
_
2.
_
N
c
b
8.R
subblock
_
.rv
idx
+ 2
_
rv
idx
= 0, 1, 2, 3
Where rv
idx
= 0, 1, 2, 3. The operation(k
o
+ j)modN
cb
in previous equation makes sure that the
bit index is reset to the rst bit in the buer when the index reaches the maximum index of N
cb
,
which is the idea of a circular buer.
5.6 Implementation of Rate Matching Transmitter
5.6.1 The Rate Matching Transimatter main blocks
Implementation of rate matching transmitter
The main blocks of the transmitter
1. Three Sub block interleavers .
2. Bit collection.
3. Bit selection.
5.6.2 Sub block interleaver
We have two tybes of sub block interleaver
95
5.6.3 The function of the Sub block interleaver
Its used to randomize the bits.
Sub block interleaver block diagram
The input ports of the Sub block interleaver
1. d:
Its the input 43 bits of data (encoded bits).
2. clk:
Its the clock of the system to synchronize the system.
3. reset:
Its used to reset the all system and the block.
4. load:
Its used to enable the block to receive bits.
The output ports of the Sub block interleaver
1. Q
1
:
The rst output bits from the sub block interleaver block.
2. Q
2
:
The second output bits from the sub block interleaver block.
3. en:
Its a signal to indicate that the interleaved bits is available at the output ports.
96
5.6.4 Bit collection
The function of the Bit collection
Its to collect the interlaved bits from the Sub blocks and interlace them
Bit collection block diagram
The input ports of the Bit collection
1. w
10
, w
20
:
the input ports from the rst sub block interleaver.
2. w
11
, w
21
:
the input ports from the second sub block interleaver.
3. w
12
, w
22
:
the input ports from the third sub block interleaver.
4. clk:
Its the clock of the system to synchronize the system.
5. load
1
,load
2
,load
3
:
Its used to enable the block.
The output ports of the Bit collection
1. w
k1
,w
k2
:
The interlaced output bits from the Bit collection block.
2. load:
Its a signal to indicate that the output bits is available at the output ports.
97
5.7 Simulation of Transmitter
We note that we will make simulations by using Modelsim and check results by using Matlab.
5.7.1 the rst Sub block interleaver
We will use the results from the previous simulation of Turbo Encoder.
the input is:
d
0
k
= 00110001110110001010101111100010101000100010.
By using matlab
v
0
k
= 9190910091019110909191019111910190909000900091109090911090109010.
We note that we represent the dummy variable by 9.
By using Modelsim
00000000000... 00110001110110001010101111100010101000100010
UUUUUUUU... 0100010001010110000101010111010100000000000001100000011000100010
UUUUUUUU... 1110110011011110101111011111110110101000100011101010111010101010
0 ps 200000 ps 400000 ps
/subblock1test/d 00000000000... 00110001110110001010101111100010101000100010
/subblock1test/reset
/subblock1test/load
/subblock1test/clk
/subblock1test/Q1 UUUUUUUU... 0100010001010110000101010111010100000000000001100000011000100010
/subblock1test/Q2 UUUUUUUU... 1110110011011110101111011111110110101000100011101010111010101010
/subblock1test/en
The First Sub block interleaver simulation by Modelsim
We note that the dummy variables representation in Matlab are dierent from VHDL representa-
tion.
5.7.2 the Third Sub block interleaver
We will use the results from the previous simulation of Turbo Encoder.
the input is:
d
2
k
= 00001011101001100100011110100011110011000000.
By using matlab
v
2
k
= 9190910191019000909191109000900091919110900091109001911090119009.
We note that we represent the dummy variable by 9.
98
By using Modelsim
00000000000... 00001011101001100100011110100011110011000000
UUUUUUUU... 0100010101010000000101100000000001010110000001100001011000110000
UUUUUUUU... 1110110111011000101111101000100011111110100011101001111010111001
0 ps 200000 ps 400000 ps
/subblock3test/d 00000000000... 00001011101001100100011110100011110011000000
/subblock3test/reset
/subblock3test/load
/subblock3test/clk
/subblock3test/Q1 UUUUUUUU... 0100010101010000000101100000000001010110000001100001011000110000
/subblock3test/Q2 UUUUUUUU... 1110110111011000101111101000100011111110100011101001111010111001
/subblock3test/en
The Third Sub block interleaver simulation by Modelsim
5.7.3 The Bit collection Block
By using matlab
The input is:
v
0
k
= 9190910091019110909191019111910190909000900091109090911090109010.
v
1
k
= 9190900090009010919191119110910190909101900090009091901090009010.
v
2
k
= 9190910191019000909191109000900091919110900091109001911090119009.
The output is:
w
k
= 9190910091019110909191019111910190909000900091109090911090109
01099119900990100019910019900100099109911991111109910100099100
0109901990199110110990000009901010099009011990111009900010199001009
By using Modelsim
0100010001010110000101010111010100000000000001100000011000100010
0100000000000010010101110110010100000101000000000001001000000010
0100010101010000000101100000000001010110000001100001011000110000
1110110011011110101111011111110110101000100011101010111010101010
1110100010001010111111111110110110101101100010001011101010001010
1110110111011000101111101000100011111110100011101001111010111001
...010001000101011000010101011101010000000000000110000001100010001000110000000100010001000100001000...
...111011001101111010111101111111011010100010001110101011101010101011111100110100011101000111001000...
0 ps 200000 ps 400000 ps 600000 ps
/collectiontest/vk10 0100010001010110000101010111010100000000000001100000011000100010
/collectiontest/vk11 0100000000000010010101110110010100000101000000000001001000000010
/collectiontest/vk12 0100010101010000000101100000000001010110000001100001011000110000
/collectiontest/vk20 1110110011011110101111011111110110101000100011101010111010101010
/collectiontest/vk21 1110100010001010111111111110110110101101100010001011101010001010
/collectiontest/vk22 1110110111011000101111101000100011111110100011101001111010111001
/collectiontest/load1
/collectiontest/load2
/collectiontest/load3
/collectiontest/clk
/collectiontest/wk1 ...010001000101011000010101011101010000000000000110000001100010001000110000000100010001000100001000...
/collectiontest/wk2 ...111011001101111010111101111111011010100010001110101011101010101011111100110100011101000111001000...
/collectiontest/en
The interlacing Modelsim simulation.
99
5.7.4 The Bit selection Block
By using matlab
The input is:
w
k
= 9190910091019110909191019111910190909000900091109090911090109
01099119900990100019910019900100099109911991111109910100099100
0109901990199110110990000009901010099009011990111009900010199001009
The output is:
at rv=0
e
k
= 10010111001101111101000000001100011001001011
at rv=1
e
k
= 11001001011000100010100010010001011111110101
at rv=2
e
k
= 11111110101000100010010111011000000001010000
at rv=3
e
k
= 00000101000001101110000010100100101001011100
By using Modelsim
The Bit selection Modelsim simulation for rv = 0.
The Bit selection Modelsim simulation for rv = 1.
100
The Bit selection Modelsim simulation for rv = 2.
The Bit selection Modelsim simulation for rv = 3.
101
5.8 Simulation of receiver
5.8.1 Matlab
There are four cases :-
1.Rv=0 sending rst part of circular buer only , turbo decoder can detect and correct data.
2.Rv=1 sending rst part of circular buer ,second part and turbo decoder can detect and cor-
rect data.
3.Rv=2 sending rst part of circular buer ,second part ,third part and turbo decoder candetect
and correct data.
4.Rv=3 sending rst part of circular buer ,second part ,third part ,last part and turbo decoder
can detect correct correct data.
In each case turbo decoder chick data and decide if it need more copy about this data or not.
Ex:-
First case if Rv=0 and rst part of data[1:48]=1.
Output after de puncturing
wk=[0000111111111111111111111111111111111111111111111111000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000]
Second case Rv=1 and rst part of data equal second part of data[1:48]=1
Output after de puncturing
wk=[0000111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111110000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000]
third case Rv=2 and all previous parts are equal data[1:144]=1.
Output after de puncturing
WK=[0000111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111
1111111111111111111100000000000000000000000000000000000000000000]
102
Fourth case Rv=3 and rst part of data=1:48 and second part of
data=49:96 and third part=97:144 and fourth part=145:192.
Output after de puncturing
103
Output after De interlacing for the fourth case
104
105
we can note that parity0 take odd number and parity1 take the even [2]
106
Output after de permutation
107
5.8.2 VHDL
There are four cases:-
First case:-
Ex1:-if received data at circular buer is ek0[0:48]=111111111..... At RV=0
output will be wk 192 bit lling remainder bits by 0s and put ek0 start
from wk(5) due to ko as in previous section.
108
109
Second case:-
Ex2:-if received data at circular buer is ek1[0:48]=[ones(0:23) zeros(0:23)]
At RV=1 it store ek0 to use it and ek1 to conrm wk output will be wk
192 bit lling remainder bits by 0s.
Third case:-
Ex3:-if received data at circular buer is ek2[0:48]=[ones(0:23) zeros(0:23)] At RV=2 it
store ek0 and ek1 to use them and ek2 to conrm wk output will be wk 192 bit lling re-
mainder bits by 0s.
110
fourth case:-
Ex4:-if received data at circular buer is ek3[0:48]=11111111..... At RV=3
it store ek0 and ek1 and ek2 to use them and ek3 to conrm wk output will
be wk 192 bit.
We note that( wk ) have four ones more than f4 which mean that last in-
put rv=3 rotate to complete least signicant nibble.
last step de permutation
Ex5:-if input to bit selection wk =1010101010........To 192 bit and output is
systematic and parity0 and parity1.
111
112
Bibliography
[1] 3GPP. Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and channel cod-
ing. TS 36.212, 3rd Generation Partnership Project (3GPP), January 2010.
[2] Farooq Khan. LTE for 4G Mobile Broadband. Cambridge university press, 2009.
113
114
Chapter 6
Scrambling
6.1 PN-sequences
Noise-like wideband spread-spectrum signals are generated using PN sequence.
In DS/SS(direct-sequence spread-spectrum) , a PN spreading waveform is a time function
of a PN sequence.
In FH/SS(frequency-hopping spread-spectrum), frequencyhopping patterns can be gener-
ated from a PN code.
PN sequences are deterministically generated, however they almost like random sequences
to an observer.
The time waveform generated from the PN sequences also seem like random noise.
6.1.1 m-sequences
M-sequences have been studied extensively as the nearest approximation to random sequences.
M-sequences have found numerous applications in modern communication systems, including
spread spectrum Code Division Multiple Access (CDMA). These applications require large sets
of codes with2 highly peaked autocorrelation and minimum cross-correlation. M-sequence (binary
maximal length shift-register sequence)
Generated using linear feedback shift-register and exclusive OR-gate circuits.
Linear generator polynomial g(x) of degree m > 0
g(x) = g
m
x
m
+g
m1
x
m1
+.... +g
1
x +g
0
Recurrence Equation (g
m
= g
0
= 1)
x
m
= g
m1
x
m1
+g
m2
x
m2
+..... +g
1
x +g
0
If g
i
= 1 , the corresponding circuit switch is closed, otherwise g
i
,= 1 , it is open.
Output of the shift-register circuit is transformed to 1 if it is 0, and 1 if it is 1.
115
The maximum number of non-zero state is 2
m
1 , which is the maximum period of output
sequence c = (c
0
, c
1
, c
2
, ......)
State of the shift-register at clock pulse i is the nite length vector
s
i
= (s
i
(m1), s
i
(m2), ...., s
i
(0))
and the output at clock pulse is c
i
= s
i
(0)
Output sequence recurrence condition according to g(x)
c
i+m
= g
m1
c
i+m1
+g
m2
c
i+m2
+.... +g
i
c
i+1
+ci
Example of a shift-register sequence For any nonzero starting state (s
0
= (0, 0, 0, 0, 0)) ,
the state of shift-register varies according to the recurrence condition.
Other g(x) may yield a sequence of shorter period than 2
m
1
For dierent initial loading, output sequences become a shift of the sequence c, T
J
c (shift
c to the left(right) by j units)
A linear combinations of T
4
c, T
3
c, T
2
c, T
1
c, c , yields all the other shift of c. example :
Shift-register sequence with x
5
+x
4
+x
2
+x + 1
116
Primitive Polynomial the generator polynomial of m-sequence is primitive polynomial. -
g(x) is a primitive polynomial of degree m if the smallest integer n for which g(x) divides
x
n
+ 1 is n = 2
m
1
g(x) = x
5
+x
4
+x
2
+x +1 is primitive, on the other hand g(x) = x
5
+x
4
+x
3
+x
2
+x +1
is not primitive since x
6
+ 1 = (x + 1)(x
5
+x
4
+x
3
+x
2
+x + 1) so the smallest n is 6.
The number of primitive polynomial of degree m is equal to
1
m
(2
m
1) where
(n) = n
p|n
_
1
1
p
_
p[n denotes all distinct prime divisors of n
(n) is the number of positive integer less than n that are relatively prime to n.
Property of m-sequences
Property I The Shift Property A cyclic shift(left-cyclic or right-cyclic) of an m-sequence is
also an m-sequence
Property II The Recurrence Property Any m-sequence in S
m
satises the recurrence condi-
tion
c
i+m
= g
m1
c
i+m1
+g
m2
c
i+m2
+.... +g
i
c
i+1
+ci
where i = 0, 1, 2...
117
Property III The window Property If a window of width m is slid along an m-sequence in
S
m
, each of 2
m
1 nonzero binary m-tuples is seen exactly once
Property IV One more 1 than 0s Any m-sequence in S
m
contains 2
m1
1s and 2
m1
1
0s
Property V The addition Property The sum of two m-sequence in S
m
(mod2, term by term)
is another in S
m
Property VI The Shift and Add Property The sum of an m-sequence and a cyclic shift of
itself(mod2, term by term) is another m-sequence.
Property VII Thumb-Tack Autocorrelation The normalized periodic autocorrelation func-
tion of an msequence, dened as
=
1
N
N1

j=0
(1)
c
i
c
j
is equal to for i = 0(mod N) and 1/N for 0 i ,= (mod N)
Proved easily by shift and add property
Property VIII Runs A run is string of consecutive 1s or a string of consecutive 0s. In any
m-sequence, one-half of the runs have length 1, onequarter have length 2, one-eighth have
length 3, and so on. In particular, there is one run of length m of 1s , one run of length m-1
of 0s.
Property IX Decimation The decimation by n0 of a m-sequence c, denoted as c[ n], has a
period equal to N/gcd(N,n), if it is not the all-zero sequence, its generator polynomial g( x)
has roots that are nth powers of the roots of g(x)
6.1.2 Preferred Pair
1. Any pair of m-sequences having the same period N can be related by y= x[q], for some q.
118
2. Denition :
m0(mod 4) : that is, m=odd or m=2(mod 4)
y = x[q], where q is odd and either q = 2
k
+ 1 or q = 2
2k
2
k
+ 1
1 for m odd
gcd(m ,k )=
2 for m =2(mod 4 )
gcd : the greatest common divisor
3. It is known that preferred-pairs of m-sequences do not exist for m=4,8,12,16, and it was
conjectured that no solutions exist for all m=0 (mod 4).
6.1.3 Gold Codes
Gold sequences of length N can be constructed from a preferred-pair of m-sequences.
A preferred-pair of m-sequences , say x and y, has a threevalued correlation function :
x,y
(n) =
1, t(m), or t(m) 2 for all n where t(m)=1 + 2
m+2/2
The set of Gold sequences includes the preferred-pair of msequences x and y , and the mod
2 sums of x and cyclic shifts of y .
The maximum correlation magnitude for any two Gold sequences in the same set is equal
to the constant t(m).
Example of Gold sequences for m=3
Number of m-sequences :1/3(7) = 2
Length of m-sequences : N = 2
3
1 = 7
Primitive polynomials of degree m=3 (initial loading : 001)
x
3
+x + 1 : x = 1001011
x
3
+x
2
+ 1 : y = 1001110
119
The corresponding set of 9 Gold sequences of period 7 is given by:
1001011 1001110 0000101
1010110 1110001 0111111
01000010 0011000 1101100
Autocorrelation function for both m-sequences : thumb-tack shaped
t(m)=1 + 2
m+2/2
= 5
Crosscorrelation function are three-valued :-1,-5 or -3
x,y
(n) = 1, t(m) = 5, t(m)2 =
3
t(m)/N2
m/2
goes to 0 exponentially as m goes to innity
This suggests that longer Gold sequences will perform better as SSMA sequences.
6.2 Scrambler
LTE downlink scrambling implies that the block of code bits delivered by the hybrid-ARQ
functionality is multiplied (exclusive-or operation) by a bit-level scrambling sequence (usu-
ally a gold code).
In general, scrambling of the coded data helps to ensure that the receiver-side decoding can
fully utilize the processing gain provided by the channel code
120
The codewords are bit-wise multiplied with an orthogonal sequence and a cell-specic scram-
bling sequence to create a sequence of symbols for each codeword, q:
The scrambling sequence is pseudo-random, created using a length-31 Gold sequence gener-
ator and initialized using the slot number within the radio network temporary identier as-
sociated with the PDSCH transmission, n
RNT1
, the cell ID, N
cell
ID
, the slot number within
the radio frame,n
s
and the codeword index q=1,0 at the start of each subframe:
C
init
= n
RNT1
2
14
+q 2
1
3 +
n
s
2
| 2
9
+N
cell
ID
Scrambling with a cell-specic sequence serves the purpose of inter-cell interference rejec-
tion. When a UE descrambles a received bitstream with a known cell specic scrambling
sequence, interference from other cells will be descrambled incorrectly and therefore only
appear as uncorrelated noise.
Pseudo-random sequences are dened by a length-31 Gold sequence. The output sequence
c(n) of length M
PN
, where n = 0, 1, ..., M
PN
1, is dened by
c(n) = (x
1
(n +N
c
) +x
2
(n +N
c
))(mod2)
x
1
(n + 31) = (x
1
(n + 3) +x
1
(n))(mod2)
x
2
(n + 31) = (x
1
(n + 3) +x
2
(n + 2) +x
2
(n + 1) +x
2
(n))(mod2)
where = 1600 C N and the rst m-sequence shall be initialized with x
1
(0) = 1, x
1
(n) =
0, n = 1, 2, ..., 30 . The initialization of c
init
=
30

i=0
x
2
(i).2
i
with the value depending on the
application of the sequence.
121
6.3 Why scrambling
6.3.1 Data randomization
The scrambling process insures that no stream of zeros is transmitted , as zeros mean that no
power will be transmitted , which will lead to synchronization loss at the reciever end , assuming
that all data is recieved .also ,randomization of bits reduces the redundancy in the data stream
which will lead to better error correction performance.
6.3.2 PAPR reduction(peak to average power ratio)
The PAPR of a waveform may be described as
PAPR =
[x(t)[
2
max
P
avg
Where P
avg
is the average power of the waveform. In practical OFDM systems, the PAPR may
be reduced using one or a combination of several techniques. The techniques may be divided
into three major categories. The rst category employs various methods of nonlinear signal dis-
tortion such as hard clipping, soft clipping, companding, or pre-distortion. Generally speaking,
the nonlinear distortion techniques are simple to implement. However, many do not work well
in cases where the OFDM sub-carriers are modulated with higherorder modulation schemes. In
such scenarios, the Euclidian distance between the symbols is relatively small and the additional
noise introduced by the PAPR reduction causes signicant performance degradation. The second
category for PAPR reduction employs various coding methods. The coding techniques have an
advantage of being distortionless and the PAPR reduction is most commonly achieved by elimi-
nating symbols having large PAPR. However, to obtain an appreciable level of PAPR reduction,
high redundancy codes need to be used and as a result, the overall eciency of transmission be-
comes reduced. Finally, the third category is based on OFDM symbol scrambling and selection of
the sequence that produces minimum PAPR. The pre-scrambling techniques achieve good PAPR
reduction but they require multiple FFT transforms and somewhat higher processing power. The
method presented in this paper belongs to the third category of the PAPR reduction techniques.
It uses conveniently chosen Pseudorandom Noise (PN) sequences applied to the input data bit
stream. The method is very easy to realize in the software or hardware environment which is
very important if the PAPR needs to be implemented in Application Specic Integrated Circuits
(ASIC). In such a scenario, the PN-Scrambler may be implemented by the addition of external
FPGA and DSP hardware to the Commercial O-The-Shelf (COTS) ASICs. As a result, one ob-
tains cost ecient and reliable hardware solutions.
Block diagrams of the transmitter and receiver implementing the proposed PN-Scrambler are
presented in Figs. 1 and 2, respectively.
122
As seen in Fig. 1, two additional elements are added to a typical OFDM transmitter. The rst
element is the PAPR scrambler, and the second one is the PAPR threshold compare block. The
PN-Scrambler utilizes a Maximal-Length Linear Feedback Shift Register (MLLFSR) with log
2
(k) =
l taps in order to produce k = 2
l
1 uncorrelated unique sets of data from the same input
sequence. The k unique sets of data are used to generate k independent identically distributed
OFDM symbols. A block of Nb bits comprising one OFDM symbol is scrambled and passed along
for Forward Error Correction (FEC) coding, interleaving, modulation, symbol mapping and IFFT.
IFFT. In any given OFDM system, N
b
is a function of the number of subcarriers, the modula-
tion scheme applied to each subcarrier, and the coding rate. By examining each individual sam-
ple coming out of the IFFT, the PAPR threshold comparator determines if the scrambler has
achieved a desired PAPR on a symbol-by-symbol basis. If the PAPR of the symbol is below a
desired threshold, then the data is passed along towards the RF stage of the transmitter. How-
ever, if the PAPR is still high, the data is scrambled with a dierent phase of the ML-LFSRs PN
sequence. Since this technique operates on the input bit stream, it is essentially independent of
the OFDM modulation and may be adapted to any particular scenario. The receiver presented
in Fig. 2 is a typical OFDM receiver that needs to perform the tasks of down conversion, channel
estimation, and decoding. The only additional task required by the PN-Scrambler PAPR reduc-
tion technique is descrambling of the data at the receiver output. To perform descrambling,
the receiver has to know the phase of the ML-LFSR used on the transmission side. This
phase is embedded in the data stream. For example, the rst l bits of the OFDM symbol may
carry the information on the ML-LFSR phase.
123
A practical implementation of the PN-Scrambler PAPR reduction technique requires selection of
several parameters. These parameters are dened as follows:
1. Number of scrambling sequences (k) - dened as the number of PN sequences produced by
the MLLFSR. Each sequence is Nb bits long.
2. PAPR threshold (L) dened as the maximum PAPR for the OFDM symbol. This value
is used by the PAPR threshold comparator block in order to discard OFDM symbols with
PAPR greater than L. 3. IFFT size / number of sub-carriers (N) dened as the number of
the non-zero orthogonal subcarriers per OFDM symbol.
3. Average latency ( k ) dened as the average number of scrambling attempts per OFDM
symbol in order to pass the threshold level L.
4. Probability of clipping (p) probability that the PAPR exceeds the threshold level L after k
scrambling attempts.
5. PN scrambler overhead ( v ) dened as the ratio of the number of bits required to repre-
sent the phase of the ML-LFSR to the number of bits per OFDM symbol Nb.
In any actual design, the above parameters allow dierent tradeos. The subsequent
section highlights some of these design trades.
6.4 Matlab code
For the matlab code :
These equations are used to implement the feed back of the shift registers :
c(n) = (x
1
(n +N
c
) +x
2
(n +N
c
))(mod2)
x
1
(n + 31) = (x
1
(n + 3) +x
1
(n))(mod2)
x
2
(n + 31) = (x
1
(n + 3) +x
2
(n + 2) +x
2
(n + 1) +x
2
(n))(mod2)
For the initial phase of the two LFSRs
The rst register will have: [zeros(1,30) 1 ]
The second shift register will have: dec2bin
(C
init
= n
RNT1
2
14
+q 2
1
3 +
n
s
2
| 2
9
+N
cell
ID
)
dec2bin : converts the previous equation from decimal to binary , so it can be placed
in the LFSR. the constants of the equation , are assigned randomly with any integers. the
previous equation is only used in case of PDSCH channel For all downlink transport chan-
nels except the MCH, as well as for the L1/L2 control signaling, scrambling sequences should
be dierent for neighbor cells (cell-specic scrambling) to ensure interference randomization
between the cells. is achieved by having the scrambling depend on the physical-layer cell
identity. contrast, in case of MBSFN-based transmission using the MCH transport chan-
nel, same scrambling should be applied to all cells taking part in the MBSFN transmission
(cell-common scrambling). is achieved by having the scrambling depend on the so-called
MBSFN area identity.
124
for count=1:31
if xPD(1,count)==1 xpd : the initial phase of the shift register eected by the chan-
nel equation
xpd(1,count)=1;
else
xpd(1,count)=0;
end
end
The previous code is to convert the from data type char (output of the dec2bin) to dou-
ble,so it can be processed easily.
The next step is descarding the rst 1600 samples . for count=1:1600
feed1=xor(x1(1,end),x1(1,end-3));
feed2=xor(xor(xpd(1,end),xpd(1,end-1)),xor(xpd(1,end-3),xpd(1,end-2)));
x1=[feed1 x1(1,1:end-1)];
xpd=[feed2 xpd(1,1:end-1)]; end
Applying the previous feed back equations for the shift registers , as for the the rst regis-
ter feed1 is calculated and placed and placed at the begining of the sequence to be shifted ,
and the last sample is discarded , the same goes for the seconde register , useing feed2 , the
operation continuous till 1600 samples are discarded .
Now we start shifting using the past equations but this time ,the last bit of the two shift
registers will be xord ( generating the golden code) then ,xord with the data bit (bit level
scrambling)
for count=1:length(data)
feed1=xor(x1(1,end),x1(1,end-3));
feed2=xor(xor(xpd(1,end),xpd(1,end-1)),xor(xpd(1,end-3),xpd(1,end-2)));
x1=[feed1 x1(1,1:end-1)];
xpd=[feed2 xpd(1,1:end-1)];
same as previous shift operation
gold=xor(x1(1,end),xpd(1,end));
xoring the last bit of the two shift registers
scrambled(1,count)=xor(gold,data(1,count));
xoring the golden bit with the data bit
end
The reciever is the exact same thing , as xor operation is reversed with another xor opera-
tion.
125
126
Bibliography
[1] ALTERA. Gold code generator reference design. 2003.
[2] Ivica Kostanic Christopher Moatt. Practical implementation of pn scrambler for papr reduc-
tion in ofdm systems for range extension and lower power consumption. 2008.
127
128
Chapter 7
Digital Modulation Technique
7.1 INTRODUCTION
In baseband pulse transmission data stream represented in the form of a discrete pulse-amplitude
modulated (PAM) signal is transmitted directly over a low-pass channel. In digital pass band
transmission, on the other hand, the incomig data stream is modulated onto a carrier (usually
sinusoidal) with xed frequency limits imposed by a band-pass channel of interest, pass band
data transmission is studied in this chapter. The communication channel used for pass band data
transmission may be a micro wave radio link, a satellite channel, or the like. Yet other applica-
tions of pass band data transmission are in the design of pass band line codes for use on digital
subscriber loops and orthogonal frequency-division multiplexing techniques for broadcasting. In
any event the modulation process making the transmission possible involves switching (keying)
the amplitude, frequency, or phase of a sinusoidal carrier in some fashion in accordance with the
incoming data. Thus there are three basic signaling schemes, and they are known as:
129
FIGURE 7.1:waveforms for the three basic forms of signaling binary information. (a) Amplitude-
shift keying(OOK), Frequency-shift keying(FSK) and Phase-shift keying(PRK).
130
Amplitude-shift keying (ASK), frequency-shift keying (FSK), and phase-shift keying (PSK). They
may be viewed as special cases of amplitude modulation, frequency mod-ulation, and phase mod-
ulation, respectively, Figure 1.1 illustrates these three methods of modulation for the case of a
source supplying binary data. The following points are noteworthy from Figure 1.1: Although in
continuous-wave modulation it is usually dicult to distinguish between phase-modulated and
frequency-modulated signals by merely looking at their waveforms, this is not true for PSK and
FSK signals. Unlike ASK signals, both PSK and FSK signals have a constant envelope. This lat-
ter property makes PSK and FSK signals impervious to amplitude nonlinearities, commonly en-
countered in microwave radio and satellite channels. It is for this reason, in practice; we nd that
PSK and FSK signals are preferred to ASK signals for pass band data transmission over nonlin-
ear channels.
7.2 HIERARCHY OF DIGITAL MODULATION
TECHNIQUES
Digital modulation techniques may be classied into coherent and non coherent techniques, de-
pending on whether the receiver is equipped with a phase-recovery circuit or not. The phase-
recovery circuit ensures that the oscillator supplying the locally generated carrier wave in the re-
ceiver is synchronized (in both frequency and phase) to the oscillator supplying the carrier wave
used to originally modulate the incoming data stream in the transmitter.
In an M-ary signaling scheme, we may send any one of M possible signals s
1
(t), s
2
(t), . . . , s
M
(t),
during each signaling interval of duration T. For almost all applications, the number of possible
signals M = 2
n
, where n is an integer the symbol duration T = nTb, where T is the bit duration.
In pass band data transmission these signals are generated by changing the amplitude, phase, or
frequency of a sinusoidal carrier in M discrete steps. Thus we have M-ary ASK, M-ary PSK, and
M-ary FSK digital modulation schemes. Another way of generating M-ary signals is to combine
dierent methods of modulation into a hybrid form. For example, a special form of this hybrid
modulation is M-ary quadrature amplitude modulation (QAM), which has some attractive prop-
erties. M-ary ASK is a special case of M-ary QAM.
M-ary signaling schemes are preferred over binary signaling schemes for transmitting digital in-
formation over band-pass channels when the requirement is to conserve band-width at the ex-
pense of increased power. Thus when the bandwidth of the channel is less than the required
value, we may use M-ary signaling schemes for maximum eciency.
M-ary PSK, M-ary QAM, and M-ary FSK are commonly used in coherent systems. Am-plitude-
shift keying and frequency-shift keying lend themselves naturally to use in non-coherent sys-
tems whenever it is impractical to maintain carrier phase synchronization. But in the case of
phase-shift keying, we cannot have non coherent PSK because the term non coherent means do-
ing without carrier phase information. Instead, we employ a pseudo PSK technique known as dif-
ferential phase-shift keying (DPSK), which (in a loose sense) may be viewed as the non coherent
form of P5K. In practice, M-ary FSK and M-ary DPSK are the commonly used forms of digital
modulation in non coherent system.
131
7.3 Pass band Transmission Model
In a functional sense, we may model a pass band data transmission system as shown jn Figure
First, there is assumed to exist a message source that emits one symbol every T seconds, with
the symbols belonging to an alphabet of M symbols, which we denote by m
1
,m
2
,... , m
M
.The a
priori probabilities P(m
1
), P(m
2
),. . . , P(m
M
) specify the message source output. When the M
symbols of the alphabet are equally likely, we write
P
i
= P (m
i
)
=
1
M
for all i
(7.1)
The M-ary output of the message source is presented to a signal transmission encoder, producing
a corresponding vector Si made up of N real elements, one such set for each of the M symbols of
the source alphabet; the dimension N is less than or equal to M. With the vectorS2 as input, the
modulator then constructs a distinct signal s(t) of duration T seconds as the representation of
the symbol m generated by the message source. The signal s
1
(t) is necessarily an energy signal,
as shown by
E
i
=
_
T
0
S
2
i
(t) dt, i=1,2,......,M
(7.2)
Note that s
i
(t) is real valued. One such signal is transmitted every T seconds. The particular sig-
nal chosen for transmission depends in some fashion on the incoming message and possibly on
the signals transmitted in preceding time slots. With a sinusoidal carrier, the feature that is used
by the modulator to distinguish one signal from another is a step change in the amplitude, fre-
quency, or phase of the carrier. (Sometimes, a hybrid form of modulation that combines changes
in both amplitude and phase or amplitude and frequency is used.)
Figure 7.2
Functional model of pass band data transmission system. Returning to the functional model of
the band pass communication channel, coupling the transmitter to the receiver, is assumed to
have two characteristics:
1. The channel is linear, with a bandwidth that is wide enough to accommodate the transmission
of the modulated signal s(t) with negligible or no distortion.
2. The channel noise w(t) is the sample function of a white Gaussian noise process of zero mean
and power spectral density N0/2.
132
7.4 COHERENT PHASE-SHIFT KEYING
7.4.1 Binary Phase-Shift Keying
In a coherent binary PSK system, the pair of signals s
1
(t) and s
2
(t) used to represent binary
symbols 1 and 0, respectively, is dened by:
S
1
(t) =
_
2E
b
T
b
cos(2f
c
t) .
. S
2
(t) =
_
2E
b
T
b
cos(2f
c
t + ) =
_
2E
b
T
b
cos(2f
c
t).
(7.3),(7.4)
Where 0 t Tb, and Eb is the transmitted signal energy per bit. To ensure that each transmit-
ted bit contains an integral number of cycles of the carrier wave, the carrier frequency f is chosen
equal to nc/Tb for some xed integer n. Pair of sinusoidal waves that dier only in a relative
phase-shift of 180 degrees, as dened in Equations (7.3) and (7.4), are referred to as antipodal
signals. From this pair of equations it is clear that, in the case of binary PSK, there is only one
basis function of unit energy, namely,

1
(t) =
_
2
T
b
cos(2f
c
t), 0 t < T
b
(7.5)
Then we may express the transmitted signals S
1
(t) and S
2
(t) in terms of (t) as follows:
S
1
(t) =

E
b

1
(t), 0 t < T
b
. S
2
(t) =

E
b

1
(t), 0 t < T
b
(7.6)
FIGURE 7.3 Signal-space diagram for coherent binary PSK system. The waveforms depicting
the transmitted signals s
1
(t) and s
2
(t), displayed in the inserts, assume n=2.
133
A coherent binary PSK system is therefore characterized by having a signal space that is one-
dimensional (i.e., N=1), with a signal constellation consisting of two message points (i.e., M=2).
The coordinates of the message points are:
s
11
=
_
T
b
0
S
1
(t)
1
(t)dt = +

E
b
s
21
=
_
T
b
0
S
2
(t)
1
(t)dt =

E
b
(7.7)
The message point corresponding to S
1
(t) is located atS
11
= +

Eb, and the message point


corresponding to S
2
(t)is located at S
21
= -

Eb Figure 7.3 displays the signal-space.


7.4.1.1 Error Probability of Binary PSK
To realize a rule for making a decision in favor of symbol 1 or symbol 0,we partition the signal
space of Figure 7.3 into two regions:
The set of points closest to message point 1 at +

Eb
The set of points closest to message point 2 at -

Eb
This is accomplished by constructing the midpoint of the line joining these two message points,
and then marking o the appropriate decision regions. In Figure 7.3 these decision regions are
marked Z1 and Z2, according to the message point around which they are constructed.
The decision rule is now simply to decide that signal s
1
(t) (i.e., binary symbol 1) was transmit-
ted if the received signal point falls in region Z1, and decide that signal s
2
(t) (i.e., binary symbol
0) was transmitted if the received signal point falls in region Z2. Two kinds of erroneous deci-
sions may, however, be made. Signal s
2
(t) is transmitted, but the noise is such that the received
signal point falls inside region Z1 and so the receiver decides in favor of signal s
1
(t). Alterna-
tively, signal s
1
(t) is transmitted, but the noise is such that the received signal point falls inside
region Z2 and so the receiver decides in favor of signal s
2
(t).
To calculate the probability of making an error of the rst kind, we note from Figure 7.3 that the
decision region associated with symbol 1 or signal s
1
(t) is described by
Z
1
= 0 < X <
(7.8)
Where the observable element x
1
is related to the received signal x(t) by:
x
1
=
_
T
b
0
x(t)
1
(t)dt
(7.9)
134
The conditional probability density function of random variable X
1
, given that symbol 0 [i.e.,
signal s
2
(t)] was transmitted, is dened by:
f
x
1
(x
1
[0) =
1

No
exp
_

1
No
(x
1
S
21
)
2
_
=
1

No
exp
_

1
No
(x
1

E
b
)
2
_
(7.10)
The conditional probability of the receiver deciding in favor of symbol 1, given that symbol 0 was
transmitted, is therefore
P
10
=
_

0
f
x
1
(x
1
[0)dx
1
P
10
=
1

No
_

0
exp
_

1
No
(x
1
+

E
b
)
2
_
dx
1
(7.11)
Putting
z =
1

No
(x
1
+

E
b
)
(7.12)
And changing the variable of integration from x
1
to z, we may rewrite the compact form
P
10
=
1

Eb/No
exp (z
2
) dz.
P
10
=
1
2
erfc
__
E
b
No
_
(7.13)
Consider next an error of the second kind. We note that the signal space of Figure 7.3 is sym-
metric with respect to the origin. It follows therefore that P
01
, the condition probability of the
receiver deciding in favor of symbol 0, given that symbol 1 was transmitted. Thus, averaging the
conditional error probabilities P
10
and P
01
, we nd that the average probability of symbol error
or, equivalently, the bit error rate for coherent bi PSK is (assuming equi probable symbols)
P
e
=
1
2
erfc
__
E
b
No
_
(7.14)
As we increase the transmitted signal energy per bit, Eb, for a specied noise spectral density
N0, the message points corresponding to symbols 1 and 0 move further apart
135
7.4.1.2 Generation and Detection of Coherent Binary PSK Signals
To generate a binary PSK signal, we see that we have to represent the input binary sequence in
polar form with symbols 1 and 0 represented by constant amplitude levels of +

Eb and -

Eb
respectively. This signal transmission encoding is performed by a polar non-return-to-zero (NRZ)
level encoder. The resulting binary wave and a sinusoidal carrier 1(t), whose frequency f (n/T,)
for some xed integer n, are applied to a product modulator, as in Figure 7.4a. The carrier and
the timing pulses used to generate the binary wave are usually extracted from a common master
clock. The desired PSK wave is obtained at the modulator output.
To detect the original binary sequence of 1s and 0s, we apply the noisy PSK signal x(t) (at the
channel output) to a correlator, which is also supplied with a locally generated coherent reference
signal1(t), as in Figure 7.4b. The correlator output, x
1
, is compared with a threshold of zero
volts. If x
1
> 0, the receiver decides in favor of symbol 1 On the other hand, if x
1
< 0, it decides
in favor of symbol 0. If x
1
is exactly zero, the receiver makes a random guess in favor of 0 or 1.
FIGURE 7.4: Block diagrams for (a) binary PSK transmitter and (b) coherent binary PSK
receiver.
136
7.4.2 QUADRIPHASE-SHIFT KEYING
The provision of reliable performance, exemplied by a very low probability of error is one im-
portant goal in the design of a digital communication system. Another important goal is the e-
cient utilization of channel bandwidth. In this section, we study a band. width-conserving mod-
ulation scheme known as coherent quadriphase-shift keying, Which is an example of quadrature-
carrier multiplexing. In quadriphase-shift keying (QPSK), as with binary PSK, information car-
ried by the transmitted signal is contained in the phase. In particular, the phase of the carrier
takes on one of four equally spaced values, such as /4, 3/4, 5/4, and 7/4.For thi8 of values we
may dene the transmitted signal as
(7.15)
Where i = 1, 2, 3, 4; E is the transmitted signal energy per symbol, and T is the symbol du-
ration. The carrier frequency f equals n/T for some xed integer n. Each possible value of the
phase corresponds to a unique digit. Thus, for example, we may choose the foregoing set of phase
values to represent the Gray-encoded set of debits: 10, 00, 01, and 11, where only a single bit is
changed from one digit to the next.
7.4.2.1 Signal-Space Diagram of QPSK
Using a well-known trigonometric identity, we may use the last Equation to redene the trans-
mitted signal S
i
(t) for the interval 0 t T
in
the equivalent form:
(7.16)
Where i= 1,2,3,4. Based on this representation, we can make the following observations:
There are two orthonormal basis functions, 1(t) and 2(t), contained in the expands of s(t).
Specically, 1(t) and 2(t) are dened by a pair of quadrature carriers:
137
TABLE 7.1 Signal-space characterization of QPSK
FIGURE 7.5: Signal-space diagram of coherent QPSK system. There are four message points,
and the associated signal vectors are dened by:
(7.18)
138
The elements of the signal vectors, namely, S
1
and S
2
have their values summarized in Table 7.1.
The rst two columns of this table give the associated dibit and phase of the QPSK signal.
Accordingly, a QPSK signal has a two-dimensional signal constellation (i.e., N = 2) and four
message points (i.e., M = 4) whose phase angles increase in a counterclockwise direction, as il-
lustrated in Figure 7.6. As with binary PSK, the QPSK signal has minimum average energy.
7.4.2.2 EXAMPLE 7.1
Figure 7.6 illustrates the sequences and waveforms involved in the generation of a QPSK signal.
The input binary sequence 01101000 is shown in Figure 7.6
FIGURE 7.6 (a) Input binary sequence. (b) Odd-numbered bits of input sequence and associated
binary PSK wave. (c) Even-numbered bits of input sequence and associated binary PSK wave.
(d) QPSK waveform dened as:
S(t)=S
i1
1(t) +S
i2
2(t).
139
7.4.2.3 Error Probability of QPSK
(7.19)
7.4.2.4 Generation and Detection of Coherent QPSK Signals
Consider next the generation and detection of QPSK signals. Figure 7.7a shows a block diagram
of a typical QPSK transmitter. The incoming binary data sequence is rst trans-formed into po-
lar form by a non return-to-zero level encoder. Thus, symbols 1 and 0 are represented by +

Eb
and -

Eb, respectively. This binary wave is next divided by means of a de multiplexer into two
separate binary waves consisting of the odd and even- numbered input bits. These two binary
waves are denoted by a
1
(t) and a
2
(t). We note that in any signaling interval, the amplitudes of
a
1
(t) and a
2
(t) equal S
i1
, and S
i2
, respectively, de-pending on the particular dibit that is being
transmitted. The two binary waves a
1
(t) and a
2
(t) are used to modulate a pair of quadrature
carriers or orthonormal basis functions:
1(t) equal to
_
2/T cos(2fc t)
2(t) equal to
_
2/T sin(2fc t).
The result is a pair of Binary PSK signals, which may be detected independently due to the or-
thogonality of 1(t) and 2(t) Finally, the two binary PSK signals are added to produce the de-
sired QPSK signal.
140
FIGURE 7.7 Block diagrams of (a) QPSK transmitter and (b) coherent QPSK receiver
The QPSK receiver consists of a pair of correlators with a common input and supplied with a
locally generated pair of coherent reference signals 1(t) and 2(t), as in 7.7b. The correlator
outputs X
1
and X
2
, produced in response to the received signal x(t) are each compared with a
threshold of zero. Finally, the binary sequences at the in-phase and quadrature channel outputs
are combined in a multiplexer to reproduce the original binary sequence at the transmitter input
with the minimum probability of symbol error in an AWGN channel.
141
7.4.3 M-ARY PSK
QPSK is a special case of M-ary PSK, where the phase of the carrier takes on one of M possible
values, namely, i= 2(i 1)/M, where 1, 2,.. . , M. Accordingly, during each signaling interval of
duration T, one of the M possible signals
(7.20)
is sent, where E is the signal energy per symbol. The carrier frequency f = n/T for some xed
integer n.
Each s
i
(t) may be expanded in terms of the same two basis functions 1(t) and 2(t), respec-
tively. The signal constellation of M-ary PSK is therefore two-dimensional. The M message points
are equally spaced on a circle of radius

E and center at the origin, as illustrated in Figure 1.9


a, for the case of octapha.se shift-keying(i.e., M = 8).
FIGURE 7.8 (a) Signal-space diagram for octaphase-shift keying (i.e., M = 8). The deci-sion
boundaries are shown as dashed lines. (b) Signal-space diagram illustrating the application of the
union bound for octaphase-shift keying.
142
The transmitted signal corresponds to the message point m
1
, whose coordinates along the 1(t)
and 2(t), axes are +

E and 0, respectively. Suppose that the ratio E/N0 is large enough to


consider the nearest two message points, one on either side of m1, as potential candidates for be-
ing mistaken for m1due to channel noise. This is illustrated in Figure 7.9b for the case of M = 8.
The Euclidean distance of each of these two points from m
1
is (for M=8)
(7.21)
7.4.3.1 The average probability of symbol error for coherent M-ary PSK
(7.22)
Where it is assumed that M 4. The approximation becomes extremely tight, for xed M, as
E/N0 is increased. For M = 4, Equation reduces to the same form given in Equation for QPSK.
7.4.3.2 BANDWIDTH EFFICIENCY OF M-ARY PSK SIGNALS
The power spectra of M-ary PSK signals possess a main lobe bounded by we1l-def spectral nulls
(i.e., frequencies at which the power spectral density is zero). Accordingly the spectral width
of the main lobe provides a simple and popular measure for the band width of M-ary PSK sig-
nals. This denition is referred to as the null-to-null bandwidth, With the null-to-null bandwidth
encompassing the main lobe of the power spectrum of an M-ary signal, we nd that it contains
most of the signal power. This is readily seen b looking at the power spectral plots of Figure 7.9.
(7.23)
TABLE 7.2 Bandwidth eciency of M-ary PSK signals
143
7.4.4 Frequency-Shift Keying
M-ary PSK and M-ary QAM share a common property: Both are examples of linear modula-
tion. In this section we study a nonlinear method of passband data transmission namely, coher-
ent frequency-shift keying (FSK). We begin the study by considering the simple case of binary
FSK.
7.4.4.1 Binary FSK
In a binary FSK system, symbols I and 0 are distinguished from each other by transmit-ting one
of two sinusoidal waves that dier in frequency by a xed amount. A typical pair of sinusoidal
waves is described by
(7.24)
Where i= 1,2, and Ebis the transmitted signal energy per bit; the transmitted frequency is
(7.25)
Thus symbol 1 is represented by S
1
(t), and symbol 0 by S
2
((t). The FSK signal described here
is known as Sundes FSK. It is a continuous-phase signal in the sense that phase continuity is al-
ways maintained, including the inter-bit switching times. This form of digital modulation is an
example of continuous-phase frequency-shift keying (CPFSK), on which we have more to say
later on in the section. From Equations, we observe directly that the signals S
1
((t) and S
2
((t)
are orthogonal, but not normalized to have unit energy. We therefore deduce that the most use-
ful form for the set of orthonormal basis functions is:
(7.26)
144
145
Thus, unlike coherent binary PSK, a coherent binary FSK system is characterized by hav-ing a
signal space that is two-dimensional (i.e., N = 2) with two message points (i.e., M = 2), as shown
in Figure 7.9
(7.28)
FIGURE 7.9 Signal-space diagram for binary FSK system. The diagram also includes two
inserts showing example waveforms of the two modulated signals S
1
(t) and S
2
(t).
146
7.4.4.2 Error Probability of Binary FSK
7.4.4.3 Generation and Detection of Coherent Binary FSK Signals
To generate a binary FSK signal, we may use the scheme shown in Figure 7.10 a. The incoming
binary data sequence is rst applied to an ono level encoder, at the output of which symbol 1 is
represented by a constant amplitude of

Eb volts and symbol 0 is represented by zero volts. By


using an inverter in the lower channel in Figure 7.10 a, we in eect make sure that when we have
symbol 1 at the input,the two frequencies f1 and f2 are chosen to equal dierent integer multiples
of the bit rate 1/Tb.
To detect the original binary sequence given the noisy received signal x(t), we may use the re-
ceiver shown in Figure 7.10b. It consists of two correlators with a common input, which are sup-
plied with locally generated coherent reference signals 1(t) and 2(t), The correlator outputs
are then subtracted, one from the other, and the resulting dierence, Y is compared with a thresh-
old of zero volts, if y >0, the receiver decides in favor of 1. On the other hand, if y <0, it decides
in favor of 0. If y is exactly zero, the receiver makes random guess in favor of 1 or 0.
147
7.4.5 M-ary QUADRATURE AMPLITUDE Modulation (QAM Mod.):
Square constellations for which the number of bits per symbol is even, and cross constellations
for which the number of bits per symbol is odd. These two cases are considered in the sequel in
that order.
(7.29)
The signal S
k
(t) consists of two phase-quadrature carriers with each one being modulated by a
set of discrete amplitudes, hence the name quadrature amplitude modulation. Depending on the
number of possiblebits per symbol ,we may distinguish two distinct QAM constellations square
148
constellations for which the number of bits per symbol is even, and cross constellations for which
the number of bits per symbol is odd.
149
7.4.5.1 QAM SQUARE CONSTELLATION:
M-ary signal With an Even number of bits per symbol Example:
Consider a 16-QAM whose signal constellation is depicted in Figure (7.1). The encoding of the
message points shown in this gure is as follows:
Two of the four bits, namely, the left-most two bits, specify the quadrant in the (1, 2) plane
in which a message point lies. Thus, starting from the rst quadrant and proceeding counter
clockwise, the four quadrants are represented by the digits 11, 10, 00, and 01.
The remaining two bits are used to represent one of the four possible symbols lying within each
quadrant of the (1, 2 ) plane.
gure(7.11) signal-space diagram of M-ary QAM for M=16 ;the message points in each
quadrant identied with gray-encoded quadbits.
NOTE:
The encoding of the four quadrants and also the encoding of the symbols in each quadrant follow
the gray coding rule.
150
The probability of symbol error for M-ary QAM is approximately given by :
(7.30)
The probability of symbol errorin terms of the average value of the
transmitted energy rather than E0:
(7.31)
7.4.5.2 QAM CROSS CONSTELLATION:
M-ary signal with an Odd number of bits per symbol.
We may construct such a signal constellation with n bits per symbol by proceeding as follows:
Start with a QAM square constellation with n-1 bits per symbol.
Extend each side of the QAM square constellation by adding 2
n3
symbols.
Ignore the corners in the extension.
Figure(7.12)illustrating how a square QAM constellation can be expanded to form a QAM
cross-constellation.
Note: also that it is not possible to perfectly Gray code a QAM cross constellation.
151
The probability of symbol error:
(7.32)
7.4.6 Compare between (PSK) Vs (QAM)
M-Ary PSK systems are consisted of xed step phase shifts with constant envelope. In a try to
increase such system capacity, the constellation points will get closer to each other increasing the
bit error rate. A simple solution is to increase the radius of the constellation points, but of course
it will also increase the power used.
A new technique was developed to overcome that problem by making use of available space
inside the constellation circle.
This technique is called quadrature amplitude modulation as it combines with or make use of
both ASK and PSK.
Figure(7.13) showing 16-Ary PSK (a) and equivalent average power 16-Ary QAM (b)
152
7.5 Noncoherent Orthogonal Modulation :
Noncoherent orthogonal modulation that includes two noncoherent receivers as special cases:
noncoherent binary frequency-shift keying and dierential phase-shift keying.
7.5.1 NONCOHERENT BINARY FSK:
the transmitted signal is dened by:
(7.33)
where the carrier frequency f
i
equals one of two possible values, f
1
and f
2
; to ensure that the
signals representing these two frequencies are orthogonal, we choose f
i
= ni /Tb, where ni is an
integer. The transmission of frequency f
1
represents symbol 1, and the transmission of frequency
f
2
represents symbol 0. For the noncoherent detection of this frequency- modulated wave, the
receiver consists of a pair of matched lters followed by envelope detectors, as in Figure (7.15).
The lter in the upper path of the receiver is matched to cos (2 f
1
t), and the lter in the lower
path is matched to cos (2 f
2
t), and in both cases 0 t Tb .The resulting envelope detector out-
puts are sampled at t= Tb,and their values are compared. The envelope samples of the upper
and lower paths in Figure (7.14) are shown as l
1
, and l
2
, respectively. Then, if l
1
> l
2
, the re-
ceiver decides in favor of symbol 1, and if l
1
< l
2
, it decides in favor of symbols 0. If l
1
=l
2
the
receiver simply makes a guess in favor of symbol 1 or 0. The noncoherent binary FSK described
herein is a special case of noncoherent orthogonal modulation with T = Tb and E= Eb ,where
Tb is the bit duration and Eb is the signal energy per bit. Because the carrier phase is unknown,
the receiver relies on amplitude as the only possible discriminant.
153
Figure(7.14):Noncoherent receiver for the detection of binary FSK signals.
154
BIT ERROR RATE:
(7.34)
7.5.2 Dierential phase shift keying (DPSK):
Dierential phase shift keying (DPSK) is a common form of phase modulation that conveys data
by changing the phase of the carrier wave. As mentioned for BPSK and QPSK there is an am-
biguity of phase if the constellation is rotated by some eect in the communications channel
through which the signal passes. This problem can be overcome by using the data to change
rather than set the phase.
For example, in dierentially-encoded BPSK a binary 1 may be transmitted by adding 180 to
the current phase and a binary 0 by adding 0 to the current phase. In dierentially-encoded
QPSK, the phase-shifts are 0, 90, 180, -90 corresponding to data 00, 01, 11, 10. This kind
of encoding may be demodulated in the same way as for non-dierential PSK but the phase am-
biguities can be ignored. Thus, each received symbol is demodulated to one of the M points in
the constellation and a comparator then computes the dierence in phase between this received
signal and the preceding one. The dierence encodes the data as described above.
155
7.5.2.1 Procedure
This would be done by dierential encoding i.e. The input binary sequence is rst dierentially
encoded, then modulated using BPSK modulator.
Let a
k
: original binary data, and d
k
: encoded binary data sequence
Encoding:
(7.35)
Decoding:
(7.36)
Modulator of DPSK:-
Demodulator of DPSK:-
156
Example:
Table (7.3) DPSK example
7.5.2.2 Power spectral density:
The same as BPSK Since the dierence of dierentially encoded BPSK from BPSK is dierential
encoding, which always produces an asymptotically equally likely data sequence the PSD of the
dierentially encoded BPSK is the same as BPSK which we assume is equally likely
Advantages VS disadvantages :
Advantage: reduce the receiver complexity.
Disadvantage: energy eciency is less than coherent PSK by 3 dB
Probability of error:-
(7.37)
157
Figure (7.18): Performance comparison between coherent BPSK,coherent FSK, DPSK and
Noncoherent BPSK.
158
7.6 Table of BER equations
159
7.7 Modulation in LTE
160
161
162
1011 1001 0001 0011

1010 1000 0000 0010

1110 1100 0100 0110

1111 1101 0101 0111

Fig: Constellation diagram of 16 Qam Modulation in LTE.
163
164
165
166
7.8 Soft demodulation
7.6.1 BASIC PRINCIPLE OF M-QAM SOFT DEMODULATION
Compared with M-QAM hard demodulation, M-QAM soft demodulation combining with Turbo-
decode can better improve decoding property, and reduce bit error ratio and HARQ frequence.
Assuming M-QAM modulation maps bit set (r
1
,r
2
,r
3
,r
4
,... ) to complex signal
X=X
I
+jX
Q
,
the channel through which transmit symbol pass is a at type Rayleigh fading channel (namely
the channel fading coecient in each symbol period is a constant), and received signal y can be
shown as:
y = hx + n
wherein, h is the channel fading coecient, E
_
[[h[[
2
_
= 1 and n=n
I
+jn
Q
are white complex
Gaussian noises, bothn
I
and n
Q
comply with mean values being 0, and variance is Gaussian dis-
tribution of
2
/ 2 .
The logarithm likelihood estimator of the r
i
bit of the received end is dened to be:
LLR(r
i
)=log
_
Pr{r
i
=1|y,h}
Pr{r
i
=0|y,h}
_
7.6.2 Softbit for 16 QAM
Channel Model
The received coded sequence is y=c+n ,where
c is the modulated coded sequence taking values in the alphabet
.
n is the Additive White Gaussian Noise following the probability distribution function,
167
Soft bit for b0
The bit mapping for the bit b0 with 16QAM Gray coded mapping is shown below. We can
see that when b0 toggles from 0 to 1, only the real part of the constellation is aected
When the b0 is 0, the real part of the QAM constellation takes values -3 or -1. The conditional
probability of the received signal y given b0 is 0 is,
P(y[b
0
=0)=
1

2
2
e
(yre+3)
2
2
2
+
1

2
2
e
(yre+1)
2
2
2
.
When the bit0 is 1, the real part of the QAM constellation takes values +1 or +3. The condi-
tional probability given b0 is zero is,
P(y[b
0
=1)=
1

2
2
e
(yre1)
2
2
2
+
1

2
2
e
(yre3)
2
2
2
168
Soft bit for b1
The bit mapping for the bit b1 with 16QAM Gray coded mapping is shown
below. We can see that when b0 toggles from 0 to 1, only the real part of
the constellation is aected.
When the b1 is zero, the real part of the QAM constellation takes values -3
or +3. The conditional probability given b1 is zero is,
P(y[b
1
=0)=
1

2
2
e
(yre+3)
2
2
2
+
1

2
2
e
(yre3)
2
2
2
When the b1 is 1, the real part of the QAM constellation takes values -1 or
+1. The conditional probability given b1 is one is,
P(y[b
1
=1)=
1

2
2
e
(yre+1)
2
2
2
+
1

2
2
e
(yre1)
2
2
2
169
Summary
The softbit for bit b0 is:
Sb(b
0
) = 2(y
re
+1) y
re
< 2
= y
re
2 y
re
< 2
= 2(y
re
-1) y
re
> 2
The softbit for bit b1 is:
Sb(b
1
) = y
re
+2 y
re
0
= -y
re
+2 y
re
> 0
The softbit for bit b1 can be simplied to:
Sb(b
1
) = -[y
re
[ +2 , for all y
re
It is easy to observe that the softbits for bits b
2
, b
3
are identical to softbits
for b
0
, b
1
respectively except that the decisions are based on the imaginary
component of the received vector y
im
.
The softbit for bit b2 is:
Sb(b
2
) = 2(y
im
+1) y
im
< 2
= y
im
2 y
im
< 2
= 2(y
im
-1) y
im
> 2
The softbit for bit b3 is:
Sb(b
3
) = -[y
im
[ +2 , for all y
im
simplication to avoids the need for having a threshold check in the
receiver for sofbits b0 and b2 respectively.
2(y
re
+1) = y
re
and
2(y
im
+1) = y
im
This simplication described in [1]
170
Bibliography
[1] Paola Bisaglia Filippo Tosato. Simplied soft-output demapper for bi-
nary interleaved cofdm with application to hiperlan/2. journal, October
2001.
[2] Simon Haykin. Communication Systems. John Wiley and Sons, Inc,
2001.
[3] Jia Yin Lang Tianyi. Application of soft demodulation in lte physical
layer downlink. journal, 2011.
171
172
Chapter 8
MIMO
8.1 MIMO concepts and capacity
8.1.1 Introduction
Wireless system designers are faced with numerous challenges, including
limited availability of radio frequency spectrum and transmission problems
caused by such factors as fading and multipath distortion. Meanwhile, there
is increasing demand for higher data rates, better quality service, fewer dropped
calls, and higher network capacity. Meeting these needs requires new tech-
niques that improve spectral eciency and network linksoperational relia-
bility. Multiple-input-multiple-output (MIMO) technology promises a cost-
eective way to provide these capabilities. MIMO uses antenna arrays at
both the transmitter and receiver. Algorithms in a radio chipset send infor-
mation out over the antennas. The radio signals reect o objects, creating
multiple paths that in conventional radios cause interference and fading.
But MIMO sends data over these multiple paths, thereby increasing the
amount of information the system carries. The data is received by multiple
antennas and recombined properly by other MIMO algorithms. This tech-
nology promises to let engineers scale up wireless bandwidth or increase
transmission ranges. MIMO is an underlying technique for carrying data.
It operates at the physical layer, below the protocols used to carry the data,
so its channels can work with virtually any wireless transmission protocol.
For example, MIMO can be used with the popular IEEE 802.11 (Wi-Fi)
technology, and in the upcoming mobile generations and broadband solu-
tions such as IEEE 802.16 (WiMAX) and Long Term Evolution (LET).
173
Figure 8.1: CHANNEL IMPAIREMENTS
For these reasons, MIMO eventually will become the standard for carry-
ing almost all wireless trac; it is thought that MIMO will become a core
technology in wireless systems. It is really the only economical way to in-
crease bandwidth and range. MIMO still must prove itself in large scale,
real-world implementations, and it must overcome several obstacles to its
success, including energy consumption, cost, and competition from similar
technologies.
8.1.2 WIRELESS CHANNEL IMPAIREMENTS:
a)Multipath fading (destructive interference) :scattering due to
dierent obstacles gure 1.1
b)Shadowing : Communication blocked by obstacles : gure 1.2
c)Interference : gure 1.3
8.1.3 What is MIMO
MIMO is an acronym that stands for Multiple Inputs Multiple Outputs.
It is an antenna technology that is used both in transmission and receiver
equipment for Wireless radio communication, to improve communication
performance. It is one of several forms of smart antenna technology.
174
Figure 8.2: Shadowing
Figure 8.3: Interference
175
Why MIMO in a key feature in the modern wireless communication systems? There
are many reasons to justify why it is thought that MIMO will become a
core technology in wireless systems, some reasons are listed here but the
coming future will demonstrate the powerful and importance of MIMO tech-
nology. MIMO technique is able to:
Exploit multipath by taking advantage of random fading, as it is known
that the main impairment to the performance of wireless communica-
tion systems is fading due to multipath and interference.
Achieve very high spectral eciency and it is a perfect solution to the
limited bandwidth availability.
Save the system power consumption, as it increases the system capac-
ity and reliability without consume excessive power.
Increase the system capacity so it can support many number of users.
Increase the system throughout as it can support high data rates.
Increase both the quality of service and the revenues signicantly.
From the previous reasons, there is no doubt about the importance of MIMO
technique, so the aim of this section is to provide a complete and concise
overview about this promising technique.
8.1.4 MIMO vs. Channel Capacity
Channel capacity: The maximum possible transmission rate such that the
probability of error is small. Multipath propagation has long been regarded
as an impairment because it causes signal fading, to mitigate this problem,
diversity techniques were developed Antenna diversity is a widespread form
of diversity, recent research has shown that multipath propagation can in
fact contribute to capacity.
There are a number of dierent MIMO congurations or formats that can
be used. These are termed SISO, SIMO, MISO and MIMO. These dierent
MIMO formats oer dierent advantages and disadvantages - these can be
balanced to provide the optimum solution for any given application.
176
8.1.5 SISO, SIMO, MISO and MIMO terminology
The dierent forms of antenna technology refer to single or multiple inputs
and outputs. These are related to the radio link. In this way the input is
the transmitter as it transmits into the link or signal path, and the output
is the receiver. It is at the output of the wireless link. Therefore the dier-
ent forms of single / multiple antenna links are dened as below:
SISO - Single Input Single Output.
SIMO - Single Input Multiple output.
MISO - Multiple Input Single Output.
MIMO - Multiple Input multiple Output.
The term MU-MIMO is also used for a multiple user version of MIMO as
described below:
SISO The simplest form of radio link can be dened in MIMO terms as
SISO - Single Input Single Output. This is eectively a standard radio chan-
nel - this transmitter operates with one antenna as does the receiver. There
is no diversity and no additional processing required gure 1.4.
The advantage of a SISO system is its simplicity. SISO requires no process-
ing in terms of the various forms of diversity that may be used. However
the SISO channel is limited in its performance. Interference and fading will
impact the system more than a MIMO system using some form of diver-
sity, and the channel bandwidth is limited by Shannons law - the through-
put being dependent upon the channel bandwidth and the signal to noise
ratio. The channel capacity of this form can be calculator by the Shannon
formula :
C = B log
2
(1 +S/R)bit/s
SIMO (receive diversity) The SIMO or Single Input Multiple Output ver-
sion of MIMO occurs where the transmitter has a single antenna and the
receiver has multiple antennas. This is also known as receive diversity. It
177
Figure 8.4: SISO
Figure 8.5: SIMO
is often used to enable a receiver system that receives signals from a num-
ber of independent sources to combat the eects of fading. It has been used
for many years with short wave listening / receiving stations to combat the
eects of ionosphere fading and interference gure 1.5.
SIMO has the advantage that it is relatively easy to implement although it
does have some disadvantages in that the processing is required in the re-
ceiver. The use of SIMO may be quite acceptable in many applications, but
where the receiver is located in a mobile device such as a cell phone hand-
set, the levels of processing may be limited by size, cost and battery drain.
In this case when the transmitter has a single antenna. To increase channel
capacity and dont change bandwidth, this form used with Capacity:
C = B log
2
(1 +nS/R)bit/s
For example, if n=2 (two transmitter antenna), B = 5 Mhz, S/N = 100, in
SISO system C= 33,3 Mb/s (Mbps); in SIMO system C = 38.3 Mb/s. This
capacity is large than a bit, but it have some other function like reduce fad-
ing (diversity gain).
MISO (transmit diversity) MISO is also termed transmit diversity. In this
case, the same data is transmitted redundantly from the two transmitter
antennas. The receiver is then able to receive the optimum signal which it
can then use to receive extract the required data gure 1.6.
178
Figure 8.6: MISO
Figure 8.7: MIMO
MIMO Where there is more than one antenna at either end of the radio
link, this is termed MIMO - Multiple Input Multiple Output. MIMO can
be used to provide improvements in both channel robustness as well as chan-
nel throughput gure 1.7.
C = B log
2
(1 +nT.nR.S/R)bit/s
nT: transmitter antenna
nR: receiver antenna
For above example, nT= nR= 2 = c= 43.3 Mb/s. But, with the signal is
coded using techniques called space-time coding
C = min(nT, nR) B log
2
(1 +S/R)bit/s
Min(nT,nR): minimum of nT and nR and C =66.6 Mb/s, much better. With
33 or 44 antenna C is more increase. MIMO is divided into MIMO single-user and
multi-user:
MIMO single-user (MIMO-SU): shown at gure 1.8
MIMO multi-user (MIMO-MU): The main dierence here with the MIMO
system is that we have many receivers each one has an antenna gure 1.9.
179
Figure 8.8: MIMO single-user
Figure 8.9: MIMO multi-user
Figure 8.10: table 1
180
8.2 Diversity
It is to send the same data over independent fading paths. These indepen-
dent paths are combined in some way such that the fading of the resultant
signal is reduced .so we will have many copies of the signal. we send on dif-
ferent independent paths so the probability that the two paths undergoes
deep fading is too small, that depend on how much that tow paths are de-
pendent on each other.
8.2.1 Types of diversity:
1. Time diversity: Time diversity is achieved by transmitting the same
signal at dierent times, where the time dierence is greater than the
channel coherence time (the inverse of the channel Doppler spread).
Time diversity does not require increased transmit power, but it does
decrease the data rate since data is repeated in the diversity time slots
rather than sending new data in these time slots. Time diversity can
also be achieved through coding and in-terleaving.
2. Frequency diversity: Separations between carriers should be at least
the coherent bandwidth (f) c will guarantee that the fading statistics
for dierent frequencies are essentially uncorrelated (Dierent copies
undergo independent fading). The coherence bandwidth is dierent for
181
Figure 8.11: Frequency diversity Vs time at one slot
Figure 8.12: Frequency diversity Vs time at two slots
dierent propagation environments . Like time diversity, frequency di-
versity induces a loss in bandwidth eciency due to a redundancy in-
troduced in the frequency domain gures 1.11 and 1.12.
3. Polarization Diversity: It uses either two transmit antennas or two
receive antennas with dierent polarization (e.g. vertically and hori-
zontally polarized waves). Disadvantages of polarization diversity rst,
you can have at most two diversity branches, corresponding to the two
types of polarization. The second disadvantage is that polarization di-
versity loses eectively half the power (3 dB) since the transmit or re-
ceive power is divided between the two dierently polarized antennas.
4. Delay diversity: a radio channel subject to time dispersion, with the
transmitted signal propagating to the receiver via multiple, indepen-
dently fading paths with dierent delays, provides the possibility for
multi-path diversity or, equivalently, frequency diversity. Thus multi-
182
path propagation is actually benecial in terms of radio-link perfor-
mance, assuming that the amount of multipath propagation is not too
extensive and that the transmission scheme includes tools to counter-
act signal corruption due to the radio-channel frequency selectivity,
for example, by means of OFDM transmission or the use of advanced
receiver-side equalization. If the channel in itself is not time dispersive,
the availability of multiple transmit antennas can be used to create ar-
ticial time dispersion or, equivalently, articial frequency selectivity
by transmitting identical signals with dierent relative delays from the
dierent antennas. In this way, the antenna diversity, i.e. the fact that
the fading experienced by the dierent antennas have low mutual cor-
relation can be transformed into frequency diversity. This kind of delay
diversity is illustrated in gure 1.13.for the special case of two trans-
mit antennas. The relative delay should be selected to ensure a suit-
able amount of frequency selectivity over the bandwidth of the signal
to be transmitted. It should be noted that, although Figure 1.13 as-
sumes two transmit antennas, delay diversity can straightforwardly be
extended to more than two transmit antennas with dierent relative
delays for each antenna. Delay diversity is in essence invisible to the
mobile terminal, which will simply see a single radio-channel subject to
additional time dispersion. Delay diversity can thus straightforwardly
be introduced in an existing mobile-communication system without
requiring any specic support in a corresponding radio-interface stan-
dard. Delay diversity is also applicable to basically any kind of trans-
mission scheme that is designed to handle and benet from frequency-
selective fading including for example, WCDMA and CDMA2000.
5. Cyclic-delay diversity: Cyclic-Delay Diversity (CDD) is similar to
delay diversity with the main dierence that cyclic-delay diversity op-
erates block-wise and applies cyclic shifts rather than linear delays,
to the dierent antennas (see Figure 1.14 ). Thus cyclic-delay diver-
sity is applicable to block-based transmission schemes such as OFDM
and DFTS-OFDM. In case of OFDM transmission, a cyclic shift of the
time-domain signal corresponds to a frequency-dependent phase shift
before OFDM modulation, as illustrated in Figure 1.14b . Similar to
183
Figure 8.13: Twp Antenna Delay Diversity
delay diversity, this will create articial frequency selectivity as seen by
the receiver. Also similar to delay diversity, CDD can straightforwardly
be extended to more than two transmit antennas with dierent cyclic
shifts for each antenna.
6. Space Diversity: The signal is transferred over several dierent prop-
agation paths. In the case of wired transmission, this can be achieved
by transmitting via multiple wires. In the case of wireless transmis-
sion, it can be achieved by antenna diversity using multiple transmit-
ter antennas (transmit diversity) and/or multiple receiving antennas
(reception diversity).The multiple antennas are separated physically by
a proper distance so that the individual signals are uncorrelated. The
separation requirements vary with antenna height, propagation envi-
ronment and frequency. Typically a separation of a few wavelengths is
enough to obtain uncorrelated signals. In space diversity, the replicas
of the transmitted signals are usually provided to the receiver in the
form of redundancy in the space domain. Unlike time and frequency
diversity, space diversity does not induce any loss in bandwidth e-
ciency. This property is very attractive for future high data rate wire-
less communications. In the latter case, a diversity combining tech-
nique is applied before further signal processing takes place. If the an-
tennas are far apart, for example at dierent cellular base station sites
or WLAN access points, this is called macrodiversity. If the antennas
are at a distance in the order of one wavelength, this is called micro-
diversity. A special case is phased antenna arrays, which also can be
184
Figure 8.14: Twp Antenna Cyclic Delay Diversity
used for beamforming, MIMO channels and Spacetime coding (STC).
8.2.2 Receive Diversity:
It called also SIMO (single input multiple output system) as we use multi
antenna at the receiver as shown in Figure 1.14.
Receive diversity is most often used in the uplink. Here, the base station
uses two antennas to pick up two copies of the received signal. The signals
reach the receive antennas with dierent phase shifts, but these can be re-
moved gure 1.16. by antenna-specic channel estimation. The base sta-
tion can then add the signals together in phase, without any risk of destruc-
tive interference between them. The signals are both made up from sev-
eral smaller rays, so they are both subject to fading. If the two individual
signals undergo fades at the same time, then the power of the combined
signal will be low. But if the antennas are far enough apart (a few wave-
lengths of the carrier frequency), then the two sets of fading geometries will
be very dierent, so the signals will be far more likely to undergo fades at
completely dierent times. We have therefore reduced the amount of fading
in the combined signal, which in turn reduces the error rate. Base stations
185
Figure 8.15: Receive Diversity
Figure 8.16: main idea of Receive Diversity
186
usually have more than one receive antenna. In LTE, the mobiles test spec-
ications assume that the mobile is using two receive antennas , so LTE
systems are expected to use receive diversity on the downlink as well as the
uplink. A mobiles antennas are closer together than a base stations, which
reduces the benet of receive diversity, but the situation can often be im-
proved using antennas that measure two independent polarizations of the
incoming signal.
Now for the receive diversity how the receiver get the signal from the many copies reached
to him? The answer is by using one technique of the diversity combining tech-
niques which is many types:
1. Selective Combining (SC): In this type the receiver has many diversity
branches he get the information from the branch has the largest signal-
to-noise ratio only so this technique is impractical to the continuous
transmission systems as we have to monitor all the diversity branches
and select of them the largest SNR branch . Moreover, since only one
branch output is used, co-phasing of multiple branches is not required,
so this technique can be used with either coherent or dierential modu-
lation gures 1.17 and 1.18.
2. Threshold Combining: A simpler type of combining, called threshold
combining, avoids the need for a dedicated receiver on each branch by
scanning each of the branches in sequential order and outputting the
187
Figure 8.17: Selective Combining
Figure 8.18: branch selective diversity
188
Figure 8.19: Threshold Combining
rst signal with SNR above a given threshold. As in SC, since only
one branch output is used at a time, co-phasing is not required Once
a branch is chosen, as long as the SNR on that branch remains above
the desired threshold, the combiner outputs that signal. If the SNR on
the selected branch falls below the threshold, the combiner switches to
another branch.
As in SC, since only one branch output is used at a time, co-phasing
is not required. Thus, this technique can be used with either coherent
or dierential modulation. There are several criteria the combiner can
use to decide which branch to switch to and the simplest criterion is to
switch randomly to another branch gure 1.19.
3. Equal Gain Combining: A simpler technique is equal-gain combining,
which co-phases the signals on each branch and then combines them
with equal weighting.
MRC requires knowledge of the time-varying SNR on each branch, which
can be very dicult to measure. A simpler technique is equal-gain com-
bining, which co-phases the signals on each branch and then combines
them with equal weighting This technique doesnt need channel estima-
tion of the envelope but for the phase only. The combiners output can
189
Figure 8.20: Switch-and-examine strategy
be written as:
4. Switched Diversity Combining (SDC): When the signal quality of the
used branch is good, there is no need to look for (to use) other branches,
other branches are needed only when the signal quality decreases. Two
strategies are used:
Switch-and-examine strategy: It Stay with the signal branch until
the envelop drops below a predened threshold gure 1.20.
Switch-and-stay strategy: The receiver switches to the strongest of
the M-1 other signals only if its level exceeds the threshold. Here
less signal discontinuities gure 1.21.
5. Maximal Ratio Combining: MRC Idea: Branches with better signal
energy should be enhanced, where as branches with lower SNRs given
lower weights In maximal ratio combining (MRC) the output is a weighted
sum of all branches due to its SNR . It is the optimal technique be-
cause it maximizes the output SNR. The combiner weights the branches
for maximum SNR. The combiners output can be written as:
190
Figure 8.21: Switch-and-stay strategy
The combiner chooses the weights to be the channel gain conjugate, so
in this technique the channel must be estimated at rst gure 1.22.
At a given time, a signalS
0
is sent from the transmitter .The channel
including the eects of the transmit chain, the air link, and the receive
chain may be modeled by a complex multiplicative distortion composed
of a magnitude response and a phase response. The channel between
the transmit antenna and the receive antenna zero is denoted byh
0
and
between the transmit antenna and the receive antenna one is denoted
byh
1
where
Noise and interference are added at the two receivers. The resulting re-
ceived baseband signals are
where n
0
and n
1
represent complex noise and interference .Assuming
n
0
and n
1
are Gaussian distributed, the maximum likelihood decision
191
Figure 8.22: Maximal Ratio Combining
rule at the receiver for these received signals is to choose signal S
i
if
and only if
Where d2(x,y) is the squared Euclidean distance between signal x and
y calculated by the following expression
We will make combining for the incoming two signal r0 and r1 in order
to get benet of the multipath, here we will use MRRC as we said be-
fore, the receiver combining scheme for two-branch MRRC is as follows
192
Expanding (3) and using (4) and (5) we get Choose Si Detected sym-
bol if
But if we are using QPSK or PSK modulation, all the constellations
have the same magnitude Energy [S
i
[ are equal
Where Es is the energy of the signal. Therefore, for PSK signals, the
decision rule in (6) may be simplied to Choose Si if
193
Detection
After combining the received signals at the receiver it is time to detect the
transmitted symbols that were transmitted from the transmitter whether in
case of using single or multiple antennas at the transmitter. There are two
main types of detectors:
1. Maximum A Posteriori (MAP): It is the optimum detector; it is based
on tracing all the possibilities of the incoming data and chooses the
one with higher probability
Example: If we are using BPSK
Where S
i
is the transmitted signal (1 or -1) at the time instant i and
Y
i
is the received signal at the receiver and S is the estimated output
from the MAP estimator, we can see that if the probability that the
transmitted symbol is 1 given the received signal is bigger than the
probability that the transmitted symbol is -1 given the received signal
then the estimated output is 1 and vice verse. From chain rule P(S/Y)
P(Y) =P(Y/S) P(S) Where P(S
i
) is priors it is the probability of the
194
transmitted symbol e.g. P(S
i
= 0) which is dicult to obtained by the
receiver.
Where P(S
i
) is priors it is the probability of the transmitted symbol
e.g.P(S
i
= 0) which is dicult to obtained by the receiver.
2. Maximum Likelihood Detector (MLD) It based on the same idea as
MAP but the only dierent it neglect the priors as it is dicult to be
obtained and needs a long time to be estimated.
In case of AWGN
Until now we have entered the eect of the channel, after adding the
eect of the channel the detection equation will slightly change. Now
the detector will compare Yi with Sihi not with Si, here we must make
channel estimation rst.
Transmit Diversity
Introduction: Here, we present space-time block codes and evaluate their
performance on MIMO fading channels. We rst introduce the Alamouti
code, which is a simple two branch transmit diversity scheme. The key fea-
ture of the scheme is that it achieves a full diversity gain with a simple maximum-
likelihood decoding algorithm. We also present space-time block codes with
195
Figure 8.23: Transmit Diversity
a large number of transmit antennas based on orthogonal designs. The de-
coding algorithms for space-time block codes with both real and complex
signal constellations are discussed gure 1.23. The performance of the schemes
on MIMO fading channels under various channel conditions is evaluated by
simulations.
Space-Time Codes: Spacetime codes (STCs) provide a new paradigm for
transmission over Rayleigh fading channels using multiple transmit anten-
nas. They are a method employed to improve the reliability of data trans-
mission in wireless communication systems using multiple transmit anten-
nas. STCs rely on transmitting multiple, redundant copies of a data stream
to the receiver in the hope that at least some of them may survive the phys-
ical path between transmission and reception in a good enough state to al-
low reliable decoding. In other words, it turns multipath propagation into a
benet for the user. There are 2 types of STCs:
1. SpaceTime Trellis Coding: SpaceTime Trellis Coding (STTCs) have been
proposed where we combine signal processing at the receiver with cod-
ing techniques appropriate to multiple transmit anten-nas and provides
coding and diversity gain Specic spacetime trellis codes designed for
twofour transmit antennas perform extremely well in slow fading en-
vironments (typical of indoor transmission) and come within 23 dB of
the outage capacity The bandwidth eciency is about threefour times
that of current systems.
2. SpaceTime Block Codes: Spacetime coding is a general term used to in-
dicate multi-antenna transmission schemes where modulation symbols
196
Figure 8.24: SpaceTime Block
are mapped in the time and spatial (transmit-antenna) domain to cap-
ture the diversity oered by the multiple transmit antennas. Two-antenna
spacetime block coding (STBC), more specically a scheme referred
to as SpaceTime Transmit Diversity (STTD), has been part of the 3G
WCDMA standard already from its rst release gure 1.24.
STTD operates on pairs of modulation symbols. The modulation sym-
bols are directly transmitted on the rst antenna. However, on the sec-
ond antenna the order of the modulation symbols within a pair is re-
versed. Furthermore, the modulation symbols are sign-reversed and
complex-conjugated In vector notation, STTD transmission can be ex-
pressed as:
The two-antenna spacetime coding can be said to be of rate one, im-
plying that the input symbol rate is the same as the symbol rate at
each antenna, corresponding to a bandwidth utilization of 1. Space-
time coding can also be extended to more than two antennas. How-
ever, in the case of complex-valued modulation, such as QPSK or 16/64QAM,
spacetime codes of rate one without any inter-symbol interference (or-
thogonal spacetime codes) only exist for two antennas. If inter-symbol
197
interference is to be avoided in the case of more than two antennas,
spacetime codes with rate less than one must be used, corresponding
to reduced bandwidth utilization. SpaceTime Block Codes (STBCs)
act on a block of data at once (similarly to linear block codes) and pro-
vide only diversity gain, but are much less complex in implementa-tion
terms than STTCs. The spacetime codes provide the best possible trade-
o between constellation size, data rate, diversity advantage, and trellis
complexity. We will focus on this type in our study.
SpaceFrequency Block Codes Spacefrequency block coding (SFBC) is
similar to spacetime block coding, with the dierence that the encod-
ing is carried out in the antenna/frequency domains rather than in the
antenna/ time domains. Thus, spacefrequency coding is applicable to
OFDM and other frequency-domain transmission schemes. The space-
frequency equivalence to STTD (which could also be referred to as Space-
Frequency Transmit Diversity, SFTD) is illustrated in Figure 1.25.
As can be seen, the block of (frequency-domain) modulation symbols
a0, a1, a2, a3, is directly mapped to OFDM carriers of the rst an-
tenna, while the block of symbols -a1*, a0*, -a3*, a2* is mapped to the
corresponding subcarriers of the second antenna.
Similar to spacetime coding, the drawback of spacefrequency coding is
that there is no straightforward extension to more than two antennas
unless a rate reduction is acceptable.
between SFBC and two-antenna cyclic-delay diversity in essence lies in
how the block of frequency domain modulation symbols are mapped
to the second antenna. The benet of SFBC compared to CDD is that
SFBC provides diversity at modulation-symbol level while CDD, in
the case of OFDM, must rely on channel coding in combination with
frequency-domain interleaving to provide diversity gure 1.26.
System Block Diagram: STBCs provide the maximum possible trans-
mission rate allowed. For complex constellations, spacetime block codes
can be constructed for any number of transmit antennas, and again
these codes have remarkably simple decoding algorithms based only
198
Figure 8.25: SpaceFrequency Block
Figure 8.26: Transmit Diversity Principle
Figure 8.27
199
Figure 8.28
on linear processing at the receiver. They provide full spatial diver-
sity and half of the maximum possible transmission rate allowed by the
theory of spacetime coding. Alamouti discovered a remarkable scheme
for transmission using two transmit antennas gure 1.27. Spacetime
block coding generalizes the transmission scheme discovered by Alam-
outi to an arbitrary number of transmit antennas and is able to achieve
the full diversity promised by the transmit and receive antennas.
Alamouti method (delay diversity method):
(a) Closed Loop Transmit Diversity Here, the transmitter sends two
copies of the signal in the expected way, but it also applies a phase
shift to one or both signals before transmission. By doing this, it
can ensure that the two signals reach the receiver in phase, without
any risk of destructive interference. The phase shift is determined
by a precoding matrix indicator (PMI), which is calculated by the
receiver and fed back to the transmitter. A simple PMI might in-
dicate two options: either transmit both signals without any phase
shifts, or transmit the second.
with a phase shift of 180. If the rst option leads to destructive in-
terference, then the second will automatically work. Once again,
the amplitude of the combined signal is only low in the unlikely
200
event that the two received signals undergo fades at the same time.
The phase shifts introduced by the radio channel depend on the
wavelength of the carrier signal and hence on its frequency. This
implies that the best choice of PMI is a function of frequency as
well. However, this is easily handled in an OFDMA system, as the
receiver can feed back dierent PMI values for dierent sets of sub-
carriers. The best choice of PMI also depends on the position of
the mobile, so a fast moving mobile will have a PMI that frequently
changes. Unfortunately the feedback loop introduces time delays
into the system, so in the case of fast moving mobiles, the PMI
may be out of date by the time it is used gure 1.28. For this rea-
son, closed loop transmit diversity is only suitable for mobiles that
are moving suciently slowly. For fast moving mobiles, it is better
to use the open loop technique described below.
(b) Open Loop Transmit Diversity open loop transmit diversity that
is known as Alamoutis technique.
The Alamouti scheme is historically the rst space-time block code
to provide full transmit diversity for systems with two transmit an-
tennas. In this section, we present Alamoutis transmit diversity
technique, including encoding and decoding algorithms and its per-
formance.
A)Two-Branch Transmit Diversity with one receiver: The g-
ure below shows the baseband representation of the Alamouti Scheme
with one receiver. In Alamouti Scheme we transmit encoded sequence,
not like MRRC where we send the transmitted message directly. The
encoding is done in space and time (space-time coding). The encod-
ing, however, may also be done in space and frequency gure 1.29. The
scheme uses two transmit antennas and one receive antenna and may
be dened by the following three functions:
The encoding and transmission sequence of information symbols at
the transmitter.
The combining scheme at the receiver.
The decision rule for maximum likelihood detection.
201
Figure 8.29: Two-Branch Transmit Diversity
Let us assume that an M-ary modulation scheme is used. In the Alam-
outi space-time encoder, each group of m information bits is rst mod-
ulated, where m = log2M. Then, the encoder takes a block of two mod-
ulated symbols S0 and S1 in each encoding operation and maps them
to the transmit antennas according to a code matrix given by
Here, the transmitter uses two antennas to send two symbols, denoted
s1 and s2, in two successive time steps. In the rst step, the transmit-
ter sends s1 from the rst antenna and s2 from the second, while in the
second step, it sends s*2 from the rst antenna and s*1 from the sec-
ond. (The symbol indicates that the transmitter should change the
sign of the quadrature component, in a process known as complex con-
jugation.) It is clear that the encoding is done in both the space and
time domains. Let us denote the transmit sequence from antennas one
and two by S1 and S2, respectively.
202
The key feature of the Alamouti scheme is that the transmit sequences
from the two transmit antennas are orthogonal, since the inner product
of the sequences S1 and S2 is zero, i.e.
Now we will transmit the encoded bits. The fading channel coecients
from the rst and second transmit antennas to the receive antenna at
time t are denoted by h0(t) and h1(t), respectively gure 1.30. Assum-
ing that the fading coecients are constant across two consecutive sym-
bol transmission periods, they can be expressed as follows
The receiver can now make two successive measurements of the received
signal, which correspond to two dierent combinations of s1 and s2. It
can then solve the resulting equations, so as to recover the two trans-
mitted symbols. There are only two requirements: the fading patterns
must stay roughly the same between the rst time step and the second,
203
Figure 8.30
and the two signals must not undergo fades at the same time. Both re-
quirements are usually met.
At the receive antenna, the received signals over two consecutive sym-
bol periods, denoted by r0 and r1 for time t and t+T , respectively,
can be expressed as
where n0 and n1 are independent complex variables with zero mean
and power spectral density N0/2 per dimension, representing additive
white Gaussian noise samples at time t and t+T , respectively.
Note that we cannot separate s1 and s2 from the received 2 vec-
tors. But simply and by linear method we can separate them.
204
Substituting the two equation , the maximum likelihood decoding
can be represented as
Thus, the maximum likelihood decoding rule (7) can be separated
into two independent decoding rules for S0 and S1, given by
Therefore, the decision rules in (10) can be further simplied to:
205
Figure 8.31: Two-Branch transmit diversity
B)Two-Branch transmit diversity with M receivers: There
may be applications where a higher order of diversity is needed and
multiple re-ceive antennas at the remote units are feasible. In such cases,
it is possible to provide a diversity order of 2M with two transmit and
receive antennas M gure 1.31.
206
The received signals at the two receive antennas:
There is no equivalent to Alamoutis technique for systems with more
than two antennas. Despite this, some extra diversity gain can still be
achieved in four antenna systems, by swapping back and forth between
the two constituent antenna pairs. This technique is used for four an-
tenna open loop diversity in LTE. We can combine open and closed
loop transmit diversity with the receive diversity techniques from ear-
lier, giving a system that carries out diversity processing using multi-
ple antennas at both the transmitter and the receiver. The technique is
dierent from the spatial multiplexing techniques that we will describe
next, although, as we will see, a spatial multiplexing system can fall
back to diversity transmission and reception if the conditions require.
Summary of Alamoutis scheme :
(a) Assumptions:
We have perfect channel knowledge at Rx.
207
Uncorrelated data streams (Flat fading).
(b) Advantages
The transmissions are orthogonal. This implies that the RX an-
tenna.
Simple maximum Likelihood decoding algorithm based on linear
processing of received signals.
Open-loop transmit diversity scheme (no feed-back from RX to
TX i.e. no need for channel information.
No B.W. expansion (as redundancy is applied in space across
multiple antennas, not in time or frequency).
Low complexity decoders.
Identical to MRC if we doubled total radiated power from that
used in MRC.
(c) Disadvantages
No coding gain unlike Space Time Trellis Codes.
Complexity of maximum Likelihood detectors rises exponen-
tially with the number of transmits antennas.
Spatial Interference.
8.3 Spatial multiplexing
8.3.1 Principles of Operation
Spatial multiplexing has a dierent purpose from diversity processing. If
the transmitter and receiver both have multiple antennas, then we can set
up multiple parallel data streams between them, to increase the data rate.
In a system with NT transmit and NR receive antennas, often known as an
NT NR spatial multiplexing system, the peak data rate is proportional
to min(NT,NR). Figure1.32 shows a basic spatial multiplexing system, in
which the transmitter and receiver both have two antennas. In the trans-
mitter, the antenna mapper takes symbols from the modulator two at a
time, and sends one symbol to each antenna. The antennas transmit the
two symbols simultaneously, so as to double the transmitted data rate. The
208
Figure 8.32
symbols travel to the receive antennas by way of four separate radio paths,
so the received signals can be written as follows:
y1 = H11x1 +H12x2 +n1
y2 = H21x1 +H22x2 +n2
Here, x1 and x2 are the signals sent from the two transmit antennas, y1
and y2 are the signals that arrive at the two receive antennas, and n1 and
n2 represent the received noise and interference. Hij expresses the way in
which the transmitted symbols are attenuatedand phase-shifted, as they
travel to receive antenna i from transmit antenna j. (The subscripts i and
j may look the wrong way round, but this is for consistency with the usual
mathematical notation for matrices.) In general, all the terms in the equa-
tion above are complex. In the transmitted and received symbols xj and yi
and the noise terms ni , the real and imaginary parts are the amplitudes of
the in-phase and quadrature components. Similarly, in each of the channel
elements Hij , the magnitude represents the attenuation of the radio signal,
while the phase represents the phase shift.
8.3.2 V-blast
Recent information theory research has shown that the rich-scattering wire-
less channel is capable of enormous theoretical capacities if the multipath is
209
properly exploited.
Introduction
The diagonally-layered space-time architecture proposed by Foschini , now
known as di- agonal BLAST (Bell Laboratories Layered Space-Time) or
D-BLAST, is one such ap- proach. D-BLAST utilizes multi-element an-
tenna arrays at both transmitter and receiver and an elegant diagonally
layered coding structure in which code blocks are dispersed across diag-
onals in space-time. In an independent Rayleigh scattering environment,
this processing structure leads to theoretical rates which grow linearly with
the number of an- tennas (assuming equal numbers of transmit and receive
antennas) with these rates approaching 90% of Shannon capacity. How-
ever, the diagonal approach suers from certain implementation complexi-
ties which make it inappropriate for initial implementation. System overview:
Operation
Single data stream is demultiplexed into M substreams. Each substream is then encoded
into symbols and fed to its respective transmitter. Transmitters operate co-channel, sym-
bols are synchronized. All use same QAM constellation. Transmitted substreams are inde-
pendent. V-BLAST is not transmit diversity.That transmissions are organized into bursts
of L symbols. Receivers 1 N are individually conventional QAM receivers. These receivers
also operate co-channel, each receiving the signals radiated from all M transmit antennas.
Basic Idea: Treat each substream in turn as desired signal, rest as interferers,and then
use AAA like techniques to detect each. (AAA= adaptive antenna array).Nulling is per-
formed by linearly weighting the received signals so as to satisfy some performance related
criterion, such as minimum mean-squared error (MMSE) or zero-forcing (ZF).
Zero forcing:
210
Figure 8.33: Demodulation/decoding of spatially multiplexed signals based on successive interfer-
ence cancellation
Successive interference cancellation: A superior technique to use successive inter-
ference cancellation with nulling zeroforcing. Where interference from already-detected
components of a is subtracted out from the re- ceived signal vector, resulting in a modied
received vector in which eectively fewer interferers are present gure 1.33.
Note: when symbol cancellation is used, the system performance is aected by the order
in which the components of a are detected, whereas it does not matter when pure nulling
is used.
Detection algorithm:
Simulation:
We used bpsk modulation.
Flat fading (rayleigh multipath channel)
211
Figure 8.34: 2 2 MIMO channel
In a 2 2 MIMO channel gure 1.34, probable usage of the available 2 transmit antennas
can be as follows:
1. Consider that we have a transmission sequence, for example x1,x2.
2. In normal transmission, we will be sending in the rst time slot x1, in the second
time slotx2, and so on.
3. However, as we now have 2 transmit antennas, we may group the symbols into groups
of two. In the rst time slot, send x1 and x2 from the rst and second an- tenna. In
second time slot, send x3 and x4 from the rst and second antenna, send x5and x6 in
the third time slot and so on.
4. Notice that as we are grouping two symbols and sending them in one time slot, we
need only time slots to complete the transmission data rate is doubled.
System Model: The received signal on the rst receive antenna is
The received signal on the second receive antenna is
where:
y1,y2 are the received symbol on the rst and second antenna respectively.
h1,1 is the channel from 1
st
transmit antenna to 1
st
receive antenna.
h1,2 is the channel from 2
nd
transmit antenna to 1
st
receive antenna.
h2,1 is the channel from 1
st
transmit antenna to 2
nd
receive antenna.
h2,2 is the channel from 2
nd
transmit antenna to 2
nd
receive antenna.
x1,x2 are the transmitted symbols and n1,n2 is the noise on receive antennas.
For convenience, the above equation can be represented in matrix notation as follows:
212
Equivalently
To solve for x The Zero Forcing (ZF) linear detector for meeting this constraint WH = I
is given by:
To do the Successive Interference Cancellation (SIC), the receiver needs to
perform the following:
Using successive interference cancellation: In classical Successive Interference Can-
cellation, the receiver arbitrarily takes one of the estimated symbols, and subtract its ef-
fect from the received symbol and . However, we can have more intelligence in choosing
whether we should subtract the eect of x1 rst or x2 rst. To make that decision, let us
nd out the transmit symbol (after multiplication with the channel) which came at higher
power at the receiver. The re-ceived power at the both the antennas corresponding to the
transmitted symbol x1 is
The received power at the both the antennas corresponding to the transmitted symbol is
r = hx1 +n
The equalized symbol is
x1 =
h
H
r
h
H
h
BER curve of ZF-sic and ZF:
213
8.3.3 spatial multiplexing Types :
1. Closed loop spatial multiplexing: In the closed-loop spatial multiplexing mode, the
NodeBapplies the spatial domain precoding on the transmitted signal taking into ac-
count the precoding matrix indicator (PMI) reported by the UE so that the trans-
mitted signal matches with the spatial channel experienced by the UE . To support
the closed-loop spatial multiplexing in the downlink, the UE needs to feedback the
rank indicator (RI), the PMI, and the channel quality indicator (CQI) in the uplink.
2. Open loop spatial multiplexed : Operated when reliable PMI feedback is not avail-
able at the e-Node-B, for example, The feedback consists of the RI and the CQI in
open-loop spatial multiplexing.
214
A transmission diversity scheme is used for rank-1 open loop transmissions. However,
for rank greater than one, the open-loop transmission scheme uses large-delay CDD
along with a xed precoder matrix for the two-antenna-ports P = 2 case, while pre-
coder cycling is used for the four-antenna-ports P = 4 case. The xed precoder used
for the case of two antenna ports is the identity matrix. Therefore, the precoder for
data resource element index i, denoted byW (i), is simply given as:
8.4 Downlink MIMO modes in LTE
Dierent downlink MIMO modes are envisaged in LTE which can be ad-
justed according to channel condition, trac requirements, and UE capabil-
ity. The following transmission modes are possible in LTE:
Single-Antenna transmission, no MIMO.
Transmit diversity.
Open-loop spatial multiplexing, no UE feedback required.
Closed-loop spatial multiplexing, UE feedback required.
Multi-user MIMO (more than one UE is assigned to the same Resource
block).
Closed-loop precoding for rank=1 (i.e. no spatial multiplexing, but pre-
coding is used).
Beam forming.
215
Figure 8.35
Downlink MIMO transmission chain
four-Tx transmission diversity respectively. We note that the term layer,
which generally refers to a stream in MIMO spatial multiplexing, can be
confusing when used in the context of transmission diversity. In transmis-
sion diversity, a single codeword is transmitted, which is Eectively a sin-
gle rank transmission. After layer mapping, transmission diversity precod-
ing, Which is eectively an SFBC block code for 2-Tx antennas and a bal-
anced SFBC-FSTD code for 4-Tx antennas, is applied. The signals after
transmission diversity precoding are mapped to time-frequency resources
on two or four antennas for the SFBC and balanced SFBC-FSTD cases and
OFDM signal generation by use of IFFT takes place shown in gure 1.35.
In the following sections, we will only discuss layer mapping and precoding
parts that are relevant for transmit diversity discussion.
216
Codeword to layer mapping In the case of transmit diversity transmission; a
single codeword is transmitted from two or four antenna ports. The num-
ber of layers in the case of transmit diversity is equal to the number of an-
tenna ports. The number of modulation symbols per layer M
layer
symb
for 2 and
4 layers is given by:
Where M
0
symb
represents the total number of modulation symbols within
the codeword. In the case of two antenna ports, the modulation symbols
from a single codeword are mapped to 2 (= 2) layers as below:
In the case of four antenna ports, the modulation symbols from a single
codeword are mapped to 4 layers (= 4) as below:
The codeword to layer mapping for two and four antenna ports transmit
diversity (TxD) transmissions in the downlink is shown in Figure 1.35. In
the case of two antenna ports (two layers), the even numbered (d
0
(0), d
0
(2), ...)and
odd-numbered (d
0
(1), d
0
(3), ...)codeword modulation symbols are mapped
to layers 0 and 1 respectively. In the case of four antenna ports 1/4 of the
codeword modulation symbols are mapped to a given layer as given by pre-
vious equation .
Transmit diversity precoding The block of vectors at the output of the layer
mapper x(i) = [x
0
(i), .....x
1
(i)]
T
i is provided as input to the precoding
217
Figure 8.36
Figure 8.37
218
stage The precoding stage then generates another block of vectors y(i) =
[y
0
(i), .....y
p1
(i)]
T
as shown in Figure 1.37.
This block of vectors is then mapped onto resources on each of the antenna
ports. The symbols at the output of precoding for antenna port p,y
(p)
(i)
are given as:
For the case of two antenna ports transmit diversity, the output of the pre-
coding operation is written as:
Where x
0
I
(i)andx
0
Q
(i)are real and imaginary parts of the modulation symbol
on layer 0 and x
1
I
(i)andx
1
Q
(i)are real and imaginary parts of the modulation
symbol on layer 1.
We note that the number of modulation symbols for mapping to resource
elements is two times the number of modulation symbols per layer, that is
M
map
symb
= 2 M
layer
symb
.
The transmit diversity precoding and RE mapping for two antenna ports is
shown in Figure 1.38. We note that the precoding and RE mapping opera-
tions result in a space frequency block coding (SFBC) scheme.
So
219
Figure 8.38: Transmit diversity precoding and RE mapping for two antenna ports
We note that the number of modulation symbols for mapping to resource
220
Figure 8.39
elements is four times the number of modulation symbols per layer, thatM
map
symb
=
4 M
layer
symb
. . The transmit diversity precoding and RE mapping for four an-
tenna ports is shown in Figure 1.39.
We note that the four antenna ports precoding and RE mapping operations
results in a balanced SFBC-FSTD scheme as is also illustrated by an alter-
native representation below:
In spatial multiplexing The LTE system supports transmission of a maxi-
mum of two codewords in the downlink. Each codeword is separately coded
using turbo coding and the coded bits from each codeword are scrambled
separately. The complex-valued modulation symbols for each of the code-
words to be transmitted are mapped onto one or multiple layers. The complex-
valued modulation symbols d
q
(0), ...d
q
(M
q
symp
1) for codeword q are mapped
onto the layers .A rank-1 transmission can happen for the case of one, two
221
or four antenna ports while for rank-2 transmission, the number of antenna
ports needs to be at least 2. In the case of rank-1 transmission, the complex-
valued modulation symbols d
q
(0), ...d
q
(M
q
symp
1)from a single codeword
(q = 0) are mapped to a single layer ( = 0) Also the number of modula-
tion symbols per layer M
layer
symp
is equal to the number of modulation symbols
per codeword M
0
symp
.It can be noted that for rank-1 transmission, the layer
mapping operation is transparent with codeword modulation symbols sim-
ply mapped to a single layer.In the case of rank-2 transmissions, which can
happen for both two and four antenna ports, the modulation symbols from
the two codewords with (q = 0, 1) are mapped to 2 layers ( = 0, 1) as be-
low:
We note that for rank-2 transmission, the codeword to layer mapping is an
MCW scheme with two codewords mapped to two layers separately as in
the above gure.
MIMO precoding
It is well known that the performance of a MIMO system can be improved
with channel knowledge at the transmitter. The channel knowledge at the
222
Figure 8.40: Illustration of feedback-based MIMO precoding
transmitter does not help to improve the degrees of freedom but power or
beam-forming gain is possible . In a TDD system, the channel knowledge
can be obtained at the eNB by uplink transmissions thanks to channel reci-
procity. However, the sounding signals needs to be transmitted on the up-
link, which represents an additional overhead. In an FDD system, the chan-
nel state information needs to be fed back from the UE to the eNB. The
complete channel state feedback can lead to excessive feedback overhead.
For example in a 4 4 MIMO channel, a total of 16 complex channel gains
from each of the transmission antennas to each of the receive antennas need
to be signaled. An approach to reduce the channel state information feed-
back overhead is to use a codebook gure 1.40. In a closed-loop MIMO pre-
coding system, for each transmission antenna conguration, we can con-
struct a set of precoding matrices and let this set be known at both the
eNB and the UE.
8.4.1 Precoding for two antenna ports
A square matrix with entries given by:
A2 2 (N = 2) Fourier matrix can be expressed as:
223
We can, for example, dene a set of four2 2Fourier matrices by taking G
= 4. These four 2 2 matrices with g = 0, 1, 2, 3are given as below:
The LTE codebook for two antenna ports consists of four precoders for rank-
1 and three precoders for rank-2 as given in next table :
Precoding operation where W(i) is size P precoding matrix, P is number
of ports and ( P) is number of layers transmitted. An example of rank-2
precoding for two and four antenna ports transmissions is shown in . We
assumed the precoders The symbols at the output of precoding is given as:
224
wherex
0
(i)andx
1
(i)represent modulation symbols from codewords 1 and 2
respectively.
8.4.2 CDD-based precoding
The LTE system also supports a composite precoding by introducing a cyclic
delay diversity (CDD) precoder on top of the precoders described before..
Two types ofCDDprecoding:
1. small-delay CDD.
225
2. large-delay CDD.
The goal of small-delay precoding is to introduce articial frequency selec-
tivity for opportunistic scheduling gains with low feedback overhead while
the large-delayCDDachieves diversity by making sure that each MIMO code-
wordis transmitted on all the available MIMO layers. Both the small-delay
and large-delay CDD schemes were incorporated in the LTE standard. How-
ever, the small-delay CDD was removed from the specication at the later
stages because the scheduling gains promised were small, particularly when
feedback-based precoding can be employed for closed-loop MIMO opera-
tion.
Small-delay CDD precoding:
The goal of small-delay CDD precoding is to provide gains by exploiting
frequency selectivity introduced via multi-user scheduling.For small-delay
cyclic delay diversity (CDD), the precoding is a composite precoding of CDD-
based precoding dened by matrix D(i) and precoding matrix W(i) as given
by the relationship below:
where W(i) is size P precodingmatrix, P is number of ports, (P) is
number of layers transmitted and D(i) is a diagonal matrix for support of
cyclic delay diversity. In the case of two antenna ports, the CDD diagonal
matrix D(i) is given as:
Large delay CDD precoding:
For large-delay cyclic delay diversity (CDD), the precoding is a composite
precoding of CDD-based precoding dened by matrix D(i) and precoding
226
matrix W(i) as given by the relationship below:
where W(i) is size P precodingmatrix, P is number of ports, (P) is
number of layers transmitted and D(i) is a diagonal matrix of layers
transmitted and irepresents modulation symbol index within each of the
layers with
In the case of two layers, the large-delay CDD diagonal matrix D(i) and
xed DFT matrix U are given as:
The CDD diagonal matrix D(i) for odd and even iis written as:
227
228
Bibliography
[1] 3GPP. Evolved Universal Terrestrial Radio Access (E-UTRA); Physi-
cal channels and modulation . TS 36.211, 3rd Generation Partnership
Project (3GPP), January 2010.
[2] Agilent TECHNOLOGIES. MIMO in LTE Operation and Measure-
ment.
[3] Siavash M. Alamouti. A simple transmit diversity technique for wire-
less communications. IEEE Journal on select areas in communication,
16(8), October 1998.
[4] Bernard Sklar Charan Langton. Finding mimo.
www.complextoreal.com.
[5] Christopher Cox. An Introduction to LTE. John Wiley & Sons Ltd,
2012.
[6] Stefan Parkvall Erik Dahlman and Johan Skld. 4G LTE/LTE-
Advanced for Mobile Broadband. Elsevier Ltd., 2011.
[7] Arunabha ghosh. fundamentals of LTE. prentice hall.
[8] Harri Holma and Antti Toskala. LTE for UMTS OFDMA and SC-
FDMA Based Radio Access. John Wiley & Sons, Ltd, 2009.
[9] Farooq Khan. LTE for 4G Mobile Broadband. Cambridge university
press, 2009.
[10] Rohde & Schwarz. UMTS Long Term Evolution (LTE) Technology In-
troduction. C.Gessner, 2008.
229
[11] Matthew Baker Stefania Sesia, Issam Touk. LTE The UMTS Long
Term Evolution From Theory to Practice. John Wiley & Sons, Ltd,
2011.
[12] Vahid Tarokh. Spacetime block codes from orthogonal designs. IEEE
TRANSACTIONS ON INFORMATION THEORY, 45(5), July 1999.
230
Chapter 9
Orthogonal Frequency Division
Multiplixing (OFDM)
231
9.1 Introduction
In general, multicarrier schemes subdivide the used channel bandwidth into
a number of parallel subchannels as shown in Figure 9.1 (a). Ideally the
bandwidth of each subchannel is such that they are, ideally, each non-frequency-
selective (i.e. having a spectrally at gain); this has the advantage that the
receiver can easily compensate for the subchannel gains individually in the
frequency domain.
Orthogonal Frequency Division Multiplexing (OFDM) is a special case of
multicarrier transmission where the non-frequency-selective narrowband
subchannels, into which the frequency-selective wideband channel is divided,
are overlapping but orthogonal, as shown in Figure 9.1(b). This avoids the
need to separate the carriers by means of guard-bands, and therefore makes
OFDM highly spectrally ecient. The spacing between the subchannels in
OFDM is such that they can be perfectly separated at the receiver. This
allows for a low complexity receiver implementation, which makes OFDM
attractive for high-rate mobile data transmission such as the LTE down-
link.
It is worth noting that the advantage of separating the transmission into
multiple narrowband subchannels cannot itself translate into robustness
against time-variant channels if no channel coding is employed. The LTE
downlink combines OFDM with channel coding and Hybrid Automatic Re-
peat reQuest (HARQ) to overcome the deep fading which may be encoun-
tered on the individual subchannels.
Figure 9.1: Spectral eciency of OFDM compared to classical multicarrier modulation: (a) clas-
sical multicarrier system spectrum; (b) OFDM system spectrum.
232
9.2 OFDM
9.2.1 Why OFDM
Transmission by means of OFDM can be seen as a kind of multi-carrier
transmission. The basic characteristics of OFDM transmission, which dis-
tinguish it from a straightforward multi-carrier extension of a more narrow-
band transmission scheme as outlined in Figure 9.2 are:
Figure 9.2: Extension to wider transmission bandwidth by means of multi-carrier transmission.
The use of a relatively large number of narrowband subcarriers. In con-
trast, a straightforward multi-carrier extension as outlined in Figure
9.2 would typically consist of only a few subcarriers, each with a rela-
tively wide bandwidth. As an example, a WCDMA multi-carrier evo-
lution to a 20MHz overall transmission bandwidth could consist of four
(sub)carriers, each with a bandwidth in the order of 5 MHz. In com-
parison, OFDM transmission may imply that several hundred subcarri-
ers are transmitted over the same radio link to the same receiver.
Simple rectangular pulse shaping as illustrated in Figure 9.3a. This
corresponds to a sinc-square-shaped per-subcarrier spectrum, as illus-
trated in Figure 9.3b.
Tight frequency-domain packing of the subcarriers with a subcarrier
233
spacing f =1/Tu, where Tu is the per-subcarrier modulation-symbol
time (see Figure 9.4). The subcarrier spacing is thus equal to the per-
subcarrier modulation rate 1/Tu.
An illustrative description of a basic OFDM modulator is provided in
Figure 9.4. It consists of a bank of Nc complex modulators, where each
modulator corresponds to one OFDM subcarrier.
Figure 9.3: Per-subcarrier pulse shape and spectrum for basic OFDM transmission.
Figure 9.4: OFDM subcarrier spacing.
In complex baseband notation, a basic OFDM signal x(t) during the time
interval mTu t <(m+1)Tu can thus be expressed as
x(t) =
N1

K=1
x
k
(t) =
N1

K=1
a
m
k
e
j2kft
(9.1)
234
where x
k
(t) is the kth modulated subcarrier with frequency f
k
=k f and
a
m
k
is the, in general complex, modulation symbol applied to the kth sub-
carrier during the mth OFDM symbol interval, i.e. during the time inter-
val mTu t <(m+1)Tu. OFDM transmission is thus block based, imply-
ing that, during each OFDM symbol interval, Nc modulation symbols are
transmitted in parallel. The modulation symbols can be from any modula-
tion alphabet, such as QPSK, 16QAM, or 64QAM.
The number of OFDM subcarriers can range from less than one hundred
to several thousand, with the subcarrier spacing ranging from several hun-
dred kHz down to a few kHz. What subcarrier spacing to use depends on
what types of environments the system is to operate in, including such as-
pects as the maximum expected radiochannel frequency selectivity (maxi-
mum expected time dispersion) and the maximum expected rate of channel
variations (maximum expected Doppler spread). Once the subcarrier spac-
ing has been selected, the number of subcarriers can be decided based on
the assumed overall transmission bandwidth, taking into account accept-
able out-of-band emission, etc.
As an example, for 3GPP LTE the basic subcarrier spacing equals 15 kHz.
On the other hand, the number of subcarriers depends on the transmission
bandwidth, with in the order of 600 subcarriers in case of operation in a
10MHz spectrum allocation and correspondingly fewer/more subcarriers in
case of smaller/larger overall transmission bandwidths.
9.2.2 Orthogonal Multiplexing Principle
Signals are orthogonal if they are mutually independent of each other.
Orthogonality is a property that allows multiple information signals to be
transmitted perfectly over a common channel and detected, without in-
terference. Mathematically, two functions are orthogonal if their prod-
uct when integrated over certain interval gives zero. We note that although
subcarriers overlap in time , we can separate them due to their orthogonal-
ity.
(m+1)T
u
_
mT
u
x
k1
(t)x

k2
(t) =
(m+1)T
u
_
mT
u
a
k1
a

k2
e
j2k1ft
e
j2k2ft
(9.2)
235
A high-rate data stream typically faces the problem of having a symbol pe-
riod Ts much smaller than the channel delay spread T
d
if it is transmit-
ted serially. This generates Inter- Symbol Interference (ISI) which can only
be undone by means of a complex equalization procedure. In general, the
equalization complexity grows with the square of the channel impulse re-
sponse length. In OFDM, the high-rate stream of data symbols is rst Serial-
to-Parallel (S/P) converted for modulation onto M parallel subcarriers as
shown in Figure 9.5. This increases the symbol duration on each subcar-
rier by a factor of approximately M, such that it becomes signicantly longer
than the channel delay spread. This operation has the important advantage
Figure 9.5: Serial-to-Parallel (S/P) conversion operation for OFDM.
of requiring a much less complex equalization procedure in the receiver, un-
der the assumption that the time-varying channel impulse response remains
substantially constant during the transmission of each modulated OFDM
symbol. Figure 9.6 shows how the resulting long symbol duration is vir-
tually unaected by ISI compared to the short symbol duration, which is
highly corrupted. Figure 9.7 shows the typical block diagram of an OFDM
system. The signal to be transmitted is dened in the frequency domain.
An S/P converter collects serial data symbols into a data block S[k] = [S
0
[k], S
1
[k], ..., S
M1
[k]]T
of dimension M, where k is the index of an OFDM symbol (spanning the
M subcarriers). The M parallel data streams are rst independently modu-
lated resulting in the complex vector
X[k] = [X
0
[k], X
1
[k], ..., X
M1
[k]]T .
Note that in principle it is possible to use dierent modulations (e.g. QPSK
or 16QAM) on each subcarrier; due to channel frequency selectivity, the
236
channel gain may dier between subcarriers, and thus some subcarriers can
carry higher data-rates than others. The vector X [k] is then used as input
to an N-point Inverse FFT (IFFT) resulting in a set of N complex time-
domain samples x[k] = [x
0
[k], ..., x
N1
[k]]T . In a practical OFDM system,
the number of processed subcarriers is greater than the number of modu-
lated subcarriers (i.e. N M), with the un-modulated subcarriers being
padded with zeros.
Figure 9.6: Eect of channel on signals with short and long symbol duration.
The next key operation in the generation of an OFDM signal is the cre-
ation of a guard period at the beginning of each OFDM symbol x [k] by
adding a Cyclic Prex (CP), to eliminate the remaining impact of ISI caused
by multipath propagation. The CP is generated by duplicating the last G
samples of the IFFT output and appending them at the beginning of x [k].
This yields the time domain OFDM symbol [x
NG
[k], ..., x
N1
[k], x
0
[k], ..., x
N1
[k]]T
, as shown in 9.8.
To avoid ISI completely, the CP length G must be chosen to be longer than
the longest channel impulse response to be supported. The CP converts the
linear (i.e. aperiodic) convolution of the channel into a circular (i.e. peri-
odic) one which is suitable for DFT processing. The insertion of the CP
into the OFDM symbol and its implications are explained more formally
later in this section.
The output of the IFFT is then Parallel-to-Serial (P/S) converted for trans-
237
Figure 9.7: OFDM system model: (a) transmitter; (b) receiver.
Figure 9.8: OFDM Cyclic Prex (CP) insertion.
238
mission through the frequency-selective channel. At the receiver, the re-
verse operations are performed to demodulate the OFDM signal. Assuming
that time- and frequency-synchronization is achieved , a number of samples
corresponding to the length of the CP are removed, such that only an ISI-
free block of samples is passed to the DFT. If the number of subcarriers N
is designed to be a power of 2, a highly ecient FFT implementation may
be used to transform the signal back to the frequency domain. Among the
N parallel streams output from the FFT, the modulated subset of M sub-
carriers are selected and further processed by the receiver. Let x(t) be the
symbol transmitted at time instant t. The received signal in a multipath
environment is then given by
r(t) = x(t) h(t) +z(t) (9.3)
where h(t) is the continuous-time impulse response of the channel, rep-
resents the convolution operation and z(t) is the additive noise. Assuming
that x(t) is band-limited to [
1
2Ts
,
1
2Ts
], the continuous-time signal x(t) can
be sampled at sampling rate Ts such that the Nyquist criterion is satised.
As a result of the multipath propagation, several replicas of the transmitted
signals arrive at the receiver at dierent delays.
9.2.3 OFDM adventage and disadventages
OFDM adventages
OFDM is an ecient way to deal with multipath eects.
Bandwidth eciency is high since it uses overlapping orthogonal sub-
carriers.
It is possible to enhance capacity signicantly by adapting the data
rate per subcarriers according to the SNR of that particular subcarrier.
OFDM disadventages
Intercarrier interference (ICI) due to phase noise and carrier frequency
oset which destroy the orthogonality.
Intersymbol ISI due to channel delays and dispersion.
High value of Peak-to-Average Power Ratio (PAPR).
239
9.2.4 Peak-to-Average Power Ratio and Sensitivity to Non-Linearity
While the previous section shows the advantages of OFDM, this section
highlights its major drawback: the Peak-to-Average Power Ratio (PAPR).
In the general case, the OFDM transmitter can be seen as a linear trans-
form performed over a large block of independent identically distributed
(i.i.d) QAM-modulated complex symbols (in the frequency domain). From
the central limit theorem , the time-domain OFDM symbol may be approx-
imated as a Gaussian waveform. The amplitude variations of the OFDM
modulated signal can therefore be very high. However, practical Power Am-
pliers (PAs) of RF transmitters are linear only within a limited dynamic
range. Thus, the OFDM signal is likely to suer from non-linear distortion
caused by clipping. This gives rise to out-of-band spurious emissions and
in-band corruption of the signal. To avoid such distortion, the PAs have
to operate with large power back-os, leading to inecient amplication or
expensive transmitters.
The PAPR is one measure of the high dynamic range of the input ampli-
tude, and hence a measure of the expected degradation. To analyse the PAPR
mathematically, let x
n
be the signal after IFFT as given by Equation
x
n
[k] =
1

N
N

m=1
X
m
[k]exp(2jm
n
N
) (9.4)
where the time index k can be dropped without loss of generality. The PAPR
of an OFDM symbol is dened as the square of the peak amplitude divided
by the mean power, i.e.
PAPR =
max
n
[x
n
[
2

E[x
n
[
2

(9.5)
Under the hypothesis that the Gaussian approximation is valid, the ampli-
tude of x
n
has a Rayleigh distribution, while its power has a central chi-
square distribution with two degrees of freedom. The Cumulative Distri-
bution Function (CDF) F
x
() of the normalized power is given by
F
x
() = Pr
_
[x
n
[
2

E[x
n
[
2

<
_
= 1 e

(9.6)
240
If there is no oversampling, the time-domain samples are mutually uncorre-
lated and the probability that the PAPR is above a certain threshold PAPR
0
is given by
Pr(PAPR > PAPR
0
) = 1 F
x
(PAPR
0
)
N
= 1 (1 e
PAPR
0
)
N
(9.7)
Figure 9.9 plots the distribution of the PAPR given by Equation ( 9.7 )
for dierent values of the number of subcarriers N. The gure shows that a
high PAPR does not occur very often. However, when it does occur, degra-
dation due to PA non-linearities may be expected.
Figure 9.9: PAPR distribution for dierent numbers of OFDM subcarriers.
9.2.5 PAPR Reduction Techniques
Many techniques have been studied for reducing the PAPR of a transmit-
ted OFDM signal.
Although no such techniques are specied for the LTE downlink signal gen-
eration, an overview of the possibilities is provided below. In general in LTE
the cost and complexi-ty of generating the OFDM signal with acceptable
Error Vector Magnitude (EVM) is left to the eNodeB implementation. As
OFDM is not used for the LTE uplink, such considerations do not directly
apply to the transmitter in the UE.
Techniques for PAPR reduction of OFDM signals can be broadly
categorized into three main concepts:
241
1. Clipping and ltering:
The time-domain signal is clipped to a predened level. This causes
spectral leakage into adjacent channels, resulting in reduced spectral
eciency as well as in-band noise degrading the bit error rate perfor-
mance. Out-of-band radiation caused by the clipping process can, how-
ever, be reduced by ltering.
If discrete signals are clipped directly, the resulting clipping noise will
all fall in band and thus cannot be reduced by ltering. To avoid this
problem, one solution consists of oversampling the original signal by
padding the input signal with zeros and processing it using a longer
IFFT. The oversampled signal is clipped and then ltered to reduce
the out-of-band radiation.
2. Selected mapping:
Multiple transmit signals which represent the same OFDM data sym-
bol are generated by multiplying the OFDM symbol by dierent phase
vectors. The representation with the lowest PAPR is selected. To re-
cover the phase information, it is of course necessary to use separate
control signalling to indicate to the receiver which phase vector was
used.
3. Coding techniques:
These techniques consist of nding the code words with the lowest PAPR
from a set of codewords to map the input data. A look-up table may
be used if N is small. It is shown that complementary codes have good
properties to combine both PAPR and forward error correction.
The latter two concepts are not applicable in the context of LTE; se-
lected mapping would require additional signalling, while techniques
based on codeword selection are not compatible with the data scram-
bling used in the LTE downlink.
9.2.6 Cyclic Prex Insertion
As described in Section 9.2.2, an uncorrupted OFDM signal can be de-
modulated without any interference between subcarriers. One way to un-
242
derstand this subcarrier orthogonality is to recognize that a modulated sub-
carrier x
k
(t) in ( 9.1 ) consists of an integer number of periods of complex
exponentials during the demodulator integration interval Tu =1/f
However, in case of a time-dispersive channel the orthogonality between the
subcarriers will, at least partly, be lost. The reason for this loss of subcar-
rier orthogonality in case of a time-dispersive channel is that, in this case,
the demodulator correlation interval for one path will overlap with the sym-
bol boundary of a dierent path, as illustrated in Figure 9.10. Thus, the
integration interval will not necessarily correspond to an integer number of
periods of complex exponentials of that path as the modulation symbols ak
may dier between consecutive symbol intervals. As a consequence, in case
of a time-dispersive channel there will not only be inter-symbol interference
within a subcarrier but also interference between subcarriers.
Figure 9.10: Time dispersion and corresponding received-signal timing.
Another way to explain the interference between subcarriers in case of a
timedispersive channel is to have in mind that time dispersion on the radio
channel is equivalent to a frequency-selective channel frequency response.
Orthogonality between OFDM subcarriers is not simply due to frequency-
domain separation but due to the specic frequency-domain structure of
each subcarrier. Even if the frequency-domain channel is constant over a
bandwidth corresponding to the main lobe of an OFDM subcarrier and only
the subcarrier side lobes are corrupted due to the radio-channel frequency
selectivity, the orthogonality between subcarriers will be lost with inter-
subcarrier interference as a consequence. Due to the relatively large side
lobes of each OFDM subcarrier, already a relatively limited amount of time
dispersion or, equivalently, a relatively modest radio-channel frequency se-
lectivity may cause non-negligible interference between subcarriers.
243
To deal with this problem and to make an OFDM signal truly insensitive
to time dispersion on the radio channel, so-called cyclic-prex insertion is
typically used in case of OFDM transmission. As illustrated in Figure 9.11,
cyclic-prex insertion implies that the last part of the OFDM symbol is
copied and inserted at the beginning of the OFDM symbol. Cyclic-prex
insertion thus increases the length of the OFDM symbol from T
u
to T
u
+
T
CP
, where T
CP
is the length of the cyclic prex, with a corresponding re-
duction in the OFDM symbol rate as a consequence. As illustrated in the
lower part of Figure 9.11, if the correlation at the receiver side is still only
carried out over a time interval T
u
=1/f , subcarrier orthogonality will
then be preserved also in case of a time-dispersive channel, as long as the
span of the time dispersion is shorter than the cyclic-prex length.
Figure 9.11: Cyclic-prex insertion.
In practice, cyclic prex insertion is carried out on the time discrete output
of the transmitter IFFT. Cyclic-prex insertion then implies that the last
N
CP
samples of the IFFT output block of length N is copied and inserted
at the beginning of the block, increasing the block length from N to N +
N
CP
. At the receiver side, the corresponding samples are discarded before
OFDM demodulation by means of, for example, DFT/FFT processing.
Cyclic-prex insertion is benecial in the sense that it makes an OFDM
signal insensitive to time dispersion as long as the span of the time disper-
244
sion does not exceed the length of the cyclic prex. The drawback of cyclic
prex insertion is that only a fraction T
u
/(T
u
+ T
CP
) of the received sig-
nal power is actually utilized by the OFDM demodulator, implying a cor-
responding power loss in the demodulation. In addition to this power loss,
cyclic prex insertion also implies a corresponding loss in terms of band-
width as the OFDM symbol rate is reduced without a corresponding reduc-
tion in the overall signal bandwidth.
One way to reduce the relative overhead due to cyclic-prex insertion is
to reduce the subcarrier spacing f , with a corresponding increase in the
symbol time T
u
as a consequence. However, this will increase the sensitivity
of the OFDMtransmission to fast channel variations, that is high Doppler
spread, as well as dierent types of frequency errors.
It is also important to understand that the cyclic prex does not necessar-
ily have to cover the entire length of the channel time dispersion. In gen-
eral, there is a trade-o between the power loss due to the cyclic prex and
the signal corruption (inter-symbol and inter-subcarrier interference) due to
residual time dispersion not covered by the cyclic prex and, at a certain
point, further reduction of the signal corruption due to further increase of
the cyclic-prex length will not justify the corresponding additional power
loss. This also means that, although the amount of time dispersion typi-
cally increases with the cell size, beyond a certain cell size there is often no
reason to increase the cyclic prex further as the corresponding power loss
due to a further increase of the cyclic prexwould have a larger negative
impact, compared to the signal corruption due to the residual time disper-
sion not covered by the cyclic prex.
Circular convolution
When an input data stream x[n] is sent through a linear time-invariant
FIR channel h[n] the output is the linear convolution: y[n] = x[n] h[n]
If the convolution is circular convolution, it is possible to take the DFT
of the channel output y[n] to get: DFTy[n] = DFTx[n] h[n] Or
in the frequency domain: Y [m] = X[m]H[m]
This formula describes an ISI-free channel in the frequency domain,
245
where each input symbol X[m] is simply scaled by a complex-value H[m].
For the convolution to be circular we need to add a cyclic prex.
If the maximum channel delay spread has a duration of N + 1 samples,
then by adding a guard band of at least N samples between OFDM
symbols, each OFDM symbol is made independent of those coming be-
fore and after it, and so the ISI between OFDM symbols is avoided.
The channel output y is decomposed into a simple multiplication of the
channel frequency response H = DFTh and the channel frequency
domain input, X = DFTx.
The cyclic prex is not entirely free. It comes with both a bandwidth
and power penalty.
Since N redundant symbols are sent, the required bandwidth for OFDM
increases from B to (L +N/L)B.
An additional v symbols must be counted against the transmit power
budget. The use of cyclic prex entails data rate and power losses that
are both: RateLoss = PowerLoss = L/(L +V )
9.2.7 Frequency-domain model of OFDM transmission
Assuming a suciently large cyclic prex, the linear convolution of a time
dispersive radio channel will appear as a circular convolution during the de-
modulator integration interval T
u
. The combination of OFDM modulation
(IFFT processing), a time-dispersive radio channel, and OFDM demodula-
tion (FFT processing) can then be seen as a frequency-domain channel as
illustrated in Figure 9.12, where the frequency-domain channel taps H
0
, . .
., H
N
c1
can be directly derived from the channel impulse response.
The demodulator output b
k
in Figure 9.12 is the transmitted modulation
symbol ak scaled and phase rotated by the complex frequency-domain chan-
nel tap H
k
and impaired by noise n
k
. To properly recover the transmitted
symbol for further processing, for example data demodulation and chan-
nel decoding, the receiver should multiply b
k
with the complex conjugate
of H
k
, as illustrated in Figure 9.13, This is often expressed as a one-tap
equalizer being applied to each received subcarrier.
246
Figure 9.12: Frequency-domain model of OFDM transmission/reception.
Figure 9.13: Frequency-domain model of OFDM transmission/reception with one-tap equaliza-
tion at the receiver.
247
9.2.8 Channel estimation and reference symbols
As described above, to demodulate the transmitted modulation symbol a
k
and allow for proper decoding of the transmitted information at the receiver
side, scaling with the complex conjugate of the frequency-domain channel
tap H
k
should be applied after OFDM demodulation (FFT processing) (see
Figure 9.13). To be able to do this, the receiver obviously needs an esti-
mate of the frequency-domain channel taps H
0
, . . ., H
N
c1
. The frequency-
domain channel taps can be estimated indirectly by rst estimating the chan-
nel impulse response and, from that, calculate an estimate of H
k
. However,
a more straightforward approach is to estimate the frequency-domain chan-
nel taps directly. This can be done by inserting known reference symbols,
sometimes also referred to as pilot symbols, at regular intervals within the
OFDM time-frequency grid, as illustrated in Figure 9.14. Using knowl-
edge about the reference symbols, the receiver can estimate the frequency-
domain channel around the location of the reference symbol. The reference
symbols should have a suciently high density in both the time and the
frequency domain to be able to provide estimates for the entire time/frequency
grid also in case of radio channels subject to high frequency and/or time se-
lectivity.
Dierent more or less advanced algorithms can be used for the channel esti-
mation, ranging from simple averaging in combination with linear interpo-
lation to Minimum-Mean-Square-Error (MMSE) estimation relying on more
detailed knowledge of the channel time/frequency-domain characteristics.
Figure 9.14: Time-frequency grid with known reference symbols.
248
9.3 OFDM as a user-multiplexing and multiple-access scheme
The discussion has, until now, implicitly assumed that all OFDM subcarri-
ers are transmitted from the same transmitter to a certain receiver, i.e.:
Downlink transmission of all subcarriers to a single mobile terminal.
Uplink transmission of all subcarriers from a single mobile terminal.
However, OFDM can also be used as a user-multiplexing or multiple-accessscheme,
allowing for simultaneous frequency-separated transmissions to/from multi-
ple mobile terminals. See Figure 9.15
Figure 9.15: OFDM as a user-multiplexing/multiple-access scheme : (a) downlink and (b) uplink
In the downlink direction, OFDM as a user-multiplexing scheme implies
that, in each OFDM symbol interval, dierent subsets of the overall set of
available subcarriers are used for transmission to dierent mobile terminals
(see Figure 9.15 a).
Similarly, in the uplink direction, OFDM as a user-multiplexing or multiple-
access scheme implies that, in each OFDM symbol interval, dierent sub-
sets of the overallset of subcarriers are used for data transmission from dif-
ferent mobile terminals
Assumes that consecutive subcarriers are used for transmission to/from the
same mobile terminal. However, distributing the subcarriers to/from a mo-
bile terminal in the frequency domain is also possible as illustrated in Fig-
ure 9.16. The benet of such distributed user multiplexing or distributed
249
Figure 9.16: Distributed user multiplexing
multiple access is a possibility for additional frequency diversity as each
transmission is spread over a wider bandwidth.
In the case when OFDMA is used as an uplink multiple-access scheme, i.e.
in case of frequency multiplexing of OFDM signals from multiple mobile
terminals, it is critical that the transmissions from the dierent mobile ter-
minals arrive approximately time aligned at the base station. More specif-
ically, the transmissions from the dierent mobile terminals should arrive
at the base station with a timing misalignment less than the length of the
cyclic prex to preserve orthogonality between subcarriers received from
dierent mobile terminals and thus avoid inter-user interference.
Figure 9.17: Uplink transmission-timing control
Due to the dierences in distance to the base station for dierent mobile
terminals and the corresponding dierences in the propagation time (which
may far exceed the length of the cyclic prex), it is therefore necessary to
control the uplink transmission timing of each mobile terminal (see Figure
9.17 ). Such transmit timing control should adjust the transmit timing of
each mobile terminal to ensure that uplink transmissions arrive approxi-
mately time aligned at the base station. As the propagation time changes
as the mobile terminal is moving within the cell, the transmittiming con-
trol should be an active process, continuously adjusting the exact transmit
250
timing of each mobile terminal.
Furthermore, even in case of perfect transmittiming control, there will al-
ways be some interference between subcarriers e.g. due to frequency er-
rors. Typically this interference is relatively low in case of reasonable fre-
quency errors, Doppler spread, etc. However, this assumes that the dier-
ent subcarriers are received with at least approximately the same power. In
the uplink, the propagation distance and thus the path loss of the dierent
mobile-terminal transmissions may dier signicantly. If two terminals are
transmitting with the same power, the received-signal strengths may thus
dier signicantly, implying a potentially signicant interference from the
stronger signal to the weaker signal unless the subcarrier orthogonality is
perfectly retained. To avoid this, at least some degree of uplink transmit-
power control may need to be applied in case of uplink OFDMA, reducing
the transmit power of user terminals close to the base station and ensuring
that all received signals will be of approximately the same power.
9.4 The downlink physical resource:
LTE downlink transmission is based on OFDM. The basic LTE downlink
physical resource can thus be seen as a time-frequency resource grid (Fig-
ure 9.18), where each resource element corresponds to one OFDM subcar-
rier during one OFDM symbol interval.
Figure 9.18: The LTE downlink physical resource
For LTE, the OFDM subcarrier spacing has been chosen to f =15 kHz.
Assuming an FFT-based transmitter/receiver implementation, this corre-
sponds to a sampling rate f
s
= 15 000 * NFFT, where NFFT is the FFT
size. The basic time unit T
s
dened in the pre-vious section can thus be
251
seen as the sampling time of an FFT-based transmitter/receiver implemen-
tation with an FFT size equal to 2048.
It is important to understand though that the time unit T
s
is introduced in
the LTE radio-access specications purely as a tool to dene dierent time
intervals and does not impose any specic transmitter and/or receiver im-
plementation constraints (e.g. a certain sampling rate).
In practice, an FFT-based transmitter/receiver implementation with an
FFT size equal to 2048 and a corresponding sampling rate of 30.72 MHz
is suitable for the wider LTE transmission bandwidths, such as bandwidths
in the order of 15 MHz and above. However, for smaller transmission band-
widths, a smaller FFT size and a correspondingly lower sampling rate can
very well be used. As an example, for transmission bandwidths in the order
of 5 MHz, an FFT size equal to 512 and a corresponding sampling rate of
7.68 MHz may be sucient.
Assuming a power-of-two FFT size and a subcarrier spacing of 15 kHz, the
sampling rate fNFFT will be a multiple or submultiple of the WCDMA/HSPA
chip rate (3.84 Mcps). This relation can be utilized when implementing mul-
timode terminals supporting both WCDMA/HSPA and LTE.
In addition to the 15 kHz subcarrier spacing, a reduced subcarrier spacing
ow = 7.5 kHz with twice as long OFDM symbol time is also dened
for LTE. The reduced subcarrier spacing specically targets MBSFN-based
multicast/broadcast transmissions.
As illustrated in Figure 9.19, in the frequency domain the downlink sub-
carriers are grouped into resource blocks, where each resource block con-
sists of 12 consecutive sub-carriers. In addition, there is an unused DC-subcarrier
in the center of the downlink band.
The reason why the DC-subcarrier is not used for downlink transmission is
that it may be subject to un-proportionally high interference, for example,
due to local-oscillator leakage.
The LTE physical-layer specication allows for a downlink carrier to consist
of any number of resource blocks, ranging from a minimum of 6 resource
blocks up to a maximum of 110 resource blocks. This corresponds to an
overall downlink transmission bandwidth ranging from roughly 1 MHz up
to in the order of 20 MHz with very ne granularity and thus allows for
252
Figure 9.19: Frequency-domain structurefor LTE downlink
a very high degree of LTE bandwidth exibility, at least from a physical-
layer-specication point-of-view. However, LTE radio-frequency require-
ments are, at least initially, only specied for a limited set of transmission
bandwidths, corresponding to a limited set of possible values for the num-
ber of resource blocks within a carrier.
Figure 9.20 outlines the more detailed time-domain structure for LTE down-
link transmission. Each 1 ms subframe consists of two equally sized slots of
length T
slot
= 0.5 ms (15 360 * T
s
). Each slot then consists of a number
of OFDM symbols including cyclic prex. A subcarrier spacing of 15 kHz
corresponds to a useful symbol time of approximately 66.7 s. The over-
all OFDM symbol time is then the sum of the useful symbol time and the
cyclic-prex length.
As illustrated in Figure 9.20 , LTE denes two cyclic-prex lengths, the
normal cyclic prex and an extended cyclic prex, corresponding to seven
and six OFDM symbols per slot, respectively.
The exact cyclic-prex lengths, expressed in the basic time unit Ts , are
given in Figure 9.21. It can be noted that, in case of the normal cyclic pre-
x, the cyclic-prex length for the rst OFDM symbol of a slot is some-
what larger, compared to the remaining OFDM symbols. The reason for
this is simply to ll the entire 0.5 ms slot as the number of basic time units
Ts per slot (15 360) is not dividable by seven.
The reasons for dening two cyclic-prex lengths for LTE are twofold:
A longer cyclic prex, although less ecient from a cyclic-prex-overhead
point-of-view, may be benecial in specic environments with very ex-
253
Figure 9.20: detailed time domain structure for LTE downlink transmission
Figure 9.21
254
tensive delay spread, for example in very large cells. It is important to
have in mind, though, that a longer cyclic prex is not necessarily ben-
ecial in case of large cells, even if the delay spread is very extensive in
such cases. If, in large cells, link performance is limited by noise rather
than by signal corruption due to residual time dispersion not covered
by the cyclic prex, the additional robustness to radio-channel time
dispersion, due to the use of a longer cyclic prex, may not justify the
corresponding loss in terms of reduced received signal energy.
In case of MBSFN-based multicast/ broadcast transmission, the cyclic
prex should not only cover the main part of the actual channel time
dispersion but also the timing dierence between the transmissions re-
ceived from the cells involved in the MBSFN transmission. In case of
MBSFN operation, the extended cyclic prex is therefore often needed.
Thus, the main use of the extended cyclic prex can be expected to be
MBSFN-based transmission. It should be noted that dierent cyclic-
prex lengths may be used for dierent subframes within a frame. As
an example, MBSFN-based multicast/broadcast transmission is typ-
ically conned to certain subframes in which case the use of the ex-
tended cyclic prex, with its associated additional cyclic-prex over-
head, may only be applied to these subframes.
Taking into account also the downlink time-domain structure, the resource
blocks mentioned above consist of 12 subcarriers during a 0.5 ms slot, as
illustrated in Figure 9.22. Each resource block thus consists of 84 resource
elements in case of normal cyclic prex and 72 resource elements in case of
extended cyclic prex.
Figure 9.22: downlink resource block assuming normal cyclic prex (i.e 7 OFDM symbols per
slot). with extended cyclic prex there are six OFDM symbols per slot.
Although resource blocks are dened over one slot, the basic time-domain
unit for dynamic scheduling in LTE is one subframe, consisting of two con-
255
secutive slots. The reason to dene the resource blocks over one slot is that
distributed downlink transmission is dened on a slot basis.
The minimum scheduling unit consisting of two resource blocks within one
subframe (one resource block per slot) is sometimes referred to as a resource-
block pair .
256
Bibliography
[1] Johan Skold Erik Dahlman, Stefan Parkvall and Per Beming. 3G Evolu-
tion HSPA and LTE for Mobile Broadband. First editionl. Elsevier Pub-
lishers, 2007.
[2] Matthew Baker Stefania Sesia, Issam Touk. The UMTS Long Term
Evolution. A John Wiley and Sons, Ltd., Publication, 2011.
257
258
Appendix A
Matlab
A.1 Communications System Toolbox
comm.BPSKModulator: Modulate using BPSK method
comm.BPSKDemodulator: deModulate using BPSK method
comm.OSTBCEncoder: The OSTBCEncoder object encodes an input sym-
bol sequence using orthogonal space-time block code (OSTBC). The block
maps the input symbols block-wise and concatenates the output codeword
matrices in the time domain.
comm.OSTBCCombiner: The OSTBCCombiner object combines the input
signal (from all of the receive antennas) and the channel estimate signal to
extract the soft information of the symbols encoded by an OSTBC. The
input channel estimate does not need to be constant and can vary at each
call to the step method. The combining algorithm uses only the estimate
for the rst symbol period per codeword block. A symbol demodulator or
decoder would follow the Combiner object in a MIMO communications sys-
tem. paragraphcomm.AWGNChannel The AWGNChannel object adds white
Gaussian noise to a real or complex input signal. When the input uses a
real-valued signal, this object adds real Gaussian noise and produces a real
output signal. When the input uses a complex signal , this object adds com-
plex Gaussian noise and produces a complex output signal.
259
Berfading: Bit error rate (BER) for Rayleigh and Rician fading channels
For All Syntaxes The rst input argument, EbNo, is the ratio of bit en-
ergy to noise power spectral density, in dB. If EbNo is a vector, the output
ber is a vector of the same size, whose elements correspond to the dierent
Eb/N0 levels.
Most syntaxes also have an M input that species the alphabet size for the
modulation. M must have the form 2k for some positive integer k.
berfading uses expressions that assume Gray coding. If you use binary cod-
ing, the results may dier.
For cases where diversity is used, the Eb/N0 on each diversity branch is
EbNo/divorder, where divorder is the diversity order (the number of diver-
sity branches) and is a positive integer.
comm.TurboEncoder: The Turbo Encoder System object encodes a binary
input signal using a parallel concatenated coding scheme. This coding scheme
uses two identical convolutional encoders and appends the termination bits
at the end of the encoded data bits.
comm.AWGNChannel: The AWGNChannel object adds white Gaussian noise
to a real or complex input signal. When the input uses a real-valued sig-
nal, this object adds real Gaussian noise and produces a real output signal.
When the input uses a complex signal , this object adds complex Gaussian
noise and produces a complex output signal.
comm.TurboDecoder: The Turbo Decoder System object decodes the input
signal using a parallel concatenated decoding scheme that employs the a-
posteriori probability (APP) decoder as the constituent decoder. Both con-
stituent decoders use the same trellis structure and algorithm.
comm.ErrorRate: The ErrorRate object compares input data from a trans-
mitter with input data from a receiver and calculates the error rate as a
running statistic. To obtain the error rate, the object divides the total num-
ber of unequal pairs of data elements by the total number of input data el-
ements from one source.
260
A.2 Fixed Point Toolbox
Construct xed-point numeric object
bin Binary representation of stored integer of object
hex Hexadecimal representation of stored integer of object
buildInstrumentedMex Generate MEX function with logging instrumenta-
tion
showInstrumentationResults Results logged by instrumented MEX function
accel Accelerate xed-point code
A.3 Matlab
svd: compute singular value decomposition of symbolic matrix
pinv: Moore-Penrose pseudoinverse of matrix
A.4 HDL Verier
The HDL Verier software provides a means for verifying HDL modules
using the HDL Cosimulation System object. You can use the System ob-
ject as a test bench or you can use it to represent a component still un-
der design. You can use the Cosim Wizard to create an HDL Cosimulation
System object from existing HDL code or you can create and populate the
System object manually .
A.4.1 Workow for Using the Cosimulation Wizard to Create a MATLAB
System Object
The workow for creating a System object using existing HDL code for cosim-
ulation with MATLAB is as follows:
261
1. Start Cosimulation Wizard.
2. Select HDL Cosimulation type as MATLAB System Object.
3. Select HDL les to use in creating block or function.
4. Specify commands for HDL compilation.
5. Select HDL module for cosimulation.
6. Congure input and output ports.
7. Provide output port details.
8. Provide clock and reset details.
9. Conrm or change start-time alignment.
10. Generate System object.
11. Create System object test bench.
For a step by step example see
http://www.mathworks.com/products/hdl-verifier/examples.html?
file=/products/demos/shipping/edalink/Tutorial_MATLAB_SysObj_
IN.html
262
Appendix B
Xilinx ISE Overview
The Xilinx ISE system is an integrated design environment that that con-
sists of a set of programs to create (capture), simulate and implement digi-
tal designs in a FPGA or CPLD target device. All the tools use a graphical
user interface (GUI) that allows all programs to be executed from toolbars,
menus or icons. On-line help is available from most windows. This write-up
is intended to get you started with the ISE tools. It gives a quick overview
of how to create a design, simulate it and download it into a FPGA. For
more detailed information please consult the on-line XILINX documenta-
tion and tutorials. The ISE User Guide is available on line.
B.1 Design Flow Overview
The following steps are involved in the realization of a digital system using
Xilinx FPGAs, as illustrated by gure (A.1).
263
B.1.1 Design Entry
The rst step is to enter y our design. This can be done by creating Source
les. Source les can be created in dierent formats such as a schematic, or
a Hardware Description Language (HDL) such as VHDL, Verilog or ABEL.
A project design will consist of a top-level source le and various lowerlevel
source les. Any of these les can be either a schematic or a HDL le.
B.1.2 Design Synthesis
The synthesis step creates netlist les from the various source les. The
netlist les can serve as input to the implementation module.
B.1.3 Design Verication (simulation)
This is an important step that should be done at various stages of the de-
sign. The simulator is used to verify the functionality of a design (func-
tional simulation), the behavior and the timing (timing simulation) of your
circuit. Timing simulation is run after implementing your circuit in the FPGA
since it needs to know the actual placement and routing to nd out the ex-
act speed and timing of the circuit.
264
B.1.4 Design Implementation
After generating the netlist le (synthesis step), the implementation will
convert the logic design into a physical le that can be downloaded on the
target device (e.g. Virtex FPGA). This step involves three sub-steps: Trans-
lating the netlist, Mapping and Place.
B.1.5 Device Conguration
This refers to the actual programming of the target FPGA by downloading
the programming le to the Xilinx FPGA.
B.2 Starting the ISE Software
To start ISE, double-click the desktop icon, or start ISE from the Start menu
by selecting: Start All Programs Xilinx ISE 12.2 Project
Navigator.
B.2.1 Create a New Project
To create a new project:
1. Select File New Project... The New Project Wizard appears.
2. Type tutorial in the Project Name eld.
3. Enter or browse to a location (directory path) for the new project.
4. A tutorial Subdirectory is created automatically.
5. Verify that HDL is selected from the Top-Level Source Type list.
6. Click Next to move to the device properties page.
7. Fill in the properties in the table as shown below:
Product Category: All
Family: Spartan3
Device: XC3S200
Package: FT256
265
Speed Grade: -4
Top-Level Source Type: HDL
Synthesis Tool: XST (VHDL/Verilog)
Simulator: ISE Simulator (VHDL/Verilog)
Preferred Language: Verilog (or VHDL)
Verify that Enable Enhanced Design Summary is selected. Leave
the default values in the remaining elds.
8. Click next to proceed to the Create New Source window in the New
Project Wizard.
When the table is complete, your project properties will look like that
the shown in gure (A.2):
B.2.2 Create an HDL Source
In this section, you will create the top-level HDL le for your design. De-
termine the language that you wish to use. We will start with Creating a
VHDL Source section below, and then Creating a Verilog Source.
266
Creating a VHDL Source
Create a VHDL source le for the project as follows:
1. Click the New Source button in the New Project Wizard.
2. Select VHDL Module as the source type.
3. Type in the le name counter.
4. Verify that the Add to project checkbox is selected.
5. Click Next.
6. Declare the ports for the counter design by lling in the port informa-
tion as shown in gure (A.3).
7. Click next, and then Finish in the New Source Wizard - Summary dia-
log box to complete the new source le template.
8. Click Next, then Next, then Finish.
267
The source le containing the entity/architecture pair displays in the Workspace,
and the counter displays in the Source tab, as shown in gure (A.4).
B.2.3 Checking the Syntax of the New Counter Module
When the source les are complete, check the syntax of the design to nd
errors and typos
1. Verify that Implementation is selected from the drop-down list in the
Sources window.
2. Select the counter design source in the Sources window to display the
related processes in the Processes window.
3. Click the + next to the Synthesize-XST process to expand the process
group.
4. Double-click the Check Syntax process.
5. Close the HDL le.
Note: You must correct any errors found in your source les. You
can check for errors in the Console tab of the Transcript window.
268
B.2.4 Implement Design and Verify Constraints
Implement the design and verify that it meets the timing constraints speci-
ed in the previous section.
Implementing the Design
1. Select the counter source le in the Sources window.
2. Open the Design Summary by double-clicking the View Design Sum-
mary process in the Processes tab.
3. Double-click the Implement Design process in the Processes tab.
4. Notice that after Implementation is complete, the Implementation pro-
cesses have a green check mark next to them indicating that they com-
pleted successfully without Errors or Warnings.
5. Locate the Performance Summary table near the bottom of the Design
Summary.
269
6. Click the All Constraints Met link in the Timing Constraints eld to
view the Timing constraints report. Verify that the design meets the
specied timing requirements.
7. Close the Design Summary.
270

Vous aimerez peut-être aussi