Académique Documents
Professionnel Documents
Culture Documents
Autor:
Carlos Arrabal Azzalini
Tutor:
Pablo Ituero Herrero
El tribunal nombrado para juzgar el Proyecto arriba indicado, compuesto por los si-
guientes miembros:
Madrid de de 2007
First of all I would like to thank Marisa for assigning this project and the scholarship to
me. I have enjoyed working on it all along.
I would like to give special thanks to my mentor and friend Pablo for his advices and
support. I had great time working with him.
Finally I would like to thank Sandra for all her support and patient and for being there
all the time.
i
ii
Abstract
Today most common architectures for implementing the SOVA algorithm are affected by
two parameters: the trace back depth and the reliability updating depth. These parame-
ters play an important role in the BER performance, power consumption, area and system
throughput trade-offs. In this work, we present a new approach for doing the SOVA de-
coding that is not limited by the mentioned parameters and leads to an optimum SOVA
algorithm execution. Besides, the architecture is achieved by recursive units which con-
sume less power since the amount of employed registers is reduced. We also present a
new scheme to improve the SOVA BER performance which is based on a approximation
to the BR-SOVA algorithm. With this scheme the BER achieved is 0.1 dB from the one
obtained with a Max-Log-Map algorithm.
iii
iv
Contents
1 Introduction 1
2 Turbo Codes 5
2.1 Binary Phase Shift Keying Communication System Model. . . . . . . . . . 5
2.2 Soft Information and Log-Likelihood Ratios in Channel Coding. . . . . . . . 7
2.3 Convolutional Encoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Trellis Diagrams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Turbo Codes Encoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Trellis Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
v
4.7 Fusion Points based Reliability Updating Unit. . . . . . . . . . . . . . . . . 45
4.8 Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.9 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Methodology 55
Bibliography 72
vi
List of Figures
vii
4.12 Register Exchange SMU for the SOVA. Pf b = [111], Pg = [101] . . . . . . . 34
4.13 Register Exchange processing elements. . . . . . . . . . . . . . . . . . . . . 35
4.14 Systolic Array for the Viterbi Algorithm. . . . . . . . . . . . . . . . . . . . 36
4.15 Survival unit for the Systolic Array. . . . . . . . . . . . . . . . . . . . . . . 37
4.16 Two Step idea. First tracing back, and then reliability updating. . . . . . . 37
4.17 Fusion Points based SMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.18 Possibility of fusion points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.19 Fusion Point detection algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 40
4.20 Sequence of the Fusion Point algorithm . . . . . . . . . . . . . . . . . . . . 41
4.21 FPU architecture for a code with constraint length K = 3. . . . . . . . . . . 43
4.22 Reliability updating problem . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.23 One possible solution to the problem of bit reliabilities releasing. . . . . . . 46
4.24 Solution adopted for the bit reliabilities releasing problem. . . . . . . . . . . 46
4.25 Fusion Points based Reliability updating unit . . . . . . . . . . . . . . . . . 48
4.26 Recursive Updating Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.27 Recursive Updating Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.28 Control Unit General Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.29 Control Unit State Diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.30 Reliability Updating Unit with BR-SOVA approximation . . . . . . . . . . 53
4.31 Recursive Update with BR-SOVA approximation . . . . . . . . . . . . . . . 54
viii
6.8 Throughput statistics. f = 25M Hz, fRU U = 25M Hz. Pf b = [111], Pg =
[101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.9 Throughput statistics. f = 25M Hz, fRU U = 50M Hz. Pf b = [111], Pg =
[101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.10 Throughput statistics. f = 16.66M Hz, fRU U = 25M Hz. Pf b = [111],
Pg = [101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.11 Throughput statistics. f = 25M Hz, fRU U = 25M Hz. Pf b = [1011],
Pg = [1101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.12 Throughput statistics. f = 25M Hz, fRU U = 50M Hz. Pf b = [1011],
Pg = [1101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.13 Throughput statistics. f = 16.66M Hz, fRU U = 50M Hz. Pf b = [1011],
Pg = [1101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
ix
Chapter 1
Introduction
The goal of any communication system is to achieve highly reliable communications with
a reduced transmitted power and reach as high as possible data rates. All these pa-
rameters usually represent a trade-off that designers have to deal with. Bandwidth is
also a limited resource in communication systems. Error-detecting and error-correcting
techniques are used in digital communication systems in order to get higher spectral and
power efficiencies. This is based on the fact that with these techniques more channel errors
can be tolerated and so the communication system can operate with a lower transmitted
power, transmit over longer distances, tolerate more interference, use smaller antennas,
and transmit at a higher data rates.
One of the most widespread of these techniques is Forward Error Correction (FEC). On
the transmitter side, an FEC encoder adds redundancy to the data in the form of parity
information. Then at the receiver, an FEC decoder is able to exploit the redundancy in
such a way that a reasonable number of channel errors can be corrected. Claude Shannon
—father of Information Theory— showed that if long random codes are used, reliable
communications can take place at the minimum required Signal to Noise Ratio (SNR).
However, truly random codes are not practical to implement. Codes must possess some
structure in order to have computationally tractable encoding and decoding algorithms.
Turbo Codes were introduced by Berrou, Glavieux and Thitimajshima in 1993 [3].
These codes exhibit an astonishing performance close to the theoretical Shannon limit,
in addition to a good feasibility of VLSI (Very Large Scale Integration) implementation.
Turbo Codes are used in the two most widely adopted third-generation cellular standards
(UMTS and CDMA2000). They are also incorporated into standards used by NASA for
deep space communications (CCSDS) and digital video broadcasting (DVB-T).
Decoding in Turbo Codes is carried out by a soft-output decoding algorithm: an
algorithm that provides a measure of reliability for each bit that it decodes. Specifically
two of the component decoding algorithms that are used in Turbo Codes are known as
MAP (Maximum a Posteriori) and SOVA (Soft Output Viterbi Algorithm). The high
computational complexity of the MAP algorithm makes its implementation expensive
and power-hungry. This is why most implementations perform a simplified version of
the algorithm. The most common simplifications are: the Log-MAP and Max-Log-MAP
algorithms which work in the logarithmic domain. Regardless, these algorithms are still
more complex and power-hungry compared to the SOVA algorithm which presents the
2 Introduction
• A complete Turbo Decoder implementation based on the SOVA algorithm has been
achieved:
– A two step approach for the SOVA decoding has been adopted [9].
– A new algorithm that does not depend on the trace back depth of the survival
path has been introduced for the SOVA decoding.
– A new architecture for the previous algorithm has been designed.
– A new architecture for updating bit reliabilities according to the HR-SOVA
algorithm has been designed.
– A novel updating process that approximates the BR-SOVA algorithm for binary
RSC codes has been presented. With this scheme the BER performance is less
than 0.1dB from the the Max-Log-Map approach.
– BER curves have been measured for the HR-SOVA and the BR-SOVA approx-
imation with different codes. (Real System).
– Throughput estimations have been obtained for different codes. (Real System).
– Power estimations have been obtained with simulation tools. (VHDL Post-
Place and Route model).
The structure of this document is the following. The second chapter introduces Turbo
Codes and sets the environment where this work resides. The third chapter deeply de-
scribes the SOVA algorithm and sets the main ideas for the fourth chapter which describes
today most common architectures and introduces the SOVA implementation proposed in
this work. It is inside the fourth chapter where the new algorithm, in conjunction with
the new architectures, is presented. The fifth chapter illustrates the practical design, from
3
implementation to verification. Finally the sixth chapter presents the results and mea-
sures carried out on the real system while the seventh chapter gives the conclusions and
establishes the basis of the future work.
4 Introduction
Chapter 2
Turbo Codes
Turbo Codes were presented by Glavieux [3] in 1993. They had a tremendous impact in
the discipline of channel coding. They are, along with LDPC (Low Density Parity Check)
codes, the closest approximation ever to the code that Claude Shanon probed to exist in
the mid XX century and which is able to achieve error free communications. Since their
introduction, they have been intensively studied. The first commercial application was
presented in 1997 [1] and today they are already part of the UMTS (Universal Mobile
Telecommunication System) standards. They have become the first choice when working
with low SNRs (Signal to Noise Ratio) such as in wireless applications and deep space
communications.
In this chapter we first introduce the communication system model which has been
employed in this work as the scenario for channel coding tests. Next we introduce the
concept of soft information which is the key of Turbo Codes. We describe the Turbo
Codes encoders and finally we talk about the trellis termination. The decoding process is
left to the next chapter.
In order to explain the soft information concept and the log-likelihood ratio, we will
develop a simplified communication model that will be the base example for the proceeding
concepts. This communication model is shown in Figure 2.1. On the transmitter side there
is a source of information that we assume to provide equally likely symbols. There is a
block for channel coding which is the main subject of this work and it is carried out by
a Turbo Code. The modulation scheme is BPSK (Binary Phase-Shift Keying) and the
channel is assumed to be an AWGN (Additive White Gaussian Noise). On the receiver
side, all the complement blocks for those in the transmitter are found. Also, there is a
matched filter which maximizes the SNR before sampling the received data. Note that we
have omitted the synchronization recovery subsystem which will be assumed to be ideal.
As starting point the source provides message bits mi at a rate of T1 bits/sec, which
are fed into the channel coding block. In a Turbo Code context, these bits are grouped
to form a frame of size L bits. The channel coding block outputs a coded frame with size
2L. So, for each message bit mi there is a symbol made of two bits xi = {xsi , xpi }. Then
6 Turbo Codes
+1V
-1V
u (t )
mi
source 01011010... Channel Coding 110010... BPSK Modulation
AWGN
+1V
Channel
-1V
sink 01011010...
Channel
Matched Filer r ( t)
Decoding
m̂i yi
+1V
Implies sampling
-1V
and quantification
AW GN
xi M a t c h e d F il e r yi
C hannel
D is c r e t e A W G N c h a n n e l
the code rate is r = 12 —one input bit, two output bits. The modulator generates the
waveform signals from the input bits and transmits them through the AWGN channel.
The matched-filter filters the received signals which, at the corresponding time instant,
are sampled and so the yi symbols are obtained. The AWGN channel, in conjunction with
the matched filter and the sampling unit, can be modeled as a discrete AWGN channel
as shown in figure 2.2. The modeling of a discrete channel is desired, since computer
simulations are simplified and the computing time is reduced. The equation that governs
the behavior of this channel is the following:
√
yi = a Es (2xi − 1) + nG (2.1)
N0
where the variance of n0G becomes σ 2 = 2Es .
2.2 Soft Information and Log-Likelihood Ratios in Channel Coding. 7
This rule is known as MAP (Maximum a posteriori) since P (xi = 1 | yi ) and P (xi = 0 | yi )
are the a posteriori probabilities. Using the Bayes theorem, the previous rule can be
rewritten as:
P (yi | xi = 1) P (xi = 1) P (yi | xi = 0) P (xi = 0)
> ⇒ xi = 1
P (yi ) P (yi )
P (yi | xi = 1) P (xi = 1) P (yi | xi = 0) P (xi = 0)
< ⇒ xi = 0
P (yi ) P (yi )
P (yi | xi = 1) P (xi = 1)
> 1 ⇒ xi = 1
P (yi | xi = 0) P (xi = 0)
P (yi | xi = 1) P (xi = 1)
< 1 ⇒ xi = 0
P (yi | xi = 0) P (xi = 0)
If we apply the natural logarithm on the previous equations, the testing result is not
altered, then we obtain:
P (yi | xi = 1) P (xi = 1)
ln + ln > 0 ⇒ xi = 1
P (yi | xi = 0) P (xi = 0)
P (yi | xi = 1) P (xi = 1)
ln + ln < 0 ⇒ xi = 0
P (yi | xi = 0) P (xi = 0)
The previous ratios in the log domain, are the LLR (Log Likelihood Ratio) metrics which
is a useful way to represent the soft decision of receivers or decoders. We can summarize
the previous steps with only one equation as follows:
where Lai is the LLR of the a priori information and Lc (yi ) is related to a measure of the
channel reliability. Note that the sign of the i indicates the hard decision.
8 Turbo Codes
x1p i
Pg1 = [101]
1 0 1
mi
1 1 1
Pg 2 = [111]
x 2p i
1
Figure 2.3: NSC encoder of rate 2
So far we have introduced the equations of soft information based on the received
symbol at the input of the decoder without the aid of the underlying code. The fact
of using channel coding in the communication system lets us improve the LLR of the a
posteriori probability. This is shown in [3]. The LLR of the a posteriori information at
the output of the decoder is:
The term Lei is known as the extrinsic information which actually is the improvement
achieved by the decoder and the decoding process on the soft information. The extrin-
sic information will be the data fed as a priori information to the other decoder in a
concatenated decoding scheme. It is important to remark that all terms in equation 2.3
can be added because they are statistically independent [3]. Statistical independence of
terms is essential to allow iterative decoding and this is the reason of interleavers in the
concatenation schemes of Turbo Encoders and Turbo Decoders.
P = [111]
fb
mi xs i
1 1 1
1 0 1
Pg1 = [101]
x pi
1
Figure 2.4: RSC encoder of rate 2
mi xs i
1 0 1 1
1 1 0 1
xp i
Figure 2.5: RSC encoder used in the UMTS standard. Pf b = [1011], Pg = [1101]
ator polynomials define the encoder of rate 21 —see figure 2.3. On the other side, an RSC
encoder is defined by both feedback and generator polynomials —see figure 2.4.
The status of the set of registers represents the state of the encoder. Input bits mi
make the encoder memory elements change and move into another state while producing
the output bits xsi , xpi —for the case of the RSC encoder. Convolutional encoders are
characterized by the constraint length K. An encoder with constraint length K has K − 1
memory elements which allows the encoder to jump through 2K−1 states.
RSC encoders are mostly used in Turbo Codes schemes rather than NSC encoders,
since better BER performances have been achieved with them. For instance, the encoder
used in UMTS is the one depicted in Figure 2.5.
s0 {0,0}
{1,1}
{1,1}
s1 {0,0}
{1,0}
s2 {0,1}
{0,1}
s3 {1,0}
m i =0
m i =1
Figure 2.6 shows the trellis for the RSC encoder of figure 2.4. The figure also shows an
example of an input message and how this input message represents a path in the trellis
diagram. This path is colored in blue an it is known as the state sequence s.
In order to find the trellis representation of an encoder we follow these steps:
• The memory elements of the encoder are set to represent a given state. Usually the
first state is 0. Then we want to calculate the connections between the present state
and the subsequent states.
– An input bit mi equal to zero is assumed. Then the output symbol is calculated
by operating with the adders and the value of the registers. Also the next state
is calculated by shifting the register inputs at the clock edge. For example in
figure 2.6, we see that at state s0 , an input message bit mi = 0 produces a
transition to state s0 . In contrast, a bit mi = 1 produces a transition to state
s2 .
– An input bit mi equal to one is assumed. Again, the output symbol is calculated
by operating with the adders and the value of the registers. Also the next
state is calculated by shifting the register inputs at the clock edge. Note that,
whenever a transition is due to a zero input bit, then that transition is drawn
as a solid line. In contrast, whenever the transition is due to an one input bit,
that transition is drawn as a dashed line.
• Repeat the previous steps with the rest of the states, s1 -s3 in the example.
The previous trellis diagram is given by the polynomials and therefore it is the same for
all the stages. The encoded message can be thought as a particular path within the trellis
diagram as shown in the example of 2.6.
2.5 Turbo Codes Encoders. 11
RSC RSC
m i In te r le a v e r xi
E ncoder E ncoder
mi xs i
1 1 1
1 0 1
xp i
Interleaver x p1
i
x p2 i
puncturing
Figure 2.8: Parallel concatenated Turbo encoder. RSC encoder with Pf b = [111], Pg =
[101].
s1
mi xs
i
s2
1
1 1
1 0 1
xp i
Interleaver x p1
i
x p2 i
Figure 2.9: Turbo Encoder with trellis termination in one encoder. Pf b = [111], Pg = [101].
termination is basically the final state the memory elements of the convolutional encoders
adopt when the end of the frame, being encoded, is reached. Since there is an interleaver
between both convolution encoders, the trellis termination of them is not a trivial task [16].
We will choose, for the purpose of this work, to terminate the first encoder and left the
second encoder open. Figure 2.9 shows the resulting Turbo encoder. The system works
as follows: At the beginning, switch s1 is closed and switch s2 is opened. A data frame
of size L − 2 is encoded and then switch s1 is opened and s2 is closed, the remaining two
bits are encoded , this leads the first convolutional encoder to the state 0. Note that the
data frame, for this case L − 2 bits long, and the remaining two bits are used to terminate
the trellis.
Chapter 3
In this chapter we will introduce a general scheme of a Turbo decoder for a parallel
concatenated code. We will go step by step through the entire decoding process and
deeply describe one of the algorithms used in the SISO (Soft Input Soft Output) unit:
The SOVA algorithm.
In the previous chapter we presented Turbo Codes and the encoding process. Now it is
time to talk about the decoding process. Turbo codes are asymmetrical codes. That is,
while the encoding process is relatively easy and straight forward, the decoding process is
complex and time consuming.
The power of Turbo Codes resides on the decoding process which unlike others tech-
niques, is done iteratively. Figure 3.1 shows a general scheme of a turbo decoder. As
we can see, the decoding process is done by two SISO decoders. Signals arriving at the
receiver are sampled and processed with the aid of the channel reliability before becoming
the soft information ”parity info 1,2” and ”systematic info” shown in figure 3.1. We can
DeInterleaver
- -
+ +
a priori Info La Λ Interleaver La Λ
SISO - SISO -
parity Info 1 p parity Info 2 p
DeInterleaver
systematic Info s s
Interleaver
see the output of one SISO decoder becoming the input of the other decoder and vice
versa, forming a feedback loop. The name of turbo code is due to this feedback loop and
its comparable appearance to a turbine engine.
Final decoding is achieved by an iterative process. Soft input information is processed
and as a result soft output information is obtained. The second decoder takes this soft
information as input and produces new soft output information that the first decoder
will use as input. This process continues until the system makes a hard decision. The
BER obtained improves drastically with the first iterations until it begins to converge
asymptotically [3]. A trade-off exists between the decoding delay and the bit error rate
achieved. Even though eight iterations are enough to obtain a reasonable BER, decoders
not always do them all; instead they check the parity of the message header and then they
decide whether to keep iterating or not.
Note that between each decoder there is an interleaver or deinterleaver depending on
the data flow. As we mentioned in chapter 2, the interleaver/deinterleaver unit is a big
issue in turbo coding. This unit reorders soft information so a priori data, parity data and
systematic data are all time coherent at the moment of processing.
Figure 3.1 also shows how soft input information is extracted from output, in order to
avoid the positive feedback which degrades the BER performance of the system.
where y is the noisy set of symbols we have at the decoder after sampling. To be more
precise y is the observation. From Bayes theorem we have:
P [y | s] P [s]
ŝ = arg max (3.2)
s P [y]
since P [y] does not change with s, we can rewrite equation 3.2 as:
3.2 SISO Unit: SOVA. 15
n o
ŝ = arg maxP [y | s] P [s] (3.3)
s
In order to compute equation 3.3, we could try all sequences s and find the one that
maximizes the expression. However, this idea it is not scalable when the frame size is too
large.
Since there is a first order Markov process involved, we can take advantage of two of
its properties to simplify the search for ŝ. These properties are:
Equation 3.4 establishes that the probability of next state does not depend on the entire
past sequence. It only depends on the last state. Equation 3.5 states that the conditional
probability of the observation symbol yi through white noise is only relevant during the
state transition.
L−1
Y
P [y | s] = P [yi | si → si+1 ] ,
i=0
L−1
Y
P [s] = P [si+1 | si ] ,
i=0
L−1 L−1
( )
Y Y
ŝ = arg max P [yi | si → si+1 ] P [si+1 | si ] (3.6)
s
i=0 i=0
L−1
( )
X
ŝ = arg max ln P [yi | si → si+1 ] + ln P [si+1 | si ] (3.7)
s
i=0
L−1
( )
X
ŝ = arg max λ (si → si+1 ) (3.8)
s
i=0
λ (si → si+1 ) is known as the branch metric associated with transition si → si+1 .
The observation yi during state transition si → si+1 is actually the output of the
encoder observed through white noise during the state transition. For our BPSK model this
16 Decoding Turbo Codes : Soft Output Viterbi Algorithm
(1
,1
s1 s1
)
s2 s2
xpi
BPSK Modulation
u s = 2 xs − 1
i i
u p = 2x p −1
s3 s3
i
i
observation is related to the systematic and parity bit pair (Figure 3.2). Thus, assuming
noise independence, we can express the conditional probability of yi during state transition
as follows:
where usi and upi are the systematic and parity bits respectively after BPSK modulation
and
ysi −usi 2 ypi −upi 2
1 1 1 1
P [ysi | usi ] = σ√2π exp − 2 σ dysi ; P [ypi | upi ] = σ√2π exp − 2 σ dypi ,
since we are dealing with white Gaussian noise with σ 2 variance. In addition, it is more
convenient to express P [si+1 | si ] in terms of the message bit mi since state transitions
are due to this bit. Then,
This is our a priori probability. For turbo decoding it is easier to work with log-likelihood
ratios, then:
P [mi = 1]
Lai = ln
P [mi = 0]
( La
e i
mi = 1
1+eLai ⇒ ln P [mi] = Lai mi − ln 1 + eLai
P [mi ] = 1
1+eLai
mi = 0
It is important to remark that for the first iteration, all message bits are assumed to be
equally likely, then P [mi = 1] = P [mi = 0] = 0.5 → Lai = 0. For successive iterations
Lai is the extrinsic information provided by the other decoder through the interleaver.
Replacing equation 3.9 and the above expression in the branch metric equation, we have:
3.2 SISO Unit: SOVA. 17
Note that in order to simplify equations we have neglected terms that do not change when
N0
varying sequence s. From chapter 2 we know that σ 2 = 2Es and Es = rEb where r = 12
is the code rate. So finally we obtain:
Eb
λ (si → si+1 ) = [ys us + ypi upi ] + Lai mi (3.11)
N0 i i
It is more common to express equation 3.11 as shown below, since channel reliability
Es
Lc = 4a N 0
( a = 1 for our model ),
L−1
( )
X
ŝ = arg max Lc ysi xsi + Lc ypi xpi + Lai mi (3.13)
s
i=0
where xsi , xpi are the raw bits at the output of the channel encoder before the BPSK
modulation. Also mi = xsi for our RSC encoder.
It is important to remark that according to [11], for the SOVA algorithm, Lc can be
assumed to be equal to 1. This means that there is no need to estimate the SNR of
the channel. This is possible because at the beginning of the decoding process, at the
first iteration Lai = 0 which leads the resulting extrinsic information to be weighted by
Lc . This extrinsic information becomes Lai for the next SISO decoder making all terms in
equation 3.13 to be weighted by Lc . Therefore Lc has no influence in the decoding process.
The fact that the SOVA does not need the channel estimation saves a lot of difficulties
and represents a big advantage over the MAP algorithm.
Summarizing, table 3.1 shows the relevant equations for applying the SOVA algorithm.
18 Decoding Turbo Codes : Soft Output Viterbi Algorithm
Element Equation
Branch Metric λ (si → si+1 ) = ysi xsi + ypi xpi + Lai mi (3.14)
L−1
( )
X
Sequence Estimator ŝ = arg max ysi xsi + ypi xpi + Lai mi (3.15)
s
i=0
Where {xsi , xpi } is the encoder output symbol when the input message bit is mi ; {ysi , ypi }
is the received symbol, when the encoder output symbol is BPSK-modulated and trans-
mitted through an AWGN channel. Finally Lai represents the LLR of the message bit
mi .
In the next subsection we will develop an example in order to show how expression
3.15 and the trellis diagram are applied in the decoding process.
Figure 3.3 shows a trellis diagram example for a code with Pf b = [111], Pg = [101], and
tries to clarify the decoding process.
• As shown on figure 3.3.a, The process begins at time i = 0 from state 0 because
that is the state the encoder takes when initialized. Thus, the probability of being
at state 0 is one, and probability of being at any other state is zero. We assign these
probabilities, as path metrics in log domain, to each state:
pmi,k ⇒ pm00 = 0
pm0k = −∞ ∀k 6= 0
• Then, the branch metrics are computed at each state for message bit 0,1 and corre-
sponding parity bit.
3.2 SISO Unit: SOVA. 19
0 λ ( s00 → s10 )
s0
λ ( s01 → s10 )
−∞
s1
λ ( s02 → s11 )
−∞
s2 λ ( s03 → s11 )
−∞
s3
i=0 i=1 i=2 i=3 i = L-2 i = L-1 i=L
0
s0
−∞
s1
−∞
s2
−∞
s3
i=0 i=1 i=2 i=3 i = L-2 i = L-1 i=L
0 λ ( s10 → s20 )
s0
λ ( s11 → s20 )
−∞
s1
−∞
s2
−∞
s3
i=0 i=1 i=2 i=3 i = L-2 i = L-1 i=L
s0
s1
s2
s3
i=0 i=1 i=2 i=3 i = L-2 i = L-1 i=L
Figure 3.3: Trellis diagram for VA, Code given by Pf b = [111] , Pg = [101] .
20 Decoding Turbo Codes : Soft Output Viterbi Algorithm
λ si,k → si+1,k0 ⇒ λ(s0,0 → s1,0 ) = (ysi + Lai ) 0 + ypi 0
λ(s0,0 → s1,2 ) = (ysi + Lai ) 1 + ypi 1
λ(s0,1 → s1,2 ) = (ysi + Lai ) 0 + ypi 0
λ(s0,1 → s1,0 ) = (ysi + Lai ) 1 + ypi 1
λ(s0,2 → s1,3 ) = (ysi + Lai ) 0 + ypi 1
λ(s0,2 → s1,1 ) = (ysi + Lai ) 1 + ypi 0
λ(s0,3 → s1,1 ) = (ysi + Lai ) 0 + ypi 1
λ(s0,3 → s1,3 ) = (ysi + Lai ) 1 + ypi 0
• The incoming path metrics for each state at time i = 1 are calculated by adding the
incoming branch metrics to the corresponding path metrics of states at time i = 0.
Figure 3.3.b.
• For each state at time i = 1, the incoming branch with the greater incoming path
metric is kept. The new path metrics of these states are the survival incoming path
metrics.
• This algorithm is repeated from item 2 until time i = L − 1. Note that the final
states will be at i = L.
• In order to find ŝ at this point, there are two possibilities: if the encoder was
terminated, the system should trace back from the state at which the encoder was
terminated —usually state 0— through all survival linked branches. If the encoder
was not terminated, the system should choose the state with the greater path metric
and trace back from there. Each branch within the trellis has a message bit m̂i
associated. The set of those bits is the most probable message. This step is shown
in figure 3.3.d while the survival path is colored in green.
The Viterbi Algorithm is able to find the most probable sequence within the trellis and
hence its associated bits. Turbo coding techniques also demand the SISO unit to supply
soft output information. There are two well-known extensions for the Viterbi Algorithm
that produce soft output [11]. One was proposed by Battail [2] and it is known as BR-
SOVA. The other one was proposed by Hagenauer [7] and it is known as HR-SOVA. The
latter is mostly used rather than the former, even though the BR-SOVA performs better
in terms of BER. However, HR-SOVA allows an easier hardware implementation. We will
explain the HR-SOVA extension and remark the main idea.
3.2 SISO Unit: SOVA. 21
• As shown in the example of figure 3.4.a, at time i = L and state k = 0 the trace back
of the survival path starts. The survival path has been colored in green as exhibited
in the legend of the figure. In order to find the bit reliabilities, the competing path
also needs to be traced back from time i = L and state k = 0 to the time it merges
with the survival path. This competing path has been colored in orange, and for
the example of figure 3.4.a, the time where both paths merge is im = L − 4. Also
the difference between both incoming path metrics at time i and state k has to be
found. In figure 3.4 this value is represented as:
∆i,k = pmi−1,k0 + λ si−1,k0 → si,k − pmi−1,k00 + λ si−1,k00 → si,k (3.16)
where k is the next state of k 0 and k 00 , for a message bit mi ∈ {0, 1} respectively.
See Figure 3.4.a for references.
• Let j be a new time index in the range im < j ≤ i. At every time instant j, the
system compares the message bit of the survival path with the message bit of the
competing path. If they differ then the reliability ρj has to be updated according to
In figure 3.4 a red square is placed on the branches that differ in the message bit.
The BR-SOVA has also an updating rule for the case where the message bit of the
survival path do not differ to the message bit of the competing path:
This is the main difference between HR-SOVA and BR-SOVA. Nevertheless this
updating rule implies the knowledge of the bit reliabilities of the competing paths
ρcj [11].
• Once the system reaches the state where the survival path and the competing path
merge, it moves one time instant back from i to i − 1 through the survival path
and traces back once again the competing path at that state. This process is shown
in figure 3.4.b. For the example, the system now starts at time i = L − 1 and the
corresponding state k = 0. For this case, the competing path and the survival path
now merge at time im = L − 5.
• This algorithm continues from step 2 until time i = 1, thus allowing all the bit
reliabilities to be updated. Figure 3.4.c shows one more iteration with the aim of
clarifying this process.
∆
im k '
L ,0
k
s0
k ''
s1
s2
s3
survival path i = L-5 i = L-4 i = L-3 i = L-2 i = L-1 i=L
competing path
(a) Survival path and competing path at time i=L, state k=0
∆ L −1,0
s0
s1
s2
s3
survival path i = L-5 i = L-4 i = L-3 i = L-2 i = L-1 i=L
competing path
(b) Survival path and competing path at time i=L-1, state k=0
∆ L − 2,1
s0
s1
s2
s3
survival path i = L-5 i = L-4 i = L-3 i = L-2 i = L-1 i=L
competing path
(c) Survival path and competing path at time i=L-2, state k=1
Figure 3.4: Soft Output extension example for the Viterbi Algorithm. Code given by
Pf b = [111] , Pg = [101] .
where m̂i is the estimated message bit —m̂i ∈ {0, 1}. Note that (2m̂i − 1) only gives
the sign to Λi ; its magnitude is provided by ρi .
After explaining the previous algorithm, it is important to remark the main idea of the
process. At a given time 0 ≤ i ≤ L − 1 the question to ask is: How reliable is the message
bit m̂i ? The extension for soft output indicates that, the correctness of bit m̂i can only be
3.2 SISO Unit: SOVA. 23
as good as the decision to choose the “closest” competing path over the most likely path.
The soft output generated by the HR-SOVA turned out to be overoptimistic [12]. It means
that the HR-SOVA algorithm produces a LLR that is greater in magnitude than the LLR
produced by the BR-SOVA or by the MAP algorithm. These overoptimistic values for the
LLR lead HR-SOVA to a worse performance in terms of BER.
In [12] two problems associated with the output of the HR-SOVA are described. One
is due to the correlation between extrinsic and intrinsic information when the HR-SOVA
is used in a turbo code scheme. The other problem is due to the fact that the output of
the HR-SOVA is biased. The first problem is not easy to solve, and most of the hardware
implementations do not deal with it. In contrast, for the second problem there have
been several proposals that are based on a normalization method. The idea behind a
normalization method can be shown by assuming that the output of the HR-SOVA, given
a message bit mi , is a random variable with a Gaussian distribution, then:
!
1 (Λi − µΛ )2
P [Λi | mi = 1] = √ exp − 2 dΛi , (3.20)
2πσΛ 2σΛ
!
1 (Λi + µΛ )2
P [Λi | mi = 0] = √ exp − 2 dΛi , (3.21)
2πσΛ 2σΛ
q
where µΛ is the expectation of Λi and σΛ = E Λ2i − µ2Λ is the standard deviation. In
order to find the LLR of the message bit mi , given the output of the HR-SOVA, we can
define:
0 P [mi = 1 | Λi ]
Λ i = ln , (3.22)
P [mi = 0 | Λi ]
using Bayes theorem, assuming P [mi = 1] = P [mi = 0], and working on the previous
expression with 3.20 and 3.21, yields:
2µΛ Λi
Λ0 i = 2 , (3.23)
σΛ
2µΛ
which indicates that the HR-SOVA output should be multiplied by the factor c = 2
σΛ
to
obtain the LLR.
The factor c, according to [12], depends on the BER of the decoder output. Some
schemes try to estimate factor c while others set up a fixed value for it. In our hardware
implementation we will use a fixed scaling factor since in [10], it has been reported that
the BER performance by a fixed scaling factor is better than by a variable scaling factor.
24 Decoding Turbo Codes : Soft Output Viterbi Algorithm
Chapter 4
Hardware Implementation of a
Turbo Decoder based on SOVA
In the previous chapter we introduced the general ideas of a turbo decoder and presented
the HR-SOVA algorithm —from now on we will refer to it just as SOVA— as the active
part of the SISO unit. In this chapter we will deal with the implementation issues and
analyze today most commonly used hardware architectures. Next we will introduce a new
algorithm for finding points of the survival path and consequently we will present the
architecture for implementing it. We will describe the unit that updates bit reliabilities
and finally we will present the improvements which allow the decoder to boost the BER
performance.
As a general scheme we present the figure 4.1. There are two blocks of RAM used
as input and output buffers. There are also two more blocks of RAM used to store
temporary data as a priori and extrinsic information. Then there is a unit that deals with
the interleaving process, a unit to control the system and to interact with the user and
finally the SISO unit that implements the SOVA algorithm. Note that we only use one
SISO unit. This is possible because of the fact that the interleaver/deinterleaver does not
allow concurrent processing, so a frame has to be completed by one decoder before it can
be processed by the other. For the proposed architecture, this processing is always done
by the same decoder.
Data arriving at the receiver is processed and fed into the data-in RAM buffer, then
a starting command is delivered to the control unit. The states the system goes through
are shown in figure 4.2. The system starts to process the interleaved data first, and at the
last iteration, it ends up with the deinterleaved data . This is done this way, in order to
save an access through the interleaver at the end of the decoding process which also saves
power and allows a simpler control unit. However, the system has to wait until the entire
frame is received, before decoding can take place.
Even though the same unit is used as decoder 1 and decoder 0, its behavior changes
slightly, depending on the role the unit is playing. We can summarize the following tasks
for each role:
– When SOVA addresses data-in RAM buffer, it addresses belong to the inter-
leaved domain.
– Since it addresses belong to the interleaved domain, in order to get systematic
data, it has to go through the deinterleaver.
– It can address “parity data 2” directly.
– If the first iteration is running, then a priori information is assumed to be 0.
Otherwise, it fetches a priori information through the deinterleaver from RAM
La/Le.
– It writes extrinsic information directly to the RAM Le/La. This entails that,
when acting as decoder 0, it has to access a priori information through the
interleaver.
cmd
Interleaving/Deinterleaving RAM
Unit Le/La
RAM
SOVA data out
Data Out
Idle
+ Addresses belong to the interleaved domain.
+ It fetches “parity data 2” directly from data-in
RAM buffer.
+ It fetches systematic data through the
deinterleaver.
N + It fetches a priori data through the
Begin? deinterleaver from RAM La/Le.
+ It writes extrinsic information directly to RAM
Le/La.
Y
Deco 1
Deco 0
syst p1 sys p1
systematic
parity 1
RAM
p2
parity 2
RAM addr out p2
p2 rd p2
x̂ x̂
data-out
RAM
addr in addr out
wr rd
data La data Le
RAM
La/Le
addr la/le addr in
rd la/le we la/le
RAM
Le/La
addr le/la addr in
rd la/le wr le/la
RAM la/le -
RAM le/la le/la write
RAM p2 RAM sp1 RAM la/le port interface
interface interface
interface interface
addr out p2 wr le
addr_spla
delayed one cycle, while the address of the systematic data goes through the deinterleaver.
Also the a priori data is fetched form the RAM La/Le and the extrinsic information is
written directly to the RAM Le/La. In contrast, when working with “deco=0”, there is
no need to access the “parity data 2” RAM, since the “parity data 1” and the systematic
data are stored in the same RAM position. In this case, a priori information is accessed
through the interleaver and extrinsic information is written directly to the RAM La/Le.
data in data in
VA SOVA
BMU BMU
ACSU ACSU
∆
SMU
SMU
RUU
ys
(x s , x p ) = (1, 0 )
λ2
+
La
(x s , x p ) = (1,1 )
λ3
+
yp
(x s , x p ) = (0 ,1 )
λ1
(x s , x p ) = (0 , 0 )
λ0
where k is the next state of k 0 that produces the higher incoming path metric. The previous
expression suggests that the path metric pmi,k can be obtained by recursion. In figure 4.9
an ACSU for the SOVA unit is presented.
The set of registers holds the previous path metrics. The branch metrics are mapped to
the corresponding adders according to the outputs during state transitions to produce the
incoming path metrics. Then these incoming path metrics are connected to the selectors,
which choose the higher incoming path metric and produce the decision vector along with
the ∆ difference between incoming path metrics. The connections between adders and
selectors represent the trellis butterfly.
One problem that might arise is the overflow of the path metrics after a certain amount
of time. Since the relevant information is the difference between path metrics, a normal-
ization method can be adopted. There have been proposed many normalization methods
since the introduction of Viterbi decoders. We find the modulo technique reported in [13]
to be a good solution, since it actually allows the overflow.
The idea behind the modulo technique is that the maximum difference ∆B between
path metrics at all states is bounded. The figure 4.10 shows the mapping between all
representable numbers, by the path metric register of nb bits, on a circumference.
Let ipm0i,k and ipmi,k be two incoming path metrics at a given time i state k, then
it is shown in [13] that ipm0i,k > ipmi,k , if ipm0i,k − ipmi,k > 0 in a two-complement
representation context. The number of bits nb relates to the bound as follows:
C = 2nb = 2∆B
32 Hardware Implementation of a Turbo Decoder based on SOVA
λ0 λ1 λ2 λ3
∆ ,0i
pm
+ +
− 1,0
Sel
i
v i ,0
pm i ,0
∆ i ,1
+ +
Sel
v i ,1
∆ ,2i
+ +
Sel
v i ,2
∆ ,3i
+ +
Sel
v i ,3
(0 , 0 ) = λ0
pm (1 ,1 )
i ,k
(1 ,1 )
(0 , 0 )
- ∆, (1 , 0 )
i k
(0 ,1 )
(0 ,1 )
>0?
v ,
i k
(1 , 0 )
selector
Figure 4.9: Add Compare Select Unit for the SOVA. Pf b = [111], Pg = [101]
This means that, even though the path metrics may grow in different ways, they all remain
in the half of the representation space provided by C. An appropriate bound is ∆B = 2nB,
being n the minimum number of stages to ensure a complete trellis connectivity among
all trellis states, and B is the upper bound on the branch metrics [13].
4.6 Survival Memory Unit. 33
ipm i,k
increasing
2nb−1 −1
0
− 2 nb−1 −1
ipm '
i ,k
Figure 4.10: Modular representation of the path metrics. Each path metric register has a
width of nb bits.
s0
s1
s2
s3
iFP i
The remaining SOVA units should obtain the soft output information for every bit in the
frame along with the most likelihood path. One way to do so, is to store all the data the
ACSU provides. Then when the last time instant is reached, the data is traced back and
the bit reliabilities are updated according to the SOVA algorithm. However most of the
hardware architectures do not do it that way because the latency is high and the amount
of memory grows considerably with the frame size, the number of states of the encoder
and the width of the quantization of ∆i,k .
Most of the SMUs take advantage of a trellis property to solve this problem. This
property is illustrated in Figure
4.11, where a trellis diagram from a decoding process is shown. If all the paths are
traced back from all the states at a given time i, it is found that they merge at time instant
iF P . Therefore, from time instant iF P down to i = 1 the only path remaining in the trace
34 Hardware Implementation of a Turbo Decoder based on SOVA
i i-U i-D
0 ∆MAX
PEU PEU PEU PEU PE PE PE v− ,0 ρi−D
1 ∆MAX
i D
v i ,0 ∆ ,0
i
0 ∆MAX
PEU PEU PEU PEU PE PE PE v− ,0 ρi−D
∆MAX
i D
1
v i ,1 ∆ ,1
i
0 ∆MAX
PEU PEU PEU PEU PE PE PE v− ,0 ρi−D
∆MAX
i D
1
v i ,2 ∆ ,2
i
0 ∆MAX
PEU PEU PEU PEU PE PE PE v− ,0 ρi−D
∆MAX
i D
1
v i ,3 ∆ ,3
i
Figure 4.12: Register Exchange SMU for the SOVA. Pf b = [111], Pg = [101]
back started at time i, is the survival path. We define the time instant along with the state
where the paths merge as a FP (Fusion Point). Then, looking at the example of figure
4.11, for time instant i there is a FP at (iF P, s3 ). Simulations have shown that the distance
between the time instant i and the FP iF P is a random variable. It is also observed that
the probability of the paths merging increases with the depth of the trace back and it is
proportional to the constraint length of the code. Then a trace back depth of 10 times
the constraint length of the code, might allow the paths to merge with high probability.
Below we will describe the mostly used architectures based on the previous property.
The RE (Register Exchange) SMU for an RSC encoder of rate 21 is shown in figure 4.12.
This scheme is reported in [9]. It is an array of PE (Processing Elements) of n rows and
D columns —n is the number of states of the encoder and D is the trace back depth. The
connection topology between PEs is given by the trellis of the encoder. In figure 4.12, two
types of PEs can be distinguished. The first U PEs —red outline— , besides tracing back
the paths, update the bit reliabilities. In figure 4.13.a a PE with updating capability is
shown. In figure 4.13.b a normal PE is shown. The system allows the trace back of all the
paths from the states at time instant i. The ACSU provides the data that enters the RE
from the left. The first U units update the bit reliabilities of each path according to the
SOVA algorithm. Each row of the array holds the information of one path. For example,
the first row holds the path information corresponding to the path traced back from the
state 0 at time i. The second row holds the information corresponding to the path traced
back from state 1 at time i, and so on. After D clock cycles, if D is large enough to allow
paths merging, the message bit and its reliability are obtained. Note, that if paths merge
before D then the data coming out from rows, at all states, is the same, since the tails of
all the paths belong to the survival path. Therefore only data from one row is selected.
Parameters U and D, represent a trade-off. Some architectures, set U in a range from
two to five times the constraint length of the code, while D is set between five to ten
times the constraint length of the code. If U and D are too large, the BER performance
increases, so power consumption and area do. The area increasing, is also due to the
resources spent in the connections, which becomes a serious problem with the number of
states of the encoder. If U is large, and D is not, then resources are spent worthlessly
since BER performance is not increased. The same if D is large while U is short or when
4.6 Survival Memory Unit. 35
v '
v ' ∆' v
''
a>b?
b
vi k
,
∆i k
,
v '
v ' ∆' v
''
vi k
,
both are short. The decoding latency for this scheme is D clock cycles and, as it can be
observed, the pipelined style of the architecture suggests high activity and hence a relative
high dynamic power consumption.
The RE scheme, presents one major problem that leads to a high power consumption. The
problem is that all the paths are traced back D steps. The idea behind the SA (Systolic
Array) is to trace back only one path, however, this path, after D steps will merge with
the survival path and will become the path we are looking for. SA is presented in [15].
The figure 4.14.a introduces the scheme of the SA for an RSC encoder of rate 21 and
four states in the encoder. The figure only shows the SMU for the VA. It is composed of
an array of elements arranged in n rows and 2D columns —n is the number of states of the
36 Hardware Implementation of a Turbo Decoder based on SOVA
i i-2D
vi ,0
1 1 1 1 1
Selection vi ,1
1 0 0 0 0
Unit vi ,2
1 1 0 1 1
vi ,3
0 1 0 0 0
TB TB TB TB TB TB SU vi−Dk
,
vi ,0
vi ,1
vi ,2
vi ,3
si , k si− k
Last 1, '
State
encoder and D is the depth of the trace back. There is also one more row, with D TB(trace
back) elements. The row of TB elements holds the sequence of the states belonging to the
survival path. It can be observed that the connections between the elements in the array
are much simpler than in the RE scheme.
The system works as follows: the selection unit, feeds the decision bits vi,k provided
by the ACSU into the left of the array. After D clock cycles, the SA is half full and the
selection unit begins to feed the state si,k with the higher path metric accumulated in
the ACSU registers into the left most TB element. The system also works if the selection
unit feeds any other state. However the state with the higher path metric is more likely
to be the survival path. Once the most likely state is fed, the TB elements along with
the decision vectors, do the trace back of that state D more cycles. Figure 4.14.b shows
the details of the TB cell. Finally after 2D cycles the SU(Survival Unit) —Figure 4.15
— provides the most likely message bit. Note that for this scheme the latency is twice
the latency of the RE scheme, however, the trace back depth is only D. Note that this
structure also suggests high activity and relatively high dynamic power consumption.
So far the SA deals with the VA. The SOVA extension for the SA presents some major
problems that were cited in [6]:
vi−D ,0
vi−D ,1
vi−D ,2
vi−D ,3
si , k
s1
s2
s3
i-D-U i-D i
survival path
competing path
Figure 4.16: Two Step idea. First tracing back, and then reliability updating.
• each state must have access to all the information about the path metric differences
and decision vectors for that particular time.
These issues make the SA not a good choice for a complete SOVA based decoder. However
SA has been used in [17] as a reliability updating unit in a Two Step configuration.
This scheme was proposed in [9] with the intention to discard all the operations that do
not affect the output. The idea is to postpone the updating process until the survival path
is found. Figure 4.16 shows this concept. The first D steps intend to find the survival
path, while the remaining U steps updates the bit reliabilities. A FIFO(First In, First
Out) memory is usually employed to delay the path metric differences along with decision
vectors until the updating process begins. The SMU we propose in this document is
actually a Two Step configuration. However, we introduce a new scheme for finding the
survival path.
38 Hardware Implementation of a Turbo Decoder based on SOVA
∆ ,1i
∆ ,2i
∆ ,1 i
∆ ,3i
RAM
v ,0i
v i ,1
v ,2i
v ,3i
FPU RUU ρ
A lot of architectures and schemes have been proposed in the last years. In [4] different
SMUs for the VA are studied and compared. In [6] a trace back architecture based on an
orthogonal memory is presented. However, all these schemes deal with a finite trace back
depth D and with a finite updating length U , which leads to a non optimum algorithm
execution. In the next subsection we will introduce a new architecture for the SOVA
algorithm that does not depend on the D-U trade-off.
So far, two of the most common schemes have been studied. They are the RE and the
SA. Both of them carry out a trace back with the aid of a pipeline architecture. The
size of this pipeline architecture has an impact in the area, power consumption and BER
performance. One of the contributions of this work is a new type of architecture based on
a new algorithm and the development of the architecture that implements it. The major
advantage of this new scheme is that it is independent of the D-U trade-offs and it allows
recursive processing which lessens the register activity.
The new architecture to implement the SOVA algorithm that we propose, as it name
suggests, deals with the Fusion Points. Figure 4.17 shows the general scheme. It consists
of a FPU (Fusion Point Unit) which finds the time instant and the state where the survival
paths merge. It is inside this unit where the new algorithm is implemented. There is a
dual port RAM to store the data the ACSU provides, and finally there is a RUU that
updates the bit reliabilities based on the information provided by the FPU.
The unit works as follows: the data the ACSU provides is stored in the dual port
RAM, the decision bits vi,k are also used by FPU to implement the FP search algorithm.
4.6 Survival Memory Unit. 39
s0
s1
s2
s3
Merging paths
Possible Fusion Points
Whenever a FP is found, it is indicated to the RUU which updates the bits reliabilities by
a tracing back method aided with the data fetched through the second port of the dual
port RAM.
This unit finds the Fusion Points along the trellis for a code with rate 12 by means of a new
algorithm1 . The algorithm is based on the idea that a fusion point for a code rate 12 will
always reside in the merging point of two paths. Figure 4.18 shows these possibles fusion
points. The following thought explain the previous idea: whenever a trace back operation
takes place, the system traces back from a given time instant i; while tracing back, paths,
at different time instants, merge in groups of two. The last of these “two-paths merging
point” is a Fusion Point. Therefore a FP will always reside in the merging point of two
paths.
The following steps along with the example of figure 4.19 introduce part of the algo-
rithm:
• Decision vectors coming from the ACSU, are used to identify the merging paths or
possible fusion points —Figure 4.19.a.
• Each possible fusion point is marked. Whenever a mark is set, the mark time and
state are held in registers —Figure 4.19.a.
• This mark is propagated along the branches to the next states —Figure 4.19.b.
• If a mark propagates to all the sates at a given time, then the origin of that mark
is a fusion point. The fusion point coordinate is held by the register and can be
recalled immediately —Figure 4.19.c.
After introducing the mark movements, figure 4.20 shows a sequence example where more
1 1
We develop the algorithm for a code of rate 2
, however, it can be extended to any code rate.
40 Hardware Implementation of a Turbo Decoder based on SOVA
s0
s1
s2
s3
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8
Mark propagated
s0
s1
s2
s3
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8
M ark propagated
s0
s1
s2
s3
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8
than one mark is handled at the same time. In the figure two columns can be appreciated.
The left column indicates the time instant the system is processing, and also the status.
The status is composed by three pointers that are able to hold the time and the states
of FPs. The first two pointers hold the possible FPs detected while the third pointer
4.6 Survival Memory Unit. 41
s0
Time Instant i=0
System Status s1
Pointer 0 (0,s0)
Pointer 1 (-,-) s2
FP (-,-)
s3
i=0
s0
Time Instant i=1
System Status s1
Pointer 0 (1,s0)
Pointer 1 (1,s2) s2
FP (0,s0)i=2
s3
i=1
s0
Time Instant i=2
System Status s1
Its pointer is free since it has no
chance to become a fusion point
Pointer 0 (1,s0)
Pointer 1 (2,s2) s2
FP (-,-)
s3
i=2
s0
Time Instant i=3
System Status s1
Pointer 0 (1,s0)
Pointer 1 (3,s0) s2
Its pointer is free since it has no
chance to become a fusion point
FP (-,-)
s3
i=3
s0
Time Instant i=4
We find that the blue mark and the yellow mark
s1 become fusion points, however if we trace back
System Status from i=5 we find that paths merge at (3,s0).
Whenever two marks coincide, the one with the
Pointer 0 (4,s0) latest origin is kept.
Pointer 1 (4,s2) s2
FP (3,s0)i=5
s3
i=4
s0
Time Instant i=5
System Status s1
Pointer 0 (4,s0)
Pointer 1 (4,s2) s2
FP (-,-)
s3
i=5
indicates a FP. The right column shows the sequence from time time i = 0 to time i = 5.
The algorithm proceeds as follows:
Address registers
FP addr
current addr
FP sel
Mark
Processing
FP state
v ,0i Mark
Mark Propagation
FP detected
v i ,1
Detection
v ,2i
v ,3i
State code registers
Mark registers
the same. They will have the same possibilities to become fusion points. However
the closest FP to the time being processed is the true FP. Then, it is not necessary
to propagate and process the behavior of both marks. The mark that is relevant
is the one with the origin closest to the time being processed. Therefore, we can
enunciate the following rule:
whenever two marks coincide, the one with the latest origin is kept.
Finally, before getting back to the algorithm, it is time to explain why the red
mark pointer and the fuchsia mark pointer were free in the previous steps. We
saw that either mark propagated to only one state. Therefore if the system
keeps propagating those marks, in the best case, they will coincide in the future
with a possible fusion point, and whenever two marks coincide the one with
the latest origin is kept. For this case the mark to be kept is the future possible
fusion point. Summarizing, this last rule becomes:
Now that we have set the main ideas and rules, we return to the algorithm.
The fusion point register is set with the pointer 1 data. The pointer 1 and
the pointer 0 are free, and then the coordinates of the turquoise mark and the
brown mark are stored in them.
• i = 5 The algorithm is executed, but there are no possible fusion points detections,
only mark propagations.
Figure 4.21 presents a design of the FPU for a code with constraint length K = 3. It
consists of a Mark Detection Unit, which uses the decision bits vi,k provided by the ACSU
44 Hardware Implementation of a Turbo Decoder based on SOVA
to detect possible FPs according to the trellis butterfly. There is a Mark Propagation
block, which propagates, along the trellis, the new marks and the stored marks. There is
a processing unit, which compares all the marks at the input, and proceeds as follows:
• if there are two equal marks, then the one with the latest address is kept.
• if there is a mark with only one bit set, then its corresponding register is freed, since
it has no chance to become a FP in the future.
• if there is a mark with all bits set, then a FP is indicated with its address and state.
Finally there is a set of registers used to hold marks, addresses, and state codes.
It is important to point out some major concerns:
• There are at most n2 new possible FPs at each time instant, where n is the number
of states of the encoder.
1
• Simulations have shown that for an RSC encoder of rate 2 with n = 2K−1 states,
the amount of registers the FPU needs is:
– n−2, registers of n+1 bits to hold marks —the remaining bit is used to indicate
if the register is empty.
– n − 2 registers of K − 1 bits to hold state codes.
– n − 2 registers of A bits to hold addresses, where A is the number of bits used
to code the frame size.
• Since the processing unit compares all marks at the same time to see if there are
equal marks, then the number of XOR gates increases drastically with the constraint
length of the code. However it has been observed that Turbo Code schemes with
encoders with short constraint length have better BER performance than encoders
with large constraint length [18].
Comparing our approach with the previous implementations, we conclude with the results
of table 4.1 for an RSC code of rate 12 , K = 3 and a message frame size of 1024. We see
that for a code with constraint length K = 3, a frame size 2A = 210 = 1024 bits, and a
trace back depth of D = 5 ∗ K, the RE SMU needs (5 ∗ 3) ∗ 4 = 60 register of one bit, and
the FPU needs (4 − 2) ∗ (4 + 1) + (4 − 2) ∗ 2 + (4 − 2) ∗ 10 = 34 register of one bit. Also,
the FPU will always find the correct FP, while the RE SMU might produce wrong results,
if paths do not merge within the trace back pipeline. Another difference is that the RE
outputs the symbol sequence of the survival path, while the FPU outputs the sequence of
FPs that are spread along the trellis. However, in a turbo code scheme context, the RUU
may take advantage of these FPs as we will show in the next subsection.
4.7 Fusion Points based Reliability Updating Unit. 45
Table 4.1: Comparison between the REU and FPU for a code with rate 12 , K = 3 and a
frame size 2A = 210 = 1024
s0
∆ 3,0 ∆ 4 ,0
s1
s2 ∆ 4 ,2
s3
i=0 i=1 i=2 i=3 i=4
survival path competing path Possible competing path in the future
Before getting into the hardware issues it is important to highlight the main problem we
face at the moment of updating bit reliabilities. For example, figure 4.22 illustrates one
example. While processing data at time instant i = 4, a FP is found at (3, s0 ). This FP
is colored in green. The example shows the survival path and the competing path traced
back from the FP until they merge. The blue branches indicate possible future branches
of the survival path, while the red paths indicate possible competing paths in the future.
The RUU could start to update bit reliabilities as soon as a FP is detected. However,
figure 4.22, shows how the reliability of bits i = 2, i = 3 might depend on ∆4,0 , or ∆4,2
. The earlier release of those bit reliabilities leads to a non optimum SOVA algorithm
execution.
One solution to the mentioned problem, is illustrated in figure 4.23. The idea is to trace
back U steps, to allow all the competing path, that start after time i, to merge. After U
steps, the remaining bit reliabilities could be released. However, this solution introduces
the U factor which is a trade-off between BER performance and power consumption.
It has no impact on the area, since as we will show later, bits reliabilities are updated
recursively. Anyway, the introduction of the U factor, leads to a non optimum SOVA
algorithm execution.
The solution we adopted is introduced by the example of figure 4.24 . By the time i,
two FPs have been detected. Since the second FP, resides after the detection line of the
first one, the updating process takes place starting from the second FP. Once, the first FP
is reached, the system continues updating and releasing the bit reliabilities. The fact that
46 Hardware Implementation of a Turbo Decoder based on SOVA
s0
s1
s2
s3 ∆ I FP ,3
survival path iFP -U iFP i
competing path
Figure 4.23: One possible solution to the problem of bit reliabilities releasing.
s1
s2
s3
iFP1 iDFP1 iFP2 i iDFP2
Figure 4.24: Solution adopted for the bit reliabilities releasing problem.
the second FP needs to reside after the detection line of the first one, is due to the concept
that any path traced back after the detection line will merge at the FP of that detection
line. Therefore any future competing path of the survival path, at most will merge at the
first FP, and will not affect the bit reliabilities before the first FP.
We can generalize this solution in an algorithm as follows:
• If the second FP is detected after the detection line of the first one then, proceed
with the updating process.
• If the second FP is detected before the detection line of the first FP, then wait for
one more FP:
– If the third FP resides after the detection line of the second FP then, proceed
to the updating process with the information of the second and third FP.
– If the third FP does not reside after the detection line of the second FP, but it
does after the detection line of the first one, then the updating process proceeds,
with the information of the first and third FP.
4.7 Fusion Points based Reliability Updating Unit. 47
– If the third FP does not reside after the detection lines of any of the other two
FPs, then the third FP is discarded. The RUU continues from step 4.
• When the updating process finishes, then the last FP, becomes the first FP, and the
process is repeated from step two.
• If the end of the frame is reached by the ACSU, then the RUU is interrupted and it
begins to update the bits reliabilities from the end.
Figure 4.25.a presents the RUU general scheme. There is a state machine which controls
the unit. It also carries out the previous algorithm. The registers at the left of the figure,
hold FP state codes, FP addresses and FP detection lines which are used to address the
RAM block and control the updating process. The lastState Unit, calculates the previous
state in the trellis, based on the current state, and the decision bit for that state. This
unit is actually doing the trace back, at each clock cycle, of the survival path. The current
state is used to drive the multiplexers to select the message bit associated with the survival
path and the ∆ difference between the metric of the survival path, and a competing path.
These elements are fed into the recursive Updating unit, which calculates the reliability
magnitude of bits ρi .
The term Lepi is stored in the RAM block in conjunction with the decision bits vi,k
and ∆i,k . This term is equivalent to:
The term Lepi is calculated when ysi and Lai are available at the time of branch metrics
because it saves clock cycles at the time of computing Lei . Not doing it a that time
supposes to access the data-in RAM buffer and the RAM La/Le-Le/La again. Besides,
the access has to be done through the interleaving/deinterleaving unit, which might be
being used. The calculation of Lei is done in the following way: the recursive units outputs
ρi , which is actually the magnitude of Λi . The bit mi gives the sign to Λi . Since a two
complement representation is used, the bit mi will indicate whether to complement ρi or
not. Then we have:
Lei = ρi + (0 − Lepi ) mi = 1
Lei = not ρi + (1 − Lepi ) mi = 0
The operation in parenthesis is done first and its result is delayed until ρi comes out of the
recursive unit. This allows to distribute combinational delays among the registers. The
resulting Lei is stored in the RAM La/Le-Le/La depending on the decoder.
The recursive updating unit is shown in figure 4.26. This unit updates bits reliabil-
ities by managing all competing path at once. In the scheme there is a set of register
that holds the different ∆ for each state. These ∆ are propagated to the corresponding
Hardware Implementation of a Turbo Decoder based on SOVA
m
Starting state decision bit
m
downCounter
delay
m
m
v k, i state of the survival path.
The inversion is controlled by the
v +
k,1 i
f2 bit mi+4
∆ ∆+
m
RAM SMU k, i
m ρ+
i 1 NOT
Recursive
Update i 4
State
Machine
>= f2 eL
i + 5
+
>= f1
Lepi Lepi+1
- delay
1 = finish delay finish_delayed
+
FPs detection line address
registers
flushing_addr_start
b
m
a<=b flushing delay wr le
a
( + − )ρ + −
This operation is equivalent to xxxxxxxxxxxxxx.
m2 i 4 +1 i 4 peL i 4
This is done this way in order to distribute
combinational delays among register
Delay addr_x^
These multiplexers select the end-of-the-frame
data when the ACSU reaches the end of the
frame.
Delay addr_le
48 Figure 4.25: Fusion Points based Reliability updating unit
4.7 Fusion Points based Reliability Updating Unit. 49
ps0
∆ i
p0
p0
m
∆MAX
MIN
decoder
m
si,k message bit associated.
p2
∆MAX
m
c0
p3
m
it complements the
decision bit of the ps1
v ,0i
survival state
ps0 p1
m
p0
MIN
v ,1
∆MAX
MIN
i
m
p1 ps1
v ∆MAX
m
i ,2 c1
p2 ps2
v
m
i ,3
p3 ps3
ps2
ρ +3
p2
v ,0
MIN
m
i i
c0
mi
∆MAX
MIN
m
v i ,1
mi c1 m
∆MAX c2
v i ,2
c2
m
mi
v i ,3
mi c3 ps3
p3
m
MIN
∆MAX
MIN
m
∆MAX
m
c3
m
Trace back Trellis
connection topology
s0
s1
s2
s3
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i = 10
ρ = min (∆ , ∆ , ∆ )
competing path competing path
previous states by the pair of multiplexers and the reverse trellis topology connection ac-
cording to the trellis decision vector. The moving of the ∆ is actually the trace back of
the competing paths. It can be observed its similarities to the ACSU in the recursive
procedure. Whenever two competing path merge, the one with the minimum ∆ is kept.
At each stage, the decision bits along with the estimated message bit, are used to drive
the multiplexers that select the relevant ∆. The minimum ∆ among these relevant ∆ is
the resulting bit reliability.
In order to clarify how the recursive unit works, we will introduce the example of figure
4.27. The set of registers from figure 4.26 will hold the colored ∆ from figure 4.27. When
the updating process is lunched, the set registers are set to ∆M AX .
50 Hardware Implementation of a Turbo Decoder based on SOVA
• The unit begins at the time instant i = 10. The orange ∆ is fed into the system
through the multiplexer from state 1. A the same time a minimizing process is
started with this orange ∆ and the remaining ∆ of the registers. The orange ∆ is
sent to state 3.
• At time i = 9. The blue ∆ is fed into the system through the multiplexer from
state 2. The orange ∆ from state 3 and the blue ∆ from state 2 participate in the
minimizing process. The blue ∆ it is sent to state 1, while the orange ∆ is sent to
state 3 again.
• At time i = 8. The fuchsia ∆ is fed into the system through the multiplexer from
state 0. Now there are three ∆ participating in the minimizing process. Finally the
orange, blue and fuchsia ∆ are sent to state 2, state 3 and state 0 respectively.
• The remaining steps are about the same. Note that, at time i = 6, state 2, two
competing path merge. For this example the blue ∆ is assumed to be less than the
turquoise ∆ and that is why it is kept.
Before moving into the next section it is important to talk about some throughput issues.
In figure 4.24 we can see that the unit RUU updates some distance before it can release
the final bit reliabilities. Therefore if we think of the time distance between fusion points
as a random variable with a mean D̄. Then the RUU processes 2D̄ time instants for each
FP detected by the FPU. This means, that the FIFO input data rate will be higher than
FIFO output data rate and the FIFO will get full. If the FIFO gets full, then the RUU
misses some FPs, however, this is not as bad as is seems, since the algorithm that manages
the FPs is still valid.
Let denote the amount of bits remaining to be updated, when the ACSU unit reaches
the end of the frame, with the parameter DR . Then the throughput of the SOVA SISO
can be estimated by
L
T HSISO = f [bps] (4.1)
L + DR
where L is the frame size and f is the frequency of the system. It is straight forward, that
if we want to increase the throughput of the system DR should be reduced. This can be
achieved by increasing the working frequency of the RUU so it processes more FPs per
time unit and at the end of the frame, less bits remain to be updated.
We finally present the design of the control unit, which is basically a finite state machine
that delays and synchronizes modules. Figure 4.28, shows the scheme. There are two
counters, one is responsible for the frame address count, and the other is responsible for
the iteration count. The iteration counter is first loaded with the number of iterations
that the user indicates. Figure 4.29 shows the state diagram that the entire system goes
through. Once the user drives the go signal high, the system begins to work. It first
initializes the units and progressively activate the corresponding modules before settling
4.8 Control Unit 51
x2+1
Iteration
Counter
niters niters
=0? iters Finished
Bit Counter
Frame Finished
State =?
Machine
Frame Length
Idle
Iters Finished go
Initializing
Finishing /Iters Finished
Modules
Frame Finished
Decoding
down in the decoding state. Once the end of the decoding process is reached the system
checks whether there is an iteration left or not.
52 Hardware Implementation of a Turbo Decoder based on SOVA
4.9 Improvements
The most common implementation of the SOVA decoder only updates bits reliabilities
by the HR-SOVA rule that was described in 3.2.2. A BR-SOVA updating rule would
be desirable since it has been proved in [5] that max-log-map algorithm and BR-SOVA
are equivalent and that the max-log-map algorithm perform better in terms of BER than
the HR-SOVA. However the BR-SOVA updating rule requires the knowledge of the bit
reliabilities of the competing paths which implies a higher complexity in the decoder and
this is the reason why we do not do a strict BR-SOVA, instead we approximate its behavior
by introducing a bound for the bit reliability of the competing path as shown below.
The BR-SOVA updating rule and HR-SOVA updating rule are the same when the
estimated bit and the competing bit are different. In contrast the following equations
recall the updating rules for each algorithm when the estimated bit and the competing bit
differ.
That is why we can think of the HR-SOVA as a BR-SOVA with an unbounded ρcj . The
improvement proposed in this work is to bound ρcj to a known value. When working
with an RSC binary code, the two incoming branches, at any state of a trellis diagram,
are associated with different message bits. Therefore, the ∆ difference between the path
metrics is actually a bound for the reliability of those message bits. The resulting updating
rule becomes: (
ρj ⇐ min
(ρj , ∆j,k ) m̂i 6= ci
ρj ⇐ min ρj , ∆i,k + ∆j c m̂i = ci
where ρj is the reliability of bit j; ∆i,k is the path metric difference between competing
path and survival path; m̂i is the estimated message bit; ci is the estimated message bit
which is associated with the competing path and finally ∆cj is the path metric difference
at each state at time j that belongs to the competing path.
Figure 4.30 and 4.31 show the modified RUU and the Recursive Updating Unit respec-
tively. They allow the previous rule to be executed. Note that the main difference is the
handling of all the ∆ since they represent the bound for the competing bit reliabilities.
previous state
Current state
put_frame_end_data
lastState
4.9 Improvements
Starting state
m
decision bit
m
si,k
Starting state
queue
si+1,k
delay
load
m
m
vi ,k mi+1
downCounter
vi +1,k
f2
m
RAM SMU ∆ i,k
m
∆ i +1 NOT
Recursive
Update ρ i+4
State
Machine
>= f2 ∆ i +1, k Lei +5
>= f1 delay
+
Lepi Lepi+1
-
m
b
a<=b flushing delay wr le
flushing_addr_start
a
Delay addr_x^ All deltas need to be available for the recursive update
Delay addr_le
∆ i,k
The ∆ i +1,k is the bound for ρ c i+1
ps0
∆i
p0
p0 ∆ i +1, 0
m
c0
decodificador
∆MAX
MIN
p1
m
si,k
p2 0 m
m
p3
+
ps1
vi , 0
ps0 p1
∆ i +1,1
m
p0 c1
MIN
v i ,1
∆MAX
MIN
m
p1 ps1
m
vi ,2 0
m
p2 ps2
v i ,3
+
p3 ps3
ps2
ρi +3
p2
MIN
vi , 0 ∆ i +1, 2
m
c2
c0
mi
∆MAX
MIN
m
v i ,1
m
c1 0
m
mi
vi , 2
c2
+
mi
v i ,3
mi c3
ps3
p3
∆ i+1, 3
m
c3
MIN
∆MAX
MIN
m
0
m
Methodology
The whole practical design process was carried out with the aid of powerful software tools.
Mainly three tools were employed in this thesis:
• Matlab 7.1. The mathematics software package Matlab was extensively used in the
simulation and verification of the design. It was employed to model the whole com-
munication system: encoder, channel, receiver and decoder. We also used Matlab
for the HIL (Hardware In the Loop) verification of the design. It was carried out
by establishing a serial port communication with an interface circuit specifically
developed for testing purposes.
• Xilinx ISE 8.2. The synthesis software package of Xilinx, ISE 8.2, was used in
all the tasks referred to the implementation, specifically the mapping, translation,
placement and routing along with the back annotation and the static timing analysis.
The FPGA programmer iMPACT is also included in this package; it was used to
download our design into the Xilinx Spartan III FPGA.
• ModelSim 6.1. VHDL code and Post-Place and Route models were simulated with
this tool.
Figure 5.1 summarizes the work flow. Five steps have taken place with some feedback
between them. On the rightmost part we have the fundamental stages of this process
whereas on the leftmost part the verification tasks associated with each stage are displayed.
The blue boxes show the main tool employed in the related task. Now we give a description
for the stages of the process:
• Specification. The specification of this work consisted on the design and implemen-
tation of a SOVA based Turbo Decoder implementation.
• High Level Design. A high level model was programmed using the software tool
MATLAB 7.1. This model allowed us to try the system in different environments
and also to fine tune the design specifications cited in step two.
56 Methodology
Information
Recopilation
Design Specifications
Matlab
ModelSim
VHDL
Behavioral Verification
Implementation
ModelSim
FPGA Post-Place &
Route Model Verification
VHDL Synthesis
In-Circuit Verification
Matlab
• VHDL Implementation. Once we were familiar with all the concepts related to
the decoding algorithm we started to work on the structure of the datapath. It
was described on VHDL code and all the combinational modules were verified by
appropriate test-benches on ModelSim. After the Datapath was totally defined, we
began to specify the control needs of our system and the way it would communicate
exteriorly, subsequently we gradually defined the whole system.
• VHDL Synthesis. After a VHDL functional model was achieved, the synthesis was
carried out. The targeted device was an Spartan 3 X3S200FT256. The system
was first verified by a Post-Place and Route model. Later the FPGA was programed
with the iMPACT tool for In-Circtuit verification. Figure 5.2 illustrates the approach
employed for this purpose while figure 5.3 shows the followed procedure. The serial
port baud rate was set to 115200 bps.
57
Interface
Matlab Decoder
Unit
mi
source Channel Coding BPSK Modulation
AWGN
BER Calculation Discrete Channel
sink m̂i
yi
Matlab
FPGA
Channel
Decoding
The system presented in chapter 4 was described using VHDL (Very high speed integrated
circuits Hardware Description Language). A generic and parameterizable VHDL code was
written. A VHDL package code includes the frame size, quantization scheme, polynomials
of the code, and the SOVA algorithm mode (HR-SOVA or BR-SOVA approximation).
The system can be configured through this package before the synthesis is performed.
The targeted device was a general purpose Xilinx FPGA Spartan 3 X3S200FT256.
All the tests have been done for two major polynomial pairs. One is the pair we have
been using through all this work, Pf b = [111], Pg = [101]. The other pair is the UMTS
polynomial pair, Pf b = [1011], Pg = [1101]. The size of the data frame has been set to 1024
bits and it is the same for all simulations and syntheses. The depth of the RUU FIFO has
been set to 16 FPs. We have employed two types of interleavers. One of the interleavers
is given in [14] —from now on, MCF. It is described by the following equations:
10^(0)
4:2
6:2
10^(−1) 8:2
10^(−2)
10^(−3)
BER
10^(−4)
10^(−5)
10^(−6)
10^(−7)
0.5 1 1.5 2 2.5 3
EbNo [dB]
Figure 6.1: ∆ quantization effect on the system BER performacne. BR-SOVA approxi-
mation scheme. Simulation with quantification. MCF. Pf b = [111], Pg = [101]
The only quantization study that has been carried out is related to the path metric
difference ∆ which has a significant impact in the system BER performance. Figure 6.1
shows the BER curve against the received signal SNR. It is observed that, for the current
example, the scheme 4:2 is better than the 6:2, 8:2. This behavior has been reported in [11]
as a method of improving the system BER performance. Since quantization saturates the
∆, the overoptimistic values of the bit reliabilities are lessened and consequently the system
BER performance increases. Note that adopting the reduced quantization scheme yields
more benefits. First the RAM that stores the data the ACSU is reduced. Furthermore
the logic related to the RUU is also reduced.
Tables 6.2 and 6.3 present the synthesis results for the short pair of polynomials and the
UMTS polynomials respectively. Both pairs of polynomials were synthesized with the
quantization scheme given in table 6.5.
6.2 Synthesis Results 61
Note that the BR-SOVA approximation spent almost the same amount of resources
than the HR implementation. In contrast the amount of used resources significantly
increases when working with the UMTS polynomials. This is due to the fact that the
UMTS encoder has twice the number of states.
Table 6.4 shows the max frequencies that the system can attain. When working with
the pair of short polynomials, the system can reach up to 85 MHz. The critical path is
located in the ACSU unit and it is related to the add, compare, select and ∆ quantiza-
tion delays. On the other side, when working with the UMTS pair of polynomials, the
maximum clock frequency suffers a considerable degradation. This is due to the excessive
62 Measures and Results
10^(0)
10^(−1)
10^(−2)
10^(−3)
BER
10^(−4)
HR Iter 1
BRap Iter 1
10^(−5) HR Iter 3
BRap Iter 3
HR Iter 5
BRap Iter 5
10^(−6) HR Iter 8
BRap Iter 8
Max−log−map Iter 8
10^(−7)
0.5 1 1.5 2 2.5 3
EbNo [dB]
combinational logic that the FPU gets for an eight states code. The optimization of these
units should be considered as future work.
10^(0)
10^(−1)
10^(−2)
10^(−3)
BER
10^(−4)
Figure 6.4 shows the real system BER performance when implementing the BR-SOVA
approximation for the short pair of polynomials. For low SNRs, we see that the real
decoder performs worse than the floating point simulation. For high SNRs the opposite
situation is observed. Note, that for the BR-SOVA approximation the BER performance of
the real decoder is about the same as the BER performance of the floating point simulation.
The ∆ quantization does not improve the BER as much as in the HR implementation.
The comparison between the HR-SOVA implementation and BR-SOVA approximation
implementation is shown in figure 6.5. The figure also shows a partial plot of a quantized
max-log-map algorithm with the following quantization scheme:
We observe that, in the worst case, the HR-SOVA is 0.14dB from the BR-SOVA
approximation and the latter is only 0.1dB from the quantized implementation of the
max-log-map.
Finally figures 6.6 and 6.7 show some partial results of the BER performance with the
UMTS polynomials and the randomly generated interleaver.
64 Measures and Results
10^(0)
10^(−1)
10^(−2)
10^(−3)
BER
10^(−4)
10^(0)
10^(−1)
10^(−2)
10^(−3)
BER
10^(−4)
10^(−5)
HR Iter 8 HIL
10^(−6) BRap Iter 8 HIL
Max−log−map Iter 8 Quant.
Max−log−map Iter 8 Inf.Pre
10^(−7)
0.5 1 1.5 2 2.5 3
EbNo [dB]
10^(−2)
10^(−2.5)
10^(−3)
10^(−3.5)
BER
10^(−4)
HR Iter 1
10^(−4.5) BRap Iter 1
HR Iter 3
BRap Iter 3
HR Iter 5
10^(−5) BRap Iter 5
HR Iter 8
BRap Iter 8
10^(−5.5)
0.5 1 1.5 2 2.5 3
EbNo [dB]
10^(−2)
10^(−2.5)
10^(−3)
10^(−3.5)
BER
10^(−4)
10^(−4.5)
BRap Iter 1 Inf.Pre
BRap Iter 1 HIL
BRap Iter 5 Inf.Pre
10^(−5) BRap Iter 5 HIL
BRap Iter 8 Inf.Pre
BRap Iter 8 HIL
10^(−5.5)
0.5 1 1.5 2 2.5 3
EbNo [dB]
4000
3500
3000
Number of observations
2500
2000
1500
1000
500
0
0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61
SISO throughput as a percentage of the system clock in [bps]
Figure 6.8: Throughput statistics. f = 25M Hz, fRU U = 25M Hz. Pf b = [111], Pg = [101]
3000
2500
Number of observations
2000
1500
1000
500
0
0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9
SISO throughput as a percentage of the system clock in [bps]
Figure 6.9: Throughput statistics. f = 25M Hz, fRU U = 50M Hz. Pf b = [111], Pg = [101]
68 Measures and Results
5000
4500
4000
3500
Number of observations
3000
2500
2000
1500
1000
500
0
0.95 0.955 0.96 0.965 0.97 0.975 0.98 0.985
SISO throughput as a percentage of the system clock in [bps]
Figure 6.10: Throughput statistics. f = 16.66M Hz, fRU U = 25M Hz. Pf b = [111],
Pg = [101]
4000
3500
3000
Number of observations
2500
2000
1500
1000
500
0
0.52 0.54 0.56 0.58 0.6 0.62
SISO throughput as a percentage of the system clock in [bps]
Figure 6.11: Throughput statistics. f = 25M Hz, fRU U = 25M Hz. Pf b = [1011], Pg =
[1101]
6.4 Throughput Results 69
4000
3500
3000
Number of observations
2500
2000
1500
1000
500
0
0.82 0.84 0.86 0.88 0.9 0.92 0.94
SISO throughput as a percentage of the system clock in [bps]
Figure 6.12: Throughput statistics. f = 25M Hz, fRU U = 50M Hz. Pf b = [1011], Pg =
[1101]
8000
7000
6000
Number of observations
5000
4000
3000
2000
1000
0
0.935 0.94 0.945 0.95 0.955 0.96 0.965 0.97 0.975 0.98
SISO throughput as a percentage of the system clock in [bps]
Figure 6.13: Throughput statistics. f = 16.66M Hz, fRU U = 50M Hz. Pf b = [1011],
Pg = [1101]
70 Measures and Results
Table 6.7: Estimated Power consumption. BRapprox. f = 25M Hz, fRU U = 50M Hz
Chapter 7
We have design a complete Turbo Decoder based on the SOVA algorithm. For this purpose
we have introduced a new algorithm for doing the SOVA decoding and we have designed
the architecture that implements it. The resulting design is not affected by the D-U trade-
off and it achieves an optimum SOVA execution. We have also introduced a modification
to the previous architecture that approximates the BR-SOVA. The resulting BER of this
last scheme is 0.1 dB from a comparable Max-Log-Map algorithm.
As future work, the following key points are proposed:
• The system throughput is affected by the management of the fusion points. Different
schemes should be studied with the aim of improving the resulting throughput. For
example, a LIFO memory could be employed instead of a FIFO at the input of the
RUU.
• The power consumption of the system could be reduced by properly selecting the
FPs that lunch the reliability updating process. This way, a long updating-without-
releasing process can be avoided.
• The critical path of the system, for the UMTS polynomials, resides inside the FPU.
Optimization strategies should be analyzed in order to reduce the combinational
delays.
[1] Sorin Adrian Barbulescu. What a wonderful turbo world ... E-book, 2004.
[2] G. Battail. Pondération des symboles décodés par l’agorithem de Viterbi. Ann.
Telecommun, 42:31–38, January 1987.
[4] Gennady Feygin and P.G. Gulak. Architectural Tradeoffs for Survivor Sequence Mem-
ory Management in Viterbi Decoders. IEEE TRANSACTIONS ON COMMUNICA-
TIONS, 41:425–429, March 1993.
[5] Marc P. C. Fossorier, Frank Burkert, Shu Lin, and Joachim Hagenauer. On the Equiv-
alence Between SOVA and Max-Log-Map Decoding. IEEE COMMUNICATIONS
LETTERS, 2(5), May 1998.
[6] David Garret and Micea Stan. Low Power Architecture of the Soft-Output Viterbi
Algorithm. Low Power Electronics and Design, 1998. Proceedings. 1998 International
Symposium on, pages 262– 267, August 1998.
[7] Joachim Hagenauer and Peter Hoeher. A Viterbi Algorithm with Soft-Decision Out-
puts and its Applications. Proc. GLOBECOM IEEE, 3:1680–1686, November 1989.
[8] Pablo Ituero Herrero. Implementation of an ASIP for Turbo Decoding. Master’s
thesis, KTH, May 2005.
[9] Olaf Joeressen, Martin Vaupel, and Henrich Meyr. High-Speed VLSI Architectures
for Soft-Output Viterbi Decoding. Proc. IEEE ICASAP’92, Oakland, California,,
pages 373–384, August 1992.
[11] Lang Lin and Roger S. Cheng. Improvements in SOVA-Based Decoding For Turbo
Codes. Communications, 1997. ICC 97 Montreal, ’Towards the Knowledge Millen-
nium’. 1997 IEEE International Conference on, 3:1473–1478, June 1997.
74 BIBLIOGRAPHY
[12] Lutz Papke and Patrick Robertson. Improved Decoding with the SOVA in a Parallel
Concatenated (Turbo-code) Scheme. IEEE International Conference on Communi-
cations, Conference Record, Converging Technologies for Tomorrow’s Applications.,
1:102–106, June 1996.
[13] C.B Shung, P.H. Siegel, G. Ungerboeck, and H.K Thapar. VLSI Architectures for
Metric Normalization in the Viterbi Algorithm. Communications, 1990. ICC 90, In-
cluding Supercomm Technical Sessions. SUPERCOMM/ICC ’90. Conference Record.,
IEEE International Conference on, 4:1723–1728, April 1990.
[15] T.K Truong, Ming-Tang Shih, Irving S. Reed, and E. H. Satorius. A VLSI Design for
a Trace-Back Viterbi Decoder. Communications, IEEE Transactions on, 40:616–624,
March 1992.
[16] Matthew C. Valenti. Iterative Detection and Decoding for Wireless Communications.
PhD thesis, Virginia Polytechnic Insitute and State University, July 1999.
[17] Yan Wang, Chi-Ying Tsui, and Roger S. Cheng. A Low Power VLSI Architecture
of SOVA-based Turbo-code decoder using scarce State Transition Scheme. IEEE
International Symposium on Circuits and Systems, Geneva, Switzerland, 00:00–00,
May 2000.
[18] Zhongfeng Wang. High Performance, Low Complexity VLSI Design of Turbo De-
coders. PhD thesis, University of Minnesota, September 2000.