Channel Coding

Channel Coding
IV
ERSITAT
U
L
ANDO
SCIEN
UR
O
DO
Lecture Notes University of Ulm
Dr.-Ing. Ralph Jordan
(unnished working manuscript, Mai 2003)
The copyright lies with the author. This copy is only for personal use. Any reproduction, publication or further distribution requires the agreement of the author.
CEND
III
Contents
I Introduction 1
3 7 7 11 13 14 21
1 A Digital Communication System 2 A rst Encounter: Block Coding and Trellis Coding 2.1 Simple Block Coding Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Convolutional Coding An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Coding Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Additive White Gaussian Noise Channel 4 To the History of Channel Coding
II
Block Coding
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
26 26 30 30 32 35 35 35 36 37 37 41 46 50 52 53 53 53 56 60 65 66 66 67 68 68 69
5 Linear Binary Block Codes 5.1 Structural Properties . . . . . . . . . . . . . . . 5.2 Distance Properties . . . . . . . . . . . . . . . . 5.2.1 Basic Denitions . . . . . . . . . . . . . 5.2.2 Bounds on Minimum Distance . . . . . 5.3 Important Code Families . . . . . . . . . . . . . 5.3.1 Simplex Codes . . . . . . . . . . . . . . 5.3.2 Hadamard Codes . . . . . . . . . . . . . 5.3.3 Reed-Muller Codes . . . . . . . . . . . . 5.4 Decoding Aspects . . . . . . . . . . . . . . . . . 5.4.1 Decoding Rules . . . . . . . . . . . . . . 5.4.2 Trellis Representation of Block Codes . 5.4.3 Ecient Optimum Decoding Techniques 5.4.4 Bounds on Performance . . . . . . . . . 5.4.5 Suboptimum Decoding Techniques . . . 6 Cyclic Codes 6.1 Structural Properties . . . . . . . 6.1.1 A First Encounter . . . . 6.1.2 Finite Field Arithmetic . 6.1.3 Roots of Cyclic Codes . . 6.2 Distance Properties . . . . . . . . 6.3 Important Code Families . . . . . 6.3.1 BCH Codes . . . . . . . . 6.3.2 Golay Code . . . . . . . . 6.3.3 Reed Solomon Codes . . . 6.3.4 Quadratic Residue Codes 6.4 Decoding Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IV 6.4.1 6.4.2 6.4.3
CONTENTS Algebraic Decoding Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . Bounds on Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implementation Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 70 70
III
Trellis Coding
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
73 73 73 76 80 82 82 85 86 89 89 89 89 89 90 92 92 92 92 95 96
7 Linear Binary Convolutional Codes 7.1 Structural Properties . . . . . . . . . . . . . . . . . . 7.1.1 Code, Generator Matrix, and Encoder . . . . 7.1.2 Minimal Encoder Realization . . . . . . . . . 7.1.3 Minimal Generator Matrix . . . . . . . . . . 7.2 Distance Properties . . . . . . . . . . . . . . . . . . . 7.2.1 State Diagram and Transition Matrix . . . . 7.2.2 Weight Enumerators . . . . . . . . . . . . . . 7.2.3 Active Distances . . . . . . . . . . . . . . . . 7.3 Important Code Families . . . . . . . . . . . . . . . . 7.3.1 OFD Codes . . . . . . . . . . . . . . . . . . . 7.3.2 MS Codes . . . . . . . . . . . . . . . . . . . . 7.3.3 ODP Codes . . . . . . . . . . . . . . . . . . . 7.3.4 QLI and ELI Codes . . . . . . . . . . . . . . 7.3.5 Punctured Codes . . . . . . . . . . . . . . . . 7.4 Decoding Aspects . . . . . . . . . . . . . . . . . . . . 7.4.1 Trellis Representation of Convolutional Codes 7.4.2 Ecient Optimum Decoding Techniques . . . 7.4.3 Bounds on Performance . . . . . . . . . . . . 7.4.4 Suboptimal Decoding Techniques . . . . . . . 8 Trellis Coded Modulation
IV
Code Concatenation
97
100 102
A Information: A Quantitative Measure References
Part I
Introduction
Chapter 1
A Digital Communication System

In 1948 Claude E. Shannon published one of the most remarkable papers in the history of engineering: A Mathematical Theory of Communication, Bell System Tech. Journal, Vol. 24, July and October 1948, pp. 379 - 423 and pp. 623 - 656. Rather than working with complex physical models, he dealt with simple mathematical models which allowed him to gain insight into the most fundamental problems of information transmission. This was the beginning of information theory. Shannon dened information as a quantitative measure. He recognized that information is something that is received by observing a random variable. He concluded that communication systems should not be built to reproduce the waveform of the transmitted signal but should be built to transmit random quantities. This is exactly what is done by a digital communication system. The reader not familiar with the concept a random variables might like to read Appendix A which gives a brief introduction to discrete probability theory.
Source
Source encoder souce coding
Channel encoder channel coding
Channel y
Destination
Source decoder
Channel decoder
Figure 1.1: Block scheme of a digital communication system. Figure 1.1 shows the information theoretic aspects of a digital communication system. Here and in the following we will consider mathematical and not physical models: Source The source generates a sequence of messages M which are taken, with probability f M (m = M ), from a set of possible messages M. M is a random variable and fM (m) is its probability function. The entropy of M is dened as H(M ) = fM (m) log2 fM (m).
m
(1.1)
This is the (average) information that is obtained by observing one output of the source and H(M ) is often called the information rate of the source.
CHAPTER 1. A DIGITAL COMMUNICATION SYSTEM
The simplest source we can think of is a binary discrete memoryless source (DMS). This generates a sequence of statistically independent and identically distributed (iid) binary random variables. The entropy of the binary DMS is H(M ) = p log2 (p) (1 p) log2 (1 p) h(p) (1.2)
where fM (M = 0) = p and fM (M = 1) = 1 p are the probabilities of the binary outputs. The binary entropy function h(p) is depicted in Figure 1.2. Since h(p) is symmetric, it is sucient to consider
1 binary entropy function
0.8
0.6
0.4
0.2
0 0
0.2
0.4 p
0.6
0.8
Figure 1.2: Information rate of binary DMS: binary entropy function. 0 p 1/2. We do not receive any information by observing the output of the binary DMS with p = 0 and get the maximal information, 1 bit1 /symbol, for p = 1/2. For 0 < p < 1/2 the source generates symbols that carry an information of less that 1 bit/symbol. Source Encoder The source encoder tries to remove the redundancy from its input sequence. The realization of the source encoder can require considerable computational complexity and is the limiting factor in the application of source coding. Channel Encoder The channel encoder adds redundancy to the sequence. For a block of k binary inputs it generates a block of n, n k, binary outputs. We call R = k/n the code rate. The redundancy can be used to correct errors that occur during transmission. Usually, the channel encoder can be realized with low complexity. Channel The channel is a probabilistic device that is fully described by the conditional probability function fY |X (y|x) as shown in Figure 1.3. Here and in the following we will assume the channel to be memoryless. This means that an output at time t only depends on the input at time t. Then
x fY |X (y|x)
Figure 1.3: Mathematical channel model. fY |X (y|x) is the probability to have y, taken from the symbol alphabet Y, at the output given x,
1 Here we use the notation bit as a quantity of information. The same notation is often used when talking about binary symbols which is not the case here!
5 taken from the symbol alphabet X , at the input. Shannon introduced an important parameter which he called the channel capacity C where H(X) =
x fX (x) log2 fX (x) fX (x)
max H(X) H(Y |X)
(1.3)
is the entropy of X and fY |X (y|x) log2 fY |X (y|x)

y
H(Y |X) =
(1.4)
is the conditional entropy of Y given X is known. The channel capacity is the maximum (average) information that can be transmitted over the channel per channel use. The simplest channel we can think of is a discrete memoryless channel (DMC) with binary input and binary output symbols. This is often called the binary symmetric channel (BSC). It is depicted
x=0
1p p p
y=0
x=1 1p
y=1
Figure 1.4: Binary symmetric channel. in Figure 1.4. The crossover probability p = fY |X (1|0) = fY |X (0|1), 0 p 1/2, is the probability that the BSC changes its input symbol. Clearly, fY |X (0|0) = fY |X (1|1) = 1 p. The channel capacity of the BSC is C = 1 h(p) where h(p) is the binary entropy function. It is depicted in Figure 1.5.
10
0
(1.5)
channel capacity
0.6
channel capacity
0.8
10
0.4
0.2
0.1
0.2 p
0.3
0.4
0.5
10
10
10 p
10
Figure 1.5: Channel capacity of the BSC: linear scale (left) double logarithmic scale (right).
CHAPTER 1. A DIGITAL COMMUNICATION SYSTEM
Channel Decoder The channel decoder tries to recover the transmitted information. In general, this is a dicult task. One possible attempt is to nd the most probable sent code word. This is easy to state mathematically, but we will see that its realization is often too complex to be implemented. Source Decoder The source decoder performs the inverse mapping of the source encoder. Usually, it can be realized with low complexity. Destination That is the sink of the transmitted information.
In his landmark paper Shannon showed the following: Theorem 1.1 (source coding theorem) Any source is characterized by its information rate H. Its output can be represented by a sequence of R > H binary symbols per source symbol. It is not possible to represent it with by a binary sequence of R < H symbols per source symbol. Theorem 1.2 (channel coding theorem) Any communication channel is characterized by its channel capacity C. There exist a coding method that allows us to transmit reliable over the channel with a code rate R < C, i.e., with arbitrarily small error probability P (u = u). It is not possible to communicate reliable over the channel with code rates R > C.
R u Channel encoder x
C y Channel Channel decoder u
Figure 1.6: Channel coding theorem. One consequence of the theorems above is that source and channel coding can be done separately. At the time when Shannon formulated the channel coding theorem it was wide-spread understanding that, to obtain reliable communication, the transmit power must be increased (which as Shannon showed increases the channel capacity). Shannons statement that communication with arbitrarily small error probability is possible by using a nite transmit power was revolutionary and hard to gasp by many engineers of that time. Moreover, Shannons proof of the coding theorem was an existence proof. He did not present a particular coding scheme that was able to achieve reliable communication at code rates R near capacity C, he just proofed that such a coding scheme exists. This sparked intensive research and soon channel coding became a scientic eld of its own. Nowadays, channel coding is an integral part in digital communication systems. The main applications are data storage systems, e.g., optical (CD, DVD, etc.) magnetic (HDD, etc.) and communication systems, e.g., wireless (GSM, UMTS, etc.) high-speed (DSL, etc.). Although the physical channels of all these systems are quite dierent they all apply coding systems that were designed for and tested with simple mathematical channel models as the BSC and the additive white Gaussian noise channel.
Chapter 2
A rst Encounter: Block Coding and Trellis Coding

Channel coding developed in two directions, viz., block coding and convolutional coding. Here we will present the basic concepts of both methods.
2.1
Simple Block Coding Schemes
In the following we will introduce three families of block codes, namely, repetition codes, single parity check codes, and Hamming codes. Thereby we will explain the basic idea behind block coding. A simple channel coding system is to transmit each binary information symbol three time. The corresponding block code C = {000, 111} (2.1)
1 k = (2.2) n 3 and n k = 2 is the redundancy per transmitted information symbol. Given C is used to communicate over the BSC with crossover probability p and let r = 100 be the channel output. Since 0 < p 1/2, we have P (r|c) = p(1 p)2 and P (r|c ) = p2 (1 p), and it is more probable that c was sent then that c was sent. Obviously, we obtain the same with the following decoding method: Count the number of dierent positions in r and c and in r and c . Then decide on that code word which diers in less positions. We dene the Hamming distance between two vectors as the number of their dierent positions and write R= dH (r, c) = 1 and dH (r, c ) = 2. Now we can mathematically formulate a decoding rule by c = min{dH (r, c)}
cC
is the set of two code words c = 000 and c = 111. We denote the code length by n = 3 and the code dimension by k = log2 |C| = 1. Then, the code rate is given by the fraction
(2.3)
(2.4)
where c denotes the estimated code word which determines the estimated information word u. Later we will see that probabilities and distances are related with each other by the so-called metric. A metric allows to measure distances in a vector space. We dene the minimum distance of a block code as the minimum Hamming-distance between all code words in the code, i.e., dm = min {dH (c, c )}.
c,c C c=c
(2.5)
CHAPTER 2. A FIRST ENCOUNTER: BLOCK CODING AND TRELLIS CODING
From our decoding rule it immediately follows that a code with minimum distance d m can correct t errors, where t (dm 1)/2. (2.6)
Here, dm = 3 and it follows, as we already have seen, that C can correct t = 1 error. Now we are ready to dene our rst code family. Denition 2.1 (repetition code) The length n repetition code is the set of two code words consisting of the all-zero code word c = 00 . . . 0 and c = 11 . . . 1. Often the code parameters (n, k, dm ) are used to distinguish between dierent block codes, even if the code parameters not uniquely determine the code. Clearly, the code parameter (n, 1, n) of the repetition code are unique, i.e., there does not exist an other code with the same code parameters. The repetition code rate decreases with the length of the code. This is a serious restriction and repetition codes can not be used in many applications. Single parity check codes are another important code family. Before we dene it, we need to introduce the modulo 2 addition of two binary symbols. This is given in Table 2.1. In digital circuit 0 0 1 1 1 0
0 1
Table 2.1: Modulo 2 addition table. design the corresponding device consisting of two inputs and one output is called a logical xor gate. If not mentioned otherwise, all additions are considered to be modulo 2. Now let us consider a length n = 3 code where each code word encodes k = 2 information bits u0 u1 such that c = u0 u1 u0 + u1 . Hence, we have C = {000, 011, 101, 110} and the mapping of information to code words is given by: u0 0 0 1 1 u1 0 1 0 1 c0 0 0 1 1 c1 0 1 0 1 c2 0 1 1 0
The code parameters are (3, 2, 2). The minimum distance is dm = 2 and the code can not correct a single error! Nevertheless, it enables us to detect a single error. We dene the Hamming weight wtH (c) of a vector c as the number of its non-zero elements. We call c2 parity check symbol and obtain from c2 = u0 + u1 that c C implies c0 + c1 + c2 = 0. (2.7) Hence, all code words in C have even weight. Assume we have received r = r0 r1 r2 , then a simple method to detect one or three errors in r is given by the following decoding rule: r0 + r 1 + r 2 = 0 0 or 2 error in r 1 1 or 3 error in r. (2.8)
Since we are not able to detect two errors, the code is said to be single error detecting. We can rewrite eq. (2.7) as cH T = 0 where H = 1 1 ... 1 cC (2.9)
is called the parity check matrix of C and we are prepared to dene:
2.1. SIMPLE BLOCK CODING SCHEMES
Denition 2.2 (single parity check code) The length n single parity check code is the set of all code words satisfying cH T = 0 where the 1 n parity check matrix is given by H = 1 1 . . . 1 . The code parameters of a single parity check code are (n, n 1, 2). Hence, these codes detect one error, but can not correct it. This concept of dening a block code by using a parity check matrix can easily be generalized to codes with two or more parity check bits. Consider for example the family of repetition codes. Here we have n 1 parity check bit and it is easy to verify that the corresponding n 1 n parity check matrix is given by 1 1 0 ... 0 1 0 1 0 H= . (2.10) . .. . . . 1 1
Another important code family can be dened as follows:
Denition 2.3 (Hamming code) The length n = 2m 1, m 3, Hamming code is the set of code words satisfying cH T = 0, where the m n parity check matrix consists of all non-zero m-tuples as its columns. The code parameters of the Hamming code are (2m 1, 2m m 1, 3). Hence, it can correct one error. Let us consider an example: Example 2.1 ((7,4,3) Hamming code) We parity check matrix 1 0 0 1 H= 0 0 consider the rate R = 4/7 Hamming codes with the 1 0 1 0 1 1 0 0 1 1 . 0 1 1 1 1
(2.11)
Notice, its columns are all 7 non-zero binary 3-tuples. The code parameters are (7, 4, 3) and the set of code words is depicted in the following table:
c0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 c1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 c2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 c3 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 c4 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 c5 0 1 1 0 0 1 1 0 1 0 0 1 1 0 0 1 c6 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1
Notice, to map information 4-tuples u0 u1 u2 u3 with code words, we might use c3 = u0 , c4 = u1 , c5 = u2 , and c6 = u3 . Then the information bit occur unchanged in the code word. Such a mapping of information to code words is called systematic.
10
The decoding rule formulated in eq. (2.4) is very general and applies to Hamming codes as well. Unfortunately, the computational complexity increases exponentially with the code dimension! We are lucky to know an other decoding method for Hamming codes which achieves exactly the same result with a decoding complexity that increases linearly with the code dimension. Given we have received r = c + e, , c C, where e = e0 e1 . . . en1 is a binary vector representing the errors in r. Then, the 1 (n k) dimensional binary vector s = rH T (2.12) is called the syndrome of r. From s = (c + e)H T = eH T , where we used cH T = 0, it follows that the syndrome depends on e, only. Hamming codes have the beautiful property, that each of the 2 nk 1 non-zero syndrome uniquely determines an error pattern of Hamming weight wt H (e) = 1. We can formulate the following decoding rule for Hamming codes:
1. compute the syndrome: s = rH T with H =
h1 h2 . . . hnk
2. determine error: s = 0 ei = 1 for sT = hi and ej = 0 for j = i 3. estimated code sequence: c = r + e Example 2.2 (Decoding (7, 4, 3) Hamming code) The syndrome dimension is n k = 3 and there exist 7 non-zero syndrome s = s0 s1 s2 . On the other hand there are 7 possible positions for a single error in e = e0 e1 e2 e3 e4 e5 e6 . We know there is a one-to-one mapping between these errors and the corresponding syndrome. Notice that the columns hi of the parity check matrix in (2.11) T are the binary representation of the index i, e.g., h3 = 1 1 0 represents 2 if read from left (least signicant bit) to right (most signicant bit). Hence, the syndrome s = rH T is the binary representation minus one of the position i where the error ei = 1 occurred. This is summarize in the the following table:
e0 0 1 0 0 0 0 0 0 e1 0 0 1 0 0 0 0 0 e2 0 0 0 1 0 0 0 0 e3 0 0 0 0 1 0 0 0 e4 0 0 0 0 0 1 0 0 e5 0 0 0 0 0 0 1 0 e6 0 0 0 0 0 0 0 1 s0 0 1 0 1 0 1 0 1 s1 0 0 1 1 0 0 1 1 s2 0 0 0 0 1 1 1 1
Let us now consider the problem of block code design from a more general point of view. Assume we want to design a code for transmitting over a BSC with crossover probability p. Then the expectation on the number of errors in a code word of length n is given by np. It can be shown that the probability more than np errors in a code word decreases as the code length n increases. This means that employing a block code with n 1 and an error correcting capability that is slightly higher than np errors would do a good job. We can simply summarize the basic idea behind the design of a channel coding system: use redundancy to gain error-correction capability average, i.e., use long code words, to reduce the variance of the number of errors From the simple code designs that we considered above one might get the impression that this task is simply to reach. Actually, it is extremely dicult to design long block codes with large error-correcting capability nd decoding methods that can be implemented with state of the art hardware
2.2. CONVOLUTIONAL CODING AN EXAMPLE
11
2.2
Convolutional Coding An Example
Convolutional coding is another approach to design codes for forward error correction. It is quite dierent to that of block coding. Both methods developed in parallel, more or less independent. While there is deep extensive theory for block coding, convolutional coding is more straight forward and practice-oriented. In the following we consider a simple example to explain the basic idea behind convolutional coding. The main dierence between block and convolutional codes is that the rst are block-oriented while the latter are stream-oriented. A rate R = k/n, n k, convolutional encoder with memory m maps a information sequence of k-tuples u = u0 u1 u2 . . . with ut = ut,1 . . . ut,k to a code sequence of n-tuples c = c0 c1 c2 . . ., with ct = ct,1 . . . ct,n such that the output at time t depends on the input at time t and the m previous input tuples, i.e., ct = fct(ut , ut1 , . . . , utm ). (2.13)
Figure 2.1 depicts a rate R = 1/2 convolutional encoder with memory m = 1. The code sequence of binary 2-tuples ct = ct,1 ct,2 is obtained by ct,1 = ut ct,2 = ut + ut1 (2.14) (2.15)
where the addition is taken modulo 2. Clearly, for a innite input sequence the encoder will generate an innite output sequence!
Figure 2.1: Rate R = 1/2 convolutional encoder with memory m = 1. The encoder requires one memory element (m = 1). The content of this memory element is called the encoder state and is denoted by t . The encoder can be regarded as nite state machine. The corresponding state diagram is depicted on the left hand side of Figure 2.2. It consists of two states and four state transitions. The labeling of the transitions, ut /ct ct1 , is the information symbol and the code tuple that are associated with the corresponding state transition: t 0 0 1 1 t+1 0 1 0 1 ut 0 1 0 1 ct,1 ct,2 00 11 01 10
While the state diagram fully describes the convolutional encoder, it is often useful to explicitly represent time in the process of encoding. This can be done with the trellis module which is depicted in Figure 2.2 on the right hand side. The advantage is that we can cascade the trellis module to obtain the encoder trellis as shown in Figure 2.3. Notice, we assumed that a convolutional encoder starts in the zero state.
Demux
ut
ct = ct,1 ct,2
12
1/10 1/10 1 1/11 1/11 0/01 0/01 0 0/00 0 0/00 0 1 1
Figure 2.2: State diagram (left) and trellis module (right) of the convolutional encoder in Figure 2.1.
1/10 1 1/11 1 1/11 0/01 0 0/00 0 0/00 0 1/10 1 1/11 0/01 0/00 0 1/10 1 1/11 0/01 0/00 0 1/10 1 1/11 0/01 0/00 0 1/10 1 1/11 0/01 0/00 0
Figure 2.3: Trellis of rate R = 1/2 convolutional encoder in Figure 2.1. Given a convolutional encoder, then a convolutional code C is dened as the (innite) set of code sequences that can be generated by the encoder. The trellis is a graphical representation of the convolutional code and, additionally, determines the mapping between information and code sequence. Let us consider an example. Given the information sequence u = 110110 . . . (2.16)
is shifted to the encoder. Then we obtain the code sequence by starting on the left hand side in the zero state of the trellis and following the path determined by the input. We obtain c = 11 10 01 11 10 01 . . . . (2.17)
The error correction capability of a convolutional code is rather dicult to describe. Here we will only consider the free distance which is dened as dfree = min {dH (c, c )}.
c,c C c=c
(2.18)
Notice, that this denition is equivalent to the denition of the minimum distance of block codes given in eq. (2.5). The free distance of the rate R = 1/2 convolutional code above is d free = 3. Hence, it can correct one error. Due to the innite code length this is not much, but if the errors are spread wide enough, the code can correct more than a single error. Later, the error correction capability of convolutional codes will be discussed in detail. Exercise 2.1 Find code sequences in the trellis of Figure 2.3 with Hamming distance 3. An important question is still open: How can we decode a convolutional code? This will be left open here. Generally, the decoding rule c = mincC {dH (r, c)} given in eq. (2.4) for block codes applies for convolutional codes as well. Later we will introduce the Viterbi algorithm and show that it solves eq. (2.4) with the least possible computational complexity using the trellis representations.
2.3. CODING GAIN
13
2.3
Coding Gain
To compare the performance of channel coding schemes with uncoded transmission, we consider the bit error probability that is achieved for a given signal to noise ratio Eb /N0 per transmitted information bit. In Figure 2.4 the simulated bit error probability of the (7, 4, 3) Hamming code and the rate R = 1/2 convolutional code with memory m = 1 is depicted.
10
0
10
bit error probability
10
10
10
10
10
uncoded undecoded (7,4,3) Hamming code 0 5 E /N [dB]

b 0
10 10 15
uncoded undecoded (3,1) conv. code, hd. (3,1) conv. code, sd. 0 5 E /N [dB]
b 0
10
15
Figure 2.4: . Simulated bit error probability of the (7, 4, 3) Hamming code for hard decision decoding (left) and of the rate R = 1/2 memory m = 1 convolutional code for hard- and soft-decision decoding (right). The use of redundancy decreases the signal to noise ratio Es /N0 that is used to transmit over the channel. If we do not use the transmitted redundancy for error-correction, but simply perform a hard-decision demodulation of the transmitted information symbols, we obtain the curve denoted by un-decoded. Due to the loss of signal to noise ratio this is worse compared to uncoded transmission. Using the redundancy for error-correction decreases the bit error probability. For a given bit error probability the dierence of the signal to noise ratio in dB between uncoded and coded transmission is called the coding gain. Clearly, the coding gain should be positive! Considering the case of hard-decision decoding in our two examples the coding gain becomes positive for small bit error probabilities, but does not show a signicant improvement. In the case of soft-decision decoding we have about 1.5 dB gain at moderate bit error probabilities. According to Shannons coding theorem, we know that there exist coding schemes that allow to communicate with an arbitrarily small bit error rate at approximately 0 dB. Hence, there is a coding gain of 10 dB at bit error probabilities of 106 possible. This lecture will be about how to achieve such large coding gains.
14
Chapter 3
The Additive White Gaussian Noise Channel

The most important channel model in coding theory is the additive white Gaussian noise (AWGN) channel. Most codes are designed for and tested with this channel model. White Gaussian noise Let n(t) be a time-continuous sample of a wide-sense stationary random process. The process is said to be white, if its (two-sided) power spectral density satises n (f ) = N0 /2. (3.1)
Then the noise power is independent of the frequency f and within any interval of width f there is noise with power f N0 /2. The random process is said to be Gaussian or, equivalently, normal distributed, if the noise amplitude n(t) is independent of t and satises fN (n) = with variance 2 = N0 /2. A random process that satises both, eq. (3.1) and (3.2), is called white Gaussian noise. (3.3) 1 2 2 e 22
n2
(3.2)
n(t) x(t) y(t) = x(t) + n(t)
Figure 3.1: Additive white Gaussian noise channel with time-continuous input and output.
AWGN channel The AWGN channel for time-continuous input and output is depicted in Figure 3.1. Its capacity is given by C = max H(Y ) H(N )
fX (x)
(3.4)
where the entropy of the output is
H(Y ) =
fY (y) log2 fY (y)dy
(3.5)
15 and the entropy of white Gaussian noise is H(N ) = 1 log2 (eN0 ). 2 (3.6)
It is known that H(Y ) is maximal if Y is Gaussian-distributed. Since the sum of two Gaussiandistributed random variables is again Gaussian-distributed, this is achieved by a Gaussian-distributed input variable x(t). It follows that C= S 1 log2 1 + 2 N (3.7)
where S is the power of the input signal x(t) and N = 2 is the noise power. S/N is called the signal to noise ratio. Eq. (3.7) is depicted in Figure 3.2. It is easy to see that we can transmit an arbitrarily large number of information bits per channel use by increasing the signal to noise power ratio S/N .
6 5
10
0
channel capacity
4 3 2 1
channel capacity
10
10
0 20
10
10 20 S/N [dB]
30
40
20
10
0 10 S/N [dB]
20
30
Figure 3.2: Channel capacity of the AWGN channel with time-continuous input and output: linear scale (left) and logarithmic scale (right).
Band-limited AWGN channel The channel capacity of the band-limited AWGN channel with time-continuous input signal x(t) is given by C = W log2 1 + S N (3.8)
where W is the channel bandwidth, N = N0 W is the noise power in the frequency band of width W , and S is the signal power. Clearly, keeping S/N constant and increasing the bandwidth W linearly increases the channel capacity with W . This is what we expected: having twice the bandwidth allows us to transmit twice the information! For W we obtain C = S/(N0 ln 2). To compare coded with uncoded transmission, we let the signal power be given by S = REb (3.9)
where R is the code rate and Eb is the signal power per transmitted information bit. Now we obtain C REb = . R N0 ln 2 (3.10)
16
CHAPTER 3. THE ADDITIVE WHITE GAUSSIAN NOISE CHANNEL
Since reliable communication requires a code rate R < C , it follows that Eb > ln 2 0.69 1.6 dB. N0 (3.11)
This bound is of fundamental meaning. It states that reliable communication over the AWGN channel is possible if the ratio Eb /N0 is large than 0.69. It also follows that reliable transmission with Eb /N0 < 0.69 is not possible. The capacity of the band-limited AWGN channel plotted over S/N 0 and Eb /N0 is shown in Figure 3.3
10
8 channel capacity
0 10
0 10 20 S/N0 and Eb/N0 [dB]
30
Figure 3.3: Channel capacity of the band-limited AWGN channel with bandwidth W = 1.
AWGN channel with a nite input alphabet Despite the limited power we did not imply any restriction on the input signal x(t). For a nite input alphabet X , it is dicult to explicitly compute the channel capacity. What we can state immediately is that C = log2 |X | for S , N (3.12)
where |X | is the cardinality of the input alphabet X . Clearly, a nite input alphabet puts a limit to the maximal number of information bits that can be transmitted in one channel use. Digital communication systems have a nite set of modulation signals. Often it is possible to represent the cascade of the modulator, the channel, and the demodulator by a time-discrete AWGN channel.
n(t) x(t) e(t) xe(t)
coherent matched lter y=

Ts 0 e(t) Es
Es x + n
dt
Figure 3.4: BPSK modulator, AWGN channel, and coherent demodulation.
17 Let us consider a simple example. A modulator that applies binary phase shift keying (BPSK) sequentially transmits one of the two signal e(t) and e(t). Figure 3.4 depicts the cascade of modulator, channel, and demodulator. It can be shown that the demodulator output is given by y= Es x + n (3.13)
where Es is the elementary signal energy and n is a Gaussian distributed random variable with variance 2 = N0 /2. Hence, modulator, AWGN channel, and coherent demodulator can be represents by a time-discrete AWGN channel with nite input alphabet. The corresponding channel is depicted in Figure 3.5.
n x y = x Es + n
Figure 3.5: Time-discrete AWGN channel. Depending on the modulation scheme and the kind of demodulator, we are able to represent the cascade of modulator, AWGN channel, and demodulator by a time-discrete AWGN channel. The channel capacity of the time-discrete AWGN channel with nite, equally-distributed input alphabet 1 is obtained by combining eq. (3.4) and (3.5) to
C=
fY (y) log2 fY (y)dy
1 log2 (eN0 ) 2
(3.14)
where the probability function of the output random variable depends on the modulator and demodulator. If a single complex elementary signal of energy Es is modulated and a coherent matched lter demodulator is employed, then output is given by y=x Es + n
x |X| 2
(3.15) = 1. Then, the probability
where X is normalized such that the expectation E[|X|2 ] = 1/|X | function of the output is given by 1 fY (y) = |X | This can be computed numerically. e 2 2 1
( Es x+n)2 2 2
(3.16)
The channel capacity of the BPSK scheme discussed previously is given by

( 1 e C= 2 N0 Es +n)2 N0
+e
( Es +n)2 N0
1 log2 (eN0 ). 2
(3.17)
This is shown in Figure 3.6. If we do not apply any channel coding, i.e., a rate R = 1 code, then the most probable sent information symbol is determined by the sign of the demodulator output x=
1
1, y < 0 1, y0
(3.18)
Despite the case of a binary antipodal modulation alphabet a no equally distributed input alphabet leads to a larger channel capacity. In case of large alphabet cardinalities, this is taken into account by a concept called shaping. This guarantees that symbols with large energy are less frequently used than those with low energy.
18
10
1
channel capacity
0.6
channel capacity
0.8
10
0.4
0.2
2
0 10
0 5 10 Es/N0 and Eb/N0 [dB]
15
10
1 2 Eb/N0 [dB]
Figure 3.6: Channel capacity of BPSK modulation over the AWGN channel with coherent matched lter demodulation: plotted over Es /N0 and Eb /N0 on a linear scale (left) and plotted over Eb /N0 on a logarithmic scale (right).
10 10 bit error probability 10 10 10 10 10
5 E /N [dB]
b 0
10
Figure 3.7: Bit error probability of uncoded BPSK modulation over the AWGN channel with coherent matched lter demodulation plotted together with the bit error rate of hypothetical rate R 0 channel coding system that achieves capacity.
19 and the bit error probability of the transmission is given by Pb = 1 erfc( 2 Es /N0 ) (3.19)
with the complementary error function dened as erfc(x) = 2/

y
ey dy.
(3.20)
The corresponding bit error probability is depicted in Figure 3.7. An appropriate model for this transmission scheme is the BSC with crossover probability p = 1/2 erfc( Es /N0 ). Exercise 3.1 Write a computer program in MATLAB to get the capacity of the time-discrete AWGN channel representing M -array quadrature amplitude modulation (QAM) AWGN channel, and coherent demodulator. The absolute value of the demodulator output y = Es x + n can be interpreted as the reliability information of the symbol x: the larger y, the more reliable is the estimated symbol. 2 To see that this reliability information should be used by the coding system, we compare the channel capacity of the hard-decision demodulated BPSK scheme with the capacity of the soft-decision demodulated BPSK scheme. These capacities are given in eqs. (1.5) and (3.17) and are compared in Figure 3.8. If the reliability of the symbol is not used by decoding scheme, there is a penalty of 2db for low rate coding systems!
0.8 channel capacity
0.6
0.4
0.2
4 6 E /N [dB]
b 0
10
Figure 3.8: Comparison of the channel capacities for hard- and soft-decision demodulation of BPSK modulated transmission over the AWGN channel. Figure 3.9 depicts the channel capacity for the time-discrete AWGN channel representing an MASK modulator, AWGN channel, and coherent matched lter. As we can see, increasing the number of input signal enables us to transmit more that 1 bit per channel use but requires a substantially larger Eb /N0 . With an increasing need for bandwidth eciency, there is a tendency towards modulation schemes with large input alphabet cardinality.
2 Such a simple interpretation is only possible for a binary modulation alphabet. A general method to compute reliability information at the demodulator output will be presented later.
20
4.5 4 3.5 channel capacity 3 2.5 2 1.5 1 0.5 0 20 10 0 10 20 30 E /N and E /N [dB]

s 0 b 0
40
Figure 3.9: Channel capacity of the AWGN channel with M-ASK modulation, M = 2, 4, 8, 16 and equally distributed input symbols. Exercise 3.2 Write a computer program in MATLAB to perform a Monte-Carlo simulation to obtain the bit error probability of the above described BPSK scheme and verify the setup by comparing the simulated bit error probability with eq. (3.19). A comment to other channels While there is a wide range of physical channels that can be approximated by the AWGN channel, there also exist many situations that require a more complex model. Usually, physical channels have memory. Since it is dicult to design codes for such channels, one permutes the input sequence of the channel. Then, after transmission, the permutation is undone and thereby the eect of having a channel with memory is destroyed. Clearly, an appropriate permutation depends on the kind of channel. To design permutations that satisfy certain properties can be a dicult task!
21
Chapter 4
To the History of Channel Coding

The following gures are taken from the presentation The Genesis of Coding Theory by Daniel J. Costello in 1999 at Thomson Consumer Electronics. They depict the coding gain at a bit error probability of 105 and 1010 of various coding schemes. While the rst are taken from simulations, the latter are union bound estimations. The colors in the plots indicate the decade within which the coding schemes was developed: 50s: (green) This is the decade of binary linear block codes: Hamming code, Golay code, Reed Muller codes. 60s: (magenta) Algebraic coding theory developed as independent research area in coding theory. Cyclic code constructions as BCH codes, Reed Solomon codes and quadratic Residue codes achieve a large step towards capacity. Additionally, low density parity check codes were developed. 70s: (blue) This is the decade of trellis coding. Convolutional codes enable to apply soft-input decoding and achieve excellent performance with rather simple encoding and decoding schemes. 80s: (lilac) Bandwidth eciency becomes more and more important. Trellis coded modulation enables us to construct such schemes. 90s: (red) Turbo codes close the gap to capacity and coding at rates close to channel capacity becomes possible. Iterative decoding is applied to a wide range of problems, e.g., turbo trellis coded modulation and achieves excellent results. Within this lecture we will discuss some of the coding schemes depicted in Figure 4.1 and 4.2. A system engineer will not design new coding schemes. He will have to analyze the underling digital communication system and decide which coding scheme will be the best choice. There is trade-o in the design of a channel coding system between performance: in terms of error correction capability, error detection capability, bit error probability, and decoding error probability delay: determined by the code length and the time that decoding takes complexity: caused by encoding and mainly decoding the code After the lecture the student should be able to deal with this problem.
22
d
CHAPTER 4. TO THE HISTORY OF CHANNEL CODING

8x8PSK BCM (3,2,6)8PSK Ung TTCM/8PSK (3,2,17)8PSK BHD/TCM

Pragmatic QPSK
Cap acit yB oun
2.0
Imai ML (3,2,2)8PSK Ung
(4,3,4)4D Wei (2,1,4)GU 3x4PSK (3,2,4)8D Wei

1.0
(2,1,6)QPSK Hamm/turbo APP
(2,1,6)AM Ung RS(255,223)
DITEC SPADE
BCH(64,45) turbo(1024,18) turbo(65536,18) BHD turbo(16384,18)

Pioneer LDPC
(2,1,6)Vit
Voyager
IS95 (4,1,14)BPSK
BHD RS(64,32) (64,22)RM (3,1,6)Vit
BCH(255,123)
Golay(24,12)
Galileo
(32,6)Mariner
0.0
2.0
4.0
6.0
8.0
10.0
12.0
Bo un d
Bo un d
4.0
(4,3,6)32QAM Ung
V.32
16QAM
16QAM Bound
(4,3,3)4D GCS
Ca pa cit
(4,3,6)4D CS (4,3,6)16QAM Ung
3.0
(4,3,4)16QAM NL ML/turbo (4,3,16)16QAM (4,3,3)4D GCS ML LDGM Pragmatic (3,2,6)8PSK Ung

' 0 &
8PSK
8PSK Bound
(6,5,4)2x8PSK (3,2,2)GU 3x8PSK QPSK
!
TTCM/16QAM
8x8PSK BCM Imai ML (3,2,2)8PSK Ung

! (
2.0
(3,2,17)8PSK TTCM/8PSK BHD/TCM
2
QPSK Bound
(4,3,4)4D Wei (2,1,4)GU 3x4PSK (3,2,4)8D Wei BPSK DITEC APP BHD turbo(65536,18)
& $
1.0
1
(2,1,6)QPSK
!
(2,1,6)AM Ung RS(255,223) BCH(255,123)

" $
BPSK Bounds
Pioneer Voyager
%
0.0
2.0
4.0
6.0
8.0
10.0
12.0
256QAM
8
14.0
8.0
Bo un d
256QAM Bound
(4,3,4)4D Wei 128QAM
8
V.34
Bo u
7.0
nd
Ca pa ci ty
(4,3,17)256QAM (4,3,4)4D Wei P 64QAM

8
128QAM Bound
6.0
64QAM Bound
(4,3,4)4D Wei (4,3,6)64QAM Ung (4,3,4)4D Wei 32QAM
8 8
5.0
(4,3,6)32QAM Ung (4,3,6)4D CS (4,3,16)16QAM
32QAM Bound
16QAM
8
4.0
V.32
16QAM Bound
(4,3,3)4D GCS (4,3,4)16QAM NL 8PSK
9
3.0
ML/turbo (3,2,17)8PSK
B
2.0
TTCM/16QAM
D
(4,3,6)16QAM Ung (4,3,3)4D GCS QPSK 8x8PSK BCM

@
8PSK Bound
(3,2,6)8PSK Ung (4,3,4)4D Wei (3,2,4)8D Wei RS(255,223) Golay(24,12)
QPSK Bound
(2,1,4)GU 3x4PSK
1.0
turbo(65536,18) Galileo (2,1,6)Vit Voyager (4,1,14)BPSK
A
BPSK
BPSK Bounds
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
20.0
22.0
24.0
Figure 4.1: Performance of various coding schemes over Eb /N0 at a bit error probability 105 .
0.0 -2.0
3
0.0 -2.0

Galileo
(4,1,14)BPSK
(2,1,6)Vit Golay(24,12) Golay(24,12) (3,1,6)Vit (64,22)RM (32,6)Mariner

"
0.0 -2.0
GSM
(64,42)RM
Golay(24,12) QR(48,24) BCH(31,16)
TTCM/16QAM
QPSK Bound
BPSK
BPSK Bounds
Hamm(31,26) Hamm(15,11) Hamm(7,4)
(32,6)RM
23
8x8PSK BCM Pragmatic QPSK

2.0
Cap acit yB oun
(3,2,2)8PSK Ung
BHD/TCM
(4,3,4)4D Wei (2,1,4)GU 3x4PSK (3,2,4)8D Wei

1.0
(2,1,6)QPSK RS(255,223)
(2,1,6)AM Ung
DITEC SPADE
BCH(64,45) turbo(65536,18) BHD Pioneer Voyager BHD

BCH(255,123) IS95 (64,22)RM (3,1,6)Vit

Golay(24,12) Golay(24,12) (32,6)RM
(4,1,14)BPSK
(32,6)Mariner
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
Bo un d
4.0
(4,3,6)32QAM Ung
16QAM
Ca pa cit y
Bo un d
16QAM Bound
(4,3,6)4D CS (4,3,6)16QAM Ung
!
(4,3,3)4D GCS
3.0
(4,3,16)16QAM
(4,3,4)16QAM NL
8PSK
8PSK Bound
(4,3,3)4D GCS (6,5,4)2x8PSK (3,2,2)GU 3x8PSK
BHD/TCM
(4,3,4)4D Wei (3,2,4)8D Wei (2,1,6)QPSK RS(255,223) BHD BCH(255,123)

# # %
(2,1,4)GU 3x4PSK (2,1,6)AM Ung DITEC BPSK
1.0
Pioneer Voyager Galileo
& 1
(2,1,6)Vit turbo(65536,18)
% '
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
256QAM
8
8.0
Bo un d
Bo un d
256QAM Bound
(4,3,4)4D Wei 128QAM
8 A
7.0
Ca pa cit y
(4,3,17)256QAM V.34
128QAM Bound
(4,3,4)4D Wei P 64QAM
8
6.0
(4,3,6)64QAM Ung
8
64QAM Bound
(4,3,4)4D Wei 32QAM
8
5.0
(4,3,6)32QAM Ung (4,3,6)4D CS (4,3,16)16QAM
32QAM Bound
(4,3,4)4D Wei V.32 16QAM
8
4.0
16QAM Bound
(4,3,3)4D GCS (4,3,4)16QAM NL (4,3,6)16QAM Ung (4,3,3)4D GCS 8PSK
3.0
8x8PSK BCM
8PSK Bound
2.0
QPSK
(3,2,6)8PSK Ung (3,2,17)8PSK (4,3,4)4D Wei (2,1,4)GU 3x4PSK (3,2,4)8D Wei turbo(65536,18) Voyager
@
QPSK Bound
BPSK
1.0
Galileo
RS(255,223) Golay(24,12) (2,1,6)Vit
BPSK Bounds
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
20.0
22.0
24.0
26.0
28.0
30.0
Figure 4.2: Performance of various coding schemes over Eb /N0 at a bit error probability 1010 .
0.0 -2.0
3
(4,1,14)BPSK
0.0 -2.0

(4,1,14)BPSK
Golay(24,12) Golay(24,12) (3,1,6)Vit (64,22)RM (32,6)Mariner
"
(3,2,17)8PSK
0 2
Imai ML Pragmatic (3,2,2)8PSK Ung
2.0
QPSK
"
(3,2,6)8PSK Ung
8x8PSK BCM
QPSK Bound
BPSK Bounds
V.32
18.0
0.0 -2.0
Galileo
turbo(16384,18)
RS(64,32) (64,42)RM (2,1,6)Vit GSM
QR(48,24) BCH(31,16)
(3,2,17)8PSK

QPSK Bound
BPSK
BPSK Bounds
Hamm(31,26) Hamm(15,11)
Hamm(7,4)
(3,2,6)8PSK Ung
Imai ML
24
Part II
Block Coding
26
Chapter 5
Linear Binary Block Codes

Linear block codes are a subclass of block codes and are the only block codes of practical interest. Here we will consider only binary linear block codes.
5.1
Structural Properties
We already have seen that a code is a set of length n code words. The code words components are taken from 2 which means that all computations are performed modulo 2. In the following we will use the theory of vector spaces to study binary linear codes. Denition 5.1 (binary linear block code) A binary linear block code C is a linear subspace of n . If C has dimension k then C is called a (n, k) code. 2 C1 = {000, 001, 010, 011, 100, 101, 110, 111} C2 = {000, 011, 101, 111} C3 = {000, 111}

Exercise 5.1 (linear codes over
3) 2
There exist three dierent linear codes over
3, 2
viz., (5.1) (5.2) (5.3)
Determine their code parameters and draw the corresponding subspaces. Given an (n, k) code then a new code with the same parameters is obtained by simply changing the position of some of its components in every code word. Denition 5.2 (equivalent codes) Two codes that are the same except for a permutation of components are called equivalent. At a rst glance an equivalent code seems to be only trivially dierent from the original code. Actually, both codes have the same error correction-capability, but as will be seen later there can be a signicant dierence in decoding complexity. Due to the linearity of the considered codes, there exist a basis of vectors that span the code. Denition 5.3 (generator matrix) A generator matrix G for a linear block code C is a kn matrix with elements from 2 for which the rows are a basis of C. The rows of G are linearly independent code words of the code. A one-to-one mapping of length k information words u = u0 u1 . . . uk1 to the length n code words is obtained by c = uG. (5.4)
Of cause any other one-to-one mapping between information and code words is possible. Using a generator matrix to realize an encoder leads us to a less complex encoder implementation then using large look-up tables to establish the mapping of information to code words. In general there does not
5.1. STRUCTURAL PROPERTIES
27
exist a generator matrix for non-linear codes and the implementation of their encoders can already be a dicult task. Any k linearly independent code words of the code can be used to build a generator matrix and each generator matrix represents a dierent mapping of information onto code words. This means that the same code C can be obtained by dierent generator matrices. Denition 5.4 (equivalent generator matrices) Two generator matrices that encode the same code are called equivalent. Equivalent generator matrices are related by row operations. It can be shown that two generator matrices G and G are equivalent if and only if they satisfy G = TG (5.5)
where T is a k k matrix satisfying det T = 1. Among the set of equivalent generator matrices there is a subset that is of great practical importance. Denition 5.5 (systematic generator matrices) A generator matrix that maps the k information symbols unchanged to any k code components is called a systematic generator matrix. It can be shown that there exists at least one systematic generator matrix for any linear code. Hence, among the set of equivalent codes there exists at least one code with a systematic generator matrix of the form G= Ikk P (5.6)
where Ikk is the k k identity matrix and P is a k (n k) matrix. Then c = uG = (u | uP ) (5.7) and the code word starts with the k information symbols. If not mentioned otherwise, we will assume a systematic generator matrix to be of the form described above. Sometimes1 it is necessary to perform the inverse mapping from a given code word to the information. This is done by the n k right inverse G1 of G, i.e., u = cG1 . (5.8) There exist methods to compute G1 , but they will not be considered here. In the case of a systematic generator matrix we have G1 = (Ikk | 0)T and it is trivial to perform the inverse mapping. Example 5.1 (generator matrix and equivalent generator matrices) We obtain a generator matrix for the (7, 4, 3) Hamming code C given in Example 2.1 by taking four linear independent code words, e.g., 1 1 1 0 0 0 0 1 0 0 1 1 0 0 G= (5.9) 0 1 0 1 0 1 0 . 1 1 0 1 0 0 1 From the 3rd, 5th, 6th, and 7th column it is easy to see that G has full rank, i.e., the corresponding code words are linear independent. The generator matrix determines the mapping from information to code word. For example, the information u = (1 1 0 1) is mapped to the code word 1 1 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 0 1 = 1 1 0 1 (5.10) 0 1 0 1 0 1 0 . 1 1 0 1 0 0 1
1
There exist important decoding methods that produce a code word as output.
28
CHAPTER 5. LINEAR BINARY BLOCK CODES
Notice that all additions are modulo 2! A systematic generator matrix that is equivalent to G is obtained by G = TG 0 0 = 1 0 1 0 = 0 0 (5.11) 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 0 1 (5.12)
(5.13)
Denition 5.6 (parity check matrix) Given a linear code C, then any (n k) n matrix H that satises cC is called a parity check matrix of C. From (5.14) it follows immediately that GH T = 0. and for a systematic generator matrix we have GH T = (Ikk | P ) P I(nk)(nk) =P +P =0 (5.16) (5.15) cH T = 0 (5.14)
where T is the inverse of the 4 4 matrix built by the rst four columns of G.
where the parity check matrix is given in systematic form H = (P T | I(nk)(nk) ). (5.17)
Once we have a systematic generator matrix it is trivial to obtain a parity check matrix of the code. A we have already seen, the parity check matrix gives a way of testing whether a vector is a code word. We dene:
Denition 5.7 (syndrome) If C is a linear code with parity check matrix H then for every x we call the 1 (n k) vector s = xH T the syndrome of x.
n 2
(5.18)
Code words are characterized by a syndrome s = 0. The rst decoding methods that were developed for binary block codes are based on syndrome computations. Often it is interesting to construct codes from other codes. It is easy to see that the parity check matrix of C immediately denes an other code. We introduce: Denition 5.8 (dual code) If C is a (n, k) code we dene the (n k, n) dual code C by
C = {x
n 2
| cxT = 0, c C}.
(5.19)
We can show that the parity check matrix of C is the generator matrix of the dual code C . If C = C then C is called self-dual. Code and dual code are closely related. In the following we will discuss methods that are often used to modify a given code. This will allow us to construct codes that have slightly dierent code parameters compared to the original code. In a system design this is often necessary to adapt given code parameters to the system restrains. We distinguish six basic code modications:
29
Extention and puncturing A code can be extended by adding additional parity check symbols or punctured by deleting parity check symbols. If both operation are performed with a single symbol, then the code parameters change as follows: (n, k) (n, k) (n + 1, k) extention (n 1, k) puncturing (5.20) (5.21)
The most common way to extend a code is to add an overall parity check such that all code words have even weight. While extention can increase the error correction capability of the code, puncturing decreases the error correction capability and it strongly depends on which parity bit is deleted. Expurgation and puncturing are inverse operations. Example 5.2 (extention) The extended (8, 4, 4) Hamming check matrix 0 1 1 1 1 0 H= 1 0 1 1 0 1 1 1 0 1 0 0 of the (7, 4, 3) Hamming code by Hext = Then by using (5.16) we obtain 1 0 = 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 ... 1 1 H 0 code is obtained by modifying the parity 0 0 1 .
(5.22)
(5.23)
Gext
(5.24)
which is a systematic generator matrix of the extended (8, 4, 4) Hamming code. Exercise 5.2 (puncturing) Puncturing is done by erasing columns of the generator matrix. Puncture the (7, 4, 3) Hamming code by one information symbol and compute the corresponding code parameters. Expurgation and augmentation A code can be expurgated by discarding some of the code words or augmented by adding new code words to the code. As we will see, this operation changes the code parameters as follows: (n, k) (n, k) (n, k + 1) (n, k 1) expurgation augmentation (5.25) (5.26)
Since we want to keep the linearity of the code, we can only expurgate subsets of code words and do this by setting information symbols to zeros in the encoding process. Doing this with one information symbol decreases the number of code words by factor two! Expurgation can signicantly increase the error correction capability of the code and depends strongly on the underlying generator matrix. It is a dicult task to nd the best choice among all generator matrices. Nevertheless, since we cannot decrease the error-correction capability any generator matrix can be used and, clearly, non-systematic generator matrices are the preferred choice. To explain augmentation we consider the parity check matrix of the code. By erasing one row we inherently substitute one parity check symbol of the code by an information symbol. This operation doubles the number of code words and can signicantly decrease the error correction capability of the code. Hence, we have to be careful with this operation. Expurgation and augmentation are inverse operations. Exercise 5.3 (expurgation and augmentation) Expurgate and augment the (7, 4, 3) Hamming code and compute the code parameters of the corresponding codes. Compare the three sets of code words.
30
Lengthening and shortening A code can be lengthened by adding additional information symbols or shortened by deleting information symbols. With these operations we change the code parameters as follows: (n, k) (n, k) (n + 1, k + 1) lengthening (n 1, k 1) shortening (5.27) (5.28)
To lengthen a code by one symbols we have to perform two steps. First we augment the code by erasing one row in its parity check matrix and then, in a second step, we extend the augmented code by adding an additional parity check bit. In many cases this operation can be done without signicantly reducing the error correction capability of the code. To shorten a code we simply delete a systematic encoded information symbol. Lengthening and shortening are inverse operations. Exercise 5.4 (lengthening and shortening) Lengthen and shorten the (7, 4, 3) Hamming code and compute the corresponding code parameters. Compare the three sets of code words.
5.2
Distance Properties
The distance properties of a code determine its error correction capability. Often codes are compared with respect to there distance properties.
5.2.1
Basic Denitions
The most important distance measure of a bock code is its minimum distance. Denition 5.9 (minimum distance) The minimum distance of a (n, k) block code is dm = min {dH (c, c )}.
c,c C c=c
(5.29)
where dH (c, c ) denotes the Hamming-distance of c and c . Due to the linearity of the considered codes it follows that the sum of two code words is again a code word: c, c C c + c C. (5.30)
This is a very important property of linear code that considerably simplies the following discussion of distance properties. From (5.30) follows immediately dm = min {wtH (c)}
cC,c=0
(5.31)
The minimum distance of a linear code is given by the minimum weight code word. Hence, to get the minimum distance of the code we do not have to compute the distance between all possible pairs of code words but we compute the weight of all code words. This is a signicant dierence in the analysis of linear codes compared to non-linear codes! To see the importance of the minimum distance we consider a minimum distance decoder (MDD). This is a device that outputs the code word c satisfying c = arg min{dH (c, c)}
cC
(5.32)
where c = c + e, c C, (5.33)
5.2. DISTANCE PROPERTIES
31
and c is the transmitted code word and e is the error pattern generated by the channel. The minimum distance determines the errors weight t (dm 1)/2 (5.34)
that the MDD is able to correct. Notice, there might exist particular errors patterns of higher weight that can be corrected! In this respect, we introduce an other important distance parameter: Denition 5.10 (covering radius) The covering radius of a (n, k) code is
cC
= max min{dH (c, x)} x
n 2
(5.35)
where dH (c, x) denotes the Hamming-distance of the code word c and the vector x, respectively. We dene the set of error patterns that a MDD is able to correct by Ec = {e | c = c}. Then the covering radius is the maximum weight among all these error patterns, i.e., = max{wtH (e)}.
eEc
(5.36)
(5.37)
that MDD is able to correct. This means, there does not exist an error pattern e of weight wt H (e) > that can be corrected by the code. In other words, an error pattern satisfying wt H (e) > guarantees to generate a decoding error, i.e., c = c, at the output of a MDD. Although it is of limited interest for practical problems, the covering radius is important to gain deeper insight and better understanding when considering the error correction capability of a code. Unfortunately, for long codes it can be extremely dicult to compute the covering radius. To have a more detailed information about the distances of a code it is necessary to consider so-called weight enumerator functions. From the linearity of the considered codes, it follows that the distances of a code word c to all other code words is the same for all c C. Hence, it is sucient to consider the zero code word and the weight distribution of a linear code is called its distance distribution. Denition 5.11 (weight enumerator) Let C be a (n, k) linear code and let A(w) be the number of code words of weight w. Then
n
A(W ) =
w=0
A(w)W w
(5.38)
is called the weight enumerator of C and A(w) is called the weight distribution of C. For many block codes there exist an analytic expression for the weight distribution. For example, the weight distribution of the length n binary Hamming code is given by [vL82] A(W ) = 1 + 1 n (1 + W )n + (1 + W )(n1)/2 (1 W )(n+1)/2 . 1+n 1+n (5.39)
There exist codes where A(w) is not known and it is too complex to compute it. Later we will see that weight enumerator allow us to compute upper bounds on the performance of the code. This can be very useful and is of great practical meaning. Sometimes it is useful to know the following relation between the weight distribution of a linear binary code C and its dual code C [MS77] A (W ) = 1 + 2k (1 + W )n A Let us consider an example: 1W 1+W . (5.40)
32
Example 5.3 (weight distribution) The weight distribution of the (7, 4, 3) Hamming code is A(W ) = 1 + 7W 3 + 7W 4 + W 8 . (5.41)
The weight distribution is a code property. This means it does not depend on a particular generator matrix. To take the mapping of information to code words into account, we dene: Denition 5.12 (input-output weight enumerator) Let C be a (n, k) linear code with generator matrix G and let A(z, w) be the number of information words of weight z that are mapped to code words of weight w. Then
k n
A(Z, W ) =
z=0 w=0
A(z, w)Z z W w
(5.42)
is called the input-output weight enumerator of G. The input-output weight enumerator is a generator matrix property. Clearly, we have
k
A(w) =
z=0
A(z, w)
(5.43)
and, hence, A(W ) = A(Z = 1, W ). (5.44)
Example 5.4 (input-output-weight distribution) The input-output weight distribution of the (7, 4, 3) Hamming code with generator matrix (5.13) is A(W, Z) = 1 + (3W 3 + W 4 )Z + (3W 3 + 3W 4 )Z 2 + (W 3 + 3W 4 )Z 3 + W 7 Z 4 , (5.45)
e.g., there are four information words with Hamming-weight 3, one is mapped to a code word of Hamming-weight 3 and three to code words of Hamming-weight 4. Later we will show how to obtain an upper bound on the bit error probability of the code with A(Z, W ) for communication over the AWGN channel.
5.2.2
Bounds on Minimum Distance
Due to the importance of the minimum distance, we will present some important upper and lower bounds on dm . Generally, we are interested in codes that have as many code words as possible for a given code length and given minimum distance. Let us start with an lower bound on the minimum distance.
Theorem 5.1 (Gilbert-Varshamov bound) If n, k, dm

dm 1 i=0
satisfy (5.46)
n i
< 2nk+1
then a (n, k, dm ) linear binary block code exists. For large n this becomes R1h where h() is the binary entropy function. This theorem guarantees the existence of good linear block codes. It does not show how to nd them! In the following we will consider three dierent upper bounds on the minimum distance. dm n (5.47)
5.2. DISTANCE PROPERTIES Theorem 5.2 (Singleton bound) Any (n, k, dm ) linear binary block code satises dm n k + 1. For large n this becomes R1 dm . n
33
(5.48)
(5.49)
Codes that satisfy the (5.48) with equality are called maximum distance separable (MDS) codes. Theorem 5.3 (Plotkin bound) The minimum distance of any binary length n code satises dm |C|2 . n 2|C|2 2|C| For large n this becomes R1 2dm . n (5.51) (5.50)
The Plotkin bound holds for non-linear codes. It is tight for low rate codes.
Theorem 5.4 (Hamming bound) If n, e satises

t i=0
, dm = 2t + 1, then any (n, k, dm ) linear binary code
n i
2nk
(5.52)
For large n this becomes R1h where h() is the binary entropy function. Codes that satisfy the Hamming-bound with equality are called perfect codes. There are only a few perfect binary codes known. These are repetition codes with odd code length n, Hamming codes, and the (23, 12, 7) Golay code which will be presented later. To illustrate the minimum distances that we can achieve with length n, rate R = k/n binary block codes, we rewrite the Gilbert-Varshamov bound as
dm 1 i=0
dm 2n
(5.53)
n i
< 2n(1R)+1
(5.54)
and the Hamming bound as

t i=0
n i
2n(1R) ,
dm = 2t + 1,
(5.55)
and plotted dm that satises (5.54) and (5.55) in Figure 5.1. The asymptotic versions of the bound presented above are depicted in Figure 5.2.
34
15 R=1/3 R=1/2 R=2/3 10
25
R=1/3 R=1/2 R=2/3
20
15 dm d 5 5 0 0 20 n 40 60
m
10
0 0
20 n
40
60
Figure 5.1: Bounds on the minimum distance of binary block codes of various code rates. GilbertVarshamov bound (left) and Hamming bound (right) according to (5.54) and (5.55), respectively.
1 GilbertVarshamov Singleton Plotkin Hamming
0.8
0.6 R 0.4 0.2 0 0
0.2
0.4
dm/n
0.6
0.8
Figure 5.2: Asymptotic bounds on the minimum distance of binary block codes.
5.3. IMPORTANT CODE FAMILIES
35
5.3
Important Code Families
There are a few code families of binary linear codes. As we have seen, there are the families of repetition codes, single parity check codes and Hamming codes. All three code families have the property that their code rates are either R 0 or R 1 as the code length increases. Here we will introduce three more code families, namely, simplex codes, Hadamard codes and Reed Muller codes. The last two code families allow us to construct codes with a code rates that will not be close to zero or one for large code length.
5.3.1
Simplex Codes
Again, these codes have been one of the rst known codes for error correction. Simplex codes are often called maximum length codes. Denition 5.13 (simples codes) The (2m 1, m, 2m 1) simplex code is the dual code of the (2m 1, 2m m 1, 3) Hamming code. The weight enumerator of the simplex code is A(W ) = 1 + (2m 1)W 2
m 1
(5.56)
The main problem with respect to system applications is the low rate of simplex codes. Nevertheless, if low rate codes are required they are a good choice.
5.3.2
Hadamard Codes
Hadamard code have been one of the rst codes that were used for error correction. Denition 5.14 (Hadamard matrix) A square matrix X of order m with elements 1 such that XX T = mImm is called Hadamard matrix. It is possible to show that the Kronecker product2 of two Hadamard matrices is again a Hadamard matrix. Starting with the m = 2 Hadamard matrix X2 = 1 1 1 1 (5.59) (5.57)
we can construct Hadamard matrices of large dimensions. It is worth to mention that there exist other methods to construct Hadamard matrices of any dimension. Now we are prepared to dene:
2
The Kronecker product of a m m matrix A and a n n matrix B is a mn mn matrix given by a11 B a12 B . . . a1m B a21 B a22 B . . . a2m B AB = . . . . . . . . . . am1 B am2 B . . . amm B
(5.58)
36
Denition 5.15 (Hadamard codes) Let Xm be a Hadamard matrix of order m. Then by replacing 1 by 0 in Xm and Xm the rows of both matrices are a length m Hadamard code of 2m code words with minimum distance dm = m/2. The minimum distance dm = n/2 follows from the property that any two rows in a Hadamard matrix dier in half of the positions. Example 5.5 (Hadamard code) To construct the m = 4 Hadamard code we compute 1 1 1 1 1 1 1 1 , X4 = X 2 X 2 = 1 1 1 1 1 1 1 1 C = {1111, 1010, 1100, 1001, 0000, 0101, 0011, 0110}. This is a linear binary (4, 3, 2) code with the systematic 1 0 0 G= 0 1 0 0 0 1 generator matrix 1 1 , 1
(5.60)
replace 1 by 0 in X4 and X4 and obtain the code
(5.61)
(5.62)
i.e., it is a (4, 3, 2) single parity check code.
Exercise 5.5 (Hadamard code) Show that the length m = 8 Hadamard code is equivalent to the extended (8, 4, 4) Hamming code. All Hadamard codes obtained by Hadamard matrices where m is a power of 2 are linear.
5.3.3
Reed-Muller Codes
There exist many dierent way to dene Reed Muller codes. Here we use that presented in [CC82]: Denition 5.16 (RM codes) Let c0 be the length 2m all-one vector and let c1 , c2 , . . . cm be the rows m mr ) Reed Muller of a matrix with all possible m-tuples as columns. The rth order (2m , r i=0 i , 2 (RM) code has a generator matrix which rows are the vectors c0 c1 , c2 , . . . cm and all of there vector products3 r or fewer at time. Example 5.6 (RM codes) The basis vectors for a Reed Muller codes of length n = 8, i.e., m = 3, are given by c0 = (1 1 1 1 1 1 1 1) c1 = (0 0 0 0 1 1 1 1) c2 = (0 0 1 1 0 0 1 1) c3 = (0 1 0 1 0 1 0 1) c1 c2 = (0 0 0 0 0 0 1 1) c1 c3 = (0 0 0 0 0 1 0 1) c2 c3 = (0 0 0 1 0 0 0 1) c1 c2 c3 = (0 0 0 0 0 0 0 1). It is easy to see, that the 0th order RM code with generator matrix G=
3
1 1 1 1 1 1 1 1
(5.63)
Here we dene the vector product as ab = (a0 b0 , a1 b1 , . . . , an bn ) with a = (a0 , a1 , . . . , an ) and b = (b0 , b1 , . . . , bn ).
5.4. DECODING ASPECTS is the (8, 1, 8) repetition code. The generator matrix of the rst order (8, 4, 4) RM code is 1 0 G= 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 . 1 1
37
(5.64)
This code is equivalent to the extended (8, 4, 4) Hamming code. A generator matrix of the second order (8, 7, 2) RM code is given by 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1
G=
(5.65)
This code is equivalent to the (8, 8, 2) single parity check code. Reed Muller codes are an important class of codes. They are highly structured and can be related to other code families, e.g., see [Bos99]. RM codes can be decoded by a simple decoding method called threshold decoding and have been among the rst codes used in system applications.
5.4
Decoding Aspects
Generally, we have to distinguish between decoding rule, decoding technique, and the decoder implementation. The decoding rule is a mathematical formulation of the decoding principle. The decoding technique shows how to realize a decoding rule in an ecient and eective way. This again must be distinguished from a particular decoder implementation in hardware or software. Here we will present the most important decoding rules and decoding techniques. We will not discuss how to actually implement such a decoder.
5.4.1
Decoding Rules
Suppose that we have been given a rate R = k/n binary block code to communicate over a given channel, and our task is to design a decoder. The channel is fully described by the conditional probability function fR|R (r|c) where c = c0 c1 . . . cn1 denotes the code word that is sent over the channel and r = ro r1 . . . rn1 is the received vector. Our objective is to make this information transmission system as reliable as possible. This is achieved by either minimizing the block error probability P ( = c) or c the bit error probabilities P (i = ui ), 1 i n, u at the output of the decoder, where c denotes the estimated code word and u i the ith estimated information symbol. In general, it is not possible to achieve both objectives simultaneously. Let us formulate both decoding rules: Optimal sequence estimation Minimizing the block error probability P ( = c) is often called c sequence estimation. The corresponding decoding rule is c = arg max{fC|R (c|r)}.
cC
(5.66)
38
We rewrite the conditional probability function fC|R (c|r) by using the rule of Bayes fC|R (c|r) = fC (c)fR|C (r|c) . fR (r) (5.67)
Since fR (r) is not relevant for the maximization, we obtain the decoding rule for maximum aposteriori probability (MAP) sequence estimation c = arg max{fC (c)fR|C (r|c)}.
cC
(5.68)
We call fC (c) the apriori probability of c, i.e., the probability that the code word c is sent. In contrast to this, fc|r (c|r) is called the aposteriori probability of c. Usually, it is assumes that all code words are equally probable, and we obtain the decoding rule for maximum likelihood (ML) sequence estimation c = arg max{fR|C (r|c)}.
cC
(5.69)
Unfortunately, the name ML sequence estimation is somewhat misleading. Clearly, if the apriori probability of c is not equally distributed MAP sequence estimation will minimize the block error probability. Since we only consider memoryless channels, we have
n
fR|C (r|c) =
i=1
fRi |Ci (ri |ci )
(5.70)
where for the BSC with crossover probability p fRi |Ci (ri |ci ) = p, Ri = C i 1 p, Ri = Ci (5.71)
and for the AWGN channel with 2 = N0 /2 and ci = Es

(ri ci )2 1 fRi |Ci (ri |ci ) = e 22 . 2 2
(5.72)
By taking the logarithm and then multiplying and adding appropriate constants we can rewrite (5.69) for the BSC as
n1
c = arg min
cC i=0
dH (ri , ci )
(5.73)
and for the AWGN channel as

n
c = arg min
cC i=1
dE (ri , xi )
(5.74)
where dE () denoted the squared Euclidean distance and is dened by dE (x, y) = (x y)2 Exercise 5.6 (Hamming metric) Derive (5.73) from (5.69). Example 5.7 (ML sequence estimation) Given the (7, 4, 3) Hamming code is used to communicate over the BSC with crossover probability p. Let r = 0 0 1 1 1 0 0 be the received word. (5.75)
5.4. DECODING ASPECTS

r c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 = = = = = = = = = = = = = = = = = (0 (0 (0 (0 (0 (0 (0 (0 (0 (1 (1 (1 (1 (1 (1 (1 (1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0 0 1 0) 0) 1) 0) 1) 1) 0) 1) 0) 1) 0) 1) 0) 0) 1) 0) 1) fR|C (r|c) p3 (1 p)4 p3 (1 p)4 p2 (1 p)5 p2 (1 p)5 p4 (1 p)3 p4 (1 p)3 p5 (1 p)2 p1 (1 p)6 p(1 p)5 p2 (1 p)5 p3 (1 p)4 p3 (1 p)4 p5 (1 p)2 p5 (1 p)2 p4 (1 p)3 p4 (1 p)3 dH (ci , r) 3 3 2 2 4 4 5 1 6 2 3 3 5 5 4 4
39
Then c8 = (0 1 1 1 1 0 0) is the most probable sent code word, e.g., it is most probable that an error at the 2nd code symbol occurred. Optimal symbol-by-symbol decoding Minimizing the bit error probability P ( i = ui ) is often u called symbol by symbol decoding. The corresponding decoding rule is ui = arg max {fUi |R (ui |r)}.
ui
(5.76)
Again we can apply the rule of Bayes and obtain the decoding rule for symbol by symbol MAP decoding ui = arg max {fUi (ui )fR|Ui (r|ui )}.
ui
(5.77)
If we drop the apriori probability fUi (ui ) we obtain the symbol by symbol ML decoding rule. Although MAP and ML decoding can be done for both, viz., sequence estimation and symbol by symbol decoding, the term ML-decoding is often used for ML sequence estimation and the term MAP-decoding for symbol by symbol MAP decoding. Often it is useful to introduce the log likelihood ratio for the ith information bit L(ui ) = fUi |R (ui = 0|r) . fUi |R (ui = 1|r) (5.78)
Then Li is called the soft-output of the decoder and can be interpreted as reliability information of the decoded information bit ui = 0, L(ui ) 0 1, L(ui ) < 0. (5.79)
Using the rule of Bayes we can rewrite the log likelihood ratio as L(ui ) = log
cC0
fC (c)fR|C (r|c) . cC1 fC (c)fR|C (r|c)
(5.80)
where C0 and C1 are subsets of the code C such that the information bit xi is 0 and 1, respectively. Then the aposteriori probability of the estimated information symbol ui can be calculated from L(ui ) by fUi |R (Ui = 0|r) = eL(ui ) 1 + eL(ui ) (5.81)
40 and
fUi |R (Ui = 1|r) = 1 fUi |r (Ui = 0|r).
(5.82)
The higher the aposteriori probability the more reliable is the estimated information symbol u i . Example 5.8 (symbol by symbol MAP decoding) Given the (7, 4, 3) Hamming code is used to communicate over the BSC with crossover probability p. Assume the information bits u 0 u1 u2 u3 are systematically encoded to the four code symbols c0 c1 c2 c3 and let the r = (0 0 1 1 1 0 0) be the received word, e.g., we consider the same situation as in the previous example. Now we can use the probabilities fR|C (r|c) in the table used for ML sequence estimation to compute the log likelihood values, e.g., L(u1 ) = log = log
8 i=1 fR|C (r|c 16 i=9 fR|C (r|c
= ci ) = ci )
(5.83)
p(1 p)6 + 2p2 (1 p)5 + 2p3 (1 p)4 + 2p4 (1 p)3 + p5 (1 p)2 p2 (1 p)5 + 2p3 (1 p)4 + 2p4 (1 p)3 + 2p5 (1 p)2 + p6 (1 p) 1p = log(1 p) log p = log p and equivalently, L(u2 ) = log = log and L(u3 ) = log = log = and L(u4 ) = log = log
i=1,3,5,7,9,11,13,15 fR|C (r|c i=2,4,6,8,10,12,14,16 fR|C (r|c i=1,2,5,6,9,10,13,14 fR|C (r|c i=3,4,7,8,11,12,15,16 fR|C (r|c i=1,2,3,4,9,10,11,12 fR|C (r|c i=5,6,7,8,13,14,15,16 fR|C (r|c
(5.84) (5.85)
= ci ) = ci )
(5.86) (5.87)
3p2 (1 p)5 + 4p3 (1 p)4 + p6 (1 p) , p(1 p)6 + 4p4 (1 p)3 + 2p5 (1 p)2 + p6 (1 p) = ci ) = ci )
(5.88) (5.89) (5.90)
p = log p log(1 p), 1p
p2 (1 p)5 + 2p3 (1 p)4 + 2p4 (1 p)3 + 2p5 (1 p)2 + p6 (1 p) p(1 p)6 + 2p2 (1 p)5 + 2p3 (1 p)4 + 2p4 (1 p)3 + p5 (1 p)2
= ci ) = ci )
(5.91) (5.92)
p2 (1 p)5 + 2p3 (1 p)4 + 2p4 (1 p)3 + 2p5 (1 p)2 + p6 (1 p) p(1 p)6 + 2p2 (1 p)5 + 2p3 (1 p)4 + p4 (1 p)3 + 2p5 (1 p)2
i 0 1 2 3 L(ui ) 1.7346 -0.4482 -1.7346 -1.8675 ui 0 1 1 1 fUi |R (Ui = ui |r) 0.8500 0.6102 0.8500 0.8675
where we assumed fC (c) = 1/16. For example, a crossover probability of p = 0.15 yields:
where fUi |R (Ui = ui |r) is aposteriori probability of ui . As can be seen, the estimated information bit u1 is less reliable than the others. This corresponds to the position where the error occurred. An important aspect of the above described decoding rules is that they allow to apply soft-input soft-output decoding. Soft-input decoding means that the decoder is able to use reliability information for decoding. When considering concatenated coding schemes this is a very important feature.
41
Some other decoding rules The two decoding rules above are optimal in the sense that they minimize the corresponding error probability. There exist a wide range of other decoding rules that are based on the distance between the received vector r and code words. In the previous discussion we have already introduced the so called minimum distance decoding rule c = arg min{dH (c, r)}.
cC
(5.93)
In the case of a BSC minimum distance decoding is equivalent to ML sequence estimation. This equivalence can be used to reduce the decoding complexity of a ML sequence estimation decoder. A special case of the minimum distance decoding rule above is the bounded minimum distance (BMD) decoding rule. The corresponding decoder output is { | dH (c, r) (dm 1)/2, c C}. c (5.94)
This is either the empty set or a single code word. Any received vector that lies within a sphere of radius (dm 1)/2 around the code words of C is decoded to the corresponding code word. Received vectors outside these correction spheres generate an empty set and, hence, cause a decoding failure. Clearly, BMD decoding is suboptimal. Nevertheless, we will see later that it is of great practical importance. Clearly, we can generalize this concept by increasing the radius of the correction sphere and obtain the list decoding rule { | dH (c, r) r, c C}. c (5.95)
This is either the empty set or a set of one or more code words. List decoding can be applied in coding schemes that concatenate two or more codes. It is straight forward to generalize the above described decoding methods to other distance measures such as the squared Euclidean distance. Nevertheless, the known decoding techniques that allow an ecient realization of these decoding rules are based upon the Hamming metric.
5.4.2
Trellis Representation of Block Codes
The code trellis is a graphical representation of the code. Two important decoding techniques to realize ML sequence estimation and symbol-by-symbol MAP decoding operate in the code trellis. We start with the following denition: Denition 5.17 (code trellis) The code trellis T (S, E) of depth n is an edge-labeled directed graph with a set of states S and a set of edges E. The sequence of edge labels along each path of length n in the trellis corresponds to a length n code word. The set of all such sequences corresponds to the code C. This is best explained by considering an example: Example 5.9 (trivial code trellis) In Figure 5.3 the trivial trellis of the (3, 2, 2) single parity check code C = {000, 011, 110, 101} is depicted. The trellis is a directed graph and it is assumed that its edges are directed from the left to the right. The trellis consists of 10 states and 12 edges. Starting from the state on the left hand side we notice that each code word corresponds to the labeling of exactly one path through the trellis. The state set S of a depth n code trellis can be partitioned in n + 1 disjoint subsets S = S 0 S1 . . . S n . (5.96)
42
0 0

0 1 1 1 1 0 0 1 0 1
Figure 5.3: Trivial trellis of the (3, 2, 2) single parity check code. The subset Si , 0 i n, is the set of vertices at depth i. The subsets S0 and Sn consist of a single state and are called the root and the toor, respectively. The set of edges can be partitioned in n disjoint subsets E = E 1 E2 . . . E n
(5.97)
where any edge in Ei , 1 i n, starts in a state taken from Si1 and ends in a state taken from Si . Any edge is labeled with a symbol taken from 2 . The edges are directed from left to the right. Any path through the trellis starts in the root takes n state transitions (edges) and terminates in the toor. The edge labeling associated with such a path corresponds to a code word. For each code word of C there exist exactly one such path. Given a length n block code C. To obtain a code trellis we draw |C| parallel paths that start in the same state (root) and end in the same state (toor). Such a code trellis is called trivial trellis. The same code can be represented with another trellis that can have a signicantly smaller number of edges and states. Let us consider an example: Example 5.10 (code trellis) Figure 5.4 shows a code trellis of the (3, 2, 2) single parity check code C = {000, 011, 110, 101}. Notice that again exist four length three paths with an edge labeling corresponding to the code words of C. Compared to the trivial trellis in Figure 5.3 the number of states is reduced from 10 to 6 and the number of edges from 12 to 8. This reduction is achieved by paths that share edges.
0 0 1 1 0 0 1
Figure 5.4: A code trellis of the (3, 2, 2) single parity check code. Now the question arises of what is the minimal number of states and edges that is required to draw the trellis of a given code. How can we obtain such a code trellis? Before we answer that question, we have to introduce complexity measures. Let C be a block code of length n and T (S, E) be a trellis of depth n that represents C. Let S0 , S1 , . . . Sn be the the state sets at depth 0 i n. Then the following state complexity measures of T (S, E) are dened: state-cardinality prole: |S0 |, |S1 |, . . . |Sn | maximal number of states: Smax = max |S0 |, |S1 |, . . . |Sn | total number of states: |S| = |S0 | + |S1 | + . . . + |Sn | Equivalently, let E1 , E2 , . . . En be the the edge sets at depth 1 i n. Then the following edge complexity measures of T (S, E) are dened:
5.4. DECODING ASPECTS edge-cardinality prole: |E1 |, |E2 |, . . . |En | maximal number of edges: Emax = max |E1 |, |E2 |, . . . |En | total number of edges: |E| = |E1 | + |E2 | + . . . + |En | Now we are prepared to dene:
43
Denition 5.18 (minimal trellis) A trellis for a code C of length n is minimal if it satises the following property: for each i = 0, 1, . . . , n the number of states at depth i is less than or equal to the number of states in any other trellis for C. Starting with this denition minimal trellis have been studied extensively. In the following we summarize some of the most important properties: existence: Every linear block code has a minimal trellis. uniqueness: The minimal trellis of a linear block code is unique up to isomorphism. complexity: The minimal trellis of a linear block code simultaneously minimizes all state and edge complexity measures that were introduced above. There exist various way to construct a minimal trellis for a given block code. Two algorithms will be presented here, viz., the merge algorithm and the syndrome trellis construction. Merge algorithm This is a simple algorithm that allows to generate a minimal code trellis from any given trellis of the code C. It basically consists of two steps, viz., a forward and a backward recursion. In the forward recursion we merge all states at depth i, 1 < i < n, that arise from the same state at depth t 1 and have the same labeling. In the backward recursion exactly the same merging procedure is performed but now starting from the toor of the code trellis. Example 5.11 (merge algorithm) Given the (5, 3, 2) linear binary block code C = {00000, 00101, 01011, 01110, 10010, 10111, 11001, 11100}. Figure 5.5 depicts the merge algorithm starting from the trivial trellis of C. Syndrome trellis construction This trellis construction is using the parity check matrix H of the code. Since the syndrome s of each code word c C is 0, we can write s = cH T . . . hT n hT 1 hT 2 (5.99) (5.98)
x1 x2 . . . xn
n
(5.100)
=
i=1
xi1 hT i
(5.101) (5.102)
where hT is the ith column of the parity check matrix H and xi is the ith code symbols of a code word i c. Now we can dene
t
st
x 1 hT 1
x 2 hT 2
+ ... +
x t hT t
=
i=1
x i hT i
(5.103)
44
Figure 5.5: Example for merge algorithm: forward recursion starting from the trivial trellis on the left hand side and backward recursion on the right hand side. and obtain s0 = 0 s = st1 + xt1 hT , 1 t n 1 t t sn = s = 0.
(5.104)
This can easily be used to construct the so called syndrome trellis of the code. Starting from the root s0 we associate with each binary n k tuple st a state in the trellis at depth t. When generating the states st+1 at depth t + 1 we start from all states st at depth t and draw two edges, one for xt = 0 and another for xt = 1, to the corresponding states at depth st+1 . When reaching the depth n, we trace back and erase all states and edges that exclusively led to non-zero states s n = 0. It can be shown that the syndrome trellis is the minimal trellis of the code. From the syndrome trellis construction we can derive a upper bound on the state complexity of the minimal trellis of a linear binary code. It can be shown that the state cardinality at depth i satises |Si | 2min{k,nk} . (5.105)
This is called the Wolf bound. It shows that rate R = 1/2 code can have the largest state cardinality. Nevertheless, there exist codes that have a signicantly smaller state complexity than the bound indicates. Example 5.12 (syndrome trellis construction) Given the (5, 3, 2) linear binary block code C =
5.4. DECODING ASPECTS {00000, 00101, 01011, 01110, 10010, 10111, 11001, 11100} with parity check matrix H= 1 1 0 1 0 0 1 1 0 1 .
45
(5.106)
Then the syndrome s of a code word c is given by

n
s=
i=0
xi1 hT = x0 (1 0) + x1 (1 1) + x2 (0 1) + x3 (1 0) + x4 (0 1). i
(5.107)
Starting from the root s0 = (0 0), the rst ve steps to generate the code trellis of depth 5 are shown in Figure 5.6 together with nal step of erasing all non-zero states at depth 5.
00 01 10 11
00 01 10 11
00 01 10 11
00 01 10 11
00 01 10 11
00 01 10 11
Figure 5.6: Example for syndrome trellis construction in six steps depicted from top left to right bottom. Exercise 5.7 (syndrome trellis) Construct the minimal trellis of the (7, 4, 3) Hamming code using the parity-check matrix 1 0 1 0 1 0 1 H= 0 1 1 0 0 1 1 0 0 0 1 1 1 1 The corresponding trellis is depicted in Figure 5.7. Finally, we like to mention two important aspects when dealing with code trellises. The rst is the so called permutation problem. It is known that equivalent codes can have a signicantly dierent trellis complexity. To nd that codes among the set of equivalent codes which has the smallest minimal trellis complexity is a dicult problem and its complexity is known to be non-polynomial in code length. Exercise 5.8 (permutation problem) Show that the 1st order rate R = 1/2 length n = 8 Reed Muller code and the (8, 4, 3) extended Hamming code are equivalent. Compare the minimal trellis complexity of both codes.
(5.108)
46
000 100 010 110 001 101 011 111
Figure 5.7: Syndrome trellis of (7, 4, 3) Hamming code. More such examples can be constructed by comparing the trellis complexity of Reed Muller codes and BCH codes. These two code families contain a large set of equivalent codes that have a signicantly dierent minimal trellis complexity. An other interesting aspect of code trellises is the so called sectionalization problem. Until now we used single binary symbols for the edge labels. In general, we might use tuples of binary digits. This can have an great impact on the trellis complexity. There exists an algorithm that allow to construct a sectionalized trellis that optimizes a given complexity measure. Exercise 5.9 (sectionalization problem) Compare the trellises of the (8, 4, 3) extended Hamming code with a sectionalization of 1 and 2. Reed Muller codes are highly structured and it is known that sectionalization can have a great impact on the minimal trellis complexity of these codes.
5.4.3
Ecient Optimum Decoding Techniques
There exist two well-known algorithms, namely, the Viterbi-algorithm and the BCJR-algorithm that operate in the code trellis and enable an ecient and eective realization of MAP/ML sequence estimation and symbol-by-symbol MAP/ML decoding, respectively. Viterbi algorithm The Viterbi-algorithm performs MAP/ML sequence estimation. Given a code trellis of a length n binary block code with state cardinality prole |S0 |, |S1 |, . . . , |Sn | and assume we have received r. Then the 1 |St | state metric at depth t, t = (t,1 , t,2 , . . . , t,|St | ), is obtained by4 t = t1 t , with 0 = 0, (5.110) (5.109)
4 Here we consider a redened matrix multiplication. Given the m n matrix A and the n m matrix B. Then, according to the denition of matrix multiplication, we obtain the m m matrix C = AB by
cij =
k
aik bkj .
In the following we consider the redened matrix multiplication where C = AB his obtained by cij = min{aik + bkj }.
k
5.4. DECODING ASPECTS where t is the |St1 | |St | state transition matrix
47
t = ( t,ij )
(5.111)
with elements
t,ij = dist(rt , edge label between i and j).
(5.112)
Hence, t,ij is the distance between the received symbol rt1 and, at depth t, the edge between state i and j. If no edge exists between the corresponding states, then t,ij = . Due to the redened matrix multiplication, the state metric ti , 1 i |St |, is the minimal distance of all paths that start in the root and end at depth t in state i. Hence, we have
n = min{dist(r, c)}.
cC
(5.113)
We are interested in the argument of this minimization, v = arg mincC {dist(r, c)}, i.e., in the path from the root to the toor that corresponds to c. Therefore, we use the survivor eld
sur = sur1 , sur2 , . . . , surn ,
(5.114)
where the 1 |St | vector surt is given by
surt,i = arg min{t,k + t,ki },

k
(5.115)
to remember the survivor states at each depth t in the trellis. Then the survivor eld can be used, starting at the toor, to trace back and nd the path to the root that corresponds to v . Let us consider an example.
Example 5.13 (Viterbi decoding) Given the (7, 4, 3) Hamming code is transmitted over the BSC and let r = (0 0 1 1 0 0) be the received word. The minimal trellis is depicted in Figure 5.7. In Figure 5.8 the steps performed by the Viterbi-algorithm are illustrated. Here the algorithm performs seven steps. Each step is illustrated, starting at the upper left-hand side. The trellis at the lower right-hand side depicts the nal step. The red path are the survivors at each state. The estimated
48
code sequence c = (0 1 1 1 1 0 0) corresponds to the survivor of the toor. The state metrics are 1 = 0 1 = 2 = 1 2 = 0 0 1 0 1 = 0 1 0 1 1 2 (5.116) (5.117)
3 = 2 3 =
4 = 3 4 = =
0 1 = 0 1 1 0 1 0 0 1 1 2 0 1 0 1 1 2 2 2 1 1 1 1 1 1 1 2 1 1 1 0 1 = 0 = 1 0 1 0 1 0
1 1 1 0
(5.118)
0 1 0 (5.119) 1 0 1 0 (5.120)
5 = 4 5 =
1 1 2
6 = 5 6 =
7 = 6 7 =
1 1
0 1
0 1
(5.121)
1 1
(5.122)
0 1
(5.123)
Hence, the distance dist(r, c) = 1. Using the survivor eld sur = 1 1 1 1 2 3 1 2 2 1 1 2 3 4 1 2 3 4 6 1 1 5 4 4 8 (5.124)
we can trace back from the toor and obtain the state sequence 1, 1, 6, 2, 3, 1, 1 which corresponds to the path that c takes from the toor to the root. The state transition matrices are sparse and the Viterbi-algorithm is the most ecient method to perform the multiplication of state metric and state transition matrix. Its computational complexity is 2|E| |S| + 1 (5.125)
where |E| and |S| are the number of edges and the number of states in the code trellis, respectively. In particular, |E| multiplications and |E| |S| + 1 additions are performed. This is an combinatorial complexity measure. The actual complexity of the algorithm depends strongly on its implementation.

0
0 1
49
1
1 1
0 1 1 2 2
1 1 1 0
2 2 2 1 1 1 1
0
1 1
Figure 5.8: Viterbi decoding of (7, 4, 3) Hamming code. BCJR algorithm The BCJR-algorithm performs symbol-by-symbol MAP/ML decoding. The algorithm works in three phases. The rst two phases employ the Viterbi-algorithm in a so-called forward recursion and backward recursion to compute the accumulated metric of all state in the trellis. These matrices are combined together with the state transition probabilities in the third phase to compute the log likelihood value of the information and (or) code symbols. The forward accumulated metric t at depth t is the 1 |St | vector t = (t,1 , t,2 , . . . , t,|St | ) obtained by t = t1 t , where t is the |St1 | |St | state transition matrix t = ( t,ij ) (5.128) with 0 = 1 (5.127) (5.126)
50 with elements
t,ij = fRt |Ct (rt |ct = edge label between i and j).
(5.129)
Hence, t,ij is the probability that yt is received given ct was sent. Equivalently, the backward accumulated metric is t at depth t is the 1 |St | vector t = (t,1 , t,2 , . . . , t,|St | ) obtained by t = t t+1 , with n = 1. (5.131) (5.130)
Now we obtain the log likelihood value of the code bit ct by L(ct ) = log t1 0 t t t1 1 t t (5.132)
where the state transition matrices 0 and 1 are a decomposition of the state transition matrix t t t = 0 + 1 , where 0 is for edges labled with ct1 = 0 and 1 is for edges with ct1 = 1. If we have t t t t a systematic generator matrix, then the L-values of the information symbol coincide with that of the corresponding code symbol. In case of a non-systematic generator matrix we have to use a code trellis that allows us to associate information symbols with edges such that a decomposition in 0 and 1 for t t some t is possible. Such a trellis might be non-minimal and, hence, increases the decoding complexity. Let us consider an example: Example 5.14 (BCJR-algorithm) Again we consider the (7, 4, 3) Hamming code is transmitted over the BSC and assume that r = (0 0 1 1 1 0 0) is received. The minimal trellis is depicted in Figure 5.7. If we replace 0 by q = 1 p, 1 by p, and by 0, we have the same set of state transition matrices t , 1 t 7. The forward accumulated metric is 1 = 2 = 3 = 4 = 5 = 6 = q p q 2 pq pq p2 pq 2 + p2 q pq 2 + p2 q pq 2 + p2 q q 3 + p3 p2 q 2 + p3 q p2 q 2 + p3 q p2 q 2 + p3 q pq 3 + p4 pq 3 + p2 q 2 pq 3 + p2 q 2 pq 3 + p2 q 2 q 4 + p3 q pq 4 + p2 q 3 + q 3 q 2 + p4 q pq 4 + p2 q 3 + q 3 q 2 + p4 q 2p2 q 3 + 2p3 q 2 2pq 4 + 2p4 q pq 5 + p2 q 4 + 3p3 q 3 + 3p4 q 2 2pq 5 + p2 q 4 + p3 q 3 + 3p4 q 2 + p5 q (5.133) (5.134) (5.135) (5.136) (5.137) (5.138) (5.139)
7 = pq 6 + 3p2 q 5 + 4p3 q4 + 4p4 q 4 + 3p3 q 2 + p6 q
The backward accumulated metric is obtained equivalently. Compute 0 and 1 for t = 3 and the t t corresponding L-value.
5.4.4
Bounds on Performance
Let Ai be an event dened for the outcome of a random experiment and denote by P (A i ) the probability that the event occurs. Then the union bound states that P
i
Ai
P (Ai )
i
(5.140)
where equality holds if and only if all events are mutually exclusive, i.e., Ai Ai = for all i = i .
51
Given a block code C is transmitted over the BSC with crossover probability p and assume ML sequence estimation is applied. Then the union bound can be used to obtain an upper bound the block error probability P (c = v ) A(w)p(w)
w
(5.141)
This is the block error probability of a code consisting of two code words that have the Hammingdistance w. Applying the Bhattacharyya bound this pair-wise error probability can be upper bounded by p(w) < 2 p(1 p)
w
where A(w) is the number of code words of weight w and p(w) is the error probability w i wi , w w odd i=(w+1)/2 i p (1 p) p(w) = w i wi , w even. 1 w pw/2 (1 p)w/2 w i=w/2+1 i p (1 p) 2 w/2
(5.142)
(5.143)
Now we can use the weight enumerator function A(W ) of the block code to obtain P (c = v ) < A(W )
W =2
p(1p)
(5.144)
This bound is a code property and does not depend on the underlying generator matrix. Equivalently, we can use the input-output weight enumerator function A(W, Z) of the code C with given generator matrix G to obtain an upper bound on the average bit error probability by E P (u = u) < A(W, Z) Z
Z=1 W =2 p(1p)
(5.145)
This is a generator matrix property and, hence, takes the mapping of the information to the code words into account. For the AWGN channel we can apply the same technique. Here the pair-wise error probability is given by p(w) = erfc and we obtain for the block error probability P (c = v ) < A(W ) and for the average bit error probability E P (u = u) < A(W, Z) Z
Z=1 W =eREb /N0 eREb /N0
2RwEb /N0 < e
R Nb w
0
(5.146)
(5.147)
(5.148)
The union bound is tight for large signal to noise ratios and diverges for low signal to noise ratios. Example 5.15 (Union bound) Figure 5.9 depicts the union bound on the average bit error probability of the (7, 4, 3) Hamming code with systematic generator matrix (5.13) and input-output weight enumerator function A(W, Z) = 1 + (3W 3 + W 4 )Z + (3W 3 + 3W 4 )Z 2 + (W 3 + 3W 4 )Z 3 + W 7 Z 4 . From this we obtain A(W, Z) Z = 12W 3 + 16W 4 + 4W 7
Z=1
(5.149)
(5.150)
and substituting W = eREb /N0 yields the depicted curve.
52
10
0
10
10
10
10
15
union bound uncoded 0 5 Eb/N0 10 15
Figure 5.9: Union bound on the bit error probability of the (7, 4, 3) Hamming code.
5.4.5
Suboptimum Decoding Techniques
There exist several suboptimal decoding techniques that have been of practical importance. Since all of them play a minor role for present channel coding applications, they will not be presented here. An interested reader will nd more, e.g., in [CC82] and [LC83].
53
Chapter 6
Cyclic Codes
Among the set of linear block codes there is the subset of so-called cyclic codes. Several code families of great practical importance are cyclic codes, e.g., BCH codes and Reed Solomon codes. The most important tool in the description of cyclic codes is the isomorphism between linear vector spaces and the group of polynomials. This means that we can represent code words by polynomials and, hence, a code by set of polynomials. This enables one to design codes of short to medium code length n with a minimum distance that is close to the best that can be found. On the other hand, there exist several decoding algorithms that can be used for these codes. Code design and decoding of cyclic codes are based on algebraic structures and algebraic coding theory established itself as a scientic eld of its own. The main focus is the algebraic structure of codes and less the performance of codes with respect to channel capacity. Algebraic coding theory came up with the rst code designs of practical importance and dominated the eld of channel coding during its rst two decades. Its results and techniques found applications to many problems beyond channel coding and the development in the eld today is of minor relevance to actual channel coding problems.
6.1
6.1.1
A First Encounter
In the following we are interested in linear binary block codes that have a special property: c0 c1 c2 . . . cn1 C cn1 c0 c1 . . . cn2 C (6.1)
This means that, if c = c0 c1 c2 . . . cn1 , is a code word, then the cyclic permutation by one to the right, i.e., c = cn1 c0 c1 . . . cn2 is again a code word. This applies again for the code word c and so on. Denition 6.1 (cyclic code) A linear block code C is called cyclic if any cyclic permutation of a code word is again a code word. To deal with cyclic codes it is convenient to represent the code words by polynomials. With the code word c = c0 c1 . . . cn1 , we associate the polynomial c(x) = c0 + c1 x + . . . + cn1 xn1 . (6.3) (6.2)
Then the code is a subset of the set of all polynomials with degree < (n 1) where the coecients are taken from 2 . For a given code word c(x) we obtain a cyclic shift by one to the right by rst
54 multiplying with x and then taking the result modulo1 xn 1, i.e., xc(x) mod (xn 1) = c (x).
CHAPTER 6. CYCLIC CODES
(6.4)
Taking any operation on a code word modulo xn 1 guarantees that the result of this operation is again a code word. A cyclic code can easily be encoded by a so-called generator polynomial. Then the information polynomial u(x) = u0 + u1 x + . . . + uk1 xk1 is multiplied by the generator matrix g(x) and taken modulo xn 1 to obtain the code polynomial c(x) = u(x)g(x) Let us consider an example: Example 6.1 (cyclic (7, 4) Hamming code) There exist a cyclic code that is equivalent to (7, 4) Hamming code. Its generator matrix is g(x) = 1 + x + x3 and by multiplying all 16 polynomials u(x) of degree < 4 with g(x) we obtain the code polynomials depicted in Table 6.1. The 2nd to the 8th row in Table 6.1 are the cyclic shifts of the generator polynomial g(x). The 9th to the 15th row are the cyclic shifts of (1 + x)g(x). The last row corresponds to (1 + x2 + x3 )g(x). mod (xn 1). (6.5)
Table 6.1: Cyclic (7, 4) Hamming code. 0 1 0 0 0 1 0 1 1 1 1 0 0 1 1 1 code 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 tuple c 0 0 0 1 0 0 0 1 0 1 0 1 1 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 code polynomial c(x) 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 +x x x2 1 x 1 1 1 1 +x +x +x x +x2 +x2 +x2 +x2 +x2 x2 +x3 +x3 +x3 +x3
+x4 +x4 +x4 +x4
+x5 +x6 +x5 +x5 +x5 +x6 +x6 +x6 +x6
1 1 1
+x3 +x3 +x3 +x3 +x3 +x3
+x +x
+x2
+x4 +x4 +x4 +x4 +x4 +x4
+x5 +x5 +x5 +x5 +x5
+x6 +x6 +x6 +x6
In general we can state the following: Theorem 6.1 (cyclic codes) Any cyclic code of length n can be represented as a polynomial code with a generator polynomial g(x) that divides xn 1.
Example 6.2 (cyclic code of length 7) The polynomial xn 1 can be factorized over xn 1 = (x + 1)(x3 + x + 1)(x3 + x2 + 1)
1
by (6.6)
Given the two polynomials q(x) and p(x) where q(x) = a(x)p(x) + b(x), then q(x) mod p(x) = b(x).
Hence, q(x) mod p(x) is the reminder of the polynomial division q(x)/p(x).
55
2.
where (x + 1), (x3 + x + 1), and (x3 + x2 + 1) are irreducible over codes of length 7. Their generator polynomials are: g1 (x) = 1 g3 (x) = (x + x + 1) g4 (x) = (x + x + 1) g6 (x) = (x 1)(x + x + 1)
3 3 3 3 2
Hence, there exist eight cyclic (6.7) (6.8) (6.9) (6.10) (6.11) (6.12) (6.13)
g2 (x) = (x 1)
3
g5 (x) = (x 1)(x + x + 1)
3 2
g7 (x) = (x + x + 1)(x + x + 1) g8 (x) = (x 1)(x + x + 1)(x + x + 1)

3 2
(6.14)
While the cyclic code with g1 (x) is only the zero code word, the cyclic code with g8 (x) contains all possible words. The code with g3 (x) and g4 (x) are two equivalent (7, 4) Hamming codes. Given a (n, k) cyclic code C, then there exist a generator polynomial of degree n k, g(x) = g0 + g1 x + . . . + gnk xnk . We obtain the generator matrix of the code by g0 G= (6.15)
gnk ... g0
gnk
(6.16)
where elements left blank are assumed to be zero. The check polynomial of a length n cyclic code with a degree-(n k) generator matrix g(x) is given by h(x) = (xn 1)/g(x). (6.17) This is a degree-k polynomial such that h(x)g(x) = 0 mod (xn 1) and it is easy to see that any c(x) C satises h(x)c(x) mod (xn 1) = 0. hk h0 ho (6.18) The parity check matrix of the code is given by H=
hk
The code with generator polynomial h(x) is equivalent to the dual code of C which is obtained by reversing the order of the symbols. Exercise 6.1 (cyclic (7, 4) Hamming code) Determine how to rearrange the code symbols of the Hamming code (as dened by the parity check matrix) to obtain one of the cyclic Hamming codes. Sometimes we are interested in a systematic encoding of a cyclic code. Then, by applying the Euclidean algorithm, we can obtain c(x) = xnk u(x) + r(x) with the information polynomial u(x) and redundancy polynomial r(x) = xnk u(x) mod g(x). (6.21) (6.20)
(6.19)
Once we have encoded a code polynomial we can perform any cyclic shift and systematic encoding to any k positions of the code word is possible. A nice property of cyclic codes that does not hold for linear block codes in general.
56
6.1.2
Finite Field Arithmetic
To study the structure of cyclic codes in more depth we need to introduce some nite eld arithmetic. A nite eld, sometimes called Galois eld, is a nite set of elements for which special rules of addition and multiplication are dened. Let us consider an example: Example 6.3 (prime eld 5 ) Given the nite eld 5 = {0, 1, 2, 3, 4}. The multiplication and addition table presented in Table 6.2 depict the results of adding, respectively, multiplying any two elements of 5 . Both operations are performed as with ordinary numbers but the result is taken modulo2 5.

+ 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
In the following denition we summarizes the properties of a nite eld: Denition 6.2 (nite eld) A nite eld q is a nite set of q elements with two dened operations, usually called addition and multiplication, that satisfy

1. closure: The result of adding or multiplying two elements is again an element in
2. identity element: The identity element of addition, 0, and the identity element of multiplication, 1, are elements in q .
3. inverse element: For each element in q there exists an inverse element of addition in for each element in q \ 0 there exists an inverse element of multiplication in q

4. commutative, associative, and distributive law: For all elements a, b q we have a + b = b + a, ab = ba (commutative law) and (a + b) + c = a + (b + c), (ab)c = a(bc) (associative law) and a(b + c) = ab + ac (distributive law). Finite elds do not exist for any arbitrary number q of elements. In general they do exist only when the number of elements is a prime number or is a power of a prime number. The importance and signicance of the following will become clear during the reminder of this chapter: Denition 6.3 (order, primitive element) An element q has order n if n = 1 but i = 1, 0 < i < n. An element with order n 1 is called a primitive element. Theorem 6.2 (primitive element) In every nite eld there exist at least one primitive element. Finite elds At rst we consider prime elds p = {0, 1, 2, . . . , p 1} where p is a prime number. Since in all p there exists at least one element of order p 1, i.e., a primitive element, we can represent any other element in p \ 0 by a power of this element. In Table 6.3 one can nd a list of all the primitive elements in p , p = 2, 3, 5, 7, 11, 13, 17, 23, 29, 31. The representation of the nite eld elements as powers of a primitive element introduces the concept of logarithms to nite elds. This
2

Given n
and q
and let n = aq + b, ab n
0,
then mod q = b.
mod q = (aq + b)
Hence, n mod q is the reminder of the integer division n/q.
Table 6.2: Addition and multiplication table for
5.
4 0 4 3 2 1
q.
and
57
p 2 3 5 7 11 13 17 19 23 29 31
primitive elements 1 2 2, 3 3, 5 2, 6, 7, 8 2, 6, 7, 11 3, 5, 6, 7, 10, 11, 12, 14 2, 3, 10, 13, 14, 15 5, 7, 10, 11, 14, 15, 17, 19, 20, 21 2, 3, 8, 10, 11, 14, 15, 18, 19, 21, 26, 27 3, 11, 12, 13, 17, 21, 22, 24
can considerably simplify the multiplication of two element in p : instead of multiplying two elements and then taking the result modulo p, we now can add the exponents of their representation as a power of the prime element and then take the result modulo p 1. This is of particular importance for system applications that do not allow us to use look-up tables to perform addition and multiplication in p . Let us consider an example: Table 6.4: Addition table, multiplication table, and mapping between logarithmic and ordinary representation of the elements in 7 .

0 1 2 3 4 5 6
0 0 1 2 3 4 5 6
1 1 2 3 4 5 6 0
2 2 3 4 5 6 0 1
3 3 4 5 6 0 1 2
4 4 5 6 0 1 2 3
5 5 6 0 1 2 3 4
6 6 0 1 2 3 4 5
0 1 2 3 4 5 6
0 0 0 0 0 0 0 0
1 0 1 2 3 4 5 6
2 0 2 4 6 1 3 5
Example 6.4 (prime eld 7 ) Given the nite eld 7 = {0, 1, 2, 3, 4, 5, 6}. The tables to perform addition and multiplication are shown in Table 6.4. Since a = 3 is a primitive element of 7 we can represent any element in 7 \ 0 as a power of 3, e.g., 5 = 35 and 6 = 33 . Now we compute 5+6 and 5 6 mod 7 = 35 33 = 3
5+3
mod 7 = 11 mod 7 = 4
mod 7
mod 6
= 38 = 32
mod 6
mod 7 = 2.
It is easy to see that while addition is easily performed with the original elements, multiplication is better performed with the logarithmic representation of the eld elements.
Table 6.3: Primitive elements in
p,
p 31.
3 0 3 6 2 5 1 4
4 0 4 1 5 2 6 3
5 0 5 3 1 6 4 2
6 0 6 5 4 3 2 1
ai 0 30 31 32 33 34 35 36
0 1 3 2 6 4 5 1
58
The existence of logarithms for nite elds means that there is a representation for the eld elements that is convenient for multiplication and another that is convenient for addition. Extension elds Now let us consider extention elds q where q = pm and p is a prime number. Here the eld elements are all possible polynomials of degree m 1 with coecients taken from p . The rules for multiplication and addition are as usual and the result is taken modulo a primitive polynomial of degree m. Let us consider an example:

Table 6.5: . Logarithmic, polynomial and vector representation of the extension eld by the primitive polynomial p(x) = 1 + x + x4 . 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 = = = = = = = = = = = = = = = = 1 2 3 1 + + +2 + + +3 +2 +2 +2 +2 +3 +3 +3 +3 +2 2 = = = = = = = = = = = = = = = = (0 (1 (0 (0 (0 (1 (0 (0 (1 (1 (0 (1 (0 (1 (1 (1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0) 0) 0) 0) 1) 0) 0) 1) 1) 0) 1) 0) 1) 1) 1) 1)
24
generated
1 1 1 1 1 1
+3 +3
Example 6.5 (extension eld 16 ) Various representations of the extension eld 16 are shown in Table 6.5. The eld is represented by polynomials in of degree < 4 where is a primitive element of 16 . Since p(x) = 1 + x + x4 is a primitive polynomial, we have p() = 1 + + 4 = 0 and, hence, 4 = 1 + . From this follows immediately 5 = 4 = (1 + ) = + 2

(6.22)
etc.
7 = 6 = 3 + 4 = 1 + + 3
6 = 5 = 2 + 3
Two elements in ponents modulo
16 are added using their binary vector representations and then taking their com, e.g., 2
(0 1 1 1) + (1 1 1 0) mod 2 = (1 2 2 1) = (1 0 0 1)
mod 2
To multiply two elements in
16
we best use their logarithmic representation, e.g., 9 10 = 9+10 = 19

mod 15
mod 15
= 4 .
6.1. STRUCTURAL PROPERTIES Alternatively, we could multiply the corresponding polynomials in ( + 3 )(1 + + 2 ) and then take the result modulo p() 5 + 4 + 2 + mod 1 + + 4 . mod 2 = 5 + 4 + 2 +
59
(6.23)
(6.24)
Notice, 5 + 4 + 2 + = ( + 1)p() + ( + 1) and computation modulo p() means to take the reminder of the polynomial division by p(). Clearly, we prefer to use the logarithmic representation to perform multiplication in 16 . A list of all primitive polynomials over 2 of degree m 8 is given in Table 6.6. We can use any primitive polynomial of degree m over 2 to generate the extension eld 2m . Dierent primitive polynomials will associate dierent powers of a primitive element with the binary vector that represents a given eld element. Nevertheless, the eld structure is the same and it is not important which primitive polynomial is used to generate the eld. Usually primitive polynomials of minimal Hamming-weight among the set of primitive polynomials of a given degree m are preferred. Table 6.7 is a list of such polynomials.

Table 6.6: All primitive polynomials in
of degree m = 1, 2, . . . , 8. primitive polynomials over 2 x7 + x5 + x4 + x3 + x2 + x + 1 x7 + x 6 + 1 x7 + x 6 + x 3 + x + 1 x7 + x 6 + x 4 + x + 1 x7 + x 6 + x 4 + x 2 + 1 x7 + x 6 + x 5 + x 2 + 1 x7 + x 6 + x 5 + x 3 + x 2 + x + 1 x7 + x 6 + x 5 + x 4 + 1 x7 + x 6 + x 5 + x 4 + x 2 + x + 1 x7 + x 6 + x 5 + x 4 + x 3 + x 2 + 1 x8 + x4 + x3 + x2 + 1 x8 + x 5 + x 3 + x + 1 x8 + x 5 + x 3 + x 2 + 1 x8 + x 6 + x 3 + x 2 + 1 x8 + x 6 + x 4 + x 3 + x 2 + x + 1 x8 + x 6 + x 5 + x + 1 x8 + x 6 + x 5 + x 2 + 1 x8 + x 6 + x 5 + x 3 + 1 x8 + x 6 + x 5 + x 4 + 1 x8 + x 7 + x 2 + x + 1 x8 + x 7 + x 3 + x 2 + 1 x8 + x 7 + x 5 + x 3 + 1 x8 + x 7 + x 6 + x + 1 x8 + x 7 + x 6 + x 3 + x 2 + x + 1 x8 + x 7 + x 6 + x 5 + x 2 + x + 1 x8 + x 7 + x 6 + x 5 + x 4 + x 2 + 1
m 1 2 3 4 5
primitive polynomials over x+1 x2 + x + 1 x3 + x + 1 x3 + x 2 + 1 x4 + x + 1 x4 + x 3 + 1 x5 + x2 + 1 x5 + x 3 + 1 x5 + x 3 + x 2 + x + 1 x5 + x 4 + x 2 + x + 1 x5 + x 4 + x 3 + x + 1 x5 + x 4 + x 3 + x 2 + 1 x6 + x + 1 x6 + x 4 + x 3 + x + 1 x6 + x 5 + 1 x6 + x 5 + x 2 + x + 1 x6 + x 5 + x 3 + x 2 + 1 x6 + x 5 + x 4 + x + 1 x7 + x + 1 x7 + x 3 + 1 x7 + x 3 + x 2 + x + 1 x7 + x 4 + 1 x7 + x 4 + x 3 + x 2 + 1 x7 + x 5 + x 2 + x + 1 x7 + x 5 + x 3 + x + 1 x7 + x 5 + x 4 + x 3 + 1
m 7
Both, prime elds p and their extension elds pm , are used in many code constructions. The representation of the elements in m by m dimensional binary vectors makes the binary extension 2 elds attractive for system application. For this reason we will pay less attention to extention elds pm where p > 2.

60
6.1.3
Roots of Cyclic Codes
According to Theorem 6.1 we know that any cyclic code of length n can be generated by a degree(n k) generator polynomial that divides xn 1. Hence, it is of great interest to factorize xn 1 in irreducible polynomials over 2 . As demonstrated in Example 6.2 this enables us to obtain the generator matrices of all length n cyclic codes. At rst, we notice that the elements of q , q = pm , are exactly the q distinct zeros of the polynomial q x, i.e., xq x = m x q (x ). If n = p 1, then the non-zero elements of the extension eld n 1. Let be a primitive element in pm are the zeros of x pm , then
n1
xn 1 = If n = 1, but n divides of order n and we can write pm pm
i=0
(x i ).
1, then there exists an extension eld

n1
pm
that has an element
xn 1 = i,
i=0
(x i )
where i = 1, 2, . . . , n, are n distinct elements in pm and represent a sub-eld of pm . Hence, by using extention elds we are able to decompose xn 1 in n distinct linear factors. From this point of view, the extention elds pm play for p the same role as complex numbers do for ordinary numbers. In the following we show how to group the n linear factors in (6.25) or (6.26) such that the corresponding products are irreducible polynomials over p . First need to dene the following:3
Denition 6.4 (cyclotomic cosets) Given an element of order n in coset of i is dened as Ki = {i | ipj
pm
then the cyclotomic (6.27)
mod n, j = 0, 1, . . . , m 1}.
Denition 6.5 (minimal polynomial) Given the cyclotomic coset Ki of i where is an element of order n in pm , then mi (x) =
kKi
3
(x k )
From the theory of nite elds the following is known: Let pm and f (x) be a polynomial with coecients in Then f () = 0 implies f ( q ) = 0. On the other hand, let g(x) be a polynomial over pm . Then all coecients of g(x) are from p , if g( p ) = 0 for every for which g() = 0. Hence, we can decompose pm into subsets and the product of the linear factors given by these coset yields a polynomial with coecients from p .

p.
m 1 2 3 4 5 6 7 8 9 10 11 12
primitive polynomial over 1+x 1 + x + x2 1 + x + x3 1 + x + x4 1 + x 2 + x5 1 + x + x6 1 + x 3 + x7 1 + x 2 + x3 + x4 + x8 1 + x 4 + x9 1 + x3 + x10 1 + x2 + x11 1 + x + x4 + x6 + x12
m 13 14 15 16 17 19 18 20 21 22 23 24
primitive polynomial over 1 + x + x3 + x4 + x13 1 + x + x6 + x10 + x14 1 + x + x15 1 + x + x3 + x12 x16 1 + x3 + x17 1 + x7 + x18 1 + x + x2 + x5 + x19 1 + x3 + x20 1 + x2 + x21 1 + x + x22 1 + x5 + x23 1 + x + x2 + x7 + x24
Table 6.7: Primitive polynomials of minimal Hamming-weight in
of degree m = 1, 2, . . . , 24.
p
(6.25)
(6.26)
(6.28)
61
is an irreducible polynomial over
and is called minimal polynomial of i .
Now we decompose the exponents {0, 1, . . . , n1} of into cyclotomic cosets and use the corresponding minimal polynomial mi (x) with coecients over p to obtain the factorization of xn 1 in irreducible polynomials: xn 1 = lcm m0 (x), m1 (x), . . . , mn1 (x) .
(6.29)
The number of factors is given by the number of dierent cyclotomic cosets. Let us consider an example:
Table 6.8: . Logarithmic, polynomial and vector representation of the extension eld by the primitive polynomial p(x) = 1 + x + x3 . 0 0 1 2 3 4 5 6 = = = = = = = = 1 2 1 1 1 + + +2 +2 +2
2)
23
generated
= = = = = = = =
(0 (1 (0 (0 (1 (0 (1 (1
0 0 1 0 1 1 0 1
0) 0) 0) 1) 0) 1) 1) 1)
2.
Example 6.6 (factorization of x7 1 over we know that

7
Let us factorize x7 1 over (x i )
Since n = 23 1 (6.30)
x 1=
i=0
where is a primitive element of
23 .
The decomposition into cyclotomic cosets by mod 7, j = 0, 1, 2} (6.31)
Ki = {i | i 2j yields
K0 = {0}
(6.32) (6.33) (6.34)
K1 = {1, 2, 4}
K3 = {3, 5, 6}
K2 = K4 = K1 and K5 = K6 = K3 . We obtain the minimal polynomials m0 (x) = (x 0 ) = x + 1 m1 (x) = (x )(x 2 )(x 4 ) =

3 3 4 2
(6.35) (6.36)
3 4 7 2
= x + ( + + )x + ( + + )x +
3
(6.37) (6.38) (6.39)
= x +x+1 m3 (x) = (x 3 )(x 5 )(x 6 )

3 3 5 6 2
= x + ( + + )x + ( + + )x +
3 2
11
14
(6.40) (6.41)
= x +x +1
where we used the extension eld
23
as given in Table 6.8. Hence, (6.42)
x7 1 = (x + 1)(x3 + x + 1)(x3 + x2 + 1).
62
n=3 m0 (x) = 1 + x m1 (x) = 1 + x + x2 n=7 m0 (x) = 1 + x m1 (x) = 1 + x + x3 m3 (x) = 1 + x2 + x3 n = 15 m0 (x) = 1 + x m1 (x) = 1 + x + x4 m3 (x) = 1 + x + x2 + x3 + x4 m5 (x) = 1 + x + x2 m7 (x) = 1 + x3 + x4 n = 31 m0 (x) = 1 + x m1 (x) = 1 + x2 + x5 m3 (x) = 1 + x2 + x3 + x4 + x5 m5 (x) = 1 + x + x2 + x4 + x5 m7 (x) = 1 + x + x2 + x3 + x5 m11 (x) = 1 + x + x3 + x4 + x5 m15 (x) = 1 + x3 + x5
n = 63 m0 (x) = 1 + x m1 (x) = 1 + x + x6 m3 (x) = 1 + x + x2 + x4 + x6 m5 (x) = 1 + x + x2 + x5 + x6 m7 (x) = 1 + x3 + x6 m9 (x) = 1 + x2 + x3 m11 (x) = 1 + x2 + x3 + x5 + x6 m13 (x) = 1 + x + x3 + x4 + x6 m15 (x) = 1 + x2 + x4 + x5 + x6 m21 (x) = 1 + x + x2 m23 (x) = 1 + x + x4 + x5 + x6 m27 (x) = 1 + x + x3 m31 (x) = 1 + x5 + x6
Example 6.7 (factorization of x5 1 over 2 ) The smallest extension eld which contains an element of order n = 5 is 24 . Since n = 5 divides 24 1 = 15, we know that = 3 has order 5, where is a primitive element in 24 . To see this, we compute = 3 , 2 = 6 , 3 = 9 , 4 = 12 , 5 = 15 = 1,

(6.43)
i.e., 5 = 1 and i = 1, 0 < i < 5, and, hence, has order 5. The cyclotomic cosets are computed by Ki = {i | i 2j which gives K0 = {0} (6.45) (6.46) mod 5, j = 0, 1, 2, 3} (6.44)
K1 = {1, 2, 3, 4}
and K2 = K3 = K4 = K1 . Hence we have two minimal polynomials m0 (x) = (x 0 ) = x + 1 m1 (x) = (x 3 )(x 6 )(x 9 )(x 12 ) = = x +x +x +x+1
24 4 3 2
(6.47) (6.48) (6.49)
where we used the extension eld
as given in Table 6.5. Hence, (6.50)
x5 1 = (x + 1)(x4 + x3 + x2 + x + 1).
Notice, m1 (x) is not among the set of primitive polynomials in Table 6.6, hence, it is an example for irreducible but not primitive polynomial.
Table 6.9: Factorization of xn 1 with n = 2m 1, m = 1, 2, . . . 6, in minimal polynomials over using the primitive polynomial of degree m from Table 6.7.
63
The factorization of xn 1, n = 3, 7, 15, 31, 63, over 2 in minimal polynomials and their cyclotomic coset numbers is presented in Table 6.9. The factorization of xn 1, 1 n 32, over 2 in irreducible polynomials is presented in Table 6.10. The factorization of x n 1, 1 n 10, over p , p = 3, 5, 7, 11, 13, in irreducible polynomials is presented in Table 6.11

20 21 22 23 24 25 25 27 28 29 30 31 32
In general a cyclic code can be specied by requiring that all code words have prescribed zeros. In fact each of the code families that will be presented in the next section is based on a special method that determines the zeros of the generator polynomial.
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
factorization over 2 (x + 1) (x + 1)2 (x2 + x + 1)(x + 1) (x + 1)4 (x4 + x3 + x2 + x + 1)(x + 1) (x2 + x + 1)2 (x + 1)2 (x3 + x + 1)(x + 1)(x3 + x2 + 1) (x + 1)8 (x2 + x + 1)(x6 + x3 + 1)(x + 1) (x4 + x3 + x2 + x + 1)2 (x + 1)2 (x10 + x9 + x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1)(x + 1) (x2 + x + 1)4 (x + 1)4 (x12 + x11 + x10 + x9 + x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1)(x + 1) (x3 + x + 1)2 (x3 + x2 + 1)2 (x + 1)2 (x4 + x3 + 1)(x4 + x3 + x2 + x + 1)(x2 + x + 1)(x4 + x + 1)(x + 1) (x + 1)16 (x8 + x7 + x6 + x4 + x2 + x + 1)(x8 + x5 + x4 + x3 + 1)(x + 1) (x2 + x + 1)2 (x6 + x3 + 1)2 (x + 1)2 (x18 + x17 + x16 + x15 + x14 + x13 + x12 + . . . +x11 + x10 + x9 + x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1)(x + 1) 4 + x3 + x2 + x + 1)4 (x + 1)4 (x (x3 + x + 1)(x2 + x + 1)(x6 + x4 + x2 + x + 1)(x3 + x2 + 1)(x6 + x5 + x4 + x2 + 1)(x + 1) (x10 + x9 + x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1)2 (x + 1)2 (x11 + x10 + x6 + x5 + x4 + x2 + 1)(x11 + x9 + x7 + x6 + x5 + x + 1)(x + 1) (x2 + x + 1)8 (x + 1)8 (x4 + x3 + x2 + x + 1)(x20 + x15 + x10 + x5 + 1)(x + 1) (x12 + x11 + x10 + x9 + x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1)2 (x + 1)2 (x2 + x + 1)(x6 + x3 + 1)(x18 + x9 + 1)(x + 1) (x3 + x2 + 1)4 (x3 + x + 1)4 (x + 1)4 (x28 + x27 + x26 + x25 + x24 + x23 + x22 + x21 + x20 + x19 + x18 + x17 + x16 + x15 + . . . +x14 + x13 + x12 + x11 + x10 + x9 + x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1)(x + 1) 4 + x3 + 1)2 (x4 + x3 + x2 + x + 1)2 (x2 + x + 1)2 (x4 + x + 1)2 (x + 1)2 (x (x5 + x4 + x3 + x2 + 1)(x5 + x3 + 1)(x5 + x4 + x2 + x + 1) (x5 + x4 + x3 + x + 1)(x5 + x3 + x2 + x + 1)(x5 + x2 + 1)(x + 1) (x + 1)32
Table 6.10: Factorization of xn 1, 1 n 32, over
in minimal polynomials.
64
p=3 x1 x2 1 x3 1 x4 1 x5 1 x6 1 x7 1 x8 1 x9 1 x10 1 p=5 x1 x2 1 x3 1 x4 1 x5 1 x6 1 x7 1 x8 1 x9 1 x10 1 p=7 x1 x2 1 x3 1 x4 1 x5 1 x6 1 x7 1 x8 1 x9 1 x10 1 p = 11 x1 x2 1 x3 1 x4 1 x5 1 x6 1 x7 1 x8 1 x9 1 x10 1 p = 13 x1 x2 1 x3 1 x4 1 x5 1 x6 1 x7 1 x8 1 x9 1 x10 1
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
(x + 2) (x + 2)(x + 1) (x + 2)3 (x + 2)(x2 + 1)(x + 1) (x + 2)(x4 + x3 + x2 + x + 1) (x + 2)3 (x + 1)3 (x + 2)(x6 + x5 + x4 + x3 + x2 + x + 1) (x + 2)(x2 + 2 x + 2)(x2 + 1)(x2 + x + 2)(x + 1) (x + 2)9 (x + 2)(x4 + x3 + x2 + x + 1)(x4 + 2 x3 + x2 + 2 x + 1)(x + 1) (x + 4) (x + 4)(x + 1) (x2 + x + 1)(x + 4) (x + 2)(x + 4)(x + 3)(x + 1) (x + 4)5 (x2 + x + 1)(x + 4)(x2 + 4 x + 1)(x + 1) (x6 + x5 + x4 + x3 + x2 + x + 1)(x + 4) (x + 2)(x2 + 3)(x + 4)(x2 + 2)(x + 3)(x + 1) (x2 + x + 1)(x + 4)(x6 + x3 + 1) (x + 4)5 (x + 1)5 (x + 6) (x + 6)(x + 1) (x + 5)(x + 6)(x + 3) (x2 + 1)(x + 6)(x + 1) (x4 + x3 + x2 + x + 1)(x + 6) (x + 2)(x + 5)(x + 4)(x + 6)(x + 3)(x + 1) (x + 6)7 (x2 + 1)(x2 + 4 x + 1)(x2 + 3 x + 1)(x + 6)(x + 1) (x3 + 3)(x + 5)(x + 6)(x3 + 5)(x + 3) (x4 + x3 + x2 + x + 1)(x4 + 6 x3 + x2 + 6 x + 1)(x + 6)(x + 1) (x + 10) (x + 10)(x + 1) (x + 10)(x2 + x + 1) (x + 10)(x2 + 1)(x + 1) (x + 10)(x + 7)(x + 8)(x + 6)(x + 2) (x + 10)(x2 + x + 1)(x2 + 10 x + 1)(x + 1) (x + 10)(x3 + 5 x2 + 4 x + 10)(x3 + 7 x2 + 6 x + 10) (x2 + 8 x + 10)(x + 10)(x2 + 1)(x2 + 3 x + 10)(x + 1) (x + 10)(x6 + x3 + 1)(x2 + x + 1) (x + 10)(x + 7)(x + 5)(x + 9)(x + 8)(x + 6)(x + 3)(x + 1)(x + 4)(x + 2) (x + 12) (x + 12)(x + 1) (x + 10)(x + 12)(x + 4) (x + 12)(x + 5)(x + 8)(x + 1) (x4 + x3 + x2 + x + 1)(x + 12) (x + 10)(x + 3)(x + 12)(x + 9)(x + 1)(x + 4) (x2 + 5 x + 1)(x2 + 3 x + 1)(x + 12)(x2 + 6 x + 1) (x2 + 5)(x + 12)(x2 + 8)(x + 5)(x + 8)(x + 1) (x + 10)(x + 12)(x3 + 10)(x3 + 4)(x + 4) (x4 + x3 + x2 + x + 1)(x4 + 12 x3 + x2 + 12 x + 1)(x + 12)(x + 1)
Table 6.11: Factorization of xn 1, 1 n 10, over
p,
p = 3, 5, 7, 11, 13, in minimal polynomials.
65
6.2
Distance Properties
We introduce the discrete analog of the Fourier transform. Let a(x) be a polynomial of degree < n over pm , then A(X) = a(1) + a( n1 )X + . . . + a()X n1
n
(6.51) (6.52)
=
i=1
a( i )X ni
where is an element of order n in 2m . Hence, the coecient Ai = a( ni ) of the polynomial A(X) is zero if ni is a zero of the polynomial a(x). The inverse transformation is given by a(x) = n1 A(1) + A()x + . . . + A( n1 )xn1
n
(6.53) (6.54)
= n1
i=1
A( i )x(ni)
mod (xn 1).
Again, we notice that the coecient ai of the polynomial a(x) is zero if i is a zero of the polynomial A(X). This transformation has some properties that are very useful when dealing with cyclic codes. Here we will use the transformation to make a general statement about the minimum distance of cyclic codes. Consider a length n cyclic code C over pm and let be an element of order n in pm . Assume that all code words c(x) have zeros at , 2 , . . . , 1 , i.e., c( i ) = 0, i = 1, 2, . . . , 1.

(6.55)
This can easily be achieved by having a generator polynomial with the corresponding zeros. The transform of a given code word c(x) C is given by C(X) = C0 + C1 X 1 + . . . + Cn X n (6.56)
where the last 1 coecients Cn+1 , Cn+2 , . . . , Cn1 = 0. Since C(X) has degree n , there exist at most n zeros in pm . Hence, there are at most n coecients ci = 0 in c(x). In other words, we have shown that any code word in C as at least Hamming-weight and we conclude that the minimum distance of C is at least . We formulate this as a theorem:
Theorem 6.3 (minimum distance) Given a length n cyclic code over g(x) =
iK
where is an element of order n in code satises
It is easily shown that Theorem 6.3 also holds for the more general case of 1 consecutive zeros l+1 , l+2 , . . . , l+2 , 0 l n 1, in the generator matrix. All the following code constructions apply this theorem to generate codes with large minimum distances. The dierences in the constructions are the underlying nite eld and small variances in how to obtain the generator matrix.
pm
with generator matrix (6.57)
(x i )
pm
and {1, 2, . . . , 1} K, then the minimum distance of the d . (6.58)
66
6.3
6.3.1

BCH Codes
The family of BCH codes is the most important code family among cyclic codes. They were discovered by R.C. Bose and D.K. Ray-Chaudhuri (1960) and independently by A. Hocquenghem (1959).
Denition 6.6 (BCH code) A cyclic code of length n over distance if its generator polynomial is
is called a BCH code of designed
g(x) = lcm m1 (x), m2 (x), . . . , m1 (x)
(6.59)
pm .
where mi (x), 0 < i < , are the minimal polynomials of i and is an element of order n in
The generator polynomial g(x) of a BCH code has n k zeros in p . Due to the denition of minimal polynomials, 1 of these zeros are , 2 , . . . , 1 . Hence, Theorem 6.3 applies and the minimum distance of BCH codes satises dm . (6.60)
The minimum distance can be larger than the designed distance, and sometimes it is! Finding the actual minimum distance of a BCH code is in general a hard problem. Weight enumerators have been found by exhaustive search for some BCH codes with either small k or n k. The code degree k is determined by the generator polynomial. Since there does not exist a simple formula to compute the degree of g(x), tables of the code parameters for a wide range of BCH codes exist. For the most frequently used BCH codes tables exist containing the generator polynomials.
Constructing a BCH code of length n over
with designed distance requires the following steps:

pm :

1. nd an element of order n in some extension eld
2. compute the cyclotomic cosets Ki of i , i = 1, 2, . . . , 1 3. compute the minimal polynomials mi (x) of the distinct cyclotomic cosets Ki Let us consider an example: Example 6.8 (BCH code) Let us construct a length n = 15 BCH code over 2 with designed distance = 5. This code is able to correct 2 errors. Firstly, we notice that n = 2 m 1 for m = 4 and we take a primitive element in 2m to obtain g(x). The cyclotomic cosets, given by Ki = {i | i2j mod 15, j = 0, 1, 2, 3}, i = 1, 2, 3, 4, are K1 = {1, 2, 4, 8}

K3 = {3, 6, 9, 12}
and K2 = K4 = K1 . There exist two distinct minimal polynomials m1 (x) =

kK1
(x k ) (x k )
m3 (x) =
kK3
b) if n = pm 1 for any m, then nd a element of order n in integer such that n divides pm 1
a) if n = pm 1 for some m , then we take a primitive element in

pm
pm
, where m is the smallest
(6.61) (6.62)
(6.63) (6.64)
67
where is a primitive element in 24 . Its left to the reader to use Table 6.5 to compute the two minimal polynomials. From Table 6.9 we have m1 (x) = 1 + x + x4 and m3 (x) = 1 + x + x2 + x3 + x4 . The generator polynomial is g(x) = m1 (x)m3 (x) = x8 + x7 + x6 + 1. Hence the BCH code has dimension k = 7. More BCH codes can easily be constructed with the help of Tables 6.9, 6.10 and 6.11. And the reader who is not familiar with nite elds should carefully reread Section 6.1.1 and study nite eld arithmetic by constructing more examples. BCH codes as dened above are often called narrow sense BCH codes. We distinguish: if is a primitive element of pm , i.e., n = pm 1, the code is called a primitive BCH code; if n pm 1 the code is called non-primitive BCH code if p = 2 the code is called binary BCH code; if p > 2 the code is called non-binary BCH code Sometimes a generalization of the denition above is considered. Then the generator polynomial is constructed such that l , l , . . . , l+2 are zeros. Then there exist again 1 consecutive zeros (in power of ), but now not starting with the l .
(6.65)
6.3.2
Golay Code
The Golay code is an example for a binary, non-primitive BCH code. Actually, it is the only known non-trivial perfect code that is able to correct more than a single error. There exist various methods to construct this code. Assume we would like to construct a length n = 23 BCH code over 2 with designed distance = 5. At rst we notice that n = 2m for any m. The smallest integer m such that n = 23 divides 2m 1 is m = 11. Since 211 1 = 23 89, we know that = 89 is an element of order 23 in 211 given is a primitive element. The cyclotomic cosets, given by Ki = {i | i2j mod 23, j = 0, 1, . . . , 10}, i = 1, 2, 3, 4, are K1 = {1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18} and K2 = K3 = K4 = K1 . Hence, the generator polynomial is g(x) = m1 (x) =
kK1

(6.66)
(x k ).
(6.67)
To actually compute g(x) we need to generate 211 , e.g., by using the primitive polynomial p(x) = 1 + x2 + x11 from Table 6.6. According to Table 6.7, x23 1 factorizes in (x + 1) and two irreducible polynomials of degree 11. Depending on which primitive polynomial is used to generate 211 we obtain the generator generator polynomial of the Golay code as g(x) = x11 + x10 + x6 + x5 + x4 + x2 + 1 or g(x) = x11 + x9 + x7 + x6 + x5 + x + 1. (6.69)
(6.68)
The minimum distance of the Golay code is dm = 7. Nevertheless, the algebraic decoding technique presented in the next section is only able to correct 2 error. Often, to fully exploit the error correction capability of the code, large look-up tables or the minimal code trellis is used for ML sequence estimation.
68
6.3.3
Reed Solomon Codes
Reed-Solomon codes are one of the simplest examples for non-binary BCH codes. Due to their capability of correcting error bursts, which often occurs in practice, these codes are used in many system applications. Denition 6.7 (Reed Solomon code) A Reed-Solomon code (RS code) is a primitive BCH code of length n = q 1 over q . The generator polynomial is given by
nk
g(x) =
i=1
(x i )
(6.70)
where is a primitive element in
q.
From Theorem 6.3 it follows immediately that dm , where according to the denition above the designed distance is = nk1. On the other hand we know from the Singleton bound for non-binary codes that dm n k 1 and we conclude that the minimum distance of RS codes is dm = n k + 1. (6.71)
It satises the Singleton bound with equality and, hence, RS code are called minimum distance separable. Example 6.9 (RS code over 7 ) Let us construct a length n = 6 RS code with minimum distance dm = 3. Hence, the code symbols are from q , q = n + 1 = 7. To have a minimum distance dm = 3 we have the generator polynomial g(x) = (x1 )(x2 ) where is a primitive element in 7 . According to Table 6.3 3 and 5 are the two primitive elements in 7 , taking = 3 gives g(x) = (x 3)(x 32 ) mod 7 = x2 + 2 + 6.

(6.72)
In many system application RS codes over binary extention elds 2m are used. Then the code symbols, respectively, the coecients of the code polynomial, can be represented as binary vectors. Example 6.10 (RS code over 23 ) Let us construct a length n = 7 RS code with minimum distance dm = 3. The code symbols are from the extention eld 23 . Again, we have the generator polynomial g(x) = x2 ( + 2 )x + 3 ,

(6.73)
but now is a primitive element in then
23 .
Let p(x) = 1 + x + x3 be used to generate the extention eld,
g(x) = (1 0 0)x2 + (0 1 1)x + (1 1 0) where we used the vector representation of the nite eld as presented in Table 6.8
(6.74)
6.3.4
Quadratic Residue Codes
This is another family of cyclic code. Even it is interesting from a theoretical point of view, this code family is rather small and is of limited relevance in practice. It is not be presented here and a interested reader can nd more, e.g., in [vL82] or [Bos99].
69
6.4
Decoding Aspects
In our discussion of decoding aspects in Section 5.4 of Chapter 5 we have seen that there exist a decoding rule to minimize the bit error probability and another to minimize the block error probability. For both decoding rules we have presented ecient decoding techniques that operate in the minimal code trellis. Clearly, this can be applied to binary cyclic block codes and, in a modied form, to nonbinary cyclic block codes. Unfortunately, as the code length n increases, the minimal trellis complexity of cyclic codes increases considerably and real-time signal processing implementation is not possible with state-of-the-art hardware. Due to the algebraic structure of cyclic codes it has been possible to develop algebraic decoding techniques that realize a bounded minimum distance (BMD) decoder. Such a decoder is able to correct t errors where t satises 2t + 1 (6.75)
and is the designed distance of the code. In the following we will present a decoding technique that realizes a BMD decoder for BCH codes and their related code families.
6.4.1
Algebraic Decoding Techniques
Consider the polynomial r(x) = r0 + r1 x + . . . + rn1 xn1 is given by r(x) = c(x) + e(x) (6.76)
where c(x) is the code polynomial of a length n BCH code with designed distance = 2t + 1 and e(x) is the error polynomial e(x) = e0 + e1 x + . . . + en1 xn1 . Applying the discrete Fourier transformation as introduced in Section 6.2 we obtain R(X) = C(X) + E(X) (6.77)
with C(X) = C0 + C1 X + . . . + Cn2t1 X n2t1 , where the 2t largest coecients of the polynomial are zero4 , i.e., Cn2t = Cn2t+1 = . . . = Cn1 = 0. Hence, we have En2t = Rn2t , En2t+1 = Rn2t+1 , . . . , and En1 = Rn1 . (6.78)
Starting with the known 2t values of E(X) the decoder must nd the n 2t other coecients that produce an estimated error polynomial E(X) such that e(x) has the minimum number of non-zero coecients. In the following we will need a nice property of the discrete Fourier transform. Let a(x), b(x), and c(x) be degree-(n 1) polynomials over q and let A(X), B(X), and C(X) be their transforms. We dene the convolution of two polynomials by a(x) b(x) = a0 b0 + a1 b1 x + . . . + an1 bn1 xn1 . If c(x) = a(x) b(x) it follows that C(x) = A(X)B(X) mod (X n 1). Equivalently, if c(x) = a(x)b(x) mod (xn 1), then C(X) = A(X) B(X). Suppose that there are t errors and dene the error locator polynomial L(X) =
i=1
(1 li X)
(6.79)
such that li = 0 if ei = 0. Hence, l(x) e(x) = 0 and it follows that L(X)E(X) = 0. (6.80)
4 If the underlying BCH code is not narrow-sense, then we simply have to shift the corresponding indices in order to obtain the zero coecients in C(X).
70 We obtain the system of n linear equations (0) (1) E 0 L0 + E 1 L0 + En1 L1 + . . . + E 0 L1 + . . . +
En+1 L1 = 0 En+2 L1 = 0 . . . (6.81)
(n 2) En2 L0 + En3 L1 + . . . + En21 L1 = 0 (n 1) En1 L0 + En2 L1 + . . . + En2 L1 = 0 (n ) En L0 + En1 L1 + . . . + En2+1 L1 = 0 . . . (n 1) En1 L0 + En2 L1 + . . . + En L1 = 0. Since t the last equations En1 En2 . . . En2 En En1 . . . En2+1 . . . . . . . . . En1 En2 . . . En
L0 L1 . . . L1
which is a system of linear equations for the non-zero coecients of L(X) is called the key equation. Once the key equation is solved, we can recursively determine the coecients of the error polynomial E(X). Finally, we subtract the transform e(x) from the polynomial r(x) and obtain the estimated code word c(x) = r(x) e(x). (6.83)
=0
(6.82)
The critical part in the design of such a decoder is to nd an ecient and eective algorithm that solves the above described system of linear equations. Two such methods exist: Berlekamp-Massey algorithm Euclidean algorithm Both algorithms are described extensively in literature.
6.4.2
Given a length n BSC code with designed distance = 2t + 1. Then the code word error probability at the output of a t error-correcting bounded distance decoder is
n
Pw =
i=t+1
n i p (1 p)ni i
(6.84)
where p is the error probability that a symbol error occurs on the channel. We can upper bound the bit error probability by assuming that a pattern of i > t channel errors will lead to a decoded code word with i + t dierent code symbols compared to the correct code word. Then,
n
Pb =
i=t+1
i+t n i p (1 p)ni . n i
(6.85)
Now we can compute the coding gain at a given bit error probability.
6.4.3
Implementation Aspects
Considerable research activity led to a number of computationally ecient decoding algorithms for BCH codes and related code families as RS codes. Their implementation aspects have been studied in detail and have been well documented. This will not be studied here and the interested reader will nd more in, e.g., [Bla84].
71
Part III
Trellis Coding
73
Chapter 7
Linear Binary Convolutional Codes

Linear binary convolutional codes are a subclass of trellis codes. Many communication systems designed nowadays apply channel coding schemes that employ convolutional codes.
7.1
When considering structural properties of convolutional codes it is carefully distinguished between a convolutional code C, its generator matrices G(D), and a convolutional encoder E. The latter is the realization of a given generator matrix by a linear sequential circuit using binary adders and delay elements. The minimal number of delay elements that are required to realize a given generator matrix is called the generator matrix degree, deg G. Equivalently, the minimal degree among all generator matrices of a given code C is called the code degree, deg C. The following inequality is satised deg C deg G . Beside the code rate, it is the code degree and the generator matrix degree that are the most important structural parameters of a convolutional code and a convolutional generator matrix, respectively.
7.1.1
Code, Generator Matrix, and Encoder
In the following we establish the basic notation for the information sequence, the code sequence, and the generator matrix, both in the time domain and in the D-transformed domain. The fundamental denitions for a convolutional code, a convolutional generator matrix, and a convolutional encoder are given. A rate R = b/c convolutional encoder is a linear device that performs a one-to-one mapping of the input sequence of binary b-tuples u = ur ur+1 . . . = u(1) u(2) . . . u(b) ur+1 ur+1 . . . ur+1 . . . r r r to the output sequence of binary c-tuples
(1) (2) (c) v = v r v r+1 . . . = vr vr . . . vr ur+1 vr+1 . . . vr+1 . . . .
(1)
(2)
(b)
(7.1)
(1)
(2)
(c)
(7.2)
Both sequences are semi-innite and start at time t = r, r . The linear mapping can be described by a semi-innite generator matrix G such that v = uG with G0 G1 G2 . . . G0 G1 . . . G= .. .. . . (7.4) (7.3)
74
CHAPTER 7. LINEAR BINARY CONVOLUTIONAL CODES
where Gi are b c sub-matrices. Often it is convenient to use generating functions to describe a convolutional encoder. We introduce the delay operator D and write the input sequence u as the b dimensional row-vector
u(D) =
u(1) (D)
u(2) (D)
u(b) (D)
=
t=r
ut D t
(7.5)
and the output sequence v as the c dimensional row-vector
v(D) =
v (1) (D)
v (2) (D)
v (c) (D)
=
t=r
v t Dt .
(7.6)
Then the linear mapping is described by a b c generator matrix G(D) such that v(D) = u(D)G(D) where the elements of G(D) are rational transfer functions of the form g(D) = f0 + f 1 D + f 2 D 2 + + f m D m . 1 + q 1 D + q2 D 2 + + q m D m (7.8) (7.7)
The delay-free denominator (q0 = 1) guarantees that g(D) is realizable.
Figure 7.1: .Memory m = 2, rate R = 1/2 convolutional encoder. Example 7.1 (polynomial generator matrix) lutional encoder depicted in Figure 7.1 is 1 1 1 0 1 1 1 1 1 0 1 1 G= G(D) = = 1 1 + The generator matrix of the rate R = 1/2 convo
1 1 1 0 1 1 1 1 1 0 1 1 . . . . . . D2
(7.9)
or written by using the delay operator D by
1 0
D+ .
1 1
(7.10) (7.11)
1 + D + D2 1 + D
Example 7.2 (rational generator matrix) tional encoder depicted in Figure 7.2 is 1 1 0 1 0 1 1 0 G= 1 or by using the delay operator D by G(D)
The generator matrix of the rate R = 1/2 convolu 1 ... 1 0 1 ... 1 0 1 0 1 ... . . . . . . 1
1 1+D
(7.12)
(7.13)
75
Figure 7.2: Memory m = 1, rate R = 1/2 convolutional encoder. Formally, we denote by 2 ((D)) the eld of binary Laurent series. The element x(D) = xi Di 2 ((D)), r , contains at most nitely many negative powers of D. By 2 [[D]] we denote the ring of formal power series. The element f (D) = fi Di 2 [[D]] is a Laurent i=0 series without negative powers in D; 2 [[D]] is a subset of 2 ((D)). A polynomial p(D) = pi Di i=0 contains no negative and only nitely many positive powers of D. The ring of binary polynomials is denoted by 2 [D] and is a subset of 2 [[D]]. Finally, we denote by 2 (D) the eld of binary rational i functions. The element x(D) = p(D)/p (D) = 2 (D), with p(D), p (D) 2 [D], is i=r xi D obtained by long division; 2 (D) is a subset of 2 ((D)). We can consider n-tuples of elements from 2 [D], 2 [[D]], 2 (D), and 2 ((D)). Then, for example, the n-tuple x(D) = (x(1) (D), x(2) (D), . . . , x(n) (D)), where x(1) (D), x(2) (D), . . . , x(n) (D) 2 ((D)), (1) (2) (n) (i) can be expressed as x(D) = (xt , xt , . . . , xt )Dt , r , with x(i) (D) = xt Dt , 1 i n. t=r t=r So we will denote the set of n-tuples of elements from 2 ((D)) by n ((D)). Similarly, we have n [D], 2 2 n [[D]], and n (D). Now we are well prepared for the following denitions. 2 2 Denition 7.1 (convolutional transducer) A rate R = b/c (binary) convolutional transducer over the eld of rational functions 2 (D) is a linear mapping

i=r
which can be represented as v(D) = u(D)G(D)
where G(D) is a b c transfer function matrix of rank b with entries in code sequence arising from the information sequence u(D).
Since the transfer function matrix has rank b over the eld 2 (D), the transducer map is injective; that is, we are able to reconstruct the information sequence u(D) from the code sequence v(D). Denition 7.2 (convolutional code) A rate R = b/c convolutional code C over 2 is the image set of a rate R = b/c convolutional transducer with G(D) of rank b over 2 (D) as its transfer function matrix. Hence, we can regard a convolutional code as the 2 ((D)) row space of G(D) and in this sense it is a rate R = b/c block code over the eld of binary Laurent series encoded by G(D). Denition 7.3 (convolutional generator matrix) A transfer function matrix (of a convolutional code) is called a convolutional generator matrix if it (has full rank and) is realizable. Denition 7.4 (convolutional encoder) A rate R = b/c convolutional encoder of a convolutional code with generator matrix G(D) over 2 (D) is a realization by a linear sequential circuit of a rate R = b/c convolutional transducer whose transfer function matrix G(D) (has full rank and) is realizable.

b 2 ((D))
u(D) v(D)
c 2 ((D))
(7.14)
2 (D)
and v(D) is called a
76
7.1.2
Minimal Encoder Realization
A convolutional encoder E is a physical realization of a generator matrix G(D) by a linear sequential circuit. Let be the number of delay elements required to build the encoder and let the binary -tuple = 1 2 . . . denote the contents of these delay elements. Denition 7.5 (encoder state and encoder state space) Given a convolutional encoder E. The contents of the delay elements at a certain time t is the encoder state t . The set of all encoder states reachable from the zero state = 0 is the encoder state space SE . A rate R = b/c convolutional encoder is initialized with r = 0 and generates for all t r an (1) (2) (c) output c-tuple v t = vt vt . . . vt which is a function of the encoder state t and the input b-tuple (1) (2) (b) ut = ut ut . . . ut . Clearly, the encoder state t+1 at time t + 1 depends on t and ut . Hence, a convolutional encoder is described by the (encoder) state space description t+1 = t A + ut B v t = t C + ut F and the initial encoder state r = 0, (7.17) (7.15) (7.16)
where the matrix A, the b matrix B, the c matrix C, and the b c matrix F have entries from 2 . It is a straightforward matter to obtain the matrices A, B, C, F of the state space description of a given generator matrix G(D) in controller or in observer canonical form. On the other hand, the generator matrix of a given convolutional encoder E with state space description A, B, C, F is obtained by
G(D) = B A D 1 I
C +F
(7.18)
where I is the identity matrix. Clearly, (7.15)-(7.17) uniquely determine the encoder state space SE . The encoder state space dimension is called as the (convolutional) encoder degree deg E log2 |SE |. (7.19)
Generally, not all 2 binary -tuples are reachable from the zero-state. We have BA1 BA2 . . .
deg E = rank BA B
(7.20)
Hence, deg E and if > deg E, then the encoder is called not controllable. Any encoder in controller canonical form satises deg E = . Example 7.3 (encoder in controller, observer, and rational cannonical form) Consider the
rate R = 2/3 generator matrix G(D) = 1 + D + D2 D + D3 + D4 + D5 1 + D2 + D3 D + D4 + D5 + D6
2
1 + D2 + D3 1 + D3 + D4 + D5 + D6
(7.21)
77
Figure 7.3: Encoder in controller (top), observer (center), and rational (bottom) canonical form with G(D) given in (7.21).
78

17 delay elements, respecIts state space description
This encoder has the minimal number of delay elements among all possible realizations of G(D) and, moreover, its matrix A is in rational canonical form. In the following we describe how to construct such an encoder for a given generator matrix.
The realization in controller and in observer canonical form require = 9 and = tively. However, Figure 7.3 depicts a realization of G(D) with = 6 delay elements. is 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 ,B = A= ,C = 0 0 0 0 1 0 0 0 0 ,F = 1 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1
1 1 1 0 0 1
(7.22)
It is natural to say that two realizations are equivalent if they realize the same generator matrix. Clearly, it is important to look for realizations with the lowest complexity within the class of equivalent realizations. Hence, we dene: Denition 7.6 (equivalent realization) Two encoders E and E are called equivalent realizations if they have the same generator matrix G(D). Denition 7.7 (minimal realization) A minimal realization of G(D) is a convolutional encoder whose number of delay elements is minimal over all equivalent realizations. Given a realization E of the generator matrix G(D) with state space description A, B, C, F . If = deg E, e.g., E is in controller canonical form, then the encoder degree of an equivalent minimal realization Emin is deg Emin = rank K where K C AC A2 C . . . A1 C . (7.24) (7.23)
The corresponding state space description is obtained by Amin = LAR, Bmin = BR, Cmin = LC, Fmin = F (7.25)
where R denotes the deg Emin dimensional matrix formed from the rst deg Emin linearly independent columns of K and L denotes a deg Emin left inverse of R. All states of a minimal realization Emin are reachable from the zero-state. Given a minimal encoder Emin ( = deg Emin ) with state space description Amin , Bmin , Cmin , Fmin . Let q1 (x), q2 (x), . . . , qp (x) be the p elementary divisors of Amin . An algorithm to compute the elementary divisors can be found in [Gil66]. Any elementary divisor is either a power of x or a power of an irreducible polynomial and their product is the characteristic polynomial1 of Amin , i.e., qi (x) = Amin xI . (7.26)
Then, the matrix Arcf of an equivalent (minimal) realization Ercf in rational canonical form is given by the direct sum2 Arcf = Mq1 (x) Mq2 (x)
1 2
...
Mqp (x)
(7.27)
The characteristic polynomial of a n n matrix A is given by |A xInn |. A The direct sum of two matrices A and B is given by A B = . B
7.1. STRUCTURAL PROPERTIES of the companion matrices qi,n1 1 qi,n2 1 . .. . . . qi1 1 qi0
79
Mqi (x)
(7.28)
where qij are the coecients of the ith elementary divisor qi (x) = qi0 + qi1 x + + qi,n1 xn1 + xn . To obtain a non-singular similarity transformation matrix P such that A rcf = P Amin P 1 we write P Amin + Arcf P = 0 and have a system of 2 homogeneous equations in pij T (p11 . . . p1 p21 . . . p2 . . . p1 . . . p )T = 0 where the matrix T can be written in compact form by using the direct product 3 T = I AT + Arcf I . min (7.31) (7.30) (7.29)
Let the null space of T be spanned by matrix N . Then, P is determined (in general not uniquely) from any linear combination of the columns of N such that P is non-singular. Hence, the encoder state space description of an equivalent (minimal) encoder Ercf in rational canonical form is Arcf = P Amin P 1 , Brcf = Bmin P 1 , Crcf = P Cmin , Frcf = Fmin . (7.32)
The encoder Ercf consists of p cascades of memory elements, each corresponds to one of the p elementary divisors qi (x). Any given generator matrix G(D) can be realized in rational canonical form. It is a minimal realization and can be used to study the structural properties of G(D). We conclude this subsection with an example.
Figure 7.4: Encoder in rational (left) and controller canonical form (right) with G(D) given in (7.33). Example 7.4 (encoder in controller and rational cannonical form) Consider the rate R = 1/2
generator matrix G(D) = 1
1+D+D 3 1+D
(7.33)
The encoder in controller canonical form requires = 3 delay elements and is a minimal realization of G(D). Its state space description is 1 1 0 0 0 Amin = 0 0 1 , Bmin = 1 0 0 , Cmin = 0 0 , Fmin = 1 1 . (7.34) 0 0 0 0 1
a11 B . 3 . The direct product of two matrices A and B is given by A B = . ak1 B ... a1n B . . . . akn B
...
80
and obtain
To get a similarity transformation matrix P we compute T 0 Amin 0 0 AT 0 + 0 T = 0 min 0 0 0 AT min 0 1 = 0 0 0 0 0 0 1 0 0 0
The elementary divisors of A are q1 (x) = x2 and q2 (x) = 1 + x. Hence, 0 1 0 0 1 1 = 0 0 0 . Arcf = 0 0 0 0 1 I33 0 0 0 0 I33
(7.35)
(7.36)
NT
0 1 0 0 0 0 0 0 1
The encoders in controller and rational canonical form are depicted in Fig. 7.4.
Then P = P1 + P2 + P3 and P = P1 + P3 are the two non-singular linear combinations of P1 , P2 , P3 and with P we obtain the state space description of an encoder Ercf in rational canonical form 0 1 0 0 1 Arcf = 0 0 0 , Brcf = 1 0 1 , Crcf = 0 1 , Frcf = 1 1 . (7.39) 0 0 1 0 1
which spans the null space of T . Writing the columns of 0 0 1 0 P1 = 0 0 1 P2 = 0 0 0 0 0
N row-wise to 3 3 0 0 1 0 0 P3 = 0 1 0 0
0 0 0 0 1 1
(7.37) matrices yields 0 0 0 0 . 1 1
(7.38)
7.1.3
Minimal Generator Matrix
The structural properties of convolutional generator matrices are discussed in detail in [JZ99]. Here we focus on minimal generator matrices and present an important theorem that enables to generate the complete set of minimal (rational) generator matrices of a given convolutional code. Just as a generator matrix can be realized by dierent encoders, so a convolutional code can be encoded by dierent generator matrices. Hence, we dene: Denition 7.8 (equivalent generator matrix, equivalent encoder) Two convolutional generator matrices G(D) and G (D) are equivalent if they encode the same code. Two convolutional encoders are called equivalent encoders if their generator matrices are equivalent. In [JZ99] it is shown that two rate R = b/c convolutional generator matrices G(D) and G (D) are equivalent if and only if there is a b b non-singular matrix T (D) over 2 (D) such that G(D) = T (D)G (D). The encoder degree of a minimal realization of G(D) is called the generator matrix degree 4 deg G, i.e., deg G deg Emin . (7.41)
(7.40)
Clearly, deg G is a generator matrix property and satises deg G deg E, where deg E is the encoder degree of any realization of G(D). Denition 7.9 (minimal generator matrix) A minimal generator matrix is a generator matrix whose degree is minimal over all equivalent generator matrices. A minimal encoder is a minimal realization of a minimal generator matrix.
4
The generator matrix degree corresponds to the abstract state space dimension in [JZ99].
7.1. STRUCTURAL PROPERTIES The degree of a minimal generator matrix Gmin (D) is called the code degree deg C deg Gmin .
81
(7.42)
Clearly, deg C is a code property and satises deg C deg G, where deg G is the degree of any generator matrix equivalent to Gmin (D). Hence, any encoder E satises deg C deg G deg E (7.43)
and if E is a minimal encoder, then equality holds throughout (7.43). Let us consider two important classes of minimal generator matrices. Consider a rate R = b/c rational generator matrix G(D). We may write the elements of the ith row in G(D) as gij (D) = fij (D)/qi (D), 1 j c, where fi1 (D), fi2 (D), . . . , fic (D), qi (D) 2 [D] and gcd fi1 (D), fi2 (D), . . . , fic (D), qi (D) = 1. Then the ith input constraint length i of G(D) is dened as i its memory m as m and its overall constraint length as
i
max deg fi1 (D), deg fi2 (D), . . . , deg fic (D), deg qi (D) ,
1 i b,
max{i },
i
i .
The realization Eccf in controller canonical form of a generator matrix G(D) with overall constraint length satises deg Eccf = = , i.e., it is realized with = delay elements and all encoder states are reachable. If G(D) satises deg C = , (7.47)
then it is called a canonical generator matrices Gcan (D). Notice that > deg C does not imply deg G > deg C, i.e., a generator matrix can be minimal but not canonical! Canonical generator matrices are studied extensively in [JZ99]. A polynomial and canonical generator matrix is called a minimalbasic generator matrix. Any convolutional code C can be encoded by a minimal-basic generator matrix Gmb (D) and, hence, any given generator matrix of a convolutional code C is equivalent to G mb (D). Another important class of minimal generator matrices are systematic generator matrices. Any systematic generator matrix Gsys (D) of a given convolutional code C satises Gsys (D) = M 1 (D)Gmb (D), M (D) Mbb , (7.48) where Gmb (D) is any minimal-basic generator matrix of C and Mbb denotes the set of all b b submatrices of G(D) that have a delay-free determinant. A determinant |M (D)| that is not delay-free yields a matrix which is not realizable and, hence, not a generator matrix. Generally, neither the realization in controller nor in observer canonical form of a systematic generator matrix is a minimal encoder. However, the realization in observer canonical form of any rate R = (c 1)/c systematic generator matrix is a minimal encoder. Generally, any minimal generator matrix of a convolutional code C can be obtained from a given minimal-basic generator matrix Gmb (D) of C. Theorem 7.1 (minimal equivalent generator matrices) Given a R = b/c minimal-basic generator matrix Gmb (D) and a b b rational matrix T (D) that has a polynomial inverse T 1 (D). Then G(D) = T (D)Gmb (D) is minimal if and only if
T i G T where i mb and i
1 1
G i mb ,
i = 1, 2, . . . , b,
denote the ith constraint lengths of Gmb (D) and T 1 (D).
(7.44)
(7.45)
(7.46)
(7.49)
(7.50)
82
Example 7.5 (equivalent minimal generator matrices) Consider the generator matrix G(D) given
in (7.21) of Example 7.3. An equivalent minimal-basic generator matrix is Gmb (D) = 1 + D + D3 D2 + D3 1 + D + D3 D + D2 D + D2 1 + D + D3 . (7.51)
Its overall constraint length is Gmb = 6. It can be realized with = 6 delay elements in controller canonical form. We obtain G(D) = T (D)Gmb (D), where T (D) = Its inverse T 1 (D) = 1 + D + D3 D + D3 1 1
1
1 D + D3
1 1 + D + D3
(7.52)
(7.53)
1
G T T is polynomial and its constraint lengths satisfying (7.50) of Theorem 7.1, i.e., 1 = 1 mb = 3 and 2 Gmb 2 = 3. It follows that G(D) is minimal. A minimal encoder of G(D) is shown in Figure 7.3.
From Theorem 7.1 we immediately obtain the following important corollary. Corollary 7.2 (set of minimal generator matrices) Given a rate R = b/c convolutional code C with minimal-basic generator matrix Gmb (D). Then any minimal (rational) generator matrix Gmin (D) of C can be generated by Gmin (D) = T (D)Gmb (D) with T (D) TGmb , where
TGmb = T (D) T 1 (D) over
2 [D],
T del |T 1 (D)| = 0, i
G i mb , i .
(7.54)
The mapping between TGmb and the set of all minimal generator matrices is one-to-one. The cardinality of TGmb determines the number of minimal (rational) generator matrices of a convolutional code.
7.2
Distance Properties
The most important distance measures of a convolutional code are the path enumerator function and the active distances. While the rst allows to derive an upper bound on the performance of maximum-likelihood decoding, the latter can be used to describe the error-correcting capability of the code.
7.2.1
State Diagram and Transition Matrix
The state diagram of a convolutional encoder and its mathematical counterpart, the transition matrix, are introduced. While convolutional distance measures are nicely described by using the state diagram, it is the transition matrix that allows to compute these distances easily. Consider a rate R = b/c convolutional encoder E with degree deg E and state space description A, B, C, F . Let us denote the encoder state space by SE = 1 , 2 , . . . , 2deg E . Then any encoder state i , i {1, 2, . . . , 2deg E }, is uniquely determined by its index i. By assumption, we always associate the index i = 1 with the zero-state, i.e., 1 = 0. To simplify notations we will write sometimes i for i . There are 2b transitions from any encoder state i SE . If b > rank B, then there exist 2brank B parallel transitions between any two encoder states i, i SE . Hence, we denote a particular transition by (i, i )p , i, i {1, 2, . . . , 2deg E }, p {1, 2, . . . , 2brank B }, and use the state space description (7.15)-(7.17) to associate an information b-tuple u ii ,p and a code c-tuple v ii ,p with (i, i )p . If b rank B, then we skip the index p. The (encoder) state diagram of a convolutional encoder is the directed graph consisting of the states (vertices) i {1, 2, . . . , 2deg E } and the 2b+deg E transitions (edges) (i, i )p . With any (i, i )p we associate the information b-tuple uii ,p and the code c-tuple v ii ,p .
7.2. DISTANCE PROPERTIES The (state) transition matrix of a convolutional encoder is a 2deg E 2deg E matrix M (W ) with mii (W ) =
pW wtH (v ii
,p )
83
mii (W )
1i,i 2deg E
(7.55)
if (i, i )p exists, otherwise.
(7.56)
The elements mii (W ) are polynomials in W . The coecient of W w is the number of transitions (i, i )p that are associated with a code c-tuple of Hamming-weight w = wtH (v ii ,p ). The element m11 (W ) corresponds to the zero-state to zero-state transition(s). The pruned transition matrix M pr (W ) is obtained by pruning the zero-state to zero-state transition associated with u ii ,p = 0, i.e., mpr,11 (W ) = m11 (W ) 1. The extended transition matrix M (K, W, L) is given by mii (K, W, L) =
pK wtH (uii
,p )
mii (K, W, L)
1i,i 2deg E
(7.57)
W wtH (vii ,p ) L if (i, i )p exists, otherwise.
(7.58)
The operator K takes into account the information weight and the operator L is to count the number of transitions. Let us consider an example.
Figure 7.5: Encoder in observer canonical form with G(D) given in (7.59). Example 7.6 (transition matrix) Consider the realization Eocf in observer canonical form of the rate
R = 3/4 systematic generator matrix 1 0 0 G(D) = 0 1 0 0 0 1 description is 1 0 0 C= 0 1
1+D+D 2 1+D 2 D2 1+D 2 D 1+D
1 F = 0 0 0 1 0 0 0 1 1 0 . 0
(7.59)
According to b rank(B) = 1 there exist two parallel transitions between any two states. For example, (1, 1) 1 and (1, 1)2 are the transitions from the zero-state to the zero-state. The rst is associated with u 11,0 = 000, v 11,0 = 0000 and the second with u11,1 = 111, v 11,1 = 1111. The transition matrix of Eocf is 1 + W4 2W 2 W + W3 W + W3 2W 2 2W 2 W + W3 W + W3 . (7.61) M (W ) = 2 4 2W 1+W W + W3 W + W3 2 2 3 3 2W 2W W +W W +W
as depicted in Figure 7.5. Its state space 0 0 1 B= 1 A= 1 0 1
0 0
0 0
0 1
(7.60)
84
The extended transition matrix of Eocf is M (K, W, L) = L + K 3W 4L (K + K 2 )W 2 L (K + K 2 )W 2 L (K + K 2 )W 2 L (K + K 2 )W 2 L L + K 3W 4L 2 2 (K + K )W L (K + K 2 )W 2 L KW L + K 2 W 3 L KW L + K 2 W 3 L W L + K 3 W 3 L KW L + K 2 W 3 L . KW L + K 2 W 3 L KW L + K 2 W 3 L KW L + K 2 W 3 L W L + K 3 W 3 L (7.62)
The transfer function of the transitions between the zero-states is m 11 (W ) = 1 + W 4 and m11 (K, W, L) = L + K 3 W 4 L, respectively.
We denote a path of length l in the state diagram by p[1,l] = (i1 , i2 )p1 (i2 , i3 )p2 . . . (il , il+1 )pl . Associated with p[1,l] are the length l information and code sequence segments u[1,l] = ui1 i2 ,p1 ui2 i3 ,p2 . . . uil il+1 ,pl v [1,l] = v i1 i2 ,p1 v i2 i3 ,p2 . . . v il il+1 ,pl and the length l + 1 state sequence segment i[1,l+1] = i1 i2 . . . il+1 . (7.66) (7.64) (7.65) (7.63)
If all states in i[1,l+1] are distinct but i1 = il+1 , then p[1,l] is called a cycle of length l. We do not distinguish between a cycle and its cyclic permutations. A cycle o is said to have Hamming-weight w if the associated code sequence segment v [1,l] has Hamming-weight w = wtH (v [1,l] ). Denition 7.10 (encoder cycle set) Given a convolutional encoder E. The set O of all cycles in the state diagram of E is called the (encoder) cycle set. A cycle o O that is associated with an all-zero information sequence segment is called an internal cycle. The set O i of all internal cycles in O is called the internal (encoder) cycle set. The internal cycle (1, 1) in the zero-state is denoted by o 1 . Cycles will play an important role in the sequel. Often it is convenient to use weight enumerator functions when dealing with cycle sets. We denote the cycle (set) weight enumerator by C(W, L)
l w
c(w, l)W w Ll .
(7.67)
This is a polynomial in W and L, where c(w, l) is the number of length l cycles with weight w in the cycle set O. Equivalently, we denote by Ci (W, L) the internal cycle (set) weight enumerator.
00 6 00 11 11 1 10 7 11 10 8 11 2 01 4 00 01
01 3
00
10 5 01
10
Figure 7.6: State diagram of the encoder in rational canonical form depicted in Figure 7.4.
85
Example 7.7 (cycle weight enumerator) Consider the rate R = 1/2 generator matrix G(D) = 3 (1 1+D+D ) of Example 7.4. In Figure 7.6 the state diagram of its encoder in rational canonical form (see 1+D Figure 7.4) is shown. The cycle weight enumerator is
C(W, L) = (1 + W )L + W 3 L2 + (W 3 + W 4 )L3 + (W 3 + W 4 + W 5 )L4 + (W 3 + W 6 )L5 + (2W 5 + W 9 )L6 + (W 4 + W 7 + W 8 + W 11 )L7 + (W 6 + W 10 )L8 . (7.68) For example, there exist two length l = 5 cycles o = (1, 6)(6, 4)(4, 5)(5, 3)(3, 1) and o = (2, 5)(5, 3)(3, 6)(6, 4)(4, 2) of Hamming-weight 6 and 3, respectively. Cycles of all lengths l, 1 l 8, exist and |O| = 19. The internal cycle set Oi consists of the two cycles o1 = (1, 1), o = (2, 2) and the internal cycle weight enumerator is Ci (W, L) = (1 + W )L.
7.2.2
Weight Enumerators
The path enumerator is the most important distance measure of a convolutional code. If the code is used to communicate over a memoryless channel and maximum-likelihood decoding is employed, then it allows to upper-bound the rst event probability and the bit error probability. Consider the state diagram of a minimal realization of the convolutional generator matrix G(D). Dene the set Pl of length l paths p[1,l] that start in the zero-state, end in the zero-state, and do not touch the zero-state in between, i.e., Pl p[1,l] i1 = il+1 = 1 and it = 1 for 0 < t l . (7.69)
Then we denote by N (k, w, l) the number of length l paths p[1,l] Pl with an associated information and code sequence segment that satises wtH (u[1,l] ) = k and wtH (v [1,l] ) = w, respectively. We call N (k, w, l) the extended weight spectrum of a convolutional generator matrix G(D). It is invariant over the set of minimal realizations of a given G(D) and, hence, it is a generator matrix property! Let N (k, w, l) be the extended weight spectrum of a minimal generator matrix, then N (w) =
k l
N (k, w, l)
(7.70)
is called the weight spectrum of a convolutional code C. It is invariant over the set of minimal encoders and, hence, it is a code property! The path enumerator (function) T (W ) of a convolutional code C with weight spectrum N (w) is given by T (W )
w
N (w)W w .
(7.71)
Its closed form is a rational function in W whose serial expansion in W coincides with T (W ). It can be obtained as follows. Lemma 7.3 (path enumerator) Given the transition matrix M (W ) of a minimal encoder of the convolutional code C with degree deg C. The closed form of the path enumerator T (W ) of C is obtained by T (W ) = a1 (W ) Ideg Cdeg C A(W ) where the deg C deg C dimensional matrix A(W ) = aii (W ) aii (W ) = and a1 (W ) = a2 (W ) = m11 (W ) 1 m12 (W ) . . . m12deg C (W ) 1 m21 (W ) . . . m2deg C 1 (W )
T 1
a2 (W ) with
(7.72)
1i,i 2deg C
0, mii (W ),
i = 1 or i = 1 otherwise
(7.73)
(7.74) (7.75)
where mii (W ) are taken from the transition matrix M (W ).
86
The extended path enumerator (function) of a convolutional generator matrix G(D) with extended weight spectrum N (k, w, l) is given by T (K, W, L)
l w k
N (k, w, l)K k W w Ll .
(7.76)
It is a straight forward matter to apply Lemma 7.3 together with the extended transition matrix M (K, W, L) to obtain a closed form of T (K, W, L). Example 7.8 (extended path enumerator) Consider the convolutional code C with the systematic generator matrix G(D) given in (7.59) of Example 7.6. Applying Lemma 7.3, we obtain the closed form of the path enumerator of C as
T (W ) = The series expansion in W yields T (W ) = 6W 3 + 23W 4 + 80W 5 + 290W 6 + 1050W 7 + 3804W 8 + . . . , (7.78) 6W 3 + 5W 4 W 5 2W 6 3W 7 + 3W 9 W 11 . 1 3W 2W 2 W 3 + W 5 W 7 (7.77)
i.e., the weight spectrum of the code starts with N (1) = N (2) = 0, N (3) = 6, N (4) = 23, . . .. The rst terms of the extended path enumerator of G(D) are T (K, W, L) =K 3 W 4 L+ (2K 2 + 2K 3 )W 3 + (K 2 + 2K 3 + K 4 )W 4 + (2K 3 + 2K 4 )W 5 L2 + (K 2 + K 3 )W 3 + (K 2 + 4K 4 + 3K 4 )W 4 + . . . + (K 6 + K 7 )W 9 L3 + . . . . (7.79)
For example, there exist two length l = 3 paths with code Hamming-weight w = 3, one with information Hamming-weight k = 2 and another with k = 3.
7.2.3
Active Distances
The active distances are a family of distance measures of a convolutional code. The active burst distance is its most important member. It describes the error-correcting capability of the code. The most well-known distance parameter of a convolutional code is its free distance. Denition 7.11 (free distance) Let C be a convolutional code. The minimal Hamming-distance between any two dierent code sequences dfree = min
v=v
dH (v, v )
(7.80)
is called the free distance of the code. From the linearity of a convolutional code it follows immediately that dfree is also the minimal Hamming-weight over the non-zero code sequences. Clearly, this is the minimal Hamming-weight over all cycles in the zero-state (except o1 ) of a minimal encoder of C. Lemma 7.4 (free distance) Given the pruned transition matrix Mpr (W ) of a minimal encoder of the convolutional code C with degree deg C. The free distance of C is obtained by 5
2 dfree = ldeg e1 Mpr
deg C
(W )eT 1
(7.81)
where e1 = (1 0 0 . . . 0) and e1 T is the transposed of e1 . The paths in the state diagram of two (semi-innite) code sequences with Hamming-distance d free dier in at most 2deg C transitions. In contrast to this we obtain another important distance parameter, when we consider the minimal average (per transition) Hamming-distance of non-merging paths in the state diagram.
5
We denote by ldeg p(W ) the least degree of the polynomial p(W ) in W .
87
Denition 7.12 (slope) Let C be a convolutional code. The minimal normalized Hamming-weight = min
oO\o1
wtH (o) l(o)
(7.82)
among the cycles set (except o1 ) of a minimal encoder is called the slope of C. Again we apply transfer function methods to compute the slope. Lemma 7.5 (slope) Given the pruned transition matrix Mpr (W ) of a minimal encoder of the convolutional code C with degree deg C. The slope of C is obtained by6 =
1l2deg C
min
1 l ldeg tr Mpr (W ) . l
(7.83)
The active distances are a family of distance measures. They are dened for non-merging paths in the state diagram. Hence, the name active. Consider a minimal encoder of the convolutional code C. We dene the set of length l paths p[1,l] that start in i1 S1 , end in il+1 S2 , and do not have transitions between the zero-states with zero information Hamming-weight PlS1 S2 p[1,l] i1 S1 , il+1 S2 , (it , it+1 )pt = (1, 1) with uit it+1 ,pt = 0, 1 t l . (7.84)
Now we are prepared to dene the active distance family. Denition 7.13 (active distances) Given a convolutional code C. Its lth order active burst distance is given by ab = min l
Pl
1 1
wtH (v [1,l] ) ,
b l lmin ,
(7.85)
b where lmin denotes the minimal possible burst length of C. Its lth order active column distance is
ac = min l
Pl
1 SE
wtH (v [1,l] ) ,
l 1,
(7.86)
its lth order active reverse column distance is arc = min l

Pl
SE 1
wtH (v [1,l] ) ,
l 1,
(7.87)
and its lth order active segment distance is as = min l

Pl
SE SE
wtH (v [1,l] ) ,
l 1,
(7.88)
where PlS1 S2 as dened in (7.84) and 1 is the zero-state. The active distances are invariant over the set of minimal encoders. Hence, they are a code property! Again we apply transfer function methods and compute the active distances. Lemma 7.6 (active distances) Given the pruned transition matrix Mpr (W ) of a minimal encoder of the convolutional code C. The active distances are obtained by
l ab = ldeg e1 Mpr (W )eT 1 l l ac = ldeg e1 Mpr (W )aT l l arc = ldeg aMpr (W )eT 1 l l as = ldeg aMpr (W )aT l
(7.89) (7.90) (7.91) (7.92)
where e1 = (1 0 . . . 0) and a = (1 1 . . . 1).
88
25 ax l 20 15 10 5 0 0
act. burst dist. act. column dist. act. rev. column dist. act. segment dist.
10
20
30
40
50
Figure 7.7: Active distance family of the convolutional code C with G(D) given in (7.51). Notice, it is sucient to consider the least degree of any polynomial in W , when performing the matrix multiplications in (7.89)-(7.92). This considerably decreases the computational complexity.
and minimal-basic generator matrix G(D) given in (7.51) of Example 7.5. Its free distance is d free = 6, its slope b is = 3/11, and the minimal possible burst length is lmin = 4. The active distance family of C is depicted in Figure 7.7.
Example 7.9 (active distances) Consider the rate R = 2/3 convolutional code C with degree deg C = 6
Theorem 7.7 (active distances) The active distances of a convolutional code C with free distance dfree and slope satisfy ab max l + b , dfree l l +
rc
(7.93) (7.94) (7.95) (7.96)
arc l as l
ac l + c l l +
where b , c , rc , s are taken such that equality holds for at least one l in (7.93)-(7.96). The active distances are lower-bounded by linearly increasing functions that are determined by the list of parameters A = dfree , , b , c , rc , s . (7.97)
We denote by tr A =
akk the trace of the matrix A.
89
7.3
7.3.1

OFD Codes
Table 7.1: Minimal-basic generator matrices of rate R = 1/2 OFD codes.
deg C 1 2 3 4 5 6 dfree 3 5 6 7 8 10 1 1/2 1/2 4/11 8/23 4/13 G(D) 1 1+D 1 + D2 1 + D + D2 1 + D + D3 1 + D + D2 + D3 1 + D3 + D4 1 + D + D2 + D4 1 + D2 + D4 + D5 1 + D + D2 + D3 + D5 1 + D2 + D3 + D5 + D6 1 + D + D2 + D3 + D6 (2 3) (5 7) (15 17) (23 35) (53 75) (133 171)
7.3.2
MS Codes
Table 7.2: Minimal-basic generator matrices of the rate R = 1/2 MS codes.
deg C 1 2 dfree 3 3 4 5 3 4 5 6 3 4 5 6 7 3 4 5 6 7 8 3 4 5 6 7 8 9 10 1 1/2 2/3 1/2 1/3 4/7 4/7 1/2 1/4 1/2 1/2 5/11 3/8 1/5 2/5 7/15 5/11 4/9 2/5 1/6 1/3 10/23 3/7 2/5 2/5 7/19 8/25 G(D) 1 1+D 1 1 + D2 1 + D 1 + D + D2 1 + D2 1 + D + D2 1 1 + D3 1 1 + D + D3 1 + D 1 + D + D3 1 + D + D3 1 + D + D2 + D3 1 1 + D4 1 1 + D + D4 1 + D 1 + D + D4 1 + D + D2 + D3 1 + D + D4 1 + D + D4 1 + D + D2 + D4 1 1 + D5 1 1 + D + D5 1 1 + D + D3 + D5 1 + D + D2 + D3 1 + D + D2 + D4 + D5 1 + D + D2 1 + D + D3 + D5 1 + D + D2 + D3 + D4 1 + D + D3 + D5 1 1 + D6 1 1 + D + D6 1 1 + D2 + D3 + D6 1 1 + D + D2 + D3 + D4 + D6 1 + D + D2 + D4 1 + D2 + D3 + D4 + D6 1 + D + D5 1 + D + D4 + D5 + D6 1 + D2 + D5 1 + D + D3 + D4 + D5 + D6 1 + D2 + D3 + D4 + D6 1 + D + D2 + D5 + D6 (2 3) (4 5) (6 7) (5 7) (10 11) (10 15) (14 15) (15 17) (20 21) (20 31) (30 31) (36 31) (31 35) (40 41) (40 61) (40 65) (74 73) (70 65) (76 65) (100 101) (100 141) (100 131) (100 175) (164 135) (142 147) (122 157) (135 163)
7.3.3
ODP Codes
Corresponding tables can be found in [JZ99].
7.3.4
QLI and ELI Codes
Corresponding tables can be found in [JZ99].
90
7.3.5
Punctured Codes
Puncturing of convolutional code sequences was introduced by Cain, Clark, and Geist. Punctured convolutional codes for a wide range of rates were tabularizes since then, and, usually, rate R = 1/2 OFD convolutional codes are used as mother codes. Then, a rate R = b/c punctured convolutional code is given by a 2 b puncturing matrix P = (pij ), where pij {0, 1} and i,j pij = c. The matrix elements pij = 0 indicate that the ith code bit vt in the mother code tuples v t are punctured. Usually, the puncturing matrices are obtained by search, such that there exists no other 2b puncturing matrix P leading to a punctured code with the same mother code and superior distance spectrum. In in our search we will involve the slope of the punctured code as an additional criterion. To demonstrate a problem that arise when doing so, we like to look at an example. Example 7.10 (punctured code) Let the rate R = 1/2 minimal-basic encoding matrix G(D) = 1 + D 1 + D + D3 (7.98)
(i)
of memory m = 3 be used to encoder the mother code sequence, which is punctured according to the puncturing matrix P = 1 0 1 0 1 0 1 1 . (7.99)
Hence, we obtain a convolutional code with overall constraint length = 1 by puncturing a m = 3 mother code. The slope of the punctured code is = 1.
The overall constraint length of this generator matrix is smaller than that of the mother code. Furthermore, we can nd an equivalent minimal-basic generator matrix 1 1 0 0 1 0 0 1 1 1 mb Gp (D) = (7.102) . 0 D 0 0 1 1 0 0 0 0
and according to P we erase the 3nd, 4rd, and 7th column of G[4] (D) and get a generator matrix of the punctured code 1 1 0 0 1 0 D 1 1 0 (7.101) Gp (D) = . 0 0 1 1 1 D D 0 D 1
By blocking we obtain the 4 8 dimensional encoding matrix 1 1 1 1 0 0 0 D 1 1 1 1 G[4] (D) = 0 0 0 D 1 1 D D 0 0 0 D
0 0 1 1
1 0 1 1
(7.100)
Since we do not want to reduce the overall constraint length by puncturing, we will only consider puncturing matrices P that achieve punctured codes having an overall constraint length equal to the memory of the mother code.
Table 7.3: Mother codes and puncturing patterns of rate R = (c 1)/c punctured codes.
deg C 1 2 dfree 2 2 3 3 2 3 4 4 2 3 4 5 2 3 4 5 6 6 2 3 4 5 6 5/16 3/8 6/17 5/16 3/7 2/5 1/5 5/13 4/9 4/11 1/3 1 2/3 1/2 1/3 4/7 1/2 R = 2/3 G(D) (2 3) (6 7) (5 7) (10 11) (10 15) (14 15) (20 31) (35 23) (40 41) (40 61) (40 65) (75 53) (75 53) (100 141) (100 131) (142 147) (135 163) 10 11 10 11 10 11 10 11 1/3 2/5 3/11 2/7 10 11 11 10 10 11 10 11 10 11 10 11 11 10 P 10 11 10 11 10 11 10 11 10 11 10 11 1 2/3 1/2 3/5 1/2 2/5 1/2 1/2 2/5 1/5 3/7 1/3 R = 3/4 G(D) (2 3) (6 7) (5 7) (14 15) (15 17) (15 17) (36 31) (36 31) (36 31) (40 41) (74 73) (75 53) (122 157) (164 135) (135 163) (135 163) 101 101 101 110 110 101 100 111 1/3 1/3 6/17 P 100 111 110 101 110 101 101 110 100 111 110 101 111 100 101 110 100 111 100 111 101 110 111 100 1/2 1/2 1/3 3/8 2/5 1/3 2/3 1/2 1 1 R = 4/5 G(D) (2 3) (6 7) (10 15) (15 17) (35 23) (36 31) (31 35) (40 65) (70 65) (75 53) (164 135) (122 157) (100 131) 1000 1111 1100 1011 1000 1111 1/3 1/3 3/11 1101 1001 1001 1110 1100 1011 1000 1111 1000 1111 1001 1110 1/2 2/5 1/4 1/2 1/2 1001 1110 1011 1100 2/3 1/2 P 1000 1111 1001 1110 1 1 R = 5/6 G(D) (2 3) (6 7) (10 15) (15 17) (36 31) (31 35) (70 65) (74 73) (75 53) (171 133) (171 133) (122 157) 10011 11001 10110 10101 10000 11111 10001 11110 10101 11010 11111 10000 10100 11011 11001 10011 10001 11101 10101 11010 P 10000 11111 10011 11100
91
92
7.4
7.4.1
Decoding Aspects
Trellis Representation of Convolutional Codes
The trellis of a convolutional code is obtained by cascading the trellis module. In contrast to block codes the trellis states have a physical meaning: they correspond to possible encoder states. Additionally, the concolutional code trellises are sectionalized such that c code bits are associated with each edge.
7.4.2
Ecient Optimum Decoding Techniques
Viterbi algorithm and BCJR algorithm can be applied to convolutional codes.
7.4.3
Let a rate R = b/c convolutional code C with generator matrix G(D) be used to communicate over a binary symmetric channel (BSC) with error probability . Assume ML sequence estimation is applied and denote by v the estimated code sequence. Since the transmitted code sequence v is of innite length, we obtain P ( = v) = 1 for any given > 0 even if the system provides an adequate bit v error probability for the information. Clearly, the sequence error probability is not the appropriate performance measure. Due to the linearity of C we can assume, without loss of generality, that v = 0. Consider the path that is associated with v in the state diagram. Clearly, possible error events are organized in bursts and a length l burst error event Blt at time t is a segment v [t,t+l) that corresponds to a path p[1,l] Pl as dened in (7.69). The probability that the estimated code sequence starts with a burst error B l (to simplify notations we omit the index t = r) upper-bounds the probability that a burst error starts at t > r. Applying the union bound we obtain P (Bl ) N (w, l)p(w)
w
(7.103)
Applying the Bhattacharyya bound [JZ99] we obtain p(w) < 2
where N (w, l) is the number of length l paths p[1,l] Pl with wtH (v [1,l] ) = w and the pair-wise error probability p(w) is given by w i w (1 )wi , w odd i=(w+1)/2 i p(w) = (7.104) w i 1 w w/2 (1 )w/2 w (1 )wi , w even. i=w/2+1 i 2 w/2 (1 )
w
(7.105)
The rst event probability, i.e., the probability that v starts with a burst error, is upper-bounded by Pf N (w, l)p(w) < T (W )
l w W =2
(1 )
(7.106)
where T (W ) is the closed form of the path enumerator of C. The rst event probability is a code property! Example 7.11 Consider the rate R = 1/2 convolutional code C with degree deg C = 2 and systematic generator matrix G(D) = 1
1+D+D 2 1+D
(7.107)
is used to communicate over a BSC channel. The extended path enumerator of G(D) is T (K, W, L) = K 2 W 4 L3 (L + W W 2 L) 1 W L(1 + K 2 W 2 L + K 2 W L2 K 2 W 3 L2 ) (7.108)

10
0
93
P(Bl) 10
2
=0.001 =0.005 =0.01 =0.03 =0.05
10
Pf 10
1
10
10
10
10
10
10
10
10
10
20 l
30
40
50
10
0.1
0.01
0.001
Figure 7.8: Union bound on P (Bl ) (left) and on rst event probability Pf (right).
and the number N (w, l) of length l paths with Hamming-weight w is obtained by series expansion T (1, W, L) =W 5 L3 + W 4 L4 + (W 5 + W 8 )L5 + (W 6 + 2W 7 )L6 + (W 6 + W 7 + 2W 8 + W 11 )L7 + (3W 8 + 4W 9 + 2W 10 + 3W 11 + W 14 )L8 + . . . . (7.109)
For example, there are three length l = 6 paths, one with Hamming-weight w = 6 and two with Hamming-weight w = 7. The union bound on the burst error probability P (Bl ) according to (7.103) is shown in Figure 7.8 (left). The path enumerator of C is T (W ) = W 4 (1 + W W 2 ) 1 W (1 + W + W 2 W 3 ) (7.110)
=W 4 + 2W 5 + 2W 6 + 5W 7 + 8W 8 + 13W 9 + . . . .
The union bound on the rst event probability Pf is shown in Figure 7.8 (right). The bound is not tight for large error probabilities and increases above one.
Pf correct state 1 Pf burst error 1
Figure 7.9: Model for burst error behavior at the ML sequence estimation output. ML sequence estimation can be regarded as a stochastic process that generates a sequence of burst errors embedded in correctly decoded segments. A simple approximation of this behavior is the renewal process7 depicted in Figure 7.9. The path corresponding to the estimated code sequence v starts in the correct state. Then, with probability Pf , a burst error occurs or, with probability
A renewal process reaches over and over again an internal state that is identical to its initial state. This allows to separate the long term behavior from that within a renewal period. Then the process within such a renewal period is given by independent identically distributed random variables.
7
94
1 Pf , the rst transition is decoded correctly and v remains in the correct state. Given a burst 1 error occurs, then its length is given by the probability function Pf P (Bl ) and the burst terminates, with probability 1, in the correct state. Once in the correct state, the process starts over again. Let us now consider the ML sequence estimation. The (information) bit error probability P b is the number of erroneously decoded information bits in the estimated information sequence u, normalized by the number of encoded information bits. Again union bound techniques can be applied to upperbounded Pb and we obtain Pb < 1 T (K, W, L) b K
K=1 (1 ) W =2 L=1
(7.111)
where T (K, W, L) is the closed form of the extended path enumerator of G(D). The bit error probability is a generator matrix property! The bounds on the rst event probability Pf on the bit error probability Pb are tight at rates R and /( + 1 ) , where H(x) = x log2 (x) (1 x) log(1 x) below the critical rate Rcrit = 1 H is the binary entropy function, and diverge for rates above. In other words, they are tight for small error probabilities .
10 P
0
union bound simulation
b
2
10
10
10
0.1
0.01
0.001
Figure 7.10: Union bound on bit error probability Pb . Example 7.12 Consider the extended path enumerator function T (K, W, L) as given in (7.108). Then,
1 T (K, W, L) b K =
K=1 L=1
1 W (1 + W + W 2 W 3 )
2(1 W )(1 + W W 2 )W 4
=2W 4 + 4W 5 + 6W 6 + 18W 7 + 32W 8 + 64W 9 + . . .
(7.112)
and the corresponding union bound on the bit error probability P b is shown in Figure 7.10 together with a simulated curve. The critical rate is Rcrit = 1/2 for 0.015 and the bound is tight for smaller error probabilities and diverges for larger . When considering the equivalent minimal-basic generator matrix G(D) = we obtain 1 T (K, W, L) b K =
K=1 L=1
1+D
1 + D + D2
(7.113)
1 W (1 + W + W 2 W 3 )
W 2 (2 2W 2 + W 3 )
=2W 4 + 4W 5 + 8W 6 + 21W 7 + 40W 8 + 81W 9 + . . . . This is dierent to (7.112) and points out that the bit error probability is a generator matrix property.
(7.114)
95
7.4.4
Suboptimal Decoding Techniques
Not considered here!
96
Chapter 8
Trellis Coded Modulation

In Work!
97
Part IV
Code Concatenation
99 In work!
100
Appendix A
Information: A Quantitative Measure

This is a brief introduction to discrete probability theory. We will see that information is something that we obtain by observing a random variable. Random experiment The possible outcomes of a random experiment is called the sample space . Here we assume the sample space is nite, i.e., = {1 , 2 , . . . , n }. Event and probability An event is any subset of . The impossible event O is the empty subset of and the certain event is . To each event we assign a probability measure, which is a real number between 0 and 1 inclusive, such that P () = 1 and P (A B) = P (A) + P (B) With A = and B = O we conclude P (O) = 0. (A.3) We call i , i = 1, 2, . . . , n, the atomic events and denote pi = P ({wi }). These probabilities completely determine the probabilities of all events. Random variable A discrete random variable X is a mapping from the sample space into a specied nite set X = {x1 , x2 , . . . , xk }. The probability function of a random variable X, denoted fX , is given by fX (x) = P ({X = x}) (A.4) if A B = O. (A.2) (A.1)
where P ({X = x}) denotes the probability of the event that the random variable X takes on the value x. We have fX (x) 0, and fX (x) = 1.
x
all x X
(A.5)
(A.6)
In discrete probability theory, there is no mathematical distinction between a random variable and a vector of random variables. This means we can replace X by X1 X2 . . . XN and obtain the joint probability function , denoted fX1 X2 ...XN , by fX1 X2 ...XN (x1 , x2 , . . . , xN ) = P ({X1 = x1 }, {X2 = x2 }, . . . , {XN = xN }) (A.7)
101 where fX1 X2 ...XN (x1 , x2 , . . . , xN ) 0 and ...

x1 x2
(A.8)
xN fX1 X2 ...XN (x1 , x2 , . . . , xN ) = 1.
(A.9)
Expectation If X is real-valued, then the expectation of X, denoted E(X), is the real number E(X) =
x
xfX (x).
(A.10)
Furthermore, for any real-valued function g(x) whose domain includes X we have E(g(x)) =
x
g(x)fX (x).
(A.11)
This can be extended to the expectation over joint probability functions. Conditional probability function tions. If fX (x) > 0, then one denes Often it is convenient to consider conditional probability funcfXY (x, y) fX (x)
fY |X (y > x) =
(A.12)
where fXY is the joint probability distribution of X and Y . We have fY |X (y|x) 0, and fY |X (y|x) = 1.
y
all y Y
(A.13)
(A.14)
Mathematically, there is no dierence between a conditional probability function and an (unconditioned) probability function. Example A.1 (Random variables) Consider the random experiment of throwing a dice. Uncertainty and mutual information We call log 2 (x) the self-information of X = x and dene the uncertainty about a random variable as the mean self-information of X. Denition A.1 (uncertainty, entropy) The uncertainty (or entropy) of a random variables X is H(X) = log2 (x)fX (x)
x
(A.15)
Now we can dene information as a quantitative measure. Denition A.2 (information) The mutual information of two random variables X and Y is I(X; Y ) = H(X) H(X|Y ) (A.16)
This means information is the reduction of uncertainty about a random variable X by observing the random variable Y . If Y = X, i.e., we exactly know X, then H(X|X) = 0 and and the initial uncertainty H(X) about X is the information that we have received.
102
Bibliography
[Ara88] B. Arazi, A commonsense approach to the theory of error-correcting codes, MIT Press, 1988, ISBN 0262010984. [Ber84] E. R. Berlekamp, Algebraic coding theory, Aegean Park Pr, 1984, ISBN 0894120638. [Bla84] [Bla90] R. E. Blahut, Theory and practice of error control codes, Addison-Wesley, 1984, ISBN 0-20110102-5. R. E. Blahut, Principles and practice of information theory, Addison-Wesley, 1990, ISBN 0-201-10709-0.
[Bos99] M. Bossert, Channel coding for telecommunications, John Wiley & Sons, 1999, ISBN 0471982776. [CC82] G. C. Clark and J. B. Cain, Error-correction coding for digital communications, Plenum Pub Corp, 1982, ISBN 013283796X.
[Gal68] R. G. Gallager, Information theory and reliable communication, John Wiley & Sons, 1968, ISBN W471290483. [Gil66] A. Gill, Linear sequential circuits, McGraw-Hill, 1966, Library of Congress Catalog Card Number 66-29752.
[HW99] C. Heegard and S. B. Wicker, Turbo coding, Kluwer Academic Publishers, 1999, ISBN 0792383788. [JZ99] [LC83] R. Johannesson and K. Sh. Zigangirov, Fundamentals of convolutional coding, IEEE Press, 1999, ISBN 0780334833. Shu Lin and D. J. Costello, Jr., Error control coding: Fundamentals and applications, Prentice-Hall, 1983, ISBN 013283796X.
[McE77] R. J. McEliece, The theory of information and coding, Encyclopedia of Math. and Applications, Vol.3, Addison-Wesely, 1977, ISBN. [MS77] [MZ02] F. J. MacWiliams and N. J. A. Sloane, The theory of error-correcting codes, North Holland, 1977, ISBN. R. H. Morelos-Zaragoza, The art of error correcting coding, John Wiley & Sons, 2002, ISBN 0471495816.
[PH98a] Eds V. S. Pless and W. C. Human, Handbook of coding theory, Part I, NorthHolland, 1998, ISBN 0-444-50088-X. [PH98b] Eds V. S. Pless and W. C. Human, Handbook of coding theory, Part II, NorthHolland, 1998, ISBN 0-444-50088-X. [Pro89] J. G. Proakis, Digital communications, McGraw-Hill Inc., 1989, ISBN 0-07-050937-9.
BIBLIOGRAPHY
103
[PW84] W. W. Peterson and E. J. Weldon, Error-correcting codes, MIT Press, 1984, ISBN 0262160390. [Sch97] [vL82] [VO79] [VY00] C. Schlegel, Trellis coding, Wiley-IEEE Press, 1997, ISBN 0780310527. J. H. van Lindt, Introduction to coding theory, Springer, 1982, ISBN 3540548947. A. J. Viterbi and J. K. Omura, The principles of digital communication and coding, McGrawHill, 1979, ISBN 0070675163. B. Vucetic and J. Yuan, Turbo codes: Principles and applications, Kluwer Academic Publishers, 2000, ISBN 0792378687.
[WJ90] J. M. Wozencraft and I. M. Jacobs, Principles of communication engineering, Waveland Press, 1990, ISBN 0881335541.

Channel Coding

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Channel Coding

Transféré par

Droits d'auteur :

Formats disponibles

Channel Coding

Lecture Notes University of Ulm

Dr.-Ing. Ralph Jordan

(unnished working manuscript, Mai 2003)

IV 6.4.1 6.4.2 6.4.3

CONTENTS Algebraic Decoding Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . Bounds on Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implementation Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 70 70

A Information: A Quantitative Measure References

A Digital Communication System

Source encoder souce coding

Channel encoder channel coding

CHAPTER 1. A DIGITAL COMMUNICATION SYSTEM

max H(X) H(Y |X)

is the entropy of X and fY |X (y|x) log2 fY |X (y|x)

CHAPTER 1. A DIGITAL COMMUNICATION SYSTEM

C y Channel Channel decoder u

A rst Encounter: Block Coding and Trellis Coding

Simple Block Coding Schemes

CHAPTER 2. A FIRST ENCOUNTER: BLOCK CODING AND TRELLIS CODING

is called the parity check matrix of C and we are prepared to dene:

2.1. SIMPLE BLOCK CODING SCHEMES

Another important code family can be dened as follows:

CHAPTER 2. A FIRST ENCOUNTER: BLOCK CODING AND TRELLIS CODING

1. compute the syndrome: s = rH T with H =

2.2. CONVOLUTIONAL CODING AN EXAMPLE

Convolutional Coding An Example

CHAPTER 2. A FIRST ENCOUNTER: BLOCK CODING AND TRELLIS CODING

1/10 1/10 1 1/11 1/11 0/01 0/01 0 0/00 0 0/00 0 1 1

2.3. CODING GAIN

bit error probability

bit error probability

uncoded undecoded (7,4,3) Hamming code 0 5 E /N [dB]

The Additive White Gaussian Noise Channel

n(t) x(t) y(t) = x(t) + n(t)

where the entropy of the output is

fY (y) log2 fY (y)dy

CHAPTER 3. THE ADDITIVE WHITE GAUSSIAN NOISE CHANNEL

0 10 20 S/N0 and Eb/N0 [dB]

n(t) x(t) e(t) xe(t)

coherent matched lter y=

Figure 3.4: BPSK modulator, AWGN channel, and coherent demodulation.

fY (y) log2 fY (y)dy

(3.15) = 1. Then, the probability

The channel capacity of the BPSK scheme discussed previously is given by

CHAPTER 3. THE ADDITIVE WHITE GAUSSIAN NOISE CHANNEL

0 5 10 Es/N0 and Eb/N0 [dB]

10 10 bit error probability 10 10 10 10 10

with the complementary error function dened as erfc(x) = 2/

0.8 channel capacity

CHAPTER 3. THE ADDITIVE WHITE GAUSSIAN NOISE CHANNEL

4.5 4 3.5 channel capacity 3 2.5 2 1.5 1 0.5 0 20 10 0 10 20 30 E /N and E /N [dB]

To the History of Channel Coding

CHAPTER 4. TO THE HISTORY OF CHANNEL CODING

Cap acit yB oun

Imai ML (3,2,2)8PSK Ung

(4,3,4)4D Wei (2,1,4)GU 3x4PSK (3,2,4)8D Wei

(2,1,6)QPSK Hamm/turbo APP

(2,1,6)AM Ung RS(255,223)

BCH(64,45) turbo(1024,18) turbo(65536,18) BHD turbo(16384,18)

BHD RS(64,32) (64,22)RM (3,1,6)Vit

(4,3,6)4D CS (4,3,6)16QAM Ung

(4,3,4)16QAM NL ML/turbo (4,3,16)16QAM (4,3,3)4D GCS ML LDGM Pragmatic (3,2,6)8PSK Ung

8x8PSK BCM Imai ML (3,2,2)8PSK Ung

(2,1,6)AM Ung RS(255,223) BCH(255,123)