Coding Theorem

Information and Coding Theory Lecture-09:
Channel Coding Theorem
Chadi Abou-Rjeily
Department of Electrical and Computer Engineering

Lebanese American University
chadi.abourjeily@lau.edu.lb
March 6, 2018
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Introduction (1)
The channel encoder adds useful redundancies in order to

protect the transmitted data stream against the noise
introduced by the channel.
Its input consists of a sequence of m information symbols
U = (U1 , . . . , Um ).
Its output consists of a sequence of n encoded symbols
X = (X1 , . . . , Xn ).
Consequently, the channel coding rate is given by:
m
Rc = information bits/transmitted bit
n
Note that since the encoder is adding redundancy to the
source, n ≥ m resulting in Rc ≤ 1.
For efficient transmissions, Rc must be maximized.
Introduction (2)
The channel decoder tries to reconstitute the data flow from

the received symbols.
Its input corresponds to the received vector Y . It consists of a
sequence of n symbols Y = (Y1 , . . . , Yn ).
Note that because of the noise added by the channel Y 6= X .
Its output corresponds to the reconstituted source V . It
consists of a sequence of m symbols V = (V1 , . . . , Vm ).
The channel code is characterized by the probability of error:
Pe = Pr (V 6= U)
Note that due to the presence of noise, V is not always equal

to U.
Introduction (3)
The problem of channel coding consists of:

Maximizing Rc for a given Pe .
The dual problem: minimizing Pe for a given Rc .
The design of channel codes depends on the properties of the
communication channel:
Example: channel codes developed for cables, optical fibers,
wireless channels, deep space communications ...
Cost Constraint (1)
In some situations, an additional cost constraint is imposed.
For discrete channels, this constraint is quantified by the
Hamming weight.
Designate by x = (x1 , . . . , xn ) one realization of the
transmitted codeword X . The Hamming weight of x is given
by:
Xn
wH (x) = wH (xi )
i =1
where:
0, xi = 0;
wH (xi ) =
1, xi =
6 0.
In other words, wH (x) is the number of nonzero components
of x.
Note that:
wH (x) = 0 ⇔ x = [0, . . . , 0]
Cost Constraint (2)
In this case, the average cost on the transmitted sequence

X = (X1 , . . . , Xn ) is:
X
EwH (X ) , p(x)wH (x)
x
Since the transmitted sequence has a length n, the imposed

cost constraint can be written as:
1
EwH (X ) ≤ P
n
The average Hamming weight is normalized by n.

P quantifies the cost.
The absence of a cost constraint corresponds to setting
P → +∞.
Cost Constraint (3)
For continuous channels, the cost constraint is often a power

constraint:
1 2
E X ≤P
n
Channel Coding Theorem (1)
Maximizing Rc is equivalent to maximizing the quantity
1
n I (X , Y ) which corresponds to the amount of information
transmitted over the channel X → Y .
The maximization must be performed under the cost
constraint: n1 EwH (X ) ≤ P.
For a given sequence length n, we define the capacity-cost
function by:

1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x,y ) n n
Note that p(x, y ) = p(x)p(y |x). Since p(y |x) is fixed by the
channel (it is independent from the channel code), then the
maximization can be performed on p(x) rather than p(x, y ).
Consequently, Cn (P) can be written as:

1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x) n n
The capacity-cost function C (P) is defined by:

1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x) n n
C (P) = max Cn (P)
n
In the special case of transmitting over a channel without a

cost constraint, the capacity C of the channel takes the
simple form:
1
Cn = max I (X , Y )
p(x) n
C = max Cn
n
Shannon’s channel coding theorem: For

reliable communications, the channel coding
rate can not exceed C (P). In other words,
for an arbitrarily small error probability
(Pe ), Rc must satisfy the following relation:
Rc ≤ C (P) Claude Shannon (1916-2001)
In other words, given a certain amount of resources (modeled

by the cost function P), we can not transmit as fast as we
desire over a noisy channel without losing information (due to
errors).
Shannon proved that the previous upper bound is the best
possible (lowest) bound on the rate Rc .
Shannon also proved the existence of a channel code that
achieves this bound.
For a stationary and memoryless channel, the channel is

completely described by the conditional probability
distribution Pr(Y |X ).
In this case, the maximization over n can be removed from
the expression of C (P).
Consequently, the capacity-cost function of a stationary and
memoryless channel is given by:
C (P) = max {I (X , Y ) | EwH (X ) ≤ P}

p(x)
Properties of C (P) (1)
Property 1: Reliable transmissions are not possible over any
channel when no resources (such as power) are available.
C (0) = Cn (0) = 0
Proof:
Cn (P) is given by:

1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x) n n
For P = 0, the relation n1 EwH (X ) ≤ 0 implies that

EwH (X ) = 0 since the Hamming weight is nonnegative.
Since wH (x) = 0 ⇔ x = 0, then EwH (X ) = 0 will imply that
X is deterministic and it is equal to zero.
Since X is deterministic, then it is independent from Y and
consequently: I (X , Y ) = 0.
Finally, Cn (0) = 0 for all values of n implying that C (0) = 0.
Property 2: C (P) is an increasing function of P. When more

resources are available, higher data rates can be achieved.
Proof:
Cn (P) is given by:

1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x) n n
When P increases, the condition n1 EwH (X ) ≤ P becomes less

stringent.
Consequently, the maximization will be performed over a
larger set of the probability distribution {p(x)}.
Therefore, it is possible to find a larger value of Cn (P) and,
consequently, Cn (P) increases when P increases. In the same
way, C (P) increases when P increases.
Property 3: C (P) is a concave function of P.

A typical plots of C (P) is shown below:
In the above figure: C = C (+∞) stands for the unconstrained

capacity.
Following from the concavity of C (P): If the points A(P1 , C1 )
and B(P2 , C2 ) can be achieved, then all the points
(capacity-cost pairs) of the segment [AB] can be achieved.
Calculating C (P) (1)
In what follows, we limit ourselves to the case of stationary

and memoryless channels. In this case, the capacity-cost
function is given by:
C (P) = max {I (X , Y ) | EwH (X ) = P}

p(x)
Note that the condition EwH (X ) ≤ P was replaced by

EwH (X ) = P since C (P) is a increasing function of P and
since that we are after maximizing C (P).
We also limit ourselves to symmetric channels. In this case,
the conditional entropy H(Y |X ) does not depend on the
probability distribution p(x).
In other words, H(Y |X ) = H(Y |X = x) for all values of x.
Calculating C (P) (2)
Since I (X , Y ) = H(Y ) − H(Y |X ), then C (P) can be written
as:
C (P) = max {I (X , Y ) | EwH (X ) = P}
p(x)
= max {H(Y ) | EwH (X ) = P} − H(Y |X )

p(x)
The second equality follows since H(Y |X ) remains invariant

when maximizing over p(x) (since it does not depend on p(x)
following from the symmetry of the channel).
As a conclusion, the procedures for determining the
capacity-cost function C (P) are as follows:
1 Calculate H(Y |X ).
2 Determine the maximum value of H(Y ) under the given
constraint.
3 From step (2) deduce the distribution p(y ).
4 From the distribution of the output p(y ) and the conditional
distribution of the channel p(y |x) deduce the distribution of
the input p(x).
Example-1 (1)
Calculate the capacity of a BSC channel with parameter p.
For a BSC channel:

H(Y |X = 0) = H(Y |X = 1) = H2 (p)
resulting in H(Y |X ) = H2 (p).
The unconstrained capacity is given by:
C = C (+∞) = max H(Y ) − H2 (p)
p(x)
The entropy H(Y ) reaches its maximum value of H(Y ) = 1

bit when Y is uniform: Pr(Y = 0) = Pr(Y = 1) = 12 .
Example-1 (2)
We next check for the existence of a distribution p(x) that
results in a uniform output Y .
This is equivalent to determining
the value of q such that:
1 − q, x = 0;
p(x) =
q, x = 1.
Consequently:
Pr(Y = 0) = Pr(Y = 0|X = 0)Pr(X = 0) + Pr(Y = 0|X = 1)Pr(X = 1)
1
⇒ = (1 − p)(1 − q) + pq = 1 − p − q + 2pq
2
p − 1/2 1
⇒q= =
2p − 1 2
The above relation holds for p 6= 1/2.
Note that for p = 1/2 any value of q results in a uniform
output. In this case:
1 1 1
= (1 − q) + q = ∀ q
2 2 2
Example-1 (3)
Therefore, the capacity of a BSC channel is given by:
C = 1 − H2 (p)
This capacity is achieved when the input has a uniform

distribution.
C reaches its maximum value of 1 information bit per
transmitted bit when:
p = 0: the channel is ideal.
p = 1: the channel is a simple inverter.
For p = 12 , C = 0 and the channel is opaque.
Example-2 (1)
Calculate the capacity of a BSC with parameter p under a
constraint P on the Hamming weight.
For a BSC channel H(Y |X ) = H2 (p) implying that:
C (P) = max {H(Y ) | EwH (X ) = P} − H2 (p)

p(x)
The cost constraint implies that:
EwH (X ) = P
⇒ Pr(X = 0) wH (X = 0) +Pr(X = 1) wH (X = 1) = P
| {z } | {z }
=0 =1
⇒ Pr(X = 1) = P ⇒ Pr(X = 0) = 1 − P
Therefore, the cost constraint fixes the input probability

distribution p(x). Consequently, the maximization over p(x)
can be removed from the expression of C (P).
Example-2 (2)
The distribution of Y can be calculated from:
= p(1 − P) + (1 − p)P
= (1 − p)(1 − P) + pP
Consequently, H(Y ) = H2 (p(1 − P) + (1 − p)P) and:

C (P) = H2 (p(1 − P) + (1 − p)P) − H2 (p)
Denote by Pmax the smallest value of P for which C (P) = C

where C is the (unconstrained) capacity of the channel. C
was calculated in example-1.
C (Pmax ) = C
⇒ H2 (p(1 − Pmax ) + (1 − p)Pmax ) − H2 (p) = 1 − H2 (p)
Example-2 (3)
The last equation implies that:
H2 (p(1 − Pmax ) + (1 − p)Pmax ) = 1
1 1
⇒ p(1 − Pmax ) + (1 − p)Pmax = ⇒ Pmax =
2 2
Finally:

H2 (p(1 − P) + (1 − p)P) − H2 (p), 0 ≤ P ≤ 12 ;
C (P) =
C = 1 − H2 (p), P ≥ 21 .
1
p=0.01, p=0.99
p=0.1, p=0.9
p=0.5
Capacity−cost function C(P)
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P
Example-3 (1)
Calculate the capacity of a M-ary symmetric channel X → Y with
parameter p characterized by the following conditional probabilities:

1 − p, y = x;
p(y |x) = p
M−1 , y 6= x.
where the input X and output Y are both M-ary.
For this channel, H(Y |X ) = H2 (p) + p log2 (M − 1) (entropy
of a M-ary symmetric r.v.).
Consequently:
C = max H(Y ) − H2 (p) − p log2 (M − 1)
p(x)
Given that Y is a M-ary r.v., then H(Y ) reaches its maximum

value of H(Y ) = log2 M bits when Y is uniform.
Next, we check for the existence of a distribution p(x) that
results in a uniform distribution p(y ):
1
p(y ) = ∀ y ∈ {0, . . . , M − 1}
M
Example-3 (2)
For Y = y ∈ {0, . . . , M − 1}:
X
Pr(Y = y ) = Pr(Y = y |X = y )Pr(X = y )+ Pr(Y = y |X = x)Pr(X = x)
x6=y
1 p X
⇒ = (1 − p)Pr(X = y ) + Pr(X = x)
M M −1
x6=y
1 p
⇒ = (1 − p)Pr(X = y ) + (1 − Pr(X = y ))
M M −1
1
⇒ Pr(X = y ) = ; y ∈ {0, . . . , M − 1}
M
Therefore when X is uniform, Y is uniform and H(Y ) is
maximized.
Consequently, the capacity is given by:
C = log2 M − H2 (p) − p log2 (M − 1)
For M = 2 (BSC), C = 1 − H2 (p).
C is expressed in information bits/transmitted symbol.
Example-3 (3)
To normalize the capacity with respect to the size of the
alphabet, C must be divided by log2 M resulting in:
H2 (p) log (M − 1)
C =1− −p 2
log2 M log2 M
C is now expressed in information bits/transmitted bit.

1
M=2
M=4
M=8
C (bits/coded bit)
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
Example-3 (4)
Note that C = 0 when:
H2 (p) + p log2 (M − 1) = log2 M
The left hand-side is the entropy of a M-ary symmetric r.v.

with parameter p:

1 − p, x = 0;
Pr(X = x) = p
M−1 , x ∈ {1, . . . , M − 1}.
The right hand-side is the entropy of a M-ary uniform r.v.:
1
Pr(X = x) = ∀ x ∈ {0, . . . , M − 1}
M
Therefore, C = 0 when:
p 1
1−p = =
M −1 M
1
implying that p = 1 − M.
Example-4 (1)
Calculate the capacity of a BEC channel with parameter π.
Method 1:
We have seen in lecture-5 that the mutual information
between the input and the output of a BEC channel is given
by:
I (X , Y ) = (1 − π)H(X )
Consequently, the unconstrained capacity is given by:

C = max I (X , Y ) = 1 − π bits
p(x)
The capacity is achieved when X has a uniform distribution.
Example-4 (2)
Method 2:
For a BEC channel H(Y |X ) = H2 (π). Consequently:
C = max H(Y ) − H2 (π)
p(x)
Note that for any input distribution:

Pr(Y = e) = Pr(X = 0)Pr(Y = e|X = 0) + Pr(X = 1)Pr(Y = e|X = 1)
= πPr(X = 0)+πPr(X = 1) = π[Pr(X = 0)+Pr(X = 1)] = π
Therefore, the maximum value of H(Y ) under the constraint
Pr(Y = e) = π can be reached when Y is a ternary (M = 3)
symmetric r.v.:

π, y = e;
p(y ) = 1−π
2 , y = 0, 1.
In this case:
H(Y ) = H2 (1 − π) + (1 − π) log2 2 = H2 (π) + 1 − π
Example-4 (3)
We next check for the existence of a distribution p(x) that
results in a ternary symmetric output Y . This is equivalent to
determining the value of q such that:

1 − q, x = 0;
p(x) =
q, x = 1.
For Y = 0:

1−π 1
⇒ = (1 − π)(1 − q) ⇒ q =
2 2
In an equivalent way, for Y = 1:

1−π 1
⇒ = (1 − π)q ⇒ q =
2 2
Therefore, H(Y ) is maximized when X is uniform.
Example-4 (4)
Consequently:
C = H2 (π) + 1 − π − H2 (π)
=1−π
This capacity is achieved when the input has a uniform
distribution.
The capacity reaches its maximum value of C = 1 for π = 0
(ideal channel with no erasures).
For π = 1, C = 0 and the channel is opaque. In this case,
Y = e independently from the value taken by the input.
Example-5 (1)
Calculate the capacity of an AWGN channel X → Y under the
average power constraint E[X 2 ] ≤ P.
For this AWGN channel:
Y = X + Z where Z stands for noise.
X and Z are independent.
Z is a Gaussian r.v.:
We denote by N = E[Z 2 ] the power of the Gaussian noise;
consequently, Z ∼ N (0, N).
H(Y |X ) can be calculated from:
1
H(Y |X ) = H(X + Z |X ) = H(Z |X ) = H(Z ) = log(2πeN)
2
The second equality follows since, conditioned on X , there is a
one-to-one relation between Z and X + Z .
The third equality follows since X and Z are independent.
The fourth equality follows from the expression of the
differential entropy of a Gaussian r.v.
Example-5 (2)
Consequently:
1
C (P) = max H(Y ) | EX 2 = P − log(2πeN)
p(x) 2
Given that X and Z are independent:
EY 2 = E(X + Z )2 = EX 2 + EZ 2 = P + N
We have previously determined the maximum value of the

differential entropy under an average power constraint (by
applying Gibbs’ inequality). Consequently, maximizing H(Y )
under the average power constraint EY 2 = P + N results in:
1
H(Y ) ≤ log(2πe(P + N))
2
with equality if Y is Gaussian with variance N + P.
Example-5 (3)
Therefore:
1 1
C (P) = log(2πe(P + N)) − log(2πeN)
2 2
1 P
= log 1 +
2 N
The output Y is given by Y = X + Z :

Z is Gaussian with variance N.
Y is Gaussian with variance P + N.
Consequently, X is also Gaussian with variance P.
Therefore, the probability density function p(x) that maximizes
x2
C (P) is p(x) = √ 1 e − 2P .
2πP
Example-5 (4)
C (P) is shown in the figure below:

3
2.5
C(P) (bits) 1.5
0.5
0
0 5 10 15 20 25 30 35 40 45 50
SNR
Note that:
limP→+∞ C (P) = +∞.
P/N is the signal-to-noise ratio (SNR).
Example-5 (5)
C (P) is expressed in bits and it constitutes an upper bound

on the channel coding rate Rc .
Designate by T the separation between 2 consecutive symbols.
Then RTc stands for the rate expressed in bits/second.
Therefore, when expressed in bits/s, C (P) takes the form:
1
C (P) = log (1 + SNR)
2T
Assume that the transmitted signal has a bandwidth B and is

sampled at the minimum rate of T1 = 2B (Nyquist rate).
Then C (P) can be written as:
C (P) = B log (1 + SNR) (bits/s)

Coding Theorem

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Coding Theorem

Transféré par

Droits d'auteur :

Formats disponibles

Information and Coding Theory Lecture-09:

Channel Coding Theorem

Department of Electrical and Computer Engineering

The channel encoder adds useful redundancies in order to

The channel decoder tries to reconstitute the data flow from

Note that due to the presence of noise, V is not always equal

The problem of channel coding consists of:

In this case, the average cost on the transmitted sequence

Since the transmitted sequence has a length n, the imposed

The average Hamming weight is normalized by n.

For continuous channels, the cost constraint is often a power

The capacity-cost function C (P) is defined by:

In the special case of transmitting over a channel without a

Shannon’s channel coding theorem: For

Rc ≤ C (P) Claude Shannon (1916-2001)

In other words, given a certain amount of resources (modeled

For a stationary and memoryless channel, the channel is

C (P) = max {I (X , Y ) | EwH (X ) ≤ P}

For P = 0, the relation n1 EwH (X ) ≤ 0 implies that

Property 2: C (P) is an increasing function of P. When more

When P increases, the condition n1 EwH (X ) ≤ P becomes less

Property 3: C (P) is a concave function of P.

In the above figure: C = C (+∞) stands for the unconstrained

In what follows, we limit ourselves to the case of stationary

C (P) = max {I (X , Y ) | EwH (X ) = P}

Note that the condition EwH (X ) ≤ P was replaced by

= max {H(Y ) | EwH (X ) = P} − H(Y |X )

The second equality follows since H(Y |X ) remains invariant

For a BSC channel:

The entropy H(Y ) reaches its maximum value of H(Y ) = 1

Therefore, the capacity of a BSC channel is given by:

This capacity is achieved when the input has a uniform

C (P) = max {H(Y ) | EwH (X ) = P} − H2 (p)

The cost constraint implies that:

Therefore, the cost constraint fixes the input probability

Consequently, H(Y ) = H2 (p(1 − P) + (1 − p)P) and:

Denote by Pmax the smallest value of P for which C (P) = C

Given that Y is a M-ary r.v., then H(Y ) reaches its maximum

C is now expressed in information bits/transmitted bit.

The left hand-side is the entropy of a M-ary symmetric r.v.

Consequently, the unconstrained capacity is given by:

Note that for any input distribution:

Pr(Y = 0) = Pr(Y = 0|X = 0)Pr(X = 0) + Pr(Y = 0|X = 1)Pr(X = 1)

Pr(Y = 1) = Pr(Y = 1|X = 0)Pr(X = 0) + Pr(Y = 1|X = 1)Pr(X = 1)

Given that X and Z are independent:

We have previously determined the maximum value of the

The output Y is given by Y = X + Z :

C (P) is shown in the figure below:

C(P) (bits) 1.5

C (P) is expressed in bits and it constitutes an upper bound

Assume that the transmitted signal has a bandwidth B and is

C (P) = B log (1 + SNR) (bits/s)

Vous aimerez peut-être aussi