Vous êtes sur la page 1sur 35

Information and Coding Theory Lecture-09:

Channel Coding Theorem

Chadi Abou-Rjeily

Department of Electrical and Computer Engineering


Lebanese American University
chadi.abourjeily@lau.edu.lb

March 6, 2018

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Introduction (1)

The channel encoder adds useful redundancies in order to


protect the transmitted data stream against the noise
introduced by the channel.
Its input consists of a sequence of m information symbols
U = (U1 , . . . , Um ).
Its output consists of a sequence of n encoded symbols
X = (X1 , . . . , Xn ).
Consequently, the channel coding rate is given by:
m
Rc = information bits/transmitted bit
n
Note that since the encoder is adding redundancy to the
source, n ≥ m resulting in Rc ≤ 1.
For efficient transmissions, Rc must be maximized.
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Introduction (2)

The channel decoder tries to reconstitute the data flow from


the received symbols.
Its input corresponds to the received vector Y . It consists of a
sequence of n symbols Y = (Y1 , . . . , Yn ).
Note that because of the noise added by the channel Y 6= X .
Its output corresponds to the reconstituted source V . It
consists of a sequence of m symbols V = (V1 , . . . , Vm ).
The channel code is characterized by the probability of error:

Pe = Pr (V 6= U)

Note that due to the presence of noise, V is not always equal


to U.

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Introduction (3)

The problem of channel coding consists of:


Maximizing Rc for a given Pe .
The dual problem: minimizing Pe for a given Rc .
The design of channel codes depends on the properties of the
communication channel:
Example: channel codes developed for cables, optical fibers,
wireless channels, deep space communications ...

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Cost Constraint (1)
In some situations, an additional cost constraint is imposed.
For discrete channels, this constraint is quantified by the
Hamming weight.
Designate by x = (x1 , . . . , xn ) one realization of the
transmitted codeword X . The Hamming weight of x is given
by:
Xn
wH (x) = wH (xi )
i =1
where: 
0, xi = 0;
wH (xi ) =
1, xi =
6 0.
In other words, wH (x) is the number of nonzero components
of x.
Note that:
wH (x) = 0 ⇔ x = [0, . . . , 0]
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Cost Constraint (2)

In this case, the average cost on the transmitted sequence


X = (X1 , . . . , Xn ) is:
X
EwH (X ) , p(x)wH (x)
x

Since the transmitted sequence has a length n, the imposed


cost constraint can be written as:
1
EwH (X ) ≤ P
n

The average Hamming weight is normalized by n.


P quantifies the cost.
The absence of a cost constraint corresponds to setting
P → +∞.

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Cost Constraint (3)

For continuous channels, the cost constraint is often a power


constraint:
1  2
E X ≤P
n

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Channel Coding Theorem (1)
Maximizing Rc is equivalent to maximizing the quantity
1
n I (X , Y ) which corresponds to the amount of information
transmitted over the channel X → Y .
The maximization must be performed under the cost
constraint: n1 EwH (X ) ≤ P.
For a given sequence length n, we define the capacity-cost
function by:
 
1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x,y ) n n
Note that p(x, y ) = p(x)p(y |x). Since p(y |x) is fixed by the
channel (it is independent from the channel code), then the
maximization can be performed on p(x) rather than p(x, y ).
Consequently, Cn (P) can be written as:
 
1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x) n n
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Channel Coding Theorem (2)

The capacity-cost function C (P) is defined by:


 
1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x) n n
C (P) = max Cn (P)
n

In the special case of transmitting over a channel without a


cost constraint, the capacity C of the channel takes the
simple form:
1
Cn = max I (X , Y )
p(x) n

C = max Cn
n

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Channel Coding Theorem (3)

Shannon’s channel coding theorem: For


reliable communications, the channel coding
rate can not exceed C (P). In other words,
for an arbitrarily small error probability
(Pe ), Rc must satisfy the following relation:

Rc ≤ C (P) Claude Shannon (1916-2001)

In other words, given a certain amount of resources (modeled


by the cost function P), we can not transmit as fast as we
desire over a noisy channel without losing information (due to
errors).
Shannon proved that the previous upper bound is the best
possible (lowest) bound on the rate Rc .
Shannon also proved the existence of a channel code that
achieves this bound.

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Channel Coding Theorem (4)

For a stationary and memoryless channel, the channel is


completely described by the conditional probability
distribution Pr(Y |X ).
In this case, the maximization over n can be removed from
the expression of C (P).
Consequently, the capacity-cost function of a stationary and
memoryless channel is given by:

C (P) = max {I (X , Y ) | EwH (X ) ≤ P}


p(x)

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Properties of C (P) (1)
Property 1: Reliable transmissions are not possible over any
channel when no resources (such as power) are available.
C (0) = Cn (0) = 0
Proof:
Cn (P) is given by:
 
1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x) n n

For P = 0, the relation n1 EwH (X ) ≤ 0 implies that


EwH (X ) = 0 since the Hamming weight is nonnegative.
Since wH (x) = 0 ⇔ x = 0, then EwH (X ) = 0 will imply that
X is deterministic and it is equal to zero.
Since X is deterministic, then it is independent from Y and
consequently: I (X , Y ) = 0.
Finally, Cn (0) = 0 for all values of n implying that C (0) = 0.
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Properties of C (P) (2)

Property 2: C (P) is an increasing function of P. When more


resources are available, higher data rates can be achieved.
Proof:
Cn (P) is given by:
 
1 1
Cn (P) = max I (X , Y ) | EwH (X ) ≤ P
p(x) n n

When P increases, the condition n1 EwH (X ) ≤ P becomes less


stringent.
Consequently, the maximization will be performed over a
larger set of the probability distribution {p(x)}.
Therefore, it is possible to find a larger value of Cn (P) and,
consequently, Cn (P) increases when P increases. In the same
way, C (P) increases when P increases.

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Properties of C (P) (3)

Property 3: C (P) is a concave function of P.


A typical plots of C (P) is shown below:

In the above figure: C = C (+∞) stands for the unconstrained


capacity.
Following from the concavity of C (P): If the points A(P1 , C1 )
and B(P2 , C2 ) can be achieved, then all the points
(capacity-cost pairs) of the segment [AB] can be achieved.

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Calculating C (P) (1)

In what follows, we limit ourselves to the case of stationary


and memoryless channels. In this case, the capacity-cost
function is given by:

C (P) = max {I (X , Y ) | EwH (X ) = P}


p(x)

Note that the condition EwH (X ) ≤ P was replaced by


EwH (X ) = P since C (P) is a increasing function of P and
since that we are after maximizing C (P).
We also limit ourselves to symmetric channels. In this case,
the conditional entropy H(Y |X ) does not depend on the
probability distribution p(x).
In other words, H(Y |X ) = H(Y |X = x) for all values of x.

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Calculating C (P) (2)
Since I (X , Y ) = H(Y ) − H(Y |X ), then C (P) can be written
as:
C (P) = max {I (X , Y ) | EwH (X ) = P}
p(x)

= max {H(Y ) | EwH (X ) = P} − H(Y |X )


p(x)

The second equality follows since H(Y |X ) remains invariant


when maximizing over p(x) (since it does not depend on p(x)
following from the symmetry of the channel).
As a conclusion, the procedures for determining the
capacity-cost function C (P) are as follows:
1 Calculate H(Y |X ).
2 Determine the maximum value of H(Y ) under the given
constraint.
3 From step (2) deduce the distribution p(y ).
4 From the distribution of the output p(y ) and the conditional
distribution of the channel p(y |x) deduce the distribution of
the input p(x).
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-1 (1)
Calculate the capacity of a BSC channel with parameter p.

For a BSC channel:


H(Y |X = 0) = H(Y |X = 1) = H2 (p)
resulting in H(Y |X ) = H2 (p).
The unconstrained capacity is given by:
C = C (+∞) = max H(Y ) − H2 (p)
p(x)

The entropy H(Y ) reaches its maximum value of H(Y ) = 1


bit when Y is uniform: Pr(Y = 0) = Pr(Y = 1) = 12 .
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-1 (2)
We next check for the existence of a distribution p(x) that
results in a uniform output Y .
This is equivalent to determining
 the value of q such that:
1 − q, x = 0;
p(x) =
q, x = 1.
Consequently:
Pr(Y = 0) = Pr(Y = 0|X = 0)Pr(X = 0) + Pr(Y = 0|X = 1)Pr(X = 1)
1
⇒ = (1 − p)(1 − q) + pq = 1 − p − q + 2pq
2
p − 1/2 1
⇒q= =
2p − 1 2
The above relation holds for p 6= 1/2.
Note that for p = 1/2 any value of q results in a uniform
output. In this case:
Pr(Y = 0) = Pr(Y = 0|X = 0)Pr(X = 0) + Pr(Y = 0|X = 1)Pr(X = 1)
1 1 1
= (1 − q) + q = ∀ q
2 2 2
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-1 (3)

Therefore, the capacity of a BSC channel is given by:

C = 1 − H2 (p)

This capacity is achieved when the input has a uniform


distribution.
C reaches its maximum value of 1 information bit per
transmitted bit when:
p = 0: the channel is ideal.
p = 1: the channel is a simple inverter.
For p = 12 , C = 0 and the channel is opaque.

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-2 (1)
Calculate the capacity of a BSC with parameter p under a
constraint P on the Hamming weight.
For a BSC channel H(Y |X ) = H2 (p) implying that:

C (P) = max {H(Y ) | EwH (X ) = P} − H2 (p)


p(x)

The cost constraint implies that:

EwH (X ) = P
⇒ Pr(X = 0) wH (X = 0) +Pr(X = 1) wH (X = 1) = P
| {z } | {z }
=0 =1
⇒ Pr(X = 1) = P ⇒ Pr(X = 0) = 1 − P

Therefore, the cost constraint fixes the input probability


distribution p(x). Consequently, the maximization over p(x)
can be removed from the expression of C (P).

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-2 (2)
The distribution of Y can be calculated from:
Pr(Y = 1) = Pr(Y = 1|X = 0)Pr(X = 0) + Pr(Y = 1|X = 1)Pr(X = 1)
= p(1 − P) + (1 − p)P
Pr(Y = 0) = Pr(Y = 0|X = 0)Pr(X = 0) + Pr(Y = 0|X = 1)Pr(X = 1)
= (1 − p)(1 − P) + pP

Consequently, H(Y ) = H2 (p(1 − P) + (1 − p)P) and:


C (P) = H2 (p(1 − P) + (1 − p)P) − H2 (p)

Denote by Pmax the smallest value of P for which C (P) = C


where C is the (unconstrained) capacity of the channel. C
was calculated in example-1.
C (Pmax ) = C
⇒ H2 (p(1 − Pmax ) + (1 − p)Pmax ) − H2 (p) = 1 − H2 (p)
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-2 (3)
The last equation implies that:
H2 (p(1 − Pmax ) + (1 − p)Pmax ) = 1
1 1
⇒ p(1 − Pmax ) + (1 − p)Pmax = ⇒ Pmax =
2 2
Finally:

H2 (p(1 − P) + (1 − p)P) − H2 (p), 0 ≤ P ≤ 12 ;
C (P) =
C = 1 − H2 (p), P ≥ 21 .
1
p=0.01, p=0.99
p=0.1, p=0.9
p=0.5
Capacity−cost function C(P)

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-3 (1)
Calculate the capacity of a M-ary symmetric channel X → Y with
parameter p characterized by the following conditional probabilities:

1 − p, y = x;
p(y |x) = p
M−1 , y 6= x.
where the input X and output Y are both M-ary.
For this channel, H(Y |X ) = H2 (p) + p log2 (M − 1) (entropy
of a M-ary symmetric r.v.).
Consequently:
C = max H(Y ) − H2 (p) − p log2 (M − 1)
p(x)

Given that Y is a M-ary r.v., then H(Y ) reaches its maximum


value of H(Y ) = log2 M bits when Y is uniform.
Next, we check for the existence of a distribution p(x) that
results in a uniform distribution p(y ):
1
p(y ) = ∀ y ∈ {0, . . . , M − 1}
M
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-3 (2)
For Y = y ∈ {0, . . . , M − 1}:
X
Pr(Y = y ) = Pr(Y = y |X = y )Pr(X = y )+ Pr(Y = y |X = x)Pr(X = x)
x6=y
1 p X
⇒ = (1 − p)Pr(X = y ) + Pr(X = x)
M M −1
x6=y
1 p
⇒ = (1 − p)Pr(X = y ) + (1 − Pr(X = y ))
M M −1
1
⇒ Pr(X = y ) = ; y ∈ {0, . . . , M − 1}
M
Therefore when X is uniform, Y is uniform and H(Y ) is
maximized.
Consequently, the capacity is given by:
C = log2 M − H2 (p) − p log2 (M − 1)
For M = 2 (BSC), C = 1 − H2 (p).
C is expressed in information bits/transmitted symbol.
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-3 (3)
To normalize the capacity with respect to the size of the
alphabet, C must be divided by log2 M resulting in:
H2 (p) log (M − 1)
C =1− −p 2
log2 M log2 M

C is now expressed in information bits/transmitted bit.


1
M=2
M=4
M=8
C (bits/coded bit)

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-3 (4)
Note that C = 0 when:
H2 (p) + p log2 (M − 1) = log2 M

The left hand-side is the entropy of a M-ary symmetric r.v.


with parameter p:

1 − p, x = 0;
Pr(X = x) = p
M−1 , x ∈ {1, . . . , M − 1}.
The right hand-side is the entropy of a M-ary uniform r.v.:
1
Pr(X = x) = ∀ x ∈ {0, . . . , M − 1}
M
Therefore, C = 0 when:
p 1
1−p = =
M −1 M
1
implying that p = 1 − M.
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-4 (1)
Calculate the capacity of a BEC channel with parameter π.

Method 1:
We have seen in lecture-5 that the mutual information
between the input and the output of a BEC channel is given
by:
I (X , Y ) = (1 − π)H(X )

Consequently, the unconstrained capacity is given by:


C = max I (X , Y ) = 1 − π bits
p(x)
The capacity is achieved when X has a uniform distribution.
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-4 (2)
Method 2:
For a BEC channel H(Y |X ) = H2 (π). Consequently:
C = max H(Y ) − H2 (π)
p(x)

Note that for any input distribution:


Pr(Y = e) = Pr(X = 0)Pr(Y = e|X = 0) + Pr(X = 1)Pr(Y = e|X = 1)
= πPr(X = 0)+πPr(X = 1) = π[Pr(X = 0)+Pr(X = 1)] = π
Therefore, the maximum value of H(Y ) under the constraint
Pr(Y = e) = π can be reached when Y is a ternary (M = 3)
symmetric r.v.:

π, y = e;
p(y ) = 1−π
2 , y = 0, 1.
In this case:
H(Y ) = H2 (1 − π) + (1 − π) log2 2 = H2 (π) + 1 − π
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-4 (3)
We next check for the existence of a distribution p(x) that
results in a ternary symmetric output Y . This is equivalent to
determining the value of q such that:

1 − q, x = 0;
p(x) =
q, x = 1.
For Y = 0:

Pr(Y = 0) = Pr(Y = 0|X = 0)Pr(X = 0) + Pr(Y = 0|X = 1)Pr(X = 1)


1−π 1
⇒ = (1 − π)(1 − q) ⇒ q =
2 2
In an equivalent way, for Y = 1:

Pr(Y = 1) = Pr(Y = 1|X = 0)Pr(X = 0) + Pr(Y = 1|X = 1)Pr(X = 1)


1−π 1
⇒ = (1 − π)q ⇒ q =
2 2
Therefore, H(Y ) is maximized when X is uniform.
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-4 (4)

Consequently:

C = H2 (π) + 1 − π − H2 (π)
=1−π
This capacity is achieved when the input has a uniform
distribution.
The capacity reaches its maximum value of C = 1 for π = 0
(ideal channel with no erasures).
For π = 1, C = 0 and the channel is opaque. In this case,
Y = e independently from the value taken by the input.

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-5 (1)
Calculate the capacity of an AWGN channel X → Y under the
average power constraint E[X 2 ] ≤ P.
For this AWGN channel:
Y = X + Z where Z stands for noise.
X and Z are independent.
Z is a Gaussian r.v.:
We denote by N = E[Z 2 ] the power of the Gaussian noise;
consequently, Z ∼ N (0, N).
H(Y |X ) can be calculated from:

1
H(Y |X ) = H(X + Z |X ) = H(Z |X ) = H(Z ) = log(2πeN)
2
The second equality follows since, conditioned on X , there is a
one-to-one relation between Z and X + Z .
The third equality follows since X and Z are independent.
The fourth equality follows from the expression of the
differential entropy of a Gaussian r.v.
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-5 (2)

Consequently:
 1
C (P) = max H(Y ) | EX 2 = P − log(2πeN)
p(x) 2

Given that X and Z are independent:

EY 2 = E(X + Z )2 = EX 2 + EZ 2 = P + N

We have previously determined the maximum value of the


differential entropy under an average power constraint (by
applying Gibbs’ inequality). Consequently, maximizing H(Y )
under the average power constraint EY 2 = P + N results in:
1
H(Y ) ≤ log(2πe(P + N))
2
with equality if Y is Gaussian with variance N + P.
Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-5 (3)

Therefore:
1 1
C (P) = log(2πe(P + N)) − log(2πeN)
2   2
1 P
= log 1 +
2 N

The output Y is given by Y = X + Z :


Z is Gaussian with variance N.
Y is Gaussian with variance P + N.
Consequently, X is also Gaussian with variance P.
Therefore, the probability density function p(x) that maximizes
x2
C (P) is p(x) = √ 1 e − 2P .
2πP

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-5 (4)

C (P) is shown in the figure below:


3

2.5

C(P) (bits) 1.5

0.5

0
0 5 10 15 20 25 30 35 40 45 50
SNR

Note that:
limP→+∞ C (P) = +∞.
P/N is the signal-to-noise ratio (SNR).

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The
Example-5 (5)

C (P) is expressed in bits and it constitutes an upper bound


on the channel coding rate Rc .
Designate by T the separation between 2 consecutive symbols.
Then RTc stands for the rate expressed in bits/second.
Therefore, when expressed in bits/s, C (P) takes the form:

1
C (P) = log (1 + SNR)
2T

Assume that the transmitted signal has a bandwidth B and is


sampled at the minimum rate of T1 = 2B (Nyquist rate).
Then C (P) can be written as:

C (P) = B log (1 + SNR) (bits/s)

Chadi Abou-Rjeily Information and Coding Theory Lecture-09: Channel Coding The

Vous aimerez peut-être aussi