Académique Documents
Professionnel Documents
Culture Documents
K
denote these codewords.
For each of these codewords we will compute the probability that if we send this codeword,
the receiver would fail. Let X
0
, . . . , X
K
, where K = 2
k
1, be the K codewords with the lowest
probability of failure. We assign these words to the 2
k
messages we need to encode in an arbitrary
fashion. Specically, for i = 0, . . . , 2
k
1, we encode i as the string X
i
.
y
Claude Elwood Shannon (April 30, 1916 - February 24, 2001), an American electrical engineer and mathemati-
cian, has been called the father of information theory.
2
The decoding of a message w is done by going over all the codewords, and nding all the
codewords that are in (Hamming) distance in the range [p(1 )n, p(1 + )n] from w. If there is
only a single word X
i
with this property, we return i as the decoded word. Otherwise, if there are
no such word or there is more than one word then the decoder stops and report an error.
28.2.1.2 The proof
Y
0
Y
1
Y
2
r = pn
2pn
Intuition. Each code Y
i
corresponds to a region that looks like a ring.
The ring for Y
i
is all the strings in Hamming distance between (1
)r and (1+)r from Y
i
, where r = pn. Clearly, if we transmit a string
Y
i
, and the receiver gets a string inside the ring of Y
i
, it is natural to
try to recover the received string to the original code corresponding to
Y
i
. Naturally, there are two possible bad events here:
(A) The received string is outside the ring of Y
i
, and
(B) The received string is contained in several rings of dierent
Ys, and it is not clear which one should the receiver decode the string
to. These bad regions are depicted as the darker regions in the gure on the right.
Let S
i
= S(Y
i
) be all the binary strings (of length n) such that if the receiver gets this word, it
would decipher it to be the original string assigned to Y
i
(here are still using the extended set of
codewords Y
0
, . . . , Y
K
). Note, that if we remove some codewords from consideration, the set S(Y
i
)
just increases in size (i.e., the bad region in the ring of Y
i
that is covered multiple times shrinks).
Let W
i
be the probability that Y
i
was sent, but it was not deciphered correctly. Formally, let r denote
the received word. We have that
W
i
=
rS
i
Pr
_
r was received when Y
i
was sent
_
. (28.1)
To bound this quantity, let (x, y) denote the Hamming distance between the binary strings x and
y. Clearly, if x was sent the probability that y was received is
w(x, y) = p
(x,y)
(1 p)
n(x,y)
.
As such, we have
Pr
_
r received when Y
i
was sent
_
= w(Y
i
, r).
Let S
i,r
be an indicator variable which is 1 if r S
i
. We have that
W
i
=
rS
i
Pr
_
r received when Y
i
was sent
_
=
rS
i
w(Y
i
, r) =
r
S
i,r
w(Y
i
, r). (28.2)
The value of W
i
is a random variable over the choice of Y
0
, . . . , Y
K
. As such, its natural to ask
what is the expected value of W
i
.
Consider the ring
ring(r) =
_
x {0, 1}
n
r ring(Y
i
)
_
8
,
where is the parameter of Theorem 28.1.3.
Proof : The decoder fails here, only if ring(r) contains some other codeword Y
j
( j i) in it. As
such,
= Pr
_
r S
i
r ring(Y
i
)
_
Pr
_
Y
j
ring(r) , for any j i
_
ji
Pr
_
Y
j
ring(r)
_
.
Now, we remind the reader that the Y
j
s are generated by picking each bit randomly and indepen-
dently, with probability 1/2. As such, we have
Pr
_
Y
j
ring(r)
_
=
ring (r)
{0, 1}
n
=
(1+)np
m=(1)np
_
n
m
_
2
n
n
2
n
_
n
(1 + )np
_
,
since (1 + )p < 1/2 (for suciently small), and as such the last binomial coecient in this
summation is the largest. By Corollary 28.3.2 (i), we have
Pr
_
Y
j
ring(r)
_
n
2
n
_
n
(1 + )np
_
n
2
n
2
nH((1+)p)
= n2
n
_
H((1+)p)1
_
.
As such, we have
= Pr
_
r S
i
r ring(Y
i
)
_
ji
Pr
_
Y
j
ring(r)
_
K Pr
_
Y
1
ring(r)
_
2
k+1
n2
n
_
H((1+)p)1
_
n2
n
_
1H(p)
_
+1 +n (H((1+)p)1)
n2
n
_
H((1+)p)H(p)
_
+1
since k n(1 H(p) ). Now, we choose to be a small enough constant, so that the quantity
H
_
(1 + )p
_
H
_
p
_
is equal to some (absolute) negative (constant), say , where > 0. Then,
n2
n+1
, and choosing n large enough, we can make smaller than /8, as desired. As such,
we just proved that
= Pr
_
r S
i
r ring(Y
i
)
_
8
.
Lemma 28.2.2 Consider the situation where Y
i
is sent, and the received string is r. We have that
Pr
_
r ring(Y
i
)
_
=
r ring(Y
i
)
w(Y
i
, r)
8
,
where is the parameter of Theorem 28.1.3.
4
Proof : This quantity, is the probability of sending Y
i
when every bit is ipped with probability p,
and receiving a string r such that more than pn+pn bits where ipped (or less than pnpn). But
this quantity can be bounded using the Cherno inequality. Indeed, let Z = (Y
i
, r), and observe
that E[Z] = pn, and it is the sum of n independent indicator variables. As such
r ring(Y
i
)
w(Y
i
, r) = Pr
_
|Z E[Z]| > pn
_
2 exp
_
2
4
pn
_
<
4
,
since is a constant, and for n suciently large.
Lemma 28.2.3 We have that f (Y
i
) =
_
r ring(Y
i
)
E
_
S
i,r
w(Y
i
, r)
_
/8 (the expectation is over all
the choices of the Ys excluding Y
i
).
Proof : Observe that S
i,r
w(Y
i
, r) w(Y
i
, r) and for xed Y
i
and r we have that E[w(Y
i
, r)] = w(Y
i
, r).
As such, we have that
f (Y
i
) =
r ring(Y
i
)
E
_
S
i,r
w(Y
i
, r)
_
r ring(Y
i
)
E[w(Y
i
, r)] =
r ring(Y
i
)
w(Y
i
, r)
8
,
by Lemma 28.2.2.
Lemma 28.2.4 We have that g(Y
i
) =
r ring(Y
i
)
E
_
S
i,r
w(Y
i
, r)
_
/8 (the expectation is over all the
choices of the Ys excluding Y
i
).
Proof : We have that S
i,r
w(Y
i
, r) S
i,r
, as 0 w(Y
i
, r) 1. As such, we have that
g(Y
i
) =
r ring(Y
i
)
E
_
S
i,r
w(Y
i
, r)
_
r ring(Y
i
)
E
_
S
i,r
_
=
r ring(Y
i
)
Pr[r S
i
]
=
r
Pr
_
r S
i
_
r ring(Y
i
)
__
=
r
Pr
_
r S
i
r ring(Y
i
)
_
Pr
_
r ring(Y
i
)
_
8
Pr
_
r ring(Y
i
)
_
8
,
by Lemma 28.2.1.
Lemma 28.2.5 For any i, we have = E[W
i
] /4, where is the parameter of Theorem 28.1.3,
where W
i
is the probability of failure to recover Y
i
if it was sent, see Eq. (28.1).
Proof : We have by Eq. (28.2) that W
i
=
_
r
S
i,r
w(Y
i
, r). For a xed value of Y
i
, we have by linearity
of expectation, that
E
_
W
i
Y
i
_
= E
_
r
S
i,r
w(Y
i
, r)
Y
i
_
_
=
r
E
_
S
i,r
w(Y
i
, r)
Y
i
_
=
r ring(Y
i
)
E
_
S
i,r
w(Y
i
, r)
Y
i
_
+
r ring(Y
i
)
E
_
S
i,r
w(Y
i
, r)
Y
i
_
= g(Y
i
) + f (Y
i
)
8
+
8
=
4
,
by Lemma 28.2.3 and Lemma 28.2.4. Now E[W
i
] = E
_
E
_
W
i
Y
i
__
E
_
/4
_
/4.
5
In the following, we need the following trivial (but surprisingly deep) observation.
Observation 28.2.6 For a random variable X, if E[X] , then there exists an event in the
probability space, that assigns X a value .
Lemma 28.2.7 For the codewords X
0
, . . . , X
K
, the probability of failure in recovering them when
sending them over the noisy channel is at most .
Proof : We just proved that when using Y
0
, . . . , Y
K
, the expected probability of failure when sending
Y
i
, is E[W
i
] /4, where
K = 2
k+1
1. As such, the expected total probability of failure is
E
_
i=0
W
i
_
_
=
i=0
E[W
i
]
4
2
k+1
2
k
,
by Lemma 28.2.5. As such, by Observation 28.2.6, there exist a choice of Y
i
s, such that
i=0
W
i
2
k
.
Now, we use a similar argument used in proving Markovs inequality. Indeed, the W
i
are always
positive, and it can not be that 2
k
of them have value larger than , because in the summation, we
will get that
i=0
W
i
> 2
k
.
Which is a contradiction. As such, there are 2
k
codewords with failure probability smaller than .
We set the 2
k
codewords X
0
, . . . , X
K
to be these words, where K = 2
k
1. Since we picked only a
subset of the codewords for our code, the probability of failure for each codeword shrinks, and is
at most .
Lemma 28.2.7 concludes the proof of the constructive part of Shannons theorem.
28.2.2 Lower bound on the message size
We omit the proof of this part. It follows similar argumentation showing that for every ring asso-
ciated with a codewords it must be that most of it is covered only by this ring (otherwise, there is
no hope for recovery). Then an easy packing argument implies the claim.
28.3 From previous lectures
Lemma 28.3.1 Suppose that nq is integer in the range [0, n]. Then
2
nH(q)
n + 1
_
n
nq
_
2
nH(q)
.
Lemma 28.3.1 can be extended to handle non-integer values of q. This is straightforward, and
we omit the easy details.
6
Corollary 28.3.2 We have:
(i) q [0, 1/2]
_
n
nq
_
2
nH(q)
. (ii) q [1/2, 1]
_
n
nq
_
2
nH(q)
.
(iii) q [1/2, 1]
2
nH(q)
n+1
_
n
nq
_
. (iv) q [0, 1/2]
2
nH(q)
n+1
_
n
nq
_
.
Theorem 28.3.3 Suppose that the value of a random variable X is chosen uniformly at random
from the integers {0, . . . , m 1}. Then there is an extraction function for X that outputs on average
at least
_
lg m
_
1 = H(X) 1 independent and unbiased bits.
28.4 Bibliographical Notes
The presentation here follows [MU05, Sec. 9.1-Sec 9.3].
Bibliography
[MU05] M. Mitzenmacher and U. Upfal. Probability and Computing randomized algorithms
and probabilistic analysis. Cambridge, 2005.
7