Académique Documents
Professionnel Documents
Culture Documents
4, JULY 1994
1147
eeapacity.
I d e r Terms-Shannon theory, cl"e4aapacity, ebennel coding theorem, channels with memory, strong converse.
I. INTRODUCITON
HANNON'S formula [l] for channel capacity (the
supremum of all rates R for which there exist sequences of codes with vanishing error probability and
whose size grows with the block length n as exp (nR)),
holds for memorykm channels. If the channel has memory, then (1.1) generalizes to the familiar limiting expression
1
lim sup -Z(X"; Y " ) .
xn n
n+m
(1.2)
FEE
0018-9448/94$04.00
0 1994 IEEE
1148
(1.3)
c = sup&r;
X
Y).
(1.4)
r=2
1149
'
P y n l X that
n satisfies
(2.1)
c 2 sup J ( X ; Y).
(2.2)
I - log M
+ 6/4
11. DIRECTCODING
THEOREM:
C 2 sup, J(X;Y)
The conventional definition of channel capacity is (e.g.,
[13]) the following.
tion on 1: An (n,M,E ) code has block length n,M where the second inequality holds for all sufficiently large
codewords, and error probability5not larger than E. R 2 0 n because of the definition of J ( X ; Y ) . In view of (2.41,
is an -achievable rate if, for every 6 > 0, there exist, for Theorem 1 guarantees the existence of the desired codes.
0
all sufficiently large n,( n , M,e) codes with rate
111. CONVERSE
CODING
THEOREM:
C Isup, I(X,Y)
log M
>R-S.
This section is devoted to our main result: a tight
n
converse that holds in full generality. To that end, we
The maximum -achievable rate is called the -capacity need to obtain for any arbitrary code a lower bound on its
C,.The channel capacity C is defined as the maximal rate error probability as a function of its size or, equivalently,
that is eachievable for all 0 < E < 1. It follows immedi- an upper bound on its size as a function of its error
probability. One such bound is the standard one resulting
ately from the definition that C = lim, oC,.
The basis to prove the desired lower and upper bounds from the Fano inequality.
Theorem 3: Every (n,M,E) code satisfies
on capacity are respective upper and lower bounds on the
error probability of a code as a function of its size. The
1
following classical result (Feinstein's lemma) 1151 shows
log M I-[ Z ( X " ; Y " )+ h ( ~ ) ]
(3.1)
1
E
the existence of a code with a guaranteed error probability as a function of its size.
where h is the binary entropy function, X" is the input
Theorem I: Fix a positive integer n and 0 < , E < 1. For distribution that places probability mass 1/M on each of
every y > 0 and input distribution Px. on A", there exists the input codewords, and Y" is its corresponding output
an ( n , M,1 code for the transition probability W" = distribution.
5We work throughout with average error probability. It is well known
that the capacity of a single-user channel with known statistical description remains the same under the maximal error probability criterion.
1150
E-
we can write
M
Pxmy*[(ci, Bill
~xnyn[~I
=
R-S<-
Z(X"; Y")
+y
h(E)]
i= 1
(3.2)
PXny.[(ci,Bi n Of)]
i= 1
1
1
liminf - sup Z(X"; Y").
1 - E n+m n xn
RI-
Pxnynt(ci, B~ n
i=l
M I
(3.3)
I+@
(3.9)
where the second inequality is due to (3.8) and the dis0
jointness of the decoding sets.
Theorems 3 and 4 hold for arbitrary random transformations, in which general setting they are nothing but
lower bounds on the minimum error probability of M-ary
equiprobable hypothesis testing. If, in that general setting,
we denote the observations by Y and the true hypothesis
by X (equiprobable on {l,--.,M}),the M hypothesized
distributions are the conditional distributions{PyIxei,
i=
l;.., M}.The bound in Theorem 3 yields
I(X;Y) + log2
21logM
1
A slightly weaker result is known in statistical inference as
Y") I - log M - y - exp(-yn)
Fano's lemma [16].The bound in Theorem 4 can easily be
n
(3.4) seen to be equivalent to the more general version
E 2 P[PXIY(XlY)
_< a] - a
for every y > 0, where X" places probability mass 1/M for arbitrary 0 5 a _< 1. A stronger bound which holds
on each codeword.
without the assumption of equiprobable hypothesis has
Proof-Denote p = exp( - yn). Note first that the been found recently in [171.
event whose probability appears in (3.4) is equal to the set
Theorem 4 gives a family (parametrized by y ) of lower
of "atypical" input-output pairs
bounds on the error probability. To obtain the best bound,
we simply maximize the right-hand side of (3.4) over y.
However, a judicious, if not optimum, choice of y is
sufficient for the purposes of proving the general converse.
This is because the information density can be written as
Theorem 5:
c IsupJ(X:Y).
(3.10)
Pxn,yn(a"lb")
X
ixnw.(a";b") = log
(3.6)
Pxn(a" )
Proof-The intuition behind the use of Theorem 4 to
prove the converse is very simple. As a shorthand, let us
and Pxn(ci)= 1/M for each of the M codewords ci E A" refer to a sequence of codes with vanishingly small error
probability (i.e., a sequence of (n, M, E,) codes such that
We need to show that
E, + 0) as a reliable code sequence. Also, we will say that
P,","[L] IE + p.
(3.7) the information spectrum of a code (a term coined in [lo])
is the distribution of the normalized information density
evaluated with the input distribution X" that places equal
Now, denoting the decoding set corresponding to ci by Di
probability mass on each of the codewords of the code.
and
Theorem 4 implies that if a reliable code sequence has
rate R, then the mass of its information spectrum lying
strictly to the left of R must be asymptotically negligible.
1151
In other words, R I J ( X ; Y)where X corresponds to the Another problem where Theorem 4 proves to be the key
sequence of input distributions generated by the sequence result is that of combined source/channel coding [18]. It
of codebooks.
turns out that when dealing with arbitrary sources and
To formalize this reasoning, let us argue by contradic- channels, the separation theorem may not hold because,
tion and assume that for some p > 0,
in general, it could happen that a source is transmissible
over a channel even if the minimum achievable source
c = sup J ( X ; Y ) + 3 p .
(3.11)
coding rate (sup-entropy rate) exceeds the channel capacX
ity. Necessary and sufficient conditions for the transbnissiEy definition of capacity, there exists a reliable code bility of a source over a channel are obtained in [181.
sequence with rate
Definition 1 is the conventional definition of channel
capacity (cf. [15] and [13D where codes are required to be
log M
>c-p.
(3.12) reliable for all sufficiently large block length. An alternan
tive, more optimistic, definition of capacity can be considNow,letting X" be the distribution that places probabil- ered where codes are required to be reliable only inity mass 1/M on the codewords of that code, Theorem 4 finitely often. This definition is less appealing in many
(choosing y = p), (3.11) and (3.12) imply that the error practical situations because of the additional uncertainty
probability must be lower bounded by
in the favorable block lengths. Both definitions turn out to
lead to the same capacity formula for specific channel
clitsses such as discrete memoryless channels [13]. HowE, 2 P --i,.,.(X";Y")
s supJ(X; Y ) + p
X
ever, in general, both quantities need not be equal, and
- exp(-np). (3.13) the optimistic definition does not appear to admit a simple general formula such as the one in (1.4) for the
But, by definition of J(X; Y ) , the probability on the right- conventional definition. In particular, the optimistic cahand side of (3.13) cannot vanish asymptotically, thereby pacity need not be equal to the supremum of sup-inforcontradicting the fact that E,, + 0.
0
mation rates. See [181 for further characterization of this
Besides the behavior of the information spectrum of a quantity.
reliable code sequence revealed in the proof of Theorem
The conventional definition of capacity may be faulted
5, it is worth pointing out that the information spectrum for being too conservative in those rare situations where
of any code places no probability mass above its rate. To the maximum amount of reliably transmissible informasee this, simply note that (3.6) implies
tion does not grow linearly with block length, but, rather,
as O(b(n)). For example, consider the case b(n) = n +
P --i,nw.(X";Yn) I - log M
1. (3.14) n sin ( CY n). This can be easily taken into account by "seanl
l
sonal adjusting:" substitution of n by b(n) in the definiThus, we can conclude that the normalized information tion of rate and in all previous results.
density of a reliable code sequence converges in probabilIv. -CAPACITY
ity to its rate. For finite-input channels, this implies [lo,
Lemma 13 the same behavior for the sequence of normalThe fundamental tools (Theorems 1 and 4) we used in
ized mutual informations, thereby yielding the classical Section I11 to prove the general capacity formula are used
bound (1.3). However, that bound is not tight for informa- in this section to find upper and lower bounds on C,,the
tion unstable channels because, in that case, the mutual ecapacity of the channel, for 0 < E < 1. These bounds
information is maximized by input distributions whose coincide at the points where the ecapacity is a continuous
information spectrum does not converge to a single point function of E.
mass (unlike the behavior of the information spectrum of
Theorem 6: For 0 < E < 1, the -capacity C, satisfies
a reliable code sequence).
Upon reflecting on the proofs of the general direct and
converse theorems presented in Sections I1 and 111, we
can see that those results follow from asymptotically tight
upper and lower bounds on error probability, and are
decoupled from ergodic results such as the law of large
numbers or the asymptotic equipartition property. Those where F J R ) denotes the limit of cumulative distribution
ergodic results enter in the picture only as a way to functions
particularize the general capacity formula to special classes
of channels (such as memoryless or information stable
F,(R) = limSupP
IR . (4.3)
channels) so that capacity can be written in terms of the
n-?m
mutual information rate.
Unlike the conventional approach to the converse cod- The bounds (4.1) and (4.2) hold with equality, except
ing theorem (Theorem 31, Theorem 4 can be used to possibly at the points of discontinuity of C,,of which
provide a formula for ecapacity as we show in Section IV. there are, at most, countably many.
[:,
[:
1152
- 6.
SUP
X
= SUP
(4.4)
supsup{R: Fx(R)
Iei}
SUPSUP{R:
Fx(R)
Iei}
= supu(q) = U()
If we apply Theorem 4 to those codes, and we let X "
i
distribute its probability mass evenly on the nth codewhere the last equality holds because U(*) is continuous
book, we obtain
0
nondecreasing at E.
1
In the special case of stationary discrete channels, the
I-log M - 6 - exp (-Sn)
functional in (4.1) boils down to the quantile introduced
n
in [7] to determine ecapacity, except for a countable
IR - 26 - exp(-Sn).
(4.5) number of values of e. The ecapacity formula in [7] was
actually proved for a class of discrete stationary channels
(so-called
regular decomposable channels) that includes
Since (4.5) holds for all sufficiently'large n and every
ergodic channels and a narrow class of nonergodic chan6 > 0, we must have
nels. The formula for the capacity of discrete stationary
for all 6 > 0
Fx(R - 2 6 ) IE ,
(4.6) nonanticipatory channels given in [9] is the limit as E 0
of the right-hand side of (4.2) specialized to that particubut R satisfies (4.6) if and only if R Isup{R: Fx(R) IE}. lar case.
Concluding, any eachievable rate is upper bounded by
The inability to obtain an expression for ecapacity at
the right-hand side of (4.11, as we wanted to show.
its points of discontinuity is a consequence of the definiIn order to prove the direct part (4.21, we will show that tion itself rather than of our methods of analysis. In fact,
for every X,any R belonging to the set in the following it is easily checked by slightly modifying the proof of
right-hand side is eachievable:
Theorem 6 that (4.1) would hold with equality for all
0 I E < 1 had eachievable rates been defined in a slightly
{R: FX<R- 6 ) < E ,
for all 6 > 0)
different (and more regular) way, by requiring sequences
of codes with both rate and error probability arbitrarily
close to R and E, respectively. More precisely, consider an
alternative definition of R as an eachievable rate (0 IE
IE - exp ( - an) if n > no .
(4.7) < 1) when there exists a sequence of (n, M ,E,) codes
with
log M
Theorem 1 ensures the existence (for any 6 > 0) of a
liminf
2R
sequence of codes with rate
n + ~ n
and
1
R - 36 I- log M I R - 26
limsup E, I. E.
n
n-rm
I- log M
+6
With this definition, the resulting C, would be the rightcontinuous version of the conventional ecapacity, (4.1)
would hold with equality for all 0 IE < 1, and the channel capacity could be written as
CO= lim C, = sup l ( X ; Y) = supsup{R: F,(R)
E
< E
for all sufficiently large n. Thus, R is eachievable.
It is easy to see that the bounds are tight except at the
points of discontinuity of C,. Let U() and 1 ( ~ denote
)
the right-hand sides of (4.1) and (4.2), respectively. Since
U() is monotone nondecreasing, the set D c (0,l) of E
at which it is discontinuous is, at most, countable. Select
any E E (0,l) - D and a strictly increasing sequence
(e1, E*,
) in (0,l) converging to E. Since F,(R) is nondecreasing, we have
sup{R: Fx(R) < E }
10
0).
log M
>C+6
n
it holds that A, + 1 as n
W.
VERDU AND
1153
c = sup J ( X ; Y)
X
log M
>C+6
n
n+m
[:
A, 2 P -iXnWn(Xn,Yn)
I - log M
n
1
lim sup -Z(X";Y")
6/2
- exp(-Sn/2)
(5.1)
S = supi(X;Y).
(52)
II
< sup I ( X ; Y ) .
W"(X1,"',X,IX1,"',
xn
= [(X,;
Yl) =
1
0.01
if (X1,"', X , )
if (xl,*-, x , )
1
lim -Z(Xf;Yr)
n + m ?l
=B =
E 0,
e 0,
1bit
and
supi(X; Y)
= i ( X 2 ; Yz) =
2 bits
VI.PROPERTIES
OF INF-INFORMATION
RATE
Many of the familiar properties satisfied by mutual
information turn out to be inherited by the inf-information rate. Those properties are particularly useful in the
computation of supx J ( X Y ) for specific channels. Here,
we will show a sample of the most important properties.
As is well known, the nonnegativity of divergence (itself a
consequence of Jensen's inequality) is the key to the proof
of many of the mutual information properties. In the
present context, the property that plays a major role is
unrelated to convex inequalities. That property is the
1154
The sup-entropy rate H(Y) and inf-entropy rate H ( Y ) introduced in [lo] are defined as the limsup and liminf, respectively, in probability of the normalized entropy density
1
- log n
p,,(Y")'
Analogously, the conditional sup-entropy rate &Y IX) is
the limsup in probability (according to {Pxnyn})
of
1
n log P , . ~ ~ ~ ( .Y ~ ~ X ~ >
(X,Y) satisfies
a) _O(XllY) L 0
b) I ( X ; Y) = I(Y;X)
c) I ( X ; Y) 2 0
d) I(x;Y) I @Y> - H(YIX)
I ( X ; Y) 5 m y ) - H(YJX)
I ( X Y) 2 H(Y) - H(YIX)
e) 0 I B(Y>< log IBI
f) I(X, y; 2) 2 I(X 2)
g> If I ( X Y) = !(X;Y) and the input alphabet is finite,
then I ( X ; Y) = limn+, ( l / n ) l ( X " ;Y")
h) I ( X ; Y) Iliminf,,, ( l / n ) l ( X " ;Y " ) .
Proofi Property a) holds because, for every S > 0.
Property b) is an immediate consequence of the definition. Property c) holds because the inf-information rate is
equal to the inf-divergence rate between the processes
( X , Y ) and
Y),where and are independent and
have the same individual statistics as X and Y, respectively.
The inequalities in d) follow from
(x,
+ -n log
PZ",X"y"(Z"lX",
Yn)
. (6.3)
PZ"lx"(ZnlX")
I0)
(6.6)
0
1155
F") I 111
I ( X ; Y) I!(X;P)
(6.7)
2 e-llog e-l
where y is the output due to X,which is an independent
process with the same first-order statistics as X, i.e., where y" is independent of X" and
n
Pzn = n,",lPx,.
g(x", y") =
Fl$(yilxi)/PyJyi).
Proofi Let a denote the liminf in probability (accordi=l
ing to {Pxayn})
of the sequence
Finally, note that (6.13) and (6.14) are incompatible for
sufficiently large n.
0
Analogous proofs can be used to show corresponding
properties for the sup-information rate i (Y)~
arising in
We will show that
the problem of channel resolvability [lo].
ff 2 I ( X ; Y)
(6.9)
VII. EXAMPLES
and
As
an
example
of
a
simple channel which is not encoma II ( X ; P).
(6.10)
passed by previously published formulas, consider the
To show (6.91, we can write (6.8) as
following binary channel.
Let the alphabets be binary A = B = (0, l), and let
1
Jq(Y;.IXi) 1 W"(Y"lX")
= -log
every output be given by
n j - 1 log Pyt(x) n
P,.(Y")
y; = x, + 2,
(7.1)
1
Py.(Y")
+ -log
(6.11) where addition is modulo-2 and Z is an urbitrury binary
n
rI~==,Py,(x>
random process independent of the input X.The evaluawhere the liminf in probability of the first term on the tion of the general capacity formula yields
right-hand side is f ( X ; Y) and the liminf in probability of
the second term is nonnegative owing to Theorem 8a).
supf(X; Y) = log2 - H(Z)
(7.2)
X
To show (6.101, note first (from the independence of the
sequence ( F i , x ) , the Chebyshev inequality, and the dis- where H(Z) is the sup-entropy rate of the additive noise
creteness of the alphabets) that
process Z (cf. definition in Section VI). A special case of
formula (7.2) was proved by Parthasarathy 1221 in the
context of stationary noise processes in a form which, in
1
n
lieu of the sup-entropy rate, involves the supremum over
I .= liminf I(Xi;
(6.12) the entropies of almost every ergodic component of the
n+m
nii=l
stittionary noise.
Then (6.10) will follow once we show that a,the liminf in
In order to verify (7.21, we note first that, according to
probability of Z,, cannot exceed (6.12). To see this, let us properties d) and e) in Theorem 8, every X satisfies
argue by contradiction and assume otherwise, i.e., for
I ( X ; Y ) Ilog2 - H ( Y I X ) .
(7.3)
some y > 0,
x).
P[ 2" > I ( X ; P)
+ y]
1.
(6.13)
1156
H(z)
1 "
limsup- C h ( 8 , ) .
n-m
n,=1
w, = 2-,
10g1/Pzzk(z2k).
Wk+i= (wk+ L ,
+ A,)/2
1157