Académique Documents
Professionnel Documents
Culture Documents
INFORMATION THEORY
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
ENTROPY
Entropy is a measure of the uncertainty of a random variable.
The entropy H(X) of a discrete random variable X is defined by
H ( X ) p( x) log p( x)
x
MANAV BHATNAGAR
1
p ( x)
MANAV BHATNAGAR
ELL714
INFORMATION THEORY
H(X ) 0
Lemma 2.1.1
Proof:
Lemma 2.1.2
1
0
p( x)
H b ( X ) (logb a) H a ( X )
Proof:
logb p logb a log a p
Properties of entropy
a) H ( X ) 0
Since 1 p( X ) 0 log p( x) 0
b) H ( X ) (log a) H ( X )
b
MANAV BHATNAGAR
EXAMPLE
Let
with probability p
X 10 with
probability 1-p
Then
H(X) = p log p (1 p) log(1 p)
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
ELL714
INFORMATION THEORY
d
p
1 p
H ( p ) log p (1)log(1 p )
)(1)
dp
p
1 p
=-logp+log(1-p)
d
H ( p) 0
dp
1
p
2
1 1
1
H ( ) log 2 2 log 2 2 1
2 2
2
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
Example 1.1.1
Consider a random variable that has a uniform distribution over 32 outcomes. To
identify an outcome, we need a label that takes on 32 different values. Thus, 5bit strings suffice as labels.
The entropy of this random variable is
32
32
1
1
log log32 5 bits
32
32
i 1
Example 1.1.2
Suppose that we have a horse race with eight horses taking part. Assume that
1
1
1
1
1
1 1 1
the probabilities of winning for the eight horses are 2 , 4 , 8 , 16 , 64 , 64 , 64 , 64
1
1 1
1 1
1 1
1
1
1
H ( X ) log log log log 4 log
2
2 4
4 8
8 16
16
64
64
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
ELL714
INFORMATION THEORY
Example 2.1.2
1
a with probability 2
b with probaility 1
4
X
c with probaility 1
1
d with probaility
8
The entropy of X is
1
1 1
1 1
1 1
1 7
H ( X ) log log log log bits
2
2 4
4 8
8 8
8 8
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
H (Y | X ) p ( x) H (Y | X x)
x X
=- p ( x ) p ( y | x ) log p ( y | x )
xX
yY
=- p ( x, y ) log p ( y | x )
xX yY
=-E log p ( y | x )
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
ELL714
INFORMATION THEORY
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
Example 2.2.1
Let (X, Y) have the following joint distribution:
H ( X | Y ) p (Y i ) H ( X | Y i )
i 1
1 1 1 1 1 1 1 1 1 1
= H , , , H , , ,
4 2 4 8 8 4 4 2 8 8
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
1 1 1 1 1 1
+ H , , , H (1,0,0,0)
4 4 4 4 4 4
1 7 1 7 1
1
= 2 0
4 4 4 4 4
4
11
= bits
8
MANAV BHATNAGAR
H(Y|X)=13/8 bits
H(X,Y)=27/8 bits
H(X|Y)\=H(Y|X)
H(X)-H(X|Y)=H(Y)H(Y|X)
10
ELL714
INFORMATION THEORY
=E p log
p ( x)
q ( x)
p ( x)
q( x)
if there is any symbol x X such that p(x) > 0 and q(x) = 0, then D(p||q)=.
relative entropy is always nonnegative and is zero if and only if p = q.
However, it is not a true distance between distributions since it is not symmetric
and does not satisfy the triangle inequality
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
11
Mutual Information
Consider two random variables X and Y with a joint probability mass function
p(x, y) and marginal probability mass functions p(x) and p(y). The mutual
information I (X; Y) is the relative entropy between the joint distribution and
the product distribution p(x)p(y):
I ( x; y )
p ( x, y )
p( x, y ) log p( x) p( y )
x X yY
=D(p(x,y)||p(x)p(y))
=E p ( x , y ) log
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
p ( x, y )
p( x) p( y )
MANAV BHATNAGAR
12
ELL714
INFORMATION THEORY
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
13
p ( x, y )
p( x) p( y )
=- p( x, y )log p( x) p( x, y )log p( x | y )
x, y
x, y
= - p( x)log p( x) ( p( x, y )log p( x | y ))
x
x, y
= H(X) -H(X|Y)
By symmetry, it also follows that
I (X; Y) = H(Y) H(Y|X).
Since H(X,Y) = H(X) + H(Y|X) ,we have
I (X; Y) = H(X) + H(Y) H(X,Y).
Finally, we note that
I (X; X) = H(X) H(X|X) = H(X).
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
14
ELL714
INFORMATION THEORY
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
15
Proof:
H ( X 1 , X 2 ......, X n ) H ( X i | X i 1 ,...,| X 1 )
i 1
I ( X 1 , X 2 ......, X n ;Y ) I ( X i ;Y | X i 1,..., X 1 )
i 1
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
16
ELL714
INFORMATION THEORY
Definition For joint probability mass functions p(x, y) and q(x, y), the
conditional relative entropy D(p(y|x)||q(y|x)) is the average of the relative
entropies between the conditional probability mass functions p(y|x) and
q(y|x) averaged over the probability mass function p(x).
D( p( y | x) || q( y | x)) p( x) p( y | x)log
x
=E p ( x , y ) log
p ( y | x)
q ( y | x)
p ( y | x)
q ( y | x)
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
17
Source
Convex functions:
2 , , , (
0)
Concave functions:
, ( 0)
Convexity test:
2 1
0,
Strict convex:
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
>0,
18
ELL714
INFORMATION THEORY
p
i 1
k 1
f ( xi ) pk f ( xk ) (1 pk ) pi' f ( xi )
i 1
k 1
p k f ( xk ) (1 pk ) f ( pi' xi )
i 1
k 1
f(p k xk (1 pk ) pi' xi )
i 1
=f( pi xi )
i 1
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
19
D ( p || q ) p ( x)log
=log q ( x)
x A
log q ( x)
x X
Equality iff
)= ( ) = 1
(
=log1
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
Equality iff
q(x)=cp(x)
c: constant
=1
=0
MANAV BHATNAGAR
20
10
ELL714
INFORMATION THEORY
Proof
I ( X ;Y ) D( p( x, y) || p( x) p( y)) 0
with equality if and only if p(x, y) = p(x)p(y) (i.e., X and Y are independent).
Corollary
D(p(y|x)||q(y|x)) 0
with equality if and only if p(y|x) = q(y|x) for all y and x such that p(x) > 0.
Corollary
I (X; Y|Z) 0,
with equality if and only if X and Y are conditionally independent given Z.
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
21
D( p || u ) p( x)log
p( x)
log X H ( X )
u ( x)
D( p || q) H ( X ) log X 0
H(X) log X
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
22
11
ELL714
INFORMATION THEORY
Theorem 2.6.6
Let X1, X 2 ......, X n be drawn according to p(x1, x2, . . . , xn). Then
n
H ( X 1 , X 2 ,....., X n ) H ( X i )
i 1
H ( X 1 , X 2 ,.......... X n ) H ( X i | X i 1, ...., X 1 )
i 1
n
H( Xi)
i 1
MANAV BHATNAGAR
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
23
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
24
12
ELL714
INFORMATION THEORY
i 1
log
n
ai
( ai ) log
bi
i 1
ai '
n
a
i 1
log
bi '
i 1
n
b
i 1
bi
B
n
ai
a 'A
A ai ' log i '
bi
bi B
i 1
n
=A a i ' log
i 1
n
=A ai ' log
i 1
n
ai '
A
A ai ' log
'
bi
B
i 1
ai '
A
A log
bi '
B
A
B
A
B
MANAV BHATNAGAR
25
We now use the log sum inequality to prove various convexity results.
We begin by reproving Theorem 2.6.3, which states that D(p||q) 0 with
equality if and only if p(x) = q(x). By the log sum inequality,
D( p || q) p( x)log
p( x)
q( x)
( p ( x))log p ( x) / q ( x)
1
=1log 0
1
with equality if and only if p(x)/q(x) = c. Since both p and q are probability
mass functions, c = 1, and hence we have D(p||q) = 0 if and only if p(x) =
q(x) for all x.
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
26
13
ELL714
INFORMATION THEORY
q
(
x
)
(1
)
q
(
x
)
q1 ( x) (1 )q2 ( x)
xX
xX
xX
1
2
=D( p1 (1 ) p2 || q1 (1 )q2
p1 ( x)log
2)
D ( p || q ) 0
proof
D ( p || q ) p ( x) log
INFORMATION
THEORY(ELL714)
p( x)
q( x)
p( x)
( p ( x )) log
q ( x)
BHATNAGAR
MANAV
=0
27
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
28
14
ELL714
INFORMATION THEORY
DATA-PROCESSING INEQUALITY
Definition
Random variables X, Y, Z are said to form a Markov chain in that order (denoted
by X Y Z) if the conditional distribution of Z depends only on Y and is
conditionally independent of X. Specifically, X, Y, and Z form a Markov chain X
Y Z if the joint probability mass function can be written as
p(x, y, z) = p(x)p(y|x)p(z|y)=p(x,y) p(z|y)
Some simple consequences are as follows:
X Y Z if and only if X and Z are conditionally independent given Y.
Markovity implies conditional independence because
p ( x, z | y )
p( x, y, z ) p( x, y ) p( z | y )
p( x | y ) p( z | y )
p( y )
p( y )
MANAV BHATNAGAR
29
MANAV BHATNAGAR
MANAV BHATNAGAR
30
15
ELL714
INFORMATION THEORY
Corollary
If X Y Z, then I (X; Y|Z) I (X; Y).
Proof:
I (X;Z|Y) = 0, by Markovity, and I (X; Z) 0 thus
I (X; Y|Z) I (X; Y).
Thus, the dependence of X and Y is decreased (or remains unchanged) by the
observation of a downstream random variable Z. Note that it is also possible
that I (X; Y|Z) > I (X; Y) when X, Y, and Z do not form a Markov chain. For
example, let X and Y be independent fair binary random variables, and let Z = X
+ Y. Then I (X; Y) = 0, but I (X; Y|Z) = H(X|Z) H(X|Y,Z) = H(X|Z) = P(Z = 1)H(X|Z =
1) = 1/2 bit.
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
31
Sufficient statistics
Let f0 (x) :a family of probability mass function
X : a sample form of distribution
T(X) : average statistics of X
then X T(X)
I( ;T(X)) I( ;X)
T(X) is a sufficient statistics if
I( ;X) I ( ;T(X))
and x are independent
X T(X)
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
32
16
ELL714
INFORMATION THEORY
f (X)=
n 1
=(
i 1
in
1 N 2 ( X n )
f 0 ( X n ) ( ) 2 e n1
2
( X n X X )
1 N2 2
) e n1
2
2
2
1 N 2 (( X n X ) 2(X n X )( X ) ( X ) )
=( ) 2 e n1
2
(X
n 1
X )( X ) X ( X n X )
n 1
f (X ) (
INFORMATION
THEORY(ELL714)
1
( X n X )2
2 n 1
1 N2
) e
2
1
( X ) 2
2 n 1
MANAV BHATNAGAR
33
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
34
17
ELL714
INFORMATION THEORY
FANOS INEQUALITY
Suppose that we know a random variable Y and we wish to
guess the value of a correlated random variable X
Fanos inequality relates the probability of error in guessing
the random variable X to its conditional entropy H(X|Y)
The conditional entropy of a random variable X given another
random variable Y is zero if and only if X is a function of Y
Hence we can estimate X from Y with zero probability of error
if and only if H(X|Y) = 0
We expect to be able to estimate X with a low probability of
error only if the conditional entropy H(X|Y) is small
Fanos inequality quantifies this idea
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
35
FANOS INEQUALITY
Let us wish to estimate a random variable X with a
distribution p(x)
We observe a random variable Y that is related to X by the
conditional distribution p(y|x)
From Y, we calculate a function g(Y) = , where is an
estimate of X
We wish to bound the probability that
We observe that X Y forms a Markov chain
Define the probability of error
= Pr{ }
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
36
18
ELL714
INFORMATION THEORY
FANOS INEQUALITY
Theorem 2.10.1: Fanos Inequality
For any estimator X such that
X Y X , with Pe = Pr(X = X ), we have
for any estimator x such that x y x with p e pr ( x x ), we have
H(p ) p log | X | H ( X | X ) H ( X | Y )
e
H ( X | Y ) 1
log X
Proof:
Define an error random variable,
x
E 10 ifif xx=x
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
37
Then, using the chain rule for entropies to expand H(E,X| X ) in two different
ways, we have
H ( E , X | X ) H ( X | X ) H ( E | X , X )
=H(E|X)+H(X|E,X)
Also, since E is a binary-valued random variable, H(E) =H(Pe ). The remaining term, H(X|E,X
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
38
19
ELL714
INFORMATION THEORY
Corollary:
For any two random variables X and Y, let p=Pr(X Y)
H(p) + p log |X| H(X|Y)
Corollary:
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
39
pe
p
,......, e )
m 1
m 1
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
40
20
ELL714
INFORMATION THEORY
Lemma 2.10.1
If X and X are i.i.d. with entropy H(X),
Pr( X X ') 2 H ( X )
2E log p ( X ) E 2log p ( X )
which implies that
2 H ( X ) 2
INFORMATION
THEORY(ELL714)
p ( x )log p ( x )
p( x)2log p ( x ) p 2 ( x)
MANAV BHATNAGAR
41
Corollary
Let X, X be independent with X p(x), X r(x), x, x X. Then
p ( x ) log p ( x ) p ( x ) log p ( x )
2 H ( p ) D ( p||r ) 2
p ( x ) log r ( x )
=2
r ( x)
p ( x)2log r ( x )
= p ( x )r ( x )
=Pr(X=X')
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
42
21
ELL714
INFORMATION THEORY
Example :
Entropy of a disjoint mixture. Let X1 and X2 be discrete random variables drawn
according to probability mass functions p1() and p2() over the respective
alphabets X1 = {1, 2, . . . , m} and X2 = {m + 1, . . . , n}. Let
X1 with probability
X 2 with probability 1-
Solution
Since X1 and X2 have disjoint support sets ,we can write
X
Define a function X
X1 with probability
X 2 with probability 1-
when X=X
f ( X ) X 12 when
X=X
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
43
H ( X ) H ( X , f ( X )) H ( ) H ( X | )
= H( ) p( 1) H ( X | 1) p( 2) H ( X | 2)
=H( )+ H(X1 ) (1 ) H ( X 2 )
where H( )=- log -(1- )log(1- )
INFORMATION
THEORY(ELL714)
MANAV BHATNAGAR
MANAV BHATNAGAR
44
22