Solutions To Problems Related To Information Theory

Harvard SEAS
ES250 Information Theory
Homework 5 Solutions
1. Suppose that (X, Y, Z ) are jointly Gaussian and that X Y Z forms a Markov chain. Let X and Y have correlation coecient 1 and let Y and Z have correlation coecient 2 . Find I (X ; Z ). Solution : First note that we may without any loss of generality assume that the means of X , Y and Z are zero. If in fact the means are not zero one can subtract the vector of means without aecting the mutual information or the conditional independence of X , Z given Y . Similarly we can also assume the variances of X , Y , and Z to be 1. (The scaling may change the dierential entropy, but not the mutual information.) Let = 1 xz xz 1
be the covariance matrix of X and Z . From Eqs. (9.93) and (9.94) I (X ; Z ) = h(X ) + h(Z ) h(X, Z ) 1 1 1 = log(2e) + log(2e) log((2e)2 ||) 2 2 2 1 = log(1 2 xz ) 2 Now from the conditional independence of X and Z given Y , we have xz = E [XZ ] = E [E [X |Y ] E [Z |Y ]] = E [ 1 Y 2 Y ] = 1 2 = E [E [XZ |Y ]]
We can thus conclude that 1 2 I (X ; Z ) = log(1 2 1 2 ) 2
2. Evaluate the dierential entropy h(X ) = f ln f for the following cases: (a) The Laplace density, f (x) = ex , x 0.
1 |x| . 2 e
Relate this to the entropy of the exponential density
(b) The sum of X1 and X2 , where X1 and X2 are independent normal random variables with means 2 , i = 1, 2. i and variances i Solution : 1
Harvard SEAS
(a) Note that the Laplace density is a two sided exponential density, so each side has a dierential entropy of the exponential and one bit is needed to specify which side. So for f0 (x) = ex , x 0 we have, 1 1 h(f ) = h(f0 (x)) + h(f0 (x)) + H 2 2 e = log + log 2 bits 2e bits = log 1 2
(b) The sum of two independent normal random variables is also normal, so X1 + X2 N (1 + 2 + 2 ), 2 , 1 2 h(f ) = 1 2 2 log 2e(1 + 2 ) bits 2
3. We wish to show that any density f0 can be considered to be a maximum entropy density. Let f0 (x) be a density and consider the problem of maximizing h(f ) subject to the constraint f (x)r(x)dx = where r(x) = ln f0 (x). Show that there is a choice of , = 0 , such that the maximizing distribution is f (x) = f0 (x). Thus f0 (x) is indeed a maximum entropy density under the constraint f ln f0 = 0 . Solution : Given the constraints that r(x)f (x) = the maximum entropy density is f (x) = e0 +1 r(x) With r(x) = log f0 (x), we have f (x) =
1 f0 (x) 1 f0 (x)dx
where 1 is chosen to satisfy the constraint. We can choose the value of the constraint to correspond to the value 1 = 1, i.e., 0 = f ln f0 , in which case f = f0 . So f0 is a maximum entropy density under appropriate constraints. 4. Let Y = X1 + X2 . Find the maximum entropy (over all distributions on X1 and X2 ) of Y under the 2 ] = P , E [X 2 ] = P . constraint E [X1 1 2 2 (a) if X1 and X2 are independent. (b) if X1 and X2 are allowed to be dependent.
Harvard SEAS
Solution : Assume that Y = X1 + X2 has some distribution with the variance 2 . Since the Gaussian distribution maximizes the entropy under the variance constraint, h(Y ) h(Y ) = 1 log(2e 2 ) 2
where Y = X1 + X2 is a Gaussian random variable with the same variance. Therefore, to maximize 2] = P , the entropy, we should maximize the variance 2 over all joint Gaussian X1 and X2 with E [X1 1 2] = P . E [X 2 2 (a) Independence implies that Var(Y ) = Var(X1 ) + Var(X2 ) = P1 + P2 This yields the variance P1 + P2 and the entropy (b) In this case, Var(Y ) = P1 + P2 + 2 P1 P2 where [1, 1] is the correlation coecient between X1 and X2 . The variance of Y is maximized for = 1. Note that = 1 means that the two random variables are adding coherently. The maximum entropy is then 1 log 2 e ( P + P + 2 P1 P2 ). 1 2 2
1 5. Let the input random variable X to a channel be uniformly distributed over the interval 2 x +1 2. Let the output of the channel be Y = X + Z , where the noise random variable is uniformly distributed a over the interval a 2 z +2. 1 2
log 2e(P1 + P2 ).
(a) Find I (X ; Y ) as a function of a. (b) For a = 1 nd the capacity of the channel when the input X is peak-limited; that is, the range 1 of X is limited to 1 2 x + 2 . What probability distribution on X maximizes the mutual information I (X ; Y )? (c) [Optional] Find the capacity of the channel for all values of a, again assuming that the range 1 of X is limited to 1 2 x +2. Solution : The probability density function for Y = X + Z is the convolution of the density of X and Z . Since both X and Z have rectangular densities, the density of Y is a trapezoid. For a < 1 the density for Y is (1+a) a) a) 1 (1+ y (1 2a (y + 2 ), 2 2 a) a) pY (y ) = 1, (1 y (1 2 2 1 (1+a) (1a) (1+a) y 2 2a (y + 2 ), 2 and for a > 1 the density for Y is (1+a) a) a) (1+ y (1 y+ 2 , 2 2 a) a) 1 pY (y ) = (1 y (1 a, 2 2 a) (1a) a) y + (1+ y (1+ 2 , 2 2 3
(When a = 1, the density of Y is triangular over the interval [1, +1].)
Harvard SEAS
(a) We use the identity I (X ; Y ) = h(Y ) h(Y |X ). It is easy to compute h(Y ) directly, but it is even easier the grouping property of entropy. First suppose that a < 1. With probability 1 a, a 1a the output Y is conditionally uniformly distributed in the interval [ 1 2 , 2 ]; whereas with probability a, Y has a split triangular density where the base of the triangle has width a. 1 h(Y ) = H (a) + (1 a) ln(1 a) + a( + ln a) 2 a a nats = a ln a (1 a) ln(1 a) + (1 a) ln(1 a) + + a ln a = 2 2 If a > 1 the trapezoidal density of Y can be scaled by a factor a, which yields h(Y ) = ln a + 21 a. Given any value of x, the output Y is conditionally uniformly distributed over an interval of length a, so the conditional dierential entropy in nats is h(Y |X ) = h(Z ) = ln a, for all a > 0. Therefore the mutual information in nats is I (X ; Y ) =
a 2 1 2a ,
ln a, if a 1 if a > 1.
(b) As usual with additive noise, we can express I (X ; Y ) in terms of h(Y ) and h(Z ): I (X ; Y ) = h(Y ) h(Y |X ) = h(Y ) h(Z )
1 1 Since both X and Z are limited to the interval [ 2 , 2 ], their sum Y is limited to the interval [1, 1]. The dierential entropy of Y is at most that of a random variable uniformly distributed on that interval; that is, h(Y ) 1. This maximum entropy can be achieved if the input X takes 1 on its extreme values x = 2 each with probability 1 2 . In this case, I (X ; Y ) = h(Y ) h(Z ) = 1 0 = 1. Decoding for this channel is quite simple :
As expected, I (X ; Y ) as a 0 and I (X ; Y ) 0 as a .
= X
1 , if y 0 2 1 if y > 0. 2,
This coding scheme transmits one bit per channel use with zero error probability. (Only a received value y = 0 is ambiguous, and this occurs with probability 0.)
1 for m = 2, 3, , we can achieve the maximum possible value I (X ; Y ) = (c) When a is of the form m 2 log m when X is uniformly distributed over the discrete points {1, 1 + m2 1 , , 1 m1 , 1}. 1 In this case Y has a uniform probability density on the interval [1 m1 1 , 1 + m1 ]. Other values of a are left as an exercise.

Solutions To Problems Related To Information Theory

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Solutions To Problems Related To Information Theory

Transféré par

Droits d'auteur :

Formats disponibles

Harvard SEAS

ES250 Information Theory

We can thus conclude that 1 2 I (X ; Z ) = log(1 2 1 2 ) 2

Relate this to the entropy of the exponential density

ES250 Information Theory

ES250 Information Theory

(When a = 1, the density of Y is triangular over the interval [1, +1].)

ES250 Information Theory

Vous aimerez peut-être aussi