Vous êtes sur la page 1sur 4

Harvard SEAS

ES250 Information Theory

Homework 5 Solutions

1. Suppose that (X, Y, Z ) are jointly Gaussian and that X Y Z forms a Markov chain. Let X and Y have correlation coecient 1 and let Y and Z have correlation coecient 2 . Find I (X ; Z ). Solution : First note that we may without any loss of generality assume that the means of X , Y and Z are zero. If in fact the means are not zero one can subtract the vector of means without aecting the mutual information or the conditional independence of X , Z given Y . Similarly we can also assume the variances of X , Y , and Z to be 1. (The scaling may change the dierential entropy, but not the mutual information.) Let = 1 xz xz 1

be the covariance matrix of X and Z . From Eqs. (9.93) and (9.94) I (X ; Z ) = h(X ) + h(Z ) h(X, Z ) 1 1 1 = log(2e) + log(2e) log((2e)2 ||) 2 2 2 1 = log(1 2 xz ) 2 Now from the conditional independence of X and Z given Y , we have xz = E [XZ ] = E [E [X |Y ] E [Z |Y ]] = E [ 1 Y 2 Y ] = 1 2 = E [E [XZ |Y ]]

We can thus conclude that 1 2 I (X ; Z ) = log(1 2 1 2 ) 2

2. Evaluate the dierential entropy h(X ) = f ln f for the following cases: (a) The Laplace density, f (x) = ex , x 0.
1 |x| . 2 e

Relate this to the entropy of the exponential density

(b) The sum of X1 and X2 , where X1 and X2 are independent normal random variables with means 2 , i = 1, 2. i and variances i Solution : 1

Harvard SEAS

ES250 Information Theory

(a) Note that the Laplace density is a two sided exponential density, so each side has a dierential entropy of the exponential and one bit is needed to specify which side. So for f0 (x) = ex , x 0 we have, 1 1 h(f ) = h(f0 (x)) + h(f0 (x)) + H 2 2 e = log + log 2 bits 2e bits = log 1 2

(b) The sum of two independent normal random variables is also normal, so X1 + X2 N (1 + 2 + 2 ), 2 , 1 2 h(f ) = 1 2 2 log 2e(1 + 2 ) bits 2

3. We wish to show that any density f0 can be considered to be a maximum entropy density. Let f0 (x) be a density and consider the problem of maximizing h(f ) subject to the constraint f (x)r(x)dx = where r(x) = ln f0 (x). Show that there is a choice of , = 0 , such that the maximizing distribution is f (x) = f0 (x). Thus f0 (x) is indeed a maximum entropy density under the constraint f ln f0 = 0 . Solution : Given the constraints that r(x)f (x) = the maximum entropy density is f (x) = e0 +1 r(x) With r(x) = log f0 (x), we have f (x) =
1 f0 (x) 1 f0 (x)dx

where 1 is chosen to satisfy the constraint. We can choose the value of the constraint to correspond to the value 1 = 1, i.e., 0 = f ln f0 , in which case f = f0 . So f0 is a maximum entropy density under appropriate constraints. 4. Let Y = X1 + X2 . Find the maximum entropy (over all distributions on X1 and X2 ) of Y under the 2 ] = P , E [X 2 ] = P . constraint E [X1 1 2 2 (a) if X1 and X2 are independent. (b) if X1 and X2 are allowed to be dependent.

Harvard SEAS

ES250 Information Theory

Solution : Assume that Y = X1 + X2 has some distribution with the variance 2 . Since the Gaussian distribution maximizes the entropy under the variance constraint, h(Y ) h(Y ) = 1 log(2e 2 ) 2

where Y = X1 + X2 is a Gaussian random variable with the same variance. Therefore, to maximize 2] = P , the entropy, we should maximize the variance 2 over all joint Gaussian X1 and X2 with E [X1 1 2] = P . E [X 2 2 (a) Independence implies that Var(Y ) = Var(X1 ) + Var(X2 ) = P1 + P2 This yields the variance P1 + P2 and the entropy (b) In this case, Var(Y ) = P1 + P2 + 2 P1 P2 where [1, 1] is the correlation coecient between X1 and X2 . The variance of Y is maximized for = 1. Note that = 1 means that the two random variables are adding coherently. The maximum entropy is then 1 log 2 e ( P + P + 2 P1 P2 ). 1 2 2
1 5. Let the input random variable X to a channel be uniformly distributed over the interval 2 x +1 2. Let the output of the channel be Y = X + Z , where the noise random variable is uniformly distributed a over the interval a 2 z +2. 1 2

log 2e(P1 + P2 ).

(a) Find I (X ; Y ) as a function of a. (b) For a = 1 nd the capacity of the channel when the input X is peak-limited; that is, the range 1 of X is limited to 1 2 x + 2 . What probability distribution on X maximizes the mutual information I (X ; Y )? (c) [Optional] Find the capacity of the channel for all values of a, again assuming that the range 1 of X is limited to 1 2 x +2. Solution : The probability density function for Y = X + Z is the convolution of the density of X and Z . Since both X and Z have rectangular densities, the density of Y is a trapezoid. For a < 1 the density for Y is (1+a) a) a) 1 (1+ y (1 2a (y + 2 ), 2 2 a) a) pY (y ) = 1, (1 y (1 2 2 1 (1+a) (1a) (1+a) y 2 2a (y + 2 ), 2 and for a > 1 the density for Y is (1+a) a) a) (1+ y (1 y+ 2 , 2 2 a) a) 1 pY (y ) = (1 y (1 a, 2 2 a) (1a) a) y + (1+ y (1+ 2 , 2 2 3

(When a = 1, the density of Y is triangular over the interval [1, +1].)

Harvard SEAS

ES250 Information Theory

(a) We use the identity I (X ; Y ) = h(Y ) h(Y |X ). It is easy to compute h(Y ) directly, but it is even easier the grouping property of entropy. First suppose that a < 1. With probability 1 a, a 1a the output Y is conditionally uniformly distributed in the interval [ 1 2 , 2 ]; whereas with probability a, Y has a split triangular density where the base of the triangle has width a. 1 h(Y ) = H (a) + (1 a) ln(1 a) + a( + ln a) 2 a a nats = a ln a (1 a) ln(1 a) + (1 a) ln(1 a) + + a ln a = 2 2 If a > 1 the trapezoidal density of Y can be scaled by a factor a, which yields h(Y ) = ln a + 21 a. Given any value of x, the output Y is conditionally uniformly distributed over an interval of length a, so the conditional dierential entropy in nats is h(Y |X ) = h(Z ) = ln a, for all a > 0. Therefore the mutual information in nats is I (X ; Y ) =
a 2 1 2a ,

ln a, if a 1 if a > 1.

(b) As usual with additive noise, we can express I (X ; Y ) in terms of h(Y ) and h(Z ): I (X ; Y ) = h(Y ) h(Y |X ) = h(Y ) h(Z )
1 1 Since both X and Z are limited to the interval [ 2 , 2 ], their sum Y is limited to the interval [1, 1]. The dierential entropy of Y is at most that of a random variable uniformly distributed on that interval; that is, h(Y ) 1. This maximum entropy can be achieved if the input X takes 1 on its extreme values x = 2 each with probability 1 2 . In this case, I (X ; Y ) = h(Y ) h(Z ) = 1 0 = 1. Decoding for this channel is quite simple :

As expected, I (X ; Y ) as a 0 and I (X ; Y ) 0 as a .

= X

1 , if y 0 2 1 if y > 0. 2,

This coding scheme transmits one bit per channel use with zero error probability. (Only a received value y = 0 is ambiguous, and this occurs with probability 0.)
1 for m = 2, 3, , we can achieve the maximum possible value I (X ; Y ) = (c) When a is of the form m 2 log m when X is uniformly distributed over the discrete points {1, 1 + m2 1 , , 1 m1 , 1}. 1 In this case Y has a uniform probability density on the interval [1 m1 1 , 1 + m1 ]. Other values of a are left as an exercise.