Vous êtes sur la page 1sur 9

EE376B/Stat 376B Information Theory Prof. T.

Cover

Handout #5 Tuesday, April 12, 2011

Solutions to Homework Set #1 1. Dierential entropy. Evaluate the dierential entropy h(X ) = f ln f for the following: e|x| . (a) The Laplace density, f (x) = 1 2 x nential density e , x 0. Relate this to the entropy of the expo-

(b) The sum of X1 and X2 , where X1 and X2 are independent normal random vari2 ables with means i and variances i , i = 1 , 2. Solution: Dierential entropy. (a) Laplace density. Note that the Laplace density is a two sided exponential density, so each side has a dierential entropy of the exponential and one bit is needed to specify which side. So for f0 (x) = ex , x 0 we have, ( ) 1 1 1 h(f ) = h(f0 (x)) + h(f0 (x)) + H (1) 2 2 2 e = log + log 2 bits. (2) 2e = log bits. (3) (b) Sum of two independent normal distributions. The sum of two independent normal random variables is also normal, so applying the result derived the class for the normal distribution, since X1 + X2 N (1 + 2 2 2 , 1 + 2 ), 1 2 2 + 2 ) bits. (4) h(f ) = log 2e(1 2

2. Maximum entropy A die comes up 6 twice as often as it comes up 1. What is the maximum entropy (p1 , p2 , . . . , p6 )?

Solution: We have the constraint p6 = 2p1 or p6 2p1 = 0. This can be written as a constraint Ef (X ) = 0 (5) where x=6 2, 1, x=1 f (x) = 0, Otherwise

Now the optimal distribution is p (x) = exp(0 + 1 f (x)) where 0 and 1 are determined using the constraints x p(x) = 1 and Ef (X ) = 0. From equn. (5), we have 2 exp(0 + 21 ) exp(0 + 1 (1)) = 0 2 exp(0 + 21 ) = exp(0 + 1 (1)) 1 exp(31 ) = 2 1 1 = ln 2 3 Finding 0 is equivalent to normalizing the distribution. Thus p (x) =

exp(21 ) , exp(21 )+exp(1 )+4 exp(1 ) , exp(21 )+exp(1 )+4 1 , exp(21 )+exp(1 )+4

x=6 x=1 Otherwise

3. Maximum entropy of atmosphere. Maximize h(Z, Vx , Vy , Vz ), Z 0, (Vx , Vy , Vz ) R3 , subject to the energy constraint 1 E( 2 m V 2 +mgZ ) = E0 . Show that the resulting distribution yields 1 3 E m V 2 = E 0 2 5 2 EmgZ = E0 . 5 of the energy is stored in the potential eld, regardless of its strength g .

Thus

2 5

Solution: Maximum entropy of atmosphere. As derived in class, the maximum entropy distribution subject to the constraint ) ( 1 2 mv + mgZ = E0 E 2 2

(6)

1 2 m 2 m 2 m 2 is of the form f (z, vx , vy , vz ) = Ce( 2 mv +mgZ ) = Ce 2 vx e 2 vy e 2 vz emgZ . We recognize this as a product distribution of four independent random variables with 1 Vx , Vy , Vz N (0, m ) and Z exp(mg ). Therefore, ( ) 1 E (mgZ ) = mg mg 1 = (7) ( ) 1 2 1 1 E ( mvx ) = m 2 2 m 1 = (8) 2

( ) 2 1 2 The constraint on energy yields = E . This immediately gives E mgZ = 5 E0 and 0 5 ( ) 3 2 E 1 m v = E . The split of energy between kinetic energy and potential energy 2 5 0 2 is 3 regardless of the strength of gravitational eld g .

4. Non ergodic Gaussian process. Consider a constant signal V in the presence of iid observational noise {Zi }. Thus Xi = V + Zi , where V N (0, S ), and Zi are iid N (0, N ). Assume V and {Zi } are independent. (a) Is {Xi } stationary? n 1 (b) Find limn n i=1 Xi . Is the limit random? (c) What is the entropy rate h of {Xi }? n+1 (X n ) and nd 2 = limn E (X n (d) Find the least mean squared error predictor X 2 Xn ) .
1 log f (X n ) h? (e) Does {Xi } have an AEP? That is, does n

Nonergodic Gaussian process (a) Yes. EXi = EV + Zi = 0 for all i, and { EXi Xj = E (V + Zi )(V + Zj ) = S, i=j S + N. i = j (9)

Since Xi is Gaussian distributed it is completely characterized by its rst and second moments. Since the moments are stationary, Xi is wide sense stationary, which for a Gaussian distribution implies that Xi is stationary.

(b) 1 Xi = lim n n i=1


n

1 lim (Zi + V ) n n i=1


n

(10) (11) (12) (13)

= V + lim

1 Zi n n i=1
n

= V + EZi (by the strong law of large numbers) = V

The limit is a random variable N (0, S ). (c) Note that X n N (0, KX n ), where KX n has diagonal values of S + N and o diagonal values of S . Also observe that the determinant is |KX n | = N n (nS/N +1). We now compute the entropy rate as: 1 h(X ) = lim h(X1 , X2 , . . . , Xn ) (14) n 1 = lim log((2e)n |KX n |) (15) 2n ( ( )) 1 nS = lim log (2e)n N n +1 (16) 2n N ( ) nS 1 1 n log(2eN ) + log +1 (17) = lim 2n 2n N ( ) nS 1 1 = log 2eN + lim log +1 (18) 2 2n N 1 = log 2eN (19) 2 (d) By iterated expectation we can write ( (( )) )2 ( )2 n n n n+1 (X ) X n+1 (X ) = E E Xn+1 X E Xn+1 X (20) We note that minimizing the expression is equivalent to minimizing the inner expectation, and that for the inner expectation the predictor is a nonrandom variable. Expanding the inner expectation and taking the derivative with respect n+1 (X n ), we get to the estimator X ( ) n 2 n E (Xn+1 Xn+1 (X )) |X ) ( n n n 2 2 (21) ( X )) | X 2 X X ( X ) + X = E (Xn n +1 n +1 n+1 +1 so ( ) n+1 (X n ))2 |X n dE (Xn+1 X n+1 (X n ) dX ) ( n+1 (X n )|X n = E 2Xn+1 + 2X n+1 (X n ) = 2E (Xn+1 |X n ) + 2X 4

(22) (23)

n+1 (X n ) = E (Xn+1 |X n ). Setting the derivative equal to 0, we see that the optimal X To nd the limiting squared error for this estimator, we use the fact that V and X n are normal random variables with known covariance matrix, and therefore the conditional distribution of V given X n is ) ( n SN S Xi , (24) f (V |X n ) N nS + N i=1 nS + N Now n+1 (X n ) = E (Xn+1 |X n ) X = E (V |X n ) + E (Zn+1 |X n ) n S = Xi + 0 nS + N i=1 and hence the limiting squared error e2 = = n Xn )2 lim E (X (28) ( )2 n1 S lim E Xi Xn (29) n (n 1)S + N i=1 )2 ( n1 S (Zi + V ) Zn V (30) lim E n (n 1)S + N i=1 ( )2 n1 S N lim E Zi Zn V (31) n (n 1)S + N i=1 (n 1)S + N )2 ( )2 ( n1 N S 2 2 lim EZi + EZn + EV 2(32) n (n 1)S + N ( n 1) S + N i=1 ( )2 ( )2 S N lim (n 1)N + N + S (33) n (n 1)S + N (n 1)S + N 0+N +0 (34) N (35)
n

(25) (26) (27)

= = = = =

(e) Even though the process is not ergodic, it is stationary, and it does have an AEP

because
1 1 1 1 X t KX n X/2 ln f (X n ) = ln 1 e n/ 2 n n (2 ) |KX n | 2 1 1 1 1 = ln(2 )n + ln |KX n | + X t KX nX 2n 2n 2n 1 1 1 1 = ln(2e)n |KX n | + X t KX nX 2n 2 2n 1 1 1 1 = h( X n ) + X t K X nX n 2 2n

(36) (37) (38) (39) (40)

Since X N (0, K ), we can write X = K 2 W , where W N (0, I ). Then 2 1 1 X t K 1 X = W t K 2 K 1 K 2 W = W t W = Wi , and therefore X t K 1 X has a 2 distribution with n degrees of freedom. The density of the 2 distribution is x 2 1 e 2 f (x) = n ( n )2 2 2
n x

(41)

The moment generating function of the 2 distribution is M (t) = f (x)etx dx n 1 x x 2 e 2 tx = dx n e ( n )2 2 2 n 1 ((1 2t)x) 2 1 e(12t)x/2 n (12t) 2 1 = (1 2t) dx n ( n )2 2 2 1 = n (1 2t) 2 By the Cherno bound (Lemma 11.19.1) { } n 1 2 Pr Wi > 1 + min es(1+) (1 2s) 2 s n e 2 (ln(1+))
n

(42) (43) (44) (45)

(46) (47)

setting s = 2(1+ . ) Thus } { ( { } ) 1 1 1 2 n = Pr Pr ln f (X ) hn > > 1 + Wi n 2 n

(48) (49)

e 2 (ln(1+))
n

and the bound goes to 0 as n , and therefore by the Borel Cantelli lemma, 1 ln f (X n ) hn 0 n with probability 1. So Xi satises the AEP even though it is not ergodic. (50)

5. Maximum entropy. (a) What is the parametric form maximum entropy density f (x) satisfying the two conditions EX 8 = a EX 16 = b? (b) What is the parametric form maximum entropy density satisfying the condition E (X 8 + X 16 ) = a + b (c) Which entropy is higher? Maximum entropy. (a) By Burgs maximum entropy theorem, the parametric form density f (x) is given by f (x) = exp(0 + 1 x8 + 2 x16 ) where 0 , 1 , 2 are determined from the constraints exp(0 + 1 x8 + 2 x16 )dx = 1 x8 exp(0 + 1 x8 + 2 x16 )dx = a x16 exp(0 + 1 x8 + 2 x16 )dx = b

(b) In this case,

f1 (x) = exp(0 + 1 (x8 + x16 ))

where 0 , 1 are determined from the constraints exp(0 + 1 (x8 + x16 ))dx = 1 (x8 + x16 ) exp(0 + 1 (x8 + x16 ))dx = a + b

(c) If the constraints of part (a), namely EX 8 = a and EX 16 = b are satised, then the constraints of part (b), namely E (X 8 + X 16 ) = a + b are also satised. Thus f (x) is admissible for part (b) as well. Hence, part (a) has a stricter constraint than part (b) does. Therefore, f1 (x) has a higher entropy than f (x) in general. Equality occurs i 1 = 2 in part (a).

6. Maximum entropy processes Consider the set of all stochastic processes with {Xi }, Xi R, with R0 = EXi2 = 1 R1 = EXi Xi+1 = 1 . 4 Find the maximum entropy rate. Maximum entropy processes The maximum entropy process is the Gauss Markov pro. cess with R0 = 1 and R1 = 1 4 Such a process is indeed Markov with ( Xn , Xn+1 N (0, 1
1 4 1 4

) )

The entropy rate of the Gauss-Markov process is given by h(X ) = lim h(X n ) = lim h(Xn |X n1 ) n n n

Since we have a rst-order Gauss-Markov process, the entropy rate is given by h(X ) = h(X2 |X1 ). Thus h(X ) = h(X2 |X1 ) = h(X1 , X2 ) h(X1 ) 1 1 = log(2e)2 |K (2) | log 2e|K (1) | 2 ( )2 1 1 4 1 1 1 4 = log 2e 2 |(1)| 15e 1 = log 2 8

7. Mutual Information Let Z1 , Z2 be i.i.d. N (0, 2 ). Find the mutual information I (X ; Y ) for { (1, 1), prob (X, Y ) = (Z1 , Z2 ), prob Solution:

1 2 1 2

In general we dene mutual information for mixed random variables as I (X ; Y ) = lim I (X ; Y )


0

Where X , Y are quantized versions of X, Y with quantization error . (More precisely, we bin X in to intervals and say P (X = n ) = P (n < X (n + 1) .) If we use this denition, we can prove that the chain rule of mutual information still holds, namely, I (X ; Y Z ) = I (X ; Y ) + I (X ; Z |Y ) We use this and another idea to solve the problem, namely the probability P {(Z1 , Z2 ) = (1, 1)} = 0, that is, a continuous R.V. almost surely never occurs at a point. We dene the R.V as = { 1, if (X, Y ) = (1, 1) 0, otherwise

Note that is a deterministic function of both X and Y . We have I (X ; Y ) = I (X ; Y ) + I (X ; |Y ) I (X ; Y ) = I (X ; ) + I (X ; Y |) Note that I (X ; |Y ) H (|Y ) = 0. Therefore, I (X ; |Y ) = 0. Note also that I (X ; Y |) =
{0,1}

I (X ; Y | = )

= I (X ; Y | = 1) + I (X ; Y | = 0) =0+0=0 Thus we have I (X ; Y ) = I (X ; Y ) = I (X ; ). Now, I (X ; ) = H () H (|X ) = 1 0 = 1. Thus I (X ; Y ) = 1.

Vous aimerez peut-être aussi