Probability and Statistics Book Solutions

29 A Modern Introduction to Probability and Statistics
Full Solutions February 24, 2006
F.M. Dekking, C. Kraaikamp, H.P. Lopuha, L.E. Meester a
458
Full solutions from MIPS: DO NOT DISTRIBUTE
29.1 Full solutions

2.1 Using the relation P(A B) = P(A)+P(B)P(A B), we obtain P(A B) = 2/3 + 1/6 1/9 = 13/18. 2.2 The event at least one of E and F occurs is the event E F . Using the second DeMorgans law we obtain: P(E c F c ) = P((E F )c ) = 1 P(E F ) = 1 3/4 = 1/4. 2.3 By additivity we have P(D) = P(C c D)+P(C D). Hence 0.4 = P(C c D)+ 0.2. We see that P(C c D) = 0.2. (We did not need the knowledge P(C) = 0.3!) 2.4 The event only A occurs and not B or C is the event {A B c C c }. We then have using DeMorgans law and additivity P(A B c C c ) = P(A (B C)c ) = P(A B C) P(B C) . The answer is yes , because of P(B C) = P(B) + P(C) P(B C) 2.5 The crux is that B A implies P(A B) = P(B). Using additivity we obtain P(A) = P(A B) + P(A B c ) = P(B) + P(A \ B). Hence P(A \ B) = P(A) P(B). 2.6 a Using the relation P(A B) = P(A) + P(B) P(A B), we obtain 3/4 = 1/3 + 1/2 P(A B), yielding P(A B) = 4/12 + 6/12 9/12 = 1/12. 2.6 b Using DeMorgans laws we get P(Ac B c ) = P((A B)c ) = 1 P(A B) = 11/12. 2.7 P((A B) (A B)c ) = 0.7. 2.8 From the rule for the probability of a union we obtain P(D1 D2 ) P(D1 ) + P(D2 ) = 2 106 . Since D1 D2 is contained in both D1 and D2 , we obtain P(D1 D2 ) min{P(D1 ) , P(D2 )} = 106 . Equality may hold in both cases: for the union, take D1 and D2 disjoint, for the intersection, take D1 and D2 equal to each other. 2.9 a Simply by inspection we nd that A = {T T H, T HT, HT T }, B = {T T H, T HT, HT T, T T T }, C = {HHH, HHT, HT H, HT T }, D = {T T T, T T H, T HT, T HH}. 2.9 b Here we nd that Ac = {T T T, T HH, HT H, HHT, HHH}, A (C D) = A = A, A Dc = {HT T }. 2.10 Cf. Exercise 2.7: the event A or B occurs, but not both equals C = (AB) (A B)c Rewriting this using DeMorgans laws (or paraphrasing A or B occurs, but not both as A occurs but not B or B occurs but not A), we can also write C = (A B c ) (B Ac ). 2.11 Let the two outcomes be called 1 and 2. Then = {1, 2}, and P(1) = p, P(2) = p2 . We must have P(1) + P(2) = P() = 1, so p + p2 = 1. This has two solutions: p = (1 + 5 )/2 and p = (1 5 )/2. Since we must have 0 p 1 only one is allowed: p = (1 + 5 )/2. 2.12 a This is the same situation as with the three envelopes on the doormat, but now with ten possibilities. Hence an outcome has probability 1/10! to occur. 2.12 b For the ve envelopes labeled 1, 2, 3, 4, 5 there are 5! possible orders, and for each of these there are 5! possible orders for the envelopes labeled 6, 7, 8, 9, 10. Hence in total there are 5! 5! outcomes.
29.1 Full solutions
459
2.12 c There are 325!5! outcomes in the event dream draw. Hence the probability is 32 5!5!/10! = 32 1 2 3 4 5/(6 7 8 9 10) = 8/63 =12.7 percent. 2.13 a The outcomes are pairs (x, y). The outcome (a, a) has probability 0 to occur. The outcome (a, b) has probability 1/4 1/3 = 1/12 to occur. The table becomes: a a b c d 0
1 12 1 12 1 12
b
1 12
c
1 12 1 12
d
1 12 1 12 1 12
1 12 1 12
1 12
2.13 b Let C be the event c is one of the chosen possibilities. Then C = {(c, a), (c, b), (a, c), (b, c)}. Hence P(C) = 4/12 = 1/3. 2.14 a Since door a is never opened, P((a, a)) = P((b, a)) = P((c, a)) = 0. If the candidate chooses a (which happens with probability 1/3), then the quizmaster chooses without preference from doors b and c. This yields that P((a, b)) = P((a, c)) = 1/6. If the candidate chooses b (which happens with probability 1/3), then the quizmaster can only open door c. Hence P((b, c)) = 1/3. Similarly, P((c, b)) = 1/3. Clearly, P((b, b)) = P((c, c)) = 0. 2.14 b If the candidate chooses a then she or he wins; hence the corresponding event is {(a, a), (a, b), (a, c)}, and its probability is 1/3. 2.14 c To end with a the candidate should have chosen b or c. So the event is {(b, c), (c, b)} and P({(b, c), (c, b)}) = 2/3. 2.15 The rule is: P(A B C) = P(A)+P(B)+P(C)P(A B)P(A C)P(B C)+P(A B C) . That this is true can be shown by applying the sum rule twice (and using the set property (A B) C = (A C) (B C)): P(A B C) = P((A B) C) = P(A B) + P(C) P((A B) C) = P(A) + P(B) P(A B) + P(C) P((A C) (B C)) = s P(A B) P((A C)) P((B C)) + P((A C) (B C)) = s P(A B) P(A C) P(B C) + P(A B C) . Here we did put s := P(A) + P(B) + P(C) for typographical convenience. 2.16 Since E F G = , the three sets E F , F G, and E G are disjoint. Since each has probability 1/3, they have probability 1 together. From these two facts one deduces P(E) = P(E F ) + P(E G) = 2/3 (make a diagram or use that E = E (E F ) E (F G) E (E G)). 2.17 Since there are two queues we use pairs (i, j) of natural numbers to indicate the number of customers i in the rst queue, and the number j in the second queue. Since we have no reasonable bound on the number of people that will queue, we take = {(i, j) : i = 0, 1, 2, . . . , j = 0, 1, 2, . . . }. 2.18 The probability r of no success at a certain day is equal to the probability that both experiments fail, hence r = (1 p)2 . The probability of success for the rst time on day n therefore equals rn1 (1 r). (Cf. Section2.5.)
460
2.19 a We need at least two days to see two heads, hence = {2, 3, 4, . . . }. 2.19 b It takes 5 tosses if and only if the fth toss is heads (which has probability p), and exactly one of the rst 4 tosses is heads (which has probability 4p(1 p)3 ). Hence the probability asked for equals 4p2 (1 p)3 . 3.1 Dene the following events: B is the event point B is reached on the second step, C is the event the path to C is chosen on the rst step, and similarly we dene D and E. Note that the events C, D, and E are mutually exclusive and that one of them must occur. Furthermore, that we can only reach B by rst going to C or D. For the computation we use the law of total probability, by conditioning on the result of the rst step: P(B) = P(B C) + P(B D) + P(B E) = P(B | C) P(C) + P(B | D) P(D) + P(B | E) P(E) 1 1 1 7 1 1 = + + 0= . 3 3 4 3 3 36 3.2 a Event A has three outcomes, event B has 11 outcomes, and A B = {(1, 3), (3, 1)}. Hence we nd P(B) = 11/36 and P(A B) = 2/36 so that P(A | B) = P(A B) 2/36 2 = = . P(B) 11/36 11
3.2 b Because P(A) = 3/36 = 1/12 and this is not equal to 2/11 = P(A | B) the events A and B are dependent. 3.3 a There are 13 spades in the deck and each has probability 1/52 of being chosen, hence P(S1 ) = 13/52 = 1/4. Given that the rst card is a spade there are 131 = 12 spades left in the deck with 52 1 = 51 remaining cards, so P(S2 | S1 ) = 12/51. If c the rst card is not a spade there are 13 spades left in the deck of 51, so P(S2 | S1 ) = 13/51.
c 3.3 b We use the law of total probability (based on = S1 S1 ): c c c P(S2 ) = P(S2 S1 ) + P(S2 S1 ) = P(S2 | S1 ) P(S1 ) + P(S2 | S1 ) P(S1 ) 12 1 13 3 12 + 39 1 = + = = . 51 4 51 4 51 4 4
3.4 We repeat the calculations from Section 3.3 based on P(B) = 1.3 105 : P(T B) = P(T | B) P(B) = 0.7 0.000 013 = 0.000 0091 P(T B c ) = P(T | B c ) P(B c ) = 0.1 0.999 987 = 0.099 9987 so P(T ) = P(T B) + P(T B c ) = 0.000 0091 + 0.099 9987 = 0.100 0078 and P(B | T ) = Further, we nd P(T c B) = P(T c | B) P(B) = 0.3 0.000 013 = 0.000 0039 and combining this with P(T c ) = 1 P(T ) = 0.899 9922: P(B | T c ) = P(T c B) 0.000 0039 = = 0.000 0043 = 4.3 106 . P(T c ) 0.899 9922 P(T B) 0.000 0091 = = 0.000 0910 = 9.1 105 . P(T ) 0.100 0078
29.1 Full solutions
461
3.5 Dene the events R1 and R2 meaning: a red ball is drawn on the rst and second draw, respectively. We are asked to compute P(R1 R2 ). By conditioning on R1 we nd: P(R1 R2 ) = P(R2 | R1 ) P(R1 ) = 3 1 3 = , 4 2 8
where the conditional probability P(R2 | R1 ) follows from the contents of the urn after R1 has occurred: one white and three red balls. 3.6 a Let E denote the event outcome is an even numbered month and H the event outcome is in the rst half of the year. Then P(E) = 1/2 and, because in the rst half of the year there are three even and three odd numbered months, P(E | H) = 1/2 as well; the events are independent. 3.6 b Let S denote the event outcome is a summer month. Of the three summer months, June and August are even numbered, so P(E | S) = 2/3 = 1/2. Therefore, E and S are dependent. 3.7 a The best approach to a problem like this one is to write out the conditional probability and then see if we can somehow combine this with P(A) = 1/3 to solve the puzzle. Note that P(B Ac ) = P(B | Ac ) P(Ac ) and that P(A B) = P(A) + P(B Ac ). So P(A B) = 1 1 1 + 1 3 4 3

1 1 1 + = . 3 6 2
3.7 b From the conditional probability we nd P(Ac B c ) = P(Ac | B c ) P(B c ) = 1 (1 P(B)). Recalling DeMorgans law we know P(Ac B c ) = P((A B)c ) = 2 1P(A B) = 1/3. Combined this yields an equation for P(B): 1 (1 P(B)) = 1/3 2 from which we nd P(B) = 1/3. 3.8 a This asks for P(W ). We use the law of total probability, decomposing = F F c . Note that P(W | F ) = 0.99. P(W ) = P(W F ) + P(W F c ) = P(W | F ) P(F ) + P(W | F c ) P(F c ) = 0.99 0.1 + 0.02 0.9 = 0.099 + 0.018 = 0.117. 3.8 b We need to determine P(F | W ), and this can be done using Bayes rule. Some of the necessary computations have already been done in a, we can copy P(W F ) and P(W ) and get: P(F | W ) = P(F W ) 0.099 = = 0.846. P(W ) 0.117
3.9 Deciphering the symbols we conclude that P(B | A) is to be determined. From the probabilities listed we nd P(A B) = P(A) + P(B) P(A B) = 3/4 + 2/5 4/5 = 7/20, so that P(B | A) = P(A B) /P(A) = (7/20)/(3/4) = 28/60 = 7/15. 3.10 Let K denote the event the student knows the answer and C the event the answer that is given is the correct one. We are to determine P(K | C). From the information given, we know that P(C | K) = 1 and P(C | K c ) = 1/4, and that P(K) = 0.6. Therefore: P(C) = P(C | K) P(K) + P(C | K c ) P(K c ) = 1 0.6 + and P(K | C) = 0.6/0.7 = 6/7. 1 0.4 = 0.6 + 0.1 = 0.7 4
462
3.11 a The probability that a driver that is classied as exceeding the legal limit in fact does not exceed the limit. 3.11 b It is given that P(B) = 0.05. We determine the answer via P(B c | A) = P(B c A) /P(A). We nd P(B c A) = P(A | B c )P(B c ) = 0.95 (1p), P(B A) = P(A | B) P(B) = 0.05 p, and by adding them P(A) = 0.95 0.9 p. So P(B c | A) = 3.11 c From b we nd P(B | A) = 1 P(B c | A) = 95 90 p 95 (1 p) 5p = . 95 90 p 95 90 p 0.95 (1 p) 95 (1 p) = = 0.5 0.95 0.9 p 95 90 p when p = 0.95.
Setting this equal to 0.9 and solving for p yields p = 171/172 0.9942. 3.12 We start by deriving some unconditional probabilities from what is given: P(B C) = P(B | C) P(C) = 1/6 and P(A B C) = P(B C) P(A | B C) = 1/24. Next, we should realize that B C is the union of the disjoint events A B C and Ac B C, so that P(Ac B C) = P(B C) P(A B C) = 1 1 1 = . 6 24 8
3.13 a There are several ways to see that P(D1 ) = 5/9. Method 1: the rst team we draw is irrelevant, for the second team there are always 5 good choices out of 9 remaining teams. Method 2: there are 10 = 45 possible outcomes when two teams 2 are drawn; among these, there are 5 5 = 25 favorable outcomes (all weak-strong pairings), resulting in P(D1 ) = 25/45 = 5/9. 3.13 b Given D1 , there are 4 strong and 4 weak teams left. Using one of the methods from a on this reduced number of teams, we nd P(D2 | D1 ) = 4/7 and P(D1 D2 ) = P(D2 | D1 ) P(D1 ) = (4/7) (5/9) = 20/63. 3.13 c Proceeding in the same fashion, we nd P(D3 | D1 D2 ) = 3/5 and P(D1 D2 D3 ) = P(D3 | D1 D2 ) P(D1 D2 ) = (3/5) (20/63) = 12/63. 3.13 d Subsequently, we nd P(D4 | D1 D3 ) = 2/3 and P(D5 | D1 D4 ) = 1. The nal result can be written as P(D1 D5 ) = 5 4 3 2 8 1= . 9 7 5 3 63
The probability of a dream draw with n strong and n weak teams is P(D1 Dn ) = 2n / 2n . n 3.14 a If you chose the right door, switching will make you lose, so P(W | R) = 0. If you chose the wrong door, switching will make you win for sure: P(W | Rc ) = 1. 3.14 b Using P(R) = 1/3, we nd: P(W ) = P(W R) + P(W Rc ) = P(W | R) P(R) + P(W | Rc ) P(Rc ) 2 2 1 =0 +1 = . 3 3 3
29.1 Full solutions
463
3.15 This is a puzzle and there are many ways to solve it. First note that P(A) = 1/2. We condition on A B: P(B) = P(B | A B) P(A B) 2 = {P(A) + P(B) P(A) P(B)} 3 2 1 1 = { + P(B) P(B)} 3 2 2 1 1 = + P(B) . 3 3 Solving the resulting equation yields P(B) = 3.16 a Using Bayes rule we nd: P(D | T ) = P(D T ) 0.98 0.01 = 0.1653. P(T ) 0.98 0.01 + 0.05 0.99
1 1 3 1 3
= 1. 2
3.16 b Denote the event the second test indicates you have the disease with the letter S. Method 1 (unconscious): from the rst test we know that the probability we have the disease is not 0.01 but 0.1653 and we reason that we shoud redo the calculation from a with P(D) = 0.1653: P(D | S) = P(D S) 0.98 0.1653 = 0.7951. P(S) 0.98 0.1653 + 0.05 0.8347
This is the correct answer, but a more thorough consideration is warrented. Method 2 (conscientious): we are in fact to determine P(D | S T ) = P(D S T ) P(S T )
and we should wonder what independent repetition of the test exactly means. Clearly, both tests are dependent on whether you have the disease or not (and as a result, S and T are dependent), but given that you have the disease the outcomes are independent, and the same when you do not have the disease. Formally put: given D, the events S and T are independent; given Dc , the events S and T are independent. In formulae: P(S T | D) = P(S | D) P(T | D), P(S T | Dc ) = P(S | Dc ) P(T | Dc ). So P(D S T ) = P(S | D) P(T | D)P(D) = (0.98)2 0.01 = 0.009604 and P(Dc S T ) = (0.05)2 0.99 = 0.002475. Adding them yields P(S T ) = 0.012079 and so P(D | S T ) = 0.009604/0.012079 0.7951. Note that P(S | T ) 0.2037 which is much larger than P(S) 0.0593 (and for a good test, it should be). 3.17 a I win the game after the next two rallies if I win both: P(W G) = p2 . Similarly for losing the game if you win both rallies: P(W c G) = (1 p)2 . So P(W | G) = p2 /(p2 + (1 p)2 ).
464
3.17 b Note that D = Gc and so P(D) = 1 p2 (1 p)2 = 2p(1 p). Using the law of total probability we nd: P(W ) = P(W G) + P(W D) = p2 + P(W | D)P(D) = p2 + 2p(1 p)P(W | D). Substituting P(W | D) = P(W ) (when it become deuce again, the initial situation repeats itself) and solving yields P(W ) = p2 /(p2 + (1 p)2 ). 3.17 c If the game does not end after the next two rallies, each of us has again the same probability to win the game as we had initially. This implies, that the probability of winning the game is split between us in the same way as the probability of an immediate win, i.e., in the next two rallies. So P(W ) = P(W | G). 3.18 a No: A B = so P(A | B) = 0 = P(A) which is positive. 3.18 b Since, by independence, P(A B) = P(A)P(B) > 0 it must be that AB = . 3.18 c If A B then P(B | A) = 1. However, P(B) < 1, so P(B) = P(B | A): A en B are dependent. 3.18 d Since A A B, P(A B | A) = 1, so for the required independence P(A B) = 1 should hold. On the other hand, P(A B) = P(A)+P(B)P(A)P(B) and so P(A B) = 1 can be rewritten as (1 P(A)) (1 P(B)) = 0, which clearly contradicts the assumptions. 4.1 a In two independent throws of a die there are 36 possible outcomes, each occurring with probability 1/36. Since there are 25 ways to have no 6s, 10 ways to have one 6, and one way to have two 6s, we nd that pZ (0) = 25/36, pZ (1) = 10/36, and pZ (2) = 1/36. So the probability mass function pZ of Z is given by the following table: z 0 1 2 pZ (z)
8 >0 > > 25 <
25 > 36 + > > : 25 36 36 25 36 10 36 1 36
The distribution function FZ is given by for for for for a<0 0a<1 1a<2 a 2.
FZ (a) =
10 36 10 36
= +
35 36 1 36
=1
Z is the sum of two independent Ber (1/6) distributed random variables, so Z has a Bin (2, 1/6) distribution. 4.1 b If we denote the outcome of the two throws by (i, j), where i is the outcome of the rst throw and j the outcome of the second, then {M = 2, Z = 0} = { (2, 1), (1, 2), (2, 2) }, {S = 5, Z = 1} = , {S = 8, Z = 1} = { (6, 2), (2, 6) }. Furthermore, P(M = 2, Z = 0) = 3/36, P(S = 5, Z = 1) = 0, and P(S = 8, Z = 1) = 2/36. 4.1 c The events are dependent, because, e.g., P(M = 2, Z = 0) = 3 25 P(M = 2) P(Z = 0) = 36 36 .
3 36
diers from
4.2 a Since P(Y = 0) = P(X = 0), P(Y = 1) = P(X = 1)+P(X = 1), P(Y = 4) = a 0 1 4 P(X = 2), the following table gives the probability mass function of Y : pY (a) 1 3 1 8 8 2
29.1 Full solutions

3 4.2 b FX (1) = P(X 1) = 1/2; FY (1) = 1/2; FX ( 3 ) = 8 ; FY ( 3 ) = 1 ; 4 4 8 FX ( 3) = FX (0.1415...) = 3/8; FY ( 3) = 1/8.
465
4.3 For every random variable X and every real number a one has that {X = a} = {X a} \ {X < a}, so P(X = a) = P(X a) P(X < a) . Since F is right-continuous (see p. 49) we have that F (a) = lim0 F (a + ) = lim0 P(X a + ). Moreover, P(X < a) = lim0 P(X a ) = lim0 F (a ), so we see that P(X = a) > 0 precisely for those a for which F (a) makes a jump. In this exercise this is for a = 0, 1 , and a = 3 , and p(0) = P(X 0) P(X < 0) = 1/3, 2 4 etc. We nd a 0 1/2 3/4 p(a) 1/3 1/6 1/2. 4.4 a By conditioning one nds that each coin has probability 2p p2 to give heads. The outcomes of the coins are independent, so the total number of heads has a binomial distribution with parameters n and 2p p2 . 4.4 b Since the total number of heads has a binomial distribution with parameters n and 2p p2 , we nd for k = 0, 1, . . . , n, pX (k) = and pX (k) = 0 otherwise. 4.5 In one throw of a die one cannot exceed 6, so F (1) = P(X 1) = 0. Since P(X = 1) = 0, we nd that F (2) = P(X = 2), and from Table 4.1 it follows that F (2) = 21/36. No matter the outcomes, after seven throws we have that the sum certainly exceeds 6, so F (7) = 1. 4.6 a Let (1 , 2 , 3 ) denote the outcome of the three draws, where 1 is the outcome of the rst draw, 2 the outcome of the second draw, and 3 the outcome of the third one. Then the sample space is given by = {(1 , 2 , 3 ) : 1 , 2 , 3 {1, 2, 3} } = {(1, 1, 1), (1, 1, 2), . . . , (1, 1, 6), (1, 2, 1), . . . , (3, 3, 3)}, 1 + 2 + 3 X(1 , 2 , 3 ) = for (1 , 2 , 3 ) . 3 The possible outcomes of X are 1, 4 , 5 , 2, 7 , 8 , 3. Furthermore, {X = 1} = {(1, 1, 1)}, 3 3 3 3 = 1 = 1 , and so P X 27 {X = Since {X = 5 } = {(1, 1, 3), (1, 3, 1), (3, 1, 1), (1, 2, 2), (2, 1, 2), (2, 2, 1)}, 3 4 } = {(1, 1, 2), (1, 2, 1), (2, 1, 1)}, 3 so 4 P X= 3

n k
2p p2
p2 2p + 1
nk
and
3 . 27
5 6 we nd that P X = 3 = 27 , etc. Continuing in this way we nd that the probability is given by mass function pX of X
466
Full solutions from MIPS: DO NOT DISTRIBUTE a pX (a) 1

1 27 4 3 3 27 5 3 6 27
2
7 27
7 3 6 27
8 3 3 27
3
1 27
4.6 b Setting, for i = 1, 2, 3,

(
Yi =
1 0
if if
Xi < 2 Xi 2,
and Y = Y1 + Y2 + Y3 , then Y is Bin (3, 1 ). It follows that the probability that 3 exactly two draws are smaller than 2 is equal to P(Y = 2) = 3 2
! 2
1 3
1 1 3
6 . 27
Another way to nd this answer is to realize that the event that exactly two draws are equal to 1 is equal to {(1, 1, 2), (1, 1, 3), (1, 2, 1), (1, 3, 1), (2, 1, 1), (3, 1, 1)}. 4.7 a Setting Xi = 1 if the ith lamp is defective, and Xi = 0 if not, for i = 1, 2, . . . , 1000, we see that Xi is a Ber (0.001) distributed random variable. Assuming that these Xi are independent, we nd that X (as the sum of 1000 independent Ber (0.001) random variables) has a Bin (1000, 0.001) distribution. 4.7 b Since X has a Bin (1000, 0.001) distribution, these probabilities are P(X = 0) = 999999 999 ( 1000 )1000 = 0.367695, P(X = 1) = 10001000 = 0.36806, and P(X > 2) = 1 P(X 2) = 0.08021. 4.8 a Assuming that O-rings fail independently of one-another, X can be seen as the sum of six independent random variables with a Ber (0.8178) distribution. Consequently, X has a Bin (6, 0.8178) distribution. 4.8 b Since X has a Bin (6, 0.8178) distribution, we nd that P(X 1) = 1 P(X = 0) = 1 (1 0.8178)6 = 0.9999634. 4.9 a With these assumptions, X has a Bin (6, 0.0137) distribution, so we nd that the probability that during a launch at least one O-ring fails is equal to P(X 1) = 1 P(X = 0) = 1 (1 0.0137)6 = 0.079436. But then the probability, that at least one O-ring fails for the rst time during the 24th launch is equal 23 P(X 1) = 0.01184. to P(X = 0) 4.9 b The probability that no O-ring fails during a launch is P(X = 0) = (1 24 0.0137)6 = 0.92056, so no O-ring fails during 24 launches is equal to P(X = 0) = 0.13718. 4.10 a Each Ri has a Bernoulli distribution, because it can only attain the values 0 and 1. The parameter is p = P(Ri = 1). It is not easy to determine P(Ri = 1), but it is fairly easy to determine P(Ri = 0). The event {Ri = 0} occurs when none of the m people has chosen the ith oor. Since they make their choices independently of each other, and each oor is selected by each of these m people with probability 1/21, it follows that m 20 P(Ri = 0) = . 21 Now use that p = P(Ri = 1) = 1 P(Ri = 0) to nd the desired answer.
29.1 Full solutions
467
4.10 b If {R1 = 0}, . . . , {R20 = 0}, we must have that {R21 = 1}, so we cannot conclude that the events {R1 = a1 }, . . . , {R21 = a21 }, where ai is 0 or 1, are independent. Consequently, we cannot use the argument from Section 4.3 to conclude that Sm is Bin (21, p). In fact, Sm is not Bin (21, p) distributed, as the following shows. The elevator will stop at least once, so P(Sm = 0) = 0. However, if Sm would have a Bin (21, p) distribution, then P(Sm = 0) = (1 p)21 > 0, which is a contradiction. 4.10 c This exercise is a variation on nding the probability of no coincident birthdays from Section 3.2. For m = 2, S2 = 1 occurs precisely if the two persons entering the elevator select the same oor. The rst person selects any of the 21 oors, the second selects the same oor with probability 1/21, so P(S2 = 1) = 1/21. For m = 3, S3 = 1 occurs if the second and third persons entering the elevator both select the same oor as was selected by the rst person, so P(S3 = 1) = (1/21)2 = 1/441. Furthermore, S3 = 3 occurs precisely when all three persons choose a dierent oor. Since there are 21 20 19 ways to do this out of a total of 213 possible ways, we nd that P(S3 = 3) = 380/441. Since S3 can only attain the values 1, 2, 3, it follows that P(S3 = 2) = 1 P(S3 = 1) P(S3 = 3) = 60/441. 4.11 Since the lotteries are dierent, the event to win something (or not) in one lottery is independent of the other. So the probability to win a prize the rst time you play is p1 p2 + p1 (1 p2 ) + (1 p1 )p2 . Clearly M has a geometric distribution, with parameter p = p1 p2 + p1 (1 p2 ) + (1 p1 )p2 . 4.12 The probability that your friend wins is equal to p + p(1 p)2 + p(1 p)4 + = p 1 1 = . 1 (1 p)2 2p
The procedure is favorable to your friend if 1/(2 p) > 1/2, and this is true if p > 0. 4.13 a Since we wait for the rst time we draw the marked bolt in independent draws, each with a Ber (p) distribution, where p is the probability to draw the bolt (so p = 1/N ), we nd, using a reasoning as in Section 4.4, that X has a Geo (1/N ) distribution. 4.13 b Clearly, P(Y = 1) = 1/N . Let Di be the event that the marked bolt was drawn (for the rst time) in the ith draw. For k = 2, . . . , N we have that
c c P(Y = k) = P(D1 Dk1 Dk ) c c c c = P(Dk | D1 Dk1 ) P(D1 Dk1 ) . c c Now P(Dk | D1 Dk1 ) = 1 , N k+1
c c c c c c c P(D1 Dk1 ) = P(Dk1 | D1 Dk2 ) P(D1 Dk2 ) ,
and
c c c c c P(Dk1 | D1 Dk1 ) = 1 P(Dk1 | D1 Dk1 ) = 1
1 . N k+2
Continuing in this way, we nd after k steps that P(Y = k) = N k+1 N k+2 N 2 N 1 1 1 = . N k+1 N k+2 N k+3 N 1 N N
See also Section 9.3, where the distribution of Y is derived in a dierent way.
468
4.13 c For k = 0, 1, . . . , r, the probability P(Z = k) is equal to the number of ways the event {Z = k} can occur, divided by the number of ways N we can select r r objects from N objects, see also Section 4.3. Since one can select k marked bolts from m marked ones in m ways, and r k nonmarked bolts from N m nonmarked k ones in N m ways, it follows that rk P(Z = k) =
m k N m rk N r
for k = 0, 1, 2, . . . , r.
4.14 a Denoting heads by H and tails by T , the event {X = 2} occurs if we have thrown HH, i.e, if the outcome of both the rst and the second throw was heads. Since the probability of heads is p, we nd that P(X = 2) = pp = p2 . Furthermore, X = 3 occurs if we either have thrown HT H, or T HH, and since the probability of throwing tails is 1 p, we nd that P(X = 3) = p (1 p) p + (1 p) p p = 2p2 (1 p). Similarly, X = 4 can only occur if we throw T T HH, T HT H, or HT T H, so P(X = 4) = 3p2 (1 p)2 . 4.14 b If X = n, then the nth throw was heads (denoted by H), and all but one of the previous n 1 throws were tails (denoted by T ). So the possible outcomes are HT T {z T H, | }
n2 times
T HT T {z T H, | }
n3 times
T T HT T {z T H, | }
n4 times
...,
T T {z T HH | }
n2 times
Notice there are exactly n1 = n 1 of such possible outcomes, each with proba1 bility p2 (1 p)n2 , so P(X = n) = (n 1)p2 (1 p)n2 , for n 2. 5.1 a Sketch of probability density f :
........... . . . .......... . . . . . . . . . . ........... . ........... . . . ..................................................... ..................................................... . . . . 0 1 2 3
5.1 b Since f (x) = 0 for x < 0, F (b) = 0 for b < 0. For 0 b 1 we compute Rb Rb F (b) = f (x) dx = 0 3/4 dx = 3b/4. Since f (x) = 0 for 1 x 2, F (b) = F (1) = 3/4 for 1 b 2. For 2 b 3 we compute F (b) =
R1 Rb Rb Rb
f (x) dx =
f (x) dx + 2 f (x) dx = F (1) + 2 1/4 dx = 3/4 + (1/4)(b 2) = b/4 + 1/4. Since 0 f (x) = 0 for x > 3, F (b) = F (3) = 1 for b > 3. Sketch of distribution function F : 5.2 The event {1/2 < X 3/4} can be written as {1/2 < X 3/4} = {X 3/4} {X 1/2}c = {X 3/4} \ {X 1/2}. Noting that {X 1/2} {X 3/4} we nd

.............. ............... ....... ................. .............. ... ... ... ... ... . .. ...................................................... ..................................................... . . 0 1 2 3
1 3 <X 2 4
=P X
3 4
P X
1 2
3 2
1 2
5 . 16
(cf. Exercise 2.5.)
29.1 Full solutions 5.3 a From P(X 3/4) = P(X 1/4) + P(1/4 < X 3/4) we obtain

469
1 3 <X 4 4
=P X
3 4
P X
1 4
=F
3
1
11 . 16
5.3 b The probability density function f (x) of X is obtained by dierentiating the distribution function F (x): f (x) = d/dx 2x2 x4 = 4x 4x3 for 0 x 1 (and f (x) = 0 elsewhere). 5.4 a Let T be the the time until the next arrival of a bus. Then T has U (4, 6) R 4.5 distribution. Hence P(X 4.5) = P(T 4.5) = 4 1/2 dx = 1/4. 5.4 b Since Jensen leaves when the next bus arrives after more than 5 minutes, R6 P(X = 5) = P(T > 5) = 5 1 dx = 1/2. 2 5.4 c Since P(X = 5) = 0.5 > 0, X cannot be continuous. Since X can take any of the uncountable values in [4, 5], it can also not be discrete. 5.5 a A probability density f has to satisfy (I) f (x) 0 for all x, and (II) R f (x) dx = 1. Start with property (II):
Z
Z
2
f (x)dx =
3
(cx+3) dx+
(3cx) dx =
hc
x2 + 3x
i2
3
+ 3x
c 2 i3 x = 65c. 2 2
So (II) leads to the conclusion that c = 1. Substituting this yields f (x) = x + 3 for 3 x 2, and f (x) = 3 x for 2 x 3 (and f (x) = 0 elsewhere), so f also satises (I). 5.5 b Since f (x) = 0 for x < 3, F (b) = 0 for b 3. For 3 b 2 we Rb Rb compute F (b) = f (x) dx = 3 (x + 3) dx = (b + 3)2 /2. Similarly one nds F (b) = 1 (3 b)2 /2 for 2 b 3. For 2 x 2 f (x) = 0, and hence F (b) = F (2) = 1/2 for 2 b 2. Finally F (b) = 1 for b 3. 5.6 If X has an Exp (0.2) distribution, then its distribution function is given by F (b) = 1 eb . Hence P(X > 5) = 1 P(X 5) = 1 F (5) = e5 . Here = 0.2, so P(X > 5) = e1 = 0.367879 . . . . 5.7 a P(failure) = P(S < 0.55) = 0.595.
R 0.55
f (x) dx =
R 0.55
0
4x dx +
R 0.55
0.5
(4 4x) dx =
5.7 b The 50-th percentile or median q0.5 is obtained by solving F (q0.5 ) = 0.5. For Rb 2 b 0.5 we obtain F (b) = 0 4x dx = 2b2 . Here F (q0.5 ) = q0.5 yields 2q0.5 = 0.5, or q0.5 = 0.5. 5.8 a The probability density g(y) = 1/(2 ry) has an asymptote in 0 and decreases to 1/2r in the point r. Outside [0, r] the function is 0. 5.8 b The second darter is better: for each 0 < b < r one has (b/r)2 < b/r so the second darter always has a larger probability to get closer to the center. 5.8 c Any function F that is 0 left from 0, increasing on [0, r], takes the value 0.9 in r/10, and takes the value 1 in r and to the right of r is a correct answer to this question. 5.9 a The area of a triangle is 1/2 times base times height. So the largest area that can occur is 1/2 (when the y-coordinate of a point is 1). The event {A 1/4} occurs if and only if the y-coordinate of the random point is at most 1/2, so the answer is {(x, y) : 2 x 3, 1 y 3/2}.
p
470
5.9 b Generalising a we see that A a if and only if the y-coordinate of the random point is at most a. Hence f (a) = P(A a) = a for 0 a 1. Furthermore F (a) = 0 for a < 0, and F (a) = 1 for a 1. 5.9 c Dierentiating F yields f (x) = 2 for 0 x 1/2; f (x) = 0 elsewhere. 5.10 The residence time T has an Exp (0.5) distribution, so its distribution function is F (x) = 1 ex/2 . We are looking for P(T 2) = F (2) = 1 e1 = 63.2 %. 5.11 We have to solve 1 ex = 1/2, or ex = 1/2, which means that x = ln(1/2) or x = ln(2)/. 5.12 We have to solve 1 x = 1/2, or x = 1/2, which means that x = 2 or x = 21/ . 5.13 a This follows with R change of variable transformation x x in the integral: a R a R (a) = (x) dx = a (x) dx = a (x) dx = 1 (a). 5.13 b This is straightforward: P(Z 2) = (2) = 1 (2) = 0.0228. 5.14 The 10-th percentile q0.1 is given by (q0.1 ) = 0.1. To solve this we have to use Table ??. Since the table only contains tail probabilities larger than 1/2, we have to use the symmetry of the normal distribution ((x) = (x)): (a) = 0.1 if and only if (a) = 0.9 if and only if 1 (a) = 0.9. This gives q0.1 = 1.28, hence q0.1 = 1.28. 6.1 a If 0 U 1 , put X = 1, if 6 U < i/6, then set X = i.
1 6
<U
2 , 6
put X = 2, etc., i.e., if (i 1)/6
6.1 b The number 6U + 1 is in the interval [1, 7], so Y is one of the numbers 1, 2, 1 3, . . . , 7. For k = 1, 2, . . . , 6 we see that P(Y = k) = P(i 6U + 1 < i + 1) = 6 ; P(Y = 7) = P(6U + 1 = 7) = P(U = 1) = 0. 6.2 a Substitute u = 0.3782739 in x = 1 + 2 u: x = 1 + 2 0.3782739 = 2.2300795. 6.2 b The map u 1 + 2 u is increasing, so u = 0.3 will result in a smaller value, since 0.3 < 0.3782739. 6.2 c Any number u smaller than 0.3782739 will result in a smaller realization than the one found in a. This happens with probability P(U < 0.3782739) = 0.3782739. 6.3 The random variable Z attains values in the interval [0, 1], so to show that its distribution is the U (0, 1), it suces to show that FZ (a) = a, for 0 a 1. We nd: FZ (a) = P(Z a) = P(1 U a) = P(U 1 a) = 1FU (1a) = 1(1a) = a. 6.4 Since 0 U 1, 1 X 3 follows, so FX (a) = 0 for a < 1 and FX (a) = 1 for a > 3. If we show that FX (a) = F (a) for 1 a 3, then we have shown FX = F :
FX (a) = P(X a) = P 1 + 2 U a = P U (a 1)/2 = P U ((a 1)/2)2
= FU
1 (a 1)2 4
1 (a 1)2 = F (a). 4
29.1 Full solutions 6.5 We see that Xa ln U a ln U a U ea ,
471
and so P(X a) = P U ea = 1 P U ea = 1 ea , where we use P(U p) = p for 0 p 1 applied to p = ea (remember that a 0). 6.6 We know that X = 1 ln U has an Exp (2) distribution when U is U (0, 1). 2 Inverting this relationship, we nd U = e2X , which should be the way to get U from X. We check this, for 0 < a 1: P(U a) = P e2X a = P(2X ln a) 1 = P X ln a 2 = a = FU (a). 6.7 We need to obtain F inv , and do this by solving F (x) = u, for 0 u 1: 1 e5x = u
2
= e2( 2 ln a)
e5x = 1 u
5x2 = ln(1 u) x=
p
x2 = 0.2 ln(1 u)
0.2 ln(1 u).
The solution is Z = 0.2 ln U (replacing 1 U by U , see Exercise 6.3). Note that 2 Z has an Exp (5) distribution. 6.8 We need to solve F (x) = u for 0 u 1 to obtain F inv (x): 1 x3 = u
1
1 u = x3
x = (1 u) 3 .
1
So, X = (1 U ) 3 has a Par (3) distribution, and the same holds for X = U 3 = 1/ 3 U . 6.9 a For six of the eight possible outcomes our algorithm terminates succesfully, so the probability of success is 6/8 = 3/4. 6.9 b If the rst toss (of the three coins) is unsuccessful, we try again, and repeat until the rst success. This way, we generate a sequence of Ber (3/4) trials and stop after the rst success. From Section 4.4 we know that the number N of trials needed has a Geo (3/4) distribution. 6.10 a Dene random variables Bi = 1 if Ui p and Bi = 0 if Ui > p. Then P(Bi = 1) = p and P(Bi = 0) = 1 p: each Bi has a Ber (p) distribution. If B1 = B2 = = Bk1 = 0 and Bk = 1, then N = k, i.e., N is the position in the sequence of Bernoulli random variables, where the rst 1 occurs. This is a Geo (p) distribution. This can be veried by computing the probability mass function: for k 1, P(N = k) = P(B1 = B2 = = Bk1 = 0, Bk = 1) = P(B1 = 0) P(B2 = 0) P(Bk1 = 0) P(Bk = 1) = (1 p)k1 p.
472
6.10 b If Y is (a real number!) greater than n, then rounding upwards means we obtain n + 1 or higher, so {Y > n} = {Z n + 1} = {Z > n}. Therefore, n P(Z > n) = P(Y > n) = en = e . From = ln(1 p) we see: e = 1 p, n so the last probability is (1 p) . From P(Z > n 1) = P(Z = n) + P(Z > n) we nd: P(Z = n) = P(Z > n 1) P(Z > n) = (1 p)n1 (1 p)n = (1 p)n1 p. Z has a Geo (p) distribution. 6.11 a There are many possibilities, one is: Y1 = g + 0.5 + Z1 , where g and Z1 are as dened earlier. In this case, the bribed jury member adds a bonus of 0.5 points. Of course, this is assuming that she was bribed to support the candidate. If she was bribed against her, Y1 = g 0.5 + Z1 would be more appropriate. 6.11 b We call the resulting deviation R and it is the average of Z1 , Z2 , . . . , Z7 after we have removed two of them, chosen randomly. This can be done as follows. Let I and J be independent random variables, such that P(I = n) = 1/7 for n = 1, 2, . . . , 7 and P(J = m) = 1/6 for m = 1, 2, . . . , 6. Put K = J if J < I; otherwise, put K = J + 1. Now, the pair {I, K} is a random pair chosen from the set of indices {1, 2, . . . , 7}. This can be veried, for example, the probability that 1 and 2 are selected equals P(I = 1, J = 1) + P(I = 2, J = 1) = 1/21, which is correct, because there are 7 = 21 pairs to choose from, each we the same probability. 2 Removing ZI and ZK from the jury list, we can compute R as the average of the remaining ones. We expect this rule to be more sensitive to bribery, because the bribed jury member is more likely to have assigned one of the extreme scores (highest or lowest). With either of the two other rules, this score has no inuence at all, because it is not taken into account. 6.12 We need to generate stock prices for the next ve years, or 60 months. So we need sixty U (0, 1) random variables U1 , . . ., U60 . Let Si denote the stock price in month i, and set S0 = 100, the initial stock price. From the Ui we obtain the stock movement, as follows, for i = 1, 2, . . .:
8 >0.95 Si1 < > :
Si =
Si1 1.05 Si1
if Ui < 0.25, if 0.25 Ui 0.75, if Ui > 0.75.
We have carried this out, using the realizations below: 110: 1120: 2130: 3140: 4150: 5160: 0.72 0.88 0.38 0.34 0.37 0.19 0.03 0.25 0.88 0.34 0.24 0.76 0.01 0.89 0.81 0.37 0.09 0.98 0.81 0.95 0.09 0.30 0.69 0.31 0.97 0.82 0.36 0.74 0.91 0.70 0.31 0.52 0.93 0.03 0.04 0.36 0.76 0.37 0.00 0.16 0.81 0.56 0.70 0.40 0.14 0.92 0.95 0.22 0.71 0.82 0.74 0.25 0.29 0.78 0.25 0.04 0.48 0.20 0.47 0.41
We do not list all the stock prices, just the ones that matter for our investment strategy (you can verify this). We rst wait until the price drops below 95, which happens at S4 = 94.76. Our money has been in the bank for four months, so we own 1000 1.0054 = 1020.15, for which we can buy 1020.15/94.76 = 10.77 shares. Next we wait until the price hits 110, this happens at S15 = 114.61. We sell the our shares for 10.77 114.61 = 1233.85, and put the money in the bank. At S42 = 92.19 we buy stock again, for the 1233.85 1.00527 = 1411.71 that has accrued in the bank. We can buy 15.31 shares. For the rest of the ve year period
29.1 Full solutions
473
nothing happens, the nal price is S60 = 100.63, which puts the value of our portfolio at 1540.65. For a real simulation the above should be repeated, say, one thousand times. The one thousand net results then give us an impression of the probability distribution that corresponds to this model and strategy. 6.13 Set p = P(H). Toss the coin twice. If HT occurs, I win; if T H occurs, you win; if T T or HH occurs, we repeat. Since P(HT ) = P(T H) = p (1 p), we have the same probability of winning. The probability that the game ends is 2 p (1 p) for each double toss. So, if 2N is the total number of tosses needed, where N has a Geo (2 p (1 p)) distribution. 7.1 a Outcomes: 1, 2, 3, 4, 5, and 6. Each has probability 1/6. 7.1 b E [T ] = 1 1 + 1 2 + 1 6 = 1 (1 + 2 = + 6) = 7 . 6 6 6 6 2 For the variance, rst compute E T 2 : E T2 = 1 1 1 91 1 + 4 + 36 = . 6 6 6 6
91 6
49 = 35 . 4 12 1 1 7.2 a Here E [X] = (1)P(X = 1) + 0 P(X = 0) + 1 P(X = 1) = 5 + 0 + 2 = 5 . 5 7.2 b The discrete random variable Y can take the values 0 and 1. The probabilities are P(Y = 0) = P X 2 = 0 = P(X = 0) = 2 , and P(Y = 1) = P X 2 = 1 = 5 P(X = 1) + P(X = 1) = 1 + 2 = 3 . Moreover, E [Y ] = 0 2 + 1 3 = 3 5 5 5 5 5 5
2 7.2 c According to the change of variable formula: E X 2 = (1)2 1 +02 2 +12 5 = 5 5 3 . 5 7.2 d With the alternative expression for the variance: Var(X) = E X 2 (E [X])2 = 3 ( 1 )2 = 14 . 5 5 25 7.3 Since Var(X) = E X 2 (E [X])2 ,we nd E X 2 = Var(X) + (E [X])2 = 7. 7.4 By the change of units rule: E [3 2X] = 3 2E [X] = 1, and Var(3 2X) = 4Var(X) = 16. 7.5 If X has a Ber (p) distribution, then E [X] = 0 (1 p) + 1 p = p and E [X]2 = 0 (1 p) + 1.p = p. So Var(X) = E [X]2 (E [X])2 = p p2 = p(1 p). 7.6 Since f is increasing on the interval [2, 3] we know from the interpretation of expectation as center of gravity that the expectation should lie closer to 3 than to 2. R3 3 3 3 43 The computation: E [Z] = 2 19 z 3 dz = 76 z 4 2 = 2 76 . 7.7 WeRuse the rule for expectation and variance under change of units. First, R1 1 1 E [X] = 0 x (4x 4x3 ) dx = 0 4x2 4x4 dx = (4/3x3 4/5x5 ) 0 = 4/3 4/5 = 8/15. Then, changing units, E [2X + 3] = 2 8/15 + 3 = 61/15. For the variance, rst R1 compute E X 2 = 0 x2 (4x 4x3 ) dx = 1/3. Hence Var(X) = 1/3 (8/15)2 = 11/225, and changing units, Var(2X + 3) = 44/225. d 7.8 Let f be the probability density of X Then f (x) = dx F (x) = 2 2x for R + R1 0 x 1, and f (x) = 0 elsewhere. So E [X] = xf (x) dx = 0 (2x 2x2 ) dx = [x2 2 x3 ]1 = 1 . 0 3 3 7.9 a If X has U (, ) distribution, then X has probability density function f (x) = 1/( ) for x , and 0 elsewhere. So
Then: Var(T ) = E T 2 (E [T ])2 =
E [X] =
x/( ) dx = 1/( )
1 2 x 2
1 ( 2 2 ) 1 = ( + ). 2 2
474

R
2 3 3
x 1 7.9 b First compute E X 2 : E [X] = () dx = 1 ( ) = 3 ( 2 + + 2 ). 3 2 2 2 2 1 1 1 Then Var(X) = E X (E [X]) = 3 ( + + ) 4 ( + )2 = 12 ( )2 . A quicker way to do this is to note that X has the same variance as a U (0, ) 1 distribution U . Hence Var(X) = Var(U ) = 12 ( )2 .
7.10 a If X has an Exp (X) distribution, then X has probability density f (X) = ex for x 0, and f (x) = 0 elsewhere. So, using the partial integration formula, R R 1 1 E [X] = 0 xex dx = xex 0 0 ex dx = ex 0 = . 7.10 b First compute E X 2 , using partial integration, and using the result from part a: E X2 =
Z
0
x2 ex dx = x2 ex
2 2
i Z
0
2xex dx = =
1 . 2
Z
0
xex dx =
2 . 2
Then Var(X) = E [X] (E [X]) =
2 2
1 ( )2
7.11 a If X has a Par () distribution, then X has p.d f (x) = x1 for x 1, and f (x) = 0 elsewhere. So the Par (2) distribution has probability density f (x) = 2x3 , R R and then E [X] = 1 x 2x3 dx = 2 1 x2 dx = 2x1 1 = 2. 7.11 b Now X has probability density function f (x) = R 1 1/2 x dx = x 1 = . 1 2 7.11 c In the general case E [X] =
, 1 1 3/2 x . 2
So now E [X] =
h
x+1 +1
R
1
x x+1 dx =
R
1
x dx =
i
1
provided x
+1
0 as x , which happens for all > 1.
7.12 From 7.11c we already know that E [X] = /( 1) if and only of > 1. R +2 Now we compute E X 2 : E X 2 = 1 x+1 dx = [ x ] = 2 provided +2 1 2 > 2 (otherwise E X = ). It follow that Var(X) exists only if > 2 , and then Var(X) = E [X]2 (E [X])2 = 7.13 Since Var(X) = E
h
(2)
2 (1)2 2
X E [X]
[2 2+12 +2] (1)2 (2)
. (1)2 (2)
0, but also Var(X) = E X 2 (E [X]2 )
we must have that E X 2 (E [X])2 . 7.14 The area of a triangle is half the length of the base time the height. Hence 1 A = 1 Y , where Y is U (0, 1) distributed. It follows that E [A] = 2 E [Y ] = 1 . 2 4 7.15 a We use the change-of-units rule for the expectation twice: Var(rX) = E (rX E [rX])2 = E (rX rE [X])2 = E r2 (X E [X])2 = r2 E (X E [X])2 = r2 Var(X) . 7.15 b Now we use the change-of-units rule for the expectation once: Var(X + s) = E ((X + s) E [X + s])2 = E ((X + s) E [X] + s)2 = E (X E [X])2 = Var(X) . 7.15 c With rst b, and then a: Var(rX + s) = Var(rX) = r2 Var(X) . 7.16 We have to use partial integration: E [X] = R1 4 3 1 R1 x x dx = 4 0 x2 dx = 4 . 3 9 0 3
R1
0
4x2 ln x dx = 4 x3 ln x 3
1 0
29.1 Full solutions
475
7.17 a Since ai 0 and pi 0 it must follow that a1 p1 + + ar pr 0. So 0 = E [U ] = a1 p1 + + ar pr 0. As we may assume that all pi > 0, it follows that a1 = a2 = = ar = 0. 7.17 b Let m = E [V ] = p1 b1 + +pr br . Then the random variable U = (V E [V ])2 takes the values a1 = (b1 m)2 , . . . , ar = (br m)2 . Since E [U ] = Var(V ) = 0, part a tells us that 0 = a1 = (b1 m)2 , . . . , 0 = ar = (br m)2 . But this is only possible if b1 = m, . . . , br = m. Since m = E [V ], this is the same as saying that P(V = E [V ]) = 1. 8.1 The random variable Y can take the values |80 100| = 20, |90 100| = 10, |100 100| = 0, |110 100| = 10 and |120 100| = 20. We see that the values are 0, 10 and 20, and the latter two occur in two ways: P(Y = 0) = P(X = 100) = 0.2, but P(Y = 10) = P(X = 110) + P(X = 90) = 0.4; P(Y = 20) = P(X = 120) + P(X = 80) = 0.4. 8.2 a First we determine the possible values that Y can take. Here these are 1, 0, and 1. Then we investigate which x-values lead to these y-values and sum the probabilities of the x-values to obtain the probability of the y-value. For instance, P(Y = 0) = P(X = 2) + P(X = 4) + P(X = 6) = Similarly, we obtain for the two other values P(Y = 1) = P(X = 3) = 1 , 6 P(Y = 1) = P(X = 1) + P(X = 5) = 1 . 3 1 1 1 1 + + = . 6 6 6 2
8.2 b The values taken by Z are 1, 0, and 1. Furthermore P(Z = 0) = P(X = 1) + P(X = 3) + P(X = 5) = and similarly P(Z = 1) = 1/3 and P(Z = 1) = 1/6. 8.2 c Since for any one has sin2 () + cos2 () = 1, W can only take the value 1, so P(W = 1) = 1. 8.3 a Let F be the distribution function of U , and G the distribution function of V . Then we know that F (x) = 0 for x < 1, F (x) = x for 0 x 1, and F (x) = 1 for x > 1. Thus

1 1 1 1 + + = , 6 6 6 2
G(y) = P(V y) = P(2U + 7 y) = P U
y7 2
y7 , 2
provided 0 (y 7)/2 1 which happens if and only if 0 y 7 2 if and only 7 y 9. Furthermore, G(y) = 0 if y < 7 and G(y) = 1 if y > 9. We recognize G as the distribution function of a U (7, 9) random variable. 8.3 b Let F and G be as before. Now G(y) = P(V y) = P(rU + s y) = P U

ys y s = , r r
provided 0 (y s)/r 1 , which happens if and only if 0 y s r (note that we use that r > 0), if and only if s y s + r. We see that G has a U (s, s + r) distribution.
476
8.4 a Let F be the distribution function of X, and G that of Y . Then we know that 1 F (x) = 1 ex/2 for x 0, and we nd that G(y) = P(Y y) = P 2 X y = P(X 2y) = 1 ey . We recognize G as the distribution function of an Exp (1) distribution. 8.4 b Let F be the distribution function of X, and G that of Y . Then we know that F (x) = 1 ex for x 0, and we nd that G(y) = P(Y y) = P(X y) = P(X y/) = 1 ey . We recognize G as the distribution function of an Exp (1) distribution.
3 8.5 a For 0 b 2 we have that FX (b) = 0 3 x(2 x) dx = 4 x2 1 x3 0 = 4 4 1 3 3 2 b 4 b . Furthermore FX (b) = 0 for b < 0, and FX (b) = 1 for b > 2. 4 8.5 b For 0 y 2 we have FY (y) = P(Y y) = P X y = P X y 2 = 3 4 y 4
Rb
1 y6 . 4
d F dy Y
8.5 c We simply dierentiate FY : fY (y) = and fY (y) = 0 elsewhere. P X P X

d F dy Y
(y) = 3y 3 3y 5 /2 for 0 y

1 X
2,
8.6 a Compute the distribution function of Y : FY (y) = P(Y y) = P

1 y 1 y

= 1P X<
d (1 dy
1 y
y =
1 y
= 1
1 FX ( y ),
where you use that P X <
since X has a continuous distribution. Dierentiating we obtain: fY (y) =

1 FX ( y )) = 1 f (1) y2 X y
(y) =
for y > 0 (fY (y) = 0 for y 0).

1 (1/z)
8.6 b Applying part a with Z = 1/Y we obtain fZ (z) = a again: fZ (z) =

1 f z2 Y

1 z
1 1 f z 2 (1/z)2 X
1 f z2 Y
1 ( z ). Then applying
) = fX (z).
This is of course what should happen: Z = 1/Y = 1/(1/X) = X, so Z and X have the same probability density function. 8.7 Let X be any random variable that only takes positive values, and let Y = ln(X). Then for y > 0 : FY (y) = P(Y y) = P(ln(X) y) = P(X ey ) = FX (ey ). If X has a Par () distribution, then FX (x) = 1 x for x 1. Hence FY (y) = 1 ey for y 0 is the distribution function of Y . We recognize this as the distribution function of an Exp () distribution. 8.8 Let X be any random variable that only takes positive values, and let W = X 1/ /. Then for w > 0 : FW (w) = P(W w) = P X 1/ / w = P(X (w) ) = FX ((w) ). If X has an Exp (1) distribution, then FX (x) = 1 ex for x 0. Hence FW (w) = 1 e(w) for w 0. 8.9 If Y = X, then FY (y) = P(Y y) = P(X y) = P(X y) = 1 FX (y) for all Y (where you use that X has a continuous distribution). Dierentiating we obtain fY (y) = fX (y) for all y. 8.10 Because of symmetry: P(X 3) = 0.500. Furthermore: 2 = 4, so = 2. Then Z = (X 3)/2 is an N (0, 1) distributed random variable, so that P(X 1) = P((X 3)/2) (1 3)/2 = P(Z 1) = P(Z 1) = 0.1587.

29.1 Full solutions
477
8.11 Since g is a convex function, Jensens inequality yields that g(E [X]) E [g(X)]. Since E [g(X)] = E [g(X)], the inequality follows by multiplying both sides by 1. 8.12 a The possible values Y can take are 0 = 0, 1 = 1, 100 = 10, and 10 000 = 100. Hence the probability mass function is given by y P(Y = y) 8.12 b Compute the second derivative:
p p
0
1 4
1 10 100
1 4 1 4 1 4
= 1 x3/2 < 0. Hence g(x) = x 4 h i p is a convex function. Jensens inequality yields that E [X] E X . E [X] = (0 + 1 + 100 + 10 000)/4 = 50.25, but h i E X = E [Y ] = (0 + 1 + 10 + 100)/4 = 27.75.
d2 x dx2
8.12 c We obtain
8.13 On the interval [, 2] the function sin w is a convex function, so by Jensens inequality sin(E [W ]) E [sin(W )]]. Veried by computations : sin(E [W ]) = sin( 3 ) = 2 R 2 1 E [sin(W )] = sin(w)/ dw = [ cos(w)/]2 = 2/. 8.14 a An example is X with P(X = 1) = P(X = 1) = 1 . Then E [X] = 2 1 1 1 = 0 and also E X 3 = 1 (1) + 2 (1) = 0. 2 2
1 2
(1)+
8.14 b The function g(x) = x3 is strictly convex on the interval (0, ). Hence if X > 0 (and X is not constant) the inequality will hold. 8.15 a We know that P(Z z) = [P(X1 z)]2 . So FZ (z) = z 2 for 0 z 1, R1 2 and fZ (z) = 2z on this interval. Therefore E [Z] = 0 2z 2 dz = 3 . For V we have R1 2 2 FV (v) = 1 (1 v) = 2v v . Hence fV (v) = 2 2v. Therefore E [V ] = 0 (2v 2 2v ) dv = 1 2/3 = 1/3. 8.15 b For generalRn we have FZ (z) = z n for 0 z 1, and fZ (z) = nz n1 . 1 n Therefore E [Z] = 0 nz n dz = n+1 . For V we have FV (v) = 1 (1 v)n , and fV (v) = n(1 v)n1 . Therefore E [V ] = 0 nv(1 v)n1 dv = R1 R1 1 1 1 n 0 un1 du n 0 un du = n. n n. n+1 = n+1 .
R1 R1
0
n(1 u)un1 du =
8.15 c Since Yi = 1 Xi are also uniform we get E [Z] = E [max{Y1 , . . . , Yn }] = E [max{1 X1 , . . . , 1 Xn }] = 1 E [min{X1 , . . . , Xn }] = 1 E [V ]. 8.16 a Suppose rst that a b Then min{a, b} = a , and also |a b| = b a. So a + b |a b| = a + b b + a = 2a, and the formula holds. If a > b then min{a, b} = b, and a + b |a b| = a + b a + b = 2b, and the formula holds also for this case. 8.16 b From part a and linearity of expectations we have E [min{X, Y }] = E [(X + Y |X Y |)/2] = = E [X] since E [X] = E [Y ]. 8.16 c From b we obtain, since E [|] X Y | 0 that E [min{X, Y }] E [X] . Interchanging X and Y we also have E [min{X, Y }] E [Y ]. Combining these two we have inequalities we obtain E [min{X, Y }] min{E [X] , E [Y ]}. 1 E [|X Y |] , 2 1 1 1 E [X] + E [Y ] E [X Y |] 2 2 2
478
8.17 a For n = 2 : if x1 x2 then min{x1 , x2 } = x1 and max{x1 , x2 } = x1 since x1 x2 ; similarly for the case x1 > x2 . In general, if xi is the smallest of x1 , . . . , xn then xi will be the largest of x1 , . . . , xn . 8.17 b Using part a we have FV (a) = P(min{X1 , . . . , Xn } a) = P( max{X1 , . . . , Xn } a) = P(max{X1 , . . . , Xn } a) = 1 P(max{X1 , . . . , Xn } a) = 1 (P(X1 a))n = 1 (1 P(X1 a))n , using Exercise 8.9 in the last step. 8.18 The distribution function of each of the Xi is F (x) = 1 ex . Then the distribution function of V is 1 (1 F (x))n = 1 enx . This is the distribution function of an Exp (n) random variable. 8.19 a This happens for all in the interval [/4, /2], which corresponds to the upper right quarter of the circle. 8.19 b Since {Z t} = {X arctan(t)}, we obtain FZ (t) = P(Z t) = P(X arctan(t)) = 1 1 + arctan(t). 2
8.19 c Dierentiating FZ we obtain that the probability density function of Z is fZ (z) = d d FZ (z) = dz dz

1 1 + arctan(z) 2
1 (1 + z 2 )
for < z < .
9.1 For a and b from 1 to 4 we have P(X = a) = P(X = a, Y = 1) + + P(X = a, Y = 4) = and 1 , 4
1 . 4 9.2 a From P(X = 1, Y = 1) = 1/2, P(X = 1) = 2/3, and the fact that P(X = 1) = P(X = 1, Y = 1) + P(X = 1, Y = 1), it follows that P(X = 1, Y = 1) = 1/6. Since P(Y = 1) = 1/2 and P(X = 1, Y = 1) = 1/2, we must have: P(X = 0, Y = 1) and P(X = 2, Y = 1) are both zero. From this and the fact that P(X = 0) = 1/6 = P(X = 2) one nds that P(X = 0, Y = 1) = 1/6 = P(X = 2, Y = 1). P(Y = b) = P(X = 1, Y = b) + + P(X = 4, Y = b) = 9.2 b Since, e.g., P(X = 2, Y = 1) = 0 is dierent from P(X = 2) P(Y = 1) = one nds that X and Y are dependent. 9.3 a P(X = Y ) = P(X = 1, Y = 1) + + P(X = 4, Y = 4) = 1 . 4 9.3 b P(X + Y = 5) = P(X = 1, Y = 4) + P(X = 2, Y = 3) + P(X = 3, Y = 2) + 1 P(X = 4, Y = 1) = 4 . 9.3 c P(1 < X 3, 1 < Y 3) = P(X = 2, Y = 2)+P(X = 2, Y = 3)+P(X = 3, Y = 2)+ 1 P(X = 3, Y = 3) = 4 . 9.3 d P((X, Y ) {1, 4} {1, 4}) = P(X = 1, Y = 1)+P(X = 1, Y = 4)+P(X = 4, Y = 1)+ 1 P(X = 4, Y = 4) = 4 .
1 6 1 2,
29.1 Full solutions
479
9.4 Since P(X = i, Y = j) is either 0, or is equal to 1/14 for each i and j from 1 to 5, we know that all the entries of the rst row and the second column of the table are equal to 1/14. Since P(X = 1) = 1/14, this then determines the rest of the rst column (apart of the rst entry it contains only zeroes). Similarly, since P(Y = 5) = 1/14 one must have thatapart from the second entryall the entries in the fth row are zero. Continuing in this way we nd a b 1 2 3 4 5 P(X = a) 1 1/14 0 0 0 0 2 1/14 1/14 1/14 1/14 1/14 3 4 5 P(Y = b)
1/14 1/14 1/14 5/14 1/14 1/14 1/14 4/14 1/14 0 0 2/14 1/14 0 0 2/14 0 0 0 1/14 1
1/14 5/14 4/14 2/14 2/14
1 1 9.5 a From the rst row it follows that 16 4 . The second and third row do 1 1 not add extra information, so we nd that 16 4 .
9.5 b For any (allowed) value of we have that P(X = 1, Y = 4) = 0. Since 1 3 P(X = 1) 1 , and P(Y = 4) = 16 , we nd that P(X = 1, Y = 4) = 16 8 P(X = 1) P(Y = 4). Hence, there does not exist a value for for which X and Y are independent. 9.6 a U attains the values 0, 1, and 2, while V attains the values 0 and 1. Then P(U = 0, V = 0) = P(X = 0, Y = 0) = P(X = 0) P(Y = 0) = 1 , due to the inde4 pendence of X and Y . In a similar way the other joint probabilities of U and V are obtained, yielding the following table: u v 0 1 0 1 2
1/4 0 1/4 1/2 0 1/2 0 1/2 1/4 1/2 1/4 1
9.6 b Since P(U = 0, V = 0) = are dependent.
1 4
1 8
= P(U = 0) P(V = 0), we nd that U and V
9.7 a The joint probability distribution of X and Y is given by a b 1 2 P(X = a) 1 2 3 P(Y = b) 0.43 0.57 1
0.22 0.15 0.06 0.11 0.24 0.22 0.33 0.39 0.28
480
1168 9.7 b Since P(X = 1, Y = 1) = 5383 = 1741 2298 = P(X = 1) P(Y = 1), we nd that 5383 5383 X and Y are dependent. 9.8 a Since X can attain the values 0 and 1 and Y the values 0 and 2, Z can attain the values 0, 1, 2, and 3 with probabilities: P(Z = 0) = P(X = 0, Y = 0) = 1/4, P(Z = 1) = P(X = 1, Y = 0) = 1/4, P(Z = 2) = P(X = 0, Y = 2) = 1/4, and P(Z = 3) = P(X = 1, Y = 2) = 1/4. 9.8 b Since X = Z Y , X can attain the values 2, 1, 0, 1, 2, and 3 with probabilities
P X = 2 = P Z = 0, Y = 2 = 1/8, P X = 1 = P Z = 1, Y = 2 = 1/8, P X = 0 = P Z = 0, Y = 0 + P Z = 2, Y = 2 = 1/4, P X = 1 = P Z = 1, Y = 0 + P Z = 3, Y = 2 = 1/4, P X = 2 = P Z = 2, Y = 0 = 1/8, P X = 3 = P Z = 3, Y = 0 = 1/8. We have the following table: z 2 1 0 1 2 3

pX (z) 1/8 1/8 1/4 1/4 1/8 1/8 9.9 a One has that FX (x) = limy F (x, y). So for x 0: FX (x) = 0, and for x > 0: FX (x) = F (x, ) = 1 e2x . Similarly, FY (y) = 0 for y 0, and for y > 0: FY (y) = F (, y) = 1 ey . 9.9 b For x > 0 and y > 0: f (x, y) =
(2x+y) 2 F (x, y) x y
ey e(2x+y)
2e . 9.9 c There are two ways to determine fX (x):

Z
fX (x) =
f (x, y) dy =
e(2x+y) dy = 2e2x
for x > 0
d FX (x) = 2e2x for x > 0. dx Using either way one nds that fY (y) = ey for y > 0. 9.9 d Since F (x, y) = FX (x)FY (y) for all x, y, we nd that X and Y are independent. 9.10 a fX (x) =

and
1 1 1 2 X , Y 4 2 3 3
= = =
1 2 1 4
2 3 1 3 1 2
12 xy(1 + y) dx dy 5
Z
2 3 1 3
12 5
Z
1 4
Z
1 4 1 2
y(1 + y) dy dx
82 41 x dx = . 135 720
29.1 Full solutions 9.10 b Since f (x, y) = 0 for x < 0 or y < 0,

Z
a b
481
F (a, b) = P(X a, Y b) =
Z
a 0
f (x, y) dy dx
Z
0
12 3 2 xy(1 + y) dy dx = a2 b2 + a2 b3 . 5 5 5
9.10 c Since f (x, y) = 0 for y > 1, we nd for a between 0 and 1, and b 1, F (a, b) = P(X a, Y b) = P(X a, Y 1) = F (a, 1) = a2 . Hence, applying (9.1) one nds that FX (a) = a2 , for a between 0 and 1. 9.10 d Another way to obtain fX is by dierentiating FX . 9.10 e f (x, y) = fX (x)fY (y), so X and Y are independent. 9.11 To determine P(X < Y ) we must integrate f (x, y) over the region G of points (x, y) in R2 for which x is smaller than y:
ZZ
P(X < Y ) =
Z

{(x,y) 2 ; x<y}
Z
f (x, y) dx dy
Z
0 1
Z
0
= 12 = 5
Z
0 1
f (x, y) dx dy =
Z
0 y
y(1 + y)
12 x dx dy = 10
Z
0
12 xy(1 + y) dx dy 5
1
y 3 (1 + y) dy =
27 . 50
9.12 a Since the integral over R2 of the joint probability density function is equal to 1, we nd from Z 1Z 2 (3u2 + 8uv) du dv = 10
0 0
Here we used that f (x, y) = 0 for (x, y) outside the unit square.
that K = 9.12 b
1 . 10
Z Z
P(2X Y ) =
Z
(x,y):2x<y 1 x=0
1 (3x2 + 8xy) dx dy 10
=
y=2x
1 (3x2 + 8xy) dx dy 10
= 0.45. 9.13 a For r 0, let DRRbe the disc with origin (0, 0) and radius r. Since 1 = r R R

f (x, y) dx dy =
D1
c dx dy = c area of D1 , we nd that c = 1/.
9.13 b Clearly FR (r) = 0 for r < 0, and FR (r) = 1 for r > 1. For r between 0 and 1, ZZ 1 r 2 dx dy = = r2 . FR (r) = P((X, Y ) Dr ) = Dr
482
9.13 c For x between 1 and 1,

Z 1x2
fX (x) =
1x2
1 2p dy = 1 x2 .
For x outside the interval 1, 1] we have that fX (x) = 0. 9.14 a Let f be the joint probability density function of the pair (X, Y ), and F 2 their joint distribution function. Since f (x, y) = x y F (x, y), we rst determine F . Setting G = (, a] (, b], for 1 a, b 1, we have that F (a, b) = P((X, Y ) G) = (a + 1)(b + 1) . 4
Furthermore, if a 1 or b 1, we have that F (a, b) = 0, since G = . In a similar way, we nd for a > 1 and 1 b 1 that F (a, b) = (b + 1)/2, while for 1 a 1 and b > 1 we have that F (a, b) = (a + 1)/2. Finally, if a, b > 1, then F (a, b) = 1. Taking derivatives, it then follows that f (x, y) = 1/4 for a and b between 1 and 1, and f (x, y) = 0 for all other values of x and y. 9.14 b Note that for x between 1 and 1, the marginal probability density function fX of X is given by Z 1 1 fX (x) = f (x, y) dy = , 2 1 and that fX (x) = 0 for all other values of x. Similarly, fY (y) = 1/2 for y between 1 and 1, and fY (y) = 0 otherwise. But then we nd that f (x, y) = fX (x)fY (y), for all possible xs and ys, and we nd that X and Y are independent, U (1, 1) distributed random variables. 9.15 a Setting have that (a, b) as the set of points (x, y), for which x a and y b, we F (a, b) = area ( (a, b)) . area of
If If If If If
a < 0 or if b < 0 (or both), then area ( (a, b)) = , so F (a, b) = 0, 1 (a, b) , then area ( (a, b)) = a(b 2 a), so F (a, b) = a(2b a), 1 0 b 1, and a > b, then area ( (a, b)) = 2 b2 , so F (a, b) = b2 , 0 a 1, and b > 1, then area ( (a, b)) = a 1 a2 , so F (a, b) = 2a a2 , 2 both a > 1 and b > 1, then area ( (a, b)) = 1 , so F (a, b) = 1. 2
2
9.15 b Since f (x, y) = x y F (x, y), we nd for (x, y) that f (x, y) = 2. Furthermore, f (x, y) = 0 for (x, y) outside the triangle .
9.15 c For x between 0 and 1,

Z
fX (x) =
f (x, y) dy =
x
2 dy = 2(1 x).
For y between 0 and 1,

Z
fY (y) =
f (x, y) dy =
0
2 dx = 2y.
29.1 Full solutions
483
9.16 Following the solution of Exercise 9.14 one nds that U and V are independent U (0, 1) distributed random variables. Using the results from Section 8.4 we nd that FU (x) = 1 (1 x)2 for x between 0 and 1, and that FV (y) = y 2 for y between 0 and 1. Dierentiating yields the desired result. 9.17 For 0 s t a, it follows from the fact that U1 and U2 are independent uniformly distributed random variables over [0, a], that P(U1 t, U2 t) = t2 /a2 , and that P(s < U1 t, s < U2 t) = (t s)2 /a2 .
But then the answer is an immediate consequence of the hint. The statement can also be obtained as follows. Note that P(V s, Z t) = P(U2 s, s < U1 t) + P(U1 s, s < U2 t) +P(U1 s, U2 s) . Using independence and that fact that Ui has distribution function FU1 (u) = u/a, we nd for 0 s t a: P(V s, Z t) = 9.18 a By denition E [Xi ] =
N X k=1
s(t s) s(t s) t2 (t s)2 s2 + + 2 = . a2 a2 a a2
kpXi (k) =
1 1 N +1 1 (1 + 2 + + N ) = N (N + 1) = . N N 2 2
9.18 b Using the identity, we nd

2 E Xi = N X k=1
k2 P(Xi = k) =
N (N + 1)(2N + 1) 1 X 2 k = . N 6 k=1
But then we have that

2 Var(Xi ) = E Xi E [Xi ] 2
9.19 a Clearly we must have that a = 50 = 5 2, but then it follows that 2ab = 80, implying that b = 4 2. Since 32 + c = 50, we nd that c = 18. 9.19 b Note that
Z
2y4 2x)2
N2 1 . 12
e(5
dy =
e 2 (
1
) dy
Z = 2 = 2,
1 y 1 e 2 ( ) dy 2
1 y 1 e 2 for < y < 2 is the probability density function of an N (, 2 ) distributed random variable (and therefore integrates to 1).
since
484 9.19 c

Z Z
fX (x) =
f (x, y) dy =
Z
30 = =
(5 2y4 2x)2 18x2
30 50x2 50y2 +80xy e dy e dy

2
30 2 18x2 = e 10
1 1 e 2 1 2 6
1 ) 36
x 1/6
for < x < . So we see that X has a N (0,
distribution.
9.20 a Since the needle hits the sheet of paper at an random position, the midpoint (X, Y ) falls completely randomly between some lines. Consequently, the distance Z between (X, Y ) and the line under (X, Y ) has a U (0, 1) distribution. Also the orientation of the needle is completely random, so the angle between the needle and the positive x-axis can be anything between 0 and 180 degrees. But then H has a U (0, ) distribution. 9.20 b From Figure 29.1 we see thatin case Z 1/2the needle hits the line under it when Z 1 sin H. In case Z > 1/2, we have a similar picture, but then 2 the needle hits the line above it when 1 Z 1 sin H. 2
(X, Y ..... )
. .. ... ... ... ... ... ... . ..... ... ... .... . .......... ............... ..... . ................ . .. ... . ..... .. ..... .. .... ... ... .. ... .................. . . . ... ..................1 ... ... . 2
sin H
Fig. 29.1. Solution of Exercise 9.20, case Z 1/2
9.20 c Since Z and H are independent, uniformly distributed random variables, the probability we are looking for is equal to the area of the two regions in [0, ) [0, 1] for which either 1 1 Z sin H or 1 Z sin H, 2 2 divided (!) by the total area . Note that these two regions both have the same are, and that this area is equal to
Z
0
1 1 sin H dH = 2 = 1. 2 2
29.1 Full solutions So the probability we are after is equal to 1 1 2 + = . 10.1 a The joint probability distribution of X and Y is given by a b 1 2 P(X = a) This means that E [X] = 1 0.323 + 2 0.397 + 3 0.280 = 1.957 E [Y ] = 1 0.427 + 2 0.573 = 1.573 E [XY ] = 1 1 0.217 + + 2 3 0.223 = 3.220. 1 2 3 P(Y = b) 0.427 0.573 1
485
0.217 0.153 0.057 0.106 0.244 0.223 0.323 0.397 0.280
It follows that Cov(X, Y ) = E [XY ] E [X] E [Y ] = 3.220 1.957 0.245 = 0.142. 10.1 b From the joint probability distribution of X and Y we compute E X 2 = 1 0.323 + 4 0.397 + 9 0.280 = 4.431 Var(X) = E X 2 (E [X])2 = 0.601 E Y 2 = 1 0.427 + 4 0.573 = 2.719 Var(Y ) = E Y 2 (E [Y ])2 = 0.245. Using the result from a we nd (X, Y ) = p 10.2 a From Exercise 9.2: x y 1 1 pX (x) so that E [XY ] = 10.2 b E [Y ] = 0 0 = 0.
1 2 1 6
Cov(X, Y ) 0.142 = 0.369. = 0.601 0.245 Var(X) Var(Y )
pY (y) 1/2 1/2 1
1/6 1/6 1/6 0 1/2 0 1/6 2/3 1/6

1 6
1 (1) +
1 2
2 (1) +
1 2
1 1 = 0.
(1) +
1 = 0, so that Cov(X, Y ) = E [XY ] E [X] E [Y ] =
10.2 c From the marginal distributions we nd
486
Full solutions from MIPS: DO NOT DISTRIBUTE 1 2 1 +1 +2 =1 6 3 6 1 2 1 E X2 = 0 + 1 + 4 = 1 6 3 6 4 1 Var(X) = E X 2 (E [X])2 = 12 = 3 3 1 1 E Y 2 = (1)2 + 12 = 1 2 2 Var(Y ) = E Y 2 (E [Y ])2 = 1 02 = 1. E [X] = 0
Since Cov(X, Y ) = 0 we have Var(X + Y ) = Var(X) + Var(Y ) =
1 3
+ 1 = 4. 3
10.2 d Since Cov(X, Y ) = Cov(X, Y ) = 0 we have Var(X Y ) = Var(X) + Var(Y ) = Var(X) + (1)2 Var(Y ) = 1 + 1 = 4 . 3 3 10.3 We have E [U ] = 1, E [V ] = 1 , E U 2 = 3 , E V 2 = 1 , Var(U ) = 1 , Var(V ) = 2 2 2 2 1 , and E [U V ] = 1 . Hence Cov(U, V ) = 0 and (U, V ) = 0. 4 2
1 10.4 Both X and Y have marginal probabilities ( 1 , 4 , 1 , 1 ), so that 4 4 4
1 1 1 1 + 2 + 3 + 4 = 2.5 4 4 4 4 1 1 1 1 E [Y ] = 1 + 2 + 3 + 4 = 2.5 4 4 4 4 16 1 E [XY ] = 1 1 + + 4 4 = 6.25. 136 136 E [X] = 1 Hence Cov(X, Y ) = E [XY ] E [X] E [Y ] = 6.25 2.5 2.5 = 0.
8 10 6 10.5 a First nd P(X = 1, Y = 0) = 1 72 72 = 72 , and similarly P(X = 0, Y = 2) = 3 4 1 1 5 and P(Y = 2) = 6 . Then P(X = 1) = 4 and P(X = 2) = 12 , and nally 72 5 P(X = 2, Y = 2) = 71 :
a b 0 1 2 0 1 2 1/3 1/2 1/6 1
8/72 6/72 10/72 12/72 9/72 15/72 4/72 3/72 5/72 1/3 1/4 5/12
10.5 b With a we nd: 1 1 5 13 +1 +2 = 3 4 12 12 1 1 1 5 E [Y ] = 0 + 1 + 2 = 3 2 6 6 9 5 65 E [XY ] = 1 1 + + 2 2 = . 72 72 72 E [X] = 0 Hence Cov(X, Y ) = E [XY ] E [X] E [Y ] =

65 72
13 12
5 6
= 0.
10.5 c Yes, for all combinations (a, b) it holds that P(X = a, Y = b) = P(X = a) P(Y = b).
29.1 Full solutions 10.6 a When c = 0, the joint distribution becomes a b 1 0 1 P(X = a) 1 0 1 P(Y = b) 1/3 1/3 1/3 1
487
2/45 9/45 4/45 7/45 5/45 3/45 6/45 1/45 8/45 1/3 1/3 1/3
1 We nd E [X] = (1) 1 + 0 1 + 1 3 = 0, and similarly E [Y ] = 0. By leaving out 3 3 terms where either X = 0 or Y = 0, we nd
E [XY ] = (1) (1)
2 4 6 8 + (1) 1 + 1 (1) +11 = 0, 45 45 45 45
which implies that Cov(X, Y ) = E [XY ] E [X] E [Y ] = 0. 10.6 b Note that the variables X and Y in part b are equal to the ones from part a, shifted by c. If we write U and V for the variables from a, then X = U + c and Y = V + c. According to the rule on the covariance under change of units, we then immediately nd Cov(X, Y ) = Cov(U + c, V + c) = Cov(U, V ) = 0. Alternatively, one could also compute the covariance from Cov(X, Y ) = E [XY ] 1 E [X] E [Y ]. We nd E [X] = (c 1) 3 + c 1 + (c + 1) 1 = c, and similarly E [Y ] = c. 3 3 Since E [XY ] = (c 1) (c 1) 2 9 4 + (c 1) c + (c + 1) (c + 1) 45 45 45 7 5 3 +c (c 1) +cc + c (c + 1) 45 45 45 6 1 8 +(c + 1) (c 1) + (c + 1) c + (c + 1) (c + 1) = c2 , 45 45 45
we nd Cov(X, Y ) = E [XY ] E [X] E [Y ] = c2 c c = 0. 10.6 c No, X and Y are not independent. For instance, P(X = c, Y = c + 1) = 1/45, which diers from P(X = c) P(Y = c + 1) = 1/9.
1 10.7 a E [XY ] = 8 , E [X] = E [Y ] = 1 , so that Cov(X, Y ) = E [XY ] E [X] E [Y ] = 2 1 ( 1 )2 = 1 . 8 2 8 1 10.7 b Since X has a Ber ( 1 ) distribution, Var(X) = 1 . Similarly, Var(Y ) = 4 , so 2 4 that Cov(X, Y ) 1/8 1 (X, Y ) = p = = . 1/4 2 Var(X) Var(Y )
10.7 c For any between 1 and 1 , Cov(X, Y ) = E [XY ] ( 1 )2 = ( 1 ) ( 1 )2 = 4 4 2 4 2 , so that Cov(X, Y ) (X, Y ) = p = = 4. 1/4 Var(X) Var(Y ) Hence, (X, Y ) is equal to 1, 0, or 1, for equal to 1/4, 0, or 1/4. 10.8 a E X 2 = Var(X) + (E [X])2 = 8. 10.8 b E 2X 2 + Y = 2E X 2 + E [Y ] = 2 8 + 3 = 13.
488
10.9 a If the aggregated blood sample tests negative, we do not have to perform additional tests, so that Xi takes on the value 1. If the aggregated blood sample tests positive, we have to perform 40 additional tests for the blood sample of each person in the group, so that Xi takes on the value 41. We rst nd that P(Xi = 1) = P(no infections in group of 40) = (1 0.001)40 = 0.96, and therefore P(Xi = 41) = 1 P(Xi = 1) = 0.04. 10.9 b First compute E [Xi ] = 10.96+410.04 = 2.6. The expected total number of tests is E [X1 + X2 + + X25 ] = E [X1 ]+E [X2 ]+ +E [X25 ] = 252.6 = 65. With the original procedure of blood testing, the total number of tests is 2540 = 1000. On average the alternative procedure would only require 65 tests. Only with very small probability one would end up with doing more than 1000 tests, so the alternative procedure is better. 10.10 a We nd
Z
E [X] =
Z

xfX (x) dx =
Z
0 2 1
2 2 9 4 7 3 9x3 + 7x2 dx = x + x 225 225 4 3 1 3 4 1 (3y 3 + 12y 2 ) dy = y + 4y 3 25 25 4

2
1
3
=
0
109 , 50
E [Y ] =
yfY (y) dy =
157 , 100
so that E [X + Y ] = E [X] + E [Y ] = 15/4. 10.10 b We nd E X2 = E Y2 = E [XY ] =

0
Z Z Z
x2 fX (x) dx = y 2 fY (y) dy =
Z Z
0
2 2 9 5 7 4 9x4 + 7x3 dx = x + x 225 225 5 4 1 3 5 1 (3y 4 + 12y 3 ) dy = y + 3y 4 25 25 5

Z
3 0
3
=
0
2 1
2
1
1287 , 250
3Z 2
318 , 125
xyf (x, y) dy dx =
Z
1 3 0
4 = 75 =
Z
x
Z
0 3
3 1
y dy 2 15 75 4
2
2 dx + 75
Z
3 0
2 2x3 y 2 + x2 y 3 dy dx 75
3
Z
x2
1
y 3 dy
dx
4 7 75 3
x3 dx +
2
x2 dx =
171 , 50
so that E (X + Y ) 10.10 c We nd
=E X
+ E Y 2 + 2E [XY ] = 3633/250.
2
Var(X) = E X 2 (E [X])2 = Var(Y ) = E Y 2 (E [Y ])2 =
1287 250 318 125
109 50
= =
157 100
2
989 , 2500
791 , 10 000

Var(X + Y ) = E (X + Y )2 (E [X + Y ])2 =
3633 250
15 4
2
939 . 2000
Hence, Var(X) + Var(Y ) = 0.4747, which diers from Var(X + Y ) = 0.4695.

9 10.11 Cov(T, S) = Cov 9 X + 32, 9 X + 32 = ( 5 )2 Cov(X, Y ) = 9.72 and (T, S) = 5 5 9 X + 32, 9 X + 32 = (X, Y ) = 0.8. 5 5
29.1 Full solutions
489
10.12 Since H has a U (25, 35) distribution, it follows immediately that E [H] = 30. Furthermore 3 1 Z 12.5 1 r E R2 = r2 dr = 2.5 = 102.0833. 5 15 7.5 7.5 Hence E [H] E R2 = 30 102.0833 = 9621.128. 10.13 a You can apply Exercise 8.15 directly for the case n = 2 to conclude that E [X] = 1 and E [Y ] = 2 . In order to compute the variances we need the probability 3 3 densities of X and Y . Using the rules about the distribution of the minimum and maximum of independent random variables in Section 8.4, we nd fX (x) = 2(1 x) fY (y) = 2y
R1
for 0 x 1 for 0 y 1.
1 6
1 This means that E X 2 = 0 2x2 (1 x) dx = 6 , so that Var(X) = R1 3 1 1 Similarly, E Y 2 = 0 2y dy = 1 , so that Var(Y ) = 2 ( 2 )2 = 18 . 2 3
( 1 )2 = 3
1 . 18
10.13 b Since U and V are independent, each with variance Var(X + Y ) = Var(U + V ) = Var(U ) + Var(V ) = 1 . 6
1 , 12
it follows that
10.13 c To compute Cov(X, Y ) we need E [XY ]. This can be solved from 1 = Var(X + Y ) = E (X + Y )2 (E [X + Y ])2 6 = E X 2 + 2E [XY ] + E Y 2 (E [X] + E [Y ])2 .
1 3 Substitute E [X] = 1 and E X 2 = Var(X) + (E [X])2 = 18 + 1 = 18 , and similarly 3 9 1 1 E [Y ] = 2 and E Y 2 = Var(Y ) + (E [Y ])2 = 18 + 4 = 1 . This leads to E [XY ] = 4 . 3 9 2
10.14 a By using the alternative expression for the covariance and linearity of expectations, we nd Cov(X + s, Y + u) = E [(X + s)(Y + u)] E [X + s] E [Y + u] = E [XY + sY + uX + su] (E [X] + s)(E [Y ] + u) = (E [XY ] + sE [Y ] + uE [X] + su) (E [X] E [Y ] + sE [Y ] + uE [X] + su) = E [XY ] E [X] E [Y ] = Cov(X, Y ) . 10.14 b By using the alternative expression for the covariance and the rule on expectations under change of units, we nd Cov(rX, tY ) = E [(rX)(tY )] E [rX] E [tY ] = E [rtXY ] (rE [X])(tE [Y ]) = rtE [XY ] rtE [X] E [Y ] = rt (E [XY ] E [X] E [Y ]) = rtCov(X, Y ) . 10.14 c First applying part a and then part b yields Cov(rX + s, tY + u) = Cov(rX, tY ) = rtCov(X, Y ) .
490
10.15 a Left plot: looks like 500 realizations from a pair (X, Y ) whose 2-dimensional distribution has contourlines that are circles. This means X and Y are uncorrelated. Middle plot: looks like 500 realizations from a pair (X, Y ) whose 2-dimensional distribution has contourlines that are ellipsoids with the line y = x as main axis. This means X and Y are positively correlated; Right plot: looks like 500 realizations from a pair (X, Y ) whose 2-dimensional distribution has contourlines that are ellipsoids with the line y = x as main axis. This means X and Y are negatively correlated. 10.15 b In the right picture the points are concentrated more closely than in the other pictures. Hence |(X, Y )| will be the largest for the right picture. 10.16 a Cov(X, X + Y ) = Cov(X, X) + Cov(X, Y ) = Var(X) + Cov(X, Y ) 10.16 b Anything can happen, depending on the sign of Cov(X, Y ) and its magnitude compared to Var(X). 10.16 c If X and Y are uncorrelated, Cov(X, Y ) = Var(X) 0, so apart from the special case where X is constant, X and X + Y are positively correlated. 10.17 a Since all expectations are zero, it is sucient to show E (X + Y + Z)2 = E X 2 + E Y 2 + E Y 2 + 2E [XY ] + 2E [XZ] + 2E [Y Z] . This follows immediately from the hint with n = 3, and using linearity of expectations. 10.17 b Write X = X E [X], and similarly Y and Z. Then part a applies to X, Y , and Z: Var X + Y + Z = Var X + Var Y + Var Z

+ 2Cov X, Y + 2Cov X, Z + 2Cov Y , Z . According to the rules on pages 104 and 151 about the variance and covariance under a change of units, it follows that Var X = Var(X), Cov X, Y = Cov(X, Y ), and similarly for all other variances and covariances. 10.17 c Similar to part a rst consider the case with all random variables having expectation zero. As in part a, the hint in a together with linearity of expectations yields the desired equality. Then argue as in part b by introducing Xi = Xi E [Xi ]. 10.17 d Use part c and note that there are n terms Var(Xi ) and n(n 1) terms Cov(Xi , Xj ) with i = j. 10.18 First note that X1 + X2 + + XN is the sum of all numbers, which is a nonrandom constant. Therefore, Var(X1 + X2 + + XN ) = 0. In Section 9.3 we argued that, although we draw without replacement, each Xi has the same distribution. By the same reasoning, we nd that each pair (Xi , Xj ), with i = j, has the same joint distribution, so that Cov(Xi , Xj ) = Cov(X1 , X2 ) for all pairs with i = j. Direct application of Exercise 10.17 with 2 = (N 1)(N + 1) and = Cov(X1 , X2 ) gives 0 = Var(X1 + X2 + + XN ) = N (N 1)(N + 1) + N (N 1)Cov(X1 , X2 ) . 12
Solving this identity gives Cov(X1 , X2 ) = (N + 1)/12.
29.1 Full solutions 10.19 By denition and linearity of expectations: Cov(X, Y ) = E [(X E [X])(Y E [Y ])] = E [XY XE [Y ] E [X] Y + E [X] E [Y ]] = E [XY ] E [XE [Y ]] E [E [X] Y ] + E [X] E [Y ] = E [XY ] E [X] E [Y ] E [X] E [Y ] + E [X] E [Y ] = E [XY ] E [X] E [Y ] . 10.20 To compute the correlation, we need: Cov U, U 2 = E U 3 E [U ] E U 2 , Var(U ) = E U 2 (E [U ])2 , Var U 2 = E U 4 E U 2 Hence we determine
h i Z
0 a 2
491
E Uk = which yields Cov U, U 2 = Var(U ) = Var U 2 = Hence
uk
1 ak du = , a k+1

a3 a a2 = 4 2 3
a2 a 2 = 3 2 a4 5

1 11 4 23 1 1 3 4
a3 = a2 =

1 3 a , 12 1 2 a , 12 4 4 a . 45
a2 3
2
1 1 5 9
a4 =
1 1 135 = 15 = 0.968. 12 4 Note that the answer does not depend on a. U, U 2 =
11.1 a In the addition rule, k is between 2 and 12. We must always have 1 k 6 and 1 6, or equivalently, k 6 k 1 and 1 6. For k = 2, . . . , 6, this means pZ (k) = P(X + Y = k) = and for k = 7, . . . , 12, pZ (k) = P(X + Y = k) =
6 X =k6 k1 X =1
pX (k )pY ( ) =
k1 X =1
1 1 k1 = 6 6 36
pX (k )pY ( ) =
6 X 1 =k6
1 13 k = . 6 36
11.1 b In the addition rule, k is between 2 and 2N . We must always have 1 k N and 1 N , or equivalently, k N k 1 and 1 N . For k = 2, . . . , N , this means pZ (k) = P(X + Y = k) =
k1 X =1
pX (k )pY ( ) =
k1 X =1
k1 1 1 = N N N2
492
and for k = N + 1, . . . , 2N , pZ (k) = P(X + Y = k) =

N X =kN
pX (k )pY ( ) =
N X =kN
1 1 2N k + 1 = . N N N2
11.2 a By using the rule on addition of two independent discrete random variables, we have P(X + Y = k) = pZ (k) =
X =0
pX (k )pY ( ).
!
Because pX (a) = 0 for a 1, all terms with P(X + Y = k) = also using

Pk
=0 k k k X =0
k + 1 vanish, so that = 2k 2 e , k!
k 1k 1 e2 X k e1 e1 = (k )! ! k! =0
= 2 in the last equality.
11.2 b Similar to part a, by using the rule on addition of two independent discrete random variables and leaving out terms for which pX (a) = 0, we have P(X + Y = k) = Next, write k = ( + )k

k X k =0
(k )!
( + )k (+) X k k e = e . ! k! ( + )k
k =0

k
+
!

k
= p (1 p)k
with p = /( + ). This means that ( + )k (+) X k ( + )k (+) P(X + Y = k) = e p (1 p)k = e , k! k!

k =0
using that 11.3
Pk
=0
p (1 p)
k X =0
= 1.
! !
pZ (k) = =
n k n k
( 1 )k ( 1 ) ( 1 )n(k ) ( 3 )m 2 4 2 4
!
1 ( 1 )n ( 4 ) ( 3 )m . 2 4
k X =0
This cannot be simplied and is not equal to a binomial probability of the type n+m k r (1 r)n+mk for some 0 < r < 1. k 11.4 a From the fact that X has an N (2, 5) distribution, it follows that E [X] = 2 and Var(X) = 5. Similarly, E [Y ] = 5 and Var(Y ) = 9. Hence by linearity of expectations, E [Z] = E [3X 2Y + 1] = 3E [X] 2E [Y ] + 1 = 3 2 2 5 + 1 = 3. By the rules for the variance and covariance, Var(Z) = 9Var(X) + 4Var(Y ) 12Cov(X, Y ) = 9 5 + 4 9 12 0 = 81, using that Cov(X, Y ) = 0, due to independence of X and Y .
29.1 Full solutions
493
11.4 b The random variables 3X and 2Y + 1 are independent and, according to the rule for the normal distribution under a change of units (page 112), it follows that they both have a normal distribution. Next, the sum rule for independent normal random variables then yields that Z = (3X) + (2Y + 1) also has a normal distribution. Its parameters are the expectation and variance of Z. From a it follows that Z has an N (3, 81) distribution. 11.4 c From b we know that Z has an N (3, 81) distribution, so that (Z + 3)/9 has a standard normal distribution. Therefore

P(Z 6) = P
6+3 Z +3 9 9
= (1),
where is the standard normal distribution function. From Table ?? we nd that (1) = 1 0.1587 = 0.8413. 11.5 According to the addition rule, the probability density of Z = X + Y is given by Z 1 Z fX (z y)fY (y) dy, fX (z y)fY (y) dy = fZ (z) =
0
where we use that fY (y) = 0 for y [0, 1]. When 0 y 1, the following holds. / For z < 0, also z y < 0 so that fX (z y) = 0, and for z > 2, z y > 1 so that fX (z y) = 0. For 0 z < 1:
Z
1
fZ (z) =
0
fX (z y)fY (y) dy =
0
1 dy = z,
whereas for 1 z 2:
Z
1
fZ (z) =
0
fX (z y)fY (y) dy =
z1
1 dy = 2 z.
11.6 According to the addition rule

Z
z
z 0
fZ (z) =
0
fX (z y)fY (y) dy = 1 z/2 e 16

Z
0 z
1 1 (z y)e(zy)/2 yey/2 dy. 4 4

z
(z y)y dy =
1 z/2 1 2 1 3 e zy y 16 2 3
=
0
z 3 z/2 e . 96
11.7 Each Xi has the same distribution as the sum of k independent Exp () distributed random variables. Since the Xi are independent, X1 + X2 + + Xn has the same distribution as the sum of nk independent Exp () distributed random variables, which is a Gam (nk, ) distribution. 11.8 a Y = rX + s has probability density function fY (y) =
y s 1 1 fX = r r r 1 +
ys 2 r
=
r . (r2 + (y s)2 )
We see that Y = rX + s has a Cau (s, r) distribution. 11.8 b If Y = (X )/ then Y has a Cau (0, 1) distribution.
494

Z Z
1
11.9 a According to the product rule on page 172,

z
fZ (z) =
1
fY
Z
1 z
z
fX (x)
1 dx = x

1
z 2 x
3 = 2 z 3 = 2
Z
1 3 1 dx = 2 x2 x3 z 2

z
1
3 1 dx x4 x

3 1 = 2 z2
1 1 2 z
1 1 4 z2 z
11.9 b According to the product rule,

z
fZ (z) =
1
fY
z
z 1
1 fX (x) dx = x x
Z
1
= +1 z
x1 dx = +1 z 1 1 z +1

z +1 x+1 x z
1 dx x
=
1
1 1 z z +1
z +1
11.10 a In the quotient rule for Z = X/Y for 0 < z < xed, we must have 1 zx < and 1 x < . Hence for 0 < z < 1,
Z
1/z
fZ (z) =
1/z
fX (zx)fY (y)x dx =
Z
1/z
4 = 3 z For z 1, we nd
4 dx = 3 z
x4 4
2 2 x dx (zx)3 x3 = 1 0 (z 4 ) = z. z3
1/z
fZ (z) =
1
fX (zx)fY (y)x dx = 4 z3
Z
1
2 =
1
x5 dx =
4 z3
x 4
(zx)3 1 4
2 x dx x3
1 . z3
11.10 b In the quotient rule for Z = X/Y for 0 < z < xed, we must have 1 zx < and 1 x < . Hence for 0 < z < 1,
Z
fZ (z) =
1/z
fX (zx)fY (y)x dx = z +1
Z
1/z 1/z
x dx (zx)+1 x+1

x1 dx =
z +1
x +
1 1 = 0 (z + ) = z . + z +1 +
1/z
For z 1, we nd
Z
fZ (z) =
1/z
fX (zx)fY (y)x dx =
Z
1
dx (zx)+1 x+1

= +1 z
x1 dx =
z +1
x +

1
1 . + z +1
29.1 Full solutions
495
11.11 a Put T3 = X1 + X2 + X3 . Then T3 can be interpreted as the time of the third success in a series of independent experiments with probability p of success. Then T3 = k means that the rst (k 1) experiments contained exactly 2 successes, and the k-th experiment was also a success. Since the number of successes in (k 1) experiments has a Bin (k 1, p) distribution, it follows that P(T3 = k) = k1 2 1 p (1 p)(k12) p = (k 1)(k 2)p3 (1 p)k3 . 2 2
!
11.11 b This follows from a smart calculation:

X k=3 X k=3
pZ (k) = 1 1 (k 1)(k 2)p3 (1 p)k3 = 1 2
1 2X p (k 1)(k 2)p(1 p)k3 = 1 ( now put k 2 = m) 2 1 2 p m(m + 1)p(1 p)m1 = 1 2 m=1 1 2X p (m2 + m)P(X1 = m) = 1 2 m=1
k=3 X
1 2 2 p (E X1 + E [X1 ]) = 1 2 2 p2 (E X1 + E [X1 ]) = 2. 11.11 c The rst part follows directly from b:

2 E X1 =
2 2 1 2p E [X1 ] = 2 = . p2 p p p2
2
For the second part:

2 Var(X1 ) = E X1 (E [X1 ])2 =
2p p2
1 p
1p . p2
11.12 (1) =
R
0
et dt = 1 and
Z
0
(x + 1) =
tx et dt = tx et
t= t=0
xtx1 et dt.
Since x > 0, the rst term on the right hand side is zero and the second term is equal to x(x). 11.13 a
496

Z
a 0
P(Zn a) =

n z n1 ez dz (n 1)!
n1 z n1 ez (n 1)!
z=a
+
Z
(a)n1 a e + = (n 1)! =
0 z=0 a n1 n2 z
n1 z n2 ez dz (n 2)! dz
z e (n 2)!
(a)n1 a e + P(Zn1 a) . (n 1)!
11.13 b Use part a recursively: P(Zn a) = = . . . =

n1 X i=1
(a)n1 a e + P(Zn1 a) (n 1)! (a)n1 a (a)n2 a e e + P(Zn2 a) (n 1)! (n 2)!
(a)i a e + P(Z1 a) . i!
11.13 c Since P(Z1 a) = 1 ea , we nd P(Zn a) = P(Z1 a) = 1 ea

n1 X i=1 n1 X i=1
(a)i a e i!
n1
X (a)i a (a)i a e =1 e . i! i! i=0
12.1 e This is certainly open to discussion. Bankruptcies: no (they come in clusters, dont they?). Eggs: no (I suppose after one egg it takes the chicken some time to produce another). Examples 3 and 4 are the best candidates. Example 5 could be modeled by the Poisson process if the crossing is not a dangerous one; otherwise authorities might take measures and destroy the homogeneity. 12.2 Let X be the number of customers on a day. Given is that P(X = 0) = 105 . Since X is Poisson distributed, P(X = 0) = e . So e = 105 , which implies = 5 ln(10), and hence = 11.5. Then also E [X] = = 11.5. 12.3 When N has a Pois (4) distribution, P(N = 4) = 44 e4 /4! = 0.195. 12.4 When X has a Pois (2) distribution, P(X 1) = P(X = 0) + P(X = 1) = e2 + 2e2 = 3e2 = 0.406. 12.5 a Model the errors in the bytes of the hard disk as a Poisson process, with intensity per byte. Given is that 220 = 1, or = 220 . The expected number of errors in 512 = 29 bytes is 29 = 211 = 0.00049. 12.5 b Let Y be the number of errors on the hard disk. Then Y has a Poisson distribution with parameter = 39054015 0.00049 = 19069.34. Then P(Y 1) = 1 P(Y = 0) = 1 e19069.34 = 1.00000 . For all practical purposes this happens with probability 1.
29.1 Full solutions
497
12.6 The expected numbers of aws in 1 meter is 100/40 = 2.5, and hence the number of aws X has a Pois (2.5) distribution. The answer is P(X = 2) = 1 (2.5)2 e2.5 = 0.256. 2! 12.7 a It is reasonable to estimate with (nr. of cars)/(total time in sec.) = 0.192. 12.7 b 19/120 = 0.1583, and if = 0.192 then P(N (10) = 0) = e0.19210 = 0.147. 12.7 c P(N (10) = 10) with from a seems a reasonable approximation of this probability. It equals e1.92 (0.192 10)10 /10! = 2.71 105 . 12.8 a We have E [X(X 1)] =
X k=0 X k=2
k(k 1)e k /k! =
e k /(k 2)!
X j=0
X 2 k=2
e k2 /(k 2)! = 2
e j /j! = 2 .
12.8 b Since Var(X) = E [X(X 1)] + E [X] (E [X])2 , we have Var(X) = 2 + 2 = . 12.9 In a Poisson process with intensity 1, the number of points in a interval of length t has a Poisson distribution with parameter t. So the number Y1 of points in [0, 1 ] is Pois (1 ) distributed, and the number Y2 in [1 + 1 + 2 ] is Pois (2 ) distributed. But the sum Y1 + Y2 of these is equal to the number of points in [0, 1 + 2 ] , and so is Pois (1 + 2 ) distributed. 12.10 We have to consider the numbers pk = P(X = k) = e . To compare k! them, divide: pk+1 /pk = /(k + 1), and note that pk < ( or > )pk+1 is equivalent to pk+1 /pk > ( or < )1. So it follows immediately that for < 1 the probabilities P(X = k) are decreasing, and for > 1 the probabilities are increasing as long as k+1 > 1, and decreasing from the moment where this fraction has become less than 1. Lastly if = 1, then p0 = p1 > p2 > p3 > . . . . 12.11 Following the hint, we obtain: P(N ([0, s] = k, N ([0, 2s]) = n) = P(N ([0, s]) = k, N ((s, 2s]) = n k) = P(N ([0, s]) = k) P(N ((s, 2s]) = n k) = (s)k es /(k!) (s)nk es /((n k)!) = (s)n e2s /(k!(n k)!). So P(N ([0, s]) = k | N ([0, 2s]) = n) = P(N ([0, s]) = k, N ([0, 2s]) = n) P(N ([0, 2s]) = n) = n!/(k!(n k)!) (s)n /(2s)n = n!/(k!(n k)!) (1/2)n .
1 This holds for k = 0, . . . , n, so we nd the Bin (n, 2 ) distribution.
k
12.12 a The event {X2 t} is a disjoint union of the events {X1 s, X2 t} and {X1 > s, X2 t}.
498
12.12 b If X2 t and Na = 2, then the rst two points of the Poisson process must lie in [0, t], and no points lie in (t, a], and conversely. Therefore P(X2 t, Na = 2) = P(N ([0, t]) = 2, N ((t, a]) = 0) 2 2 t = 2! et e(at) 1 2 2 a . = 2 t e Similarly, P(X1 > s, X2 t, Na = 2) = P(N ([0, s)) = 0, N ([s, t]) = 2, N ((t, a]) = 0) 2 2 = es (ts) e(ts) e(at) 2! 1 = 2 2 (t s)2 ea . The result now follows by using part a. 12.12 c This follows immediately from part b since P(X1 s, X2 t, Na = 2) , P(X1 s, X2 t | Na = 2) = P(Na = 2) where the denominator equals
1 2 2 a 2
ea .
12.13 a Given that Nt = n + m, the number of daisies Xt has a binomial distribution with parameters n + m 1/4. So P(Xt = n, Yt = m|Nt = n + m) = and m n 1 3 P(Xt = n|Nt = n + m) = n+m 4 . n 4 12.13 b Using part a we nd that P(Xt = n, Y = m) = P(Xt = n, Yt = m|Nt n + m) P(Nt = n + m) = = t
n+m n 1 4 n 3 4 m (t)n+m et (n+m)!
1 1 n! m!
1 4
3 4
(t)n+m et .
12.13 c Using b we nd that P P(Xt = n) = P(Xt = n, Yt = m) = m=0 P 1 1 n 1 1 ( ) (t)n e(/4)t m! (3/4)m e(3/4)t = n! ((/4)t)n e(/4)t , m=0 n! 4 since the sum over m just adds all the probabilities P(Z = m) where Z has a Pois (3/4) distribution. 12.14 a This follows since 1 1/n 1 and 1/n 0.
1 1 12.14 b This is an easy computation: E [Xn ] = (1 n ) 0 + ( n ) 7n = 7 for all n. 13.1 For U (1, 1), = 0 and = 1/ 3, and we obtain the following table:
2 3 4
P(|Y | < k) 0.577 1 1 1 For U (a, a), = 0 and = a/ 3, and we obtain the following table: k 1 2 3 4
P(|Y | < k) 0.577 1 1 1 For N (0, 1), = 0 and = 1, and we obtain the following table: k 1 2 3 4
P(|Y | < k) 0.6826 0.9544 0.9974 1
29.1 Full solutions
499
For N (, 2 ), P(|Y | < k) = P(|Y |/ < k), and we obtain the same result as for N (0, 1). p For Par (3), = 3/2, = 3/4, and
Z
3/2+k 1 3/2
P(|Y | < k) = and we obtain the following table: k 1
3x4 dx = 1 (3/2 + k 3/2)3 ,
P(|Y | < k) 0.925 0.970 0.985 0.992 For Geo (3), = 2, = 2, and P(|Y | < k) = P |Y 2| < k 2 = P 2 k 2 < Y < 2 + k 2 = P Y < 2 + k 2 , and we obtain the following table: k 1 2 3 4
P(|Y | < k) 0.875 0.9375 0.9844 0.9922 13.2 a From the formulas for the U (a, b) distribution, substituting a = 1/2 and b = 1/2, we derive that E [Xi ] = 0 and Var(Xi ) = 1/12. 13.2 b We write S = X1 + X2 + + X100 , for which we nd E [S] = E [X1 ] + + 1 E [X100 ] = 0 and, by independence, Var(S) = Var(X1 ) + + Var(X100 ) = 100 12 = 100/12. We nd from Chebyshevs inequality: P(|S| > 10) = P(|S 0| > 10) Var(S) 1 = . 102 12
13.3 We can apply the law of large numbers to the sequence (Yi ), with Yi = |Xi | R 0.5 1 for i = 1, . . . . Since E [Yi ] = E [|Xi |] = 2 0 P dx = 2 2 (0.5)2 = 0.25, and since x n 1 the Yi have nite variance, it follows that n i=1 |Xi | 0.25 as n . 13.4 a Because Xi has a Ber (p) distribution, E [Xi ] = p and Var(Xi ) = p (1 p), and so E Xn = p and Var Xn = Var(Xi ) /n = p (1 p)/n. By Chebyshevs inequality: p (1 p)/n 25p(1 p) P |Xn p| 0.2 = . (0.2)2 n The right-hand side should be at most 0.1 (note that we switched to the complement). If p = 1/2 we therefore require 25/(4n) 0.1, or n 25/(4 0.1) = 62.5, i.e., n 63. Now, suppose p = 1/2, using n = 63 and p(1 p) 1/4 we conclude that 25p(1 p)/n 25 (1/4)/63 = 0.0992 < 0.1, so (because of the inequality) the computed value satises for other values of p as well. 13.4 b For arbitrary a > 0 we conclude from Chebyshevs inequality: P |Xn p| a p(1 p) p (1 p)/n 1 = , a2 na2 4na2
where we used p (1 p) 1/4 again. The question now becomes: when a = 0.1, for what n is 1/(4na2 ) 0.1? We nd: n 1/(4 0.1 (0.1)2 ) = 250, so n = 250 is large enough.
500
13.4 c From part a we know that an error of size 0.2 or occur with a probability of at most 25/4n, regardless of the values of p. So, we need 25/(4n) 0.05, i.e., n 25/(4 0.05) = 125. 13.4 d We compute P Xn 0.5 for the case that p = 0.6. Then E Xn = 0.6 and Var Xn = 0.6 0.4/n. Chebyshevs inequality cannot be used directly, we need an intermediate step: the probability that Xn 0.5 is contained in the event the prediction is o by at least 0.1, in either direction. So 0.6 0.4/n 24 = P Xn 0.5 P |Xn 0.6| 0.1 (0.1)2 n For n 240 this probability is 0.1 or smaller. 13.5 To get P |Mn c| 0.5 0.9, we must have P |Un | 0.5 0.9. Now 3/n P |Un | 0.5 Var(Ui )/n = 0.25 = 12 . If we want this below 0.1, we should take n (0.5)2 n 120. 13.6 The probability distribution for an individual game has P(X = 1) = 18 , and 37 1 P(X = 1) = 19 . From this we compute E [X] = (1) 18 + 1 19 = 37 . If the game is 37 37 37 played 365 1000 times, and we suppose that the roulette machine is fair, the total 1 gain will, according to the law of large numbers, be close to 365 1000 37 = 9865 (Euro). (Using Chebyshevs inequality one can, for example, nd a lower bound for the probability that the total gain will be between 9865 250 and 9865 + 250.) 13.7 Following the hint we dene Yi = 1P when Xi (, a], and 0 otherwise. 1 In this way we have written Fn (a) = n n Yi . The expectation of Yi equals i=1 E [Yi ] = P(Xi (, a]) = F (a). Moreover, Var(Yi ) = F (a)(1 F (a)) is nite. Since Fn (a) = Yn , the law of large numbers tells us that
n
lim P(|Fn (a) F (a)| > ) = 0.

R a+h
13.8 a We have p = E [Y ] = ah xex dx 2h 2e2 . (Of course the integral can be computed exactly, but this approximation is excellent: for h = 0.25 we get p = 0.13534, while the exact result is p = 0.13533.) Now Var(Y ) = p(1 p) = 4he2 (1 4he2 ), and Var Yn /2h = 13.8 b What is required is that P Yn /2h f (a) 0.2f (a) 0.2. Chebyshevs inequality yields that Var Yn /2h , P Yn /2h f (a) 0.2f (a) 0.04f (a)2 so we want Var Yn /2h 0.008 (2e2 )2 . Using part a this leads to 1 0.008 e2 n 1 e2 = n e2 1 = 798.6 0.008 = n = 799. 1 4he2 (1 4he2 ). 4h2 n
29.1 Full solutions
501
13.9 a The statement looks like the law of large numbers, and indeed, if we look 2 more closely, we see that Tn is the average of an i.i.d. sequence: dene Yi = Xi , then Tn = Yn . The law of large numbers now states: if Yn is the average of n independent random variables with expectation and variance 2 , then for any > 0: limn P |Yn | > = 0. So, if a = and the variance 2 is nite, then it is true.
2 13.9 b We compute expectation and variance of Yi : E [Yi ] = E Xi =
R1
= 1/3. And: E =E dx = 1/5, so Var(Yi ) = 1/5 (1/3) = 4/45. The variance is nite, so indeed, the law of large numbers applies, and the statement 2 is true if a = E Xi = 1/3. 13.10 a For 0 1: P(|Mn 1| > ) = P(1 < Mn < 1 + ) = P(1 < Mn ) = (1 )n . 13.10 b For any 0 < : limn (1 )n = 0. The conclusion is that Mn converges to 1, as n goes to innity. This cannot be obtained by a straightforward application of Chebyshevs inequality or the law of large numbers. 13.11 a We have P(|X 1| > 8) = P(X = 10) = 1/10. On the other hand we get from Chebyshevs inequality P(|X 1| > 8) (t 1)/64 = 9/64. 13.11 b For a = 5 we have EXACT: P(|X 1| > 5) = P(X = 10) = 1/10, CHEB: P(|X 1| > 5) 9/25, so the Chebyshev gap equals 9/25-1/10=0.26. For a = 10 we have EXACT: P(|X 1| > 10) = 0, CHEB: P(|X 1| > 10) 9/100, so the Chebyshev gap equals 9/100=0.09. 13.11 c Choosing a = t as in the previous question, we have P(|X 1| > t) = 0, and P(|X 1| > t) (t 1)/t2 < 1/t. So to answer the question we can simply choose t = 100, t = 1000 and t = 10 000. 13.11 d As we have seen that the Chebyshev gap can be made arbitrarily small (taking arbitrarily large t in part c) we can not nd a closer bound for this family of probability distributions. P P 1 1 13.12 a Since E Xn = n n i , and Var Xn = n2 n Var(Xi ), this inequali=1 i=1 ity follows directly from Chebyshevs. 13.12 b This follows directly from
n 1 X 1 Var Xn = 2 Var(Xi ) M 0 n i=1 n
Yi2
4 Xi
R1
1 4 x 1 2
1 2 x 1 2 2
dx =
as
n .
14.1 Since = 2 and = 2, we nd that P(X1 + X2 + + X144 > 144) = P X144 > 1 X144 1 =P 144 > 144 12 2 1 (6) = 1. = P Z144 > 12

502
14.2 Since E [Xi ] = 1/4 and Var(Xi ) = 3/80, using the central limit theorem we nd P(X1 + X2 + + X625 < 170) = P X625 1/4 170/625 1/4 p 625 p < 625 3/80 3/80
!
(2.8402) = 1 0.0023 = 0.9977. 14.3 First note that P |Xn p| < 0.2 = 1P Xn p 0.2 P Xn p 0.2 . Because = p and 2 = p(1 p), we nd, using the central limit theorem: P Xn p 0.2 = P 0.2 Xn p np np p(1 p) p(1 p) 0.2 np p(1 p)
! ! !
= P Zn
P Z
0.2 np p(1 p)
where Z has an N (0, 1) distribution. Similarly, P Xn p 0.2 P Z 0.2 np p(1 p)

! !
so we are looking for the smallest positive integer n such that 1 2P Z 0.2 np p(1 p)
!
0.9,
i.e., the smallest positive integer n such that P Z From Table ?? it follows that np 0.2 1.645. p(1 p) 0.2 np p(1 p) 0.05.
Since p(1 p) 1/4 for all p between 0 and 1, we see that n should be at least 17. 14.4 Using the central limit theorem, with = 2, and 2 = 4, P(T1 + T2 + + T30 60) = P 22 T30 2 30 2 2 1 (0) = . 2 30

14.5 In Section 4.3 we have seen that X has the same probability distribution as X1 + X2 + + Xn , where X1 , X2 , . . . , Xn are independent Ber (p) distributed random variables. Recall that E [Xi ] = p, and Var(Xi ) = p(1 p). But then we have for any real number a that X np a P p np(1 p)
!
=P
X1 + X2 + + Xn np p a np(1 p)
= P(Zn a) ;
29.1 Full solutions see also (14.1). It follows from the central limit theorem that X np P p a np(1 p) i.e., the random variable Xnp normal. 14.6 a Using the result of Exercise 14.5 yields that X 25 25 25 P(X 25) = P p p 75/4 75/4 X 25 0 =P p 75/4 P(Z 0) 1 = , 2 where Z has a standard normal distribution. In the same way, X 25 26 25 P(X < 26) = P p p 75/4 75/4 X 25 2 =P p 5 3 75/4 P(Z 0.2309) = 0.591. 14.6 b P(X 2) P(Z 5.31) 0 (the table ends with z = 3.49).
! ! ! !
np(1p)
503
(a),
has a distribution that is approximately standard
1/4 1/4 14.7 Setting Y = n Xn , one easily shows that E [Y ] = n , and Var(Y ) = 1/ n. Since 1 Xn 4 P n a = P(|Y E [Y ]| a) ,
it follows from Chebyshevs inequality that

n4
Xn a
1 . a2 n
As a consequence, if n goes to innity, we see that most of the probability mass 1 of the random variable n 4 Xn is concentrated in the interval between a and a, for every a > 0. Since we can choose a arbitrarily small, this explains the spike in Figure 14.1.
2 2 2 14.8 a 1 = Var(Xi ) = E Xi (E [Xi ])2 = E Xi 0, so E Xi = 1. 4 14.8 b (integration by parts: E Xi =
2 3 xe 2 x dx = 3E Xi = 3) 2 4 2 Var Xi = E Xi (E Xi )2 = 3 1 = 2.
x4 1 e 2 x dx = 2
1 2
[x3 e 2 x ]
3x2 e 2 x dx =
504

1 2
14.8 c P(Y100 > 110) P Z >
0.242.
14.9 a The probability that for a chain of at least 50 meters more than 1002 links are needed is the same as the probability that a chain of 1002 chains is shorter than 50 meters. Assuming that the random variables X1 , X2 , . . . , X1002 are independent, and using the central limit theorem, we have that

P(X1 + X2 + + X1002 < 5000) P Z <
5000 5 1002 1002 0.04
= 0.0571,
where Z has an N (0, 1) distribution. So about 6% of the customers will receive a free chain. 14.9 b We now have that P(X1 + X2 + + X1002 < 5000) P(Z < 0.0032) , which is slightly larger than 1/2. So about half of the customers will receive a free chain. Clearly something has to be done: a seemingly minor change of expected value has major consequences! 14.10 a Note that 3 Mn c 3 P |Mn c| 0.5 = P n < n 6 6 3 3 P n<Z< n 6 6 3 = 1 2P Z n , 6
where Z is N (0, 1). Now choose n such that 1 2P Z 63 n = 0.9, i.e., let n be such that P Z 63 n = 0.05. Then 63 n = 2.75 yielding that n = 90.75. Since n is an integer, we nd n = 91. 14.10 b In case the Ui are normally distributed, the random variable (Mn c)/ has an N (0, 1) distribution, and the calculations in the answer of a are exact (cf. Section 5.5). In all other cases this random variable has a distribution which is by approximation equal to an N (0, 1) distribution (see also the discussion in Section 14.2 on the size of n).
15.1 a For bin (32.5, 33.5], the height equals the number of xj in Bi divided by n|Bi |: 3/(5732 1) = 0.00052. Similar computations for the other bins give bin (32.5, 33.5] (33.5, 34.5] (34.5, 35.5] (35.5, 36.5] (36.5, 37.5] (37.5, 38.5] (38.5, 39.5] (39.5, 40.5] height 0.00052 0.00331 0.01413 0.03297 0.07135 0.13137 0.18526 0.18876 bin (40.5, 41.5] (41.5, 42.5] (42.5, 43.5] (43.5, 44.5] (44.5, 45.5] (45.5, 46.5] (46.5, 47.5] (47.5, 48.5] height 0.16312 0.11270 0.05461 0.02931 0.00872 0.00314 0.00052 0.00017
29.1 Full solutions
505
15.1 b Symmetric, since the histogram has one mode with more or less the same decay in both directions.
0.20
0.15
0.10
0.05
0 32 34 36 38 40 42 44 46 48 50
15.2 a For bin (50, 55], the height equals the number of xj in Bi divided by n|Bi |: 1/(23 5) = 0.0087. The heights of the other bins can be computed similarly. bin (50,55] (55,60] (60,65] (65,70] (70,75] (75,80] (80,85] height 0.0087 0.0174 0.0087 0.0870 0.0348 0.0348 0.0087
15.2 b No, 31 degrees is extremely low compared to the temperature experienced on earlier takeos. 15.3 a For bin (0, 250], the height equals the number of xj in Bi divided by n|Bi |: 141/(190 250) = 0.00297. The heights of the other bins can be computed similarly. Bin (0,250] (250,500] (500,750] (750,1000] (1000,1250] (1250,1500] (1500,1750] (1750,2000] (2250,2500] (2250,2500] Height 0.00297 0.00067 0.00015 0.00008 0.00002 0.00004 0.00004 0 0 0.00002
506
15.3 b Skewed, since the histogram has one mode and only decays to the right.
0.003
0.002
0.001
0 0 500 1000 1500 2000 2500
15.4 a For bin (0, 500], the height equals the number of xj in Bi divided by n|Bi |: 86/(135 500) = 0.0012741. The heights of the other bins can be computed similarly.
Bin [0,500] (500,1000] (1000,1500] (1500,2000] (2000,2500] (2500,3000] (3000,3500] (3500,4000] (4000,4500] (4500,5000] (5000,5500] (5500,6000] (6000,6500]
Height 0.0012741 0.0003556 0.0001778 0.0000741 0.0000148 0.0000148 0.0000296 0 0.0000148 0 0.0000148 0.0000148 0.0000148
15.4 b Since all elements (not rounded) are strictly positive, the value of Fn at zero is zero. At 500 the value of Fn is the number of xi 500 divided by n: 86/135 = 0.6370. The values at the other endpoints can be computed similarly: t 0 500 1000 1500 2000 2500 3000 Fn (t) 0 0.6370 0.8148 0.9037 0.9407 0.9481 0.9556 t 3500 4000 4500 5000 5500 6000 6500 Fn (t) 0.9704 0.9704 0.9778 0.9778 0.9852 0.9926 1
15.4 c The area under the histogram on bin (1000, 1500] is 500 0.000178 = 0.0889, and Fn (1500) Fn (1000) = 0.9037 0.8148 = 0.0889.
29.1 Full solutions
507
15.5 Since the increase of Fn on bin (0, 1] is equal to the area under the histogram on the same bin, the height on this bin must be 0.2250/(1 0) = 0.2250. Similarly, the height on bin (1, 3] is (0.445-0.225)/(3-1)=0.1100, and so on: Bin (0, 1] (1, 3] (3, 5] (5, 8] (8, 11] (11, 14] (14, 18] Height 0.2250 0.1100 0.0850 0.0400 0.0230 0.0350 0.0225
15.6 Because (2 0) 0.245 + (4 2) 0.130 + (7 4) 0.050 + (11 7) 0.020 + (15 11) 0.005 = 1, there are no data points outside the listed bins. Hence Fn (7) = number of xi 7 n number of xi in bins (0, 2], (2, 4] and (4, 7] = n n (2 0) 0.245 + n (4 2) 0.130 + n (7 4) 0.050 = n = 0.490 + 0.260 + 0.150 = 0.9.
15.7 Note that the increase of Fn on each bin is equal to the area under the histogram on the same bin. Since all bins have the same width, it follows from the results of Exercise 15.2 that the increase of Fn is the largest (5 0.0.870 = 0.435) on (65, 70]. 15.8 K does not satisfy (K1), since it is negative between 0 and 1 and does not integrate to one. 15.9 a The scatterplot indicates that larger durations correspond to longer waiting times. 15.9 b By judgement by the eye, the authors predicted a waiting time of about 80. 15.9 c If you project the points on the vertical axes you get the dataset of waiting times. Since the two groups of points in the scatterplot are separated in North East direction (both vertical as well as horizontal), you will see two modes. 15.10 Each steep part of Fn corresponds to a mode of the dataset. One counts four of such parts. 15.11 The height of the histogram on a bin (a, b] is number of xi in (a, b] (number of xi b) (number of xi a) = n(b a) n(b a) Fn (b) Fn (a) . = ba
508
15.12 a By inserting the expression for fn,h (t), we get

Z n t xi 1 X t fn,h (t) dt = t K dt nh i=1 h
n Z 1X t t xi K dt. n i=1 h h
For each i xed we nd with change of integration variables u = (t xi )/h,

Z

t K h
t xi h
dt =
(xi + hu)K (u) du

Z
= xi
K (u) du + h
uK (u) du = xi ,
using that K integrates to one and that uK (u) du = 0, because K is symmetric. Hence Z n Z n t xi 1X 1X t K xi . dt = t fn,h (t) dt = n i=1 h h n i=1 15.12 b By means of a similar reasoning
Z
t2 fn,h (t) dt = =
t2
n Z t xi 1 X t2 K dt. n i=1 h h
n 1 X t xi K dt nh i=1 h
For each i:
Z

t2 K h
t xi h
dt
Z
(xi + hu)2 K (u) du =
(x2 + 2xi hu + h2 u2 )K (u) du i

2
= =
x2 i x2 i
K (u) du + 2xi h
uK (u) du + h
u2 K (u) du
+h
u K (u) du,
again using that K integrates to one and that K is symmetric. 16.1 a The dataset has 135 elements, so the sample median is the 68th element in order of magnitude. From the table in Exercise 15.4 we see that this is 290. 16.1 b The dataset has n = 135 elements. The lower quartile is the 25th empirical percentile. We have k = 0.25 (135 + 1) = 34, so that = 0, and qn (0.25) = x(34) = 81, according to the table in Exercise 15.4. Similarly, the upper quartile is qn (0.75) = x(102) = 843, and the IQR is 843 81 = 762. 16.1 c We have k = 0.37 (135 + 1) = 50.32 = 50, so that = 0.32, and qn (0.37) = x(50) + 0.32 (x(51) x(50) ) = 143 + 0.32 (148 143) = 144.6.
29.1 Full solutions
509
16.2 To compute the lower quartile, we use that n = 272, so that k = 0.25 273 = 68.25 = 68 and = 0.25, and hence from Table 15.2: qn (0.25) = x(68) + 0.25 (x(69) x(68) ) = 129 + 0.25 (130 129) = 129.25. Similarly, qn (0.5) = 240 and qn (0.75) = 267.75. Hence, |qn (0.75) qn (0.5)| = 27.75 and |qn (0.25) qn (0.5)| = 110.75. The kernel density estimate (see Figure 15.3) has a peak just right of 240, so there is relatively a lot of probability mass just right of the sample median, namely about 25% between 240 and 110.75. 16.3 a Because n = 24, the sample median is the average of the 12th and 13th element. Since these are both equal to 70, the sample median is also 70. The lower quartile is the pth empirical quantile for p = 1/4. We get k = p(n + 1) = 0.25 (24 + 1) = 6.25 = 6, so that qn (0.25) = x(6) + 0.25 (x(7) x(6) ) = 66 + 0.25 (67 66) = 66.25. Similarly, the upper quartile is the pth empirical quantile for p = 3/4: qn (0.75) = x(18) + 0.75 (x(19) x(18) ) = 75 + 0.75 (75 75) = 75. 16.3 b In part a we found the sample median and the two quartiles. From this we compute the IQR: qn (0.75) qn (0.25) = 75 66.25 = 8.75. This means that qn (0.25) 1.5 IQR = 66.25 1.5 8.75 = 53.125, qn (0.75) + 1.5 IQR = 75 + 1.5 8.75 = 88.125. Hence, the last element below 88.125 is 88, and the rst element above 53.125 is 57. Therefore, the upper whisker runs until 88 and the lower whisker until 57, with two elements 53 and 31 below. This leads to the following boxplot:
81 75 70 66.25 57
31
16.3 c The values 53 and 31 are outliers. Value 31 is far away from the bulk of the data and appears to be an extreme outlier. 16.4 a Check that yn = 700/99 and xn = 492/11, so that 5 5 (n 32) = x 9 9

492 32 11
700 . 99
16.4 b Since n = 11, for both datasets the sample median is 6th element in order of magnitude. Check that Med(y1 , . . . , yn ) = 50/9 and Med(x1 , . . . , xn ) = 42, so that 5 5 50 (Med(x1 , . . . , xn ) 32) = (42 32) = , 9 9 9 which is equal to Med(y1 , . . . , yn ).
510
16.4 c For any real number a, we have

n n 1X 1X yn = yi = (axi + b) = a n i=1 n i=1 n 1X xi n i=1
+ b = an + b. x
For a 0, the order of magnitude of yi = axi + b is the same as that of the xi s, and therefore Med(y1 , . . . , yn ) = aMed(x1 , . . . , xn ) + b. When a < 0, the order of yi = axi + b is reverse to that of the xi s, but the position of the middle number in order of magnitude (or average of the two middle numbers) remains the same, so that the above rule still holds. 16.5 a Check that the sample variance of the yi s is (sC )2 = 132550/992 , so that
2 2
5 9
(sF )2 =
5 9
132550 482 = , 11 992
which is equal to (sC )2 , and hence sC = 5 sF . 9 16.5 b For the xi s we have Med(x1 , . . . , xn ) = 42. This leads to the following table 43 43 41 41 41 42 43 58 58 41 41 xi |xi 42| 1 1 1 1 1 0 1 16 16 1 1 |xi 42| ordered 0 1 1 1 1 1 1 1 1 16 16 so that MADF = Med|xi 42| = 1. Similarly, we have Med(y1 , . . . , yn ) = yi |yi |yi
50 | 9 50 | 9 55 9 5 9 55 9 5 9 5 9 50 , 9
and
5
5 9 5 9
5
5 9 5 9
5
5 9 5 9
50 9
55 9 5 9 5 9
130 9 80 9 5 9
130 9 80 9 5 9
5
5 9 80 9
5
5 9 80 9
0
5 9
ordered
0
50 | 9
so that MADC = Med|yi
= 5 . Therefore, MADC = 5 MADF . 9 9
16.5 c Since, yn = an + b, for any real number a we have for the sample variance x of the yi s s2 = Y
n n 1 X 1 X (yi yn )2 = ((axi + b) (an + b))2 x n 1 i=1 n 1 i=1 2
n 1 X =a (xi xn )2 = a2 s2 X n 1 i=1
so that sY = |a| sX . Because MedY = aMedX + b, it follows that MADY = Med|yi MedY | = Med|(axi + b) (aMedX + b)| = |a|Med|xi MedX | = |a|MADX . 16.6 a Yes, we nd x = (1 + 5 + 9)/3 = 15/3 = 5, y = (2 + 4 + 6 + 8)/4 = 20/4 = 5, so that ( + y )/2 = 5. The average for the combined dataset is also equal to 5: x (15 + 20)/7 = 5.
29.1 Full solutions 16.6 b The mean of x1 , x2 , . . . , xn , y1 , y2 , . . . , ym equals x 1 + + x n + y1 + + ym nn + mm x y n m = = xn + ym . n+m n+m n+m n+m
511
In general, this is not equal to (n + ym )/2. For instance, replace 1 in the rst dataset x 1 by 4. Then xn = 6 and ym = 5, so that (n + ym )/2 = 5 2 . However, the average of x the combined dataset is 38/7 = 5 2 . 7 16.6 c Yes, m = n implies n/(n + m) = m/(n + m) = 1/2. From the expressions found in part b we see that the sample mean of the combined dataset equals (n + x ym )/2. 16.7 a We have Medx = 5 and Medy = (4 + 6)/2 = 5, whereas for the combined dataset 1 2 4 5 6 8 9 the sample median is the fourth number in order of magnitude: 5. 16.7 b This will not be true in general: take 1, 2, 3 and 5, 7 with sample medians 2 and 6. The combined dataset has sample median 3, whereas (2 + 6)/2 = 4. 16.7 c This will not be true in general: take 1, 9 and 2, 4 with sample medians 5 and 3. The combined dataset has sample median 3, whereas (5 + 3)/2 = 4. 16.8 The ordered combined dataset is 1, 2, 4, 5, 6, 8, 9, so that the sample median equals 5. The absolute deviations from 5 are: 4, 3, 1, 0, 1, 3, 4, and if we put them in order: 0, 1, 1, 3, 3, 4, 4. The MAD is the sample median of the absolute deviations, which is 3. 16.9 a One can easily check that xn = 10/3. The average of the yi s: 1 3 so that y7 = 1/7 . x 16.9 b No, for instance take 1, 2, and 3. Then xn = 2, and yn = (1 + 11/18.
1 2
1 1 +1+ 6 15
1 54 3 = , 3 60 10 + 1 )/3 = 3
16.9 c First note that the transformation y = 1/x puts the yi s in reverse order compared to the xi s. If n is odd, then the middle element in order of magnitude remains the middle element under the transformation y = 1/x. Therefore, if n is odd, the answer is yes. If n is even, the answer is no, for example take 1 and 2. Then 3 MedX = 2 and MedY = Med(1, 3 ) = (1 + 3 )/2 = 5 . 2 2 4 16.10 a The sum of the elements is 16.8 + y which goes to innity as y , and therefore the sample mean also goes to innity, because we only divide the sum by the sample size n. When y , the ordered data will be 3.0 4.2 4.6 5.0 y Therefore, no matter how large y gets, the sample median, which is the middle number, remains 4.6. 16.10 b In order to let the middle number go to innity, we must replace at least three numbers. For instance, replace 3.2, 4.2, and 5.0 by some real number y that goes to innity. In that case, the ordered data will be 3.0 4.6 y y y of which the middle number y goes to innity.
512
16.10 c Without loss of generality suppose that the dataset is already ordered. By the same reasoning as in part a, one can argue that we only have to replace one element in order to let the sample mean go to innity. When n = 2k + 1 is odd, the middle number is the (k + 1)st number. By the same reasoning as in part b one can argue that in order to let that go to innity, we must replace the middle number, as well as k other elements by y, which means we have to replace k + 1 = (n 1)/2 = (n + 1)/2 elements. When n = 2k is even, the middle number is the average of the kth and (k + 1)st number. In order to let that go to innity, it suces to replace k elements, that include either the kth or the (k + 1)st number, by y, This means we have to replace k = n/2 = (n + 1)/2 elements. 16.11 a. From Exercise 16.10 we already know that the sample mean goes to innity. This implies that (4.6 xn )2 also goes to innity and therefore also the sample variance, as well as the sample standard deviation. From Exercise 16.10 we know that the sample median remains 4.6. Hence, the ordered absolute deviations are 0 0.4 0.4 1.6 |y 4.6| Therefore, no matter how large y gets, the MAD, which is the middle number of the ordered absolute deviations, remains 0.4. b. In order to let the middle number the ordered absolute deviations go to innity, we must at least replace three numbers. For instance, replace 3.2 and 4.2 by y and 5.0 by y, where y is some real number that goes to innity. In that case, the ordered data will be y 3.0 4.6 y y of which the middle number is 4.6. The ordered absolute deviations are 0 0.4 |y 4.6| |y 4.6| |4.6 + y| The MAD is the middle number, which goes to innity. c. Without loss of generality suppose that the dataset is already ordered. By the same reasoning as in part a we only need to replace one element in order to let the sample standard deviation go to innity. When n = 2k + 1 is odd, the middle number is the (k + 1)st number. By the same reasoning as in part b one can argue that in order to let the MAD go to innity, we must replace the middle number by y, as well as k other numbers. This means we have to replace k + 1 = (n 1)/2 = (n + 1)/2 elements, for instance one by y and the rest by y. In that case, the sample median remains nite and the majority of the absolute deviations from the median tends to innity. When n = 2k is even, the middle number is the average of the kth and (k + 1)st number. In order to let the MAD go to innity, it suces to replace k elements, that include either the kth or the (k + 1)st number, by y. This means we have to replace k = n/2 = (n + 1)/2 elements, for instance one by y and the rest by y. In that case, the sample median remains nite and half of the absolute deviations from the median tends to innity. 16.12 The sample mean is 1 1 N (N + 1) N +1 (1 + 2 + + N ) = = N N 2 2
29.1 Full solutions
513
When N = 2k + 1 is odd, the sample median is the (k + 1)st number, which is k + 1 = (N 1)/2 + 1 = (N + 1)/2. When N = 2k is even, the sample median is the average of the kth and (k + 1)st number: (N/2 + (N/2 + 1)) N +1 = . 2 2 16.13 First note that the sample mean is zero, and n = 2N + 1. Therefore the sample variance is s2 = n 1 (N )2 + + (1)2 + 02 + 12 + N 2 n1 (N + 1)(2N + 1) 1 1 N (N + 1)(2N + 1) = 2 12 + N 2 = = , 2N N 6 6
q
1 so that sn = (N + 1)(2N + 1). Since n = 2N + 1 is always odd, the sample 6 median is the middle number, which is zero. Therefore we have the following
N 1 0 1 N xi |xi | N 1 0 1 N |xi | ordered 0 1 1 N N Since n = 2N + 1, which is odd, MAD = Med|xi | is the (N + 1)st number in the bottom row: (N + 1)/2.
1 16.14 When n = 2i + 1 is odd, then in the formula qn ( 2 ) = x(k) + [x(k+1) x(k) ] 1 for the 50th empirical percentile, we nd k = 2 (n + 1) = i + 1 = i + 1 and = 0, 1 so that qn ( 2 ) = x(i+1) , which is the middle number in order of magnitude. 1 When n = 2i is even, we nd k = 1 (n + 1) = i + 1 = i and = 2 , so that 2 2 1 qn ( 2 ) = x(i) + 1 2
x(i+1) x(i) =
x(i+1) +x(i) , 2
which is the average of the two middle numbers in order of magnitude. 16.15 First write
n n n n n 1X 2 1X 2 1X 1X 2 1X (xi xn )2 = xi 2n xi + x2 = x n xi 2n x xi + xn . n i=1 n i=1 n i=1 n i=1 n i=1
Next, by inserting
n 1X xi = xn n i=1
and
n 1X 2 1 xn = n x2 = x2 , n n n i=1 n
we nd
n n n 1X 2 1X 2 1X (xi xn )2 = xi 22 + x2 = xn n xi x2 . n n i=1 n i=1 n i=1
16.16 According to Exercise 15.12

Z
tfn,h (t) dt = xn
and
514

Z

t2 fn,h (t) dt =
Z n 1X 2 xi + h2 u2 K(u) du. n i=1
Therefore by the rule for the variance, Var(X) = E X 2 (E [X])2 , the variance corresponding to fn,h is
Z

Z
t2 fn,h (t) dt
2
tfn,h (t) dt
= = 1 n
t2 fn,h (t) dt (n )2 x x2 (n )2 + h2 x i
Z
n X i=1
u2 K(u) du
Z n 1X = (xi xn )2 + h2 u2 K(u) du, n i=1
according to Exercise 16.15. 16.17 For p = i/(n + 1) we nd that k = p(n + 1) = i so that qn (p) = x(k) + x(k+1) x(k) = x(i) . 17.1 Before one starts looking at the gures, it is useful to recall a couple features of the two distributions involved: 1. most of the probability mass of the N (0, 1) distribution lies between 3, and between 9 for the N (0, 9) distribution. 2. the height at zero of the exponential density is and the median of the Exp () is ln(2)/ (see Exercise 5.11). We then have the following: 1. N (3, 1), since the mode is about 3. 2. N (0, 1), since Fn is 0.5 at about zero and all elements are between about 2. 3. N (0, 1), since the mode is about zero and all elements are between about 4 and 3. 4. N (3, 1), since the mode is about 3. 5. Exp (1/3), since the sample median is about 2. 6. Exp (1), since the sample median is less than 1. 7. N (0, 1), since the mode is about zero and all elements are between about 4. 8. N (0, 9), since the mode is about zero, and all elements are between about 6 and 9. 9. Exp (1), since the height of the histogram at zero is about 0.7. Moreover, almost all elements are less than 6, whereas the probability of exceeding 6 for the Exp (1/3) distribution is 0.135. 10. N (3, 1), since Fn is 0.5 at about 3. 11. N (0, 9), since the mode is about zero and all elements are between about 12. 12. Exp (1/3), since the height of the histogram at zero is about 0.24. Moreover, there are several elements beyond 10, which has probability 0.000045 for the Exp (1) distribution. 13. N (0, 9), since the mode is about zero and all elements are between about 5 and 8. and = p(n + 1) k = 0,
29.1 Full solutions
515
14. Exp (1/3), since the height of the kernel density estimate at zero is about 0.35. 15. Exp (1), since the sample median is about 1. 17.2 We continue as in Exercise 17.1: 1. Exp (1/3), since the sample median is about 2. 2. N (0, 9), since the sample median is about zero and about 6 and 8. 3. Exp (1/3), since the sample median is about 2. 4. N (0, 1), since the sample median is about zero and about 2 and 3. 5. N (3, 1), since the sample median is about 3. 6. Exp (1), since the sample median is less than one. 7. N (0, 9), since the sample median is about zero and about 9 and 7. 8. N (0, 9), since the sample median is about zero and about 7 and 9. 9. N (3, 1), since the sample median is about 3. 10. Exp (1), since the sample median is less than one. 11. N (3, 1), since the sample median is about 3. 12. Exp (1), since the sample median is less than one. 13. N (0, 1), since the sample median is about zero and about 3. 14. N (0, 1), since the sample median is about zero and about 3 and 4. 15. Exp (1/3), since the sample median is about 2. all elements are between all elements are between
all elements are between all elements are between
all elements are between all elements are between
17.3 a The model distribution corresponds to the number of women in a queue. A queue has 10 positions. The occurrence of a woman in any position is independent of the occurrence of a woman in other positions. At each position a woman occurs with probability p. Counting the occurrence of a woman as a success, the number of women in a queue corresponds to the number of successes in 10 independent experiments with probability p of success and is therefore modeled by a Bin (10, p) distribution. 17.3 b We have 100 queues and the number of women xi in the ith queue is a realization of a Bin (10, p) random variable. Hence, according to Table 17.2, the average number of women x100 resembles the expectation 10p of the Bin (10, p) distribution. We nd x100 = 435/100 = 4.35, so an estimate for p is 4.35/10 = 0.435. 17.4 a Recall that the parameter is the expectation of the Pois () distribution. Hence, according to Table 17.2 the sample mean seems a reasonable estimate. Since, the dataset contains 229 zeros, 211 ones, etc., xn = 211 1 + 93 2 + 35 3 + 7 4 + 1 7 537 = = 0.9323. 576 576
17.4 b From part a the parameter = 0.9323. Then P(X = 0) = e0.9323 = 0.393, and the other Poisson probabilities can be computed similarly. The following table compares the relative frequencies to the Poisson probabilities. hits 0 1 2 3 4 5 6 7
rel.freq. 0.397 0.367 0.162 0.06 0.012 0.0000 0.0000 0.0017 prob. 0.393 0.367 0.171 0.053 0.012 0.0023 0.0004 0.0000
516
17.5 a One possibility is to use the fact that the geometric distribution has expectation 1/p, so that the sample mean will be close to 1/p. From the average number of cycles needed, 331/93, we then estimate p by 93/331 = 0.2809. Another possibility is to use that fact that p is equal to the probability of getting pregnant in the rst cycle. We can estimate this by the relative frequency of women that got pregnant in the rst cycle: 29/93 = 0.3118. 17.5 b From the average number of cycles needed, 1285/474, we could estimate p by 474/1285 = 0.3689. Another possibility is to estimate p by the relative frequency of women that got pregnant in the rst cycle: 198/474 = 0.4177. 17.5 c p smokers 0.2809 0.3118 p + p(1 p) + p(1 p)2 0.6281 0.6741 0.7486 0.8026
nonsmokers 0.3689 0.4177
17.6 a The parameters and are the expectation and standard deviation of the N (, 2 ) distribution. Hence, according to Table 17.2 we can estimate by the sample mean: n 2283772 1X xi = = = 39.84, n i=1 5732 and by the sample standard deviation sn . To compute this, use Exercise 16.15 to get s2 n
n 1 X n = (xi xn )2 = n 1 i=1 n1
"
n n1
"
n 1X 2 xi n i=1
!2
n 1X (xi xn )2 n i=1
(n )2 = x
5732 9124064 (39.8)2 = 4.35. 5731 5732
Hence, take sn =
4.35 = 2.09 as an estimate for .
17.6 b Since we model the chest circumference by means of a random variable X with a N (, 2 ) distribution, the required probability is equal to P(38.5 < X < 42.5) = P(X < 42.5) P(X < 38.5) 42.5 38.5 =P Z< P Z <

42.5
38.5
where Z = (X )/ has a standard normal distribution. We can estimate this by plugging in the estimates 39.84 and 2.09 for and . This gives (1.27) (0.64) = 0.8980 0.2611 = 0.6369. Another possibility is to estimate P(38.5 < X < 42.5) by the relative frequency of chest circumferences between 38.5 and 42.5. Using the information in Exercise 15.1 we nd 3725/5732 = 0.6499.
29.1 Full solutions
517
17.7 a If we model the series of disasters by a Poisson process, then as a property of the Poisson process, the interdisaster times should follow an exponential distribution (see Section 12.3). This is indeed conrmed by the histogram and empirical distribution of the observed interdisaster times; they resemble the probability density and distribution function of an exponential distribution. 17.7 b The average length of a time interval is 40 549/190 = 213.4 days. Following Table 17.2 this should resemble the expectation of the Exp () distribution, which is 1/. Hence, as an estimate for we could take 190/40 549 = 0.00469. 17.8 a The distribution function of Y is given by FY (a) = P(Y a) = P X y 1/ = 1 e

This is the distribution function of the Exp ( ) distribution, which has expectation 1/ . Therefore, E [X ] = E [Y ] = 1/ . 17.8 b Take -powers of the data: yi = x . Then, according to part a, the yi s i are a realization of a sample from an Exp ( ) distribution, with expectation 1/ . Hence, 1/ can be estimated by the average of these numbers:
n 1 1X xi . n i=1
Next, solve for to get an estimate for : =

n 1X xi n i=1
!1/
When we plug in = 2.102 and apply this formula to the dataset, we get = (10654.85)1/2.102 = 0.0121. 17.9 a A (perfect) cylindrical cone with diameter d (at the base) and height h has volume d2 h/12, or about 0.26d2 h. The eective wood of a tree is the trunk without the branches. Since the trunk is similar to a cylindrical cone, one can expect a linear relation between the eective wood and d2 h. 17.9 b We nd zn = yi /xi 9.369 = = 0.3022 n 31 P ( yi )/n 26.486/31 y / = P x = = 0.3028 ( xi )/n 87.456/31 P xi yi 95.498 least squares = P 2 = = 0.3035. 314.644 xi 17.10 a G(y) = P(Y y) = P(|X m| y) = P(m y X m + y) = P(X m + y) P(X m y) = F (m + y) F (m y).
P
518
17.10 b If f is symmetric around its median, then f (m y) = f (m + y) for all y. This implies that the area under f on (, my] is equal to the area under f on [m+y, ), i.e. F (my) = 1F (m+y). By a this means G(y) = 2F (m+y)1. To derive Ginv ( 1 ) put 1 = G(y). Then, since m = F inv ( 1 ) 2 2 2 1 3 = 2F (m + y) 1 = F (m + y) 2 4 3 y = F inv ( 3 ) m = F inv ( 4 ) F inv ( 1 ). 4 2 17.10 c First determine an expression for F inv (x), by putting x = F (u). Then F inv (x) = + inv (x). With b it follows that the MAD of a N (, 2 ) is equal to
3 F inv ( 4 ) F inv ( 1 ) = inv ( 3 ) inv ( 1 ) = inv ( 3 ), 2 4 2 4
using that inv ( 1 ) = 0. 2 For the N (5, 4) distribution the MAD is 2inv ( 3 ) = 2 0.6745 = 1.3490. 4 17.11 a Using Exercise 17.10 a it follows that G(y) = F (m + y) F (m y) = 1 e(my) 1 e(m+y) = em ey ey =
1 y e ey , 2
1 using that m is the median of the Exp () distribution, and satises 1 em = 2 .
17.11 b Combining a with Exercise 17.10 b the MAD of the Exp () distribution is a solution of G(y) = 1 , so that 1 ey ey = 1 . Multiplying this equation with 2 2 2 ey , yields that the MAD must satisfy e2y ey 1 = 0. 17.11 c Put x = ey and solve x2 x 1 = 0 for x. This gives 1 5 x= . 2 Since the MAD must be positive, it can be found from the relation 1+ 5 . ey = 2 If follows that the MAD of the Exp () distribution is equal to ln(1 + 5) ln 2 ln(1 + 5) = F inv ( 1 ). 2 18.1 If we view a bootstrap dataset as a vector (x , . . . , x ), we have ve possibil1 n ities 1, 2, 3, 4, and 6 for each of the six positions. Therefore there are 56 dierent vectors possible. They are not equally likely. For instance, (1, 1, 1, 1, 1, 1) has probability ( 1 )6 , whereas (2, 2, 2, 2, 2, 2) has probability ( 1 )6 to occur. 3 6
29.1 Full solutions
519
18.2 a Because the bootstrap sample can only be 1 if and only if all elements in the bootstrap random sample are 1, we get:
P Xn = 1 = P(X1 = 1, . . . , X4 = 1) = P(X1 = 1) P(X4 = 1) =
4
1 4
= 0.0039.
18.2 b Because the maximum is less than 4, is equivalent to all numbers being less than 4, we get
P(max Xi = 6) = 1 P(max Xi 4) = 1 P(X1 4, . . . , X4 4) = 1 P(X1 4) P(X4 4) = 1
4
3 4
= 0.6836
18.3 a Note that generating from the empirical distribution function is the same as choosing one of the elements of the original dataset with equal probability. Hence, an element in the bootstrap dataset equals 0.35 with probability 0.1. The number of ways to have exactly three out of ten elements equal to 0.35 is 10 , and each has 3 probability (0.1)3 (0.9)7 . Therefore, the probability that the bootstrap dataset has exactly three elements equal to 0.35 is equal to 10 (0.1)3 (0.9)7 = 0.0574. 3 18.3 b Having at most two elements less than or equal to 0.38 means that 0, 1, or 2 elements are less than or equal to 0.38. Five elements of the original dataset are smaller than or equal to 0.38, so that an element in the bootstrap dataset is less than or equal to 0.38 with probability 0.5. Hence, the probability that the bootstrap dataset has at most two elements less than or equal to 0.38 is equal to (0.5)10 + 10 (0.5)10 + 10 (0.5)10 = 0.0547. 1 2 18.3 c Five elements of the dataset are smaller than or equal to 0.38 and two are greater than 0.42. Therefore, obtaining a bootstrap dataset with two elements less than or equal to 0.38, and the other elements greater than 0.42 has probability (0.5)2 (0.2)8 . The number of such bootstrap datasets is 10 . So the answer is 2 10 (0.5)2 (0.2)8 = 0.000029. 2
18.4 a There are 9 elements strictly less than 0.46, so that P(M10 < 0.46) = 9 10 ( 10 ) = 0.3487. 18.4 b There are n 1 elements strictly less than mn , so that P(Mn < mn ) = n n1 n ( n ) = (1 1/n) . 18.5 Since each Xi is either 0, 3, or 6, it is not so dicult to see that X1 +X2 +X3 can only take the values 0, 3, 6, 9, 12, 15, and 18. The sample mean of the dataset is xn = 3. Therefore X3 xn = (X1 + X2 + X3 )/3 3 can only take 3 xn = 3 corresponds to the values 3, 2, 1, 0, 1, 2, and 3. The value X (X1 , X2 , X3 ) = (0, 0, 0). Because there are 33 = 27 possibilities for (X1 , X2 , X3 ), 3 xn = 3 = 1/27. Similarly, the value X3 xn = 2 the probability P X corresponds to (X1 , X2 , X3 ) being equal to (3, 0, 0), (0, 3, 0), or (0, 0, 3), so that P X3 xn = 2 = 3/27. The other probabilities can be computed in the same way, which leads to
a P Xn xn = a
3 2 1 1 27 3 27 6 27
0 7 27
1 6 27
2 3 27
3 1 27
520
18.6 a We have F (m) =

1 2
1 em =
1 2
m = ln( 1 ) = ln(2) m = 2
ln 2 .
18.6 b Since we know that the dataset is a realization of a sample from an Exp () distribution, we are dealing with a parametric bootstrap. Therefore we must generate the bootstrap datasets from the distribution function of the Exp () distribution with the parameter estimated by 1/n = 0.2. Because the bootstrap simulation is for x Med(X1 , X2 , . . . , Xn )m , in each iteration we must compute Med(x , x , . . . , x ) 1 2 n m , where m denotes the median of the estimated exponential distribution: m = ln(2)/(1/n ) = 5 ln(2). This leads to the following parametric bootstrap simulation x procedure: 1. Generate a bootstrap dataset x , x , . . . , x from F (x) = 1 e0.2x . n 2 1 2. Compute the centered sample median for the bootstrap dataset: Med(x , x , . . . , x ) 5 ln(2), 2 n 1 where Med(x , x , . . . , x ) is the sample median of x , x , . . . , x . n 2 2 n 1 1 Repeat steps 1 and 2 many times. 18.7 For the parametric bootstrap, we must estimate the parameter by = (n + 1)mn /n, and generate bootstrap samples from the U (0, ) distribution. This distribution has expectation = /2 = (n + 1)mn /(2n). Hence, for each bootstrap sample x , x , . . . , x compute x = x (n + 1)mn /(2n). n n 1 2 n Note that this is dierent from the empirical bootstrap simulation, where one would estimate by xn and compute x xn . n 18.8 a Since we know nothing about the distribution of the interfailure times, we estimate F by the empirical distribution function Fn of the software data and we estimate the expectation of F by the expectation = xn = 656.8815 of Fn . The bootstrapped centered sample mean is the random variable Xn 656.8815. The corresponding empirical bootstrap simulation is described as follows: 1. Generate a bootstrap dataset x , x , . . . , x from Fn , i.e., draw with replacement 1 2 n 135 numbers from the software data. 2. Compute the centered sample mean for the bootstrap dataset: x 656.8815 n where x is the sample mean of x , x , . . . , x . n 1 2 n Repeat steps 1 and 2 one thousand times. 18.8 b Because the interfailure times are now assumed to have an Exp () distribu tion, we must estimate by = 1/n = 0.0015 and estimate F by the distribution x function of the Exp (0.0015) distribution. Estimate the expectation = 1/ of the Exp () distribution by = 1/ = xn = 656.8815. Also now, the bootstrapped centered sample mean is the random variable Xn 656.8815. The corresponding parametric bootstrap simulation is described as follows: 1. Generate a bootstrap dataset x , x , . . . , x from the Exp (0.0015) distribution. 1 2 n
29.1 Full solutions 2. Compute the centered sample mean for the bootstrap dataset: x 656.8815, n where x is the sample mean of x , x , . . . , x . n 1 2 n
521
Repeat steps 1 and 2 one thousand times. We see that in this simulation the boot strapped centered sample mean is the same in both cases: Xn xn , but the corresponding simulation procedures dier in step 1. 18.8 c Estimate by = ln 2/mn = 0.0024 and estimate F by the distribution function of the Exp (0.0024) distribution. Estimate the expectation = 1/ of the Exp () distribution by = 1/ = 418.3816. The corresponding parametric bootstrap simulation is described as follows: 1. Generate a bootstrap dataset x , x , . . . , x from the Exp (0.0024) distribution. n 2 1 2. Compute the centered sample mean for the bootstrap dataset: x 418.3816, n where x is the sample mean of x , x , . . . , x . n 1 2 n Repeat steps 1 and 2 one thousand times. We see that in this parametric bootstrap simulation the bootstrapped centered sample mean is dierent from the one in the empirical bootstrap simulation: Xn (ln 2)/mn instead of Xn xn . 18.9 Estimate by xn = 39.85, by sn = 2.09, and estimate F by the distribution function of the N (39.85, 4.37) distribution. 1. Generate a bootstrap dataset x , x , . . . , x from the N (39.85, 4.37) distribution. 1 2 n 2. Compute the centered sample mean for the bootstrap dataset: x 39.85 n where xn is the sample mean of x , x , . . . , x . 1 2 n Repeat steps 1 and 2 one thousand times. Estimate P |Xn | > 1 by the relative frequency of bootstrapped centered sample means that are greater than 1 in absolute value: number of x with | 39.85| greater than 1 n xn . 1000 18.10 a Perform the empirical bootstrap simulation as in part a of Exercise 18.8. 1. Generate a bootstrap dataset x , x , . . . , x from Fn , i.e., draw with replacement 1 2 n 135 numbers from the software data. 2. Compute the centered sample mean for the bootstrap dataset: x 656.8815 n where x is the sample mean of x , x , . . . , x . n 1 2 n Repeat steps 1 and 2 one thousand times. Estimate P |Xn | > 10 by the relative frequency of bootstrapped centered sample means that are greater than 1 in absolute value: number of x with | 656.8815| greater than 10 n xn . 1000
522
18.10 b Perform the parametric bootstrap simulation as in part b of Exercise 18.8 1. Generate a bootstrap dataset x , x , . . . , x from the Exp (0.0015) distribution. 1 2 n 2. Compute the centered sample mean for the bootstrap dataset: x 656.8815, n where x is the sample mean of x , x , . . . , x . n 1 2 n Repeat steps 1 and 2 one thousand times. Estimate P |Xn | > 10 as in part a. 18.11 Estimate by xn = 39.85, by sn = 2.09, and estimate F by the distribution function of the N (39.85, 4.37) distribution. 1. Generate a bootstrap dataset x , x , . . . , x from the N (39.85, 4.37) distribution. n 2 1 2. Compute the sample mean x , sample standard deviation s and empirical disn n tribution function Fn of x , x , . . . , x . Use these to compute the bootstrapped 1 2 n KS distance t = sup |Fn (a) Fx ,s (a)| n n ks
a
Repeat steps 1 and 2 a large number of times. To investigate to which degree the value 0.0987 agrees with the assumed normality of the dataset, one may compute the percentage of t values that are greater then 0.0987. The closer this percentage ks is to one, the better the normal distribution ts the data. 18.12 a Note that P(Tn 0) = P(Mn ) = 0, because all Xi are less than with probability one. To compute P(Tn 0), note that, since the bootstrap random sample is from the original dataset, we always have that Mn mn . Hence
P(Tn 0) = P(Mn mn ) = P(Mn = mn ) = 1 P(Mn < mn ) . Furthermore, from Exercise 18.4 we know that P(Mn < mn ) = 1 1 n . n
18.12 b Note that Gn (0) = P(Tn 0) = P(Mn 0) = P(X1 0) P(Xn 0) , which is zero because the Xi have a U (0, ) distribution. With part a it follows that

sup |G (t) Gn (t)| |G (0) Gn (0)| = P(Tn 0) = 1 1 n n t
1 n
n
18.12 c From the inequality ex 1 x, it follows that

1 1
1 n
n
1 e1/n
n
= 1 e1 .
18.13 a The density of a U (0, ) distribution is given by f (x) = 1/, for 0 x . Hence for 0 a , Z a 1 a F (a) = dx = . 0
29.1 Full solutions 18.13 b We have P(Tn t) = P(1 Mn / t) = P(Mn (1 t)) = 1 P(Mn (1 t)) .
523
Using the rule on page 115 about the distribution function of the maximum, it follows that for 0 t 1, Gn (t) = 1 P(Mn (1 t)) = 1 F ((1 t))n = 1 (1 t)n . 18.13 c By the same argument as before
G (t) = P(Tn t) = 1 P Mn (1 t) . n Since Mn is the maximum of a random sample X1 , X2 , . . . , Xn from a U (0, ) distribution, again the rule on page 115 about the distribution function of the maximum yields that for 0 t 1, G (t) = 1 P Mn (1 t) = 1 F ((1 t))n = 1 (1 t)n . n
19.1 a We must show that E [T ] = 2 . From the formulas for the expectation and variance of uniform random variables we deduce that E [Xi ] = 0 and Var(Xi ) = 2 (2)2 /12 = 2 /3. Hence E Xi = Var(Xi )+(E [Xi ])2 = 2 /3. Therefore, by linearity of expectations E [T ] = 3 2 2 2 (E X1 + E X2 + + E Xn ) n 3 2 2 3 2 = + + = n = 2 . n 3 3 n 3
Since E [T ] = 2 , the random variable T is an unbiased estimator for 2 . 19.1 b The function g(x) = x is a strictly convex function,h because g (x) = p i (x3/2 )/4 > 0. Therefore, by Jensens inequality, E [T ] < E T . Since, from h i part a we know that E [T ] = 2 , this means that E T < . In other words, T is a biased estimator for , with negative bias. 19.2 a We must check whether E [S] = . By linearity of expectations, we have

E [S] = E
1 1 1 1 1 1 X1 + X2 + X3 = E [X1 ] + E [X2 ] + E [X3 ] 2 3 6 2 3 6

1 1 1 = + + = 2 3 6
1 1 1 + + 2 3 6
= .
So indeed, S is an unbiased estimator for . 19.2 b We must check under what conditions E [T ] = for all . By linearity of expectations, we have E [T ] = E [a1 X1 + a2 X2 + + an Xn ] = a1 E [X1 ] + a2 E [X2 ] + + an E [Xn ] = a1 + a2 + + an = (a1 + a2 + + an ) . This is equal to if and only if a1 + a2 + + an = 1. Therefore, T is an unbiased estimator for if and only if a1 + a2 + + an = 1.
524
19.3 We must check under what conditions E [T ] = for all . By linearity of expectations, we have E [T ] = E [a(X1 + X2 + + Xn ) + b] = a (E [X1 ] + E [X2 ] + + E [Xn ]) + b = a ( + + + ) + b = an + b. For T to be an unbiased estimator for , one must have an + b = for all , with a and b being constants not depending on . This can only be the case, when b = 0 and a = 1/n. 19.4 a The function g(x) = 1/x is convex, so that according to Jensens inequality:

E [T ] = E
1 Xn
>
1 . E Xn
Furthermore, E Xn = E [X1 ] = 1/p, so that E [T ] > 1/(1/p) = p. We conclude that T is a biased estimator for p with positive bias. 19.4 b For each i label Xi 3 as a success, then the total number Y of all Xi 3 has a Bin (n, ) distribution where = p + (1 p)p + (1 p)2 p represents the probability that a woman becomes pregnant within three or fewer cycles. This implies that E [Y ] = n and therefore E [S] = E [Y /n] = E [Y ] /n = . This means that S is an unbiased estimator for . 19.5 We must nd out for which c, E [T ] = , where = 1/ is the expectation of the Exp () distribution. Because Mn has an Exp (n) distribution with expectation 1/(n), we have 1 E [T ] = E [cMn ] = cE [Mn ] = c . n Hence, with the choice c = n, we get E [T ] = 1/ = , which means that T is an unbiased estimator for . 19.6 a We must show that E [T ] = 1/. By linearity of expectations E [T ] = = n n E Xn E [Mn ] = n1 n1 n 1 1 n = n1 n n1

+

1 n
1 n
1 1 = .
19.6 b Find a linear combination of Xn and Mn of which the expectation is . From n and E [Mn ] we see that we can eliminate by subtracting the expressions for E X E Xn from nE [Mn ]. Therefore, rst consider nMn Xn , which has expectation 1 E nMn Xn = nE [Mn ] E Xn = n + n This means that

= (n 1).
nMn Xn n1 has expectation : E [T ] = E nMn Xn /(n 1) = , so that T is an unbiased estimator for . T =
29.1 Full solutions 19.6 c Plug in xn = 8563.5, mn = 2398 and n = 20 in the estimator of part b: t= 20 2398 8563.5 = 2073.5. 19 4 E [N1 ] 2. n
525
19.7 a Note that by linearity of expectations E [T1 ] =
Because N1 has a Bin (n, p1 ) distribution, with p1 = (+2)/4, it follows that E [N1 ] = np1 = n( + 2)/4, so that E [T1 ] = . The argument for T2 is similar. 19.7 b Plug in n = 3839, n1 = 1997, and n2 = 32 in the estimators T1 and T2 : 4 1997 2 = 0.0808, 3839 4 t2 = 32 = 0.0333. 3839 t1 = 19.8 From the model assumptions it follows that E [Yi ] = xi for each i. Using linearity of expectations, this implies that E [Yn ] 1 E [Y1 ] 1 x1 xn + + = + + n x1 xn n x1 xn E [Y1 ] + + E [Yn ] x1 + + xn E [B2 ] = = = , x1 + + xn x1 + + xn x1 E [Y1 ] + + xn E [Yn ] x2 + + x2 1 n E [B3 ] = = = . 2 2 x1 + + xn x2 + + x2 n 1 E [B1 ] =

= ,
19.9 Write T = eZ/n , where Z = X1 + X2 + + Xn has a Pois (n) distribution, with probabilities P(Z = k) = en (n)k /k!. Therefore E [T ] = E eZ/n =
X k=0
X k=0
ek/n P(Z = k)
k
=e Use that
P
k=0
k/n
1/n X ne (n)k = en k! k! k=0
xk /k! = ex , with x = ne1/n , and conclude that E [T ] = en ene

1/n
= en(1e
1/n
20.1 We have Var Xn = MSE(Xn ) = 2 /n, which is decreasing in n, so the larger the sample, the more ecient Xn is. In particular Var Xn /Var X2n = ( 2 /n)/( 2 /2n) = 2 shows that X2n is twice as ecient as Xn . 20.2 a Compute the mean squared errors of S and T : MSE(S) = Var(S) + [bias(S)]2 = 40 + 0 = 40; MSE(T ) = Var(T ) + [bias(T )]2 = 4 + 9 = 13. We prefer T , because it has a smaller MSE. 20.2 b Compute the mean squared errors of S and T : MSE(S) = 40, as in a; MSE(T ) = Var(T ) + [bias(T )]2 = 4 + a2 . So, if a < 6: prefer T . If a 6: prefer S. The preferences are based on the MSE criterion.
526
20.3 Var(T1 ) = 1/(n2 ), Var(T2 ) = 1/2 ; hence we prefer T1 , because of its smaller variance. 20.4 a Since L has the same distribution as N + 1 M , we nd E [T3 ] = E [3L1] = E [3N +33M 1] = 3N +2 E [2T2 + 2] = 3N 2N = N. Here we use that E [2T2 ] = 2N , since T2 is unbiased. 20.4 b We have Var(T3 ) = Var(3L 1) = 9Var(L) = 9Var(M ) = 20.4 c We compute: Var(T3 ) /Var(T2 ) = 9Var(M ) /Var
3 M 2
1 (N + 1)(N 2). 2
1 = 9Var(M ) / 9 Var(M ) = 4. 4
So using the maximum is 4 times as ecient as using the minimum!

1 20.5 From the variance of the sum rule: Var((U + V )/2) = 4 (Var(U ) + Var(V ) + 2Cov(U, V )). Using that Var(V )=Var(U ), we get from this that that the relative eciency of U with respect to W is equal to
Var(W ) Var((U + V )/2) Cov(U, V ) 1 1 1 p = = 1 + 1 + 2p = + (U, V ) . Var(U ) Var(U ) 4 2 2 Var(V ) Var(U )
Since the correlation coecient is always between -1 and 1, it follows that the relative eciency of U with respect to W is always between 0 and 1. Hence it ia always better (in the MSE sense) to use W . 20.6 a By linearity of expectations, E [U1 ] = E [T1 ] + 1 ( E [T1 ] E [T2 ] E [T3 ]), 3 which equals 1 + 1 ( 1 2 3 ) = 1 . 3 1 To compute the variances, rewrite U1 as U1 = 2 T1 3 T2 1 T3 + 1 . Then, by 3 3 3 1 independence, Var(U1 ) = 4 Var(T1 ) + 1 Var(T2 ) + 9 Var(T3 ) = 2 2 . 9 9 3 20.6 b We have Var(T1 ) /Var(U1 ) = 3/2, so U1 is 50% more ecient than T1 . 20.6 c There are at least two ways to obtain an ecient estimator for 1 = 2 . The rst is via the insight that T1 and T2 both estimate 1 , so (T1 + T2 )/2 is an ecient estimator for 1 (c.f. Exercise 20.1). Then we can improve the eciency in the same way as in part a. This yields the estimator
1 1 V1 = 1 (T1 + T2 ) + 1 ( T1 T2 T3 ) = 1 T1 + 6 T2 1 T3 + 3 . 2 3 6 3
The second way is to nd the linear estimator V1 = uT1 + vT2 + wT3 + t with the smallest MSE, optimising over u, v, w and t. This will result in the same estimator as obtained with the rst way. 20.7 We compare the MSEs, which by unbiasedness are equal to the variances. Both N1 and N2 are binomally distributed, so Var(N1 ) = np1 (1 p1 ) = n( + 2)(1 ( + 2)/4)/4, and Var(N2 ) = np2 (1 p2 ) = n(1 )/4. It follows that Var(T1 ) = 1 16 1 16 Var(N1 ) = (4 2 ); Var(T2 ) = 2 Var(N2 ) = (4 ). n2 n n n
Since (4 2 ) > (4 ) for all , we prefer T2 .
29.1 Full solutions 20.8 a This follows directly from linearity of expectations:
527
E [T ] = E rXn +(1 r)Ym = rE Xn + (1 r)E Ym = r + (1 r) = . 20.8 b Using that Xn and Ym are independent, we nd MSE(T )=Var(T ) = r2 Var Xn + (1 r)2 Var Ym = r2 2 /n + (1 r)2 2 /m. To nd the minimum of this parabola we dierentiate with respect to r and equate the result to 0: 2r/n 2(1 r)/m = 0. This gives the minimum value: 2rm 2n(1 r) = 0 or r = n/(n + m). 20.9 a Since E [T1 ] = p, T1 is unbiased. The estimator T2 takes only the values 0 and 1, the latter with probability pn . So E [T2 ] = pn , and T2 is biased for all n > 1.
1 20.9 b Since T1 is unbiased, MSE(T1 ) = Var(T1 ) = np(1 p)/n2 = n p(1 p). Now n for T2 : this random variable has a Ber (p ) distribution, hence MSE(T2 ) = E (T2 )2 = E (T2 p)2 = p2 P(T2 = 0) + (1 p)2 P(T2 = 1) = p2 (1 pn ) + (1 p)2 pn = pn 2pn+1 + p2 .
20.9 c For n = 2: MSE(T2 ) = 4p MSE(T1 ), so for p < T1 , but otherwise T1 is more ecient.
1 4
T2 is more ecient than
20.10 a Recall that the variance of an Exp () distribution equals 1/2 , hence the mean squared error of T equals MSE(T ) = Var(T ) + E [T ] 1
2 2
= c2 n 2 + c n 1 1 = 2 c2 n + (c n 1)2 = 2 c2 (n2 + n) 2cn + 1 .
20.10 b This is a parabola in c, taking its smallest value when 2c(n2 + n) = 2n, which happens for c = 1/(n + 1). So the estimator with the smallest MSE is U= 1 (X1 + X2 + + Xn ) . n+1
Substituting c = 1/n, and c = 1/(n + 1) in the formula for the MSE, we nd that MSE(Xn ) = 1 , n and MSE(U ) = 1 . n+1
So U performs better in terms of MSE, but not much. 20.11 We have MSE(T1 ) = Var(T1 ), which equals Var(T1 ) = Var
n X i=1
xi Yi
n .X i=1
x2 i
n X i=1
x2 Var(Yi ) i
n X i=1
!2
x2 i
n X i=1
x2 i
Also, MSE(T2 ) = Var(T2 ), which equals Var(T2 ) = Var 1 X Yi n i=1 xi

n
1 X Yi Var n2 i=1 xi
n
n 2 X 1/x2 . i 2 n i=1
Finally, MSE(T3 ) = Var(T3 ), which equals
528

n X i=1
Var(T3 ) = Var
Yi
n .X i=1
xi
n X i=1
Var(Yi )
n X i=1
!2
xi
= n
n X i=1
!2
xi
To show that Var(T3 ) < Var(T2 ): introduce a random variable X by P(X = xi ) = pi for i = 1 . . . n, and apply Jensen with the function g(x) = 1/x2 , which is strictly convex on (0, ) (note that all xi in the cherry tree example are positive). 20.12 a Following the hint: P(Mn k) = P(X1 k, X2 k, . . . , Xn k) kn+1 k k1 , = N N 1 N n+1 (N n)! k! = . (k n)! N! 20.12 b To have Mn = n we should have drawn all numbers 1, 2 . . . , n, and conversely, so n!(N n)! P(Mn = n) = P(Mn n) = . N! 20.12 c This also follows directly from part a: P(Mn = k) = P(Mn k) P(Mn k 1) (N n)! (k 1)! (N n)! k! = (k n)! N! (k 1 n)! N! (k 1)![k (k n)] (N n)! = (k 1 n)! N! (k 1)! (N n)! = n . (k n)! N! 21.1 Setting Xi = j if red appears in the ith experiment for the rst time on the jth throw, we have that X1 , X2 , and X3 are independent Geo (p) distributed random variables, where p is the probability that red appears when throwing the selected die. The likelihood function is L(p) = P(X1 = 3, X2 = 5, X3 = 4) = (1 p)2 p (1 p)4 p (1 p)3 p = p3 (1 p)9 ,
5 so for D1 one has that L(p) = L( 5 ) = 5 1 6 , whereas for D2 one has that 6 6 6 1 1 3 1 9 5 L(p) = L( 6 ) = 6 1 6 = 5 L( 6 ). It is very likely that we picked D2 . 3 9
21.2 As in the solution of Exercise 21.1, the likelihood is given by L(p) = p3 (1p)9 , where p is between 0 and 1. So the loglikelihood is given by (p) = 3 ln p + 9 ln(1 p), and dierentiating the loglikelihood with respect to p gives: d dp (p) = 3 9 . p 1p
d We nd that dp (p) = 0 if and only if p = 1/4, and since (p) (and also L(p)!) attains its maximum for this value of p, we nd that the maximum likelihood estimate of p is p = 1/4.
29.1 Full solutions 21.3 a The likelihood L() is given by
529
L() = CP(X = 0)229 P(X = 1)211 P(X = 2)93 P(X = 3)35 P(X = 4)7 P(X = 7) = C537 e576 . But then the loglikelihood () satises () = ln C + 537 ln 576. But then we have, that () = 537/ 576, from which it follows that and only if = 537/576 = 0.93229. 21.3 b Now the likelihood L() is given by L() = CP(X = 0, 1)440 P(X = 2)93 P(X = 3)35 P(X = 4)7 P(X = 7) = C(1 + )440 316 e576 . But then the loglikelihood () satises () = ln C + 440 ln(1 + ) + 316 ln 576. But then we have, that () = from which it follows that that = 0.9351086. 440 + 316/ 576, 1+ () = 0 if
() = 0 if and only if 5762 180 + 260 = 0. We nd
21.4 a The likelihood L() is given by L() = P(X1 = x1 , . . . , Xn = xn ) = P(X1 = x1 ) P(Xn = xn ) = x1 xn en e e = x1 +x2 ++xn . x1 ! xn ! x1 ! xn !
!
21.4 b We nd that the loglikelihood () is given by () = Hence

n X i=1
xi
ln() ln (x1 ! xn !) n.
P
xi d = n, d and we ndafter checking that we indeed have a maximum!that xn is the max imum likelihood estimate for . 21.4 c In b we have seen that xn is the maximum likelihood estimate for . Due to x the invariance principle from Section 21.4 we thus nd that en is the maximum likelihood estimate for e . 21.5 a By denition, the likelihood L() is given by L() = f (x1 ) f (xn ) 2 2 1 1 1 1 = e 2 (x1 ) e 2 (xn ) 2 2 = (2)n/2 e 2
1
Pn
2 i=1 (xi )
530
But then the loglikelihood () satises () =

n n 1X ln(2) (xi )2 . 2 2 i=1
Dierentiating () with respect to yields () =

n X i=1
(xi ) = n +
n X i=1
xi .
So () = 0 if and only if = xn . Since () attains a maximum at this value of (check this!), we nd that xn is the maximum likelihood estimate for . 21.5 b Again by the denition of the likelihood, we nd that L() is given by L() = f (x1 ) f (xn ) 2 2 1 2 1 2 1 1 = e 2 x1 / e 2 xn / 2 2 = n (2)n/2 e But then the loglikelihood () satises () = n ln
n n 1 X 2 ln(2) xi . 2 2 2 i=1
Pn 2 12 i=1 xi 2 .
Dierentiating () with respect to yields () =

r
n n 1 X 2 + 3 xi . i=1
1 Xn 2 xi . Since () attains a maximum at i=1 n r 1 Xn 2 this value of (check this!), we nd that xi is the maximum likelihood i=1 n estimate for . So () = 0 if and only if = 21.6 a In this case the likelihood function L() = 0 for > x(1) , and L() = en
P
xi
for x(1) .
21.6 b It follows from the graph of L() in a that x(1) is the maximum likelihood estimate for . 21.7 By denition, the likelihood L() is given by L() = f (x1 ) f (xn ) x1 1 2 2 xn 1 2 2 = 2 e 2 x1 / 2 e 2 xn / ! = 2n
n Y
xi
Pn 2 12 i=1 xi
2
i=1
But then the loglikelihood () is equal to
29.1 Full solutions () = 2n ln + ln

n Y i=1
531
xi
n 1 X 2 xi . 2 2 i=1
Dierentiating () with respect to yields () = and we nd that

n 2n 1 X 2 + 3 xi , i=1
()) = 0 if and only if

v u n u 1 X =t x2 . i
2n
i=1
Since () attains a maximum at this value of (check this!), we nd that the q P n 1 2 maximum likelihood estimate for is given by i=1 xi . 2n 21.8 a The likelihood L() is given by
1997 32 906 904
L() = C =
1 (2 + ) 4
1 4
1 (1 ) 4
1 (1 ) 4
C (2 + )1997 32 (1 )1810 , 43839
where C is the number of ways we can assign 1997 starchy-greens, 32 sugary-whites, 906 starchy-whites, and 904 sugary-greens to 3839 plants. Hence the loglikelihood () is given by () = ln(C) 3839 ln(4) + 1997 ln(2 + ) + 32 ln() + 1810 ln(1 ). 21.8 b A short calculation shows that d () =0 d 38102 1655 64 = 0,
so the maximum likelihood estimate of is (after checking that L() indeed attains a maximum for this value of ): 1655 + 3714385 = 0.0357. 7620 21.8 c In this general case the likelihood L() is given by
n1 n2 n3 n4
L() = C =
1 (2 + ) 4
1 4
1 (1 ) 4
1 (1 ) 4
C (2 + )n1 n2 (1 )n3 +n4 , 4n
where C is the number of ways we can assign n1 starchy-greens, n2 sugary-whites, n3 starchy-whites, and n4 sugary-greens to n plants. Hence the loglikelihood () is given by () = ln(C) n ln(4) + n1 ln(2 + ) + n2 ln() + (n3 + n4 ) ln(1 ).
532
A short calculation shows that d () =0 d n2 (n1 n2 2n3 2n4 ) 2n2 = 0,
so the maximum likelihood estimate of is (after checking that L() indeed attains a maximum for this value of ): n1 n2 2n3 2n4 +
p
(n1 n2 2n3 2n4 )2 + 8nn2 . 2n
21.9 The probability density of this distribution is given by f, (x) = 0 if x is not between and , and f, (x) = 1 for x .
Since the xi must be in the interval between and , the likelihood (which is a function of two variables!) is given by

L(, ) =
n
for
x(1)
and
x(n) ,
and L(, ) = 0 for all other values of and . So outside the rectangle (, x(1) ] [x(n) , ) the likelihood is zero, and clearly on this rectangle it attains its maximum in (x(1) , x(n) ). The maximum likelihood estimates of and are therefore = x(1) and = x(n) . 21.10 The likelihood is L() = so the loglikelihood is () = n ln ( + 1) ln
Y n
i=1
x+1 x+1 1 2
x+1 n
= n
Y n
i=1
(+1)
xi
xi .
Qn
i=1
Dierentiating () to yields as maximum likelihood = n/ ln
xi .
21.11 a Since the dataset is a realization of a random sample from a Geo (1/N ) distribution, the likelihood is L(N ) = P(X1 = x1 , X2 = x2 , . . . , Xn = xn ), where each Xi has a Geo (1/N ) distribution. So

L(N ) =

1 N
x1 1
1 1 1 N N
Pn
i=1
x2 1 n
1 1 1 N N
xn 1
1 N
1 1 N
n+
xi
1 N
But then the loglikelihood is equal to

(N ) = n ln N +
n+
n X i=1
xi ln 1
1 . N
29.1 Full solutions Dierentiating to N yields d dN

d dN
533
(N ) =
n + N
n+
n X i=1
xi
1 , N (N 1)
Now (N ) = 0 if and only if N = xn . Because (N ) attains its maximum at xn , we nd that the maximum likelihood estimate of N is N = xn . 21.11 b Since P(Y = k) = 1/N for k = 1, 2, . . . , N , the likelihood is given by
n
L(N ) =
1 N
for N y(n) ,
and L(N ) = 0 for N < y(n) . So L(N ) attains its maximum at y(n) ; the maximum likelihood estimate of N is N = y(n) . 21.12 Since L(N ) = P(Z = k), it follows from Exercise 4.13c that L(N ) =
m k N m rk N r
In order to see that L(N ) increases for N < mr , and decreases for N > mr , consider k k the ratio L(N )/L(N 1). After some elementary calculations one nds that (N m)(n r) L(N ) = . L(N 1) N (N m r + k)
L(N ) (N m)(N r) So L(N ) is increasing if L(N 1) = N (N mr+k) > 1, and another elementary calculation shows that this is when N < mr . k
21.13 Let N (t1 , t2 ) be the number of customers arriving in the show between time t1 and time t2 . Then it follows from the assumption that customers arrive at the shop according to a Poisson process with rate , that P(N (t1 , t2 ) = k) = ((t2 t1 ))k (t2 t1 ) e , k! for k = 0, 1, 2, . . . .
The likelihood L() is given by L() = P(N (12.00, 12.15) = 2, N (12.15, 12.45) = 0, N (12.45, 13.00) = 1) +P(N (12.00, 12.15) = 1, N (12.15, 12.45) = 1, N (12.45, 13.00) = 0) . Since N (12.00, 12.15), N (12.15, 12.45), and N (12.45, 13.00) are independent random variables, we nd that L() =
1 ( 1 )2 1 ( 2 )0 1 ( 1 )1 1 4 e 4 e 2 4 e 4 2! 0! 1! ( 1 )1 1 ( 1 )1 1 ( 1 )0 1 + 4 e 4 2 e 2 4 e 4 1! 1! 0! 1 3 1 2 = + e . 128 8
Now L () = 0 if and only if 3 + 132 32 = 0. Since > 0, we nd that = 13+ 297 = 2.1168439. 2
534
21.14 For i = 1, 2, . . . , n, ri is the realization of a continuous random variable Ri . Since the shots at the disc do not inuence each other, the Ri are independent, and all have the same distribution function F (x) = x2 /2 if x is between 0 and . But then f (x) = 2x/2 for x between 0 and (and f (x) = 0 otherwise). Since the disc is hit each of the n shots, the likelihood is: L() = f (r1 )f (r2 ) f (rn ) = 2n n ri 2r1 2r2 2rn i=1 2 = , 2 2 2n
Q
for r(n) , and L() = 0 otherwise. But then we at once see that L() attains its maximum at = r(n) , i.e., the maximum likelihood estimate for is = r(n) . 21.15 At temperature t, the probability of failure of an O-ring is given by p(t), so the probability of a failure of k O-rings at this temperature, for k = 0, 1, . . . , 6, is given by ! 6 (p(t))k (1 p(t))6k . k Setting C= 6 0
!16
6 1
!5
6 2
!2
we nd that the likelihood L(a, b) is given by L(a, b) = C (p(53))2 (1 p(53))4 (p(53))0 (1 p(53))6 =C
23 Y
(p(ti ))ni (1 p(ti ))6ni ,
i=1
where ti is the temperature and ni is the number of failing O-rings during the ith launch, for i = 1, 2 . . . , 23. But then the loglikelihood (a, b) is given by (a, b) = ln C +
23 X i=1
ni ln p(ti ) +
23 X i=1
(6 ni ) ln(1 p(ti )).
21.16 Since sn is the realization of a Bin (n, p) distributed random variable Sn , we nd that the likelihood L(p) is given by L(p) = n psn (1 p)nsn , sn
!
from which we see that the loglikelihood (p) satises n (p) = ln sn

!
+ sn ln p + (n sn ) ln(1 p).
But then dierentiating (p) with respect to p yields that (p) = sn n sn , p 1p
and we nd that (p) = 0 if and only if p = sn /n. Since (p) attains a maximum at this value of p (check this!), we have obtained that sn /n is the maximum likelihood
29.1 Full solutions
535
estimate (and Sn /n the maximum likelihood estimator) for p. Due to the invariance principle from Section 21.4 we nd that 2n/sn is the maximum likelihood estimate for , and that 2n T = Sn is the maximum likelihood estimator for . 22.1 a Since xi yi = 12.4, (c.f. (22.1) and (22.2)), that n =
P P P
xi = 9,
P
yi = 4.8,
x2 = 35, and n = 3, we nd i
xi yi ( xi )( yi ) 3 12.4 9 4.8 1 P P = = , n x2 ( xi )2 3 35 92 4 i
and = yn xn = 2.35. 22.1 b Since ri = yi xi , for i = 1, . . . , n, we nd r1 = 2 2.35 + 0.25 = 0.1, r2 = 1.8 2.35 + 0.75 = 0.2, r3 = 1 2.35 + 1.25 = 0.1, and r1 + r2 + r3 = 0.1 + 0.2 0.1 = 0. 22.1 c See Figure 29.2.
3
..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ... .. ... ..... . ........ ..... .... .... ..... ..... ..... .... .... . ..... ..... ... ... ..... ..... . ..... ..... ... ... . ..... ..... ... ... ..... ..... . ..... ..... ... ... ..... ..... .... ...
(3 , y3 ) x
0 1 0 1 2 3 4 6
Fig. 29.2. Solution of Exercise 22.1 c.

P P P
22.2 As in the previous exercise, we have that xi yi = 12.4, P 2 xi = 35. However, now n = 4, and we nd that n =
P P P
xi = 9,
yi = 4.8,
xi yi ( xi )( yi ) 4 12.4 9 4.8 P P = 0.10847. = n x2 ( xi )2 4 35 92 i
Since we now have that xn = 9/4, and yn = 1.2, we nd that = yn xn = 0.9559. 22.3 a Ignoring the subscripts to the sums, we have
X X X X
xi = 10,
x2 = 21.84, i
yi = 20,
and
xi yi = 41.61
From (22.1) we nd that the least squares estimate of is given by
536
Full solutions from MIPS: DO NOT DISTRIBUTE 5 41.61 10 20 = 0.875, = 5 21.84 102
while from (22.2) it follows that the least squares estimate of is given by = 22.3 b See Figure 29.3.
5
20 10 0.875 = 2.25 5 5
. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. .. .... .. .. ... .. .. .... .. .... .. .. . .. .. .. . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. .. .. .. . .. . .. .. .. . .. . .. .. . .. .. . 5 .. .. .. .. .. .. .. .. .. . .. .. .. .. .. .. .. .. .. . .. .. .. .. .. .. .. .. .. . . .. .. .. .. .. .. .. .. .. . .
( , y5 ) x
0 1 0 1 2 3 4
Fig. 29.3. Solution of Exercise 22.3 b.
22.4 Since the least squares estimate of is given by (22.1), we nd that 100 5189 231.7 321 = 2.385. = 100 2400.8 231.72 From (22.2) we nd that = 3.21 2.317cdot = 2.316. 22.5 With the assumption that = 0, the method of least squares tells us now to minimize S() = Now
n X i=1 n n n X X X 2 dS() = 2 (yi xi )xi = 2 xi yi xi d i=1 i=1 i=1
(yi xi )2 .
!
29.1 Full solutions so dS() =0 d

Pn i=1 xi yi = Pn 2 .
i=1
537
xi
Because S() has a minimum for this last value of , we see that the least squares estimator of is given by Pn i=1 xi Yi = Pn 2 . i=1 xi 22.6 In Exercise 22.5 we have seen that
n X
xi Yi , x2 i
i=1 n X i=1
so for the timber dataset we nd that = 34.13. We use the estimated regression line y = 34.13x to predict the Janka hardness. For density x = 65 we nd as a prediction for the Janka hardness y = 2218.45. 22.7 The sum of squares function S(, ) is now given by S(, ) =
n X i=1
yi e+xi
2
In order to nd the values of and for which S(, ) attains a maximum, one could dierentiate S(, ) to and . However, this does not yield workable expressions such as those in (22.1) and (22.2). In order to nd and for a given bivariate dataset (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), an iterative method is best suited. 22.8 In the model with intercept, and are chosen in such a way, that S(, ) = 2 i=1 ri is minimal. In the model with no intercept we chose in such a way, that P 2 S(0, ) = n ri is minimal. Clearly, in this last model the residual sum of squares i=1 Pn 2 i=1 ri is greater than, or equal to the residual sum of squares in the model with intercept.
Pn
22.9 With the assumption that = 0, the method of least squares tells us now to minimize S() = Now
n n X i=1
(yi )2 .
X dS() = 2 (yi ) = 2(nn n) = 2n(n ). y y d i=1
So dS() = 0 if and only if = yn . Since S() has a minimum for this last value of d , we see that Yn is the least squares estimator of . 22.10 a One has that n = 3, xi = 3, xi yi = 2, yi = 4, x2 = 5, and i 3234 so = 3532 = 1. Since x3 = 1 and y3 = 4/3, we nd that = 7/3. Since r1 = 1/3, r2 = 2/3, and r3 = 1/3, we nd that A(, ) = |r1 | + |r2 | + |r3 | = 4/3. 2 2 2 Note that S(, ) = r1 + r2 + r3 = 1/3.
P P P P
538
22.10 b A(, 1) = |2 + 0| + |2 + 1| + |0 + 2|. For 2 < 3 we nd that A(, 1) = 1. For 3 we nd that A(, 1) = 3 7, while for < 2 we nd that A(, 1) = 7 3. We nd that A(, 1) < (, ) when is between 17/9 and 7/3. Note that A(, 1) is minimal (and equal to 1) for 2. 22.10 c Since A(2, 1) = 1, we must have that 1 < < 3 (since otherwise A(, ) |2 + 0| > 1). Clearly we must have that < 0 (since otherwise A(, ) |0 2| > 1). Considering the various cases for yields that A(, ) attains its minimum at = 2 and = 1. 22.11 a In the present set-up one has to minimize S(, ) =
n X i=1
yi (xi + x2 ) i
Dierentiating S(, ) to and yields

X S = 2 xi yi xi x2 i i=1
n
and
X 2 S 2 = 2 xi yi xi xi . i=1
n
We nd that S =0 and S =0
n X i=1 n X i=1
x2 + i
n X i=1 n X i=1
x3 = i
n X i=1 n X i=1
xi yi
x3 + i
x4 = i
x2 yi . i
22.11 b Using Cramers rule from linear algebra now yields that
P P 3 xi y x P 2 i P i 4
P 2 P 3 xi xi P 3 P 4
xi yi xi
xi
xi Yi )( x4 ) ( x3 )( x2 Yi ) i i i P P P ( x2 )( x4 ) ( x3 )2 i i i
xi
and =
P 2 P xi xi y P 3 P 2 i P 2 P 3 xi xi P 3 P 4
xi
xi yi xi
x2 )( x2 Yi ) ( x3 )( xi Yi ) i i i P P P . ( x2 )( x4 ) ( x3 )2 i i i
xi
Since S(, ) is a vase, the above stationary point (, ) is a global minimum for S(, ). This nishes the exercise. 22.12 a Since the denominator of is a number, not a random variable, one has that P P P h i E [n( xi Yi ) ( xi )( Yi )] P 2 P E = . x xi ( xi )2 Furthermore, the numerator of this last fraction can be written as
29.1 Full solutions E n which is equal to n

h X X
539
xi Yi E (
h X X
xi )(
Yi ) ,
(xi E [Yi ]) (
xi )
P
E [Yi ] .
P
22.12 b Substituting E [Yi ] = + xi in the last expression, we nd that

h i n E = P
(xi ( + xi )) ( xi ) [ ( + xi )] P P . x x2 ( xi )2 i
h i P P P P
22.12 c The numerator of the previous expression for E can be simplied to n which is equal to
P
xi + n
x2 n xi ( i P P n x2 ( xi )2 i
P P
xi )(
xi )
(n x2 ( xi )2 ) i . P P n x2 ( xi )2 i
h i
22.12 d From c it now follows that E = , i.e., is an unbiased estimator for . 23.1 This is the case: normal data with variance known. So we should use the formula from Section 23.2 (the case variance known):

xn 1.96 , xn + 1.96 n n
where xn = 743, = 5 and n = 16. Because z/2 = z0.025 = 1.96, the 95% condence interval is:

5 5 743 1.96 , 743 + 1.96 16 16
= (740.55, 745.45).
23.2 This is the case: normal data with variance unknown. So we should use the formula from Section 23.2 (the case variance unknown):

sn sn xn tn1,/2 , xn + tn1,/2 n n
where xn = 3.54, sn = 0.13 and n = 34. Because tn1,/2 = t33,0.01 t30,0.01 = 2.457, the 98% condence interval is:

0.13 0.13 3.54 2.457 , 3.54 + 2.457 34 34
= (3.485, 3.595).
One can redo the same calculation using t33,0.01 = 2.445 (obtained from a software package), and nd (3.4855, 3.5945). 23.3 This is the case: normal data with variance unknown. So we should use the formula from Section 23.2 (the case variance unknown):

where xn = 93.5, sn = 0.75 and n = 10. Because tn1,/2 = t9,0.025 = 2.262, the 95% condence interval is:

0.75 0.75 93.5 2.262 , 93.5 + 2.262 10 10
= (92.96, 94.036).
540
23.4 This is the case: normal data with variance unknown. So we should use the formula from Section 23.2 (the case variance unknown):

where xn = 195.3, sn = 16.7 and n = 18. Because tn1,/2 = t17,0.025 = 2.110, the 95% condence interval is:

16.7 16.7 195.3 2.110 , 195.3 + 2.110 18 18
= (186.99, 203.61).
23.5 a The standard condence interval for the mean of a normal sample with unknown variance applies, with n = 23, x = 0.82 and s = 1.78, so:

s s x t22,0.025 , x + t22,0.025 23 23
The critical values come from the t (22) distribution: t22,0.025 = 2.074. The actual interval becomes:

1.78 1.78 0.82 2.074 , 0.82 + 2.074 23 23
= (0.050, 1.590).
23.5 b Generate one thousand samples of size 23, by drawing with replacement from the 23 numbers 1.06, For each sample q
1 22
1.04,
2.62,
..., x 23
2.01. 0.82/(s / 23), where s = 23 23
x , x , . . . , x 1 2 23
compute: t =
(x x )2 . 23 i
23.5 c We need to estimate the critical value c such that P(T c ) 0.025. We l l take c = 2.101, the 25th of the ordered values, an estimate for the 25/1000 = 0.025 l quantile. Similarly, c is estimated by the 976th, which is 2.088. l The bootstrap condence interval uses the c values instead of the t-distribution values tn1,/2 , but beware: c is from the left tail and appears on the right-hand l side of the interval and c on the left-hand side: u

xn
sn c u
, xn
sn c l
n

Substituting c = 2.101 and c = 2.088, the condence interval becomes: u l

1.78 1.78 0.82 2.088 , 0.82 + 2.101 23 23
= (0.045, 1.600).
23.6 a Because events described by inequalities do not change when we multiply the inqualities by a positive constant or add or subtract a constant, the following equalities hold: P Ln < < Un = P(3Ln + 7 < 3 + 7 < 3Un + 7) = P(3Ln < 3 < 3Un ) = P(Ln < < Un ), and this equals 0.95, as is given. 23.6 b The condence interval for is obtained as the realization of (Ln , Un ), that n , un ) = (3ln + 7, 3un + 7). This is obtained by transforming the condence is: (l interval for (using the transformation that is applied to to get ).
29.1 Full solutions
541
23.6 c We start with P(Ln < < Un ) = 0.95 and try to get 1 in the middle: P(Ln < < Un ) = P(Ln > > Un ) = P(1 Ln > 1 > 1 Un ) = P(1 Un < 1 < 1 Ln ), where we see that the minus sign causes an inter change: Ln = 1 Un and Un = 1 Ln . The condence interval: (1 5, 1 (2)) = (4, 3). 23.6 d If we knew that Ln and Un were always positive, then we could conclude: 2 P(Ln < < Un ) = P L2 < 2 < Un and we could just square the numbers in the n condence interval for to get the one for . Without the positivity assumption, the sharpest conclusion you can draw from Ln < < Un is that 2 is smaller than the 2 2 maximum of L2 and Un . So, 0.95 = P(Ln < < Un ) P 0 2 < max{L2 , Un } n n 2 and the condence interval [0, max{ln , u2 }) = [0, 25) has a condence of at least n 95%. This kind of problem may occur when the transformation is not one-to-one (both 1 and 1 are mapped to 1 by squaring). 23.7 We know that (ln , un ) = (2, 3), where ln and un are the realizations of Ln and Un , that have the property P(Ln < < Un ) = 0.95. This is equivalent with P eUn < e < eLn

= 0.95,
so that e3 , e2 = (0.050, 0.135) is a 95% condence interval for P(X = 0) = e . 23.8 Dene random variables Xi = weight of ith bottle together with lling amount Wi = weight of ith bottle alone Yi = weight of ith lling amount so that Xi = Wi +Yi . It is given that Wi has a N (250, 152 ) distribution and according to Exercise 23.1, Yi has a N (y , 52 ). Since they are independent Xi = Wi + Yi has a normal distribution with expectation = 250 + y and variance 152 + 52 = 250: Xi N (250 + y , 250). Our data consist of the weights of 16 lled bottles of wine, x1 , . . . , x16 . On the basis of these we can construct a condence interval for . Since we are in the case: normal data with known variance, this is

xn z/2 , xn + z/2 n n
where xn = 998, = condence interval is:

250 and n = 16. Because z/2 = z0.025 = 1.96, the 95%
250 250 998 1.96 , 998 + 1.96 = (990.25, 1005.75). 16 16
Since, this is a 95% condence interval for = 250 + y , the 95% condence interval for y , is given by (990.25 250, 1005.75 50) = (740.25, 755.75). 23.9 a Since we do not assume a particular parametric model, we are dealing with an empirical bootstrap simulation. Generate bootstrap samples x , x , . . . , x size 1 2 n n = 2608, from the empirical distribution function of the dataset, which is equivalent to generate from the discrete distribution with probability mass function:
542
Full solutions from MIPS: DO NOT DISTRIBUTE a p(a) 0 57/2608 1 2 3 4 203/2608 383/2608 525/2608 532/2608 8 45/2608 13 1/2608 9 27/2608 14 1/2608
a 5 6 7 p(a) 408/2608 273/2608 139/2608 a p(a) Then determine: 10 10/2608 11 4/2608 t = 12 0
x 3.8715 n . s / 2608 n Repeat this one thousand times. Next estimate the values c and c such that u l P(T c ) = 0.025 l

and
P(T c ) = 0.025. u

The bootstrap condence interval is then given by: xn sn c u n , xn sn c l n
23.9 b For a 95% condence interval we need the empirical 0.025-quantile and the 0.975-th quantile, which we estimate by the 25th and 976th order statistic: cl 1.862 and cu 1.888. This results in

1.9225 1.9225 3.8715 1.862 , 3.8715 (1.888) 2608 2608
= (3.801, 3.943).
23.9 c The (estimated) critical values that we would obtain from the table are 2.228 and 2.234, instead of 1.888 and 1.862 we used in part b. Hence, the resulting interval would be larger. 23.10 a This interval has been obtained from

so that the sample mean is in the middle and must be equal to (1.6 + 7.8)/2 = 4.7. 23.10 b From the formula we see that half the width of the 95% condence interval is 7.8 1.6 sn sn = 3.1 = t15,0.025 = 2.131 . 2 n n Similarly, half the width of the 99% condence interval is sn sn 2.947 sn 2.947 t15,0.005 = 2.947 = 2.131 = 3.1 = 4.287. 2.131 2.131 n n n Hence the 99% condence interval is (4.7 4.287, 4.7 + 4.287) = (0.413, 8.987). 23.11 a For the 98% condence interval the same formula is used as for the 95% interval, replacing the critical values by larger ones. This is the case, no matter whether the critical values are from the normal or t-distribution, or from a bootstrap experiment. Therefore, the 98% interval contains the 95%, and so must also contain the number 0.
29.1 Full solutions
543
23.11 b From a new bootstrap experiment we would obtain new and, most probably, dierent values c and c . It therefore could be, if the number 0 is close to u l the edge of the rst bootstrap condence interval, that it is just outside the new interval. 23.11 c The new dataset will resemble the old one in many ways, but things like the sample mean would most likely dier from the old one, and so there is no guarantee that the number 0 will again be in the condence interval. 23.12 a This follows immediately from the change of units rule for normal random variables on page 112 and the fact that if the Zi s are independent, so are the + Zi s. 23.12 b We have
n n n 1X 1X 1X X= Xi = ( + Zi ) = + Zi = + Z. n i=1 n i=1 n i=1
From this we also nd that Xi X = ( + Zi ) ( + Z) = (Zi Z) It follows that

2 SX = n n 1 X 2 1 X 2 (Xi X)2 = (Zi Zi )2 = 2 SZ , n 1 i=1 n 1 i=1
so that SX = SZ . 23.12 c The equality follows immediately from part b, by inserting X = Z, and SX = SZ . The right hand side is the studentized mean for a random sample from a N (0, 1) distribution, and therefore its distribution does not depend on and . The left hand side is the studentized mean for a random sample from a N (, 2 ) distribution. Since the two are equal, also their distributions are, which means that the distribution of the studentized mean for a random sample from a N (, 2 ) distribution (the left hand side) also does not depend on and . 24.1 From Section 24.1, using z0.05 = 1.645 we nd the equation:
2
70 p 100
(1.645)2 p(1 p) < 0. 100
This reduces to
1.0271 p2 1.4271 p + 0.49 < 0.

p
The zeroes of this parabola are p1,2 = (1.4271) (1.4271)2 4 1.0271 0.49 = 0.6947 0.0746 2 1.0271
and the 90% condence interval is (0.6202, 0.7693). 24.2 a Since n = 6 is too small to expect a good approximation from the central limit theorem, we cannot apply the Wilson method.
544

24.2 b Solve
140 p 250
2
(1.96)2 p(1 p) = 0 250
or That is p1,2 =
1.0154 p2 1.1354 p + 0.3136 = 0.

p
(1.1354)
(1.1354)2 4 1.0154 0.3136 = 0.5591 0.0611 2 1.0154
and the 90% condence interval is (0.4980, 0.6202).
24.3 The width of the condence interval is 2 2.576 / n, where = 5. So, we require (see Section 24.4):

2 2.576 5 1
2
= (25.76)2 = 663.6,
24.4 a The width of the condence interval will be about 2tn1,0.05 s/ n. For s we substitute our current estimate of , 0.75, and we use tn1,0.05 = z0.05 = 1.645, for the moment assuming that n will be large. This results in

that is, at least a sample size of 664.
2 1.645 0.75 0.1
2
= 608.9,
so we use n = 609 (which is indeed large, so it is appropriate to use the critical value from the normal distribution). 24.4 b In our computation, we used s = 0.75. From the new dataset of size 609 we are going to compute s609 and use that in the computation. If s609 > 0.75 then the condence interval will be too wide. 24.5 a From P z/2 we deduce P

X np < p z/2 n/4
=1

z/2 z/2 X X <p< + n n 2 n 2 n

= 1 .
So, the approximate 95% condence interval is x z0.05 x z0.05 , + . n 2 n n 2 n The width is 2z0.05 /(2 n) = z0.05 / n and so n should satisfy 1.96/ n 0.01 or 2 n (196) = 38416. 24.5 b The condence interval is 19477 1.96 = 0.5070 0.005, 38416 2 38416 so of the intended width.

29.1 Full solutions
545
24.6 a The environmentalists are interested in a lower condence bound, because they would like to make a statement like We are 97.5% condence that the concentration exceeds 1.68 ppm [and that is much too high.] We have normal data, with unknown so we use s16 = 1.12 = 1.058 as an estimate and use the critical value corresponding to 2.5% from the t (15) distribution: t15,0.025 = 2.131. The lower condence bound is 2.24 2.131 1.058/ 16 = 2.24 0.56 = 1.68, the interval: (1.68, ). 24.6 b For similar reasons, the plant management constructs an upper condence bound (We are 97.5% condent pollution does not exceed 2.80 [and this is acceptable.]). The computation is the same except for a minus sign: 2.24 + 2.131 1.058/ 16 = 2.24 + 0.56 = 2.80, so the interval is [0, 2.80). Note that the computed upper and lower bounds are in fact the endpoints of the 95% two-sided condence interval. 24.7 a From the normal approximation we know Xn P z0.025 < < z0.025 / n or P i.e., P

0.95
Xn / n
2
2
<
2 z0.025
0.95,

Xn
< (1.96)2 n
0.95.
Just as with the derivation of the Wilson method (Section 24.1) we now conclude that the 95% condence interval contains those for which (n )2 (1.96)2 . x n 24.7 b We need to solve (n )2 (1.96)2 /n = 0, where xn = 3.8715 and x n = 2608, resulting in

2 2 3.8715 + or
(1.96)2 2608
+ (3.8715)2 = 0,
2 7.7446 + 14.9889 = 0.
From the roots we nd the condence interval (3.7967, 3.9478). 24.7 c The condence interval (3.7967, 3.9478)is almost the same as the one in Exercise 23.9 b. This is not surprising: n is very large, so the normal approximation should be very good. 24.8 a We solve

15 p 23
2
(1.96)2 p(1 p) = 0 23
or
1.1670 p2 1.4714 p + 0.4253 = 0,
from which we nd the condence interval (0.4489, 0.8119).
546
24.8 b Replacing the 1.96 in the equation above by z0.05 = 1.645, we nd (0.4808, 0.7915). The appropriate one-sided condence interval will provide us with a lower bound on the outer lane winning probability p, so it is (0.4808, 1]. 24.9 a From Section 8.4 we know: P(M a) = [FX (a)]12 , so P(M/ t) = P(M t) = [FX (t)]12 . Since Xi has a U (0, ) distribution, FX (t) = t, for 0 t 1. Substituting this shows the result. 24.9 b For cl we need to solve (cl )12 = /2, or cl = (/2)1/12 = (0.05)1/12 = 0.7791. For cu we need to solve (cu )12 = 1/2, or cu = (1/2)1/12 = (0.95)1/12 = 0.9958. 24.9 c From b we know that P(cl < M/ < cu ) = P(0.7790 < M/ < 0.9958) = 0.90. Rewriting this equation, we get: P(0.7790 < M < 0.9958 ) = 0.90 and P(M/0.9958 < < M/0.7790) = 0.90. This means that (m/0.9958, m/0.7790) = (3.013, 3.851) is a 90% condence interval for . 24.9 d From b we derive the general formula:

P (/2)1/n <
M < (1 /2)1/n
= 1 .
The left hand inequality can be rewritten as < M/(/2)1/n and the right hand one as M/(1 /2)1/n < . So, the statement above can be rewritten as:

M M << (1 /2)1/n (/2)1/n

= 1 ,
so that the general formula for the condence interval becomes: m m , (1 /2)1/n (/2)1/n

24.10 a From Section 11.2 we know that Sn , being the sum of n independent Exp () random variables, has a Gam (n, ) distribution. From Exercise 8.4 we know: Xi has an Exp (1) distribution. Combining these facts, it follows that Sn = X1 + +Xn has a Gam (n, 1) distribution. 24.10 b From the quantiles we see 0.9 = P(q0.05 S20 q0.95 ) = P(13.25 S20 27.88) = P(13.25/S20 27.88/S20 ) . Noting that the realization of S20 is x1 + + x20 = 20 x20 , we conclude that

13.25 27.88 , 20 x20 20 x20
0.6625 1.394 , x20 x20
is a 95% condence interval for . 25.1 The alternative hypothesis should reect the belief that arrival delays of trains exhibit more variation during rush hours than during quiet hours. Therefore take H1 : 1 > 2 . 25.2 The alternative hypothesis should reect the belief that the number of babies born in Cleveland, Ohio, in the month of September in 1977 is higher than 1472. Therefore, take H1 : > 1472.
29.1 Full solutions
547
25.3 When the regression line runs through the origin, then = 0. One possible testing problem is to test H0 : = 0 against H1 : = 0. If H0 : = 0 is rejected in favor of H1 : = 0, the parameter should be left in the model. 25.4 a Denote the observed numbers of cycles for the smokers by X1 , X2 , . . . , Xn1 and similarly Y1 , Y2 , . . . , Yn2 for the nonsmokers. A test statistic should compare estimators for p1 and p2 . Since the geometric distributions have expectations 1/p1 and 1/p2 , we could compare the estimator 1/Xn1 for p1 with the estimator 1/Yn2 for p2 , or simply compare Xn1 with Yn2 . For instance, take test statistic T = Xn1 Yn2 . Values of T close to zero are in favor of H0 , and values far away from zero are in favor of H1 . Another possibility is T = Xn1 /Yn2 . 25.4 b In this case, the maximum likelihood estimators p1 and p2 give better indi cations about p1 and p2 . They can be compared in the same way as the estimators in a. 25.4 c The probability of getting pregnant during a cycle is p1 for the smoking women and p2 for the nonsmokers. The alternative hypothesis should express the belief that smoking women are less likely to get pregnant than nonsmoking women. Therefore take H1 : p1 < p2 . 25.5 a When the maximum is greater than 5, at least one Xi is greater than 5 so that must be greater than 5, and we know for sure that the null hypothesis is false. Therefore the set of relevant values of T1 = max{X1 , X2 , . . . , Xn } is the interval [0, 5]. Similar to Exercise 8.15, one can argue that E [T1 ] = n/(n + 1). Hence values of T1 close to 5n/(n + 1) are in favor of H0 . Values of T1 in the neighborhood of 0 indicate that < 5, and values of T1 very close to 5 indicate that > 5. Both these regions are in favor of H1 .
values of T1 in favor of H1 0
values of T1 in favor of H0
5n n+1
25.5 b When the distance between 2Xn and 5 is greater than 5, then 2Xn must be greater than 10, which means that at least one Xi is greater than 5. In that case we know for sure that the null hypothesis is false. Therefore the set of relevant values of T2 = |2Xn 5| is the interval [0, 5]. Since Xn will be close to /2, values of T2 close to zero are in favor of H0 . Values of T2 far away from zero either correspond to 2Xn far below 5, which indicates < 5, or correspond to 2Xn far above 5, which indicates > 5. Hence values of T2 far away from zero are in favor of H1 .
25.6 a The p-value P(T 2.34) = 0.23 is larger than 0.05, so do not reject. 25.6 b The p-value P(T 2.34) = 1 P(T 2.34) = 0.77 is larger than 0.05, so do not reject. 25.6 c The p-value P(T 0.03) = 0.968 is larger than 0.05, so do not reject.
548
25.6 d The p-value P(T 1.07) = 1 P(T 1.07) = 0.019 is less than 0.05, so reject. 25.6 e The p-value P(T 1.07) P(T 2.34) = 0.99, which is larger than 0.05, so do not reject. 25.6 f The p-value P(T 2.34) P(T 1.07) = 0.0.019, which is smaller than 0.05, so reject. 25.6 g The p-value P(T 2.34) P(T 1.07) = 0.200. Therefore, the p-value is smaller than 0.200, but that does not give enough information to decide about the null hypothesis. 25.7 a Since the parameter is the expectation of T , values of T much larger than 1472 suggest that > 1472. Because we test H0 : = 1472 against H1 : > 1472, the more values of T are to the right, the stronger evidence they provide in favor of H1 . 25.7 b According to part a, values to the right of t = 1718 bare stronger evidence in favor of H1 . Therefore, the p-value is P(T 1718), where T has a Poisson distribution with = 1472. Because the distribution of T can be approximated by a normal distribution with mean 1472 and variance 1472, we can approximate this probability as follows

P(T 1718) = P
T 1472 1718 1472 1472 1472
P(Z 6.412)
where Z has an N (0, 1) distribution. From Table ?? we see that the latter probability is almost zero (to be precise, 7.28 1011 , which was obtained using a statistical software package). 25.8 The values of Fn and lie between 0 and 1, so that the maximal distance between the two graphs also lies between 0 and 1. In fact, since Fn has jumps of size 1/n, the minimal value of T must be half the size of a jump: 1/(2n). This would correspond to the situation, where at each observation, the graph of precisely runs through the middle of the two heights of Fn . When the graph of Fn lies far to the right (or to the left) of that of , the maximum distance between the two graphs can be arbitrary close to 1. When the dataset is a realization from a distribution dierent from the standard normal, the corresponding distribution function F will dier from . Since Fn F (recall Table 17.2), the graph of Fn will show large dierences with that of resulting in a relatively large value of T . On the other hand, when the dataset is indeed a realization from the standard normal, then Fn , resulting in a relative small value of T . We conclude that only large values of T close to 1 are evidence against the null hypothesis. 25.9 Only values of Tks close to 1 are evidence against the null hypothesis. Therefore the p-value is P(Tks 0.176). On the basis of the bootstrap results, this probability is approximated by the relative frequency of Tks -values greater than or equal to 0.176, which is zero. 25.10 a The alternative hypothesis should express the belief that the gross caloric exceeds 23.75 MJ/kg. Therefore take H1 : > 23.75.
29.1 Full solutions
549
25.10 b The p-value is the probability P Xn 23.788 under the null hypothesis. We can compute this probability by using that under the null hypothesis Xn has a N (23.75, (0.1)2 /23) distribution: Xn 23.75 23.788 23.75 P Xn 23.788 = P 0.1/ 23 0.1/ 23

= P(Z 1.82) ,
where Z has an N (0, 1) distribution. From Table ?? we nd P(Z 1.82) = 0.0344. 25.11 A type I error occurs when = 0 and |t| 2. When = 0, then T has an N (0, 1) distribution. Hence, by symmetry of the N (0, 1) distribution and Table ??, we nd that the probability of committing a type I error is P(|T | 2) = P(T 2) + P(T 2) = 2 P(T 2) = 2 0.0228 = 0.0456. 26.1 A type I error is to falsely reject the null hypothesis, i.e., to falsely conclude suspect is guilty. This happened in 9 out of 140 cases. Hence, the probability of a type I error is 9/140 = 0.064. A type II error is to falsely accept the null hypothesis, i.e., to falsely conclude suspect is innocent. This happened in 15 out of 140 cases. Hence, the probability of a type II error is 15/140 = 0.107. 26.2 According to Exercise 25.11, we do not reject if |T | < 2. Therefore the probability of a type II error is P(|T | < 2) = P(2 < T < 2) = P(T < 2) P(T < 2) , where T has a N (1, 1) distribution. Using that T 1 has a N (0, 1) distribution, we nd that P(T < 2) P(T < 2) = P(T 1 < 1) P(T 1 < 3) = 0.8413 0.0013 = 0.84. 26.3 a A type I error is to falsely reject the null hypothesis H0 : = 2. We reject when X 0.1 or X 1.9. Because under the null hypothesis, X has a U (0, 2) distribution, the probability of committing a type I error is P(X 0.1 | = 2) + P(X 1.9 | = 2) = 0.05 + 0.05 = 0.1. 26.3 b In this case a type II error is to falsely accept the null hypothesis when = 2.5 We accept when 0.1 < X < 1.9. Because under = 2.5, X has a U (0, 2.5) distribution, the probability of committing a type I error is P(0.1 < X < 1.9 | = 2.5) = 1.9 0.1 = 0.72. 2.5
26.4 a Since T has a Bin (144, p) distribution, values of T close to 144/8 = 18 are in favor of the null hypothesis. Values of T far above 18 indicate that p > 1/8, whereas values far below 18 indicate that p < 1/8. This means we reject only for values of T far above 18. Hence we are only dealing with a right critical value. 26.4 b Denote the right critical value by c. Then we must solve P(T c) = 0.01. Using the normal approximation for the binomial distribution,
550
Full solutions from MIPS: DO NOT DISTRIBUTE c np P(T c) P Z p np(1 p)

! 0 1
c 18 A , = P @Z q 18 7 8
where Z has a N (0, 1) distribution. We nd c by solving

q
c 18 18
7 8
= z0.01 = 2.326,
which gives c = 27.2. Because T can only take integer values, the right critical value is taken to be 28, and the critical region is {28, 29, , 144}. Since the observed number t = 29 falls in the critical region, we reject H0 : p = 1/8 in favor of H1 : p > 1/8. 26.5 a The p-value is P(X 15) under the null hypothesis H0 : p = 1/2. Using Table 26.3 we nd P(X 15) = 1 P(X 14) = 1 0.8950 = 0.1050. 26.5 b Only values close to 23 are in favor of H1 : p > 1/2, so the critical region is of the form K = {c, c + 1, . . . , 23}. The critical value c is the smallest value, such that P(X c) 0.05 under H0 : p = 1/2, or equivalently, 1 P(X c 1) 0.05, which means P(X c 1) 0.95. From Table 26.3 we conclude that c 1 = 15, so that K = {16, 17, . . . , 23}. 26.5 c A type I error occurs if p = 1/2 and X 16. The probability that this happens is P(X 16 | p = 1/2) = 1 P(X 15 | p = 1/2) = 1 0.9534 = 0.0466, where we have used Table 26.3 once more. 26.5 d In this case, a type II error occurs if p = 0.6 and X 15. To approximate P(X 15 | p = 0.6), we use the same reasoning as in Section 14.2, but now with n = 23 and p = 0.6. Write X as the sum of independent Bernoulli random variables: X = R1 + + Rn , and apply the central limit theorem with = p = 0.6 and 2 = p(1 p) = 0.24. Then P(X 15) = P(R1 + + Rn 15) R1 + + Rn n 15 n =P n n 15 13.8 = P Z23 (0.51) = 0.6950. 0.24 23 26.6 a Because T has a Pois () distribution (see Exercise 25.7), we always have E [T ] = . Therefore values of T around 1472 are in favor of the null hypothesis, values of T far to the left of 1472 are in favor of < 1472, and values of T far to the right 1472 are in favor of > 1472. Therefore, only values far to the right of 1472 are in favor of H1 : > 1472, so that we only have a right critical value. 26.6 b Since, according to part a, we only have a right critical value c, we must solve P(T c) = 0.05. Using the normal approximation

P(T c) = P
T c
c 1472 P Z 1472
where Z has a N (0, 1) distribution. We nd c by solving c 1472 = z0.05 = 1.645, 1472
29.1 Full solutions
551
which gives c = 1535.1. Because T can only take integer values we take right critical value 1536, and critical region {1536, 1537, }. The observed number 1718 falls into the critical region, so that we reject H0 : = 1472 in favor of H1 : > 1472. 26.7 We must solve P(X1 + X2 c) = 0.05 under the null hypothesis. Under the null hypothesis, X1 and X2 are independent random variables with a U (0, 1) distribution. According to Exercise 11.5, the random variable T = X1 + X2 has density f (t) = t, for 0 t 1, and f (t) = 2 t, for 1 t 2. Since this density integrates to 0.5 for 0 t 1, we can nd c by solving
Z
0 1 or equivalently 2 c2 = 0.05, which gives left critical value c = 0.316. The corresponding critical region for T = X1 + X2 is [0, 0.316]. 26.8 a Test statistic T = Xn takes values in (0, ). Recall that the Exp () distri bution has expectation 1/, and that according to the law of large numbers Xn will be close to 1/. Hence, values of Xn close to 1 are in favor of H0 : = 1, and only values of Xn close to zero are in favor H1 : > 1. Large values of Xn also provide evidence against H0 : = 1, but even stronger evidence against H1 : > 1. We conclude that T = Xn has critical region K = (0, cl ]. This is an example in which the alternative hypothesis and the test statistic deviate from the null hypothesis in opposite directions. Test statistic T = eXn takes values in (0, 1). Values of Xn close to zero correspond n correspond to values of T close to values of T close to 1, and large values of X to 0. Hence, only values of T close to 1 are in favor H1 : > 1. We conclude that T has critical region K = [cu , 1). Here the alternative hypothesis and the test statistic deviate from the null hypothesis in the same direction. 26.8 b Again, values of Xn close to 1 are in favor of H0 : = 1. Values of Xn close to zero suggest > 1, whereas large values of Xn suggest < 1. Hence, both small and large values of Xn are in favor of H1 : = 1. We conclude that T = Xn has critical region K = (0, cl ] [cu , ). Small and large values of Xn correspond to values of T close to 1 and 0. Hence, values of T both close to 0 and close 1 are in favor of H1 : = 1. We conclude that T has critical region K = (0, cl ] [cu , 1). Both test statistics deviate from the null hypothesis in the same directions as the alternative hypothesis. 26.9 a Test statistic T = (Xn )2 takes values in [0, ). Since is the expectation of the N (, 1) distribution, according to the law of large numbers, Xn is close to . Hence, values of Xn close to zero are in favor of H0 : = 0. Large negative values of Xn suggest < 0, and large positive values of Xn suggest > 0. Therefore, both large negative and large positive values of Xn are in favor of H1 : = 0. These values correspond to large positive values of T , so T has critical region K = [cu , ). This is an example in which the test statistic deviates from the null hypothesis in one direction, whereas the alternative hypothesis deviates in two directions. Test statistic T takes values in (, 0) (0, ). Large negative values and large positive values of Xn correspond to values of T close to zero. Therefore, T has critical region K = [cl , 0) (0, cu ]. This is an example in which the test statistic deviates from the null hypothesis for small values, whereas the alternative hypothesis deviates for large values. c
t dt = P(T c) = 0.05,
552
26.9 b Only large positive values of Xn are in favor of > 0, which correspond to large values of T . Hence, T has critical region K = [cu , ). This is an example where the test statistic has the same type of critical region with a one-sided or two-sided alternative. Of course, the critical value cu in part b is dierent from the one in part a. Large positive values of Xn correspond to small positive values of T . Hence, T has critical region K = (0, cu ]. This is another example where the test statistic deviates from the null hypothesis for small values, whereas the alternative hypothesis deviates for large values. 27.1 a The value of the t-test statistic is t= xn 10 11 10 = = 2. sn / n 2/ 4
The right critical value is tn1,/2 = t15,0.025 = 2.131. The observed value t = 2 is smaller than this, so we do not reject H0 : = 10 in favor of H1 : = 10. 27.1 b The right critical value is now tn1, = t15,0.05 = 1.753. The observed value t = 2 is larger than this, so we reject H0 : = 10 in favor of H1 : > 10. 27.2 a The belief that the pouring temperature is at the right target setting is put to the test. The alternative hypothesis should represent the belief that the pouring temperature diers from the target setting. Hence, test H0 : = 2550 against H1 : = 2550. 27.2 b The value of the t-test statistic is t= xn 2550 2558.7 2550 = 1.21. = sn / n 517.34/ 10
Because H1 : = 2550, both small and large values of T are in favor of H1 . Therefore, the right critical value is tn1,/2 = t9,0.025 = 3.169. The observed value t = 1.21 is smaller than this, so we do not reject H0 : = 2550 in favor of H1 : = 2550. 27.3 a The alternative hypothesis should represent the belief that the load at failure exceeds 10 MPa. Therefore, take H1 : > 10. 27.3 b The value of the t-test statistic is t= 13.71 10 xn 10 = = 4.902. sn / n 3.55/ 22
Because H1 : > 10, only large values of T are in favor of H1 . Therefore, the right critical value is tn1, = t21,0.05 = 1.721. The observed value t = 4.902 is larger than this, so we reject H0 : = 10 in favor of H1 : > 10. 27.4 The value of the t-test statistic is t= xn 31 31.012 31 = = 0.435. sn / n 0.1294/ 22
Because H1 : > 31, only large values of T are in favor of H1 . Therefore, the right critical value is tn1, = t21,0.01 = 2.518. The observed value t = 0.435 is smaller than this, so we do not reject H0 : = 31 in favor of H1 : > 31.
29.1 Full solutions
553
27.5 a The interest is whether the inbreeding coecient exceeds 0. Let represent this coecient for the species of wasps. The value 0 is the a priori specied value of the parameter, so test null hypothesis H0 : = 0. The alternative hypothesis should express the belief that the inbreeding coecient exceeds 0. Hence, we take alternative hypothesis H1 : > 0. The value of the test statistic is t= 0.044 = 0.70. 0.884/ 197
27.5 b Because n = 197 is large, we approximate the distribution of T under the null hypothesis by an N (0, 1) distribution. The value t = 0.70 lies to the right of zero, so the p-value is the right tail probability P(T 0.70). By means of the normal approximation we nd from Table ?? that the right tail probability P(T 0.70) 1 (0.70) = 0.2420. This means that the value of the test statistic is not very far in the (right) tail of the distribution and is therefore not to be considered exceptionally large. We do not reject the null hypothesis. 27.6 The belief that the intercept is zero is put to the test. The alternative hypothesis should represent the belief that the intercept diers from zero. Therefore, test H0 : = 0 against H1 : = 0. The value of the t-test statistic is ta = 5.388 = = 2.875. sa 1.874
Because H1 : = 0, both small and large values of Ta are in favor of H1 . Therefore, the right critical value is tn1,/2 = t5,0.05 = 2.015. The observed value t = 2.875 is larger than this, so we reject H0 : = 0 in favor of H1 : = 0. 27.7 a The data are modeled by a simple linear regression model: Yi = + xi , where Yi is the gas consumption and xi is the average outside temperature in the ith week. Higher gas consumption as a consequence of smaller temperatures corresponds to < 0. It is natural to consider the value 0 as the a priori specied value of the parameter (it corresponds to no change of gas consumption). Therefore, we take null hypothesis H0 : = 0. The alternative hypothesis should express the belief that the gas consumption increases as a consequence of smaller temperatures. Hence, we take alternative hypothesis H1 : < 0. The value of the test statistic is tb = 0.3932 = = 20.06. sb 0.0196
The test statistic Tb has a t-distribution with n 2 = 24 degrees of freedom. The value 20.06 is smaller than the left critical value t24,0.05 = 1.711, so we reject. 27.7 b For the data after insulation, the value of the test statistic is tb = 0.2779 = 11.03, 0.0252
and Tb has a t (28) distribution. The value 11.03 is smaller than the left critical value t28,0.05 = 1.701, so we reject.
554
28.1 a H0 : 1 = 2 and H1 : 1 = 2 ; The value of the test statistic is tp = 2.130. The critical values are t142,0.025 . These are not in Table ??, but 1.96 < t142,0.025 < 2.009, so that we reject the null hypothesis. 28.1 b The value of the test statistic is the same (see Exercise 28.4). The normal approximation yields that the critical values are 1.96. The observed value td = 2.130 is smaller than the left critical value 1.96, so that we reject the null hypothesis. 28.1 c The observed value td = 2.130 is smaller than the left critical value 2.004, so that we reject the null hypothesis. The salaries dier signicantly. 28.2 First consider testing H0 : 1 = 2 against H1 : 1 = 2 . In view of the observed sample variances, there is no reason to assume equal variances, so compute s2 = d and s2 s2 7.77 25.33 y x + = + = 0.1071, n m 775 261
xn ym 39.08 37.59 = = 4.553. sd 0.1071 With sample sizes n1 = 775 and n2 = 261 we can use the normal approximation. The p-value is P(Td 4.553) 0, so that we reject H0 : 1 = 2 in favor of H1 : 1 = 2 . The other testing problems are handled in the same way. For testing H0 : 1 = 3 against H1 : 1 = 3 , we nd td = s2 = d 7.77 4.95 + = 0.0178 775 633 and td = 39.08 39.60 = 3.898 0.0178
with p-value P(Td 3.898) 0, so that we reject H0 : 1 = 3 in favor of H1 : 1 = 3 . For testing H0 : 2 = 3 against H1 : 2 = 3 , we nd s2 = d 25.33 4.95 + = 0.1049 261 633 and td = 37.59 39.60 = 6.206 0.1049
with p-value P(Td 6.206) 0, so that we reject H0 : 2 = 3 in favor of H1 : 2 = 3 . 28.3 a The value of the test statistic is tp = 22.43 11.01 = 2.492. 4.58
Under the assumption of normal data with equal variances, we must compare this with the right critical value t43,0.025 . This is not Table ??, but 2.009 < t43,0.025 < 2.021. Hence, tp > t43,0.025 , so that we reject the null hypothesis. 28.3 b The value tp = 2.492 is greater than the right critical value 1.959, so that we reject on the basis of the bootstrap simulation. 28.3 c The value of the test statistic is td = 22.43 11.01 = 2.463. 4.64
Without the assumption of equal variances, and using the normal approximation, we must compare this with the right critical value z0.025 = 1.96. Since tp > 1.96, we reject the null hypothesis.
29.1 Full solutions
555
28.3 d Because we test at level 0.05, we reject if the right tail probability corresponding to td = 2.463 is smaller than 0.025. Since this is the case, we reject on the basis of the bootstrap simulation. 28.4 When n = m, then
2 Sp = 2 2 (n 1)SX + (n 1)SY n+n2
1 1 + n n
2 2 (n 1)(SX + SY ) 2 S2 S2 2 = X + Y = Sd . 2(n 1) n n n
2 2 2 2 28.5 a When aSX + bSY is unbiased for 2 , we should have E aSX + bSY = 2 . 2 2 2 2 2 2 Using that SX and SY are both unbiased for , i.e., E SX = and E SY = 2 , we get 2 2 2 2 E aSX + bSY = aE SX + bE SY = (a + b) 2 . 2 2 Hence, E aSX + bSY = 2 for all > 0 if and only if a + b = 1. 2 2 28.5 b By independence of SX and SY write 2 2 Var aSX + (1 a)SY 2 2 = a2 Var SX + (1 a)2 Var SY
(1 a)2 a2 + n1 m1
2 4 .
To nd the value of a that minimizes this, dierentiate with respect to a and put the derivative equal to zero. This leads to 2(1 a) 2a = 0. n1 m1 Solving for a yields a = (n 1)/(n + m 2). Note that the second derivative of 2 2 Var aSX + (1 a)SY is positive so that this is indeed a minimum. 28.6 a By independence of X1 , X2 , . . . , Xn and Y1 , Y2 , . . . , Ym we have 2 2 Var Xn Ym = Var Xn + Var Ym = X + Y . n m 28.6 b
2 E Sp = 2 2 (n 1)E SX + (m 1)E SY n+m2 2 2 (n 1)X + (m 1)Y n+m2
1 1 + n m
1 1 + n m .
2 2 In principle this may dier from Var Xn Ym = X /n + Y /m.
28.6 c First note that

2 2 2 2 2 2 E aSX + bSY = aE SX + bE SY = aX + bY . 2 2 For unbiasedness this must equal X /n + Y /m for all X > 0 and Y > 0, with a, b not depending on X and Y . This is only possible for a = 1/n and b = 1/m.
556
2 2 28.6 d From part a together with X = Y = 2 , we have 2 E Sp = 2 2 (n 1)X + (m 1)Y n+m2 2
(n 1) + (m 1) = n+m2
1 1 + n m

1 1 + n m
1 1 + n m
28.6 e No, not in general, see part b. If n = m, then according to (the computations 2 2 2 in) Exercise 28.4, Sp = Sd . Since, according to part c, Sd is always an unbiased 2 estimator for Var Xn Ym it follows that Sp is also an unbiased estimator for Var Xn Ym . One may also check this as follows:
2 E Sp = 2 2 (n 1)X + (m 1)Y n+m2 2 2 (n 1)X + (n 1)Y n+n2

1 1 + n m
2 2 = (X + Y )
n1 n+n2
1 1 + n n
1 1 + n n

2 2 = (X + Y )
1 2 2 = X + Y. n n n

Probability and Statistics Book Solutions

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Probability and Statistics Book Solutions

Transféré par

Droits d'auteur :

Formats disponibles

29 A Modern Introduction to Probability and Statistics

Full Solutions February 24, 2006

F.M. Dekking, C. Kraaikamp, H.P. Lopuha, L.E. Meester a

Full solutions from MIPS: DO NOT DISTRIBUTE

29.1 Full solutions

29.1 Full solutions

Full solutions from MIPS: DO NOT DISTRIBUTE

29.1 Full solutions

Full solutions from MIPS: DO NOT DISTRIBUTE

29.1 Full solutions

Full solutions from MIPS: DO NOT DISTRIBUTE

29.1 Full solutions

Full solutions from MIPS: DO NOT DISTRIBUTE a pX (a) 1

4.6 b Setting, for i = 1, 2, 3,

29.1 Full solutions

c c c c c c c P(D1 Dk1 ) = P(Dk1 | D1 Dk2 ) P(D1 Dk2 ) ,

Full solutions from MIPS: DO NOT DISTRIBUTE

(cf. Exercise 2.5.)

Full solutions from MIPS: DO NOT DISTRIBUTE

put X = 2, etc., i.e., if (i 1)/6

29.1 Full solutions 6.5 We see that Xa ln U a ln U a U ea ,

0.2 ln(1 u).

Full solutions from MIPS: DO NOT DISTRIBUTE

Si1 1.05 Si1

if Ui < 0.25, if 0.25 Ui 0.75, if Ui > 0.75.

29.1 Full solutions

Then: Var(T ) = E T 2 (E [T ])2 =

Full solutions from MIPS: DO NOT DISTRIBUTE

Then Var(X) = E [X] (E [X]) =

0 as x , which happens for all > 1.

[2 2+12 +2] (1)2 (2)

0, but also Var(X) = E X 2 (E [X]2 )

29.1 Full solutions

G(y) = P(V y) = P(2U + 7 y) = P U

Full solutions from MIPS: DO NOT DISTRIBUTE

8.5 c We simply dierentiate FY : fY (y) = and fY (y) = 0 elsewhere. P X P X

8.6 a Compute the distribution function of Y : FY (y) = P(Y y) = P

where you use that P X <

since X has a continuous distribution. Dierentiating we obtain: fY (y) =

for y > 0 (fY (y) = 0 for y 0).

8.6 b Applying part a with Z = 1/Y we obtain fZ (z) = a again: fZ (z) =

29.1 Full solutions

Full solutions from MIPS: DO NOT DISTRIBUTE

for < z < .

9.1 For a and b from 1 to 4 we have P(X = a) = P(X = a, Y = 1) + + P(X = a, Y = 4) = and 1 , 4

29.1 Full solutions

1/14 5/14 4/14 2/14 2/14

1/4 0 1/4 1/2 0 1/2 0 1/2 1/4 1/2 1/4 1

9.6 b Since P(U = 0, V = 0) = are dependent.

= P(U = 0) P(V = 0), we nd that U and V

0.22 0.15 0.06 0.11 0.24 0.22 0.33 0.39 0.28

Full solutions from MIPS: DO NOT DISTRIBUTE

2e . 9.9 c There are two ways to determine fX (x):

29.1 Full solutions 9.10 b Since f (x, y) = 0 for x < 0 or y < 0,

c dx dy = c area of D1 , we nd that c = 1/.

Full solutions from MIPS: DO NOT DISTRIBUTE

9.13 c For x between 1 and 1,

9.15 c For x between 0 and 1,

For y between 0 and 1,

29.1 Full solutions

s(t s) s(t s) t2 (t s)2 s2 + + 2 = . a2 a2 a a2

9.18 b Using the identity, we nd

But then we have that

Full solutions from MIPS: DO NOT DISTRIBUTE