Vous êtes sur la page 1sur 22

# Statistics Solutions: Weeks 1 7

CVEN2002/2702

Solutions Week 1 7

## Adapted from a 2011 exam

In an air-pollution study, ozone concentrations were taken in a large
California city at 5.00 p.m. The eight readings (in parts per million)
were
7.9, 11.3, 6.9, 12.7, 13.2, 8.8, 9.3, 10.6
1

## Draw a boxplot of the data and comment on its feature.

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

2 / 22

1

## sample mean x = 10.0875,

sample variance s2 = 5.0670,
sample sd s = 2.2510

## median m = 9.95, q1 = 8.35, q3 = 12,

hence five number summary is
{ 6.9 8.35 9.95 12 13.2 }

## iqr = 12 8.35 = 3.65,

q1 1.5iqr = 2.875,
q3 + 1.5 iqr = 17.475, hence no outlier

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

3 / 22

Week 3

## Equally likely outcomes: example

Example
A computer system uses passwords that are 6 characters and each character
is one of the 26 letters (a-z) or 10 integers (0-9). Uppercase letters are not
used. Let A the event that a password begins with a vowel (either a, e, i, o or
u) and let B denote the event that a password ends with an even number
(either 0, 2, 4, 6 or 8). Suppose a hacker selects a password at random.
What are the probabilities P(A), P(B), P(A B) and P(A B) ?
All passwords are equally likely to be selected classical definition of
probability total number of cases = 366 = 2, 176, 782, 336
P(A) =

5 365
5
=
= 0.1389
366
36

P(B) =

365 5
5
=
= 0.1389
366
36

5 364 5
25
=
= 0.0193
6
36
362
P(A B) = P(A) + P(B) P(A B) = 2 0.1389 0.0193 = 0.2585
P(A B) =

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

4 / 22

Week 3

## Equally likely outcomes: example

Example: the birthday problem
If n people are present in a room, what is the probability that at least two of
them celebrate their birthday on the same day of the year ? How large need n
to be so that this probability is more than 1/2 ?
1.0

We have:
365
n

0.8

n!
,
365n

0.0


365
n n!
=1
365n

0.2

0.4

so that

prob

0.6

## Prob > 1/2 n > 23

20

40

60

80

100

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

5 / 22

Week 3

Example
A bin contains 5 defective, 10 partially defective and 25 acceptable
transistors. Defective transistors immediately fail when put in use, while
partially defective ones fail after a couple of hours of use. A transistor is
chosen at random from the bin and put into use. If it does not immediately
fail, what is the probability it is acceptable?
Define the following events:
A = the selected transistor is acceptable
P(PD) =

PD = it is partially defective
D = it is defective

P(D) =

F = it fails immediately

P(A) =

25
40

10
40

5
40

P(F ) = P(D) =

5
40

Now,
P(A|F c ) = P(F c |A)

CVEN2002/2702 (Statistics)

P(A)
25/40
25
=1
=
c
P(F )
1 5/40
35

Dr Joanna Wang

Solutions Week 1 7

6 / 22

Week 3

Example
Example
We toss two fair dice, denote E1 =the sum of the dice is six, E2 =the sum
of the dice is seven and F =the first die shows four. Are E1 and F
independent? Are E2 and F independent?
Recall that S = {(1, 1), (1, 2), (1, 3), . . . , (6, 5), (6, 6)} (there are thus 36
possible outcomes).
E1 = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}

P(E1 ) = 5/36

E2 = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}

P(E2 ) = 6/36

F = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)}

P(F ) = 6/36

E1 F = {(4, 2)}

P(E1 F ) = 1/36

E2 F = {(4, 3)},

P(E2 F ) = 1/36

## Hence, P(E1 F ) 6= P(E1 )P(F ) and P(E2 F ) = P(E2 )P(F )

E2 and F are independent, but E1 and F are not.
CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

7 / 22

Week 3

Example
A new medical procedure has been shown to be effective in the early
detection of an illness and a medical screening of the population is proposed.
The probability that the test correctly identifies someone with the illness as
positive is 0.99, and the probability that someone without the illness is
correctly identified by the test is 0.95. The incidence of the illness in the
general population is 0.0001. You take the test, and the result is positive.
What is the probability that you have the illness?
Let I = event that you have the illness, T = positive outcome of the screening
test for illness. From the question we have,
P(T |I) = 0.99,

P(T C |I C ) = 0.95,

P(I) = 0.0001.

## We aim to find P(I|T ), using bayes second rule,

P(T |I)P(I)
P(T |I)P(I) + P(T |I C )P(I C )
0.99 0.0001
=
0.99 0.0001 + 0.05 0.9999

P(I|T ) =

CVEN2002/2702 (Statistics)

Dr Joanna Wang

= 0.001976
Solutions Week 1 7

8 / 22

Week 3

Example
Suppose a multiple choice test, with m multiple-choice alternatives for each
question. A student knows the answer of a given question with probability p.
If she does not know, she guesses. Given that the student correctly answered
a question, what is the probability that she effectively knew the answer?
Let C =she answers the question correctly and K =she knows the
answer. Then, we desire P(K |C). We have
P(K )
P(C)
P(C|K ) P(K )
=
P(C|K ) P(K ) + P(C|K c ) P(K c )
1p
=
1 p + (1/m) (1 p)
mp
=
1 + (m 1)p

## P(K |C) = P(C|K )

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

9 / 22

Week 4

## Continuous random variables: examples

Examples of continuous random variables include: electrical current, length,
pressure, temperature, time, voltage, weight, speed of a car, amount of
alcohol in a persons blood, efficiency of solar collector, strength of a new
alloy, . . .
continuous random variables generally arise when we measure
things

Example
Let X denote the current measured in a thin copper wire (in mA). Assume

that the pdf of X is
C(4x 2x 2 ) if 0 < x < 2
f (x) =
0
otherwise
What is the value of C ? Find P(X > 1.8)
R +
R2
We must have f (x) dx = 1, so C 0 (4x 2x 2 ) dx = C 83 = 1, that is
C = 3/8
R +
R2
Then, P(X > 1.8) = 1.8 f (x) dx = 3/8 1.8 (4x 2x 2 ) dx = 0.028
CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

10 / 22

Week 4

Expectation: examples
Example 1
What is the expectation of the outcome when a fair die is rolled?
X = outcome, SX = {1, 2, 3, 4, 5, 6} with p(x) = 1/6 for any x SX
= E(X ) = 1 1/6 + 2 1/6 + 3 1/6 + 4 1/6 + 5 1/6 + 6 1/6
= 3.5
need not be a possible outcome !
is not the most likely outcome (this is called the mode)

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

11 / 22

Week 4

Example 2
What is the expected sum when two fair dice are rolled?
X = sum of the two dice,
SX = {2, 3, . . . , 12} with
p(x) = (6 |7 x|)/36 for any x SX
= E(X ) = 2 1/36 + 3 2/36 + . . . + 12 1/36 = 7

## Example 3: Bernoulli r.v. (see Slide 13)

What is the expectation of a Bernoulli r.v.?
E(X ) = 0 (1 ) + 1 =

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

12 / 22

Week 4

Variance : examples
Example 1
What is the variance of the number of points shown when a fair die is rolled?
X = outcome, SX = {1, 2, 3, 4, 5, 6} with p(x) = 1/6 for any x SX
E(X 2 ) = 12 1/6 + 22 1/6 + 32 1/6 + 42 1/6 + 52 1/6 + 62 1/6
= 91/6
We know that = 3.5 (Slide 23), so that
2 = E(X 2 ) 2 = 91/6 3.52 ' 2.92

## The standard deviation is = 2.92 ' 1.71

Example 2
What is the variance of the sum of the points when 2 fair dice are rolled ?
(Exercise) Check that 2 ' 5.83, ' 2.41
CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

13 / 22

Week 4

## Expectation of a function of two random variables

For instance, in the continuous case,
Z Z
E(aX + bY ) =
(ax + by )fXY (x, y )dy dx
SX SY
Z Z
Z Z
=
ax fXY (x, y )dy dx +
by fXY (x, y )dy dx
SX SY
SX SY
Z
Z
Z
Z
=a
x
fXY (x, y )dy dx + b
y
fXY (x, y )dx dy
SX
SY
SY
SX
Z
Z
yfY (y )dy
xfX (x)dx + b
=a
SY

SX

= aE(X ) + bE(Y )

Example
What is the expected sum obtained when two fair dice are rolled?
Let X be the sum and Xi the value shown on the ith die. Then, X = X1 + X2 ,
and
E(X ) = E(X1 ) + E(X2 ) = 2 3.5 = 7
CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

14 / 22

Week 5

## Binomial distribution: examples

Example
It is known that disks produced by a certain company will be defective with
probability 0.01 independently of each other. The company sells the disk in
packages of 10 and offers a money-back guarantee if more than 1 of the
disks is defective. a) In the long-run, what proportion of packages is
returned? b) If someone buys three packages, what is the probability that
exactly one of them will be returned?
a) Let X be the number of defective disks in a package. Then, it is clear that
X Bin(10, 0.01)
Hence,
P(X > 1) = 1 P(X = 0) P(X = 1)
 
 
10
10
=1
0.010 0.9910
0.011 0.999 ' 0.004
0
1
in the long-run, 0.5 percent of the packages will have to be replaced
CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

15 / 22

Week 5

## Binomial distribution: examples

Example
It is known that disks produced by a certain company will be defective with
probability 0.01 independently of each other. The company sells the disk in
packages of 10 and offers a money-back guarantee if more than 1 of the
disks is defective. a) In the long-run, what proportion of packages is
returned? b) If someone buys three packages, what is the probability that
exactly one of them will be returned?
b) Let Y be the number of packages that the person will have to return. We
have
Y Bin(3, )
where is the probability that a package is returned, that is, contains more
than 1 defective disk. In a), we found that = 0.004
Thus, the probability that exactly one of the three packages will be returned is
 
3
P(Y = 1) =
0.0051 0.9962 = 0.013
1
CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

16 / 22

Week 5

## Poisson distribution: examples

Example
Over a 10-minute period, a counter records an average of 1.3 gamma
particles per millisecond coming from a radioactive substance. To a good
approximation, the distribution of the count, X , of gamma particles during the
next millisecond is Poisson distributed. Determine a) , b) the probability of
observing one or more gamma particles during the next millisecond and c)
the variance of this number
a) The mean of the Poisson distribution is , so we can approximate by the
long-run average of the number of particles per millisecond, that is, ' 1.3.
So we have
X P(1.3)
b) Thus,
1.30
= 1 e1.3 = 0.727
0!
c) The variance of the Poisson distribution is also equal to , hence
P(X 1) = 1 P(X = 0) = 1 e1.3

## Var(X ) = 1.3 (particles2 )

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

17 / 22

Week 5

## Uniform distribution: example

The probability that X lies in any subinterval [a, b] of [, ] is:
P(a < X < b) =

ba

(area of a rectangle)

Example
Buses arrive at a specified stop at 15-minute intervals starting at 7 A . M . That
is, they arrive at 7, 7:15, 7:30, 7:45, etc. If a passenger arrives at the stop at a
time uniformly distributed between 7 and 7:30, find the probability that he
waits less than 5 minutes for a bus
Let X denote the time (in minutes) past 7 A . M . that the passenger arrives at
the stop. We have X U[0,30]
The passenger will have to wait less than 5 min if he arrives between 7:10
and 7:15 or between 7:25 and 7:30. This happens with probability
P((10 < X < 15) (25 < X < 30)) = P(10 < X < 15) + P(25 < X < 30)
5
5
1
=
+
=
30 30
3
CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

18 / 22

Week 5

## Exponential distribution: example

Example
Suppose that, on average, 3 trucks arrive per hour to be unloaded at a
warehouse. What is the probability that the time between the arrivals of two
successive trucks will be a) less than 5 minutes? b) at least 45 minutes?
Assuming the number of trucks arriving during one hour is Poisson distributed
(with parameter = 3), then the amount of time X between two truck arrivals
follows the Exp(3) distribution
Hence,
R 1/12 3x
a) P(X 1/12) = 0
3e
dx = 1 e1/4 = 0.221
R
b) P(X > 3/4) = 3/4 = e9/4 = 0.105

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

19 / 22

Week 6

Example
A sample of n = 28 heavy smoking men aged between 35 and 57, had
a sample mean testosterone level of 795.1 nanograms per decilitre. In
the general population of non smoking men of this age, we can
assume that testosterone is normally distributed with mean 620.6 and
standard deviation of 201.5. Are the results for the sample of smokers
consistent with the distribution for non-smokers?
Let Xi represent the testosterone level of a generic non-smoking man aged
between 35 57. Then Xi N (620.6, 201.5) for i = 1, 2, . . . , n. Then the
sample average of non-smoking men in this age bracket,

## N (620.6, 38.08) since / n = 210.5/ 28 ' 38.08. Compare the

X
observed value of the average testosterone level in the smoking men with the
distribution of the non-smoking men.



X 620.6
795.1 620.6

38.08
38.08
P (Z 4.58)

795.1) = P
P(X

## = 1 (4.58) < 0.0002

CVEN2002/2702 (Statistics)

Dr Joanna Wang

## (Tables finish at 3.49)

Solutions Week 1 7

20 / 22

Week 7

## Confidence interval on the mean of a normal

distribution, variance known: example
Example
The Charpy V-notch (CVN) technique measures impact energy and is often
used to determine whether or not a material experiences a ductile-to-brittle
transition with decreasing temperature. Ten measurements of impact energy
(in J) on specimens of steel cut at 60 C are as follows:
64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, 64.3
Assume that impact energy is normally distributed with = 1 J. a) Find a 95%
CI for , the mean impact energy for that kind of steel
An elementary computation yields x = 64.46 J. With n = 10, = 1 and
= 0.05, direct application of the previous results gives a 95% CI as follows:

 


1
1

x z1/2
, x + z1/2
= 64.46 1.96
, 64.46 + 1.96
n
n
10
10
= [63.84, 65.08]
CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

21 / 22

Week 7

## Confidence interval on the mean of a normal

distribution, variance known: example
Example (ctd.)
b) Determine how many specimens we should test to ensure that the 95% CI
on the mean impact energy has a length of at most 1 J
The length of the CI in part a) is 1.24 J. If we desire a higher precision, namely
a confidence interval length of 1 J, then we need more than 10 observations
The bound on error estimation e is one-half of the length of the CI, thus use
the expression on Slide 26 with e = 0.5, = 1 and = 0.05:

n=

z1/2
e

2


=

1.96 1
0.5

2
= 15.37

## as n must be an integer, the required sample size is 16

CVEN2002/2702 (Statistics)

Dr Joanna Wang

Solutions Week 1 7

22 / 22