Vous êtes sur la page 1sur 174

Study Material

for

P ro ba bility a nd S ta tistic s
AAOC ZC1 1 1

Distance Learning Programmes Division


Birla Institute of Technology & Science
Pilani 333031 (Rajasthan)
July 2003

Course Developed by

M.S.Radhakrishnan

Word Processing & Typesetting by


Narendra Saini
Ashok Jitawat

Contents
Page No.
INTRODUCTION, SAMPLE SPACES & EVENTS

Probability

Events

AXIOMS OF PROBABILITY

Some elementary consequences of the Axioms

Finite Sample Space (in which all outcomes are equally likely)

CONDITIONAL PROBABILITY

11

Independent events

11

Theorem on Total Probability

14

BAYES THEOREM

16

MATHEMATICAL EXPECTATION & DECISION MAKING

22

RANDOM VARIABLES

26

Discrete Random Variables

27

Binomial Distribution

28

Cumulative Binomial Probabilities

29

Binomial Distribution Sampling with replacement

31

Mode of a Binomial distribution

31

Hyper Geometric Distribution (Sampling without replacement)

32

Binomial distribution as an approximation to the Hypergeometric


Distribution

34

THE MEAN AND VARIANCE OF PROBABILITY DISTRIBUTIONS

36

The mean of a Binomial Distribution

37

Digression

37

Chebychevs theorem

39

Law of large numbers

41

Poisson Distribution

42

Poisson approximation to binomial distribution

42

Cumulative Poisson distribution

43

Poisson Process

43

The Geometric Distribution

46

Multinomial Distribution

52

Simulation

54

CONTINUOUS RANDOM VARIABLES

56

Probability Density Function (pdf)

57

Normal Distribution

64

Normal Approximation to Binomial Distribution

69

Correction for Continuity

70

Other Probability Densities

71

The uniform Distribution

71

Gamma Function

73

Properties of Gamma Function

74

The Gamma Distribution

74

Exponential Distribution

74

Beta Distribution

78

The Log-Normal Distribution

79

JOINT DISTRIBUTIONS TWO AND HIGHER DIMENSIONAL


RANDOM VARIABLES

83

Conditional Distribution

86

Independence

87

Two-Dimensional Continuous Random Variables

88

Marginal and Conditional Densities

90

Independence

91

The Cumulative Distribution Function

93

Properties of Expectation

100

Sample Mean

101

Sample Variance

102

SAMPLING DISTRIBUTION

115

Statistical Inference

115

Statistics

116

The Sampling Distribution of the Sample Mean X .

117

Inferences Concerning Means

128

Point Estimation

128

Estimation of n

130

Estimation of Sample proportion

143

Large Samples

143

Tests of Statistical Hypothesis

148

Notation

149

REGRESSION AND CORRELATION

164

Regression

164

Correlation

167

Sample Correlation Coefficient

167

INTRODUCTION, SAMPLE SPACES & EVENTS


Probability
Let E be a random experiment (where we know all possible outcomes but cant predict
what the particular outcome will be when the experiment is conducted). The set of all
possible outcomes is called a sample space for the random experiment E.
Example 1:
Let E be the random experiment:
Toss two coins and observe the sequence of heads and tails. A sample space for this
experiment could be S = {HH , TH , HT , TT }. If however we only observe the number
of heads got, the sample space would be S = {0, 1, 2}.
Example 2:
Let E be the random experiment:
Toss two fair dice and observe the two numbers on the top. A sample space would be

S=

(1,1), (1,2), (1,3), , (1,6)


(2,1), (2,2), (2,3),
(3,1),
|

(6,1) , (6,6)

If however, we are interested only in the sum of the two numbers on the top, the
sample space could be S = { 2, 3, , 12}.
Example 3:
Let E be the random experiment:
Count the number of machines produced by a factory until a defective machine is
produced. A sample space for this experiment could be S = {1, 2, 3, }.
1

Example 4:
Let E be the random experiment:
Count the life length of a bulb produced by a factory.
Here S will be {t | t 0} = [0, ).
Events
An event is a subset of the sample space.
Example 5:
Suppose a balanced die is rolled and we observe the number on the top. Let A be the
event: an even number occurs.
Thus in symbols,

A = {2,4,6} S = {1,2,3,4,5,6}
Two events are said to be mutually exclusive if they cannot occur together; that is there
is no element common between them.
In the above example if B is the event: an odd number occurs, i.e. B = {1,3,5} , then A and
B are mutually exclusive.
Solved Examples
Example 1:
A manufacturer of small motors is concerned with three major types of defects. If A is
the event that the shaft size is too large, B is the event that the windings are improper and
C is the event that the electrical connections are unsatisfactory, express in words what
events are represented by the following regions of the Venn diagram given below:
(a) region 2 (b) regions 1 and 3 together (c) regions 3, 5, 6 and 8 together.

4
C

2
1

6
8

Solution:
(a) Since this region is contained in A and B but not in C, it represents the event that
the shaft is too large and the windings improper but the electrical connections are
satisfactory.
(b) Since this region is common to B and C, it represents the event that the windings
are improper and the electrical connections are unsatisfactory. (c) Since this is the
entire region outside A, it represents the event that the shaft size is not too large.
Example 2:
A carton of 12 rechargeable batteries contain one that is defective. In how many ways can
the inspector choose three of the batteries and
(a) get the one that is defective
(b) not get the one that is defective.
Solution:
(a) one defective can be chosen in one way and two good ones can be chosen in
11
= 55 ways. Hence one defective and two good can be chosen in 1 x 55 = 55
2
ways.
(b) Three good ones can be chosen in

11
3
3

= 165 ways

AXIOMS OF PROBABILITY
Let E be a random experiment. Suppose to each event A, we associate a real number
P(A) satisfying the following axioms:

0 P ( A) 1

(i)
(ii)

P (S ) = 1

(iii)

If A and B are any


P ( A B ) = P ( A) + P ( B )

(iv)

If {A1, A2 - - - - - -An , } is a sequence of pair- wise mutually exclusive


events, then P ( A1 A2 ... An ...) = P ( A1 ) + P ( A2 ) + ... + P ( An ) + ...

two

mutually

exclusive

events,

then

We call P(A) the probability of the event A.


Axiom 1 says that the probability of an event is always a number between 0 and 1.
Axiom 2 says that the probability of the certain event S is 1. Axiom 3 says that the
probability is an additive set function.
Some elementary consequences of the Axioms
1.

P( ) = 0
Proof: S= S .. Now S and are disjoint.
Hence P ( S ) = P ( S ) + P ( )

2.

P ( ) = 0.

Q.E.D.

If A1 , A2 ,..., An are any n pair-wise mutually exclusive events, then

P ( A1 A2 ... An ) =

n
i =1

P ( Ai ) .

Proof: By induction on n.
Def.: If A is an event
A the complementary event = S-A (It is the shaded portion in the figure below)

A
4

3.

P ( A) = 1 P ( A)
Proof: S = A A
Now P ( S ) = P ( A) + P ( A) as A and A are disjoint or 1 = P ( A) + P ( A) .
Thus P ( A) = 1 P ( A) . Q.E.D.

4.

5.

Probability is a
subtractive set function; i.e.
If A B , then
P ( B A) = P ( B ) P ( A) .

B
A

Probability is a monotone set function:


i.e. A B
P ( A) P ( B )
Proof: B = A (B A ) where A, B-A are disjoint.
Thus P ( B ) = P ( A) + P ( B A) P ( A).
A B

6.

If A, B are any two events,


P( A B ) = P( A) + P( B) P( A B )

Proof:
( A B) ) = A ( A B )
where A and A B are disjoint
Hence P( A B ) = P( A) + P( A B )

A B

But B = ( A B ) ( A B ),

union of two disjoint sets


P ( B ) = P ( A B ) + P ( A B )
or P ( A B ) = P (B ) P ( A B ).

P( A B ) = P( A) + P ( B) P( A B ) . Q.E.D.
7.

If A, B, C are any three events,

P(A B C ) = P(A) + P(B) + P(C) P(A B) P(B C) P(C A) + P(A B C) .


5

Proof:
P(A B C) = P(A B) + P(C) P((A B) C )

= P(A) + P(B) P(A B) + P(C) P((A B) C)


= P(A) + P(B) + P(C) P(A B) P((A C) (B C))
= P(A) + P(B) + P(C) P(A B) P(A C) P(B C) + P(A B C)
More generally,
8.

If A1 , A2 ,..., An are any n events.


P(A1 A 2 ... A n )
=

n
i =1

P( A I )

P(A i A j ) +

i 1< j n

P(A i A j A k ) ...

1 i < j k < n

+ (1) n 1 P(A1 A 2 A n )

Finite Sample Space (in which all outcomes are equally likely)
Let E be a random experiment having only a finite number of outcomes.
Let all the (finite no. of) outcomes be equally likely.
If S = {a1 , a 2 ,..., a n } ( a1 , a 2 ,..., a n are equally likely outcomes), S = {a 1 } {a 2 }.......{a n }.a
union of m.e. events.
Hence P ( S ) = P ({a1 }) + P{a 2 } P ({a n })
But P({a1})=P({a2})= = P({an}) = p (say)
Hence 1 = p+ p+ . . . +p (n terms) or p = 1/n
Hence if A is a subset consisting of k of these outcomes,
A ={a1, a2ak}, then P ( A) =

No. of favorable outcomes


k
=
.
n
Total no. of outcomes

Example 1:
If a card is drawn from a well-shuffled pack of 52 cards find the probability of drawing
2
(a)
a red king
Ans:
52
16
(b)
a 3, 4, 5 or 6
Ans:
52
1
(c)
a black card
Ans:
2
4
(d)
a red ace or a black queen
Ans:
52

Example 2:
When a pair of balanced die is thrown, find probability of getting a sum equal to
(a)

7.
Ans:

6 1
= (Total number of equally likely outcomes is
36 6
36 & the favourable number of outcomes = 6, namely
(1,6), (2,5),, (6,1).)

(b)

11

Ans:

2
36

(c)

7 or 11

Ans:

8
36

(d)

2, 3 or 12

Ans: =

1
2
1
4
+
+
=
.
36 36 36 36

Example 3:
10 persons in a room are wearing badges marked 1 through 10. 3 persons are chosen at
random and asked to leave the room simultaneously and their badge nos are noted. Find
the probability that
(a) the smallest badge number is 5.
(b) the largest badge number is 5.
7

Solution:
3 persons can be chosen in 10C3 equally likely ways. If the smallest badge
number is to be 5, the badge numbers should be 5 and any two of the 5
numbers 6, 7, 8, 9,10. Now 2 numbers out of 5 can be chosen in 5C2 ways.
Hence the probability that the smallest badge number is 5 is 5C2 /10C3 .
(b) Ans. 4C2 /10C3 .

(a)

Example 4:
A lot consists of 10 good articles, 4 articles with minor defects and 2 with major defects.
Two articles are chosen at random. Find the probability that
10

(a) both are good

Ans:

16

(b) both have major defects Ans:

16

C2
C2
C2
C2

6c1
16c 2

(c) At least one is good

Ans: 1 P(none is good) = 1

(d) Exactly one is good

Ans:

(e) At most one is good

Ans. P(none is good) + P(exactly one is good) =

10c1 . 6c1
16c 2

6c 2 10c1 . 6c1
+
16c 2
16c 2
(f) Neither has major defects Ans:

14c 2
16c 2

(g) Neither is good

6c 2
16c 2

Ans:

Example 5:
From 6 positive and 8 negative integers, 4 integers are chosen at random and multiplied.
Find the probability that their product is positive.

Solution:
The product is positive if all the 4 integers are positive or all of them are negative or two
of them are positive and the other two are negative. Hence the probability is
6

6 8

4
2 2
4
+
+
14
14
14
4

Example 6:
If, A, B are mutually exclusive events and if P(A) = 0.29, P(B) = 0.43, then
(a) P(A ) = 1 0.29 = 0.71
(b) P(AB) = 0.29 + 0.43 = 0.72
(c) P ( A B ) = P(A) = 0.29 [ (as A is a subset of B , since A and B are m.e.)
(d) P(A B) = 1 P(A B) = 1 0.72 = 0.28

Example 7:
P(A) = 0.35, P(B) = 0.73, P (A B) = 0.14 . Find
(a) P (A B) = P(A) + P(B) - P( A B) = 0.94.
(b) P (A B) = P(B) P(A B) = 0.59
(c) P (A B) = P(A) P(A B) = 0.21
(d) P(A B) = 1 P(A B) = 1 0.14 = 0.86

Example 8:
A, B, C are 3 mutually exclusive events. Is this assignment of probabilities possible?
P(A) = 0.3, P(B) = 0.4, P(C) = 0.5
9

Ans.

P(A B C) = P(A) + P(B) + P(C) >1

NOT POSSIBLE

Example 9:
Three newspapers are published in a city. A recent survey of readers indicated the
following:
20% read A
16% read B
14% read C

8% read A and B
5% read A and C
4% read B and C

2% read all

Find probability that an adult chosen at random reads


(a) none of the papers.
Ans. 1 P(A B C) = 1

20 + 16 + 14 + 8 5 4 + 2
= 0.65
100

(b) reads exactly one paper.


P (Reading exactly one paper)
=

B
9

9+6+7
= 0.22
100

6
3 2

6
2
C

(c) reads at least A and B given he reads at least one of the papers.
P (At least reading A and B given he reads at least one of the papers)
=

P(A B)
8
=
P(A B C) 35

10

CONDITIONAL PROBABILITY
Let, A, B be two events. Suppose P(B) 0. The conditional probability of A occurring
given that B has occurred is defined as
P(A | B) = probability of A given B =

Similarly we define P(B | A) =

P(A B)
.
P(B)

P(A B)
if P(A) 0.
P(A)

Hence we get the multiplication theorem


P(A B) = P(A).P(B/A) (if P(A) 0) )
= P(B).P(A/B) (if P(B) 0)

Example 10
A bag contains 4 red balls and 6 black balls. 2 balls are chosen at random one by one
without replacement. Find the probability that both are red.

Solution
Let A be the event that the first ball drawn is red, B the event the second ball drawn is
red. Hence the probability that both balls drawn are red =
4 3 2
P(A B) = P(A) P(B | A) = =
10 9 15

Independent events:
Definition: We say two events A, B are independent if P(A B) = P(A). P(B)
Equivalently A and B are independent if P(B | A) = P(B) or P(A | B) = P(A)
Theorem If, A, B are independent, then
(a) A , B are independent
(b) A, B are independent
(c) A , B are independent
11

Proof B = (A B) (A B)

AB

A B

Mutually
exclusive
P(B) = P(A B) + P(A B)
P(A B) = P(B) - P(A B)
= P(B) P(A) (P/B)
= P(B) [1-P(A)]
= P(B) P( A )
A, B are also independent.
By the same reasoning, A and B are independent.
So again A and B are independent.

Example 11
Find the probability of getting 8 heads in a row in 8 tosses of a fair coin.

Solution
If Ai is the event of getting a head in the ith toss, A1, A2, , A8 are independent and
1
P(Ai) = for all i. Hence P(getting all heads) =
2
P(A1) P(A2)P(An) =

1
2

Example 12
It is found that in manufacturing a certain article, defects of one type occur with
probability 0.1 and defects of other type occur with probability 0.05. Assume
independence between the two types of defects. Find the probability that an article chosen
at random has exactly one type of defect given that it is defective.

12

Let A be the event that article has exactly one type of defect.
Let B be the event that the article is defective.
Required P(A | B) =

P(A B)
P(B)

P(B) = P(D E) where D is the event it has type one defect


E is the event it has type two defect
= P(D) + P(E) P(D E) = 0.1 + 0.05 - (0.1) (0.05) = 0.145
P(A B) = P (article is having exactly one type of defect)
= P(D) + P(E) 2 P(D E) = 0.1 + 0.05 - 2 (0.1) (0.05)
= 0.14
Probability =

0.14
0.145

[Note: If A and B are two events, probability that exactly only one of them occurs
is P(A) + P(B) 2P(A B)]

Example 13
An electronic system has 2 subsystems A and B. It is known that
P (A fails) = 0.2
P (B fails alone) = 0.15
P (A and B fail) = 0.15
Find (a) P (A fails | B has failed)
(b) P (A fails alone)

13

Solution
P(A and B failed) 0.15 1
=
=
P(B failed)
0.30 2

(a)

P(A fails | B has failed) =

(b)

P (A fails alone) = P (A fails) P (A and B fail) = 0.02-0.15 = 0.05

Example 14
A binary number is a number having digits 0 and 1. Suppose a binary number is made up
of n digits. Suppose the probability of forming an incorrect binary digit is p. Assume
independence between errors. What is the probability of forming an incorrect binary
number?
1- P (forming a correct no.) = 1 (1-p)n .

Ans
Example 15

A question paper consists of 5 Multiple choice questions each of which has 4 choices (of
which only one is correct). If a student answers all the five questions randomly, find the
probability that he answers all questions correctly.
1
4

Ans

Theorem on Total Probability


Let B1, B2, , Bn be n mutually exclusive events of which one must occur. If A is any
other event, then
P(A) = P(A B1 ) + P(A B 2 ). + ..... + P(A B n )

n
i=1

P(Bi ) P(A | Bi )

(For a proof, see your text book.)

Example 16
There are 2 urns. The first one has 4 red balls and 6 black balls. The second has 5 red
balls and 4 black balls. A ball is chosen at random from the 1st and put in the 2nd. Now a
ball is drawn at random from the 2nd urn. Find the probability it is red.
14

Solution:
Let B1 be the event that the first ball drawn is red and B2 be the event that the first ball
drawn is black. Let A be the event that the second ball drawn is red. By the theorem on
total probability,
4 6
6 5
54
+ =
=0.54.
P(A) = P(B1) P(A | B1) + P(B2) P(A | B2) =
10 10 10 10 100

Example 17:
A consulting firm rents cars from three agencies D, E, F. 20% of the cars are rented from
D, 20% from E and the remaining 60% from F. If 10% of cars rented from D, 12% of
cars rented from E, 4% of cars rented from F have bad tires, find the probability that a
car rented from the consulting firm will have bad tires.

Ans.

(0.2) (0.1) + (0.2) (0.12) + (0.6) (0.04)

Example 18:
A bolt factory has three divisions B1, B2, B3 that manufacture bolts. 25% of output is
from B1, 35% from B2 and 40% from B3. 5% of the bolts manufactured by B1 are
defective, 4% of the bolts manufactured by B2 are defective and 2% of the bolts
manufactured by B3 are defective. Find the probability that a bolt chosen at random from
the factory is defective.

Ans.

25
5
35
4
40
2

100 100 100 100 100 100

15

BAYES THEOREM
Let B1, B2, .Bn be n mutually exclusive events of which one of them must occur.
If A is any event, then
P(B k | A) =

P(A B k )
P(B )P(A | B k )
= n k
P(A)
P(B i )P(A | B i )
i =1

Example 19
Miss X is fond of seeing films. The probability that she sees a film on the day before
the test is 0.7. Miss X is any way good at studies. The probability that she maxes the test
is 0.3 if she sees the film on the day before the test and the corresponding probability is
0.8 if she does not see the film. If Miss X maxed the test, find the probability that she
saw the film on the day before the test.

Solution
Let B1 be the event that Miss X saw the film before the test and let B2 be the
complementary event. Let A be the event that she maxed the test.
Required. P(B1 | A)

P(B1 )P(A | B1 )
P(B1 ) P(A | B1 ) + P(B) P(A | B 2 )

0 .7 0 .3
0 . 7 0 . 3 + 0 . 3 0 .8

Example 20
At an electronics firm, it is known from past experience that the probability a new worker
who attended the companys training program meets the production quota is 0.86. The
corresponding probability for a new worker who did not attend the training program is
0.35. It is also known that 80% of all new workers attend the companys training
16

program. Find probability that a new worker who met the production quota would have
attended the companys training programme.

Solution
Let B1 be the event that a new worker attended the companys training programme. Let
B2 be the complementary event, namely a new worker did not attend the training
programme. Let A be the event that a new worker met the production quota. Then we
0 .8 0 .8
want P(B1 | A) =
.
0.8 0.86 + 0.2 0.35

Example 21
A printing machine can print any one of n letters L1, L2,.Ln. It is operated by
electrical impulses, each letter being produced by a different impulse. Assume that there
is a constant probability p that any impulse prints the letter it is meant to print. Also
assume independence. One of the impulses is chosen at random and fed into the machine
twice. Both times, the letter L1 was printed. Find the probability that the impulse chosen
was meant to print the letter L1.

Solution:
Let B1 be the event that the impulse chosen was meant to print the letter L1. Let B2 be the
complementary event. Let A be the event that both the times the letter L1 was printed.
1
P(B1) = . P(A|B1) = p2. Now the probability that an impulse prints a wrong letter is (1n
1 p
p). Since there are n-1 ways of printing a wrong letter, P(A|B2) =
. Hence P(B1|A)
n 1
P(B1 ) P(A | B1 )
=
P(B1 ) P(A | B1 ) + P(B 2 ) P(A | B 2 )

1 2
p
n
1 2
1
p + 1
n
n

1 p
n 1

. This is the required probability.

17

Miscellaneous problems
1 (a). Suppose the digits 1,2,3 are written in a random order. Find probability that at
least one digit occupies its proper place.

Solution
There are 3! = 6 ways of arranging 3 digits (See the figure), out of which in 4
arrangements , at least one digit occupies its proper place. Hence the probability is
4 4
= .
123
213
312
3! 6
132
231
321
(Remark. An arrangement like 231, where no digit occupies its proper place is
called a derangement.)
(b)

Same as (a) but with 4 digits 1,2,3,4

Solution

Ans.

15
(Try proving this.)
24

Let A1 be the Event 1st digit occupies its proper place


A2 be the Event 2nd digit occupies its proper place
A3 be the Event 3rd digit occupies its proper place
A4 be the Event 4th digit occupies its proper place

P(at least one digit occupies its proper place)


=P(A1A2 A3 A4)
=P(A1) + P(A2) + P(A3) + P(A4)
(There are 4C1 terms each with the same probability)
P(A 1 A 2 ) P(A 1 A 3 ) P(A 1 A 4 ) ... P( A 3 A 4 )
(There are 4C2 terms each with the same probability)
+ P(A 1 A 2 A 3 ). + P(A 1 A 2 A 4 ) + ... + P(A 2 A 3 A 4 )

(There are 4C3 terms each with the same probability)


- P( A 1 A 2 A 3 A 4 )
= 4c1

3!
2!
1!
0!
4c 2 + 4c 3 4c 4
4!
4!
4!
4!

18

= 1

(c)

1 1 1
+
2 6 24

24 12 + 4 1 15
=
24
24

Same as (a) but with n digits.

Solution

Let A1 be the Event 1st digit occupies its proper place


A2 be the Event 2nd digit occupies its proper place

An be the Event nth digit occupies its proper place

P(at least one digit occupies its proper place)


= P(A1A2 An)
(n 1)!
(n 2)!
(n 3)!
1
= nc1
nc 2
+ nc 3
- ...... + (-1) n -1
n!
n!
n!
n!
= 1

2.

In a party there are n married couples. If each male chooses at random a


female for dancing, find the probability that no man chooses his wife.

Ans
3.

1 1 1
1
+ ..........(1) n 1
1 e ! (for n large).
2! 3! 4!
n!

1-( 1

1 1 1
1
+ ..........(1) n 1 ).
2! 3! 4!
n!

A and B play the following game. They throw alternatively a pair of dice.
Whosoever gets sum of the two numbers on the top as seven wins the game
and the game stops. Suppose A starts the game. Find the probability (a) A
wins the game (b) B wins the game.

19

Solution
A wins the game if he gets seven in the 1st throw or in the 3rd throw or in the
1 5 5 1 5 5 5 5 1
5th throw or . Hence P(A wins) = + + +
6 6 6 6 6 6 6 6 6

1
6
=
5
1
6
4.

1
6
5
6
=
= . P(B wins) = complementary probability = .
36 25 11
11
36

Birthday Problem
There are n persons in a room. Assume that nobody is born on 29th Feb.
Assume that any one birthday is as likely as any other birth day. Find the
probability that no two persons will have same birthday.

Solution
If n > 365, at least two will have the same birthday and hence the probability
that no two will have the same birthday is 0.
If n 365, the desired probability is =

5.

365 364 .........[365 (n 1)]


.
(365) n

A die is rolled until all the faces have appeared on top.


(a) What is probability that exactly 6 throws are needed?
Ans.

6!
66

(b) What is probability that exactly n throws are needed? (n > 6)

20

6.

Polyas urn problem


An urn contains g green balls and r red balls. A ball is chosen at random and
its color is noted. Then the ball is returned to the urn and c more balls of same
color are added. Now a ball is drawn. Its color is noted and the ball is
replaced. This process is repeated.
(a) Find probability that 1st ball drawn is green.
Ans.

g
g+r

(b) Find the probability that the 2nd ball drawn is green.
Ans.

g
g+c
r
g
g

+
=
g +r g +r+c g +r g+r+c g+r

(c) Find the probability that the nth ball drawn is green.
The surprising answer is
7.

g
.
g+r

There are n urns and each urn contains a white and b red balls. A ball is
chosen from Urn 1 and put into Urn 2. Now a ball is chosen at random from
urn 2 and put into urn 3 and this is continued. Finally a ball drawn from Urn n.
Find the probability that it is white.

Solution
Let pr = Probability that the ball drawn from Urn r is white.
p r = p r 1

a +1
a
+ (1 p r 1 )
; r = 1, 2, , n.
a + b +1
a + a +1

This is a recurrence relation for pr. Noting that p1 =

21

a
, we can find pn.
a+b

MATHEMATICAL EXPECTATION & DECISION MAKING


Suppose we roll a die n times. What is the average of the n numbers that appear on the
top?
Suppose 1 occurs on the top n1 times
Suppose 2 occurs on the top n2 times
Suppose 3 occurs on the top n3 times
Suppose 4 occurs on the top n4 times
Suppose 5 occurs on the top n5 times
Suppose 6 occurs on the top n6 times
Total of the n numbers on the top = 1 n 1 + 1 n 2 + ..............6 n 6
Average of the n numbers,
1 n 1 + 2 n 2 ..........6 n 6
n
n
n
= 1 1 + 2 2 + ... + 6 6
n
n
n
n
Here clearly n1, n2, , n6 are unknown. But by the relative frequency definition of
n
1 n
probability, we may approximate 1 by P(getting 1 on the top) = , 2 by
n
6 n
1
P(getting 2 on the top) = , and so on. So we can expect the average of the n
6
7
numbers to be = 3.5 . We call this the Mathematical Expectation of the number
2
on the top.
=

Definition
Let E be a random experiment with n outcomes a1, a2 .an. Suppose P({a1})=p1,
P({a2})=p2, , P({an})=pn. Then we define the mathematical expectation as
a 1 p1 + a 2 p 2 ......... + a n p n

22

Problems
1.

If a service club sells 4000 raffle tickets for a cash prize of $800, what is the
mathematical expectation of a person who buys one of these tickets?

Solution.
2.

5000

1
1
+ 100
+ 0 ( )
2000
2000

A game between 2 players is called fair if each player has the same mathematical
expectation. If some one gives us $5 whenever we roll a 1 or a 2 with a balanced
die, what we must pay him when we roll a 3, 4, 5 or 6 to make the game fair?

Solution.

4.

1
1
+ 0 ( ) = = 0 .2
4000
5

A charitable organization raises funds by selling 2000 raffle tickets for a 1st prize
worth $5000 and a second prize $100. What is mathematical expectation of a
person who buys one of the tickets?

Solution.
3.

800

If we pay $x when we roll a 3, 4, 5, or 6 for the game to be fair,


4
2
x = 5 or x = 10. That is we must pay $10.
6
6

Gamblers Ruin
A and B are betting on repeated flips of a balanced coin. At the beginning, A has
m dollars and B has n dollars. After each flip the loser pays the winner 1 dollar
and the game stops when one of them is ruined. Find probability that A will win
Bs n dollars before he loses his m dollars.

Solution.
Let p be the probability that A wins (so that 1-p is the probability that B wins).
Since the game is fair, As math exp = Bs math exp.
Thus n p + 0 (1 p ) = m(1 p) + 0.p or p =

23

m
m+n

5.

An importer is offered a shipment of machines for $140,000. The probability that


he will sell them for $180,000, $170,000 (or) $150,000 are respectively 0.32,
0.55, and 0.13. What is his expected profit?

Solution.

Expected profit
= 40,000 0.32 + 30,000 0.55 + 10,000 0.13
=$30,600

6.

The manufacturer of a new battery additive has to decide whether to sell her
product for $80 a can and for $1.2 a can with a double your money back if not
satisfied guarantee. How does she feel about the chances that a person will ask
for double his/her money back if
(a) she decides to sell the product for $0.80
(b) she decides to sell the product for $1.20
(c) she can not make up her mind?

Solution.

In the 1st case, she gets a fixed amount of $0.80 a can


In the 2nd case, she expects to get for each can
(1.20) (1-p) + (-1.2) (p) = 1.20 (2.4) p
Let p be the prob that a person will ask for double his money back.
(a) happens if 0.80 > 1.20 2.40 p
p > 1/6
(b) happens if
p < 1/6
(c) happens if p = 1/6

24

7.

A manufacturer buys an item for $1.20 and sells it for $4.50. The probabilities for
a demand of 0, 1, 2, 3, 4, 5 or more items are 0.05, 0.15, 0.30, 0.25, 0.15, 0.10
respectively. How many items he must stock to maximize his expected profit?

No. of items stocked

No. sold with prob.

0
1

Exp. profit

0
1

0.05
0.95

0
1
2

0.05
0.15
0.80
0
1
2
3

0.05
0.15
0.30
0.50

0
0 0.05 + 4.5
0.95 2.1
= 2.175
0 0.05 + 4.5 0.15
+ 9 0.80 4.2
= 3.675
0 0.05 + 4.5 0.15
+ 9 0.30 + 13.5
0.15 6.3
= 

2.85

0.525

0.45

Hence he must stock 3 items to maximize his expected profit.


8.

A contractor has to choose between 2 jobs. The 1st job promises a profit of
$240,000 with probability 0.75 and a loss of $60,000 with probability 0.25. The
2nd job promises a profit of $360,000 with probability 0.5 and a loss of $90,000
with probability 0.5.
(a)

Which job should the contractor choose to maximize his expected profit?
3
1
i. Exp. profit for job1 = 240,000 60,000 = 155,000
4
4
1
1
ii. Exp. profit for job2 = 36,000 90,000 = 135,000
2
2
Go in for job1.

(b)

What job would the contractor probably choose if her business is in bad
shape and she goes broke unless, she makes a profit of $300,000 on her
next job.
Ans:- She takes the job2 as it gives her higher profit.

25

RANDOM VARIABLES
Let E be a random experiment. A random variable (r.v) X is a function that associates to
each outcome s, a unique real number X (s).

Example 1
Let E be the random experiment of tossing a fair coin 3 times. We see that there are
2 3 = 8 outcomes TTT, HTT, THT, TTH, HHT, HTH, THH, HHH all of which are
equally likely. Let X be the random variable that counts the number of heads obtained.
Thus
X
can
take
only
4
values
0,1,2,3.
We
note
that
1
3
3
1
P ( X = 0 ) = , P ( X = 1) = , P ( X = 2 ) = , P ( X = 3) = . This is called the
8
8
8
8
probability distribution of the rv X. Thus the probability distribution of a rv X is the
listing of the probabilities with which X takes all its values.

Example 2
Let E be the random experiment of rolling a pair of balanced die. There are 36 possible
equally likely outcomes, namely (1,1), (1,2) (6,6). Let X be the rv that gives the sum
of the two nos on the top. Hence X take 11 values namely 2,312. We note that the
probability distribution of X is
1
2
P(X = 2 ) = P(X = 12 ) = , P(X = 3) = P(X = 11) =
,
36
36
3
P(X = 4 ) = P(X = 10 ) = ,
36
4
P(X = 5) = P(X = 9 ) =
.
36
5
6 1
P(X = 6 ) = P(X = 8) = , P(X = 7 ) =
= .
36
36 6

Example 3
Let E be the random experiment of rolling a die till a 6 appears on the top. Let X be the
no of rolls needed to get the first six. Thus X can take values 1,2,3 Here X takes
an infinite number of values. So it is not possible to list all the probabilities with which X
takes its values. But we can give a formula.
26

P( X = x ) =

5
6

x 1

1
6

(x = 1,2.....)

(Justification: X = x means the first (x-1) rolls gave a number (other than 6) and

5 5
5 1 5
the xth roll gave the first 6. Hence P ( X = x ) = ... =
6 6
6 6 6

x 1

1
6

x 1 times

Discrete Random Variables


We say X is a discrete rv of it can take only a finite number of values (as in example 1,2
above) or a countably infinite values (as in example 3).
On the other hand, the annual rainfall in a city, the lifelength of an electronic device, the
diameter of washers produced by a factory are all continuous random variables in the
sense they can take (theoretically at least) all values in an interval of the x-axis. We
shall discuss continuous rvs a little later.
Probability distribution of a Discrete RV
Let X be a discrete rv with values x1 , x 2 ......

Let f (x i ) = P(X = x i )(i = 1,2.....)

We say that {f (x i )}i =1, 2.... is the probability distribution of the rv X.

Properties of the probability distribution


(i)

f (x i ) 0 for all i = 1,2.....


f (x i ) = 1

(ii)
i

The first condition follows from the fact that the probability is always 0. The second
condition follows from the fact that the probability of the certain event = 1.

27

Example 4
Determine whether the following can be the probability distribution of a rv which can
take only 4 values 1,2,3 and 4.
(a)
(b)
(c)

f (1) = 0.26 f (2) = 0.26 f (3) = 0.26 f (4) = 0.26 .


No as the sum of all the probabilities > 1.
f (1) = 0.15 f (2) = 0.28, f (3) = 0.29 f (4 ) = 0.28 .
Yes as these are all 0 and add up to 1.
x +1
f (x ) =
x = 1,2,3,4 .
16
No as the sum of all the probabilities < 1.

Binomial Distribution
Let E be a random experiment having only 2 outcomes, say success and failure.
Suppose that P(success) = p and so P(failure) = q (=1-p). Consider n independent
repetitions of E (This means the outcome in any one repetition is not dependent upon the
outcome in any other repetition). We also make the important assumption that P(success)
= p remains the same for all such independent repetitions of E. Let X be the rv that
counts the number of successes obtained in n such independent repetitions of E. Clearly
X is a discrete rv that can take n+1 values namely 0,1,2,.n. We note that there are
2 n outcomes each of which is a string of n letters each of which is an S or F (if n =3, it
will be FFF, SFF, FSF, FFS, SSF, SFS, FSS, SSS).
X = x means in any such outcome there are
x successes and (n-x) failures in some order. One such will be SSS ..S FFF ..F . Since all
x

n x

the repetitions are independent prob of this outcome will be p x q n x . Exactly the same
prob would be associated with any other outcome for which X = x. But x successes can
n
occur out of n repetitions in
mutually exclusive ways. Hence
x
P(X = x ) =

n
x

p x q n x (x = 0,1, ...n ).

28

We say X has a Binomial distribution with parameters n ( the number of repetitions)


and p (Prob of success in any one repetition).
We denote P(X = x ) by b(x; n , p ) to show its dependence on x, n and p. The letter b
stands for binomial.
Since all the above (n+1) probabilities are the (n+1) terms in the expansion of the
binomial (q + p ) , X is said to have a binomial distribution. We at once see that the sum
n

of all the binomial probabilities = (q + p ) = 1n = 1.


n

The independent repetitions are usually referred to as the Bernoulli trials. We note that
b(x; n, p ) = b(n x; n, q )
(LHS = Prob of getting x successes in n Bernoulli trials = prob of getting n-x failures in
n Bernoulli trials = R.H.S.)

Cumulative Binomial Probabilities


Let X have a binomial distribution with parameters n and p.

P(X x ) = P(X = 0) + P (X = 1) + ...... P(X = x )


=

x
k =0

b(k; n , p )

is denoted by B( x; n, p ) and is called the cumulative Binomial distribution function. This


is tabulated in Table 1 of your text book. We note that
b(x; n , p ) = p(X = x ) = P(X x ) P(X x 1)
= B(x; n , p ) B(x 1; n , p )

Thus b(9;12,00.60) = B(9;12,0.60) B(8;12,0.60)


= 0.9166 0.7747
= 0.1419

(You can verify this by directly calculating b(9;12,0.60)).


29

Example 5 (Exercise 4.15 of your book)


During one stage in the manufacture of integrated circuit chips, a coating must be
applied. If 70% of the chips receive a thick enough coating find the probability that
among 15 chips.
(a) At least 12 will have thick enough coatings.
(b) At most 6 will have thick enough coatings.
(c) Exactly 10 will have thick enough coatings.

Solution
Among 15 chips, let X be the number of chips that will have thick enough coatings.
Hence X is a rv having Binomial distribution with parameters n =15 and p = 0.70.
(a) P(X 12) = 1 P(X 11)

= 1 B (11;15,0.70 )
= 1 0.7031 = 0.3969

(b) P(X 6) = B(6;15,0.70 )


= 0.0152
(c) P(X = 10) = B(10;15,0.70) B(9;15,0.70)
= 0.4849 0.2784
= 0.2065

Example 6 (Exercise 4.19 of your text book)


A food processor claims that at most 10% of her jars of instant coffee contain less coffee
than printed on the label. To test this claim, 16 jars are randomly selected and contents
weighed. Her claim is accepted if fewer than 3 of the 16 jars contain less coffee (note that
10% of 16 = 1.6 and rounds to 2). Find the probability that the food processors claim
will be accepted if the actual percent of the jars containing less coffee is
(a) 5%

(b) 10%

(c) 15% (d) 20%

Solution:
Let X be the number of jars that contain less coffee (than printed on the label) (among the
16 jars randomly chosen. Thus X is a random variable having a Binomial distribution
30

with parameters n = 16 and p (the prob of success = The prob that a jar chosen at
random will have less coffee)
(a) Here p = 5% = 0.05
Hence P (claim is accepted) = P(X 2) = B(2;16,0.05) = 0.9571.
(b) Here p = 10% = 0.10
Hence P (claim is accepted) = B(2;16,0.01) = 0.7892
(c) Here p = 15% = 0.15.
Hence P (claim is accepted) = B (2;16,0.15) = 0.5614
(d) Here p = 20% = 0.20
Hence P(claims accepted) = B(2,16,0.29) = 0.3518

Binomial Distribution Sampling with replacement


Suppose there is an urn containing 10 marbles of which 4 are white and the rest are black.
Suppose 5 marbles are chosen with replacement. Let X be the rv that counts the no of
white marbles drawn. Thus X = 0,1,2,3,4 or 5 (Remember that we replace each marble in
the urn before drawing the next one. Hence we can draw 5 white marbles)
P (Success) = P (Drawing a white marble in any one of the 5 draws) =

4
(remember
10

we draw with replacement).


Thus X has a Binomial distribution with parameters n = 5 and p =
Hence P ( X = x ) = b x;5,

4
10

4
10

Mode of a Binomial distribution


We say x0 is the mode of the Binomial distribution with parameters n and p if

P ( X = x0 ) is the greatest. From the binomial tables given in the book we can easily see

that
31

When n = 10, p =

1
, P ( X = 5) is the greatest or 5 is the mod e.
2

Fact
b( x + 1; n, p ) n x
p
=

> 1if x < np (1 p )


b( x; n; p )
n +1 1 p
= 1 if x = np (1 p )

<1if n > n p (1 p )

Thus so long as x <np (1-p) the binomial probabilities increase and if x> np-(1-p) they
decrease. Hence if np-(1-p) = x0 is an integer, then the mode is x0 and x0 + 1. If n (1-p)

in not an integer and if x0 = smallest integer np (1 p ) , the mode is x 0 .

Hypergeometric Distribution (Sampling without replacement)


An urn contains 10 marbles of which 4 are white. 5 marbles are chosen at random
without replacement. Let X be the rv that counts the number of white marbles drawn.
Thus X can take 5 values names 0,1,2,3,4. What is P (X = x)? Now out of 10 marbles 5
10
4
6
can be chosen in
equally like ways, out of which there will be
ways of
5
x 5 x
drawing x white marbles (and so 5-x read marbles) (Reason out of 4 white marbles, x can
4
6
be chosen in
ways and out of 6 red marbles, 5-x can be chosen in
ways).
x
5 x
4
Hence P ( X = x ) =

6
5 x
x = 0,1,2,3,4.
10
5

We generalize the above result.


A box contains N marbles out of which a are white. n marbles are chosen without
replacement. Let X be the random variable that counts the number of white marbles
drawn. X can take the values 0,1,2. n.

32

P( X = a ) =

N a

nx
x = 0,1,2.... n
N
n

(Note x must be less than or equal to a and n-x must be less than or equal to N-a)
We say the rv X has a hypergeometric distribution with parameters n,a and N. We denote
P(X=x) by h (x;n,a,N).

Example 7 (Exercise 4.22 of your text book)


Among the 12 solar collectors on display, 9 are flat plate collectors and the other three
are concentrating collectors. If a person choses at random 4 collectors, find the prob that
3 are flat plate ones.
9

Ans

h (3; 4, 9,12 ) =

3 1
12
4

Example 8 (Exercise 4.24 of your text book)


If 6 of 18 new buildings in a city violate the building code, what is the probability that a
building inspector, who randomly selects 4 of the new buildings for inspection, will catch
(a) None of the new buildings that violate the building code
12

Ans

h(1; 4, 6, 18) =

4
18
4

(b) One of the new buildings that violate the building code

33

Ans

h(1; 4, 6,18) =

12

3
18
4

(c) Two of the new buildings that violate the building code

Ans

h(2; 4, 6, 18) =

12

2
18
4

(d) At least three of the new buildings that violate the building code

Ans

h(3; 4, 6, 18) + h (4; 4, 6, 18)

(Note: We choose 4 buildings out of 18 without replacement. Hence hypergeometric


distribution is appropriate)

Binomial distribution as an approximation to the Hypergeometric Distribution


We can show that h( x; n, a, N ) b( x; n, p ) as N
(Where p =
probability

a
= " prob of a success" ) . Hence if N is large the hypergeometric
N
h (x; n, a , N ) can be approximated by the binomial probability

b(x; n, p ) where p =

a
.
N

Example 9 (exercise 4.26 of your text)


A shipment of 120 burglar alarms contains 5 that are defective. If 3 of these alarms are
randomly selected and shipped to a customer, find the probability that the customer will
get one defective alarm.
(a) By using the hypergemetric distribution
(b) By approximating the hypergeometric probability by a binomial probability.
34

Solution
Here N = 120 (Large!) a = 5 n = 3 x =1
(a) Reqd prob = h(1; 3, 5,120)
5
=

115
2
120

5 6555
= 0.1167
280840

3
(b) h(1; 3, 5, 120 ) b 1; 3,

3
=
1

5
120

5
120

5
1
120

= 0.1148

Example 10 (Exercise 4.27 of your text)


Among the 300 employees of a company, 240 are union members, while the others are
not. If 8 of the employees are chosen by lot to serve on the committee which
administrates the provident fund, find the prob that 5 of them will be union members
while the others are not.
(a) Using hypergemoretric distribution
(b) Using binomial approximation

Solution
Here N = 300, a = 240, n = 8 x = 5
(a) h (5; 8, 240, 300)
(b) b 5; 8,

240
300

35

THE MEAN AND VARIANCE OF PROBABILITY DISTRIBUTIONS


We know that the equation of a line can be written as y = mx + c. Here m is the slope and
c is the y intercept. Different m,c give different lines. Thus m and c characterize a line.
Similarly we define certain numbers that characterize a probability distribution.

The mean of a probability distribution is simply the mathematical expectation of the


corresponding r.v. If a rv X takes on the values x 1, x 2 ..... with probabilities

f (x 1 ), f (x 2 )...., its

mathematical

x1 f ( x1 ) + x 2 f ( x 2 ) + ...... =

expectation

xi x P ( x = xi ) =

or

expected

value Pr obability

We use the symbol to denote the mean of X.


Thus = E ( X ) =

xi P( x = xi ) (Summation over all xi in the Range of X)

Example 11
Suppose X is a rv having the probability distribution
X

Prob

1
2

1
3

1
6

Hence the mean of the prob distribution (of X) is

=1

1
1
1 5
+ 2 + 3 =
2
3
6 3

Example 12
Let X be the rv having the distribution
X

Prob

36

value

is

where q = 1 p. Thus = 0 q + 1 p = p.
The mean of a Binomial Distribution
Suppose X is a rv having Binomial distribution with parameters n and p. Then
Mean of X = = np.
(Read the proof on pages 107-108 of your text book)
The mean of a hypergeometric Distribution
If X is a rv having hypergeometric distribution with parameters N , n, a, then = n

a
.
N

Digression
The mean of a rv x give the average of the values taken by the rv. X. Thus the
average marks in a test is 40 means the students would have got marks less than 40
and greater than 40 but it averages out to be 40. But we do not get an idea about the
spread ( deviation from the mean) of the marks. This spread is measured by the
variance. Informally speaking by the average of the squares of deviation from the
mean.
Variance of a Probability Distribution of X is defined as the expected value of

(X )2
Variance of X = 2
=

(x i )2 P(X = x i )

xi R X
Note that R.H.S is always 0 (as it is the sum of non-ve numbers)
The positive square root of 2 is called the standard deviation of X and has the
same units as X and .
37

Example 13
For the rv X having the prob distribution given in example 11, the variance is

5
1
3
=

1
5
+ 2
2
3

1
5
+ 3
3
3

1
6

4 1 1 1 16 1 5
x + + =
9 2 9 3 9 6 9

We could have also used the equivalent formula

( )

2 = E (X ) = E X 2 2
2

( )

1
1
1 1 4 9 60 10
+ 2 2 + 32 = + + = =
2
3
6 2 3 6 18 3
10
25
5
2 =

= .
3
9 9
Here E X 2 = 12

Example 14
For the probability distribution of example 12,

( )

E X 2 = o 2 q + 12 p = p

2 = p p 2 = p(1 p ) = pq
Variance of the Binomial Distribution

2 = npq
Variance of the Hypergeometric Distribution

2 =n

a
a N n
1
.
.
N
N N 1

38

CHEBYCHEVS THEOREM
Suppose X is a rv with mean and variance 2 . Chebychevs theorem states that: If k
is a constant > 0,
P(| X | k )

1
k2

In words the prob of getting a value which deviates from its mean by at least k is at
1
.
k2
Note: Chebyshevs Theorem gives us an upper bound of the prob of an event. Mostly it is
of theoretical interest.
most

Example 15 (Exercise 4.44 of your text)


In one out of 6 cases, material for bullet proof vests fails to meet puncture standards. If
405 specimens are tested, what does Chebyshev theorem tell us about the prob of getting
at most 30 or at least 105 cases that do not meet puncture standards?
1 135
Here = np = 405 =
6 2
1 5
6 6

2 = n p q = 405

15
2
Let X = no of cases out of 405 that do not meet puncture standards
Reqd P(X 30 or X 105)
=

Now X 30
X 105

75
2

75
2

Thus X 30 or X 105

| X |

75
= 5
2

39

P(X 30 or X 105) = P(| X | 5 )

1 1
= = 0.04
5 2 25

Example 16 (Exercise 446 of your text)


How many times do we have to flip a balanced coin to be able to assert with a prob of at
most 0.01 that the difference between the proportion of tails and 0.50 will be at least
0.04?

Solution:
Suppose we flip the coin n times and suppose X is the no of tails obtained. Thus the
proportion of tails =

No of tails
X
. We must find n so that
=
n
Total No of flips

X
0.50 0.04 0.01
n

Now X = no of tails among n flips of a balanced coin is a rv having Binomial distribution


with parameters n and 0.5.
Hence = E(X ) = np = n 0.50

(as p = q = 0.50)

= n p q = n 0.50
Now

X
0.50 0.04 is equivalent to X n 0.50 0.04n.
n

We know P( X k )

1
k2

Here k = 0.04n

k =

0.04n
0.50 n

= 0.08 n

40

X
0.50 0.04
n

= P(| X | k )

1
0.01
k2

1
2
= 100 or if (.08) n 100.
0.01
100
or n
=15625
(.08)2

if k 2

Law of large Numbers


Suppose a factory manufactures items. Suppose there is a constant prob p that an item is
defective. Suppose we choose n items at random and let X be the no of defectives found.
Then X is a rv having binomial distribution with parameters n and p.
mean = E (X ) = np, var iance 2 = npq
Let be any no > 0.
Now P

X
p
n

= P( X np n ) = P( x k ) (where k = n )

1
2
npq pq
(
by
Chebyshev
'
s
theorem
)
=
= 2 2 = 2 0 as n .
2
2 2
k
n
n
n

Thus we can say that the prob that the proportion of defective items differs from the
actual prob. p by any + ve no 0 as n . (This is called the Law of Large
numbers)
This means most of the times the proportion of defectives will be close to the actual
X
(unknown) prob p that an item is defective for large n. So we can estimate p by , the
n
(Sample) proportion of defectives.

41

POISSON DISTRIBUTION
A random variable X is said to have a Poisson distribution with parameter > 0 if its
probability distribution is given by
P ( X = x ) = f ( x; ) = e

x
x!

x = 0,1,2......

We can easily show: mean of X = = and variance of X = 2 = .


Also P( X = x ) is largest when x = 1 and if is an integer and when x = [ ] = the
greatest integer (when is not an integer). Also note that P( X = x ) 0 as x .

POISSON APPROXIMATION TO BINOMIAL DISTRIBUTION


Suppose X is a rv having Binomial distribution with parameters n and p. We can easily
show b(x; n, p ) = P(X = x ) f (x; ) as n in such a way that np remains a constant

.
Hence for n large, p small, the binomial prob b( x; n, p ) can be approximated by the
Poisson prob f ( x; ) where = np.

Example 17

b(3;100, 0.03)
f (3;3) =

e 3 3 3
3!

Example 18 (Exercise 4.54 of your text)


If 0.8% of the fuses delivered to an arsenal are defective, use the Poisson approximation
to determine the probability that 4 fuses will be defective in a random sample of 400.

Solution
If X is the number of defectives in a sample of 400, X has the binomial distribution with
parameters n = 400 and p = 0.8% = 0.008.
42

Thus P (4 out of 400 are defective)


= b(4; 400, 0.008) f (4; ) (Where = 400 0.008 = 3.2 )
=e

3.2

(3.2)4

4!
= 0.781 0.603
(from table 2 at the end of the text)
= 0.178

Cumulative Poisson Distribution Function


If X is a rv having Poisson Distribution with parameter , the cumulative Poisson Prob
= F(x; ) = P(X x ) =

x
k =0

P(X = k ) =

x
k =0

f (k; )

For various and x, F(x; ) has been tabulated in table 2 (of your text book on page 581
to 585) .We use the table 2 as follows.
f (x; ) = P(X = x ) = P(X x ) P(X x 1)
= F(x; ) F(x 1; )

Thus f (4;3.2) = F (4;3.2) F (3;3.2) = 0.781 0.603 = 0.178.

Poisson Process
There are many situations in which events occur randomly in regular intervals of time.
For example in a time period t, let X t be the number of accidents at a busy road junction
in New Delhi; X t be the number of calls received at a telephone exchange; X t be the
number of radio active particles emitted by a radioactive source etc. In all such examples
we find X t is a discrete rv which can take non-ve integral values 0,1,2,.. The important
thing to note is that all such random variables have same distribution except that the
parameter(s) depend on time t.
The collection of random variables (X t ) t > 0 is said to constitute a random process. If

each (X t ) has a Poisson Distribution, we say (X t ) is a Poisson process. Now we show

the rvs (X t ) which counts the number of occurrences of a random phenomena in a time
43

period t constitute a Poisson process under suitable assumptions. Suppose in a time


period t, a random phenomenon which we call success occurs. We let Xt = number of
successes in time period t. We assume :
1.

In a small time period t , either no success or one success occurs.

2.

The prob of a success in a small time period t is proportional to t i.e. say


P ( X t = 1) = t . ( constant of proportionality)

3.

The prob of a success during any time period does not depend on what
happened prior to that period.

Divide the time period t into n small time periods each of length t . Hence by
assumptions above, we note that Xt = no of successes in time period t is a rv having
Binomial distribution with parameters n and p = t . Hence
P(X t = x ) = b(x; n , t )
f (x;

) as n

where

= n.

So we can say that Xt = no of successes in time period t is a rv having Poisson


distribution with parameter t.
Meaning of the proportaratility constant
Since mean of X t is = t , We find = mean no of successes in unit time.
(Note: For a more rigorous derivation of the distribution of Xt, you may see Meyer,
Introductory probability and statistical applications, pages 165-169).

Example 19 (Exercise 4.56 of your text)


Given that the switch board of a consultants office receives on the average 0.6 call per
minute, find the probability that
(a) In a given minute there will be at least one call.
(b) In a 4-minute interval, there will be at least 3 calls.
44

Solution
Xt= no of calls in a t-minute interval is a rv having Poisson distribution with parameter
t = 0.6t
(a) P(X1 1) = 1 P(X 1 = 0 ) = 1 e 0.6 = 1 0.549 = 0.451.

(b) P(X 4 3) = 1 P(X 4 2 ) = 1 F (2;2.4) = 1 0.570 = 0.430

Example 20
Suppose that Xt, the number of particles emitted in t hours from a radio active source
has a Poisson distribution with parameter 20t. What is the probability that exactly 5
particles are emitted during a 15 minute period?

Solution
15 minutes =

1
hour
4

Hence if X 14 = no of particles emitted in

P X 14 = 5 = e

14 20

1
20
4
5!

1
hour
4

= e 5

55
5!

= 0.616 0.440 = 0.176 (from table 2)

45

THE GEOMETRIC DISTRIBUTION


Suppose there is a random experiment having only two possible outcomes, called
success and failure. Assume that the prob of a success in any one trial ( repetition
of the experiment) is p and remains the same for all trials. Also assume the trials are
independent. The experiment is repeated till a success is got. Let X be the rv that counts
the number of trials needed to get the 1st success. Clearly X = x if the first (x-1) trials
were failures and the xth trial gave the first success. Hence
P(X = x ) = g (x; p ) = (1 p )

x 1

p = q x 1 p

(x = 1,2......)

We say X has a geometric distribution with parameter p (as the respective probabilities
form a geometric progression with common ratio q).
We can show the mean of this distribution is

q
1
and the variance is 2 = 2
p
p

(For example suppose a die is rolled till a 6 is got. It is reasonable to expect on an average
1
we will need 1 = 6 rolls as there are 6 nos!)
6

Example 21 (Exercise 4.60 of your text)


An expert hits a target 95% of the time. What is the probability that the expert will miss
the target for the first time on the fifteenth shot?

Solution
Here Success means the expert misses the target. Hence p = P(Success ) = 5% = 0.05 . If
X is the rv that counts the no. of shots needed to get a success, we want
P ( X = 15) = q 14 p = (0.95) 0.05.
14

46

Example 22
The probability of a successful rocket launching is 0.8. Launching attempts are made till
a successful launching has occurred. Find the probability that exactly 6 attempts will be
necessary.

Solution

(0.2)5 0.8

Example 23
X has a geometric distribution with parameter p. show
(a)
(b)

P ( X r ) = q r 1

r = 1,2,.........

P(x s + t | x > s ) = P( X t )

Solution
(a)

(b)

P(X r ) =

X > s )=

P ( X s + t ) q s +t 1
=
= q t 1 = P ( X t ).
s
P( X > s )
q

x =r

P(X s + t

q r 1 p
= q r 1 .
1 q

q x 1 .p

Application to Queuing Systems


Service facility
Customers arrive in a
Poisson Fashion

Depart after service

There is a service facility. Customers arrive in a random fashion and get service if the
server is idle. Else they stand in a Queue and wait to get service.
Examples of Queuing systems
1. Cars arriving at a petrol pump to get petrol
2. Men arriving at a Barbers shop to get hair cut.
3. Ships arriving at a port to deliver goods.
47

Questions that one can ask are :


1. At any point of time on an average how many customers are in the system
(getting service and waiting to get service)?
2. What is the mean time a customer waits in the system?
3. What proportion of time a server is idle? And so on.
We shall consider only the simplest queueing system where there is only one server. We
assume that the population of customers is infinite and that there is no limit on the
number of customers that can wait in the queue.
We also assume that the customers arrive in a Poission fashion at the mean rate of .
This means that X t the number of customers that arrive in a time period t is a rv having
Poisson distribution with parameter t . We also assume that so long as the service
station is not empty, customers depart in a Poisson fashion at a mean rate of . This
means, when there is at least one customer, Yt , the number of customers that depart
(after getting service) in a time period t is a r.v. having Poisson distribution with
parameter t (where > ).
Further assumptions are : In a small time interval t , there will be a single arrival or a
single departure but not both. (Note that by assumptions of Poisson process in a small
time interval t , there can be at most one arrival and at most one departure). Let at time
t, N t be the number of customers in the system. Let P ( N t = n ) = p n (t ). We make another
assumption:

p n (t ) n as t . n is known as the steady state probability distribution of the


number of customers in the system. It can be shown:

o =1

n = 1

(n = 0, 1, 2, . . .)

Thus L = Mean number of customers in the system getting service and waiting to get
service)
48

n =0

n. n =

L q = Mean no of customers in the queue (waiting to get service)

n =1

(n 1) n =

=L
( )

W = mean time a customer spends in the system


=

L
1
=

W q = Mean time a customer spends in the queue.

Lq

1
=
=W .
( )

(For a derivation of these results, see Operations Research Vol. 3 by Dr. S.


Venkateswaran and Dr. B Singh, EDD Notes of BITS, Pilani).

Example 24 (Exercise 4.64 of your text)


Trucks arrive at a receiving dock in a Poisson fashion at a mean rate of 2 per hour. The
trucks can be unloaded at a mean rate of 3 per hour in a Poisson fashion (so long as the
receiving dock is not empty).
(a) What is the average number of trucks being unloaded and waiting to get
unloaded?
(b) What is the mean no of trucks in the queue?
(c) What is the mean time a truck spends waiting in the queue?
(d) What is the prob that there are no trucks waiting to be unloaded?
(e) What is the prob that an arriving truck need not wait to get unloaded?

49

Solution
Here = arrival rate = 2 per hour
= departure rate = 3 per hour.
Thus

L=

(b)

2
22
4
Lq =
=
=
( ) 3(1) 3

(c)

Wq =

(d)

P (no trucks are waiting to be unloaded)

2
=2
32

(a)

2
= hr
( ) 3

= (No of trucks in the dock is 0 or 1)

= 0 + 1 = 1

=
(e)

+ 1

2
2 2
= 1 + 1

3
3 3

1 2 5
+ =
3 9 9

P (arriving truck need not wait)


= P (dock is empty)
1
= 0 =
3

Example 25
With reference to example 24, suppose that the cost of keeping a truck in the system is
Rs. 15/hour. If it were possible to increase the mean loading rate to 3.5 trucks per hour at
a cost of Rs. 12 per hour, would this be worth while?

50

Solution
In the old scheme, = 2, = 3, L = 2
Mean cost per hour to the dock = 2 x 15 = 30/hr.
In the new scheme = 2, = 3, L =

4
verify!
3

Net cost per hour to the dock =

4
15 + 12 = 32 / hr.
3

Hence it is not worthwhile to go in for the new scheme.

51

MULTINOMIAL DISTRIBUTION
Consider a random experiment E and suppose it has k possible outcomes A1 , A2 ,.... Ak .

Suppose P ( Ai ) = pi for all i and that pi remains the same for all independent repetitions
of E. Consider n independent repetitions of E. Suppose A1 occurs X1 times, A2 occurs X2
times, , Ak occurs Xk times. Then P ( X 1 = x1 , X 2 = x 2 ,.... X k = x k )
n!
p1x1 p 2x 2 ..... p kxk
x1 ! x 2 !......x k !

for all non-ve integers x1 , x 2 .., x k with x1 + x 2 + ... + x k = x


Proof. The probability of getting A1 x1 times, A2 x 2 times, Ak x k times in any one way
is p1x1 p 2x2 ...... p kxk as all the repetitions are independent. Now among the n repetitions

A1 occurs x1 times in

From

the

n
n!
=
ways.
x1
x1 ! (n x1 )!

remaining

n x1 repetitions

A2

can

occur

x2

times

in

n x1
(n x1 )!
=
ways and so on.
x2
x 2 ! (n x1 x 2 )!
Hence the total number of ways of getting A1 x1 times, A2 x 2 times, . Ak x k times will
be

(n x1 x 2 .....x k 1 )!
(n x1 )!
n!

...
x1 ! (n x1 )! x 2 ! (n x1 x 2 )!
x k ! (n x1 x 2 ....x k 1 x k )!
=

n!
as x1 + x 2 + .....x k = n and 0! = 1
x1 ! x 2 !......x k !

Hence P ( X 1 = x1 , X 2 = x 2 ,..... X k = x k ) =

n!
p1x1 p 2x2 .... p kxk
x1 ! x 2 !....x k !

52

Example 26
A die is rolled 30 times. Find the probability of getting 1 2 times, 2 3 times, 3 4 times,
4 6 times, 5 7 times and 6 8 times.

Ans

30!
1

2! 3! 4! 6! 7! 8! 6

1
6

1
6

1
6

1
6

1
6

Example 27 (See exercise 4.72 of your text)


The probabilities are, respectively, 0.40, 0.40, and 0.20 that in city driving a certain type
of imported car will average less than 10 kms per litre, anywhere between 10 and 15 kms
per litre, or more than 15 kms per litre. Find the probability that among 12 such cars
tested, 4 will average less than 10 kms per litre, 6 will average anywhere from 10 to 15
kms per litre and 2 will average more than 15 kms per litre.

Solution
12!
(.40)4 (.40)6 (.20)2 .
4! 6! 2!
Remark
1.
Note that the different probabilities are the various terms in the expansion of the
multinomial

( p1 + p 2 + ...... p k )n .
Hence the name multinomial distribution.
2.
3.

The binomial distribution is a special case got by taking k =2.


For any fixed i (1 i k )X i (the number of ways of getting Ai ) is a random
variable having binomial distribution with parameters n and pi. Thus
E ( X i ) = n p i and V(X i ) = np i (1 p i ). i = 1,2..........k

53

SIMULATION
Nowadays simulation techniques are being applied to many problems in Science and
Engineering. If the processes being simulated involve an element of chance, these
techniques are referred to as Monte Carlo methods. For example to study the distribution
of number of calls arriving at a telephone exchange, we can use simulation techniques.
Random Numbers : In simulation problems one uses the tables of random numbers to
generate random deviates (values assumed by a random variable). Table of random
numbers consists of many pages on which the digits 0,1,2.. 9 are distributed in such a
1
was that the probability of any one digit appearing is the same, namely 0.1 = .
10

Use of random numbers to generate heads and tails. For example choose the 4th
column of the four page of table 7, start at the top and go down the page. Thus we get
6,2,7,5,5,0,1,8,6,3.. Now we can interpret this as H,H,T, T,T, H, T, H, H,T, because the
prob of getting an odd no. = the propagating an even number = 0.5 Thus we associate
head to the occurrence of an even number and tail to that of an odd no. We can also
associate a head if we get 5,6,7,8, or 9 and tail otherwise. The use can say we got
H,T,H,H,H,T,T,H,H,T.. In problems on simulation we shall adopt the second scheme
as it is easy to use and is easily extendable for more than two outcomes. Suppose for
example, we have an experiment having 4 outcomes with prob. 0.1, 0.2, 0.3 and 0.4
respectively.
Thus to simulate the above experiment, we have to allot one of the 10 digits 0,1.9 to
the first outcome, two of them to the second outcome, three of them to the third outcome
and the remaining four to the fourth outcome. Though this can be done in a variety of
ways, we choose the simplest way as follows:
Associate the first digit 0 to the 1st outcome 01
Associate the next 2 digits 1,2 to the 2nd outcome 0 2
Associate the next 3 digits 3,4,5 to the 3rd outcome 0 3 .
And associate the last 4 digits 6,7,8,9 to the 4th outcome 0 4 .
Hence the above sequence 6,2,7,5,5,0,1,8,6,3 of random numbers would correspond to
the sequence of outcomes O 4 , O 2 , O 4 , O3 , O3 , O1 , O 2 , O 4 , O 4 , O3 ..............
Using two and higher digit Random numbers in Simulation
54

Suppose we have a random experiment with three outcomes with probabilities 0.80, 0.15
and 0.05 respective. How can we now use the table of random numbers to simulate this
experiment? We now read 2 numbers at a time : say (starting from page 593 room 12,
1
column 4) 84,71,14,24,20,31,78, 03.. Since P (anyone digit) =
, P (any two
10
1 1
= 0.01 . Thus each 2 digit random number occurs with prob 0.01.
digits) =
10 10
Now that there will be 100 2 digit random numbers : 00, 01, , 10, 11, , 20, 21, ,
98, 99. Thus we associate the first 80 numbers 00,0179 to the first out come, the next
15 numbers (80, 81, 94) to the second outcome and the last 5 numbers (95, 96, , 99)
to the 3rd outcome. Thus the above sequence of 2 digit random numbers would simulate
the outcomes:

O 2 , O1 , O1 , O1 , O1 , O1 , O1 , O1 .......
We describe the above scheme in a diagram as follows:

*
**

Outcome

Probability

Cumulative Probability*

Random Numbers**

O1

0.80

0.80

00-79

O2

0.15

0.95

80-94

O3

0.05

1.00

95-99

Cumulative prob is got by adding all the probabilities at that position and above thus cumulative
prob at O2 = Prob of O1 + Prob O2 = 0.80 + 0.15 = 0.95.
You observe the beginning random number is 00 for the 1st outcome; and for the remaining
outcomes, it is one more than the ending random numbers of the immediately preceding outcome.
Also the ending random number for each outcome is one less than the cumulative probability.

Similarly three digit random numbers are used if the prob of an outcome has 3 decimal
places. Read the example on page 133 of your text book.

55

Exercise 4.97 on page 136

No. of polluting spices

Probability

Cumulative
Probability

Random Numbers

0.2466

0.2466

0000-2465

0.3452

0.5918

2466-5917

0.2417

0.8335

5918-8334

0.1128

0.9463

8335-9462

0.0395

0.9858

9463-9857

0.0111

0.9969

9858-9968

0.0026

0.9995

9969-9994

0.0005

1.0000

9995-9999

Starting with page 592, Row 14, Column 7, we read of the 4 digit random nos as :

R No.

Polluting spics

R.No.

Polluting spics

5095

2631

0150

3033

8043

9167

9079

4998

6440

7036

CONTINOUS RANDOM VARIABLES


In many situations, we come across random variables that take all values lying in a
certain interval of the x axis.

Example
(1)
(2)

life length X of a bulb is a continuous random variable that can take all non-ve
real values.
The time between two consecutive arrivals in a queuing system is a random
variable that can take all non-ve real values.

56

(3)

The distance R of the point (where a dart hits) (from the centre) is a
continuous random variable that can take all values in the interval (0,a) where
a is the radius of the board.

It is clear that in all such cases, the probability that the random variable takes any one
particular value is meaningless. For example, when you buy a bulb, you ask the question?
What are the chances that it will work for at least 500 hours?

Probability Density function (pdf)


If X is a continuous random variable, the questions about the probability that X takes
values in an interval (a,b) are answered by defining a probability density function.

Def Let X be a continuous rv. A real function f(x) is called the prob density function of X
if
(1)
(2)

(3)

f ( x ) 0 for all x

f ( x )dx = 1

P (a X b ) =

b
a

f ( x ) dx.

Condition (1) is needed as probability is always 0.


Condition (2) says that the probability of the certain event is 1.
Condition (3) says to get the prob that X takes a value between a and b, integrate the
function f(x) between a and b. (This is similar to finding the mass of a rod by integrating
its density function).
Remarks
1.

P( X = a ) = P(a X a ) =

2.

Hence P(a X b ) = P(a < X b ) = P(a X < b ) = P(a < X < b )

3.

Please note that unlike discrete case, it is immaterial whether we include or


exclude one or both the end points.
P( x X x + x ) f ( x )x

a
a

f ( x )dx = 0

57

This is proved using Mean value theorem.

Definition (Cumulative Distribution function)


If X is a continuous rv and if f(x) is its density,

P( X x ) = P( < X x ) =

f (t )dt

We denote the above by F(x) and call it the cumulative distribution function (cdf) of X.

Properties of cdf
1.

0 F ( x ) 1 for all x.

2.

x1 < x 2

3.

F ( ) = lim f ( x ) = 0; f (+ ) =

F ( x1 ) F ( x 2 ) i.e., F(x) is a non-decreasing function of x.

4.

d
d
F (x ) =
dx
dx

lim F (x ) = 1.
x

f (t ) dt = f ( x )

(Thus we can get density function f(x) by differentiating the distribution function F(x)).

Example 1 (Exercise 5.2 of your book)


If the prob density of a rv is given by f ( x ) = kx 2 0 < x < 1 (and 0 elsewhere) find the
value of k and the probability that the rv takes on a value
1
3
and
4
4
2
(b) Greater than
3

(a) Between

Find the distribution function F(x) and hence answer the above questions.

58

Solution

f ( x )dx = 1

gives
1
0

i.e.

f ( x )dx = 1 (as f ( x ) = 0 if x < 0 or > 1)


1
0

kx 2 dx = 1 or k

Thus

1
= 1 or k = 3.
3

f ( x ) = 3 x 2 0 x 1 and 0 otherwise.

1
3
P <X <
=
4
4
P X >

= 13

1
4

3
3 x dx =
4
2

2
2
= P < X <1 =
3
3
2
3

1
2

26 13
=
64 32

3 x 2 dx

19
27

Distribution function F (x ) =

f (t )dt

Case (i)

x 0 . In this case f (t ) = 0 between and x F ( x ) = 0

Case (ii)

0<x<1. In this case f (t ) = 3t 2 between 0 and x and 0 for t<0.

F (x ) =
Case (iii)

f (t )dt =

3t 2 dt = x 3 .

x>1
Now f (t ) = 0 for t > 1

59

F (x ) =

f (t )dt =

f (t )dt = 1 (by case ii )

Hence we can say the distribution function

F (x ) = x 3
1

Now P

0< x 1
x>
0

1
3
3
1
<X < =P X <
P X
4
4
4
4

= P X

= F

3
1
P X
4
4

3
1
3
F
=
4
4
4

P X >

1
4

13
32

2
2
= 1 P X
3
3

2
2
= 1 F
=1
3
3

19
27

Example 2 (Exercise 5.4 of your book)


The prob density of a rv X is given by

0 < x <1

f (x ) = 2 x 1 x < 2
0
elsewhere
Find the prob that the rv takes a value
(a) between 0.2 and 0.8
(b) between 0.6 and 1.2
Find the distribution function and answer the same questions.
60

Solution

P(0.2 < X < 0.8) =

(a)

0.8
0.2

=
=

1
0.6

1
0.6

f ( x )dx +

x dx +

= 0.32 +

0.2

0 .8
x dx =
2

P(0.6 < X < 1.2) =

(b)

0.8

f ( x )dx

0 .2

1.2
0.6

1.2
4

= 0 .3

f ( x )dx
f (x )dx

(why ?)

2
(2 x ) dx = 1 0.6
2
2

1.2
1

2 x
+
2

f (t )dt

x 0 In this case f (t ) = 0 for t x


x

f (t )dt = 0.

0 < x 1 In this case f (t ) = 0 for t 0 = t and = t for t x


Hence F ( x ) =

=0+
Case (iii)

F (x ) =
Case (ii)

1.2

1 (.8)
=
= 0.32 + 0.18 = 0.5
2
2

To Find the distribution function F ( x ) = P( x x ) =


Case (i)

t dt =

f (t )dt =

f (t )dt + 1

x2
2

1 < x 2 In this case f (t ) = 0

t0
t 0< t 1
2t 1< t x

61

f (t )dt

F (x ) =
=

f (t )dt

f (t )dt +

x
1`

1
(by case ii ) +
2

f (t )dt

(2 t )dt

x
1

(2 x )
1 1 (2 x )
= 1
= +
2 2
2
2
2

Case (iv)

x > 2 In this case f (t ) = 0 for 2 < t < x

F (x ) =
=

f (t )dt

f (t )dt +

x
2

= 1 (by case iii ) +

f (t )dt
x

0 dt = 1

Thus
0
x2
2
F (x ) =
2
(
2 x)
1
2
1

0< x 1
1<
x

x 2
>

P (0.6 < X < 1.2 ) = P( X < 1.2 ) P ( X 0.6 )


= P ( X 1 .2 ) P ( X 0 .6 )
= F (1.2 ) F (0.6 )
2
(
0 .8 )
= 1

(0.6 )2
2

= 0 .5
62

P ( X > 1 .8 ) = 1 P ( X 1 .8 )
= 1 F (1.8) = 1

2
(
.2 )
1

= 0.02

The mean and Variance of a continuous r.v


Let X be a continuous rv with density f(x)
We define its mean as

= E(X ) =

x f ( x )dx

We define its variance 2 as


E (x ) =

( )

(x )2 f (x )dx

= E X 2 2

( )

Here E X 2 =

x 2 f ( x )dx

Example 3 The density of a rv X is


F ( x ) = 3x 2 0 < x < 1 (and 0 elsewhere )

Its mean = E ( X ) =

( )

E X2 =

1
0

x f ( x )dx =

1
0

3
x.3 x 2 dx = .
4

x 2 f ( x )dx

x 2 . 3 x 2 dx =

3
5

3 3
Hence 2 =
5 4

= 0.0375

Hence its sd is = 0.1936.

63

Example 4 The density of a rv X is


1 x / 20
e
x>0
f ( x ) = 20
0
elsewhere

= E(X ) =

x f ( x )dx =

x.

1 x / 20
e
dx
20

Integrating by parts we get

[(

= x. e x / 20 20e x / 20

= 20.

( )

E X2 =

x2

x 2 f ( x )dx

1 x / 20
e
dx
20

On integrating by parts we get

[x ( e
2

x / 20

= 800

) (2 x ) (20 e

x / 20

) + 2.( 400 e

x / 20

)]

( )

2 = E X 2 2 = 800 400 = 400


= 20.

NORMAL DISTRIBUTION
A random variable X is said to have the normal distribution (or Gaussian Distribution) if
its density is

f x; , 2 =

1
2

( x )2
2 2

< x <

Hence , are fixed (called parameters) and > 0. The graph of the normal density is
a bell shaped curve:
64

Figure
It is symmetrical about the line x = and has points of inflection at x = .
One can use integration and show that
variance of X = E ( X ) = 2 .

f ( x )dx = 1 . We also see that E ( X ) = and

If = 0, = 1, we say that X has standard normal distribution. We usually use the


symbol Z to denote the variable having standard normal distribution. Thus when Z is
1
2
standard normal, its density is f ( z ) =
e z 2 , < z < .
2
The cumulative distribution function of Z is
F ( z ) = P (Z z ) =

1
2

e t

dt

and represents the area under the density upto z. It is the shaded portion in the figure.

Figure
We at once see from the symmetry of the graph that F (0 ) =

F ( z ) = 1 F ( z )
65

1
= 0 .5
2

F(z) for various positive z has been tabulated at in table 3 (at the end of your book).
We thus see from Table 3 that

F (0.37 ) = 0.6443, F (1.645) = 0.95


F (2.33) = 0.99 F ( z ) 1 for z 3
Hence F ( 0.37 ) = 1 0.6443 = 0.3557

F ( 1.645) = 1 0.95 = 0.05 etc


Definition of z
If Z is standard normal, we define z to be that number such that
P (Z > z ) = or F ( z ) = 1 .

Since F(1.645) = 0.95 = 1-0.05, we see that


z 0.05 = 1.645
Similarly z 0.01 = 2.33
we also note z1 = z
Thus z 0.95 = z 0.05 = 1.645
z 0.99 = z 0.01 = 2.33.

Important
If X is normal with mean and variance 2 , it can be shown that the standardized r.v.
X
Z=
has standard normal distribution. Thus questions about the prob that X

assumes a value between say a and b can be translated into the prob that Z assumes
values in a corresponding range. Specifically :

P(a < X < b )

66

=P

=F

<

<

=P

<Z<

Example 1 (See Exercise 5.24 on page 152)


Given that X has a normal distribution with mean = 16.2 and variance 2 = 1.5625,
find the prob that it will take on a value
(a)
(b)
(c)
(d)

> 16.8
< 14.9
between 13.6 and 18.8
between 16.5 and 16.7

Here = 1.5625 = 1.25


Thus P ( X > 16.8) = P

>

16.8 16.2
1.25

.6
= P (Z > 0.48)
1.25
= 1 P (z 0.48) = 1 F (0.48)

=P Z>

= 1 0.6844 = 0.3156
(b)

P ( X < 14.9 ) = P

<

14.9 16.2
1.25

1 .3
= P (Z < 1.04 )
1.25
= F ( 1.04 ) = 1 F (1.04 ) = 1 0.8508 = .1492

=P Z <

P (13.6 < X < 18.8)


P

13.6 16.2 X 18.8 16.2


<
<
1.25

1.25

67

2 .6
2 .6
<Z <
= P ( 2.08 < Z < 2.08)
1.25
1.25
= F (2.08) F ( 2.08) = F (2.08) (1 F (2.08))

=P

= 2 F (2.08) 1 = 2 0.9812 1 = .9624

(Note that P( c < Z < c ) = 2 F (c ) 1 for c > 0 )


P (16.5 < X < 16.7 ) = P

16.5 16.2 X 16.7 16.2


<
<
1.25

1.25

.3
.5
<Z <
1.25
1.25
= P (0.24 < z < 0.4 ) = F (0.4 ) F (0.24 )

=P

= 0.6554 0.5948 = 0.606

Example 2
A rv X has a normal distribution with = 10. If the prob is 0.8212 that it will take on a
value < 82.5, what is the prob that it will take on a value > 58.3?

Solution
Let the mean (unknown) be .
Given P( X < 82.5) = 0.8212
Thus P

<

82.5
= 0.8212
10

Or P Z <

82.5
= 0.8212
10

82.5
= 0.8212
10

From table 3,

82.5
= 0.92
10

Or = 82.5 9.2 = 73.3


Hence P( X > 58.3)
68

=P

>

58.3 73.3
= P (Z > 1 / 5 )
10

= 1 P(Z 1.5) = 1 F ( 1.5)

= 1 (1 F (1.5)) = F (1.5) = 0.9332

Example 3 (See Exercise 5.33 on page 152)


In a Photographic process the developing time of prints may be looked upon as a r.v. X
having normal distribution with = 16.28 seconds and s.d. of 0.12 second. For which
value is the prob 0.95 that it will be exceeded by the time it takes to develop one of the
prints.

Solution
That is find a number c so that

P( X > c ) = 0.95
i.e P

i.e. P Z >

>

c 16.28
= 0.95
1 .2

Hence P Z

c 16.28
= 0.95
1 .2

c 16.28
= 0.05
1 .2

c 16.28
= 1.645
1 .2

c = 16.28 1.2 1.645 = 14.306.

NORMAL APPROXIMATION TO BINOMIAL DISTRIBUTION


Suppose X is a r.v. having Binomial distribution with parameters n and p. Then it can be
X np

z P (Z z ) = F ( z ) as n . i.e in words, standardized


npq
binomial tends to standard normal.

shown that P

69

Thus when n is large, the binomial probabilities can be approximated using normal
distribution function.

Example 4 (See Exercise 5.36 on page 153)


A manufacturer knows that on the average 2% of the electric toasters that he makes will
require repairs within 90 days after they are sold. Use normal approximation to the
binomial distribution to determine the prob that among 1200 of these toasters at least 30
will require repairs within the first 90 days after they are sold?

Solution
Let X = No. of toasters (among 1200) that require repairs within the first 90 days after
they are sold. Hence X is a rv having Binomial Distribution with parameters n = 1200
2
= .02.
and p =
100
Required P ( X 30 ) = P

X np
npq

30 24
4.85

P (Z 1.24 )1 P(Z < 1.24 )

= 1 F (1.24 ) = 1 0.8925 = 0.1075

Correction for Continuity


Since for continuous rvs P( z c ) = P( z > c ) (which is not true for discrete rvs), when we
approximate binomial prob by normal prob, we must ensure that we do not lose the end
point. This is achieved by what we call continuity correction: In the previous example,
P( X 30) also = P( X 29.5) (Read the justification given in your book on page 150
line 1to 7).

=P

X np
npq

29.5 24
4.85

5 .5
= P(Z 1.13)
4.85
= 1 P (Z 1.13) = 1 F (1.13) = 1 0.878

P Z
= .1292

(probably better answer).


70

Example 5 (See Exercise 5.38 on page 153)


A safety engineer feels that 30% of all industrial accidents in her plant are caused by
failure of employees to follow instructions. Find approximately the prob that among 84
industrial accidents anywhere from 20 to 30 (inclusive) will be due to failure of
employees to follow instructions.

Solution
Let X = no. of accidents (among 84) due to failure of employees to follow instructions.
Thus X is a rv having Binomial distribution with parameters n = 84 and p = 0.3.
Thus np = 25.2 and npq = 4.2
Required P(20 X 30)

= P(19.5 X 30.5) (continuity correction)


=P

19.5 25.2 X np 30.5 25.2

4 .2
4 .2
npq

P ( 1.36 Z 1.26 )
= F (1.26 ) F ( 1.36 ) = F (1.26 ) + F (1.36 ) 1
= 0.8962 + 0.9131 1 = 0.8093

OTHER PROBABILITY DENSITIES


The Uniform Distribution
A r.v X is said to have uniform distribution over the interval ( , ) if its density is given
by
1
f (x ) =
0

<x<
elsewhere

71

Thus the graph of the density is a constant over the interval ( , )


If

<c<d <
P (c < X < d ) =

d
c

1
d c
dx =

and thus is proportional to the length of the interval (c, d ).


You may verify that
The mean of X = E ( X ) = =

The variance of X =
0
x
f (x ) =

1

+
2

(
)2
=
12

(mid point of the interval ( , ) )

. The cumulative distribution function is

<x
x>

Example 6 (See page 165 exercise 546)


In certain experiments, the error X made in determining the solubility of a substance is a
rv having the uniform density with = 0.025 and = 0.025 . What is the prob such an
error will be
(a) between 0.010 and 0.015?
(b) between 0.012 and 0.012?

Solution
(a)

P (0.010 < X < 0.015) =


=

(b)

0.015 0.010
0.025 ( 0.025)

0.005
= 0 .1
0.050

P ( 0.012 < X < 0.012) =


=

0.012 ( 0.012 )
0.025 ( 0.025)

12
= 0.48
25

72

Example 7 (See exercise 5.47 on page 165)


From experience, Mr. Harris has found that the low bid on a construction job can be
regarded as a rv X having uniform density
3
f ( x ) = 4C
0

2C
< x < 2C
3
elsewhere

where C is his own estimate of the cost of the job. What percentage should Mr. Harris
add to his cost estimate when submitting bids to maximize his expected profit?

Solution
Suppose Mr. Harris adds k% of C when submitting his bid. Thus Mr. Harris gets a profit
kC
kC
if he gets the contract which happens if the lowest bid (by others) C +
and
100
100
kC
. Thus the prob that he gets the bid
gets no profit if the lowest bid < C +
100
kC
3
kC
3
k
=P C+
< X < 2C =
2C C +
= 1
100
4C
100
4
100
Thus the expected profit of Mr. Harris is
kC 3
k
1
+ 0 (....)
100 4
100

3C
k2
k
=
400
100
which is maximum (by using calculus) when k =50.
Thus Mr. Harriss expected profit is a maximum when he adds 50% of C to C, when
submitting bids.

Gamma Function
This is one of the most useful functions in Mathematics. If x > 0, it is shown that the

improper integral

e t t x 1 dt converges to a fuite real number which we denote by ( x )

(Capital gamma of x). Thus for all real no x > 0, we define

( x ) =

e t t x 1 dt.
73

Properties of Gamma Function


1.

( x + 1) = x( x ) , x > 0

2.

(1) = 1

3.

(2) = 1(1) = 1, (3) = 2(2) = 2 1 = 2!


More generally (n + 1) = n! whenever n is a +ve integer or zero.
1
= .
2

4.

5.

( x ) decreases in the interval (0,1) and increases in the interval (2, ) and has a
minimum somewhere between 1 and 2.

THE GAMMA DISTRIBUTION


Let 1 be 2 +ve real numbers. A r.v X is said to have a Gamma Distribution with
parameters 1 if its density is
1
x .
e x 1 x > 0
f ( x ) = ( )
0
elsewhere

It can be shown that


Mean of X = E ( X ) = =
(See the working on Page 159 of your text book)
Variance of X = 2 = 2 .

Exponential Distribution
If = 1, we say X has exponential distribution. Thus X has an exponential distribution
(with parameter > 0 ) if its density is
f (x ) =

x>0
elsewhere
74

We also see easily that:


1. Mean of X = E ( X ) =
2. Variance of X = 2 = 2
3. The cumulative distribution function of X is

F (x ) =

1 e

x>0
elsewhere

4. X has the memoryless property:

P( X > s + t | X > s ) = P( X > t )., s, t > 0


Proof of (4): P( X > s ) = 1 P( X s )

= 1 F (s ) = e

(by (3))

P( X > s + t | X > s ) =

P (( X > s + t ) ( X > s ))
P( X > s )

P( X > s + t ) e ( s + t ) /
t
=
=
=
e
= P( x > t ).QED
s

P( X > s )
e

Example 8 (See exercise 5.54 on page 166)


In a certain city, the daily consumption of electric power (in millions of kw hours) can be
treated as a r.v. X having a Gamma distribution with = 3, = 2. If the power plant in
the city has a daily capacity of 12 million kw hrs, what is the prob. that the power supply
will be inadequate on any given day?

Solution
The power supply will be inadequate if demand exceeds the daily capacity.
Hence the prob that the power supply is inadequate
= P ( X > 12 ) =

f ( x )dx

12

75

1
e 2 x 31
Now as = 3, = 2, f ( x ) = 3
2 (3)
x

1 2 2
=
x e
16

1 2 2
Hence P ( X > 12 ) =
x e dx
12 10
Integrating by parts, we get
x

1 2
=
x 2e 2 2 x 4e 2 + 2 8e 2
10

1
2 12 2 e 6 + 8 12 e 6 + 16e 6
16

400 6
e = 25e 6 = 0.062
10

12

Example 9 (see exercise 5.58 on Page 166)


The amount of time that a surveillance camera will run without having to be reset is a r.v.
X having exponential distribution with = 50 days. Find the prob that such a camera
(a) will have to be reset in less than 20 days.
(b) will not have to be reset in at least 60 days.

Solution
The density of X is
f (x ) =
(a)

1 50
e
50

x > 0 (and 0 elsewhere)

P (The camera has to be reset in < 20 days)


= P (the running time < 20)

76

20

20

1 50
= P ( X < 20 ) =
e dx = e 50
50
0

= 1 e
(b)

20
50

=1 e

2
5

= 0.3297

P (The camera will not have to be reset in at least 60 days.)

= P ( X > 60 ) =

= e

x
50

1 50
e dx
60 50

=e

6
5

= 0.3012

60

Example 10 (See exercise 5.61 on page 166)


Given a Poisson process with the average arrivals per unit time, find the prob density
of the inter arrival time (i.e the time between two consecutive arrivals).

Solution
Let T be the time between two consecutive arrivals. Thus clearly T is a continuous r.v.
No arrival in time period t.
with values > 0. Now T > t
Thus P (T > t ) = P ( X t = 0)
( X t = Number of arrivals in time period t)
= e t (as X t has a Poisson distribution with parameter = t )
Hence the distribution function of T
= F (t ) = P (T t ) = 1 P (t > t ) = 1 e t t > 0

(F (t ) = 0 clearly

for all t 0)

77

Hence the density of T , f (t ) =

e t

d
F (t )
dt

if t > 0
elsewhere

Hence we would say the IAT is a continuous rv. with exponential density with parameter
1
.

The Beta Function


If x,y>0 the beta function, B( x, y ) (read capital Beta x,y), is defined by
1

B ( x, y ) = t x 1 (1 t )

y 1

dt

It is well-known that B ( x, y ) =

( x )( y )
, x , y > 0.
( x + y )

BETA DISTRIBUTION
A r.v. X is said to have a Beta distribution with parameter , > 0 if its density is
1
1
f (x ) =
, x 1 (1 x )
0 < x <1
B ( , )
0
elsewhere
It is easily shown that

(1)

E(X ) = =

(2)

V (X ) = 2 =

( + ) ( + + 1)
2

78

Example 11 (See Exercise 5.64)


If the annual proportion of erroneous income tax returns can be looked upon as a rv
having a Beta distribution with = 2, = 9, what is the prob that in any given year,
there will be fewer than 10% of erroneous returns?

Solution
Let X = annual proportion of erroneous income tax returns. Thus X has a Gamma density
with = 2, = 9.

P( X < 0.1) =

0.1

f (x )dx (Note the proportion can not be < 0)

0.1

1
9 1
x 2 1 (1 x ) dx
B (2,9 )

B (2,9 ) =

0.1

(2 )(9 ) 1 8!
1
1
=
=
=
(11)
11! 9 10 11 990

x. (1 x ) dx =
8

[(1 x )

0.1

(1 x ) dx
9

(1 x )9

10

= (.9 )

10
(
1 x)

0.1

9
(
.9 )
=

1 (.9 )
1
+ +

9 9
10
10
10

.9 1
1 1
1
19
9
+
= (.9 )
10 9
9 10 90
900

= 0.00293

The Log Normal Distribution


A r.v X is said to have a log normal distribution if its density is

f (x ) =

1
2
0

x 1 e (ln x )

/ 2 2

x > 0, > 0
elsewhere
79

It can be shown that if X has log-normal distribution, Y = ln X has a normal distribution


with mean = and s.d. = .
Thus P(a < X < b )

= p(ln a < ln X < ln b )


=p

ln a

<Z <

ln b

ln b

=F

ln a

Where F (z ) = cdf of the standard normal variable Z.


Lengthy calculations show that if X has log-normal distribution, its mean E ( X ) = e +
2

and its variance = e 2 + e 1

More problems on Normal Distribution


Example 12
and sd . Determine c as a function of and such

Let X be normal with mean


that

P( X c ) = 2 P( X c )
Solution

P ( X c ) = 2 P (x c )
Implies P( X c ) = 2 (1 P( X < c ))
Let P( X c ) = p
Thus 3 p = 2 or p =
Now P ( X c ) = P
Implies

2
3

=F

= 0.43 (approx from Table 3)

c = + 0.43

80

2
= .6667
3

Example 13

Suppose X is normal with mean 0 and sd 5. Find P 1 < X 2 < 4

Solution

P 1< X 2 < 4

= P (1 < X < 2 )
=P

1
2
2
1
< Z <
=P Z <
P Z <
5
5
5
5

=2 F

2
5

1 2 F

2
1
1
1 = 2 F
F
5
5
5

= 2(0.6554 0.5793) from Table 3


= 2 (.0761) = 0.1522
Example 14
The annual rain fall in a certain locality is a r.v. X having normal distribution with mean
29.5 and sd 2.5. How many inches of rain (annually) is exceeded about 5% of the time?

Solution
That is we have to find a number C such that

P ( X > C ) = 0.05
i.e P

Hence

>

C 29.5
= 0.05
2 .5

C 29.5
= z 0.05 = 1.645
2 .5

C = 29.5 + 2.5 1.645


= 33.6125
81

Example 15
A rocket fuel is to contain a certain percent (say X) of a particular compound. The
specification calls for X to lie between 30 and 35. The manufacturer will make a net
profit on the fuel per gallon which is the following function of X.
T (X ) =

$ 0.10 per gallon if 30 < X < 35


$0.05 per gallon if 35 X < 40 or 25 < X 30
-$0.10 per gallon
elsewhere.

If X has a normal distribution with mean 33and s.d. 3, find the prob distribution of T and
hence the expected profit per gallon.

Solution
T = 0.10 if 30 < X < 35
P (T = 0.10 ) = P(30 < X < 35)
=P

30 33 X 35 33
<
<
3

= P 1< Z <

2
3

2
2
F ( 1) = F
+ F (1) 1
3
3
= 0.7486 + 0.8413 1 = 0.5899

=F

P (T = 0.05) = P(35 X < 40 ) + P (25 < X 30 )


=P

35 33 X 40 33
25 33 X 30 33
<
+P
<
<

3
3

=P

2
7
8
+P
< Z 1
Z <
3
3
3

=F

7
2
8
F
+ F ( 1) F
3
3
3

=F

7
2
8
F
+F
F (1)
3
3
3

= 0.9901 0.7486 + 0.9961 0.8413 = 0.3963


Hence P (T = 0.10 ) = 1 0.5899 0.3963
= 0.0138
Hence expected profit = E(T)

82

= 0.10 .5899 + 0.05 0.3963 + ( 0.10) 0.0138


= $0.077425

JOINT DISTRIBUTIONS Two and higher dimensional Random


Variables
Suppose X,Y are 2 discrete rvs and suppose X can take values x1 , x 2 .......and Y can take
values y1 , y 2 ......... we refer to the function f ( x, y ) = P(Y = x, Y = y ) as the joint prob
distribution of X and Y. The ordered pair (X,Y) is sometimes referred to as a two
dimensional discrete r.v.

Example 16
Two cards are drawn at random from a pack of 52 cards. Let X be the number of aces
drawn and Y be the number of Queens drawn.
Find the joint prob distribution of X and Y.

Solution
Clearly X can take any one of the three values 0,1,2 and Y one of the three values, 0,1,2.
The joint prob distribution of X, and Y is depicted in the following 3 x 3 table
x

4
1

44
2
52

4
1

44
1

44
2
52

4
2
52

4
1

4
1

52

52

4
2

2
52
2

83

Justification
P ( x = 0, y = 0 )
= P (no aces and no queens in t he 2 cards)

44
2
52
2

P( X = 1, Y = 0) (the entry in the 2nd col and 1st row)


=P (one ace and one other card which is neither ace nor a queen)

44

44

1
52
2

etc.

Can we write down the distribution of X? X can take any one of the 3 values 0,1,2
What is P( X = 0) ?
X = 0 means no ace is drawn but we might draw 2 queens, or 1 queen and one non queen
or 2 cards which are neither aces nor queens.
Thus

P ( X = 0 ) = P ( X = 0, Y = 0 ) + P ( X = 0, Y = 1) + P ( X = 0, Y = 1)
= Sum of the 3 prob in col. 1
44
2
52
2

4
1

44
1
52
2

4
2
52
2

48
2
52
2

(Verify!)

Similarly P( X = 1) = P( X = 1, Y = 0) + P( X = 1, Y = 1) + P( X = 1, Y = 2)
84

= Sum of the 3 probabilities in 2nd col.

44

52

1
52

+0=

48

1
52

(Verify!)

P ( X = 2 ) = P( X = 2, Y = 0 ) + P ( X = 2, Y = 1) + P( X = 2, Y = 2 )
= Sum of the 3 probabilities in 3rd col
4
4
2
2
=
+0+0=
52
52
2

The distribution of X derived from the joint distribution of X and Y is referred to as the
marginal distribution of X..
Similarly the marginal distribution of Y are the 3 row totals.

Example 17
The joint prob distribution of X and Y is given by

-1
y
0
1
Marginal Distribution of X

-1
1
8
1
8
1
8
3
8

0
1
8
0
1
8
2
8

1
1
8
1
8
1
8
3
8

3
8
2
8
3
8

Write the marginal distribution of X and Y. To get the marginal distribution of X, we find
the column totals and write them in the (bottom) margin. Thus the (marginal) distribution
of X is
X
-1
0
1
Prob
3
2
3
8
8
8
85

(Do you see why we call it the marginal distribution)


Similarly to get the marginal distribution of Y, we find the 3 row totals and write them in
the (right) margin.
Thus the marginal distribution of y is

Y
-1

Prob
3
8
2
8
3
8

0
1

Notation: If f ( x, y ) = P( X = x, Y = y ) is the joint prob distribution of the 2-dimensional


discrete r.v (X.Y), we denote by g (x) the marginal distribution of X and by h(y) the
marginal distribution of Y.
Thus g ( x ) = P ( X = x ) = 1 P( X = x, Y = y ) = 1 f ( x, y )
All y
all y
1

And

h( y ) = P(Y = y ) =

1
1

P ( X = x, Y = y ) =

all x

1
1

f ( x, y )

all x

Conditional Distribution
The conditional prob distribution of Y for a given X = x is defined as

h( y x ) = P (Y = y X = x ) (read prob of Y = y given X = x)


=

P ( X = x, Y = y ) f ( x , y )
=
P( X = x )
g (x )

where g (x) is the marginal distribution of X.


Thus in the above example 17,

h(0 | 1) = P (Y = 0 | X = 1) =

P( X = 1, Y = 0 )
=
P( X = 1)

1
8
3

1
3

Similarly, the conditional prob distribution of X for a given Y = y is defined as

86

g ( x | y ) = P( X = x | Y = y ) =

P ( X = x, Y = y ) f ( x, y )
=
P(Y = y )
h( y )

Where h(y) is the marginal distribution of Y.


In the above example,

g (0 | 0 ) = P ( X = 0 | Y = 0 ) =

P (Y = 0, y = 0 ) 0
= =0
2
P (Y = 0 )
8

Independence
We say X,Y are independent if

P( X = x, Y = y ) = P( X = x )P(Y = y ) for all x, y.


Thus X,Y are independent if and only if

f ( x, y ) = g ( x )h( y ) for all x and y


which is the same as saying of g(x|y) =g(x) for all x and y which is the same as saying
h( y | x ) = h( y ) for all x,y.
In the above example X,Y are not independent as P( X = 0, Y = 0) P( X = 0)P(Y = 0)

Example 18
The joint prob distribution of X and Y is given by

(a)

2
0
1

2
0.1
0.05
0.1

X
0
0.2
0.1
0.1

1
0.1
0.15
0.1

Find the marginal distribution of x.


Ans

X
Prob

2
0.25

0
0.4

87

1
0.35

(b)

Find the marginal distribution of Y


Ans

Y
2
0
1
(c)

Prob
0.4
0.3
0.3

Find P( X + Y = 2)
Ans X + Y = 2 if ( X = 2, Y = 0) or ( X = 1, Y = 1) or ( X = 0, Y = 2)
Thus P( X + Y = 2 ) = 0.05 + 0.1 + 0.2 = 0.35

(d)

Find P( X Y = 0)
Ans : X Y = 0 if ( X = 2, Y = 2) or ( X = 0, Y = 0) or ( X = 1, Y = 1)

P( X Y = 0) = 0.1 + 0.1 + 0.1 = 0.3


(e)
(f)
(g)
(h)

Find P( X 0) Ans. 1
0 .3
= 0 .3
1
0 .2 1
Find P ( X Y = 0 X 1) Ans.
=
0 .6 3
Are X,Y independent?
Find P ( X Y = 0 X 0 ) Ans.

Ans No! P( X = 1, Y = 1) P( X = 1) P(Y = 1).

Two-Dimensional Continuous Random Variables


Let (X,Y) be a continuous 2-dimensional r.v. This means (X,Y) can take all values in a
certain region of the X,Y plane. For example, suppose a dart is thrown at a circular board
of radius 2. Then the position where the dart hits the board (X,Y) is a continuous two
dimensional r.v as it can take all values (x,y) such that x 2 + y 2 4.
A function f ( x, y ) is said to be the joint prob density of (X,Y) if
(i)

f ( x, y ) 0 for all x, y

88

(ii)

f (x, y )dy dx = 1

(iii)

P(a X b, c Y d ) =

b d

f ( x, y )dy dx.

a c

Example 19(a)
Let the joint prob density of (X,Y) be

f ( x, y ) =

1
4
0

0 x 2, 0 y 2

elsewhere

Find P( X + Y 1)
Ans : The region x + y 1 is given by the shaded portion.

P (x + y 1)

1 x

x = 0 y =0

1
dy dx
4

1
(1 x ) dx = 1 (1 x )2
=
8
0 4

1
= .
8

Example 19(b)
The joint prob density of (X,Y) is

f ( x, y ) =

1
(6 x y ) 0 < x < 2, 0 < y < 4
8

Find P ( X < 1, Y < 3)

Solution
1

f (x, y )dy dx

x =0 y = 2

89

x=0 y =2

2
1
(6 x ) y y
8
2

x =0

1
(6 x y )dy dx
8

x=0

dx
2

1
(6 x ) 5 dx
8
2

1 (6 x )
5
=

8
2
2
2

=
0

1 25 5
3
+ 18 =
8
2 2
8

Marginal and Conditional Densities


If f ( x, y ) is the joint prob density of the 2-dimensional continuous rv (X,Y), we define
the marginal prob density of X as
g (x ) =

f ( x, y )dy

That is fix x and integrate f(x,y) w.r.t y


Similarly the marginal prob density of Y is
h( y ) =

f ( x, y )dx

The conditional prob density of Y for a given x is


h( y | x ) =

f ( x, y )
(Defined only for those x for which g(x) 0)
g (x )

The conditional prob density of X for a given y is

g (x | y ) =

f ( x, y )
(defined only for those y for which h( y ) 0)
h( y )

90

Independence
We say X,Y are independent if and only if f ( x, y ) = g ( x )h( y )
which is the same as saying g ( x | y ) = g ( x ) or h( y | x ) = h( y ).

Example 20
Consider the density of (X,Y) as given in example 19.
The marginal density of x
= g (x ) =

1
(6 x y )dy
8

y =2

1
y2
(6 x ) y
8
2

1
[2(6 x ) 6] 0 < x < 2
8
and = 0
elsewhere
=

We verify this is a valid density.


g (x ) =
2

1
g ( x )dx =
8

Secondly
0

1
(6 2 x ) 0 for 0 < x < 2
8

1
6x x 2
8

(6 2 x )dx

1
[12 4] = 1
8

The marginal density of Y is

h( y )

x =0

1
(6 x y )dx
8

1
x2
= (6 y )x
8
2

=
x =0

1
[2(6 y ) 2]
8

91

1
(10 2 y ) or < y < 4
8
0
elsewhere

Again h( y ) 0 and

h( y )dy

1
=
(10 2 y )dy = 1 10 y y 2
82
8

4
2

1
[20 12] = 1
8

The conditional density of Y for X = 1

1
(6 1y ) 1
f ( x, y ) 8
is h( y | 1) =
=
= (5 y ), 2 < y < 4
1
g (1)
(6 2) 4
8
And 0 elsewhere
Again this is a valid density as h( y | 1) 0
4

And
2

1
h( y | 1)dy =
4

(5 y )dy

1 (5 y )
=

4
2

P ( x < 1 | Y < 3) =

Now Nr =

=
2

1 9 1
=1

4 2 2

P( X < 1, Y < 3)
P (Y < 3)

3
8

Dr = P(Y < 3) =

h( y )dy =

1
=
(5 y )dy = 1 (5 y )
42
4
2
3

1
(10 2 y )dy
82
2

=
2

1 9 4
5

=
4 2 2
8
92

The conditional density of Y for X = 1

1
(6 1y ) 1
f (1, y ) 8
Is h( y | 1) =
=
= (5 y ) 2 < y < 4
1
g (1)
(6 2) 4
8
And 0 elsewhere
Again this is a valid density as h( y | 1) 0 and

h( y | 1)dy =

1 (5 y )
=

4
2

=
2

P ( X < 1 | Y < 3) =

Now Numerator =

3
8
3

1
h( y )dy =
8

(5 y )dy = 1 (5 y )
4
2

(5 y )dy

P ( x < 1, y < 3)
P (Y < 3)

1 9 1
=1

4 2 2

Denominator = P (Y < 3) =

1
=
4

1
4

=
2

(10 2 y )dy

1 9 4
5

=
4 2 2
8

3
Hence P ( X < 1, Y < 3) = 8 =
5
5
8

The Cumulative Distribution Function


Let f ( x, y ) be the joint density of (X,Y). We define the cumulative distribution function
as
F ( x, y ) = P ( X x , Y y )

f (u , v )dvdu.

93

Example 21 (See Exercise 5.77 on page 180)


The joint prob density of X and Y is given by

f ( x, y ) =

6
5

(x + y )

0 < x < 1, 0 < y < 1


elsewhere

Find the cumulative distribution function F(x,y)

Solution
Case (i)

x<0
x

F ( x, y ) =

f (u , v )dvdu

= 0 as f (u , v ) = 0 for
any u , v < 0
Case (ii)

y < 0.
Again F ( x, y ) = 0 whatever be x.

Case (iii)

(0 < x < 1, 0 < y < 1)


y

F ( x, y ) =

f (u , v )dvdu

6
5

(u + v )dvdu (as f (u, v ) = 0 for u < 0 or v < 0)


2

u =0 v = 0

6
v3
=
uv +
5 u =0
3

du
0

6
y3
6 x 2 y xy 3
=
uy +
du =
+
.
5 u =0
3
5 2
3

94

Case (iv)

0 < x < 1, y 1
y

F ( x, y ) =

f (u , v ) dv du

u =0 v = 0

6
u + v 2 dv du
5

6
1
6 x2 x
u + du =
=
+
5 u =0
3
5 2 3
Case (v)

x 1, 0 < y < 1
as in case (iii) we can show

F ( x, y ) =
Case (v)

6 y y3
+
5 2 3

x 1, y 1
x

F ( x, y ) =

f (u , v )dv du =

u =0 v = 0

6
1
6 1 1
u + du =
+ =1
5 u 0
3
5 2 3

(Did you anticipate this?)


Hence

P(0.2 < X < 0.5, 0.4 < Y < 0.6)


= F (0.5,0.6 )
F (0.2,0.6 ) F (0.5, 0.4 )
+ F (0.2,0.4 ) (Why ?)

95

6
u + v 2 dvdu
5

6 (.5) (0.6 ) (0.5)(0.6 ) (0.2 ) (0.6 ) (0.2 )(0.6 )


+

5
2
3
2
3
3

2
3
2
3
(
0.5) (0.4 ) (0.5)(0.4 ) (0.2 ) (0.4 ) (0.2 )(0.4 )

+
+

6
(0.5)2 1 + (0.5) (0.6)3 (0.4)3 (0.2)2 .1 (0.2) (0.6) (0.4)
5
3
3

6
(0.5 )2 (0.2 )2 0.1 + (0.1) (0 .6 )3 (0.4 )3
5

6
2
2
3
3
0 .1 (0 .5 ) (0 .2 ) + (0 .6 ) (0 .4 )
5

6
0 .1 [0 .362 ]
5

[[

]]

= 0 .04344

Example 22
The joint density of X and Y is

f ( x, y ) =

6
5

(x + y )
2

0 < x < 1, 0 < y < 1


elsewhere

(a) Find the conditional prob density g (x | y)


1
(b) Find g x |
2

(c) Find the mean of the conditional density of X given that Y =


Solution

g (x | y ) =

f ( x, y )
where h( y ) is the marginal density of y.
h( y )

96

1
2

Thus
1

h( y ) =

f ( x, y )dx =

x =0

1
6
5

(x + y )dx
2

x =0

6 1
+ y 2 0 < y < 1.
5 2

Hence
g (x | y ) =

(and

6
5
6
5

(x + y ) = x + y
( +y ) +y
2

1
2

1
2

2
2

, 0 < x < 1.

0 elsewhere )

g x |

x+ 1 4
1
1
= 1 14 =
x+ ,
2
+4 3
4
2

0 < x <1

Hence
E x| y=

1
2

x g x|
0

=
0

1
dx
2

4
1
x + dx
3
4

4 x3 x2
=
+
3 3 8

=
0

4 1 1
11
+
=
3 3 8
8

97

Example 23
(X,Y) has a joint density which is uniform on the rhombus find
(a) Marginal density of X.
(b) Marginal density of Y
(c) The conditional density of Y given X =

1
2

Solution
(X,Y) has uniform density on the rhombus means
f (x , y ) =

1
Area of the r hom bus

1
over the r hom bus
2

and 0 elsewhere
(a) Marginal Density of X
Case (i)

0<x<1
f (x ) =

1 x

y = x 1

Case (ii)

1
dy = (1 x )
2

1<x<0
f (x ) =

1+ x

1
dy = 1 + x
2
y = 1 x

Thus

1 + x 1 < x < 0

g (x ) = 1 x
0

0 < x <1
elsewhere

(b) By symmetry marginal density of Y is


98

1+ y

1 < y < 0

h( y ) = 1 y
0
(c)

for x =

0 < y <1
elsewhere

1
1 1
, y ranges from to
2
2 2

Thus conditional density of Y for X =

h (y | 12 ) =

for x =

h (y |

f (x , 12 )
f ( 12 )

1 12 < y < 12
0 elsewhere

1
2 2
Y rangs from to
3
3 3

1
3

)=

1
2
2
3

=
0

3
4

2
2
<y<
3
3
elsewhere

99

1
is
2

PROPERTIES OF EXPECTATIONS
Let X be a r.v. a,b be constants
Then
(a)

E (aX + b ) = a E ( x ) + b

(b)

Var (aX + b ) = a 2 Var (X )

If X 1 , X 2 ...... X n are any n rvs,

E(X 1 + X 2 + ....... + X n ) = E(X 1 ) + E(X 2 ) + .... + E(X n )


But if X 1 ,.....X n are n indep rvs then

Var (X 1 + X 2 + ..... + X n ) = Var (X 1 ) + Var (X 2 ) + .... + Var (X n )


In particular if X,Y are independent

Var (X + Y ) = Var(X Y ) = Var (X ) + Var (Y )


Please note : whether we add X and Y or subtract Y from X, we always must add their
variances.
If X,Y are two rvs, we define their covariance

COV (X, Y ) = E[(X 1 )(Y 2 )]


Where 1 = E(X ), 2 = E(Y )
Th. If X,Y are indep, E(XY ) = E(X )E(Y ) and COV (X, Y ) = 0

100

Sample Mean
Let X 1 , X 2 .....X n be n indep rvs each having the same mean and same variance 2 .
We define
X=

X 1 + X 2 + ... + X n
n

X is called the mean of the rvs X 1 .....X n . Please note that X is also a rv.

Theorem

()

1.

E X =

2.

Var X =

()

2
.
n

Proof
(i)

()

EX =

1
[E(X 1 ) + E(X 2 ) + .... + E(X n )]
n

(2)

()

Var X =

1 + + ..... +
=
n
n times
1
[Var (X 1 ) + Var(X 2 ) + .... + Var(X n )]
n2

(as the variables are independent)

1 2 + 2 + .. + 2
n 2 2
=
=
n times
n
n2
n2

101

Sample variance
Let X 1 ...X n be n indep rvs each having the same mean and same variance 2 . Let
X=

X1 + X 2 + X n
be their sample mean. We define the sample variance as
n

S2 =

1
n 1

n
i =1

(X

Note S 2 is also a r.v.

( )

E S2 = 2

Proof. Read it on page 179.

Simulation
To simulate the values taken by a continuous r.v. X, we have to use the following
theorem.

Theorem
Let X be a continuous r.v. with density f(x) and cumulative distribution function F(x). Let
U = F ( X ) . Then U is a r.v. having uniform distribution on (0,1).
In other words, U is a random number. Thus to simulate the value taken by X, we take a
random no U from the table 7 (Now you must put a decimal point before the no) And
solve for X, the equation

F (X ) = U

102

Example 24
Let X have uniform density on ( , ) . Simulate the values of X using the 3-digit random
numbers.
937, 133, 753, 503, ..

Solution
Since X has uniform density on ( , ) its density is

f (x ) =

<n<

elsewhere

Thus the cumulative distribution function is

F (x ) =

<x
x>

1
F(X ) =

means

X
=

_ _

X = + ( )
Hence if

= .937, X = + ( ).937

= .133, X = + ( ).133
etc.
Let x have exponential density (with parameter )
f (x ) =

x>0
elsewhere

103

Hence the cumulative distribution function is


F(x ) =

0
1 e

xp

x0
x>0

Thus solving F(X ) = U, (ie) 1 e

= U for X, we get

1
1 U

X = ln

Since U is a random number implies 1-U is also a random number, we can as well use the
formula

X = ln

1
U

= ln U.

Example 25
X has exponential density with parameter 2. Simulate a few values of X.

Solution
The defining equation for X is

X = 2 ln
Taking 3 digit random numbers form table 7 page 595 row 21 col. 3, we get the random
numbers : 913, 516, 692, 007 etc.
The corresponding X values are :

2 ln(.913), 2 ln(.516), 2 ln (.692 )..........

104

Example 26
The density of a rv X is given by

f (x ) = x 1 < x < 1

= 0 elsewhere
Simulate a few values of X.

Solution
First let us find the cumulative distribution function F(x).
Case (i)

x 1 In this case F(x) = 0

Case (ii)

1 < x 0.
F (x ) =

f (t )dt =

Case (iii)

t dt
1

t dt =

1 x2
2

0 < x 1
In this case F ( x ) =

f (t )dt

0 dt +

= 0+

t dt +

tdt
0

1 x2 1+ x2
+
=
2 2
2

105

Case (iv)

x>1. In this case F(x) =1


Thus

x 1

F(x ) =

1 < x 0
0 < x 1

x >1

1 x
2
1+ x 2
2

To simulate a value for X, we have to solve the equation F(x) = U for X


0 U <

Case (i)

1
2

In this case we use the equation

1 x 2
F(x ) =
= U(why ?)
2
X = 1 2 U (why ?)
1
U <1
2

Case (ii)

In this case we solve for X, the equation


F(X ) =

1+ X 2
=U
2

X = + 2U 1
Thus the defining conditions are :

If 0 U <

1
, X = 1 2U
2

and
1
If U < 1,
2

x = + 2U 1
106

Let us consider the 3 digit random numbers on page 594 Row 17 Col. 5
726, 282, 272, 022,.
U = .726

1
Thus X = + 2 .726 1 = 0.672
2

U = .281 <

1
Thus X = 21 2 .281 = 0.662
2

Note : Most of the computers have built in programs which generate random deviates
from important distributions. Especially, we can invoke the random deviates from a
standard normal distribution. You may also want to study how to simulate values from a
standard normal distribution by Box-Muller-Marsaglia method given on page 190 of the
text book.

Example 27
Suppose the no of hours it takes a person to learn how to operate a certain machine is a
random variable having normal distribution with = 5.8 and = 1.2. Suppose it takes
two person to operate the machine. Simulate the time it takes four pairs of persons to
learn how to operate the machine. That is, for each pair, calculate the maximum of the
two learning times.

Solution
We use Box-Muller-Marsaglia Method to generate pairs of values z1 , z 2 taken by a
standard normal distribution. Then we use the formula

x1 = + z1
x 2 = + z 2
to simulate the time taken by a pair of persons.
(where = 5.8, = 1.2 )
We start with the random numbers from Table 7
107

Page 593, Row 19, Column 4


729, 016, 672, 823, 375, 556, 424, 854
Note
z 1 = 2 ln (u 2 ) Cos (2 1 )
Sin (2u 1 )

z 2 = 2 ln u 2

The angles are expressed in radians.


U1
.729

U2
.016

Z1
-0.378

Z2
-0.991

etc.

Review Exercises
5.108. If the probability density of a r.v. X is given by

k 1 x 2
f (x ) =
0

0 < x <1
elsewhere

Find the value of k and the probabilities


(a)

P(0.1 < X < 0.2)

(b)

P(X > 0.5)

Solution

f ( x )dx = 1 gives w s

k 1 x 2 dx = 1

or k 1

k =

1
=1
3

3
2
108

X1
5.346

X2
4.611

The cumulative distribution function F(x) of X is:


Case (i)

x 0 F (x ) = 0

Case (ii)

0 < x 1 , F(x ) =

k 1 t 2 dt

x3
3
x
=
.
2
3
Case (iii)

x > 1. F(x ) = 1
P(0.1 < X < 0.2 ) = F(0.2) F(0.1)

3
(0.2)2 (0.2)
2
3

3
(0.1) (0.1)
2
3

P(X < 0.5) = 1 P(X 0.5)


= 1 F(0.5) = 1

3
(0.5) (0.5)
2
3

5.113: The burning time X of an experimental rocket is a r.v. having the normal
distribution with = 4.76 sec and = 0.04 sec . What is the prob that this kind of rocket
will burn
(a)
(b)
(c)

<4.66 Sec
> 4.80 se
anywhere from 4.70 to 4.82 sec?

Solution
(a)

P(X < 4.66 ) = P

X 4.66 4.76
<

0.04

= P(Z < 0.25) = 1 P(Z < 0.25)

= 1 F (0.25) = 1 0.5987 = 04013


109

P(X > 4.80 ) = P

(b)

X 4.80 4.76
>

0.04

= P(Z > 1) = 1 F (1) = 1 0.8413 = 0.1587


P(4.70 < X < 4.82)

(c)

=P

4.70 4.76 X 4.82 4.76


<
<
0.04

0.04

= P( 1.5 < Z < 1.5)

= 2F(1.5) 1 = 2 0.9332 1 = 0.8664

5.11 The prob density of the time (in milliseconds) between the emission of beta particles
is a r.v. X having the exponential density

0.25e 0.25
f (x ) =
0

x>0
elsewhere

Find the probability that


The time to observe a particle is more than 200 microseconds (=200x 10-3
milliseconds)
The time to observe a particle is < 10 microseconds

(a)
(b)

Solution
(a)

P(> 200 micro sec ) = P X > 200 10 3 milli sec


=

20010

0.25e 0.25 x dx = e 0.25 x


3

= e 5010

20010 3

110

P(X < 10 micro sec onds ) = P X < 10 10 3

(b)

1010 3

0.25 e 0.25 x dx = e 0.25b

1010 3
0

= 1 e 2.510

5.120: If n sales people are employed in a door-to-door selling campaign, the gross sales
volume in thousands of dollars may be regarded as a r.v. having the Gamma distribution
1
with = 100 n and = . If the sales costs are $5,000 per salesperson, how many
2
sales persons should be employed to maximize the profit.
Solution
For a Gamma distribution = = 50 n . Thus (in thousands of dollars) the average
profit when n persons are employed.
= T = 50 n 5n (5 x 1000 per person is the cost per person)
This is a maximum (using calculus) when n = 25.

5.122: Let the times to breakdown for the processors of a parallel processing machine
have joint density

f ( x, y ) =

0.04e 0.2 x 0.2 y


0

x > 0, y > 0
elsewhere

where X is the time for the first processor and Y is the time for the 2nd processor. Find
(a)
(b)
(c)

The marginal distributions and their means


The expected value of the sum of the X and Y.
Verify that the mean of a sum is the sum of the means.

111

Solution
(a)

Marginal density of X

= g (x ) =

f (x , y )dy =

y =

= 0 .2 e

0.2 x

0.04e 0.2 x 0.2 y dy

y =0

0.2e 0.2 y dy = 0.2e 0.2 x , x > 0

y=0

(and

= 0 if x 0 )

By symmetry, the marginal distribution of Y is

0.2e 0.2 y
h( y ) =
0

y>0
elsewhere

Since X (& Y) have exponential distributions (with parameters


= E(Y) = 5.
E since f(x,y) = g (x) h (y), X,Y are independent.

E(X + Y ) =

(x + y ) f (x, y )dydx

(x + y )(0.04)e 0.2 x 0.2 y dydx

x =0 y = 0

x.0.04e 0.2 x 0.02 y dydx

x =0 y = 0

112

1
= 5 ) E(X)
0 .2

y 0.04e 0.2 x 0.2 y dydx

x =0 y =0

= 5 + 5 = 10 (verify!)
= E(X ) + E(Y )
5.123: Two random variable are independent and each has binomial distribution with
success prob 0.7 and 2 trials.
(a)
(b)

Find the joint prob distribution.


Find the prob that the 2nd variable is greater than the first.

Solution
Let X,Y be independent and have Binomial distribution with parameters n = 2, and
p = 0.7 Thus

P(X = k ) =

2
(0.7 )k (0.3)2 k k = 0,1,2
k

P(Y = r ) =

2
(0.7 )r (0.3)2 r r = 0,1,2
r

P(X = k , Y = r ) = P(X = k )P(Y = r ) as X, Y are independent.


=

2
k

2
r

(0.7 )k + r (.3)4(k + r )
0 k, r 2

113

(b)

P(Y > X )
= P(Y = 2, X = 0 or1) + P(Y = 1, X = 0 )

2
2
2
1

(0.7 )2 (.3)0
(0.7 )1 (0.3)1

(0.7 )0 (0.3)2 +

0
2
0

2
1

(0.7 )1 (0.3)1

(0.7 )0 (0.3)2

5.124 If X1 has mean 5, variance 3 while X2 has mean 1 and variance 4, and the two are
independent, find
(a)

E(3X 1 + 5X 2 + 2)

(b)

Var (3X 1 + 5X 2 + 2)

Ans:
(a)

3 ( 5) + 5(1) + 2 = 8

(b)

9 3 + 25 4 = 127

114

Sampling Distribution
Statistical Inference
Suppose we want to know the average height of an Indian or the average life length of a
bulb manufactured by a company, etc. obviously we cannot burn out every bulb and find
the mean life length. One chooses at random, say n bulbs, find their lifelengths
X + X 2 + .... + X n
X 1 , X 2 ..... X n and take the mean life length X = 1
as an approximation
n
to the actual (unknown) mean life length. Thus we make a statement about the
population (of all life lengths) by looking at a sample of it. This is the basis behind
statistical inference. The whole theory of statistical inference tells us how close we are to
the true (unknown) characteristic of the population.
Random Sample of size n
In the above example, let X be the lifelength of a bulb manufactured by the company.
Thus X is a rv which can assume values > 0. It will have a certain distribution and a
certain mean etc. When we make n independent observations, we get n values

x1 , x 2 ....x n . clearly if we again take n observations, we would get y1 , y 2 .... y n . Thus we


may say

Definition
Let X be a random variable. A random sample of size n from x is a finite ordered
sequence {X 1 , X 2 ...., X n }of n independent rv3 such that each Xi has the same
distributions that of X.

Sampling from a finite population


Suppose there is an universe having a finite number of elements only (like the number of
Indians, the number of females in USA who are blondes etc.). A sample of size n from
the above is a subset of n elements such that each subset of n elements has the same prob
of being selected.

115

Statistics
Whenever we sample, we use a characteristic of the sample to make a statement about the
population. For example suppose the true mean height of an Indian is (cms). To make a

statement about , we randomly select n Indians, Find their heights {X 1 , X 2 ...., X n }and
then their mean namely
X=

X 1 + X 2 + ..... + X n
n

We use then

X as an estimate of the unknown parameter . Remember is a

parameter, a constant that is unchanged. But the sample mean X is a r.v. It may assume
different values depending on the sample of n Indians chosen.
Definition : Let X be a r.v. Let {X1 , X 2 .....X n } be a sample of size n from X. A statistic

is a function of the sample {X 1 , X 2 ,...., X n }.

Some Important Statistics


1. The sample mean X =

X 1 + X 2 + ..... + X n
n

2. The sample Variance S 2 =

1 n
Xi X
n 1 i =1

3. The minimum of the sample K = min {X 1 , X 2 ,...., X n }


4. The maximum of the sample M = max {X 1 , X 2 ,......X n }.

5. The Range of the sample R = M K

Definition

If X 1 ,.....X n is a random sample of size n and if X is a statistic, then we remember X is

also a r.v. Its distribution is referred to as the sampling distribution of X .


116

The Sampling Distribution of the Sample Mean X .


Suppose X is a r.v. with mean and variance 2 . Let X 1 , X 2 .....X n be a random sample
of size n from X. Let X =

X 1 + X 2 + ........ + X n
be the sample mean. Then
n

( )

(a)

E X = .

(b)

VX =

(c)

If X 1 ....X n is a random sample from a finite population with N elements, then

( )

2
.
n

( )

Var X =

2 N n
.
n N 1

(d)

If X is normal, X is also normal

(e)

Whatever be the distribution of X, if n is large

X
has approximately the

n
standard normal distribution. (This result is known as the central limit theorem.)

Explanation
(a)

tells us that we can expect the sample mean X to be an approximation to


the population mean .

(b)

tells us that the nearness of X to is small when the sample size n is


large.

(c)

says that if X has a normal distribution.

X
has a standard normal

distribution.
(d)

says that whatever be the distribution of X, discrete or continuous,


has approximately standard normal distribution if n is large.
117

Example 1 (See exercise 6.14, page 207)


The mean of a random sample of size n = 25 is used to estimate the mean of an infinite
population with standard deviation = 2.4. What can we assert about the prob that the
error will be less than 1.2 if we use
(a) Chebyshevs theorem
(b) The central limit theorem?

Solution

( )

( )

(a) We know the sample mean X is a rv with E X = and Var X =


Chebyshevs theorem tell us that for any r.v. T,

P | T E (T ) | k Var(T ) 1

1
k2

( )

Taking T = X, and noting E (T ) = E X = ,

( )

var(T ) = var X =

P X < k.

2 (2.4 )
=
, we find
n
25
2

2 .4
1
1 2 .
5
k

Desired P X < 1.2 ?

k.

2 .4
5
=1.2 gives k =
5
2

Thus we can assert using Chebyshevs theorem that

P X < 1 .2 1

1
25
4

21
= 0.84
25
118

2
n

(b) Central limit theorem says

Thus P X < 1.2

=P

2.4
5

is approximately standard normal.

)
<

1 .2
2.4
5

5
5
= 2F
1
2
2

P Z <

= 2 F(2.5) 1 = 2 0.9938 1 = 0.9876

Example 2 (See exercise 6.15 on page 207)


A random sample of size 100 is taken from an infinite population having mean = 76
and variance 2 = 256. What is the prob that X will be between 75 and 78?

Solution
We use central limit theorem namely

Required P 75 < X < 78

=P

is approximately standard normal.

75 76

16
10

<

<

78 76
16
10

10
20
5
5
<Z<
=P <Z<
16
16
8
4

5
5
5
5
F
=F
+F
1
4
8
4
8
= 0.8944 + 0.7340 1 = 0.8284

=F

119

Example 3 (See Exercise 6.17 on page 217)


If the distribution of weights of all men travelling by air between Dallas and El Paso has
a mean of 163 pounds and a s.d .of 18 pounds, what is the prob. That the combined gross
weight of 36 men travelling on a plane between these two cities is more than 6000
pounds?

Solution
Let X be the weight of a man traveling by air between D and E. It is given that X is a rv
with mean E(X ) = = 163 lbs and sd = 18 lbs.
Let X 1 , X 2 .....X 36 be the weights of 36 men traveling on a plane between these two cities.
Thus we can regard {X 1 , X 2 ....., X 36 }as a random sample of size 36 from X.

Required P(X 1 + X 2 + ..... + X 36 > 6000 )


=P X>

=P

6000
36

P Z>

>

1000
6

163

by central limit theorem

18
6

22
18

= 1 P Z

22
= 1 F (1.22 )
18

= 1 0.8888 = 0.1112

120

The sampling distribution of the sample mean X (when is unknown).


Theorem
Let X be a rv having normal distribution with mean E(X ) = . Let X be the sample
mean and S2 the sample variance of a random sample of size n form that of X.
Then the rv. t =

X
S
n

has (students) t-distribution with n-1 degrees of freedom.

Remark
(1)

The shape of the density curve of t-distribution (with parameter -greek nu)
is like that of standard normal distribution and is symmetrical about the yaxis.
t , is that
unique number such that
P(t > t v, ) =
( the parameter)
By symmetry t ,1 = 1 t ,
The values of t , for various and are tabulated in Table 4.
For large, t , Z .

Example 4 (See exercise 6.20 on page 213)


A random sample of size 25 from a normal population has the mean x = 47.5 and the s.d.
s = 8.4. Does this information tend to support or refute the claim that the mean of the
population is = 42.1?

121

Solution:

t=

x
s
n

has a t-distribution with parameter = n 1

Here = 42.1, s = 8.4, n = 25


t n 1, 0.005 = t 24, 0.005 = 2.797
Thus P(t > 2.797 ) = 0.005
X

Or P

s
n

> 2.797 = 0.005

Or P X > 42.1 + 2.797

8 .4
= 0.005
5

Or P X > 46.78 = 0.005


This means when = 4.21 only in about 0.5 percent of the cases we may get an

X > 46.78 . Thus we will have to refute the claim = 42.1 (in favour of > 42.1)
Example 5 (See exercise 6.21 on page 213)
The following are the times between six calls for an ambulance (in a certain city) and the
patients arrival at the hospital : 27, 15,20, 32, 18 and 26 minutes. Use these figures to
judge the reasonableness of the ambulance services claim that it takes on the average 20
minutes between the call for an ambulance and the patients arrival at the hospital.

Solution
Let X = time (in minutes) between the call for an ambulance and the patients arrival at
the hospital. We assume X has a normal distribution. (When nothing is given, we assume
normality). We want to judge the reasonableness of the claim that E(X ) = = 20 minutes.
For this we recorded the times for 6 calls. So we have a random sample of size 6 from X
with
122

X 1 = 27, X 2 = 15, X 3 = 20, X 4 = 32, X 5 = 18, X 6 = 26. Thus X = (27 + 15 + 20 + 32 + 18 + 26 ) / 6


=

138
= 23.
6

S2 =

1
(27 23)2 + (15 23)2 + (20 23)2 + (32 23)2 + (18 23)2 + (26 23)2
6 1

1
[16 + 64 + 9 + 81 + 25 + 9] = 204
5
5

Hence S =

204
5

We calculate
t=

x
s
n

23 20
204
5

/ 6

= 1.150

Now t n 1, = t 5, = 2.015 for = 0.05


= 1.476 for = 0.10
Since our observed t = 1.150 < t 5.10
We can say that it is reasonable to assume that the average time is = 20 minutes

Example 6
A process for making certain bearings is under control if the diameters of the bearings
have a mean of 0.5000 cm. What can we say about this process if a sample of 10 of these
bearings has a mean diameter of 0.5060 cm and sd 0.0040 cm?

H int . P 3.25 <

X 0 .5
.004
10

< 3.25 = 0.01

or P 0.492 < x < 0.504 = 0.01


Since X = 0.506 > 0.504,
the process is not under control.
123

Sampling Distribution of S2 (The sample variance)


Theorem
If S2 is the sample variance of a random sample of size n taken from the normal
population with (population) variance 2 , then

2 = (n 1)

S2
1
= 2
2

n
i =1

(X

is a random variable having chi-square distribution with parameter = n 1.

Remark
Since S2 > 0, the rv has +ve density only to right of the origin. 2 , is that unique

number such that P 2 > 2 , = and is tabulated for some s and s in table 5.

Example 7 (See exercise 6.24 on page 213)


A random sample of 10 observations is taken from a normal population having the
variance 2 = 42.5 . Find approximately the prob of obtaining a sample standard
deviation S between 3.14 and 8.94

Solution
Required P(3.14 < S < 8.94)

( (3.14)2 < S 2 < (8.94)2 )


(n 1) S 2 <
9
2
=P
(3.14 ) <
=p

42.5

= P 2.088 < < 16.925


2

9
2
(8.94 )
42.5

(From Table 5, 92 05 = 16.919, 92 , 0.99 = 2.088 )

) (

= P 2 > 2.088 P 2 > 16.919 (approx )


= 0.99 0.05 = 0.94 (approx)

124

Example 8 (See exercise 6.23 on page 213)


The claim that the variance of a normal population is 2 = 21.3 is rejected if the
variance of a random sample of size 15 exceeds 39.74. What is the prob that the claim
will be rejected even though 2 = 21.3 ?

Solution
The prob that the claim is rejected

= P S 2 > 29.74
=P

(n 1) S 2 >

14
39.74 = P 2 > 21.12
21.3

2
= 0.025 As from table 5, 14
, 0.025 = 21.12

Theorem
If S12 , S 22 are the variances of two independent random samples of sizes n1 , n2
respectively taken from two normal populations having the same variance, then
S12
F= 2
S2
is a rv having the (Snedecors) F distribution with parameters 1 = n1 1 and 2 = n2 1

Remark
1. n1 1 is called the numerator degrees of freedom and n2 1 is called the
denominator degrees of freedom.
2. If F is a rv having ( 1 , 2 ) degrees of freedom, then F1 , 2 , is that unique number
such that

125

P F > F 1 2 , = and is tabulated for = 0.05 in table 6(a) and for = 0.01 in table
6(b).
We also note the fact : F 2 , 2 , =

Thus F10, 20,0.95 =

1
F20,10, 0.05

1
F1 , 2 ,1

1
= 0.36
2.77

Example 9
(a) F12,15, 0.95 =

(b) F6, 20, 0.99 =

1
F15,12,0.05
1
F20, 6, 0.01

1
= 0.38
2.62

1
= 0.135
7.40

Example 10 (See Exercise on page 213)


If independent random samples of size n1 = n2 = 8 come from two normal populations
having the same variance, what is the prob that either sample variance will be at least
seven times as large as the other?

Solution
Let S12 , S 22 be the sample variances of the two samples.

Reqd P S12 > 7S 22 OR S 22 > 7S12

=P

S12
S 22
>
7
or
>7
S 22
S12

= 2 P (F > 7 )
where F is a rv having F distribution with (7,7) degrees of freedom
= 2 x 0.01 = 0.02 (from table 6(b)).
126

Example 11 (see exercise 6.38 on page 215)


If two independent random samples of size n1 = 9 and n2 = 16 are taken from a normal
population, what is the prob that the variance of the first sample will be at least four times
as large as the variance of the second sample?

Hint : Reqd prob = P S12 > 4S 22


=P

S12
> 4 = P(F > 4 )
S 22

= 0.01 (as F8,15, 0.01 = 4 )

Example 12 (See Exercise 6.29 on page 214)


The F distribution with (4,4) degrees of freedom is given by
f (F ) =

6 F (1 + F )

F >0
F 0

If random samples of size 5 are taken from two normal populations having the same
variance, find the prob that the ratio of the larger to the smaller sample variance will
exceed 3?

Solution
Let S12 , S 22 be the sample variance of the two random samples.

Reqd P S12 > 3S 22 or S 22 > 3S12

S12
= 2 P 2 > 3 = 2 P ( F > 3)
S2
where F is a rv having (4,4) degrees of freedom

127

=2

6F

(1 + F)

= 12

= 12

dF = 12

(1 + F)

2(1 + F)

(1 + F)4

dF

3(1 + F)

1
1
5 12 5

=
=
32 192
192
16

Inferences Concerning Means


We shall discuss how we can make statement about the mean of a population from the
knowledge about the mean of a random sample. That is we estimate the mean of a
population based on a random sample.

Point Estimation
Here we use a statistic to estimate the parameter of a distribution representing a
population. For example if we can assume that the lifelength of a transistor is a r.v.
having exponential distribution with (unknown) parameter , can be estimated by
some statistic, say X the mean of a random sample. Or we may say the sample mean is
an estimate of the parameter .

Definition

Let be a parameter associated with the distribution of a r.v. A statistic (based on a


random sample of size n) is said to be an unbiased estimate ( estimator) of if

E = . That is, will be on the average close to .

Example

( )

Let X be a rv; the mean of X. If X is the sample mean then we know E X = . Thus
we may say the sample mean X is an unbiased estimate of
statistic,

X=

X 1 + X 2 + ..... + X n
n

a
128

function

of

(Note X is a rv, a
the

random

sample

(X1 , X 2 ....., X n ). If

1 , 2 .... n are

any

non-ve

numbers

such

that

1 + 2 + ...... + n = 1, then we can easily see that 1 x 1 + 2 x 2 + ..... + n x n is also an


unbiased estimate of . (Prove this). X is got as a special case by taking
1
n

1 = 2 = .... = n = . Thus we have a large number of unbiased estimates for .

Hence the question arises : If 1 , 2 are both unbiased estimates of , which one do we
prefer? The answer is given by the following definition.

Definition

Let 1 , 2 be both unbiased estimates of the parameter . We say is more efficient than

2 if Var 1 Var 2 .

Remark
That is the above definition says prefer that unbiased estimate which is more closer to

. Remember the variance is a measure of the closeness of X to .


Maximum Error in estimating by X
Let X be the sample mean of a random sample of size n from a population with
(unknown) mean . Suppose we use X to estimate . X - is called the error in
estimating by X . Can we find an upperbound on this error? We know if X is normal
(or if n is large) then by Cantral Limit Theorem.

is a r.v. having (approximately) the standard normal distribution. And we can say

P Z <
2

< Z = 1
2

129

Thus we can say with prob (1 ) that the max absolute error X in estimating by
X is atmost Z

. (Here obviously we assume, the population s.d. is known. And

Z is that unique no. such that P Z > Z =


2

.
2

We also say that we can say with 100(1 ) percent confidence that the max. abs error is
atmost Z

. The book denotes, this by E.

Estimation of n
Thus to find the size n of the sample so that we may say with 100(1 ) percent
confidence, the max. abs. error is a given quantity E, we solve for n, the equation
Z

or n =

= E.

Example 1
What is the maximum error one can expect to make with prob 0.90 when using the mean
of a random sample of size n = 64 to estimate the mean of a population with 2 = 2.56 ?

Solution
Substituting n = 64, = 1.6 and Z = Z 0.05 = 1.645 (Note 1 = 0.90 implies
2

in the formula for the maximum error E = Z

E = 1.645

1 .6

= 1.445

we get

1 .6
= 1.645 0.2 = 0.3290
8

64
Thus the maximum error one can expect to make with prob 0.90 is 0.3290.

130

= 0.05 )

Example 2
If we want to determine the average mechanical aptitude of a large group of workers,
how large a random sample will we need to be able to assert with prob 0.95 that the
sample mean will not differ from the population mean by more than 3.0. points? Assume
that it is known from past experience that = 200.

Solution
Here 1 = 0.95 so that

= 0.025 , hence Z = Z 0.025 = 1.96


2

Thus we want n so that we can assert with prob 0.95 that the max error E = 3.0

n =

Z
2

1.96 20
=
3

= 170.74

Since n must be an integer, we take it as 171.

Small Samples
If the population is normal and we take a random sample of size n (n small) from it, we
note

t=

( X sample mean, S = Sample s.d)

s
n

is a rv having t-distribution with (n-1) degrees of freedom.


Thus we can assert with prob 1 that t t n 1, where t n1, is that unique no such that

. Thus if we use X to estimate , we can assert with prob (1 ) that


2
the max error will be
S
E = t n 1,
2
n
(Note : If n is large, then t is approx standard normal. Thus for n large, the above
S
formula will become E = Z
)
2
n
P t > t n1, =
2

131

Example 3
20 fuses were subjected to a 20% overload, and the times it took them to blow had a
mean x = 10.63 minutes and a s.d. S = 2.48 minutes. If we use x = 10.63 minutes as a
point estimate of the true average it takes for such fuses to blow with a 20% overload,
what can we assert with 95% confidence about the maximum error?

Solution
Here n = 20 (fuses) x = 10.63, S = 2.478
95

= 0.95 so that = 0.025


100
2

1 =

Hence t n 1, = t19, 0.025 = 2.093


2

Hence we can assert with 95% confidence (ie with prob 0.95) that the max error will be
S

E = t n 1,

= 2.093

2.48
20

= 1.16

Interval Estimation
If X is the mean of a random sample of size n from a population with known sd , then
we know by central limit theorem,

Z=

is (approximately) standard normal. So we can say with prob

Z <
2

< Z .
2

which can be rewritten as


X

Z < <X +
2

Z
2

132

(1 ) that

Thus we can assert with Prob (1 ) ( ie. with (1 ) 100% confidence ) that lies in

the interval X

Z , X +

Z .
2

We refer to the above interval as a (1 )100% confidence interval for . The end
points X

Z are known as (1 )100% . confidence limits for .

Example 4
Suppose the mean of a random sample of size 25 from a normal population (with = 2 )
is x = 78.3. Obtain a 99% confidence interval for , the population mean.

Solution
Here n = 25, = 2, (1 ) =

79
= 0.99
100

= 0.005 Z = Z 0.005 = 2.575


2

= 78.3

Hence a 99% confidence interval for is

x Z

, x + Z

= 78.3 2.575

2
25

n
, 78.3 + 2.575

= (78.3 1.0300, 78.3 + 1.0300 )


= (77.27, 79.33)
133

2
25

unknown
Suppose X is the sample mean and S is the sample sd of a random sample of size n taken
from a normal population with (unknown) mean . Then we know the r.v.
t=

X
s
n

has a t-distribution with (n-1) degrees of freedom. Thus we can say with prob 1 that
t n 1, < t < t n1,
2

or t n 1, <
2

X
<t
n 1,
S
2
n

or X t

n 1,
2

< < X + t

n 1,
2

Thus a (1 )100% confidence interval for is

Xt

n 1,
2

,X + t

n 1,
2

Note :
(1)

If n is large, t has approx the standard normal distribution. In which case the
(1 )100% confidence interval for will be

x Z
2

(2)

S
n

, x + Z
2

S
n

If nothing is mentioned, we assume that the sample is taken from a normal


population so that the above is valid.
134

Example 5
Material manufactured continuously before being cut and wound into large rolls must be
monitored for thickness (caliper). A sample of ten measurements on paper, in mm,
yielded
32.2, 32.0, 30.4, 31.0, 31.2, 31.2, 30.3, 29.6, 30.5, 30.7
Obtain a 95% confidence interval for the mean thickness.

Solution
Here n = 10

x = 30.41 S = 0.7880
1 = 0.95 or
t

n 1,

= 0.025

= t 9, 0.0025 = 2.262

Hence a 95% confidence interval for is


30.9 2.262

0.7880
10

, 30.9 + 2.262

0.7880
10

= (30.34, 31.46 )

Example 6:
Ten bearings made by a certain process have a mean diameter of 0.5060 cm with a sd of
0.0040 cm. Assuming that the data may be looked upon as a random sample from a
normal population, construct a 99% confidence interval for the actual average diameter of
bearings made by this process.

135

Solution
Here n = 10, x = 0.5060, S = 0.0040

(1 ) =
t

n 1,

99
= 0.99. Hence = 0.005
100

= t 9, 0.005 = 3.250

Thus a 99% confidence interval for the mean

= x t

n 1,

S
n

, x+t

= 0.5060 3.250

n 1,

0.0040
10

s
n

, 0.5060 + 3.250

0.0040
10

= (0.5019, 0.5101)
Example 7
In a random sample of 100 batteries the lifetimes have a mean of 148.2 hours with a s.d.
of 24.9 hours. Construct a 76.60% confidence interval for the mean life of the batteries.

Solution
Here n = 100, x = 148.2, S = 24.9
76.60

= .7660 so that = 0.1170


100
2
Thus t = t 99, 0.1170 Z 0.1170 = 1.19

1 =

n 1,

Hence a 76.60% confidence interval is


148.2 1.19
= (145.2,151.2 ).

24.9
100

,148.2 + 1.19

24.9
100

136

Example 8
A random sample of 100 teachers in a large metropolitan area revealed a mean weekly
salary of $487 with a sd of $48. With what degree of confidence can we assert that the
average weekly salary of all teachers in the metropolitan area is between $472 and $502?

Solution
Suppose the degree of confidence is (1 ) 100%
Thus x + t

n 1,

S
n

= $502

Here x = 487, S = 48, n = 100

99 ,

Thus we get 487 + Z


2

Or Z =
2

48
= 502
10

15
= 3.125
4 .8

= 0.0009 or 1 = 0.9982

We can assert with 99.82% confidence that the true mean salaries will be between
$472 and $502.

Maximum Likelihood Estimates (See exercise 7.23, 7.24)


Definition
Let X be a rv. Let f ( x, ) = P( X = x ) be the point prob function if X is discrete and let

f ( x, ) be the pdf of X if X is continuous (here is a parameter). Let X 1 , X 2 .....X n be a

random sample of size n from X. Then the likelihood function based on the random
sample is defined as
137

L () = L(x 1 , x 2 ,....x n ; ) = f (x 1 , )f (x 2 , ).....f (x n , ).


Thus the likelihood function L( ) = P ( x1 = x1 )P ( x 2 = x 2 )...P( x n = x n ) if X is discrete and
is the joint pdf of X 1 ,...X n when X is continous. The maximum likelihood estimate

(MLE)of is that which maximizes L( ) .

Example 8
Let X be a rv having Poisson distribution with parameter .
Thus f (x , ) = P(X = x ) = e

x
; x = 0,1,2.......
x!

Hence the likelihood function is

L( ) = e

x1!

x2 !

....e

xn !

e n x1 + x 2 +....+ x n
=
; x i = 0,1,2.......
x 1! x 2 !.....x n !

To find the value of which maximizes L( ) , we use calculus.


First we take ln (log to base e

natural logarithm)

ln L( ) = n + (x 1 + ..... + x n ) ln ln (x 1!....x n !)
Differentiating w.r.t. (noting x1 .....x n are not be varied)
We get

(x + ....xn )
1 L
= n + 1
L( )

= 0 gives =

x 1 + .... + x n
n
138

2L
We can easily verify
is <0 for this .
2

x1 + ....x n
= x (The sample mean)
n

Hence the MLE of is =

Example 9 MLE of Proportion


Suppose p is the proportion of defective bolts produced by a factory. To estimate p, we
proceed as follows. We take n bolts at random and calculate fD = Sample proportion of
defectives.
=

No of defectives found among the n chosen ones


n

we show fD ist he MLE of p.


We define a rv X as follows.
X=

0 if the bolt chosen is not defective


1 if the bolt chosen is defective

Thus X has the prob distribution


x

Prob

1-p

It is clear that the point prob function

f ( x; p )(of X ) is given by
f (x; p ) = p x (1 p )

1 x

; x = 0,1

(Note f (x;0 ) = P(x = 0 ) = 1 p & f (x;1) = P(x = 1) = p)


Choosing n bolts at random amounts to choosing a random sample {X 1 , X 2 ..., X n }from X
where Xi = 0 if the ith bolt chosen is not defective and = 1 if it is defective (I=1,2n).
139

Hence X 1 + X 2 .... + X n (can you guess?)


= no of defective bolts among the n chosen.
The likelihood function of the sample is

L(p ) = f (x 1 ; p )f (x 2 , p ).....f (x n ; p )
= p x1T ...+ x n (1 p )
= p (1 p )
s

n s

n ( x1 + x 2 +...+ x n )

x i = 0 or1 for all i = 1,....n

(s = x 1 .... + x n )

Taking ln and differentiating (partially) wrt p,


We get

1 L s (n s )
=
L p p 1 p

for maximum,

(i.e) p =

L
s n s
= 0 or =
p
p 1 p

s x 1 + x 2 + ..... + x n
=
n
n

No of defectives among the n chosen


n

= Sample proportion of defectives


2L
(One can easily see this p makes
< 0 so that L is maximum for this p).
p 2

Example 10
Let X be a rv having exponential distribution with parameter (unknown). Hence the
density of X is f ( x; ) =

(x > 0)
140

Let {X1 , X 2 ....., X n } be a random sample of size n. Hence the likelihood function is

L( ) = f ( x1 ; ) f ( x 2 ; ).... f ( x n ; )

( x1 + x2 +....+ xn )

( xi

> 0)

Taking ln and differentiating (partially) w.r.t. , we get


1 L
n x + .... + x n
= + 1
= 0 (for max imum )
L

2
gives =

x1 + x 2 + .... + x n
=x
n

Thus the sample mean x is the MLE of .

Example 11
A r.v. X has density
f (x; ) = ( + 1)x ;0 < x < 1
Obtain the ML estimate of based on a random sample {X1 , X 2 .....X n } of size n from
x.

Solution
The likelihood function is
L( ) = ( + 1) (x 1 x 2 ...x n ) ; 0 < x i < 1
n

Taking ln and differentiating (partially) wrt , we get

141

1 L
n
=
+ ln (x 1 .......x n )
L + 1
= 0 (for L to be max imum )
gives = 1

ln (x 1 .....x n )

which is the ML estimate for .


So far we have considered situations where the ML estimate is got by differentiating L
(and equalizing the derivative) to zero. The following example is one where the
differentiation will not work.

Example 12
A rv X has uniform density over [0, ]
(ie) The density of X is f (x; ) =

1
; 0 x (and 0 elsewhere)

The likelihood function based on a random sample of size n from X is


L( ) = f (x 1 ; )f (x 2 ; ).....f (x n ; )
=

1
; 0 x 1 , 0 x 2 , ....,0 x n
n

This is a maximum when the Dr is least


(ie) when is least. But > xi i = 1,2....n
Hence the least is max {x1 .....x n } which is the MLE of

142

Estimation of Sample proportion


We have just in the above seen if p = population proportion (i.e proportion of persons,
things etc. having a characteristics) then the ML estimate of p = sample proportion Now
we would like to find a (1 ) 100% confidence interval for p.
(This is treated in chapter 9 of your text book)

Large Samples
Suppose we have a dichotomous universe; that is a population whose members are
either haves on have nots; that is a member has a property or not.
For example we can think of a population of all bulbs produced by a factory. Any bulb is
either a have (ie defective) or is a have-not (ie it is good) and p = proportion of haves
= Prob that a randomly chosen member is a have.
As another example, we can think of a population of all females in USA. A member is a
have ( = 0) is a blond or is a have-not
(=is not a blond). As a last example, consider the population of all voters in India. A
member is a have if he follows BJP and is a have-not otherwise.
To estimate p, we choose n members at random and count the number X of haves. Thus
X is a rv having binomial distribution with parameters n and p!
P(X = x ) = f (x; p ) =

n
x

p x (1 p )

n x

; x = 0,1,2.....n

and if n is large, we know standardized Binomial


(ie) for large n ,

X np

np(1 p )

standard normal

has approx standard normal distribution. So we can say with

prob (1 ) that

143

z <
2

x np

np (1 p )

x
z
n
2

x
p
n
z
p (1 p ) 2
n

or z <

or

< z

p (1 p )
x
< p < + z
n
n
2

p (1 p )
n

In the end points, we replace p by the MLE

X
(=sample proportion)
n

Thus we can say with prob (1 ) that

x
z
n
2

x
x
1
n
n
n

x
< p< + z
n
2

x
x
1
n
n
n

Hence a (1 )100% confidence interval for p is

x
z
n
2

x
x
1
n
n
n

X
+ z
n
2

x
x
1
n
n
n

Remark : We can say with prob (1 ) that the max error


X
is
n

E = Z
2

p (1 p )
n

We can replace p by

X
and say the
n

144

X
p in approximating p by
n

Max error = Z
2

X
X
1
n
n
n

Or we note that p(1 p ) for (0 p 1) is a maximum

1
1
(which is obtained when p = )
4
2

Thus we can also say with prob (1 ) that the max error.

1
4n

= Z
2

This last equation tell us that to assert with prob (1 ) that the max error is E, n must be

1
4

Example 13
In a random sample of 400 industrial accidents, it was found that 231 were due at least
partially to unsafe working conditions. Construct a 99% confidence interval for the
corresponding true proportion p.

Solution
Here n = 400 , x = 231, (1 ) = 0.99
so that

= 0.005 hence Z = 2.575


2

Thus a 99% confidence interval for p will be

x
Z
n
2

x
x
1
n
n
n

x
+ Z
n
2

x
x
1
n
n
n

145

231
231
1
231
400
400
=
2.575
400
400

231
231
1
231
400
400
+ 2.575
400
400

= (0.5139,0.6411)
Example 14
In a sample survey of the safety explosives used in certain mining operations,
explosives containing potassium mitrate were found to be used in 95 out of 250 cases. If
95
= 0.38 is used as an estimate of the corresponding true proportion, what can we say
250
with 95% confidence about the maximum error?

Solution
Here n = 250, X = 95, 1 = 0.95
so that

= 0.025 ; hence Z = 1.96


2

Hence we can say with 95% confidence that the max. error is E = Z
2

= 1.96

x
x
1
n
n
n

0.38 0.62
250

= 0.0602

Example 15:
Among 100 fish caught in a large lake, 18 were inedible due to the pollution of the
.18
environment. If we use
= 0.18 as an estimate of the corresponding true proportion,
100
with what confidence can we assert that the error of this estimate is atmost 0.065?

146

Solution
Here n = 100, X = 18 max error = E = 0.065

We note E = Z
2

X
X
1
n
n
n

= Z
2

.18 .82
100

= Z 0.03842
2

Z =
2

Hence

0.065
= 1.69
0.03842

= 1 0.9545 = 0.0455

= 0.0910 or 1 = 0.9190
So we can assert with (1 ) 100% = 91.9% confidence that the error is at most 0.065.

Example 16
What is the size of the smallest sample required to estimate an unknown proportion to
within a max. error of 0.06 with at least 95% confidence?

Solution
Here E = 0.06 ;1 = 0.95 or

= 0.025
2

Z = Z 0.025 = 1.96
2

Hence the smallest sample size n is

147

1
n=
4

Z
2

1 1.96
=
4 0.06

= 266.77
Since n must be an integer, we take the size to be 267.

Remark
Read the relevant material in your text on pages 279-281 of finding the confidence
interval for the proportion in case of small samples.

Tests of Statistical Hypothesis


In many problems, instead of estimating the parameter, we must decide whether a
statement concerning a parameter is true of false. For instance one may like to test the
truth of the statement: The mean life length of a bulb is 500 hours.
In fact we may even have to decide whether the mean life is 500 hours or more (!)
In such situations, we have a statement whose truth or falsity we want to test. We then
say we want to test the null hypothesis H0 = the mean life lengths is 500 hours (Here
onwards, when we say we want to test a statement, it shall mean we want to test whether
the statement is true). We then have another (usually called alternative) hypothesis. Make
some experiment and on the basis of that we will decide whether to accept the null
hypothesis or reject it. (When we reject the null hypothesis we automatically accept the
alternative hypothesis).

Example
Suppose we wish to test the null hypothesis H0 = The mean life length of a bulb is 500
hours against the alternative H1 = The mean life length is > 500 hours. Suppose we take a
random sample of 50 bulbs and found that the sample mean is 520 hours. Should we
accept H0 or reject H0 ? We have to note that even though the population mean is 500
hours the sample mean could be more or less. Similarly even though the population mean
is > 500 hours, say 550 hours, even then the sample mean could be less than 550 hours.
Thus whatever decision we may make, there is a possibility of making an error. That is
148

falsely rejecting H0 (when it should have been accepted) and falsely accepting H0 (when
it should have been rejected). We put this in a tabular form as follows:
Accept H0

Reject H0

H0 is true

Correct Decision

Type I error

H0 is false

Type II Error

Correct Decision

Thus the type I error is the error of falsely rejecting H0 and the type II error is the error of
falsely accepting H0. A good decision ( test) is one where the prob of making the errors
is small.

Notation
The prob of committing a type I error is denoted by . It is also referred to as the size of
the test or the level of significance of the test. The prob of committing Type II error is
denoted by .

Example 1
Suppose we want to test the null hypothesis = 80 against the alternative hyp = 83 on
the basis of a random sample of size n = 100 (assume that the population s.d. = 8.4 )
The null hyp. is rejected if the sample mean x > 82 ; otherwise is is accepted. What is the
prob of typeI error; the prob of type II error?

Solution
We know that when = 80 (and = 8.4 ) the r.v.

X
has a standard normal

distribution. Thus,
P (Type I error)
=P (Rejecting the null hyp when it is true)

149

= P X > 82 given = 80

X 82 80
>

8 .4
10
n

=P

= P(Z > 2.38)


= 1 P(Z 2.38) = 1 0.9913 = .0087
Thus in roughly about 1% of the cases we will be (falsely) rejecting H0. Recall this is also
called the size of the test or level of significance of the test.
P (Type II error) = P (Falsely accepting H0)
= P (Accepting H0 when it is false)

= P X 82 given = 83

=P

X 82 83

8 .4
10
n

= P(Z 1.19 )
= 1 P( Z 1.19) = 1 0.8830 = 0.1170
Thus roughly in 12% of the cases we will be falsely accepting H0.

Definition (Critical Region)


In the previous example we rejected the null hypothesis when x > 82 (i.e.) when x lies in
the region x>82 (of the x axis). This portion of the horizontal axis is then called the
critical region and denoted by C. Thus the critical region for the above situation is

C = x > 82 and remember we reject H0 when the (test) statistic X lies in the critical
150

region (ie takes a value > 82). So the size of the critical region ( prob that X lies in C)
is the size of the test or level or significance.
The shaded portion is the critical region. The portion
acceptance of H0.

...

is the region of false

Critical regions for Hypothesis Concerning the means


Let X be a rv having a normal distribution with (unknown) mean and (known) s.d. .
Suppose we wish to test the null hypothesis = 0 .
The following tables given the critical regions (criteria for rejecting H0) for various
alternative hypotheses.
Null hypothesis : = 0 (Normal population known)

Z=

x 0

n
Alternative Hypothesis
H1

Reject H0 if

Prob of Type I error

= 1 (< 0 )

Z < Z

Prob of type II error

1 F

0 1
Z

< 0

Z < Z

= 1 > 0

Z > Z

0 1
+ Z

> 0

Z > Z

Z < Z

or Z > Z
2

151

F(x) = cd f of standard normal distribution.

Remark:
The prob of Type II error is blank in case H1 (the alternative hypothesis) is one of the
following three things = < 0 , > 0 , 0 . This is because the Type II error can
happen in various ways and so we cannot determine the prob of its occurrence.

Example 2:
According to norms established for a mechanical aptitude test, persons who are 18 years
old should average 73.2 with a standard deviation of 8.6. If 45 randomly selected persons
averaged 76.7 test the null hypothesis = 73.2 against the alternative > 73.2 at the
0.01 level of significance.

Solution
Step I Null hypothesis H 0 : = 73.2
Alternative hypothesis H 1 : > 73.2
(Thus here 0 = 73.2 )

Step II The level of significance


= = 0.01

Step III Reject the null hypothesis if Z > Z = Z 0.01 = 2.33


Step IV Calculations
Z=

x 0

76.7 73.2
= 2.73
8 .6
45

Step V Decision net para since Z = 2.73 > Z = 2.33


we reject H0 (at 0.01 level of significance)
(i.e) we would say > 73.2 (and the prob of falsely saying this is 0.01 ).
152

Example 3
It is desired to test the null hypothesis = 100 against the alternative hypothesis

< 100 on the basis of a random sample of size n = 40 from a population with = 12.
For what values of x must the null hypothesis be rejected if the prob of Type I error is to
be = 0.01?

Solution
Z = Z 0.01 = 2.33 . Hence from the table we reject H0 if Z < Z =-2.33 where

Z=

x 0

x 100
< 2.33 gives
12

n
x < 100 2.33

40
12
40

= 95.58

Example 4
To test a paint manufacturers claim that the average drying time of his new fast-drying
paint is 20 minutes, a random sample of 36 boards is painted with his new paint and his
claim is rejected if the mean drying time x is > 20.50 minutes. Find
(a)
(b)

The prob of type I error


The prob of type II error when = 21 minutes.
(Assume that = 2.4 minutes)

Solution
Here null hypothesis H 0 : = 20
Alt hypothesis H 1 : > 20
P (Type I error) = P (Rejecting H0 when it is true)
Now when H0 is true, = 20 and hence
153

X X 20
6
=
=
X 20 is standard normal.

2 .4
2 .4
n
36
Thus P (Type I error)
= P ( X > 20.50 given that = 20 )

X 20.50 20
>

2 .4

=P

n
36
= P(Z > 1.25) = 1 P(Z 1.25) = 1 F(1.25)
= 1 0.8944 = 0.1056
(b)

P (Type II error when = 21 )


=P (Accepting H0 when = 21 )

= P X 20.50 when = 21

=P

X 20.50 21

= P(Z 1.25) = P(Z > 1.25)

2 .4
n

36

= 0.1056

154

Example 5
It is desired to test the null hypothesis = 100 pounds against the alternative hypothesis

< 100 pounds on the basis of a random sample of size n=40 from a population with
= 12. For what values of x must the null hypothesis be rejected if the prob of type I
error is to be = 0.01?
Solutions
We want to test the null hypothesis H 0 : = 100 against the alt hypothesis H 1 : < 100
given = 12, n = 50.
Suppose we reject H0 when x < C.
Thus P (Type I error)
= P (Rejecting H0 when it is true)

= P X < C given = 100

=P

X C 100
C 100
<
=P Z<

12
12
n

=F

50

50

C 100
= 0.01
12

50
implies

C 100
= 2.33
12
50

Or C = 100

12
50

2.33 = 96.05

Thus reject H0 if X < 96.05

155

Example 6
Suppose that for a given population with = 8.4 in 2 , we want to test the null hypothesis

= 80.0 in 2 against the alternative hypothesis < 80.0 in 2 on the basis of a random
sample of size n = 100.
(a)

If the null hypothesis is rejected for x < 78.0 in 2 and otherwise it is accepted,
what is the probability of type I error?

(b)

What is the answer to part (a) if the null hypothesis is 80 in 2 instead of

= 80.0 in 2
Solution
(a)

null hypothesis H 0 : = 80
Alt hypothesis H 1 : < 80
Given = 8.4, n = 100
P (Type I error) = P (Rejecting H0 when it is true)

= P X < 78.0 given = 80

=P

X 78.0 80.0
10
<
= P Z < 1
8 .4

4 .2
n

= 1 P Z <

100
10
= 1 F (2.38)
4 .2

=1-0.9913 =.0087
(b)

In this case we define the type I error as the max prob of rejecting H0 when it is

true = P x < 78.0 given is a number 80.0

Now P x < 78.0 when the population mean is

156

=P

<

78.0
10
(78 )
=P Z<
8 .4
8 .4
100

= F (1.19(78 ))
We note that cdf of Z, viz F(z) is an increasing function of Z. Thus when
80, F (1.19(78 )) is largest when is smallest i.e. = 80. Hence P (Type I
error)
= Max F (1.19(78 )) = F (1.19 (78 80 ))

80
= 0.0087

Example 7
If the null hypothesis = 0 is to be tested against the one-sided alternative hypothesis

< 0 (or > 0 ) and if the prob of Type I error is to be and the prob of Type II

error is to be when = 1 , it can be shown that this is possible when the required
sample size is
2 (Z + Z )

n=

(1 0 )2

where 2 is the population variance.


(a)

It is desired to test the null hypothesis = 40 against the alternative hypothesis

< 40 on the basis of a large random sample from a population with = 4.


If the prob of type I error is to be 0.05 and the prob of Type II error is to be 0.12
for = 38, find the required size of the sample.
(b)

Suppose we want to test the null hypothesis = 64 against the alternative


hypothesis < 64 for a population with standard deviation = 7.2. How large a

157

sample must we take if is to be 0.05 and is to be 0.01 for = 61? Also for
what values of x will the null hypothesis have to be rejected?

Solution
(a) Hence = 0.05 , = 0.12 0 = 40, 1 = 38, = 4
Z = Z 0.05 = 1.645, Z = Z 0.12 = 1.175

Thus the required sample size


=

16(1.645 + 1.175)

(38 40)2

= 31.89 n 32.

(b) Here = 0.05, = 0.01, 0 = 64, 1 = 61, = 7.2


2
2
7.2 ) (1.645 + 2.33)
(
n
(61 64 )2

= 91.01 n 92

We reject H 0 if Z < Z ie

X 64
< 1.645 or X < 62.76
7 .2
92

Tests concerning mean when the sample is small


If X is the sample mean and S the sample s.d. of a (small) random sample of size n from
a normal population (with mean 0 ) we know that the statistic t =

X 0
has a tS
n

distribution with (n-1) degrees of freedom. Thus to test the null hypothesis H 0 : = 0
against the alternative hypothesis H 1 : > 0 , we note that when H 0 is true, (ie) when

= 0 , P(t > t n 1, ) =

Thus if we reject the null hypothesis when t > t n 1, (ie) when X > 0 + t n 1,
shall be committing a type I error with prob .
158

S
n

we

The corresponding tests when the alternative hypothesis is < 0 (& 0 ) are
described below.

Note: If n is large, we can approximate t n 1, by Z in these tests.


Critical Regions for Testing H 0 : = 0 (Normal population, unknown )
Alt Hypothesis

Reject Null hypothesis if

< 0

t < t n 1,

> 0

t > t n 1,

t=

t < t n 1, or
2

t > t n 1,

X 0
(n sample size)
s
n

In each case P(Type I error) =

Example 8
A random sample of six steel beams has a mean compressive strength of 58,392 psi
(pounds per square inch) with a s.d. of 648 psi. Use this information and the level of
significance = 0.05 to test whether the true average compressive strength of the steel
from which this sample came is 58,000 psi. Assume normality.

Solution
1. Null Hypothesis = 0 = 58,000
Alt hypothesis > 58,000 (why!)

2. Level of significance = 0.05


3. Criterion : Reject the null hypothesis if t > t n 1, = t 5, 0.05 = 2.015
4. Calculations
159

X 0 58,392 58,000
=
S
648
n
.6
= 1.48
5. Decision
= 1.48 2.015
t=

Since

t observed

we cannot reject the null hypothesis. That is we can say the true average compressive
strength is 58,000 psi.

Example 9
Test runs with six models of an experimental engine showed that they operated for
24,28,21,23,32 and 22 minutes with a gallon of a certain kind of fuel. If the prob of type I
error is to be at most 0.01, is this evidence against a hypothesis that on the average this
kind of engine will operate for at least 29 minutes per gallon with this kind of fuel?
Assume normality.

Solution
1. Null hypothesis H 0 : 0 = 29
Alt hypothesis: H 1 : < 0
2. Level of significance = 0.01
3. Criterion : Reject the null hypothesis if t < t n 1, = t 5, 0.01 = 3.365 (Note n = 6 )
X 0
S
n
4. Calculations
24 + 28 + 21 + 23 + 32 + 22
X=
= 25
6
where t =

160

S2 =

1
(24 25)2 + (28 25)2 + (21 25)2 + (23 25)2 + (32 25)2 + (22 25)2
6 1

= 17.6
t =

25 29
17.6

= 2.34

5. Decision
Since t obs = 2.34 3.365 , we cannot reject the null hypothesis. That is we can
say that this kind of engine will operate for at least 29 minute per gallon with this
kind of fuel.

Example 10
A random sample from a companys very extensive files shows that orders for a certain
piece of machinery were filled, respectively in 10,12,19,14,15,18,11 and 13 days. Use the
level of significance = 0.01 to test the claim that on the average such orders are filled
in 10.5 days. Choose the alternative hypothesis so that rejection of the null hypothesis.
= 10.5 indicates that it takes longer than indicated. Assume normality.

Solution
1.

Null hypothesis H 0 : 0 = 10.5


Alt hypothesis : H 1 : < 10.5

2. Level of significance = 0.01


3. Criterion : Reject the null hypothesis if t < t n 1, = t 81, 001 = t 7, 0.01 = 2.998
X 0
(where 0 = 10.5, n = 8)
S
n
4. Calculations
where t =

X=

10 + 12 + 19 + 14 + 15 + 18 + 11 + 13
=14
8

161

1 (10 14 ) + (12 14) + (19 14 ) + (14 14) + (15 14)


8 1 + (18 14)2 + (11 14 )2 + (13 14 )2
2

S2 =

= 10.29
14 10.5
t =
= 3.09
10.29
8

5. Decision
Since t observed = 3.09 > 2.998 , we have to reject the null hypothesis .That is we can
say on the average, such orders are filled in more than 10.5 days.

Example 11
Tests performed with a random sample of 40 diesel engines produced by a large
manufacturer show that they have a mean thermal efficiency of 31.4% with a sd of 1.6%.
At the 0.01 level of significance, test the null hypothesis = 32.3% against the
alternative hypothesis

32.3%

Solution
1. Null hypothesis = 0 = 32.3
Alt hypothesis 32.3
2. Level of significance = 0.01
3. Criterion : Reject H 0 if < t n 1, or t n 1, (ie) if t < t 39, 0.005 or t 39, 0.005 .
2

Now t 39, 0.005 Z 0.005 = 2.575


Thus we reject H 0 if t < 2.575 or t > 2.575
X 0
S
n
4. Calculations
31.4 32.3
t=
= 3.558
1 .6
40
5. Decision
Since t observed = 3.558 < 2.575
where t =

Reject H0 ; That is we can say the mean thermal efficiency 32.3


162

Example 12
In 64 randomly selected hours of production, the mean and the s.d. of the number of
acceptable pieces produced by an automatic stamping machine are
X = 1,038 and S = 146. At the 0.05 level of significance, does this enable us to reject the
null hypothesis = 1000 against the alt hypothesis > 1000 ?

Solution
1. The null hypothesis H 0 : = 0 = 1000
Alt hypothesis H 1 : > 1000
2. Level of significance = 0.05
3. Criterion : Reject H 0 if t > t n 1, = t 641, 0.05
Now t 63, 0.05 Z 0.05 = 1.645
Thus we reject H 0 if t > 1.645
4. Calculations: t =

X 0 1,038 1,000
=
= 2.082
S
146
n
64

5. Decision : Since t obs = 2.082 > 1.645


we reject H 0 at 0.05 level of significance.

163

REGRESSION AND CORRELATION


Regression
A major objective of many statistical investigations is to establish relationships that make
it possible to predict one or more independent variables in terms of others. Thus studies
are made to predict the potential sales of a new product in terms of he money spent on
advertising, the patients weight in terms of the number of weeks he/she has been on a
diet, the marks obtained by a student in terms of the number of classes he attended, etc.
Although it is desirable to predict the quantity exactly in terms of the others, this is
seldom possible and in most cases, we have to be satisfied with predicting average or
expected values. Thus we would like to predict the average sales in terms of the money
spent on advertising, the average income of a college student in terms of the number of
years he/she has been out of the college.
Thus given two random variables, X, Y and given that X takes th value x, the basic
problem of bivariate regression is to determine the conditional expected value E(Y|x) as a
function of x. In most cases, we may find that E(Y|x) is a linear function of x:
E(Y|x) = + x, where the constants , are called the regression coefficients.
Denoting E(X) = 1, E(Y) = 2,

Var (X ) = 1,

Var (Y ) = 2, cov(X,Y) = 12, =

12
, we can show:
1 2
Theorem: (a) If the regression of Y on X is linear, then
E(Y|x) = 2 +

2
(x -1)
1

(b) If the regression of X on Y is linear, then


E(X|y) = 1 +

1
(y -2)
2

Note: is called the correlation coefficient between X and Y.


In actual situations, we have to estimate the regression coefficients , from a random
sample { (x1,y1), (x2, y2), (xn, yn)} of size n from the 2-dimensional random variable
(X, Y). We now fit a straight line y = a + bx for the above data by the method of least
164

squares. The method of least squares says that choose constants a and b for which the
sum of the squares of the vertical deviations of the sample points (xi, yi) from the line y
= a+bx is a minimum. I.e. find a, b so that T =

n
i =1

[ y i (a + bxi )] 2 is a minimum. Using


T
T
= 0 and
= 0. Thus we get
a
b

2-variable calculus, we should determine a, b so that


n

the following two equations

(2) [yi (a + bxi)] = 0 and

i =1

( -2xi) [yi (a + bxi)] = 0.

i =1

Simplifying, we get the so called normal equations:


na + (

n
i =1

n
i =1

xi )b =

xi )a + (

n
i =1

n
i =1

yi

xi2 )b = (

n(
Solving we get

b=

n
i =1

n
i =1

xi y i )

xi y i ) (

n(

n
i =1

n
i =1

x )(
2
i

xi ) (
n
i =1

n
i =1

xi )

yi )

(
; a=

i =1

yi ) (
n

n
i =1

xi ) b
.

These constants a and b are used to estimate the unknown regression coefficients , .
Now if x = xg, we predict y as yg = a + bxg.

Problem 1.
Various doses of a poisonous substance were given to groups of 25 mice and the
following results were observed:

Dose (mg)
x
4
6
8
10
12
14
16

Number of deaths
y
1
3
6
8
14
16
20
165

(a) Find the equation of the least squares line fit to these data
(b) Estimate the number of deaths in a group of 25 mice who receive a 7 mg dose of
this poison.

Solution:
(a)

n = number of sample pairs (xi, yi) = 7


xi = 70,

yi = 68

xi2 = 812,

xi yi = 862

Hence b = {7 x 862 70 x 68 } / { 7 x 812 (70)2 } = 1274/784 = 1.625


a = {68 70 x 1.625}/7 = - 6.536
Thus the least square line that fits the given data is: y = -6.536 + 1.625 x
(b)

If x = 7, y = -6.536 + 1.625 x 7 = 4.839.

Problem 2:
The following are the scores that 12 students obtained in the midterm and final
examinations in a course in Statistics:

Mid Term Examination


x
71
49
80
73
93
85
58
82
64
32
87
80

Final Examination
y
83
62
76
77
89
74
48
78
76
51
73
89

166

(a) Fit a straight line to the above data


(b) Hence predict the final exam score of a student who received a score of 84 in the
midterm examination.

Solution:
(a) n = number of sample pairs (xi, yi) = 12
xi = 854,

yi = 876

xi2 = 64222,

xi yi = 64346

Hence b = {12 x 64346 854 x 876 } / { 12 x 64222 (854)2 } = 24048/41348 = 0.5816


a = {876 854 x 0.5816}/12 = 31.609
Thus the least square line that fits the given data is: y = 31.609 + 0.5816 x
(b) If x = 84, y = 31.609 + 0.5816 x 84 = 80.46

Correlation
If X, Y are two random variables, the correlation coefficient, , between X and Y is
defined as
=

cov ( X , Y )

Var ( X ) Var (Y )

It can be shown that


(a) -1 1
(b) If Y is a linear function of X, = 1
(c) If X and Y are independent, then = 0
(d) If X, Y have bivariate normal distribution and if = 0, then X and Y are
independent.

Sample Correlation Coefficient


If { (x1,y1), (x2, y2), (xn, yn)} is a random sample of size n from the 2-dimensional
random variable (X, Y), then the sample correlation coefficient, r, is defined by
167

r=

i =1
n
i =1

( xi x ) ( y i y )

( xi x )

n
i =1

.
( yi y )

We shall use r to estimate the (unknown) population correlation coefficient . If (X, Y)


has a bivariate normal distribution, we can show that the random variable,
1 1+ r
1 1+
1
Z = ln
is approximately normal with mean ln
and variance
.
2 1 r
2 1
n 3
S xy

Note: A computational formula for r is given by r =

where S xx =

S xy =

n
i =1

n
i =1

( xi x ) 2 =

( xi x ) ( y i y ) =

n
i =1

n
i =1

(
xi2

(
xi y i

n
i =1

xi ) 2
n

n
i =1

xi ) (

, S xx =
n

i =1

S xx S yy
n

i =1

( yi y ) 2 =

n
i =1

(
y i2

yi )

Problem 3.
Calculate r for the data { (8, 3), (1, 4), (5, 0), (4, 2), (7, 1) }.

Solution
x = 25/5 = 5. y = 10/5 = 2.
n
i =1
n
i =1
n
i =1

( xi x ) ( y i y ) = 3 x 1 + (-4) x 2 + 0 x (-2) + (-1) x 0 + 2 x (-1) = -7


( xi x ) 2 = 9 + 16 + 0 + 1 + 4 = 30
( y i y ) 2 = 1 + 4 + 4 + 0 + 1 = 10

Hence r =

7
(30) (10)

= - 0.404.

168

n
i =1

yi ) 2
n

Problem 4.
The following are the measurements of the air velocity and evaporation coefficient of
burning fuel droplets in an impulse engine:

Air velocity
x
20
60
100
140
180
220
260
300
340
380

Evaporation Coefficient
y
0.18
0.37
0.35
0.78
0.56
0.75
1.18
1.30
1.17
1.65

Find the sample correlation coefficient, r.

Solution. S xx =

S xx =

S xy =

( xi x ) 2 =

i =1

n
i =1

( yi y ) 2 =

n
i =1

n
i =1

xi2

i =1

i =1

y i2

( xi x ) ( y i y ) =

xi ) 2
n

n
i =1

n
i =1

= 532000 (2000)2 /10 = 132000

yi ) 2
n

(
xi y i

= 9.1097 (8.35)2 /10 = 2.13745

n
i =1

xi ) (

n
i =1

yi )

= 505.4
Hence r =

S xy
S xx S yy

505.4
(132000) (2.13745)

= 0.9515.

**************
169

= 2175.4

(2000) (8.35)
10

Vous aimerez peut-être aussi