Académique Documents
Professionnel Documents
Culture Documents
UNIT 2
PROBABILITY DISTRIBUTIONS
OBJECTIVES
General Objective
Specific Objectives
INPUT
2.0 INTRODUCTION
If a variable can assume only a specific number of values, such as the outcomes
for the roll of a die or the outcomes for the toss of a coin, then the variable is
called a discrete variable. Discrete variables have values that can be
counted.
Variables that can assume all values in the interval between any given two
values are called continuous variables. For example, if the temperature goes
from 60o to 75o in a 24-hour period, it has passed through all possible number
PROBABILITY DISTRIBUTIONS C 5606/2/ 3
from 60 to 75. Continuous random variables are obtained from data that can
be measured rather than counted.
The procedure shown here for constructing a probability distribution for a discrete
random variable uses the probability experiment of tossing three coins. Recall
that when three coins are tossed, the sample space is represented as TTT, TTH,
THT, HTT, HHT, HTH, THH, HHH, and if X is the random variable for the number
of heads, then X assumes the value 0, 1,2, or 3.
1 3
Hence, the probability of getting no heads is , one head is , two heads is
8 8
3 1
, and three heads is . From these values, a probability distribution can be
8 8
constructed by listing the outcomes and assigning the probability of each
outcome , as shown.
Number of heads, X 0 1 2 3
Probability, P(X) 1 3 3 1
8 8 8 8
The sum of the probabilities of all events in the sample space must equal 1; that is
The probability of each event in the sample space must be between or equal to 0
and 1. That is ≤P(X)≤1
Example 2.1
2. Represent graphically the probability distribution for the sample space for
tossing three coins.
3. In the rainy months, a store selling umbrella keeps track of the number of
umbrellas it sells each day during a period of 90 days. The number of
umbrellas sold per day is represented by the variable X. The results are
shown here.
X Number of days
0 45
1 30
2 15
Total 90
1. Since the sample space is S = {1, 2, 3, 4, 5, 6},, and each outcome has a
1
probability of , the distribution is as follows:
6
Outcome X 1 2 3 4 5 6
Probability, P(X) 1
6
1
6
1
6
1
6
1
6
1
6
2. The values X assumes are located on the x-axis, and the values for P(X)
are located on the y axis.
Number of Heads, 0 1 2 3
X
Probability, P(X) 1/8 3/8 3/8 1/8
3. The probability P(X) can be computed for each X by dividing the number
of days X umbrellas sold by total days.
For 0 umbrellas: 45
90
0 .5
For 1 umbrella: 0.33
30
90
Number of umbrellas
PROBABILITY DISTRIBUTIONS C 5606/2/ 7
ACTIVITY 2A
4..
X 3 6 9 12 15
P(X) 4
9
2
9
1
9
1
9
1
9
5.
X 1 32 3 4 5
P(X) 3
10
1
10
1
10
2
10
3
10
6.
X 5 10 15
P(X) 1.2 0.3 0.5
PROBABILITY DISTRIBUTIONS C 5606/2/ 8
For question 10 to 12, construct a probability distribution for the data and draw a
graph for the distribution.
10. The probabilities that a patient will have 0, 1, 2, or 3 medical tests performed
on entering a hospital are 166 , 155 , 153 and 151 , respectively.
FEEDBACK TO ACTIVITY 2A
INPUT
INPUT
b
P(a X b) f ( x)dx with the following requirements satisfied..
a
ii) f ( x ) dx 1
Example 2.2
1
(2 y ) 0 y 2
f(y) = 2
0 others
i) Find P(Y>1)
ii) P(Y<1)
PROBABILITY DISTRIBUTIONS C 5606/2/ 11
iii) P(0.5<Y 1)
iv) Draw the graph for f(y) and double check your answer (i)
through (iii) by finding the area under the graph.
1 y2
0 2
1 2
f ( y)dy 2 (2 y)dy 0dy 0 (2 y)dy 2 ody 0 2 2 y 2
0
0 1
f(y)
1/2
1 2 y
PROBABILITY DISTRIBUTIONS C 5606/2/ 12
1 1
(ii) P(Y<1) = Area of trapezium = 1 .1 3 4
2 2
f(y)
1/2
2 y
13 11 5
iii) P( 0.5 y 1) area of trapezium = 16
24 22
PROBABILITY DISTRIBUTIONS C 5606/2/ 13
f(y)
1
3/4
1/2
1/2 1 2 y
ACTIVITY 2B
0.2 1 y 0
f(y) = 0.2 + cy 0 y 1
0 others
c) P(Y>0.5)
d) P(Y)>0.5 / P(Y)>0.1
1 2
a) P (X) = x = ( ) x = 1,2,3
2
b) f(x) = 12x2 (1-x) 0<x<1
1 2 x
c) f(x) = ( )x e 0<x<∞
2
PROBABILITY DISTRIBUTIONS C 5606/2/ 15
FEEDBACK TO ACTIVITY 2B
1. c = 1.2 (a) 0.25, (b) 0.35, (c) 0.55, (d) P(Y>0.5) = 0.55, P(Y)>0.1 = 0.774
P(Y)>0.5 / P(Y)>0.1 = 0.71
2. a) c = ¼
1
y = ¼(2x-3)
PROBABILITY DISTRIBUTIONS C 5606/2/ 16
3/4
1/2
y = 1/4
1/4
1 2 3
c) P = ¼
d) P = 5/16
e) P = 0.3475
3. a) P(A∩B) = 0
11
b) P(AUB) =
20
4. a) X = 1
2
b) X =
3
c) X = 2
PROBABILITY DISTRIBUTIONS C 5606/2/ 17
INPUT
INPUT
Many types of probability problems have only two outcomes, or they can be
reduced to two outcomes. For example when a coin is tossed, it can land heads
or tails. When a baby is born, it will be either male or female. A multiple-choice,
even though there are four or five answer choices, can be classified as correct or
incorrect. Situation like these are called binomial experiments.
PROBABILITY DISTRIBUTIONS C 5606/2/ 18
1. Each trial can have only two outcomes or outcomes that can be reduced to
two outcomes. The outcomes can be considered as either success or failure.
2. There must be a fixed number of trials.
3. The outcomes of each trial must be independent of each other.
4. The probability of a success must remain the same for each trial.
n!
P( X ) . p x .q n x
( n X )! X !
Example 2.3
1. A coin is tossed three times. Find the probability of getting exactly two
heads.
0 There are exactly five people in the sample who are afraid
being alone at night.
PROBABILITY DISTRIBUTIONS C 5606/2/ 20
1 There are at most three people in the sample who are being
afraid of being alone at night.
2 There are at least three people in the sample who are afraid
of being alone at night.
1. This problem can be solved by looking at the sample space. There are
three ways to get two heads.
S = {HHH, HHT,HTH, THH, TTH, THT, HTT, TTT}
3
The answer is 8 , or 0.375
2. From the standpoint of a binomial experiment, you can show that it meet
the four requirements.
a. There are only two outcomes for each trial, heads or tails.
b. There is a fixed number of trials (three)
c. The outcomes are independent of each other (the outcome of one toss in
no way affects the outcome of another toss).
d. The probability of a success (heads) is 12 in each case.
1 1
In this case, n = 3, X = 2, p = 2 , and q = 2 . Hence substituting in the
3! 1 1 3
formula gives P(2 heads) = ( ) 2 ( )1 0.375
(3 2)!2! 2 2 8
3 2
5! 1 4
P(3) = 0.05
(5 3)!3! 5 5
4. You need the binomial distribution table. Fit in the proper values and the
answer is 0.375.
iii) n = 20 and p = 0.05. “At least three people” means 3, 4, 5,…, 20. This
problem can best be solved by finding P(0) + p(1) + P(2) and
subtracting from 1.
1 – 0.924 = 0.076
ACTIVITY 2C
iii) Testing four different brands of aspirin to see which brands are
effective.
iv) Testing one brand of aspirin using 10 people to see which brands
are effective.
Feedback to Activity 2C
INPUT
INPUT
In the use of the binomial distribution, the outcomes must be independent. For
example, in a selection of components from a batch to be tested, each
PROBABILITY DISTRIBUTIONS C 5606/2/ 24
component must be replaced before the next one is selected. Otherwise, the
outcomes are not independent. However, a dilemma arises because there is a
chance that the same component could be selected again. This situation can be
avoided by not replacing the component and using the hyper geometric
distribution to calculate the probabilities.
mean n. p
variance 2 n. p.q
standard deviation = n. p.q
Example 2.4
1. A coin is tossed is tossed four times. Find the mean, variance, and standard
deviation of the number of heads that it will be obtained.
2. A die is rolled 480 times. Find the mean, variance, and standard deviation
of the number of 2 s that will be rolled.
n. p 4. 12 2
2 n. p.q 4. 12 . 12 1
1 1
n. p 480. 16 80
ACTIVITY 2C
1.A statistical bulletin reported that 2% of all births result in twins. If a random
sample of 8000 births is taken, find the mean, variance, and standard deviation
of the number of births that would result in twins.
2.If 2% of automobile carburetors are defective, find the mean, variance, and
standard deviation of 500 carburetors.
FEEDBACK TO ACTIVITY 2C
1
2. 108
INPUT
INPUT
PROBABILITY DISTRIBUTIONS C 5606/2/ 28
What is Normal?
It is known that the normal range of systolic blood pressure 110 to 140. The
normal interval for a person’s triglycerides is from 30 to 200 mg/dl. By measuring
these variables, a doctor can determine if a patient’s vital statistics are within the
normal interval, or if some type of treatment is needed to correct the condition
and avoid future illness. The question then is ‘How does one determine the so-
called normal interval?’
Recall that a continuous variable can assume all values between any two given
values of the variables. Many continuous variables have distributions that are
bell-shaped and are called approximately normally distributed variables. For
example, if we select a random sample of 100 adult women, measure their
heights, and construct a histogram, we get a graph similar to the one in fig (a).
Now if we increase the sample size and decrease the width of the classes, the
histogram will look like the ones shown in fig. (d) and (c). Finally, if it were
possible to measure exactly the heights of all adult females in Malaysia and plot
them, the histogram would approach what is called the normal distribution,
shown in (d).
When the data values are evenly distributed about the mean, the distribution is
said to be symmetrical. See fig (a) below. When the majority of the data values
PROBABILITY DISTRIBUTIONS C 5606/2/ 29
fall to the left or right of the mean, the distribution is said to be skewed. See fig
(b) and (c)
For the sake of simplicity, a theoretical curve, called the normal distribution curve
can be used to study many variables that are not perfectly normally distributed
but are nevertheless approximately normal. The mathematical equation for the
2
) /( 2 2 )
e ( X
normal distribution curve is y , where e 2.718, =3.14, =
2
population mean, = population standard deviation.
Another important aspect in statistics is that the area under the normal
distribution curve is more important than frequencies. Therefore, when the
normal distribution is pictured, the y axis, which indicates the frequencies, is
sometimes omitted.
The shape and position of the normal distribution curve depend on two
parameters, the mean and standard deviation. Each normally distributed variable
has its own normal distribution curve, which depends on the values of the
variable’s mean and standard
deviation.
PROBABILITY DISTRIBUTIONS C 5606/2/ 30
Fig (a) shows two normal distributions with the same mean values but different
standard deviations. The larger the standard deviation, the more dispersed, or
spread out the distribution is. Fig (b) shows two normal distributions with the
same standard deviation with the same standard deviation but with different
means. These curves have the same shapes but are located at different positions
on the x axis. Fig (c) shows two normal distributions with different means and
different standard deviation.
Since each normally distributed variable has its own mean and standard
deviation, the shape and location of these curves will vary. In practical
applications, one would have to have a table of areas under the curve for each
variable. To simplify this situation, statisticians use what is called the standard
normal distribution.
PROBABILITY DISTRIBUTIONS C 5606/2/ 32
The values under the curve indicate the proportion of area in each section. For
example, the area between the mean and one standard deviation above or below
the mean is about 0.3143, or 31.43%.
x2
2
All normally distributed variables can be transformed into the standard normally
distributed variable by using the formula for the standard score
value mean X
z= or z
sdeviation
This is the same formula used in Section 3-4. The use of this formula will be
explained in the next session.
As stated earlier, the area the normal distribution curve is used to solve practical
application problems, such as finding the percentage of adult women whose
height is between 5 feet 4 inches and 5 feet 7 inches, or finding the probability
that a new battery will last longer than four years. Hence, the major emphasis of
this section will be to show the procedure for finding the area under the standard
normal distribution curve for any z value. The application will be shown in the
next section. Once the X values are transformed using the preceding formula,
they are called z value. The z value is actually the number of standard deviations
that a particular X value is away from the mean.
For the solution of problems using the normal distribution, a four-step procedure
is recommended with the use of the Procedure Table shown below.
Example 2.5
Find the probability (i.e. the area under the curve) for each question 1 through
17.
PROBABILITY DISTRIBUTIONS C 5606/2/ 35
Question 1
a. P( Z 0.5)
b. P(Z 1. 0)
c. P(Z 1.53)
d. P( Z 1.998)
e. P( Z 1.86)
Question 2
Question 1
PROBABILITY DISTRIBUTIONS C 5606/2/ 36
Question 2
PROBABILITY DISTRIBUTIONS C 5606/2/ 37
PROBABILITY DISTRIBUTIONS C 5606/2/ 38
ACTIVITY 2D
1. Find the area to the right of Z = 2.43 and to the left of Z = -3.01.
3. Find the z value such that the areas under the normal distribution curve
between 0 and the z value is 0.2123.
PROBABILITY DISTRIBUTIONS C 5606/2/ 39
FEEDBACK TO ACTIVITY 2D
1. 0.0088, or 0.88%
2. i) 0.4898, or 48.98%
ii) 0.9505, or 95.05%
iii) 0.0281, or 2.81%
3. 0.56
PROBABILITY DISTRIBUTIONS C 5606/2/ 40
INPUT
INPUT
For example, suppose that the scores for a standardized test are normally
distributed, have a mean of 100, and have a standard deviation of 15. When the
scores are transformed into z values, the two distributions coincide, as shown in
the figure below. (Recall that the z distribution has a mean of 0 and a standard
deviation of 1)
PROBABILITY DISTRIBUTIONS C 5606/2/ 41
Example 2.6
1. If the scores for the test have a mean of 100 and a standard deviation of
15, find the percentage of scores that will fall below 112.
1.
PROBABILITY DISTRIBUTIONS C 5606/2/ 43
3.
STEP 3 Find the appropriate area. The area obtained from the table is
0.4868, which corresponds to the area between z = 0 and z = -2.2
STEP 5 To find how many calls will be made in less than 15 minutes,
multiply the sample size (80) by the area (0.0132) to get 1.056.
Hence, 1.056, or approximately one, call will be responded to in
under 15 minutes.
PROBABILITY DISTRIBUTIONS C 5606/2/ 45
4.
STEP 1 Subtract 0.1000 from 0.5000 to get area under the normal
distribution between 200 and X: 0.5000 – 0.1000 = 0.4000
X
STEP 3 Substitute in the formula z and solve for X.
X 200
1.28 of which the answer is X = 226.
20
A score of 226 should be used as cutoff. Anybody scoring 226 or
higher qualifies.
5. Assume that blood pressure readings are normally distributed; then cutoff
points are as shown in the figure below.
PROBABILITY DISTRIBUTIONS C 5606/2/ 46
Note that two values are needed, one above the mean and one below the
mean. Find the value to the right of the value first. The closest z value for an
area of 0.3000 is 0.84. Substituting in the formula X z , one gets
ACTIVITY 2E
1. The mean income of daily labors is RM60. Assume the standard deviation
RM10. If a labor is selected at random, what is the probability that his
salary is -
a) more than RM70
b) below RM62
c) more than RM64
d) between RM62 and RM72
e) between RM55 and RM80
3. Chip-board partition units are produced with a mean length of 75mm and
standard deviation 6 mm. Determine the rejection rate if the permissible
length deviations are 15 mm (assume the lengths to be normally
distributed).
PROBABILITY DISTRIBUTIONS C 5606/2/ 48
FEEDBACK TO ACTIVITY 2E
1. a) P(X>70) = 0.1567
b) P(X<62) = 0.5793
c) P(X>54) = 0.7257
d) P (62 X 72) 0.3056
e) P (55 X 80) 0.6687
2. a) P(8.9<X<11.1) = 0.3966
b) P(X>15) = 2.24
SELF-ASSESMENT 2
You are approaching success. Try all the questions in this self-assessment
section and check your answer given on the next page. If you face any problems,
consult your instructor. Good luck.
X 3 6 9 12 15
P(X) 4/9 2/9 1/9 1/9 1/9
Construct a probability distribution for the data and draw a graph for the
distribution.
3. The number of cups of coffee a fast food restaurant serves each day.
4. The weight of a rhinoceros
5. The probability that a patient will have 0, 1,2 or 3 medical tests performed
6 5 3 1
on entering a hospital are , , and ,
15 15 15 15
respectively.
6. The probabilities that a customer will purchase 0,1,2, or 3 books are 0.45,
0.30 ,0.15 and 0.10, respectively.
FEEDBACK TO SELF-ASSESSMENT 2
Have you tried the questions??? If “YES”, check your answers now.
2. Yes
3. Discrete
4. Continuous