Vous êtes sur la page 1sur 70

Chapter 2

Probability and Statistics


Review
2.1 Probability Review
2.1.1 Introduction
The intent of this chapter is to provide a review of the basic probability
and statistical topics needed for the study of reliability. This chapter is
not a substitute for a calculus-based probability and statistics course. This
chapter covers primarily those topics from probability and statistics which
are necessary to understand the statistical treatments used in this book.
It has become increasingly clear that a basic uncertainty exists in the
outcomes of real-world processes. Often it is useful to be able to predict the
likelihood of the occurrence of certain of these outcomes. Probability theory
often employs mathematical models which have the necessary quality of con-
sistency and also have sucient exibility to describe realistic situations. In
addition, probability theory has been used successfully and practically to ex-
tend the uncertainty of basic outcomes to a determination of the likelihood of
complex events. What follows is a sketch of the models of probability theory.
2.1.2 Experiments, Sample Spaces and Events
Denition: An experiment is any process whose possible outcomes can
be identied and whose actual outcome can be observed but not determined
in advance. Although the actual outcome of a particular experimental trial
21
22 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
cannot be determined in advance of the trial, the set of possible outcomes
can be known and that set is called the sample space and denoted by S.
SAMPLE SPACE:
Denition: A sample space of an experiment is the set of possible outcomes
of the experiment. Sample spaces are often classied into discrete sample
spaces, in which there are either a nite number of outcomes or a countably
innite number of outcomes, and continuous sample spaces, in which there
are a non-countable number of outcomes.
EXAMPLES OF EXPERIMENTS AND SAMPLE SPACES:
Experiment 1: A coin is tossed 3 times
S={HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}, 8 possible outcomes.
Experiment 2: A black die and a red die are rolled
S={(1,1),(1,2),...,(1,6),(2,1),...,(6,6)},36 possible outcomes, see Figure 2.1
Figure 2.1: Sample space of the dice experiment
Experiment 3: A spinner is spun and the point on the chord of a circle
is noted.
S = {x : x(0, 2)}, a continuous sample space
Experiment 4: A spinner, as in Figure 2.2 is spin twice and the number
2.1. PROBABILITY REVIEW 23
pair noted S={(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)},
Figure 2.2: Simple Spinner
Experiment 5: The number of phone calls coming into a major tele-
phone exchange in 1 hour is observed S={0,1,2,3,...}, a countably innite
sample space
Experiment 6: The time to failure of an electrical component is ob-
served (hours) S = {t : t R, t > 0}, a non-countable innite sample space
Experiment 7: A sample of 3 components is observed; each component
can be non-defective(N) or defective(D)
S = {NNN, NND, NDN, DNN, NDD, DND, DDN, DDD} ,
EVENTS: Collections of outcomes from S are called events.
Denition: An event is a sub-group of the outcomes of S. Events are most
often denoted by capital letters, A, B, . . . or A
1
, A
2
, . . . . Note that both the
empty set and S itself are events, that is, subsets of S. might be called
an impossible event since it contains no possible outcomes; likewise, S might
be called a certain event.
EXAMPLES OF EVENTS:
Experiment 1: Event A: the outcomes which result on at least 2 heads
A={HHH,HHT,HTH,THH}
Experiment 2: Event B: the outcomes where the sum of the spots is 8
B={(2,6),(3,5),(4,4),(5,3),(6,2)}, in Figure 2.1, B is the diagonal of points.
24 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Experiment 5: Event C: the number of calls is greater than 5, C={6,7,8,...}
Experiment 6: Event D: the time to failure is between 100 and 200
hours D={(100,200)}
Since events are point sets, the language and operations of set theory are
useful in the discussion of probability. Some basic denitions of set opera-
tions are reviewed below and their relationships to events are indicated in
this table .
Table
NOTATION SET LANGUAGE EVENT LANGUAGE
S Universal Set Sample Space (Certain Event)
Empty Set Impossible Event
A
c
Complement of A Event A does not occur
(points in S that
are not in set A)
A B A union B Event A or Event B
(points that are in Set A or both occur
or Set B or both
A B A Intersect B Event A and Event B
(points that are in both both occur
Set A and Set B)
A B = Set A and Set B are Event A and Event B
disjoint are mutually exclusive
Figure 2.3 shows a set or Venn diagram indicating a sample space with two
sets, A and B, their intersection and their union.
2.1.3 Denition of probability
Usually the ways of describing probabilities are more the ways of examining
the relevance of a probability model to a realistic situation at hand. Although
the theory of probability often aids in the modeling and understanding of the
occurrence of random physical phenomena, probability theory is a mathemat-
ical model constructed by means of the axiomatic method. Viewed in this
way then, the ways of describing probabilities are methods for identifying the
undened terms of a mathematical theory, hopefully with some relationship
to real phenomena.
2.1. PROBABILITY REVIEW 25
Figure 2.3: Venn Diagram with 2 Sets
In the axiomatic approach, probability is dened as a function on the
events.
Denition: If an experiment has sample space S and event A is dened
on S, then P(A) is a real number called the probability of A. The probability
function must follow three axioms:
1) 0 P(A) 1, for every event A
2) P(S) = 1
3) For any sequence of events: A
1
, A
2
, . . . which are mutually exclusive
(that is, A
i
A
j
= , for i = j), then P(A
i
) =

P(A
i
).
Whenever values P(A) satisfy the above axioms, it has been shown that
a complete theory of probability can be developed as a consistent mathe-
matical system. That is true regardless of how the values P(A) are assigned,
other than satisfying the axioms. It is entirely another question to have the
values P(A) model reality. The modeling of reality by a probability system
will be discussed in the next section.
2.1.4 The assignment of probabilities to nite sample
spaces
The assignment of probabilities is most easily examined and most easily ac-
complished in a physically meaningful way when the sample space S is nite.
Three usual ways of assigning probabilities will be examined in this section.
26 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
In the case of nite S, S= {O
1
, O
2
, . . . , O
n
}, where O
i
represents the ith pos-
sible outcome. With each O
i
is associated a value p
i
, which is assigned so
that:
1. p
i
0, for all i;
2.

p
i
= 1.
In addition, the probability of an event is the sum of the p
i
values associated
with the O
i
contained in the event subset. That is, P(A)= p
i
, where the sum
is over N
A
, the number of outcomes in the event A. Probabilities assigned
this way satisfy the three axioms and thus have the properties of probability
theory which follow from the axioms. If probability theory is to be a model
for which the probability of an event has physical meaning, it remains to
assign the values p
i
in a physically meaningful way. Three ways of doing so
are now outlined.
a) EQUALLY LIKELY OUTCOMES If there are N possible out-
comes in S and it is determined that the outcomes are equally likely, then
p
i
=
1
N
for all i. This is easily implemented as long as the number of possible
outcomes in S is known and results in an association of probability with like-
lihood. It is also possible that some experiments that do not t the above
criteria can be described in such a way as to use the equally likely method of
assignment. For example, some innite sample spaces can be viewed in such
a way, although care must be used in doing so and it is possible that such
tted assignments are not unique. In the case of equally likely outcomes,
P(A) =
N
A
N
, where N
A
is as before, the number of outcomes in the event A.
EXAMPLES OF EQUALLY LIKELY PROBABILITY ASSIGN-
MENTS:
Experiment 1: Probability of
1
8
is assigned to each possible outcome
Event A: the outcomes that result in at least 2 heads
P(A) =
N
A
N
=
4
8
=
1
2
Experiment 2: Probability of 1/36 is assigned to each possible outcome
Event B: the outcomes where the sum of spots is 8
P(B) =
N
B
N
=
5
36
2.1. PROBABILITY REVIEW 27
Experiment 4: In this experiment, the 9 possible outcomes are not
intuitively equally likely. However, the equally likely concept can be in-
voked to induce an intuitively appealing assignment of probabilities. Also,
to stay within the nite sample space case, assume that the spinner stops
only on the degree marks of the circle. Then, within the 1 area, there
are 180 degrees and in each of the other two, there are 90 degrees. In
this case, each of the 360 degree marks can intuitively be considered as
an equally likely stopping place. In the case of one spin: P(1) =
180
360
=
1
2
, P(2) =
90
360
=
1
4
and P(3) =
90
360
=
1
4
. For the sample space of two spins:
S={(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)} and the possible out-
comes are not intuitively equally likely. However, by again assuming that the
degree marks are all equally likely and using a simple argument on lengths of
arc, it can be seen that the equally likely assumption induces the following
assignment of probabilities for the points in S:
P(1, 1) =
1
4
, P(1, 2) = P(2, 1) = P(1, 3) = P(3, 1) =
1
8
,
P(2, 3) = P(3, 2) = P(2, 2) = P(3, 3) =
1
16
For this experiment, an easier method of assignment for the sample space
of two spins will be available as more theory is developed. In any case, the
question of assigned probabilities versus induced probabilities is not always
obvious. As a general rule, it seems more straight forward to assign probabil-
ities at the most basic or primary level of the experiment, at least for nite
sample spaces. In other cases, as will be shown, it may be easier to assign
probabilities at a more developed stage of the problem.
b)FREQUENCY OF OCCURRENCE OF OUTCOMES:
If the experiment can be thought of as a random experiment that is repeat-
able, then the probability of an event may be thought of as the relative
frequency of occurrence of the outcomes in the event. The rolling of dice
experiment can obviously be thought of as repeatable and the probability of
event B above, the outcomes where the sum is 8, is the relative frequency of
the sum of 8 in a large number of rolls of the dice, In addition, the relative
frequency assignment satises the three axioms and results in a consistent
model for probability.
Note that in the equally likely assignment of probability, a drawback to
the method of assignment is the necessity that there be a nite number of
possible outcomes and that they all be equally likely. In the case of the
28 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
relative frequency assignment, a drawback is the necessity that the experi-
ment be assumed to be repeatable and repeatable under essentially the same
conditions. The relative frequency method is more often used to verify a par-
ticular assignment of probability or to check its reasonableness rather than
be used as an assignment method itself.
The following example indicates some of the ambiguity that might arise
using the relative frequency method of probability assignment. Note, how-
ever, that dierences that result between using the relative frequency method
of assignment and using the equally likely method, if both are applicable, are
most often quite small. Some sources recommend that
N
A
N
, where N
A
is the
number of occurrences of event A and N is the total number of occurrences,
be used as the assignment as N goes to innity. The diculties induced by
questions of an optimal number of occurrences to observe before using the
assignment raise questions as to the practicality of this recommendation.
EXAMPLES OF RELATIVE FREQUENCY PROBABILITY
ASSIGNMENTS:
Experiment 2: In 1000 rolls of the two dice, the event B, sum=8, was
observed 137 times. On this basis P(B) could be chosen to be .137 which is
close to the equally likely assignment of 5/36. It is possible that for these
dice, P(B) is .137 of that if more rolls were observed, the relative frequency
of event B would be closer to
5
36
. If P(B)=.137 is used as the assignment,
P(B
c
) = .863 and, of course, it is true that B-complement was observed
on 863 of the 1000 rolls. In any case, assignments on the basis of relative
frequency are consistent with the theory of probability.
In this experiment it is obvious that the 8 outcomes are not equally likely.
If there is a past history of samples of size 3 for the testing of this compo-
nent, then the relative frequency method of assignment could be used in a
straightforward manner. If only data for individual components is available,
the relative frequency assignment can still be used, but more theory of prob-
ability is needed to do so. This example will be treated again in a later
section of this chapter.
c) SUBJECTIVE PROBABILITY ASSIGNMENTS:
Not all experiments are repeatable and not all result in an equally likely set
of possible outcomes. In fact, there are many instances in practice where
neither of these conditions are true and it is still important to attach a value
2.1. PROBABILITY REVIEW 29
of likelihood to an event. For example, a space vehicle with a very specic
mission is to be launched and a value for the likelihood that this specic
mission will succeed is required.
In this situation, probabilities can be assigned subjectively and can be
taken to denote the degree of belief in the occurrence of the event. If proba-
bilities are assigned subjectively, it allows one to incorporate ones experience
with similar experiments or ones engineering judgment into the assignment.
Other types of information can be included as well, including information
that would result in a relative frequency assignment or an equally likely as-
signment. The requirements of a subjective probability assignment are that
they be assigned in a coherent and consistent manner so that the assignment
is consistent in preferences and so that the assignment does not contradict
the three axioms of probability.
There is some concern that this method of assignment will result in prob-
abilities with which reasonable people will fail to agree. This implies that
using the other methods reasonable people will agree with the assignment
and that the other methods do not include any subjectivity. However, it is
the experience of many that subjective elements are present in all methods
of assignment. Thus all methods should include criteria for reasonableness
of judgment. Such criteria for consensus and consistency of agreement have
been suggested for the subjective method of assignment of probability. An
interested reader is referred to Savage (1954) or Lindley (1969).
It seems to be important that the theory of probability include as applica-
tions the many cases of interest where the outcomes of an experiment are not
equally likely nor repeatable. It is still useful that the guidance provided by
probability theory be available. In many cases, incorporation of engineering
judgment, for example, into an experiment is extremely valuable; and, the
subjective denition of probability allows the incorporation of this kind of
information. In fact, many feel that it is neglectful to ignore it.
The subjective assignment of probability allows one to examine and ma-
nipulate probabilistically ones degree of belief in an outcome and to examine
its eect on the degrees of belief in more complex events. In addition, the use
of the subjective assignment widens the range of applicability of probability
theory.
EXAMPLE OF SUBJECTIVE PROBABILITY ASSIGNMENT:
Suppose it is required to state the likelihood of success of the next space shut-
tle ight. In this case, success is dened as the lack of a catastrophic failure.
30 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
This event will only occur once and yet it is important to indicate the chance
that it will succeed. There have been only 26 shuttle ights so far and there
would be consensus that this number is too small to use the relative frequency
assignment as a subjective probability. There would also be consensus that
the probability of success is greater than the
1
26
= .038 that would be used if
the assignment were done this way.
Prior to the Challenger accident in January, 1986, NASA estimated that
the chance of such a catastrophic solid rocket booster failure was roughly
1 in 100,000 or .00001. This estimate was produced for the Department of
Energy for use in a risk analysis. After the Challenger disaster, Richard
Feynman did a subjective analysis by adjusting (for improved technology)
the estimate obtained by using data from 2900 solid rocket booster ights
across all military and NASA programs. He proposed an adjusted estimate
of 1 failure per 50-100 launchings or a .02 to .01 chance of a solid rocket
booster failure.
It is not easy to attach an exact value to the probability of success of
such an event but to attempt to do so is thought by many analysts to be
worth the eort. The analysis in the short description here of the assessment
attempts contains information, not only of the probability values but also of
the diculty and range of the assessments. The discrepancy among informed
experts in the assignment of probabilities gives additional information about
the precision of an assignment. This information is useful in subsequent anal-
yses. This topic will be taken up again in a later section on Bayes analysis.
2.1.5 Some theorems of probability
Using only the three axioms, a number of useful properties of probabilities
can be proved. Note that the events A and A
c
are mutually exclusive and
that A A
c
= S.
The law of complement follows:
LAW OF COMPLEMENT:
P(A
c
) = 1 P(A) (2.1)
A second property which follows from the axioms is called the general addi-
tion law. It states that:
2.1. PROBABILITY REVIEW 31
GENERAL ADDITION LAW:
P(A B) = P(A) + P(B) P(A B) (2.2)
Note that, if A and B are mutually exclusive,A B = , and (2.2) be-
comes:
P(A B) = P(A) + P(B) (2.3)
Also, notice that the probabilities of more complicated events are often
more easily computed using some of these developed properties of probabil-
ity theory. For example, to illustrate the use of (2.1) and (2.2), consider the
following examples.
EXAMPLES OF USE OF THEOREMS:
Experiment 2: Event C: the outcomes where both dice exhibit even
numbers, P(C) =
N
C
N
=
9
36
=
1
4
. Recall Event B: outcomes where sum is 8
P(B C) = P(B) + P(C) P(B C) =
5
36
+
9
36

3
36
=
11
36
In this case, P(B C) could have been determined as easily by counting.
Experiment 4: Event D: both spins result in the same value,
P(D)=P(1,1)+P(2,2)+P(3,3)=
1
4
+
1
16
+
1
16
=
6
16
=
3
8
.
Event D
c
: both spins result in dierent values, P(D
c
) = 1 P(D) =
5
8
.
2.1.6 Conditional probability and independent events
Most probabilities are functions of two events. The rst is the event being
considered and the second, the conditioning event, describes the conditions
under which the rst is being considered. In many applications the con-
ditioning event is ignored or averaged over, resulting in what are denoted
as marginal probabilities. However, it is important to consider conditioning
events and to present probability rules relating to them. Conditional proba-
bilities(probabilities of conditional events) follow the same three axioms. The
resulting theory is based on a subspace of the sample space which is induced
by the restrictions imposed by the condition.
32 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Suppose that events A
1
and A
2
are among the subsets of S and suppose
that interest lies in the probability of A
1
. Now suppose also that there is
information that A
2
has occurred. Then the interest is in the probability
of event A
1
, given that event A
2
has occurred. This probability is written,
P(A
1
|A
2
). The information that A
2
has occurred is a restriction of S, namely
a restriction to a consideration of only the possible outcomes contained in the
event A
2
. See Figure 2.4. The shaded area consists of the possible outcomes
of which the set (A
1
|A
2
) is comprised. In a sense, the event A
2
is considered
a new sample space on the basis of the information that A
2
has occurred.
This also indicates that the probabilities assigned to the possible outcomes
within event A
2
(the new restricted sample space must be adjusted to sum to
one. Then the conditional probability P(A
1
|A
2
) is the sum of these adjusted
probabilities assigned to the possible outcomes in the set (A
1
|A
2
), that is,
the shaded area of Figure 2.4. This can also be achieved by multiplying by
1
P(A
2
)
so that:
P(A
1
|A
2
) =
P(A
1
A
2
)
P(A
2
)
(2.4)
Figure 2.4: Venn diagram
EXAMPLES OF CONDITIONAL PROBABILITY:
Experiment 2: S={(1,1),...,(6,6)}. See Figure 2.1.
a) Suppose an observer notes that the black die is a 4. What is the probabil-
2.1. PROBABILITY REVIEW 33
ity that the sum of spots is 6?The information restricts the new sample space
to the 6 possible outcomes {(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)}. The probabil-
ity
1
6
is assigned to each outcome in the restricted space, whereas the other
30 outcomes in S have probability zero. Also, P(sum = 6|Black = 4) =
1
6
.
b) Using (2.4) to adjust the probabilities assigned to the outcomes of the
original S,
P(Sum = 6|Black = 4) =
P(Sum=6Black=4)
P(Black=4)
=
1
36
/
6
36
=
1
6
.
Note that knowledge of the outcome of the black die changes the proba-
bility that the sum is 6. Without knowledge of the outcome of the black die,
P(Sum=6)=5/36 and with the knowledge that the black die is 4, P(Sum =
6|Black = 4) =
6
36
. Thus the information that the black die is a 4 slightly
increases the probability that the sum is a 6. The event, (Sum=6),is said to
be dependent of the event, (Black=4).
An extreme example of dependency of two events is the relationship be-
tween the event, (Sum=6), and the event, (Black=6). In this case, knowledge
that the black die has the outcome 6, species that the sum cannot be 6,
that is, if the black die is a 6, the event, (Sum=6), cannot occur. Thus
mutually exclusive, non-empty, events are dependent because the occurrence
of one prohibits the occurrence of the other. Next, consider the relationship
between the events, (Black=4) and (Sum=7).
P(Sum = 7|Black = 4) =
P(Sum=7Black=4)
P(Black=4)
=
1
6
= P(Sum = 7)
The occurrence of the event, (Black=4), does not change the probability
that the sum is 7. That is, the probability of the event, (Sum=7), in the
restricted sample space is the same as the probability of that event in S.
When this occurs, it is said that the two events are independent.
Denition: Two events, A
1
and A
2
, are independent, if, and only if,
P(A
1
|A
2
) = P(A
1
) (2.5)
Note that (2.5) is equivalent to the denition of independence:
P(A
1
A
2
) = P(A
1
).P(A
2
) (2.6)
34 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Note also, from (2.4), that in general, there is a multiplication rule
P(A
1
A
2
) = P(A
1
|A
2
).P(A
2
) (2.7)
Probability Trees: The additional theory of conditional and indepen-
dent events allows a choice for the determination of a sample space for exper-
iments. It also allows the use of an appealing graphical procedure for outlin-
ing a sample space called a probability tree. Consider experiment 4,where a
spinner is spun twice. On therst spin, the outcome could be 1, 2 or 3, with
probability 1/2, 1/4 and 1/4, respectively. See Figure 2.2 and Figure 2.4. On
the second spin, the same outcomes are possible, with the same probabilities.
If it is assumed that the two spins result in independent outcomes, then the
probabilities of the 2-spin outcomes are the same as those determined earlier.
The diagram in Figure 2.5 is called a probability tree.
Figure 2.5: Tree diagram
2.1. PROBABILITY REVIEW 35
2.1.7 Rule of total probability and Bayes rule
Let A
1
, A
2
, . . . , A
k
form a partition of S, that is, the As are mutually exclu-
sive and their union is S. For example, A and A
c
form a partition of S. Then
if B is any event, that is, B is a set in S, then:
B = (A
1
B) (A
2
B) (A
k
B) (2.8)
Since the As are mutually exclusive, the sets (A
i
B) are all mutually
exclusive and (2.3) gives the rule of total probability:
P(B) = P(A
1
B) + P(A
2
B) + + P(A
k
B) (2.9)
For example, for any events A and B in S, B = (A B) (A
c
B). See
Figure 2.6. Then, P(B) = P(A B) + P(A
c
B).
Figure 2.6: Venn diagram
This sets the stage for a probabilistic relationship that has potential for
being extremely useful in applications of probability and thus reliability. This
relationship, called Bayes Rule, is now seen to be a simple extension of the
rule of conditional probability. To be rst to discover an obvious relation-
ship, however, is still quite an achievement. Bayes Rule was rst published
36 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
by Rev. Thomas Bayes in 1764. It follows from the multiplication rule:
P(A B) = P(A|B)P(B) = P(B|A)P(A).
Using the right-hand sides of the equation and the rule of total probability,
Bayes Rule is:
P(A|B) =
P(B|A) P(A)
P(B)
=
=
P(B|A) P(A)
P(A
1
) P(B|A
1
) + P(A
2
) P(B|A
2
) + + P(A
k
) P(B|A
k
)
, (2.10)
where A
1
, A
2
, . . . , A
k
is a partition of S.
Now, examine the potential of Bayes Rule. Suppose P(B|A) and P(A)
are known, where A is one of the A
i
in the partition. Also, suppose that
the event B is observed to occur. In this case, the question is asked: Does
the occurrence of B inuence the probability of occurrence of A? And, if it
does, how? Bayes Rule tells how. One view of statistics is as a means of
using observation to adjust ones beliefs or prior probability. Bayes Rule
indicates how to make that adjustment under certain restrictions.
Consider the case where an experimenter begins an analysis with a prior
probabilistic belief about the value of a parameter. The prior belief usually
comes from information, in the case of engineering studies, that is based
on engineering judgment of past experience. Bayes Rule is a way of proba-
bilistically adjusting that prior belief using some observed test results. The
diculty in applying Bayes Rule often comes from the diculty in getting
the prior belief into a proper form.
In the form of Bayes Rule of equation (2.10), the P(A) is called the prior
probability and it represents the prior belief about the event A. The P(A|B)
is called the posterior probability and it represents the probability of event
A after observing that the event B has occurred.
EXAMPLE OF THE USE OF BAYES RULE:
A new component is being developed which is essentially the same as a
previously produced component. However, improved materials and processes
have indicated that the new component may exhibit improved performance
over the prototype. Tests with the prototype led engineers to believe that
2.1. PROBABILITY REVIEW 37
its production results in components of 3 types. The rst type T
1
rarely fails
(assume the probability of failure to be .00001); the second type T
2
fails 1%
of the time; and, the third type T
3
fails 10% of the time. In the experience,
the production results in approximately equal numbers of the 3 types. A
random group of 10 of the new components is tested with no failures. Bayes
Rule can be used to determine the new frequency of the 3 types. Let event
N represent non-defectiveness (not a failure):
P(T
1
) = 0.333 P(T
2
) = 0.333 P(T
3
) = 0.333
P(N|T
1
) = 0.99999 P(N|T
2
) = 0.99 P(N|T
3
) = 0.9
P(10N|T
1
) = 0.9999 P(10N|T
2
) = 0.904 P(10N|T
3
) = 0.349
Using the rule of total probability, P(10 N)=0.75. Then
P(T
1
|10N) = 0.44, P(T
2
|10N) = 0.40, P(T
3
|10N) = 0.16
Thus, the probability of T
1
has increased from 0.333 to 0.44, the prob-
ability of T
2
has increased from 0.333 to 0.40 and the probability of T
3
has
decreased from 0.333 to 0.16.
2.1.8 Random variables and probability distributions
In many applications of probability theory, statistics and reliability especially,
it is important that a numerical value be allocated to an outcome from an
experiment. Notice that in the experiments described earlier, this was not
necessarily the case. In fact, experiment 1 is such an experiment with sample
space:
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
If this occurs, it can be of interest to assign a real number to each outcome
of S. A natural association of outcomes from experiment 1 to real numbers
is the number of heads in each outcome. Then the outcome HHH is asso-
ciated with the number 3; HHT, HTH and THH are associated with 2 and
so on, where each outcome is associated with one, unique real number. If
X=number of heads, then X is a function of the outcomes that associates
each outcome with a unique real number. In this case, X is called a random
variable.
Denition: A random variable is a rule or function that associates
each outcome of the sample space with a real number.
38 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Notice that for a discrete sample space, values of a random variable can
be considered events because each value of a random variable is induced by a
subset of outcomes from S. Thus it is natural, in this case, to assign a proba-
bility to that value of the random variable in the same way that probabilities
were assigned to events. That is, the probability that a random variable has
a particular value is the sum of the probabilities that were assigned to the
outcomes which induce that particular value of the random variable.
EXAMPLE OF A RANDOM VARIABLE: As in Example 1, a
coin is tossed 3 times and each of the 8 possible outcomes in S is assigned
probability
1
8
. Let X=number of Hs:
P(X = 0) = P(TTT) =
1
8
P(X = 1) = P(HTT) + P(THT) + P(TTH) =
3
8
P(X = 2) = P(HHT) + P(HTH) + P(THH) =
3
8
P(X = 3) = P(HHH) =
1
8
The association of each value of a random variable with the probability
of that value occurring is called the probability distribution of that random
variable.
Figure 2.7: Outcomes:Sum of spots on a Toss of a Single Pair of Dice
EXAMPLES OF DISCRETE PROBABILITY DISTRIBUTIONS:
a) In the above example of random variable X, the distribution of X is given.
2.1. PROBABILITY REVIEW 39
b) Experiment 2: Let Y=sum of the spots showing on the dice. The proba-
bility distribution of Y is illustrated in Figure 2.7.
Usually, random variables are denoted by capital letters taken from near
the end of the alphabet. Since a random variable is a function or rule, it takes
on specic values at times and these particular values are usually denoted by
a lower case of the same letter as the random variable. Often of interest is
the interval of values when X x. The probability that X x, P(X x),
is usually expressed as a function of x:
F
X
(x) = P(X x).
The function F
X
(x) is called the cumulative distribution function (cdf)
of the random variable X. If it is entirely clear that the distribution function
F is the cdf of a particular random variable X, the subscript of F will be
deleted. Cumulative distribution functions have the properties:
a) lim
x
F
X
(x) = 1, lim
x
F
X
(x) = 0,
b) F
X
(x) is non-decreasing,
c) P(a < X < b) = F
X
(b) F
X
(a).
The cdf may be used to classify random variables into discrete or contin-
uous types. A random variable X is said to be discrete (continuous) if its cdf
F
X
(x) is a discrete (continuous) function. If X is a discrete random variable,
a number p
X
(x
i
) = P(X = x
i
) is associated with the value x
i
of the random
variable. The numbers p
X
(x) satisfy:
a) p
X
(x
i
) 0, for all i
b)

{all i}
p
X
(x
i
) = 1.
Also, F
X
(x) =

x
i
x
p
X
(x
i
).
EXAMPLE OF A DISCRETE CUMULATIVE DISTRIBUTION
FUNCTION: The cdf of the random variable, X=number of heads, in ex-
periment 1: 3 tosses of a coin.
F
X
(x) =
_

_
0 for x < 0
1
8
for 0 x < 1
4
8
for 1 x < 2
7
8
for 2 x < 3
1 for 3 x
40 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Since a continuous random variable has a non-countable innity of pos-
sible values, a positive probability cannot be assigned to each possible value.
Thus, the probability that a continuous random variable will assume any
particular one of its values is zero. Instead, a continuous random variable
has a density function, a non-negative function dened for all real values x,
which represents the a continuous distribution of mass over the values of the
random variable and the mass over any interval of the random variable is nu-
merically equal to the probability that the random variable assumes a value
in that interval. The density function f(x) is a function with the property
that this mass is given by the integral of f(x) over the interval. The integral
represents the area under the graph of f(x) between the end points of the
interval. Thus,
P(a < X < b) =
_
b
a
f(x)dx.
Since probabilities are non-negative, the density function must be non-negative
and
_

f(x)dx = P(< X < ) = 1


Also, since F(x) is continuous,F

(x) exists and F

(x) = f(x).
EXAMPLE OF A CONTINUOUS PROBABILITY DISTRI-
BUTION: The density of a continuous random variable X is given by:
f(x) =
_

_
0 for x < 0
4 8x for 0 < x < 0.5
0, for x > 0.5
The probability that X is between 0 and 0.1 is given by:
P(X < 0.1) =
_
0.1
0
(4 8x)dx =
_
4x
(8x
2
)
2
_
|
0.1
0
= 0.36
Also, since
F(x) =
_
x
0
(4 8y) dy = 4x 4x
2
0 < x < 0.5
P(0.2 < X < 0.4) = F(0.4) F(0.2) = 0.96 0.64 = 0.32
2.1. PROBABILITY REVIEW 41
Figure 2.8: Triangular Density Function
P(0.2 < X < 0.4|X > 0.2) =
0.32
1 F(0.2)
=
0.32
0.36
= 0.89
Thus far in the review of probability, the structure underlying a prob-
ability distribution has been developed from the denitions of experiment,
sample space and random variable. Although this structure exists and is
sometimes important to consider, it is more often used as a background sup-
port that is not detailed. Most of the time, the probability distribution of
a random variable is assumed to be of a certain known type, based on past
experience or judgment.
Often, a random variable is a mechanism that arises as a result of a pro-
cess with certain characteristics and these characteristics of the process give
rise to the probability distribution of the random variable. In the case of ex-
periments which result in continuous sample spaces, it can often be assumed
that the outcomes themselves are values of a continuous random variable and
a distribution on that random variable can be specied without the interven-
ing complex structure.
42 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
EXAMPLES OF PROBABILITY DISTRIBUTION SPECIFI-
CATIONS:
a) Consider the case in which an extremely large lot (so large that we can
consider it to be innite) is tested until the rst defective is found. If the
lot has 2% defectives and once again letting X represent the number of trials
until the rst defective is found, we have P(X = x) = 0.98
(x1)
0.02. This
means that the rst (x 1) trials have non-defectives (each with probability
0.98) and then we follow these trials with the rst defective (with probability
0.02). The distribution of the random variable, X, for this case is called the
geometric distribution.
In general, the probability that the kth success occurs on the trial num-
bered X is given by
_
x 1
k 1
_
p
k1
q
xk
p =
_
x 1
k 1
_
p
k
q
xk
.
To obtain this result, we have applied the binomial distribution to obtain
(k 1) successes anywhere among the rst (x 1) trials, followed by a
success. This distribution is known as the negative binomial or Pascal distri-
bution. If p = 2% = 0.02 the probability that the 4th defective is found on
trial 9 is
_
8
3
_
0.02
4
0.98
5
= 8.1 10
6
. The geometric distribution is a special
case of the negative binomial distribution with k = 1.
b) A random number generator generates a 10-digit number X between 0
and 1. Every value is equally likely to occur. X is assumed to be a continuous
random variable. The density of X is then
f(x) =
_
1 for 0 < x < 1
0, otherwise.
Hence we may say that random numbers are uniform on the unit interval.
The continuous uniform distribution is given by
f(x) =
_
1
ba
for a < x < b
0, otherwise.
For the special case of the random variable, X representing a random number
as described above, a = 0 and b = 1 and
f(x) =
_
1 for 0 < x < 1
0, otherwise.
2.2. SOME USEFUL DISCRETE DISTRIBUTIONS 43
Figure 2.9 illustrates the distribution of random numbers or equivalently
the distribution of a random variable which is uniform on the unit interval.
Figure 2.9: Distribution of a Uniform Random Variable on the Unit Interval
2.2 Some useful discrete distributions
2.2.1 Binomial distribution
The binomial arises from the Bernouli trials situation in which there are n
independent trials with two outcomes on each trial (arbitrarily called success
and failure) and the probability of either outcome is constant from trial to
trial. Suppose that p is the probability of success and that Sq=1-pS is the
probability of failure. let X =the number of successes in n Bernouli trials.
Then
P(X = x) =
_
n
x
_
p
x
q
nx
x = 0, 1, 2, . . . , n,
where
_
n
x
_
=
n!
x!(n x)!
; n! = n(n 1)(n 2) . . . , (2)(1)
For example, consider 10 tosses of a pair of fair dice. We wish to know
the probability of exactly 3 7s in 10 tosses. Although we have more than
two outcomes here, we may arbitrarily say that the other outcome for use
with the binomial distribution is a non-7. Hence p =
6
36
=
1
6
and q =
5
6
,
then
P(X = 3) =
_
10
3
_ _
1
6
_
3
_
5
6
_
7
= 120
_
1
216
__
78125
279936
_
= 0.155
44 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
In reliability, p would usually represent the probability of a single unit, sub-
system or system surviving t time units.
Example: 8 devices having the exponential distribution with parameter
= 1000 cycles are placed on test for 500 cycles. what is the probability a)
that exactly 4 survive ? and b) at least 5 survive ?
a) p = p(survive 500 cycles) = r(500) = e
500/100
= e
1/2
= 0.6065
q= p(fail within 500 cycles) = f(500) = 1 e
500/100
= 0.3935, thus,
p(exactly 4 of 8 survive 500 cycles) = P(X = 4) =
_
8
4
_
(0.6065)
4
(0.3935)
4
=
70(0.1353)(0.0240) = 0.2271
b) P(at least 5 of 8 survive 500 cycles)=
P(X 5) =

8
k=5
_
8
k
_
(0.6065)
k
(0.3935)
8k
=
= 56(0.0821)(0.0609) + 28(0.0498)(0.1548) + 8(0.0302)(0.3935) + 1(0.0183)(1) =
= 0.2801 + 0.2159 + 0.0959 + 0.0183 = 0.6102
Cumulative values of the binomial distribution are found in most statis-
tics texts and in several computer software packages. That is, values of
B(x, n, p) =

x
k=0
_
n
x
_
p
x
q
nx
x = 0, 1, 2 . . . , n are found. To obtain indi-
vidual or marginal probabilities, simply subtract two consecutive cumulative
values. That is, P(X = x) = B(x, n, p) B(x 1, n, p). The mean of the
binomial variate is np and the variance is np(1 p).
2.2.2 Poisson Distribution
A random variable X with a Poisson distribution takes the values x =
0, 1, 2, . . . with a probability mass function
P(X = x) =
e

x
x!
where is the parameter of the distribution.
We note that, compared to the binomial,there is no n (number of tri-
als) and that the random variable, x, can assume all possible integer val-
ues. Again, the random variable, x, counts successes. However, the number
of trials is innite and hence the number of successes is unlimited. The
parameter,, is the mean number of successes. Thus, it is necessary to know,
2.2. SOME USEFUL DISCRETE DISTRIBUTIONS 45
estimate or hypothesize the mean in order to obtain probabilities of x suc-
cesses. In practice, the number of occurrences of any event must be nite,
but the Poisson works well even though only the rst few values of x have
probabilites signicantly dierent than zero. For example, there are many
situations where random arrivals are of interest. The number of automobiles
arriving at a particular point on a highway or at a particular intersection,
buzz-bombs falling on London during World War II and semi-nished prod-
uct arriving on a conveyor belt to the next stage of manufacturing have all
been tracked successfully using the Poisson distribution. The distribution of
aws in materials or goods of a xed size or area tends to be Poisson.
Example Suppose that the number of violent storms arriving on the gulf
coast of the United States is a Poisson random variable with a mean of 4.2
per year. What is the probability that in a given year there will be
a) no violent storms
b) exactly 3
c) four or less ?
SOLUTION:
a) P(X = 0) =
e
4.2
4.2
0
0!
= e
4.2
= 0.015
b) P(X = 3) =
e
4.2
4.2
3
3!
=
0.015(74.088)
6
= 0.185
c) P(X 4) =

4
k=0
e
4.2
4.2
k
k!
= 0.015 + 0.063 + 0.1323 + 0.185 + 0.194 = 0.590
In computing we have used the recursive relationship P(X = x) =
P(x, ) = P(x 1, )

x
. Cumulative tables and computer software for the
Poisson are also widely available. In reliability, the Poisson has many uses.
One is illustrated here: Suppose that we have a single unit operating with 4
identical units in standby. Suppose further that each unit has the exponen-
tial distribution with parameter = 0.001 failures per cycle ( = MTBF =
1

= 1000 cycles). We seek the reliability of this system for a mission of


2000 cycles. Assuming that the switch used to turn on the next stand-by
unit when the previous unit fails is completely reliable, we solve this prob-
lem using the Poisson. The system is reliable as long as we have 4 or fewer
failures. This is equivalent, with identical units, to allowing one unit to fail
4 or fewer times in 2000 cycles. Furthermore, we take advantage of the rela-
tionship between the Poisson and the exponential distributions. The Poisson
46 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
counts random occurrences and the exponential measures time between oc-
currences. The parameter, , is the same for both. It represents, in this case,
the mean or expected number of failures in 2000 cycles which is 2. Thus,
P(X failuresfailure mean, =
2
2000
cycles) =
e

x
x!
P(4 or fewer failures
= 2) = e
2
(1 +
1
1
1
+
2
2
2!
+
2
3
3!
+
2
4
4!
) = 0.9473.
2.3 More about distributions
2.3.1 Multivariate, marginal and conditional distribu-
tions
There is sometimes interest in probability computations involving two or
more random variables. The term multivariate probability distribution refers
to a joint probability distribution of r random variables. Since the details of
distributions of r random variables are exactly the same as for distributions
of 2 random variables, only probability distributions of 2 random variables,
called joint distributions will be dened here.
Denition: The joint cumulative probability distribution of random vari-
ables X and Y is
F
X,Y
(x, y) = P(X x, Y y), for all x, y.
The distribution of X, called the marginal distribution, in this case, can be
obtained from the joint distribution of X and Y by:
F
X
(x) = P(X x) = P(X x, Y < )
If X and Y are both discrete random variables, there is a joint probability
mass function of X and Y denoted by p(x, y) = P(X = x, Y = y). If X
and Y are both continuous random variables, there exists a joint probability
density function f(x,y) dened for all real x and y such that
P(x
1
< X < x
2
, y
1
< Y < y
2
) =
_
y
2
y
1
_
x
2
x
1
f(x, y)dx dy
Denition: Random variables X and Y are independent if, for all x
and y, P(X < x, Y < y) = P(X < x)P(Y < y). In terms of the joint
distribution function F of X and Y, X and Y are independent if F(x, y) =
F
X
(x)F
Y
(y). The marginal distribution can be regarded as a distribution
2.3. MORE ABOUT DISTRIBUTIONS 47
obtained by summing across the joint distribution. If the joint distribution
is viewed as a 3-dimensional plot with the up-dimension being the value of
the probability or the density, then the marginal can be seen as the reection
of the joint distribution on one of the upright sides of the 3-dimensional cube
that encloses the joint distribution. Also, the darkness of the shadow of
the reection indicates an increase of the height of the reected shadow.
This is meant as an intuitive representation only; the actual determination
of the marginal distribution from the joint distribution is as was outlined
earlier. It is often also of interest to nd the distribution of one of two
jointly distributed random variables, given a particular value of the other.
That is, the distribution of X, given that Y = y, may be of interest and this
distribution is called the conditional distribution of X given Y = y. If both X
and Y are discrete random variables, this conditional distribution is denoted
by
P
X|Y
(x|Y = y) = P(X = x|Y = y) =
P(x, y)
P(y)
If both X and Y are continuous random variables, the density of the condi-
tional distribution can be written in terms of the joint and marginal densities
f
X|Y
(x|y) =
f
X,Y
(x, y)
f
Y
(y)
, if f
Y
(y) > 0
In terms of the geometrical description, the conditional distribution is the
adjusted distribution on a slice through the joint distribution where the ad-
justing is to make the probabilities sum or integrate to 1.
EXAMPLES OF JOINT, MARGINAL AND CONDITIONAL
DISTRIBUTIONS:
a) Consider the situation where 2 samples of 5 units are taken from a
large group of units of which 1 are defective and 99% are non-defective.
Let X=number of samples of the 2 that contain all non-defective units and
let Y=number of samples that contain exactly 1 defective unit and 4 non-
defective units. It can be noticed that, although the samples are indepen-
dent with respect to the number of defectives in each sample, the random
variables X and Y are dependent. The values of the random variables are
determined by the number of samples that contain a specied number of
defective units and so since there are only 2 samples, X and Y are depen-
dent. For example, if it is known that X=2, it is clear that Y must be 0.
48 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Further, P(Y = 0|X = 2) = 1. The joint probabilities are given in Table 2.1
and the marginal distributions are given by the marginals of the table. The
computations in the table are based on P(5 non-def)=5(0.99)=0.95 and P(4
non-def)=5(0.99)4(0.01)=0.048.
Y
0 1 2 Total
0 0 0.0002 0.0023 0.0025
X 1 0.0038 0.0912 0 0.0950
2 0.9025 0 0 0.9025
Total 0.9063 0.0914 0.0023 1
Table 2.1: Joint Distribution of X and Y
b) consider the joint density
f(x, y) =
_
4(1+xy)
5
for 0 < x, y < 1
0, otherwise.
It follows that
f(x) =
_
(4+2x)
5
for 0 < x < 1
0, otherwise.
f(y) =
_
(4+2y)
5
for 0 < y < 1
0, otherwise.
X and Y are not independent. The dependence of X and Y is true although
f(y|x = 0.5) =
(4+2y)
5
= f(y). Thus, it can occur that a conditional adjusted
slice through the joint density can result in the marginal projection, but for
independence, it is necessary for the condition f(y|X = x) = f(y) to hold
for all values of x. The contours of the joint density of X, Y are given in
Figure 2.10.
2.3.2 Empirical distributions
a)Histogram: Let the range of values of a particular random variable X be
partitioned into intervals of equal length. Also, let the probability that the
values of X lie in any interval be interpreted as a relative frequency. If a
2.3. MORE ABOUT DISTRIBUTIONS 49
Figure 2.10: Contours of Joint Density Function
group of X values is now observed, then the observed relative frequencies
in the groups can be used to represent the probabilities and an approximate
distribution of X can be displayed visually. Such a display is called an empir-
ical distribution or histogram and is often used to examine if the probability
distribution of X is of a certain type.
To draw a histogram, let the horizontal axis represent the values of the
random variable and draw the boundaries of the intervals, called classes. Let
the vertical axis represent the relative frequencies. It is easiest to choose the
classes of equal length so that the heights of the rectangle or bar over the
classes are proportional to the relative frequencies.
The empirical distribution can show interesting features of the values that
might otherwise be unnoticed. Also, from the display one can more easily
notice the range of the values, the shape of the concentration of the values
and whether that shape is symmetric or skewed, whether there are gaps in
the values and whether there are outliers, that is, values that are markedly
dierent from the others.
b)Stem and Leaf Plot: Another method of illustrating the empirical dis-
tribution and which has the additional benet of preserving the actual indi-
50 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
vidual values (which are lost in the use of the histogram method) is the stem
and leaf plot. The observed values of the random variable are considered to
be of two parts, a stem, consisting of one or two of the leading digits, and
a leaf, consisting of the remaining digits. The digits then are used to sort
the values into groups of numerical order and at the same time are used to
display the values and their frequency.
EXAMPLE OF A STEM AND LEAF PLOT: Table 2.2 presents
25 observed values of the time to failure of a particular component in hours:
41.3 32.7 65.4 53.4 27.3
29.8 21.3 52.6 35.6 26.5
75.1 31.2 57.7 20.2 45.8
46.9 55.2 39.8 28.9 24.3
22.8 36.5 44.8 33.4 21.7
Table 2.2: Time to Failure for 25 Components
In this case, the rst digit can be chosen for the stem and the remaining
two digits for the leaf. This choice results in 5 stems and the plot in Fig-
ure 2.11. The number of stems is chosen for viewing ease and there is some
personal choice. In general, too few stems result in a lack of discrimination
in the view and too many stems result in too much noise in the view. Most
often the advice is given that an eective view is obtained when between 5
and 20 stems are used. If there are too few stems with a natural choice for
stems and leaves, it is possible to increase the number of stems while keeping
the same stem and leaf choice. For this, each stem is used twice, where leaves
0,1,2,3,4 are recorded on the rst line of the stem and leaves 5,6,7,8,9 are on
the second. Then the plot is said to have 2 lines per stem. This procedure
can be extended and the next natural choice for extension is to 5 lines per
stem. In the example of Figure 2.11, there may be some who believe that
there are too few stems to get a good view of the empirical distribution. In
this case, one would use 2 lines per stem and redo the gure.
c)Box Plot: Another display of the values of a random variable that
provides visual shape information about the about the relative frequency
2.3. MORE ABOUT DISTRIBUTIONS 51
distribution is called a box plot. The box plot also provides clearer infor-
mation about the location or center of the distribution, about the dispersion
and skewness of the distribution and about the tails of the distribution. Box
plots, because of the information about the tails, can be useful in determin-
ing whether an extreme value or values should be considered outliers from
the distribution. An outlier would be a value that should not be considered
as an observation from the distribution in question. The construction of box
plots will be discussed in a later section when more of the necessary statisti-
cal tools have been dened.
d)Other Empirical Distribution Methods: There are a number of other
important method for examining empirical distributions, especially for the
distributions which typically arise in reliability studies. Because several of
these are so related to reliability distributions, their presentation will be
postponed until more statistical and reliability tools have been dened.
Figure 2.11: Stem and leaf plot
52 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
2.3.3 Transformation of variables
In reliability engineering, it is often necessary to comprehend the transforma-
tion of a variable in order to understand the relationship between densities.
If we are given a density function , say f(x), and we wish to know the den-
sity of some function of X, say Y= u(x), then we obtain this density, g(y),
using a variable transformation. This is a straightforward procedure which
is usually taught in any calculus sequence. For example, a transformation of
two variables is made to develop the relationship between rectangular and
polar coordinates. In our treatment of reliability, we will always transform
only one variable. Shown below are two dierent means of getting the same
result. Method 1 uses the Jacobian determinant and Method 2 uses the def-
inition of the cumulative density function to obtain the density of the new
variable. In the example shown, we are given the exponential density as f(t)
and we wish to know the density of the new random variable, Y =
1
T
. In the
Jacobian method, we rst solve for T in terms of Y, then substitute t for y
in the density of y (the exponential) and multiply this result by the absolute
value of the determinant of the single element matrix,
dt
dy
. If the cumulative
density of the original variable can be easily obtained, Method 2 becomes
quite simple.
Method 1. Given f(x); suppose that Y = u(X)
Then X = w(Y ). Let w be a monotone function. Then
g(y) = f[w(y)]

dx
dy

Example:
f(t) =
_
1

for t > 0
0, otherwise.
Let Y =
1
T
. Find g(y)
T =
1
y
, g(y) =
1

1
y
|
1
y
2
| =
1
y
2

1
y
.
Method 2. Obtain the cdf of Y , G(y), and then dierentiate to obtain
g(y).
EXAMPLE
f(t) =
_
1

for t > 0
0, otherwise.
2.4. ELEMENTS OF STATISTICS 53
Let Y =
1
T
Find g(y).
G(Y ) = P(Y y) P(T
1
y
)
=
_
1/y
0
1

dt = 1 e

1
y
therefore, g(y) =
d
dy
_
1 e

1
y
_
=
1
y
2

1
y
as before.
2.4 Elements of statistics
2.4.1 Introduction
Usually the result of an experiment or test is a set of observations, mea-
surements or data. One view of statistics is that it is the science of making
inferences about a population based on an analysis of sample data from that
population. The process of taking a sample of data is important and basically
one wishes that the sample be representative of the population. An impor-
tant method of selecting a sample is random sampling and most statistical
techniques are based on the assumption of a random sample. In this section,
only a brief outline of the techniques that are used in statistics to examine
the data once it is obtained will be presented. These techniques will include
only computational techniques, as the graphical techniques will be presented
elsewhere. The techniques will include the summarization of the data by
computation of estimates that represent the characteristics of the population
from which the data are a sample and the making of statistical inferences
from the summarized data. Distributions that are useful in reliability are
treated in Chapter 3, but some discussion of sampling distributions, as
needed, will be provided in this chapter.
2.4.2 Moments and parametric estimation
The outline begins with the denition of the characteristics of the popula-
tion (or its representative distribution) called moments or expected values
and designated by the Greek letter
i
or E(x
i
). Characteristics of the pop-
ulation are called parameters and the moments represent certain of these
characteristics, such as the center of gravity (rst moment or mean), the
54 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
dispersion (the second moment about the mean), the skewness (a function of
the third moment) or the kurtosis (a function of the fourth moment). Also,
any function of the data that does not depend on unknown parameters is
called a statistic. The statistics that are usually used to estimate the mo-
ments are presented here with a short description of some of the properties
of these estimates. First, the denitions of moments:
Denition: The ith moment of a distribution represented by a density f(x)
is

i
=
_
x
i
f(x)Dx = E(x
i
). (2.11)
The rst moment, the center of gravity of the distribution or the measure
of its central tendency, is usually denoted by , the population mean,
and estimated by the sample arithmetic mean if the data are somewhat
symmetrically mound-shaped. The sample mean is denoted by:
X =
1
n
n

i
x
i
, (2.12)
where n is the number of observations in the sample. The measure of
dispersion of the distribution is usually characterized by the second moment
about the mean or E(x
2
{E(x)}
2
=
2
, called the population variance.
The sample variance is
s
2
=

n
i
(x
i
x)
2
n 1
(2.13)
and the square-root of the variance is the standard deviation. Note that
the denominator of the sample variance is n-1. Both of these sample measures
X and s
2
, have the property of being unbiased, which is the property that
the mean of the sample measures for all possible samples is the characteristic
itself. It is the divisor n-1 that allows this property for s
2
. The sample
skewness is measured by:
_
b
1
=

(x
i
x)
3
/n
_

(x
i
x)
2
/n
or by
_
b
1
=
n
(n 1)(n 2)
n

i
_
(x
i
x)
s
_
3
(2.14)
The distribution skewness, usually the value
_

1
=

3

3
, indicates the
direction and length of the tail of the distribution. A negative value of
2.4. ELEMENTS OF STATISTICS 55
skewness indicates that the data tails o to the left, a value near zero indicates
that the data tend to look symmetric and a positive value indicates that the
data tails o to the right. The sample kurtosis is measured by:
b
2
=
n(n + 1)
(n 1)(n 2)(n 3)
n

i
_
x
i
x
s
_
4

3(n 1)
2
(n 2)(n 3)
+ 3 (2.15)
The distribution kurtosis, denoted by
2
=

4

2
2
, indicates the heaviness of
the tails of the distribution. The normal distribution has a kurtosis value of 3.
Examples: Means and Variances
Recall that the mean of a discrete-valued random variable is given by
= E(X) =

all x
xP
X
(x).
Thus, we see that the mean is a weighted average the sum of values of X
weighted by the probability that each value occurs. The mean is a measure
of central tendency. It is the rst moment about the origin. The variance of
a discrete-valued random variable, X, is given by

2
= V AR(X) = E[(x )
2
] =

all x
(x )
2
P
X
(x)
or alternatively,

2
= V AR(X) = E(x
2
)[E(x)]
2
= E(x
2
)
2
=

all x
x
2
P(x)
_

all x
xP(x)
_
2
The variance is a measure of dispersion or spread. It, in essence, represents
the average squared distance of all values from the mean. The standard de-
viation, , is the square root of the variance.
Example: For the experiment in which two dice were tossed and X rep-
resented the sum of the upturned faces,
E(X) = 2(1/36)+3(2/36)+4(3/36)+5(4/36)+6(5/36)+7(6/36)+8(5/36)+
9(4/36) + 10(3/36) + 11(2/36) + 12(1/36) = 7
V AR(X) = 2
2
(1/36)+3
2
(2/36)+4
2
(3/36)+5
2
(4/36)+6
2
(5/36)+7
2
(6/36)+
56 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
8
2
(5/36)+9
2
(4/36)+10
2
(3/36)+11
2
(2/36)+12
2
(1/36)7
2
= 1974/3649 =
54.83 49.00 = 5.83
The mean of a continuous-valued random variable is given by
= E(X) =
_
all x
xf(x)dx
As in the discrete case, we see that the mean is a weighted average it is the
rst moment about the origin. The variance of a continuous-valued random
variable, X, is given by

2
= V AR(X) = E[(X )
2
] =
_
all x
(x )
2
f(x)dx
or alternatively,

2
= E(X
2
) E(X)
2
= E(X
2
)
2
=
_
all x
x
2
f(x)dx
__
all x
xf(x)dx
_
2
As before, the standard deviation, , is the square root of the variance.
Example: for the density function given earlier,
f(x) =
_

_
0 for x < 0
4 8x for 0 < x < 0.5
0, x > 0.5
E(X) =
_
0.5
0
x(4 8x)dx = 2x
2

8x
3
3
|
0.5
0
=
1
2
13 =
1
6
= 0.1667
V (X) =
_
x
2
(4 8x)dx
_
1
6
_
2
=
4x
3
3
2x
4
|
0.5
0

_
1
6
_
2
==
1
6

1
8

1
36
=
1
72
= 0.0139
2.4.3 Samples, statistics and sampling distributions
Random Sample A sample x
1
, x
2
, . . . , x
n
is said to be a random sample
from a population, if (1) each item in the population has an equal likelihood
of being a part of the sample and (2) the selection of any item in the sample
does not inuence the selection of any other item and is not inuenced by
the selection of any other item in the sample.
2.4. ELEMENTS OF STATISTICS 57
Characteristics of the sample are called statistics. Examples of statis-
tics are the mean, the median, the range and the standard deviation (of the
sample values). Characteristics of the population are called parameters.
The word parameter also has other meanings in mathematics and statistics.
Examples of parameters are the population mean, the population variance
and the population skewness. Statistics are usually, by convention, assigned
Latin letters and parameters are assigned Greek letters. e.g., x and s are
statistics; and are parameters.
Parameter Estimation. Statistics are used to estimate parameters. For
example, the sample average is used to estimate the population mean and
the sample standard deviation is used to estimate the population standard
deviation. x =

n
i
x
i
is used to estimate and
s =

n
i
(x
i
x)
2
n 1
=

n
i
x
2
i

(

n
i
x
i
)
2
n
n 1
is used to estimate . In general, the estimator of a parameter, is given the
symbol

.

is called theta estimate or theta hat.
Desirable Properties of Estimators
1) Unbiasedness: An estimator

is said to be an unbiased estimate of if
E(

) = . X is an unbiased estimate of . The median and the mode are


also unbiased estimates of . . s is an unbiased estimate of . The bias of an
estimator

, is given by E(

) .
2) Small Variance: The variance of an estimator

is given by

2
(

) = V AR(

) = E[(

E(

)]
2
=

all

[(

E(

)]
2
P(

)
Or _
all

[(

E(

)]
2
f(

)d

Obviously if

has a small variance then the spread around its mean is small
and any selected value of

will be close to the mean of

and if the mean
58 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
of is close to , then

is a good estimator. The mean square error (MSE)
combines the variance and bias of an estimator as a single measure.
MSE(

) = E(

)
2
= E(

2
2

+
2
) = E(

2
) 2E(

) +
2
We now add and subtract the same quantity
E(

)
2
= E(

2
) 2E(

) +
2
[E(

)]
2
+ [E(

)]
2
.
Rearranging,
E(

)
2
= E(

2
) [E(

)]
2
+ [E(

)]
2
2E(

) +
2
MSE(

) = E(

)
2
= V AR(

) + [E(

) ]
2
or
MSE(

) = E(

)
2
= V AR(

) + [bias]
2
.
Thus the MSE of an estimator

combines information about its variance
and bias. The better the estimator, the smaller the MSE. If the estimator
is required to be unbiased, that is, E(

) = , then E(

)
2
= V (

). In
this case, minimizing MSE yields

that is the minimum variance, unbiased
estimator (MVUE), a desirable property for an estimator.
When we have two or more estimators of the same parameter, one way of
comparing them pairwise is to calculate their relative eciency. Given two
estimators of ,

1
and

2
, then the eciency of

2
relative to

1
is given by
Relative eciency =
V AR(

2
)
V AR(

1
)
Example: It can be shown that the variance of the sample median for
normal distribution (?) is, for large n, V AR(med) = (1.2533)
2
2
n
. We know
that the variance of the sample mean X is

2
n
. Thus, the eciency of the
sample median relative to the sample mean is
V AR(X)
V AR(med)
=

2
/n
(1.2533)
2

2
/n
=
1
1.2533
2
= 0.6366
3. Consistency
Let

n
be the estimate of after n observations have been taken. The esti-
mator

n
is said to be a consistent estimator of if for any positive number,

lim
n
P(|

n
| ) = 1
2.4. ELEMENTS OF STATISTICS 59
This means that as the sample size increases

is getting closer and closer to .
The minimum variance unbiased estimator (MVUE) will possess the smallest
variance possible among unbiased estimators. If, in addition, V AR(

) 0
then this estimator (or, rather, a sequence of estimators) will be consistent.
It can be shown that for normal distribution, the sample mean X is the
MVUE estimator of the population mean, .
4. Suciency
Another important property of an estimator is the property of suciency.
An estimator is said to be sucient if it contains all of the information in the
sample regarding the parameter. Furthermore, if an unbiased estimator and
a sucient statistic exist for the best unbiased estimator of is an
explicit function of the sucient statistic. If there exists a unique function
of the sucient statistic for , then this is necessarily the best estimator for
.
E.g., for normal distribution, the sample mean X is a sucient statistic
for estimating .
2.4.4 Normal Distribution:
The Central Limit Theorem states that, under fairly general conditions,
the distribution of a sum of n independent random variables, for suciently
large n, is approximately a normal distribution. Furthermore, the normal
distribution can be eectively used to approximate other important sam-
pling distributions, such as the binomial. It follows that the distribution of
the random variable X, from a random sample, is approximately normally
distributed with mean and standard deviation

n
, where and are
the mean and standard deviation of the individual random variables in the
sample and n is the sample size. That is,
X

n
= Z, where f
Z
(z) =
1

2
e

z
2
2
(2.16)
Z is said to have a standard normal distribution, with mean 0 and standard
deviation 1 or one writes Z N(0, 1).
Values of the cdf (the area below) (z) = P(Z z) are found in tables
like that in the appendix, or software. The value of z which has an area of
above that value will be denoted as z

. Similarly, the value of z which


leaves an area of below that value is given by z

, due to the symmetry


60 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
of standard Normal distribution. E.g., z
0.025
= 1.96 and z
0.975
= 1.96. A
(1 ) condence interval for is given by:
1 = P
_
X z
/2

n
X + z
/2

n
_
Example: Suppose that the lifetime of wooden telephone poles is normally
distributed with a mean of 20.2 years and a standard deviation of 2.1 years.
The probability that a pole will survive beyond 23 years is
1
_
23 20.2
2.1
_
= 1 (1.3333) = 1 0.9088 = 0.0912
The probability distribution of a statistic is called a sampling distribution.
There are several important sampling distributions to consider for various
statistical techniques such as the normal, chi-square, Students t and the F
distributions. These distributions will be outlined briey here.
2.4.5 The Chi-Square Distribution
The sum of squared standard normal random variables is said to have a chi-
square distribution, with parameter r, where r is the number of variables in
the sum. That is:

2
= Z
2
1
+ Z
2
2
+ . . . , +Z
2
r
and f

2(u) =
1
2
r/2

_
r
2
_u
r
2
1
e
u/2
, u > 0 (2.17)
where:
(n) =
_

0
x
n1
e
x
dx for n > 0 (2.18)
The chi-square distribution is a special case of Gamma distribution which is
important in reliability theory as it is the distribution of a sum of exponential
random variables. It is also important as the distribution of the sample
variance. In addition, the sum of independent chi-square random variables is
chi-square distributed with the parameter equal to the sum of the individual
parameters.
The gamma function of equation 2.2.8 is used quite regularly in reliability
engineering. If n is an integer then (n) = (n 1)! Also the relationship
(n) = (n 1)(n 1) is always true whether or not n is an integer. Much
more will be said about the gamma function in Chapter 4.
2.4. ELEMENTS OF STATISTICS 61
In reliability, owing to its relationship with the exponential density, the
chi-square might be used for those situations where the exponential is appro-
priate.
It can be shown that for a sample of n independent normals X
i
N(, ),
the quantity
S
2
(n 1)

2
=

(X
i
X)
2

2
has chi-square distribution with n 1 d.f. Using the above relationship, we
may state,
1 = P
_
(n 1)s
2

2
/2,n1

2

(n 1)s
2

2
1/2,n1
_
where
2
,n1
is the value of the chi-sqaure statistic with an area beyond
this chi-square value.
Example: Consider a random sample of 8 items drawn from a population
known to be normal: 45.2, 67.8, 34.6, 21.7, 89.3, 55.5, 78.3 and 49.0. We will
use this sample data to rst obtain a 90% condence interval on
2
and then
to get a lower 80% condence bound for
2
. From the data, s
2
= 505.759.
0.90 = P
_
7(505.759)

2
0.05,7

2

7(505.759)

2
0.05,7
_
0.90 = P
_
3540.31
14.07

2

3540.31
2.17
_
0.90 = P(251.62
2
1631.48)
Now suppose that a 80% lower bound for
2
is desired.
0.80 = P
_
7(505.759)

2
0.20,7

2
_
0.80 = P
_
3540.31
9.80

2
_
0.80 = P(361.26
2
)
62 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
2.4.6 The Students t Distribution:
If Z (0, 1) and V is chi-square with parameter r, then T has the Students
t distribution with r degrees of freedom, where:
T =
Z
_
V
r
and
f(t) =

_
r+1
2
_

r
_
r
2
_
__
t
2
r
_
+ 1
_
(r+1)/2
< t < (2.19)
Thus,
T =
X
S

n
=
X

n
/
_
S
2
/
2
has a t distribution with parameter n 1 since
X

n
= Z N(0, 1) and
S
2

2
=

n
i
(X
i
X)
2
(n 1)
2
=
V
(n 1)
Critical values of t are given in the Appendix.
To obtain values of t from the Appendix, one needs the degrees of freedom,
= n 1, for example t
0.05,6
= 1.943. This means that a t-value of 1.943
with 6 degrees of freedom (sample size is 7) leaves an area of 0.05 beyond it.
Note, from the t tables that lim

t
,
= z

. E.g. t
0.05,100
= 1.660 is fairly
close to z
0.05
= 1.645. A (1 )100% condence interval for is given by:
1 = P
_
X t
/2,n1
s

n
X + t
/2,n1
s

n
_
Example: suppose that a sample of ve telephone failures (believed to be
distributed normally) are: 16.5, 21.4, 11.8 19.7 and 22.9. Find a 95% con-
dence interval on the true mean time to failure. X = 18.46 and s = 4.418.
thus
P
_
18.46 t
0.25,4
4.418

5
18.46 + t
0.25,4
4.418

5
_
0.95 = P[18.46 2.571(1.976) 18.46 + 2.571(1.976)]
= P(13.380 23.540)
2.4. ELEMENTS OF STATISTICS 63
2.4.7 The F Distribution
The ratio of independent chi-square random variables has an F-distribution
with parameters r
1
and r
2
. In this case, a ratio of sample variances has
an F-distribution with parameters n
1
1 and n
2
1, where n
1
and n
2
are
the corresponding sample sizes of the samples from which the variances were
computed. Thus,
S
2
1
/
2
1
S
2
2
/
2
2
has an F-distribution with the above parameters.
This model can be used for inferences about the ratio of population variances.
The F distribution density is given by
f
F
(t) =

1
+
2
2
_

1
2

2
2
_

2
_

1
/2
t
(
1
/2)1
_
1 +

1
t

2
_
(
1
+
2
)/2
, t 0
The F distribution comes about as the ratio of two chi-square variates divided
by their respective degrees of freedom, i.e.,
F
n
1
1,n
2
1
=

2
(n
1
1)
/(n
1
1)

2
(n
2
1)
/(n
2
1)
From above, we can see that the ratio of two variances drawn from normal
populations having the same variance tends to follow an F distribution, i.e.,
S
2
1
S
2
2
F
(n
1
1,n
2
1)
. We can use this fact to produce a condence interval for
the ratio of two variances.
Example: Consider the following two samples, drawn from normal pop-
ulations and test to see if they have the same variance ( = 0.10).
Sample 1: 23.4, 31.6, 29.6, 19.9, 26.4,28.5, 26.7, 20.3, 32.4
Sample 2: 67.4, 69.1, 72.5, 66.8, 80.8,79.9 ,77.4,66.8, 73.6, 71.1, 74.4,76.1
s
2
1
= 24.28, s
2
2
= 20.76, n
1
= 9, n
2
= 12
_
H
0
: s
2
1
= s
2
2
H
1
: s
2
1
= s
2
2
= 0.10
We have
s
2
1
s
2
2
= 1.170 F
0.05,11,8
= 3.31 F
0.95,11,8
= 0.34
0.34 < 1.170 < 3.31 Thus, we can not reject H
0
.
In order to obtain F
.95,11,8
, we used the relationship
F
0.95,11,8
=
1
F
0.05,8,11
=
1
2.95
= 0.34
64 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
In general,
F
1,n
1
1,n
2
1
=
1
F
,n
2
1,n
1
1
2.4.8 Tables of Sampling Distributions:
There are tables of the sampling distributions which include the background
information as well as the necessary values for use of the distributions. One
comprehensive book of tables is by Owen (1962) which is directed toward
students, practitioners and researchers in Statistics.
2.5 PARAMETER ESTIMATION
The general problem of estimating a characteristic of a population or dis-
tribution, called a parameter, is that of deriving a function of the sample
observations, the data, such that the value computed from the sample is usu-
ally close the actual value of the parameter. There may be several dierent
potential estimators for a parameter. For example, if the mean of a distri-
bution is to be estimated, one might consider the sample mean, the sample
median or some other function of the data as an estimator. Candidate esti-
mators are often found by the method of maximum likelihood, least squares
or the method of moments.
2.5.1 Maximum Likelihood:
One method for choosing a point estimator, and in fact, one of the best
methods is the method of maximum likelihood. Let X be a random variable
with probability density function f(x; ), where is an unknown parameter.
Let X
1
= x
1
, X
2
= x
2
, . . . , X
n
= x
n
be a random sample of n observed values
x
1
, x
2
, . . . , x
n
with likelihood function:
L() = f(x
1
, ) f(x
2
, ) f(x
n
, ).
The likelihood function, a function of only the unknown parameter since
the xs are observed values, is essentially then the likelihood of with these
observed values of x. The maximum likelihood estimator of is the value of
that maximizes the function L().
2.5. PARAMETER ESTIMATION 65
Maximum likelihood estimators are not necessarily unbiased but they can
usually be easily adjusted to be unbiased. Also, maximum likelihood esti-
mators have excellent large sample properties, since they are asymptotically
normally distributed, asymptotically unbiased and, under mild conditions of
regularity, asymptotically ecient.
Using the likelihood function, approximate (1 )100% condence in-
tervals on parameters of interest may be obtained by inverting Fishers In-
formation Matrix. Fishers Information Matrix, for the two parameter case,

i
,
j
, is given by
I
ij
= E
_

2
ln L

j
_
i, j = 1, 2
This leads to
_
V ar(

1
) Cov(

1
,

2
)
Cov(

1
,

2
) V ar(

2
)
_
=
_

2
lnL

1
2
|

1
,

2
lnL

2
|

1
,

2
lnL

2
|

1
,

2
lnL

2
2
|

1
,

2
_
1
Using the asymptotic normality property of the MLEs, the (1 )100%
condence intervals for
i
are calculated using

i
z
/2
_
V ar(

i
)
Example:
Consider the exponential distribution for the case in which all units tested
fail. The times to failure are t
1
, t
2
, . . . , t
n
.
L =
_
1

t
1

__
1

t
2

_
. . .
_
1

tn

_
L =
_
1

_
n
e

n
i
t
i

ln L = nln
_
1

_
n
i
t
i

= nln()

n
i
t
i

lnL

=
n

n
i
t
i

2
= 0
Set
ln L

= 0 n +

n
i
t
i

= 0
66 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW

n
i
t
i
n
=

t
Example: Supppose that 6 units are tested until each fails. The time to
failure density is believed to be exponential. The failure times are 450, 540,
670, 710, 990 and 1210. Then

6
i=1
t
i
6
=
4570
6
= 761.67
Often the simultaneous equations represented by the partial derviatives
of ln L with respect to each unknown parameter are dicult to solve and
appoximate and or iterative methods such as the method of Newton-Raphson
must be used. More will be said about this with respect to MLEs for the
parameters of the Weibull density in Chapter 4.
Example: Consider a censored sample from the exponential distribution
with parameter . Censored, for our purposes here means that the unit ran
for a certain time and did not fail and we have recorded its non-failure run-
ning time. Items in our sample either represent failure times, t
if
, or censored
times, t
is
. We will rst develop a general expression for the maximum likeli-
hood estimate of and then look at a specic case. Suppose that our sample,
ordered by time to failure and running time, looks like this:
{t
1,f
, t
2,f
, t
3,s
, t
4,f
, t
5,s
, t
6,s
, t
7,f
, t
8,f
, t
9,f
, t
10,f
, t
11,s
, t
12,s
, t
13,f
}.
This indicates that units 1,2,4,7,8,9,10 and 13 failed at times t
1
, t
2
, t
4
, t
8
, t
9
, t
10
and t
13
and units 3,5,6,11 and 12 survived through times t
3
, t
5
, t
6
, t
11
and t
12
.
Then, using the exponential density,
L =

k=f
_
1

e
t
i,k

_

k=s
_
e

t
j,k

_
= ln L = ln
_
1

8
_
+

k=f

t
i,k

k=s

t
j,k

= ln
8

k=f
t
i,k
+

k=s
t
j,k
_
= 8 ln
1

k=f
t
i,k
+

k=s
t
j,k
_
2.5. PARAMETER ESTIMATION 67

=
8

+
1

k=f
t
i,k
+
1

k=s
t
j,k
= 0
8 +
1

k=f
t
i,k
+

k=s
t
j,k
_
= 0

i
t
i,f
+

j
t
j,s
8
Thus, the maximum likelihood estimator for the multiply censored case is
simply the sum of all failure times and running times divided by the number
of failures. For the example above suppose that the data is as follows:
t
1,f
= 114 t
6,s
= 520 t
11,s
= 1692
t
2,f
= 237 t
7,f
= 774 t
12,s
= 1748
t
3,s
= 251 t
8,f
= 892 t
13,f
= 2107
t
4,f
= 495 t
9,f
= 1055
t
5,s
= 520 t
10,f
= 1278

k=f
t
i,k
= 6952

k=s
t
j,k
= 4731

=
6952 + 4731
8
=
11683
8
= 1460.38
In general, for a sample taken from an exponential distribution with r
failures and c censored items (or non-failures),

r
i=1
t
i,f
+

c
j=1
t
i,s
r
where f indicates failed unit.
2.5.2 Moment Estimators
The method of moments equates sample moments to population moments
and solves for the parameters to be estimated. Population moments have
been observed earlier and they are formally dened now. The kth population
moment about the origin (dened earlier in section 2.3.2) is given by

k
=

all x
x
k
P
x
(x) for the discrete case and
68 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW

k
=
_
all x
x
k
f(x)dx for the continuous case. The kth sample moment about
the origin is given by
m

k
=

n
i=1
x
k
n
Example: Consider the gamma density which is given by
f(x) =
_
x

_
1
e

()
, x > 0
where
() =
_

0
x
1
e
x
dx
The rst two moments about the origin for this distribution are

1
= and

2
=
2
+
2

2
.
Now consider a sample of size n taken on a process variable believed to
be described best by the gamma density. The rst two sample moments are
given by m

1
=

n
i
x
i
n
= X and m

2
=

n
i
x
2
i
n
Hence, we equate
m

1
=

n
i
x
i
n
= X = and m

2
=

n
i
x
2
i
n
=
2
+
2

2
. From the rst
equation, =
m

. Substituting in the second equation and substituting for


, we have
=
m

1
2
m

2
m

1
=
(X)
2
_

n
i
X
i
2
n
_
X
2
=
n(X)
2

n
i
(X
i
X)
2
Substituting in the rst equation, we obtain
=
m

=
X

n
i
(X
i
X)
2
nX
.
Thus the moment estimators of the parameters and for the gamma
distribution are
=
n(X)
2

n
i
(X
i
X)
2
and =

n
i
(X
i
X)
2
nX
.
2.5. PARAMETER ESTIMATION 69
2.5.3 Least Squares Procedure:
Suppose that there is a single dependent variable or response y which is
uncontrolled and depends on one or more independent regressor variables say
X
1
, X
2
, . . . , X
n
which are measured with negligible error and are controlled.
The relationship t to such a set of experimental data is characterized by a
prediction equation called a regression equation. Linear regression involves
treating only the case of a single regressor variable. Let us denote a random
sample of size n by the set {(x
i
, y
i
); i = 1, 2, ...n}. Each observation in this
sample satises the equation y
i
= + x
i
+
i
.
The and in the above model are called regression coecients. The
is the slope of the regression line and is the Y intercept of the regression
line.
i
is a random error term with mean 0 and variance
2
. The random
errors are also assumed to be uncorrelated. The estimated regression line is,
y
i
= a + bx
i
Each pair of observations satises the relation,
y
i
= a + bx
i
+ e
i
where e
i
= y
i
y
i
, and a and b in the above equation are estimators for the
parameters and . These parameters are estimated by minimizing the sum
of squares of the residuals e
i
.
Least Squares Estimation The process is to nd a and b, the estimates
of and such that the sum of the squares of the residuals is a minimum.
The residual sum of squares is called sum of squares error(SSE).
SSE =
n

i
e
2
i
=
n

i=1
(y
i
y
i
)
2
=
n

i=1
(y
i
a bx
i
)
2
Dierentiating SSE with respect to a and b and setting them to 0 we get
(SSE)
a
= 2
n

i+1
(y
i
a bx
i
) = 0
or
n

i=1
y
i
= na + b
n

i=1
x
i
(2.20)
70 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
X Y X
2
XY
1 10 143 100 1430
2 17 137 289 2329
3 22 129 484 2838
4 29 114 841 3306
5 35 98 1225 3430
6 41 87 1681 3567
7 48 79 2304 3792
8 57 59 3249 3363
9 66 48 4356 3168
10 78 41 6084 3198
11 91 35 8281 3185
SUM: 494 970 28894 33606
Table 2.3: Data and calculation for Least squares Example
(SSE)
b
= 2
n

i=1
x
i
(y
i
a bx
i
)
or
n

i=1
x
i
y
i
= a
n

i=1
x
i
+ b
n

i=1
x
i
2
(2.21)
The least squares estimates a and b of the regression coecients are
computed by solving equations (2.20) and (2.21) simultaneously, resulting in
b =
n

n
i=1
x
i
y
i
(

n
i=1
x
i
)(

n
i=1
y
i
)
n

n
i=1
x
i
2
(

n
i=1
x
i
)
2
(2.22)
and
a =

n
i=1
y
i
b

n
i=1
x
i
n
= y b x (2.23)
Example
The data in the Table 2.3 below will now be analyzed using the least
square intercept and slope parameter estimates shown above.
b =
n

n
i=1
x
i
y
i
(

n
i=1
x
i
)(

n
i=1
y
i
)
n

n
i=1
x
i
2
(

n
i=1
x
i
)
2
=
11(33606) (494)9970)
11(28894) (494)
2
= 1.484
2.6. STATISTICAL INFERENCE 71

n
i=1
y
i
b

n
i=1
x
i
n
= y b x =
970
11
(1.484)
_
494
11
_
= 154.827
Figure 2.12 presents a plot of the data and the tted line.
Figure 2.12: Least Squares Regression line Fitted to the data of Table 2.3
2.6 Statistical inference
Statistical inference is comprised of methods of making inferences about the
population from sample observations. Of course, the use of estimates as in
the previous section is a method of making an inference about the population
from sample data and, as such, is considered a statistical inference. There
are two other commonly used methods of statistical inference that will be
outlined here: interval estimation and tests of hypotheses.
2.6.1 Interval Estimation:
In many cases, a point estimate of a parameter does not provide enough
information about the parameter of interest. For example, the point estimate
does not reveal any information about the variability of the estimate. This
situation can be rectied by the use of an interval estimate or condence
interval, whose length is a function of the variability of the estimate.
There are several formal methods of choosing the appropriate values of
the parameter for the condence set.
72 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Often, it is possible to compute a condence interval in a fairly simple
manner, but this simple way of constructing condence intervals requires
that functions of the sample and parameter be found which are distributed
independently of the parameter. Thus, in order to compute a condence
interval on a parameter in this simple way, two functions

l
and

u
of the
data are determined such that Pr{

l
< <

u
} = 1 , where is usually
small and where it is possible to compute this probability indepen-
dently of the value of the parameter. Then suppose this probability is
computed for a large number of similarly taken samples, that is, the limits

l
and

u
are calculated for each sample. One could then interpret probability
as meaning that approximately 100(1 )% of the intervals will cover the
true value of . Using this procedure with one sample results in a method
for obtaining a random interval which covers the true value of the parameter
with a specied probability. This method can be used to construct condence
intervals for the mean of a normal distribution because it is known that
the function
X
s

n
has a t distribution independent of .
Now, suppose that the distribution of an estimator of depends on and
one is interested in computing a 100(1)% condence interval. In the case
where one can determine the density of an estimator of the parameter, say
g(

; ), where

is the estimator and is the parameter, one proceeds as
follows. For a particular specied value of , say
0
, it is possible to nd
two numbers,g
1
(
0
) and g
2
(
0
), such that:
P{

< g
1
(
0
)} =
_
g
1
(
0
)

g(

;
0
)d

=

2
and
P{

> g
2
(
0
)} =
_

g
2
(
0
)
g(

;
0
)d

=

2
The values g
1
() and g
2
() are functions of the values and it follows
that:
P{g
1
() <

< g
2
()} =
_
g
2
()
g
1
()
g(

; )d

= 1
When the functions g
1
() and g
2
() are plotted in the ,

space, as in
Figure 2.13, a condence interval for can be constructed as follows: from
a sample of n, compute

n
and draw a horizontal line through

n
on the
2.6. STATISTICAL INFERENCE 73

-axis. This line will intersect the two curves at points labeled Un and
Ln as in the gure. These two numbers, U
n
and L
n
, when projected on the
-axis as
L
and
U
dene a condence interval for . To examine why this
is true, consider that the sample comes from a population with parameter
value
0
. The probability that the estimate

is between g
1
() and g
2
() is
1 . When the estimate does fall between these values, a horizontal line
through

will cut a vertical line through
0
at some point between the
curves and the corresponding interval (
L
,
U
) will cover . It follows that
the probability is 1 that such an interval will cover . This statement is
true for any population parameter value .
Figure 2.13: Plot of L() and L(

) in the ,

space
Sometimes it is possible to determine the limits (
L
,
U
) for a given
estimate without actually nding the functions g
1
() and g
2
(). Refer to
Figure 2.13 and note that the limits for are at points (
L
,
U
) where
g
1
(
U
) =

n
and g
2
(
L
) =

n
In terms ofg
1
and g
2
, it follows that
U
is
the value of for which: this seems
murky to me!
_

n

g(

; )d

=

2
74 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
and
L
is the value of for which:
_

n
g(

; )d

=

2
If these equations can be solved for , then the solutions are the 100(1)%
condence limits for .
EXAMPLE: Consider the exponential distribution with mean time to
failure . For a sample of 1 observation x,

= x and the above two integrals


result in the equations:
1 e

=

2
and e

=

2
which can be solved for
U
and
L
. That is,
U
=
x
ln(1/2)
and

L
=
x
ln(/2)
.
In the case of a discrete random variable, the above integrals become
sums, but condence intervals having condence coecients exactly equal
to 1 are not available. However, under certain conditions, one can nd
condence intervals having condence coecients not less than 1 . For
example, consider the case of a sample (x
1
, x
2
, . . . , x
n
) which represents a
sample of observation from a binomial random variable with:
f(x, p) = p
x
(1 p)
1x
, x = 0, 1
Now suppose that k of the xs are 1s. The estimator of p is p =
k
n
,where
k =

n
i=1
x
i
can have values 0, 1, 2, ..., n. Then
g(n p; p) =
_
n
n p
_
p
n p
(1 p)
nn p
, n p = 0, 1, 2, . . . , n
The upper 100(1)% condence limit P
U
can be determined by nding
the value of p for which:
k

y=0
_
n
y
_
p
y
(1 p)
ny
=

2
and the lower limit P
L
is the value of p for which:
n

y=k
_
n
y
_
p
y
(1 p)
ny
=

2
2.6. STATISTICAL INFERENCE 75
If k=0, the lower limit is taken to be 0, and if k=n, the upper limit is
taken to be 1.
Another procedure for nding binomial parameter condence intervals
and a procedure that is more easily extendible to multidimensional parame-
ters is as follows: First construct a set which is called a 1 acceptance set
for each possible value of p. For each value of p include in the acceptance
set, values of k such that the sum of the probabilities of the values in the
acceptance set is greater than or equal to 1 . The usual way to do this
is to select the value of k which has the largest probability of occurrence for
the given p. Then continue adding values of k into the acceptance set by
descending order of probability until the total probability of entries in the
acceptance set is greater than or equal to 1 . Condence sets are con-
structed from acceptance sets. For the outcome k that is observed, check the
acceptance set for a value of p and, if k is in the acceptance set for the value
of p checked, then that value of p is in the condence set for p for the k value
that was observed. In regular cases, this procedure results in the same limits
as the procedure discussed earlier.
2.6.2 Hypothesis Testing:
Sometimes, the situation requires that a statement about the parameter in
question be statistically veried as acceptable. The statement is usually
called a hypothesis and the decision-making procedure is called hypothesis
testing. In a statistical test of a hypothesis, the procedure depends on
the statement of two opposing hypotheses, the null hypothesis H
0
, and
the alternative hypothesis H
1
. The hypotheses are statements about the
population or distribution of interest and not statements about the sample.
The hypothesis testing procedures use the sample information to make a
statistical decision about the population hypothesis. If the information
from the sample is consistent with the statement of the hypothesis, then
it is concluded that the hypothesis is true. If the information from the
sample is inconsistent with the hypothesis, then the hypothesis is judged
false. One method of testing hypotheses can be easily based on condence
interval estimation.
The decision to accept or reject the hypothesis is based on a test statistic
computed from the data. Two regions are determined such that when the
test statistic is calculated to be in one region, the hypothesis H0 is rejected
76 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
and when the test statistic is in the other region, H0 is not rejected. The
rst region is called the rejection region and is determined so that the
probability of the test statistic being in the rejection region when H
0
is true
is , a value denoted as the signicance level of the test.
When a decision is made using the sample data, it is possible to make an
error in this decision. There are two kinds of error that can be made when
testing hypotheses: 1) if the null hypothesis is rejected when it is true, an
error, called a Type I error, is made, with probability ; if the alternative
hypothesis is rejected when it is true, an error, called a Type II error, is
made, with probability .
The methods for constructing condence intervals and tests of hypotheses
in reliability situations will be discussed in later chapters and more general
methods can be found in many statistical texts, see for example, Hines and
Montgomery (1990).
2.6.3 Tolerance Limits:
In many applications of statistics it is useful to compare the data on an
item to a set of specications to determine how many of the items satisfy the
specications. Often these specications are called tolerance limits and are
determined by the requirements on the item. Sometimes, tolerance limits are
limits computed from the sample data which have a certain proportion of the
population between them with a certain probability.
If the distribution is known and the values of its parameters specied, then
the computation of tolerance limits is straight-forward from the distribution
function and it is unnecessary to add a probability statement because no
sample data are involved.
If the distribution and/or the parameters of the distribution are unknown,
then the tolerance limits must be computed using some sample data and it
is possible to compute tolerance limits such that with a certain probability
or condence they contain a certain proportion of the population.
Tolerance limits for known distributions are based on estimates of the
parameters of the distribution and are tabled for several important distri-
butions. Non-parametric tolerance limits can also be constructed, based on
sample order statistics. The non-parametric limits are ecient and useful in
many situations.
2.6. STATISTICAL INFERENCE 77
EXAMPLE: Consider a normal distribution with known standard devi-
ation that is, X N(, ). In this case, the sample mean based on a sam-
ple of n items is such that: X N(,

n
). The tolerance limits are bounds
X k, where k is such that:
P{F
X
(X + k) F
X
(X k) 1 } 1
or, in words, the probability is greater than or equal to 1 that the area
between X k is greater than or equal to 1 . Since P{X k
X X + k} 1 will be true whenever X k + z
/2
and
X+k z
/2
(wherez
/2
is the value of the standard normal (see 2.2.6)
for which there is area /2 above), it follows that this will occur whenever
_
z
/2
k
_
n
X

_
k z
/2
_
n
or whenever k = z
/2
+
_
1

n
_
z
/2
. Then this value of k provides tolerance
limits for the distribution of X.
2.6.4 Prediction Intervals:
Another type of statistical limit that reliability practitioners nd useful, in
addition to condence intervals and tolerance limits, is called a prediction
interval. Condence intervals give bounds on characteristics (parameters) of
the population and tolerance limits give bounds on areas or proportions of
the population enclosed in a region. Prediction intervals give bounds on the
values of the next k observations from the same population; that is, a pre-
diction interval is an interval formed from two statistics, L(X
1
, X
2
, . . . , X
n
)
and U(X
1
, X
2
, . . . , X
n
) from a random sample from the population under
consideration such that the interval (L, U) contains the next k observations
from that population, with probability . Note that the prediction interval
does not mean that the interval (L, U) determined from a rst sample of
size n will contain 100% of a large number of second samples of size k. It
does mean: if pairs of samples of sizes n and k are drawn repeatedly, and an
interval (L, U) is computed from each of the samples of size n of a given pair,
then 100% of the samples of size k will fall in the interval corresponding to
that pair. Prediction intervals will be presented for specic situations later
78 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
in this text. At this point, only an example of their possible use in reliability
will be presented. Suppose one has n observations on the time to failure of a
particular device and there is interest in buying k more of these devices. An
important question is: How large should k be so that there is a probability
that at least m(m k) of these devices will operate t
0
units of time or
longer? Prediction intervals can eectively answer that question. See Engle-
hardt and Bain (1978), Hall and Prairie (1971) and Hall and Prairie (1973).
EXAMPLE: Suppose one has a sample of the failure times of 10 de-
vices which can be assumed to follow a normal distribution. In addition, it is
planned that k more of the devices are to be put into use. How large should k
be to give 95% condence that at least m k of the devices will survive past
x
10
+r
m
s
10
with probability .95, where x
10
and s
10
are the sample mean and
standard deviation from the previous sample of n=10 and r
m
= r
10
= 1.0 is
found from Hall and Prairie (1973), when k=16 and m=10.
2.7 Goodness of Fit tests
We review two general tests and then two specic tests for goodness of t
(for the exponential and for the Weibull) the general nature of a goodness-
of-t test is to set up hypotheses such that the null hypothesis reects that
the data is a representive sample from the distribution under consideration.
E.g.,
H
0
: data represents a sample from the normal distribution(with/without
and given)
H
1
: data does not represent a sample from the normal distribution (with/without
and given). The test is conducted at a level of risk, representing the
probability of falsely rejecting H
0
.
2.7.1 Chi-square Goodness-of-Fit Test
The most popular of all goodness-of-t tests, it requires a large number of
data points, say at least 50 and preferably at least 100 data points. We
categorize the data. We note the observed and expected frequencies in each
cell or category. For the continuous case, the data might be categorized as:
2.7. GOODNESS OF FIT TESTS 79
CELL OBSERVED EXPECTED
a
1
x a
2
O
1
E
1
a
2
x a
3
O
2
E
2
. . .
. . .
a
n
x a
n+1
O
n
E
n
Calculate
2
=

n
i=1
(O
i
E
i
)
2
E
i
and compare it with
2
,
, where the degrees
of freedom depend on the number of parameters estimated from the data.
For example, if the null hypothesis is about the normal distribution and
and are estimated from the data and 12 cells are used then the degrees of
freedom are 12 1 2 = 9. A rule of thumb indicates that if any cells have
expected values less than 5, they should be combined with adjoining cells
until the expectation is greater than 5. In this case, the number of cells, n,
is reduced and the new n should be used in determining degrees of freedom.
Consider the following two sets of hypotheses:
SET:1 SET:2
H
0
: data from normal dist. H
0
: data from normal dist.with (, )
H
1
: data not from normal dist. H
1
: data not from normal with (, )
Note that if H
0
in set 1 is rejected, we may conclude that the data is not
from a normal distribution, but if H
0
in set 2 is rejected, we conclude only
that the data is not from a normal distribution with parameters and ,
which implies that the data could be from another distribution with dierent
parameters,

and/or

.
Example:
Table 2.4 presents the following data, taken from a population considered
to be exponential:

=
100(17)+300(14)++3000(8)
1000
= 960
Note that cell mid-points were used to estimate and that 3000 was ar-
bitrarily assigned to represent values beyond 2400. Expected values are now
calculated using the cumulative exponential, F(t) = 1 e

t
960
.
80 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
CELL OBSERVED
0-200 17
200-400 14
400-600 16
600-800 12
800-1000 4
1000-1200 6
1200-1400 6
1400-1600 4
1600-1800 4
1800-2000 2
2000-2200 5
2200-2400 2
> 2400 5
Table 2.4: Data for Chi-Square Goodness-of-Fit Example
CELL OBSERVED EXPECTED
0-200 17 18.81
200-400 14 15.27
400-600 16 12.39
600-800 12 10.07
800-1000 4 8.17
1000-1200 6 6.64
1200-1400 6 5.39
1400-1600 4 4.37
1600-1800 4 3.55
1800-2000 2 2.90
2000-2200 5 2.34
2200-2400 2 1.90
> 2400 5 8.20
The calculated chi-square value is 5.504. This is compared with the chi-
square value = 0.05, with 1011 = 8 degrees of freedom.
2
0.05,8
= 15.51.
Since 5.504 < 15.51 we cannot reject the null hypothesis and we may con-
clude that the exponential distribution is appropriate for this data.
2.7. GOODNESS OF FIT TESTS 81
Combining cells with expectations less than 5 we have
CELL OBSERVED EXPECTED
0-200 17 18.81
200-400 14 15.27
400-600 16 12.39
600-800 12 10.07
800-1000 4 8.17
1000-1200 6 6.64
1200-1400 6 5.39
1400-1800 8 7.92
1800-2200 7 5.24
> 2200 7 10.10
2.7.2 Kolmogorov-Smirnov Test
This non-parametric goodness of t test estimates the empirical distribution
function at each ordered data point x
(i)
by
F
n
(x
(i)
) =
i
n
, i = 1, 2, . . . , n.
We let F
0
(x
(i)
) represent the value of the distribution function evaluated
under the hypothesized density. For each i, we calculate |F
n
(x
(i)
) F
0
(x
(i)
)|.
We then obtain the maiximum of these absolute deviations and compare
it with the critical K-S value, D
n,
(with the risk level ). Specically, if
D
n
= max
i
|F
n
(x
(i)
)F
0
(x
(i)
)| > D
n,,
then we may reject the null hypothesis
that the data is a representative sample of the proposed distribution. The
critical values D
n,
are given in the Appendix. Also, for n > 35, D
n,0.05
can
be approximated by 1.36/

n.
Example: Shown in Table 2.5 below are 10 computer-generated random
numbers. Test the hypothesis that they are from a population uniform on
the unit interval. Note that, for the uniform distribution, F
0
(x
(i)
) = x
(i)
. The
last column of Table 2.5 is for use with the Lohrding test, described in the
next section.
From Table 2.5, we compute D
n
= max
i
|F
n
(x
(i)
) F
0
(x
(i)
)| = 0.161.
From Table 2.8, D
10,0.05
= 0.41. Thus, we cannot reject the hypothesis that
the data is a sample from a uniform (0,1) distribution.
82 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Data Order No. x
(i)
F
n
(x
(i)
) F
0
(x
(i)
) |F
n
F
0
| d
k
0.394 1 0.240 0.1 0.240 0.140 1.79
0.639 2 0.330 0.2 0.330 0.130 1.33
0.748 3 0.363 0.3 0.363 0.063 0.698
0.330 4 0.394 0.4 0.394 0.006 0.218
0.539 5 0.487 0.5 0.487 0.013 0.223
0.984 6 0.539 0.6 0.539 0.061 0.042
0.620 7 0.620 0.7 0.620 0.080 0.118
0.240 8 0.639 0.8 0.639 0.161 0.684
0.487 9 0.748 0.9 0.748 0.152 0.631
0.363 10 0.984 1.0 0.984 0.016 1.078
Table 2.5: Random numbers and their use in K-S test and Lohrdings Test
2.7.3 The Lohrding Test
A generally more powerful extension of the K-S goodness of t test has
been developed by Lohrding (1973). The Lohrding test, whose power was
examined using an extensive simulation study, is based on a statistic implied
by Pyke (1959) and used by Maag and Dicaire (1971). Three versions of the
statistic were considered and the most powerful generally is T =

n
k=1
d
k
n
,
where
d
k
=
|F
n
(x
(k)
)
k
n+1
|
_
(nk+1)k
(n+1)
2
(n+2)
Critical values for T are given by T() = P
1
P
2
(n + P
3
)
P
4
, where for
= 0.10, 0.05, 0.01, the P values are presented in Table 2.6.
Example:
Consider the data in Table 2.5. Recall that the data represents 10 random
numbers. Test the hypothesis that they represent the uniform distribution
on (0,1). Note that for the uniform distribution, F
0
(x(i)) = x(i).
Using the Lohrding test with the data in Table 2.5, the average of the d

k
s
(last column of the table) =T=0.6813 and the critical value is T(.05)=1.47.
2.8. REFERENCES 83
This results in the same conclusion of not reject that was reached using
the K-S test with this example.
Also given in Lohrding (1973) are 100(1 )% condence bounds on the
cdf F that are comparable to the corresponding K-S bounds but which, for
large sample sizes, are generally better, that is, more narrow in the tails of
the distribution than the K-S bounds. The Lohrding bounds are:
_
_
_
Lower 100(1 )% bound:
k
n+1
B()
_
(nk+1)k
(n+1)
2
(n+2)
Upper 100(1 )% bound:
k
n+1
+ B()
_
(nk+1)k
(n+1)
2
(n+2)
where B() = P
1
+ P
2
(n + P
3
)
P
4
and the P values are given in Table 2.6
P
1
P
2
P
3
P
4
0.10 1.23 6.48 3.18 -2.05
0.05 1.42 2.13 2.26 -1.54
0.01 1.84 0.17 -4.02 -0.753
Table 2.6: Values of P used to compute the critical values of T()
For the above example, the 95% condence bounds on F for both the K-S
and the Lohrding techniques are given in Table 2.7. Note that the condence
width for Lohrding technique is smaller than that of the K-S.
2.8 References
Advisory Group on Reliability of Electronic Equipment (AGREE) (1957),
Reliability of Military Electronic Equipment, Task Group 9 Report, Wash-
ington, DC, US Government Printing Oce, June.
Englehardt, M. and Bain, L. J. (1978), Prediction Intervals for the
Weibull Process, Technometrics, 20, pp. 167-169.
Hahn, G. J. and Nelson, W. B. (1973), A Survey of Prediction Intervals
and Their Applications, Journal of Quality Technology, 5, pp. 178-188.
84 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Data interval LL CL UL LK-S CK-S UK-S
0-0.240 0 0 0.245 0 0 0.41
0.240-0.33 0 0.0909 0.418 0 0.1 0.51
0.33-0.363 0 0.1818 0.562 0 0.2 0.61
0.363-0.394 0 0.2727 0.683 0 0.3 0.71
0.394-0.487 0 0.3636 0.785 0 0.4 0.81
0.487-0.539 0.033 0.4545 0.876 0.09 0.5 0.91
0.539-0.620 0.124 0.5454 0.955 0.19 0.6 1
0.620-0.639 0.226 0.6364 1 0.29 0.7 1
0.639-0.748 0.347 0.7273 1 0.39 0.8 1
0.748-0.984 0.491 0.8182 1 0.49 0.9 1
0.984-1 0.664 0.9091 1 0.59 1 1
Table 2.7: Examples of K-S (LK-S and UK-S) and Lohrding (LL and UL)
95% Condence Bounds
Hall, I. J. and Prairie, R. R. (1971), Prediction Intervals in Reliability
Work, Sandia Laboratories Report, SC-DR-70-833, Albuquerque, NM.
Hall, I. J. and Prairie, R. R. (1973), One-Sided Prediction Intervals to
Contain at Least m Out of k Future Observations, Technometrics, 15, No.
4, pp. 897-914.
Hines, William W. and Montgomery, Douglas C., Probability and Statis-
tics in Engineering and Management Science, John Wiley & Sons, New York,
1990.
Lindley, D. V. (1969), Introduction to Probability and Statistics From a
Bayesian Viewpoint, Part I Probability , Cambridge University Press, Cam-
bridge, England.
Lohrding, Ronald K. (1973), Three Kolmogorov-Smirnov Type One-Sample
Tests with Improved Power Properties, Journal of Stat. Computing and Sim-
ulation, Vol. 2, pp. 139-18.
Maag, U. R. and Dicaire, G. (1971), On Kolmogorov-Smirnov Type One-
Sample Statistics, Biometrika, Vol. 54, pp. 653-656.
2.9. PROBLEMS FOR CHAPTER 2 85
Owen, D. B. (1962), Handbook of Statistical Tables, Addison-Wesley Pub-
lishing Company, Inc., Reading, MA.
Pyke, R. (1959), The Supremum and Inmum of the Poisson Process,
Annals of Math. Stat., Vol. 30, pp. 568-576.
Savage, L. J. (1954), The Foundations of Statistics, John Wiley & Sons,
New York.
2.9 Problems for Chapter 2
Problem 2.1 Let a sample space be made up of 7 simple events: e
1
, e
2
, e
3
, e
4
, e
5
, e
6
, e
7
.
Let P{e
1
} = P{e
3
} = P{e
5
} = P{e
1
} = 0.01, P{e
2
} = 0.2, P{e
4
} =
0.05, P{e
6
} = 0.35.
Event A is made up of: e
1
, e
3
, event B is made up of: e
1
, e
2
, e
5
, e
7
and
event C is made up of :e
3
, e
4
, e
5
, e
6
, e
7
. Determine:
a)P(A) b) P(B) c) P(AB) d) P(AB) e) P(C|A) f) P(A
c
)
g) if A and C are independent h) if A and B are independent
i) if there can be any other event D in the sample space that is mutually
exclusive of B C.
Problem 2.2 A small lot of ten items contains 2 defective items. The experiment is:
a random sample of 3 items is selected from the lot and the type of item
(defective [D] or non-defective [N]) is determined. Case A: replace-
ment sampling is used, that is, an item is selected, the defectiveness
determined and the item is replaced in the sample before the next item
is selected. Case B: the sampling is performed using non-replacement.
a) Write out the simple events that make up the sample space S.
b) Let A be the event that 2 non-defective items are selected in a row.
Find P{A}.
c) Let B be the event that the 2 defective items are selected in a row.
Find P{B}.
86 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
Problem 2.3 Why arent there many 5-engine airplanes? Well, probably for other
reasons, but consider the following problems. If at least half of a planes
engines must be functioning for the plane to y,
a) are there any values of q for which a 3-engine plane is safer than
a 4 engine plane, where q is the probability that an engine will be
functioning?
b) for which q a 3-engine plane is safer than a 5-engine plane? Assume
that the engines function independently.
Problem 2.4 6 dice are tossed,
a) what is the probability that they are all 2s?
b) what is the probability that every possible number appears?
Answer a) and b) if 7 dice are tossed.
Problem 2.5 What is the probability of winning at craps? (One wins on rst toss
with a 7 or 11. If one gets a 4, 5, 6, 8, 9, 10 on the rst toss, one tosses
again until the rst toss is repeated for a win or a 7 is tossed for a loss.)
Problem 2.6 What is the probability that in a group of 23 people, no two people
will have the same birthday?
Problem 2.7 The proportion of women who vote Republican is .45, while the pro-
portion of men who vote Republican is .55. What is the probability
that a person chosen at random from the population is a Republican,
if women make up 55% of the population?
Problem 2.8 A red die and a black die are tossed. What is the probability that
the black die is a 2, if the sum of the numbers is known to be 5? Are
these events independent? What is the probability that the black die
is a 2, if the sum of the numbers is known to be 7? Are these events
independent?
Problem 2.9 An urn contains 10 identical balls, of which 2 are white and 8 are black.
If 3 balls are drawn without replacement,
a) what is the probability that 2 are black,
b) what is the probability that at least 2 are black?
c) Show that the answers to a) and b) can be obtained using the hy-
pergeometric distribution.
2.9. PROBLEMS FOR CHAPTER 2 87
Problem 2.10 The Arizona lottery has the game Lotto, for which one wins by match-
ing 3, 4, 5, or all 6 numbers drawn in any order, where the numbers
range from 1 to 42. The odds for winning the jackpot, that is, match-
ing all 6 out of the 6 numbers drawn are listed as 1:5,245,786. The
second prize, matching any 5 out of the 6, has odds of 1:24,286.05. For
matching any 3 out of 6, one wins $2, at odds of 1:36.74.
a) Compute the odds of winning the jackpot.
b) Verify the odds of winning the second prize.
c) Verify the odds of winning $2.
d) Which distribution can be used to compute the above odds and redo
the above using the distribution.
Problem 2.11 A consumer is deciding whether to buy a lot of N=200 items which the
producer guarantees to contain D=2 or fewer defective items. If the lot
contains as many as D=6 defective items, the consumer will not buy
the lot.
a) Which distribution represents the distribution of the number of de-
fective items in a sample of n items, where n > 20?
b) How large of a sample must be taken and for which value of c must
be used so that:
P{accept lot|D = 2} 0.90
P{reject lot|D = 6} 0.90
where the lot is accepted if the observed number of defectives in the
sample is less than or equal to c and rejected otherwise.
c) Could one do better, that is, save money with the same statistical
requirements, if one sampled with replacement?
Problem 2.12 A production line has a constant probability p that an item from the
line is defective. Assume that items are sampled from the line in an
independent manner.
a) What distribution usually is chosen to represent the number of fail-
ures in a sample of n items?
b) What distribution usually is chosen to represent the number of items
sampled until the rst failure is found?
c) What distribution usually is chosen to represent the number of items
sampled until the rth failure is found?
d) What distribution usually is chosen to represent the number of fail-
88 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
ures in a sample of n items, the size of the sample is very large and p
is small?
e) What distribution usually is chosen to represent the number of fail-
ures in a sample of n items, if the size of the sample is very large and
p is not small?
f) What are the continuous analogues of the above distributions?
Problem 2.13 One is trying to assess if a large production line has an acceptable per-
cent defective p
a
= 0.02, or an unacceptable percent defective p
u
=
0.06. Devise a sampling plan, based on the observed number of defec-
tives c and the sample size n, so that:
P{X c|p = 0.02} 0.95
P{X c|p = 0.06} 0.05
where X represents the random variable of the number of defectives in
the sample.
Problem 2.14 The Poisson distribution is an adequate approximation to the binomial
if the number of trials n is large and the proportion p is small. Repeat
problem (13) using the Poisson distribution and show that the results
hold whenever the ratio
pu
pa
= 3 with the appropriate sample size.
Problem 2.15 A function f is dened by:
f(x) =
_

_
kx, for 0 < x < 1
k
2
(3 x) for 1 x 3
0 elsewhere
a) What is the value of k so that f is a density?
b) What is E(X), where X is the random variable whose density is f(x)?
c) What is V(X), the variance of X?
d) What is the median of X?
e) What is the coecient of skewness of X?
f) What is the coecient of kurtosis of X?
Problem 2.16 For the following data, draw a stem plot and a box plot. Compute the
sample 5-number summary and the sample coecients of skewness and
kurtosis. 23.20, 17.33, 26.47, 32.66, 32.19, 35.40, 11.90, 23.59, 31.57,
2.9. PROBLEMS FOR CHAPTER 2 89
18.48, 20.86, 14.86, 13.92, 19.13, 20.36, 12.29, 21.59, 22.58, 25.81, 22.81,
23.04, 22.78, 33.06, 24.49
Problem 2.17 Apply the K-S and Lohrding tests for the goodness of t of the data
4.8 8.0 13.0 46.6 10.1 6.6 22.6 4.5
3.9 17.4 6.3 5.4 5.2 9.2 9.1
to the exponential distribution with the mean = 10. Compare the
results to the chi-square test.
90 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW
n = .20 .15 .10 .05 .01
1 0.900 0.925 0.950 0.975 0.995
2 0.684 0.726 0.776 0.842 0.929
3 0.565 0.597 0.642 0.708 0.828
4 0.494 0.525 0.564 0.624 0.733
5 0.446 0.474 0.510 0.565 0.669
6 0.410 0.436 0.470 0.521 0.618
7 0.381 0.405 0.438 0.486 0.577
8 0.358 0.381 0.411 0.457 0.543
9 0.339 0.360 0.388 0.432 0.514
10 0.322 0.342 0.368 0.410 0.490
11 0.307 0.326 0.352 0.391 0.468
12 0.295 0.313 0.338 0.375 0.450
13 0.284 0.302 0.325 0.361 0.433
14 0.274 0.292 0.314 0.349 0.418
15 0.266 0.283 0.304 0.338 0.404
16 0.258 0.274 0.295 0.328 0.392
17 0.250 0.266 0.286 0.318 0.381
18 0.244 0.259 0.278 0.309 0.371
19 0.237 0.252 0.272 0.301 0.363
20 0.231 0.246 0.264 0.294 0.356
25 0.210 0.220 0.240 0.270 0.320
30 0.190 0.200 0.220 0.240 0.290
35 0.180 0.190 0.210 0.230 0.270
> 35 1.07/

n 1.14/

n 1.22/

n 1.36/

n 1.63/

n
Table 2.8: Critical values of K-S test