1. Probability Theory
Our discussion on probability starts with the Set theory. We first state the basic concepts, then the set operations (with Laws of operation) and finally define function. We then discuss the different approaches to the theory of probability, laws of probability, conditional probability and Bayes’ theorem. We illustrate them with some examples.
1.1 Basic Concepts
• An experiment is any process that generates welldefined outcomes.
• The sample space for an experiment is the set of all experimental outcomes.
• A sample point is an element of the sample space, any one particular experimental outcome.
• In Sample Space, 2 outcomes cannot occur at the same time. On the other hand, one outcome must occur.
Sets are unordered collections of elements. Elements are usually named with lower case letters. Sets are usually named with capital letters.
• Finite Sets: Finite number of elements.
• Denumerable Sets: Each character is identifiable, countable and distinct. In case of nondenumerable sets, the elements can be classified as having certain characteristics but each element cannot be separated or identified.
There are three main ways to specify a set:
(1) Listing all its members (list notation);
Example: {zen, astra, santro}
(2) Stating a property of its elements (predicate notation);
Example: {x  x is a natural number and x < 8} Reading: “the set of all x such that x is a natural number and is less than 8” and therefore the second part of this notation is a property the members of the set share (a condition or
1
a predicate which holds for members of this set). General form: {x  P(x)}, where P is some predicate (condition, property).
(3) Defining a set of rules which generates (defines) its members (recursive rules).
Example – the set E of even numbers greater than 3:
a) 4 ∈ E
b) if x ∈ E, then x + 2 ∈ E
c) nothing else belongs to E.
The first rule is the basis of recursion, the second one generates new elements from the elements defined before and the third rule restricts the defined set to the elements generated by rules a) and b).
Two sets A and B are equal sets if and only if all the elements in A belong to B and vice versa. Set A is a subset of B if an element belongs to A is also an element in B.
A set A is a subset of a set B if and only if every element of A is also an element of B.
Such a relation between sets is denoted by A ⊆ B. If A ⊆ B and A ≠ B we call A a proper subset of B and write A ⊂ B. Note the largest possible subset of a given set is the set itself whereas the smallest possible subset is the null set for any given set.
If two sets have no common elements between them then are called disjoint sets
(mutually exclusive sets).
1.2 Set operations
Union Set: Set obtained by combining two or more sets.
A ∪ B = {x : x ∈ Aorx ∈ A}
Intersection Set: Set obtained by combining two or more sets using only common elements.
A ∩ B = {x : x ∈ Aandx ∈ A}
Universal Set (Ω): Totality of the all elements under consideration in a given set.
Compliment: Given Ω and A, then A
C = {x : x ∉ Aandx ∈ Ω}
Set Difference: A –B is the set of all elements that are in A but not in B.
1.2.1 Laws of Set Operations
2
a) Commutative Law: A ∪ B = B ∪ A; A ∩ B = B ∩ A
b) Associative Law: A ∪ (B ∪ C) = ( A ∪ B) ∪ C; A ∩ (B ∩ C) = ( A ∩ B) ∩ C
c) Distributive Law:
A ∪ (B
∩ C) = ( A ∪ B) ∩ ( A ∪ C); A ∩ (B ∪ C) = ( A ∩ B) ∪ ( A ∩ C)
Note given the above;
(A
A ∩
A
A ∪
C
)
C
= A
A
∩Ω=
A
C
=Φ
A
=Ω
C
1.2.2 De Morgan’s Law
(i)
(
(
A
A
∪
∩
B
B
)
)
C
C
=
=
A
A
C
C
∩
∪
B
B
C
C
Proof:
Method 1
Let (A ∪ B)
C
≠ A
C
∩ B
C
Suppose the following:
∈( ∪
x
and
x
∉
A
A
C
∩
B )
B
C
C
x
x
x
x
x A
∈
A
∉
∉Ω− ( A ) ∩Ω− (
∉Ω− ( A ∩
∉
C
∩
B
C
B
)
(
A
∩
∩
B
B
)
C
B
)
This is contradictory and the hence the assumption of ( A ∪ B) correct. Therefore we conclude:
(A∪ B)
C
= A
C
∩ B
C
C
≠ A
C
∩ B
C
is not
3
Method 2
Let
(
A
⇒ ∈
x
∈
x
Let
A
⇒ ∉
x
∈ (
x
∪
(
B
C
A
C
∩
)
C
∩
B
C
(
A
∪
B
⇒ ∉
B
C
x
)
Aandx
(
A
∉
)
B
⇒ ∪
B
C
⇒ ∈
x
C
A andx
⊂
A
C
∩
B
C
∈
B
C
) ⇒ ∈
)
⇒ ∈
x
x
A andx
(
∪
C
A
B
)
∈
C
B
⇒
C
⇒ ∉
∩
B
x
A
C
Aandx
A
C
(
⊂
∉
∪
B
B
)
C
The second part of the theorem is left for you to prove. We summarize the laws on set operations below:
S.NO 
Operator 
Symbol 
Example 
Meaning 

1 
Union 
U 
AUB 
The 
event 
of 
either A 
or 
B 

occurring. 

2 
Finite union 
n U A i A 
3 U 
i 
The event of any one of the events A _{1} ,A _{2} and A _{3} occurring. 

i 
=1 
i=1 

3 
Countable union 
∞ 
∞ 
The event 
of 
of 
the 

U i=1 A i A 
U i=1 
i 
events A _{1} ,A _{2} … 
any one occurring. 

4 
Intersection 
∩ 
A∩B 
The 
event 
of 
both 
A 
and 
B 

occurring. 

5 
Finite 
n I A 
3 I A 
The event of all the events A _{1} ,A _{2} 

intersection 
i 
i 
and A _{3} occurring. 

i 
=1 
i=1 

6 
Countable 
∞ I A 
∞ I A 
The event of all the events A _{1} ,A _{2} 

intersection 
i 
i 
… 
occurring. 

i=1 
i=1 

7 
Complementation 
c or − 
A ^{c} or A 
The event of A not occurring. 

8 
Subtraction 
 
AB 
The event of A occurring and B not occurring. 
1.3 Function
A function is a rule or law that associates each element in one set with one and only one element in another set of elements.
Let a ∈ A and a ∈ B
1
2
= f
a
f
2
(.)
=
(
{
a
a
1
2
)
∈ B

a
1
∈ A
a
2
= f
(
a
1
)}
., Let A ⊂ Ω and x ∈ A . We can define an Indicator function as
I
I
A
A
=
=
1
0
if
if
x ∈ A
x ∉ A
4
1.4 Sample Space and Event
The sample space for an experiment is the set of all experimental outcomes. In Sample Space, 2 outcomes cannot occur at the same time. On the other hand, one outcome must occur.
Examples 

Experiment 
Sample Space 
Toss a Coin, Note Face Toss 2 Coins, Note Faces Play a Soccer Game Inspect a Part, Note Quality Observe Gender 
{Head, Tail} {HH, HT, TH, TT} {Win, Lose, Draw} {Defective, Good} {Male, Female} 
Event could be any collection of sample points. Simple Event refers to Outcome with one characteristic whereas Compound Event refers to:
Collection of outcomes or simple events Two or more characteristics Joint event is a special case
Examples 

Experiment: 
Toss 2 Coins. Note Faces. 

Sample Space: 
HH, HT, TH, TT 

Event 
Outcomes in Event 

1 Head & 1 Tail Head on 1st Coin At Least 1 Head Heads on Both 
HT, TH HH, HT HH, HT, TH HH 
5
1.5 Different approaches to the theory of Probability
There are three major approaches to the theory of Probability: (1) Classical approach (2) Frequency approach and (3) Axiomatic approach.
1.5.1 Classical approach
The classical approach makes the following assumptions:
(a) 
The Sample space is a finite set of elementary events. 
(b) 
All elementary events are equally likely to occur in a single trial of the experiment. 
Classical (or Laplace) definition of Probability:
Under the above assumptions, the probability of any Event E, is given by the ratio
P(E) =
The number of elementary events in E
.
The total number of elementary events in S
This formula will help us to compute the probabilities of many events under the classical (finite sample space equally likely elementary events) set up. Note that the probability of any event lies between 0 and 1.
Example 1: The probability of getting an odd number when a die is thrown is ½.
Example 2: In a class of 15 male students and 14 female students two class representatives are to be selected at random. Then the probability of selecting a particular male student and a particular female student is:
⎛ 15 ⎞ ⎛ 14 ⎞ ⎟
⎜
⎜
⎟ ⎟ ⎜ ⎜
⎟ ⎠
⎝ ⎠ ⎝
1
1
⎛ 29 ⎞
⎜ ⎝ ⎜
⎟ ⎟
2 ⎠
.
Note that the definition will give P(S) = 1 and P(φ) = 0. The certain event has probability 1 and the impossible event has probability 0!
Two events A and B are said to be mutually exclusive if A ∩ B = φ. It is easy to prove that for any two mutually exclusive events P(AUB) = P(A) + P(B).
6
In the classical (finite sample space equally likely elementary events) set up, any two events A and B can “independently” occur if P(A ∩ B) = P(A) P(B).
One can easily establish the following results using the definition:
1. P(AUB)=P(A)+P(B)P(A ∩ B)
2. P(A ^{c} )=1P(A).
3. P(AB)=P(A)P(B) if A ⊃ B.
4. If A _{1} ,A _{2} ,…,A _{k} are k mutually exclusive events then
P(A _{1} UA _{2} U…UA _{k} )=P(A _{1} )+P(A _{2} )+…+P(A _{k} ).
5. If A _{1} ,A _{2} ,…,A _{k} are k mutually exclusive events and exhaustive events (ie they addup to S, then for any event B
P(B)= P(B ∩ A _{1} )+P(B ∩ A _{2} )+…+P(B ∩ A _{k} ).
6. For any event E we have 0 ≤ P(E) ≤ 1 .
There are some limitations in this approach. They are:
1. The method fails if the sample space is infinite.
2. When events are not equally likely.
1.5.2 Relative frequency approach
If the elementary events are not equally likely, and even if the Sample space is infinite, one can adopt this approach for any event. Let E be the event for which we want compute the probability of its occurrence. Let the experiment be repeated a large number of times say N times. Let M denote the number of times E has occurred. Then the probability of E is defined by
P(E)=
Lim
⎡
⎢
⎣
M
N
→∞
N
⎤
⎥
⎦
Note that this definition is based on a limiting concept. But the limit has been shown, mathematically, to exist. But one cannot conduct the experiment infinite number of times to find P(E) from a practical point of view. So one can approximate P(E) by the ratio M/N for sufficiently large N. Note this approach suffers from large replications of the experiment.
7
1.5.3 Axiomatic approach
The sample space, the set of all possible outcomes of the random experiment can be uncountably infinite like the Real line R=(∞, ∞). Note that this includes the finite sample space in the classical approach. Let us denote this by Ω. We do not assign probabilities to all subsets of Ω as that would be impossible if Ω is infinite. Instead we concentrate on a class of interesting events that would be sufficient for our inference. Such a class, say F,
should satisfy the following three requirements:
1. The sample space Ω belongs to F.
2. If A belongs to F
then A ^{c} also belongs to F.
∞
3. If {A _{n} } is any sequence events then _{U}
n=1
A
n
also belongs to F .
Any collection of events satisfying the above requirements is called a σFIELD of
events. We will call this class as the event space.
1.5.3.1 Axioms of Probability
Given the sample space Ω and the event space F, the probability is a nonnegative real
valued function on the event space F satisfying the following three axioms:
Axiom 1:
Axiom
2:
Axiom 3:
P
(
∞
U
=
n
1
A
n
)
=
(
P A
i
) ≥ 0
P(Ω) = 1 For any sequence {A _{n} } of mutually disjoint (exclusive) events in F,
∞
∑
=
1
n
P A
(
n
)
. The triplet (Ω, F,,P) is called the probability space.
We can use the above axioms to state the following theorems.
Theorem 1
P(Φ) = 0
For infinite sets, we choose a set of A _{i} such that A _{i} = Φ (for all i = 1, 2, 3,….). By definition they are mutually exclusive.
Using Axiom 3,
8
P
P
(
(
∞
U
i
A
i
Φ =
)
)
=
∞
∑
∞
∑
i
= 1
P A
(
P
(
Φ
)
i
)
A number will be equal to zero if its sum (given it is positive) if and only if the number itself is zero. This implies P(Φ) = 0
Theorem 2
A
A
P
C
C
=Ω−
∩ A =Φ
A
(
Ω =
)
P ( A
P ( A
=
= ⇒
1
P( A
C
)
= −
1
P( A)
A
)
P ( A
+
∪
C
)
P ( A
C
)
C
)
= −
1
P ( A
)
Theorem 3
P( A ∪ B) = P( A) + P(B) − P( A ∩ B)
A
A
P
P
∪
∩
A
A
(
(
B
(
A
= ∪
(
A
C
∩ B
)
)
+
+
P A
(
(
P B
)
A
C
)
∪
∪ =
=
B
B
)
B
)
(
P A
(
P A
∩ =Φ
C
)
∩ B
−
)
P A
(
General Result:
∩
B
)
If A _{1} , A _{2} , …., A _{n} are not mutually exclusive events and
P A
(
1
∪
A
2
∪ ∪
A
n
)
=
n
∑
=
j
1
P A
(
j
)
− ∑∑ P A ∩
j
(
i
i
<
A
j
)
+∑∑∑
A
i ∈ F for any i = 1,2,…,n then
P A
(
1
∩
A
2
∩
A
n
)
(
− 1
)
n + 1
P
((
A
1
∩
A
2
Theorem 4 Let _{A}_{,} _{B} _{∈} _{F} and A ⊂ B , then P( A) ≤ P(B)
B =
(
A
P
Since
P
P
P
∩
B
∩
∩
(
)
=
(
B
A
∩
B P
)
∩
A
(
(
(
B
B
A
∩
)
)
≥
≤
A
C
(
(
P B
P A
)
∪
)
)
+
≥
)
)
0
(
B
A
(
A
(
)
B
C
(
∩
B
A
∩
C
=Φ
P B
(
C
A
)
)
=Φ
∩
A
C
)
∩
A
n
)
9
Boole’s inequality (Proof omitted)
If
A
i
(
P A
1
∈ F
for any i = 1,2,…,n then
∪ A
2
∪
)
∪ A ≤
n
(
P A
1
)
+
(
P A
2
)
+
1.6 Counting Rules
+
P
(
A
n
)
A useful counting rule enables us to count the number of experimental outcomes when n objects are to be selected from a set of N objects where the order of selection is important.
Number of permutations of N objects taken n at a time:
P
N
n
=
n ! ⎜ ⎜ ⎛ N ⎞ ⎟ ⎟ = n
⎝
n
⎠
!
N !
N !
=
(
n
! )(
N
−
n
)!
(
N
−
n
)!
Another useful counting rule enables us to count the number of experimental outcomes when n objects are to be selected from a set of N objects. Here the order of selection is not important. The number of combinations of N objects taken n at a time is:
C
C
N
N
n
n
⎛ ⎛ N N ⎞ ⎞
⎝ ⎜ ⎝ ⎜ n n ⎠ ⎟ ⎠ ⎟
⎜
=
= ⎜
⎟ ⎟ = =
N
N
!
!
n N
N
n
!(
!(
− −
)! )!
n n
Examples
Example 1
A problem is given to three students whose chances of solving it are ½, ¾, and ¼ respectively.
a) What is the chance that the problem is being solved?
b) What is the chance that exactly one of them solves it?
a) P(solved) = 1 – P(not solved) = 1 – (1/2)(3/4)(1/4) = 1 – (3/32) = 29/32
b) P(exactly one of them solves it)
= P(A solves, B and C do not) + P(B solves, A and C do not) + P(C solves, A and B do not)
=
= (1/2)(1/4)(3/4) + (3/4)(1/2)(3/4) + (1/4)(1/2)(1/4) = 3/32 + 9/32 + 1/32
= 13/32
P
(
A ∩ B
C
∩ C
C
)
+ P ( B ∩ A
C
∩ C
C
)
+ P ( C ∩ A
C
∩ B
C
)
10
Example 2
An urn contains 4 Red, 3 White and 2 Blue balls. A person draws 4 balls without replacement. What is the probability that amongst the balls drawn at least one ball of each colour?
Note the events are:
{2 R, 1 W, 1 B}, {1 R, 2 W, 1 B}, {1 R, 1 W, 2 B}
=
4 
C 
2 3 
C 
2 1 C 
1 
+ 
4 
C 
1 
3 
C 
2 
2 
C 
1 
+ 
4 
C 1 1 3 C C 2 
2 
= 
4 

9 
C 4 
9 
C 
4 
9 
C 
4 
7 
Example 3
An urn contains 4 tickets numbered 1, 2, 3, and 4. Another urn contains 6 tickets numbered 2, 4, 6, 7, 8 and 9. If one of the 2 urns is chosen at random and a ticket is drawn (at random) from the chosen urn, what is the probability that ticket drawn bears the number 2 or 4?
P(2,4) = P(U _{1} ) + P(U _{2} ) = (1/2)(1/2) + (1/2)(2/6) = 5/12
1.7 Conditional Probability
The conditional probability refers to the probability of an event given that another event has occurred. The conditional probability of A given B is denoted by P(AB). A conditional probability is computed as follows:
(
P A

B
i
)
P B
(
i
)
11
1.7.1 Bayes’ Theorem
If A is any event, such that P(A)>0, and B _{1} ,B _{2} ,…,B _{n} are any finite set of mutually exclusive and exhaustive events such that P(B _{i} ) >0 for all i= 1,2,…,n, then
(
P B
i

A
) =
P A
(

B
i
)
P B
(
i
)
n
∑
i = 1
P A
(

B
i
)
P B
(
i
)
for i
= 1,2,
,
n .
Proof: By the definition of conditional probability, for each i = 1,2,…,n, we can write P(A∩B _{i} )=P(AB _{i} )P(B _{i} ) and also as P(B _{i} A)P(A). Therefore,
P B
(
i

A
) =
P A
(

B
i
)
P B
(
i
)
P A
(

B
i
)
P B
(
i
)
=
P A
(
)
n
∑
i = 1
P A
(

B
i
)
P B
(
i
)
using the law of total probability for P(A) in the denominator.
1.7.2 Multiplication Rule
P A
(
1
∩
A
2
∩ ∩
A
n
)
=
P ( A
1
)
P ( A
2
1.8 Independent Events

A
1
)
P ( A
3
)
P ( A
1
∩
A
2
)
P ( A
n

A
1
∩
A
2
∩ ∩
A
n
−1
)
Independence plays an import role in probability theory. To know what it is we must differentiate between pairwise independence and mutual independence. Pairwise independence refers to two events while mutual independence refers to more than two events. Definition: (Pairwise independence) Two events, say A and B, are said to be
independent if P( A ∩ B) = P( A)P(B) .
Definition: (Mutual independence) A set of k events, say A _{1} ,A _{2} ,…,A _{k} (k>2), are said to be mutually independent if all the following k2 conditions hold;
• For any two events in the given set of events, say A _{i} and A _{j} , i ≠ j, P(A _{i} ∩A _{j} ) = P(A _{i} )P(A _{j} ) i,j=1,2,…,k, i ≠ j.
• For any three events say A _{i} A _{j} and A _{k} , i ≠ j ≠ k P(A _{i} ∩A _{j} ∩A _{k} ) = P(A _{i} )P(A _{j} )P(A _{k} ) i,j,k=1,2,…,k, where i ≠ j ≠ k and so on up to P(A,A _{2} …A _{k} )=P(A _{1} ) P(A _{2} )… P(A _{k} )
12
Note that mutual independence implies pairwise independence (see the first condition above). But mere pairwise independence need not imply mutual independence as the following example shows.
Example:
Let Ω = {w _{1} ,w _{2} ,w _{3} ,w _{4} }; A ={w _{1} ,w _{2} },B={ w _{1} ,w _{3} } and C={ w _{1} ,w _{4} }. Let P{w _{1} }=1/4, P{w _{2} }=1/4, P{w _{3} }=1/4 and P{w _{4} }=1/4. Using the classical definition one can see that P(A) = 1/2, P(B) = 1/2 and P(C) = 1/2. But P(A∩B∩C)=1/4 which is not equal to P(A)P(B)P(C)=1/8. Therefore A, B and C are pairwise independent but are not mutually independent!
Immediate consequences of the definition of independence are listed below:
1. If A and B are independent, then A ^{c} and B, A and B ^{c} and A ^{c} and B ^{c} are also independent.
2. If A and B are independent, then P(AB)=P(A) and P(BA)=P(B)
1.9 Random Variables
Let Ω be the given sample space and F be the event space. A random variable X is a real
valued function on the sample space Ω such that the inverse image of any interval of the type (∞,x] lies in F for all values of x.
Note that there are two requirements for X to be a random variable.
1. For any elementary event w in the sample space Ω, X(w) is real number.
2. The set of elementary events w such that X(w) ≤ x is an event for which the probability can be determined, for all real values of x. In other words, one should be able to find P(X ≤ x) for all values of x.
Let us denote by F(x) the value P(X ≤ x). This function is called the distribution function of the random variable. Note, therefore, that every random variable has a distribution function which determined through the Probability P on the event space.
1.9.1 Discrete Variables
If an ‘experiment’ (a trial, the empirical observation of a phenomenon) can result in a
) set of specified
finite ( x , x
1
2
,
, x
n
) or countably infinite ( − ∞
,
,
x i
−
1
,
x
i
,
x
i
+
1
,
,
+ ∞
13
measurable outcomes and no other, the random variable X which measures the outcome is said to be discrete:
X
= [
,
X
x
i
−1
=
,
x
[
i
x , x
1
,
x
i
+1
2
,
,
,
]
x
n
]
finite
countably infinite
For example, if a random sample of n items is taken from a production line, the random
variable X, representing the number of defectives in the sample, is discrete as it can take
the set of values x = 012, ,
0.3, 0.0001 etc. Discrete variables need not be integers, e.g. the proportion of defective
n but no other, i.e. the variable cannot take values like ½,
,
,
items in the sample would take the values, 0 
, 
1 
, 
2 
, 
,1 
and would also be a discrete 
n 
n 
variable.
1.9.2 Continuous Variables
If an ‘experiment’ can result in an uncountably infinite set of measurable outcomes which can take any value within a generic range (i.e. from − ∞ to + ∞ ) or a specified range (e.g. from a to b), the random variable X which measures the outcome is said to be continuous:
X 
= [− ∞ < x < +∞] 
generic range 
X = [a < x < b] 
specified range 
For example, arrival time of students for a 9:00am lecture may be a random variable X taking values in the range 8:30 to 9:30. Here any value of the variable can occur. Note the notational convention: capital letters, like X, Y, Z, refer to random variables in a
general sense; small letters, like x, y, z, refer to possible values taken by random variables
(collectively); small letters with a subscript, like
random variable.
x , x , x , refer to specific values of a
1
2
3
14
1.10
Probability Distributions
Discrete Variables
The probability distribution of a discrete random variable is a function (in the mathematical sense) that gives the probabilities of possible values of the random variable. It can take the form of a table (empirical distribution) which lists the individual probabilities, or the form of an algebraic expression (theoretical distribution) which generates the individual probabilities. Using general notation, a probability distribution of a discrete random variable can be defined as
()
f x
=
(
P X
=
x
)
for
assuming the random variable X is defined as X
x
=
= [
,
, x
x
i
i
−1
−1
,
,
x
x
i
,
i
,
x
x
i
+1
i
+1
,
,
]
.
The probability distribution f (x) has the following properties:
(a)
(b)
(c)
The outcomes
x i are mutually exclusive and collectively exhaustive;
f (
x
i
f
∑
x
)
(
≥ 0
for all i, i.e. probabilities are nonnegative;
x
)
= 1,
i.e. the sum of all probabilities is equal to 1.
Example 1 Empirical Distribution (Table) Let X be daily sales (in number of loaves) of a special rye bread in a shop
x 
0 
1 
2 
3 
4 
5 

_{f} (x) 
0.1 
0.2 
0.3 
0.2 
0.1 
0.1 
∑= 1 
15
Example 2 Theoretical distribution (Algebraic expression) If a random sample of n items is taken from a production line with probability of a defective being equal to P, the probability distribution of ‘the number of defectives in the sample’ (X) is given by
()
f x
=
n
C P
x
x
(
1
−
P
)
n
−
x
,
So, if n = 5
and P = 0.1, then
f
(
x =
2)
= f
()2
=
5
C
2
(0
.
for
x
1)
2
(0 9)
.
3
=
=
0 1
,
,
2
,
(10)(0
.
, n,
where
n
C
01)(0 729)
.
=
0 0729
.
.
x
=
n !
x
!(
n
−
x
)!
Here f (x) is given by an algebraic expression known as the binomial distribution.
Continuous Variables For a continuous random variable, the probability distribution is a function (in the mathematical sense) that describes a curve, the probability density function (pdf), so that areas under the curve give probabilities associated with corresponding intervals of the variable. Using general notation, a probability distribution of a continuous random variable can be defined as
f (x)
for
− ∞ < x < +∞
assuming the random variable X is defined as X = [− ∞ < x < _{+}_{∞}_{]}_{.}
Here the probability distribution f (x) always takes the form of an algebraic expression.
The pdf f (x) has the following properties:
(a)
(b)
f (x) ≥ 0, i.e. probabilities are nonnegative;
+∞
∫
−∞
f (x)dx = 1,
i.e. probabilities sum to 1;
16
(c)
P(a
≤
X
≤
b)
=
b
∫
a
f ()x dx
, i.e. the area under the pdf between
a
and
b
represents the probability that X lies in the interval a to b, with a ≤ b .
Note also that P(X = x) = 0 , as f (x) is the ordinate of the pdf and thus has no area.
Example
Daily sales of petrol (X, measured in ‘000 litres) of a garage has the following pdf:
(1)
(2)
f
()
x
=
x
,
2
0
< x <
2
2
∫
0
f
(
x
)
dx
=
2
∫
0
specified.
f (x)
1
0.5
0
1
⎡
⎢
⎣
x
2
x
⎤
⎥
⎦
2
1
2
2
2
0
4
dx
=
=
⎛
1
3
⎞
P ⎜ ≤ X ≤ ⎟=
⎝
2
2
⎠
, i.e. the probability
that daily sales will be between 500 and 1500 litres is 0.5.
Note that if the pdf was specified as, for example,
constant, the value of k could be found by using the relation
(
f x
)
x 

= 
, where k is an unknown 

k 
17
2
_{∫}
0
f
()
x
dx =
1
.
In our example,
2
∫
0
f
()
x
dx =
2
∫
0
and, therefore, k = 2 .
1 ⎡ x ⎤
⎥
⎦
2
2
1
1 ⎞ ⎟ 4
⎛
⎜
2
⎝ ⎠
(
x
k
k
2
0
k
dx
=
⎢
⎣
=
−
0
)
=
2
k
= 1
1.11 Cumulative Probability Distributions (cpd) (or Distribution Functions)
Discrete Variables
For a discrete random variable, the cumulative probability distribution (cpd) (or distribution function) is a function that gives the probability that X does not exceed a specific value x. Using general notation,
(
F x
)
=
(
P X
≤
x
)
=
∑
t ≤ x
f
()
t
for
x
=
,
x
i
−1
,
x
i
,
x
i
+1
,
Note that f (x) has been replaced by f (t) whose form is the same as f (x). The change
of x into t is simply to distinguish the variable (t) from the upper limit of the summation (x).
Properties of the cpd F(x):
(a)
F(− ∞) = 0 and F(+ ∞) = 1;
(b)
(c)
If a < b , then F (a) ≤ F (b) , for any real numbers a and b;
f
() () ()
x
3
= F
x
3
− F
x
2
,
f
(
x
2
or x
3
)
= F
(
x
3
)
− F
(
x
1
)
.
Example 1 Empirical Distribution (Table) Let x be daily sales (in number of loaves) of a special rye bread in a shop.
x 
0 
1 
2 
3 
4 
5 
_{f} (x) 
0.1 
0.2 
0.3 
0.2 
0.1 
0.1 
F(x) 
0.1 
0.3 
0.6 
0.8 
0.9 
1.0 
18
Example 2 Theoretical Distribution (e.g. binomial) (Algebraic expression)
_{f} ()x
()
F x
.
=
=
n
C P
x
x
(
Bien plus que des documents.
Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.
Annulez à tout moment.