Elements of Probability and Statistics
Probability Theory provides the mathematical models of phenomena governed by chance. Examples of such phenomena include weather, lifetime of batteries, traﬃc congestion, stock exchange indices, laboratory measurements, etc. Statistical Theory provides the mathe matical methods to gauge the accuracy of the probability models based on observations or data. The remaining Lectures are about this topic. “Essentially, all models are wrong, but some are useful.” — George E. P. Box.
Contents
1 Sets, Experiments and Probability 
3 

1.1 Rudiments of Set Theory 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
3 

1.2 Experiments . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
5 

1.3 Probability 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
7 
1.4 Conditional Probability 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
11 

2 Random Variables 
15 

2.1 Discrete Random Variables and their Distributions 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
18 

2.1.1 Discrete uniform random variables with ﬁnitely many possibilities 
. 
. 
19 

2.1.2 Discrete nonuniform random variables with ﬁnitely many possibilities 
20 
2.1.3 Discrete nonuniform random variables with inﬁnitely many possibilities 22
2.2 Continuous Random Variables and Distributions . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
25 

3 Expectations 
33 

4 Tutorial for Week 1 
38 

4.1 Preparation Problems (Homework) 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
38 

4.2 In Tutorial Problems 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
39 
5 Tutorial for Week 2 
43 

5.1 Preparation Problems (Homework) 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
43 

5.2 In Tutorial Problems 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
43 
List of Tables 

1 f ( x ) and F ( x ) for the sum of two independent tosses of a fair die RV X . 
. 
. 
21 

2 DF Table for the Standard Normal 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
47 

3 Quantile Table for the Standard Normal 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
48 
1
List of Figures
.
.
3 f ( x ) and F ( x ) of RV X for the sum of two independent tosses of a fair die
1
2
f(x) = P(x) =
_{6} and F ( x ) of the fair die toss RV X of Example 2.4 .
.
1
.
.
.
.
f ( x ) and F ( x ) of an astragali toss RV X of Example 2.6
.
.
.
.
.
.
.
.
19
21
21
4 Probability density function of the volume of rain in cubic inches over the
lecture theatre tomorrow.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 PDF and DF of Normal(µ, σ ^{2} ) RV for diﬀerent values of µ and σ ^{2}
2
.
.
.
.
.
.
.
.
.
.
.
.
26
30
1
Sets, Experiments and Probability
1.1 Rudiments of Set Theory
1. 
A set is a collection of distinct objects or elements and we enclose the elements by curly braces. For example, the collection of the two letters H and T is a set and we denote it by {H, T}. But the collection {H, T, T} is not a set (do you see why? think distinct!). Also, recognise that there is no order to the elements in a set, i.e. {H, T} is the same as {T, H}. 

2. 
We give convenient names to sets. For example, we can call the set {H, T} by A and write A = {H, T} to mean it. 

3. 
If a is an element of A, we write a ∈ A. 
For example, if A = {1, 2, 3}, then 1 ∈ A. 

4. 
If a is not 13 ∈/ A. an element of A, we write a ∈/ 
A. 
For example, if A = {1, 2, 3}, then 

5. 
We say that a set A is a subset of a set B if every element of A is also an element of 

B 
and write A ⊆ B. For example, {1, 2} ⊆ {1, 2, 3, 4}. 

6. 
We say that a set A is not a subset of a set B if at least one element of A is not an element of B and write A B. For example, {1, 2} is not a subset of {1, 3, 4} since 

2 ∈ 
{1, 2} but 2 ∈/ {1, 3, 4} and write {1, 2} {1, 2, 3, 4} to mean this. 

7. 
We say a set A is equal to a set B and write A = B if and only if A ⊆ B and B ⊆ A 

8. 
The union A ∪ B of A and B consists of elements that are in A or in B or in both A 

and B. For example, if A = {1, 2} and B = {3, 2} then A ∪ B = {1, 2, 3}. 

9. 
The intersection A ∩ B of A and B consists of elements that are in both A and B. For example, if A = {1, 2} and B = {3, 2} then A ∩ B = {2}. 

10. 
The empty set contains no elements and it is the collection of nothing. It is denoted by ∅ = {}. 

11. 
Given some universal set, say Ω, the Greek letter Omega, the Complement of a set 

A 
denoted by A ^{c} is the set of all elements in Ω that are not in A. 
For example, if 

Ω = {H, T} and A = {H} then A ^{c} = {T}. 
Note that for any set A ⊆ Ω: 

A ^{c} ∩ A = ∅, A ∪ A ^{c} = Ω, 
Ω ^{c} = ∅, 
∅ ^{c} = Ω 
. 

12. 
When we have more than two sets, we can deﬁne unions and intersections similarly. The union of m sets 
_{m}
j=1
A _{j} = A _{1} ∪ A _{2} ∪ ··· ∪ A _{m}
3
consists of elements that are in at least one of the m sets A _{1} , A _{2} , union of inﬁnitely many sets
∞
j=1
A _{j} = A _{1} ∪ A _{2} ∪ ··· ∪ ···
, A _{m} , and the
consists of elements that are in at least one of the sets A _{1} , A _{2} , Similarly, the intersection
m
j=1
A _{j} = A _{1} ∩ A _{2} ∩ ··· ∩ A _{m}
of m sets consists of elements that are in each of the m sets and the intersection of inﬁnitely many sets
∞
j=1
A _{j} = A _{1} ∩ A _{2} ∩ ···
consists of elements that are in each of the inﬁnitely many sets.
Exercise 1.1 Let Ω = {1, 2, 3, 4, 5, 6}, A = {1, 3, 5} and B = {2, 4, 6}. By using the deﬁnitions of sets and set operations ﬁnd the following sets:
A ^{c} = 
B ^{c} = 
Ω ^{c} = 
∅ ^{c} = 

{1} ^{c} = { 
} 
A ∪ B 
= 
A ∩ B 
A ∪ Ω = 

A ∩ Ω = 
B ∩ Ω = 
B 
= ∪ Ω = 
A ∪ A ^{c} = 
B
∪ B ^{c} =
etc.
Venn diagrams are visual aids for set operations.
Example 1.1 For three sets A, B and C, the Venn diagrams for A∪B, A∩B and A∩B ∩C are:
4
Exercise 1.2 Let A = {1, 3, 5, 7, 9, 11}, B = {1, 2, 3, 5, 8, 13} and
C = {1, 2, 4, 8, 16, 32}
denote three sets. Let us use a Venn diagram to visualise these three sets and their intersec tions. Can you mark which sets correspond to A, B and C in the ﬁgure below.
1.2
Experiments
Deﬁnition 1.1 An experiment is an activity or procedure that produces distinct, well deﬁned possibilities called outcomes. The set of all outcomes is called the sample space and is denoted by Ω, the uppercase Greek letter Omega. We denote a typical outcome in Ω
by ω, the lowercase Greek letter omega, and a typical sequence of possibly distinct outcomes
by ω _{1} , ω _{2} , ω _{3} ,
Example 1.2 Ω = {Defective, Nondefective} if our experiment is to inspect a light bulb.
Example 1.3 Ω = {Heads, Tails} if our experiment is to note the outcome of a coin toss.
In Examples 1.2 and 1.3, Ω only has two outcomes and we can refer to the sample space of such twooutcome experiments generically as Ω = {ω _{1} , ω _{2} }. For instance, the two outcomes of of Example 1.2 are ω _{1} = Defective and ω _{2} = Nondefective while those of Example 1.3 are ω _{1} = Heads and ω _{2} = Tails.
Example 1.4 If our experiment is to roll a die whose faces are marked with the six numerical symbols or numbers 1, 2, 3, 4, 5, 6 then there are six outcomes corresponding to the number that shows on the top. Thus, the sample space Ω for this experiment is {1, 2, 3, 4, 5, 6}.
5
Exercise 1.3 Suppose our experiment is to observe whether it will rain or shine tomorrow.
What is the sample space for this experiment? Answer: Ω = {
}.
The subsets of Ω are called events. The outcomes ω _{1} , ω _{2} ,
such as, {ω _{1} }, {ω _{2} },
., are simple events.
., when seen as subsets of Ω,
Example 1.5 In our roll a die experiment of Example 1.4 with Ω = {1, 2, 3, 4, 5, 6}, the set of odd numbered outcomes A = {1, 3, 5} or the set of even numbered outcomes B = {2, 4, 6} are examples of events. The simple events are {1}, {2}, {3}, {4}, {5}, and {6}.
Example 1.6 Consider a generic dietossing experiment by a human experimenter. Clearly,
Ω = {
} and C = {ω _{3} } are examples
of events. This experiment could correspond to rolling a die whose faces are:
} and A = {
}, B = {
1. sprayed with six diﬀerent scents (nose!), or
2. studded with six distinctly ﬂavoured candy (tongue!), or
3. contoured with six distinct bumps and pits (touch!), or
4. acoustically discernible at six diﬀerent frequencies (ears!), or
5. painted with six diﬀerent colours (eyes!), or
6. marked with six diﬀerent numbers 1, 2, 3, 4, 5, 6 (eyes!), or ,
This example is meant to concretely convince you that an experiment’s sample space is merely a collection of distinct elements called outcomes and these outcomes have to be discernible in some wellspeciﬁed sense to the experimenter!
Deﬁnition 1.2 A trial is a single performance of an experiment and it results in an out come.
Example 1.7 We call a single roll of a die as a trial.
Example 1.8 We call a single toss of a coin as a trial.
An experimenter often performs more that one trial. Repeated trials of an experiment forms the basis of science and engineering as the experimenter learns about the phenomenon by repeatedly performing the same mother experiment with possibly diﬀerent outcomes. This repetition of trials in fact provides the very motivation for the deﬁnition of probability in § 1.3.
Deﬁnition 1.3 An nproduct experiment is obtained by repeatedly performing n trials of a mother experiment.
6
Example 1.9 Suppose we toss a coin twice by performing two trials of the coin toss ex periment of Example 1.3 and use the shorthand H and T to denote the outcome of Heads and Tails, respectively. Then our sample space Ω = {HH, HT, TH, TT}. Note that this is the 2product experiment of the coin toss mother experiment.
Exercise 1.4 What is the event that at least one Heads occurs in the 2product experiment of Example 1.9, i.e., tossing a fair coin twice?
Exercise 1.5 What is the sample space of the 3product experiment of the coin toss exper iment, i.e., tossing a fair coin thrice?
Deﬁnition 1.4 An ∞product experiment is deﬁned as
_{n}_{→}_{∞} nproduct experiment of some mother experiment .
lim
Remark 1.5 Loosely speaking, a set that can be enumerated or tagged uniquely by natural
numbers N = {1, 2, 3,
elements. Some examples of such sets include any ﬁnite set, the set of natural numbers
N = {1, 2, 3,
= 0},
.} is said to be countably inﬁnite or contain countably many
.}, the set of nonnegative integers {0, 1, 2, 3,
.}, the set of all integers Z =
, −3, −2, −1, 0, 1, 2, 3,
.}, the set of all rational numbers Q = {p/q : p, q ∈ Z, q
but the set of real numbers R = (−∞, ∞) is uncountably inﬁnite.
Example 1.10 The sample space Ω of the ∞product experiment of tossing a coin inﬁnitely many times has uncountably inﬁnitely many elements and is in bijection with all binary numbers in the unit interval [0, 1] — just replace H with 1 and T with 0. We cannot enumerate all outcomes in Ω but can show some outcomes:
Ω = {HHHH · · · , HTHH · · · , THHH · · · , TTHH · · · , , TTTT · · · , HTTT · · · , THTT · · · , HHTT · · · ,
,
1.3
Probability
,
.}
.
Deﬁnition 1.6 Probability is a function P that assigns real numbers to events, which satisﬁes the following four Axioms:
Axiom (1): for any event A, 0 ≤ P (A) ≤ 1
Axiom (2): if Ω is the sample space then P (Ω) = 1
Axiom (3): if A and B are disjoint, i.e., P (A ∩ B) = ∅ then
P(A ∪ B) = P(A) + P(B)
7
Axiom (4): if A _{1} , A _{2} ,
exclusive events, i.e., A _{i} ∩ A _{j} = ∅ whenever i
is an inﬁnite sequence of pairwise disjoint or mutually
= j, then
P
i=1 A i =
∞
∞
i=1
P(A _{i} )
These axioms are merely assumptions that are justiﬁed and motivated by the frequency interpretation of probability in nproduct experiments as n tends to inﬁnity, which states that if we repeat an experiment a large number of times then the fraction of times the event A occurs will be close to P (A). To be precise, if we let N (A, n) be the number of times A occurs in the ﬁrst n trials then
P (A) = lim
_{n}_{→}_{∞} N (A, n)/n
Given P (A) = lim _{n}_{→}_{∞} N (A, n)/n, Axiom (1) simply aﬃrms that the fraction of times a given event A occurs must be between 0 and 1. If Ω has been deﬁned properly to be the set of ALL possible outcomes, then Axiom (2) simply aﬃrms that the fraction of times something in Ω happens is 1. To explain Axiom (3), note that if A and B are disjoint then
N (A ∪ B, n) = N (A, n) + N (B, n)
since A ∪ B occurs if either A or B occurs but it is impossible for both to occur. Dividing both sides of the previous equality by n and letting n → ∞, we arrive at Axiom (3). Axiom (3) implies that Axiom (4) holds for a ﬁnite number of sets. In many cases the sample space is ﬁnite so Axiom (4) is not relevant or necessary. Axiom (4) is a new assumption for inﬁnitely many sets as it does not simply follow from Axiom (3) any longer. Axiom (4) is more diﬃcult to motivate but without it the theory of probability becomes more diﬃcult and less useful, so we will impose this assumption on utilitarian grounds.
The following three Theorems are merely properties of probability.
Theorem 1.7 Complementation Rule. The probability of an event A and its comple ment A ^{c} in a sample space Ω, satisfy
P(A ^{c} ) = 1 − P(A)
.
(1)
Proof: By the deﬁnition of complement, we have Ω = A ∪ A ^{c} and A ∩ A ^{c} = ∅. Hence by Axioms 2 and 3,
1 = P (Ω) = P (A) + P (A ^{c} ), thus P (A ^{c} ) = 1 − P(A).
8
Example 1.11 Recall the coin toss experiment of Example 1.3 with Ω = {Heads, Tails}. Suppose that our coin happens to be fair with P (Heads) = 1/2. Since, {Tails} ^{c} = {Heads}, we can apply the complementation rule to ﬁnd the probability of observing a Tails from P (Heads) as follows:
P (Tails) = 1 − P (Heads) = ^{1} 2
^{.}
Theorem 1.8 Addition Rule for Mutually Exclusive Events. For mutually exclusive
or pairwise disjoint events A _{1} ,
, A _{m} in a sample space Ω,
P(A _{1} ∪ A _{2} ∪ A _{3} ∪ ··· ∪ A _{m} ) = P(A _{1} ) + P(A _{2} ) + P(A _{3} ) + ··· + P(A _{m} )
.
(2)
Proof: This is a consequence of applying Axiom (3) repeatedly:
P (A _{1} ∪ A _{2} ∪ A _{3} ∪ ··· ∪ A _{m} ) = P (A _{1} ∪ (A _{2} ∪ ··· ∪ A _{m} ))
= P (A _{1} ) + P (A _{2} ∪ (A _{3} ··· ∪ A _{m} )) = P (A _{1} ) + P (A _{2} ) + P (A _{3} ···
∪ A _{m} ) = ···
= P(A _{1} ) + P(A _{2} ) + P(A _{3} ) + ··· + P(A _{m} ) .
Example 1.12 Let us observe the number on the ﬁrst ball that pops out in a New Zealand
Lotto trial. There are forty balls labelled 1 through 40 for this experiment and so the sample
, 39, 40}. Because the balls are vigorously whirled around inside the
Lotto machine before the ﬁrst one pops out, we can model each ball to pop out ﬁrst with the same probability. So, we assign each outcome ω ∈ Ω the same probability of _{4}_{0} , i.e., our
probability model for this experiment is:
space Ω = {1, 2, 3,
1
P(ω) =
_{4}_{0} 1 , for each
ω ∈ Ω = {1, 2, 3,
, 39, 40}
.
NOTE: we sometimes abuse notation and write P (ω) instead of the more accurate but cumbersome P ({ω}) when writing down probabilities of simple events. Now, let’s check if Axiom (1) is satisﬁed for simple events in our model for this Lotto experiment,
0 ≤ P (1) = P (2) = · · · = P (40) =
1
40 ^{≤} ^{1}
Is Axiom (3) satisﬁed? For example, disjoint simple events {1} and {2}
P({1, 2}) = P({1} ∪ {2}) = P({1}) + P({2}) =
1
1
40 ^{+} 40 ^{=}
2
40
Is Axiom (2) satisﬁed? Yes, by Equation (2) of the addition rule for mutually exclusive events (Theorem 1.8):
P (Ω) = P ({1, 2,
, 40}) = P
i=1 i =
40
9
40
i=1
P(i) =
1
40 ^{+}
1
_{4}_{0} + ··· +
1
40 ^{=} ^{1}
(a) 1114 NZ Lotto draw frequency from 1987 to 2008.
(b) 1114 NZ Lotto draw relative frequency from 1987 to 2008.
Recommended Activity 1.1 Explore the following web sites to learn more about NZ and British Lotto. The second link has animations of the British equivalent of NZ Lotto. http://lotto.nzpages.co.nz/
Theorem 1.9 Addition Rule for Two Arbitrary Events. For events A and B in a sample space,
Proof:
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
.
(3)
P(A ∪ B) 
= 
P (A ∪ (B ∩ A ^{c} )) 
= 
P(A) + P(B ∩ A ^{c} ) by Axiom (3) and disjointness 

= 
P(A) + P(B) − P(A ∩ B) 
The last equality P (B ∩ A ^{c} ) = P (B) − P (A ∩ B) is due to Axiom (3) and the disjoint union of B = (B ∩ A ^{c} ) ∪ (A ∩ B) giving P (B) = P (B ∩ A ^{c} ) + P (A ∩ B). It is easy to see this with a Venn diagram.
Exercise 1.6 In English language text, the twenty six letters in the alphabet occur with the following frequencies:
E 
13% 
R 
7.7% 
A 
7.3% 
H 
3.5% 
F 
2.8% 
M 
2.5% 
W 
1.6% 
X 
0.5% 
J 
0.2% 
T 
9.3% 
O 
7.4% 
S 
6.3% 
L 
3.5% 
P 
2.7% 
Y 
1.9% 
V 
1.3% 
K 
0.3% 
Z 
0.1% 
N 
7.8% 
I 
7.4% 
D 
4.4% 
C 
3% 
U 
2.7% 
G 
1.6% 
B 
0.9% 
Q 
0.3% 
Suppose you pick one letter at random from a randomly chosen English book from our
, Z} – ignoring upper/lower cases, then what is the
central library with Ω = {A, B, C, probability of the following events?
(a) 
P ({Z}) = 
(b) 
What is the most likely outcome? 
(c) 
P (‘picking any letter’) = P (Ω) = 
10
(d)
P ({E, Z}) =
— by Axiom (3)
(e) P (‘picking a vowel’) = by Equation (2) of addition rule for mutually exclusive events (Theorem 1.8).
(f) P (‘picking any letter in the word WAZZZUP’) = tion (2) of addition rule for mutually exclusive events (Theorem 1.8).
(g) P (‘picking any letter in the word WAZZZUP or a vowel’) =
by Equa
= 42.2%
by Equation (3) of addition rule for two arbitrary events (Theorem 1.9).
1.4 Conditional Probability
Conditional probability allows us to make decisions from partial information about an ex periment.
Deﬁnition 1.10 The probability of an event B under the condition that an event A occurs is called the conditional probability of B given A and is denoted by P (BA). In this case A serves as a new (reduced) sample space, and that probability is the fraction of P (A) which corresponds to A ∩ B. Thus,
P(BA) = ^{P}^{(}^{A} P(A) ^{∩} ^{B}^{)} , 
if 
= P (A) 0 
(4) 

Similarly, the conditional probability of A given B is 
. 

P(AB) = ^{P}^{(}^{A} P(B) ^{∩} ^{B}^{)} , 
if 
= P (B) 0 
. 
(5) 
Conditional Probability is a probability and therefore all four Axioms of probability also hold for conditional probability of events given the conditioning event A has P (A) > 0.
Axiom (1): For any event B, 0 ≤ P (BA) ≤ 1.
Axiom (2): P (ΩA) = 1.
Axiom (3): For any two disjoint events B _{1} and B _{2} , P(B _{1} ∪B _{2} A) = P(B _{1} A)+P(B _{2} A).
Axiom (4): For mutually exclusive or pairwisedisjoint events, B _{1} , B _{2} ,
P(B _{1} ∪ B _{2} ∪ ···A) = P(B _{1} A) + P(B _{2} A) + ··· .
.,
Note that the complementation and addition rules also follow for conditional probability.
1. complementation rule for conditional probability:
P(BA) = 1 − P(B ^{c} A)
11
.
(6)
2. addition rule for two arbitrary events B _{1} and B _{2} :
P(B _{1} ∪ B _{2} A) = P(B _{1} A) + P(B _{2} A) − P(B _{1} ∩ B _{2} A)
.
(7)
Theorem 1.11 Multiplication Rule. If A and B are events and P (A)
then
= 0, P (B)
= 0,
P(A ∩ B) = P(A)P(BA) = P(B)P(AB)
.
(8)
Proof: Solving for P (A ∩ B) in the Deﬁnitions (4) and (5) of conditional probability, we obtain Equation (8) of the above theorem.
Example 1.13 Suppose the NZ All Blacks team is playing in a four team Rugby match. In the ﬁrst round they have a tough opponent that they will beat 40% of the time but if they win that game they will play against an easy opponent where their probability of success is 0.8. What is the probability that they will win the tournament?
If A and B are the events of victory in the ﬁrst and second games, respectively, then P (A) = 0.4 and P (BA) = 0.8, so by multiplication rule, the probability that they will win the tournament is:
P (A ∩ B) = P (A)P (BA) = 0.4 × 0.8 = 0.32 .
Exercise 1.7 In Example 1.13, what is the probability that the All Blacks will win the ﬁrst game but loose the second?
Deﬁnition 1.12 Independent events. If events A and B are such that
P(A ∩ B) = P(A)P(B),
they are called independent events. Assuming P (A)
= 0, P (B)
= 0, we have P (AB) =
P (A), and P (BA) = P (B). This means that the probability of A does not depend on the occurrence or nonoccurence of B, and conversely. This justiﬁes the term “independent”.
12
Example 1.14 Suppose you toss a fair coin twice such that the ﬁrst toss is independent of the second. Then,
P (HT) = P (Heads on the ﬁrst toss ∩ Tails on the second toss) = P (H)P (T) =
1
2 ^{×}
1
2 ^{=} 1
4
^{.}
Similarly, P (HH) = P (TH) = P (TT) = space Ω = {HT, HH, TH, TT}.
_{2} × ^{1} _{2} = ^{1} _{4} . Thus, P (ω) =
1
_{4} 1 for every ω in the sample
Accordingly, three events A, B, C are independent if and only if
P(A ∩ B) = P(A)P(B), P(B ∩ C) = P(B)P(C), P(C ∩ A) = P(C)P(A), P(A ∩ B ∩ C) = P(A)P(B)P(C).
Example 1.15 Suppose you independently toss a fair die thrice. What is the probability of getting an even outcome in all three trials? Let E _{i} be the event that the outcome is an even number on the ith trial. Then, the probability of getting an even number in all three trials is:
P(E _{1} ∩ E _{2} ∩ E _{3} ) = P(E _{1} )P(E _{2} )P(E _{3} ) = (P ({2,
4, 6})) ^{3} = (P ({2} ∪ {4} ∪ {6})) ^{3}
= (P ({2}) + P ({4}) + P ({6})) ^{3} =
1
6 ^{+} 1
6 ^{+} 1
3
6 3 ^{=} 6
_{3} =
1
2 3 ^{=} 1
8
^{.}
Deﬁnition 1.13 Independence of n Events. Similarly, n events A _{1} , independent if
, A _{n} are called
P(A _{1} ∩ ··· ∩ A _{n} ) = P(A _{1} )P(A _{2} ) ··· P(A _{n} ) .
Example 1.16 Suppose you toss a fair coin independently m times. Then each of the 2 ^{m}
possible outcomes in the sample space Ω has equal probability of
2
1 _{m} due to independence.
Theorem 1.14 Total probability theorem. Suppose B _{1} ∪ B _{2} ··· ∪ B _{n} is a sequence of
events with positive probability that partition the sample space, i.e., B _{1} ∪ B _{2} ··· ∪ B _{n} = Ω,
B _{i} ∩ B _{j} = ∅ for i
= j, then
P(A) =
n
i=1
P(A ∩ B _{i} ) =
n
i=1
P(AB _{i} )P(B _{i} )
.
(9)
Proof: The ﬁrst equality is due to addition rule for mutually exclusive events, A ∩ B _{1} , A ∩
B _{2} ,
, A ∩ B _{n} and the second equality is due to multiplication rule.
13
Exercise 1.8 An wellmixed urn contain ﬁve red and ten black balls. We draw two balls from the urn without replacement. What is the probability that the second ball drawn is black?
Theorem 1.15 Bayes theorem.
_{P}_{(}_{A}_{}_{B}_{)} _{=} P(A)P(BA) P(B)
.
(10)
Proof: The proof is a consequence of the deﬁnition of conditional probability and the multiplication rule.
P(AB) = ^{P}^{(}^{A} P(B) ^{∩} ^{B}^{)}
_{=} P(B ∩ A)
P(B)
_{=} P(BA)P(A)
P(B)
_{=} P(A)P(BA)
P(B)
^{.}
Exercise 1.9 Approximately 1% of women aged 40–50 have breast cancer. A woman with breast cancer has a 90% chance of a positive test from a mammogram, while a woman without breast cancer has a 10% chance of a false positive result from the test. What is the probability that a woman indeed has breast cancer given that she just had a positive test?
14
2 Random Variables
We are used to traditional variables such as x as an “unknown” in the equation:
x + 3 = 7
,
where we can solve for x = 7 − 3 = 4. Another example is to use traditional variables to represent geometric objects such as a line:
y = 3x − 2
,
where the variable y for the yaxis is determined by the value taken by the variable x, as x varies over the real line R = (−∞, ∞). The variables we have used to represent sequences such as:
{a _{n} } _{n}_{=}_{1} = a _{1} , a _{2} , a _{3} ,
∞
,
are also traditional. When we wrote functions of a variable, such as x, in:
f(x) =
x
_{x} _{+} _{1} ,
for x ≥ 0
,
the argument x is also a traditional variable. In fact, all of Calculus you have been taught is by means of such traditional variables.
, f (x)?
Answer: They are instances of deterministic variables, that is, these traditional variables take a ﬁxed or deterministic value when we can solve for them.
Question: What is common to all these variables above, such as, x, y, a _{1} , a _{2} , a _{3} ,
We need a new kind of variable to deal with realworld situations where the same variable may take diﬀerent values in a nondeterministic manner. Random variables do this job for us. Random variables, unlike traditional deterministic variables can take a bunch of diﬀerent values! In fact, random variables are actually functions! They take you from the “world of random processes and phenomena” to the world of real numbers. In other words, a random variable is a numerical value determined by the outcome of the experiment.
15
Deﬁnition 2.1 A Random variable or RV is a function from the sample space Ω to the set of real numbers R:
X(ω) : Ω → R
,
such that, for every real number x, the corresponding set {ω ∈ Ω : X(ω) ≤ x}, i.e. the set
of 
outcomes whose numerical value is less than or equal to x, is an event. The probability 
of 
such events is given by the function F (x) : R → [0, 1] called the distribution function 
or 
DF of the random variable X: 
F (x) = P (X ≤ x) = P ({ω : X(ω) ≤ x}) , for any x ∈ R
.
(11)
NOTE: Distribution function or DF is sometimes called cumulative distribution function or CDF in precalculus treatments of the subject. We will avoid the CDF nomenclature in our treatment.
Example 2.1 Recall the rain or shine experiment of Example 1.3 with sample space Ω = {rain, shine}. We can associate a random variable X with this experiment as follows:
X(ω) =
1,
0,
if ω = rain if ω = shine
Thus, X is 1 if it will it rain tomorrow and 0 otherwise. Note that another equally valid discrete random variable, say Y , for this experiment is:
_{Y} _{(}_{ω}_{)} _{=} ^{} π, ^{√} 2,
if
if
ω = rain ω = shine
A random variable can be chosen to assign each outcome ω ∈ Ω to any real number as the
experimenter desires.
Recall the experiments of Example 1.6 that involved smelling, tasting, touching, hearing, or seeing to discern between outcomes. It becomes very diﬃcult to communicate, process and make decisions based on outcomes of experiments that are discerned in this manner and even more diﬃcult to record them unambiguously. This is where real numbers can give us a helping hand. Data are typically random variables that act as numerical placeholders for out comes of an experiment about some realworld random process or phenomenon. We said that the random variable can take one of many values, but we cannot be certain of which value it will take. However, we can make probabilistic statements about the value x the random variable X will take. This can be done with probabilities.
Theorem 2.2 Probability that the RV X takes a value x in the halfopen interval ( a, b ], i.e., a < x ≤ b, is:
(12)
P(a < X ≤ b) = F(b) − F(a)
.
16
Proof: Since the events (X ≤ a) = {ω : X(ω) ≤ a} and (a < X ≤ b) = {ω : a < X(ω) ≤ b} are mutually exclusive or disjoint events whose union is the event (X ≤ b) = {ω : X(ω) ≤ b}, Axiom (3) of Deﬁnition 1.6 of probability and by Equation (11) in Deﬁnition 2.1 of DF,
F(b) = P(X ≤ b) = P(X ≤ a) + P(a < X ≤ b) = F(a) + P(a < X ≤ b)
Bien plus que des documents.
Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.
Annulez à tout moment.