Vous êtes sur la page 1sur 244

ST1051-ST3905-ST5005-ST6030

ST1051 - Introduction to Probability and Statistics


ST3905 - Applied Probability and Statistics
ST5005 - Introduction to Probability and Statistics
ST6030 - Foundations of Statistical Data Analytics

Eric Wolsztynski

eric.w@ucc.ie

Department of Statistics
School of Mathematical Sciences
University College Cork, Ireland

2015-2016
Version 1.0
ST1051-ST3905-ST5005-ST6030

Acknowledgment
These lecture notes make use of former material written by Dr
Kingshuk Roy Choudhury and Dr Supratik Roy for previous course
syllabii. This material largely used [Dekking et al 2005].

However the structure of the course has been completely reviewed


in 2014-15 and updated again for 2015-16. Updates are based
mainly on [Rice 1995].

All mistakes and inaccuracies are the sole responsibility of their


author Eric Wolsztynski.

For any comment or query about this document, please contact


eric.w@ucc.ie

IPS 2
ST1051-ST3905-ST5005-ST6030
Course information

References
[1] J. A. Rice, Mathematical Statistics and Data Analysis, 2nd Edition, ITP Duxbury Press 1995

[2] J. L. Devore, Probability and Statistics for Engineering and the Sciences, 3rd Edition, Brooks-Cole 1991

[3] F. M. Dekking, C. Kraaikamp, H. P. Lopuha and L. E. Meester, A Modern Introduction to Probability and
Statistics, Springer 2005

[4] B.W. Lindgren, Statistical Theory, Fourth Edition, Chapman & Hall, 1993

[5] D.A. Berry and B.W. Lindgren, Statistics: Theory and Methods, 2nd edition, 1995

[6] MITOpenCourseWare (MIT online lecture material):


http://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/

[7] J. D. Gibbons and S. Chakraborti, Nonparametric Statistical Inference, 4th Edition, Dekker 2014

[8] B. S. Everitt and T. Hothorn, A Handbook of Statistical Analyses Using R, Second Edition, Chapman & Hall
2010

[9] M. J. Crawley, Statistics: an Introduction Using R, Wiley 2005

[10] R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. URL http://www.R-project.org/.

IPS 3
ST1051-ST3905-ST5005-ST6030
Course information

Timetable

This module is taught in Period 1


Lectures: Mondays 3-4pm in BHSC G01
Fridays 3-4pm in WGB G05

Tutorials: Fridays 11am-12pm in Windle ANLT


Fridays 4-5pm in WGB G05

Practicals: ST1051
Monday 4-5pm in lab WGB G34 (TBC)
Tuesday 3-4pm in lab WGB G33 (TBC)

IPS 4
ST1051-ST3905-ST5005-ST6030
Course information

Assessment

ST1051/ST3905:

2 home assignments (10 + 10 marks)


+ 90-minute exam (80 marks)

ST5005/ST6030:

3 home assignments (10 + 10 + 30 marks)


+ 90-minute exam (50 marks)

IPS 5
ST1051-ST3905-ST5005-ST6030
Course information

Module objective

To provide an understanding of fundamental notions of Probability


and Statistics, and explore basic probability and statistical notions
underlying hypothesis-driven data analytic methods.

IPS 6
ST1051-ST3905-ST5005-ST6030
Outline

1 Motivation

2 Elements of Probability Theory

3 Discrete Random Variables

4 Continuous Random Variables

5 Limit theorems

6 Statistical Inference

7 Estimation

8 Hypothesis Testing
IPS 7
ST1051-ST3905-ST5005-ST6030
Motivation

Section I

Motivation

IPS 8
ST1051-ST3905-ST5005-ST6030
Motivation
General concepts

Probability? Statistics?

Focus on random or unpredictable phenomenon

Goal is usually to understand, represent, describe or predict

Probability theory aims at describing reality: mathematical


framework for representing real-life phenomena

Statistics aim at providing models and techniques to analyse


observations: data-driven approach

The central feature is always the information (data).

IPS 9
ST1051-ST3905-ST5005-ST6030
Motivation
General concepts

Statistics consist in the collection and analysis of data.

Probability theory provides a mathematical foundation for


statistics.
IPS 10
ST1051-ST3905-ST5005-ST6030
Motivation
Examples

Typical examples
Business, financial mathematics and actuarial science:
decision making, investment strategies

trading (high-probability trading, return plans, strategies, ...)

insurance / pensions (premium pricing, risk assessment, ...)

Engineering:
tracking mobile terminals in wireless networks

image and video processing

Medical and biostatistics:


clinical trials

diagnostic and prognostic analyses

genomics
IPS 11
ST1051-ST3905-ST5005-ST6030
Motivation
Examples

Why probability and statistics: space shuttle Challenger

[Dekking et al 2005]

On 28th January 1986, the space shuttle Challenger exploded


about one minute after it had taken off from the launch pad
at Kennedy Space Center in Florida

Root cause of the disaster: failure of O-rings (sealed joints


that link rocket boosters)

Apparently, a management decision was made to overrule


the engineers recommendation not to launch

IPS 12
ST1051-ST3905-ST5005-ST6030
Motivation
Examples

Why probability and statistics: space shuttle Challenger

The Challenger launch was the 24th of the space shuttle


program, and we can look at the data on the number of failed
O-rings, available from previous launches

Each rocket has three O-rings, and two rocket boosters are
used per launch

Because low temperatures are known to adversely affect the


O-rings, we also look at the corresponding launch temperature

IPS 13
ST1051-ST3905-ST5005-ST6030
Motivation
Examples

Figure: number of failed O-rings per mission


There are 23 dots: one time the boosters could not be recovered
from the ocean; temperatures are rounded to the nearest degree
Fahrenheit; in case of two or more equal data points these are
shifted slightly

IPS 14
ST1051-ST3905-ST5005-ST6030
Motivation
Examples

Modelling...
The probability p(t) that an individual O-ring fails should depend
on the launch temperature t. Use the data to calibrate this model
(a Binomial distribution) and estimate the expected number of
failures, 6p(t).

IPS 15
ST1051-ST3905-ST5005-ST6030
Motivation
Examples

Aftermaths...

Combining these with estimated probabilities of other events


needed for a complete failure of the joint, the estimated
probability of failure is 0.023...

Six field-joints implies probability of at least one complete


failure is 1 (1 0.023)6 = 0.13

Would you hop on the shuttle?

IPS 16
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory

Section II

Elements of Probability Theory

IPS 17
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory

Outline

Introduction

Events and set operations

Computing probabilities

Conditional probability and independence

Random variables and distributions

IPS 18
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Introduction

Probability

Probability, chance, randomness, likelihood, ...

Probability theory aims at representing chance phenomena


mathematically.

Mathematics allow us to organise the information and its


complexity.

Ultimately, probabilities are always ratios of counts.

IPS 19
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Introduction

Outcomes, events, and sample spaces

Random or unpredictable phenomenon = experiment outcome

The outcomes are elements of a sample space

Subsets of are called events

An event is assigned a probability, between 0 and 1, that


expresses its likelihood

Sample spaces: sets whose elements describe the outcomes

Basic experiment: the tossing of a coin; 2 possible outcomes:


heads and tails. Sample space = {H, T }

IPS 20
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Introduction

Outcomes, events, and sample spaces


A commuter drives through a sequence of 3 intersections with
traffic lights. Each time, she either stops (s) or continues (c).
The sample space is the set of all possible outcomes:

= {ccc, ccs, css, csc, sss, ssc, scc, scs}

Experiment: ask the next person we meet on the street in


which month her birthday falls. Sample space:

= {Jan, Feb, Mar , Apr , May , Jun, Jul, Aug , Sep, Oct, Nov , Dec}

Question: the length of time between successive earthquakes


in Nice (France) that are greater than a given magnitude, may
also be considered an experiment. What is the sample space
for this experiment?
IPS 21
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Introduction

Products of sample spaces


Common scenario: same experiment performed several times

Ex: throw a coin twice. What is the sample space?

= {H, T } {H, T } = {(H, H), (H, T ), (T , H), (T , T )}

If we had a fair coin, i.e., P(H) = P(T ), then

P((H, H)) = P((H, T )) = P((T , H)) = P((T , T )) = 1/4

Generally, for two experiments with sample spaces 1 and 2 ,


sample space for the combined experiment is

= 1 2 = {(1 , 2 ) : 1 1 , 2 2 }

If |1 | = r ,|2 | = s, then |1 2 | = rs
IPS 22
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Introduction

Events
Recall: subsets of the sample space are called events

Event A occurs if experiment outcome is an element of set A

Example (birthday experiment):


events = outcomes corresponding to a long month (31 days)
L = {Jan, Mar , May , Jul, Aug , Oct, Dec}
Events may be combined according to the usual set operations

Example: event = the months having r in their name


R = {Jan, Feb, Mar , Apr , Sep, Oct, Nov , Dec}
Then long months having the letter r are
L R = {Jan, Mar , Oct, Dec}
IPS 23
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Events and set operations

Events

The set L R is called the intersection of L and R and occurs


if both L and R occur

Similarly, we have the union A B of two sets A and B,


which occurs if at least one of the events A and B occurs

Another common operation is taking complements

The event Ac = { : 6 A} is called the complement of


A; it occurs if and only if A does not occur

The complement of is denoted , the empty set, or


impossible event

IPS 24
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Events and set operations

Sets and set operations and Events

Events A and B disjoint or mutually exclusive if A and B have


no outcomes in common; or A B = .
Ex: {the birthday falls in a long month} {Feb} =

Event A implies event B if the outcomes of A also lie in B:


AB
De Morgans laws: For any two events A and B,
(A B)c = Ac B c
c c c IPS 25
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Events and set operations

Sets and set operations and Events


Let:
J be the event John is to blame

M be the event Mary is to blame

Express the following two statements in terms of the events


J, J c , M, M c :
It is certainly not true that neither John nor Mary is to blame

John or Mary is to blame, or both

Check the equivalence of the statements by means of De


Morgans laws

IPS 26
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Events and set operations

Disjoint and contained events

Minimal and maximal intersection of two sets.

IPS 27
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability

Probability = measure of how likely it is that an event occurs

A probability is a ratio of counts:


number of ways event A can occur
P(A) =
total number of possible outcomes

The number P(A) is called the probability that A occurs

We assign a probability to each event

Since each event has to be assigned a probability, we speak of


a probability function

IPS 28
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability

Definition: a probability function P on a finite sample space


assigns to each event A in a number P(A) in [0, 1] such that

(i) P() = 1

(ii) P(A B) = P(A) + P(B) if AB =

IPS 29
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability
Recall:
(i) P() = 1

(ii) P(A B) = P(A) + P(B) if AB =

(i) states that the outcome of the experiment is always an


element of the sample space

(ii) is the additivity property of a probability function

(ii) implies additivity of the probability function over more


than two sets

If A, B, C are disjoint events, then (A B) C = and


P(A B C ) = P(A B) + P(C )
= P(A) + P(B) + P(C )
IPS 30
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability

Example: to decide whether Peter or Paul has to wash the


dishes, we may toss a coin

We consider this a fair implies heads and tails are equally


likely to occur

So we put P({H}) = P({T }) = 1/2

We write {H} for the set consisting of the single element H,


because a probability function is defined on events, not on
outcomes

IPS 31
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability

Example: due to an asymmetric distribution of the mass over


the coin, the coin is not completely fair

For example, P(H) = 0.4999 and P(T ) = 0.5001

Bernouilli experiment: two possible outcomes, say failure


and success, with probabilities 1 p and p to occur, where
p [0, 1]

Example: buying a ticket in a lottery with 10,000 tickets and


only one prize, where success stands for winning the prize,
then p = 104

IPS 32
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability
How should we assign probabilities in the experiment where
we ask for the birthday month?

P(Jan) = P(Feb)=. . .= P(Dec) = 1/12

What about long/short months?

P(Jan) = 31/365 and P(Apr ) = 30/365

Assuming that one in every four years is a leap year, how


would you assign a probability to each month?

When outcomes are real numbers (e.g. time to next


earthquake), it is impossible to assign a positive probability to
each outcome (there are just too many outcomes!)

IPS 33
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability: extensions to non-disjoint events

In general, additivity of P implies that the probability of an


event is obtained by summing the probabilities of the
outcomes belonging to the event
Exercise: compute P(L) and P(R) in the birthday experiment,
where
L = {Jan, Mar , May , Jul, Aug , Oct, Dec}

R = {Jan, Feb, Mar , Apr , Sep, Oct, Nov , Dec}

IPS 34
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability: extensions to non-disjoint events

Rule to compute probabilities of events A and B that are not


disjoint?

Note that we can write

A = (A B) (A B c )

which is a disjoint union

Hence
P(A) = P(A B) + P(A B c )

IPS 35
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability: extensions to non-disjoint events

We can split A B in the same way with B and B c

We obtain (A B) B and (A B) B c

These boil down to be respectively B and A B c

Thus
P(A B) = P(B) + P(A B c )

IPS 36
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Probability: extensions to non-disjoint events

Recall: for any two events A and B,

P(A) = P(A B) + P(A B c )


P(A B) = P(B) + P(A B c )

Eliminating P(A B c ) from these 2 equations we obtain the


rule:
P(A B) = P(A) + P(B) P(A B)

From the additivity property we can also find a way to


compute probabilities of complements of events:

since A Ac = , P(Ac ) = 1 P(A)

IPS 37
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Counting methods: combinations and permutations


Permutation: ordered arrangement of objects

Given a set of size n and a sample of size k, there are...


with replacement: nk different ordered samples

without replacement:
n!
Ank = = n(n 1) . . . (n k + 1)
(n k)!
different ordered samples

Corollary: the number of orderings of n elements is

n! = n(n 1)(n 2) . . . 1

Ex: there are 5!=120 ways to line up five children


IPS 38
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Counting methods: combinations and permutations


Combinations:
 
n n!
Ckn = =
k k!(n k)!

enumerates the number of possible combinations of k out of


n items
n!
Using Ckn = k!(nk)! implies that order does not matter

Application: these binomial coefficients occur in


n
X
(a + b)n = Ckn ak b nk
k=0

(try with a = b = 1)
IPS 39
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Counting methods: contingency tables


Ex: At a particular police checkpoint, 20% of females fail a breath
test for drunken driving. The corresponding percentage for males is
40%. Of the individuals tested, 70% are male.

1 How likely is it that a randomly selected individual passes the


breath test?

2 How likely is it that a randomly selected male passes the


breath test?

3 Suppose that an individual fails a breath test, what is the


probability this individual is female?

IPS 40
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Computing probabilities

Ex: At a particular police checkpoint, 20% of females fail a breath


test for drunken driving. The corresponding percentage for males is
40%. Of the individuals tested, 70% are male...

First, pick a hypothetical number of participants, then apply


these proportions in a contingency table:

Gender
Breath test Male Female Total
Pass 420 240 660
Fail 280 60 340
Total 700 300 1,000

Now we can answer the questions (cf. tutorial)...

IPS 41
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Conditional probability

Ex [Rice 1995 p.15]: digitalis therapy is beneficial to patients with


a particular heart condition, but it has a risk of intoxication (a
serious side-effect that is difficult to diagnose).
For diagnosis purposes, the concentration of digitalis in the blood
is measured in 135 patients and results are arranged as follows:

T+ high blood concentration (positive test)

T low blood concentration (negative test)

D+ toxicity (disease present)

D no toxicity (disease absent)

IPS 42
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Conditional probability
T+ high blood concentration (positive test)

T low blood concentration (negative test)

D+ toxicity (disease present)

D no toxicity (disease absent)

Toxicity
D+ D Total
T+ 25 14 39
T 18 78 96
Total 43 92 135

IPS 43
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Conditional probability
Converting the frequencies to proportions (out of 135):
Toxicity Toxicity
D+ D Total D+ D Total
T+ 25 14 39 T+ .185 .104 .289
T 18 78 96 T .135 .578 .711
Total 43 92 135 Total .318 .682 1.000

From the table: P(T +) = .289, P(D+) = .318.

If one knows that the test for high blood concentration was
positive, what is the probability of disease (toxicity)?

P(D + T +) 25 .185
P(D+ | T +) = = = = .640 = 64%
P(T +) 39 .289

IPS 44
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Definition of Conditional Probability

Definition: the conditional probability of A given B is


(
P(AB)
P(A | B) = P(B) , if P(B) > 0
0, otherwise

The multiplication rule follows: for any events A and B,


P(A B) = P(A|B)P(B)

Show that P(A|B) + P(Ac |B) = 1

Let B be a fixed conditioning event and define


Q(A) = P(A | B) for events A ; then Q is a probability
function and hence satisfies all the rules

IPS 45
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

The law of total probability and Bayes rule


2001: the EC introduced massive testing of cattle to determine
infection with the transmissible form of Bovine Spongiform
Encephalopathy (BSE, mad cow disease) [Dekking et al 2005]

As no test is 100% accurate, most tests have the problem of


false positives and false negatives

False positive: test says cow is infected, although cow actually


isnt

False negative: an infected cow is not detected by the test

Let B=cow has BSE and T=test comes up positive

Test the test by analyzing samples from cows that are


known to be infected or known to be healthy and so
determine effectiveness of the test.
IPS 46
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Results may be summarized as follows: an infected cow has a


70% chance of testing positive, and a healthy cow just 10%;
ie P(T |B) = 0.70, P(T |B c ) = 0.10

Probability P(T ) that an arbitrary cow tests positive?

The tested cow is either infected or it is not: T occurs in


combination with B or with B c (no other possibilities)

In terms of events T = (T B) (T B c ), so that


P(T ) =P(T B)+ P(T B c )

P(T B) = P(T |B)P(B)

P(T B c ) = P(T |B c )P(B c )

P(T ) = P(T |B)P(B) + P(T |B c )P(B c )

IPS 47
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Recall:

P(T ) = P(T |B)P(B) + P(T |B c )P(B c )

This is an application of the law of total probability

Computing a probability through conditioning on several


disjoint events that make up the whole sample space

Suppose P(B) = 0.02; then:

P(T ) = 0.02 0.70 + (1 0.02) 0.10 = 0.112

IPS 48
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Total probability

Exercise: Calculate P(T ) when P(T |B) = 0.99 and


P(T |B c ) = 0.05

The law of total probability:


Suppose B1 , B2 ,. . . , Bm are disjoint events such that

m
i=1 Bi =

The probability of an arbitrary event A can be expressed as:


m
X
P(A) = P(A|Bi )P(Bi )
i=1

IPS 49
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Bayes Theorem
Suppose a cow tests positive; what is the probability it really
has BSE?

I.e. what is P(B|T ) given P(T |B)? We have:


P(T B)
P(B|T ) =
P(T )
P(T |B)P(B)
=
P(T |B)P(B) + P(T |B c )P(B c )
So with P(B) = 0.02 we find
P(B|T ) = (0.70 0.02)/(0.70 0.02 + 0.10(1 0.02)) = 0.125

Similarly: P(B|T c ) = 0.0068

Test A is not a very good test; a perfect test would result in


P(B|T ) = 1 and P(B|T c ) = 0. This is Bayes rule.
IPS 50
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Bayes Theorem

Bayes rule:
Suppose the events B1 , B2 , . . . , Bm are disjoint and
m
i=1 Bi = . Then

P(A|Bi )P(Bi )
P(Bi |A) = Pm
j=1 P(A|Bj )P(Bj )

It follows from P(Bi |A)P(A) = P(A|Bi )P(Bi ) in combination


with the law of total probability applied to P(A)

Mad cow example:


Calculate P(B|T ) and P(B|T c ) if P(T |B) = 0.99 and
P(T |B c ) = 0.05

IPS 51
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Independence
Consider the three probabilities

P(B) = 0.02, P(B|T ) = 0.125, P(B|T c ) = 0.0068

If we know nothing about a cow, we would say that there is a


2% chance it is infected (B)

But if we know it tested positive (T), we can say there is a


12.5% chance the cow is infected

If it tested negative (TC ), there is only a 0.68% chance

Knowing whether T occurs affects our assessment of


likelihood of B
IPS 52
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Independence

Consider the three probabilities

P(B) = 0.02, P(B|T ) = 0.125, P(B|T c ) = 0.0068

Imagine the opposite: the test is useless

Whether the cow is infected is unrelated to the outcome of


the test, and knowing the outcome of the test does not
change our probability of B: P(B|T ) = P(B)

In this case we would call B independent of T

IPS 53
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Independence

Definition:

An event A is called independent of B if P(A|B) = P(A)

By application of the multiplication rule, if A is independent


of B, then

P(A B) = P(A|B)P(B) = P(A)P(B)

On the other hand, if P(A B) = P(A)P(B), then


P(A|B) = P(A) follows from the definition of independence

Therefore A independent of B implies P(A B) = P(A)P(B)

IPS 54
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Independence
Finally, by definition of conditional probability, if A is
independent of B, then
P(A B) P(A)P(B)
P(B|A) = = = P(B)
P(A) P(A)
that is, B is independent of A

To show that A and B are independent it suffices to prove just


one of the following:
P(A | B) = P(A)
P(B | A) = P(B)
P(A B) = P(A)P(B)
where A may be replaced by Ac and B replaced by B c , or both

If one of these statements holds, all of them are true


IPS 55
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Independence: example [Rice 1995 p.22]

A card is selected at random from a deck. Let A = card is an


ace and D = card is a diamond.

Knowing the card is an ace gives no information about its suit

Checking formally for independence: P(A) = 4/52 = 1/13


and P(D) = 1/4

Also, P(A D) = 1/52

Since P(A)P(D) = (1/4) (1/13) = 1/52 the events are in


fact independent

IPS 56
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Independence of more than two events

Events A1 ,A2 ,. . . ,Am are called independent if


m
Y
P(m
i=1 Ai ) = P(Ai )
i=1

This holds if any subset is replaced by complements

Suppose A and B are independent; and B and C are


independent. Then are A and C independent?

IPS 57
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Independence of more than two events: example

Perform two independent tosses of a coin

Let A={heads on toss 1}, B={heads on toss 2},


and C={ the two tosses are equal}

P(A) =P(B) = 1/2,

P(C ) = P(A B)+ P(Ac B c ) =1/4 + 1/4 = 1/2

A,B are independent by assumption

IPS 58
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Conditional probability and independence

Independence of more than two events: example


Given that the first toss is heads (A occurs), C occurs if and
only if the second toss is heads as well (B occurs), so
1
P(C |A) = P(B|A) = P(B) = = P(C )
2
By symmetry, P(C |B) = P(C )

So all pairs taken from A, B, C are independent: the three are


called pairwise independent

But P(A B C ) = P(A B) = 1/4 , whereas


P(A)P(B)P(C ) = 1/8

And P(A B C c ) = P() = 0, whereas


P(A)P(B)P(C c ) = 1/8
IPS 59
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Random variables

A random variable is a variable of interest whose values are not


known in advance and are subject to chance (variability).

Each possible value of a r.v. has an associated likelihood, or


probability (or mass, depending on the context).

A r.v. actually is a mapping defined over the whole sample space,


i.e. it is a function. The development of random variables is
associated with measure theory.

IPS 60
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Discrete random variables


Let be a sample space. A discrete random variable is a
function X : R that takes on a finite number of values
a1 , a2 , . . . , an or an infinite number of values a1 , a2 , . . .

In a way, a discrete random variable X transforms a sample


space to a more tangible sample space , whose events
are more relevant

Example : Two throws with a die and the corresponding sums


and maximum

For instance, S=sum transforms


= {(1, 1), (1, 2), ..., (1, 6), (2, 1), ..., (6, 5), (6, 6)}

to = {2, . . . , 12}
= {1, ..., 6}
M=maximum transforms to
IPS 61
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Formally, we must determine the probability distribution of X,


i.e. describe how the probability mass is distributed over
possible values of X

Once a discrete r.v. X is introduced, we can list the possible


values of X and their corresponding probabilities, and the
sample space is no longer important

This information is contained in the probability mass function


(pmf) of X

Ex (maximum): what is the pmf of M?

M=a 1 2 3 4 5 6
p(a) 1/36 3/36 5/36 7/36 9/36 11/36

IPS 62
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

The probability mass function p of a discrete random variable X


is the function
p : R [0, 1]
defined by
p(a) = P(X = a)
for < a < . If X is a discrete random variable that takes on
the values a1 , a2 , . . ., then

p(ai ) > 0
X
p(ai ) = 1
i

and p(a) = 0 for all other a.

IPS 63
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Definition: The distribution function F of a random variable X is


the function
F : R [0, 1]
defined by
F (a) = P(X a)
for < a < .

Both the probability mass function and the distribution


function of a discrete random variable X contain all the
probabilistic information of X

The probability distribution of X is determined by either of


them

IPS 64
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Properties of pmf and pdf


Example plots for M

IPS 65
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Properties of the distribution function F of a random variable X:

1 For a b one has that F (a) F (b)

2 Since F (a) is a probability, 0 F (a) 1, and


lim F (a) = 1
a+
lim F (a) = 0
a

3 F is right-continuous, i.e., one has


lim F (a + ) = F (a)
0

NB: a b implies that the event {X a} is contained in the


event {X b}

Conversely, any function F satisfying 1, 2, and 3 is the


distribution function of some random variable
IPS 66
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Exercise: Let X be a discrete random variable, and let a be such


that p(a) > 0. Show that

F (a) = P(X < a) + p(a)

IPS 67
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Continuous random variables

Let be a sample space. A continuous random variable is a


function X : R that takes on any value a R

We no longer consider the mass of each possible value of X

Instead we consider the likelihood that X (a, b) for a < b

Example : the pH level X of some chemical compound can


take any value between 0 and 14. We would then evaluate
e.g. the probability that 5.5 X 6.5.

IPS 68
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Continuous random variables


The probability density function (pdf) f (x) of X is an
integrable function such that
Z b
P(a X b) = f (x)dx
a

Conditions on f :
f (x) 0 x
R

f (x)dx = 1

The cdf of a continuous r.v. X is defined as


Z x
F (x) = f (u)du = P(X x)

IPS 69
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Expectation and variance


Definition: The expected value of a discrete random variable X
is defined as X
E (X ) = xi p(xi )
xi

Definition: The expected value of a continuous random variable


X is defined as Z
E (X ) = xf (x)dx

E [g (X )] is said to exist if the corresponding sum and integral


exists.

IPS 70
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Definition: The variance of a random variable X is defined as

Var (X ) = E (X E (X ))2


= E (X 2 ) E (X )2

The variance of a discrete r.v. X is obtained from the pmf:


2
X X
Var (X ) = xi2 p(xi ) xi p(xi )
xi xi

The variance of a continuous r.v. X is obtained from the pdf:


Z Z 2
2
Var (X ) = x f (x)dx xf (x)dx

IPS 71
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

The standard deviation of a rv X is


p
(X ) = Var (X )

It has the same dimension as the measure itself: e.g. if X is


expressed in metres, then so is (X )

IPS 72
ST1051-ST3905-ST5005-ST6030
Elements of Probability Theory
Random variables and distributions

Some properties of E (X ) and Var (X )


Expectation:

E (aX ) = aE (X ) a constant
E (XY ) = E (X )E (Y ) if X and Y are independent
E (a + bX ) = a + bE (X ) linearity
E (X + Y ) = E (X ) + E (Y ) linearity
Xn n
X
E[ Xi ] = E [Xi ]
i=1 i=1

Variance:

Var (aX ) = a2 Var (X ) a constant


Var (a + X ) = Var (X ) a constant
IPS 73
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables

Section III

Discrete Random Variables

IPS 74
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables

Outline

The Binomial distribution

The Geometric distribution

The Poisson distribution

IPS 75
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Binomial distribution

Binomial experiments
Consider an experiment with outcomes 1 (success) and 0
(failure) five times

Then = {0, 1} {0, 1} {0, 1} {0, 1} {0, 1}

Consider A = exactly one experiment was a success


This event is given by the set

A = {(0, 0, 0, 0, 1), (0, 0, 0, 1, 0), (0, 0, 1, 0, 0), (0, 1, 0, 0, 0), (1, 0, 0, 0, 0)}

Let success have probability p and failure probability 1 p

Then P(A) = 5(1 p)4 p, since there are five outcomes in the
event A, each having probability (1 p)4 p

IPS 76
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Binomial distribution

Binomial experiments

Exercise: What is the probability of the event B exactly two


experiments were successful?

IPS 77
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Binomial distribution

The Bernoulli and Binomial distributions


The Bernoulli distribution is used to model an experiment with
only two possible outcomes, often referred to as success and
failure, usually encoded as 1 and 0.

Definition: A discrete random variable X has a Bernoulli


distribution with parameter p, where 0 p 1, if its probability
mass function is given by

pX (1) = P(X = 1) = p

and
pX (0) = P(X = 0) = 1 p

Notation: X Ber (p).


IPS 78
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Binomial distribution

Suppose you attend, completely unprepared, a multiple-choice


exam

It consists of 10 questions, and each question has four


alternatives (of which only one is correct)

You will pass the exam if you answer six or more questions
correctly

You decide to answer each of the questions in a random way,


in such a way that the answer of one question is not affected
by the answers of the others

What is the probability that you will pass?

IPS 79
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Binomial distribution

Bernoulli / Binomial

Setting for i = 1, 2, . . . , 10,



1 if i-th answer is correct
Ri =
0 if i-th answer is wrong
P10
The number of correct answers X is given by X = i=1 Ri

Exercise:
Calculate the probability that you answered the first question
correctly and the second one incorrectly

IPS 80
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Binomial distribution

X attains only the values 0, 1, . . . , 10

Let us first consider the case X = 0

Since the answers to the different questions do not influence


each other, we conclude that the events {R1 = a1 },. . . ,
{R10 = a10 } are independent for every choice of the ai , where
each ai is 0 or 1

We have
P(X0 ) = P(R1 = 0, R2 = 0, . . . , R10 = 0)
= P(R1 = 0)P(R2 = 0) . . . P(R10 = 0)
= (3/4)10
The probability that we have answered exactly one question
correctly equals
P(X = 1) = (1/4) (3/4)9 10
IPS 81
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Binomial distribution

The probability of observing k independent successes is


   k  10k
10 1 3
P(X = k) =
k 4 4

When order matters:

Choose k different objects out of an ordered list of n objects:


n possibilities for the first object

n 1 possibilities for the second object

n 2 possibilities for the third object

...

n (k 1) possibilities for the kth object

So there are n(n 1) . . . (n (k 1)) ways to choose the k


objects
IPS 82
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Binomial distribution

When order does not matter:

Any two arrangements will represent the same choice if they


are composed of the same objects

Thus a single choice collection of k objects corresponds to k!


ordered arrangements

So a distinct number of choices is obtained by dividing the


number for ordered by k!

The probability that you will pass is P(X 6) = 0.0197

It pays to study, doesnt it?!

IPS 83
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Binomial distribution

Definition: A discrete random variable X has a Binomial


distribution with parameters n and p, where n = 1, 2, . . . and
0 p 1, if its probability mass function is given by
 
n
pX (k) = P(X = k) = p k (1 p)nk
k

for k = 0, 1, . . . , n

We denote this distribution by Bin(n, p)

The expectation of a Binomial distribution Bin(n, p) is

E (X ) = np

Its variance is
Var (X ) = np(1 p)

IPS 84
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Geometric distribution

The Geometric distribution


Example of infinite experiments: the geometric experiment

1 Each observation falls into one of two categories, either


success or failure

2 The probability of a success, call it p, is the same for each


observation

3 The observations are all independent (this allows us to


multiply probabilities)

4 The variable of interest is the number of trials required to


obtain the first success

IPS 85
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Geometric distribution

Definition: A discrete random variable X has a geometric


distribution with parameter p, where 0 < p 1, if its probability
mass function is given by

pX (k) = P(X = k) = (1 p)k1 p

for k = 1, 2, ....

We denote this distribution by Geo(p)

The expectation of a Geometric distribution Geo(p) is



X 1
E (X ) = kp(1 p)k1 =
p
k=1

Its variance is
1p
Var (X ) =
p2
IPS 86
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Geometric distribution

Geometric distribution

Exercise:
Let X have a Geo(p) distribution. For n 0, show that
P(X > n) = (1 p)n .

IPS 87
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Geometric distribution

Memoryless property
Memoryless property: for n, k = 0, 1, 2, . . . one has

P(X > n + k|X > k) = P(X > n)

We have:
P({X > k + n} {X > k})
P(X > n + k | X > k) =
P(X > k)
P(X > k + n)
=
P(X > k)
(1 p)n+k
=
(1 p)k
= (1 p)n
= P(X > n)
IPS 88
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Poisson distribution

The Poisson distribution


One may be interested in counts per unit time/space interval

If counts per unit interval are typically relatively low (rare),


the situation may be modelled by a Poisson distribution

Example: we observe the number X of incoming calls (events)


at a call centre per hour
Two assumptions:
Homogeneity: the rate at which events occur is constant
over time/space

Independence: the numbers of events in disjoint intervals are


independent of each other

Homogeneity implies that we require at any unit interval


E (X ) =
IPS 89
ST1051-ST3905-ST5005-ST6030
Discrete Random Variables
The Poisson distribution

Definition: A discrete random variable X has a Poisson


distribution with parameter > 0 if its probability mass function p
is given by
k
p(k) = P(X = k) = e
k!

We denote this distribution by Poi().

Derivation of the expectation of a Poisson rv X with rate :



P k
X k1
E (X ) = k=0 ke

k! = e
(k 1)!
k=1
P j
= e j=0 j! =
The variance can be derived in a similar way:
Var (X ) =
IPS 90
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables

Section IV

Continuous Random Variables

IPS 91
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables

Outline

Continuous random variables

The Uniform distribution

The Exponential distribution

The Normal distribution

Moments

IPS 92
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables

Many experiments have outcomes that take values on a


continuous scale (e.g. of weight, length, duration, etc.)

Probability density functions may be seen as a (neverending)


process of refinement from discrete random variables

Ex: a discrete rv takes on the value 6.283 with probability p.


This value may be refined (updated at a smaller scale), and
then the probability p is spread over the outcomes
6.2830, 6.2831, . . . , 6.2839

Each of these new values is taken on with a probability


smaller than p, and the sum of the ten probabilities is p

IPS 93
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables

Continuing the refinement process to more and more


decimals, the probabilities of the possible values of the
outcomes become smaller and smaller, approaching zero

However, the probability that the possible values lie in some


fixed interval [a, b] will settle down

IPS 94
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables


A random variable X is continuous if for some function f : R R
and for any numbers a, b with a b,
Z b
P(a X b) = f (x)dx
a
R
and f satisfies f (x) 0 x and f (x)dx = 1. We call f the
probability density function (or probability density) of X .

IPS 95
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables


P(a X b) = area under a probability density function f
on the interval [a, b]

IPS 96
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables

Let X be a continuous random variable. Then  > 0, we


have: Z a+
P(a  X a + ) = f (x)dx
a

For  0, it follows that a, P(X = a) = 0

a, b constant,

P(a X b) = P(a < X b)


= P(a X < b)
= P(a < X < b)

IPS 97
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables


For small  > 0,
Z a+
P(a  X a + ) = f (x)dx 2f (a)
a

Hence f (a) can be interpreted as a (relative) measure of how


likely it is that X will be near a

f (a) is not a probability: f (a) can be arbitrarily large

Ex: Let
if x 0

0
1
f (x) =
2 x
if 0 < x < 1
if x 1

0
is a probability density function
IPS 98
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables


Discrete rvs do not have a probability density function f

Continuous rvs do not have a probability mass function p

But both have a distribution function F (a) = P(X a)

For a < b, the event {X b} is a disjoint union of the events


{X a} and {a < X b}

We can express the probability that X lies in an interval (a, b]


directly in terms of F for both cases:
P(a < X < b) = P(X b) P(X a)
= F (b) F (a)
Z b
F (b) = f (x)dx

IPS 99
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables - example

Suppose we want to make a probability model for an experiment


that can be described as an object hits a disc of radius r in a
completely arbitrary way. We are interested in the distance X
between the hitting point and the center of the disc.

Since distances cannot be negative, we have


F (b) = P(X b) = 0 when b < 0

Since the object hits the disc, we have F (b) = 1 when b > r

Probability of hitting any region is proportional to the area of


that region

IPS 100
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables - example

The original disc has area r 2

The inner disc defined by the hitting point has radius b and
area b 2
b 2 b2
We should put F (b) = P(X b) = r 2
= r2
for 0 b r

The pdf f of X is equal to 0 outside the interval [0, r ] and, for


0 x r,
dF (x) 1 d 2x
f (x) = = 2 x2 = 2
dx r dx r

IPS 101
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Continuous random variables - exercise

Exercise:
Compute for the darts example the probability that
0 < X r /2, and the probability that r /2 < X r .

IPS 102
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Continuous random variables

Expected value of a functional of a random variable

Let g (x) be a function of rv X ; then g (X ) is also a random


variable

Then, for the discrete and continuous cases respectively,


X
E [g (X )] = g (x)P(X = x)
x

and Z +
E [g (X )] = g (x)f (x)dx

E [g (X )] is said to exist if the corresponding sum and integral


exist

IPS 103
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Uniform distribution

The Uniform distribution

The Uniform distribution corresponds to an experiment where


the outcome is completely arbitrary, except that we know that
it lies between certain bounds

Example: measure for a long time the emission of radioactive


particles of some material

IPS 104
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Uniform distribution

The Uniform distribution

Suppose the experiment consists of recording in each hour at


what times the particles are emitted

Then the outcomes will lie in the interval [0,60] minutes

The measurements must not concentrate in any temporal


way (in our physical world anyway)

Not concentrating in any way means that subintervals of


the same length should have the same probability

The pdf should be constant on [0, 60]

IPS 105
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Uniform distribution

The Uniform distribution

A continuous rv has a Uniform distribution on the interval [, ] if


its probability density function f is given by

0 if x
/ [, ]
f (x) = 1
for x

We denote this distribution by U(, ).

Exercise:
Argue that the distribution function F of a rv that has a
U(, ) distribution is given by F (x) = 0 if x < , F (x) = 1
if x > , and F (x) = (x )/( ) for x .

IPS 106
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Uniform distribution

The Uniform distribution


pdf and the distribution function of a U(0, 1/3) distribution:

IPS 107
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Exponential distribution

The Exponential distribution


Describes how long until something happens (e.g. time
between emissions of particles from a radioactive source)

Obtained directly from the Geometric distribution

Also has the memoryless property:


P(X > s + t | X > s) = P(X > t)
Notation: Exp(), with rate > 0

If X Exp(), then range of X is R+ , > 0

Cumulative distribution function:


F (a) = 1 e a for a0
Probability density function:
f (x) = e x for x 0
IPS 108
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Exponential distribution

The Exponential distribution


Exponential distribution for various rates (en.wikipedia.org):

IPS 109
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Illustration
Example: relative frequency histogram of lifetimes of a
computer component

What happens if one uses finer bins (with a large enough


sample)?
IPS 110
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Using finer bins (classes):

This bell-shaped curve is typical of the Normal distribution


IPS 111
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

The Normal distribution

The normal distribution has two parameters: its mean and


its standard deviation

Notation: N(, 2 )

If X N(, 2 ), then range of X is R, R, and > 0

The density is given by


2
1 1 (x)
f (x) = e 2 2
2

IPS 112
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

The Normal distribution


The shape of the normal distribution varies according to the
values of and

However the distribution is always bell-shaped and symmetric


about the mean

IPS 113
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

The Standard Normal distribution: probability table


To find probabilities, we would need to integrate the pdf

But integrating this pdf is not straightforward

Instead, use a table of standard normal probabilities

Normal table gives the areas to the right for a series of


z-values. i.e. right hand tails or P(Z z)

IPS 114
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

The Standard Normal distribution: probability table

Table 1. Areas in the Tail of the Standard Normal Distribution

This table gives the probability that a standardised normal variable


x
will be at least ,

where is the mean and is the standard deviation of the normal
variable.

.00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641
0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247
0.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859
0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483
0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121

0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2776
0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451
0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148
0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867
0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611

1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379 IPS 115
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

The Standard Normal distribution: exercises

Find P(Z < 0.45)

Find P(Z < 1.03)

Find P(0.36 Z 1.04)

Find P(0.48 Z 0.60)

Find P(1.96 Z 0.63)

IPS 116
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Percentiles of the Normal distribution


We can also use the standard normal table to find percentiles
of the standard normal distribution

Ex: 80th percentile = value below which lies 80% of the


distribution, i.e. P(Z < P0.80 ) = 0.80

We can use the table of probabilities, working backwards

Since P(Z > P0.80 ) = 0.20, we rather look for 0.2000 within
the table

We read that
P(Z > 0.84) = 0.2005
P(Z > 0.85) = 0.1977
Therefore 0384 < P0.80 < 0.85

Approximating, we get P 0.841 IPS 117


ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Percentiles of the Normal distribution

Exercise: find the 95th percentile of the standard normal


distribution...

This knowledge will become useful in further sections...

IPS 118
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Standardization

Most commonly, normal rvs are not standard, i.e. 6= 0


and/or 6= 1

Standardizing these allows one to apply the table of normal


probabilities

Standardization: given a rv X N(, ),

X
Z= N(0, 1)

This principle is also (implicitly) fundamental in many
statistical inference methods

IPS 119
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Standardization: example
A life assurance company has established that the lifetimes of
a certain subgroup of policy-holders are normally distributed
with a mean of 72 years and a standard deviation of 4 years,
i.e. the continuous lifetime H N(72, 4)

Percentage of policy-holders lives longer than 78 years?

IPS 120
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Standardization: example

Standardize:
 
H 78 72
P(H > 78) = P > = P(Z > 1.50)
4

IPS 121
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Adding Normal variables

The sum (or difference) of independent normal variables is


also normally distributed

Suppose we have n independent normal variables


X1 , X2 , . . . , Xn where Xi N(i , i ) i = 1, ..., n

Then, for a sequence of constants {a1 , . . . , an },


v
Xn Xn u n
uX
Y = ai Xi N ai i , t ai2 i2
i=1 i=1 i=1

IPS 122
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Adding Normal variables: exercises

A chemical detergent is made by mixing 2 ingredients, A and


B.
The volumes of A are normally distributed with a mean of
50ml and a standard deviation of 1.5ml.

The volumes of B are normally distributed with a mean of


75ml and a standard deviation of 2.5ml.

The detergent is made by mixing 2 parts of A with 3 parts of


B.

What proportions of detergent will have volumes greater than


330ml?

IPS 123
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
The Normal distribution

Adding Normal variables: exercises

At a certain bank, account balances are normally distributed


with a mean of 1700 and a standard deviation of 100.
A random sample of n accounts is taken.

What is the distribution of the sample total?

NB: Each account balance is normally distributed with mean


1700 and standard deviation 100.

IPS 124
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Moments

Moments

E [X k ] is called the kth raw moment of X , if the expectation


exists, where k is any positive integer.

E [|X |k ] is called the kth absolute moment.

E [(X E [X ])k ] is called the kth central moment.

IPS 125
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Moments

Moment Generating Function


Special expectation: the Moment Generating Function (MGF)

X (t) = E [e tX ]

if it exists. The MGF is an alternative way of specifying the


distribution of a random variable.

For a continuous distribution,


Z
X (t) = e tx f (x)dx

Z
1
= (1 + tx + t 2 x 2 + . . . )f (x)dx
2!
1
= 1 + tE [X ] + t 2 E [X 2 ] + . . .
2!
IPS 126
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Moments

Moment Generating Function

Useful properties:
1 Limit:
d k X (t)
lim = E [X k ]
t0 dt k
2 If X , Y are independent,

X +Y (t) = E [e t(X +Y ) ]
= E [e tX e tY ]
= E [e tX ]E [e tY ]
= X (t)Y (t)

IPS 127
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Moments

Examples of MGFs
X Exp():

1 tx x/
Z
X (t) = e e dx
0
1 x[t(1/)]
Z
= e dx
0
" #
1 e x[t(1/)]
=
t (1/)
0
" #
1 e x[t(1/)]
 
1 1
= lim
x t (1/) t (1/)
1
=
1 t
as long as t < 1/, since the upper limit will vanish
IPS 128
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Moments

Examples of MGFs
X N(, 2 ):
Z 2
1 1 (x)
X (t) = e tx e 2 2 dx
2
Z
1 1 x 2 2x+2
= e tx e 2 2 dx
2
Z 2 2 2
1 1 x 2x(+t )+
= e 2 2 dx
2
Z 0 2
(+t 2 )2
 2
1 2+ 1 (x )
= e 2 2 2 e 2 2 dx
2

where 0 = ( + t 2 )

The integrand is another Gaussian with different mean



Therefore it must integrate to 2
1 2 2 IPS 129
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Moments

Characteristic function
The MGF of a random variable does not always exist

Its characteristic function X always exists, where

X (t) = E [e itX ]
Z
= e itx f (x)dx

is the Fourier transform of f (x)

Connection with the MGF:

X (it) = X (t)

Additive property:

X +Y (t) = X (t) + Y (t)


IPS 130
ST1051-ST3905-ST5005-ST6030
Continuous Random Variables
Moments

Cumulants
The log of the characteristic function is used to generate
cumulants n :

X (it)n
log (X (t)) = n
n!
n=1

Cumulants are related to moments:

1 = E [X ]
2 = E [X 2 ] E [X ]2
3 = 2E [X ]3 3E [X ]E [X 2 ] + E [X 3 ]
...

Their relationship to centred moments is simpler


IPS 131
ST1051-ST3905-ST5005-ST6030
Limit theorems

Section V

Limit theorems

IPS 132
ST1051-ST3905-ST5005-ST6030
Limit theorems

Outline

Motivation

Limit theorems

IPS 133
ST1051-ST3905-ST5005-ST6030
Limit theorems
Motivation

The Normal distribution: utility

Central Limit Theorem: under very general conditions, the


distribution of the sum of a large number of mutually
independent rvs may be approximated by a Normal
distribution

This is a very important result that allows to use the Normal


distribution in a very large variety of situations

Confidence intervals, hypothesis tests, and regression models


(describing relationships between variables) are some of the
key elements of the theory of Statistics that largely rely on the
normal distribution

IPS 134
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems

Chebyshevs inequality

Let X1 , . . . , Xn be a sequence of iid rvs with E (Xi ) = and


Var(Xi ) = 2 . Let
n
1X
Xn = Xi
n
i=1

Then for any > 0, Chebyshevs inequality states that

 Var(Xn )
P | Xn |>
2

IPS 135
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems

The Law of Large Numbers Theorem

Let X1 , . . . , Xn be a sequence of iid rvs with E (Xi ) = and


Var(Xi ) = 2 . Let
n
1X
Xn = Xi
n
i=1

Then for any > 0,

P | Xn |> 0

as n

IPS 136
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems

Definition: convergence in distribution

Let X1 , X2 , . . . be a sequence of rvs with cdf F1 , F2 , . . . and let X


be a rv with cdf F . We say that Xn converges in distribution to X
if
lim Fn (x) = F (x)
n

IPS 137
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems

The Central Limit Theorem

Let X1 , X2 , . . . be a sequence of iid rvs having 0 mean and


variance 2 , and the common distribution function F and MGF M
defined in a neighbourhood of 0. Let
n
X
Sn = Xi
i=1

Then for < x > ,


 
Sn
lim P x = (x)
n n

where (x) denotes the cdc for the Standard Normal distribution

IPS 138
ST1051-ST3905-ST5005-ST6030
Limit theorems
Limit theorems

Examples:

Let X1 , . . . , Xn be iid N(, 2 ); then


Pn
Sn n Xi n
Zn = = i=1 N(0, 1)
n n

Let X1 , . . . , X12 (i.e. n = 12) be iid U(0, 1); then


n
X approx.
Sn 6 = Xi 6 N(0, 1)
i=1

(since X U(0, 1) has E (X ) = 1/2 and Var(X ) = 1/12)

IPS 139
ST1051-ST3905-ST5005-ST6030
Statistical Inference

Section VI

Statistical Inference

IPS 140
ST1051-ST3905-ST5005-ST6030
Statistical Inference

Outline

Exploratory Analysis and Descriptive statistics

Sampling

Exploratory data analysis

IPS 141
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory Analysis and Descriptive statistics

Probability? Statistics?

IPS 142
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory Analysis and Descriptive statistics

Statistics!

Moneyball (2011)

IPS 143
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Sampling

Population parameters
Statistical inference consists in estimating population features

Ex: population of N = 393 hospitals in a given country, for


which the mean number of discharges is
N
1 X
= xi = 814.6
N
i=1
The population total (total number of discharges) is
N
X
= xi = N = 320, 138
i=1
Population variance on number of discharges per hospital:
N N
2 1 X 2 1 X
= (xi ) = xi 2 2
N N
i=1 i=1
IPS 144
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Sampling

Simple Random Sampling (SRS)

A sample of size n is picked to represent a population of size N

Most elementary form of sampling is SRS

Each sample of size n has the same probability of occurrence


 
N
There are such samples (without replacement)
n

Each item gets picked at most once

Can be performed using a (pseudo-)random generator, balls in


urn, etc.

IPS 145
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Sampling

A sample of size n is picked to represent a population of size N

The sample mean number of discharges approximates with


n
1X
X = Xi
n
i=1

An estimate of the population total (number of discharges) is


then
= N X

The population variance on the number of discharges per


hospital is estimated with
n
1 X
2
s = (Xi X )2
n1
i=1

IPS 146
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Sampling

Stratified Random Sampling

For a variety of reasons a population may be partitioned into


groups (strata)

These strata can then be sampled independently


Ex:
human populations organised in geographical areas

Irish pupils stratified by school

shipments of goods stratified by carrier size (large, medium


and small)

A final sample is obtained by combining the results from the


strata

IPS 147
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Sampling

Consider L strata, each of size Nl , l = 1, . . . , L (before sampling):

The sample mean for each stratum is


nl
1 X
Xl = Xil l
nl
i=1

and the overall population mean estimate is


L L
X Nl Xl X
X = = Wl Xl
N
l=1 l=1

The estimate of l2 is
l n
1 X
sl2 = (Xil Xl )2
nl 1
i=1
IPS 148
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Sampling

Cluster Sampling

In stratified random sampling, all strata get sampled from

This requirement may be unrealistic in some cases

Cluster sampling consists in grouping the population into


clusters

SRS is then applied by selecting whole clusters

Usually produces greater sampling error than random or


stratified sampling

But loss of precision may be outweighed by the efficiency of


data collection

IPS 149
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Sampling

Systematic Sampling

Sampling at regular intervals

Ex: select every 10th member of a list

Requires the sequence of members to be random (i.e. not


sorted) so as to avoid bias

IPS 150
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Exploratory Data Analysis


The durations of 272 eruptions of the Old Faithful geyser at
Yellowstone National Park, Wyoming, USA, were recorded from
1st to 15th Aug 1985 (in seconds)

IPS 151
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Exploratory Data Analysis


The durations of 272 eruptions of the Old Faithful geyser at
Yellowstone National Park, Wyoming, USA, were recorded
from 1st to 15th Aug 1985 (in seconds)

The variety in the lengths of the eruptions indicates that


randomness is involved
By exploring the dataset we might learn about this
randomness:
which durations are more likely to occur?

is there something like the typical duration of an eruption?

do the durations vary symmetrically around the center of the


dataset?

IPS 152
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Yellowstone data: duration (seconds) of 272 eruptions


216 108 200 137 272 173 282 216 117 261 110 235 252 105 282
130 105 288 96 255 108 105 207 184 272 216 118 245 231 266
258 268 202 242 230 121 112 290 110 287 261 113 274 105 272
199 230 126 278 120 288 283 110 290 104 293 223 100 274 259
134 270 105 288 109 264 250 282 124 282 242 118 270 240 119
304 121 274 233 216 248 260 246 158 244 296 237 271 130 240
132 260 112 289 110 258 280 225 [......] 200 250 260 270 145 240
250 113 275 255 226 122 266 245 110 265 131 288 110 288 246
238 254 210 262 135 280 126 261 248 112 276 107 262 231 116
270 143 282 112 230 205 254 144 288 120 249 112 256 105 269
240 247 245 256 235 273 245 145 251 133 267 113 111 257 237
140 249 141 296 174 275 230 125 262 128 261 132 267 214 270
249 229 235 267 120 257 286 272 111 255 119 135 285 247 129
265 109 268 (Source: W. Hardle. Smoothing techniques with
implementation in S. 1991; Table 3, page 201. Springer New York)
IPS 153
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Exploratory Data Analysis

In order to retrieve this type of information, just listing the


observed durations does not help much

Somehow we must summarize the observed data

We could start by computing the mean of the data, which is


209.3 for the Old Faithful data

However, this is a poor summary of the dataset, because there


is a lot more information in the observed durations

How do we get hold of this?

IPS 154
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Ordered durations
96 100 102 104 105 105 105 105 105 105 107 107 108 108 108 108 109
109 109 110 110 110 110 110 110 110 111 111 112 112 112 112 112 112
112 112 113 113 113 113 115 115 116 116 117 118 118 118 119 119 119
120 120 120 120 121 121 121 122 122 124 125 125 126 126 126 128 129
130 130 131 132 132 132 133 134 134 135 135 136 137 138 139 140 141
142 143 144 144 145 145 149 157 158 168 173 174 184 199 200 200 202
205 207 210 210 214 214 216 216 216 216 221 223 224 225 226 226 229
230 230 230 230 230 231 231 233 235 235 235 237 237 238 238 240 240
240 240 240 240 242 242 243 244 244 245 245 245 245 [.....] 274 274
275 275 275 275 276 276 276 276 277 278 278 278 279 280 280 282 282
282 282 282 282 283 284 285 286 287 288 288 288 288 288 288 289 289
290 290 291 293 294 294 296 296 296 300 302 304 306
Middle elements (136th and 137th) = 240, much closer to max
(306) than to min (96) - implies asymmetry
IPS 155
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Numerical summaries

A range of descriptive statistics can be used to build a


numerical summary of a sample

This is useful in particular to focus on specific features

Depending on the nature and characteristics of the sampled


data, some statistics are more adequate than others

They usually follow the terms sample of empirical

Some required the ordered sample

X[1] , . . . , X[n]

IPS 156
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Sample {x1 , . . . , xn } = empirical information

Each observation xi has empirical probability 1/n

Under usual regularity conditions, large sample theory


highlights probabilistic notions

Ex: sample mean


n n  
1X X 1
X = Xi = (Xi )
n n
i=1 i=1
n
X
E (X ) = xp(x) (discrete)
Z
n
E (X ) = xf (x)dx (continuous)

IPS 157
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Common numerical summaries


Median:

more robust than sample mean

is the value xM such that P(X < xM ) = 0.5

from an ordered sample X[1] , . . . , X[n] : median = X[ n+1 ]


2

Ex: median of (1, 3, 4, 7, 9) is 4

Ex: median of (1, 3, 4, 7, 100) is still 4

Ex: median of (1, 3, 4, 7, 9, 10) is 4+0.5(7-4) = 5.5


(interpolation)

IPS 158
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Given a sample X = {X1 , . . . , Xn } , most commonly used


statistics are:
For centrality: the sample mean X and/or median X[ n+1 ]
2

For shape: the empirical quartiles

qn (0.25) = Q1 (X ) = X[ n+1 ]
4

qn (0.75) = Q3 (X ) = X[ 3(n+1) ]
4

For variability: the sample standard deviation sn or variance


n
1 X
sn2 = (Xi Xn )2
n1
i=1

or the inter-quartile range IQR(X ) = Q3 (X ) Q1 (X )

IPS 159
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Numerical summaries

A typical summary of a sample X = {X1 , . . . , Xn } would


include:
min(X ) (or alternative)

Q1 (X )

median(X )

Q3 (X )

max(X ) (or alternative)

This summary matches that provided by a typical boxplot

IPS 160
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Robust numerical summaries


Often with real datasets one has to deal with outlying values

Typically, outliers are defined as values that stand outside

(Q1 (X ) 1.5 IQR, Q3 (X ) + 1.5 IQR)

They may affect summaries significantly and prompt the use


of robust statistics:
use the median rather the the sample mean

use Q1 (X ) 1.5 IQR or qn (0.02) rather than min(X )

use Q3 (X ) + 1.5 IQR or qn (0.98) rather than max(X )

A boxplot should represent these outliers

IPS 161
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Ex: sepal width on the Iris data (50 flowers from each of 3 species
of iris)
Iris data (2nd component)

4.0
3.5
3.0
2.5
2.0

Source: Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The


New S Language. Wadsworth & Brooks/Cole
IPS 162
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Histogram of the Old Faithful data


Histogram reveals the asymmetry of the dataset and the fact that
the elements accumulate somewhere near 120 and 270, which was
not clear from the list of values

IPS 163
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Drawing a histogram
Whenever feasible let the software do it!

Know the theory behind the histogram, so as to interpret it


and modify its calibration
Essential features:
1 Total area under graph taken to represent 1

2 Rectangles put on bin-widths

3 m = 1 + 3.3 log10 (n) be the number of bins, or

4 b = 3.49sn1/3 be the bin-widths, where s is the sample


standard deviation

Also refer to the Normal Reference Curve method

IPS 164
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Results of different bin-widths

IPS 165
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Interfailure times data (in CPU seconds)

30 113 81 115 9 2 91 112 15 138 50 77 24 108 88 670 120 26 114


325 55 242 68 422 180 10 1146 600 15 36 4 0 8 227 65 176 58
457 300 97 263 452 255 197 193 6 79 816 1351 148 21 233 134
357 193 236 31 369 748 0 232 330 365 1222 543 10 16 529 379 44
129 810 290 300 529 281 160 828 1011 445 296 1755 1064 1783
860 983 707 33 868 724 2323 2930 1461 843 12 261 1800 865
1435 30 143 108 0 3110 1247 943 700 875 245 729 1897 447 386
446 122 990 948 1082 22 75 482 5509 100 10 1071 371 790 6150
3321 1045 648 5485 1160 1864 4116
Source: J.D. Musa, A. Iannino, and K. Okumoto. Software
reliability: measurement, prediction, application. McGraw-Hill,
New York, 1987; Table on page 305

IPS 166
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

The empirical distribution function


Cumulative representation of the data

Empirical cumulative distribution function of the data Fn is


1
Fn (x) = (number of elements in the dataset x)
n

IPS 167
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Boxplot
Another way of summarising the underlying data distribution

Given a sample, a boxplot indicates quartiles and outliers:


Faithful data

90
80
70
60
50

IPS 168
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Scatterplot

To investigate the relationship between two or more variables

Given x and y , the dataset consists of pairs of observations:


(x1 , y1 ), (x2 , y2 ), . . ., (xn , yn ) (bivariate dataset)

Does y depend on x? If so, can we describe their relationship?

A first step is to plot the points (xi , yi ) for i = 1, 2, . . . , n

This plot is called a scatterplot

IPS 169
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Scatterplot
Example: daily readings of air quality values in NYC,
1st May - 30th Sept 1973 (R dataset airquality)

Ozone: Mean ozone in parts per billion from 1300 to 1500


hours at Roosevelt Island

Solar.R: Solar radiation in Langleys in the frequency band


40007700 Angstroms from 0800 to 1200 hours at Central Park

Wind: Average wind speed in miles per hour at 0700 and


1000 hours at LaGuardia Airport

Temp: Maximum daily temperature in degrees Fahrenheit at


La Guardia Airport.

IPS 170
ST1051-ST3905-ST5005-ST6030
Statistical Inference
Exploratory data analysis

Air quality, NYC, May-Sep 1973 Air quality, NYC, May-Sep 1973

20
90
Temperature (degrees F)

Wind (miles per hour)

15
80

10
70

5
60

0 50 100 150 0 50 100 150

Ozone (parts per billion) Ozone (parts per billion)

IPS 171
ST1051-ST3905-ST5005-ST6030
Estimation

Section VII

Estimation

IPS 172
ST1051-ST3905-ST5005-ST6030
Estimation

Outline

Statistical Inference

Estimation

Confidence intervals

Linear regression

IPS 173
ST1051-ST3905-ST5005-ST6030
Estimation
Statistical Inference

Statistical inference
Detection:
Discrete probabilities (most of the time)

Hypothesis testing: minimise probability of incorrect decision

Estimation:
Discrete or continuous probabilities

Classical statistics: estimate a real number, not a r.v.


(e.g. the mass of an electron)

Bayesian inference: estimate a r.v. (with its distribution)

Always minimise (some form of) estimation error

The central feature is always the information (data).

Statistical inference techniques can easily be applied very badly.


IPS 174
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

Estimation, estimators and estimates

Why estimation?

An estimate t is a realization of a random variable T ...

We cannot say anything with certainty about which of the


estimators is closer to the parameters of interest

When is one estimate better than another?

Does there exist a best possible estimate?

How likely it is that an estimate lies within a given distance


from the parameter?

IPS 175
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

Estimators
Let t = h(x1 , x2 , . . . , xn ) be an estimate based on the dataset
x1 , x2 , . . . , xn

Then t is a realization of the random variable


T = h(X1 , X2 , . . . , Xn )

The random variable T is called an estimator

The word estimator refers to the method or device for


estimation

This is distinguished from estimate, which refers to the actual


value computed from a dataset

Note that estimators are special cases of sample statistics

IPS 176
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

An estimator is a statistic - or an appropriate function of the


sample observations that gives us an estimated value for the
unknown parameter

Ex: sample mean X and sample variance s 2 are estimators for


the mean and variance 2 from the Normal distribution
N(, 2 )

There can be more than one estimator feasible

How to choose one of them or select a best one?

We need criteria to determine what is desirable

The two most common criteria used are (a) Unbiasedness (b)
Minimum Variance

IPS 177
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

Unbiasedness and minimum variance


Unbiasedness: If T is an estimator for , then T is called
unbiased if
E [T ] =

Ex: for a Normal sample from N(, 2 ), E [X ] =

Minimum Variance: If an estimator T of achieves


minimum variance, then under regular conditions T achieves
the best possible estimation accuracy

The sample mean often turns out to be a Minimum Variance


unbiased estimator for its expected value

For N(, 2 ), X is a Minimum Variance unbiased estimator


for

Harder to check, but feasible


IPS 178
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

Constructing good estimators: Maximum Likelihood

Let
L() = f (x1 , x2 , . . . , xn ; ),
be the joint pdf of X1 , X2 , . . . , Xn

For a given set of observations (x1 , x2 , . . . , xn ), a value


at which L() is maximum is called a maximum likelihood
estimate (MLE) of

That is is a value of that satisfies


= max f (x1 , x2 , . . . , xn ; )
f (x1 , x2 , . . . , xn ; )

IPS 179
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

Maximum Likelihood Estimator

MLE:
= max f (x1 , x2 , . . . , xn ; )
f (x1 , x2 , . . . , xn ; )

What makes this estimator attractive? Under very general


conditions on the density or pmf:
ML converges to the true in probability as sample size
increases

as sample size increases n( ) converges to a Normal
distribution with mean 0

IPS 180
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

MLE: interfailure data example


Ex: the sample is a realization of random variables
X1 , X2 , . . . , Xn , with n=135, and Xi EXP()

Let the sample be denoted by x1 , . . . , xn

The pdf is
1
f (x, ) = e x/ , x, > 0

Joint pdf is
Y
L() = f (xi , )
i
n
Y 1 xi /
= e

i=1
n ni=1 xi /
P
= e
IPS 181
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

Note that L() is a strictly positive function of

Therefore we can write


Pn
i=1 xi
ln L() = n ln

Note that ln L() is a differentiable function of and
Pn
d ln L() n xi
= + i=1 2
=0 = x
d
Further, Pn
d 2 ln L()

n i=1 xi

= 2 2 <0
d2 3

which imples that is a local maximum of the ML function

IPS 182
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

X is an unbiased estimator for , because E [X ]=E [X1 ] =

The estimate is 656.8815

It can be shown that X is also the Minimum Variance one


among all possible Unbiased estimators [But beyond our scope
now]

IPS 183
ST1051-ST3905-ST5005-ST6030
Estimation
Estimation

How reliable is our estimator?


So we have a supposedly good estimator, X

This is still a rv and therefore will vary in an unpredictable


manner from one sample to another if we repeat the
experiment

We can find a range of values within which we can claim that


the true value lies with a high probability

This is called a Confidence Interval

To find a confidence interval, we need to know the probability


distribution of the estimator statistic

In our example, we need to find the pdf for X

IPS 184
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Confidence Intervals
A confidence interval is an interval in which we are very
confident the population parameter of interest lies

The level of confidence is stated and is frequently 95%

When we estimate a statistic about a population (e.g. a


mean), we calculate a single estimate, known as a point
estimate

This makes no use or mention of the sampling error

Knowledge of the standard error of the estimate will allow us


to give a measure of the sampling error

The standard error (and the sampling distribution) is used to


calculate a confidence interval
IPS 185
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Confidence Intervals for the sample mean


The confidence interval for the mean X is obtained based on
the Normal distribution N(. 2 )

Given a sample of size n and a critical value Z , this CI is



X Z
n
In practice the standard deviation is not known, and one
usually uses its sample estimate instead

If a confidence level of say 95% is set, then the significance


level is 100%-95%=5%

The critical value Z sets the level of confidence

Z is the percentile of the Standard Normal distribution


yielding a rhs area of half the significance level
IPS 186
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Ex: for a 95% CI, one needs to remove the most extreme
2.5% from each tail of the distribution

i.e. one truncates the Standard Normal distribution beyond


(Z , Z ) = (1.96, 1.96)

P( Z > 1.96) = 0.05


0.4
0.3
Density

0.2
0.1
0.0

4 2 0 2 4
X 2
Z=
n

IPS 187
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Ex: a random sample of 50 transactions was selected from a


travel agency

The mean value was 732.16 and the standard deviation was
83.14

95% CI for the mean transaction value is given by


s 83.14
X Z = 732.16 1.96 = 732.16 23.05
n 50
= (709.11,755.21)

We can be 95% confident that the true (population) mean


transaction value lies between 709.11 and 755.21

IPS 188
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Confidence Intervals

In previous example, a 99% CI will be wider: because of the


higher confidence level, more values must be included

Generally speaking 2 parameters control the width of a CI: the


sample size and the level of confidence

CI will be narrower when either increasing n or decreasing Z

It is usually preferred to increase the sample size

If not possible one must then decrease the confidence level

IPS 189
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Confidence Intervals for a proportion

Consider an estimated proportion p of a certain


sub-population

Under usual conditions the maximum likelihood estimator for


+
a proportion given a sample of n observations is p = nn

The associated confidence interval is given by


r
p)
p(1
p Z
n

IPS 190
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Ex: from a random sample of 250 people in a certain electoral


ward, 65 were in favour of a proposed amendment to the
constitution

Find a 99% CI for the proportion of all people in favour...


65
We have p = 250 = 0.26, Z = 2.575 and
r r
p)
p(1 0.26 0.74
pZ = 0.262.575 = 0.26.0714
n 250
This means that we can be 99% confident that the true
(population) proportion of people in favour of a proposed
amendment to the constitution is between 18.66% and
33.14%

IPS 191
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Sample size determination


When we estimate a population feature using a statistic, we
do not know in advance how wide the CI will be

If we use too small a sample size, the confidence interval may


be to wide to be meaningful

If we use too large a sample size, the confidence interval may


be unnecessarily narrow, meaning valuable resources were
wasted in the process

If we have some previous knowledge of the population


variability, we can calculate the sample size required to
estimate the population feature to within a stated range
(precision) with a stated level of confidence

IPS 192
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Sample size for a sample mean

Given a population variance 2 and a critical value Z , the


sample size required to estimate a mean to within an
allowable error  is
Z 22
n = 2

Usually, the population standard deviation will not be known
but an estimate, say s, may be available e.g. from a pilot
study or from similar previous studies

We can substitute s into the formula

The accuracy of our calculated sample size depends on the


accuracy of the previous estimate of the standard deviation

IPS 193
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Ex: from a pilot study, the weekly bank charges to private


customers was found to have a standard deviation of 10

How large a sample would be needed to estimate the


population mean bank charge to within 1.50 with 95%
confidence?

We have Z = 1.96, s = 10,  = 1.50 so


 2
1.96 10
n= = 170.74
1.50

i.e. we must use n = 171

Always round up!!!

IPS 194
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Sample size for a proportion

Consider an estimated proportion p of a certain


sub-population

The sample size required to estimate this proportion to within


an allowable error  and with (100 2)% confidence is
determined by
Z 2 p(1
p)

n= 2

The allowable error  is expressed as a decimal, i.e.  (0, 1)

IPS 195
ST1051-ST3905-ST5005-ST6030
Estimation
Confidence intervals

Ex: after performing a study, we judge the confident interval


about an estimated proportion p = 0.20 to be too wide

We wish to repeat the study so that we estimate the


proportion to within 4 percentage points with 95%
confidence...

p = 0.20, Z = 1.96,  = 0.04

The sample size required to do so is

1.962 0.20 0.80


n= = 384.16
0.042
i.e. we must take n = 385

Always round up!!!

IPS 196
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression

Regression
Let Y be a random variable and x a deterministic variable
(that is, non-random)

Given a random sample (x1 , Y1 ),. . ., (xn , Yn ) we want to find


a mathematical relationship that expresses Y in terms of x

The variable x is called the independent variable and Y is


called the dependent or response variable

In the case of simple linear regression, the model that we


propose is of the form

Y = 0 + 1 x + 

where  is an error term


IPS 197
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression

We assume that each observation Yi of Y satisfies

Yi = 0 + 1 Xi + i

where i N(0, 2 ) for i = 1, . . . , n, and that the random


variables i are independent

Note that we take for granted that the variance of i is the


same for all values of i
Note also that
the Yi s are observed

the Xi s are known

the i s are unobservable

IPS 198
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression

Regression and Least Squares

For this model, the best estimators of the parameters 0 and


1 , that is, the minimum variance unbiased estimators of 0
and 1 , are obtained using the method of least squares

We define the sum


n
X n
X
SS = 2i = (Yi 0 1 xi )2
i=1 i=1

The estimators 0 and 1 of 0 and 1 by the method of least


squares are the values of 0 and 1 that minimize the sum SS

IPS 199
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression

We set the normal equations


n
SS X
= 2 (Yi 0 1 xi ) = 0
0
i=1

n
SS X
= 2 xi (Yi 0 1 xi ) = 0
1
i=1

The solutions of these equations are


Pn Pn
(x x)(Yi Y ) x Y
i=1 xi Yi n
1 = Pn i
i=1
= n 2
)2 x2
P
i=1 (xi x i=1 xi n
0 = Y 1 x

IPS 200
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression

Example: tensile strength

We want to determine how the tensile strength of a certain


alloy depends on the percentage of zinc it contains. We have
the following data:

% of zinc 4.7 4.8 4.9 5.0 5.1


Tensile strength 1.2 1.4 1.5 1.5 1.7

Consider the simple linear regression model: Y = 0 +1 x +,


where x is the percentage of zinc and Y is the tensile strength

IPS 201
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression

P5 2
P5
x= 4.9, y = 1.46, i=1 xi =120.15 and i=1 xi yi = 35.88

Then,
P5
xi yi 5
x y 35.88 5(4.9)(1.46)
1 = Pi=1
5
= = 1.1
2
i=1 xi 5 x2 120.15 5(4.9)2

and
0 = y 1 x = 1.46 (1.1)(4.9) = 3.93

Thus the prediction equation is given by y = 3.93 + 1.1x

IPS 202
ST1051-ST3905-ST5005-ST6030
Estimation
Linear regression

Example: air quality (R dataset)


Air quality, NYC, May-Sep 1973 Air quality, NYC, May-Sep 1973

20
90
Temperature (degrees F)

Wind (miles per hour)

15
80

10
70

5
60

0 50 100 150 0 50 100 150

Ozone (parts per billion) Ozone (parts per billion)

Temp 69.4 + 0.20 Ozone ( = 0.698)

Wind 12.6 0.07 Ozone ( = 0.602)


IPS 203
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing

Section VIII

Hypothesis Testing

IPS 204
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing

Outline

Concepts in hypothesis testing

One-sample, one-sided tests of the population mean

One-sample, one-sided tests of the population proportion

One-sample, two-sided tests

Two-sample tests

Goodness-of-fit tests

Testing for significance in linear regression

Summary

IPS 205
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Concepts in hypothesis testing

Hypothesis Testing
We know that X is an unbiased estimator

We further have a Confidence Interval at 95% confidence level


for the unknown parameter

Can we check whether the unknown parameter actually takes


values that are not dependent on the sample

[Note that the range of values given in the Confidence Interval


is dependent on the sample]

i.e. instead of deriving ranges of values (even if in probability)


from the sample, can we start out by making assumptions
about the range of possible values and then test the
assumption based on the sample?
IPS 206
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Concepts in hypothesis testing

Hypothesis Testing: Interfailure example

Ex: we have x = 656.8815 for unknown parameter

Can we make a hypothesis that the true value of the unknown


is equal to 700?

Well we definitely can!

Having made that hypothesis, we need to test it based on the


sample of observations

NB: we cannot assume that, say, < 500, since is not the
rate of an Exponential distribution

IPS 207
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Concepts in hypothesis testing

Forming Hypotheses

Typically, hypotheses are expressed as restriction of possible


values for the true unknown parameter

i.e. they represent a partition of the parameter space

Ex: parameter space is R+ ; i.e. R+

Null hypothesis H0 : = 0 where 0 = 700 is to be tested

We can see that H0 represents a proper subset of

H0 is assumed true until data indicate otherwise

Typically, H0 assumes no effect

IPS 208
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Concepts in hypothesis testing

We also need to formally define an alternative hypothesis HA

HA typically states a significant effect was observed

The set represented by HA is not allowed any overlap with the


set represented by H0

HA contains values that would lead to reject the null H0

Ex: given H0 : = 0 , a reasonable HA is HA : = 1 = 600

IPS 209
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Concepts in hypothesis testing

Some standard forms of Rejection Regions


Forms of hypothesis Reject if
H0 : = 0 vs Ha : = 1 , 0 < 1 X > k
H0 : = 0 vs Ha : = 1 , 0 > 1 X < k
H0 : = 0 vs Ha : 6= 1 , X < k1 or X > k2
H0 : < 0 vs Ha : > 0 X > k
H0 : = 0 vs Ha : > 0 X > k
H0 : = 0 vs Ha : < 0 X < k

Note that
P[N(0, 1) < 1.645] = P[N(0, 1) > 1.645] = 0.05
P[N(0, 1) < 1.96] = P[N(0, 1) > 1.96] = 0.025
IPS 210
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Concepts in hypothesis testing

Errors in detection
Recall: one seeks to retain or reject a null hypothesis H0 on the
basis of evidence. Let us denote H1 the alternative hypothesis.

H0 is true H1 is true
H0 is accepted Correct decision Type II error
H1 is accepted Type I error Correct decision

The Null hypothesis can never be proven

Type I error occurs when H0 is true but rejected

P(Type I error) = significance level of the test

P(Type II error) = false negative rate


IPS 211
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Concepts in hypothesis testing

Test statistic
We test the hypotheses based on the sample

To do this, we can only use functions of sample values that do


not involve the unknown

Hence the use of a statistic... but how to pick one?


Idea: if
the statistic in question, say T, is an unbiased estimator for

and the underlying model distribution has finite variance so


that weak Law of Large Numbers applies
then for large samples T will be quite close in probability to
true value

T will then reflect behaviour of unknown

If T increases wed expect to increase and vice-versa


IPS 212
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Concepts in hypothesis testing

p-value
The test procedure becomes: Reject H0 if T > tc for some
unknown but computable tc

If T < tc , we will say that based on this sample we fail to


reject H0 . Now how to decide on tc ?
The p-value is the probability of obtaining a value of the test
statistic
at least as extreme as the one computed form the sample data

under the assumption that the null hypothesis is true

The smaller the p-value, the less likely H0 is to be true and


therefore the more evidence there is against it

Typically, reject H0 if p < 0.05 (i.e. 5%)

If the decision is to reject H0 , the results are termed


IPS 213
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Concepts in hypothesis testing

Finding the p-value


In the example, to set up the test:
use an unbiased estimator, e.g.T = X

fix Type I error (i.e. significance level) e.g. at 5%

We need to find tc such that

Type I Error = P(Reject H0 |H0 is True)


= P(T > tc | = 700)
= 0.05

i.e.
P(|X| > tc | = 700) = 0.05

We need to know the distribution of the test statistic...


IPS 214
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population mean

One-sided tests of the population mean


Let us consider the case where the hypothesized value 0 is
an upper bound on the true population mean
H0 : 0
Where the population variance is known, one defines
the z-test statistic for a sample of size n as
x 0
z=
/ n
(for a Normal population or n > 30)

Then H0 is to be rejected if z z , where z is the P100


percentile of the Normal distribution (e.g. P95 = 1.645)

NB: this example implements an upper-tail test. In a lower


tail test, H0 : 0 is rejected when z z .
IPS 215
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population mean

One-sided tests of the population mean

0.4
P( Z > 1.645, H0) = 0.05

0.3
Density

0.2
0.1
0.0

2 0 2 4
x 0
z=
n

IPS 216
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population mean

z-test in R

Given values n, xbar and mu0, the one-sided z-test may be


carried out by comparing the test statistic
z = (xbar-mu0)/(sigma/sqrt(n))
with the critical value set e.g. for alpha=.05:
z.alpha = qnorm(1-alpha)

Although to obtain a p-value, one may instead instruct:


pval=pnorm(z, lower.tail=FALSE)
pval > alpha

IPS 217
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population mean

Case when is unknown: the t-test

Where the population variance is unknown, one uses the


t-test instead of the z-test. The t-test statistic for a sample
of size n is defined using the sample standard deviation s as
x 0
t= , df = n 1
s n

(for a Normal population or n > 30)

Then H0 : 0 is to be rejected if t t , where t is the


P100 percentile of the Student t-distribution with n 1 dfs

NB: this example implements an upper-tail test. In a lower


tail test, H0 : 0 is rejected when t t .

IPS 218
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population mean

If X1 , X2 , . . . , Xn is a sample from N(, 2 ), then:

X N , 2 /n


X12 + + Xn2 2 (n 1)
X
t(n 1)
s/ n

These properties are useful for deriving p-values

The test statistic is often standardized in some way so as to


use known probabilities to derive the p-value

IPS 219
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population mean

A continuous rv has a t-distribution with parameter m, which


is also called the degrees of freedom where m 1 is an
integer, if its probability density is given by
 m+1
x2 2

f (x) = km 1 +
n

for x R, where

(m + 12 )
km =
(m/2) m

and Z
(u) = e x x u1 dx
0

IPS 220
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population mean

In this case, we find zL, zU such that


P(t(n 1) < zL ) = P[(t(n 1) > zU ) = 0.025
Everything
 else is the same as in the
 known- case:
P X zU sn < < X zL sn = 0.95

X = 23.78778, s = 0.07827513, n = 23

Using R:
qt(0.025,22) = -2.073873

qt(0.975,22) = 2.073873

i.e. zL = 2.073873 and zU = 2.073873

So the 95% C.I becomes


 
0.08 0.08
23.79 2.07 , 23.79 + 2.07 = (23.78, 23.79)
23 23
IPS 221
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population mean

t-test in R

The (very popular) t-test is readily available, with synopsis:


t.test(x, y = NULL,
alternative = c(two.sided, less, greater),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)

Be careful during implementation:


t.test(1:10,y=c(7:20)) # p-value = .00001855
t.test(1:10,y=c(7:20, 200)) # p-value = .1245
t.test(1:10,y=c(7:20), alt=less) # comment?
t.test(1:10,y=c(7:20), alt=greater) # comment?

IPS 222
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population proportion

One-sided tests of the population proportion


The null hupothesis of an upper-tail test of population
proportion is formulated as
H0 : p 0 p
where p0 is an hypothesized upper bound on the true
population proportion p

Under an adequately randomized sample, and when np0 and


n(1 p0 ) are > 10, the one-proportion z-test is defined as
p p0
z=p
p0 (1 p0 )/ n
The null hypothesis is to be rejected when z z , where z is
the 100(1 ) percentile of the standard Normal distribution

In a lower-tail test, H0 : p0 p is rejected when z z


IPS 223
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, one-sided tests of the population proportion

Testing proportions in R

As for the z-test for the mean, implementation is direct

Given values n, pbar and p0, the one-sided z-test may be


carried out by comparing the test statistic
z = (pbar-p0)/sqrt(p0*(1-p0)/n)
with the critical value set e.g. for alpha=.05:
z.alpha = qnorm(1-alpha)

Although to obtain a p-value, one may instead instruct:


pval=pnorm(z, lower.tail=FALSE)
pval > alpha

IPS 224
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, two-sided tests

Two-sided tests of the population mean

For a two-sided z-test, one must check whether

z z/2 or z z/2

In R, the two-tailed p-value of the statistic may be obtained by


pval = 2 * pnorm(z, lower.tail=FALSE)

The two-sided t-test is derived using argument alternative:


t.test(x, alternative=two.sided)
There is actually no need to specify it as it is the default value.

IPS 225
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, two-sided tests

Two-sided tests of the population proportion

A two-tailed test on proportions may be implemented in R as


follows (e.g. at the 5% significance level):
z = (pbar-p0) / sqrt(p0*(1-p0)/n)
alpha = .05
z.half.alpha = qnorm(1-alpha/2)
pval = 2 * pnorm(abs(z), lower.tail=FALSE)

IPS 226
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, two-sided tests

Coal example: the Z -test


Combine these values into a confidence statement about the
true gross calorific content of Osterfeld 262DE27?

First assume that the unknown variance component, or


equivalently, the standard deviation is known

Under this assumption X N(, 2 /n)

Here and n are known, is unknown

Moreover, X N(0, 2 /n), that is a distribution which is


free of the unknown parameter

Using standardization,
X
N(0, 1)
/ n
IPS 227
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, two-sided tests

Example: gross calorific content of coal


When a shipment of coal is traded, a number of its properties
should be known accurately, because the value of the
shipment is determined by them

Gross calorific value characterizes the heat content (in


megajoules per kilogram, MJ/kg)

The ISO 1928 method is carried out to determine its value

Resulting measurement errors are known to be approximately


normal, with a standard deviation of about 0.1 MJ/kg

Laboratories that operate according to standard procedures


receive ISO certificates

The next table shows a number of such ISO 1928


measurements for a shipment of Osterfeld coal coded
IPS 228
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, two-sided tests

Example: gross calorific content of coal

Gross calorific value measurements for Osterfeld 262DE27:

23.870 23.730 23.712 23.760 23.640 23.850 23.840 23.860


23.940 23.830 23.877 23.700 23.796 23.727 23.778 23.740
23.890 23.780 23.678 23.771 23.860 23.690 23.800

[Source: A.M.H. van der Veen and A.J.M. Broos.


Interlaboratory study programme ILS coal
characterizationreported data. Technical report, NMi Van
Swinden Laboratorium B.V., The Netherlands, 1996 ]

IPS 229
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, two-sided tests

Coal example: the Z -test

We need to find two points, zL and zU such that

P (z < zL ) = P (z > zU ) = 0.025

This will give us the probability equation

X
   

P zL < < zU = P zL < X < zU
/ n n n
 

= P X zU < < X zL
n n
= 0.95

From tables or software, zU = 1.96, and zL = 1.96

IPS 230
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
One-sample, two-sided tests

Coal example - confidence interval

From the data, we compute xn = 23.788

Using the given = 0.1 and = 0.05, we find the 95% CI:
 
0.1 0.1
23.788 1.96 , 23.788 + 1.96
23 23
i.e.
(23.747, 23.829) MJ/kg

IPS 231
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Two-sample tests

Two-sample z-test

In a two-sample z-test, one compares the means of two


samples w.r.t. an hypothesized difference in means d0 , using a
test statistic of the form
x1 x2 ) d0
(
z= q 2
1 22
n1 + n2

IPS 232
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Two-sample tests

Paired t-test
In a paired t-test, one compares the mean d of the differences
between two samples with an hypothesized difference in
means d0 , using a test statistic of the form
d d0
t= , df = n 1
s/ n
Recall synopsis for the t-test:
t.test(x, y = NULL,
alternative = c(two.sided, less, greater),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)

In the paired case, both x and y must be specified , and be


the same length. Example:
t.test(x, y, alt=less, paired=TRUE)
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Two-sample tests

Other tests in R
F-test to compare the variances of two samples from normal
populations: var.test()
x <- rnorm(50, mean = 0, sd = 2)
y <- rnorm(30, mean = 1, sd = 1)
var.test(x, y)

Testing for null correlation (Pearsons coefficient):


z <- rnorm(30, mean = 0, sd = 2)
cor.test(y, z)

Nonparametric tests (cf. next section):


wilcox.test() (means), ks.test() (Normality), ...

And many more...

IPS 234
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Goodness-of-fit tests

Pearsons Chi-Square Goodness-of-Fit Test

Testing for the nature of a distribution is also very useful and


often needed

Let X be a random variable whose probability density (or


mass) function fX (x) is unknown

We want to test the null hypothesis H0 : fX (x) = f0 (x)


against the alternative hypothesis HA :fX (x) 6= f0 (x) where f0
is a given distribution function

Pearsons 2 goodness-of-fit test checks whether the observed


frequencies are consistent with those expected under f0 (x)

IPS 235
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Goodness-of-fit tests

1 Divide the set Sx of possible values of X into k disjoint and


exhaustive classes (or intervals)

2 Take a random sample of size n from the population X


(n m )2
Calculate D 2 = kj=1 j mj j where nj is number of obs in j-th
P
3
class and mj the expected frequency under H0

4 If H0 is true and if n is large enough, then D 2 2kr 1


(approximately), where r is the number of unknown
parameters of the function f0 (x) that we must estimate

5 Reject H0 at the significance level iif D 2 > 2,kr 1

IPS 236
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Goodness-of-fit tests

Example of a Goodness of Fit


Suppose we have a random sample from a discrete random
variable summarized in the following table

value in sample 0 1 2 3 >3


frequency 31 33 22 12 2

Test null hypothesis that this random sample comes from a


Poisson distribution with = 1 at significance level =0.05
0
Compute probabilities P[X = 0]=e 0! =0.3678 and multiply
by sample size n to get an estimated proportion of the
number of 0s expected if it really came from Poi(1)

Similarly, estimate probabilities for X = 1, 2, 3, and P[X > 3]


and multiply by n = 100
IPS 237
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Goodness-of-fit tests

value in sample 0 1 2 3 >3


frequency 31 33 22 12 2
estimated freq 36.78 36.78 18.39 6.13 1.899

k
X (nj mj )2
D2 =
mj
j=1

D 2 = 7.63 < 7.81 (corresponding to 20.05,511 )

IPS 238
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Testing for significance in linear regression

Testing for significance of regression coefficients


Tests are based on two quantities:
n
X
SSx = (xi x)2 = nsx2
i=1

where sx2 is the sample variance of the xi s, and the sum of


squared errors (or residuals)
n
X
SSE = (Yi Yi )2
i=1

Additionally we define the quantity called Mean Squared Error


SSE
MSE =
n2
IPS 239
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Testing for significance in linear regression

Regression Tests

To test the null hypothesis H0 :0 = 00 we use the statistic

0 00
T0 := q P 2 tn2
MSE i xi
SSX n

We then reject H0 at significance level if and only if



|T0 | > t/2,n2 if HA : 0 6= 00
T0 > t,n2 if HA : 0 > 00
T0 < t,n2 if HA : 0 < 00

IPS 240
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Testing for significance in linear regression

Regression Tests

To test the null hypothesis H0 :1 = 10 we use the statistic

1 10
T0 := q tn2
MSE
SSX

We then reject H0 at significance level if and only if



|T0 | > t/2,n2 if HA : 0 6= 00
T0 > t,n2 if HA : 0 > 00
T0 < t,n2 if HA : 0 < 00

IPS 241
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Testing for significance in linear regression

There is an easy way to calculate SSE

First find the Total Sum of Squares


n
X
SST = (Yi Y )2
i=1
Xn
= Yi2 nY 2
i=1

Then find the Regression Sum of Squares

SSR = 12 SSX
= SST SSE

IPS 242
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Testing for significance in linear regression

Tensile strength example: testing for significance


Test H0 : 1 = 0 against HA : 1 6= 0

SST = 5i=1 yi2 5y 2 = 10.79 5(1.46)2 =0.132


P

SSR = 12 SSX =(0.1)(1.1)2 = 0.121

Thus, SSE = SST SSR = 0.011


q
SSR
Note that the test statistic reduces to MSE or
s
0.121
T0 = = 33 = 5.744563
0.011/(5 2)

From Tables, t0.025,3 = 3.18, hence we reject the null


hypothesis
IPS 243
ST1051-ST3905-ST5005-ST6030
Hypothesis Testing
Summary

Summary
1 What we need to do a test:
1 null and alternative hypotheses

2 a test statistic T

3 a significance level

4 a rejection region C ? (where if the test statistic value lies,


reject H0 )

2 Form of the test: Reject H0 if T C ?


3 Two possible errors:
1 Type I error =P[Reject H0 H0 True]

2 Type II error =P[Fail to Reject H0 H0 False]


It is the Type I error which is fixed by setting it equal to the
significance level
IPS 244