Vous êtes sur la page 1sur 40

ST102

Elementary Statistical Theory


Revision lectures Michaelmas Term material
Dr James Abdey

Department of Statistics
London School of Economics and Political Science

ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 1

Examination arrangements
Thursday, May 22nd 2014, 10:0013:00.

Please double-check the time in your examination timetable on LSE


for You (published by week 1 of ST), in case of (extremely unlikely)
changes to the date and time.
In the examination, you will be provided with:

Murdoch and Barnes: Statistical Tables, 4th edition.

The only tables from this that you will (may) need are for the
standard normal, t,
2
, F and Wilcoxon distributions. These tables
are also on the ST102 Moodle site, so make sure you are familiar with
their layout.

A formula sheet (at the end of the examination paper). This is also
on the ST102 Moodle site.
For general administrative matters on examinations see:
http://www2.lse.ac.uk/intranet/students/registrationTimetablesAssessment/
examinationsAndResults/examTimetables/ExamTimetable.aspx
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 2

A word on calculators
You can also use a scientic calculator, as prescribed by examination
procedures.
The rubric on the front of the examination paper will say:
Scientic calculators are permitted in the examination, as prescribed
by the Schools regulations. If you have a programmable calculator,
you must delete anything stored in the memory in the presence of an
invigilator at the start of the examination.
In short, graphics calculators are permitted (the graphics capability
will be of no benet in the examination). However, any
programmable memory must be re-set in the presence of an
invigilator at the start of the examination.
Although many statistical calculations can be performed on scientic
calculators, you must still show all your working in your answer
booklet.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 3

Structure of the examination paper
The question paper contains seven questions, all given equal weight
(20 marks each).
Section A: two compulsory questions; Section B: ve questions.
Answer both questions from Section A, and three questions from
Section B.
If you answer more than three questions from Section B, only your
best 3 answers will count towards the nal mark. However, you are
strongly advised to only attempt 3 questions from Section B to
make ecient use of your time.
Each question in each section could cover any part of the syllabus.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 4

Structure of the examination paper
The nal mark is out of 100.
Pass mark for the examination is 40.
Important note: Re-sit candidates only will sit the old
examination paper structure which was in place in the 201213
academic year. All past examination papers on Moodle have the
old structure.
A specimen 2014 examination paper with the new structure is also
available on Moodle.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 5

Notes on examination tactics
Make sure you do not miss out on any marks that you can get!

i.e. try to give time to all questions you attempt, and not to get
stuck on any single question.
Some (parts of) questions are entirely standard and straightforward,
some are more challenging. So try to make sure you do not miss out
on the standard ones at least, bearing in mind that:

The questions are not in order of diculty.

Parts of questions (e.g. 1(a), 1(b), . . .) are not in order of diculty.

Questions may be answered in any order but keep answers to each


question in one place in your answer booklet!
Remember that partial credit is given for partially correct answers.
The only guaranteed way to get a 0 is an empty answer book!
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 6

Preparing for the examination
Only the topics covered in the lecture notes are included in the
examination

... except for the topics in MT, and others (if any) that may be
explicitly stated as not examinable, mainly LT after Section 6.10 and
Section 7 on ANOVA.
Among these topics, all are potentially examinable.

However, some are, of course, more central than others, and more
likely to turn up in the examination.

Use the lecture notes, exercises and (especially) recent examination


papers to form an idea of which topics are most prominent in this
respect, and to decide which ones to give most weight to in your
preparation.
Also read the textbook on these topics, if it helps you.
Substantive queries about the material are best posted in Moodle
Q&A forum (so everyone can read the response).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 7

Past examination papers
Most relevant are papers from 2008 onwards (including the 2008
mock).
Older examination papers exist, but...

they include further topics that are no longer covered

the solutions are not always complete and contain some errors

the style of the questions is generally dierent from more recent ones.
These can be accessed (with LSE username and password) at:
https://library-2.lse.ac.uk/protected-exam/index.html
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 8

The rest of today
Outline of the most important topics and results from MT.

You should denitely at least remember these!

In the examination you can take these results as known and use them,
unless told otherwise (i.e. unless a question explicitly asks you to
prove some of them).
Examples of common types of questions, from past examinations.

If you suspect any typos or errors in the solutions, please query them
in the Moodle Q&A forum.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 9

Key topics covered in MT
1. Descriptive statistics

Not often separate examination questions, but some of these (e.g.



X
and S
2
) appear in other questions.
2. Set theory and counting rules (used in probability questions).
3. Probability: denition, classical probability, independence,
conditional probability, Bayes theorem.
4. Random variables: denition, pf/pdf and cdf, expected values and
variances, medians, moment generating functions.
5. Common probability distributions: discrete and continuous uniform,
Poisson, binomial, exponential and normal.
6. Multivariate probability distributions: independence of random
variables, conditional and marginal distributions, sums and products
of random variables, covariance and correlation.
7. Sampling distributions: random (IID) samples, statistics and their
sampling distributions, the central limit theorem.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 10

Set theory
Basic rules of set-theoretic operations:
A B = B A and A B = B A
A (B C) = (A B) C and A (B C) = (A B) C
A(B C) = (AB) (AC) and A(B C) = (AB) (AC)
(A B)
c
= A
c
B
c
and (A B)
c
= A
c
B
c
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 11

Set theory
Let S be the sample space and A S. Then:
A and
c
= S
A = and A = A
A S = A and A S = S
A A
c
= and A A
c
= S
A A = A and A A = A
(For these and other similar results, see slides 122123.)
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 12

Counting rules
Remember that in classical probability problems, where all
outcomes are equally likely, probability calculations involve counting
outcomes (see slides 141142).
See slide 155 for the basic counting formulae.
Counting possibilities directly (without the formulae) is also ne, if
you can do it (i.e. in small problems).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 13

Probability: denition and key properties
See slide 128 for the axioms of probability.
The basic properties of the probability function P (slide 136):
P(S) = 1 and P() = 0.
0 P(A) 1 for all events A.
P(A
c
) = 1 P(A).
P(A B) = P(A) + P(B) P(A B).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 14

Some crucial results and denitions
Independence: A and B are independent if P(AB) = P(A)P(B)

and if A
1
, A
2
, . . . , A
n
are independent, then
P(A
1
A
2
A
n
) = P(A
1
)P(A
2
) P(A
n
).
Conditional probability:
P(A| B) =
P(A B)
P(B)
, provided P(B) > 0.
Multiplication rule: P(A B) = P(A| B)P(B) and its extensions
(see slide 181182).
If A
1
, A
2
, . . . , A
n
form a partition of the sample space (see slide
124):

Total probability formula: P(B) =


n

j =1
P(B | A
j
)P(A
j
)

Bayes theorem:
P(A
i
| B) =
P(B | A
i
)P(A
i
)
P(B)
=
P(B | A
i
)P(A
i
)
n

j =1
P(B | A
j
)P(A
j
)
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 15

Example question
Question: A, B and C are independent events. Prove that A and
(B C) are independent.
Solution: Using rules for set-theoretic operations and basic properties of
probability:
P[A (B C)]
= P[(A B) (A C)]
= P(A B) + P(A C) P[(A B) (A C)]
= P(A B) + P(A C) P(A B C)
= P(A)P(B) + P(A)P(C) P(A)P(B)P(C)
= P(A)[P(B) + P(C) P(B)P(C)]
= P(A)[P(B) + P(C) P(B C)] = P(A)P(B C)
and thus A (B C).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 16

Example question
Question: If a committee of three persons is to be formed from a group
of ve men and four women, what is the probability that at least two of
the committee are women, given that there is at least one woman on the
committee?
Solution: Let A = There are at least two women on the committee and
B = There is at least one woman on the committee. Note that A B,
so A B = A. Calculate rst:
P(No women on the committee) =
_
5
3
_
/
_
9
3
_
= 10/84
P(One woman on the committee) =
__
5
2
__
4
1
__
/
_
9
3
_
= 40/84
Then P(B) = 1 10/84 = 74/84, P(A) = 1 [10/84 +40/84] = 34/84,
and
P(A| B) =
P(A B)
P(B)
=
P(A)
P(B)
=
34
74
= 0.4595.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 17

Example question
Question: We know that 3% of the population have a particular heart
condition. A screening test has a 70% chance of identifying the condition
if it is present, but has a 20% chance of recording a positive result when
it is not. Evaluate the probability that a patient who gets a positive test
result actually has the condition.
Solution: Let H = Person has the condition and D = Test is positive.
Then P(H) = 0.03, P(H
c
) = 0.97, P(D | H) = 0.7 and P(D | H
c
) = 0.2.
Using Bayes theorem, we get:
P(H | D) =
P(D | H)P(H)
P(D)
=
P(D | H)P(H)
P(D | H)P(H) + P(D | H
c
)P(H
c
)
=
0.7 0.03
0.7 0.03 + 0.2 0.97
=
0.021
0.021 + 0.194
= 0.098.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 18

Probability distributions of random variables
Suppose X is a continuous random variable, f (x) is its probability density
function (pdf) and F(x) is its cumulative distribution function (cdf). Key
results for these:
pdf must satisfy (i.) f (x) 0 for all x, and (ii.)
_

f (x) d(x) = 1.
F(x) = P(X x) =
_
x

f (t) dt.
P(a < X b) = F(b) F(a) =
_
b
a
f (x) d(x) for any a b.
F

(x) = f (x).
Similar results, except for the last one, also hold for the cdf and
probability function p(x) = P(X = x) of a discrete random variable (with
integration replaced by summation over the possible values of X).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 19

Expected values, variances and medians
For a continuous random variable X, we have:
E(X) =
_

xf (x) dx
E[g(X)] =
_

g(x)f (x) dx for any function g


Var(X) = E[(X E(X))
2
] =
_

(x E(X))
2
f (x) dx
= E(X
2
) (E(X))
2
F(m) = 0.5
where m denotes the median of X.
Similar denitions for a discrete random variable, with sums instead of
integrals (and a modied denition of the median, see slide 288).
Expected values and variances of sums and products of random variables:
see slide 433.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 20

Moment generating functions
The moment generating function (mgf) of a continuous r.v. X is:
M
X
(t) = E(e
tX
) =
_

e
tx
f (x) dx.
For discrete random variables, integration is replaced by summation, and
f (x) by p(x).
In both cases:
M

X
(0) = E(X)
and M

X
(0) = E(X
2
)
which also gives:
Var(X) = E(X
2
) (E(X))
2
= M

X
(0) (M

X
(0))
2
.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 21

Example question
Question: The weights, in kilograms, of a certain species of sh caught
o the coast of Cornwall have a continuous distribution well-described by
the probability density function:
f (x) =
_
c(6x x
2
5) 1 x 5
0 otherwise.
(a) Determine the value of the constant c; (b) Derive the cumulative
distribution function and evaluate the median and expected value of X.
Solution: This very common type of question requires integration. In an
examination answer you must show the intermediate steps of the
integration, even though they are omitted here for brevity.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 22

Example question
Solution continued: (a) Here:
_

f (x) dx = c
_
5
1
(6x x
2
5) dx = c 32/3
and since the integral must be 1, we have x = 3/32.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 23

Example question
Solution continued: (b) We have:
_
x
1
(3/32)(6t t
2
5) dt = (9x
2
x
3
15x + 7)/32
so:
F(x) =
_

_
0 for x < 1
(9x
2
x
3
15x + 7)/32 for 1 x 5
1 for x > 5.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 24

Example question
Solution (b) continued:
The median m is the solution to F(m) = 0.5.
Since you cannot solve this third-degree equation directly, there must be
another way.
Since the expected value E(X) = 3 is exactly half-way between 1 and 5,
you might guess that this is because the distribution is symmetric around
3. If this is the case, the median is also equal to 3. Direct calculation
then shows that indeed F(3) = 0.5, so m = 3.
(This reminds us that in any question some parts may be routine, while
others may involve a twist!)
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 25

Example question
Solution (b) continued:
The expected value is given by:
E(X) =
_

xf (x) dx = (3/32)
_
5
1
x(6x x
2
5) dx = 3
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 26

Common probability distributions
You should memorise, and can then use, the pf/pdf, cdf, mean, variance
and median (if given) of the following distributions:
Binomial
Poisson
Discrete and continuous uniform
Exponential
Normal.
If any other distribution is used in a question, you will be given formulae
for them (or asked to derive them, as part of the question).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 27

Common discrete distributions
Poisson distribution for counts x = 0, 1, 2, . . . .
Binomial distribution for the number x of successes out of n trials.
Common type of question: calculate probabilities or expected values
for these distributions, given some value of their parameters.
For the binomial distribution with large n, the normal approximation
is often used:

i.e. Bin(n, ) is approximately N(n, n(1 )) (see slide 377)

the table of the standard normal distribution is then used

then remember to include the continuity correction (see slide 379).


Remember also results for sums of independent binomial and
Poisson random variables: see slide 443.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 28

The normal distribution
For a normal distribution X N(,
2
), it is important to remember (in
addition to the pdf, mean E(X) = and variance Var(X) =
2
) that:
Linear combinations and sums of normally distributed random
variables are also normally distributed (see slide 445).
In particular, the standardised variable
Z =
X

N(0, 1)
With standardisation, calculations of probabilities for any normal
distribution can be transformed into calculations for a standard
normal [N(0, 1)] distribution.
The normal distribution tables that you have in the examination
show values of 1 (z) = P(Z > z) of the standard normal
distribution.

You should know how to do these calculations (see slides 365376 for
the rules and examples).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 29

Example question
Question: In the construction of a certain assembly, four rods of length
X
1
, X
2
, X
3
and X
4
are connected end-to-end to form a composite rod to
span a gap of width Y. To function satisfactorily the length of the
composite rod must exceed the size of the gap by not less than 0.10 cm.
The lengths X
1
, X
2
, X
3
and X
4
are independently normally distributed
with mean 4.0 cm and variance 0.015 cm. Y is also normally distributed
with mean 15.94 cm and variance 0.024 cm, independently of the lengths
of the rods.
Find the probability that the assembly is satisfactorily formed at the rst
attempt. Out of 10 independent composite rods, what is the probability
that two and only two are satisfactory?
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 30

Example question
Solution: Let C = X
1
+ X
2
+ X
3
+ X
4
be the length of the composite
rod.
Then C is also normally distributed with mean 4 4.0 = 16.0 and
variance 4 0.015 = 0.06.
Since Y N(15.94, 0.024) independently of C, the dierence is also
normally distributed with:
D = C Y N(16.0 15.94, 0.06 + 0.024) = N(0.06, 0.084).
The probability we need is:
P(D 0.1) = P
_
D 0.06

0.084

0.1 0.06

0.084
_
= P(Z 0.14) = 1 (0.14) = 0.4443
where Z N(0, 1).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 31

Example question
Solution continued: For the second part of the question, let X now
denote the number of satisfactory rods out of 10 independent rods.
Then X Bin(10, 0.4443). The probability we need is
P(X = 2) =
_
10
2
_
(0.4443)
2
(1 0.4443)
8
= 45 (0.4443)
2
(1 0.4443)
8
= 0.081.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 32

Multivariate distributions
If random variables X
1
, X
2
, . . . , X
n
are independent, the pf/pdf of
their joint distribution is the product of their univariate marginal
pfs/pdfs (see slides 426430).
Key concepts for the general (possibly non-independent) case were
introduced mainly in the context of a bivariate discrete random
variable (X, Y):

Marginal distributions (see slides 395396):


p
X
(x) =

y
p(x, y) and p
Y
(y) =

x
p(x, y)

Conditional distributions, for example (see slide 404):


p
Y | X
(y | x) = P(Y = y | X = x) =
P(X = x and Y = y)
P(X = x)
=
p(x, y)
p
X
(x)
Covariance and correlation: measures of association between any
two random variables (see slides 415419).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 33

Example question
Question: The table below species the joint probability distribution of
the random variables X and Y:
X
1 0 1
1 0.05 0.15 0.10 0.30
Y 0 0.10 0.05 0.25 0.40
1 0.10 0.05 0.15 0.30
0.25 0.25 0.50 1
(a) Identify the marginal distribution of Y, and the conditional
distribution of X | Y = 1.
(b) Evaluate the covariance of X and Y.
(c) Are X and Y independent?
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 34

Example question
Solution: (a) Conveniently, the summation to give the marginal
distribution p
Y
(y) is already included in the table. So:
p
Y
(1) = 0.30, p
Y
(0) = 0.40, p
Y
(1) = 0.30
and p
Y
(y) = 0 for all other y.
The conditional pf is:
p
X | Y
(x | Y = 1) = p
X,Y
(x, 1)/p
Y
(1) = p
X,Y
(x, 1)/0.30
i.e.
p
X | Y
(1 | Y = 1) = 0.10/0.30 = 0.33
p
X | Y
(0 | Y = 1) = 0.05/0.30 = 0.17
p
X | Y
(1 | Y = 1) = 0.15/0.30 = 0.50
and p
X | Y
(x | 1) = 0 for all other x.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 35

Example question
Solution: (b) First we need the marginal expected values:
E(Y) =

y
yp
Y
(y) = 1 0.30 + 0 0.40 + 1 0.30 = 0
and E(X) = 0.25, similarly. We also need:
E(XY) =

y
xyp
X,Y
(x, y)
= 1 (0.10 + 0.10) + 1 (0.05 + 0.15) + 0 = 0
so
Cov(X, Y) = E(XY) E(X)E(Y) = 0 0.25 0 = 0.
(c) Even though the covariance is 0, X and Y are not independent. For
example, p
X
(1)p
Y
(0) = 0.20 = 0.25 = p
X,Y
(1, 0). Therefore it is not the
case that p
X,Y
(x, y) = p
X
(x)p
Y
(y) for all x, y.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 36

Sampling distributions
Random sample from a distribution f (x; ): Random variables
X
1
, X
2
, . . . , X
n
which are independent and each has the same
distribution f (x, ) (see slide 454)

i.e. n independent and identically distributed (IID) random variables.


A statistic is a function of the variables in the sample which does
not depend on unknown parameters (i.e. its value in a sample can
be calculated when a sample is observed) (see slide 458).
A statistic is a random variable. Its distribution is the sampling
distribution of the statistic (see slide 459).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 37

Sampling distribution of the sample mean
Consider a random sample X
1
, . . . , X
n
from a distribution with mean
E(X
i
) = and variance Var(X
i
) =
2
.
For the sampling distribution of

X =
n

i =1
X
i
/n, the mean and
variance are always E(

X) = and Var(

X) =
2
/n, rsepectively.
About the shape of the sampling distribution we know the following:

If the X
i
s are normally distributed:

X N(,
2
/n) (1)

Even when X
i
are not normally distributed, (1) holds approximately
when n is large enough. This is the central limit theorem (CLT)
(see slides 475477).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 38

Sampling distribution of the sample mean
A common application is when X
i
Bin(1, ).
Let S =
n

i =1
X
i
, which is distributed as S Bin(n, ).
Then

X = S/n = is the sample proportion of observations with
value X
i
= 1.
The CLT then says that approximately:
N(, (1 )/n).
This also implies that approximately:
S = n N(n, n(1 )).
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 39

Example question
Question: A random sample of 100 individuals is telephoned and asked
various questions. One of the questions was Do you consider that in
general the products for sale in AB Stores are of high quality?. Suppose
the percentage in the general population who believe that the products
are of high quality is 30%. Let S denote the number of people in the
sample who answer Yes to the question. Evaluate P(S 25).
Solution: Here the 100 individual responses are a random sample from
the distribution X
i
Bin(1, 0.3), so S Bin(100, 0.3).
Let Y N(100 0.3, 100 0.3 0.7) = N(30, 21).
Here it is useful to use a continuity correction in the calculation, so:
P(S 25) = P(Y 25.5) = P
_
Y 30

21

25.5 30

21
_
= P(Z 0.98) = 0.1635
(where Z N(0, 1)), using the table of the standard normal distribution.
ST102 Elementary Statistical Theory Dr James Abdey ST 2014 Revision lectures MT material 40