Académique Documents
Professionnel Documents
Culture Documents
CVEN2002/2702
Week 10
This lecture
7. Inferences concerning a mean
7.11 Hypothesis tests for a proportion
9. Inferences concerning a difference of means
9.2 Two independent populations
9.3 Paired observations
8. Inferences concerning a variance
8.2 Estimation of a variance
8.3 Confidence interval for a variance
8.4 Hypothesis tests for a variance
Additional reading: Sections 7.3, 7.5 and 8.2 (pp.359-367)
in the textbook
CVEN2002/2702 (Statistics)
Dr Justin Wishart
2 / 53
Revision
In Section 7.8 (Week 8), we explained that, when a
proportion/probability is the population parameter of interest, it can
naturally be estimated from the sample by the sample proportion
n
X
=1
Xi
P
n
i=1
P
a
np
N (0, 1)
(1 )
(for n large)
Dr Justin Wishart
3 / 53
np
P
(1 )
N (0, 1)
against
Ha : 6= 0
reject H0 if p
/ 0 z1/2
, 0 + z1/2
n
n
CVEN2002/2702 (Statistics)
Dr Justin Wishart
4 / 53
0 (10 )
n
0
p
np
0 (1 0 )
CVEN2002/2702 (Statistics)
Dr Justin Wishart
5 / 53
or
Ha : < 0 ,
0 (1 0 )
n
0 (1 0 )
n
will have approximate significance level
< 0 z1
reject H0 if p
or
p = (z0 )
Dr Justin Wishart
6 / 53
The estimate of is
=
p
48
= 0.80
60
0.70 (1 0.70)
= 0.7973
60
Dr Justin Wishart
7 / 53
0.80 0.70
60 p
= 1.69
0.70 (1 0.70)
p = 1 (1.69) = 0.0455
; at level = 0.05, we do reject H0
CVEN2002/2702 (Statistics)
Dr Justin Wishart
8 / 53
9. Inferences concerning a
difference of means
CVEN2002/2702 (Statistics)
Dr Justin Wishart
9 / 53
9.1 Introduction
Dr Justin Wishart
10 / 53
9.1 Introduction
Dr Justin Wishart
11 / 53
Dr Justin Wishart
12 / 53
H1 : 1 > 2
CVEN2002/2702 (Statistics)
or
(two-sided alternative)
H1 : 1 < 2
Dr Justin Wishart
(one-sided alternatives)
Session 2, 2012 - Week 10
13 / 53
X1 =
and X2 =
X1i N 1 ,
X2i N 2 ,
n1
n1
n2
n2
i=1
i=1
(a)
( means that these are exact results for any n1 , n2 if the populations are
normal, approximate results for large n1 , n2 if they are not)
We also know (Slide 20 Week 6) that if X
(2 , 2 )
1 ) and X2 N
1 N (1 , q
are independent, then aX1 + bX2 N a1 + b2 , a2 12 + b2 22
1 X
2 :
; we deduce the sampling distribution of X
q 2
(a)
X1 X2 N 1 2 , n11 +
22
n2
Dr Justin Wishart
14 / 53
x1i and x2 =
x2i
x1 =
n1
n2
i=1
i=1
s
s
2
2
2
2
1
reject H0 if x1 x2
/ z1/2
+ 2 , z1/2
+ 2
n1
n2
n1
n2
(interval obviously centred at 0 by H0 )
The associated p-value is given by p = 2 (1 (|z0 |))
x1 x2
where z0 is the z-score of x1 x2 if 1 2 = 0, i.e. z0 = q 2
1
22
+
n1
n2
CVEN2002/2702 (Statistics)
Dr Justin Wishart
15 / 53
Dr Justin Wishart
16 / 53
1.5
q
1.32
10
+ 1.3
10
p = 1 (2.58) = 0.0049
; adding the new ingredient significantly reduces the drying time
CVEN2002/2702 (Statistics)
Dr Justin Wishart
17 / 53
We note that X1 X2 N 1 2 , n1 + n2 , so
(X1 X2 ) (1 2 )
q 2
1 = P z1/2
z1/2
1
22
+
n1
n2
s
2
2
1
1 X
2 ) z1/2
= P 1 2 (X
+ 2
n1
n2
s
2
2
2
2
1
(x1 x2 ) z1/2
+ 2 , (x1 x2 ) + z1/2
+ 2
n1
n2
n1
n2
CVEN2002/2702 (Statistics)
Dr Justin Wishart
18 / 53
Dr Justin Wishart
19 / 53
CVEN2002/2702 (Statistics)
Dr Justin Wishart
20 / 53
Dr Justin Wishart
21 / 53
2
n X
N (0, 1) + S =
1
n1
Pn
i=1 (Xi
)2
X
n X S tn1
n1 + n1
1
N (0, 1) + Sp2 =
1 X
2 ) (1 2 ) (a)
(X
q
tn1 +n2 2
Sp n11 + n12
Dr Justin Wishart
22 / 53
n1
1 X
x1i
n1
and
i=1
x2 =
n2
1 X
x2i
n2
i=1
and
v
u
u
s1 = t
1
n1 1
n1
X
(x1i x1 )2
and
i=1
v
u
u
s2 = t
2
1 X
(x2i x2 )2
n2 1
i=1
Dr Justin Wishart
23 / 53
x1 x2
/ tn1 +n2 2;1/2 sp
1
1
+ , tn1 +n2 2;1/2 sp
n1 n2
1
1
+
n1 n2
x x2
q1
sp n11 + n12
Dr Justin Wishart
24 / 53
Dr Justin Wishart
25 / 53
"
(x1 x2 ) tn1 +n2 2;1 sp
CVEN2002/2702 (Statistics)
Dr Justin Wishart
1
1
+ , +
n1 n2
26 / 53
Dr Justin Wishart
27 / 53
1.5
1.0
1.0
0.0
0.5
0.0
Theoretical Quantiles
0.5
0.5
0.5
Theoretical Quantiles
89
1.5
1.5
1.0
1.0
90
91
92
93
94
95
90
Sample Quantiles
92
94
96
Sample Quantiles
With the data we have, we easily find the observed pooled standard
deviation
s
s
(n1 1)s12 + (n2 1)s22
7 2.392 + 7 2.982
sp =
=
= 2.70
n1 + n2 2
8+82
CVEN2002/2702 (Statistics)
Dr Justin Wishart
28 / 53
x1 x2
/ 2.145 2.70
1 1
+ , 2.145 2.70
8 8
1 1
+
8 8
= [2.895, 2.895]
Here, x1 x2 = 0.478
; do not reject H0 !
Conclusion: at the 0.05 level of significance, we do not have evidence
enough to conclude that catalyst 2 results in a mean yield that differs
from the mean yield when catalyst 1 is used
; cheaper catalyst 2 can be used without (significantly) affecting the
mean process yield
CVEN2002/2702 (Statistics)
Dr Justin Wishart
29 / 53
x x2
0.478
q1
q
=
1
1
sp n1 + n2
2.70 18 +
= 0.35,
1
8
so that
p = 2 P(T > 0.35) = 2 0.365 = 0.73,
for T t14
; not very risky to claim that the yield is significantly affected
A 95% confidence interval can also be derived for 1 2 :
"
#
r
r
1 1
1 1
0.478 2.145 2.70
+ , 0.478 + 2.145 2.70
+
8 8
8 8
= [2.418, 3.374]
Of course, 0 belongs to this interval of plausible values for 1 2
(why?)
CVEN2002/2702 (Statistics)
Dr Justin Wishart
30 / 53
(s22 /n2 )2
n2 1
Dr Justin Wishart
31 / 53
s22
n2
against
Ha : 12 6= 22
Dr Justin Wishart
32 / 53
Sample mean
51.71
136.14
Dr Justin Wishart
33 / 53
(3.592 /10)2
9
= 9.87
From the table, we can find t9;0.975 = 2.262, so a 95% confidence interval is
"
s
#
r
2
2
s
s
3.592
0.792
1
2
x1 x2 t;1/2
= 51.71 136.14 2.262
+
+
n1
n2
10
10
= [87.06, 81.80]
; we can be 95% confident that the true average permeability for acetate
fabric exceeds that for cotton by between 81.80 and 87.06 cm3 /cm2 /sec
CVEN2002/2702 (Statistics)
Dr Justin Wishart
34 / 53
Paired observations
In the application of the two-sample t-test we need to be certain the
two populations (and thus the two random samples) are independent
; this test cannot be used when we deal with before and after data,
the ages of husbands and wives, and numerous situations where
the data are naturally paired (and thus, not independent!)
Let (X11 , X21 ), (X12 , X22 ), . . . , (Xn1 , Xn2 ) be a random sample of n pairs
of observations drawn from two subpopulations X1 and X2 , with
respective means 1 and 2
Because Xi1 and Xi2 share some common information, they are
certainly not independent, but they can be represented as
Xi1 = Wi + Yi1 ,
Xi2 = Wi + Yi2 ,
Dr Justin Wishart
35 / 53
Paired observations
An easy way to get rid of the dependence implied by Wi is just to
consider the differences
Di = Xi1 Xi2 = (Wi + Yi1 ) (Wi + Yi2 ) = Yi1 Yi2
; we have just a sample of independent observations D1 , D2 , . . . , Dn ,
one for each pair, drawn from a distribution with mean
D = 1 2
; testing for H0 : 1 = 2 is exactly equivalent to testing for
H 0 : D = 0
This can be accomplished by performing the usual one-sample t-test
(Slides 33-34, Week 9) (or a large-sample test, Slides 39-40 Week 9)
on D , from the observed sample of differences
Note: the test will be performed on the sample of differences only
; check if the population of differences is normal or not (the initial
distributions of X1 and X2 do no matter)
CVEN2002/2702 (Statistics)
Dr Justin Wishart
36 / 53
1
47
36
2
73
60
3
46
44
4
124
119
5
33
35
6
58
51
7
83
77
8
32
29
9
26
26
10
17
11
Define 1 the true mean weekly loss before the safety program was put into
operation, and 2 the true mean weekly loss after the safety program was put
into operation. We would like to test:
H0 : 1 = 2
against
Ha : 1 > 2
CVEN2002/2702 (Statistics)
37 / 53
Dr Justin Wishart
38 / 53
d
5.2
n = 10
= 3.347
s
4.296
Dr Justin Wishart
39 / 53
8. Inferences concerning a
variance
CVEN2002/2702 (Statistics)
Dr Justin Wishart
40 / 53
8.1 Introduction
CVEN2002/2702 (Statistics)
Dr Justin Wishart
41 / 53
Estimation of a variance
In Chapter 7, there were several instances where we estimated a
population standard deviation by means of a sample standard
deviation (e.g. in the derivation of the t-confidence interval for )
The sample variance of a random sample {X1 , X2 , . . . , Xn } with mean
is given by
X
n
1 X
)2 ,
(Xi X
S2 =
n1
i=1
)2 =
(Xi X
n
X
i=1
n
X
)2 + 2( X
)
(Xi )2 + n( X
i=1
(Xi )
i=1
)2 2n(X
)2 =
(Xi )2 + n(X
i=1
CVEN2002/2702 (Statistics)
n
X
n
X
)2
(Xi )2 n(X
i=1
Dr Justin Wishart
42 / 53
Estimation of a variance
) = E((X
)2 ) =
We know that Var(Xi ) = E((Xi )2 ) = 2 and Var(X
hence
!
n
X
)2
(Xi X
= n 2 n
i=1
2
n ,
2
= (n 1) 2
n
and thus
n
X
1
)2
E(S ) =
E
(Xi X
n1
2
i=1
S2
!
=
(n 1) 2
= 2
n1
Dr Justin Wishart
43 / 53
Dr Justin Wishart
44 / 53
The 2 -distribution
A random variable, say X , is said to follow the chi-square-distribution
with degrees of freedom, i.e.
X 2
if its probability density function is given by
f (x) =
1
2/2
x /21 ex/2
for x > 0
; SX = [0, +)
for y > 0
Dr Justin Wishart
45 / 53
The 2 -distribution
1.0
0.5
0.2
0.1
0.4
0.2
f(x)
F(x)
0.6
0.3
0.8
0.4
2 d.f.
5 d.f.
10 d.f.
0.0
0.0
2 d.f.
5 d.f.
10 d.f.
10
15
20
25
10
15
20
25
cdf F (x)
CVEN2002/2702 (Statistics)
Dr Justin Wishart
46 / 53
The 2 -distribution
It can be shown that the mean and the variance of the 2 -distribution
are
E(X ) =
and
Var(X ) = 2
Note that a 2 -distributed random variable is nonnegative, as
expected, and the distribution is skewed to the right
However, as increases, the distribution becomes more and more
symmetric
In fact, it can be shown that the standardised 2 - distribution with
degrees of freedom approaches the standard normal distribution as
CVEN2002/2702 (Statistics)
Dr Justin Wishart
47 / 53
for X 2
Careful! unlike the standard normal
distribution (or the t-distribution),
the 2 -distribution is not symmetric
1
2,
x
Dr Justin Wishart
48 / 53
(n 1)S 2
2n1 ,
2
we can write P 2n1;/2
be rearranged as
P
(n1)S 2
2
2n1;1/2 = 1 , which can
(n 1)S 2
(n 1)S 2
2 2
2
n1;1/2
n1;/2
!
= 1 ,
Dr Justin Wishart
49 / 53
= P(S
/ [`, u] when =
; `=
02 )
2n1;/2 02
n1
=P
(n 1)S 2
(n 1)` (n 1)u
/
,
02
02
02
and
u=
2n1;1/2 02
n1
CVEN2002/2702 (Statistics)
Dr Justin Wishart
50 / 53
2n1;1 02
n1
Here, n = 15, and we can find on the table that 214;0.95 = 23.68, so the rule is
reject H0 if s2 >
CVEN2002/2702 (Statistics)
23.68 0.25
= 0.4229
14
Dr Justin Wishart
51 / 53
Dr Justin Wishart
52 / 53
Objectives
Objectives
Now you should be able to:
test hypotheses on a population proportion
test hypotheses and construct confidence intervals on the
variance of a normal population
structure comparative experiments involving two samples as
hypothesis tests
test hypotheses and construct confidence intervals on the
difference in means of two independent populations
test hypotheses and construct confidence intervals on the
difference in means of two paired (sub)populations
Recommended exercises: ; Q25(a-b) p.311, Q31 p.312, Q47, Q49
p.326, Q51(b), Q53 p.327, Q75 p.341, Q25, Q27 p.368, Q29, Q31
p.369, Q35, Q37 p.370
CVEN2002/2702 (Statistics)
Dr Justin Wishart
53 / 53