Stat 491 Chapter 8 - Hypothesis Testing - Two Sample Inference

Introduction
Inference about 1 2 : Paired Samples

Inference about 1 2 : Independent Samples
Stat 491: Biostatistics

Chapter 8: Hypothesis TestingTwo-Sample Inference
Solomon W. Harrar
The University of Montana
Fall 2012
Introduction
Two-Sample Inference
In Chapter 6 and 7, we had only one-sample.
Underlying (or p) of the population from which the sample
was drawn was compared with known mean (prevalence rate)
of the general population.
Example: Asian immigrants mean cholesterol was compared
with the general US mean cholesterol known to be 190
mg/dL.
In this chapter, we do have two samples each from a different
population.
Interest lies in comparing the underlying unknown means of
the two populations.
Introduction
Randomized Clinical Trials (RCT)

Patients are assigned to treatments by some random
mechanism.
If sample sizes are large, we expect type of patients assigned
to different treatment modalities to be similar.
If sample sizes is small, patient characteristics of treatment
groups may not be comparable.
A table of characteristics of the treatment groups are
customarily presented to check that the randomization is
working well.
Design features of RCT
Randomization: Complete, Block , Cluster (Group), Stratified
(by age, sex, or overall clinical condition).
Blinding: Single, Double, Triple and unblinded
Example: Greek Health Project

Introduction
Two Types of Samples

Paired Samples: Each data point in one samples is matched
and related to a unique data point in the other sample.
Independent Samples: The data points in one sample are
unrelated to the data points in the other sample.
Example: Suppose we are interested in studying the
association between Oral Contraceptive (OC) use and blood
pressure.
One can start with non OC user women in the child bearing
age group (16-49 years of age) and follow them for one year.
For those who started using OC within the one year period,
compare the blood pressure at baseline and follow-up.
Alternatively, one can identify a group of OC user women and
another group of non users and compare their blood pressures.
Introduction
Paired Samples Arise When

Having the same set of experimental units receive both
treatments (Cross-Over Design)
Having measurement taken before and after treatment
(Repeated-Measures Design)
No randomization.
Matching Subjects (Matched-Pair Design)

Using naturally occurring pairs such as twins or husbands and
wives.
Matching with respect to extraneous factors that may mask
differences in the treatments.
Block Randomization
Matched Case-Control Study (Observational study)
Introduction
Paired or Independent Sample

In repeated measures, each subject is serving as their own
control. This design may benefit from having a control group
as it allows to rule out other factors that may cause changes
between the two time points.
In matching, extraneous factors are expected to influence both
members of the pair equally.
Hence, paired design is definitive in that if difference is
present, it is highly likely that it occurred because of the the
difference in treatment.
Difference in the independent samples are only suggestive.
The differences in the subjects may mask true treatment or
group differences.
Paired design may NOT sometimes be practical and is usually
expensive.
Introduction
Paired t Test
Let d = 1 2 .
Let n denote the number of pairs of measurements in the
sample.
Let di denote the difference between the first and second
measurement in the ith pair.
Assumption: d1 , d2 , . . . , dn constitute a random sample from
a normally distributed population with mean d and unknown
variance d2 .
We can look at Q-Q plot and Box plots of the ds to check
violation of the normality assumption.
Compute
s
Pn
n
X
2
1
i=1 (di d)
di and sd =
.
d=
n
n1
i=1
Introduction
The Paired t-test

Hypotheses:
Case 1. H0 : d = 0
Case 2. H0 : d = 0
Case 3. H0 : d = 0
T.S.:
vs Ha : d > 0
vs Ha : d < 0
vs Ha : d 6= 0
sd / n
R.R.: For a specified value of ,
Case 1. Reject H0 if t tn1,1 .
Case 2. Reject H0 if t tn1,1 .
Case 3. Reject H0 if |t| tn1,/2 .
p-Value:
Case 1. P(t > tcomputed )
Case 2. P(t < tcomputed )
Case 3. 2 P(t > |tcomputed |) for two-sided test.
t=
Introduction
Confidence Interval for d

A 100(1 )% two-sided confidence interval estimate of the
size of the difference (d ) is
sd
d tn1,1/2 .
n
A 100(1 )% lower-sided confidence limit for the size of the
difference (d ) is
sd
d + tn1,1 .
n
A 100(1 )% upper-sided confidence limit for the size of the
difference (d ) is
sd
d tn1,1 .
n
If n is large then the z-test is used and normality is not
needed.
Introduction
Example: Nutrition
An important hypothesis in hypertension research is that sodium
restriction may lower blood pressure. However, it is difficult to
achieve sodium restriction over the long term, and dietary
counseling in a group setting is sometimes used to achieve this
goal. The data on overnight urinary sodium excretion (mEq/8hr)
were obtained on eight individuals enrolled in a sodium-restricted
group. Data was collected at baseline
and after one week of dietary counseling. (d = 1.14 and sd = 12.22)
Person
Baseline
Week 1
di
1
7.85
9.59
-1.74
2
12.03
34.50
-22.47
3
21.84
4.55
17.29
4
13.94
20.78
-6.84
5
16.68
11.69
4.99
6
41.78
32.51
9.27
7
14.97
5.46
9.51
8
12.072
12.95
-0.88
Test the appropriate hypothesis and report p-value. Construct 95%

CI for the true mean change in overnight sodium excretion over a
one-week period. Verify the validity of the normality assumption.
Introduction
Power Analysis and Sample-Size Estimation

Note that di = x1i x2i where x1i and x2i are the
measurements on the ith subject at the baseline and
follow-up, respectively.
Assumed d1 , . . . , dn constitute a random sample from
N(d , d2 ).
If we can get a good working estimate of d from a previous
or pilot or reproducibility study, we can use the power and
sample-size formulae from the one sample problem here.
More specifically, for the two-sided alternative
PWR(d ) P(Z z1/2 +
n = d2
|d |
)
d / n
and
(z1/2 + z1 )2
2d
For one-sided test, replace /2 with , and the power is exact.

Introduction
Power Analysis and Sample-Size Estimation Contd...

However, caution has to be used when using estimate of d
from a previous study, in particular, in longitudinal studies.
Know that
d2 = 12 + 22 21 2
where is the correlation between X1 and X2 .
d2 depends on the correlation .
The correlation typically decreases at the time separation
increases.
To use d from a previous study, we have to make sure that
the time separation between baseline and follow up in the
previous study and the planned study are about the same.
Introduction
Background
Notations: Let us denote the population means and standard

deviations from the two populations as
Population 1: 1 and 1
Population 2: 2 and 2
Notations: Let us denote the means, standard deviation and

sample sizes of the two independent samples from the two
populations as
Sample 1: x1 , s1 and n1
Sample 2: x2 , s2 and n2
We are interested in making inference about 1 2 ..

1 X
2 .
A natural estimator of 1 2 is X
Introduction
1 X
2
The Sampling Distribution of X
If the two populations are normally distributed then the

1 X
2 is normal with mean
sampling distribution of X
X1 X2 = 1 2 and standard deviation
X21 X2 =
12 22
+ .
n1
n2
If either of the two populations are non-normal but n1 and n2

are both large, then the above sampling distribution of
1 X
2 hold approximately. This is a consequence of the
X
CLT.
Introduction
The three cases
Case 1: Both populations are normally distributed with

(a) 1 = 2 = (Pooled-variance t-procedures).
(b) 1 =
6 2 (Welch-Satterthwaite t-procedures).
Case 2: Both Sample Sizes n1 and n2 are large (z procedures)

Case 3: Either n1 or n2 is small and the population is non-normal.
(Bootstrap or Nonparametric procedures)
Introduction
The Equal-Variance Case

The two populations are normally distributed,
t=
1 X
2 ) (1 2 )
(X
q
s n11 + n12
where
S2 =
tn1 +n2 2
(n1 1)S12 + (n2 1)S22

.
n1 + n2 2
Notice the degrees of freedom n1 + n2 2 comes from S 2 .

We will use this quantity to construct tests and confidence
intervals when the two populations are normal and the
standard deviations are equal.
Introduction
Large-Samples Case
When the sample sizes n1 and n2 are large, we use the

quantity
Z=
1 X
) (1 2 )
(X
q2 2
S22
S1
n1 + n2
N(0, 1)
This is true whether or not normality or equality of variance

hold.
This quantity is used for tests and confidence intervals when
n1 and n2 are large.
Introduction
The Independent-Samples t-test for 1 2

Hypotheses:
Case 1. H0 : 1 2 0 vs Ha : 1 2 > 0
Case 2. H0 : 1 2 0 vs Ha : 1 2 < 0
Case 3. H0 : 1 2 = 0 vs Ha : 1 2 6= 0
T.S.:
p
t = (
x1 x2 )/(s 1/n1 + 1/n2 )
R.R.: For a specified value of ,
Case 1. Reject H0 if t tn1 +n2 2,1 .
Case 2. Reject H0 if t tn1 +n2 2,1 .
Case 3. Reject H0 if |t| tn1 +n2 2,1/2 .
p-Value:
Case 1. Reject H0 if P(t > tcomputed ).
Case 2. Reject H0 if P(t < tcomputed ).
Case 3. Reject H0 if P(t > |tcomputed |).
Introduction
100(1 )% CI for 1 2 when 1 = 2

A 100(1 )% confidence interval for 1 2 is given by
r
1
1
(
x1 x2 ) tn1 +n2 2,1/2 s
+
n1 n2
Lower-sided confidence interval for 1 2
r
1
1
(
x1 x2 ) + tn1 +n2 2,1 s
+ .
n1 n2
Upper-sided confidence interval for 1 2
r
1
1
(
x1 x2 ) tn1 +n2 2,1 s
+ .
n1 n2
In R inference for difference in means can be tested in one of
the following two ways depending on how your data is
organized.
Introduction
t test in R
Inference for difference in means can be computed in R in one
of the following two ways depending on how your data is
organized.
If the two samples are entered as vectors x and y then
t.test(x,y,mu=0,paired=F,var.equal=T,
alternative="two.sided")
If the all the data form the two samples is in one vector y and
the vector x contains indicators of sample, then we use
t.test(y~x,mu=0,paired=F,var.equal=T,
alternative="two.sided")
Examples:
x=c(2.3,3.4,1.2,4.4)
y=c(3.2,1.5,2.6,3.3,4.5)
t.test(x,y,var.eual=T)
x=c(1,1,1,1,2,2,2,2,2)
y=c(2.3,3.4,1.2,4.4,3.2,1.5,2.6,3.3,4.5)
t.test(y~x,var.eual=T)
Introduction
Example: Veterinary Science

An experiment was conducted to evaluate the effectiveness of a
treatment for tapeworm in the stomachs of sheep. A random
sample of 24 worm-infected lamb of approximately the same age
and health was randomly divided into two groups. Twelve of the
lambs were injected with the drug and the remaining twelve were
left untreated. After a 6-month period, the lambs were slaughtered
and the following worm counts were recorded:
Drug Treated: 18, 43, 28, 50, 16, 32, 13, 35, 38, 33, 6, 7
Untreated: 40, 54, 26, 63, 21, 37, 39, 23, 48, 58, 28,39
(a) Does any of the assumptions of the pooled t-test appear to an
issue? (b) Test whether the mean number of tapeworms in the
stomachs of the treated lambs is less than the mean for untreated
lambs. Use = 0.05. (c) What is the level of significance for this
test? (d) Place a 95% CI on 1 2 to assess the size of the
difference in the two means.
Introduction
Pooled-Variance t-test for 1 2 :An Example Contd...

x1 = 26.58, s1 = 14.36, x2 = 39.67 and s2 = 13.86
Normal QQ Plot for Untreated
50
Normal QQ Plot for Drug Treated
60
30
30
40
Sample Quantiles
50
20
Sample Quantiles
40
10
1.5
10
20
1.0
0.5
20
0.0
0.5
1.0
1.5
1.5
1.0
0.5
0.0
0.5
Theoretical Quantiles
Theoretical Quantiles
Box Plot for Drug Treated
Box Plot for Untreated
30
40
50
20
30
40
1.0
50
1.5
60
Introduction
Test for Equality of Variances

The choice between the pooled-variance and
Welch-Satterthwaite procedures depends on whether the
variances of the two populations are equal or not.
In reality, it may not always be clear if equality holds or not.
However, we can conduct a statistical test to assess the
departure from equality using sample data.
Assume the two populations are normally distributed.
We are interested in testing
H0 : 12 = 22
vs Ha : 12 6= 22
Introduction
Test for Equality of Variances Contd...

The quantity,
F =
S12 /12
S22 /22
Fn1 1,n2 1
where
n1
P
S12 =
n2
P
1 )2
(X1i X
i=1
n1 1
and S22 =
2 )2
(X2i X
i=1
n2 1
The Fd1 ,d2 distribution depends on two degrees of freedom

known as the numerator and denominator degrees of freedom.
The Fd1 ,d2 distribution is a right-skewed distribution over the
interval (0, ).
We want to reject H0 , when the test-statistic F = S12 /S22 is
small or large compared to 1.
Introduction

For a size- test, we reject H0 if
F Fn1 1,n2 1,/2
or F Fn1 1,n2 1,1/2
In R, the quantiles of Fn1 1,n2 1 can be obtained as

qf(alpha/2, n1-1, n2-1)
qf(1-alpha/2, n1-1, n2-1)
p-value,
(
2 P(F > Fcomputed ) if Fcomputed 1
pvalue =
2 P(F < Fcomputed ) if Fcomputed < 1
Area under the curve of Fn1 1,n2 1 to the left of Fcomputed
can be found in R by
pf(F_computed,n1-1,n2-1)
For the tape worm data, test the hypothesis of equality of
variance.
Introduction

In the test-statistic F ,
F =
S12
S22
H0
Fn1 1,n2 1 ,
we are using using the variance of the sample from population

1 in numerator and that of population 2 in the denominator.
The labeling of the population is arbitrary.
We could define the test statistic as
F =
S22
S12
H0
Fn2 1,n1 1 .
Do we get the same conclusion? YES.

Introduction

We observe, under the null hypothesis H0 : 12 = 22 , that
P(
S22
< Fn2 1,n1 1,/2 ) = /2
S12
= P(
S12
S22
1
>
F
)
=
P(
<
)
n
1,n
1,1/2
1
2
Fn1 1,n2 1,1/2
S22
S12
Therefore,
Fn2 1,n1 1,/2 =
1
Fn1 1,n2 1,1/2
In R equality of variance can be tested in one of the following

two ways depending on how your data is organized.
var.test(x,y,ratio=1,alternative="two.sided")
var.test(y~x,ratio=1,alternative="two.sided")
Introduction
The Behrens-Fisher Problem
Assume two independent samples from normal populations.

We know, by conducting a test or otherwise, 1 6= 2 .
Inference about 1 2 in this situation is known as the
Behrens-Fisher problem.
The test and confidence interval procedure was developed by
Welch(1938) using Satterthwaite approximation for the
degrees of freedom and, hence, is referred to as
Welch-Satterthwaite Method.
Introduction
The Behrens-Fisher Problem Contd...

The quantity
t0 =
(
x1 x2 ) (1 2 )
q 2
td
s22
s1
n1 + n2
where
d=
(s12 /n1 + s22 /n2 )2

.
(s12 /n1 )2 /(n1 1) + (s22 /n2 )2 /(n2 1)
This quantity is used for tests and confidence intervals

concerning 1 2 .
For example, a 100(1 )% CI for 1 2 is given by
s
s12
s2
(
x1 x2 ) td,1/2
+ 2.
n1 n2
Effect of unequal variance is large for unequal sample sizes.
Introduction
Strategy for Testing Equality of Means

When it is not clear whether 12 = 22 but normality appears
to hold, use the following strategy.
Fail to Reject
Test
Reject
H0 : 21 = 22
Use Pooled
Use Welch's
t Test for
t Test for
H0 : 1 = 2
H0 : 1 = 2
Test for equality of variance is sensitive to departure from

normality.
Non-parametric methods must be used in these cases.
Introduction
Behrens-Fisher Problem: Example

A possible important environmental determinant of lung function
in children is amount of cigarette smoking in the home. Suppose
this question is studied by selecting two groups: group 1 consists
of 23 nonsmoking children 5-9 years of age, both of whose parents
smoke, who have a mean forced expiratory volume (FEV) of 2.1 L
and standard deviation of 0.7 L; group 2 consists of 20 nonsmoking
children of comparable age, neither of whose parents smoke, who
have mean FEV of 2.3 L and a standard deviation of 0.4 L. (a)
What are the appropriate null and alternative hypothesis in this
situation? (b) What is the appropriate test procedure for the
hypotheses above? (c) Carry out the test and report p-value. (d)
Provide 95% CI for the true mean difference in FEV between 5- to
9-year-old children whose parents smoke and comparable children
whose parents do not smoke.
Introduction
Power Analysis
For given sample sizes n1 and n2 and significance level , the

power the study will have in detecting a difference of
= |1 2 | is
PWR() = P(Z < z1/2 + q

)
12 /n1 + 22 /n2
= pnorm(z1/2 + q
, 0, 1)
12 /n1 + 22 /n2
For one-sided alternative, we replace /2 with .
Introduction
Power Analysis : Example
Suppose 100 OC users and 100 non-OC users are available for
study and a true mean difference of 1 2 = 5 mm Hg is
anticipated, with OC users having the higher mean SBP. How
much power would such a study have if estimates of the
standard deviations for OC users and non-users were obtained
from a pilot study as 15.34 mm Hg and 18.23 mm Hg,
respectively?
Introduction
Sample-Size Estimation
The appropriate sample size to have a probability of 1 of
finding a significant difference based on a two-sided test with
significance level when the absolute difference in mean
between the two groups is = |1 2 | is:
a. Equal sample sizes anticipated
n1 = n2 = (12 + 22 )
(z1/2 + z1 )2
.
2
b. A known proportion n2 = kn1 anticipated

n1 = (12 + 22 /k)
(z1/2 + z1 )2
.
2
For one-sided test, we replace /2 with .

When 1 = 2 , the smallest total sample size for a given
and is achieved by the equal sample size allocation.
Introduction
Sample-Size Estimation: Example
Suppose we anticipate twice as many non-OC users as OC

users entering the study. From a pilot study, estimates of the
standard deviations for OC users and non-users were obtained
as 15.34 and 18.23, respectively. Project the required sample
size to find a significant difference in a two-sided test with 5%
significance level and 80% power when there a 5 mm Hg
difference in the true SBP means of OC users and non-OC
users.
Introduction
Paired-Samples versus Independent-Samples t Test

For one sided test H0 : 1 2 versus H0 : 1 > 2 , the
power of the Z test is given by,
PWR() = P(Z < z1 +
X 1 X 2
where = 1 2 .
For paired sample,
X2
1 X 2
12 22
1 2
+
2
.
n
n
n
For independent samples,

12 22
+ .
1 X 2
n
n
When > 0, which is typically the case, paired sample will
have higher power than independent samples.
X2
Introduction
Efficiency of Pairing: An Example
A study was designed to measure the effect of home environment

on academic achievement of 12-year-old students. Because genetic
differences may also contribute to academic achievement, the
researcher wanted to control for this factor. Thirty sets of identical
twins were identified who had been adopted prior to their first
birthday, with one twin placed in a home in which academics were
emphasized (Academic) and the other twin placed in a home in
which academics were not emphasize (Nonacademic). The p
values for comparing the mean scores for the academic and
nonacademic environments were 0.000 and 0.24 for the paired and
independent sample t tests, respectively.
Introduction
Efficiency of Pairing: An Example Contd...
(a) Is there a difference in in the mean final grade between the

students in an academically oriented home environment and
those in a nonacademic home environments?
(b) Does it appear that using twins in this study to control for
variation in the final scores were effective as compared to
taking a random sample of 30 students in both types of
environments? Justify your answer? See scatter plot on the
next page.
Introduction
Efficiency of Pairing: An Example Contd...

Scatter Plot of Scores of Academic and Nonacademic Twins
90
70
Nonacademic Environment
80
60
50
50
60
70
80
Academic Environment
90

Stat 491 Chapter 8 - Hypothesis Testing - Two Sample Inference

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Stat 491 Chapter 8 - Hypothesis Testing - Two Sample Inference

Transféré par

Droits d'auteur :

Formats disponibles

Introduction

Inference about 1 2 : Paired Samples

Stat 491: Biostatistics

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Randomized Clinical Trials (RCT)

Example: Greek Health Project

Stat 491: Biostatistics

Two Types of Samples

Stat 491: Biostatistics

Paired Samples Arise When

Matching Subjects (Matched-Pair Design)

Matched Case-Control Study (Observational study)

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Paired or Independent Sample

Stat 491: Biostatistics

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

The Paired t-test

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Confidence Interval for d

Stat 491: Biostatistics

Test the appropriate hypothesis and report p-value. Construct 95%

Stat 491: Biostatistics

Power Analysis and Sample-Size Estimation

For one-sided test, replace /2 with , and the power is exact.

Stat 491: Biostatistics

Power Analysis and Sample-Size Estimation Contd...

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Notations: Let us denote the population means and standard

Notations: Let us denote the means, standard deviation and

We are interested in making inference about 1 2 ..

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

If the two populations are normally distributed then the

If either of the two populations are non-normal but n1 and n2

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

The three cases

Case 1: Both populations are normally distributed with

Case 2: Both Sample Sizes n1 and n2 are large (z procedures)

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

The Equal-Variance Case

(n1 1)S12 + (n2 1)S22

Notice the degrees of freedom n1 + n2 2 comes from S 2 .

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

When the sample sizes n1 and n2 are large, we use the

This is true whether or not normality or equality of variance

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

The Independent-Samples t-test for 1 2

Stat 491: Biostatistics

100(1 )% CI for 1 2 when 1 = 2

Stat 491: Biostatistics

Stat 491: Biostatistics

Example: Veterinary Science