Vous êtes sur la page 1sur 39

Introduction

Inference about 1 2 : Paired Samples


Inference about 1 2 : Independent Samples

Stat 491: Biostatistics


Chapter 8: Hypothesis TestingTwo-Sample Inference

Solomon W. Harrar
The University of Montana

Fall 2012

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Two-Sample Inference
In Chapter 6 and 7, we had only one-sample.
Underlying (or p) of the population from which the sample
was drawn was compared with known mean (prevalence rate)
of the general population.
Example: Asian immigrants mean cholesterol was compared
with the general US mean cholesterol known to be 190
mg/dL.
In this chapter, we do have two samples each from a different
population.
Interest lies in comparing the underlying unknown means of
the two populations.

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Randomized Clinical Trials (RCT)


Patients are assigned to treatments by some random
mechanism.
If sample sizes are large, we expect type of patients assigned
to different treatment modalities to be similar.
If sample sizes is small, patient characteristics of treatment
groups may not be comparable.
A table of characteristics of the treatment groups are
customarily presented to check that the randomization is
working well.
Design features of RCT
Randomization: Complete, Block , Cluster (Group), Stratified
(by age, sex, or overall clinical condition).
Blinding: Single, Double, Triple and unblinded

Example: Greek Health Project


Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Two Types of Samples


Paired Samples: Each data point in one samples is matched
and related to a unique data point in the other sample.
Independent Samples: The data points in one sample are
unrelated to the data points in the other sample.
Example: Suppose we are interested in studying the
association between Oral Contraceptive (OC) use and blood
pressure.
One can start with non OC user women in the child bearing
age group (16-49 years of age) and follow them for one year.
For those who started using OC within the one year period,
compare the blood pressure at baseline and follow-up.
Alternatively, one can identify a group of OC user women and
another group of non users and compare their blood pressures.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Paired Samples Arise When


Having the same set of experimental units receive both
treatments (Cross-Over Design)
Having measurement taken before and after treatment
(Repeated-Measures Design)
No randomization.

Matching Subjects (Matched-Pair Design)


Using naturally occurring pairs such as twins or husbands and
wives.
Matching with respect to extraneous factors that may mask
differences in the treatments.
Block Randomization

Matched Case-Control Study (Observational study)

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Paired or Independent Sample


In repeated measures, each subject is serving as their own
control. This design may benefit from having a control group
as it allows to rule out other factors that may cause changes
between the two time points.
In matching, extraneous factors are expected to influence both
members of the pair equally.
Hence, paired design is definitive in that if difference is
present, it is highly likely that it occurred because of the the
difference in treatment.
Difference in the independent samples are only suggestive.
The differences in the subjects may mask true treatment or
group differences.
Paired design may NOT sometimes be practical and is usually
expensive.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Paired t Test
Let d = 1 2 .
Let n denote the number of pairs of measurements in the
sample.
Let di denote the difference between the first and second
measurement in the ith pair.
Assumption: d1 , d2 , . . . , dn constitute a random sample from
a normally distributed population with mean d and unknown
variance d2 .
We can look at Q-Q plot and Box plots of the ds to check
violation of the normality assumption.
Compute
s
Pn
n
X
2
1
i=1 (di d)

di and sd =
.
d=
n
n1
i=1

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

The Paired t-test


Hypotheses:
Case 1. H0 : d = 0
Case 2. H0 : d = 0
Case 3. H0 : d = 0
T.S.:

vs Ha : d > 0
vs Ha : d < 0
vs Ha : d 6= 0

sd / n
R.R.: For a specified value of ,
Case 1. Reject H0 if t tn1,1 .
Case 2. Reject H0 if t tn1,1 .
Case 3. Reject H0 if |t| tn1,/2 .
p-Value:
Case 1. P(t > tcomputed )
Case 2. P(t < tcomputed )
Case 3. 2 P(t > |tcomputed |) for two-sided test.
t=

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Confidence Interval for d


A 100(1 )% two-sided confidence interval estimate of the
size of the difference (d ) is
sd
d tn1,1/2 .
n
A 100(1 )% lower-sided confidence limit for the size of the
difference (d ) is
sd
d + tn1,1 .
n
A 100(1 )% upper-sided confidence limit for the size of the
difference (d ) is
sd
d tn1,1 .
n
If n is large then the z-test is used and normality is not
needed.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Example: Nutrition
An important hypothesis in hypertension research is that sodium
restriction may lower blood pressure. However, it is difficult to
achieve sodium restriction over the long term, and dietary
counseling in a group setting is sometimes used to achieve this
goal. The data on overnight urinary sodium excretion (mEq/8hr)
were obtained on eight individuals enrolled in a sodium-restricted
group. Data was collected at baseline
and after one week of dietary counseling. (d = 1.14 and sd = 12.22)
Person
Baseline
Week 1
di

1
7.85
9.59
-1.74

2
12.03
34.50
-22.47

3
21.84
4.55
17.29

4
13.94
20.78
-6.84

5
16.68
11.69
4.99

6
41.78
32.51
9.27

7
14.97
5.46
9.51

8
12.072
12.95
-0.88

Test the appropriate hypothesis and report p-value. Construct 95%


CI for the true mean change in overnight sodium excretion over a
one-week period. Verify the validity of the normality assumption.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Power Analysis and Sample-Size Estimation


Note that di = x1i x2i where x1i and x2i are the
measurements on the ith subject at the baseline and
follow-up, respectively.
Assumed d1 , . . . , dn constitute a random sample from
N(d , d2 ).
If we can get a good working estimate of d from a previous
or pilot or reproducibility study, we can use the power and
sample-size formulae from the one sample problem here.
More specifically, for the two-sided alternative
PWR(d ) P(Z z1/2 +
n = d2

|d |
)
d / n

and

(z1/2 + z1 )2
2d

For one-sided test, replace /2 with , and the power is exact.


Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Power Analysis and Sample-Size Estimation Contd...


However, caution has to be used when using estimate of d
from a previous study, in particular, in longitudinal studies.
Know that
d2 = 12 + 22 21 2
where is the correlation between X1 and X2 .
d2 depends on the correlation .
The correlation typically decreases at the time separation
increases.
To use d from a previous study, we have to make sure that
the time separation between baseline and follow up in the
previous study and the planned study are about the same.

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Background

Notations: Let us denote the population means and standard


deviations from the two populations as
Population 1: 1 and 1
Population 2: 2 and 2

Notations: Let us denote the means, standard deviation and


sample sizes of the two independent samples from the two
populations as
Sample 1: x1 , s1 and n1
Sample 2: x2 , s2 and n2

We are interested in making inference about 1 2 ..


1 X
2 .
A natural estimator of 1 2 is X

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

1 X
2
The Sampling Distribution of X

If the two populations are normally distributed then the


1 X
2 is normal with mean
sampling distribution of X
X1 X2 = 1 2 and standard deviation
X21 X2 =

12 22
+ .
n1
n2

If either of the two populations are non-normal but n1 and n2


are both large, then the above sampling distribution of
1 X
2 hold approximately. This is a consequence of the
X
CLT.

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

The three cases

Case 1: Both populations are normally distributed with


(a) 1 = 2 = (Pooled-variance t-procedures).
(b) 1 =
6 2 (Welch-Satterthwaite t-procedures).

Case 2: Both Sample Sizes n1 and n2 are large (z procedures)


Case 3: Either n1 or n2 is small and the population is non-normal.
(Bootstrap or Nonparametric procedures)

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

The Equal-Variance Case


The two populations are normally distributed,
t=

1 X
2 ) (1 2 )
(X
q
s n11 + n12

where
S2 =

tn1 +n2 2

(n1 1)S12 + (n2 1)S22


.
n1 + n2 2

Notice the degrees of freedom n1 + n2 2 comes from S 2 .


We will use this quantity to construct tests and confidence
intervals when the two populations are normal and the
standard deviations are equal.

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Large-Samples Case

When the sample sizes n1 and n2 are large, we use the


quantity
Z=

1 X
) (1 2 )
(X
q2 2
S22
S1
n1 + n2

N(0, 1)

This is true whether or not normality or equality of variance


hold.
This quantity is used for tests and confidence intervals when
n1 and n2 are large.

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

The Independent-Samples t-test for 1 2


Hypotheses:
Case 1. H0 : 1 2 0 vs Ha : 1 2 > 0
Case 2. H0 : 1 2 0 vs Ha : 1 2 < 0
Case 3. H0 : 1 2 = 0 vs Ha : 1 2 6= 0
T.S.:
p
t = (
x1 x2 )/(s 1/n1 + 1/n2 )
R.R.: For a specified value of ,
Case 1. Reject H0 if t tn1 +n2 2,1 .
Case 2. Reject H0 if t tn1 +n2 2,1 .
Case 3. Reject H0 if |t| tn1 +n2 2,1/2 .
p-Value:
Case 1. Reject H0 if P(t > tcomputed ).
Case 2. Reject H0 if P(t < tcomputed ).
Case 3. Reject H0 if P(t > |tcomputed |).
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

100(1 )% CI for 1 2 when 1 = 2


A 100(1 )% confidence interval for 1 2 is given by
r
1
1
(
x1 x2 ) tn1 +n2 2,1/2 s
+
n1 n2
Lower-sided confidence interval for 1 2
r
1
1
(
x1 x2 ) + tn1 +n2 2,1 s
+ .
n1 n2
Upper-sided confidence interval for 1 2
r
1
1
(
x1 x2 ) tn1 +n2 2,1 s
+ .
n1 n2
In R inference for difference in means can be tested in one of
the following two ways depending on how your data is
organized.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

t test in R
Inference for difference in means can be computed in R in one
of the following two ways depending on how your data is
organized.
If the two samples are entered as vectors x and y then
t.test(x,y,mu=0,paired=F,var.equal=T,
alternative="two.sided")
If the all the data form the two samples is in one vector y and
the vector x contains indicators of sample, then we use
t.test(y~x,mu=0,paired=F,var.equal=T,
alternative="two.sided")

Examples:
x=c(2.3,3.4,1.2,4.4)
y=c(3.2,1.5,2.6,3.3,4.5)
t.test(x,y,var.eual=T)
x=c(1,1,1,1,2,2,2,2,2)
y=c(2.3,3.4,1.2,4.4,3.2,1.5,2.6,3.3,4.5)
t.test(y~x,var.eual=T)
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Example: Veterinary Science


An experiment was conducted to evaluate the effectiveness of a
treatment for tapeworm in the stomachs of sheep. A random
sample of 24 worm-infected lamb of approximately the same age
and health was randomly divided into two groups. Twelve of the
lambs were injected with the drug and the remaining twelve were
left untreated. After a 6-month period, the lambs were slaughtered
and the following worm counts were recorded:
Drug Treated: 18, 43, 28, 50, 16, 32, 13, 35, 38, 33, 6, 7
Untreated: 40, 54, 26, 63, 21, 37, 39, 23, 48, 58, 28,39
(a) Does any of the assumptions of the pooled t-test appear to an
issue? (b) Test whether the mean number of tapeworms in the
stomachs of the treated lambs is less than the mean for untreated
lambs. Use = 0.05. (c) What is the level of significance for this
test? (d) Place a 95% CI on 1 2 to assess the size of the
difference in the two means.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Pooled-Variance t-test for 1 2 :An Example Contd...


x1 = 26.58, s1 = 14.36, x2 = 39.67 and s2 = 13.86
Normal QQ Plot for Untreated

50

Normal QQ Plot for Drug Treated

60

30

30

40

Sample Quantiles

50

20

Sample Quantiles

40

10

1.5

10

20

1.0

0.5

20

0.0

0.5

1.0

1.5

1.5

1.0

0.5

0.0

0.5

Theoretical Quantiles

Theoretical Quantiles

Box Plot for Drug Treated

Box Plot for Untreated

30

40

50

Chapter 8: Hypothesis TestingTwo-Sample Inference

20

30

40

Stat 491: Biostatistics

1.0

50

1.5

60

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Test for Equality of Variances


The choice between the pooled-variance and
Welch-Satterthwaite procedures depends on whether the
variances of the two populations are equal or not.
In reality, it may not always be clear if equality holds or not.
However, we can conduct a statistical test to assess the
departure from equality using sample data.
Assume the two populations are normally distributed.
We are interested in testing
H0 : 12 = 22

Chapter 8: Hypothesis TestingTwo-Sample Inference

vs Ha : 12 6= 22

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Test for Equality of Variances Contd...


The quantity,
F =

S12 /12
S22 /22

Fn1 1,n2 1

where
n1
P

S12 =

n2
P

1 )2
(X1i X

i=1

n1 1

and S22 =

2 )2
(X2i X

i=1

n2 1

The Fd1 ,d2 distribution depends on two degrees of freedom


known as the numerator and denominator degrees of freedom.
The Fd1 ,d2 distribution is a right-skewed distribution over the
interval (0, ).
We want to reject H0 , when the test-statistic F = S12 /S22 is
small or large compared to 1.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Test for Equality of Variances Contd...


For a size- test, we reject H0 if
F Fn1 1,n2 1,/2

or F Fn1 1,n2 1,1/2

In R, the quantiles of Fn1 1,n2 1 can be obtained as


qf(alpha/2, n1-1, n2-1)
qf(1-alpha/2, n1-1, n2-1)
p-value,
(
2 P(F > Fcomputed ) if Fcomputed 1
pvalue =
2 P(F < Fcomputed ) if Fcomputed < 1
Area under the curve of Fn1 1,n2 1 to the left of Fcomputed
can be found in R by
pf(F_computed,n1-1,n2-1)
For the tape worm data, test the hypothesis of equality of
variance.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Test for Equality of Variances Contd...


In the test-statistic F ,
F =

S12
S22

H0

Fn1 1,n2 1 ,

we are using using the variance of the sample from population


1 in numerator and that of population 2 in the denominator.
The labeling of the population is arbitrary.
We could define the test statistic as
F =

S22
S12

H0

Fn2 1,n1 1 .

Do we get the same conclusion? YES.


Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Test for Equality of Variances Contd...


We observe, under the null hypothesis H0 : 12 = 22 , that
P(

S22
< Fn2 1,n1 1,/2 ) = /2
S12

= P(

S12
S22
1
>
F
)
=
P(
<
)
n
1,n
1,1/2
1
2
Fn1 1,n2 1,1/2
S22
S12

Therefore,
Fn2 1,n1 1,/2 =

1
Fn1 1,n2 1,1/2

In R equality of variance can be tested in one of the following


two ways depending on how your data is organized.
var.test(x,y,ratio=1,alternative="two.sided")
var.test(y~x,ratio=1,alternative="two.sided")
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

The Behrens-Fisher Problem

Assume two independent samples from normal populations.


We know, by conducting a test or otherwise, 1 6= 2 .
Inference about 1 2 in this situation is known as the
Behrens-Fisher problem.
The test and confidence interval procedure was developed by
Welch(1938) using Satterthwaite approximation for the
degrees of freedom and, hence, is referred to as
Welch-Satterthwaite Method.

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

The Behrens-Fisher Problem Contd...


The quantity
t0 =

(
x1 x2 ) (1 2 )
q 2
td
s22
s1
n1 + n2

where
d=

(s12 /n1 + s22 /n2 )2


.
(s12 /n1 )2 /(n1 1) + (s22 /n2 )2 /(n2 1)

This quantity is used for tests and confidence intervals


concerning 1 2 .
For example, a 100(1 )% CI for 1 2 is given by
s
s12
s2
(
x1 x2 ) td,1/2
+ 2.
n1 n2
Effect of unequal variance is large for unequal sample sizes.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Strategy for Testing Equality of Means


When it is not clear whether 12 = 22 but normality appears
to hold, use the following strategy.
Fail to Reject

Test

Reject

H0 : 21 = 22

Use Pooled

Use Welch's

t Test for

t Test for

H0 : 1 = 2

H0 : 1 = 2

Test for equality of variance is sensitive to departure from


normality.
Non-parametric methods must be used in these cases.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Behrens-Fisher Problem: Example


A possible important environmental determinant of lung function
in children is amount of cigarette smoking in the home. Suppose
this question is studied by selecting two groups: group 1 consists
of 23 nonsmoking children 5-9 years of age, both of whose parents
smoke, who have a mean forced expiratory volume (FEV) of 2.1 L
and standard deviation of 0.7 L; group 2 consists of 20 nonsmoking
children of comparable age, neither of whose parents smoke, who
have mean FEV of 2.3 L and a standard deviation of 0.4 L. (a)
What are the appropriate null and alternative hypothesis in this
situation? (b) What is the appropriate test procedure for the
hypotheses above? (c) Carry out the test and report p-value. (d)
Provide 95% CI for the true mean difference in FEV between 5- to
9-year-old children whose parents smoke and comparable children
whose parents do not smoke.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Power Analysis

For given sample sizes n1 and n2 and significance level , the


power the study will have in detecting a difference of
= |1 2 | is

PWR() = P(Z < z1/2 + q


)
12 /n1 + 22 /n2

= pnorm(z1/2 + q
, 0, 1)
12 /n1 + 22 /n2
For one-sided alternative, we replace /2 with .

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Power Analysis : Example

Suppose 100 OC users and 100 non-OC users are available for
study and a true mean difference of 1 2 = 5 mm Hg is
anticipated, with OC users having the higher mean SBP. How
much power would such a study have if estimates of the
standard deviations for OC users and non-users were obtained
from a pilot study as 15.34 mm Hg and 18.23 mm Hg,
respectively?

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Sample-Size Estimation
The appropriate sample size to have a probability of 1 of
finding a significant difference based on a two-sided test with
significance level when the absolute difference in mean
between the two groups is = |1 2 | is:
a. Equal sample sizes anticipated
n1 = n2 = (12 + 22 )

(z1/2 + z1 )2
.
2

b. A known proportion n2 = kn1 anticipated


n1 = (12 + 22 /k)

(z1/2 + z1 )2
.
2

For one-sided test, we replace /2 with .


When 1 = 2 , the smallest total sample size for a given
and is achieved by the equal sample size allocation.
Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Sample-Size Estimation: Example

Suppose we anticipate twice as many non-OC users as OC


users entering the study. From a pilot study, estimates of the
standard deviations for OC users and non-users were obtained
as 15.34 and 18.23, respectively. Project the required sample
size to find a significant difference in a two-sided test with 5%
significance level and 80% power when there a 5 mm Hg
difference in the true SBP means of OC users and non-OC
users.

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Paired-Samples versus Independent-Samples t Test


For one sided test H0 : 1 2 versus H0 : 1 > 2 , the
power of the Z test is given by,
PWR() = P(Z < z1 +

X 1 X 2

where = 1 2 .
For paired sample,
X2

1 X 2

12 22
1 2
+
2
.
n
n
n

For independent samples,


12 22
+ .
1 X 2
n
n
When > 0, which is typically the case, paired sample will
have higher power than independent samples.
X2

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Efficiency of Pairing: An Example

A study was designed to measure the effect of home environment


on academic achievement of 12-year-old students. Because genetic
differences may also contribute to academic achievement, the
researcher wanted to control for this factor. Thirty sets of identical
twins were identified who had been adopted prior to their first
birthday, with one twin placed in a home in which academics were
emphasized (Academic) and the other twin placed in a home in
which academics were not emphasize (Nonacademic). The p
values for comparing the mean scores for the academic and
nonacademic environments were 0.000 and 0.24 for the paired and
independent sample t tests, respectively.

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Efficiency of Pairing: An Example Contd...

(a) Is there a difference in in the mean final grade between the


students in an academically oriented home environment and
those in a nonacademic home environments?
(b) Does it appear that using twins in this study to control for
variation in the final scores were effective as compared to
taking a random sample of 30 students in both types of
environments? Justify your answer? See scatter plot on the
next page.

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples

Efficiency of Pairing: An Example Contd...


Scatter Plot of Scores of Academic and Nonacademic Twins

90

70

Nonacademic Environment

80

60

50

50

60

70

80

Academic Environment

Chapter 8: Hypothesis TestingTwo-Sample Inference

Stat 491: Biostatistics

90

Vous aimerez peut-être aussi