Vous êtes sur la page 1sur 41

NSTA-116

Hypothesis
Testing TwoSample
Inference

Professor
Andreas
Evangelou,
Ph.D.
(Lecture #9:
Nov.18&Nov.19,
2014)

TOPICS
The Paired t-Test
Interval Estimation for the Comparison of Means
from Two Paired Samples
Two Sample t-Test for Independent Samples with
Equal Variances
Two-Sample t-Test for Independent Samples with
Unequal Variances
Interval Estimation for the Comparison of Means
from Two Independent Samples With Equal
Variances

EXPECTATIONS
1. What is hypothesis testing for two-sample
inference?
2. What is the difference between paired- and
independent-samples?
3. What is a paired t-test?
4. How do we perform a two-sample t-test for
independent samples with equal variance?
5. How do we perform a two-sample t-test for
independent samples with unequal variance?

Inference About the


Population Variance
We use the following test statistic, 2 (Chisquared) distributed with n-1 degrees of
freedom

We use this relationship to test and estimate the


population variance
Rejection region:

or

<

n-1,

>

n-1,1-

Chi-Square (2) Distribution Table


Area to the Left of the Critical Value
0.005

0.01

0.025

0.05

0.10

0.90

0.95

0.975

0.99

0.995

<

Bernard Rosner (2011). Fundamentals of Biostatistics (7th Ed.)

Bernard Rosner (2011). Fundamentals of Biostatistics (7th Ed.)

Example #9: Blood Pressure


The variability of an Arteriosonde machine (AM)
(ultrasonic blood pressure meter or monitor) differs
from that of a standard blood pressure cuff
(SBPC). It is anticipated that the variability in AM
should be lower than that of SBPC. However, the
variability could possibly be higher. Investigators
decide to use a two-sided test to study this
phenomenon. Suppose we know from previous
published work that 2 = 35 (02, obtained from
SBPC readings). Asses the statistical significance of
the Arteriosonde machine data (s2 = 8.178, n = 10).
Thus we want to test the hypothesis H0: 2=02 vs.
Ha: 202.

Example #10: Car Sit Springs


Car sit springs are designed to be 500 mm long,
i.e., springs too long or too short must be
reworked. A standard deviation ( )of 2 mm in
spring length will result in an acceptable number
of reworked springs. A sample of 100 springs
was taken and measured. Can we infer at 10%
significance level that the number of springs
requiring reworking is unacceptably large, given
the sample variance is 6.52 mm2?

Example #11: Random


Sample of 100 Observations
A random sample of 100 observations was taken
from a normal population. The sample variance
was 29.76. Can we infer at 2.5% significance
level that the population variance does not
exceed 30? Estimate the population variance
with 90% confidence.

INTRODUCTION
Two-Sample Hypothesis Testing: comparing the
underlying parameters of two different populations,
neither of whose values are assumed known
Paired-samples: data points from each sample are
matched uniquely (used in longitudinal studies)
Independent-samples: data points in one sample
are unrelated to the data points in the second
sample (used in cross-sectional studies)
Two hypothesis statements for testing:
H0: 1=2 vs. Ha: 12 (two-sided tail test)

Two-Sample Statistical Inference for Normality


Theory Methods
TWO SAMPLES?
NO

YES
NO

Methods for
comparing more
than two samples,
e.g., ANOVA

Normal Distribution?
Or can CLT be applied?

YES
Inferences
concerning
variances,
e.g., F-test

INFERENCES CONCERNING THE MEAN


Paired t-Test
NO

YES
Independent Samples

Two-Sample tTest with Equal


Variances

NO

Are 12 and 22
significantly
different?

Paired Samples

YES

Two-Sample tTest with


Unequal
Variances

The Paired t Test


Used for paired-sample data, assuming both
variables are normally distributed
Test statistic:

is sample standard deviation of observed


differences
=

, n = number of matched pairs

Our test hypotheses:


H0: 1=2 vs. Ha: 12

Acceptance and Rejection


Regions for the paired t-Test

(/2)

Interval Estimation for the Comparison


of Means from Two-Paired Samples
Construct confidence limits for the true
mean difference ()
di are normally distributed with mean and
variance d2
is normally distributed with mean and
variance d2/n where d2 is unknown.

100%(1-) CI for true mean :


[ -tn-1,1-/2(
= tn-1,1-/2(

), +tn-1,1-/2(
)

)]

Two-Sample t Test for Independent


Samples with Equal Variance

~ [

H0: 1=2

~ [0,

is normally distributed with mean 1, 12

is normally distributed with mean 2, 22


12=22=2 (i.e., equal variance)
H0: 1=2 vs. Ha: 12
n1 and n2 are independent samples
( ) is normally distributed with mean (
) and variance
+
+

Two-Sample t Test for Independent


Samples with Equal Variance
If 2 were known, the test statistic used as the basis for
hypothesis testing is:

However, 2 is unknown, and we must estimate it from


the data, using s12 and s22, respectively
In addition, n1 and n2 are different and hence weighted
difference between s12 and s22 is not the same
Therefore we need to take the average of the two
sample variances; the weighted difference is the number
of degrees of freedom between n1 and n2

Two-Sample t Test for Independent


Samples with Equal Variance
The Pooled Estimate of the Sample Variance from the
two independent samples:

+(

Therefore we can substitute s below for :


=

+(

The resulting test statistic follows a t-distribution with


n1+n2-2 degrees of freedom, rather than an N(0,1)
distribution because 2 is unknown.

Two-Sample t Test for Independent


Samples with Equal Variance
Therefore to test H0: 1=2 vs. Ha: 12 at
significance level , for two normally distributed
populations where 12=22, compute the
following test statistic:

=
=

1)

+ ( 1)
+ 2
2

Acceptance and Rejection Regions for


Two Independent Samples t-Test with
Equal Variance

(/2)

Interval Estimation for the Comparison of


Means from Two-Independent Samples
(Equal Variance Case)
The 100%(1-) CI for true mean difference
between two population groups, 1-2:
o If is known:

i.e.,

~ [

) (

o If is unknown:

)(
+

~ ( , )

Interval Estimation for the Comparison of


Means from Two-Independent Samples
(Equal Variance Case)
o The two-sided 100%(1-) CI:

<

=1

)(

OR

: [(

)]

Two-Sample t-Test for Independent


Samples with Unequal Variance
We assume two normally distributed random
samples (n1 and n2) from N(1,12) and N(2,22)
distributions, respectively, and 12 22
We test H0: 1=2 vs. Ha: 12

1:

is normally distributed with mean 1 and


variance 1
2:

is normally distributed with mean 1 and


variance 2

~ (

) under H0: 1=2

~ ( ,

Two-Sample t-Test for Independent


Samples with Unequal Variance
If 12 and 22 are known:
o Use test statistic:

, under H0: N(0,1)

If 12 and 22 are unknown:


o Use test statistic:

, however the exact

distribution of t under H0 is difficult to derive, so we


use THE SATTERTHWAITE APPROXIMATION

Two-Sample t-Test for Independent


Samples with Unequal Variance
(Satterthwaites Method)
(1) Compute the test statistic:

(2) Compute the approximate degrees of freedom:

) =

2
2/

2
2/

Two-Sample t-Test for Independent


Samples with Unequal Variance
(Satterthwaites Method Contd)
(3) Round (df) down to the nearest integer (df):
if t < -t(df),1-/2 or t > +t(df),1-/2: Reject H0
if -t(df),1-/2 t +t(df),1-/2: Accept H0
(4) Compute ~p-value as follows:
o for

, if t 0, p = 2x[area left of t]

o if t > 0, p = 2x[area to right of t]

Two-Sample t-Test for Independent


Samples with Unequal Variance
(Satterthwaites Method Contd)
(5) Compute two-sided 100%x(1-) CI:
o 1-2 (12 22)

Acceptance and Rejection Regions for


Two Independent Samples t-Test with
Unequal Variance

(/2)

Example Problem #1: Paired t-Test


We are interested in the relationship between oral
contraceptives (OC) use and blood pressure (SBP) in
women. In one study, the SBP levels (mm Hg) in 10
women while not using (baseline;
) vs. while
using (follow-up;
) OCs were collected
( , , = - ) as follows: (115,128,13), (112,115,3),
(107,106,-1), (119,128,9), (115,122,7), (138,145,7),
(126,132,6),
(105,109,4),
(104,102,-2),
and
(115,117,2). Assume that SBP is normally distributed
for each woman in the study. (a) Assess the statistical
significance of this SBP-OC data. (b) Compute a 95%
CI for the true increase in mean SBP after starting
OCs.

Example Problem #2: Paired t-Test


Effect of different contraceptive methods on fertility.
Researchers are interested in comparing how long it
takes users of either OCs or diaphragms to become
pregnant after stopping contraception. One study group
of 20 OC users was formed, and diaphragm users who
match each OC user with regard to age, race, parity, and
SES were found. The investigators compute the
differences in time to fertility between previous OC and
diaphragm users, and find that the mean difference
(OC-diaphragm) in time to fertility is 4 months with a
standard deviation (Sd) of 8 months. (a) What can we
conclude from these data? (b) Compute a 95% CI for the
true mean difference between OC users and diaphragm
users in time to fertility.

Example Problem #3: Independent


Samples t-Test (Equal Variance)
Suppose a sample of eight 35- to 39-year old nonpregnant premenopausal oral contraceptive (OC)
users who work in a company and have a mean
systolic blood pressure (SBP) of 132.86 mm Hg and
sample standard deviation s = 15.34 mm Hg are
identified.
A sample of 21 non-pregnant, premenopausal, non-OC users in the same age group
are similarly identified who have a mean SBP of
127.44 mm Hg. What can be said about the
underlying mean difference in blood pressure
between the two groups?

Example Problem #4: Independent Samples tTest (Unequal Variance)


Cholesterol levels are assessed in 100 children (4-14 years old)
of men who have died from heart disease. The mean
cholesterol level in the group ( ) is 207.3 mg/dL and the
sample standard deviation (s1) is 35.6 mg/dL. Previously, the
cholesterol levels in this group of children were compared with
175 mg/dL which was assumed to be the underlying mean level
in children in this age group based on previous large studies.
Suppose that researchers found that among 74 control children
(i.e., fathers have no history of heart disease), the mean
cholesterol level ( ) is 193.4 mg/dL with a sample standard
deviation (s2) of 17.3 mg/dL. Compare the mean of the two
groups using the two-sample t-test for independent samples,
and assuming unequal variance among the two samples. That
is, test for the equality of the mean cholesterol levels of the
children whose fathers have died from heart disease vs. the
children whose fathers do not have a history of heart disease.

Example Problem #5: Independent


Samples t-Test (Unequal Variance)
Compare the mean duration of hospitalization between
antibiotic users and non-antibiotic users. Among 7
antibiotic users, mean duration of hospitalization was
11.57 days with standard deviation 8.81 days. Among
the 18 non-antibiotic users, mean duration of
hospitalization was 7.44 days with standard deviation
3.70 days.
(a) Using a statistical F-test it was
determined that the two variances above differ
significantly (p = 0.004, F = 5.68), and thus a twosample t test with unequal variance should be used.
(b) Compute a 95% CI for the mean difference in
duration of hospital stay between patients who do and
do not receive antibiotics.

Textbook Pages for Reading


Consideration and Practice
Problems
Rosner, B. (2011). Fundamentals of Biostatistics. (7th
ed.). Canada: Thomson Learning Inc. p. 269-281, 287293 and p. 301-307.
No Practice Problems Set #9 will be given for this lecture
material, but make sure you know how to do the
example problems given here.

Chi-Square (2) Distribution Table


Area to the Left of the Critical Value
0.005

0.01

0.025

0.05

0.10

0.90

0.95

0.975

0.99

0.995

Bernard Rosner (2011). Fundamentals of Biostatistics (7th Ed.)

<

Bernard Rosner (2011). Fundamentals of Biostatistics (7th Ed.)

Vous aimerez peut-être aussi