08 T-Tests

STAT 2020
STATISTICS FOR BIOLOGISTS

T O P I C 8 I N F E R E N C E F O R A P O P U L AT I O N M E A N W I T H
U N K N O W N S TA N D A R D D E V I AT I O N
OBJECTIVES (PSLS CHAPTER 17)
Inference for the mean of one population

( unknown)
When is unknown
The t distributions
The t test
Confidence intervals
Matched pairs t procedures
Robustness
WHEN IS UNKNOWN
The sample standard deviation s provides an estimate of the

population standard deviation .
Larger samples give more reliable estimates of .
Population
distribution
Large sample Small sample

THE t DISTRIBUTIONS
We take 1 random sample of size n from a Normal
population N(,)
When is known, the sampling distribution of the mean
is Normal N(/n), and the statistic

z
x
n
follows the standard Normal, N(0,1), distribution
When is estimated from the sample standard

t
x
deviation s, the statistic
s n
follows the t
distribution t (,1) with n 1 degrees of freedom.

Standard Normal
t distribution, df 4
Standard Normal
When n is large, s is a
good estimate of and
the t df n 1 distribution is
close to the standard
Normal distribution.
STANDARD DEVIATION VERSUS STANDARD
ERROR
For a sample of size n, 1
the sample standard deviation s is: s
n 1
i
( x x ) 2
n 1 is the degrees of freedom.
The value s/n is called the standard error of the mean SEM.
Scientists often present their sample results as the mean SEM.
A medical study examined the effect of a new medication

on the seated systolic blood pressure. The results,
presented as mean SEM for 25 patients, are 113.5
8.9. What is the standard deviation s of the sample data?
SEM = s/n <=> s = SEM*n

s = 8.9*25 = 44.5
When is
TABLE C
unknown we use a
t distribution with
n 1 degrees of
freedom (df).
Table C shows the

z-values and t- x
values t
corresponding to s n
landmark P-
values/ confidence
levels.

When is
known, we use
the Normal
distribution and
THE ONE-SAMPLE t TEST
As before, a test of hypotheses requires a few steps:
1. Stating the null hypothesis (H0)
2. Deciding on a one-sided or two-sided alternative (Ha)
3. Choosing a significance level
4. Calculating t and its degrees of freedom
5. Finding the area under the curve with Table C or software
6. Stating the P-value and concluding

We draw a random sample of size n from an N(, )
population.
When is estimated from s, the distribution of the test

statistic t is a t distribution with df = n 1.
H o : = o
x 0
t 1
s n
t

This resulting t test is robust to deviations from
Normality as long as the sample size is large enough.
The P-value is the probability, if H0 was true, of
randomly drawing a sample like the one obtained or
more extreme in the direction of Ha.
One-sided
(one-tailed)
x 0
t
s n
Two-sided
(two-tailed)
USING TABLE For Ha: > 0 if n = 10 and t = 2.70, then
C:
2.398 < t =2.7 < 2.821

so
0.02 > P-value > 0.01
CONFIDENCE INTERVALS
A confidence interval is a range of values that contains the true
population parameter with probability (confidence level) C.
We have a set of data from a population with both and

unknown. We use x to estimate , and s to estimate using a t
distribution (df n 1).
C is the area between t*

and t*.
We find t* in the t table.
The margin of error m is: C
m m
m t*s n t* t*
Data on the blood cholesterol levels (mg/dl) of 24 lab rats give a sample mean
of 85 and a standard deviation of 12. We want a 95% confidence interval for the
mean blood cholesterol of all lab rats.
of 85 and a standard deviation of 12. We want a 95% confidence interval for the
mean blood cholesterol of all lab rats.
df n1 23
* s 12
m t (2.069)( ) 5.07
n 24
x m 85 5.07 79.9 90.1mg/ dl
We are 95% confident that the true mean blood cholesterol
of all lab rats is between 79.9 and 90.1 mg/dl.
of 85 and a standard deviation of 12. At a significance level of 0.05, is there
sufficient evidence to support the claim that average blood cholesterol in rats is
different than that in mice, = 90?
of 85 and a standard deviation of 12. At a significance level of 0.10, is there
sufficient evidence to support the claim that average blood cholesterol in rats is
different than that in mice, = 90?
H 0 : 90
H a : 90
x 0 85 90
t 2.04
s/ n 12 / 24
df 23
.05 p .1
MATCHED PAIRS t PROCEDURES
Sometimes we want to compare treatments or conditions at
the individual level. The data sets produced this way are not
independent.
The individuals in one sample are related to those in the

other sample.
Pre-test and post-test studies look at data collected on the same

sample elements before and after some experiment is performed.
Twin studies often try to sort out the influence of genetic factors by
comparing a variable between sets of twins.
Using people matched for age, sex, and education in social studies
allows us to cancel out the effect of these potential lurking variables.
MATCHED PAIRS t PROCEDURES
In these cases, we use the paired data to test for
the difference in the two population means.
The variable studied becomes : average difference,
and
H0: diff = 0; Ha: diff > 0 (or < 0, or 0)
Conceptually, this is just like a test for one
population mean.
Study Participants: 53 obese children ages 9 to 12 with a BMI above
the 95th percentile for age and gender
Intervention: family counseling sessions on the stoplight diet
(green/yellow/red approach to eating food) - after 8 weekly sessions and
3 follow-up sessions
Assessment: Weight change at 15 weeks of intervention
Was the intervention effective in helping obese children lose
weight?
H0: = 0 versus Ha: < 0 (one-sided

test)
Variable N Mean SE Mean StDev

Weightchange 53 -2.404 0.720 5.243
Study Participants: 53 obese children ages 9 to 12 with a BMI above
the 95th percentile for age and gender
Intervention: family counseling sessions on the stoplight diet
(green/yellow/red approach to eating food) - after 8 weekly sessions and
3 follow-up sessions
Assessment: Weight change at 15 weeks of intervention
Was the intervention effective in helping obese children lose
x
weight? 0 2.404 0
t 3.34
s/ n 0.72
df 52, p 0.005
here is a significant weight loss, on average, following intervention.

ROBUSTNESS
The t procedures are exactly correct when the population is

exactly Normal. This is rare.
The t procedures are robust to small deviations from

Normality, but:
The sample must be a random sample from the population.
Outliers and skewness strongly influence the mean and therefore
the t procedures. Their impact diminishes as the sample size gets
larger because of the Central Limit Theorem.
As a guideline:
When n < 15, the data must be close to Normal and without
outliers.
When 15 > n > 40, mild skewness is acceptable, but not
outliers.
When n > 40, the t statistic will be valid even with strong
Does oligofructose consumption stimulate calcium absorption?
Healthy adolescent males took a pill for nine days and had their calcium
absorption tested on the ninth day. The experiment was repeated three
weeks later. Subjects received either an oligofructose pill first or a
control sucrose pill first. The order was randomized and the experiment
was double-blind.
Fractional calcium absorption data (in percent of intake) for 11 subjects:

Subject Control Oligofructose Difference (O-C)
1 78.4 62.0 -16.4 40
Difference in percent intake (O-C)

2 76.6 95.1 18.5
-10.9 30
3 57.4 46.5
4 51.5 49.4 -2.1
20
5 49.0 89.7 40.7
6 46.6 43.8 -2.8 10
7 44.2 50.3 6.1
8 42.9 51.6 8.7 0
9 37.2 66.6 29.4
10 34.1 52.7 18.6 -10
11 24.6 54.0 29.4
xbar 49.32 60.15 10.84 -20
-2 -1 0 1 2
s 16.51 17.24 18.15 Score
Can we use a t inference procedure for this study? Discuss the

assumptions.
Red wine, in moderation
Does drinking red wine in moderation increase blood polyphenol
levels,
thus maybe protecting against heart attacks?
Nine randomly selected healthy men were assigned to drink half a

bottle of red wine daily for two weeks. The percent change in their
blood polyphenol 0.7levels
3.5 was
4 assessed:
4.9 5.5 7 7.4 8.1 8.4
x = 5.5; s = 2.517; df = n 1 = 8
1.2 2.4 3.6 4.8 6.0 7.2 8.4

Percent change in blood polyphenol level
Can we use a t inference procedure for this study? Discuss the

assumptions.

08 T-Tests

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

08 T-Tests

Transféré par

Droits d'auteur :

Formats disponibles

STAT 2020

STATISTICS FOR BIOLOGISTS

Inference for the mean of one population

The sample standard deviation s provides an estimate of the

Larger samples give more reliable estimates of .

Large sample Small sample

We take 1 random sample of size n from a Normal

When is known, the sampling distribution of the mean

is Normal N(/n), and the statistic

When is estimated from the sample standard

distribution t (,1) with n 1 degrees of freedom.

n 1 is the degrees of freedom.

A medical study examined the effect of a new medication

SEM = s/n <=> s = SEM*n

Table C shows the

As before, a test of hypotheses requires a few steps:

1. Stating the null hypothesis (H0)

2. Deciding on a one-sided or two-sided alternative (Ha)

3. Choosing a significance level

4. Calculating t and its degrees of freedom

5. Finding the area under the curve with Table C or software

6. Stating the P-value and concluding

When is estimated from s, the distribution of the test

2.398 < t =2.7 < 2.821

We have a set of data from a population with both and

C is the area between t*

The individuals in one sample are related to those in the

Pre-test and post-test studies look at data collected on the same

In these cases, we use the paired data to test for

the difference in the two population means.

The variable studied becomes : average difference,

H0: diff = 0; Ha: diff > 0 (or < 0, or 0)

Conceptually, this is just like a test for one

H0: = 0 versus Ha: < 0 (one-sided

Variable N Mean SE Mean StDev

here is a significant weight loss, on average, following intervention.

The t procedures are exactly correct when the population is

The t procedures are robust to small deviations from

Fractional calcium absorption data (in percent of intake) for 11 subjects:

Difference in percent intake (O-C)

Can we use a t inference procedure for this study? Discuss the

Nine randomly selected healthy men were assigned to drink half a

1.2 2.4 3.6 4.8 6.0 7.2 8.4

Can we use a t inference procedure for this study? Discuss the

Vous aimerez peut-être aussi