Vous êtes sur la page 1sur 23

STAT 2020

STATISTICS FOR BIOLOGISTS


T O P I C 8 I N F E R E N C E F O R A P O P U L AT I O N M E A N W I T H
U N K N O W N S TA N D A R D D E V I AT I O N
OBJECTIVES (PSLS CHAPTER 17)

Inference for the mean of one population


( unknown)

When is unknown
The t distributions
The t test
Confidence intervals
Matched pairs t procedures
Robustness
WHEN IS UNKNOWN

The sample standard deviation s provides an estimate of the


population standard deviation .

Larger samples give more reliable estimates of .

Population
distribution

Large sample Small sample


THE t DISTRIBUTIONS

We take 1 random sample of size n from a Normal

population N(,)

When is known, the sampling distribution of the mean

is Normal N(/n), and the statistic


z
x
n
follows the standard Normal, N(0,1), distribution

When is estimated from the sample standard


t
x
deviation s, the statistic
s n
follows the t

distribution t (,1) with n 1 degrees of freedom.


Standard Normal
t distribution, df 4
t distribution, df 1

Standard Normal
t distribution, df 100
t distribution, df 20

When n is large, s is a
good estimate of and
the t df n 1 distribution is
close to the standard
Normal distribution.
STANDARD DEVIATION VERSUS STANDARD
ERROR
For a sample of size n, 1
the sample standard deviation s is: s
n 1
i
( x x ) 2

n 1 is the degrees of freedom.

The value s/n is called the standard error of the mean SEM.
Scientists often present their sample results as the mean SEM.

A medical study examined the effect of a new medication


on the seated systolic blood pressure. The results,
presented as mean SEM for 25 patients, are 113.5
8.9. What is the standard deviation s of the sample data?

SEM = s/n <=> s = SEM*n


s = 8.9*25 = 44.5
When is
TABLE C
unknown we use a
t distribution with
n 1 degrees of
freedom (df).

Table C shows the


z-values and t- x
values t
corresponding to s n
landmark P-
values/ confidence
levels.


When is
known, we use
the Normal
distribution and
THE ONE-SAMPLE t TEST

As before, a test of hypotheses requires a few steps:

1. Stating the null hypothesis (H0)

2. Deciding on a one-sided or two-sided alternative (Ha)

3. Choosing a significance level

4. Calculating t and its degrees of freedom

5. Finding the area under the curve with Table C or software

6. Stating the P-value and concluding


We draw a random sample of size n from an N(, )
population.

When is estimated from s, the distribution of the test


statistic t is a t distribution with df = n 1.

H o : = o

x 0
t 1
s n
t

This resulting t test is robust to deviations from
Normality as long as the sample size is large enough.
The P-value is the probability, if H0 was true, of
randomly drawing a sample like the one obtained or
more extreme in the direction of Ha.

One-sided
(one-tailed)
x 0
t
s n

Two-sided
(two-tailed)
USING TABLE For Ha: > 0 if n = 10 and t = 2.70, then
C:

2.398 < t =2.7 < 2.821


so
0.02 > P-value > 0.01
CONFIDENCE INTERVALS
A confidence interval is a range of values that contains the true
population parameter with probability (confidence level) C.

We have a set of data from a population with both and


unknown. We use x to estimate , and s to estimate using a t
distribution (df n 1).

C is the area between t*


and t*.
We find t* in the t table.
The margin of error m is: C

m m

m t*s n t* t*
Data on the blood cholesterol levels (mg/dl) of 24 lab rats give a sample mean
of 85 and a standard deviation of 12. We want a 95% confidence interval for the
mean blood cholesterol of all lab rats.
Data on the blood cholesterol levels (mg/dl) of 24 lab rats give a sample mean
of 85 and a standard deviation of 12. We want a 95% confidence interval for the
mean blood cholesterol of all lab rats.

df n1 23
* s 12
m t (2.069)( ) 5.07
n 24
x m 85 5.07 79.9 90.1mg/ dl
We are 95% confident that the true mean blood cholesterol
of all lab rats is between 79.9 and 90.1 mg/dl.
Data on the blood cholesterol levels (mg/dl) of 24 lab rats give a sample mean
of 85 and a standard deviation of 12. At a significance level of 0.05, is there
sufficient evidence to support the claim that average blood cholesterol in rats is
different than that in mice, = 90?
Data on the blood cholesterol levels (mg/dl) of 24 lab rats give a sample mean
of 85 and a standard deviation of 12. At a significance level of 0.10, is there
sufficient evidence to support the claim that average blood cholesterol in rats is
different than that in mice, = 90?

H 0 : 90
H a : 90
x 0 85 90
t 2.04
s/ n 12 / 24
df 23
.05 p .1
MATCHED PAIRS t PROCEDURES
Sometimes we want to compare treatments or conditions at
the individual level. The data sets produced this way are not
independent.

The individuals in one sample are related to those in the


other sample.

Pre-test and post-test studies look at data collected on the same


sample elements before and after some experiment is performed.

Twin studies often try to sort out the influence of genetic factors by
comparing a variable between sets of twins.

Using people matched for age, sex, and education in social studies
allows us to cancel out the effect of these potential lurking variables.
MATCHED PAIRS t PROCEDURES

In these cases, we use the paired data to test for

the difference in the two population means.

The variable studied becomes : average difference,

and

H0: diff = 0; Ha: diff > 0 (or < 0, or 0)

Conceptually, this is just like a test for one

population mean.
Study Participants: 53 obese children ages 9 to 12 with a BMI above
the 95th percentile for age and gender
Intervention: family counseling sessions on the stoplight diet
(green/yellow/red approach to eating food) - after 8 weekly sessions and
3 follow-up sessions
Assessment: Weight change at 15 weeks of intervention
Was the intervention effective in helping obese children lose
weight?

H0: = 0 versus Ha: < 0 (one-sided


test)

Variable N Mean SE Mean StDev


Weightchange 53 -2.404 0.720 5.243
Study Participants: 53 obese children ages 9 to 12 with a BMI above
the 95th percentile for age and gender
Intervention: family counseling sessions on the stoplight diet
(green/yellow/red approach to eating food) - after 8 weekly sessions and
3 follow-up sessions
Assessment: Weight change at 15 weeks of intervention
Was the intervention effective in helping obese children lose
x
weight? 0 2.404 0
t 3.34
s/ n 0.72

df 52, p 0.005

here is a significant weight loss, on average, following intervention.


ROBUSTNESS

The t procedures are exactly correct when the population is


exactly Normal. This is rare.

The t procedures are robust to small deviations from


Normality, but:
The sample must be a random sample from the population.
Outliers and skewness strongly influence the mean and therefore
the t procedures. Their impact diminishes as the sample size gets
larger because of the Central Limit Theorem.

As a guideline:
When n < 15, the data must be close to Normal and without
outliers.
When 15 > n > 40, mild skewness is acceptable, but not
outliers.
When n > 40, the t statistic will be valid even with strong
Does oligofructose consumption stimulate calcium absorption?
Healthy adolescent males took a pill for nine days and had their calcium
absorption tested on the ninth day. The experiment was repeated three
weeks later. Subjects received either an oligofructose pill first or a
control sucrose pill first. The order was randomized and the experiment
was double-blind.

Fractional calcium absorption data (in percent of intake) for 11 subjects:


Subject Control Oligofructose Difference (O-C)
1 78.4 62.0 -16.4 40

Difference in percent intake (O-C)


2 76.6 95.1 18.5
-10.9 30
3 57.4 46.5
4 51.5 49.4 -2.1
20
5 49.0 89.7 40.7
6 46.6 43.8 -2.8 10
7 44.2 50.3 6.1
8 42.9 51.6 8.7 0
9 37.2 66.6 29.4
10 34.1 52.7 18.6 -10
11 24.6 54.0 29.4
xbar 49.32 60.15 10.84 -20
-2 -1 0 1 2
s 16.51 17.24 18.15 Score

Can we use a t inference procedure for this study? Discuss the


assumptions.
Red wine, in moderation
Does drinking red wine in moderation increase blood polyphenol
levels,
thus maybe protecting against heart attacks?

Nine randomly selected healthy men were assigned to drink half a


bottle of red wine daily for two weeks. The percent change in their
blood polyphenol 0.7levels
3.5 was
4 assessed:
4.9 5.5 7 7.4 8.1 8.4

x = 5.5; s = 2.517; df = n 1 = 8

1.2 2.4 3.6 4.8 6.0 7.2 8.4


Percent change in blood polyphenol level

Can we use a t inference procedure for this study? Discuss the


assumptions.

Vous aimerez peut-être aussi