Vous êtes sur la page 1sur 31

Biostatistics for Public Health

Chapter 11 - Inference About a Mean

Kevin Brooks MSc., PhD.


Objective

Perform and interpret one-sample, two-sample, and


paired t hypothesis tests on means.
✓ Estimated Standard Error of the Mean
✓ Student’s t Distribution
✓ One-Sample t Test
✓ Confidence Interval for μ
✓ Paired Samples
✓ Conditions for Inference

Kevin Brooks MSc., PhD. 2


Estimated Standard Error of the
Mean

• We rarely know population standard deviation σ


⇒ instead, we calculate sample standard
deviations s and use this as an estimate of σ
• We then use s to calculate this estimated
standard error of the mean:
s

 SE x =
n
• Using s instead of σ adds a source of uncertainty
⇒ z procedures no longer apply 

⇒ use t procedures instead

3
Kevin Brooks MSc., PhD.
Student’s t distributions

• A family of distributions identified by


“Student” (William Sealy Gosset) in 1908
• t family members are identified by their degrees of
freedom, df.
• t distributions are similar to z distributions but with
broader tails
• As df increases → t tails get skinnier → t become
more like z

4
Kevin Brooks MSc., PhD.
t probability density functions with 1, 9, and ∞ degrees
of freedom.

5
Kevin Brooks MSc., PhD.
t table (Table C)

• Use Table C to look up t values and probabilities


– Entries ⇒ t values
– Rows ⇒ df
– Columns ⇒ probabilities

6
Kevin Brooks MSc., PhD.
Understanding Table C

Let tdf, p ≡ a t value with df degrees of freedom and


cumulative probability p. For example, t9, 0.90 = 1.383

Table C. Traditional t table


Cumulative p 0.75 0.80 0.85 0.90 0.95 0.975
Upper-tail p 0.25 0.20 0.15 0.10 0.05 0.025
df = 9 0.703 0.883 1.100 1.383 1.833 2.262

7
Kevin Brooks MSc., PhD.
The 10th and 90th percentiles on t9.

Left tail: Right tail:


Pr(T9 < -1.383) = 0.10 Pr(T9 > 1.383) = 0.10

8
Kevin Brooks MSc., PhD.
One-Sample t Test

A. Hypotheses. H0: µ = µ0 vs. Ha: µ ≠ µ0 (two-sided) [ Ha:


µ < µ0 (left-sided) or Ha: µ > µ0 (right-sided)]
B. Test statistic.



 x − µ0

tstat = with df = n − 1
s n
C. P-value. Convert tstat to P-value [table C or software].
Small P ⇒ strong evidence against H0
D. Significance level (optional). See Ch 9 for guidelines.
9
Kevin Brooks MSc., PhD.
One-Sample t Test: Statement of
the Problem

• Do SIDS babies have lower than average birth


weights?
• We know from prior research that the mean birth
weight of the non-SIDs babies in this population is
3300 grams
• We study n = 10 SIDS babies, determine their birth
weights, and calculate x-bar = 2890.5 and s = 720.
• Do these data provide significant evidence that
SIDs babies have different birth weights than the
rest of the population?

10
Kevin Brooks MSc., PhD.
One-Sample t Test: Example
A. H0: µ = 3300 versus Ha: µ ≠ 3300 (two-sided)
B. Test statistic

x − µ 0 2890.5 − 3300
tstat = = = −1.80
SE x 720 10
df = n − 1 = 10 − 1 = 9
C.P = 0.1054 [next slide]
Weak evidence against H0
(optional) Data are not significant at α = 0.10
11
Kevin Brooks MSc., PhD.
Converting the tstat to a P-value

tstat ⇒ P-value via Table C. Wedge |tstat| between critical


value landmarks on Table C. One-tailed 0.05 < P < 0.10 and
two-tailed 0.10 < P < 0.20.

Table C. Traditional t table |tstat| = 1.80


Cumulative p 0.75 0.80 0.85 0.90 0.95 0.975
Upper-tail p 0.25 0.20 0.15 0.10 0.05 0.025
df = 9 0.703 0.883 1.100 1.383 1.833 2.262

tstat ⇒ P-value via software. Use a software utility to


determine that a t of −1.80 with 9 df has two-tails of 0.1054.

12
Kevin Brooks MSc., PhD.
Two-tailed P-value, SIDS illustrative example

13
Kevin Brooks MSc., PhD.
Confidence Interval for µ
s
(1 − α )100% CI for µ = x ± t n −1,1− α ⋅
2
n
• Typical point “estimate ± margin of error” formula
• tn-1,1-α/2 is from t table (see bottom row for conf. level)
• Similar to z procedure except uses s instead of σ
• Similar to z procedure except uses t instead of z
• Alternative formula:
s
x ± t n −1,1− α ⋅ SE x where SE x =
2
n 14
Kevin Brooks MSc., PhD.
Confidence Interval: Example 1

Let us calculate a 95% confidence interval for μ for the


birth weight of SIDS babies.

x = 2890.5 s = 720.0 n = 10
s
95% CI for µ = x ± t10 −1,1− .05 ⋅
2
n
720
= 2890.5 ± 2.262 ⋅
10
= 2890.5 ± 515.1
= (2375.4 to 3405.6) grams
15
Kevin Brooks MSc., PhD.
Confidence Interval: Example 2

Data are “% of ideal body weight” in 18 diabetics:


{107, 119, 99, 114, 120, 104, 88, 114, 124, 116,
101, 121, 152, 100, 125, 114, 95, 117}. Based on
these data we calculate a 95% CI for µ.
x = 112.778 s = 14.424 n = 18
s 14.242
SE x = = = 3.400
n 18
t n −1,1− α = t18−1,1− .05 = t17,.975 = 2.110 (from t table)
2 2

x ± (t n −1,1− α )( SE x ) = 112.778 ± (2.110)(3.44)


2

= 112.778 ± 7.17 = (105.6, 120.0)


16
Kevin Brooks MSc., PhD.
Paired Samples

• Paired samples: Each point in one sample is matched


to a unique point in the other sample
• Pairs be achieved via sequential samples within
individuals (e.g., pre-test/post-test), cross-over
trials, and match procedures
• Also called “matched-pairs” and “dependent
samples”

17
Kevin Brooks MSc., PhD.
Example: Paired Samples

• A study addresses whether oat bran reduce LDL


cholesterol with a cross-over design.
• Subjects “cross-over” from a cornflake diet to an oat
bran diet.
– Half subjects start on CORNFLK, half on
OATBRAN
– Two weeks on diet 1
– Measures LDL cholesterol
– Washout period
– Switch diet
– Two weeks on diet 2
– Measures LDL cholesterol
18
Kevin Brooks MSc., PhD.
Example, Data
Subject CORNFLK OATBRAN
---- ------- -------
1 4.61 3.84
2 6.42 5.57
3 5.40 5.85
4 4.54 4.80
5 3.98 3.68
6 3.82 2.96
7 5.01 4.41
8 4.34 3.72
9 3.80 3.49
10 4.56 3.84
11 5.35 5.26 19
12 3.89 3.73 Kevin Brooks MSc., PhD.
Calculate Difference Variable “DELTA”

• Step 1 is to create difference variable “DELTA”


• Let DELTA = CORNFLK - OATBRAN
• Order of subtraction does not materially effect
results (but does change sign of differences)
• Here are the first three observations:

ID CORNFLK OATBRAN DELTA 
 Positive


---- ------- ------- -----
 values
1 4.61 3.84 0.77
represent
2 6.42 5.57 0.85
3 5.40 5.85 -0.45 lower LDL on
↓ ↓ ↓ ↓ oatbran
20
Kevin Brooks MSc., PhD.
Explore DELTA Values
Here are all the twelve paired differences (DELTAs): 

0.77, 0.85, −0.45, −0.26, 0.30, 0.86, 0.60, 0.62, 0.31, 0.72, 0.09, 0.16

-0.5 0 0.5 1 1.5

EDA shows a slight


negative skew, a median
of about 0.45, with
results varying from −0.4
to 0.8. 21
Kevin Brooks MSc., PhD.
Descriptive stats for DELTA

• Data (DELTAs): 0.77, 0.85, −0.45, −0.26, 0.30,


0.86, 0.60, 0.62, 0.31, 0.72, 0.09, 0.16
• The subscript d will be used to denote statistics for
difference variable DELTA

n = 12
xd = 0.3808
s d = 0.4335
22
Kevin Brooks MSc., PhD.
95% Confidence Interval for µd

• A t procedure directed toward the DELTA


variable calculates the confidence interval
for the mean difference.
sd
(1 − α )100% CI for µ d = xd ± t n −1,1− α ⋅
2
n
• “Oat bran” data:
For 95% confidence use t12−1,1− .05 = t11,.975 = 2.201 (from Table C)
2

.4335
95% CI for µ d = 0.3808 ± 2.201 ⋅
12
= 0.3808 ± 0.2754
= (0.105 to 0.656) 23
Kevin Brooks MSc., PhD.
Paired t Test

• Similar to one-sample t test


• µ0 is usually set to 0, representing “no mean
difference”, i.e., H0: µ = 0
• Test statistic:

xd − µ 0
tstat =
sd n
df = n − 1
24
Kevin Brooks MSc., PhD.
Paired t Test: Example“Oat bran” data

A. Hypotheses. H0: µd = 0 vs. Ha: µd ≠ 0


B. Test statistic.
xd − µ 0 0.38083 − 0
tstat = = = 3.043
s n .4335 / 12
df = n − 1 = 12 − 1 = 11
C. P-value. P = 0.011 (via computer). The evidence
against H0 is statistically significant.
D. Significance level (optional). The evidence against H0
is significant at α = 0.05 but is not significant at α = .01
25
Kevin Brooks MSc., PhD.
SPSS Output: Oat Bran data
● USE SAS TO PRODUCE THIS EXAMPLE LIVE

26
Kevin Brooks MSc., PhD.
Conditions for Inference

t procedures require these conditions:


• SRS (individual observations or DELTAs)
• Valid information (no information bias)
• Normal population or large sample (central limit
theorem)

27
Kevin Brooks MSc., PhD.
The Normality Condition

• The Normality condition applies to the sampling


distribution of the mean, not the population.
• Therefore, it is OK to use t procedures when:
– The population is Normal
– Population is not Normal but is symmetrical and
n is at least 5 to 10
– The population is skewed and the n is at least 30
to 100 (depending on the extent of the skew)

28
Kevin Brooks MSc., PhD.
Can a t procedures be used?

• If dataset is skewed and small: avoid t


procedures
• If dataset has a mild skew and is moderate in
size: use t procedures
• If data set is highly skewed and is small: avoid t
procedure

29
Kevin Brooks MSc., PhD.
Thank You

For Viewing

Kevin Brooks MSc., PhD.


MPH Program,
Division of Public Health
College of Human Medicine
Michigan State University
brooks52@msu.edu
30
Kevin Brooks MSc., PhD.
Biostatistics for Public Health

The End

Chapter 11 - Inference About a Mean

Kevin Brooks MSc., PhD. 31

Vous aimerez peut-être aussi