Académique Documents
Professionnel Documents
Culture Documents
Geert Verbeke
Biostatistical Centre, K.U.Leuven
geert.verbeke@med.kuleuven.be
http://perswww.kuleuven.be/geert verbeke
Contents
I
Introductory material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
II
What is statistics ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Introduction to Biostatistics
27
III
IV
10
From the population to the sample, and back to the population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
11
12
13
14
Introduction to Biostatistics
101
151
269
ii
15
16
17
VI
18
19
20
21
22
Bibliography
Introduction to Biostatistics
358
499
iii
Part I
Introduction, motivation and example
Introduction to Biostatistics
Chapter 1
Introductory material
. Motivation
. Course material
. Evaluation system
Introduction to Biostatistics
1.1
Motivation
Master thesis
Statistics in the (bio-)medical literature
Correct analysis of collected data
Correct interpretation of results
Introduction to Biostatistics
1.2
Course material
. ...
Introduction to Biostatistics
. Online:
http://ucs.kuleuven.be/links/index.htm
. Local installation:
http://ucs.kuleuven.be/java/download/download.html
and follow instructions
Introduction to Biostatistics
1.3
Evaluation system
Part A:
Introduction to Biostatistics
Chapter 2
Homeopathy: The test
. The controversy
. The movie
. Blinding
. Placebo
. The ultimate experiment
. The statistics
. Errors in statistics
Introduction to Biostatistics
2.1
The controversy
Introduction to Biostatistics
2.2
The movie
Introduction to Biostatistics
2.3
Blinding
(J.Randy)
Introduction to Biostatistics
10
Bias can be introduced if the scientist knows what samples are being investigated.
This can be avoided by blinding
Blinding is obtained by randomly assigning codes to the samples/treatments.
The codes are broken after all data have been collected.
The less objective the measurements are, the more important is blinding:
. Survival of the patient is an objective measure
. Tumour reduction is a semi-objective measure
Introduction to Biostatistics
11
Introduction to Biostatistics
12
Introduction to Biostatistics
13
2.4
Placebo
14
The fact that treated patients improve is no evidence for the efficacy of the
treatment:
. Natural improvement can occur
. Improvement can be the result of the attention given to the patient
Hence, showing efficacy of a treatment requires comparison to placebo
This explains the popularity of the placebo-controled trials in bio-medical sciences
Is the use of placebo ethical ?
Introduction to Biostatistics
15
In cases where the use of placebo is considered unethical, the new treatment is
often compared to a standard one.
The aim of the study is then to show that the new treatment is at least as
good as the standard treatment.
Introduction to Biostatistics
16
2.5
2 5 tubes are prepared, with 5C dilution. The first five starting from active
substance, the second five starting from pure water
These 10 tubes are given a random label: blinding
The tubes are diluted further to obtain 2 20 dilutions of 18C
Introduction to Biostatistics
17
New labels are assigned, in order to rule out any form of fraude.
A sample of living human cells is added to a drop of each tube.
How many cells have been activated by the different test solutions ?
Measurements are performed by two different labs in parallel.
Labs were told there were 20 active solutions and 20 placebo solutions. This
was done to avoid that researchers would classify all solutions as non-active.
Introduction to Biostatistics
18
2.6
The statistics
Introduction to Biostatistics
19
Decision
Reality
Homeopathy
Placebo
Homeopathy
11
20
Placebo
11
20
20
20
40
Introduction to Biostatistics
20
Decision
Reality
Homeopathy
Placebo
Homeopathy
10
10
20
Placebo
10
10
20
20
20
40
Introduction to Biostatistics
21
This random variability is also reflected in the results of the first lab:
Decision
Reality
Homeopathy
Placebo
Homeopathy
11
20
Placebo
11
20
20
20
40
In general, repeating an experiment would rarely lead to exactly the same results
By random chance, one may obtain results which are slightly different from 10/20
How much is slightly ?
Introduction to Biostatistics
11/20 ?
12/20 ?
13/20 ?
...
22
What number x of correct positive test results should have been obtained in order
to consider this sufficient evidence in favour of H ?
The answer should be based on the probability of having at least x correct positive
test results by pure chance
Such probabilities can be calculated using probability theory:
x
Introduction to Biostatistics
11
12
0.1715
13
0.0564
14
0.0128
15
0.0019
23
Note how unlikely it would be to observe, e.g., x = 15 correct positive test results
if H and P would be equivalent (p = 0.0019)
Therefore, observing x = 15 could be considered strong evidence in favour of H
On the other hand, if there is no difference between H and P, one will observe at
least 11 correct positive test results in 37.62% of the cases, by pure random
chance.
Our experiment therefore does not provide any evidence in favour of H
24
2.7
Errors in statistics
Introduction to Biostatistics
25
Conclusion:
Introduction to Biostatistics
26
Part II
Basic principles of statistical methodology
Introduction to Biostatistics
27
Chapter 3
What is statistics ?
. Examples
. Conclusion
Introduction to Biostatistics
28
3.1
Sickness absence
Gender
Introduction to Biostatistics
No
Yes
female
245
184
429
male
98
58
156
343
242
585
29
Research question:
Is there a relation between absence and gender ?
Introduction to Biostatistics
30
If this would be very unlikely, then the data provide evidence for a relation
between gender and absence
If this would not be unlikely, then the data provide no evidence for such a relation.
Introduction to Biostatistics
31
3.2
Disease status
Age
Introduction to Biostatistics
Cervical cancer
Control
25
42
203
245
> 25
114
121
49
317
366
32
Research question:
Is there a relation between cancer and age ?
Note that 7/49 = 14.3% of the cancer cases had their first pregnancy after the
age of 25 years, while this is 114/317 = 35.96% in the control group
This suggests that cancer is more likely to occur when the first pregnancy was
before the age of 25 years.
How likely are the observed differences to occur by pure chance, if there is no
relation at all between cancer and age at first pregnancy ?
Introduction to Biostatistics
33
If the chance of observing this would be extremely small, then the observed data
provide evidence that there indeed is a relationship
If this chance is high, then the above data do not provide evidence for any relation
at all.
Introduction to Biostatistics
34
3.3
Introduction to Biostatistics
35
Dataset Weightgain:
Heigh protein
Low protein
134
70
146
118
104
101
119
85
124
107
161
132
107
94
83
Average (g)
High protein:
120
Low protein:
101
113
129
97
123
36
It would be of interest to know how likely such differences are to occur by pure
chance, i.e., if weight gains would be completely unrelated to protein intake.
If this is very unlikely, the above data provide evidence that weight gains really
depend on the diet.
If such differences are likely to occur by pure chance, then the above data do not
provide evidence that weight gains show any relation with protein intake.
Introduction to Biostatistics
37
3.4
Introduction to Biostatistics
38
Dataset Cancer:
Stomach
Bronchus
Colon
Ovary
Breast
124
81
248
1234
1235
42
461
377
89
24
25
20
189
201
1581
45
450
1843
356
1166
412
246
180
2970
40
51
166
537
456
727
1112
63
519
3808
46
64
455
791
103
155
406
1804
876
859
365
3460
146
151
942
719
340
166
776
396
37
372
223
163
138
101
72
20
245
283
Introduction to Biostatistics
Average (days)
Stomach:
286
Bronchus:
211.6
Colon:
457.4
Ovary:
884.3
Breast:
1395.9
39
Introduction to Biostatistics
40
3.5
Introduction to Biostatistics
41
Dataset Captopril
Before
After
Patient
SBP
DBP
SBP
DBP
210
130
201
125
169
122
165
121
187
124
166
121
160
104
157
106
167
112
147
101
176
101
145
85
185
121
168
98
206
124
180
105
173
115
147
103
10
146
102
136
98
11
174
98
151
90
12
201
119
168
98
13
198
106
179
110
14
148
107
129
103
15
154
100
131
82
Introduction to Biostatistics
112.3
Diastolic after:
103.1
Systolic before:
176.9
Systolic after:
158.0
42
It would be of interest to know how likely the observed changes in BP are to occur
by pure chance.
If this is very unlikely, the above data provide evidence that BP indeed decreases
after treatment with Captopril. Otherwise, the above data do not provide evidence
for efficacy of Captopril.
Introduction to Biostatistics
43
3.6
Introduction to Biostatistics
No
212
144
356
256
707
963
468
851
1319
44
Research question:
Is the prevalence of severe colds different at the two ages ?
At age 12, 356/1319 = 27% of the children reported severe colds.
At age 14, this percentage equals 468/1319 = 35%
These data suggest that the prevalence of severe colds increases with age.
It would be of interest to know how likely the observed change in prevalence is to
occur by pure chance.
If this is very unlikely, the above data provide evidence that the prevalence indeed
changes with age. Otherwise, the above data do not provide evidence for such a
change.
Introduction to Biostatistics
45
Note that the data structure is similar to the one in the Captopril data, in the
sense that subjects are measured twice at different time points:
Introduction to Biostatistics
46
3.7
47
Available measurements:
Note that the potential relation between BP and log(dose) makes it difficult to
disentangle the their relative relations to the recovery time.
Introduction to Biostatistics
48
Dataset Surgery
Minor non-thoracic
Introduction to Biostatistics
Major non-thoracic
Thoracic
log(dose)
BP
Time
log(dose)
BP
Time
log(dose)
BP
Time
2.26
1.81
1.78
1.54
2.06
1.74
2.56
2.29
1.80
2.32
2.04
1.88
1.18
2.08
1.70
1.74
1.90
1.79
2.11
1.72
66
52
72
67
69
71
88
68
59
73
68
58
61
68
69
55
67
67
68
59
7
10
18
4
10
13
21
12
9
65
20
31
23
22
13
9
50
12
11
8
1.74
1.60
2.15
2.26
1.65
1.63
2.40
2.70
1.90
2.78
2.27
1.74
2.62
1.80
1.81
1.58
2.41
1.65
2.24
1.70
68
63
65
72
58
69
70
73
56
83
67
84
68
64
60
62
76
60
60
59
26
16
23
7
11
8
14
39
28
12
60
10
60
22
21
14
4
27
26
28
2.45
1.72
2.37
2.23
1.92
1.99
1.99
2.35
1.80
2.36
1.59
2.10
1.80
84
66
68
65
69
72
63
56
70
69
60
51
61
15
8
46
24
12
25
45
72
25
28
10
25
44
49
Introduction to Biostatistics
50
3.8
Conclusion
Introduction to Biostatistics
51
Introduction to Biostatistics
52
Chapter 4
Population versus sample
. The population
. The (random) sample
. Statistics versus probability theory
. Types of studies
. Random samples variability uncertainty
Introduction to Biostatistics
53
4.1
Introduction
Introduction to Biostatistics
54
4.2
The population
Introduction to Biostatistics
55
Introduction to Biostatistics
56
4.3
4.3.1
Consider a study on risk factors for the prevalence of low back pain in nurses
Suppose that interest is in the population of all nurses in all Belgian hospitals
Data sets with the following characteristics would be problematic:
. Only female nurses
Introduction to Biostatistics
57
Introduction to Biostatistics
58
4.3.2
Example:
Introduction to Biostatistics
59
4.3.3
Obviously, a data set in which all subjects satisfy the in- and exclusion criteria of
the population will not necessarily allow generalizations of observed effects/trends
to the total population.
Ideally, the sample should be a perfect reflection of the total population of interest
This can only be realized by taking a completely random selection from the total
population
Imbalances for some variables then only occur in small samples, and by pure
chance.
Introduction to Biostatistics
60
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
Introduction to Biostatistics
61
Introduction to Biostatistics
62
4.4
4.4.1
Probability theory
Introduction to Biostatistics
63
Question:
If 100 patients are given the treatment,
what is the probability that less than 60 of them
will experience an improvement ?
Probability theory aims at predicting the outcome of an experiment, knowing the
population
Introduction to Biostatistics
64
4.4.2
Statistics
Introduction to Biostatistics
65
4.4.3
Conclusion
Introduction to Biostatistics
66
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
Introduction to Biostatistics
STATISTICS
67
4.5
4.5.1
Types of studies
Introduction
Introduction to Biostatistics
68
4.5.2
. Treated patients are examined 1 month after the start of the treatment
. Cancer patients are followed after chemotherapy and the outcome of interest is
the time untill disease progression.
. Datasets weightgain, cancer, captopril
Retrospective: Subjects having a particular outcome or endpoint are identified
and studied. Often, measurements from the past are of interest
Introduction to Biostatistics
69
Examples:
Introduction to Biostatistics
70
4.5.3
. Cancer patients are followed after chemotherapy and the outcome of interest is
the time untill disease progression.
. Datasets Captoptil, weightgain, cancer
Observational: The data are collected on a routinely basis, and no new
experiment is set up
Introduction to Biostatistics
71
Examples:
It is often not clear from which population observational data can be believed to
be sampled from.
For example, consider observational data collected on a routinely basis on all
patients treated in a university hospital
For example, consider the data collected by an occupational health service, on a
routinely basis, during a specific year
Introduction to Biostatistics
72
1. Agriculture / fisheries
1.1
2.7
2. Energy / water
0.2
0.5
3. Minerals / chemistry
0.7
2.2
2.4
4.8
5. Other industry
10.7
10.2
6. Construction
0.6
1.0
16.4
21.5
8. Transportation / communication
1.0
3.6
6.0
8.3
60.9
45.2
Missing
6.1
Introduction to Biostatistics
73
4.5.4
. Study where height of children at the age of 12 years old are related to the
height of the parents
. Datasets on sickness absence, or cervical cancer
Longitudinal: The outcome of interest is measured repeatedly over time
Example: Captopril and Severe cold data
Introduction to Biostatistics
74
4.5.5
Clinical trial
A rigourously designed experiment aiming at finding the best treatment for future
patients in a specific condition
All aspects are pre-specified in the study protocol
Typically, a group of patients is randomly allocated to one of a number of treatments, after
which the outcome(s) of interest are measured
These are the only studies that are accepted
by regulatory agencies that approve marketing
of treatments
The random allocation allows causal interpretation of observed treatment effects (see later)
Introduction to Biostatistics
75
Introduction to Biostatistics
76
4.5.6
Cohort study
Introduction to Biostatistics
77
Introduction to Biostatistics
78
4.5.7
Case-control study
Suppose we want to investigate the relation between smoking and lung cancer
One may select a group of smokers and a group of non-smokers, and follow them
for a (long) period of time
The outcome of interest is the incidence of lung cancer.
A potential dataset is:
Lung cancer
Smoking status
Introduction to Biostatistics
Yes
No
Yes
42
203
245
No
114
121
49
317
366
79
Such a study would take an very long time to conduct, as one has too wait untill a
sufficiently high number of cancer cases has been observed.
One therefore often conducts a case-control study, in which a number of cases
(cancers) and controls (non-cancers) are selected, which are questioned about
their smoking behaviour in the past.
This potentially may lead to the same data.
Note that, since the number of sampled cases and controls is pre-defined, such a
study design does not allow the estimation of the prevalence of lung cancer.
However, the case-control study does allow to study the relation between risk
factors and the prevalence of some disease (see later).
Introduction to Biostatistics
80
4.5.8
Introduction to Biostatistics
81
For example, matching for age and gender can be done as follows:
. Sample the required number of cases
. For each case, select a control with the same age and gender as the case
In some situations, one may want to match multiple controls to each case, or
multiple cases to each control.
Ideally, one would like to match for as many factors as possible.
However, matching on too many factors complicates the search for appropriate
controls.
Introduction to Biostatistics
82
4.6
A sample needs to be taken randomly such that it well represents the total
population. Only then, valid conclusions can be drawn
Note however, that different random samples will include different subjects, with
different observations
Hence, each new random sample or, equivalently, each new experiment will lead to
(slightly) different conclusions, implying that, sometimes, wrong conclusions will
be drawn
Note that absolute certainty cannot be expected as conclusions are based on only
a small part (the sample) from the total, infinitely large, population.
Introduction to Biostatistics
83
Conclusion:
Introduction to Biostatistics
84
Chapter 5
Causality and randomization
. Causal effects
. Methods of randomization
. Randomization not always possible
Introduction to Biostatistics
85
5.1
Causal effects
Introduction to Biostatistics
86
What if:
Any difference between both groups, other than the treatment, may explain the
observed difference in efficacy.
In such cases a difference between H and P should not automatically be ascribed
to the treatment.
In general, an observed effect is not necessarily a causal effect in the sense that
the difference in treatment can be interpreted as the cause of the observed
difference in response.
Introduction to Biostatistics
87
The only way to assure treatment groups are comparable with respect to all
known and unknown factors is to assign treatments to subjects in a completely
random way.
This is randomization
Groups then only differ with respect to
the treatments they received
Small imbalances can occur by pure
chance, in small studies
Introduction to Biostatistics
88
Cause = Effect
Introduction to Biostatistics
89
5.2
5.2.1
Methods of randomization
Simple randomization
spinning
Introduction to Biostatistics
90
5.2.2
Block randomization
Introduction to Biostatistics
91
Introduction to Biostatistics
92
5.2.3
Stratified randomization
With relatively small studies, (serious) imbalance can be obtained by pure chance.
Stratified randomization can be used to ensure complete balance, at least with
respect to some measured important prognostic factors.
For example, suppose gender and age are believed to be strongly related to the
outcome of interest.
Ideally, the two treatment groups would have exactly the same age and gender
distribution.
This can be realized by using separate (block) randomization for each combination
of age with gender.
Hence, separate randomization lists are to be constructed for each combination of
age with gender.
Introduction to Biostatistics
93
In practice, one often would like to stratify for as many factors as possible.
However, stratification on too many factors may lead to many incomplete blocks
implying that the balance hoped for cannot be realized
Some extreme versions of stratified randomization are:
. Twin studies: both subjects are assigned randomly to the two treatments
Introduction to Biostatistics
94
Introduction to Biostatistics
95
5.3
5.3.1
Example 1
Suppose one wants to study the effect of chemotherapy in women, on the unborn
baby and its evolution after birth.
Ideally, one would randomize pregnant women into two groups
. Group 1: receives chemotherapy
Introduction to Biostatistics
96
Introduction to Biostatistics
97
5.3.2
Example 2
Suppose one wants to study the relation between smoking and lung cancer.
Ideally, one would randomly subdivide subjects into two groups:
. Group 1: subjects have to smoke many cigarettes, daily, during many years.
Introduction to Biostatistics
98
In such studies, one will select a group of cancer cases, and a comparable group of
non-cancer cases.
All subjects are questioned about their smoking behaviour in the past.
This will still allow to study the association between smoking and the occurrence
of lung cancer
This is an example of a case-control study
Note that associations detected, should not be interpreted as causal
Introduction to Biostatistics
99
5.3.3
Implications
Imbalance with respect to some important prognostic factors cannot be ruled out
Imbalances with respect to measured known factors can be corrected for by
appropriate statistical techniques
However, as one cannot correct for the imbalance with respect to unknown or
unmeasured factors, causality can still not be concluded from such analyses.
For example, one will never be able to show any causality in the relation between
smoking and lung cancer.
Introduction to Biostatistics
100
Part III
Describing and summarizing data
Introduction to Biostatistics
101
Chapter 6
Types of outcomes
. Qualitative data
. Quantitative data
Introduction to Biostatistics
102
6.1
Qualitative data
Introduction to Biostatistics
103
6.2
Quantitative data
Introduction to Biostatistics
104
Chapter 7
Graphical presentation of data
. One variable
. Multiple variables
Introduction to Biostatistics
105
7.1
Introduction to Biostatistics
106
7.2
Histogram:
107
Introduction to Biostatistics
108
7.3
Introduction to Biostatistics
109
7.4
Scatterplot:
Introduction to Biostatistics
110
Introduction to Biostatistics
111
Introduction to Biostatistics
112
7.5
Introduction to Biostatistics
113
Categorized histograms:
Introduction to Biostatistics
114
Bubble plot:
Introduction to Biostatistics
115
Chapter 8
Summary statistics
. Introduction
. Measures of location
. Measures of spread
. Percentages
. Geometric mean and standard deviation
. Missing data
. Graphical representation
. Examples from the biomedical literature
Introduction to Biostatistics
116
8.1
Introduction
A B
117
8.2
Measures of location
Location measures:
Where are the observations more or less located ?
As an example, consider the small sample:
1, 3, 3, 4, 5, 14
x =
Introduction to Biostatistics
118
Introduction to Biostatistics
119
3+4
2
{z
4}
14
= 3.5
3,
Introduction to Biostatistics
3,
4,
5,
14
120
Mode
Introduction to Biostatistics
121
For symmetric data, the average and the median are the same. In general, they
are not:
Symmetric
Median = Mean
Introduction to Biostatistics
Skewed
an ean
i
ed M
M
122
With skewed data, the mean can be heavily influenced by the random presence of
a/some extreme observation(s).
In order to still get a good idea about the location of the data, one then prefers
the use of the median over the mean:
Introduction to Biostatistics
123
8.3
Measures of spread
Introduction to Biostatistics
124
Measures of spread:
How similar are the observations ?
xn
....
x8
x7
x6
x5
x4
x3
x2
x1
Introduction to Biostatistics
xn
.. ..
x7
x4
or
x2
x8
x6
x5
x3
x1
125
1, 3, 3, 4, 5, 14
n
X
(xi x) =
i=1
4 2 2 1 + 0 + 9
0
=
= 0
6
6
2
2
2
2
2
2
(4)
+
(2)
+
(2)
+
(1)
+
0
+
9
(xi x)2 =
i=1
6
n
X
Introduction to Biostatistics
106
= 17.67
6
126
Sample variance:
s2 =
1
n1
n
X
(xi x)2
i=1
s =
Introduction to Biostatistics
v
u
u
u
u
t
1
n1
n
X
(xi
i=1
x)2
= 21.2 = 4.60
127
Sample range:
R = max xi min xi = 14 1 = 13
i
Note that the range strongly depends on the sample size n: Larger samples are
more likely to contain extreme observations, hence are more likely to have a larger
range
Since we hope that our measure of spread reflects the amount of variation in the
population, we prefer a measure that does not depend on the sample size.
The sample interquartile range is the range obtained after deletion of the 25%
highest and 25% lowest values in the sample:
1, 3, 3, 4, 5, 14
Introduction to Biostatistics
3,3,4,5
IQR = 5 3 = 2
128
The interquartile range does not depend on the sample size n, since a larger
number of observations is deleted in larger samples.
The variance (hence also mean quadratic deviation and standard deviation), and
the range are very sensitive to outliers:
1, 3, 3, 4, 5, 14 s2 = 21.2,
1, 3, 3, 4, 5, 20 s2 = 48.8,
1, 3, 3, 4, 5, 26 s2 = 88.4,
R = 13
R = 19
R = 28
129
With skewed data, the standard deviation can be heavily influenced by the random
presence of a/some extreme observation(s).
In order to still get a good idea about the variation in the data, one then prefers
the use of the interquartile range over the standard deviation:
Symmetric data = Standard deviation
Skewed data = IQR
Introduction to Biostatistics
130
Introduction to Biostatistics
131
8.4
Percentages
xi =
x1 + x2 + . . . + xn
Number of people with sickness absence
=
n
n
Introduction to Biostatistics
132
Hence, the average equals the observed proportion (percentage) of people with
sickness absence
Note that, once the average is known, the original observations are known, hence
also the variability:
0
1
x6
x5
x4
x3
x2
x1
x = 0.5
Introduction to Biostatistics
x6
x5
x4
x3
x2
x1
x = 0.16
1
x6
x5
x4
x3
x2
x1
x = 0.84
133
n
x (1 x)
n1
Since the variance directly follows from average, only the mean is reported, no
measure of spread
In general, measures of location and spread are only used for quantitative
(continuous) variables.
Other variables are described by observed frequencies and percentages.
Introduction to Biostatistics
134
For example, the variables sickness absence and cancer type could be
summarized as follows:
Variable
Sickness:
Cancer type :
Introduction to Biostatistics
(n = 256)
Yes
103 (40.23%)
No
153 (59.77%)
Breast
79 (30.86%)
Stomach
26 (10.16%)
Bronchus
83 (32.42%)
Colon
58 (22.66%)
Ovary
10 (3.90%)
135
8.5
The mean and standard deviation are used to describe symmetric data
In case of skewness, alternatives such as median and IQR are used
An alternative is to transform the original data such that the transformed
observations are symmetric
A special, frequently occurring, case is when symmetry is obtained using a
logarithmic transformation
As an example, we consider the survival times of cancer patients, and we restrict
to the patients with stomach cancer
Introduction to Biostatistics
136
Summary statistics:
Introduction to Biostatistics
137
Often, skewness in the direction of the large values can be solved with a
logarithmic transformation:
X = survival time Y = ln(X) = ln(survival time)
Introduction to Biostatistics
138
Stomach
X Y = ln(X)
124
4.82
42
25
3.74
3.22
45
3.81
412
51
6.02
3.93
1112
7.01
46
103
3.83
4.63
876
6.78
146
340
4.98
5.83
396
5.98
Introduction to Biostatistics
139
Outcome
Survival time (days)
Stomach cancer
mean (stand.dev.)
144.03
(3.49)
= exp(4.97)
= exp(1.25)
which is very different from the arithmetic mean and standard deviation
that were reported before:
Introduction to Biostatistics
140
8.6
Missing data
(n = 203)
Yes
103 (50.74%)
No
100 (49.26%)
Variable
Sickness:
(n = 256)
Yes
103 (40.23%)
No
100 (39.07%)
Missing
53 (20.70%)
141
8.7
Introduction to Biostatistics
142
8.8
. Bar plot
. Categorized for 3 groups
Introduction to Biostatistics
143
. Scatterplot
. Jittered to avoid overlapping symbols
. Categorized for 42 groups
. Lines are probably means
Introduction to Biostatistics
144
Introduction to Biostatistics
145
. Categorization of continuous
variables
. Explicit acknowledgement of
missing values
Introduction to Biostatistics
146
Introduction to Biostatistics
147
Wu [13], Figure 2:
Introduction to Biostatistics
148
Introduction to Biostatistics
149
. Geometric means
. IQR instead of geometric
standard deviations
Introduction to Biostatistics
150
Part IV
Basic concepts of statistical inference
Introduction to Biostatistics
151
Chapter 9
Describing the population
. Stochastic variable
. Discrete probability distribution
. Continuous probability distribution
. Summary characteristics for probability distributions
. The normal distribution
Introduction to Biostatistics
152
9.1
Stochastic variable
Introduction to Biostatistics
153
At most, one can say that, e.g., it is more likely that this subject will have a BMI
between 20 and 25 than a BMI larger than 35
So, the realized value of X depends on random variability
Our sample x1, x2, . . . , x321 can be considered as n = 321 realizations of the same
random variable X, for subject i, i = 1, 2, . . . , 321.
Drawing the sample can be viewed as performing n = 321 small experiments, each
time selecting one subject and measuring this subjects BMI, leading to the
realized value xi of X
How likely it is to observe certain values or certain ranges of values is described by
the probability distribution
Similar to the classification of observations, one can classify random variables as
qualitative, quantitative, discrete, continuous,. . .
Introduction to Biostatistics
154
Other examples:
Experiment
Selecting one Belgian
Throwing a die
Introduction to Biostatistics
Random variable
Type of variable
. Weight
. Qualitative, continuous
. Height
. Qualitative, continuous
. Gender
. Quantitative, dichotomous
. Quantitative, discrete
. Quantitative, discrete
. Percentage of women
. Quantitative, disc./cont. ?
. Average age
. Quantitative, continuous
. Quantitative, disc./cont. ?
155
9.2
X =
Introduction to Biostatistics
156
These probabilities are the percentages of 0s and 1s one would observe if the
experiment would be repeated over and over again.
Hence, we need to describe the observations one would observe in an experiment
of size n = +
We will do this in exactly the same way as how discrete observations were
described before, i.e., using the Bar plot:
1.0
0 + 1 = 1
0.5
1
0.0
0 (No)
1 (Yes)
157
Introduction to Biostatistics
158
For example, if the experiment is to throw a die, and X is the result of one throw,
then the probability distribution of X is given by:
Graphically:
xi :
1 2 3 4 5 6
i = P (X = xi):
1
6
1
6
1
6
1
6
1
6
1
6
1.0
0.8
1 = 2 = 3 = 4 = 5 = 6 =
1
6
0.6
0.4
0.2
0.0
159
i = P (X = xi):
Graphically:
1.0
0.8
0.6
0.4
0.2
0.0
1
1
6
2
160
Introduction to Biostatistics
161
9.3
Introduction to Biostatistics
162
To study this, we draw samples from our population, and study the behaviour of
the histogram of BMI, when the sample size increases
Six samples will be drawn, with sample sizes:
. n = 10
. n = 20
. n = 50
. n = 100
. n = 500
. n = 5000
For each sample, the histogram of the observed BMI values is constructed
We will use histograms with intervalwidth equal to 1
Introduction to Biostatistics
163
Introduction to Biostatistics
164
Introduction to Biostatistics
165
Obviously, the obtained histogram becomes smoother as the sample size increases
Introduction to Biostatistics
166
Eventually, the histogram becomes a smooth function, f (x), called the density
function of the random variable X:
Introduction to Biostatistics
167
Since we started from histograms with intervalwidth equal to 1, all histograms had
the property that the total surface of a bar represented the proportion of
observations in the corresponding interval
This property is now carried over to the density function:
f (x)
......
..............
......................
..............................................
......................................
...............................................
..........................................................................................
......................................................................
..............................................................................
.........................................................................................................................................
.........................................................................................................
....................................................................................................................
.....................................................................................................................................................................................................
.....................................................................................................................................................................................................................................
..............................................................................................................................................................................
...........................................................................................................................................................................................
..........................................................................................................................................................................................................................................................................................
............................................................................................................................................................................................
............................................................................................................................................................................................
..........................................................................................................................................................................................................................................................................................
..............................................................................................
P (a X b)
a
Introduction to Biostatistics
b
168
169
Introduction to Biostatistics
170
9.4
The probability distribution can be viewed as an extension of the bar plot and the
histogram to the total population, or equivalently, an infinite sample
It describes how likely specific values are to be observed when randomly drawing
from the population
Similarly, we can now define measures of location and spread for the total
population.
These are the measures of location and spread one would observe if the total
population would be measured, i.e., in an infinite sample
Introduction to Biostatistics
171
. population variance: 2
Note that, similarly to the probability distribution, the population versions of the
measures of location and spread, are theoretical concepts, as one will never
observe them, or measure them.
Indeed, in practice, one only observes a finite sample, from which it is possible to
calculate the sample-based versions, such as the sample average x and s2:
Population
Sample
(never observable) (observable)
Introduction to Biostatistics
Location:
Spread:
s2
172
9.5
1
1
2
f (x) =
exp
(x
2 2
2 2
173
X
N(0, 1)
X N(0, 1) = + X N(, 2 )
Introduction to Biostatistics
174
P ( X + )
= P 1
1 = 68.27%
P ( 1.96 X + 1.96)
= P 1.96
1.96 = 95%
P ( 2 X + 2)
2 = 95.45%
= P 2
P ( 3 X + 3)
= P 3
3 = 99.73%
Introduction to Biostatistics
175
Introduction to Biostatistics
176
Chapter 10
From the population to the sample, and back to the
population
Introduction to Biostatistics
177
10.1
The probability distribution describes how likely specific values are to be observed
when randomly drawing from the population
Also, the probability distribution summarizes how the data in an infinitely large
sample would be distributed.
Hence, when a sufficiently large random sample is drawn from that population,
one expects the observed histogram to be close to the probability distribution.
This is probability theory
Introduction to Biostatistics
178
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
Introduction to Biostatistics
RANDOM
NOT RANDOM
179
10.2
In statistics, the observations in the sample are used to learn about the
population.
Obviously, in order for the sample to learn us something about the population, the
sample needs to be drawn randomly
This procedure, in which information from the sample is used to draw conclusions
about the population is called statistical inference or estimation
Introduction to Biostatistics
180
P
O
P
U
L
A
T
I
O
N
181
Introduction to Biostatistics
182
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
Introduction to Biostatistics
Distribution of X
?
183
10.3
Example: BMI
184
185
Introduction to Biostatistics
186
Histogram
Introduction to Biostatistics
Possible transformations
187
Introduction to Biostatistics
188
Introduction to Biostatistics
189
Introduction to Biostatistics
190
From now on, Y = ln(BMI) will be assumed normally distributed, with mean 3.21
and variance 0.152 , and probabilities of interest can be calculated
For example, the proportion of males in the population with overweight
(BMI> 25) equals:
Introduction to Biostatistics
191
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
Distribution of BMI
Introduction to Biostatistics
192
10.4
Normal, or reference, values are often used in the reporting of clinical test results
95% normal values are the values c1 and c2 such that 95% of the total population
falls in between those values:
95%
c1
c2
With clinical test results, the normal values are with respect to the normal
(healthy) population
Introduction to Biostatistics
193
Example:
Introduction to Biostatistics
194
The probability that a randomly selected, healthy, patient has a value within the
95% normal values is by definition 95%.
When two independent parameters are measured, the probability that a randomly
selected, healthy, patient has both parameters within the respective 95% normal
value ranges equals
P (Both parameters within the 95% normal range) = 0.95 0.95 = 0.9025
Hence, combining two sets of 95% normal values leads to region which contains
only 90.25% of the total population.
In general, one has :
P (k parameters within the 95% normal range) = 0.95k
Introduction to Biostatistics
195
Some values:
k
1
2
5
10
20
50
100
0.95k
0.9500
0.9025
0.7738
0.5987
0.3585
0.0769
0.0059
Hence, for 100 tests, we have almost certainty that at least one parameter will
take a value outside its 95% normal range.
Obviously, one can use higher percentages (e.g., 99% instead of 95%), but the
problem of multiple testing remains.
Note that the above calculations assume the tested parameters to be independent.
Introduction to Biostatistics
196
For example, suppose that a normal value for parameter 1 always leads to a
normal value for parameter 2, and vice versa, we would have that
P (Both parameters within the 95% normal range)
= P (The first parameter within the 95% normal range) = 0.95
Alternatively, suppose that a normal value for parameter 1 always leads to a
value for parameter 2 which is outside its normal range, we would have that
P (Both parameters within the 95% normal range) = 0
Conclusion:
Normal values need to be interpreted with extreme caution
Introduction to Biostatistics
197
Chapter 11
Estimation, sampling variability, bias, and precision
. Estimation
. Example
. Sampling variability
. Bias and precision
. Sampling distribution of the sample average
. Standard error of the mean
Introduction to Biostatistics
198
11.1
Estimation
One does not always have to estimate the complete distribution of a random
variable X
Often, interest is in specific characteristics of the distribution, such as the
population average
One can then try to draw conclusions about , based on the observed data in the
sample, without having to specify the distribution of X
Since the population characteristics (mean, median, variance, . . . ) are summary
statistics in an infinitely large sample, it is natural to estimate them using the
sample versions.
Introduction to Biostatistics
199
Summarized:
Population parameter
c = x
c 2 = s2
population median
sample median
population IQR
sample IQR
...
...
Note that it is very unlikely that the estimate is identical to the parameter it is
estimating
How close the estimate will be to the true value depends on various aspects.
Some key results will be explained in the next sections
Introduction to Biostatistics
200
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
Distribution of X
Characteristic of f (x)
...........
...
.
.
................................ 2
.....
..... ..
..........
Introduction to Biostatistics
Estimate for
d
=x
...........
.
..
.
.
2
d2
.................................
=
s
.....
..... ..
..........
201
11.2
Example: BMI
Re-consider the random sample of n = 2605 Belgian males, with the outcome X
of interest being the body mass index (BMI):
Introduction to Biostatistics
202
We previously described the distribution of BMI with a normal distribution for the
log-transformed values, and we estimated the percentage of people in the
population with overweight (BMI > 25) to be 47.34%
Note that this percentage equals = P (X > 25) which is a characteristic of the
BMI distribution in the Belgian male population
We can estimate this by the observed proportion c of males in the sample, with
overweight:
number of males with xi > 25
c =
= 46.99%
2605
Note that c is a new estimate for = P (X > 25), which does not require
estimating the whole distribution of BMI in the total population.
If our new estimate would have been very different from the previous one
(47.34%), this would have been some indication that our estimation of the BMI
distribution was not accurate.
Introduction to Biostatistics
203
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
Distribution of BMI
= P (X > 25)
Introduction to Biostatistics
Characteristic of f (x)
Estimate for
b =
observed proportion of
males with BMI> 25
204
11.3
Sampling variability
Variance:
= 2
In general:
Introduction to Biostatistics
c = x
c = s2
c
205
The estimate c is calculated from the observed data, hence the resulting value for
c completely depends on the sample that was drawn from the population.
Repeating the experiment would lead to another sample, other observations, thus
also to another estimate c for .
The estimate c can therefore be interpreted as one realized value of a random
d
variable
d
d
The distribution of
is called the sampling distribution of .
It describes what
values of c are to be expected should the experiment be repeated many times.
d
In general, the sampling distribution of
depends on:
d
. The statistic :
different for mean, median, variance,. . .
206
11.4
Introduction to Biostatistics
207
Example:
Distribution of
. Asymmetric
. Unlikely to have serious underestimation
. Likelely to have serious overestimation
. On average, our estimate will be correct
Example:
. Symmetric
Distribution of
Introduction to Biostatistics
208
Example:
. Symmetric
Distribution of
Example:
. Symmetric
Distribution of
Introduction to Biostatistics
209
Example:
Distribution of
. Symmetric
. On average, our estimate is not correct
. We have a biased estimator c for
Introduction to Biostatistics
210
11.5
211
Simulation steps:
x(1)
Sample 2
x(2)
Sample 3
x(3)
Sample 4
x(4)
Sample 5
x(5)
...
. . .
...
212
1
5
213
Introduction to Biostatistics
214
Introduction to Biostatistics
215
Introduction to Biostatistics
1
5
( = 0.2):
216
Introduction to Biostatistics
217
X N ,
n
This is the Central Limit Theorem (CLT), which will be the basis for most
calculations from now on
Introduction to Biostatistics
218
This implies that, for sufficiently large samples, x is an unbiased estimate for ,
which is more precise as the sample gets larger
What is sufficiently large ? The simulation results have shown that this entirely
depends on the distribution of the original data X. Hence, no generally valid
answer can be given.
One can also use similar simulation studies to investigate the sampling distribution
of other statistics such as the median, the variance,. . . . However, no general
results can be derived as in the CLT
For the variance, such simulations (Vestac Java Applet basics distribution
of variance) show that . . .
. . . . the sample variance s2 is unbiased for 2
. . . . the precision of s2 increases with n
Introduction to Biostatistics
219
These results (and the CLT) are the key motivation for conducting large studies,
since collecting additional information (more observations, larger sample) will lead
to increased precision in the estimation:
Introduction to Biostatistics
220
11.6
Introduction to Biostatistics
221
Average s.e.m.
Introduction to Biostatistics
222
Chapter 12
Confidence intervals
. Example
. The confidence interval
. Interpretation
. Properties of confidence intervals
. Example
. Example from the biomedical literature
Introduction to Biostatistics
223
12.1
Consider the Captopril data, where blood pressure was taken in 15 hypertensive
patients, before and after administration of the drug Captopril:
224
Introduction to Biostatistics
Patient
Before
DBP
After
DBP
Change
xi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
130
122
124
104
112
101
121
124
115
102
98
119
106
107
100
125
121
121
106
101
85
98
105
103
98
90
98
110
103
82
5
1
3
2
11
16
23
19
12
4
8
21
4
4
18
225
Note that, in relatively small samples, the histogram can be difficult to interpret.
One therefore prefers not to estimate the complete distribution of X
On the other hand, there does not seem to be strong evidence for severe skewness.
Focuss will be on the estimation of the average of X. As before, our estimate
will be the sample average:
c = x = 9.27
c it is of interest
Since every other sample would have lead to another estimate ,
to know how likely it is that our estimate is far from the true value
Introduction to Biostatistics
226
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
Distribution of X
Characteristic of f (x)
: Average change
in diastolic BP
Estimate
i
Introduction to Biostatistics
for =
= x = 9.27
227
12.2
The CLT describes what values for x are to be expected if one would repeatedly
draw new samples. If n is sufficiently large, we have that:
X N , n
So, thanks to the CLT, we can calculate how likely it is to have an estimate far
from the correct value, or close to the correct value
Introduction to Biostatistics
228
P (1 X < 1) = P
X
s
2
n
2
n
Introduction to Biostatistics
229
The above calculations can be repeated for other distances between x and :
Distance |x |
1
2
3
4.36
6.25
Probability
35%
63%
82%
95%
99%
For example, 99% of the random samples would yield a sample average that is not
further away from than 6.25 units
So, there is 99% chance that the interval [x 6.25; x + 6.25] contains .
The interval [x 6.25; x + 6.25] is called the 99% confidence interval (C.I.)
for .
Introduction to Biostatistics
230
Confidence interval
[x 1; x + 1]
[8.27; 10.27]
[x 2; x + 2]
[7.27; 11.27]
[x 3; x + 3]
[6.27; 12.27]
[x 4.36; x + 4.36] [4.91; 13.63]
[x 6.25; x + 6.25] [3.02; 15.52]
231
12.3
Interpretation
Let us focuss on the 95% confidence interval. For other confidence levels, the
interpretation is similar.
We derived that 95% of the random samples would yield a sample average x that
is not further away from than 4.36 units
So, one can expect that approximately 95 out of 100 samples would lead to an
interval [x 4.36; x + 4.36] that contains .
For a specific data set, such as the Captopril data, the obtained confidence interval
[4.91; 13.63] may or may not contain . However it is very likely to contain ,
since only 5 out of 100 data sets would lead to an interval not containing .
Illustration: Vestac Java Applet statistical tests confidence interval for mean
Introduction to Biostatistics
232
Introduction to Biostatistics
233
12.4
Ideally, C.I.s are small, as this reflects a very precise estimation of the unknown
population parameter
Hence, a C.I. can be used as an indication of the precision of the estimation:
. short C.I.: precise estimation
Introduction to Biostatistics
Confidence interval
[4.91; 13.63]
[3.02; 15.52]
234
Intuitively: larger intervals are more likely to contain the unknown population
parameter
The length of the C.I. decreases with the sample size n
Illustration: Vestac Java Applet statistical tests confidence interval for mean
Introduction to Biostatistics
235
Precise estimation of
Introduction to Biostatistics
Imprecise estimation of
236
Introduction to Biostatistics
237
12.5
Example: BMI
The concept of C.I. has been explained in the context of the estimation of a
population average
However, C.I.s can be constructed for any characteristic of the distribution of
the random variable X of interest (variance, mean, median, proportion)
As an example, we re-consider the BMI data on n = 2605 Belgian males, and we
estimated the proportion of males in the population with overweight (BMI> 25)
by the observed proportion c = 46.99%
As an indication of the precision of this estimate, we can calculate, e.g., a 95%
C.I. for : [0.45; 0.49]
The interval [0.45; 0.49] contains the unknown proportion with 95% probability
Introduction to Biostatistics
238
12.6
Introduction to Biostatistics
239
Chapter 13
Hypothesis testing
. Example
. Null and alternative hypothesis
. The p-value and level of significance
. Possible errors in decision making
. Hypothesis testing versus confidence intervals
. Example
. Example from the biomedical literature
Introduction to Biostatistics
240
13.1
Example
We continue the example with the Captopril data, where blood pressure was taken
in 15 hypertensive patients, before and after administration of the drug Captopril:
241
Introduction to Biostatistics
Patient
Before
DBP
After
DBP
Change
xi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
130
122
124
104
112
101
121
124
115
102
98
119
106
107
100
125
121
121
106
101
85
98
105
103
98
90
98
110
103
82
5
1
3
2
11
16
23
19
12
4
8
21
4
4
18
242
Focuss will be on finding evidence that the treatment affected the BP.
In case the treatment would have no effect, the average of X would be zero.
So, if one can show that there is (strong) evidence that 6= 0, then this can be
considered as evidence for a treatment effect.
c = x = 9.27
Based on our sample, the estimate for is
Obviously, this estimate is relatively far away from 0, suggesting that the
treatment might have affected BP
c = 9.27 could have occurred by pure
On the other hand, the observed effect
chance, even if there would be no treatment effect at all.
In general, one will decide that there is evidence that 6= 0 if our estimate x of
is far away from 0, i.e., if |x 0| is large.
Introduction to Biostatistics
243
13.2
versus
HA : 6= 0
Introduction to Biostatistics
244
Introduction to Biostatistics
245
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
Distribution of X
Characteristic of f (x)
H0 : = 0
HA : 6= 0
Estimate for
b = x = 9.27
Introduction to Biostatistics
246
13.3
Introduction to Biostatistics
247
The CLT will help us in deciding, as it describes what values for x are to be
expected if one would repeatedly draw new samples. If n is sufficiently large, we
have that:
X N , n
Introduction to Biostatistics
248
X N 0, 74.21
15
|
0
So, if = 0, we expect that random samples generate averages that behave
according to the above distribution
Hence, if we observe a random sample, with an average that is very extreme
according to this distribution, we should question the validity of the null
hypothesis = 0
Introduction to Biostatistics
249
X N 0, 74.21
15
| | |
1 0 1
If = 0, the probability of observing a x less than 1 unit away from 0 is:
P (1 X 1)
= P
1 0
74.21
15
Introduction to Biostatistics
X 0
s
74.21
15
1 0
74.21
15
Hence, if there is no treatment effect, i.e., if = 0, then there is only 35% chance
of having a random sample with average x within 1 unit away from 0.
So, there would be 65% chance of observing a sample with an average more than
1 unit away from 0.
Observing x = 1 cannot really be considered a lot of evidence against H0 : = 0
Similar calculations can be used for other values, such as 2, 3, . . .
X N 0, 74.21
15
| | | | | | |
3 1 0 1 3
2
2
Introduction to Biostatistics
251
Probability
65%
37%
18%
This probability can also be calculated for the distance |x 0| = 9.27 that was
observed in our experiment:
X N 0, 74.21
15
|
9.27
Introduction to Biostatistics
|
0
|
9.27
252
Probability
65%
37%
18%
0.1%
253
The probability 0.1% that expresses how extreme our observations are in case the
null hypothesis would be true, is denoted by p, and is called the p-value.
A small p-value is indication of extreme results were H0 true. One then rejects
the null hypothesis
A large p-value is indication that the observed results are perfectly in line with
what can be expected to observe, if H0 is true. One then does not reject the
null hypothesis, which is equivalent to accepting the null hypothesis
In practice, one has to decide how small p should get before the null hypothesis is
rejected.
One therefore specifies the so-called level of significance :
p < = reject H0
p = accept H0
Introduction to Biostatistics
254
Introduction to Biostatistics
255
Introduction to Biostatistics
256
13.4
257
This should not have been considered as formal proof that any treatment effect
would be absent.
Maybe, the treatment effect is not 0, but very close to 0. The data one then
would observe would look very similar to data that would be observed if = 0,
such that the data do not allow to detect that 6= 0
Conclusion:
Statistics can prove everything
258
Introduction to Biostatistics
259
13.5
For the Captopril data, we have drawn conclusions about the average treatment
effect in the population, through 2 different statistical procedures:
. 95% confidence interval: [4.91; 13.63]
. Significance of treatment effect, p = 0.001
We know from the C.I. that the average treatment effect is likely to be between
4.91 and 13.63, excluding 0
The significance test has rejected the value 0 as possible value for
So, both procedures agree
Introduction to Biostatistics
260
Question:
Do both procedures always agree ?
Answer:
Yes, provided the levels of significance and
confidence are complementary to each other:
Level of significance Confidence level (1 )100%
Introduction to Biostatistics
0.05
95%
0.10
90%
0.01
99%
261
95% C.I.
[
.....
.. .....
.. ... ..
....
...
...
..
H0
In case of rejecting H0 (p < = 0.05):
95% C.I.
..
.........
.. .... ..
...
....
..
..
H0
Introduction to Biostatistics
262
Introduction to Biostatistics
Biomedical literature
263
13.6
Example: BMI
The concept of statistical tests has been explained in the context of the
estimation of a population average
However, tests can be constructed for any characteristic of the distribution of
the random variable X of interest (variance, mean, median, proportion)
As an example, we re-consider the BMI data on n = 2605 Belgian males, and we
estimated the proportion of males in the population with overweight (BMI> 25)
by the observed proportion c = 46.99%
Suppose it would be known that 10 years before our sample was taken, only 40%
of the Belgian males suffered from overweight
Introduction to Biostatistics
264
versus
HA : > 0.40
Introduction to Biostatistics
265
versus
HA : 6= 0
Introduction to Biostatistics
266
13.7
. Two-sided tests
. 5% level of significance
Introduction to Biostatistics
267
. Table 2:
Introduction to Biostatistics
268
Part V
Some frequently used tests
Introduction to Biostatistics
269
Chapter 14
The comparison of two means: Unpaired data
. Example
. Confidence interval for the difference
of two means
. The unpaired t-test
. Assumptions
. Example: Survival times of cancer patients
. Example from the biomedical literature
Introduction to Biostatistics
270
14.1
Example
Re-consider the example on the weight gain in rats, where interest is in the
comparison between rats fed on a high or low protein diet
Group-specific histograms:
Introduction to Biostatistics
271
Introduction to Biostatistics
272
Note that, strictly speaking, we have two populations, with a sample randomly
drawn from each:
. High protein rats: The hypothetical population of all rats that are given a
high protein diet
. Low protein rats: The hypothetical population of all rats that are given a
low protein diet
From the first population, a random sample of n1 = 12 rats was taken. From the
second one, a random sample of n2 = 7 rats was drawn.
The corresponding observed means are x1 = 120 and x2 = 101 respectively.
Because there is no relation between the observations taken from the first
population and those taken from the second, we have unpaired data.
Introduction to Biostatistics
273
14.2
Let 1 and 2 be the (unknown) mean weight gain in the high and low protein
population, respectively:
Low protein
High protein
|
2
|
1
274
1
2
1
2
Introduction to Biostatistics
275
Introduction to Biostatistics
276
14.3
Often, it is of interest to test whether two populations have the same mean.
This is translated in a set of hypotheses of the form:
H0 : 1 = 2
versus
HA : 1 6= 2
We will reject the null hypothesis if the observed data show too much deviation
from what is expected to see if the null hypothesis were correct
Hence, we will reject H0 if x1 is much larger than x2 , or vice versa
This is equivalent with rejecting H0 if |x1 x2| is too large
Introduction to Biostatistics
277
Question:
How large is too large ?
Answer:
If the observed difference |x1 x2|
Introduction to Biostatistics
278
So, even if there is no relation at all between the protein content of the diet and
weight gain, then one can still expect to observe a difference of at least 19g in
7.6% of the future similar experiments.
Since p = 0.0757 > 0.05 = , we consider this unsufficient evidence to conclude
that the protein level would indeed affect the weight gain
Introduction to Biostatistics
279
Conclusion:
There is no significant difference (p = 0.0757) in weight gain
between rats on a high protein level diet,
and rats on a low protein level diet
The above testing procedure is called the unpaired t-test since unpaired data are
analysed, and since the calculation of the p-value is based on the t-distribution.
Introduction to Biostatistics
280
14.4
Assumptions
The calculation of the C.I., as well as the computation of the p-value are based on
the sampling distribution of X 1 X 2, which describes what values for x1 x2
can be expected in case the experiment would be repeated many times.
The sampling distribution of X 1 X 2 is completely determined from the
sampling distribution of X 1 and X 2
In case of large samples, those distributions are known to be normal (CLT)
In small samples, this normality of X 1 and X 2 is only valid in cases where the
original data are (approximately) normally distributed.
Introduction to Biostatistics
281
Low protein
High protein
|
2
Introduction to Biostatistics
|
1
282
Conclusion:
Low protein
High protein
|
2
|
1
Low protein
High protein
|
2
|
1
Note that the samples in our group were small (n1 = 12 and n2 = 7). Hence the
histograms should be explored for any evidence against symmetry
Introduction to Biostatistics
283
Note that, given the small sample sizes, assessment of symmetry is difficult
This illustrates another drawback of small samples: Assumptions are often needed,
which are very hard to check based on the observed data.
Introduction to Biostatistics
284
versus
HA : 12 6= 22
Most software packages automatically report the results from such a test, and
even provide a corrected unpaired t-test, which corrects for the unequal variances:
Introduction to Biostatistics
285
The variances are not significantly different from each other (p = 0.9788), such
that our original result remains valid.
Note that, since the variances are so similar, the corrected and uncorrected t-tests
yield very similar results (p-values).
Often, non-equality of the variances is associated with non-normality of the data
Introduction to Biostatistics
286
14.5
Based on the data on survival times of cancer patients, we want to compare the
surival times of stomach cancer patients with the survival times of colon cancer
patients
Summary statistics:
We observe a large difference of 457.4 286 = 171.4 days in average survival time
between both groups.
Introduction to Biostatistics
287
On the other hand, there is a lot of variability between the subjects in both groups.
Hence, it is not clear whether the observed difference of 171 days is sufficient
evidence to conclude that survival times are indeed different for colon cancer
patients and stomach cancer patients
Results of the unpaired t-test:
We do not find a significant difference between both groups, with respect to the
survival time (p = 0.2483).
Introduction to Biostatistics
288
However, the histograms suggest skewness in the data, such that the underlying
assumption of normality becomes questionable:
The skewness in the direction of the large values suggests that a logarithmic (or
similar) transformation might be useful:
X = survival time Y = ln(X) = ln(survival time)
Introduction to Biostatistics
289
Histogram
Introduction to Biostatistics
Possible transformations
290
Stomach
X Y = ln(X)
Colon
X Y = ln(X)
124
4.82
248
5.51
42
25
3.74
3.22
377
189
5.93
5.24
45
3.81
1843
7.52
412
51
6.02
3.93
180
537
5.19
6.29
1112
7.01
519
6.25
46
103
3.83
4.63
455
406
6.12
6.01
876
6.78
365
5.90
146
340
4.98
5.83
942
776
6.85
6.65
396
5.98
372
5.92
163
101
5.09
4.62
20
3.00
283
5.65
Introduction to Biostatistics
291
The observed difference between both groups is still not significant (p = 0.0671),
but the p-value is very different from what we obtained before the transformation
(p = 0.2483).
This illustrates that:
Introduction to Biostatistics
292
Note that this is another example where geometric means and standard
deviations would be useful to describe the location and spread in survival times in
the two cancer groups separately:
Outcome
Survival time (days)
Stomach cancer
mean (stand.dev.)
144.03
(3.49)
Colon cancer
mean (stand.dev.)
314.19
(2.72)
= exp(4.97)
= exp(5.75)
= exp(1.25)
= exp(1.00)
which is very different from the arithmetic means and standard deviations that
were reported before:
Introduction to Biostatistics
293
The fact that the formal test has been performed on the log-transformed survival
times does not change the interpretation of the result
If the log-transformed survival times are different for the two groups, then also the
untransformed survival times
Hence, although the conclusion, strictly speaking, should be that
there is no significant difference in log survival times,
it will often be formulated as
there is no significant difference in survival times.
Introduction to Biostatistics
294
14.6
. Large samples
. Similar variability in both groups
. p < 0.001 rather than p = 0.000
Introduction to Biostatistics
295
Introduction to Biostatistics
296
Chapter 15
The comparison of two proportions: Unpaired data
. Example
. The chi-squared test
. Assumptions The Fisher Exact test
. Rows versus columns
. Example: Case-control data
. Example from the biomedical literature
Introduction to Biostatistics
297
15.1
Example
Sickness absence
Gender
Introduction to Biostatistics
No
Yes
female
245
184
429
male
98
58
156
343
242
585
298
Research question:
Is there a relation between absence and gender ?
184/429 = 42.9% of the females, and 58/156 = 37.2% of the males have been
absent
This suggests that females are more absent than males
However, even if absence due to sickness is equally frequent amongst males and
females, the above results could have occurred by pure chance.
It therefore would be of interest to calculate how likely it would be to observe such
differences, by pure chance
Introduction to Biostatistics
299
Note that we have again two populations, with a sample randomly drawn from
each:
. Males: The hypothetical population of all male employees with similar job
conditions
. Females: The hypothetical population of all female employees with similar
job conditions
From the first population, a random sample of n1 = 156 males was taken. From
the second one, a random sample of n2 = 429 females was drawn.
Let 1 and 2 denote the proportion of males and females in the total populations
Then 1 and 2 can be estimated based on their sample versions c1 = 0.372 and
c2 = 0.429
Because there is no relation between the observations taken from the first
population and those taken from the second, we have unpaired data.
Introduction to Biostatistics
300
15.2
Often, it is of interest to test whether two populations have the same percentage
of people with absence due to sickness.
This is translated in a set of hypotheses of the form:
H0 : 1 = 2
versus
HA : 1 6= 2
We will reject the null hypothesis if the observed data show too much deviation
from what is expected to see if the null hypothesis were correct
Hence, we will reject H0 if c1 is much larger than c2 , or vice versa
c | is too large
This is equivalent with rejecting H0 if |c1
2
Introduction to Biostatistics
301
Question:
How large is too large ?
Answer:
If the observed difference |c1 c2|
Introduction to Biostatistics
302
So, even if there is no relation at all between gender and absence, then one can
still expect to observe a difference of 5.7% in 21.5% of the future similar
experiments.
Since p = 0.215 > 0.05 = , we consider this unsufficient evidence to conclude
that the occurrence of sickness absence is related to gender
Introduction to Biostatistics
303
Conclusion:
There is no significant difference (p = 0.215) in prevalence
of sickness absence
between males and females
The testing procedure needed for the comparison of proportions in unpaired data
is called the chi-squared test since the calculation of the p-value is based on the
chi-squared (2 ) distribution.
Introduction to Biostatistics
304
15.3
c
c
The calculation of the p-value is based on the sampling distribution of
1 2 ,
c can be expected in case the experiment
which describes what values for c1
2
would be repeated many times.
c
c
and
Note that
1
2 are the sample averages X 1 and X 2 of the binary variable
sickness absence.
c
c directly follows from
Hence, for large samples, the sampling distribution of
1
2
the CLT
c
c
In small samples, the normality of
1 and 2 can be problematic, and an
alternative calculation of the p-value is needed.
Introduction to Biostatistics
305
The Fisher Exact test provides an alternative way to calculate the p-value,
without relying on the CLT, nor on the assumption of large samples.
As an example, we consider again data on sickness absence, but from a second,
much smaller, company:
Sickness absence
Gender
No
Yes
female
male
10
12
11
14
The results based on the chi-squared as well as on the Fisher Exact test are:
Introduction to Biostatistics
306
Males
Females
Fisher Exact
58/156
184/429
0.215
0.219
2/12
1/2
0.287
0.396
107/330 405/1079
0.091
0.102
Introduction to Biostatistics
p-value
37/97
40/122
0.409
0.477
3/10
48/150
0.895
1.000
56/156
1/11
0.070
0.100
1/12
0/1
0.764
1.000
53/170
0/1
0.501
1.000
378/1089
117/269
0.007
0.009
307
The Fisher Exact test is very time-consuming, and cannot be calculated for large
samples, except with special software.
However, note that, for large samples, the chi-squared test remains possible, and
yields results very similar to the ones that would have been obtained with the
Fisher Exact test
In practice, it is often standard to use Fisher Exact, unless computational
restrictions require the use of chi-squared.
Conclusion:
Large samples: Chi-squared test
Small samples: Fisher Exact test
Introduction to Biostatistics
308
15.4
When comparing two unpaired proportions, the data can always be summarized by
a 2 2 table:
Sickness absence
Gender
No
Yes
female
A+B
male
C+D
A+C
B+D
A+B+C +D
309
One can show that this is equivalent with comparing the percentage of males
(females) between the employees with and without sickness absence:
B
D
=
A+B C +D
Proof:
C
D
=
A+C B+D
B
D
=
B(C + D) = D(A + B)
A+B
C+D
BC = AD
C(B + D) = D(A + C)
C
D
=
A+C
B+D
This implies that, for the analysis of a 2 2 table, rows and columns can be
interchanged.
This is of interest for the analysis of case-control data
Introduction to Biostatistics
310
15.5
Case-control data
We consider the data on cervical cancer, where the relationship between the
occurrence of cervical cancer and the age at first pregnancy is studied.
Data were collected on 49 cancer cases and 317 non-cancer cases (controls). All
women were asked about their age at first pregnancy, and the data are
summarized as:
Disease status
Age
Introduction to Biostatistics
Cervical cancer
Control
25
42
203
245
> 25
114
121
49
317
366
311
Research question:
Is there a relation between cancer and age ?
Introduction to Biostatistics
312
Such a design only allows correct estimation of the percentage of women with first
pregnancy before the age of 25, for cases and controls separately.
However, since rows and columns can be interchanged, this is sufficient to answer
our research question of interest:
Introduction to Biostatistics
313
For testing purposes, rows and columns can be interchanged, implying that the
analysis of case-control data still answers the research question of interest
For descriptive purposes, however, the choice between row and column
percentages entirely depends on the design of the study.
In the above example on cervical cancer, the row-percentages (i.e., percentage of
women with first pregnancy before the age of 25), for cancer cases and controls
separately, are the only ones that reflect the case-control nature of the experiment.
Introduction to Biostatistics
314
15.6
Introduction to Biostatistics
315
It is not clear when chi-squared is used, and when Fisher Exact is used
Introduction to Biostatistics
316
Chapter 16
The comparison of two means: Paired data
. Example
. Confidence interval for the difference of two means
. The paired t-test
. The paired versus unpaired t-test
. Example
. Assumptions
. Example from the biomedical literature
Introduction to Biostatistics
317
16.1
Example
We re-consider the example with the Captopril data, where blood pressure was
taken in 15 hypertensive patients, before and after administration of the drug
Captopril:
318
Without treatment
|
2
Introduction to Biostatistics
|
1
319
1
2
1
2
In the case of paired data, is estimated by the average of all subject-specific
differences between BPs before and after treatment. More specifically, the
variable of interest becomes the difference X in BP before and after treatment:
X = BPbefore BPafter
Introduction to Biostatistics
320
As before, the observed values xi for X can be calculated from the observed
values of the BP in our sample:
Before
After
Change
Patient
DBP
DBP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
130
122
124
104
112
101
121
124
115
102
98
119
106
107
100
125
121
121
106
101
85
98
105
103
98
90
98
110
103
82
xi
5
1
3
2
11
16
23
19
12
4
8
21
4
4
18
is the population mean of the variable X, and inference for can be based on
the within-subject differences xi, rather than on the original BP measurements.
Note that this situation has been explained in full detail in the Chapters 12 and 13.
Introduction to Biostatistics
321
16.2
Introduction to Biostatistics
322
16.3
versus
HA : 1 6= 2
This is equivalent with the following test about the mean of the difference X in
bloodpressure:
H0 : = 0
versus
HA : 6= 0
323
16.4
Introduction to Biostatistics
324
. Paired:
Introduction to Biostatistics
325
Conclusion:
15 2 measurements 6= 30 1 measurement
In general, the analysis of an outcome, measured multiple times per subject
(repeated measures), requires different statistical procedures than when the
outcome is measured only once for each subject.
Introduction to Biostatistics
326
16.5
Example
Obviously, it is important to correctly account for the paired nature of the data
In practice, this requires knowledge about the design of the study and the way
data have been collected
As an example, suppose interest is in testing for differences in BMI between males
and females
Suppose that BMI measurements are available for 100 males and 100 females.
The unpaired t-test is the obvious choice for the analysis, provided all assumptions
are satisfied.
Suppose now that the 100 males and females are taken from 100 married couples,
would this change the preferred method for analysis ?
YES !
Introduction to Biostatistics
327
16.6
Assumptions
It has been shown in the Chapters 12 and 13 that the calculation of both the C.I.
and the p-value entirely depends on the sampling distribution of X, the sample
average of the differences in BP before and after treatment.
In large samples, this sampling distribution is normal (CLT)
In small samples, this normality is only valid in cases where the difference in BP is
(approximately) normally distributed.
Therefore, in case of small samples, one assumes the difference X to be normally
distributed.
Note that, in this context, the sample size refers to the number of pairs, not the
number of observations in the data set
Introduction to Biostatistics
328
Conclusion:
Difference X
|
=0?
Difference X
|
=0?
In our Captopril example, the sample size was small (n = 15). Hence the
histogram of the observed differences should be explored for any evidence against
symmetry
Introduction to Biostatistics
329
Assessment of symmetry is again difficult due to the small sample size, but there
is no strong evidence for severe skewness.
Note that the normality assumption is with respect to the difference X, not the
original measurements.
Introduction to Biostatistics
330
In our example, the original BP measurements (before and after treatment) are
allowed to be skewed, as long as their differences are symmetrically distributed:
After treatment
Before treatment
|
2
|
1
Difference X
|
=0?
Introduction to Biostatistics
331
Note that, in case of skewness, it is often difficult and/or not helpful to transform
the observed differences xi:
. Since often negative
differences are observed, several standard transformations
such as ln() or are not possible
. Even if a transformation such as, e.g., yi = ln(xi + 10) would yield symmetric
observations yi, it is not clear what null hypothesis should be tested.
. Obviously, one can no longer test whether the mean of Y is equal to zero.
In case of skewness, one therefore usually transforms the original data in such way
that the differences become symmetric. This has the advantage that:
. Simple, standard, transformations can often be used
. One can still test for mean zero.
Introduction to Biostatistics
332
For example, a potential transformation for the Captopril data would be:
BPbefore
BPafter
ln(BPbefore)
ln(BPafter)
X = ln(BPbefore) ln(BPafter)
instead of:
BPbefore
BPafter
Introduction to Biostatistics
X = BPbefore BPafter
Y = ln(X + 5)
333
16.7
Introduction to Biostatistics
334
Introduction to Biostatistics
335
Introduction to Biostatistics
336
Chapter 17
The comparison of two proportions: Paired data
. Example
. Mc Nemar test
. Assumptions
. Remark
. Mc Nemar versus chi-squared
. Example from biomedical literature
Introduction to Biostatistics
337
17.1
Example
Consider the data on the prevalence of severe colds in 1319 children, measured at
the ages of 12 and 14.
The response of interest is whether the child had severe colds during the last 12
months
Introduction to Biostatistics
No
212
144
356
256
707
963
468
851
1319
338
Research question:
Is the prevalence of severe colds different at the two ages ?
At age 12, 356/1319 = 27% of the children reported severe colds.
At age 14, this percentage equals 468/1319 = 35%
These data suggest that the prevalence of severe colds increases with age.
It would be of interest to know how likely the observed change in prevalence is to
occur by pure chance.
If this is very unlikely, the above data provide evidence that the prevalence indeed
changes with age. Otherwise, the above data do not provide evidence for such a
change.
Introduction to Biostatistics
339
Note that the data structure is similar to the one in the Captopril data, in the
sense that subjects are measured twice at different time points:
340
17.2
Mc Nemar test
Let 1 and 2 be the percentage of children in the total population with a severe
cold at the ages 12 and 14 respectively.
Interest is in testing whether 1 and 2 are equal, which would reflect no change
over time in the percentage of children with a severe cold.
The hypothesis of interest is
H0 : 1 = 2
versus
HA : 1 6= 2
Note that a change over time in the percentage of severe colds can only occur if
children change their status:
. No severe cold at 12yrs severe cold at 14yrs
341
Moreover, in order to have a change over time, more children should change in
one direction than in the other
Our test will therefore reject H0 if the number of changers in one direction is
much larger than the number of changers in the other direction.
In our example, we will reject H0 if |256 144| is too large
Question:
How large is too large ?
Answer:
If the observed difference |256 144|
342
This p-value !
So, if severe colds would occur equally frequently at both ages, it would be very
unlikely to observe what has been observed in this particular experiment
We therefore conclude that our data provide evidence that the probability of
having a severe cold at the age of 12 is not the same as the probability of having a
severe cold at the age of 14.
Introduction to Biostatistics
343
Conclusion:
There is a significant difference (p < 0.0001) in the
occurrence of severe colds between the ages 12 and 14
The testing procedure needed for the comparison of proportions in paired data is
called the Mc Nemar test.
Introduction to Biostatistics
344
17.3
Assumptions
Similarly to the chi-squared test, the calculation of the p-value is based on the
assumption of a large sample
In case of small samples, the p-value can be calculated without approximations
based on CLT
The exact calculation is similar to the Fisher Exact test for unpaired data.
Many statistical packages only support the large-sample calculations.
Introduction to Biostatistics
345
17.4
Remark
As discussed before, the Mc Nemar test rejects H0 if the off-diagonal elements are
too different from each other, i.e., if there are many more changes in one direction
than in the other direction.
This implies that the testing procedure is independent of the observed diagonal
elements
Examples:
Table:
McNemar: comparison:
result:
Introduction to Biostatistics
20 20
200 20
40 50
40 500
60
130
vs.
40
130
p = 0.0142
240
760
vs.
220
760
p = 0.0142
346
17.5
There seems to be a lot of confusion about when Mc Nemar test and when
chi-squared test should be used.
As an example, consider the results from a survey in which 75 people were
questioned about their intended vote in the US presidential elections, before and
after a debate on the national television:
After TV debate
Before
TV debate
Introduction to Biostatistics
Reagan
Carter
Reagan
27
34
Carter
13
28
41
40
35
75
347
Depending on the research question, this table can be analysed in two different
ways:
. Chi-squared: test for relation between vote before and after debate
. Mc Nemar: test for equal proportion Reagan voters before and after debate
Hence, even when data are paired, the chi-squared test can be used
Note that, in case of continuous data, there is no such choice:
. Unpaired data
. Paired data
Introduction to Biostatistics
= Unpaired t-test
= Paired t-test
348
17.5.1
Mc Nemar test
After TV debate
Before
TV debate
Reagan
Carter
Reagan
27
34
Carter
13
28
41
40
35
75
Research question:
Is the proportion Reagan voters the same
before and after the debate ?
Introduction to Biostatistics
349
Hence the observed difference of 45.3% versus 53.3% would happen in 26.36% of
the cases, even if the percentage of voters for Reagan is the same before and after
the debate.
Conclusion:
The debate has not significantly changed the voting
behaviour (p = 0.2636).
Introduction to Biostatistics
350
17.5.2
Chi-squared test
After TV debate
Before
TV debate
Reagan
Carter
Reagan
27
34
Carter
13
28
41
40
35
75
Research question:
Is there a relation between voting behaviour before and
after the debate ?
Or equivalently:
Is the proportion of Reagan voters after the debate the same
amongst those who were in favour of Reagan before the debate as
amongst those who were in favour of Carter before the debate ?
Introduction to Biostatistics
351
The observed difference of 79.4% versus 31.7% is very unlikely to happen if there
would be no relation between the voting behaviour before and after the debate.
Introduction to Biostatistics
352
Conclusion:
There is a significant relation between the voting behaviour
before and after the debate (p < 0.0001).
Introduction to Biostatistics
353
17.5.3
General conclusion
The survey results can be analysed in two different ways, leading to two different
conclusions:
. Mc Nemar: There is no evidence that a TV debate would change the results
of an election (p = 0.2636)
. Chi-squared: There is a strong relation between voting behaviour before and
after the debate (p < 0.0001).
Note that the proportion of Reagan voters before and after a TV debate could
also be compared based on unpaired data.
One then would question 75 people before the debate, and one would question 75
other people after the debate.
Introduction to Biostatistics
354
Reagan
Carter
Before
34
41
75
After
40
35
75
74
76
150
The chi-squared test would compare the observed proportions 34/75 = 45.3% and
40/75 = 53.3%, which are the same ones as those compared before with the
Mc Nemar test for the experiment with paired observations
Introduction to Biostatistics
355
17.5.4
Table:
2: comparison:
result:
McNemar: comparison:
result:
Introduction to Biostatistics
25 25
10 10
40 10
5 20
25 25
40 40
10 40
45 30
25
50
vs.
25
50
10
50
vs.
10
50
40
50
vs.
10
50
5
50
vs.
20
50
p = 1.0000
p = 1.0000
p < 0.0001
p = 0.0291
50
100
50
100
50
100
50
100
vs.
50
100
p = 1.0000
vs.
20
100
p < 0.0001
vs.
50
100
p = 1.0000
vs.
25
100
p = 0.0098
356
17.6
Mc Nemar test to compare the presence of sumptoms before and after surgery.
Introduction to Biostatistics
357
Part VI
Further topics on statistical inference
Introduction to Biostatistics
358
Chapter 18
Errors in statistics: Basic concepts
. Introduction
. Two types of errors
. Power
. Sample size calculation
. Examples
. Remarks
. Example from the biomedical literature
Introduction to Biostatistics
359
18.1
Introduction
Re-consider the example on the weight gain in rats, where interest is in the
comparison between rats fed on a high or low protein diet
Group-specific histograms:
Introduction to Biostatistics
360
Introduction to Biostatistics
361
Conclusion:
There is no significant difference (p = 0.0757) in weight gain
between rats on a high protein level diet,
and rats on a low protein level diet
Introduction to Biostatistics
362
Alternatively, if the t-test would have lead to p = 0.001, this would still not
formally proof that there is a difference between both populations.
After all, p = 0.001 would only indicate that the observed difference of 19g occurs
once every 1000 times, even if there is no difference at all between both
populations.
Maybe, our sample was indeed the extreme one that happens once every thousand
experiments.
Hence, whenever statistical tests are used, one has to be aware that errors in the
conclusions can occur.
It is therefore important to quantify the errors, and to keep them under
control
Introduction to Biostatistics
363
18.2
Accept H0
H0 correct
H0 not correct
No error
Type II error
No error
Introduction to Biostatistics
364
18.3
Type I error
A type I error occurs if H0 is correct but the test leads to a significant result.
Question:
How likely is such an error to occur ?
365
The probability of making a type I error is therefore equal to the chosen level of
significance.
In practice, the probability of making a type I error is kept under control by
choosing sufficiently small
In biomedical sciences = 5% is often used, hereby allowing to make a type I
error in 5% of the cases.
Reality
H0 correct
Test result
Accept H0
Reject H0
H0 not correct
1
If H0 is correct, then the probability of making a type I error is , while the
probability of correctly accepting H0 is 1 .
Introduction to Biostatistics
366
18.4
Type II error
A type II error occurs if H0 is incorrect but the test has not detected this, i.e., a
non-significant result is obtained
Question:
How likely is such an error to occur ?
In contrast to the type I error, the probability of making a type II error is not easily
controlled, and depends on various aspects of the sample(s) and population(s)
Introduction to Biostatistics
367
Reality
Test result
H0 correct
H0 not correct
Accept H0
Reject H0
Introduction to Biostatistics
368
18.5
Power
Introduction to Biostatistics
369
As before, let 1 and 2 represent the average weight gain in the total population,
under high and low protein diets, respectively.
The null and alternative hypotheses are given by
H0 : 1 = 2
versus
HA : 1 6= 2
Introduction to Biostatistics
370
Graphically:
Low protein
High protein
2 .. . 2 ..
.
.................................... .....................................
|
|
2
1
.................
........................
Introduction to Biostatistics
371
18.5.1
Power as a function of
Intuitively: Type I errors are less likely if the null hypothesis is rejected less
often. However, in cases where H0 is truly wrong, it will still be rejected less often.
An extreme case is obtained for = 0:
Introduction to Biostatistics
372
18.5.2
Intuitively: Large deviations from the null hypothesis are easier to detect
Low protein
High protein
|
|
2
1
.................
.........................
Introduction to Biostatistics
Low protein
High protein
| |
2 1
.........
.
373
18.5.3
High protein
|
|
2
1
..................
.........................
Introduction to Biostatistics
Low protein
High protein
|
|
2
1
..................
........................
374
18.5.4
Introduction to Biostatistics
375
18.5.5
Conclusion
Introduction to Biostatistics
376
18.6
Introduction to Biostatistics
377
. We are not concerned about small powers for detecting smaller differences, as
such differences are not relevant anyway.
One can then calculate the number(s) of observations needed to reach a desired
level of power.
Introduction to Biostatistics
378
18.7
In the weight gain data, the observed difference of 19g was found not to be
significant (p = 0.0757)
We can calculate the power that a real difference of 19g would be found
significant if a new experiment were to be conducted, again with 12 and 7
observations in the high and low protein diet groups, respectively.
Group-specific summary statistics, from the current experiment:
Introduction to Biostatistics
379
Introduction to Biostatistics
380
Summary:
0g
5.00%
10g
15.70%
19g
43.45%
30g
80.80%
40g
96.49%
: equal to
381
Introduction to Biostatistics
382
Introduction to Biostatistics
383
18.8
Sickness absence
Gender
No
Yes
female
245
184
429
male
98
58
156
343
242
585
The observed difference between the absence rate 42.9% in females and 37.2% in
males was found not significant (chi-squared test, p = 0.215).
Introduction to Biostatistics
384
In case the percentages of sickness absence would be 42% in the total female
population, and 37% in the total male population, and in case a random sample of
429 females and 156 males would be taken, there would be 19.01% chance to
reach a significant effect.
So, if the population proportions are indeed 42% and 37%, an experiment with
429 en 156 would detect this difference only 19 times out of 100 experiments.
If a difference of 5% is considered clinically relevant, then the current experiment
was clearly too small, since it is very likely that such a difference would remain
undetected.
We can calculate how large the samples should be in order to detect a difference
between 42% and 37%, with sufficiently high power
Introduction to Biostatistics
385
Introduction to Biostatistics
386
For example, two samples of approximately 2500 observations are needed in order
to show a difference between 37% and 42%, with 95% probability
Introduction to Biostatistics
387
18.9
Remarks
The earlier examples of power and/or sample size calculations were in the context
of the unpaired t-test and chi-squared test.
Similar calculations can be done in any other statistical testing situation, e.g.,
Fisher Exact test, paired t-test, McNemar test, . . .
Strictly speaking, all experiments should be preceded by a realistic sample size
calculation to avoid experiments with unacceptable high type II error rates, i.e.,
with almost no chance at all to show clinically meaningful effects.
Introduction to Biostatistics
388
18.10
Introduction to Biostatistics
389
Discussion, p.664:
Introduction to Biostatistics
390
The difference on which the sample size calculation was based was much larger
than what actually was observed in the experiment
Therefore, the power to reject equality of the groups was (much) lower than the
expected 80%
The current study cannot tell the difference between a 9% increase and a 3%
decrease.
If such differences are considered clinically important, then the current study was
under-powered, due to the fact that the difference was overestimated at the time
of the sample size calculation.
Introduction to Biostatistics
391
Chapter 19
Errors in statistics: Practical implications
. Multiple testing
. Bonferroni correction
. Tests for baseline differences
. Equivalence tests
. Significance versus relevance
. Examples from biomedical literature
Introduction to Biostatistics
392
19.1
Multiple testing
Introduction to Biostatistics
393
19.1.1
On entry in the classroom, assign each student at random to be seated at the left
or at the right side of the classroom
Compare both sides with respect to 100 aspects including weight, height, age,
gender, color of hair, color of eyes,. . .
It is to be expected that for at least 5 of these outcomes, a significant difference is
obtained at the 5% level of significance, by pure chance.
Introduction to Biostatistics
394
19.1.2
. 18 tests performed
. only 2 significant results
Introduction to Biostatistics
395
19.1.3
Introduction to Biostatistics
396
19.1.4
It was even stated that those who wake up before 7.21am have a statistically
significant higher stress level during the day than those who wake up after 7.21am.
Introduction to Biostatistics
397
19.1.5
Conclusion
Introduction to Biostatistics
398
For example, a new experiment might show no difference in stress levels between
subjects waking up early and those waking up late. Or maybe a difference would
be found only when waking up is later than 8.12am.
Introduction to Biostatistics
399
19.2
Bonferroni correction
Introduction to Biostatistics
400
For example, performing 2 tests at the 2.5% level of significance each implies that
the probability of making at least one type I error will not exceed 5%.
In general, when k tests are performed at the /k level of significance, one is sure
that the overall probability of making at least one type I error will not exceed .
This correction of the significance level is called the Bonferroni correction.
When confidence intervals are used instead of p-values, the confidence levels can
be corrected in a similar way
Introduction to Biostatistics
401
Some examples:
Number of tests
Significance level
Confidence level
0.05
95%
0.025
97.5%
0.01
99%
0.05/k
(1 0.05/k) 100%
For example, if CI1 , CI2 , . . . CI5 are 5 intervals with 99% confidence, for 5
unknown parameters 1 , 2 , . . . , 5, then there is at least 95% probability that all
5 C.I.s will contain all 5 unknown parameters:
P (CI1 contains 1 and
Introduction to Biostatistics
...
402
Introduction to Biostatistics
403
19.3
Introduction to Biostatistics
404
Introduction to Biostatistics
405
Note that the reader cannot perform the Bonferroni correction as the exact
p-values have not been reported.
Introduction to Biostatistics
406
19.4
In order to show causal effects, patients are often randomized into 2 or more
groups
This ensures (at least in large studies) that all treatment groups are identical,
except for the treatment the patients receive
In (relatively) small studies, imbalances can still occur by pure chance
Therefore, one often compares the various groups with respect to important
factors which are believed to be strongly related to the outcome of interest.
This is called testing for baseline differences, as one compares the
characteristics of the patients at the start of the study.
Introduction to Biostatistics
407
versus
HA : A 6= B
Note that H0 and HA express properties of the populations, not the samples
Introduction to Biostatistics
408
409
In our example suppose that a 95% confidence interval for the average difference
in age (years) is given by [0.1; 0.3], then we believe that this difference would be
too small to explain why patients in group A show more decrease in BP than
patients in group B.
Note also that testing for baseline differences cannot be used to check whether
the randomization was done properly.
Introduction to Biostatistics
410
19.5
Introduction to Biostatistics
411
Introduction to Biostatistics
412
19.6
Equivalence tests
Suppose two groups A and B are to be compared, and a two-sample t-test is used
to test
H0 : A = B versus HA : A 6= B
In case of a non-significant test result, one often concludes that both groups are
identical or equivalent
An alternative interpretation is that the experiment did not have sufficient power
to show an effect which is present.
Conclusion:
Non-significance should not be interpreted as equivalence
Introduction to Biostatistics
413
This can also be seen from the fact that, if the two-sample t-test could be used to
show equivalence, it would be best to collect data on (extremely) small samples,
as this would increase the chance to obtain an non-significant result, due to lack
of power.
Instead, one should reverse H0 and HA:
H0 : |A B | >
versus
HA : |A B |
414
95% C.I.
...
...
..
...
...
....
..
...
...
...
...
....
..
...
...
...
...
.
[c
A B
....
.......
.. .. ...
.. .... ..
....
...
....
..
....
..
...
....
...
...
...
..
...
...
....
..
...
...
...
...
....
..
...
...
...
...
.
0
Graphically, H0 would not be rejected if:
95% C.I.
[
...
...
...
...
....
..
...
....
..
...
....
..
...
....
..
...
....
A B
c
.....
......
.. ... ..
.. ... ...
...
..
....
..
...
....
..
...
....
..
...
...
...
...
....
..
...
....
..
...
....
..
...
....
..
...
....
0
Introduction to Biostatistics
415
Obviously, the result of the equivalence test entirely depends on the choice of
Therefore, needs to be specified prior to the data collection
Introduction to Biostatistics
416
19.7
Introduction to Biostatistics
417
. Table 1:
No significant
differences !
Introduction to Biostatistics
418
Introduction to Biostatistics
419
Introduction to Biostatistics
420
. Study design:
421
. Definition of equivalence:
Introduction to Biostatistics
422
Introduction to Biostatistics
423
19.8
We discussed before that the power to detect some effect increases with the
sample size
This implies that any effect , no matter how small, will, sooner or later, be
detected, if the sample is sufficiently large.
For example, consider the Captopril data, where the observed difference of 9.27
mmHg was found significantly different from zero (p < 0.001), based on data
from 15 patients only:
Introduction to Biostatistics
424
The 99% confidence interval for the average change in BP was found to be
[3.02; 15.52].
Suppose that the observed difference would have been 0.1 mmHg.
A p-value as small as 0.001 would be likely to be obtained, provided that the
sample would be sufficiently large.
Obviously, an average change in BP as small as 0.1 mmHg is not relevant from a
clinical point of view.
Conclusion:
Statistical significance
Introduction to Biostatistics
6=
Clinical relevance
425
p = 0.0001
95% C.I.
.
....
...
..
...
...
...
...
....
..
...
....
..
...
....
..
...
..
A highly significant effect can also be a very small effect, but estimated with high
precision, due to a large sample size:
95% C.I.
....
...
..
...
...
....
..
...
...
...
...
....
..
...
....
..
...
p = 0.0001
[c]
0
Introduction to Biostatistics
426
Introduction to Biostatistics
427
Chapter 20
One-sided versus two-sided tests
. Introduction
. One-sided tests
. Example
. Example from the biomedical literature
Introduction to Biostatistics
428
20.1
Introduction
c = 9.27 mmHg
Re-consider the Captopril data, where the observed difference of
was found significantly different from zero (p < 0.001):
versus
HA : 6= 0
429
This implies that an observed difference much larger or much smaller than 0
provides evidence against H0
This is also reflected in the calculation of the p-value:
p is the probability of observing an average difference at
least as far away from 0 as 9.27, if = 0.
This is equivalent to
p is the probability of observing an average difference larger
than 9.27 or smaller than 9.27, if = 0.
Introduction to Biostatistics
430
Graphically:
p/2
...........
..... ...
... ...
. .....
...
....
...
...
....
.
|
9.27
Introduction to Biostatistics
p/2
|
0
....
................
.. ..
.
.
...
..
...
.
.
.
..
...
....
|
9.27
431
20.2
One-sided tests
versus
HA : > 0
Introduction to Biostatistics
432
Graphically:
p
|
9.27
Introduction to Biostatistics
|
0
....
.................
. ..
.
.
.
....
...
...
.
.
.
...
...
|
9.27
433
Introduction to Biostatistics
434
However, the study objectives should never be influenced by the data that are
observed.
One-sided testing is justified only if
. it is known that an effect, if any, can only be
in one direction
. only one direction is of scientific interest
. the decision is made prior to the data collection
Introduction to Biostatistics
435
20.3
Example
In the context of the Captopril data, suppose that one is only interested in
treatments which yield an average decrease of at least 5 mmHg in diastolic BP.
This would lead to testing
H0 : 5
versus
HA : > 5
Note that only differences larger than 5 can be used as evidence against H0
The p-value is calculated as:
p is the probability of observing an average difference at
least as large as 9.27, if = 5.
Introduction to Biostatistics
436
Graphically:
p
|
5
Introduction to Biostatistics
...
..................
. ..
.
.
..
...
...
.
.
...
...
...
|
9.27
437
Introduction to Biostatistics
438
20.4
Introduction to Biostatistics
439
Results, p.8316:
Results (abstract):
Introduction to Biostatistics
440
Chapter 21
Describing associations
. Introduction
. Pearson correlation
. Relative risk
. Odds ratio
. Examples from biomedical literature
Introduction to Biostatistics
441
21.1
Introduction
All test procedures discussed so far aim at expressing to what extent an observed
relation between two variables can be ascribed to pure chance:
. Unpaired t-test: The relation between a continuous response Y (e.g., weight
gain) and a dichotomous variable X (e.g., protein level) which defines the
groups to be compared.
. Chi-squared test: The relation between a dichotomous response Y (e.g.,
sickness absence) and a dichotomous variable X (e.g., gender) which defines
the groups to be compared.
As discussed before, p-values do not express the size of a relation: A highly
significant effect does not necessary mean that the effect is clinically relevant, i.e.,
the association between the variables is not necessarily very strong.
Introduction to Biostatistics
442
Introduction to Biostatistics
443
21.2
Pearson correlation
Introduction to Biostatistics
444
Introduction to Biostatistics
445
Let us focuss on describing the association between the recovery time, and the
log(dose), irrespective of the type of operation.
For each patient, we have two measurements:
. The log(dose): xi for the ith patient
446
x)(yi y)
r
P
P
2
2
(x
x)
i i
i (yi y)
i (xi
where x and y are the sample averages of the observed x-values and y-values,
respectively:
1
x =
n
xi ,
1
y =
n
yi
Introduction to Biostatistics
447
x)(yi y)
r
r= P
P
2
2
i (xi x)
i (yi y)
r
yi
i (xi
(,+)
(+,+)
(,)
(+,)
x
Introduction to Biostatistics
xi
448
Introduction to Biostatistics
449
Introduction to Biostatistics
450
Note that the correlation r is computed from the observed values (xi, yi), and
only describes the association that has been observed in the sample.
However, this sample correlation r can be considered an estimate for the
population correlation , i.e., the correlation that would be obtained if the
total (infinite) population would be studied.
Usually it is of interest to use the observed sample to test whether can be
considered different from zero
Formally, the following hypothesis is to be tested:
H0 : = 0,
versus
HA : 6= 0
451
P
O
P
U
L
A
T
I
O
N
RANDOM
S
A
M
P
L
E
H0 : = 0
HA : 6= 0
Introduction to Biostatistics
Estimate for
b = r
452
For our example, the correlation matrix for the three variables in the surgery data
set is:
Introduction to Biostatistics
453
Introduction to Biostatistics
454
Note that the normality assumption for the time variable is questionable, implying
that the reported p-values may not be correct
One way to solve this is to transform the variable logarithmically, leading to:
455
21.3
Relative risk
We re-consider the sickness absence example, where the following data were
observed in one of the companies studied:
Sickness absence
Gender
Yes
No
female
117
152
269
male
378
711
1089
495
863
1358
Introduction to Biostatistics
456
The relative risk (RR) quantifies how much more sickness absence occurs in
females, compared to males:
RR =
117/269
= 1.26
=
378/1089
This implies that sickness absence occurs 1.26 times more in females than in males
Alternatively, we can conclude that the risk on sickness absence is 26% larger in
females than in males
As for the correlation coefficient, the RR can be considered an estimate, based on
our sample, for the theoretical relative risk in the total population.
Introduction to Biostatistics
457
Note that a RR equal to 1 would imply that the risk is the same for both genders,
i.e., that there is no relation between sickness absence and gender.
It is therefore often of interest to test whether the relative risk in the population is
equal to 1. Alternatively, C.I.s for the relative risk can be constructed as well.
For example, a 95% C.I. for the RR in our example, is given by [1.0692; 1.4686].
Since 1
/ [1.0692; 1.4686], we know that the null hypothesis of no relation
between gender and sickness absence is rejected.
Note that formal testing of this hypothesis was done before using the chi-squared
and Fisher Exact test.
Introduction to Biostatistics
458
21.4
Odds ratio
We re-consider the data on the relation between the occurrence of cervical cancer
and the age at first pregnancy:
Disease status
Age
Cervical cancer
Control
25
42
203
245
> 25
114
121
49
317
366
It was shown before that there is a highly significant relation between age at first
pregnancy and the occurrence of cervical cancer (p = 0.002, chi-squared and
Fisher Exact).
Introduction to Biostatistics
459
The relative risk of interest would indicate how much more likely cervical cancer is
to occur when the first pregnancy is before the age of 25 years, compared to when
the first pregnancy is after the age of 25 years.
Hence, the relative of interest is
RR =
As discussed before, the case-control nature of this study does not allow
estimation of the proportions needed to calculate the above RR.
This is a direct consequence of the fact that the scientist him-/herself decides how
many cancer cases and how many controls will be selected in the sample.
Introduction to Biostatistics
460
The effect of that decision can be seen from comparing several situations with
different numbers of selected controls:
Table :
RR:
42
Control
203
114
42/(42 + 203)
= 2.96
7/(7 + 114)
42
2030
1140
42/(42 + 2030)
= 3.36
7/(7 + 1140)
This means that the RR can be completely influenced by taking more or less
controls.
Therefore, the RR cannot be used to describe the strength of association in
case-control studies.
Introduction to Biostatistics
461
An alternative to the RR, which can be used for case-control studies, is the odds
ratio, defined as the ratio of the odds of cancer in the 25 group over the odds
of cancer in the > 25 group.
The odds of cancer in the 25 group is defined as:
Odds25 =
42/(42 + 203)
42
=
=
= 0.2069
203/(42 + 203)
203
Note that this odds is a measure for the risk of cancer in the 25 group, since it
will be large if there are many cancer cases, and small otherwise.
Introduction to Biostatistics
462
7/(7 + 114)
7
=
= 0.0614
114/(7 + 114)
114
This odds is a measure for the risk of cancer in the > 25 group, since it will be
large if there are many cancer cases, and small otherwise.
The odds ratio is now defined as:
Odds25
0.2069
OR =
=
= 3.37
Odds>25
0.0614
Introduction to Biostatistics
463
Hence there is 3.37 times more odds on developing cervical cancer when the first
pregnancy is at an age younger than 25 years old.
The odds ratio is difficult to interpret, but it clearly gives a general indication of
how much more risk there is in one group, compared to another group.
Note that the odds ratio also equals:
42 114
= 3.37
OR =
203 7
In general, we have, for a general 2 2 table:
Group 1 Group 2
Introduction to Biostatistics
Case
Control
OR =
AD
BC
464
This shows that, in contrast to the RR, the OR does not depend on the numbers
of selected cases and controls.
This can also be seen in our earlier examples:
Table :
42
Control
203
114
42
2030
1140
RR:
42/(42 + 203)
= 2.96
7/(7 + 114)
42/(42 + 2030)
= 3.36
7/(7 + 1140)
OR:
42 114
= 3.37
7 203
42 1140
= 3.37
7 2030
Introduction to Biostatistics
465
As for the correlation coefficient and the RR, the OR can be considered an
estimate, based on our sample, for the theoretical odds ratio in the total
population.
Note that an OR equal to 1 would imply that the risk is the same for both groups,
i.e., that there is no relation between cervical cancer and the age at first
pregnancy.
In that case, one would also have RR = 1.
It is therefore often of interest to test whether the odds ratio in the population is
equal to 1. Alternatively, C.I.s for the odds ratio can be constructed as well.
Introduction to Biostatistics
466
For example, a 95% C.I. for the OR in our example, is given by [1.4658; 7.7457].
Since 1
/ [1.4658; 7.7457], we know that the null hypothesis of no relation
between cervical cancer and age at first pregnancy is rejected.
Note that formal testing of this hypothesis was done before using the chi-squared
and Fisher Exact test.
Introduction to Biostatistics
467
21.5
Introduction to Biostatistics
468
. Table 1, p.398:
Introduction to Biostatistics
469
Classmates
Impaired
2 (1.3%)
Preterm
99 (41%)
OR =
158 99
= 56
2 142
99/241
RR =
= 33
2/160
Introduction to Biostatistics
470
Chapter 22
Non-parametric statistics
. Introduction
. The principle of ranks
. Wilcoxon test
. Example: Survival times in cancer patients
. Spearman correlation
. Example: Surgery data
. Remarks
. Examples from biomedical literature
Introduction to Biostatistics
471
22.1
Introduction
472
Introduction to Biostatistics
473
22.2
Introduction to Biostatistics
474
This suggests that the colon-cancer cases have longer survival times than the
stomach-cancer cases, i.e., that the distribution of the the survival times in one
group is shifted more to the right from the distribution in the other group.
This implies that, if all observations would be ranked, we expect to see more
observations from the stomach-cancer group in the lower ranks, and more from
the colon-cancer group in the higher ranks.
This suggests that it is sufficient to study the ranks of the observations, i.e.,
which observations are larger/smaller than others, to decide whether the survival
times in both groups can be assumed to be sampled from the same distribution.
The actual location of the observations is not needed, it is sufficient to know their
ranks.
Most non-parametric tests are based on replacing the observations by their ranks.
Introduction to Biostatistics
475
Introduction to Biostatistics
476
22.3
Wilcoxon test
The Wilcoxon test is the non-parametric version of the unpaired t-test. Hence, it
allows comparison of two populations, without having to assume the data to be
normally distributed in both populations
The null and alternative hypotheses are:
Introduction to Biostatistics
Colon cancer
477
Hence, the alternative assumes that one distribution is just shifted from the other.
As an example of how the Wilcoxon test proceeds, consider the comparison of two
populations (A en B), on the basis of the following two samples:
A 7 4 9 17
B 11 6 21 14 18
The observations are now sorted, while keeping track of the population from
which they were sampled (group A or B):
4 6 7 9 11 14 17 18 21
A B A A B B A B B
Introduction to Biostatistics
478
The observed values are now replaced by their rank in the complete data set
(groups A and B together):
1 2 3 4 5 6 7 8 9
A B A A B B A B B
The sum of the ranks of all observations from one group is now calculated. For
example, for group A, this becomes:
WA = 1 + 3 + 4 + 7 = 15
Obviously, if WA is exceptionally large, this means that the observations in group
A are located more to the right, when compared to the observations in group B
Introduction to Biostatistics
479
Introduction to Biostatistics
480
Hence, even if the two samples were drawn from the same population, there would
be 28.57% chance of observing two samples shifted from each other as much as in
the current experiment, by pure chance.
Hence, what has been observed in the current experiment is perfectly in line with
what is to be expected, if the two populations are identical.
Introduction to Biostatistics
481
482
22.4
The survival times of colon cancer patients was compared before with those of
stomach cancer patients, using the unpaired t-test, after logarithmic
transformation of the survival times.
We can now repeat this non-parametrically, for the original as well as
log-transformed survival times:
t-test
Wilcoxon
Original data
p = 0.2483
p = 0.0945
Log-transformed data
p = 0.0671
p = 0.0945
Introduction to Biostatistics
483
Note that the Wilcoxon test yields a p-value closer to the one obtained from the
t-test based on log-transformed data than to the one obtained from the t-test
based on the original data
Since the Wilcoxon test is based on ranks rather than the original data,
transforming the data will not affect the result, as long as monotonic
transformations are used.
Introduction to Biostatistics
484
22.5
Spearman correlation
Introduction to Biostatistics
485
yi
xi
yi xi
yi
0 0.10 10 6.55
13 8.17
2 1.05
6 5.30 12 6.65
4 4.00
8 5.75
xi
Each value xi is now replaced by its rank amongst all observed values for X.
Similarly, each value yi is now replaced by its rank amongst all observed values
for Y .
Introduction to Biostatistics
486
Grahically:
rank(yi)
rank(xi)
One now calculates a Pearson correlation as a measure of association between
the so-obtained ranks.
Introduction to Biostatistics
487
In the above example, the ranks show a perfect linear relation, implying that the
Spearman correlation will equal 1.
Note that the original data did not show a perfect linear fit, implying that the
Pearson correlation would be less than 1.
The Spearman correlation coefficient measures to what extent there is a
monotone relation between X and Y , and has the following properties:
. 1 r 1
. r < 0 : negative trend between the xi and the yi
. r > 0 : positive trend between the xi and the yi
. r = 1 : there is a perfect negative monotone relation between the xi and yi
. r = 1 : there is a perfect positive monotone relation between the xi and yi
. r = 0 : there is no monotone trend between the xi and the yi
Introduction to Biostatistics
488
Introduction to Biostatistics
489
22.6
Introduction to Biostatistics
490
Introduction to Biostatistics
491
Before, a Pearson correlation analysis was performed, and the variable Time was
log-transformed in order to satisfy the normality assumption.
We compare the previous results with those from a non-parametric Spearman
correlation analysis:
Note that Spearman correlations are not always larger/smaller than Pearson
correlations.
Since the Spearman correlation is based on ranks rather than the original data,
monotone tansformations of the data will not affect the result.
Introduction to Biostatistics
492
22.7
Remarks
Introduction to Biostatistics
493
+ Parametric tests
. Medians and interquartile ranges + Non-parametric tests
. Means and standard deviations
In case the distributional assumptions of a specific test are satisfied, one has the
choice between the parametric and non-parametric test.
In such cases, the parametric techniques are to be preferred, as they are more
powerful to detect relevant effects.
Unfortunately, many research questions will require more complex statistical tools
for which no non-parametric alternatives are available.
Introduction to Biostatistics
494
22.8
495
. Figure 3:
Introduction to Biostatistics
496
497
Introduction to Biostatistics
498
Bibliography
Introduction to Biostatistics
499
Bibliography
[1] S. Graham and W. Shotz. Epidemiology of cancer of the cervix in buffalo, new york. Journal of the National Cancer Institute, 63:2327,
1979.
[2] D.J. Hand, F. Daly, A.D. Lunn, K.J. McConway, and E. Ostrowski. A handbook of small datasets. Chapman & Hall, first edition, 1989.
[3] P. Armitage and P. Berry. Statistical methods in medical research. Blackwell Scientific Publications, 1987.
[4] E. Cameron and L. Pauling. Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival times
in terminal human cancer. Proceedings of the National Academy of Science U.S.A., 75:45384542, 1978.
[5] G.A. MacGregor, N.D. Markandu, J.E. Roulston, and J.C. Jones. Essential hypertension: effect of an oral inhibitor of
angiotensin-converting enzyme. British Medical Journal, 2:11061109, 1979.
[6] M. Bland. An introduction to medical statistics. Oxford University Press, 3 edition, 2006.
[7] J.D. Robertson and P. Armitage. Comparison of two hypotensive agents. Anaesthesia, pages 5364, 1959.
[8] H.A. Boushey, C.A. Sorkness, T.S. King, et al. Daily versus as-needed corticosteroids formild persistent asthma. The New England
Journal of Medicine, 352:15191528, 2005.
[9] N. Marlow, D. Wolke, M.A. Bracewell, et al. Neurologic and developmental disability at six years of age after extremely preterm birth.
The New England Journal of Medicine, 352:919, 2005.
Introduction to Biostatistics
500
[10] C.A. Wong, B.M. Scavone, A.M. Peaceman, et al. The risk of cesarean delivery with neuraxial analgesia given early versus late in labor.
The New England Journal of Medicine, 352:655665, 2005.
[11] F. Blanchon, M. Grivaux, B. Asselain, et al. 4-year mortality in patients with non-small-cell lunc cancer: development and validation of a
prognostic index. Lancet Oncology, 7:829836, 2006.
[12] K.M. Kellett, D.A. Kellett, and L.A. Nordholm. Effects of an exercise program on sick leave due to back pain. Physical Therapy,
71:283293, 1991.
[13] S.P. Wu. Maximum acceptable weight of lift by chinese experienced male manual handlers. Applied Ergonomics, 28:237244, 1997.
[14] T. Nawrot, M. Plusquin, J. Hogervorst, et al. Environmental exposure to cadmium and risk of cancer: a prospective population-based
study. The Lancet Oncology, 7:119126, 2006.
[15] S.E. Nissen, E.M. Tuzcu, P. Schoenhagen, et al. Statin therapy, LDL cholesterol, C-reactive protein, and coronary artery disease. The
New England Journal of Medicine, 352:2938, 2005.
[16] E. Zuskin, J. Mustajbegovic, N. Schachter, et al. Longitudinal study of respiratory findings in rubber workers. American Journal of
Industrial Medicine, 30:171179, 1996.
[17] N.H. Chen, P.C. Wang, M.J. Hsieh, et al. Impact of severe acute respiratory syndrome care on the general health status of healthcare
workers in taiwan. Infection Control and Hospital Epidemiology, 28:7579, 2007.
[18] C.A.S. De Clercq, J.S.V. Abeloos, M.Y. Mommaerts, and L.F. Neyt. Temporomandibular joint symptoms in an orthognathic surgery
population. Journal of Cranio Maxillo-Facial Surgery, 23:195199, 1995.
[19] A.I. Amin, O. Hallbook, A.J. Lee, R. Sexton, B.J. Moran, and R.J. Heald. A 5-cm colonic j pouch colo-anal reconstruction following
anterior resection for low rectal cancer results in acceptable evacuation and continence in the long term. Colorectal Disease, 5:3337,
2003.
[20] S. Kaplan, S. Etlin, I. Novikov, and B. Modan. Occupational risks for the development of brain tumours. American Journal of Industrial
Medicine, 31:1520, 1997.
Introduction to Biostatistics
501
[21] Y. Baba, J.D. Putzke, N.R. Whaley, Z.K. Wszolek, and R.J. Uitti. Gender and the parkinsons disease phenotype. Journal of Neurology,
252:12011205, 2005.
[22] T. Shatari, M.A. Clark, T. Yamamoto, A. Menon, C. Keh, J.Alexander-Williams, and M. Keighley. Long strictureplasty is as safe and
effective as short strictureplasty in small-bowel crohns disease. Colorectal Disease, 6:438441, 2004.
[23] P. Sripalakit, P. Nermhom, and S. Maphanta. Bioequivalence evaluation of two formulations of Doxazosin tablet in healthy Thai male
volunteers. Drug Development and Industrial Pharmacy, 31:10351040, 2005.
[24] L.F. Hutchins, S.J. Green, P.M. Ravdin, D. Lew, S. Martino, M. Abeloff, A.P. Lyss, C. Allred, S.E. Rivkin, and C.K. Osborne.
Randomized, controlled trial of Cyclophosphamide, Methotrexate, and Fluorouracil versus Cyclophosphamide, Doxorubicin, and
Fluorouracil with and without Tamoxifen for high-risk, node-negative breast cancer: Treatment results of intergroup protocol int-0102.
Journal of Clinical Oncology, 23:83138321, 2005.
[25] T. Giantomaso, L. Makowsky, N.L. Ashworth, and R. Sankaran. The validity of patient and physician estimates of walking distance.
Clinical Rehabilitation, 17:394401, 2003.
[26] S.A. Choksy, P.L. Chong, C. Smith, M. Ireland, and J. Beard. A randomised controlled trial of the use of a tourniquet to reduce blood loss
during transtibial amputation for peripheral arterial disease. European Journal of Vascular and Endovascular Surgery, 31:646650, 2006.
[27] C.-C.J. Huang, C.-M. Li, C.-F. Wu, S.-P. Jao, and K.-Y. Wu. Analysis of urinary N-acetyl-S-(propionamide)-cysteine as a biomarker for
the assessment of acrylamide exposure in smokers. Environmental Research, 104:346351, 2007.
Introduction to Biostatistics
502