Académique Documents
Professionnel Documents
Culture Documents
(AS04)
EPM304 Advanced Statistical Methods in Epidemiology
This document contains a copy of the study material located within the computer
assisted learning (CAL) session.
If you have any questions regarding this document or your course, please contact
DLsupport via DLsupport@lshtm.ac.uk.
Important note: this document does not replace the CAL material found on your
module CDROM. When studying this session, please ensure you work through the
CDROM material first. This document can then be used for revision purposes to
refer back to specific sessions.
These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of
the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale
or further copying.
London School of Hygiene & Tropical Medicine September 2013 v2.0
SM02
SM11
Section 3: Introduction
In a cohort study a group of people is followed over a period of time to
study the occurrence of disease.
Interaction: Hyperlink: cohort
Output (appears in separate window):
A study in which subsets of a defined population can be identified that are, have
been, or may in the future be exposed, or not exposed, to a factor that is thought to
influence the probability of occurrence of a given disease or other outcome.
Cohort studies can also be called follow-up, longitudinal, or prospective studies.
3.1: Introduction
Typically, we are interested in the rate of occurrence of disease in the group and how
this rate varies between sub-groups with different patterns of exposure.
In the examples that we have looked at previously, the exposure has always been
the same through time for each individual and so the sub-groups have been fixed
through the duration of the cohort.
Can you think of any examples of exposures that are fixed through time?
Interaction: Button: Show
Output:
Gender, age at entry to the cohort and country of origin will all be fixed during the
duration of the cohort.
3.2: Introduction
We will now consider exposures that can change during the follow up period. We
can use the same analysis to calculate the usual estimates of crude, stratumspecific and adjusted rate ratios, but first some data manipulation is necessary.
Let's first think about the types of variable that change over time. Can you think
of any? Click the button below for examples.
Interaction: Button: Example (1):
Output:
What is a time changing variable?
Interaction: Tabs: Fixed :
Output:
Some characteristics of a person are fixed and never change, for example 'date of
birth', 'colour of eyes', 'place of birth'.
Interaction: Tabs: Random :
Output:
Other variables can change randomly.
For example, a person may be a smoker, give up smoking, and start smoking
again six months later. Such details can be difficult to obtain but it is sometimes
possible.
3.3: Introduction
In the diagram below, subject 3 spends the first half of the follow-up period in the
unexposed group and the second half of the follow-up period in the exposed
group. To deal with this we can split the follow-up period into 2 parts.
In the same way it is possible to split the follow up time of an individual into
many parts.
The method that splits the individual follow-up times, for example into 5-year age
intervals, is called Lexis expansion.
Interaction: Hyperlink: Lexis expansion.:
Output:
Lexis Expansion
A method that splits the follow-up time of individuals in a cohort study. Using this
manipulation of cohort data we can examine the effect of variables which change
over time.
Imagine the exposure of interest is aged 50 years or more, therefore the unexposed
is < 50 years. Consider the 4 subjects shown in the diagram. During a 10-year follow
up period, subject 3 changes from non-exposed to exposed.
3.4: Introduction
Example
Imagine an individual followed for 12 years from the age of 21 years to 32 years
inclusive.
If we split the follow up time for this individual into the age bands 20 to 24 years,
25 to 29 years and 30 to 34 years, the follow-up time in each interval would be:
(a) 4 years in the ageband 20 to 24 years
(b) 5 years in the ageband 25 to 29 years
(c) 3 years in the ageband 30 to 34 years
3.5: Introduction
To study the effect of changing age on cohort mortality rates, the total
observation time for each individual is split into age specific intervals. Once this is
done, the separate age-specific records for all subjects are treated as
independent and age-specific rates estimated.
3.6: Introduction
To illustrate the basic Lexis principle we will use 3 subjects from a cohort which was
followed from time of entry into the study until 01/01/1981. The event of interest is
death. For simplicity all dates fall on the first day of the month. Click on Swap below
to view these data as a diagram.
Subje
ct
Birth
date
Enter
ed
Age
at
entry
01/03
/1927
01/04
/1935
01/11
/1942
01/07
/1966
01/11
/1961
01/02
/1970
39.3
2
3
26.6
27.3
End
of
follo
w-up
01/09
/1977
01/12
/1973
01/01
/1981
Age
at
exit
Outc
ome
50.5
Alive
38.7
Death
38.2
Alive
The follow-up periods for the three subjects are shown in the diagram above,
plotted on an age scale (rather than calendar date).
Now, click Split below to split the follow-up time into 5-year intervals.
3.7: Introduction
Let's first focus on Subject 1. You can now see the observation time and number
of outcomes in each interval for this subject.
The total observation time for Subject 1 is 11.2 years (from 01/07/1966 to
01/09/1977). This is equal to the sum of the separate times spent in the different
age groups.
For each age interval we want the follow-up time and the outcome for each
individual. Use the drop-down menu below to show the observation time and
outcomes in each interval, for each of the subjects.
Interaction: Pulldown: Subject 1:
Output:
Use Swap to see the original table of data for the subjects.
Interaction: Button: Swap:
Output:
Subject
Birth date
Entered
Age at
entry
End of
follow-up
Age at
exit
Outcome
01/03/1927
01/07/1966
39.3
01/09/1977
50.5
Alive
01/04/1935
01/11/1961
26.6
01/12/1973
38.7
Death
01/11/1942
01/02/1970
27.3
01/01/1981
38.2
Alive
3.8: Introduction
If we add up the follow-up times for each subject within each ageband, we will get
the total observation time for each ageband.
If we add up the number of events (deaths) for each subject within each ageband,
we will get the total number of events for each ageband.
What is the total observation time (in years) for the age-interval 25 to 29 years?
Interaction: Calculation: Total Y =
Incorrect answer :
Output:
No, you should have added the time each subject spent in the age-interval 25 to 29
years.
0.0
+
3.4
+
2.7 =
6.1 years
(Subject 1) (Subject 2) (Subject 3)
Correct answer :
Correct
Yes, the total observation for the age-interval 25 to 29 years is the sum of the time
each subject spent in that interval.
0.0
+
3.4
+
2.7 =
6.1 years
(Subject 1) (Subject 2) (Subject 3)
Correct answer :
Output:
That's correct, none of the subjects died during the 25 to 29 years age interval, so
the number of events is zero.
Interaction: Pulldown: Subject 1:
Output:
3.9: Introduction
In the table below you can now see the values you just calculated for the 25 - 29
years interval. Click below to do this for all age groups.
Interaction: Button: Show:
Output:
Now we have Y and D for each age interval. Using these values we can now compute
the overall ageband specific rates. Click below to do this.
Interaction: Button: Show:
Output:
In this example we have used only
3 subjects as a simple illustration of the Lexis expansion process. In practice this is
done on many subjects.
Rates within age bands for three subjects
Age
25-29
30-34
35-39
40-44
45-49
50-54
Total
Y
6.1
D
0
Rate
Y
6.1
D
0
Rate
30-34
35-39
40-44
45-49
50-54
Total
10.0
7.6
5.0
5.0
0.5
34.2
0
1
0
0
0
1
Y
6.1
10.0
7.6
5.0
5.0
0.5
34.2
D
0
0
1
0
0
0
1
Rate
0/6.1
0/10.0
1/7.6
0/5.0
0/5.0
0/0.5
1/34.2
3.10: Introduction
Once a dataset is changed with Lexis expansion, age-specific rates can be assessed
with the 'new data' from realistic ages rather than age at entry.
Applying a Lexis expansion to follow-up data updates age throughout the follow-up
period. It is important to use Lexis expansion when the follow-up is long term.
Click below to apply the Lexis expansion to records for all individuals in the Whitehall
dataset.
Interaction: Button: Show:
Output(appears on RHS):
This table shows the Lexis expansion for all individuals in the Whitehall dataset.
The original 1677 individual records were split into 5243 age-specific records.
Estimated rates (per 1000) and lower/upper bounds of 95% confidence
intervals
Ageband
40 - 49
50 - 54
55 - 59
60 64
65 69
70 74
75 79
80 - 89
Rate
Lower
Upper
6
18
39
89
94
93
46
18
3.14
4.43
6.00
6.17
4.40
2.44
0.91
0.12
1.91
4.07
6.50
14.43
21.37
38.11
50.55
150.00
0.86
2.56
4.75
11.73
17.46
31.07
37.77
95.76
4.26
6.45
8.89
17.77
26.16
46.66
67.33
241.25
We now know how to update age data throughout a long-term cohort. This is
useful because age may be a potential confounder, effect modifier or a risk factor.
Once we have split the data using a Lexis expansion, we will have one line in the
dataset for each person in each age category that they were in during the duration
of the cohort. Hence, we will typically have many more lines than before we used a
Lexis expansion.
We can now use current age as if it were any of the other time fixed variables
that we have looked at before, because for each line in the dataset, the person and
age category is fixed.
Although it may seem as though we must adjust for the fact that several lines (or
observations) are actually from the same person, this is not necessary (see Clayton
and Hills for an explanation of why this is, if you are interested).
If we now analyse the effect of another exposure and do not include current age
in the model, we will get the same results as if we had not done the Lexis
expansion. This is because the summed person time and number of events in each
category remains the same.
Rate ratio
1.13
2.35
1.87
1.91
1.78
0.98
1.33
1.94
2.30
1.52
Lower CL
0.13
0.88
0.96
1.25
1.19
0.65
0.74
0.64
1.89
1.25
Upper CL
9.71
6.26
3.64
2.91
2.67
1.48
2.39
5.89
2.80
1.86
The 95% confidence interval is narrow and does not include 1. We can be 95%
confident that the low grade workers have a higher mortality compared to high grade
workers in the population of civil servants working in Whitehall. This is after
adjusting for the effect of age and men ageing during the cohort.
Interaction: Tabs: 3 :
Output:
To assess whether the relationship between employment grade and mortality is
confounded by current age, we compare the crude and adjusted rate ratios.
RRcrude = 2.30
RRM-H = 1.52
The adjusted rate ratio is lower than the crude rate ratio. This shows some evidence
of positive confounding. There was an overestimate of the increase in mortality rate
in the low-grade workers. This is due to the age differential in the different grades.
Crude, stratum specific
and adjusted rate ratios
Ageband
40 49
50 54
55 59
60 - 64
65 69
70 74
75 79
80 - 89
Crude
Adjusted
Rate ratio
1.13
2.35
1.87
1.91
1.78
0.98
1.33
1.94
2.30
1.52
Lower CL
0.13
0.88
0.96
1.25
1.19
0.65
0.74
0.64
1.89
1.25
Upper CL
9.71
6.26
3.64
2.91
2.67
1.48
2.39
5.89
2.80
1.86
In the classical analysis we saw how to split the data (using a Lexis expansion)
for time-changing confounders such as age and then stratify by this new variable
to assess potential confounding or effect modification. We can control for such
variables in the same way in a Poisson model.
Standard error
Grade1
0.3454
0.1681
Ageband50
1.8433
1.0541
Ageband55
2.1612
1.0291
Ageband60
2.9162
0.0134
Ageband65
3.3047
0.0128
Ageband70
3.5169
1.0184
Ageband75
3.8104
1.0350
Ageband80
4.4880
1.1214
Constant
-8.1115
1.0006
(appears on page)
What is the adjusted rate ratio for the effect of Grade, to 2 decimal places?
Interaction: Calculation: Adjusted rate ratio =
Output:
Incorrect answer:
No, in fact the adjusted rate ratio for the effect of Grade is given by the
exponential of the coefficient for Grade1.
Adjusted rate ratio = exp(0.3454)
= 1.41
So, the rate in the low-grade workers is 1.41 times greater than that in high-grade
workers, after adjusting for the effect of changing age.
Correct answer:
Correct
The adjusted rate ratio for the effect of Grade is given by:
exp(0.3454) = 1.41.
So, the rate in the low-grade workers is 1.41 times greater than that in high-grade
workers, after adjusting for the effect of changing age.
Estimates from a Poisson model adjusted for age
Coefficient
Standard error
Grade1
0.3454
0.1681
Ageband50
1.8433
1.0541
Ageband55
2.1612
1.0291
Ageband60
2.9162
0.0134
Ageband65
3.3047
0.0128
Ageband70
3.5169
1.0184
Ageband75
3.8104
1.0350
Ageband80
4.4880
1.1214
Constant
-8.1115
1.0006
(RR = 1)
Standard error
Grade1
0.3454
0.1681
Ageband50
1.8433
1.0541
Ageband55
2.1612
1.0291
Ageband60
2.9162
0.0134
Ageband65
3.3047
0.0128
Ageband70
3.5169
1.0184
Ageband75
3.8104
1.0350
Ageband80
4.4880
1.1214
Constant
-8.1115
1.0006
the
High grade
40-
-8.1115
50-
-6.2682
55-
-5.9503
60-
-5.1953
65-
-4.8068
70-
-4.5946
75-
-4.3011
80-
-3.6235
Low grade
-7.7661
(hotspot1)
-5.9228
(hotspot2)
-5.6049
(hotspot3)
-4.8499
(hotspot4)
-4.4614
(hotspot5)
-4.2492
(hotspot6)
-3.9557
(hotspot7)
-3.2781
(hotspot8)
(appears on page):
The tabs below show simple illustrations of no interaction and interaction.
Interaction: Tabs: Proportional :
Output:
Output:
Exit
01/07/1966
01/03/1967
01/01/1970
01/03/1972
01/01/1975
01/03/1977
01/09/1977
This splits the total follow-up time for subject 1 into 6 parts, as shown on the
diagram below.
exposed to the chemical, we would need to compare the mortality rates in this
cohort to an external, reference cohort.
Can you think of any potential biases in such an analysis?
Interaction: Button: thought bubble:
Output (appears on page):
Many studies show that those in an occupational cohort have lower mortality than
the general population. This is because those who are very sick, and hence at higher
risk of death, often cannot work. This is referred to as the healthy worker effect.
Those who are in the reference cohort may not be truly unexposed. For example, a
reference cohort drawn from the communities surrounding a mine may also be
exposed to mine dust and may include ex-miners.
The exposed cohort and the unexposed reference population may differ substantially
in age and calendar period. For example, if an occupational cohort has been followed
for 30 years, it may have experienced changing mortality over three decades, which
would not be reflected in a more recent reference cohort.
Age
group
50-54
55-59
60-64
65-69
70-74
Whitehall cohort
Reference cohort
Deaths
Person-years
(per 1000py)
Mortality rate
(per 1000py)
Mortality rate
(per 1000py)
39
87
92
62
15
22.2599
19.3621
14.6177
5.9896
1.1421
1.752
4.493
6.294
10.351
13.134
3.487
5.569
8.751
13.777
19.946
Age
group
Deaths
Personyears (per
1000py)
Mortality rate
(per 1000py)
Reference
cohort
Mortality rate
(per 1000py)
50-54
55-59
60-64
65-69
70-74
39
87
92
62
15
22.2599
19.3621
14.6177
5.9896
1.1421
1.752
4.493
6.294
10.351
13.134
3.487
5.569
8.751
13.777
19.946
Whitehall cohort
Rate
ratio
0.50
0.81
0.72
0.75
0.66
Incorrect answer:
No, that's not right. The SMR is the observed divided by the expected number of
deaths.
SMR = (39+87+92+62+15) / (77.6+107.8+127.9+82.5+22.8)
= 295 / 418.7
= 0.70
Correct answer:
Yes, the SMR is the observed divided by the expected number of deaths.
SMR = (39+87+92+62+15) / (77.6+107.8+127.9+82.5+22.8)
= 295 / 418.7
= 0.70
So the Whitehall cohort had a 30% lower mortality than the general population,
controlling for the effect of age. SMRs are often quoted to base 100 i.e. this SMR
would be quoted as 70. Note this method of age standardisation is often referred to
as indirect standardisation.
Mortality rate
(per 1000py)
Reference
cohort
Mortality rate
(per 1000py)
Expect
ed
deaths
1.752
4.493
6.294
10.351
13.134
3.487
5.569
8.751
13.777
19.946
77.6
107.8
127.9
82.5
22.8
Whitehall cohort
Age
group
50-54
55-59
60-64
65-69
70-74
Deaths
39
87
92
62
15
Personyears (per
1000py)
22.2599
19.3621
14.6177
5.9896
1.1421
Section 9: Summary
This is the end of AS04. When you are happy with the material covered here please
move on to session AS05 .
The main points of this session will appear below as you click on the relevant title.