As 04

Stratifying on time for cohort studies
(AS04)
EPM304 Advanced Statistical Methods in Epidemiology
Course: PG Diploma/ MSc Epidemiology
This document contains a copy of the study material located within the computer
assisted learning (CAL) session.
If you have any questions regarding this document or your course, please contact
DLsupport via DLsupport@lshtm.ac.uk.
Important note: this document does not replace the CAL material found on your
module CDROM. When studying this session, please ensure you work through the
CDROM material first. This document can then be used for revision purposes to
refer back to specific sessions.
These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of
the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale
or further copying.
London School of Hygiene & Tropical Medicine September 2013 v2.0
Section 1: Stratifying on time for cohort studies

Aims
To learn how to deal with variables that change systematically with time in cohort
studies.
Objectives
By the end of this session students will be able to:
recognise variables that change systematically with time, such as current age and
calendar period
manipulate data to account for such time changing variables using Lexis expansion
compare rates in different subgroups of a time changing variable
assess confounding and effect modification by a time changing variable
compare rates in two time changing variables, using the example of current age
and calendar period
compute and understand standardised mortality ratios (SMRs)
Section 2: Planning your study

The purpose of this session is to introduce methods of data manipulation to deal
with variables that change over time.
To follow this session, you need to be familiar with classical methods of analysing
cohort studies, together with the basics of Poisson regression.
If you have completed SME, then you should make sure that you understand the
following sessions before starting this one:
Cohort studies
Introduction to Poisson and Cox regression
SM02
SM11
Section 3: Introduction
In a cohort study a group of people is followed over a period of time to
study the occurrence of disease.
Interaction: Hyperlink: cohort
Output (appears in separate window):
A study in which subsets of a defined population can be identified that are, have
been, or may in the future be exposed, or not exposed, to a factor that is thought to
influence the probability of occurrence of a given disease or other outcome.
Cohort studies can also be called follow-up, longitudinal, or prospective studies.
'Disease' is used as a general term to refer to the outcome of interest,

whether this is disease onset, death or any other well-defined event.
3.1: Introduction
Typically, we are interested in the rate of occurrence of disease in the group and how
this rate varies between sub-groups with different patterns of exposure.
In the examples that we have looked at previously, the exposure has always been
the same through time for each individual and so the sub-groups have been fixed
through the duration of the cohort.
Can you think of any examples of exposures that are fixed through time?
Interaction: Button: Show
Output:
Gender, age at entry to the cohort and country of origin will all be fixed during the
duration of the cohort.
3.2: Introduction
We will now consider exposures that can change during the follow up period. We
can use the same analysis to calculate the usual estimates of crude, stratumspecific and adjusted rate ratios, but first some data manipulation is necessary.
Let's first think about the types of variable that change over time. Can you think
of any? Click the button below for examples.
Interaction: Button: Example (1):
Output:
What is a time changing variable?
Interaction: Tabs: Fixed :
Output:
Some characteristics of a person are fixed and never change, for example 'date of
birth', 'colour of eyes', 'place of birth'.
Interaction: Tabs: Random :
Output:
Other variables can change randomly.
For example, a person may be a smoker, give up smoking, and start smoking
again six months later. Such details can be difficult to obtain but it is sometimes
possible.
Interaction: Tabs: Deterministic:

Output:
The change in other variables is deterministic. For example, we know that over a
specific length of time a person will age, we can therefore determine the change.
These are what we call time changing variables. Click below for an example
Interaction: Button: Example:

Output:
Example
A person may enter a study at age 25 and be followed up for 20 years, and
therefore become 20 years older. The risk of certain diseases is known to be much
greater in older age groups. For this reason, it is important to worry about timechanging variables, especially age.
(back to main text on LHS):

If the follow-up is fairly short, it is unlikely that exposure variables will change, but
for longer studies some variables will change substantially during the follow up
period. It is necessary to account for such changes.
Interaction: Button: Example (2):

Output (appears on RHS):
Imagine the exposure of interest is aged 50 years or more, therefore the unexposed
is < 50 years. Consider the 4 subjects shown in the diagram. During a 10-year follow
up period, subject 3 changes from non-exposed to exposed.
3.3: Introduction
In the diagram below, subject 3 spends the first half of the follow-up period in the
unexposed group and the second half of the follow-up period in the exposed
group. To deal with this we can split the follow-up period into 2 parts.
In the same way it is possible to split the follow up time of an individual into
many parts.
The method that splits the individual follow-up times, for example into 5-year age
intervals, is called Lexis expansion.
Interaction: Hyperlink: Lexis expansion.:
Output:
Lexis Expansion
A method that splits the follow-up time of individuals in a cohort study. Using this
manipulation of cohort data we can examine the effect of variables which change
over time.
Imagine the exposure of interest is aged 50 years or more, therefore the unexposed
is < 50 years. Consider the 4 subjects shown in the diagram. During a 10-year follow
up period, subject 3 changes from non-exposed to exposed.
3.4: Introduction
Example
Imagine an individual followed for 12 years from the age of 21 years to 32 years
inclusive.
If we split the follow up time for this individual into the age bands 20 to 24 years,
25 to 29 years and 30 to 34 years, the follow-up time in each interval would be:
(a) 4 years in the ageband 20 to 24 years
(b) 5 years in the ageband 25 to 29 years
(c) 3 years in the ageband 30 to 34 years
3.5: Introduction
To study the effect of changing age on cohort mortality rates, the total
observation time for each individual is split into age specific intervals. Once this is
done, the separate age-specific records for all subjects are treated as
independent and age-specific rates estimated.
Lexis expansion assumes that the

true rate for the cohort is constant within each age band. This is clearly an
approximation. In general, little is gained by using intervals shorter than 5 years
unless the rate is changing very rapidly with age. If this occurs, survival analysis
should be used. This was looked at in SM03 and SM11 and will be covered in more
depth in AS06 and AS07.
3.6: Introduction
To illustrate the basic Lexis principle we will use 3 subjects from a cohort which was
followed from time of entry into the study until 01/01/1981. The event of interest is
death. For simplicity all dates fall on the first day of the month. Click on Swap below
to view these data as a diagram.
Subje
ct
Birth
date
Enter
ed
Age
at
entry
01/03
/1927
01/04
/1935
01/11
/1942
01/07
/1966
01/11
/1961
01/02
/1970
39.3
2
3
26.6
27.3
Interaction: Button: Swap:

Output:
End
of
follo
w-up
01/09
/1977
01/12
/1973
01/01
/1981
Age
at
exit
Outc
ome
50.5
Alive
38.7
Death
38.2
Alive
The follow-up periods for the three subjects are shown in the diagram above,
plotted on an age scale (rather than calendar date).
Now, click Split below to split the follow-up time into 5-year intervals.
Interaction: Button: Split:

Output:
3.7: Introduction
Let's first focus on Subject 1. You can now see the observation time and number
of outcomes in each interval for this subject.
The total observation time for Subject 1 is 11.2 years (from 01/07/1966 to
01/09/1977). This is equal to the sum of the separate times spent in the different
age groups.
For each age interval we want the follow-up time and the outcome for each
individual. Use the drop-down menu below to show the observation time and
outcomes in each interval, for each of the subjects.
Interaction: Pulldown: Subject 1:
Output:

Output:

Output:
Use Swap to see the original table of data for the subjects.
Interaction: Button: Swap:
Output:
Subject
Birth date
Entered
Age at
entry
End of
follow-up
Age at
exit
Outcome
01/03/1927
01/07/1966
39.3
01/09/1977
50.5
Alive
01/04/1935
01/11/1961
26.6
01/12/1973
38.7
Death
01/11/1942
01/02/1970
27.3
01/01/1981
38.2
Alive
3.8: Introduction
If we add up the follow-up times for each subject within each ageband, we will get
the total observation time for each ageband.
If we add up the number of events (deaths) for each subject within each ageband,
we will get the total number of events for each ageband.
What is the total observation time (in years) for the age-interval 25 to 29 years?
Interaction: Calculation: Total Y =
Incorrect answer :
Output:
No, you should have added the time each subject spent in the age-interval 25 to 29
years.
0.0
+
3.4
+
2.7 =
6.1 years
(Subject 1) (Subject 2) (Subject 3)
Correct answer :
Correct
Yes, the total observation for the age-interval 25 to 29 years is the sum of the time
each subject spent in that interval.
0.0
+
3.4
+
2.7 =
6.1 years
(Subject 1) (Subject 2) (Subject 3)
Interaction: Calculation: Total D =

Incorrect answer :
Output:
No, none of the subjects died during the 25 to 29 years age interval, so the number
of events is zero.
Correct answer :
Output:
That's correct, none of the subjects died during the 25 to 29 years age interval, so
the number of events is zero.
Output:

Output:

Output:
3.9: Introduction
In the table below you can now see the values you just calculated for the 25 - 29
years interval. Click below to do this for all age groups.
Interaction: Button: Show:
Output:
Now we have Y and D for each age interval. Using these values we can now compute
the overall ageband specific rates. Click below to do this.
Output:
In this example we have used only
3 subjects as a simple illustration of the Lexis expansion process. In practice this is
done on many subjects.
Rates within age bands for three subjects
Age
25-29
30-34
35-39
40-44
45-49
50-54
Total
Y
6.1
D
0
Rate
(changes on when click on first Show button)

Age
25-29
Y
6.1
D
0
Rate
30-34
35-39
40-44
45-49
50-54
Total
10.0
7.6
5.0
5.0
0.5
34.2
0
1
0
0
0
1
(changes on when click on second Show button)

Age
25-29
30-34
35-39
40-44
45-49
50-54
Total
Y
6.1
10.0
7.6
5.0
5.0
0.5
34.2
D
0
0
1
0
0
0
1
Rate
0/6.1
0/10.0
1/7.6
0/5.0
0/5.0
0/0.5
1/34.2
3.10: Introduction
Once a dataset is changed with Lexis expansion, age-specific rates can be assessed
with the 'new data' from realistic ages rather than age at entry.
Applying a Lexis expansion to follow-up data updates age throughout the follow-up
period. It is important to use Lexis expansion when the follow-up is long term.
Click below to apply the Lexis expansion to records for all individuals in the Whitehall
dataset.
Output(appears on RHS):
This table shows the Lexis expansion for all individuals in the Whitehall dataset.
The original 1677 individual records were split into 5243 age-specific records.
Estimated rates (per 1000) and lower/upper bounds of 95% confidence
intervals
Ageband
40 - 49
50 - 54
55 - 59
60 64
65 69
70 74
75 79
80 - 89
Rate
Lower
Upper
6
18
39
89
94
93
46
18
3.14
4.43
6.00
6.17
4.40
2.44
0.91
0.12
1.91
4.07
6.50
14.43
21.37
38.11
50.55
150.00
0.86
2.56
4.75
11.73
17.46
31.07
37.77
95.76
4.26
6.45
8.89
17.77
26.16
46.66
67.33
241.25
Section 4: Adjusting for changing age
We now know how to update age data throughout a long-term cohort. This is
useful because age may be a potential confounder, effect modifier or a risk factor.
Once we have split the data using a Lexis expansion, we will have one line in the
dataset for each person in each age category that they were in during the duration
of the cohort. Hence, we will typically have many more lines than before we used a
Lexis expansion.
We can now use current age as if it were any of the other time fixed variables
that we have looked at before, because for each line in the dataset, the person and
age category is fixed.
Although it may seem as though we must adjust for the fact that several lines (or
observations) are actually from the same person, this is not necessary (see Clayton
and Hills for an explanation of why this is, if you are interested).
If we now analyse the effect of another exposure and do not include current age
in the model, we will get the same results as if we had not done the Lexis
expansion. This is because the summed person time and number of events in each
category remains the same.
4.1: Adjusting for changing age

As an example, consider whether current age acts as a confounder or effect
modifier on the relationship between employment grade and mortality rates in the
Whitehall cohort.
How do we assess confounding and effect modification by 'current' age?
Interaction: Button: thought bubble button:
Output (appears below and on RHS):
Now that we have used a Lexis expansion on the data we can assess for
confounding and interaction in the usual way.
1 First we stratify by age-band and look at the rate ratio for the employment
grade and mortality within each stratum.
2 Then we assess whether there is homogeneity across strata, that is are
the rate ratios similar. Remember we can test this formally using a test for
unequal rate ratios, a test for effect modification.
3 If there is no interaction we can present an adjusted Mantel-Haenszel estimate
of the rate ratio. This is compared to the crude rate ratio to assess confounding.
4 If there is interaction we should present the stratum specific rate ratios.

The rate ratios for the effect of grade on mortality within each 5-year interval of
current age are shown below. The crude and adjusted estimates are also shown.
The test for unequal rate ratios within strata is given below the table.
Considering these results, is there evidence of confounding or effect modification

by current age?
Go on to the next page when you have thought about this.
Crude, stratum specific
and adjusted rate ratios
Ageband
40 49
50 54
55 59
60 - 64
65 69
70 74
75 79
80 - 89
Crude
Adjusted
Rate ratio
1.13
2.35
1.87
1.91
1.78
0.98
1.33
1.94
2.30
1.52
Lower CL
0.13
0.88
0.96
1.25
1.19
0.65
0.74
0.64
1.89
1.25
Upper CL
9.71
6.26
3.64
2.91
2.67
1.48
2.39
5.89
2.80
1.86
Approximate test for unequal rate ratios (interaction):

2 = 7.72, P = 0.3583

Interaction: Tabs: 1 :
Output:
The stratum-specific rate ratios are similar, with wide overlapping confidence limits.
This suggests no interaction.
This is confirmed by the test for interaction, P=0.36.
We can therefore use the Mantel-Haenszel estimate, which is adjusted for the effect
of changing age.
RRM-H = 1.52
Output:
After adjusting for current age, there is still a strong effect of grade on mortality,
with a 52% increase in mortality in the lower grades of employment compared to
the high grades.
The 95% confidence interval is narrow and does not include 1. We can be 95%
confident that the low grade workers have a higher mortality compared to high grade
workers in the population of civil servants working in Whitehall. This is after
adjusting for the effect of age and men ageing during the cohort.
Output:
To assess whether the relationship between employment grade and mortality is
confounded by current age, we compare the crude and adjusted rate ratios.
RRcrude = 2.30
RRM-H = 1.52
The adjusted rate ratio is lower than the crude rate ratio. This shows some evidence
of positive confounding. There was an overestimate of the increase in mortality rate
in the low-grade workers. This is due to the age differential in the different grades.
Crude, stratum specific
and adjusted rate ratios
Ageband
40 49
50 54
55 59
60 - 64
65 69
70 74
75 79
80 - 89
Crude
Adjusted
Rate ratio
1.13
2.35
1.87
1.91
1.78
0.98
1.33
1.94
2.30
1.52
Lower CL
0.13
0.88
0.96
1.25
1.19
0.65
0.74
0.64
1.89
1.25
Upper CL
9.71
6.26
3.64
2.91
2.67
1.48
2.39
5.89
2.80
1.86
Approximate test for unequal rate ratios (interaction):

2 = 7.72, P = 0.3583
Section 5: Adjusting for time-changing confounders with

Poisson regression
In the classical analysis we saw how to split the data (using a Lexis expansion)
for time-changing confounders such as age and then stratify by this new variable
to assess potential confounding or effect modification. We can control for such
variables in the same way in a Poisson model.
5.1: Adjusting for time-changing confounders with Poisson

regression
The parameter estimates (and SEs) from a Poisson model for the effect of Grade
adjusted for Ageband (current age) is shown below.
The exposure of interest is grade of

employment; we are not really interested in the estimates for the different age
bands. Ageband is included in the model to adjust for the effect of changing age on
the effect of Grade.
Estimates from a Poisson model adjusted for age

Coefficient
Standard error
Grade1
0.3454
0.1681
Ageband50
1.8433
1.0541
Ageband55
2.1612
1.0291
Ageband60
2.9162
0.0134
Ageband65
3.3047
0.0128
Ageband70
3.5169
1.0184
Ageband75
3.8104
1.0350
Ageband80
4.4880
1.1214
Constant
-8.1115
1.0006
Log likelihood = 831.8421

regression
What does the coefficient (=0.3454) for Grade1 tell us?
Interaction: Button: thought bubble:
Output (appears in new window):
The value of the coefficient for Grade1 is the difference in log rates from low-grade
workers to high-grade workers, this is the log(rate ratio). This estimate is now
adjusted for the potential confounding effect of age.
(appears on page)
What is the adjusted rate ratio for the effect of Grade, to 2 decimal places?
Interaction: Calculation: Adjusted rate ratio =
Output:
Incorrect answer:
No, in fact the adjusted rate ratio for the effect of Grade is given by the
exponential of the coefficient for Grade1.
Adjusted rate ratio = exp(0.3454)
= 1.41
So, the rate in the low-grade workers is 1.41 times greater than that in high-grade
workers, after adjusting for the effect of changing age.
Correct answer:
Correct
The adjusted rate ratio for the effect of Grade is given by:
exp(0.3454) = 1.41.
So, the rate in the low-grade workers is 1.41 times greater than that in high-grade
workers, after adjusting for the effect of changing age.
Coefficient
Standard error
Grade1
0.3454
0.1681
Ageband50
1.8433
1.0541
Ageband55
2.1612
1.0291
Ageband60
2.9162
0.0134
Ageband65
3.3047
0.0128
Ageband70
3.5169
1.0184
Ageband75
3.8104
1.0350
Ageband80
4.4880
1.1214
Constant
-8.1115
1.0006

regression
Now, using the correct values from the table below, calculate a Wald test statistic
for the hypothesis of no effect of low-grade employment.
H0: log(rate ratio) = 0
(RR = 1)
Give your answer to 3 decimal places.

Interaction: Calculation: Wald test statistic, z =
Output:
Incorrect answer:
No, that's not correct. Remember that the Wald test statistic is given by:
coefficient / standard error
= 0.3454 / 0.1681
= 2.055.
Correct answer:
Yes, the Wald test statistic
= coefficient / standard error
= 0.3454 / 0.1681
= 2.055.
Coefficient
Standard error
Grade1
0.3454
0.1681
Ageband50
1.8433
1.0541
Ageband55
2.1612
1.0291
Ageband60
2.9162
0.0134
Ageband65
3.3047
0.0128
Ageband70
3.5169
1.0184
Ageband75
3.8104
1.0350
Ageband80
4.4880
1.1214
Constant
-8.1115
1.0006

regression
Referring z=2.055 to the normal distribution gives P = 0.04. Can you select the
correct words from the dropdowns in the paragraph below to give the true
interpretation of this result?
The Wald test, P = 0.04, suggests that the data are not compatible with the null
hypothesis, RR = 1. We can, therefore, say that there is evidence
the
null hypothesis and there is a
difference in the rate of CHD for low-grade
workers and high-grade workers, after adjusting for age.
Interaction: Pulldown: We can, therefore, say that there is evidence
null hypothesis:
the
Incorrect Response to support (pop up box appears):

No, with a P-value = 0.04 we can reject the null hypothesis and say that the data in
this study (and inferences that we can draw from the study) do not support the null
hypothesis.
Correct Response against (pop up box appears):
That's correct, P=0.04 is evidence against the null hypothesis.
Incorrect Response describing (pop up box appears):
No, you cannot say there is evidence describing the null hypothesis. The P-value tells
us whether the evidence against the null hypothesis is weak or strong. In this case
P=0.04 is moderately strong evidence against the null hypothesis.
Interaction: Pulldown: and there is a
difference in the rate of CHD for
low-grade workers and high-grade workers, after adjusting for age:
Correct Response significant (pop up box appears):
Yes, we can say there is a significant difference in the rate of CHD for low-grade
workers compared to high-grade workers after adjusting for the effect of changing
age.
Incorrect Response small (pop up box appears):
No, if there is evidence against the null hypothesis we can say there is a significant
difference in the rates of CHD for low grade workers compared to high grade
workers, but we cannot say how small or large this difference is from a P-value
Incorrect Response variable (pop up box appears):
No, there is not a variable difference. From the P-value, we can conclude that there
is a significant difference in the rate of CHD between low grade and high-grade
workers.

regression
The table below shows the estimated log rates by grade and age. The "40
ageband is the baseline group.
Click the highlighted cells to plot these estimates in the graph below.
Current age
High grade
40-
-8.1115
50-
-6.2682
55-
-5.9503
60-
-5.1953
65-
-4.8068
70-
-4.5946
75-
-4.3011
80-
-3.6235
Low grade
-7.7661
(hotspot1)
-5.9228
(hotspot2)
-5.6049
(hotspot3)
-4.8499
(hotspot4)
-4.4614
(hotspot5)
-4.2492
(hotspot6)
-3.9557
(hotspot7)
-3.2781
(hotspot8)
Interaction: Hotspot: -7.7661 (hotspot1)

Output: (changes table on RHS):







Note: parallel lines indicate assumption of proportional rates.

regression
The Poisson model for Grade and Ageband is shown below. What is the assumption
we make in this model?
Output:
The assumption we make in this model is that the effect of grade is the same in all
age groups. We can call this the proportional rates assumption (the same as the
proportional odds assumption in logistic regression). This model does not account
for potential interaction.
Go on to the next page to consider a model with interaction.
Log rate = constant + Grade1 + Ageband50 + Ageband55 + Ageband60
+ Ageband65 + Ageband70 + Ageband75 + Ageband80
This is a Poisson model with a separate effect for each age group.
Section 6: Testing for Interaction

So far, the model we have fitted assumes no interaction between Grade and
Ageband, i.e., proportional rates.
We can check whether this assumption is valid in a Poisson model, the same way we
do in a logistic model. How do we do this?
Output
(appears on page):
To check the assumption of proportional rates we fit a model with interaction
between Grade and Ageband and compare it to a model without interaction using a
likelihood ratio test. If there is a large difference between the two models then there
is significant interaction.
Interaction: Button: note:
Output (appears in new window):
When examining such lines they may not be exactly parallel. The likelihood ratio test
tests whether they are close enough to being parallel that we can produce a model
which assumes they are. Remember we should always try to produce the simplest
model possible.
(appears on page):
The tabs below show simple illustrations of no interaction and interaction.
Interaction: Tabs: Proportional :
Output:
Interaction: Tabs: Interaction 1 :

Output:
Interaction: Tabs: Interaction 2 :

Output:
6.1: Testing for Interaction

In the following models, because of the small number of events in the extreme
groups, the lowest age band has been combined with the second lowest and the
highest age band has been combined with the second highest. We have 6 age
groups in Ageband and thus there are 5 estimated parameters for ageband.
The log-likelihoods for the two models with and without interaction are:
Model with interaction between Grade and Ageband:
Log likelihood, L1 = - 829.68055
Model without interaction between Grade and Ageband:
Log likelihood, L0 = - 834.89366
Calculate the LRS to test for interaction, giving your answer to 2 decimal places:
Interaction: Calculation: LRS =
Output:
Incorrect answer:
No, that's not right. The likelihood ratio statistic is given by:
LRS
= 2(L1 L0)
= 2( 829.68055 ( 834.89366))
= 10.43
Correct answer:
Yes, the likelihood ratio statistic is
LRS
= 2(L1 L0)
= 2( 829.68055 ( 834.89366))
= 10.43
(back to main text)

In these models, because of the small number of events in the extreme groups, the
lowest age band has been combined with the second lowest and the highest age
band has been combined with the second highest. We therefore have 6 age groups in
Ageband
Section 7: Adjusting for age and calendar period

Interaction: Tab 1
Think about studies which last 10 years or more; during such long periods rates
may vary.
For example, men aged 40-44 in Britain had a different mortality rate in 1940 than
had men aged 40-44 in 1970. In this situation it is better to divide events and
observation time by both age and calendar period.
Tab 2
The figure below shows three cohort subjects. The x-axis represents calendar period,
the y-axis represents age. Instead of only splitting the total follow-up time for an
individual when they change age group, it is also split when they change calendar
period. Click below to show this.
As before we can then calculate mortality rates using the combined age and calendar
period intervals.
Interaction: button: Show
Output:
7.1: Adjusting for age and calendar period

Example
Consider subject 1:
Entry
Changed age group
Changed calender period
Exit
01/07/1966
01/03/1967
01/01/1970
01/03/1972
01/01/1975
01/03/1977
01/09/1977
This splits the total follow-up time for subject 1 into 6 parts, as shown on the
diagram below.
7.2: Adjusting for age and calendar period

Once we have done a Lexis expansion on both age and calendar period, we will have
separate records for each individual for each combination of current age and
calendar period. We can then analyse these exposures as usual.
Note that if we do the Lexis expansion and then do not include age or calendar
period in the model, we will get the same result for other exposures as if we had not
done the Lexis expansion at all.
Similarly, if we do not include calendar period (or conversely age) in the model, we
will get the same result as if we had only done the Lexis expansion on age (or
conversely calendar period).
Section 8: Standardised Mortality Ratios

There are cohort studies conducted in populations that have all experienced an
exposure of interest. It can be interesting to compare their mortality rates to the
rates in a reference cohort.
For example, consider a cohort consisting entirely of the workforce of a factory that
manufactures a potentially hazardous chemical. Since the whole workforce has been
exposed to the chemical, we would need to compare the mortality rates in this
cohort to an external, reference cohort.
Can you think of any potential biases in such an analysis?
Output (appears on page):
Many studies show that those in an occupational cohort have lower mortality than
the general population. This is because those who are very sick, and hence at higher
risk of death, often cannot work. This is referred to as the healthy worker effect.
Those who are in the reference cohort may not be truly unexposed. For example, a
reference cohort drawn from the communities surrounding a mine may also be
exposed to mine dust and may include ex-miners.
The exposed cohort and the unexposed reference population may differ substantially
in age and calendar period. For example, if an occupational cohort has been followed
for 30 years, it may have experienced changing mortality over three decades, which
would not be reflected in a more recent reference cohort.
8.1: Standardised Mortality Ratios

Standardised Mortality Rates (SMRs) are a way of dealing with these differences in
age and calendar period. In essence, it is a stratified rate ratio between the
exposed cohort mortality rates and the reference cohort mortality rates, where the
strata are categories of age, calendar period and, possibly, sex.
The Standardised Incidence Ratio (SIR) has the same definition but is for comparing
disease incidence instead of mortality.

The SMR is calculated as the number of deaths observed in the cohort (D), divided
by the number of deaths expected from the rates in the reference cohort (E).
The expected number of events is calculated separately for each stratum (as defined
by all combinations of age, calendar period and possibly sex) and then summed to
give the total number of expected events.
The expected number of events in a stratum is the person time in that stratum from
the exposed cohort multiplied by the rate in that stratum from the reference cohort
i.e. we compare the observed deaths in the exposed cohort to the number of deaths
that would be expected if the reference cohort had the same age/calendar period/sex
distribution as the exposed cohort.
What are we assuming about the stratum-specific rate ratios, comparing the rate in
exposed individuals with the rate in unexposed individuals?
Output (appears on page):

As with all Mantel-Haenszel summary estimates, we assume that all the underlying
stratum-specific rate ratios for the effect of the exposure are the same i.e. any
differences between the stratum-specific rate ratios is just random variation. So we
are assuming that the effect of the exposure (as measured by the rate ratio) is the
same in all combinations of age group, sex, and calendar period.

We will now calculate the SMR, controlling for age only, comparing the Whitehall
cohort during the period 1970-74 to the reference rates from England and Wales
over the same time period.
Calculate the rate ratio for age group 50-54 years to 2 decimal places.
Interaction: Calculation: rate ratio =
Output:
Incorrect answer:
No, that's not right. The rate ratio is the rate in the exposed group divided by the
rate in the unexposed group.
RR
= 1.752 / 3.487
= 0.50
Correct answer:
Yes, the rate ratio is the rate in the exposed group divided by the rate in the
unexposed group.
RR
= 1.752 / 3.487
= 0.50
(back to main text)
On the next page we will see all the rate ratios completed.
Age
group
50-54
55-59
60-64
65-69
70-74
Whitehall cohort
Reference cohort
Deaths
Person-years
(per 1000py)
Mortality rate
(per 1000py)
Mortality rate
(per 1000py)
39
87
92
62
15
22.2599
19.3621
14.6177
5.9896
1.1421
1.752
4.493
6.294
10.351
13.134
3.487
5.569
8.751
13.777
19.946

We can see below that the rate ratios vary between 0.50 and 0.81 with no obvious
pattern. We will assume that the true rate ratio is the same in each age stratum and
hence, calculate the SMR controlling for age.
Now calculate the expected number of deaths for age group 50-54 years to 1
decimal place.
Interaction: Calculation: expected deaths =
Output:
Incorrect answer:
No, that's not right. The expected number of deaths is the person-years from the
exposed cohort multiplied by the rate in the reference cohort.
Expected deaths
= 22.2599 * 3.487
= 77.6
Correct answer:
Yes, the expected number of deaths is the person-years from the exposed cohort
multiplied by the rate in the reference cohort.
Expected deaths
= 22.2599 * 3.487
= 77.6
(back to main text)
On the next page we will see all the expected deaths completed.
Age
group
Deaths
Personyears (per
1000py)
Mortality rate
(per 1000py)
Reference
cohort
Mortality rate
(per 1000py)
50-54
55-59
60-64
65-69
70-74
39
87
92
62
15
22.2599
19.3621
14.6177
5.9896
1.1421
1.752
4.493
6.294
10.351
13.134
3.487
5.569
8.751
13.777
19.946
Whitehall cohort
Rate
ratio
0.50
0.81
0.72
0.75
0.66

We can see below the observed and expected deaths in each age group.
Now calculate the SMR, controlling for age, to 2 decimal places.
Interaction: Calculation: SMR =
Output:
Incorrect answer:
No, that's not right. The SMR is the observed divided by the expected number of
deaths.
SMR = (39+87+92+62+15) / (77.6+107.8+127.9+82.5+22.8)
= 295 / 418.7
= 0.70
Correct answer:
Yes, the SMR is the observed divided by the expected number of deaths.
SMR = (39+87+92+62+15) / (77.6+107.8+127.9+82.5+22.8)
= 295 / 418.7
= 0.70
So the Whitehall cohort had a 30% lower mortality than the general population,
controlling for the effect of age. SMRs are often quoted to base 100 i.e. this SMR
would be quoted as 70. Note this method of age standardisation is often referred to
as indirect standardisation.
Mortality rate
(per 1000py)
Reference
cohort
Mortality rate
(per 1000py)
Expect
ed
deaths
1.752
4.493
6.294
10.351
13.134
3.487
5.569
8.751
13.777
19.946
77.6
107.8
127.9
82.5
22.8
Whitehall cohort
Age
group
50-54
55-59
60-64
65-69
70-74
Deaths
39
87
92
62
15
Personyears (per
1000py)
22.2599
19.3621
14.6177
5.9896
1.1421

To calculate a 95% confidence interval for an SMR, we divide and multiply by the
error factor, calculated as exp(1.96/D).
For hypothesis testing that SMR differs significantly from 100, we calculate the test
statistic as U2/V, where U is the (observed deaths expected deaths) and V is the
expected deaths. This test statistic is then compared to the 2 distribution on one
degree of freedom.
Calculate the confidence interval for the Whitehall SMR, controlling for age, to 0
decimal places.
Confidence interval = Lower CL

Upper CL
Interaction: Calculation: Lower CL

Output:
Any incorrect answer:
No, that's not right. The error factor is exp(1.96/295) = 1.12.
Lower CL
= 70/1.12 = 63
Correct answer:
Thats right.
Interaction: Calculation: Lower CL
Output:
Any incorrect answer:
No, that's not right. The error factor is exp(1.96/295) = 1.12.
Upper CL
= 70*1.12 = 78
Correct answer:
Thats right.
Yes, the error factor is exp(1.96/295) = 1.12.
Lower CL
= 70/1.12 = 63
Upper CL
= 70*1.12 = 78
Hence, we are 95% confident that the true SMR lies between 63 and 78.

In some cases we want to compare the rates across many groups. Instead of
calculating an SMR for each pair, it is easier to calculate a list of SMRs, one for each
comparison of a group with the reference.
By comparing two SMRs in the list, we get an indirect comparison between their two
groups.
These indirect comparisons are valid, providing that the true stratum-specific rate
ratios (between each group and the reference population) can be assumed to be
constant for each group being compared. If this assumption is valid, the ratio of the
corresponding SMRs provides an unbiased estimate of the rate ratio between the two
groups.
Hence, we can see that if it is appropriate to calculate a series of SMRs in the first
place, it is also appropriate to compare them. However, the comparison of SMRs
should only be used as a rough guide, and more detailed comparisons between
particular pairs should be made directly using the Mantel-Haenszel method.
Section 9: Summary
This is the end of AS04. When you are happy with the material covered here please
move on to session AS05 .
The main points of this session will appear below as you click on the relevant title.
Recognising the importance of time changing variables

Some exposure variables change over time. When this change is deterministic
(for example, we know that individuals will age during the period of follow-up),
this can be taken into account during analysis. Note that some variables, such as
age, can be regarded as time fixed (for example, age at entry to the cohort) or
time changing (current age). Choosing which to use depends on the study
question and on the duration of the cohort.
Manipulating the data using Lexis expansion
We do this by first splitting the record for each individual by intervals of the exposure
variable, for example, by current age groups. This manipulation of the data is
called a Lexis expansion. Then we have the separate exposure specific records, such
that each individual will have a separate record for each age group that they were in
during the cohort (for the example of current age).
Adjustment for time changing variables
We can then treat these multiple records from one individual as independent
records and the usual methods of analysis can be applied i.e. we can treat
current age as though it were a time fixed variable when assessing if it is a risk
factor, a confounder or an effect modifier.
Standardised mortality ratios (SMRs)
We have seen that in situations where an entire cohort has been exposed, we can
compare the cohorts mortality or incidence rates to those in a reference
population. In doing so, we normally need to standardise by age, calendar period
and possibly sex. Standardised Mortality Ratios (SMRs) provide an indirect
standardisation for this.

As 04

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

As 04

Transféré par

Droits d'auteur :

Formats disponibles

Stratifying on time for cohort studies

Course: PG Diploma/ MSc Epidemiology

Section 1: Stratifying on time for cohort studies

Section 2: Planning your study

'Disease' is used as a general term to refer to the outcome of interest,

Interaction: Tabs: Deterministic:

Interaction: Button: Example:

(back to main text on LHS):

Interaction: Button: Example (2):

Lexis expansion assumes that the

Interaction: Button: Swap:

Interaction: Button: Split:

Interaction: Pulldown: Subject 2:

Interaction: Pulldown: Subject 3:

Interaction: Calculation: Total D =

Interaction: Pulldown: Subject 2:

Interaction: Pulldown: Subject 3:

(changes on when click on first Show button)

(changes on when click on second Show button)

Section 4: Adjusting for changing age

4.1: Adjusting for changing age

4.2: Adjusting for changing age

Considering these results, is there evidence of confounding or effect modification

Approximate test for unequal rate ratios (interaction):

4.3: Adjusting for changing age

Approximate test for unequal rate ratios (interaction):

Section 5: Adjusting for time-changing confounders with

5.1: Adjusting for time-changing confounders with Poisson

The exposure of interest is grade of

Estimates from a Poisson model adjusted for age

Log likelihood = 831.8421

5.2: Adjusting for time-changing confounders with Poisson

Log likelihood = 831.8421

5.3: Adjusting for time-changing confounders with Poisson

Give your answer to 3 decimal places.

Log likelihood = 831.8421

5.4: Adjusting for time-changing confounders with Poisson

Incorrect Response to support (pop up box appears):

5.5: Adjusting for time-changing confounders with Poisson

Interaction: Hotspot: -7.7661 (hotspot1)

Interaction: Hotspot: -5.9228 (hotspot2)

Interaction: Hotspot: -5.6049 (hotspot3)

Interaction: Hotspot: -4.8499 (hotspot4)

Interaction: Hotspot: -4.4614 (hotspot5)

Interaction: Hotspot: -4.2492 (hotspot6)

Interaction: Hotspot: -3.9557 (hotspot7)

Interaction: Hotspot: -3.2781 (hotspot8)

Note: parallel lines indicate assumption of proportional rates.

5.6: Adjusting for time-changing confounders with Poisson

Section 6: Testing for Interaction

Interaction: Tabs: Interaction 1 :

Interaction: Tabs: Interaction 2 :

6.1: Testing for Interaction

(back to main text)

Section 7: Adjusting for age and calendar period

7.1: Adjusting for age and calendar period

Changed age group

Changed calender period

7.2: Adjusting for age and calendar period

Section 8: Standardised Mortality Ratios

8.1: Standardised Mortality Ratios

8.2: Standardised Mortality Ratios

Output (appears on page):