Longitudinal Data Analysis - Note Set 13

Longitudinal Data Analysis - Note Set 13
Topics in the Remainder of the Course

Longitudinal analysis when the model for the outcome variable is
nonlinear
Binary outcomes (logistic regression)
Counts (Poisson regression)
Generalized estimating equations for marginal models
Nonlinear mixed effects analysis for subject-specific models
Inverse Probability of Missingness weighting
Hierarchical models for multilevel data
346
Topics for Today

Logistic Regression
Poisson Regression
The Generalized Linear Model
347
Review: Logistic and Poisson Regression

We begin with a review of logistic and Poisson regression for a single
response variable. This review will explain why we need a different
approach when the regression model for the response is nonlinear, that is,
when the expected response does not change linearly with changes in the
value of the covariates.
Logistic Regression:
So far, we have considered linear regression models for a continuous
response, Y , of the following form
Y = 0 + 1X1 + 2X2 + . . . + kXk +
In the univariate case, the response variable, Y, was assumed to have a
normal distribution with mean
E(Y) = 0 + 1X1 + 2Xz+ . . . + kXk
and with variance, 2.
348
The population intercept, 0, represents the mean value of the response

when all of the covariates have value equal to zero.
Regression coefficients for covariates, say 1, represent the expected
change in the mean response for a single-unit change in X1, given no
change in the values of the other covariates.
In many studies, however, we are interested in a response variable that is
dichotomous rather than continuous.
This situation calls for a regression model for a binary (or dichotomous)
response.
349
Let Y be a binary response, where

Y = 1 represents a `success';
Y = 0 represent a `failure'.
Then the mean of the binary response variable, denoted , is the
proportion of successes or the probability that the response equals 1.
That is,
= E(Y) = Pr(Y = 1) = Pr(`success')
With a binary response, we are usually interested in estimating the

Probability, , and modeling how it depends on the values of covariates.
The most common approach to this problem is logistic regression.
350
A naive strategy for modeling a binary response is to perform an ordinary

linear regression analysis using the regression model
= E(Y ) = 0 + 1X1 + 2Xz+ . . . + kXk
This is sometimes done but is usually not employed in the analysis of
subject level data since is a probability and is restricted to values
between 0 and 1.
Also, the usual assumption of homogeneity of variance would be violated
since the variance of a binary response depends on the mean, i.e.
var(Y ) = (1 - )
Instead, we can consider a logistic regression model where
ln [/(1 - )] = 0 + 1X1 + 2Xz+ . . . + kXk
351
Relation between log(/(1 )) and
352
This model accommodates the constraint that is restricted to values

between 0 and 1.
Recall that /(1 - ) is defined as the odds of success.
Therefore, modeling with a logistic function can be considered to be
analogous to a linear regression model where the mean of the continuous
response has been replaced by the logarithm of the odds of success.
Note that the relationship between and the covariates is non-linear. We
can use ML estimation to obtain estimates of the logistic regression
parameters, under the assumption that the binary responses are Bernoulli
random variables.
353
Given the logistic regression model

ln [(1 - )] = 0 + 1X1 + 2Xz+ . . . + kXk
the population intercept, 0, has interpretation as the log odds of success
when all of the covariates take on the value zero.
The population slope, say 1, has interpretation in terms of the change in
log odds of success for a single-unit change in X1 given that all of the other
covariates remain constant.
When one of the covariates is dichotomous, say X1, then 1 has a special
interpretation:
exp (1) is the odds ratio or ratio of odds of success for the two possible
levels of X1 (given that all of the other covariates remain constant).
354
Keep in mind that as:

increases
the odds of success increases
and the log odds of success increases
Similarly, as:
decreases
the odds of success decreases
and thus the log odds of success decreases
355
Longitudinal Assessment of Neurocognitive

Function After CABG
NEJM 2001; 344:395-402
The investigators studied predictors of cognitive decline five years after

surgery in 261 patients undergoing surgery. Decline in postoperative
function was defined as a drop of 1 SD or more in the scores on tests of
any one of four domains of cognitive function.
Factors predicting long-term cognitive decline were determined by
multivariable logistic regression.
The incidence of cognitive decline was 53 percent at discharge, 36 at six
weeks, and 42 percent at five years.
356
Univariable and Multivariable Predictors of Cognitive Decline and Change

in the Composite Cognitive Index at Five Years
Newman, M. F. et al. N Engl J Med 2001;344:395-402
357
Poisson Regression
In Poisson regression, the response variable is a count (e.g. number of
cases of a disease in a given period of time) and the Poisson distribution
provides the basis of likelihood-based inference.
Often the counts may be expressed as rates. That is, the count or
absolute number of events is often not satisfactory because any
comparison will depend on the sizes of the groups (or the
`time at risk') that generated the observations.
Like a proportion or probability, a rate provides a basis for direct
comparison.
In either case, Poisson regression relates the expected counts or rates to a
set of covariates.
358
The Poisson regression model has two components:

1. The distributional assumption: The response variable is a count and is
assumed to have a Poisson distribution.
That is, the probability a specific number of events, Y, occurs is
Pr(Y events) = e- Y/(Y!)
Note that is the expected count or number of events and the expected
rate is given by /t, where t is a relevant baseline measure (e.g. t might
be the number of persons or the number of person-years of observation).
359
2. The regression model:

ln( /t) = 0 + 1X1 + 2Xz+ . . . + kXk
Since ln( /t) = ln() - ln(t), the Poisson regression model can also be
written as
log linear
ln() = ln(t) + 0 + 1X1 + 2Xz+ . . . + kXk

where the `coefficient' associated with ln(t) is fixed as 1.
This adjustment term is known as an `offset'.
360
Therefore, modeling (or /t) with a log function can be considered to be

equivalent to a linear regression model where the mean of the continuous
response has been replaced by the logarithm of the expected count (or
rate).
Once again, the relationship between (or /t) and the covariates is
non-linear.
We can use ML estimation to obtain estimates of the Poisson regression
parameters, under the assumption that the responses are Poisson random
variables.
361
Given the Poisson regression model

ln(/t) = 0 + 1X1 + 2Xz+ . . . + kXk
the population intercept, 0, has interpretation as the log expected rate
when all the covariates take on the value zero.
The population slope, say 1, has interpretation in terms of the change in
log expected rate for a single-unit change in X1 given that all of the other
covariates remain constant.
When one of the covariates is dichotomous, say X1, then 1 has a special
interpretation:
exp (1) is the rate ratio for the two possible levels of X1 (given that all of
the other covariates remain constant).
362
Example: Prospective study of coronary heart disease (CHD).

The study observed 3154 men aged 40-50 for an average of 8 years and
recorded incidence of cases of CHD.
The risk factors considered include:
Smoking exposure:
Systolic BP:
Behavior Type:
0, 10, 20, 30 cigs per day;

0 (< 140), 1 ( 140);
0 (type B), 1 (type A).
A simple Poisson regression model is:

or
ln(/t) = ln(rate of CHD) = 0 + 1Smoke

ln() = ln(t) + 0 + 1Smoke
363
Person
Years
5268.2
2542.0
1140.7
614.6
4451.1
2243.5
1153.6
925.0
1366.8
497.0
238.1
146.3
1251.9
640.0
374.5
338.2
Smoking
0
10
20
30
0
10
20
30
0
10
20
30
0
10
20
30
Blood
Pressure
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
Behavior
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
CHD
20
16
13
3
41
24
27
17
8
9
3
7
29
21
7
12
364
For these data, the ML estimate of 1 is 0.0318. That is, the rate of CHD
increases by a factor of exp(0.0318) = 1.032 for every cigarette smoked.
Alternatively, the rate of CHD in smokers of one pack per day (20 cigs) is
estimated to be (1.032)20 = 1.88 times higher than the rate of CHD in
non-smokers.
We can include the additional risk factors in the following model:
ln ( /t) = 0 + 1 Smoke + 2 Type + 3BP
Effect
Intercept
Smoke
Type
BP
Estimate
Std. Error
-5.420
0.027
0.753
0.753
0.130
0.006
0.136
0.129
365
10
Now, the adjusted rate of CHD (controlling for blood pressure and behavior
type) increases by a factor of exp(0.027) = 1.028 for every cigarette smoked.
Thus, the adjusted rate of CHD in smokers of one pack per day (20 cigs) is
estimated to be (1.027)20 = 1.704 times higher than the rate of CHD in nonsmokers.
Finally, note that when a Poisson regression model is applied to data
consisting of very small rates (say, /t << 0.01), the rate is approximately
equal to the corresponding probability, p, of an event and
ln (rate) ln (p) ln [p/(1 - p)]
Therefore, logistic regression and Poisson regression analyses produce
similar regression coefficients when the event being studied is rare.
366
Extended Work Shifts and the Risk of

Motor Vehicle Crashes among Interns
NEJM 352:125-134 2005
In 2002-03, investigators conducted a prospective nationwide, web-based

survey of 2,737 residents in their first postgraduate year (interns) who
completed 17,003 monthly reports that provided detailed information about
work hours, work shifts of an extended duration, documented motor
vehicle crashes, near-miss incidents, and incidents involving involuntary
sleeping.
Residents were enrolled through the National Resident Matching Program.
The investigators used Poisson regression analysis adjusted for age and
sex to determine whether the mean monthly number of scheduled
extended shifts was associated with the occurrence of crashes. For each
participant, the time at risk for the Poisson regression was considered to
be the number of monthly surveys that the participant completed.
367
11
Weekly Hours That Interns Worked as a Percentage of Reported Weeks
Barger, L. K. et al. N Engl J Med 2005;352:125-134
368
Results
A total of 320 motor vehicle crashes were reported, including 133 that were
consequential; 131 of the 320 crashes occurred on the commute from
work.
Every extended shift (> 24 hrs) scheduled per month increased the
monthly rate of any motor vehicle crash by 9.1 percent (95 percent
confidence interval, 3.4 to 14.7 percent) and increased the monthly rate of
a crash on the commute from work by 16.2 percent (95 percent confidence
interval, 7.8 to 24.7 percent).
369
12

Longitudinal Data Analysis - Note Set 13

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Longitudinal Data Analysis - Note Set 13

Transféré par

Droits d'auteur :

Formats disponibles

Longitudinal Data Analysis - Note Set 13

Topics in the Remainder of the Course

Topics for Today