Vous êtes sur la page 1sur 12

Longitudinal Data Analysis - Note Set 13

Topics in the Remainder of the Course


Longitudinal analysis when the model for the outcome variable is
nonlinear
Binary outcomes (logistic regression)
Counts (Poisson regression)
Generalized estimating equations for marginal models
Nonlinear mixed effects analysis for subject-specific models
Inverse Probability of Missingness weighting
Hierarchical models for multilevel data
346

Topics for Today


Logistic Regression
Poisson Regression
The Generalized Linear Model

347

Longitudinal Data Analysis - Note Set 13

Review: Logistic and Poisson Regression


We begin with a review of logistic and Poisson regression for a single
response variable. This review will explain why we need a different
approach when the regression model for the response is nonlinear, that is,
when the expected response does not change linearly with changes in the
value of the covariates.
Logistic Regression:
So far, we have considered linear regression models for a continuous
response, Y , of the following form
Y = 0 + 1X1 + 2X2 + . . . + kXk +
In the univariate case, the response variable, Y, was assumed to have a
normal distribution with mean
E(Y) = 0 + 1X1 + 2Xz+ . . . + kXk
and with variance, 2.
348

The population intercept, 0, represents the mean value of the response


when all of the covariates have value equal to zero.
Regression coefficients for covariates, say 1, represent the expected
change in the mean response for a single-unit change in X1, given no
change in the values of the other covariates.
In many studies, however, we are interested in a response variable that is
dichotomous rather than continuous.
This situation calls for a regression model for a binary (or dichotomous)
response.

349

Longitudinal Data Analysis - Note Set 13

Let Y be a binary response, where


Y = 1 represents a `success';
Y = 0 represent a `failure'.
Then the mean of the binary response variable, denoted , is the
proportion of successes or the probability that the response equals 1.
That is,

= E(Y) = Pr(Y = 1) = Pr(`success')

With a binary response, we are usually interested in estimating the


Probability, , and modeling how it depends on the values of covariates.
The most common approach to this problem is logistic regression.

350

A naive strategy for modeling a binary response is to perform an ordinary


linear regression analysis using the regression model
= E(Y ) = 0 + 1X1 + 2Xz+ . . . + kXk
This is sometimes done but is usually not employed in the analysis of
subject level data since is a probability and is restricted to values
between 0 and 1.
Also, the usual assumption of homogeneity of variance would be violated
since the variance of a binary response depends on the mean, i.e.
var(Y ) = (1 - )
Instead, we can consider a logistic regression model where
ln [/(1 - )] = 0 + 1X1 + 2Xz+ . . . + kXk

351

Longitudinal Data Analysis - Note Set 13

Relation between log(/(1 )) and

352

This model accommodates the constraint that is restricted to values


between 0 and 1.
Recall that /(1 - ) is defined as the odds of success.
Therefore, modeling with a logistic function can be considered to be
analogous to a linear regression model where the mean of the continuous
response has been replaced by the logarithm of the odds of success.
Note that the relationship between and the covariates is non-linear. We
can use ML estimation to obtain estimates of the logistic regression
parameters, under the assumption that the binary responses are Bernoulli
random variables.

353

Longitudinal Data Analysis - Note Set 13

Given the logistic regression model


ln [(1 - )] = 0 + 1X1 + 2Xz+ . . . + kXk
the population intercept, 0, has interpretation as the log odds of success
when all of the covariates take on the value zero.
The population slope, say 1, has interpretation in terms of the change in
log odds of success for a single-unit change in X1 given that all of the other
covariates remain constant.
When one of the covariates is dichotomous, say X1, then 1 has a special
interpretation:
exp (1) is the odds ratio or ratio of odds of success for the two possible
levels of X1 (given that all of the other covariates remain constant).

354

Keep in mind that as:


increases
the odds of success increases
and the log odds of success increases
Similarly, as:
decreases
the odds of success decreases
and thus the log odds of success decreases

355

Longitudinal Data Analysis - Note Set 13

Longitudinal Assessment of Neurocognitive


Function After CABG
NEJM 2001; 344:395-402

The investigators studied predictors of cognitive decline five years after


surgery in 261 patients undergoing surgery. Decline in postoperative
function was defined as a drop of 1 SD or more in the scores on tests of
any one of four domains of cognitive function.
Factors predicting long-term cognitive decline were determined by
multivariable logistic regression.
The incidence of cognitive decline was 53 percent at discharge, 36 at six
weeks, and 42 percent at five years.

356

Univariable and Multivariable Predictors of Cognitive Decline and Change


in the Composite Cognitive Index at Five Years

Newman, M. F. et al. N Engl J Med 2001;344:395-402

357

Longitudinal Data Analysis - Note Set 13

Poisson Regression
In Poisson regression, the response variable is a count (e.g. number of
cases of a disease in a given period of time) and the Poisson distribution
provides the basis of likelihood-based inference.
Often the counts may be expressed as rates. That is, the count or
absolute number of events is often not satisfactory because any
comparison will depend on the sizes of the groups (or the
`time at risk') that generated the observations.
Like a proportion or probability, a rate provides a basis for direct
comparison.
In either case, Poisson regression relates the expected counts or rates to a
set of covariates.

358

The Poisson regression model has two components:


1. The distributional assumption: The response variable is a count and is
assumed to have a Poisson distribution.
That is, the probability a specific number of events, Y, occurs is
Pr(Y events) = e- Y/(Y!)
Note that is the expected count or number of events and the expected
rate is given by /t, where t is a relevant baseline measure (e.g. t might
be the number of persons or the number of person-years of observation).

359

Longitudinal Data Analysis - Note Set 13

2. The regression model:


ln( /t) = 0 + 1X1 + 2Xz+ . . . + kXk
Since ln( /t) = ln() - ln(t), the Poisson regression model can also be
written as

log linear

ln() = ln(t) + 0 + 1X1 + 2Xz+ . . . + kXk


where the `coefficient' associated with ln(t) is fixed as 1.
This adjustment term is known as an `offset'.

360

Therefore, modeling (or /t) with a log function can be considered to be


equivalent to a linear regression model where the mean of the continuous
response has been replaced by the logarithm of the expected count (or
rate).
Once again, the relationship between (or /t) and the covariates is
non-linear.
We can use ML estimation to obtain estimates of the Poisson regression
parameters, under the assumption that the responses are Poisson random
variables.

361

Longitudinal Data Analysis - Note Set 13

Given the Poisson regression model


ln(/t) = 0 + 1X1 + 2Xz+ . . . + kXk
the population intercept, 0, has interpretation as the log expected rate
when all the covariates take on the value zero.
The population slope, say 1, has interpretation in terms of the change in
log expected rate for a single-unit change in X1 given that all of the other
covariates remain constant.
When one of the covariates is dichotomous, say X1, then 1 has a special
interpretation:
exp (1) is the rate ratio for the two possible levels of X1 (given that all of
the other covariates remain constant).
362

Example: Prospective study of coronary heart disease (CHD).


The study observed 3154 men aged 40-50 for an average of 8 years and
recorded incidence of cases of CHD.
The risk factors considered include:
Smoking exposure:
Systolic BP:
Behavior Type:

0, 10, 20, 30 cigs per day;


0 (< 140), 1 ( 140);
0 (type B), 1 (type A).

A simple Poisson regression model is:


or

ln(/t) = ln(rate of CHD) = 0 + 1Smoke


ln() = ln(t) + 0 + 1Smoke

363

Longitudinal Data Analysis - Note Set 13

Person
Years
5268.2
2542.0
1140.7
614.6
4451.1
2243.5
1153.6
925.0
1366.8
497.0
238.1
146.3
1251.9
640.0
374.5
338.2

Smoking
0
10
20
30
0
10
20
30
0
10
20
30
0
10
20
30

Blood
Pressure
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

Behavior
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1

CHD
20
16
13
3
41
24
27
17
8
9
3
7
29
21
7
12
364

For these data, the ML estimate of 1 is 0.0318. That is, the rate of CHD
increases by a factor of exp(0.0318) = 1.032 for every cigarette smoked.
Alternatively, the rate of CHD in smokers of one pack per day (20 cigs) is
estimated to be (1.032)20 = 1.88 times higher than the rate of CHD in
non-smokers.
We can include the additional risk factors in the following model:
ln ( /t) = 0 + 1 Smoke + 2 Type + 3BP
Effect
Intercept
Smoke
Type
BP

Estimate

Std. Error

-5.420
0.027
0.753
0.753

0.130
0.006
0.136
0.129
365

10

Longitudinal Data Analysis - Note Set 13

Now, the adjusted rate of CHD (controlling for blood pressure and behavior
type) increases by a factor of exp(0.027) = 1.028 for every cigarette smoked.
Thus, the adjusted rate of CHD in smokers of one pack per day (20 cigs) is
estimated to be (1.027)20 = 1.704 times higher than the rate of CHD in nonsmokers.
Finally, note that when a Poisson regression model is applied to data
consisting of very small rates (say, /t << 0.01), the rate is approximately
equal to the corresponding probability, p, of an event and
ln (rate) ln (p) ln [p/(1 - p)]
Therefore, logistic regression and Poisson regression analyses produce
similar regression coefficients when the event being studied is rare.

366

Extended Work Shifts and the Risk of


Motor Vehicle Crashes among Interns
NEJM 352:125-134 2005

In 2002-03, investigators conducted a prospective nationwide, web-based


survey of 2,737 residents in their first postgraduate year (interns) who
completed 17,003 monthly reports that provided detailed information about
work hours, work shifts of an extended duration, documented motor
vehicle crashes, near-miss incidents, and incidents involving involuntary
sleeping.
Residents were enrolled through the National Resident Matching Program.
The investigators used Poisson regression analysis adjusted for age and
sex to determine whether the mean monthly number of scheduled
extended shifts was associated with the occurrence of crashes. For each
participant, the time at risk for the Poisson regression was considered to
be the number of monthly surveys that the participant completed.
367

11

Longitudinal Data Analysis - Note Set 13

Weekly Hours That Interns Worked as a Percentage of Reported Weeks

Barger, L. K. et al. N Engl J Med 2005;352:125-134

368

Results
A total of 320 motor vehicle crashes were reported, including 133 that were
consequential; 131 of the 320 crashes occurred on the commute from
work.
Every extended shift (> 24 hrs) scheduled per month increased the
monthly rate of any motor vehicle crash by 9.1 percent (95 percent
confidence interval, 3.4 to 14.7 percent) and increased the monthly rate of
a crash on the commute from work by 16.2 percent (95 percent confidence
interval, 7.8 to 24.7 percent).

369

12

Vous aimerez peut-être aussi