Académique Documents
Professionnel Documents
Culture Documents
Outline
1. Some examples and questions of interest
Marie Davidian
Department of Statistics
North Carolina State University
2. First steps
3. How do longitudinal data happen? A conceptualization
4. Statistical models: Subject-specific and population-averaged
5. Implementation
6. Discussion
http://www.stat.ncsu.edu/davidian
(a copy of these slides is available at this website)
1
1
30
1
1
1
25
distance (mm)
20
1
1
1
1
1
1
0
0
1
0
1
0
1
0
1
0
1
0
0
0
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
1
0
0
1
1
0
0
1
1
1
0
1
0
0
0
1
1
0
1
1
1
0
1
0
1
0
1
0
1
0
1
1
1
0
0
0
0
0
0
0
1
0
From web pages by Professor John B. Ludlow, UNC-Chapel Hill School of Dentistry
10
11
age (years)
12
13
14
Observations:
All children have all 4 measurements at the same time points (ages)
(balanced )
30
The individual pattern for most children follows a rough straight line
increase (with some jitter )
25
distance (mm)
1
1
20
10
11
12
13
14
age (years)
300 children from six different cities examined annually at ages 912
On each child, respiratory status (1=infection, 0=no infection) and
maternal smoking in past year (1=yes, 0=no)
1
0
0
1
0
0
10
10
10
1
0
.
0
0
.
11
11
11
1
0
.
0
0
.
12
12
12
1
0
.
Portage
Kingston
Portage
0
0
.
Age
9
10
11
12
10
Dosing recommendations
12
10
11
15
Time (hr)
12
20
25
kai Di
{exp(kei t) exp(kai t)}, kei = Cli /Vi
Vi (kai kei )
Di
Xi (t)
kai
kei
13
14
15
2. First steps
Dental study: 16 boys, 11 girls, distance measured at 8, 10, 12, 14
years of age, no missing observations
Focus: Is the pattern of dental distance over time different for boys
and girls?
Favorite ad hoc analysis of my clinician friends:
Cross-sectional analysis comparing means (boys vs. girls) at each
age 8, 10, 12, 14 (two-sample t-tests)
P-values: 0.08, 0.06, 0.01, 0.001
Conclusion ? Multiple comparisons ?
How to put this together to say something about the differences
in patterns and how they differ? What are the patterns, anyway?
17
16
2. First steps
Problem: Were trying to force a familiar analysis to address questions
its not designed to answer!
In fact, what if the data werent balanced?
Need to start with a formal statistical model for the situation that
acknowledges the data structure. . .
Statistical model:
Informally a description of the mechanisms by which data are
thought to arise
More formally a probability distribution that describes how
observations we see take on their values
In order to talk about analysis , we need to first identify an
appropriate statistical model. . .
Introduction to Longitudinal Data
18
2. First steps
2. First steps
iid
19
2. First steps
20
2. First steps
30
25
20
distance (mm)
10
11
12
13
14
age (years)
21
2. First steps
22
2. First steps
Yij = 0G (1 Gi ) + 0B Gi + 1G (1 Gi ) + 1B Gi + ij
ij is a mean-zero deviation that accounts for fact that the
distance we observe at tij for i is not exactly equal to
Population mean for girls at tij
Population mean for boys at tij
Introduction to Longitudinal Data
23
=
=
0G + 1G tij
0B + 1B tij
Introduction to Longitudinal Data
24
2. First steps
2. First steps
Question of interest, more formally: Assuming that each child has
his/her own underlying straight-line trajectory
30
25
distance (mm)
25
distance (mm)
30
20
20
10
11
12
13
14
10
age (years)
11
12
13
14
age (years)
25
2. First steps
26
2. First steps
27
28
response
(b)
response
(a)
PSfrag replacements
C(t)
PSfrag replacements
time
time
PSfrag replacements
C(t)
Introduction to Longitudinal Data
29
C(t)
Introduction to Longitudinal Data
30
PSfrag replacements
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
31
PSfrag replacements
C(t)
Introduction to Longitudinal Data
32
Remarks:
Remarks:
6
2
C(t)
10
12
PSfrag replacements
PSfrag replacements
10
C(t)
t
33
15
20
PSfrag
replacements
t
PSfrag replacements
C(t)
C(t)
PSfrag replacements
C(t)
Introduction to Longitudinal Data
34
Conceptualization:
(a)
(b)
response
response
PSfrag replacements
Measurements on the same subject are correlated due to
C(t)
within-individual covariation
PSfrag replacements
PSfrag replacements
C(t)
t
35
C(t)
C(t)
PSfrag replacements
time
time
C(t)
36
Population-averaged models
Subject-specific models
Depending on the questions in a particular situation, one may be
more suitable then the other
PSfrag replacements
C(t)
t
C(t)
C(t)
37
PSfrag replacements
C(t)
Introduction to Longitudinal Data
38
Subject-specific model:
Conceptualization:
(a)
(b)
response
response
PSfrag replacements
PSfrag replacements
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
39
C(t)
40
PSfrag replacements
time
time
C(t)
PSfrag replacements
C(t)
41
C(t)
C(t)
PSfrag replacements
C(t)
Introduction to Longitudinal Data
42
ef,i (t) and ef,i (t0 ) for times t and t0 close together might tend to
be in the same direction relative to the inherent trend
within-subject (auto)correlation
More succinctly
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
t
0i = 0G (1 Gi ) + 0B Gi + b0i , 1i = 1G (1 Gi ) + 1B Gi + b1i
43
44
Summarizing:
0i = 0G (1 Gi ) + 0B Gi + b0i , 1i = 1G (1 Gi ) + 1B Gi + b1i
Remaining: Assumptions on ef,ij , eme,ij , b0i , b1i that operationalize
what weve said. . .
PSfrag replacements
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
45
C(t)
46
0G
b0i
1G
, Zi =
=
, bi =
0B
b1i
1B
(1 Gi ) (1 Gi )ti1 Gi
..
..
...
Xi =
.
.
(1 Gi ) (1 Gi )timi Gi
1
..
.
ti1
..
.
1 timi
Gi tij
..
.
Gi tij
so that
Y i |Gi N (X i , V i )
C(t)
t
PSfrag replacements
Y i = (Yi1 , . . . , Yimi )T
PSfrag replacements
47
C(t)
C(t)
48
Features:
Hi =
2
Depends on
2 3
1 2
PSfrag replacements
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
49
50
Then V i = Z i DZ Ti + (f2 + e2 )I i
| {z }
2
A reasonable model
V i = Z i DZ Ti + 2 I i
e2 = 0 so V i = Z i DZ Ti + f2 H i = Z i DZ Ti + 2 H i
Z i DZ Ti
+ Ii
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
51
Is typical (mean) slope for girls different from that for boys?
Test 1G = 1B
PSfrag replacements
C(t)
Introduction to Longitudinal Data
52
Notice: V i = Z i DZ Ti + 2 I i
Population-averaged model:
Yi1 , . . . , Yimi are always correlated because they all share the same
inherent trajectory (i.e., b0i , b1i )!
Questions are about how population means are related over time
From the previous slide, this means pick a working model that
will account for among-subject heterogeneity at the least!
PSfrag replacements
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
C(t)
Introduction to Longitudinal Data
PSfrag replacements
53
PSfrag replacements
C(t)
Introduction to Longitudinal Data
54
Conceptualization:
(a)
(b)
response
response
PSfrag replacements
PSfrag replacements
C(t)
t
C(t)
t
time
PSfrag replacements
time
PSfrag replacements
C(t)
C(t)
55
0G
(1 Gi ) (1 Gi )ti1
..
..
1G
=
, Xi =
.
.
0B
(1 Gi ) (1 Gi )timi
1B
Gi
...
Gi tij
..
.
Gi
Gi tij
Within-subject autocorrelation
series
i = 2
PSfrag replacements
C(t)
C(t)
C(t)
57
56
C(t)
Introduction to Longitudinal Data
PSfrag replacements
PSfrag replacements
C(t)
Introduction to Longitudinal Data
58
Result: The models for the population average are of the same form!
Thus and describe the same thing, so are really the same . . .
. . . and we can interpret them either way, e.g., typical slope or
slope of the population average profile !
The distinction between subject-specific and population-averaged
ends up not mattering, so choose the interpretation you like best!
PSfrag replacements
C(t)
t
PSfrag replacements
Difference: How var(Y i |Gi ) is represented
PSfrag replacements
C(t)
C(t)
59
PSfrag replacements
C(t)
Introduction to Longitudinal Data
60
P (Yij = 1|xi , bi )
= 0i + 1i Xij = 0 + bi + 1 Xij
log
1 P (Yij = 1|xi , bi )
0i = 0 + bi , 1i = 1 , bi N (0, D)
PSfrag replacements
C(t)
t
PSfrag replacements
PSfrag replacements
C(t)
C(t)
61
P (Yij = 1|xi , bi )
log
= 0i + 1i Xij , = 0 + bi + 1 Xij
1 P (Yij = 1|xi , bi )
P (Yij = 1|xi )
= 0 + 1 Xij
log
1 P (Yij = 1|xi )
P (Yij = 1|xi ) = E(Yij |xi ) is the probability of respiratory infection
at age j in the population of children with mothers overall smoking
xi
C(t)
C(t)
63
64
exp(0 + 1 Xij )
1 + exp(0 + 1 Xij )
A direct model for the average over all children in the population
Subject-specific: P (Yij = 1|xi , bi ) = E(Yij|xi , bi ) =
exp(0 + bi + 1 Xij )
1 + exp(0 + bi + 1 Xij )
Z
b2
exp(0 + bi + 1 Xij )
1
exp i
dbi
1 + exp(0 + bi + 1 Xij ) 2D
2D
This integral (over the N (0, D) density) is a mess that does not have
the same form as the population-averaged model above!
PSfrag replacements
C(t)
C(t)
C(t)
65
C(t)
Introduction to Longitudinal Data
P (Yij = 1|xi )
= 0 + 1 Xij
log
1 P (Yij = 1|xi )
62
C(t)
C(t)
PSfrag replacements
66
bka ,i
bi =
bCl,i , bi N (0, D)
bV,i
PSfrag replacements
C(t)
t
C(t)
C(t)
67
5. Implementation
68
5. Implementation
i=1
n
X
i=1
i=1
i=1
i=1
i=1
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
69
PSfrag replacements
C(t)
Introduction to Longitudinal Data
5. Implementation
5. Implementation
n
Y
i=1
1
b {Y i (xi , )} = 0
D Ti (xi , )
i
p(y i |xi , bi ) =
ij (1 ij )
PSfrag replacements
SAS proc genmod, R gee( )
C(t)
PSfrag replacements
71
i=1
exp(0 + 1 Xij )
ij =
1 + exp(0 + 1 Xij )
C(t)
Maximize in , D
n Z
Y
p(y i |xi , bi ) n(bi ; 0, D) dbi , n(b; 0, D) is N (0, D) density
p(y i |xi ) =
p(y i |xi , bi ) is the assumed density of Y i given (xi , bi )
E.g., for Six Cities (binary response , (xi , ) has jth element
PSfrag replacements
70
C(t)
mi
Y
j=1
exp(0 + bi + 1 Xij )
1 + exp(0 + bi + 1 Xij )
72
5. Implementation
5. Implementation
SAS proc mixed: Linear mixed effects model Y i = X i + Z i bi + ei
Basic syntax:
Thus, the user must be clear about exactly which model s/he
wishes to fit
For example. . .
PSfrag replacements
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
73
5. Implementation
V i = Z i DZ Ti + 2 I i
Because the within-subject part of V i is 2 I i , a repeated
statement is not required, but we show what it would be if we chose
to include it
PSfrag replacements
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
75
74
Yij = 0G (1Gi )+0B Gi +1G (1Gi )tij +1B Gi tij +b0i +b1i tij +eij
5. Implementation
PSfrag replacements
C(t)
Introduction to Longitudinal Data
6. Discussion
76
6. Discussion
What we didnt talk about: Lots!
More advanced modeling considerations
Before one can analyze longitudinal data, one must understand the
models and their interpretation
How to select the best model and diagnose how well a model fits
Details of implementation
PSfrag replacements
PSfrag replacements
PSfrag replacements
C(t)
C(t)
C(t)
77
PSfrag replacements
C(t)
Introduction to Longitudinal Data
78
6. Discussion
6. Discussion
Fitzmaurice, G.M., Laird, N.M., and Ware, J.H. (2004) Applied Longitudinal
Analysis, Wiley.
http://www.stat.ncsu.edu/davidian
(including lots of examples of using SAS and R under the ST 732 and
ST 762 course web pages!)
PSfrag replacements
C(t)
t
79
C(t)
C(t)
PSfrag replacements
C(t)
Introduction to Longitudinal Data
80