Académique Documents
Professionnel Documents
Culture Documents
December 2006
DOI: 10.1111/j.1541-0420.2006.00570.x
email: wang@wald.ucdavis.edu
Summary. The maximum likelihood approach to jointly model the survival time and its longitudinal
covariates has been successful to model both processes in longitudinal studies. Random eects in the longitudinal process are often used to model the survival times through a proportional hazards model, and
this invokes an EM algorithm to search for the maximum likelihood estimates (MLEs). Several intriguing
issues are examined here, including the robustness of the MLEs against departure from the normal random
eects assumption, and diculties with the prole likelihood approach to provide reliable estimates for the
standard error of the MLEs. We provide insights into the robustness property and suggest to overcome the
diculty of reliable estimates for the standard errors by using bootstrap procedures. Numerical studies and
data analysis illustrate our points.
Key words: Joint modeling; Missing information principle; Nonparametric maximum likelihood; Posterior
density; Prole likelihood.
1. Introduction
It has become increasingly common in survival studies to
record the values of key longitudinal covariates until the occurrence of survival time (or event time) of a subject. This
leads to informative missing/dropout of the longitudinal data,
which also complicates the survival analysis. Furthermore, the
longitudinal covariates may involve measurement error. All
these diculties can be circumvented by including random
eects in the longitudinal covariates, for example, through a
linear mixed eects model, and modeling the longitudinal and
survival components jointly rather than separately. Such an
approach is termed joint modeling and we refer the readers
to the insightful surveys of Tsiatis and Davidian (2004) and
Yu et al. (2004) and the references therein. Numerical studies
suggest that the joint maximum likelihood (ML) method of
Wulfsohn and Tsiatis (1997), hereafter abbreviated as WT, is
among the most satisfactory approaches to combine information. We focus on this approach in this article and address
several intriguing issues.
The approach described in WT is semiparametric in that
no parametric assumptions are imposed on the baseline hazard function in the Cox model (Cox, 1972), while the random
eects in the longitudinal component are assumed to be normally distributed. An attractive feature of this approach, as
conrmed in simulations, is its robustness against departure
from the normal random eects assumption. It is in fact as efcient as a semiparametric random eects model proposed by
Song, Davidian, and Tsiatis (2002). These authors called for
further investigation of this intriguing robustness feature. We
provide a theoretical explanation in Section 2 and demonstrate it numerically in Section 4. A second nding of this
C
j = 1, . . . , mi .
(1)
1037
1038
(2)
(3)
n
i=1
n
i 1(Vi =t)
(4)
j=1
Strictly speaking, this is not an estimate as it involves functions in the E-step, Ej , which are taken with respect to the
posterior density involving 0 (t) itself. Specically, the posterior density is
h(bi | Vi , i , Wi , ti ; )
=
g bi | Wi , ti ; , e2 f (Vi , i | bi , , 0 )
g
bi | Wi , ti ; , e2
. (5)
f (Vi , i | bi , , 0 ) dbi
i
i=1
n
log f (Vi , i | bi , , 0 ).
i=1
SIP =
Ei {S c (; 0 (t), bi )}
i=1
n
i=1
n
j=1
n
. (6)
Ej [exp{Xj (Vi )}]1(Vj Vi )
j=1
n
Ei
i=1
c
S (; 0 , bi )
Ii(,) =
n
n
Ei [S c (; 0 , bi )]
= Ei
j=1
i=1
1039
c
S (; 0 , bi )
Ei S c (; 0 , bi )S h (; 0 , bi ) ,
j=1
n
2
n
.
j=1
(7)
The above iterative procedure is implemented until a convergence criterion is met. We next examine what has been
achieved by the EM algorithm at this point.
3.2 Fisher Information
The iterative layout of the EM algorithm is designed to
achieve the MLEs and hence the consistency of parameters
(, , 2e ) and the cumulative baseline hazard function. There
is no complication on this front. However, the working formula
IW
in (7) at the last step of the EM algorithm has been suggested in the literature to provide the precision estimate for
the standard deviation (standard error) of the -estimator
via (Iw )1/2 . This is invalid. It was derived by taking the
partial derivative of the implicit prole score in equation (6)
with respect to , by treating the conditional expectation
(8)
with
S h (; 0 , bi ) =
log h(bi | Vi , i , Wi , ti ; )
= S c (; 0 , bi ) Ei S c (; 0 , bi )
(9)
n
Vari S c (; 0 , bi ) .
(10)
i=1
The implication of this relation is that under the joint modeling with random eects structure on the covariate Xi , some
information is lost and that this loss is unrecoverable. Therefore, I W
is not the true Hessian, and would be too large when
compared with the true Hessian,
(10). The actual
n I , in
amount of information loss,
Vari {S c (; 0 , bi )} in (10),
i=1
1040
Table 1
Simulated results for case (1) when the longitudinal data have normal random eects
SD for
1/2
Mean of (IW
)
Mean of
Divergence
( 14 22 , e2 )
(0.0025, 0.1)
(4 22 , 2e )
(0.04, 0.1)
( 22 , 2e )
(0.01, 0.1)
( 22 , 4 2e )
(0.01, 0.4)
(22 , 14 e2 )
(0.01, 0.025)
( 22 , 0 2e )
(0.01, 0)
0.132
0.118
1.008
0%
0.098
0.085
0.998
0%
0.112
0.104
1.004
0%
0.169
0.103
1.007
4%
0.108
0.103
0.984
0%
0.103
0.102
0.987
0%
Table 2
Simulated results for case (2) with nonsparse longitudinal data and nonnormal random eects
SD for
1/2
Mean of (IW
)
Mean of
Divergence
( 14 22 , e2 )
(0.003, 0.6)
(4 22 , 2e )
(0.048, 0.6)
( 22 , 2e )
(0.012, 0.6)
( 22 , 4 2e )
(0.012, 2.4)
(22 , 14 e2 )
(0.012, 0.15)
( 22 , 0 2e )
(0.012, 0)
0.130
0.113
0.977
0%
0.111
0.090
0.981
1%
0.119
0.100
0.995
0%
0.196
0.098
0.987
3%
0.114
0.103
0.992
0%
0.103
0.103
1.002
0%
1041
Table 3
Simulated results for case (3) with sparse longitudinal data and nonnormal random eects
Simulated value
Empirical target
Mean
SD
11
12
22
2e
1
1
1.3498
0.1647
4.173
4.5032
4.4172
0.1573
0.0103
0.0913
0.2366
0.0169
4.96
4.7950
3.4356
0.4413
0.0456
0.0175
0.0130
0.0246
0.012
0.0046
0.0067
0.0015
0.15
0.1533
3.2903
0.3574
1042
Table 4
Analysis of lifetime and log-fecundity of medies with lifetime reproduction less than 300 eggs. The bootstrap
1/2
SD and the working SE, (IW
, are reported in the last two rows.
)
EM estimate
Bootstrap mean
Bootstrap SD
1/2
(IW
)
11
12
22
2e
0.5597
0.5676
0.0921
0.0801
0.6950
0.6932
0.0736
0.0663
0.0542
0.0541
0.0156
0.0141
0.7656
0.7621
0.1056
0.0988
0.1123
0.1121
0.0207
0.0176
0.0196
0.0198
0.0043
0.0039
0.9883
0.9858
0.0606
0.0495
Table 5
Analysis of lifetime and log-fecundity of medies with lifetime reproduction between 300 and 749 eggs. The
1/2
bootstrap SD and the working SE, (IW
, are reported in the last two rows.
)
EM estimate
Bootstrap mean
Bootstrap SD
1/2
(IW
)
11
12
22
2e
0.1997
0.2036
0.0836
0.0672
1.7618
1.7659
0.0800
0.0576
0.1730
0.1739
0.0145
0.0120
1.0540
1.0487
0.0784
0.0572
0.1734
0.1766
0.0151
0.0112
0.0301
0.0309
0.0033
0.0023
1.4344
1.4333
0.0356
0.0282
Acknowledgements
The research of Jane-Ling Wang was supported in part by
the National Science Foundation and National Institutes of
Health. The authors greatly appreciate the suggestions from
a referee and an associate editor.
References
Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the
American Statistical Association 88, 925.
Breslow, N. E. and Lin, X. (1995). Bias correction in generalized linear mixed models with a single component
dispersion. Biometrika 82, 8191.
Carey, J. R., Liedo, P., M
uller, H. G., Wang, J. L., and Chiou,
J. M. (1998). Relationship of age patterns of fecundity
to mortality, longevity, and lifetime reproduction in a
large cohort of Mediterranean fruit y females. Journal
of GerontologyBiological Sciences 53, 245251.
Cox, D. R. (1972). Regression models and life tables (with
discussion). Journal of the Royal Statistical Society, Series
B 34, 187220.
Fan, J. and Wong, W. H. (2000). Comment on On prole
likelihood by Murphy and van der Vaart. Journal of the
American Statistical Association 95, 468471.
Henderson, R., Diggle, P., and Doboson, A. (2000). Joint modeling of longitudinal measurements and event time data.
Biostatistics 4, 465480.
1043
Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of innitely
many incidental parameters. Annals of Mathematical
Statistics 27, 887906.
Louis, T. A. (1982). Finding the observed Fisher information
when using the EM algorithm. Journal of the Royal Statistical Society, Series B 44, 226233.
Murphy, S. and van der Vaart, A. W. (2000). On prole likelihood (with discussion). Journal of the American Statistical Association 95, 449465.
Orchard, T. and Woodbury, M. A. (1972). A missing information principle: Theory and applications. In Proceedings of
the 6th Berkeley Symposium on Mathematical Statistics
and Probability, Volume 1, p. 697715. Berkeley: University of California Press.
Pateeld, W. M. (1977). On the maximized likelihood function. Sankhya, Series B 39, 9296.
Solomon, P. J. and Cox, D. R. (1992). Nonlinear component
of variance models. Biometrika 79, 111.
Song, X., Davidian, M., and Tsiatis, A. A. (2002). A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics 58, 742753.
Tierney, L. and Kadane, J. B. (1986). Accurate approximation for posterior moments and marginal densities. Journal of the American Statistical Association 81, 8286.
Tsiatis, A. A. and Davidian, M. (2004). Joint modelling of
longitudinal and time-to-event data: An overview. Statistica Sinica 14, 809834.
Wulfsohn, M. S. and Tsiatis, A. A. (1997). A joint model
for survival and longitudinal data measured with error.
Biometrics 53, 330339.
Yu, M., Law, N. J., Taylor, J. M. G., and Sandler, H. M.
(2004). Joint longitudinal-survival-cure models and their
application to prostate cancer. Statistica Sinica 14, 835
862.
Received July 2004. Revised December 2005.
Accepted January 2006.