An Application of Longitudinal Data Analysis

Analysis of longitudinal data -an application to chronic angle
closure glaucoma
Pallavi Basu
Abhishek Pal Majumder
Anirban Basak
Priyam Biswas
May 29, 2007
Abstract
We examine longitudinal data of visual field score and IOP from patients having chronic angle closure glaucoma. In
determinig a relationship between field score and IOP , linear regression technique is used . Serious concerns can be raised
about the normality assumption. A Box-Cox transformation is hence applied.We try to analyze the assumption that each
subfield is equally affected by glaucoma .Resampling technique is used to estimate distribution of test statistic . Predicting
Progression was not feasible due to shortage of data.
1
Contents
1 Outlining situation and framing objectives. 4
1.1 Explaining the variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Inclusion criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Categorization by glaucoma stage(by AGIS system) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Brief description of methods of analysis 4
3 Using this dataset 5

3.1 Dealing with missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 One assumption that can’t be ignored here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Handling of visual acuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Examining relationship between IOP and Visual field score 5

4.1 Selection of response and Explanatory variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Independence of left and right eye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.1 Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.2 Formulation of hypothesis and testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.3 Evaluation of cut-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2.5 Interpretation of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Choice of model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4 Selection of structure of V0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4.2 The exponential correlation model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4.3 Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5 Method of analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5.1 Restricted maximum likelihood estimation(REML) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5.2 Box-Cox transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.5.3 Box-Cox transformation and REML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.6.1 Estimates of the parameters of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.6.2 Model adequacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.7 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.9 Interpretation of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.10 Testing between nonNONE categories and interpretation of results . . . . . . . . . . . . . . . . . . . . . . . . 10
4.11 An interesting observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 To evaluate characteristic visual field defect 11

5.1 Defining baseline field score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3 Category : MILD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3.2 Testing procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.4 Category : MODERATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.4.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.4.2 Testing procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6 To evaluate Progression of visual field damage 14

6.1 Definition of Progression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.2 Objectives and problem faced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3 Future scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7 Dealing with missing data 15

7.1 Dropouts and intermittent missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.2 Dealing with intermittent missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.3 Methodology for dropouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2
8 References 16
9 Acknowledgements 17
List of Figures
1 Scatter plot of left and right IOP at different time points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Scatter plot of residual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Normal probability plot of residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Empirical cdf for T1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Empirical cdf for T2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6 Empirical cdf for T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3
1 Outlining situation and framing objectives.
90 patients each having chronic angle closure glaucoma in one or both pairs of eyes were diagonised at 4 different time points
within a time span of two years.The purpose of this project is to resolve out issues that are of help to medical experts.
1.1 Explaining the variables

1. Age of the patient at the first time point of visit
2. Gender
3. Visual acuity
4. Intraocular pressure(IOP)
5. Field score(0-20)1
• Nasal (0-2)
• Superior hemifield(0-9)
• Inferior hemifield(0-9)
6. Kind of treatment provided upto that time point
• Drop
• Trab
• PI
• IOL
• Needling
1.2 Inclusion criterion

• New and follow up cases of chronic angle closure glaucoma with or without treatment
• Absence of any other major eye disease or any other kind of glaucoma
• Age group 30-70
1.3 Categorization by glaucoma stage(by AGIS system)

• None(score 0)
• Mild(score 1-5)
• Moderate(score 6-11)
• Severe(score 12-17)
• End-stage(score 18-20)
1.4 Objectives
1. To evaluate characteristic visual field defect .
2. To assess the relationship between IOP and visual field damage.
3. To evaluate Progression of visual field damage.
2 Brief description of methods of analysis

• Under the null hypothesis it is expected that all the fields - Nasal , Superior hemifield and Inferior hemifield have same
effect. A suitable test statistic is used . The distribution of this statistic is estimated using large no. of resamples of
equal size from the original data2 .A 100(1 − 0.05)% confidence interval is obtained. Depending on where the value
under null lies the null is rejected or not rejected.
• Linear regression model taking IOP as response and all other as explanatory variables is used.As the data of a particular
individual is correlated over time an autoregressive model of order 1 is selected. Treating the dataset as longitudinal
data, linear regression model is fitted. Usual methods of analysis follows thereafter.
• Due to a very small no. of data, proper analysis of progression is not feasible. Many more follow ups of the same
datasets are required.
1 AGIS scoring system is universally used in this work
2 in literature called Bootstrap
4
3 Using this dataset
3.1 Dealing with missing data
Missing data will be categorised into dropouts and other.A separate section 3 deals with this problem.However, not much
missing data - specially the dropout variety is present in this particular dataset.
3.2 One assumption that can’t be ignored here

Due to unavailability of data , equal time spaced interval for all subsequent visits and for all patients are assumed.Although,
this may seem a crude assumption taking into account that a medical expert asks a patient for a next visit at an interval
that a particular patient deserves.From this viewpoint, the assumption is partially justified.
3.3 Handling of visual acuity

This has a special kind of data form.A chart4 is used to measure visual acuity.If a person has visual acuity 20/40, at 20 feet
from the chart that person can read letters that a person with 20/20 vision could read from 40 feet away.Since a linear model
is assumed , the visual acuity score is converted to fraction(20/40 ≡ 1/2).
4 Examining relationship between IOP and Visual field score

4.1 Selection of response and Explanatory variables
IOP is treated as a response and all other variables as explanatory.
4.2 Independence of left and right eye

For each time point,a nonparametric approach to test independence is prefered.Kendall’s test for independence based on
signs is used 5 .To avoid the difficulty due to ties the usual cut-off is not used.Fixing an eye, the IOP values for the other eye
is permuted over different patients6 to get an estimate of cut-off.
4.2.1 Preliminary Analysis

The Correlation coefficient between left eye and right eye at various time points are examined.Notice that the absolute values
decreases with time .
Time point 1:ρ = +0.3644
Time point 2:ρ = +0.3429
Time point 3:ρ = −0.1037
Time point 4:ρ = −0.0324
4.2.2 Formulation of hypothesis and testing

For each time point:
H0 ≡ τ = 0
and
H1 ≡ τ 6= 0
The Kendall sample correlation statistic for X and Y of n independent paired sample is
n−1
X n
X
K= Q((Xi , Yi ), (Xj , Yj ))
i=1 j=i+1

 1 if (d − b)(c − a) > 0
where, Q((a,b),(c,d)) = −1 if (d − b)(c − a) < 0
0 if (d − b)(c − a) = 0

0 0
RejectH0 if K ≥ kα/2 or K ≤ kα/2 where, kα/2 is upper α/2 tail probability of the null distribution of K and kα/2 is lower
α/2 tail probability of the null distribution of K.
3 refer section 8
4 technically called Snellen chart
5 for detailed theory refer Nonparametric statistical methods(Hollander and Wolfe)
6 termed ’Permutation distribution in literature
5
Time point 1 Time point 2
60 60
40 40
20 20
0 0
0 20 40 60 0 20 40 60
Time point 3 Time point 4

40 30
30
20
20
10
10
0 0
0 20 40 60 0 10 20 30 40 50
Figure 1: Scatter plot of left and right IOP at different time points
4.2.3 Evaluation of cut-off

A significant proportion of the data has tied ranks.Due to this drawback usual cut-off tables cannot be refered.Permutation
distribution was used to evaluate the cut-off.Keeping IOP values of one of the eyes fixed, the other IOP value was permuted
at random.The Kendall’s test statistic was recalculated using this dataset.This procedure is repeated for 10000 times. From
the emperical cdf of this statistic , 100(1 − 0.05)% CI was constructed.The null is then rejected or not rejected accordingly
as the kendall’s statistic from the original dataset lies within or outside this CI.
4.2.4 Results
The results for the 4 time points are as follows :
• At time point 1 the null was rejected
• At time point 2 the null was rejected
• At time point 3 the null was not rejected
• At time point 4 the null was not rejected
4.2.5 Interpretation of results

From the sample correlaion coefficient at various time points mentioned earlier it was already observed that its absolute
value decreases with time.Moreover, from the above result at the last two time points , the IOP values for the pair of eyes
of an inidividual is found to be uncorrelated. This hints to review the dataset.It is then observed that at the first time point
almost 50% of the patients had no medical treatment before.This leads to consider the treatment nonuniformity among the
patients.This effect is further continued to the second time point.But,when at the third time point the nonuniformity of being
under medication or not disappears and hence for the third and the fourth time points the IOP values for the pair of eyes
are uncorrelated.Hence,if treatment is considered as an explanatory variable IOP values of pair of eyes of a patient can be
considered uncorrelated.
6
4.3 Choice of model
˜ = β0 + Xãge β1 + Xgender
YIOP ˜ β2 + Xvisualacuity
˜ ˜ β4 + X˜iol β5 + Xtrab
β3 + Xdrop ˜ β6 + X˜pi β7 + Xneedling
˜ β8 + XM˜ILD β9 +
˜ ˜ ˜
XM ODERAT E β10 + XSEV ERE β11 + XEN DST AGE β12 + ε̃
Since, the IOP is taken as a response variable and all others as explanatory variables, from the earlier conclusions the
left and right eyes of an individual are taken as independent experimental units.However, measurements of an unit over
different time points cannot be taken uncorrelated.
GLM for longitudinal data treats y as a realization of a multivariate Gaussian random vector Y with
Y ∼ M V N (Xβ, σ 2 V )
where, V is a block diagonal matrix with nonzero 4 × 4 blocks V0 , each representing the variance matrix for the vector of
measurements on a single experimental unit.
4.4 Selection of structure of V0

4.4.1 Motivation
The sample time correlation matrix is:
1.00 0.51 0.40 0.32
0.51 1.00 0.50 0.44
0.40 0.50 1.00 0.51
0.32 0.44 0.51 1.00
Notice correlation between first and second time point is almost same with that of second and third and also with third and
fourth time point.Moreover, correlation between first and third time point is close to that of second and fourth time point.
4.4.2 The exponential correlation model

In this model V0 has jk th element, vjk = Cov(Yij , Yik ) of the form vjk = σ 2 ρabs(j−k)
Yij denotes the observation of ith experimental unit at jth time point.
A justification of above model is to represent the random variables Yij as Yij = µij + Wij , i = 1, . . . , m ,j = 1, . . . , n,where
Wij = ρWij−1 + Zij , and Zij s are mutually independent N(0,σ 2 (1 − ρ2 )) where m = no. of experimental units & n = no. of
time points for each unit .
4.4.3 Justification
In the exponential correlation model the correlation between jth and kth time points of an individual depends on j and k only
through their absolute difference.As, the sample correlation matrix almost satisfies this property ,the exponential correlation
model is selected.
4.5 Method of analysis

4.5.1 Restricted maximum likelihood estimation(REML)
In the case of the GLM with dependent errors the REML estimtor is defined as a maximum likelihood estimator based on a
linearly transformed set of data Y ∗ = AY such that the distribution of Y ∗ does not depend on β where
Y ∼ M V N (Xβ, σ 2 V )
. Calculation7 shows that the REML estimator maximises the loglikelihood equation
L∗ (σ 2 , V ) = −0.5 log(det(σ 2 V )) − 0.5 log(det(σ −2 X 0 V −1 X)) − 0.5(y − X β̂)0 σ −2 V −1 (y − X β̂)
Substituiting,β̂ and σˆ2 in the loglikelihood equation ,
L∗ (V0 ) = −0.5m(n log(RSS(V0 )) + log(det V0 )) − 0.5 log(det(X 0 V −1 X))
where,
RSS(V0 ) = (y − X β̂(V0 ))0 V −1 (y − X β̂(V0 ))
and
β̂(V0 ) = (X 0 V −1 X)−1 X 0 V −1 y
To solve V0 method of iteration is used.Subsequently, β̂ and σ̂ are obtained.
7 Refer Analysis of longitudinal dataDiggle Chapter 4
7
4.5.2 Box-Cox transformation
Original data on IOP being integer valued it is wise to apply a Box-Cox transformation 8 to ensure normality.
½ λ
(y − 1)/λẏ λ−1 if λ 6= 0
y (λ) =
ẏ ln y if λ = 0
where ẏ is the geometric mean of the response variable.Applying this transformation SS E (λ) is calculated for different values
of λ and that value of λ is chosen for which SSE (λ) is minimum.
4.5.3 Box-Cox transformation and REML

There being no closed form solution of the REML log-likelihood equation and V 0 being function of ρ only, different values of
ρ were used to evaluate the log-likelihood equation and ρ̂ is that which maximizes the log-likelihood equation.To implement
Box-Cox transformation in this set-up first a λ is fixed for which ρ̂ is evaluated and the corresponding SS E is obtained.Now,
varying over λ , λ̂ is obtained for which SSE (λ) is minimum.
4.6 Results
4.6.1 Estimates of the parameters of the model
ρ̂=0.6
λ̂=0.36
29.22
−0.01
−1.42
−0.26
−3.06
5.21
ˆ
β̃= −9.16
−1.84
−7.08
2.41
3.27
4.57
3.03
4.6.2 Model adequacy

• The scatter plot of the residuals(Figure 2 ) appears to be random which emphasizes that they do not exihibit any
definite pattern.
• The normal probability plot of the residuals (Figure 3 ) appears to be in a straight line indicating that the fact errors
are indeed normal,emphasizing normality assumption of the response is valid .
4.7 Hypothesis testing
H0 ≡ Qβ = 0
,where Q is a full-rank q × p matrix for some q ≤ p. It can be deduced that

ˆ L ∼ M V N (Qβ, QRREM
QβREM ˆ L Q0 )
,where
ˆ L = σˆ2 (X 0 V −1ˆ X)−1
RREM REM L
An appropriate test statistic for testing the hypothesis Qβ = 0 would be
ˆ L 0 Q0 (QRREM
T = βREM ˆ L
ˆ L Q0 )−1 QβREM
and the approximate null sampling distribution of T is chi-squared on q degrees of freedom.

8 For details refer Design of experimentsMontogomery
8
Plot of residual
20
15
10
−5
−10
−15
−20
0 100 200 300 400 500 600
Figure 2: Scatter plot of residual
Normal Probability Plot
0.999
0.997
0.99
0.98
0.95
0.90
0.75
Probability
0.50
0.25
0.10
0.05
0.02
0.01
0.003
0.001
−15 −10 −5 0 5 10 15
Data
Figure 3: Normal probability plot of residuals
9
4.8 Results
It is required to find out whether there is a field category effect on the IOP.Each of the p-values listed below indicate the
result for :
ˆ
H0 ≡ βcorrespondingcategory =0
versus
ˆ
H1 ≡ βcorrespondingcategory 6= 0
• P-value of βMÎLD = .029

ˆ
• P-value of βM ODERAT E = .006
• P-value of βSEVÊRE ≈ 0
ˆ AGE = .02
• P-value of βEN DST
4.9 Interpretation of results

A p-value less than 0.05 here, indicates that for any unit in that category the expected value of IOP is higher than that if
the unit was in the None category. Also, if somebody suffers from glaucoma he/she is bound to have higher IOP than that
of a normal person.
4.10 Testing between nonNONE categories and interpretation of results

In similar lines, proper testing was done to check whether there is significant difference among the four categories -
mild,moderate,severe and end stage.
Effect of severe is found to be the most whereas effect of mild is found to be the least amongst the four categories.However,difference
of effect of moderate and that of end stage was not significant.
Restating the two possible order of increasing effect of visual field categories are,
mild < moderate < endstage < severe
or,
mild < endstage < moderate < severe
4.11 An interesting observation

Normal probability plot of the residuals showed that there were 8 outliers. Retrieving the original data it has been observed
that
• In cases where outliers have positive residuals trabeculectomy has been done just after the time point at which residual
is an outlier
• In cases where outliers have negative residuals trabeculectomy has been done just before the time point at which residual
is an outlier
This emphasizes the fact that trabeculectomy has an enormous effect in reducing the IOP of patients having glaucoma .
10
5 To evaluate characteristic visual field defect
It is of interest to medical experts given a glaucomatous eye at a certain stage which is defined by the category , which
subfield has greater damage.The main emphasis in this analysis is given to mild and moderate category as in the other
higher(severe and endstage) categories the scores in each subfield already being high enough ,is impossible to compare the
degree of damage.
5.1 Defining baseline field score

As glaucoma is a very slow damage process , baseline field score gives an approximate idea of the field score values within
a short period of time. To get baseline field score , repeated measurements are taken. This has been done only at the first
time point.Enough time span is not covered through the clinical trials to refix the baseline field score.
5.2 Methodology
Visual field scoring method is the same throughout all the subfields..Each subfield has test locations - 6 for nasal , 23 for
superior and 23 for inferior.As a result a maximum of 20 score is possible , with a maximum of 2 from the nasal field, and
maximum of 9 from each of superior and inferior hemifield.With this scoring methodology, if it is assumed that each of the
subfields is affected equally by glaucoma it is expected on the average the subfield scores should be 2N/20 for nasal,9N/20 for
inferior and 9N/20 for superior where N is the total field score.Using these fact, from the available dataset using simulation
, a CI of the mean(s) is obtained.This gives a way to test H0 vs H1 .
5.3 Category : MILD

5.3.1 Hypothesis
H0 ≡Damage of glaucoma in nasal is same as in superior and inferior
H1 ≡ Nasal is affected the most
Consider,
H0Sup ≡ Damage of Glaucoma in Nasal = Damage of Glaucoma in Superior

H1Sup ≡ Damage of Glaucoma in Nasal > Damage of Glaucoma in Superior
AND
H0Inf ≡ Damage of Glaucoma in Nasal = Damage of Glaucoma in Inferior

H0Inf ≡ Damage of Glaucoma in Nasal > Damage of Glaucoma in Inferior
Clearly, testing the above two hypotheses is equivalent to test the original hypothesis.
5.3.2 Testing procedure

In accordance of H0 if (Xi , Yi , Zi ) are ordered field scores(nasal,superior,inferior) of an unit, whose total field score is N it is
expected that their mean should be close to (2N/20, 9N/20, 9N/20) . In other words mean of (X i − 2N/20, Yi − 9N/20, Zi −
9N/20) should be close (0, 0, 0).Here, two tests are performed one,concerning nasal and superior and the other concerning
nasal and inferior. Define , X
S1N = Xi − 2N/20 − Zi + 9N/20
i3Xi +Yi +Zi =N
X
S2N = Xi − 2N/20 − Yi + 9N/20
i3Xi +Yi +Zi =N
Now, X
T1 = S1N /]U nitsM ILD
N ∈M ild
X
T2 = S2N /]U nitsM ILD
N ∈M ild
From the definitions,it is clear that if the dataset is in accordance with the null T 1 and T2 should be close to zero.

Taking the available dataset as population a random sample of same size is drawn from the dataset using SRSWR.This
procedure is repeated for 10000 times.For every time point test statistics T 1 and T2) are evaluated.As a result emperical CDF
of the two statistics are obtained. From the empirical cdf , 100(1 − α)% CI of T 1 and T2 are obtained.If the value of T1 and
T2 evaluated from original dataset falls outside the CI, H0 is rejected in favour of H1 .
11
80
70
60
50
40
30
20
10
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Figure 4: Empirical cdf for T1
45
40
35
30
25
20
15
10
0
0.8 1 1.2 1.4 1.6 1.8 2
Figure 5: Empirical cdf for T2
12
5.3.4 Results
Two sided CI for T1 is (0.3596, 0.9585)
Two sided CI for T2 is (1.2372, 1.7734)
Note any of the CI(s) does not contain the value 0 and in both the cases 0 is smaller than the lower cut-off points. In both
the cases p-values are almost equal to zero(indeed p value obtained using emperical cdf was actually found to be zero).

Since the CI(s) does not contain the value 0 the null hypothesis of equal damage in nasal is same as in superior and inferior
is rejected in favour of the alternative.
For the mild category, the Nasal subfield is affected most
5.4 Category : MODERATE

5.4.1 Hypothesis
H0 ≡ Damage of glaucoma is same in superior and inferior subfield
H1 ≡ Inferior is affected more than superior
Alternatively,
H0 ≡ Damage of glaucoma in superior = Damage of glaucoma in inferior
H1 ≡ Damage of glaucoma in inferior > Damage of glaucoma in superior
5.4.2 Testing procedure

In accordance of H0 if (Yi , Zi ) are the field scores of superior and inferior respectively of an unit, whose total field score is N
it is expected that their mean should be close to (9N/20, 9N/20) . In other words mean of (Y i − 9N/20, Zi − 9N/20) should
be close (0, 0).Define , X
SN = (Yi − Zi )
i3Xi +Yi +Zi =N
Now, X
T = S N /]U nitsM ODERAT E
N ∈M oderate
From the definitions,it is clear that if the dataset is in accordance with the null T should be close to zero.

Taking the available dataset as population a random sample of same size is drawn from the dataset using SRSWR.This
procedure is repeated for 10000 times.For every time point test statistics T is evaluated.As a result an emperical CDF of the
statistics is obtained. From the empirical cdf , a 100(1 − α)% CI of T are obtained.If the value of T evaluated from original
dataset falls outside the CI, H0 is rejected in favour of H1 .
5.4.4 Results
Two sided CI of T is (1.1290,2.8710)
Note the CI does not contain the value zero and zero is smaller than the lower cut-off point.

Since the CI does not contain the value 0 the null hypothesis that damage in inferior is same as in superior is rejected in
favour of the alternative.
For the moderate category the Inferior subfield is more affected than the Superior subfield
13
350
300
250
200
150
100
50
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Figure 6: Empirical cdf for T
6 To evaluate Progression of visual field damage

6.1 Definition of Progression
Progression is quantified as a field score increase of ≥ 4 in three consecutive reliable visual field tests.
6.2 Objectives and problem faced

Progression is an important measure of how the glaucoma is advancing. It would be very helpful to practitioners if it could
be predicted when a progression would take place in an unit.
The time span over which this data is collected is less than two years.Only 3% of the units have progressed (in cardi-
nality 4-5).It is not of much worth to try to predict progression in this dataset.More time span is necessary to put forward
any comment on progression.Also, from medical perspective it not much interesting to study progression in this dataset.
6.3 Future scope

More follow ups in this study can be used in future.
• As the field score is a count data,taking field score as a response and a poisson distribution on the errors the joint
distribution of field scores upto the future time point that is needed to be predicted can be obtained.Hence,using the
conditional probability on all the data available before the time point needed to be predicted, a range of field score can
be given which may be able to determine progression.
• Also, the stochastic nature of the field scores can be used.The transition probabilities may be estimated given the
dataset and hence a prediction can be obtained.
14
7 Dealing with missing data
In the dataset two different types of missing data were observed.
• During the first visit of a few patients very high IOP was observed.Medical expertise says that it is not justified to
measure the visual field score of a patient while he/she having a very high IOP because high IOP causes a great deal
of variation in visual field score which perturbs to prepare a baseline field score .Proper medicine(s) are suggested to
control high IOP value.In the subsequent follow ups while the patient has a reasonably lower IOP visual field score is
measured and a baseline field score value is prepared.
• In some of the cases it is seen that patients did not come for subsequent follow ups.
7.1 Dropouts and intermittent missing values

• Supose it is intended to take a sequence of measurements Y1 , Y2 , . . . , Yn on a particular unit .Missing vaues are said to
be dropouts if whenever Yj is missing ,so are Yk ∀k ≥ j.
• All other types of missing values are considered to intermittent missing values.
7.2 Dealing with intermittent missing values

As discussed earlier intermittent missing values occurred because of high IOP values which leads to a large variation in visual
field score.It would be of least importance to fill in those missing values as it is well known that field score is bound to show
temporary variation even within a very short span of time. So only choice left was to discard those intermittent missing
values for the purpose of analysis .
7.3 Methodology for dropouts

Let Y ∗ denote the complete set of measurements which would have been obtained if there were no dropouts,and partition
this set into Y ∗ = (Y 0 , Y d ) with Y 0 denoting the measurements actually obtained and Y d denoting the measurements
which would have been available if there were no dropouts.Finally, R denote a set of indicator random variables, denoting
which elements of Y ∗ fall into Y 0 and which into Y d .Now, a probability model for the missing value mechanism defines the
probability distribution of R conditional on Y ∗ = (Y 0 , Y d )
• Dropouts are said to be completely random if R is independent of both Y 0 and Y d .
• Dropouts are said to be random if R is independent of Y d .
• Dropouts are said to be informative if R is dependent on Y d .
Different methods exist in literature9 to test whether dropouts are comletely random,random or informative.However in all
methodology large enough data on dropouts required.Thereafter different models follow for different types of dropouts.
In the dataset total number of dropout cases were 3 which fails to suffice the minimum number of dropouts required to test
for randomness.So analysis was done discarding those dropouts.
9 Refer Analysis of longitudinal data Diggle
15
8 References
• Analysis of longitudinal data (Diggle)
• Design and analysis of experiments (Montgomery)
• Non-parametric statistical methods (Hollander,Wolfe)
• Applied linear statistical models (John Neter)
• The elements of statistical learning (Tribshirani)
16
9 Acknowledgements
We are grateful to Dr. Sanchita Ray for her constant help and support. We extend sincere thanks to Prof. Arijit Chakraborty
and Prof. Saurabh Ghosh for their fruitful suggestions. Finally , we thank our batchmates and seniors.
17

An Application of Longitudinal Data Analysis

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

An Application of Longitudinal Data Analysis

Transféré par

Droits d'auteur :

Formats disponibles

Analysis of longitudinal data -an application to chronic angle

2 Brief description of methods of analysis 4

3 Using this dataset 5

4 Examining relationship between IOP and Visual field score 5

5 To evaluate characteristic visual field defect 11

6 To evaluate Progression of visual field damage 14

7 Dealing with missing data 15

1.1 Explaining the variables

1.2 Inclusion criterion

1.3 Categorization by glaucoma stage(by AGIS system)

2 Brief description of methods of analysis

3.2 One assumption that can’t be ignored here

3.3 Handling of visual acuity

4 Examining relationship between IOP and Visual field score

4.2 Independence of left and right eye

4.2.1 Preliminary Analysis

4.2.2 Formulation of hypothesis and testing

Time point 3 Time point 4

4.2.3 Evaluation of cut-off

• At time point 1 the null was rejected

• At time point 2 the null was rejected

• At time point 3 the null was not rejected

• At time point 4 the null was not rejected

4.2.5 Interpretation of results

4.4 Selection of structure of V0

4.4.2 The exponential correlation model

4.5 Method of analysis

L∗ (σ 2 , V ) = −0.5 log(det(σ 2 V )) − 0.5 log(det(σ −2 X 0 V −1 X)) − 0.5(y − X β̂)0 σ −2 V −1 (y − X β̂)

Substituiting,β̂ and σˆ2 in the loglikelihood equation ,

L∗ (V0 ) = −0.5m(n log(RSS(V0 )) + log(det V0 )) − 0.5 log(det(X 0 V −1 X))

4.5.3 Box-Cox transformation and REML

4.6.2 Model adequacy

4.7 Hypothesis testing

,where Q is a full-rank q × p matrix for some q ≤ p. It can be deduced that

An appropriate test statistic for testing the hypothesis Qβ = 0 would be

and the approximate null sampling distribution of T is chi-squared on q degrees of freedom.

Figure 2: Scatter plot of residual

Normal Probability Plot

Figure 3: Normal probability plot of residuals

• P-value of βMˆILD = .029

4.9 Interpretation of results

4.10 Testing between nonNONE categories and interpretation of results

mild < moderate < endstage < severe

4.11 An interesting observation

5.1 Defining baseline field score

5.3 Category : MILD

H0Sup ≡ Damage of Glaucoma in Nasal = Damage of Glaucoma in Superior

H0Inf ≡ Damage of Glaucoma in Nasal = Damage of Glaucoma in Inferior

5.3.2 Testing procedure

5.3.3 Evaluation of cut-off

Figure 4: Empirical cdf for T1

Figure 5: Empirical cdf for T2

5.3.5 Interpretation of results

5.4 Category : MODERATE

5.4.2 Testing procedure

5.4.3 Evaluation of cut-off

5.4.5 Interpretation of results

Figure 6: Empirical cdf for T

6 To evaluate Progression of visual field damage

6.2 Objectives and problem faced

6.3 Future scope

7.1 Dropouts and intermittent missing values

7.2 Dealing with intermittent missing values

7.3 Methodology for dropouts