Vous êtes sur la page 1sur 11

ARTICLE IN PRESS

Risk Prediction for Ischemic Stroke and Transient Ischemic


Attack in Patients Without Atrial Fibrillation:
A Retrospective Cohort Study

Zhong Yuan, MD, PhD,* Erica A. Voss, MPH,† Frank J. DeFalco, BA,†
Guohua Pan, PhD,† Patrick B. Ryan, PhD,* Daniel Yannicelli, MD,‡,1 and
Christopher Nessel, MD†

Background: Stroke mainly occurs in patients without atrial fibrillation (AF). This
study explored risk prediction models for ischemic stroke and transient ischemic
attack (TIA) in patients without AF. Methods: Three US-based healthcare data-
bases (Truven MarketScan Commercial Claims and Encounters [CCAE], Medicare
Supplemental [MDCR], and Optum Clinformatics [Optum]) were used to estab-
lish patient cohorts without AF during the index period of 2008-2012. The performance
of 2 existing models (CHADS2 and CHA2DS2-VASc) for predicting stroke and TIA
was examined by fitting a logistic regression to a training dataset and evaluating
predictive accuracy in a validation dataset (area under the curve, AUC) using pa-
tients with complete follow-up of 1 or 3 years, separately. Results: The commercial
populations were younger and had fewer comorbidities than Medicare-eligible pop-
ulation. The incidence proportions of ischemic stroke and TIA during 1 and 3
years of follow-up were .5% and 1.9% (CCAE), .6% and 2.2% (Optum), and 4.6%
and 13.1% (MDCR), respectively. The models performed consistently across all 3
databases, with the AUC ranging from .69 to .77 and from .68 to .73 for 1- and
3-year prediction, respectively. Predictive accuracy was lower than the initial work
of CHADS2 evaluation in patients with AF (AUC: .82), but consistent with a sub-
sequent meta-analysis of CHADS2 (.60-.80) and CHA2DS2-VASc performance (.64-.79).
Conclusion: Although the existing schemes for predicting ischemic stroke and TIA
in patients with AF can be applied to patients without AF with comparable pre-
dictive accuracy, the evidence suggests that there is room for improvement in these
models’ performance. Key Words: Stroke—transient ischemic attack—risk
prediction—stroke prevention.
© 2017 The Authors. Published by Elsevier Inc. on behalf of National Stroke
Association. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).

From the *Janssen Research & Development, LLC, Titusville, New Jersey; †Janssen Research & Development, LLC, Raritan, New Jersey;
and ‡Janssen Scientific Affairs, LLC, Titusville, New Jersey.
Received April 29, 2016; revision received February 9, 2017; accepted March 24, 2017.
Declaration of financial/other relationships: The authors (Z.Y., E.A.V, F.J.D, G.P., P.B.R, C.N.) are salaried employees of Janssen Research &
Development, LLC, USA.
Address correspondence to Zhong Yuan, MD, PhD, Janssen Research & Development, LLC, 1125 Trenton-Harbourton Rd, Titusville, NJ
08560. E-mail: zyuan6@its.jnj.com.
1
Dr. Yannicelli: dyannicelli@comcast.net (has left Janssen Scientific Affairs, LLC, at the time of resubmission).
1052-3057/$ - see front matter
© 2017 The Authors. Published by Elsevier Inc. on behalf of National Stroke Association. This is an open access article under the CC BY-
NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
http://dx.doi.org/10.1016/j.jstrokecerebrovasdis.2017.03.036

Journal of Stroke and Cerebrovascular Diseases, Vol. ■■, No. ■■ (■■), 2017: pp ■■–■■ 1
ARTICLE IN PRESS
2 Z. YUAN ET AL.

Introduction vious analyses have demonstrated that the risk of stroke


among patients without AF is also positively associated
Stroke is the leading cause of adult disability, repre- with those risk scores.14 Although approximately 85%
senting a significant public health problem worldwide. of all strokes occur in people without AF, the perfor-
In the United States, stroke is the fourth leading cause mance of these schemes for predicting stroke risk has
of death among all diseases, with an annual incidence not been comprehensively examined in this population
of 795,000, resulting in nearly 130,000 deaths a year.1 Al- from a statistical perspective.15,16 Unlike patients with
though the incidence of stroke increases with age and AF, there is no clear prophylactic strategy for stroke
with the presence of a number of comorbidities, atrial prevention in patients without AF; a good prediction
fibrillation (AF) is the most important single predictor model could promote the awareness of patients at high
of ischemic stroke (primarily through embolism of the risk, thus allowing healthcare providers to enhance
left atrial appendage thrombi), which confers nearly a five- treatment planning by aggressively managing underly-
fold increase in risk of stroke based on the Framingham ing diseases.
study.2 Given this important causal relationship, the throm- Therefore, we designed the current study and hypoth-
botic mechanism and AF as the most common cardiac esized that the commonly used schemes such as CHADS2
arrhythmia (particularly in the elderly), prophylactic an- and CHA2DS2-VASc scores can be used to predict the risks
ticoagulation has been the cornerstone in stroke prevention of stroke and TIA in patients without AF. In addition,
in patients with AF for several decades, potentially saving we intended to explore additional factors that might easily
many lives.3,4 be identified in clinical practice, and that might improve
Several risk prediction schemes were developed ini- the model performance, as measured by AUC. To ac-
tially to characterize the risk of stroke for patients with complish these study objectives and assess the robustness
AF, including those developed by the Atrial Fibrillation and consistency of the results, we employed multiple com-
Investigators (AFI) and the Stroke Prevention in Atrial mercially available databases, multiple end points, and
Fibrillation (SPAF) III investigators.5-7 In predicting stroke different durations of follow-up for ascertaining the out-
in patients with AF, the model-based C-statistic (the area comes of interest.
under the curve [AUC] for this receiver operating char-
acteristic [ROC] curve) was .68 (95% confidence interval Methods
[CI]: .65-.71) for AFI and .74 (95% CI: .71-.76) for SPAF.
Study Design, Data Sources, and Patient Selection
Based on data from the National Registry of Atrial Fi-
brillation, encompassing Medicare beneficiaries aged 65- This was a retrospective cohort study that used com-
95 years with nonrheumatic AF who were not prescribed mercially available claims databases, including Truven
warfarin on hospital discharge, Gage et al showed that MarketScan Commercial Claims and Encounters (CCAE),
the CHADS2 index (congestive heart failure [CHF], hy- Truven MarketScan Medicare Supplemental (MDCR),
pertension, age ≥75, diabetes, and prior stroke or transient and Optum Clinformatics (Optum). Briefly, CCAE is an
ischemic attack [TIA] [double score]) had an improved administrative health claims database for active employ-
performance as compared with AFI and SPAF for pre- ees, early retirees, the Consolidated Omnibus Budget
dicting stroke, with a C-statistic of .82 (95% CI, .80-.84).8 Reconciliation Act beneficiaries, and their dependents
Because of its simplicity, the CHADS2 index became the insured by employer-sponsored plans (individuals in
most commonly used scoring scheme for stroke predic- plans or product lines with fee-for-service plans and
tion in patients with AF. More recently, Lip et al developed fully capitated or partially capitated plans). CCAE cap-
the CHA2DS2-VASc score, consisting of CHF, hyperten- tures person-specific clinical utilization, expenditures,
sion, age ≥75 years (double score), diabetes mellitus, and enrollment across inpatient, outpatient, prescrip-
previous stroke and TIA (double score), vascular disease, tion drug, and carve-out services. MDCR is an
age 65-74 years, and sex (female), with an accompany- administrative health claims database for Medicare-
ing C-statistic of .606 (95% CI: .513-0.699).9 The European eligible active and retired employees and their Medicare-
Society of Cardiology, the American College of Cardiology/ eligible dependents from employer-sponsored supplemental
American Heart Association, and the National Institute plans (predominantly fee-for-service plans). Only plans
for Health and Care Excellence all now recommend the where both the Medicare-paid amounts and the employer-
use of CHA2DS2-VASc as the preferred risk scoring method paid amounts were available and evident on the claims
to assess stroke risk in AF patients, as it provides more were selected for this database. MDCR also captures
accurate assessment of low-risk patients than the exist- person-specific clinical utilization, expenditures, and en-
ing methods.10-12 rollment across inpatient, outpatient, prescription drug,
Stroke prevention in patients with nonvalvular atrial and carve-out services. Finally, Optum is an administra-
fibrillation relies on an assessment of the individual tive health claims database for members who are fully
risks and CHADS2 and CHA2DS2-VASc risk scores, and insured in commercial plans or in administrative ser-
they are commonly used in clinical practice.13 Our pre- vices only, Medicaid (prior to July 2010, 1.25 million)
ARTICLE IN PRESS
RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS 3

Figure 1. Flowchart for the cohort definition.

and Legacy Medicare Choice (prior to January 2006, Main Outcomes Measure
0.36 million). Optum also captures person-specific clin-
ical utilization, expenditures, and enrollment across The composite of ischemic stroke or TIA (at 1-year or
inpatient, outpatient, prescription drug, and carve-out 3-year observable time window, respectively) was the main
services. outcome of interest because the current study is intend-
These 3 databases were used to establish the study ed to assess the performance of the existing risk schemes
cohort. Patients were considered eligible for the cohort (i.e., CHADS2 and CHA2DS2-VASc) for its prediction. The
if a medical encounter took place between the years 2008 outcome postindex date was identified using the Inter-
and 2012, with the first service date set as the index date national Classification of Diseases, 9th Revision, Clinical
for the patient. These patients needed to be between 18 Modification (ICD-9-CM) codes present in any diagno-
and 64 years of age for CCAE and Optum and older than sis field in the database (ischemic stroke: 433.x1, 434.1,
or equal to 65 years of age for MDCR, and have at least 434.x1; TIA: 435.x). We used all diagnosis fields to as-
365 days of continuous observation time prior to the index certain the study end point, because a prior validation
date. Patients with a prior diagnosis of AF and evi- study in patients with AF by Thigpen et al17 showed that,
dence of receiving warfarin or direct oral anticoagulants although the diagnosis using the primary position had
were excluded from the analyses. an excellent positive predictive value (97.2%), the diag-
Once this base cohort was established, patients were nosis using the nonprimary position accounted for about
evaluated to determine if they had an end point of in- 20% of all valid stroke events, with a positive predic-
terest during a 1-year follow-up period, that is, ischemic tive value of 83.7%, which is still reasonably good for
stroke or composite of ischemic stroke or TIA. Patients secondary database studies. Furthermore, in our defini-
who did not experience an end point of interest were re- tion of ischemic stroke, we did not use the ICD-9-CM
quired to have at least 1-year complete observable time codes of 436 (acute, but ill-defined cerebrovascular disease)
postindex date, whereas patients who experienced an end primarily for 2 reasons: (1) in the aforementioned study,
point of interest did not need to meet that requirement this code identified fewer than 3% of all stroke events
(Fig 1). As part of sensitivity analysis, we also repeated (in patients with AF), and (2) the accuracy of this code
the analyses using a 3-year observable time window in patients without AF has not been comprehensively ex-
postindex date. amined. Finally, given that TIA may be considered a soft
ARTICLE IN PRESS
4 Z. YUAN ET AL.
end point, we also assessed the end point of ischemic indices, and factors associated with resource utilization
stroke alone to corroborate the primary analysis.17 (e.g., number of outpatient visits, hospitalizations).20
For descriptive purposes, a number of baseline For these analyses, we divided each study cohort into 2
comorbidities of interest were identified using the ICD- groups: one for the training dataset and one for the
9-CM codes, including diabetes, CHF and left ventricle validation dataset. Summary statistics were reported on
dysfunction, myocardial infarction, chronic obstructive pul- full cohorts created in each dataset, and prediction
monary disease, heart failure, vascular disease, model statistics were reported based on the validation
hypertension, hyperlipidemia, hyperthyroidism, throm- dataset. The AUC scores from the PLP models were
boembolism, liver disease, renal disease, cancer, ischemic compared with the AUC scores from the CHADS2 and
stroke, and TIA. Appendix S1 presents the ICD-9-CM codes CHA2DS2-VASc models (respectively) to show the models’
for these comorbid conditions. improvements, if any.
Only de-identified patient-level data were analyzed for
this study and institutional review board oversight was
Statistical Analyses not required. All analyses were conducted in R version
Descriptive statistics were provided for patient demo- 3.2.1 (Vienna, Austria) and the main package used was
graphics and baseline comorbidities. Means and standard the PatientLevelPrediction (PLP) package generated from
deviations were reported for continuous variables, whereas the Observational Health Data Sciences and Informatics
counts and frequencies were reported for categorical vari- open-source community.21,22
ables. The incidence of ischemic stroke and TIA was
presented as incidence proportion for each fixed observ-
Results
able time period along with standard errors (SE). Because
the sample size was quite large for each database, we A total of 12,006,960 (CCAE), 5,318,574 (Optum), and
did not present CI for the incidence proportions as the 1,371,352 (MDCR) patients were included in the final anal-
data points were nearly identical. As expected, SE con- ysis from each database. The baseline characteristics are
verged near 0 as sample size increased, but it should be presented in Table 1. Hypertension, hyperlipidemia, and
interpreted in a proper context because it did not take cancer were the most prevalent comorbidities among the
other factors into consideration (e.g., potential system- study patients. As expected, the privately insured patient
atic error, measurement error, or misclassification). populations (CCAE and Optum) were younger and had
The scores for the existing schemes of CHADS2 (CHF; fewer comorbidities than the Medicare-eligible popula-
hypertension; age older than or equal to 75 years; dia- tion (MDCR), corresponding to CHADS2 and CHA2DS2-
betes mellitus; and prior stroke, TIA, or thromboembolism) VASc scores (standard deviation) of .3 (.6) and .9 (.8) for
and CHA2DS2-VASc (CHADS2 + vascular disease, age CCAE, .3 (.6) and .8 (.8) for Optum, and 1.7 (1.2) and
between 65 and 74 years, and sex category) were calcu- 3.2 (1.3) for MDCR. Gender distribution was generally
lated based on baseline comorbidities.8,9 For the CCAE balanced for the CCAE and Optum patients, whereas about
and Optum, because all patients were younger than 65 56.2% of patients were females for MDCR, which is con-
years of age, the CHADS2 and CHA2DS2-VASc scores were sistent with the gender distribution in the overall MDCR
modified and calculated without the age category. To assess database. These observations are generally consistent with
the performance of these 2 schemes for predicting isch- the findings from previous investigations of patient char-
emic stroke and TIA, we performed a logistic regression acteristics across databases.23,24
model, including the risk scores as independent vari- The incidence proportion (SE) of ischemic stroke and
able and outcomes of interest as dependent variable. The TIA during 1 and 3 years of follow-up were .5% (.00002)
model differentiates the correct classification of each patient and 1.9% (.00006) for CCAE, .6% (.00004) and 2.2% (.00010)
(outcome or not), and the predicted probability was plotted for Optum, and 4.6% (.00020) and 13.1% (.00039) for
in an ROC curve with sensitivity against 1-specificity. The MDCR, respectively. Within each database cohort, the pre-
AUC for this ROC curve is also known as the C-statistic, dictive accuracy between the 2 models (CHADS2 and
with a range of .5 (no discrimination) to a theoretical CHA2DS2-VASc scores) was similar regardless of follow-
maximum of 1.18,19 The model performance was com- up time, with the AUC difference less than .01. Overall,
pared across risk schemes (CHADS2 and CHA2DS2- the models performed similarly for the MDCR cohort,
VASc), databases (CCAE, Optum, MDCR), and different with the AUC ranging from .68 (95% CI: .68-.69) to .70
observable time periods (1-year and 3-year complete ob- (95% CI: .69-.70) regardless of follow-up time period
servable time). (Table 2). For the CCAE and Optum cohorts, the models
To explore whether additional parameters would improve generally performed slightly better for 1-year follow-up
the patient-level prediction (PLP), we employed a regu- time (the AUC ranged from .72 to .74) as compared with
larized logistic regression model, which included a large 3-year follow-up time (the AUC ranged from .69 to .70),
number of baseline covariates, including age, sex, diag- with the AUC difference between .03 and .04 with the
noses, procedures, comorbidities, medications, comorbidity same model for different observation periods.
ARTICLE IN PRESS
RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS 5
Table 1. Baseline characteristics of the study patients without atrial fibrillation

CCAE Optum MDCR

Study cohort, N 12,006,960 5,318,574 1,371,352


Baseline demographics
Mean age, years 43 42 76
Standard deviation 13 13 7
Sex (female) (%) 50.9 50.9 56.3
Baseline comorbidity of interest
Diabetes (%) 7.7 7.0 23.4
Congestive heart failure and left ventricle dysfunction (%) 1.0 .9 9.8
Myocardial infarction (%) .5 .3 3.1
Chronic obstructive pulmonary disease (%) 2.0 1.7 15.2
Heart failure (%) .7 .6 8.4
Vascular disease (%) 2.5 2.2 15.6
Hypertension (%) 22.1 21.4 64.7
Hyperlipidemia (%) 25.3 26.2 45.9
Hyperthyroidism (%) 1.3 1.1 1.4
Thromboembolism (%) .1 .0 .4
Liver disease (%) 5.2 5.2 6.9
Renal disease (%) .6 .6 5.4
Cancer (%) 11.5 10.8 36.0
Ischemic stroke, TIA (%) .8 .6 8.8
Ischemic stroke, TIA, and thromboembolism (%) .9 .6 9.1
Ischemic stroke, TIA, and thromboembolism (within 60 days of index) (%) .1 .1 .8
CHADS2 (SD) .3 (.6) .3 (.6) 1.7 (1.2)
CHA2DS2-VASc (SD) .9 (.8) .8 (.8) 3.2 (1.3)

Abbreviations: CCAE, Truven MarketScan Commercial Claims and Encounters; MDCR, Truven MarketScan Medicare Supplemental; Optum,
Optum Clinformatics; SD, standard deviation; TIA, transient ischemic attack.

Table 2. CHADS2 and CHA2DS2-VASc scores for predicting outcomes of interest among patients without AF: AUC

Risk prediction model: AUC (95% CI)

CCAE Optum MDCR

Study cohort, N 12,009,924 5,318,577 1,373,502


Outcome: composite of ischemic stroke or TIA
CHADS2 With 1-year complete follow-up .73 .74 .70
(.73-.73) (.73-.74) (.69-.70)
With 3-year complete follow-up .70 .71 .69
(.70-.70) (.70-.71) (.68-.69)
CHA2DS2-VASc With 1-year complete follow-up .72 .73 .69
(.72-.72) (.72-.73) (.69-.69)
With 3-year complete follow-up .69 .70 .68
(.69-.69) (.70-.70) (.68-.69)
Outcome: ischemic stroke
CHADS2 With 1-year complete follow-up .76 .77 .71
(.76-.76) (.76-.78) (.70-.71)
With 3-year complete follow-up .73 .73 .70
(.72-.73) (.72-.74) (.69-.70)
CHA2DS2-VASc With 1-year complete follow-up .74 .75 .70
(.73-.74) (.74-.76) (.69-.70)
With 3-year complete follow-up .71 .71 .69
(.70-.71) (.71-.72) (.69-.69)

Abbreviations: AF, atrial fibrillation; AUC, area under the curve; CI, confidence interval; CCAE: Truven MarketScan Commercial Claims
and Encounters; Optum: Optum Clinformatics; MDCR: Truven MarketScan Medicare Supplemental; TIA, transient ischemic attack.
ARTICLE IN PRESS
6 Z. YUAN ET AL.
Table 3. Common predictors for all 3 databases for the outcome of stroke and TIA looking back 365 days with associated betas

CCAE MDCR Optum


Description beta beta beta

Intercept −6.1026 −3.8255 −6.0192


Number of distinct conditions observed in 365 days on or prior to cohort index −.0090 .0140 .0211
Number of distinct drug ingredients observed in 365 days on or prior to cohort index .0038 .0125 .0076
Number of distinct procedures observed in 365 days on or prior to cohort index .0031 −.0042 −.0100
Number of visits observed in 365 days on or prior to cohort index −.0084 .0010 −.0009
Number of ER visits observed in 365 days on or prior to cohort index .0370 −.0026 −.0731
Charlson index—Romano adaptation, using conditions all time on or prior to cohort index .0764 .0094 .0247
Diabetes Comorbidity Severity Index, using conditions all time on or prior to cohort index .0224 .0457 .0547
CHADS2, using conditions all time on or prior to cohort index .5308 .1977 .7552
Condition era record observed during anytime on or prior to cohort index: type 2 diabetes −.4195 −.1959 −.4671
mellitus
Condition era record observed during anytime on or prior to cohort index: transient cerebral .5423 .2339 .2015
ischemia
Condition occurrence record observed during 365 days on or prior to cohort index: cerebral 1.0777 .5913 .6942
infarction due to thrombosis of cerebral arteries
Condition era record observed during anytime on or prior to cohort index: cerebral infarction .2526 .1107 1.1025
due to thrombosis of cerebral arteries
Condition era record observed during anytime on or prior to cohort index: adult health −.0112 −.0106 −.0073
examination
Number of ingredients within the drug group observed all time on or prior to cohort index: −.0596 .0091 −.1495
DRUGS USED IN DIABETES
Number of ingredients within the drug group observed all time on or prior to cohort index: −.0420 −.0094 .1270
ANALGESICS

Abbreviations: CCAE, Truven MarketScan Commercial Claims and Encounters; ER, emergency room; MDCR, Truven MarketScan Medi-
care Supplemental; Optum, Optum Clinformatics; TIA, transient ischemic attack;
Beta: coefficient from regularized regression (average shrinkage estimate). Condition era: a condition era is defined as a span of time when
the person is assumed to have a given condition. Condition eras are chronological periods of condition occurrence records. A 30-day gap
between records was used (http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:condition_era).

The sensitivity analysis showed that the performance cluding CHADS2 scores. These additional factors can be
of the models for ischemic stroke alone generally fol- broadly summarized into 3 categories: (1) general base-
lowed a similar pattern (Table 2). In addition, the models line health condition of the patient and resource utilization
performed slightly better for ischemic stroke alone than (e.g., number of distinct conditions observed, number of
for the composite of ischemic stroke or TIA, with the AUC distinct drug ingredients observed, number of distinct pro-
difference between .02 and .03 for the same database and cedures observed, number of visits); (2) risk scores, for
same observation period, particularly for the commer- example, Charlson index and CHADS2 scores; and (3) prior
cial population (CCAE and Optum databases). stroke and TIA. In addition, we identified several factors
The regularized regression included a large number of that were PLP model predictors for the CCAE and Optum
baseline covariates in the PLP models. As compared with databases, but not for MDCR (Table 4). Not surpris-
univariate models that included only CHADS 2 and ingly, the oldest age categories of 50-54, 55-59, and 60-
CHA2DS2-VASc scores, the regularized regression im- 64 were identified as independent predictors for the CCAE
proved the prediction accuracy (AUC) by a range of .05- and Optum databases. However, other predictors need
.10, which was generally more pronounced for patients to be interpreted with caution, particularly because those
in the CCAE and Optum databases than the MDCR da- same factors showed directionally different impact on out-
tabase (Fig 2, A,B). The AUCs from the PLP models were comes between CCAE and Optum databases.
statistically superior to the corresponding models based
on CHADS2 and CHA2DS2-VASc scores (all P < .008).
Discussion
Using the end point of ischemic stroke and TIA with
1-year complete follow-up time as an example, the PLP Electronic medical records have been increasingly used
model ended up with 110 predictors for CCAE, 101 pre- as a valuable source for medical and scientific research.
dictors for Optum, and 228 predictors for MDCR. Fifteen Although CHADS2 and CHA2DS2-VASc are the 2 most
predictors were identified in all databases (Table 3), in- commonly used risk schemes for predicting stroke in pa-
ARTICLE IN PRESS
RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS 7

CCAE OPTUM MDCR

CCAE OPTUM MDCR

Figure 2. (A) The receiver operator curve (ROC) for various models across databases at 1-year complete follow-up time. (B) The ROC for various models
across databases at 3-year complete follow-up time. The AUCs from the PLP models were statistically superior to the corresponding models based on
CHADS2 and CHA2DS2-VASc scores (all P < .008). Abbreviations: AUC, area under the curve, also known as C-statistic; CCAE, Truven MarketScan
Commercial Claims and Encounters; MDCR, Truven MarketScan Medicare Supplemental; Optum, Optum Clinformatics; PLP, patient-level prediction
using a regularized regression; TIA, transient ischemic attack.

tients with AF, the performance of these models has not yses showed that these 2 models had generally modest
been comprehensively evaluated in patients without AF, to good performance in predicting the composite of isch-
which motivated us to conduct the current study. Using emic stroke or TIA among patients without AF (with the
3 large healthcare databases in the United States, our anal- AUC ranging from .68 to .74), with slightly better results
ARTICLE IN PRESS
8 Z. YUAN ET AL.
Table 4. Common predictors for CCAE and Optum but not MDCR for the outcome of stroke and TIA looking back 365 days with
associated betas

CCAE MDCR Optum


Description beta beta beta

Age group: 50-54 .5335 — .4087


Age group: 55-59 .7032 — .4449
Age group: 60-64 .7610 — .8652
Number of distinct observations observed in 365 days on or prior to cohort index .0058 — −.0009
Condition era record observed during anytime on or prior to cohort index: low back pain −.0000 — .0020
Condition era record observed during anytime on or prior to cohort index: pure .0247 — .0288
hypercholesterolemia
Procedure occurrence record observed during 365 days on or prior to cohort index: established .1173 — .1251
patient office or other outpatient, visit typically 25 minutes
Condition era record observed during anytime on or prior to cohort index: vaccination required −.1323 — −.2064
Procedure occurrence record observed during 365 days on or prior to cohort index within .2089 — .0904
procedure group: chem. metabolic function tests
Procedure occurrence record observed during 365 days on or prior to cohort index within −.0000 — .0810
procedure group: surgical pathology procedure
Drug era record observed during 365 days on or prior to cohort index within drug group: .0000 — .0878
corticosteroids acting locally
Number of ingredients within the drug group observed all time on or prior to cohort index: SEX −.0230 — −.0424
HORMONES AND MODULATORS OF THE GENITAL SYSTEM
Number of ingredients within the drug group observed all time on or prior to cohort index: .0295 — −.0016
ANTIBACTERIALS FOR SYSTEMIC USE
Number of ingredients within the drug group observed all time on or prior to cohort index: −.0980 — −.1067
COUGH AND COLD PREPARATIONS
Number of ingredients within the drug group observed all time on or prior to cohort index: .0406 — −.0603
ANTI-INFLAMMATORY AND ANTIRHEUMATIC PRODUCTS
Number of ingredients within the drug group observed all time on or prior to cohort index: .0645 — −.0087
PSYCHOANALEPTICS
Procedure occurrence record observed during 365 days on or prior to cohort index within .0000 — .0081
procedure group: imaging of brain

Abbreviations: CCAE, Truven MarketScan Commercial Claims and Encounters; MDCR, Truven MarketScan Medicare Supplemental; Optum,
Optum Clinformatics; TIA, transient ischemic attack.
Beta: coefficient from regularized regression (average shrinkage estimate). Condition era: a condition era is defined as a span of time when
the person is assumed to have a given condition. Condition eras are chronological periods of condition occurrence records. A 30-day gap
between records was used (http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:condition_era).

for the end point of ischemic stroke alone (with the AUC SPAF schemes (C-statistic of .68 [95% CI: .65-.71] and .74
ranging from .70 to .77 and a general improvement of [95% CI: .71-.76], respectively).8 In later work, using the
the AUC from .02 to .03), particularly for younger patient combined clinical trial datasets from SPORTIF III and
cohorts at 1 year of complete follow-up time. These find- SPORTIF V (Stroke Prevention using ORal Thrombin In-
ings appeared clinically intuitive and not surprising. As hibitor in atrial Fibrillation), Lip et al developed and
patients aged and were followed up longer, the base- validated the CHA2DS2-VASc scheme.9 Although the model
line characteristics may have lost significance, whereas performance was relatively modest with a C-statistic of
other health behaviors and (postbaseline) comorbidities .65 (95% CI: .61-.68), this new risk scheme identified the
may have become more prominent. Those results were greatest proportion of AF patients at high risk for stroke
robust and generally consistent across different patient as compared with other risk stratification schemes. Cur-
cohorts and with various sensitivity measures. rently, the treatment guidelines generally all recommend
The risk schemes of CHADS2 and CHA2DS2-VASc were the use of CHA2DS2-VASc as the preferred risk scoring
both developed for patients with AF. In essence, the method to assess stroke risk in AF patients.25,26
CHADS2 score was the combination of 2 stroke predic- The initial work associated with CHADS2 and CHA2DS2-
tion schemes developed earlier: the AFI and the SPAF VASc schemes was compelling, but interestingly a recent
III. As previously mentioned, Gage et al showed that the systematic review and meta-analysis suggested that the
CHADS2 scheme (C-statistic of .82; 95% CI: .80-.84) had performances of these schemes in patients with AF were
better prediction of stroke as compared with the AFI and heterogeneous and study population-dependent, with the
ARTICLE IN PRESS
RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS 9
C-statistic ranging from .60 to .80 (median: .683) for differences in how stroke and TIA are coded in electron-
CHADS2 and .64-.79 (median: .673) for CHA2DS2-VASc.27,28 ic healthcare records. For instance, stroke is a severe
Given these findings, our analyses suggested that these clinical event and often requires an intensive work-up
2 existing risk schemes performed similarly in patients prior to a diagnosis being recorded, for example, “ICD9
without AF and could be used in clinical practice to 434.1-cerebral embolism,” which falls under ischemic
manage these patients, particularly given that the large stroke code sets. In contrast, the condition of TIA may
majority of stroke events occur in patients without AF.15,16 be less specific (e.g., “ICD9 435.3-Vertebrobasilar artery
Conceptually, clinicians could use these models to iden- syndrome”), which may or may not be associated with
tify patients without AF who are at much higher risk for some thrombotic risk factors. However, from the per-
stroke, perhaps necessitating more aggressive therapies spective of prevention, there is value in predicting TIA
to manage underlying diseases (e.g., diabetes, hyperten- early, as it might help manage underlying diseases
sion, heart failure). In addition, clinicians can also inform more aggressively and prevent more severe events from
high-risk patients to look out for early signs and symp- occurring (e.g., cerebral infarction).
toms of stroke, so as to allow potential early therapeutic Our study has limitations and our results should be
intervention, if stroke occurs, to prevent the devastating interpreted in that context. The CCAE and Optum reflect
consequences of an ischemic event. Based on the empir- privately insured populations (mainly for patients <65 years
ical data presented in this study, clinicians should also of age), whereas MDCR reflects privately insured pa-
be aware that the models had better predictions of events tients with supplemental Medicare coverage (mainly >65
at 1 year as opposed to 3 years, which is biologically in- years of age). These databases capture healthcare claims
tuitive. Of note, other researchers also investigated the that are gathered primarily for reimbursement pur-
factors that are associated with the risk of ischemic stroke. poses rather than to answer a particular medical or
However, those studies may be limited by the use of com- scientific question. Therefore, the data suffer from limi-
munity studies, a non-US population, or studies of a patient tations that are similar to those in other claims databases
population with a particular condition.29-32 From that per- (e.g., coding practice, accuracy, and completeness of the
spective, our study population tended to be more robust data reported). However, previous studies have sug-
and representative of a general patient population without gested that clinically important end points, such as stroke,
AF. myocardial infarction, and cancer, can reliably be iden-
In addition to examining the performance of the tified through those electronic healthcare records.33,34
existing risk schemes, we also employed a regularized Although our primary analysis focused on the compos-
regression approach to explore a large number of ite of ischemic stroke and TIA, where TIA may be
baseline characteristics, with intention to improve the considered to be a soft end point, the results from the
model performance and to identify additional risk sensitivity analysis that focused on ischemic stroke alone
factors that are not part of the current risk scores. were largely consistent with the primary findings, dem-
Several findings are worth noting. First, a number of onstrating the robustness of the primary results. It is
baseline characteristics were identified as risk factors encouraging that our regularized regression has identi-
across all databases. With the addition of those factors, fied a number of baseline characteristics associated with
the model performance was substantially improved, thrombotic risk and improved the performance of exist-
with the C-statistic approaching or above .8, particular- ing risk schemes substantially. Importantly, however, the
ly for younger patient populations (CCAE and Optum). actual utility of these models has not been tested in clin-
Not surprisingly, these factors generally reflected poor ical practice. Given the large sample sizes included in
health conditions of patients, although unlike the indi- our study, some risk factors (although statistically sig-
vidual components of the current risk schemes, most of nificant) need to be interpreted in the context of clinical
the additional factors might not be easily included in importance, and biological mechanisms for these risk factors
the risk calculation quantitatively, because a more in relation to stroke (e.g., a risk factor was a predictor
comprehensive evaluation of the patient’s medical history for one database but not the other) warrant further
may be required. Nevertheless, clinicians may still take investigation.
those factors into consideration when evaluating the The existing schemes (CHADS2 and CHA2DS2-VASc
risk of ischemic stroke and TIA for patients without AF. scores) for stroke prediction in patients with AF can be
In addition, even with CHADS2 scores in the model, applied to patients without AF with similar predictive
history of ischemic stroke and TIA was identified as an accuracy. Our results also suggest that these models can
independent risk factor, suggesting that this parameter be improved upon, but further research is required to
might need to be assigned more than 2 points in the validate our findings. Because the majority of stroke events
original risk scheme. Finally, our results also suggested occur in patients without AF, our findings highlight an
that the PLP models generally performed better for important clinical question regarding how patients who
ischemic stroke alone than for the composite of isch- are at high risk for ischemic stroke and TIA could be
emic stroke and TIA. This could be due, in part, to the managed more effectively in clinical practice.
ARTICLE IN PRESS
10 Z. YUAN ET AL.
Acknowledgment: The authors would like to thank Jesse 14. Yuan Z, Makadia R, Ryan P, et al. Incidence of ischemic
Berlin, ScD, Senior Vice President and Global Head of Epi- stroke or transient ischemic attack in patients with multi-
demiology, Johnson & Johnson, for his critical review of the ple risk factors with or without atrial fibrillation: a retrospec-
tive cohort study. Curr Med Res Opin 2015;31:1257-1266.
manuscript. 15. Hughes M, Lip GY. Stroke and thromboembolism in atrial
fibrillation: a systematic review of stroke risk factors, risk
stratification schema and cost effectiveness data. Thromb
Appendix: Supplementary Material Haemost 2008;99:295-304.
16. Bunch TJ, May HT, Bair TL, et al. Atrial fibrillation
Supplementary data to this article can be found online ablation patients have longterm stroke rates similar to
at doi:10.1016/j.jstrokecerebrovasdis.2017.03.036. patients without atrial fibrillation regardless of CHADS2
score. Heart Rhythm 2013;10:1272-1277.
17. Thigpen JL, Dillon C, Forster KB, et al. Validity of
References international classification of disease codes to identify
ischemic stroke and intracranial hemorrhage among
1. Centers for Disease Control and Prevention. Stroke facts. individuals with associated diagnosis of atrial fibrillation.
Available at: http://www.cdc.gov/stroke/facts.htm. Circ Cardiovasc Qual Outcomes 2015;8:8-14.
Accessed March 17, 2016. 18. Hanley JA, McNeil BJ. The meaning and use of the area
2. Wolf PA, Abbott RD, Kannel WB. Atrial fibrillation as under a receiver operating characteristic (ROC) curve.
an independent risk factor for stroke: the Framingham Radiology 1982;143:29-36.
Study. Stroke 1991;22:983-988. 19. Cook NR. Use and misuse of the receiver operating
3. Hart RG, Halperin JL. Atrial fibrillation and stroke: characteristic curve in risk prediction. Circulation
concepts and controversies. Stroke 2001;32:803-808. 2007;115:928-935.
4. Go AS, Hylek EM, Phillips KA, et al. Prevalence of 20. Suchard MA, Simpson SE, Zorych I, et al. Massive
diagnosed atrial fibrillation in adults: national implications parallelization of serial inference algorithms for a complex
for rhythm management and stroke prevention: the generalized linear model. ACM Trans Model Comput
AnTicoagulation and Risk Factors in Atrial Fibrillation Simul 2013;23:1-17.
(ATRIA) Study. JAMA 2001;285:2370-2375. 21. R Foundation for Statistical Computing. R: a language
5. Pearce LA, Hart RG, Halperin JL. Assessment of three and environment for statistical computing. Vienna, Austria:
schemes for stratifying stroke risk in patients with R Core Team, 2015.
nonvalvular atrial fibrillation. Am J Med 2000;109:45-51. 22. Schuemie MJ, Suchard MA, Ryan PB, et al. Package
6. Atrial Fibrillation Investigators. Risk factors for stroke “PatientLevelPrediction”. 1.1.0, 2015. Available at:
and efficacy of antithrombotic therapy in atrial fibrillation: https://github.com/OHDSI/PatientLevelPrediction.
analysis of pooled data from five randomized clinical Accessed March 17, 2016.
trials. Arch Intern Med 1994;154:1949-1957. 23. Voss EA, Ma Q, Ryan PB. The impact of standardizing
7. The SPAF III Writing Committee for the Stroke Prevention the definition of visits on the consistency of multi-database
in Atrial Fibrillation Investigators. Patients with observational health research. BMC Med Res Methodol
nonvalvular atrial fibrillation at low-risk of stroke during 2015;15:13.
treatment with aspirin. JAMA 1998;279:1273-1277. 24. Voss EA, Makadia R, Matcho A, et al. Feasibility and
8. Gage BF, Waterman AD, Shannon W, et al. Validation utility of applications of the common data model to
of clinical classification schemes for predicting stroke: multiple, disparate observational health databases. J Am
results from the National Registry of Atrial Fibrillation. Med Inform Assoc 2015;22:553-564.
JAMA 2001;285:2864-2870. 25. Durrant J, Lip GYH, Lane DA. Stroke risk stratification
9. Lip GYH, Nieuwlaat R, Pisters R, et al. Refining clinical scores in atrial fibrillation: current recommendations for
risk stratification for predicting stroke and clinical practice and future perspectives. Expert Rev
thromboembolism in atrial fibrillation using a novel risk Cardiovasc Ther 2013;11:77-90.
factor-based approach: the Euro Heart Survey on Atrial 26. Chao T, Liu C, Wang K, et al. Should atrial fibrillation
Fibrillation. Chest 2010;137:263-272. patients with 1 additional risk factor of the CHA2DS2-
10. Camm AJ, Lip GY, De Caterina R, et al. 2012 focused Vasc score (beyond sex) receive oral anticoagulation? J
update of the ESC Guidelines for the management of Am Coll Cardiol 2015;65:635-642.
atrial fibrillation: an update of the 2010 ESC Guidelines 27. Keogh C, Wallace E, Dillon C, et al. Validation of the
for the management of atrial fibrillation. Eur Heart J CHADS2 clinical prediction rule to predict ischaemic
2012;33:2719-2747. stroke: a systematic review and meta-analysis. Thromb
11. January CT, Wann LS, Alpert JS, et al. 2014 AHA/ACC/ Haemost 2011;106:528-538.
HRS guideline for the management of patients with atrial 28. Chen JY, Zhang AD, Lu HY, et al. CHADS2 versus
fibrillation: a report of the American College of CHA2DS2-VASc score in assessing the stroke and
Cardiology/American Heart Association Task Force on thromboembolism risk stratification in patients with atrial
Practice Guidelines and the Heart Rhythm Society. fibrillation: a systematic review and meta-analysis. J
Circulation 2014;130:2071-2104. Geriatr Cardiol 2013;10:258-266.
12. National Institute for Health and Care Excellence. Atrial 29. Ohira T, Shahar E, Chambless LE, et al. Risk factors for
fibrillation: clinical guidelines 2014. Available at: ischemic stroke subtypes: the Atherosclerosis Risk in
http://www.nice.org.uk/guidance/cg180/evidence/atrial Communities study. Stroke 2006;37:2493-2498.
-fibrillation-update-full-guideline-243739981. Accessed 30. Lip GY, Lin HJ, Chien KL, et al. Comparative assessment
December 12, 2015. of published atrial fibrillation stroke risk stratification
13. Reiffel JA. Atrial fibrillation and stroke: epidemiology. schemes for predicting stroke, in a nonatrial fibrillation
Am J Med 2014;127:e15-e16. doi:10.1016/ population: the Chin-Shan Community Cohort Study. Int
j.amjmed.2013.06.002. J Cardiol 2013;168:414-419.
ARTICLE IN PRESS
RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS 11
31. Welles CC, Whooley MA, Na B, et al. The CHADS2 score 33. Fisher ES, Whaley FS, Krushat WM, et al. The accuracy
predicts ischemic stroke in the absence of atrial fibrillation of Medicare’s hospital claims data: progress has been
among subjects with coronary heart disease: data from made, but problems remain. Am J Public Health
the Heart and Soul Study. Am Heart J 2011;162:555- 1992;82:243-248.
561. 34. Wahl PM, Rodgers K, Schneeweiss S, et al. Validation
32. Ntaios G, Lip GY, Makaritsis K, et al. CHADS(2), of claims-based diagnostic and procedure codes for
CHA(2)S(2)DS(2)-VASc, and long-term stroke outcome cardiovascular and gastrointestinal serious adverse events
in patients without atrial fibrillation. Neurology in a commercially-insured population. Pharmacoepidemiol
2013;80:1009-1017. Drug Saf 2010;19:596-603.

Vous aimerez peut-être aussi