Académique Documents
Professionnel Documents
Culture Documents
PII: S1071-9164(18)30144-1
DOI: https://doi.org/10.1016/j.cardfail.2018.04.003
Reference: YJCAF 4125
Please cite this article as: Griffin M. Weber, Using Artificial Intelligence in an Intelligent Way to
Improve Efficiency of a Heart Failure Care Team, Journal of Cardiac Failure (2018),
https://doi.org/10.1016/j.cardfail.2018.04.003.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will
undergo copyediting, typesetting, and review of the resulting proof before it is published in its
final form. Please note that during the production process errors may be discovered which could
affect the content, and all legal disclaimers that apply to the journal pertain.
Using artificial intelligence in an intelligent way to improve efficiency of a heart failure
care team
Email: weber@hms.harvard.edu
Although the potential benefits of artificial intelligence (AI) to medicine have been discussed for
several decades, there are several reasons to think that the promised impact of AI will finally be
here soon [1,2]: the rapid adoption of electronic health records has made the large amounts of
data needed to develop and test AI algorithms much more readily available; wearable devices
and environmental sensors are providing new ways of digitally monitoring patients’ health;
to recognize patterns in data; and, consumer products, such as voice recognition in smart
phones, are leading towards greater acceptance and trust in AI. However, many also rightly
warn about the hype around AI [3,4,5]. Computers will neither be curing diseases nor
eliminating the need for human doctors in the immediate future. The goals for AI should be more
incremental. It should be viewed as a tool that can assist providers in making clinical decisions
and help them work more efficiently and with fewer medical errors.
Page 1 of 5
In this issue of the Journal, Blecker et al sought to identify patients with acute decompensated
heart failure (ADHF) early in their hospitalization so that interventions to reduce readmissions,
such as patient education and involvement of multidisciplinary teams, can be coordinated before
the patients are discharged. The current approach at Tisch Hospital in New York City, where
this study was conducted, is a screening tool used by the heart failure transition team that looks
for patients with evidence for heart failure on the basis of the problem list, inpatient loop diuretic
use, or a BNP value >=500 pg/ml. With a sensitivity of 0.98, this method correctly identifies
nearly every patient who was later discharged with a diagnosis of ADHF. However, with a low
positive predictive value (PPV) of only 0.13, most of the patients selected by the screening tool
do not actually have ADHF. Providers, therefore, must perform manual chart review, averaging
Blecker and colleagues tested whether logistic regression models can generate a better
screening tool. Although viewed as a more traditional machine learning classification technique,
logistic regression often works well for simple binary decision problems [6]. The models are
easy to interpret and it is clear how the individual variables contribute to the classification. In
contrast, newer algorithms, such as “deep learning” systems based on neural networks, are the
state-of-the-art in AI for many applications including computer vision and speech recognition [7].
However, they are more complex models and treated like a black box since it is difficult to
The authors compared three logistic regression models. The first was based on 31 structured
data elements, which included the three in the original screening tool as well as several
demographic, diagnosis, medication, and laboratory test variables. The second was based on
unstructured clinical notes. In this model, each of the 36,463 different words that appeared at
least ten times across all notes was a potential variable; however, including all of these in the
Page 2 of 5
logistic regression risks overfitting the model with irrelevant words. The authors used a common
technique to avoid this problem called L1-regularization, which adds a penalty to models with
large feature coefficients [8]. This resulted in all but 427 significant words dropping out of the
final model as their coefficients became zero. The third model combined both the structured
data elements and the words from clinical notes, for a final model of 432 variables after L1-
regularlization. Their dataset included 37,229 hospitalizations, of which 1,294 (3.5%) have a
principal discharge diagnosis of ADHF; and, as standard practice, they randomly divided the
hospitalizations into separate datasets used to train and test the models.
The output of the logistic regression models are values between 0 and 1, which estimate the
probability that the patient has ADHF. By selecting different probability thresholds, it is possible
to “tune” the model to a desired sensitivity or specificity. In the current study, Blecker et al fixed
the sensitivity to 0.98 so that all models would identify the same number of patients with ADHF
as the existing screening tool. This was key to their approach. Their intent was not to find
additional ADHF patients that are currently being missed. Instead, their objective was only to
raise the PPV and reduce the number of false positives, thereby eliminating the time being
The logistic regression models based only on structured or unstructured data had PPVs of 0.13
and 0.30, respectively. The combined model, which performed the best, had a PPV of 0.34.
Although this might still appear low, when converted to time savings, it reduces the manual
chart review from over an hour down to only 25 minutes to find each patient with ADHF. This
corresponds to potentially hundreds of hours of provider time saved each year at their hospital,
Page 3 of 5
Perhaps the most important limitation of this study is that much of the benefit of the models
comes from the clinical notes, rather than the structured data. Due to variation in the format,
language, abbreviations, and other characteristics of clinical notes across different providers
and institutions, it is unclear how well these logistic regression models would work at another
hospital [9,10]. The models would need to be trained and tested again before they could be
used. Additionally, the reported time savings are only estimates. Blecker et al plan to use the
combined model at their hospital to generate a daily screening list for their heart failure team. It
would be interesting to learn what the actual time savings are and how the heart failure team
By focusing on time savings rather than the absolute performance of the models, Blecker et al
are using AI in an intelligent way. They are neither overstating what their models can do nor
suggesting that the models can replace manual chart review. With logistic regression, they
combine both structured and unstructured clinical data in a single, easily computable and
interpretable formula. Although this results in only a modest improvement over their existing
screening tool, over time the cumulative gain in efficiency could be significant.
References
Page 4 of 5
5. Scogland B. Artificial Intelligence in Medicine: Hope or Hype? Medical Device and
Diagnostic Industry. January 2, 2018. Accessed online on March 8, 2018, at
https://www.mddionline.com/artificial-intelligence-medicine-hope-or-hype.
6. Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network
classification models: a methodology review. J Biomed Inform. 2002 Oct-Dec;35(5-
6):352-9. Review. PubMed PMID: 12968784.
7. Krittanawong C, Zhang H, Wang Z, Aydar M, Kitai T. Artificial Intelligence in Precision
Cardiovascular Medicine. J Am Coll Cardiol. 2017 May 30;69(21):2657-2664. doi:
10.1016/j.jacc.2017.03.571. Review. PubMed PMID:28545640.
8. Ng AY. Feature selection, L1 vs L2 regularlization, and rotational invariance. ICML '04
Proceedings of the twenty-first international conference on Machine learning. 2004.
Page 78.
9. Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA,
Boomershine CS, Lasko TA, Xu H, Karlson EW, Perez RG, Gainer VS, Murphy SN,
Ruderman EM, Pope RM, Plenge RM, Kho AN, Liao KP, Denny JC. Portability of an
algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform
Assoc. 2012 Jun;19(e1):e162-9. Epub 2012 Feb 28. PubMed PMID: 22374935; PubMed
Central PMCID: PMC3392871.
10. Rochefort CM, Buckeridge DL, Forster AJ. Accuracy of using automated methods for
detecting adverse events from electronic health record data: a research protocol.
Implement Sci. 2015 Jan 8;10:5. doi: 10.1186/s13012-014-0197-6. PubMed PMID:
25567422; PubMed Central PMCID: PMC4296680.
Page 5 of 5