Vous êtes sur la page 1sur 6

Accepted Manuscript

Title: Using Artificial Intelligence in an Intelligent Way to Improve Efficiency


of a Heart Failure Care Team

Author: Griffin M. Weber

PII: S1071-9164(18)30144-1
DOI: https://doi.org/10.1016/j.cardfail.2018.04.003
Reference: YJCAF 4125

To appear in: Journal of Cardiac Failure

Received date: 11-4-2018


Accepted date: 11-4-2018

Please cite this article as: Griffin M. Weber, Using Artificial Intelligence in an Intelligent Way to
Improve Efficiency of a Heart Failure Care Team, Journal of Cardiac Failure (2018),
https://doi.org/10.1016/j.cardfail.2018.04.003.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will
undergo copyediting, typesetting, and review of the resulting proof before it is published in its
final form. Please note that during the production process errors may be discovered which could
affect the content, and all legal disclaimers that apply to the journal pertain.
Using artificial intelligence in an intelligent way to improve efficiency of a heart failure
care team

Griffin M Weber, MD, PhD1,2


1
Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
2
Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts

Griffin M Weber, MD, PhD


Department of Biomedical Informatics
Harvard Medical School
10 Shattuck St
Boston, MA 02115

Email: weber@hms.harvard.edu

Disclosures: Dr. Weber is supported by NIH/NCATS UL1TR001102, NIH/NIGMS


U01GM112623, NIH/NIGMS U01GM112623, NIH/NCI U01CA198934, NIH/NHGRI
U54HG007963, NSF/SciSIP SMA-1360042, and PCORI CDRN1306-04608.

Although the potential benefits of artificial intelligence (AI) to medicine have been discussed for

several decades, there are several reasons to think that the promised impact of AI will finally be

here soon [1,2]: the rapid adoption of electronic health records has made the large amounts of

data needed to develop and test AI algorithms much more readily available; wearable devices

and environmental sensors are providing new ways of digitally monitoring patients’ health;

improvements in AI algorithms have resulted in significant advances in the ability of computers

to recognize patterns in data; and, consumer products, such as voice recognition in smart

phones, are leading towards greater acceptance and trust in AI. However, many also rightly

warn about the hype around AI [3,4,5]. Computers will neither be curing diseases nor

eliminating the need for human doctors in the immediate future. The goals for AI should be more

incremental. It should be viewed as a tool that can assist providers in making clinical decisions

and help them work more efficiently and with fewer medical errors.

Page 1 of 5
In this issue of the Journal, Blecker et al sought to identify patients with acute decompensated

heart failure (ADHF) early in their hospitalization so that interventions to reduce readmissions,

such as patient education and involvement of multidisciplinary teams, can be coordinated before

the patients are discharged. The current approach at Tisch Hospital in New York City, where

this study was conducted, is a screening tool used by the heart failure transition team that looks

for patients with evidence for heart failure on the basis of the problem list, inpatient loop diuretic

use, or a BNP value >=500 pg/ml. With a sensitivity of 0.98, this method correctly identifies

nearly every patient who was later discharged with a diagnosis of ADHF. However, with a low

positive predictive value (PPV) of only 0.13, most of the patients selected by the screening tool

do not actually have ADHF. Providers, therefore, must perform manual chart review, averaging

61.4 minutes of effort to find each patient with ADHF.

Blecker and colleagues tested whether logistic regression models can generate a better

screening tool. Although viewed as a more traditional machine learning classification technique,

logistic regression often works well for simple binary decision problems [6]. The models are

easy to interpret and it is clear how the individual variables contribute to the classification. In

contrast, newer algorithms, such as “deep learning” systems based on neural networks, are the

state-of-the-art in AI for many applications including computer vision and speech recognition [7].

However, they are more complex models and treated like a black box since it is difficult to

understand how the input variables lead to the classification [3,6].

The authors compared three logistic regression models. The first was based on 31 structured

data elements, which included the three in the original screening tool as well as several

demographic, diagnosis, medication, and laboratory test variables. The second was based on

unstructured clinical notes. In this model, each of the 36,463 different words that appeared at

least ten times across all notes was a potential variable; however, including all of these in the

Page 2 of 5
logistic regression risks overfitting the model with irrelevant words. The authors used a common

technique to avoid this problem called L1-regularization, which adds a penalty to models with

large feature coefficients [8]. This resulted in all but 427 significant words dropping out of the

final model as their coefficients became zero. The third model combined both the structured

data elements and the words from clinical notes, for a final model of 432 variables after L1-

regularlization. Their dataset included 37,229 hospitalizations, of which 1,294 (3.5%) have a

principal discharge diagnosis of ADHF; and, as standard practice, they randomly divided the

hospitalizations into separate datasets used to train and test the models.

The output of the logistic regression models are values between 0 and 1, which estimate the

probability that the patient has ADHF. By selecting different probability thresholds, it is possible

to “tune” the model to a desired sensitivity or specificity. In the current study, Blecker et al fixed

the sensitivity to 0.98 so that all models would identify the same number of patients with ADHF

as the existing screening tool. This was key to their approach. Their intent was not to find

additional ADHF patients that are currently being missed. Instead, their objective was only to

raise the PPV and reduce the number of false positives, thereby eliminating the time being

spent manually reviewing those patients’ charts.

The logistic regression models based only on structured or unstructured data had PPVs of 0.13

and 0.30, respectively. The combined model, which performed the best, had a PPV of 0.34.

Although this might still appear low, when converted to time savings, it reduces the manual

chart review from over an hour down to only 25 minutes to find each patient with ADHF. This

corresponds to potentially hundreds of hours of provider time saved each year at their hospital,

given its volume of heart failure admissions.

Page 3 of 5
Perhaps the most important limitation of this study is that much of the benefit of the models

comes from the clinical notes, rather than the structured data. Due to variation in the format,

language, abbreviations, and other characteristics of clinical notes across different providers

and institutions, it is unclear how well these logistic regression models would work at another

hospital [9,10]. The models would need to be trained and tested again before they could be

used. Additionally, the reported time savings are only estimates. Blecker et al plan to use the

combined model at their hospital to generate a daily screening list for their heart failure team. It

would be interesting to learn what the actual time savings are and how the heart failure team

responds to this introduction of AI into their daily workflow.

By focusing on time savings rather than the absolute performance of the models, Blecker et al

are using AI in an intelligent way. They are neither overstating what their models can do nor

suggesting that the models can replace manual chart review. With logistic regression, they

combine both structured and unstructured clinical data in a single, easily computable and

interpretable formula. Although this results in only a modest improvement over their existing

screening tool, over time the cumulative gain in efficiency could be significant.

References

1. Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y.


Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017
Jun 21;2(4):230-243. doi: 10.1136/svn-2017-000101. eCollection 2017 Dec. Review.
PubMed PMID: 29507784; PubMed Central PMCID: PMC5829945.
2. Shameer K, Johnson KW, Glicksberg BS, Dudley JT, Sengupta PP. Machine learning in
cardiovascular medicine: are we there yet? Heart. 2018 Jan 19. pii:heartjnl-2017-
311198. doi: 10.1136/heartjnl-2017-311198. [Epub ahead of print] Review. PubMed
PMID: 29352006.
3. Cabitza F, Rasoini R, Gensini GF. Unintended Consequences of Machine Learning in
Medicine. JAMA. 2017 Aug 8;318(6):517-518. doi: 10.1001/jama.2017.7797. PubMed
PMID: 28727867.
4. Houssami N, Lee CI, Buist DSM, Tao D. Artificial intelligence for breast cancer
screening: Opportunity or hype? Breast. 2017 Dec;36:31-33.
doi:10.1016/j.breast.2017.09.003. Epub 2017 Sep 20. PubMed PMID: 28938172.

Page 4 of 5
5. Scogland B. Artificial Intelligence in Medicine: Hope or Hype? Medical Device and
Diagnostic Industry. January 2, 2018. Accessed online on March 8, 2018, at
https://www.mddionline.com/artificial-intelligence-medicine-hope-or-hype.
6. Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network
classification models: a methodology review. J Biomed Inform. 2002 Oct-Dec;35(5-
6):352-9. Review. PubMed PMID: 12968784.
7. Krittanawong C, Zhang H, Wang Z, Aydar M, Kitai T. Artificial Intelligence in Precision
Cardiovascular Medicine. J Am Coll Cardiol. 2017 May 30;69(21):2657-2664. doi:
10.1016/j.jacc.2017.03.571. Review. PubMed PMID:28545640.
8. Ng AY. Feature selection, L1 vs L2 regularlization, and rotational invariance. ICML '04
Proceedings of the twenty-first international conference on Machine learning. 2004.
Page 78.
9. Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA,
Boomershine CS, Lasko TA, Xu H, Karlson EW, Perez RG, Gainer VS, Murphy SN,
Ruderman EM, Pope RM, Plenge RM, Kho AN, Liao KP, Denny JC. Portability of an
algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform
Assoc. 2012 Jun;19(e1):e162-9. Epub 2012 Feb 28. PubMed PMID: 22374935; PubMed
Central PMCID: PMC3392871.
10. Rochefort CM, Buckeridge DL, Forster AJ. Accuracy of using automated methods for
detecting adverse events from electronic health record data: a research protocol.
Implement Sci. 2015 Jan 8;10:5. doi: 10.1186/s13012-014-0197-6. PubMed PMID:
25567422; PubMed Central PMCID: PMC4296680.

Page 5 of 5

Vous aimerez peut-être aussi