Drug safety surveillance using de-identied EMR and
claims data: issues and challenges
Prakash M Nadkarni ABSTRACT The author discusses the challenges of pharmacovigilance using electronic medical record and claims data. Use of ICD-9 encoded data has low sensitivity for detection of adverse drug events (ADEs), because it requires that an ADE escalate to major-complaint level before it can be identied, and because clinical symptomatology is relatively under-represented in ICD-9. A more appropriate vocabulary for ADE identication, SNOMED CT, awaits wider deployment. The narrative-text record of progress notes can potentially be used for more sensitive ADE detection. More effective surveillance will require the ability to grade ADEs by severity. Finally, access to online drug information that includes both a reliable hierarchy of drug families as well as structured information on existing ADEs can improve the focus and predictive ability of surveillance efforts. In this issue, Reisinger et al 1 describe the creation of a database intended to facilitate drug safety surveillance by monitoring for adverse events, using extracted data from two de-identied databases, a claims database and an electronic medical record (EMR) database provided by a large healthcare company. The proposed data model is a subset of a more detailed model specied by the Observa- tional Medical Outcomes Partnership. 2 That commercial enterprises engage in such work is highly laudable. The proposed data model is fairly straightfor- ward. A Persons (Patients) table records basic demographic elements and related tables list the encounters, medications, procedures, and clinical conditions for each person. The latter three tables encode the concepts being recorded using standard medical vocabularies whose contents, as well as associated hierarchical relationships, are extracted from the US National Library of Medicines Unied Medical Language System (UMLS) Metathesaurus. 3 Chronological information is essential in surveil- lance databases: to suspect a medication-related adverse event, a condition must follow the onset of medication, though of course a post hoc phenom- enon does not by itself prove cause and effect. To create the chronological information, Reisinger et al pre-processed the raw data by coalescing consecutive records for the same patient for the same medica- tion, clinical condition, or procedure into a single record. The resultant record represents an era for the therapeutic intervention or condition. Each is tagged by start and end dates that denote an episode of continued medication administration or of ongoing care visits for a condition. The coalescing heuristic used was: if one encounters a sequence of records where the start date of intervention in a subsequent record follows the end date in the preceding record by 30 days or less, the sequence can be merged into a single record. The resulting database is impressive in terms of its data volume: 43 million subjects and 1 billion drug exposures. However, both the data model and the vocabularies employed in the work bring with them signicant limitations in terms of the infer- ences one can make with regard to medication safety. To be fair, some of these limitations only serve to illustrate the challenges inherent in the problem. IDENTIFYING ADVERSE EFFECTS: DATA SOURCE AND VOCABULARY ISSUES In the above work, the only source of adverse event data that was utilized from the EMR/claims data was clinical-condition information that was encoded using the International Classication of Diseases, 9th edition (ICD-9) 4 : this was converted, where possible, into equivalent codes in MedDRA (Medical Dictionary of Regulatory Activities) 5 using exact-correspondence information in UMLS. Because they are used for billing purposes, ICD-9 data are the most readily available structured data in EMRs for identifying clinical conditions. However, such data have several issues. PROBLEMS WITH THE USE OF CLAIMS DATA FOR ADVERSE EVENT DETECTION The majority of adverse drug effects (ADEs) are recorded in the narrative text associated with the initial post-event visit or progress note, if at all. Only if severe enough to constitute a chief complaint or a major nding will they be coded using ICD-9. The requirement that ADE ndings must escalate to a major-complaint level to be picked up lowers the systems sensitivity. Under- recognition of a seemingly common problem (weight gain) was an issue with the antipsychotic risperidone, overuse of which is now the focus of federal concerns 6 : the prevalence of this ADE was only recognized when the problem escalated into obesity sufcient to cause type II diabetes or became pathological. The relatively weaker coverage of ICD-9 for (non-billable) symptomatology, in comparison to other vocabularies such as the Systematic Nomen- clature of Medicine Clinical Terms (SNOMED CT), 7 is well documented. 8 9 For example, it would be unusual to code a complaint of dry mouth due to anticholinergic medications using ICD-9. The encoding process itself is vulnerable to inaccuracies, because it is not always performed by the care Correspondence to Prakash M Nadkarni, Center for Medical Informatics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06511, USA; prakash.nadkarni@yale.edu Received 13 September 2010 Accepted 17 September 2010 J Am Med Inform Assoc 2010;17:671e674. doi:10.1136/jamia.2010.008607 671 Viewpoint paper provider during the time of the clinical encounter. Depending on the healthcare organization and the specialty, a signicant portion of the clinical record may be recorded in narrative text, which is then encoded a day or two later by medical records staff for billing and reporting purposes. As such, the encoding does not reect ground truth. For example, Stein et al 10 studied the phenomenon of post-operative pulmonary embolism as recorded in narrative text and in encoded form, and found not only signicant discrepancies between the two, but also false posi- tives and negatives in both. Some groups now promote encoding of problem lists using SNOMED CT, 11 because the latter captures symptomatology much better than ICD-9. However, there are signicant hurdles to the intended widespread deployment of SNOMED CT. Crit- ical aspects include the large size of the terminology and the signicant redundancy in its content. Projects such as the construction by the National Library of Medicine (NLM) of a CORE Subset of SNOMED CT 12 aim to address both of these issues. Nevertheless, recent work by Nadkarni and Darer 13 indicates that such subsets, while undoubtedly useful, cannot provide the necessary coverage in all circumstanceseaccess to the complete SNOMED CTcontent is still required. Another concern is that encoding of ne details of the encounter, when performed by humans with software assis- tance, is time consuming. Consequently, busy clinicians may nd this an unacceptable chore, and relegate this task to their medical records staff. Doing so would propagate the afore- mentioned concerns regarding accuracy. Conversely, for clin- ical encounters documented primarily as narrative text, a currently popular question is whether automated natural- language-processing (NLP) techniques can adequately extract all ADE-related information from the text. Wang et al 14 explored the feasibility of using NLP for ADE signal detection in a recent JAMIA paper. In their proof of concept study, Wang et al evaluated patients treated with bupropion, and the results were promising. However, the eld must replicate such work on a much larger scale to determine where the pitfalls lie. THE CHOICE OF MEDDRA AS AN ADVERSE EVENT TERMINOLOGY The FDA uses MedDRA to collect and encode reports of adverse events. Thus, mapping of ICD-9 codes to MedDRA is necessary for communication to the FDA. Using MedDRA has some advantages, notably in the area of standardized MedDRA queries. Through a knowledge base representing the ndings of various syndromes using MedDRA terms, one can search for patients whose individual ndings are consistent with disorders such as anaphylaxis, extrapyramidal manifestations, hemolysis, or renal failure. However MedDRAs design deviates signicantly from modern controlled-vocabulary-design principles as articu- lated in Ciminos classic paper 15 : its limitations have been discussed by other authors. 16e18 Concerns about MedDRA include that it is not concept-oriented, it is non-compositional, its hierarchy is arbitrarily constrained to ve levels, and, at the higher levels, it is articially mono-hierarchical, which leads to difculties in formulating queries. Because the SNOMED CT concept hierarchy is signicantly richer than MedDRAs, Bodenreider attempted to map, using automated approaches, MedDRA preferred terms (the equiva- lent of concepts) to SNOMED CT concepts. 19 He found that 58% of MedDRAs preferred terms could be mapped this way. Thus, the incorporation of additional intermediate-level concepts from SNOMED may make MedDRA-encoded data easier to categorize, aggregate, and analyze meaningfully. GRADING OF ADVERSE EVENTS Early and sensitive adverse event detection requires adverse event grading. Merely recording that a drug causes an adverse event is not enougheone must know how severe it is. In the running example of the Reisinger et al paper, acute myocardial infarction represents only the tip of the iceberg of coronary artery disease leading to occlusion. Patients with acute myocardial infarction frequently experience symptoms such as anginal pain beforehand. It is important to catch adverse events before they escalate into full-blown emergencies. While Reisinger et al mention ischemic heart disease in passing, it is not clear how their model would represent progression of given disorders along a spectrum from mild to severe such that all intermediate states would t as recognizable components of the same disease process. Some adverse events, by their very natureesuch as anaphy- laxis or toxic epidermal necrolysiseoccur in a severe form. Most ADEs, however, can occur with varying grades of severity. For example, National Cancer Institute (NCI)-sponsored clinical trials of cancer therapies utilize the Common Toxicity Criteria for Adverse Effects (CTC AE), 20 where adverse events are graded on a 1e5 scale (5 represents death), though, depending on the particular adverse event, not all points on the scale may be used. For example, dry mouth can occur as 1e2 on the scale, while secondary malignancy, if present, is automatically grade 4. One motivation for grading is to enable consistent reporting of adverse events to the local Institutional Review Board, to other collaborating sites in a multi-site study, and to the studys sponsor, for example, by requiring reporting of major adverse events of grade 3 and above. While originally developed for oncology, because CTC AE grading is anchored, it has found application in non-cancer- related studies such as stem cell transplantation 21 and, in a modied form, for rheumatology. 22 The use of CTC AE minimizes inter-rater variability. Anchoring implies that rather than simply using terms like mild, moderate, or severe without denition or qualication, CTC AE species a particular grade of an adverse event in unambiguous detail, often in terms of numerical ranges or the extent of functional disability. For all its strengths, however, CTC AE is not comprehensive enough to use for all drug categories or for all types of clinical studies. For example, psychiatric ndings are under-represented, as are certain physical ndings such as tendon rupture. The latter can occur with uoroquinolone antimicrobial administration or after periarticular corticosteroid injections. With CTC AE, tendon rupture can only be encoded as musculoskeletal, other (specify). Grading of adverse events is not always possible or feasible to perform in real time. While the grades of some adverse events (eg, those based on measurable physical or laboratory ndings) can be readily computed algorithmically, grading of subjective ndings typically requires careful inspection of the clinical record or detailed interviewing of the patient. Electronic support in the form of check-lists can facilitate its implementation. A concern regarding the model proposed by Reisinger et al is that adverse event grade information is not easily gleaned from ICD- 9 data. First, only a small proportion of clinical conditions are graded in ICD-9 as mild/minimal, moderate, or severe. More importantly, as already stated, the billing and administrative practices related to ICD-9 usage tend to leave adverse events of 672 J Am Med Inform Assoc 2010;17:671e674. doi:10.1136/jamia.2010.008607 Viewpoint paper a low-level grade as narrative-text portions of the clinical record rather than formally encoding them. DRUG INFORMATION: CHOICE OF REFERENCE CONTENT For hierarchical relationships among drugs, Reisinger et al chose to use the drug hierarchy of SNOMED CT. While SNOMED CTs strengths with respect to encoding much of clinical medicine are well known, SNOMED CT is a suboptimal source for information about relationships among drugs. In the discussion below it is important to note that the relationship between drugs and drug families/categories is poly- hierarchical (ie, one drug may belong to more than one family). A given drug may have multiple therapeutic actionsefor example, aspirin is both an anti-inammatory and anti-platelet agenteor a drug may bind multiple receptors, as in the case of chlorpromazine. The authors choice of rofecoxib as an exemplar was fortuitous: SNOMED CT characterizes it correctly as a cox-2 inhibitor. However, in SNOMED CTchlorpromazine falls under the single category phenothiazine, a chemical classication that is not useful from the pharmacological or therapeutic perspectives. The SNOMED CTclassication for the widely used drug acetamino- phen is incorrect: Para-aminophenol derivative anti-inammatory agent (substance). Acetaminophen has antipyretic and analgesic effects, but has no clinically signicant anti-inammatory effects. The antimicrobial ciprooxacin (a uoroquinolone) is placed in the less useful, broader category quinolones along with nalidixic acid, an older drug with a signicantly different adverse event prole. Such classication problems can have real-world compli- cations. For example, the uoroquinolone drug family, which is not a distinct concept in SNOMED CT, is the focus of complaints regarding overuse from groups such as the Fluoroquinolone Toxicity Research Foundation, and the Health Research Group of Public Citizen, which has petitioned the FDAto require black-box label warnings. 23 An accurate and comprehensive drug hierarchy is important for analyses of groups of related drugs. Useful drug hierarchies have been constructed, but are not always freely available. For example, the Cerner Multum Drug Lexicon database 24 was freely available in its earlier versions, and correctly classied chlorpromazine both as a phenothiazine antipsychotic and a phenothiazine antihistamine. Unfortunately, its distribution has been constrained in its more recent versions, and one can now only obtain it by purchasing the content. DETERMINING DOSE-RELATED EFFECTS: CHALLENGES Reisinger et al state explicitly that their data model does not support analyses by drug strength. More concerning, the model does not record dose information. Many adverse events occur as dose-related extensions of pharmacological actions, such as congestive heart failure with b-blockers and uid retention with the thiazolidinedione anti-diabetic agents. The absence of dose data again limits the models utility. There are several issues related to performing such analyses. < Many of the standard sources of drug informationesuch as the NLMs RxNorm 25 and the drug hierarchy of SNOMED CT that the authors usededo not treat the numbers associated with a pharmaceutical preparation specially. Instead, the numbers are simply part of the string that describes a formulation. The UMLS reects this design limitation as well. More advanced data models, such as the previously mentioned Multum Lexicon, explicitly separate the numeric part of the drug strength (as well as the units in which the strength is expressed) from the medication itself. The Multum data model is sophisticated enough to recognize that in many cases, both strength and units are expressed in two parts, numerator and denominator (eg, milligrams per 100 ml), and so these parts were modeled separately where necessary. < Of course, even if one knows what strength of preparation was being prescribed for a given patient, that is not enough to reliably compute the quantity of medication that the patient is actually receiving per unit time. For ambulatory patients, one may try to rely on the quantity dispensed for a given period, but that is not the same as what is ingested. For several drugs, the dose must be continually titrated based on the values of a laboratory measure (eg, the International Normalized Ratio (INR) for warfarin), so the number of tablets taken per day or per week may change frequently. One practical issue is that many EMRs (eg, EpicCare) record the caregivers orders for a given prescription only as narrative text, for example, 1 bid, even though it should not be partic- ularly difcult in principle to enforce at least partial structure in the data through the use of pull-down lists and separation of the numeric part of the order from the dose frequency (although narrative text is still necessary for special instructions). Because of the considerable variation that can occur in such text, attempting to extract computable dose information can become a difcult pattern-recognition or NLP project. The full Observational Medical Outcomes Partnership data model allows recording of the number of rells, the number of days supply, and the total quantity of drug, but does not try to address dosage issues explicitly. This illustrates the overall challenges related to determining actual administered drug dose information reliably. UTILIZING KNOWN ADVERSE EVENT INFORMATION FOR DRUG SURVEILLANCE Pharmocovigilance (drug surveillance) efforts can utilize existing knowledge about adverse events in several ways: < Drug surveillance may resemble data mining with hypothesis generation. Formally designed studies must later conrm (or disprove) initial signals or trends detected in the raw data. A signicant problem in data mining exercises is the over- abundance of signals. Such problems multiply if the software lacks information on what is already common knowledge, as in the apocryphal story of the software program that discovered that ovarian cancer only occurs in women. One way for software to reduce pharmacovigilance study noise levels is to post-process signals by checking against known adverse event information for the drugs under suspicion, so that only novel signals are considered for further exploration. < Existing adverse event knowledge about closely related chemical compounds can also serve to focus the surveillance. For example, programs should monitor new aminoglycoside antibiotics for adverse renal or vestibulo-cochlear effects, and new statin-class drugs for hepatotoxicity and myopathy. < If one knows a drugs pharmacological mechanisms of action, one can predict part of its potential adverse event prole before case reports appear in the literature. A new drug with anticholinergic side effects will likely cause urinary retention in elderly males with benign prostatic hypertrophy, and can potentially exacerbate glaucoma in those patients known to have the condition. Such patients are not commonly subjects in clinical trials of drugs, which often may not specically target older populations. J Am Med Inform Assoc 2010;17:671e674. doi:10.1136/jamia.2010.008607 673 Viewpoint paper While commercial drug databases store such content, they are proprietary and vary considerably in design. Since comparative descriptions of commercial content have not been published in the literature, avoiding a blanket assessment is necessary here. However, a considerable portion of the proprietary content reproduces entirely the prose in the FDA-mandated package insert, and the latter is now freely available through NLMs DailyMed. 26 The added value of proprietary sources comes in part from categorizing the textual content into functional categories (side effects, pregnancy, and lactation), organ systems, and an occasional severity indicator, but this is not sufcient for drug surveillance purposes. The time is now appropriate for systematic efforts (preferably combining public and private resources) to extract the infor- mation that is present in the numerous primary and secondary adverse event data repositories into a single, over-arching structured representation with standard form and content. Such structuring will possibly be facilitated by the creation of a standard terminology of adverse event content that has much richer inter-relationships than are present in MedDRA, and where aspects of the same spectrum of disease are correlated along a time-and-severity spectrumefor example, angina pectoris and myocardial infarctioneas opposed to merely being related concepts. Bodenreiders pilot work at using SNOMED CT represents the starting point for such efforts. A larger consortium should build upon this work. Acknowledgments The author wishes to thank Randolph Miller for valuable feedback on the manuscript. Competing interests None. Provenance and peer review Not commissioned; not externally peer reviewed. REFERENCES 1. Reisinger SJ, Ryan PB, OHara DJ, et al. Development and evaluation of a common data model enabling active drug safety surveillance using disparate healthcare databases. J Am Med Inform Assoc 2010;17:652e62. 2. Observational Medical Outcomes Partnership. OMOP Common Data Model Specications, Version 2.0. 2009. http://omop.fnih.org (accessed 9 Jan 2010). 3. Lindberg DAB, Humphreys BL, McCray AT. The unied medical language system. Meth Inform Med 1993;32:281e91. 4. World Health Organization. International Classication of Diseases, 10th edn. Geneva, Switzerland, 1992. 5. MedDRA Maintenance and Support Organization. Medical Dictionary of Regulatory Activities. 2009. http://www.meddramsso.com (accessed 10 Sep 2009). 6. Harris G. Use of Antipsychotics in Children Is Criticized. The New York Times, 2008. 7. International Health Terminology Standards Development Organization. SNOMED Clinical Terms (SNOMED CT). 2009. http://www.snomed.org (accessed 2 Jan 2009). 8. Chute C, Cohn S, Campbell K, et al. The content coverage of clinical classications. For The Computer-Based Patient Record Institutes Work Group on Codes & Structures. J Am Med Inform Assoc 1996;3:224e33. 9. Brouch K. AHIMA project offers insights into SNOMED, ICD-9-CM mapping process. J AHIMA 2003;74:52e5. 10. Stein H, Nadkarni P, Erdos J, et al. Exploring the degree of concordance of coded and textual data in answering clinical queries from a clinical data repository. J Am Med Inform Assoc 2000;7:42e54. 11. Warren J, Collins J, Sorrentino C, et al. Just-in-time coding of the problem list in a clinical environment. Proc AMIA Symp 1998; Washington DC:280e4. 12. US National Library of Medicine. The CORE problem list subset of SNOMED-CT. 2009. http://www.nlm.nih.gov/research/umls/Snomed/core_subset.html (accessed 6 Jan 2010). 13. Nadkarni P, Darer J. Migrating existing clinical content from ICD-9 to SNOMED. J Am Med Inform Assoc 2010;17:602e7. 14. Wang X, Hripcsak G, Markatou M, et al. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc 2009;16:328e37. 15. Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-rst century. Methods Inf Med 1998;37:394e403. 16. Merrill G. The MedDRA paradox. AMIA Annu Fall Symp 2008:470e4. 17. Richesson R, Fung K, Krischer J. Heterogeneous but standard coding systems for adverse events: Issues in achieving interoperability between apples and oranges. Contemp Clin Trials 2008;29:635e45. 18. Bousquet C, Lagier G, LiioeLe-Lou A, et al. Appraisal of the MedDRA conceptual structure for Describing and Grouping Adverse Drug Reactions. Drug Saf 2005;28:19e34. 19. Bodenreider O. Using SNOMED CT in combination with MedDRA for reporting signal detection and adverse drug reactions reporting. AMIA Annu Fall Symp Am Med Inform Assoc 2009;2009:45e9. 20. National Cancer Institute. Common Terminology Criteria for Adverse Events (CTCAE) and Common Toxicity Criteria (CTC). 2009. http://ctep.cancer.gov/ protocolDevelopment/electronic_applications/ctc.htm (accessed 9 Jan 2009). 21. Daly A, Song K, Nevill T, et al. Stem cell transplantation for myelobrosis: a report from two Canadian centers. Bone Marrow Transplant 2003;32:35e40. 22. Woodworth T, Furst DE, Alten R, et al. Standardizing assessment and reporting of adverse effects in rheumatology clinical trials II: the Rheumatology Common Toxicity Criteria v.2.0. J Rheumatol 2007;34:1401e14. 23. Landers S. FDA requires black-box warnings for uoroquinolones. 2008. http:// www.ama-assn.org/amednews/2008/07/28/hlsc0728.htm (accessed 9 Feb 2010). 24. Cerner Corporation. Multum Lexicon. 2005. http://www.multum.com/ VantageRxDB.htm (accessed 6 Aug 2005). 25. National Library of Medicine. RxNorm. 2010. http://www.nlm.nih.gov/research/ umls/rxnorm (accessed 9 Feb 2010). 26. National Library of Medicine. About DailyMed. 2010. http://www.dailymed.nlm. nih.gov/dailymed/about.cfm (accessed 9 Feb 2010). 674 J Am Med Inform Assoc 2010;17:671e674. doi:10.1136/jamia.2010.008607 Viewpoint paper