Vous êtes sur la page 1sur 8
i REVIEW ‘CLINICIAN’S CORNER Accuracy of Diagnostic Tests Read With and Without Clinical Information A Systematic Review en es Inwig, MBBCh, PhD ETHER DIAGNOSTIC tests should be read with, clinical information has been much disputed since Schreiber! suggested 40 years ago that clinical information improved the accuracy of chest x-ray reading, An ar- gument favoring reading diagnostic tests with clinical information is that the accuracy of the read may be improved by the additional information. It may help at 2 stages: perception and inte pretation. Consider a 78-year-old woman undergoing a ventilation- perfusion (V/Q) sean for left-sided chest pain that develops 10 days after a total hip replacement. Clinical information may focus the reader's perception to the relevant area—the left lang. Knowing that the patient had a hip replacement 10 days ago, the reader may also con- sciously oF unconsciously alter his/ her interpretation of subtle cues or al- ter his/her overall level of suspicion for pulmonary embolism, An argument favoring reading diag- nostic tests without clinical informa tion is that it may bias the reading and that clinical information should be in- corporated into decision making only after an unbiased read.* For example, clinical information may bias the reader into seeing perfusion defects that may ‘ME available online at www.jama.com Context Although is common practice to read tests with clinical information, whether this improves or decreases the accuracy of test reading is uncertain, Objective Todetermine whether diagnostic tests are more accurate when read with clinical information or without it, Data Sources MEDLINE search (1966-December 2003) extended by search of refer- ence ists and articles citing the aries retrieved (Web of Scence, 1985-December 2003) Study Selection Allartcles comparing the accuracy of tests read twice by the same readers, once without and once with clinical information, but otherwise under iden- tical conditions. Only articles that reported sensitivity and specificity or receiver oper~ aling characteristic (ROC) curves were included. Data Extraction Data were extracted by one author and reviewed independently by the other. When the data were difficult to interpret, differences were resolved by discussion. Data Synthesis Sixteen articles met the inclusion criteria. Eleven articles compared areas under ROC curves for tests read with and without clinical information, and 5 compared only sensitivity and specificity. Ten articles used actual clinical information; 6 used constructed clinical information that was plausible. Overall, clinical informa tion improved test reading accuracy although the effect was smaller in the articles us- ing actual clinical information when compared with those using constructed clinical information. There were no instances in which clinical information resulted i sigifi- cant reduction in test reading accuracy. In some instances, improved test reading ac~ curacy came from improved sensitivity without loss of specificity Conclusions At least forthe tests examined, the common practice of reading diag- nostic tests with clinical information seems justified. Future studies should be de- signed to investigate the best way of providing clinical information. These studies should also give an estimate of the accuracy of clinical information used, display ROC curves with identified data points, and include a wider range of diagnostic tests JAMA, 20042921602-1609 ww jamacom | not be present. In addition, the clint- monary embolus was unlikely in this cal information would be double- patient clinically and informed the ra counted if it influenced both the ini-diologist, the radiologist might re- al test reading and subsequent {interpretation of test result by the phy- Author Affilatons Seenng and Tet Eataton Pe sician ordering the test, Ibis also con- gam, Seal of Pubic Heth, Univesity of Syney Sydray, Austra, ceivable that clinical information can (ene a es ing, MBBCh, PHO defeat the purpose ofa diagnostic test. Seeing and Tet vauaton Progam School Pub eee he ic Heath, Room 301, A2?, Edvard Ford fide Un ‘or example, ifthe physician who or- —YerstyofSyaney NSW 2006, Auras eiGheah dered the V/Q sean thought thata pul- “isydedual). No. 15 Repinte) (©2004 American Medical Association. All ights reserved. spond by raising the threshold for di- agnosis. This is equivalent to increasing the V/Q scan’s specificity and lower ing its sensitivity, which may make the exclusion of pulmonary embolism on fa negative test result less definite be- ‘cause specific testsare less useful in ex- cluding disease than sensitive ones.> We therefore systematically re- viewed the literature to determine whether a diagnostic test is more ac- curate when read with clinical infor- mation oF without it METHODS: Data Sources, Study Selection The published literature was searched by one of the authors (C-T.L) using MEDLINE (1966 to December 2003) and 2 strategies. For the first strategy, ‘we searched MEDLINE through OVID’s explode function (EXP) using (EXP ‘medical history taking and EXP sensi- tivity and specificity) followed by an ex- tended search which included exami nation of reference lists of all included. articles, Additional articles citing the ar- licles thus identified were located and examined using Web of Science (1985- December 2003). Using Web of Sei- cence proved to be a useful strategy in identifying articles not easily retriev- able on MEDLINE because the rel- evant studies came from a wide vari- ety of disciplines and there were a few carly seminal articles that were likely tohave been quoted by subsequent a thors. For the second strategy, we searched MEDLINE through PubMed using the word history and the spe- cific version of the “Clinical Queries Llter diagnosis. For each of the strategies, abstracts, ‘were read forall articles with relevant Liles. We obtained full articles for ab- stracts that suggested that the articles were relevant, We included all articles that com- pared the accuracy oftests against a ref erence standard with or without clini- cal information and that had been read twice by the same readers under oth- erwise identical conditions. We ac- cepted all articles that measured sen- sitivity and specificity or that had (©2004 American Medical Assoc 1, All rights reserved. READING DIAGNOSTIC TESTS WITH CLINICAL INFORMATION recelver operating characteristic (ROC) Data Extraction Extraction of data on study characte istics and test accuracy were per~ formed by one author (C.T-L). Dill cell to interpret articles were reviewed by the other author (LL), and differ- ences were resolved by discussion. Qual- fay assessment was performed using the Quality Assessment of Diagnostic Ac- curacy Studies tool,* as well as addi- ional criteria specific to methodolog) cal issues relating to this research question, which included whether elini- cal information was actual oF con- structed, whether a balanced design was used, what the amount of ime was be- ‘oven reading sessions, and whether l- temnate ways of providing clinical infor- ‘mation was considered. Test accuracy can be quantified in terms of sensitivity and specificity and ROC curves, Measures stich as percent- age agreement and repeatability (eg, statistic) address internal consistency but not accuracy and, therefore, are not helpful in this context Sensitivity is the proportion of ind viduals with a disease who have pos Live test results for that particular dis- case while specificity isthe proportion, of individuals without a disease who have negative test results for that par- Udcular disease. When the threshold of a testis varied, sensitivity and spec ficity can be traded off, without neces- sarily altering the overall test accu icy. An ROC curves formed when test, accuracy estimates for a test at several thresholds are joined together. On an ROC curve, a change in threshold is represented by shifts along the curve. On the other hand, any improvement in the overall diagnostic test accuracy is represented by an upward and left- ward shift of the curve. The accuracy of 2 tests can be compared by looking at the areas under their ROC curves. ‘We used the area under the curve in our systematic review, wherever pos- sible, to assess whether reading a test with clinical information improved its overall accuracy. This allowed usto dis- Linguish observed changes in sensitiv ity and specificity due to true changes in overall test accuracy from those due to threshold shift alone Whenever available, confidence in- tervals were caleulated using SEs and variances reported inthe articles. Pval- uues quoted in the articles are also listed. ‘with levels of significance set at P<.05. Data on sensitivity and specificity wer only used when data on the area un- der the curve were not available RESULTS Using the combined Ovid-Web of Set- cence strategy, we identified 39 articles that studied the impact of clinical in- formation on test reading. We ex- cluded 23 articles: 12 because they only measured percentage change, agres ment, or proportion correct; 6 be- ccause different readers read the tests with and without clinical informa. tion!” 3 because they only assessed the impact of different clinical infor- mation, without comparing that against ‘test reading without clinial informa: tion? 1 because previous films wes included in the clinical information”: and 1 because the study's readers re- ceived test reading training between the 2 reading sessions” Using the specific version of the “Clinical Queries” filter diagnosis and abroad search term history in PubMed, 3503 entries were found identifying conly 1 ofthe 16 articles found with the first strategy. These 3503 entries we also examined, finding no additional articles meeting our inchision criteria Using the sensitive version ofthe same Ser, we found 88880 entries that iden- lified only 12 of the 16 articles using ‘our first strategy Ofthe Lo articles thus identified, 11 reported areas under the ROC curve,=" and 5 reported only sensitivities and specificities." Nine of these 16 re- ports were from 3 research groups: (1) Berbaum et al (2) Cooperstein et ab? and Good et a." and (3) Rickettet al! and Tudor etal" Two pairs of a~ ticles used the same sets of radio- ‘graphs—() Berbaum et ab” and (2) Cooperstein et ab and Good et al— (Reprinted) JAMA, Over 6, 2004 Nol 292, No, 12 1608 READING DIAGNOSTIC TESTS WITH CLINICAL INFORMATION Table 1 Detale of Aries nctuced™ Time No.of Botween Diagnostic Test Reference "Test No.of Ginical—_—‘alanced Reading ‘Source ‘Assessed Diseases Tested ‘Standard Samples Readers Information Used _Designt Sessions ‘Riles reporting areas ‘under fhe ROG curve erbaumet a." Chestradogaphs Ones subtle incl, AL 6 Comectprompts for Yes Sov ‘es ‘eck anc Tabet, “real ins wth months onnodiarking — srgery. and thesane sto ‘Sensoe ‘autopsy prompts used or fotowp fhenema fins Bataan ata’ Bare raciographs Sibi acres of Not epected 01 Correct eatin Ya ome "eo ‘rembes and Specie tors for conve sane Barra tins wih thesame eto htores used fr normal eases, maton oy ‘Sufomeal meas earned Sengga™ ——Gieat abaoman Oseases common Compan! 1008 Ottanea Far Te Tmo doe ‘nd bone inradoogedclnea recunels paint Fadlogeohs ———ractes, farang —_laboralany, ate, with {fom preumana sirgeal and ‘rodcaten of fogaldonesto rer “Seve ‘rosures imaging dagpostc reais Serato? Barkaum atay* —Ciestraciogaphe — Overs, suite Cineal To Gincatnormatan Ves Sava ‘ee ‘eck anc Tabet, ‘atone months onnodiarking — srgery. and dagroces for ‘Sensoe ‘autopsy SSrevmaltins, with fotowp thesane sao promt used or oral ine the lormaten vse Seopomedina numberof itera ‘nays, ncueing beer inspection of theraddgaphs anc ater nspecton of theradogaphs wth the opportaity torenspeet rama, Fenchalbush — Walgantefabyy Haoagcaad oT 8 Abarectofreca No 23m 2000) erobay 18 moat ‘acalinings ard fines Sugpeien of asso fotowrun Barigumata,” Pose cestand Suite, cherse Notepeated 610 Hisoneaconastant Yes amo ‘aad ‘borin realy ‘wth dagnoses fmoogaons —gafeant Mee for sere siraralics fie, ithe same fet cthstone used fer nama ire hetores were ao provdedin2 ‘Seren wae bere ane Fpecon al the fedoras but the radiographs were read either by bones; mammograms; and computed receiving the test in practice. Four did readers from different specialties” or tomography of the head. not specily the reference standard were different readers. °* Assessment of these articles using used, and 1 did not provide sufficient, Delails on the topic and design of the Quality Assessment of Diagnostic information about the reference stan- these articles canbe found in Tante 1, Accuracy Studies tool found that all dard to enable quality assessment Diagnostic tests examined included cy- 16 articles clearly described their The remaining 11 of the 16 articles all tology of bronchial washings; radio- selection criteria although only 7 of used well-described reference stan- graphs of the chest, abdomen, and these were representative of patients dards performed over the whole 1604 JAMA coher 6, 2004 92, No. 2 Reprinted) (©2004 American Medical Association, All ights reserved. Downloaded From: https:/jamanetwork.com/ by a Universidad de los Andes Colombia User on 05/10/2019

Vous aimerez peut-être aussi