Vous êtes sur la page 1sur 3

Note: This copy is for your personal non-commercial use only.

To order presentation-ready copies for distribution to your colleagues or clients, contact us at www.rsna.org/rsnarights.

EDITORIAL

REVIEWS AND COMMENTARY

Submissions to Radiology : Our Top 10 List of Statistical Errors1

Deborah Levine, MD Alexander A. Bankier, MD Elkan F. Halpern, PhD

Published online 10.1148/radiol.2532090759 Radiology 2009; 253:288 290


1 From the Department of Radiology, Beth Israel Deaconess Medical Center, 330 Brookline Ave, Boston, MA 02215 (D.L., A.A.B.); and Institute for Technology Assessment, Massachusetts General Hospital, Boston, Mass (E.F.H.). Received May 1, 2009; nal version accepted June 1. Address correspondence to D.L. (e-mail: dlevine@bidmc.harvard.edu ).

Authors stated no nancial relationship to disclose. RSNA, 2009

asic statistical concepts are the underpinnings of appropriate study design. In 2001, Radiology introduced systematic statistical review of all published manuscripts that contain statistical content. In 20022004 Radiology published a series of articles regarding statistics and radiology (118). As a series, these articles can be found by clicking on Statistical Concept Series on the Radiology Web site at http://radiology.rsnajnls.org /misc/collect_index.shtml. The articles were designed to provide readers of Radiology with an understanding of the basic concepts of statistics, probability, and scientic methods that are used in the medical literature. As stated by Dr Anthony Proto, when he introduced the Statistical Concept Series, One of the most common comments by our statistical reviewers is that authors have selected inappropriate statistical tests for the analysis of their data. We urge authors to consult with statisticians regarding the analysis of their data. It is particularly important that a study be designed and data be collected in a manner that will allow the study hypothesis to be adequately evaluated. Statistical consultation in the study-planning stages can help ensure success in this regard (19). Although these initiatives emphasized the importance of statistics, and despite reports on how to improve accuracy studies with use of Standards for Reporting of Diagnostic Accuracy criteria (2022), we continue to frequently encounter statistical issues in the manuscripts submitted to our journal. Research manuscripts published in Radiology currently undergo three levels of review prior to acceptance: peer review, editor/deputy editor review, and statistical review. In August 2007, our editorial ofce in Boston, Mass, began reviewing original research manuscripts. At that time, we began weekly editorial meetings where all research manuscripts

that are potentially going to be accepted for publication are reviewed and discussed. A consultant statistician is present at these meetings and screens the manuscripts under discussion for the validity of their statistical methods. In our editorial meetings we have noted a number of recurrent issues in the study design and statistical analysis of submitted manuscripts that serve to lessen their overall impact. We thought it would be helpful to our authors to summarize our suggestions for avoiding a number of the most common errors and problem issues in statistical analysis that we encounter in submissions to Radiology. Please note that these are not listed in order of frequency or importance. We present these issues, together with a brief explanation, and provide recommendations and resources to authors for further understanding and potentially eliminating these problems. Top 10 list of our suggestions to avoid statistical problems with manuscripts submitted to Radiology: 1.Consult a statistician during the study design phase to review study size, the data to be collected, and the type of analysis that will be performed on the data obtained. 2. Make sure that the size of the study group is sufcient to justify the conclusions you are reporting. Account for the statistical power (or lack thereof) in your study (5). Explanation: Many studies are statistically underpowered. Some of this is a reection of the desire to rapidly publish studies of cutting edge technology before a large-enough study population can be accrued. If a study has positive results (ie, a signicant difference is found between two groups) statistical power is less of an issue. However, in demonstrating that there is no statistical difference between two groups, a power analysis is mandatory. Otherwise, it is impossi-

288

radiology.rsna.org Radiology: Volume 253: Number 2November 2009

EDITORIAL: Top 10 List of Statistical Errors

Levine et al

ble to determine if the lack of difference is a consequence of the small sample size studied. Tip: Perform a statistical power analysis before starting the study to ensure that sample size is sufcient (1,2). In preliminary studies and technical developments that report equivalence based on lack of statistical signicance, perform a post hoc power analysis to determine if your sample size was sufcient to make a meaningful statement about the lack of signicance of your results. Remember that biologic variation in a small sample may not adequately represent the spectrum of disease. 3. Analyze all of the data from each step in the methods. Explanation: Authors frequently choose variables to report based on the signicance obtained (ie, good results) instead of presenting all of their data. Tip: If you did analyze some nding, report it. Do not try to avoid adjusting for multiple comparisons by pretending you never performed those nonsignicant ancillary tests. 4. In a diagnostic performance study, be sure to account for true-negative cases in your population. Explanation: Many studies have histologic conrmation only in those who had positive ndings at imaging. If you have such a population, and include only those with histologic conrmation of the nding, you will likely bias your study sample by excluding those who did not have tissue proof of diagnosis. This leads to under-representation of the true-negative cases in your population. If instead you assume that all those without histologic conrmation were reference-standard negative cases, you will miss all the false-negative cases in the population. These issues lead to an inability to calculate sensitivity and specicity, respectively. An imperfect solution is to use correlative imaging, follow-up imaging, and clinical follow-up as reference standards. However, use of these may have associated limitations that should be addressed in your discussion. Tip: In a lesion-level analysis, be sure to explain how you calculate the number of true-negative ndings.

5. Use condence intervals to assess the extent of differences (8). Explanation: Statistical signicance and clinical signicance are not synonymous. Just because a difference is statistically signicant does not mean it is enough to alter clinical practice/case management (and vice versa). A P value of less than .05 with a difference in a test between 1% and 2% of a measured value is statistically signicant but probably is clinically unimportant since it could not be used to differentiate a normal from an abnormal population. Tip: A condence interval for the difference is necessary to assess clinical signicance. 6. Use a statistical test that considers clustering effects when a study subject has more than one lesion. Explanation: Lesions in a subject may be similar to each other, and therefore a lesion-by-lesion analysis may not represent the same biologic variability if only a single lesion per patient were assessed. Tip: Use a general estimating equation or another method that incorporates terms associating lesions within patient. 7. Use a statistical test that corrects for multiple comparisons, when a large number of variables are being analyzed (11). Explanation: The more variables analyzed, the more likely it is that one will be signicant by chance. An example is that if you are performing 20 tests, on average, at least one is likely to be significantly different at the P .05 level by chance alone. Another example is in group comparisons. If there are ve groups to compare, there are 10 possible pairwise comparisons and at least one is likely to be signicant by chance. Tip: With more than one variable, either use a multivariate analysis or use some correction to the univariate analyses such as the Bonferroni method (18). Tip: With multiple pairwise comparisons of groups, either correct using an adjustment such as Bonferroni or use an analysis of variance (or an analog) and perform the pairwise tests only if the analysis of variance was signicant. 8. Understand the interpretation of a P value (9). A P value of .05 does not mean

there is a 5% chance that the null hypothesis is true. Instead, it says that if the null hypothesis were true, then there was a 5% chance of seeing results as extreme as in the data. Note, it says nothing about the chances of such results if the null hypothesis were false. Tip: Be careful how you phrase (and interpret) your results. 9. Understand the difference between correlation and accuracy. Explanation: Even if your new measure is highly correlated to the traditional measurement, they may not be the same. For example, degrees centigrade and degrees Kelvin are perfectly correlated, but do not have the same numerical values. Tips: Test for bias by testing if the average of two sets of data are the same; test whether the coefcient of proportionality is one; determine the typical error (see how far apart values tend to be); and investigate whether the error is uniform or is relative to the value. A helpful rst step to this goal is to create a Bland-Altman plot, which illustrates how far apart two sets of measurements are from the mean of the those measurements (23). A slightly different take on this was described in the article by Kundel and Polansky: High accuracy implies high agreement, but high agreement does not necessarily imply high accuracy (13). 10. Report on variability in readers. Background. Multiple independent readers are better than a single reader, since a single reader may have great experience that is difcult for others to duplicate. However, when you use multiple readers, be careful to not merely have them interpret studies in consensus, since individual variability is lost when only consensus reads are used. The value of a diagnostic tool may depend on the readerperhaps due to experience or some other characteristic. Multiple readers can reveal this. However, analyzing a consensus reader can be misleading unless clinical practice will also require consensus interpretation. This important topic will be the subject of a forthcoming editorial in Radiology.
289

Radiology: Volume 253: Number 2November 2009 radiology.rsna.org

EDITORIAL: Top 10 List of Statistical Errors

Levine et al

Tip: Report results provided by each reader and assess inter-reader variability. Avoid consensus reading whenever possible.

ability and condence intervals in medicine: why should radiologists care? Radiology 2003; 226(2):297301. 9. Zou KH, Fielding JR, Silverman SG, Tempany CM. Hypothesis testing I: proportions. Radiology 2003;226(3):609 613. 10. Zou KH, Tuncali K, Silverman SG. Correlation and simple linear regression. Radiology 2003;227(3):617 622. 11. Tello R, Crewson PE. Hypothesis testing II: means. Radiology 2003;227(1):1 4. [Published correction appears in Radiology 2003; 229(3):934.] 12. Langlotz CP. Fundamental measures of diagnostic examination performance: usefulness for clinical decision making and research. Radiology 2003;228(1):39. 13. Kundel HL, Polansky M. Measurement of observer agreement. Radiology 2003;228(2): 303308. 14. Sistrom CL, Garvan CW. Proportions, odds, and risk. Radiology 2004;230(1):1219. 15. Obuchowski NA. Special Topics III: bias. Radiology 2003;229(3):617 621. 16. Obuchowski NA. Receiver operating charac-

teristic curves and their use in radiology. Radiology 2003;229(1):3 8. 17. Gareen IF, Gatsonis C. Primer on multiple regression models for diagnostic imaging research. Radiology 2003;229(2):305310. 18. Gonen M, Panageas KS, Larson SM. Statistical issues in analysis of diagnostic imaging experiments with multiple observations per patient. Radiology 2001;221(3):763767. 19. Proto AV. Radiology 2002: statistical concept series [editorial]. Radiology 2002; 225(2):317. 20. Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Radiology 2003;226(1):24 28. 21. Smidt N, Rutjes AW, van der Windt DA, et al. Quality of reporting of diagnostic accuracy studies. Radiology 2005;235(2):347 353. 22. Wilczynski NL. Quality of reporting of diagnostic accuracy studies: no change since STARD statement publication beforeand-after study. Radiology 2008;248(3): 817 823. 23. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1(8476):307310.

References
1. Eng J. Sample size estimation: how many individuals should be studied? Radiology 2003;227(2):309 313. 2. Eng J. Sample size estimation: a glimpse beyond simple formulas. Radiology 2004; 230(3):606 612. 3. Applegate KE, Crewson PE. An introduction to biostatistics. Radiology 2002;225(2):318 322. 4. Applegate KE, Crewson PE. Statistical literacy. Radiology 2004;230(3):613 614. 5. Sunshine JH, Applegate KE. Technology assessment for radiologists. Radiology 2004; 230(2):309 314. 6. Sonnad SS. Describing data: statistical and graphical methods. Radiology 2002;225(3): 622 628. 7. Halpern EF, Gazelle GS. Probability in radiology. Radiology 2003;226(1):1215. 8. Medina LS, Zurakowski D. Measurement vari-

290

radiology.rsna.org Radiology: Volume 253: Number 2November 2009

Vous aimerez peut-être aussi