Académique Documents
Professionnel Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/269634928
CITATIONS READS
2 297
1 author:
Vladica Velikovi
University of Ni
20 PUBLICATIONS 11 CITATIONS
SEE PROFILE
All content following this page was uploaded by Vladica Velikovi on 17 February 2015.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Ethics
Vladica M. Velikovi
I
n 2012, the New England Journal of the correlation coefficient. The correlation misconceptions can appear even in the
Medicine published a paper claim- coefficient ranges in value from 1 to New England Journal of Medicine, I won-
ing that chocolate consumption +1. The closer the magnitude is to 1, the dered, how often are they appearing
could enhance cognitive function. stronger the relationship. in the biomedical literature generally?
The basis for this conclusion was that The stark simplicity of a correla- The example of chocolate consump-
the number of Nobel Prize laureates in tion coefficient hides the considerable tion and Nobel Prize winners brings
each country was strongly correlated complexity in interpreting its meaning. me to another, even more common
with the per capita consumption of One error in the New England Journal of misinterpretation of correlation analy-
chocolate in that country. When I read Medicine paper is that the authors fell sis: the idea that correlation implies
this paper I was surprised that it made into an ecological fallacy, when a con- causality. Calculating a correlation co-
it through peer review, because it was clusion about individuals is reached efficient does not explain the nature of
clear to me that the authors had com- based on group-level data. In this case, a quantitative agreement; it only as-
mitted two common mistakes I see in the authors calculated the correlation sesses the intensity of that agreement.
the biomedical literature when research- coefficient at the aggregate level (the The two factors may show a relation-
ers perform a correlation analysis. country), but then erroneously used ship not because they are influenced
Correlation describes the strength that value to reach a conclusion about by each other but because they are
of the linear relationship between two the individual level (eating chocolate both influenced by the same hidden
observed phenomena (to keep matters enhances cognitive function). Accu- factorin this case, perhaps a coun-
simple, I focus on the most commonly rate data at the individual level were trys affluence affects access to choco-
used linear relationship, or Pearsons completely unknown: No one had col- late and the availability of higher edu-
correlation, here). For example, the in- lected data on how much chocolate the cation. Correlation can certainly point
crease in the value of one variable, such Nobel laureates consumed, or even if to a possible existence of causality, but
as chocolate consumption, may be fol- they consumed any at all. I was not it is not sufficient to prove it.
lowed by the increase in the value of the the only one to notice this error. Many An eminent statistician, George E.
other one, such as Nobel laureates. Or other scientists wrote about this case of P. Box, wrote in his book Empirical
the correlation can be negative: The in- erroneous analysis. Chemist Ashutosh Model Building and Response Surfaces:
crease in the value of one variable may Jogalekar wrote a thorough critique on Essentially, all [statistical] models are
be followed by the decrease in the value his Scientific American blog The Curious wrong, but some are useful. All sta-
of the other. Because it is possible to cor- Wavefunction, and Beatrice A. Golomb tistical models are a description of a
relate two variables whose values can- of University of California, San Di- real-world phenomenon using math-
not be expressed in the same unitsfor ego, even tested this hypothesis with ematical concepts; as such, they are
example, per capita income and cholera a team of coauthors, pointing out that just a simplification of reality. If statis-
incidencetheir relationship is mea- there is no link. tical analyses are carefully designed,
sured by calculating a unitless number, Regardless of the scientific com- in accordance with current good prac-
munitys criticism of this paper, many tice guidelines and a thorough un-
news agencies reported on this arti- derstanding of the limitations of the
Vladica M. Velikovi is a Doctor of Medicine,
cles results. The paper was never re- methods used, they can be very useful.
a PhD student in public health, and a full-
time teaching assistant at the Public Health
tracted, and to date has been cited 23 But if models are not designed in ac-
Department, Faculty of Medicine, University of times. Even when erroneous papers cordance with the previous two prin-
Ni, Serbia. His research interests are in the use are retracted, news reports about them ciples, they can be not only inaccurate
of computational and mathematical models for remain on the Internet and can contin- and completely useless but also poten-
public health insight. E-mail: vladica.velickovic@ ue to spread misinformation. If these tially dangerousmisleading medical
medfak.ni.ac.rs faulty conclusions reflecting statistical practitioners and public.
degrees awarded
clearly shows a linear relationship
pounds
y1
y1
vidual behavior when analyzed cor- research team is no Aldrich, J. 1995. Correlations genuine and spu-
rious in Pearson and Yule. Statistical Sci-
longer an advantage
rectly, but that requires individual- ence10:364376.
level data. Then, modeling at the Andrade, A. J. M., S. W. Grande, C. E. Tals
but a necessity.
individual level must be performed ness, K. Grote, and I. Chahoud. 2006. A
dose-response study followingin uteroand
in an attempt to determine the con- lactational exposure to di-(2-ethylhexyl)-
nection between individual and ag- phthalate (DEHP): Non-monotonic dose
gregate levels. Only then is it possible but a necessity. Some universities offer response and low dose effects on rat brain
to conclude whether the correlation the option for researchers to check their aromatase activity.Toxicology227:185192.
at the aggregate level applies to the analysis with their statistics department Anscombe, F. J. 1973. Graphs in statistical anal-
ysis.American Statistician27:1721.
individual level. Ecologic data alone before sending the article to review with
David, H. A. 2009. A historical note on zero
do not allow one to determine wheth- a publication. Although this solution correlation and independence. The American
er ecologic bias is likely to be present could work for some researchers, it pro- Statistician. 63:185186.
for this type of data set; the only solu- vides little incentive for the researcher to Hill, A. B. 1965. The environment and disease:
tion is to supplement the ecologic data take this extra time. Association or causation? Proceedings of the
with individual-level data. This type The process of scientific research Royal Society of Medicine 58:295300.
of modeling usually involves mixed requires adequate knowledge of bio- King, G. 1997. A Solution to the Ecological Infer-
or multilevel statistical models, which statistics, a constantly changing field. ence Problem: Reconstructing Individual Be-
havior from Aggregate Data. Princeton, NJ:
allow for individuals to be nested into To that end, biostatisticians should Princeton University Press.
aggregates. be involved in the research from the
Lemmens, P. 2010. U-shaped curve. In N. Sal-
To avoid assuming two variables very beginning, not after the measure- kind (Ed.), Encyclopedia of Research Design.
are independent because their corre- ment, observations, or experiments are Thousand Oaks, CA: SAGE Publications.
lation equals zero, the data must be completed. On the other hand, basic pp. 15871589. doi: 10.4135/9781412961288.
plotted to make sure it is monotonic. knowledge of biostatistics is essential n485.
If not, one or both variables can be in the critical appraisal of published Pearl,J. 2009. Causal inference in statistics: An
overview.Statistics Surveys3:96146.
transformed to make them so. In a scientific papers. A critical approach
Wakefield, J. 2009. Multi-level modelling, the
transformation, all values of a vari- must exist regardless of the journal in ecologic fallacy, and hybrid study designs.
able are recalculated using the same which the paper is published. A more International Journal of Epidemiology38:330
equation, so that the relationship be- careful use of statistics in biology can 336. doi: 10.1093/ije/dyp179.
tween the variables is maintained but also help set more rigorous standards Zadnik, K., et al. 2000. Myopia and ambient
their distribution is changed. Different for other fields. night-time lighting. Nature 404:143144.