Vous êtes sur la page 1sur 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/216435379

Fundamental concepts in statistics: Elucidation and illustration

Article  in  Journal of Applied Physiology · October 1998


DOI: 10.1152/jappl.1998.85.3.775

CITATIONS READS
83 820

3 authors, including:

Douglas Curran-Everett
National Jewish Health
170 PUBLICATIONS   7,729 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Douglas Curran-Everett on 13 August 2014.

The user has requested enhancement of the downloaded file.


invited review
Fundamental concepts in statistics:
elucidation and illustration
DOUGLAS CURRAN-EVERETT, SUE TAYLOR, AND KAREN KAFADAR
Departments of Pediatrics and of Preventive Medicine and Biometrics, School of Medicine,
University of Colorado Health Sciences Center, Denver, 80262; and Department of Mathematics,
University of Colorado at Denver, Denver, Colorado 80217-3364

Curran-Everett, Douglas, Sue Taylor, and Karen Kafadar. Fun-


damental concepts in statistics: elucidation and illustration. J. Appl.
Physiol. 85(3): 775–786, 1998.—Fundamental concepts in statistics form
the cornerstone of scientific inquiry. If we fail to understand fully these
fundamental concepts, then the scientific conclusions we reach are more
likely to be wrong. This is more than supposition: for 60 years, statisti-
cians have warned that the scientific literature harbors misunderstand-
ings about basic statistical concepts. Original articles published in 1996
by the American Physiological Society’s journals fared no better in their
handling of basic statistical concepts. In this review, we summarize the
two main scientific uses of statistics: hypothesis testing and estimation.
Most scientists use statistics solely for hypothesis testing; often, however,
estimation is more useful. We also illustrate the concepts of variability
and uncertainty, and we demonstrate the essential distinction between
statistical significance and scientific importance. An understanding of
concepts such as variability, uncertainty, and significance is necessary,
but it is not sufficient; we show also that the numerical results of
statistical analyses have limitations.
confidence interval; estimation; tolerance interval; uncertainty; variability

There are very few things which we know, which are not size methods rather than concepts, they contain glaring
capable of being reduc’d to a Mathematical Reasoning, errors, or they perpetuate misconceptions (4, 11, 12).
. . . and where a Mathematical Reasoning can be had, In his editorial prelude to a series of statistical
it’s as great folly to make use of any other, as to grope for papers, Yates (51) wrote that the papers were designed
a thing in the dark when you have a Candle standing to raise statistical consciousness and thereby reduce
by you. statistical errors in journals published by the American
John Arbuthnot (1692) Physiological Society. Rather than reinforce concepts,
STATISTICS IS ONE KIND of mathematical reasoning. Its these papers reviewed methods: analysis of variance
concepts and principles are ubiquitous in science: as (20), linear regression (37, 46), mathematical modeling
researchers, we use them to design experiments, ana- (22, 29, 40), risk assessment (36), and statistical pack-
lyze data, report results, and interpret the published ages (34). The proper use of any statistical technique,
however, requires an understanding of the fundamen-
findings of others. Indeed, it is from this foundation of
tal statistical concepts behind the technique.
statistical concepts and principles that scientific knowl-
How well do physiologists understand fundamental
edge is accumulated. If we fail to understand fully these concepts in statistics? One way to answer this question
fundamental statistical concepts and principles—if our is to examine the empirical incidence of basic statistical
statistical reasoning is faulty—then we are more likely quantities such as standard deviations, standard er-
to reach wrong scientific conclusions. Wrong conclu- rors, and confidence intervals. These quantities charac-
sions based on faulty reasoning is shoddy science; it is terize different statistical features: standard deviations
also unethical (1, 21, 30). characterize variability in the population, whereas
Regrettably, faulty reasoning in statistics rears its standard errors and confidence intervals characterize
head in the practice of science: for 60 years, statisti- uncertainty about the estimated values of population
cians have documented statistical errors in the scien- parameters, e.g., means. Of the original articles pub-
tific literature (3, 4, 17, 33, 50). In part, these errors lished in 1996 by the American Physiological Society,
exist because many introductory textbooks of statistics the overwhelming majority (69–93%, range) report
paradoxically hinder literacy in statistics: they empha- standard errors, apparently not as estimates of uncer-
http://www.jap.org 8750-7587/98 $5.00 Copyright r 1998 the American Physiological Society 775
776 INVITED REVIEW

Table 1. Manuscripts for the American Physiological Society’s journals in 1996: use of statistics and statisticians
%Research Manuscripts That Report

Standard Standard Confidence P


n deviation error interval value* Statistician†

Am. J. Physiol.
Cell Physiol. 43 21 88 0 7 0
Endocrinol. Metab. 28 18 86 0 4 4
Gastrointest. Liver Physiol. 26 8 92 0 4 12
Heart Circ. Physiol. 60 17 87 0 10 3
Lung Cell. Mol. Physiol. 25 20 84 0 4 4
Regulatory Integrative Comp. Physiol. 41 17 88 0 15 12
Renal Fluid Electrolyte Physiol. 27 15 93 0 7 4
J. Appl. Physiol. 62 24 79 0 6 10
J. Neurophysiol. 58 36 69 2 5 7
n, No. of research manuscripts reviewed. In 1996, these journals published a total of 3,693 original articles. No. of articles reviewed
represents a 10% sample (selected by systematic random sampling, fixed start) of articles published by each journal. * Precise P value: for
example, P 5 0.02 (rather than P , 0.05) or P 5 0.13 (rather than P $ 0.05 or P 5 not significant). † We assessed collaboration with a
statistician using author affiliation and acknowledgments. We recognize that a statistician may be affiliated with another department, e.g.,
medicine. Using our criterion, however, few articles (0–12%, range) report formal collaboration of a physiologist with a statistician, a
partnership that typically reaps great rewards.

tainty but as estimates of variability (Table 1). Virtu- use statistics solely for hypothesis testing. In many
ally no articles (0–2%, range) report confidence inter- situations, statisticians play down hypothesis testing
vals, recommended by statisticians (2, 5, 9, 10, 28, 39) and prefer estimation instead.
as interval estimates of uncertainty about the values of Hypothesis testing. To test a scientific hypothesis, a
population parameters. Moreover, few articles (4–15%, researcher must formulate the hypothesis before any
range) report precise P values, which precludes per- data are collected, then design and execute an experi-
sonal assessment of statistical significance. ment that is relevant to it. Because the hypothesis is
In this review, we summarize the primary scientific most often one of no difference, the hypothesis is called,
uses of statistics. Then, we illustrate several fundamen- by tradition, the null hypothesis.1 Using data from the
tal concepts: variability, uncertainty, and significance. experiment, the researcher must next compute the
Last, we illustrate that although an understanding of observed value T of a test statistic. Finally, the re-
concepts such as variability, uncertainty, and signifi- searcher must compare the observed value T with some
cance is necessary, it is not sufficient: it is essential to critical value T *, chosen from the distribution of the
realize also that the numerical results of statistical test statistic that is based on the null hypothesis. If T is
analyses have limitations. more extreme than T *, then that is a surprising result
Glossary if the null hypothesis is true, and the researcher is
entitled, on statistical grounds, to become skeptical
a Critical significance level about the scientific validity of the null hypothesis.
Ave 5q6 Average of the quantity q The statistical test of a null hypothesis is useful
µ Population mean because it assesses the strength of the evidence: it helps
n Degrees of freedom guard against an unwarranted conclusion, or it helps
n Number of observations argue for a real experimental effect (19, 48). Neverthe-
N (µ, s2 ) Normal (Gaussian) distribution with mean less, a null hypothesis is often an artificial construct:
µ and variance s2 before any data are recorded, the investigator knows—at
P Achieved significance level least, suspects—that the null hypothesis is not exactly
Pr 5A6 Probability of event A true. Moreover, the only question a hypothesis test can
s Population standard deviation answer is a trivial one: is there anything other than
sy Standard deviation of the sampling distri- random variation here?2
bution of the sample mean Statisticians have emphasized repeatedly the limited
s Sample standard deviation value of hypothesis testing (2, 4, 9, 18, 24, 28, 31, 38,
s2 Population variance 50). In fact, the P values that result from hypothesis
s2 Sample variance tests have been described as ‘‘absurdly academic’’3 (25)
SE 5q6 Standard error of the quantity q and as having a ‘‘strictly limited role’’ (19) in data
Var 5q6 Variance of the quantity q analysis. Within the scientific community, unwar-
Y Random variable Y
yi Sample observation i, where i 5 1, 2, . . . , n
y Sample mean 1 The adjective ‘‘null’’ can be misleading: this hypothesis need not

be one of no difference. The use of null persists because of historical


SCIENTIFIC USES OF STATISTICS inertia.
2 Kruskal (38) reviews other drawbacks to hypothesis testing.
In science, there are two main uses of statistics: 3 Sir Ronald Fisher, the author of this phrase, developed many

hypothesis testing and estimation. Most researchers statistical procedures, including the analysis of variance.
INVITED REVIEW 777

ranted focus on hypothesis testing has blurred the son procedures is beyond the scope of this review; Refs.
distinction between statistical significance and scien- 2, 9, 42, and 48 summarize these issues.
tific importance (3, 13, 19). Most investigators appear For the rest of this review, we focus our attention on
to reach scientific conclusions that are based not on several aspects of estimation.
their knowledge of science but solely on the probabili-
USING SAMPLES TO LEARN ABOUT POPULATIONS
ties of test statistics (16); this is an untenable approach
to scientific discovery. As researchers, we use samples to make inferences
The limited utility of hypothesis testing can be about populations. A sample interests us not because of
demonstrated with an example. Suppose a clinician its own merits but because it helps us estimate selected
wants to assess the impact of a placebo and the characteristics of the underlying population: for ex-
b-blockers bisoprolol and metoprolol on heart rate ample, the sample mean y estimates the population
variability in patients with left heart failure. Suppose mean µ.5
also that the clinician constructs the null and alterna- As an illustration, suppose the random variable Y
tive hypotheses, H0 and H1, as represents the change in systolic blood pressure after
some intervention. Suppose also that the distribution of
H0: treatments have identical effects Y conforms to a normal distribution. A normal distribu-
on heart rate variability tion is specified completely by two parameters: the
H1: treatments have different effects mean and variance. The population mean µ conveys the
location of the center of the distribution; the population
on heart rate variability
standard deviation s, the square root of the population
The result of this hypothesis test will fail to convey any variance s2, conveys the spread of the distribution. The
information about the direction or magnitude of the distribution of possible outcomes of the random vari-
treatment effects on heart rate variability. Direction able Y is described by the normal probability density
and magnitude are important: in patients with left function ( f ), which incorporates µ and s2
heart failure, decreases in heart rate variability are 1
associated with increases in the risk of sudden cardiac f(y) 5 ·exp 52( y 2 µ) 2/(2·s2 )6,
catastrophe (49). Direction and magnitude of an effect sÎ2p (1)
reflect scientific importance; they are obtained by esti-
for 2` , y , 1`
mation.
Estimation. Regardless of the statistical result of a In Fig. 1, the distributions for three different popula-
hypothesis test, the crucial question concerns the scien- tions are theoretical: each depicts the distribution of
tific result: is the experimental effect big enough to be population values as if we had observed the entire
relevant? A point estimate of a population parameter4 population.6
and an interval estimate of the uncertainty about the Suppose we want to estimate µ1 5 215, the mean of
value of that parameter help answer this question. For population 1, in Fig. 1. To do this, we would measure
example, one point estimate of a population mean is the the change in systolic blood pressure in a sample of n
sample mean; one interval estimate of the uncertainty independent observations, y1, y2, . . . , yn, from the
about the value of the populations mean is a confidence population. For simplicity, assume we limit the sample
interval. Interval estimates circumvent the drawbacks to 10 observations. One random sample is
inherent to hypothesis testing, yet they provide the
same statistical information as a hypothesis test (15, 233, 215, 26, 0, 18, 23, 8, 222, 222, 27
18, 28, 38). More important, point and interval esti- The average of these sample observations is the sample
mates convey information about scientific importance. mean y
Practical considerations. Estimation focuses atten-
10
tion on the magnitude and uncertainty of the experimen- 1
tal results. We must emphasize that hypothesis testing y5
10
· o y 5 28.2
i51
i (2)
can have value beyond assessing the strength of the
experimental evidence: for example, hypothesis testing Because of intrinsic variability in the population, the
is useful if an investigator wants to evaluate the sample mean y differs from the population mean µ1;
importance of between-subject variability in an experi- only because this is a contrived example do we know
ment. In practice, estimation should be done whenever the true magnitude of the discrepancy.7 Next, we re-
it is relevant and feasible; the precise P value from the view measures that estimate variability in the popula-
associated hypothesis test should be reported with the tion.
point and interval estimates. When more than one
hypothesis is tested in an experiment, the problem of 5
multiple comparisons becomes relevant. Nevertheless, References 2, 9, 42, and 48 discuss other aspects of sampling.
6 Statistical calculations and exercises were executed by using SAS
a discussion of the issues involved in multiple-compari- Release 6.04 (SAS Institute, Cary, NC, 1987).
7 We address the discrepancy between the value of the sample

estimate of a population parameter and the value of the population


4 A parameter is a numerical constant: for example, the population parameter itself in ESTIMATING UNCERTAINTY ABOUT A POPULATION
mean. PARAMETER.
778 INVITED REVIEW

Fig. 1. Using samples to learn about populations: 3


normal distributions. These distributions differ in
location, reflected in the mean µ, or spread, reflected
in the standard deviation s. A normal probability
density function (Eq. 1) describes the distribution of
each population.

ESTIMATING VARIABILITY IN THE POPULATION be a deceptive one: even subtle departures from a
normal distribution can render useless the standard
The preceding sample observations, 233, 215, . . . ,
27, differ because the population from which they were deviation as an index of variability (43); often, the
drawn is distributed over a range of possible values. distribution of a biological variable differs grossly from
This intrinsic variability is more than a distraction: it is a normal distribution. As one example, the distribution
an integral part of statistics, and the careful study of of values for plasma creatinine (26) resembles the
variability may reveal something about underlying skewed distribution depicted in Fig. 2. When the tails of
scientific processes (25). The most common measure of a distribution are elongated, as is the right tail of this
the variability among sample observations is the sam- skewed distribution, the sample standard deviation
ple standard deviation s, the square root of the sample will be an inflated measure of variability in the popula-
variance s 2 tion (43, 48). There are two remedies to this misrepre-
n
sentation of variability by the standard deviation: use
1
s 5
2
n21
· o ( y 2 y)
i51
i
2 another measure of variability, or transform the data.
Alternative measures of variability. Two measures of
variability that are useful with a variety of distribu-
(See also Refs. 2, 9, 42, and 48.) The sample standard tions are the mean absolute deviation and the interquar-
deviation characterizes the typical distance of an obser- tile range. The mean absolute deviation (Ave 50 dev06) is
vation from the distribution center; in other words, it the average distance of the sample observations from
reflects the dispersion of individual sample observa- the sample mean
tions about the sample mean. The sample standard
deviation s also estimates the population standard 1 n
deviation s: the standard deviation of the sample
observations 233, 215, . . . , 27 is s 5 15.2, which
Ave 50dev 06 5 ·o 0y 2 y 0
n i51 i
estimates s 5 20. The interquartile range (often designated as IQR)
Most journals would publish the preceding sample
encompasses the middle 50% of a distribution and is
mean and standard deviation as
the difference between the 75th and 25th percentiles.
28.2 mmHg 6 15.2 For 0 , w , 1, the 100wth percentile is the value below
The 6 symbol, however, is superfluous: the standard which 100w% of the distribution is found.
deviation is a single positive number. A standard Data transformation. When the sample observations
deviation can be reported clearly with notation of this happen to be drawn from a population that has a
form skewed distribution (e.g., a constituent of blood or the
growth rate of a tumor), a transformation may change
28.2 mmHg (SD 15.2) the shape of their distribution so that the distribution
In a table, the symbol SD can be omitted without loss of of the transformed observations is more symmetric (14,
clarity as long as the table legend identifies the paren- 23, 26, 32, 48). Common transformations include the
thetical value as a standard deviation. logarithmic, inverse, square root, and arc sine transfor-
The standard deviation is often a useful index of mations. The APPENDIX reviews a useful family of data
variability, but in many experimental situations it may transformations.
INVITED REVIEW 779

Fig. 2. Estimating variability in the population: a


skewed distribution. The lognormal probability den-
sity function (Eq. A1) describes this skewed distribu-
tion in which the Pr 5Y # 6.16 5 0.50 and the Pr 52.1 #
Y # 16.46 5 0.68 (gray area). For a normal distribu-
tion with the same mean and variance (inset), the
Pr 5Y # 10.06 5 0.50, and the Pr 523.1 # Y # 23.16 5
0.68 (gray area). See APPENDIX for further explana-
tion.

In the next section, we revisit the unknown discrep- and variance s2, which are known; the notation for this
ancy between the sample estimate of a population normal distribution is Y , N(µ, s2 ). If an infinite
parameter and the population parameter itself. number of samples, each with n independent observa-
tions, is drawn from this normal distribution, then the
ESTIMATING UNCERTAINTY ABOUT
sample means y1, y2, . . . , y` will also be distributed
A POPULATION PARAMETER
normally.8 The average of the sample means, Ave 5y6, is
In the sampling exercise from USING SAMPLES TO the population mean µ, but the variance of the sample
LEARN ABOUT POPULATIONS, the sample mean y 5 28.2 means (Var 5y6) is smaller than the population variance
(Eq. 2) estimated the population mean µ1 5 215. If we s2 by a factor of 1/n
had calculated this sample mean from experimental
observations, then we would be uncertain about the Ave 5y6 5 µ and Var 5y6 5 s2y 5 s2/n
magnitude of the discrepancy between the sample
estimate y and the population parameter µ1. The ability (The APPENDIX derives these expressions. Figure 3
to estimate the level of uncertainty about the value of a develops these expressions using empirical examples.)
population parameter by using the sample estimate of Therefore, the standard deviation of the theoretical
that parameter is a powerful aspect of statistics (47). distribution of the sample mean, sy, is
Suppose we measure the same response variable, the
change in systolic blood pressure, in a second sample of sy 5 s/În
10 independent observations drawn from the same
population. We know beforehand that because of ran- If the sample size n increases, then the standard
dom sampling the mean of the second sample, y2, will deviation sy will decrease: that is, the more sample
differ from the mean of the first sample, y1 5 28.2. If we observations we have, the more certain we will be that
measure the change in systolic blood pressure in 100 the point estimate y is near the actual population
samples of 10 independent observations, then we ex- mean µ.
pect 100 different estimates of the population mean µ1; The standard deviation of the theoretical distribu-
for example
tion of the sample mean is known also as the standard
y1 5 28.2, y2 5 28.1, · · ·, y100 5 222.5 error of the sample mean, SE 5y6; that is

If we treat these 100 observed sample means as 100 SE 5y6 5 s/În


observations, then we can calculate their mean and
standard deviation, designated as Ave5y6 and SD5y6 In estimation, the standard error of the mean has no
particular value; instead, it is useful because of its role
Ave 5y6 5 214.5 and SD 5y6 5 6.07
8 The Central Limit Theorem states that the theoretical distribu-
We can generalize from this empirical distribution of
tion of the sample mean will be approximately normal, regardless of
sample means to a theoretical distribution of the sam- the distribution of the original observations (35, 42). If the distribu-
ple mean for a sample of size n. Consider a random tion of the original observations happens to be normal, then the
variable Y that is distributed normally with mean µ theoretical distribution of the sample mean will be exactly normal.
780 INVITED REVIEW

Fig. 3. Estimating uncertainty about a population pa-


rameter: empirical distributions of sample means. These
distributions are based on 1,000 samples of 5 (A), 10 (B),
20 (C), or 40 (D) observations drawn at random from
population 1, for which the mean µ 5 215 and the
variance s2 5 400. For each empirical distribution, the
average of the sample means, Ave 5 y6, happens to be
215.1. As sample size increases, however, the sample
means become concentrated more closely about Ave 5 y6.
When sample size doubles, the variance of the sample
means, Var 5 y6, is approximately halved.

in the calculation of a confidence interval for the the interval


population mean µ.9
Confidence intervals. When we construct a confidence [y 2 a, y 1 a] (6)
interval for the population mean, we assign numerical is called the 100(1 2 a)% confidence interval for the
bounds to the expected discrepancy between the sam- population mean µ.
ple mean y and the population mean µ. In essence, a In practice, the sample standard deviation s esti-
confidence interval is a range that we expect, with some mates the population standard deviation s, which
means that s/În estimates the standard error of the
level of confidence, to include the actual value of the
population mean. Below, we use the theoretical distribu-
mean (Eq. 3). In calculating a 100(1 2 a)% confidence
tion of the sample mean to derive the confidence
interval for the mean µ, this uncertainty about the
interval for the population mean µ.10
In the theoretical distribution of the sample mean, actual value of s is handled by replacing za/2 in Eq. 5
100(1 2 a)% of the possible sample means is included with ta/2,n, the 100[1 2 (a/2)]th percentile from a Stu-
in the interval dent t distribution with n 5 n 2 1 degrees of freedom.
Therefore, the allowance applied to the sample mean to
[µ 2 a, µ 1 a] (4) obtain the 100(1 2 a)% confidence interval for the
population mean (Eq. 6) is
where the allowance a is
a 5 ta/2,n ·SE 5y6
a 5 za/2 ·SE 5y6 (5)
where SE 5y6 5 s/În. Note that this allowance exceeds
In Eq. 5, za/2 is the 100[1 2 (a/2)]th percentile from the
the allowance in Eq. 5: there is greater uncertainty
standard normal distribution, i.e., a normal distribu-
tion with mean 0 and variance 1, and SE 5y6 is defined about the value of the population mean µ. This happens
by Eq. 3. Therefore, when the population standard because if n , `, then ta/2,n . za/2 for all values of a.
deviation s is known, 95% of the possible sample means Suppose we want to calculate a confidence interval
are within 1.96·SE 5y6 of the population mean µ. for the population mean µ1 5 215 by using the observa-
The interval in Eq. 4 can be written as the probability tions 233, 215, . . . , 27 of the first sample. The mean
expression and standard deviation of these 10 observations are y 5
28.2 and s 5 15.2. Therefore, the estimated standard
Pr 5µ 2 a # y # µ 1 a6 5 1 2 a error of the mean is
which declares that the probability is 1 2 a that a SE 5y6 5 s/În 5 15.2/Î10 5 4.81
sample mean lies within the interval [µ 2 a, µ 1 a].
After algebraic rearrangement, this expression can be Because n 5 10, there are n 5 n 2 1 5 9 degrees of
written freedom. If we want a 95% confidence interval, then a 5
0.05, ta/2,n 5 2.26, and the allowance a 5 2.26 3 4.81 5
Pr 5y 2 a # µ # y 1 a6 5 1 2 a 10.9. Therefore, the 95% confidence interval is
but note that the randomness resides in the parameter [219.1, 12.7]
estimate y, not in the actual parameter µ. In this form,
In other words, we can declare, with 95% confidence,
9 References 2, 9, 42, and 48 discuss the calculation of confidence
that the population mean is included in the interval
intervals for other population parameters.
[219.1, 12.7].
10 Moses (Ref. 42, p. 113–117) illustrates further the concept of a Bear in mind that a single confidence interval either
confidence interval by using empirical examples. does or does not include the value of the population
INVITED REVIEW 781

Fig. 4. Estimating uncertainty about a population pa-


rameter: 95% confidence intervals for a population
mean. These confidence intervals are for 100 samples of
10 observations drawn at random from population 1 in
Fig. 1. It is because of the random sampling that the
position and length of the confidence interval vary from
sample to sample. About 95 of these intervals—the
actual number will vary—are expected to cover the
population mean of 215 mmHg. In this example, 98 of
the confidence intervals cover the population mean µ;
the 2 exceptions are highlighted (heavy black lines
numbered 1 and 2).

parameter; in experimental situations, we are uncer- (45) evaluated the impact of antihypertensive drugs on
tain about which of these outcomes has occurred. the incidence of stroke in persons with isolated systolic
Instead, the level of confidence in a confidence interval hypertension. When compared with placebo, these drugs
is based on the concept of drawing a large number of reduced by 36% (P 5 0.0003) the incidence of stroke.
samples, each with n observations, from the popula- Associated with this reduced incidence of stroke was a
tion. When we measured the change in systolic blood greater decrease in systolic blood pressure.
pressure in 100 random samples, we obtained 100 To appreciate the distinction between statistical sig-
different sample means and 100 different sample stan- nificance and scientific importance, consider two popu-
dard deviations. As a consequence, we will calculate lations that represent the theoretical distributions of
100 different 100(1 2 a)% confidence intervals; we ex- the decreases in systolic blood pressure for the two
pect ,100(1 2 a)% of these observed confidence inter- groups. Let the decrease in systolic blood pressure of
vals to include the actual value of the population mean the placebo group be designated Y1 and that of the drug
(see Fig. 4). treatment group be designated Y2. Assume that Y1 and
A confidence interval characterizes the uncertainty Y2 are distributed normally
about the estimated value of a population parameter.
Sometimes, an investigator may be interested less in Y1 , N (µ1, s21 ) and Y2 , N (µ2, s22 )
the value of the population parameter and more in the
distribution of individual observations. A tolerance The normal probability density function (Eq. 1), in
interval characterizes the uncertainty about the esti- which approximate values for the observed sample
mated distribution of those individual observations means and variances from the SHEP trial, yi and s2i , are
(see APPENDIX). substituted for the population means and variances,
Next, we illustrate the distinction between statistical generates the population distributions depicted in Fig. 5
significance and scientific importance. Last, we show
that the numerical results of statistical analyses have y1 5 215 ⇒ µ1, s 21 5 400 ⇒ s21
limitations.
and y2 5 225 ⇒ µ2, s 22 5 400 ⇒ s22
STATISTICAL AND SCIENTIFIC SIGNIFICANCE DIFFER
Suppose our objective is to estimate the difference
Hypothesis testing, as the primary scientific use of between population means
statistics, has a drawback: the result of a hypothesis
test conveys mere statistical significance. In contrast, µ2 2 µ1 5 225 2 (215) 5 210 mmHg
estimation conveys scientific significance.11 This distinc- The SHEP group established convincingly that the
tion is obvious if we use the results of a recent clinical difference µ2 2 µ1, which represents the greater de-
trial. In this trial, the Systolic Hypertension in the crease in systolic blood pressure after drug therapy,
Elderly Program (SHEP) Cooperative Research Group was important. To estimate µ2 2 µ1, we would sample at
random from each population: the difference between
11 The word ‘‘significance,’’ when used to refer to scientific conse- sample means, y2 2 y1, estimates the difference be-
quence, is ambiguous. Hereafter, we use the word ‘‘importance.’’ tween population means, µ2 2 µ1.
782 INVITED REVIEW

Fig. 5. Statistical and scientific significance differ: pla-


cebo (black) and drug-treatment (gray) populations. The
populations represent theoretical distributions of
changes in systolic blood pressure during year 5 of the
Systolic Hypertension in the Elderly Program clinical
trial (see Ref. 45). The distributions are described by the
normal probability density function (Eq. 1) in which the
sample means and variances, yi and s2i , are substituted
for the population means and variances. To generate
samples of size n from each population, observations
(Obs) were drawn at random from the placebo popula-
tion; corresponding observations from the drug-treat-
ment population were obtained by subtracting 10 from
each placebo observation. The sampling procedure is
illustrated for n 5 2.

By drawing samples of 2–128 observations from each scientific importance can be maintained by routinely
population (Table 2) and by forcing y2 2 y1 5 210 (see addressing two questions: how likely is it that the
Fig. 5), the distinction between statistical significance experimental effect is real, and is the experimental
and scientific importance becomes clear. As sample size effect large enough to be relevant? The first question
n grows, the statistical significance increases, from P 5 can be answered simply: compare the P value, obtained
0.71 for n 5 2 to P , 0.001 for n 5 128. Regardless of in the hypothesis test, with the critical significance
sample size, one aspect of scientific importance, that level a, chosen before any data are collected; if P , a,
reflected by the difference y2 2 y1, remains constant. As then the experimental effect is likely to be real. The
sample size increases, uncertainty about the actual second question can be answered in two steps: calculate
difference µ2 2 µ1, another aspect of scientific impor- a confidence interval for the population parameter, and
tance characterized by the numerical bounds of the then assess the numerical bounds of that confidence
confidence interval, decreases.
interval for scientific importance; if either bound of the
Practical considerations. In experimental situations,
confidence interval is important from a scientific per-
the distinction between statistical significance and
spective, then the experimental effect may be large
enough to be relevant.
Table 2. Statistical and scientific significance differ: Consider the results when 15 sample observations
statistical results were drawn from the placebo and drug treatment
95% Confidence
populations: when compared with placebo, the greater
n y2 2 y1 SE 5 y2 2 y1 6 Interval* t† Pr 5µ2 2 µ1 5 06‡ decrease in systolic blood pressure after drug therapy
was unconvincing from a statistical perspective
2 210 23.1 2110 to 190 20.43 0.71
4 210 22.6 265 to 145 20.44 0.67 (P 5 0.18). Because the 95% confidence interval was
8 210 12.3 236 to 116 20.81 0.43 [225, 15], uncertainty about the actual impact of drug
10 210 10.1 231 to 111 20.99 0.34 treatment on systolic blood pressure is relatively large.
15 210 7.3 225 to 15 21.38 0.18
20 210 5.8 222 to 12 21.74 0.09
Note, however, that the additional decrease in systolic
25 210 5.3 221 to 11 21.88 0.07 blood pressure gained by drug treatment may have
32 210 4.7 219 to 21 22.13 0.04 been as pronounced as 25 mmHg. From a scientific
64 210 3.7 217 to 23 22.70 ,0.01 perspective, further studies, designed with greater
128 210 2.4 215 to 25 24.25 ,0.001
statistical power, are warranted.
n, Sample size drawn from placebo ( population 1) and drug To illustrate that a significant statistical result may
treatment (population 2) populations (see Fig. 5). * Confidence inter- have little scientific importance, imagine that systolic
val for the difference between population means, µ2 2 µ1 (see Eq. A2).
† Test statistic used to evaluate statistical significance of the differ- blood pressure had been measured in mmH2O rather
ence y2 2 y1 (see Eq. A3). ‡ Probability (2-tailed) that µ2 2 µ1 5 0; this than in mmHg. Consider the results when 128 sample
is the significance level P for the null hypothesis H0 : µ2 2 µ1 5 0. The observations were drawn from the two populations: the
difference y2 2 y1 and the 95% confidence interval for the difference greater decrease in systolic blood pressure after drug
µ2 2 µ1 reflect the magnitude and uncertainty of the experimental
results. The test statistic t and its associated P value reflect statisti- therapy was compelling from a statistical perspective
cal significance. An increase in the no. of observations drawn from (P , 0.001). If the confidence interval [215, 25] is
each population decreases SE 5 y2 2 y1 6: as a consequence, the statisti- expressed in mmHg (by dividing each bound by 13.6),
cal significance increases (irregularly, because of random sampling),
but the estimated difference between population means remains
then the investigator can declare, with 95% confidence,
constant at y2 2 y1 5 210. The APPENDIX details the statistical that the magnitude of the greater decrease in systolic
equations required to perform this sampling exercise. blood pressure was 0.4–1.1 mmHg. In this example, the
INVITED REVIEW 783

investigator can be quite certain of a trivial experimen- Table 3. Limitations of statistics: raw data and
tal effect. regression statistics
Whatever the statistical result of a hypothesis test,
assessment of the corresponding confidence interval Drug A Drug B

incorporates the scientific importance of the experimen- x y x y For each drug:


tal result.
10 8.04 10 9.14 No. of observations (n) 5 11
LIMITATIONS OF STATISTICS 8 6.95 8 8.14 Average of x values (x) 5 9.0
13 7.58 13 8.74 Average of y values (y) 5 7.5
Although the process of scientific discovery requires 9 8.81 9 8.77 Equation of regression line: ŷ 5 3 1 0.5x
11 8.33 11 9.26 Standard error of estimate of slope
an understanding of fundamental concepts in statis- [SE5b16] 5 0.118
tics, the use of statistics does have limitations. For 14 9.96 14 8.10 t5H0 : slope (b1 ) 5 06 5 4.24
example, not many of us would accept, solely on the 6 7.24 6 6.13 Sum of squares of x values [S(x 2 x)2] 5 110.0
basis of a close temporal relationship, that solar radia- 4 4.26 4 3.10 Regression sum of squares 5 27.50
12 10.84 12 9.13 Residual sum of squares 5 13.75
tion governs stock market prices (Fig. 6). The limita- 7 4.82 7 7.26 Correlation coefficient (r) 5 0.82
tions of statistics are more subtle if an association is 5 5.68 5 4.74 %Total sum of squares explained by regression
plausible. (R 2 ) 5 67%
Imagine this scenario: a neurological syndrome re- For drugs A and B, values are administered drug concentration x,
sults from impaired production of some neurotransmit- measured increase in neurotransmitter production y, and statistics
ter. Drugs A and B, derivatives of the same parent from regression analysis of the first-order model Y 5 b0 1 b1X 1 e,
compound, both stimulate production of this neurotrans- where e is error. Additional regression analyses (23) reveal that this
mitter. Just one of the drugs, however, continues to model is inappropriate for drug B (see Fig. 7). Data are from
Anscombe (6).
increase neurotransmitter production over its entire
therapeutic range. At higher doses, the second drug
becomes less effective at boosting neurotransmitter statistical technique are to be verified. For examples in
production and causes neurotoxicity. For each drug, regression, see chapt. 3 in Ref. 23.
Table 3 lists administered drug concentrations and
measured increases in neurotransmitter production. If SUMMARY
you rely on only the regression statistics in Table 3, It is depressing to find how much good biological work is
which drug is which? If you are unfortunate and in danger of being wasted through incompetent and
happen to have this hypothetical syndrome, then your misleading analysis . . .
choice assumes added importance. Frank Yates and Michael J. R. Healy (1964)
From the regression statistics alone, it is impossible This scathing remark, written almost 35 years ago
to differentiate the drugs. Their identities are plain, (50) but relevant even now (4), reflects the frustrations
however, when the data are plotted (Fig. 7): drug A felt by statisticians over the statistical misconceptions
increases neurotransmitter production over the entire held by scientists. These misconceptions exist in large
range of drug concentrations; the increase in neuro- part because of shortcomings in the cursory statistics
transmitter production begins to fall at higher concen- education we received in graduate or medical school (4,
trations of drug B. 11, 12). The major defect in most introductory courses
Practical considerations. Data graphics are essential in statistics is that fundamental concepts in statistics,
also if the requisite assumptions behind a particular the cornerstone of scientific inquiry (47), are neglected
rather than emphasized (4, 7, 17, 44, 50). Statisticians
share responsibility with other faculty for ensuring

Fig. 7. Limitations of statistics: scatterplots of drug concentration x


Fig. 6. Limitations of statistics: solar radiation and New York stock and increase in neurotransmitter production y. For each drug, the
market prices during 1929 (after Ref. 27). In general, increases in fitted first-order model ŷ 5 3 1 0.5x and corresponding regression
stock prices were associated with decreases in solar radiation. This statistics are identical (see Table 3). For only drug A, however, is this
nonsensical association illustrates the phenomenon of spurious corre- first-order relationship plausible. For drug B, a second-order model of
lation. the form Y 5 b0 1 b1 X 1 b2 X 2 1 e is required.
784 INVITED REVIEW

that introductory courses in statistics are relevant and Consider the linear function L
sound (7, 44, 50).
L 5 k1X11 k2X2 1 · · · 1 kmXm
In this review, we have reiterated the primary role of
statistics within science to be one of estimation: estima- For i 5 1, 2, . . . , m, each ki is a real constant, and each Xi ,
tion of a population parameter or estimation of the N(µi, s2i ). The mean of L, Ave 5L6, is
uncertainty about the value of that parameter. More- m
over, we have demonstrated the essential distinction
between statistical significance and scientific impor-
Ave 5L 6 5 k1µ1 1 k2µ2 1 · · · 1 kmµm 5 okµ
i51
i i

tance; of the two, scientific importance merits more


consideration. We have shown also that without data If X1, X2, . . . , Xm are mutually independent, then the vari-
graphics, data analysis is a game of chance. And last, ance of L, Var 5L6, is
that this review was written by a physiologist and two m
statisticians embodies one of the most basic notions in
all science: collaboration.
Var 5L 6 5 k 21s21 1 k 22s22 1 · · · 1 k 2ms2m 5 ok s
i51
2 2
i i

APPENDIX If the function L is x, the mean of the n sample observations


x1, x2, . . . , xn, then m 5 n, and furthermore, for i 5 1, 2, . . . , n
This APPENDIX reviews the lognormal distribution (a distri-
bution that reveals limitations of the standard deviation as ki 5 1/n and Xi , N (µ, s2 )
an estimate of variability), a versatile family of data transfor-
mations, the theoretical distribution of the sample mean, Therefore
tolerance intervals, the statistical equations required to n n
perform the significance sampling exercise, and the confi-
dence interval for the difference between two population
Ave 5L 6 5 o k µ 5 o µ/n 5 n · (µ/n) 5 µ 5 Ave 5x6
i51
i i
i51
means.
Lognormal distribution. The lognormal distribution is a and
common probability distribution model for skewed data. The n n
random variable Y is distributed lognormally if the logarithm
of Y is distributed normally with mean t and variance j2, or
Var 5L 6 5 o
i51
k 2i s2i 5 o s /n
i51
2 2
5 n · (s2/n 2 ) 5 s2/n 5 Var 5x6
ln Y , N(t, j2 ). Formally, the lognormal probability density
function g is Tolerance intervals. A tolerance interval identifies the
bounds that are expected to contain some percentage of a
1 population, not just a single population parameter such as
g( y) 5 · exp 52ln2 ( y/e t )/(2 · j2 )6, for y . 0 (A1)
yjÎ2p
the mean (41). If a normal distribution has mean µ and
variance s2, which are known, then the 100w% tolerance
The mean µg and variance s2g of the lognormal distribution interval is
specified by Eq. A1 are
[µ 2 5z(12w)/2 · s6, µ 1 5z(12w)/2 · s6]
2 2 2
µg 5 e t1(j /2)
and s2g 5 e 2t1j · (e j 2 1) where z(12w)/2 is the 100[1 2 5(1 2 w)/26]th percentile from the
standard normal distribution, i.e., N(0, 1). This tolerance
For the distribution in Fig. 2, t 5 1.803 and 5 1; there- j2
interval covers exactly 100w% of the distribution. If w 5 0.95,
fore, µg 5 10 and s2g 5 172.
then z(12w)/2 5 1.96. For the population that represented the
A family of data transformations. Box and Cox (14) have
described a family of power transformations in which an change in systolic blood pressure after some intervention (see
USING SAMPLES TO LEARN ABOUT POPULATIONS), µ 5 215 and
observed variable y is transformed into the variable w by
using the parameter l s 5 20; therefore, the exact 95% tolerance interval is

( y l 2 1)/l for l Þ 0, and [254, 124]


w5 5 ln y for l 5 0 In practice, the sample statistics y and s are used to
estimate the population parameters µ and s. This element of
The inverse (l 5 21) and square root transformations uncertainty about the values of µ and s is handled by
(l 5 0.5) are members of this family. Draper and Smith (Ref. replacing z(12w)/2 with the confidence coefficient k, where k
23, p. 225–226) summarize the steps required to estimate the depends on w as well as the sample size n. Therefore, the
parameter l so that the distribution of w is as normal estimated 100w% tolerance interval is
(Gaussian) as possible.
Theoretical distribution of the sample mean. Suppose some [ y 2 ks, y 1 ks]
random variable X is distributed normally with mean µ and
[If w 5 0.95 and n 5 `, then k 5 z(12w)/2 5 1.96 as above, when
variance s2: that is, X , N(µ, s2 ). When a sample of n
independent observations, x1, x2, . . . , xn, is drawn repeatedly µ and s were known.] The coefficient k is chosen to enable the
from this distribution, the observed sample means can be declaration, with 100(1 2 a)% confidence, that the estimated
treated as observations. These sample means will be distrib- tolerance interval covers 100w% of the distribution (see Table
uted normally with mean µ and variance s2/n, or XIV in Ref. 41).
For the observations listed in USING SAMPLES TO LEARN
Ave 5x6 5 µ and Var 5x6 5 s2x 5 s2/n ABOUT POPULATIONS, y 5 28.2 and s 5 15.2. Suppose we want
to estimate with 95% confidence a 90% tolerance interval
As you might expect, there is a mathematical foundation to based on these results. When we use these percentages and
these relationships. the sample size of 10, the coefficient k 5 2.839. Therefore, the
INVITED REVIEW 785

tolerance interval is If we want a 95% confidence interval for each population


mean (Eq. 6), then a 5 0.05, ta/2,n 5 2.04, and the allowance
[251, 135] a 5 2.04 3 3.32 5 6.8. Therefore, the 95% confidence interval
for the mean of the placebo population is
In other words, we can declare, with 95% confidence, that 90%
of persons will have a change in systolic blood pressure of [217, 23]
between 251 and 135 mmHg after the intervention. Note
that this statement differs markedly from our previous the 95% confidence interval for the mean of the drug treat-
assertion, made also with 95% confidence, that the population ment population is
mean µ was included in the interval [219.1, 12.7]. [227, 213]
The tolerance intervals outlined above are appropriate
only if the distribution of the underlying population is Because these individual confidence intervals overlap, we
normal; other formulas exist to construct tolerance intervals might conclude that there is insufficient evidence to declare
when the population is distributed nonnormally. that the two population means differ. In this example, 86%
Equations for the significance sampling exercise. For two confidence intervals for the population means would just fail
samples of equal size n, the standard error of the difference to overlap: that is, we could declare that the population
between sample means, SE 5 y2 2 y1 6, is estimated as means differ at the P 5 0.14 level. Note that when we
calculate a confidence interval for the difference between
SE 5 y2 2 y1 6 5 Î(s 2
2 1 s 21 )/n these population means, we are more confident that an actual
difference exists.
where s 2j is sample variance.
The 100(1 2 a)% confidence interval for µ2 2 µ1, the We thank Brenda B. Rauner, Publications Manager and Executive
difference between population means, is Editor, APS Publications, for providing the information about re-
search manuscripts published by the American Physiological Society.
[5( y2 2 y1 ) 2 a6, 5( y2 2 y1 ) 1 a6] (A2) This review was supported in part by the Dept. of Pediatrics (M.
Douglas Jones, Jr., Chair); by a Grant-in-Aid from the American
The allowance a applied to the difference y2 2 y1 is Heart Association of Colorado and Wyoming (to D. Curran-Everett);
and by the National Science Foundation Grant DMS 95-10435 (to K.
a 5 ta/2,n · SE 5 y2 2 y1 6 Kafadar).
Address for reprint requests: D. Curran-Everett, Dept. of Pediat-
where ta/2,n is the 100[1 2 (a/2)]th percentile from a Student t
rics, B-195, Univ. of Colorado Health Sciences Center, 4200 East 9th
distribution with n 5 2n 2 2 degrees of freedom. In this Ave., Denver, CO 80262 (E-mail: dcurranevere@castle.cudenver.edu).
sampling exercise, we use the t distribution because we
assume the standard deviations of the populations are un- REFERENCES
known (42). 1. Altman, D. G. Misuse of statistics is unethical. In: Statistics in
The test statistic used to evaluate statistical significance of Practice, edited by S. M. Gore and D. G. Altman. London: Br.
the difference y2 2 y1 is Med. Assoc., 1982, p. 1–2.
2. Altman, D. G. Practical Statistics for Medical Research. New
( y2 2 y1 ) 2 0 York: Chapman & Hall, 1991.
t 5H0 : µ2 2 µ1 5 06 5 (A3) 3. Altman, D. G. Statistics in medical journals: developments in
SE 5 y2 2 y1 6 the 1980s. Stat. Med. 10: 1897–1913, 1991.
4. Altman, D. G., and J. M. Bland. Improving doctors’ understand-
Confidence interval for the difference between population ing of statistics. J. R. Stat. Soc. Ser. A 154: 223–267, 1991.
means. In the significance sampling exercise (see STATISTICAL 5. Altman, D. G., S. M. Gore, M. J. Gardner, and S. J. Pocock.
AND SCIENTIFIC SIGNIFICANCE DIFFER), we calculated a confi- Statistical guidelines for contributors to medical journals. Br.
dence interval for the difference between two population Med. J. 286: 1489–1493, 1983.
means. Rather than construct a confidence interval for this 6. Anscombe, F. J. Graphs in statistical analysis. Am. Statistician
difference, a researcher could construct a confidence interval 27: 17–21, 1973.
for each population mean: if the two confidence intervals fail 7. Appleton, D. R. What statistics should we teach medical
undergraduates and graduates? Stat. Med. 9: 1013–1021, 1990.
to overlap, the researcher would conclude that the population 8. Arbuthnot, J. Of the Laws of Chance. London: Benj. Motte,
means differ. This approach is conservative. 1692.
Consider the results when 32 sample observations were 9. Armitage, P., and G. Berry. Statistical Methods in Medical
drawn from the placebo and drug treatment populations: Research (3rd ed.). Cambridge, MA: Blackwell Scientific, 1994.
when compared with placebo, drug therapy was associated 10. Bailar, J. C., III, and F. Mosteller. Guidelines for statistical
with a greater decrease in systolic blood pressure (P 5 0.04), reporting in articles for medical journals. Ann. Intern. Med. 108:
and the 95% confidence interval for the difference between 266–273, 1988.
11. Bland, J. M., and D. G. Altman. Caveat doctor: a grim tale of
population means was [219, 21]. That this confidence inter- medical statistics textbooks. Br. Med. J. 295: 979, 1987.
val excludes 0 corroborates that the population means differ 12. Bland, J. M., and D. G. Altman. Misleading statistics: errors in
at the a 5 0.05 level. textbooks, software and manuals. Int. J. Epidemiol. 17: 245–247,
The observed sample means for the placebo and drug 1988.
treatment groups were 13. Boring, E. G. Mathematical vs. scientific significance. Psychol.
Bull. 16: 335–338, 1919.
y1 5 29.9 and y2 5 219.9 14. Box, G. E. P., and D. R. Cox. An analysis of transformations. J.
R. Stat. Soc. Ser. B 26: 211–243, 1964.
the standard errors of these sample means were 15. Burkholder, D. L., and J. Pfanzagl. Estimation. In: Interna-
tional Encyclopedia of the Social Sciences, edited by D. L. Sills.
SE 5 y1 6 5 SE 5 y2 6 5 3.32 New York: Macmillan & The Free Press, 1968, vol. 5, p. 142–157.
16. Burnand, B., W. N. Kernan, and A. R. Feinstein. Indexes and
Because n 5 32, each sample has n 5 n 2 1 5 31 degrees of boundaries for ‘‘quantitative significance’’ in statistical decisions.
freedom. J. Clin. Epidemiol. 43: 1273–1284, 1990.
786 INVITED REVIEW

17. Colditz, G. A., and J. D. Emerson. The statistical content of 35. Hogg, R. V., and A. T. Craig. Introduction to Mathematical
published medical research: some implications for biomedical Statistics (4th ed.). New York: Macmillan, 1978.
education. Med. Educ. 19: 248–255, 1985. 36. Iberall, A. S. The problem of low-dose radiation toxicity. Am. J.
18. Colton, T. Statistics in Medicine. Boston, MA: Little, Brown, Physiol. 244 (Regulatory Integrative Comp. Physiol. 13): R7–R13,
1974. 1983.
19. Cox, D. R. Statistical significance tests. Br. J. Clin. Pharmacol. 37. Jackson, T. E. Comparison of a class of regression equations.
14: 325–331, 1982. Am. J. Physiol. 246 (Regulatory Integrative Comp. Physiol. 15):
20. Denenberg, V. H. Some statistical and experimental consider- R271–R276, 1984.
ations in the use of the analysis-of-variance procedure. Am. J. 38. Kruskal, W. H. Tests of significance. In: International Encyclope-
Physiol. 246 (Regulatory Integrative Comp. Physiol. 15): R403– dia of the Social Sciences, edited by D. L. Sills. New York:
R408, 1984. Macmillan & The Free Press, 1968, vol. 14, p. 238–250.
21. Denham, M. J., A. Foster, and D. A. J. Tyrrell. Work of a 39. Land, T. A., and M. Secic. How to Report Statistics in Medicine.
district ethical committee. Br. Med. J. 2: 1042–1045, 1979. Philadelphia, PA: Am. College Physicians, 1997.
22. DiStefano, J. J., III, and E. M. Landaw. Multiexponential, 40. Landaw, E. M., and J. J. DiStefano III. Multiexponential,
multicompartmental, and noncompartmental modeling. I. Meth- multicompartmental, and noncompartmental modeling. II. Data
odological limitations and physiological interpretations. Am. J. analysis and statistical considerations. Am. J. Physiol. 246
Physiol. 246 (Regulatory Integrative Comp. Physiol. 15): R651– (Regulatory Integrative Comp. Physiol. 15): R665–R677, 1984.
R664, 1984. 41. Montgomery, D. C., and G. C. Runger. Applied Statistics and
23. Draper, N. R., and H. Smith. Applied Regression Analysis (2nd Probability for Engineers. New York: Wiley, 1994, p. 361–363.
ed.). New York: Wiley, 1981. 42. Moses, L. E. Think and Explain with Statistics. Reading, MA:
24. Evans, S. J. W., P. Mills, and J. Dawson. The end of the p Addison-Wesley, 1986.
value? Br. Heart J. 60: 177–180, 1988.
43. Mosteller, F., and J. W. Tukey. Data Analysis and Regression.
25. Fisher, R. A. Statistical Methods and Scientific Inference (3rd
Reading, MA: Addison-Wesley, 1977.
ed.). New York: Hafner, 1973.
44. Murray, G. D. How we should approach the future. Stat. Med. 9:
26. Flynn, F. V., K. A. J. Piper, P. Garcia-Webb, K. McPherson,
1063–1068, 1990.
and M. J. R. Healy. The frequency distributions of commonly
45. SHEP Cooperative Research Group. Prevention of stroke by
determined blood constituents in healthy blood donors. Clin.
Chim. Acta 52: 163–171, 1974. antihypertensive drug treatment in older persons with isolated
27. Garcia-Mata, C., and F. I. Shaffner. Solar and economic systolic hypertension. Final results of the systolic hypertension
relationships: a preliminary report. Q. J. Economics 49: 1–51, in the elderly program (SHEP). JAMA 265: 3255–3264, 1991.
1934. 46. Slinker, B. K., and S. A. Glantz. Multiple regression for
28. Gardner, M. J., and D. G. Altman. Confidence intervals rather physiological data analysis: the problem of multicollinearity. Am.
than P values: estimation rather than hypothesis testing. Br. J. Physiol. 249 (Regulatory Integrative Comp. Physiol. 18):
Med. J. 292: 746–750, 1986. R1–R12, 1985.
29. Garfinkel, D., and K. A. Fegley. Fitting physiological models to 47. Snedecor, G. W. The statistical part of the scientific method.
data. Am. J. Physiol. 246 (Regulatory Integrative Comp. Physiol. Ann. NY Acad. Sci. 52: 792–799, 1950.
15): R641–R650, 1984. 48. Snedecor, G. W., and W. G. Cochran. Statistical Methods (7th
30. Gray, B. H., R. A. Cooke, and A. S. Tannenbaum. Research ed.). Ames: Iowa State Univ. Press, 1980.
involving human subjects. Science 201: 1094–1101, 1978. 49. Tuininga, Y. S., D. J. van Veldhuisen, J. Brouwer, J.
31. Healy, M. J. R. Significance tests. Arch. Dis. Child. 66: 1457– Haaksma, H. J. G. M. Crijns, A. J. Man in’t Veld, and K. I.
1458, 1991. Lie. Heart rate variability in left ventricular dysfunction and
32. Healy, M. J. R. Data transformations. Arch. Dis. Child. 69: heart failure: effects and implications of drug treatment. Br.
260–264, 1993. Heart J. 72: 509–513, 1994.
33. Hill, A. B. Principles of medical statistics. XII—Common falla- 50. Yates, F., and M. J. R. Healy. How should we reform the
cies and difficulties. Lancet i: 706–708, 1937. teaching of statistics? J. R. Stat. Soc. Ser. A 127: 199–210, 1964.
34. Hofacker, C. F. Abuse of statistical packages: the case of the 51. Yates, F. E. Contribution of statistics to ethics of science. Am. J.
general linear model. Am. J. Physiol. 245 (Regulatory Integrative Physiol. 244 (Regulatory Integrative Comp. Physiol. 13): R3–R5,
Comp. Physiol. 14): R299–R302, 1983. 1983.

View publication stats

Vous aimerez peut-être aussi