Académique Documents
Professionnel Documents
Culture Documents
net/publication/216435379
CITATIONS READS
83 820
3 authors, including:
Douglas Curran-Everett
National Jewish Health
170 PUBLICATIONS 7,729 CITATIONS
SEE PROFILE
All content following this page was uploaded by Douglas Curran-Everett on 13 August 2014.
There are very few things which we know, which are not size methods rather than concepts, they contain glaring
capable of being reduc’d to a Mathematical Reasoning, errors, or they perpetuate misconceptions (4, 11, 12).
. . . and where a Mathematical Reasoning can be had, In his editorial prelude to a series of statistical
it’s as great folly to make use of any other, as to grope for papers, Yates (51) wrote that the papers were designed
a thing in the dark when you have a Candle standing to raise statistical consciousness and thereby reduce
by you. statistical errors in journals published by the American
John Arbuthnot (1692) Physiological Society. Rather than reinforce concepts,
STATISTICS IS ONE KIND of mathematical reasoning. Its these papers reviewed methods: analysis of variance
concepts and principles are ubiquitous in science: as (20), linear regression (37, 46), mathematical modeling
researchers, we use them to design experiments, ana- (22, 29, 40), risk assessment (36), and statistical pack-
lyze data, report results, and interpret the published ages (34). The proper use of any statistical technique,
however, requires an understanding of the fundamen-
findings of others. Indeed, it is from this foundation of
tal statistical concepts behind the technique.
statistical concepts and principles that scientific knowl-
How well do physiologists understand fundamental
edge is accumulated. If we fail to understand fully these concepts in statistics? One way to answer this question
fundamental statistical concepts and principles—if our is to examine the empirical incidence of basic statistical
statistical reasoning is faulty—then we are more likely quantities such as standard deviations, standard er-
to reach wrong scientific conclusions. Wrong conclu- rors, and confidence intervals. These quantities charac-
sions based on faulty reasoning is shoddy science; it is terize different statistical features: standard deviations
also unethical (1, 21, 30). characterize variability in the population, whereas
Regrettably, faulty reasoning in statistics rears its standard errors and confidence intervals characterize
head in the practice of science: for 60 years, statisti- uncertainty about the estimated values of population
cians have documented statistical errors in the scien- parameters, e.g., means. Of the original articles pub-
tific literature (3, 4, 17, 33, 50). In part, these errors lished in 1996 by the American Physiological Society,
exist because many introductory textbooks of statistics the overwhelming majority (69–93%, range) report
paradoxically hinder literacy in statistics: they empha- standard errors, apparently not as estimates of uncer-
http://www.jap.org 8750-7587/98 $5.00 Copyright r 1998 the American Physiological Society 775
776 INVITED REVIEW
Table 1. Manuscripts for the American Physiological Society’s journals in 1996: use of statistics and statisticians
%Research Manuscripts That Report
Am. J. Physiol.
Cell Physiol. 43 21 88 0 7 0
Endocrinol. Metab. 28 18 86 0 4 4
Gastrointest. Liver Physiol. 26 8 92 0 4 12
Heart Circ. Physiol. 60 17 87 0 10 3
Lung Cell. Mol. Physiol. 25 20 84 0 4 4
Regulatory Integrative Comp. Physiol. 41 17 88 0 15 12
Renal Fluid Electrolyte Physiol. 27 15 93 0 7 4
J. Appl. Physiol. 62 24 79 0 6 10
J. Neurophysiol. 58 36 69 2 5 7
n, No. of research manuscripts reviewed. In 1996, these journals published a total of 3,693 original articles. No. of articles reviewed
represents a 10% sample (selected by systematic random sampling, fixed start) of articles published by each journal. * Precise P value: for
example, P 5 0.02 (rather than P , 0.05) or P 5 0.13 (rather than P $ 0.05 or P 5 not significant). † We assessed collaboration with a
statistician using author affiliation and acknowledgments. We recognize that a statistician may be affiliated with another department, e.g.,
medicine. Using our criterion, however, few articles (0–12%, range) report formal collaboration of a physiologist with a statistician, a
partnership that typically reaps great rewards.
tainty but as estimates of variability (Table 1). Virtu- use statistics solely for hypothesis testing. In many
ally no articles (0–2%, range) report confidence inter- situations, statisticians play down hypothesis testing
vals, recommended by statisticians (2, 5, 9, 10, 28, 39) and prefer estimation instead.
as interval estimates of uncertainty about the values of Hypothesis testing. To test a scientific hypothesis, a
population parameters. Moreover, few articles (4–15%, researcher must formulate the hypothesis before any
range) report precise P values, which precludes per- data are collected, then design and execute an experi-
sonal assessment of statistical significance. ment that is relevant to it. Because the hypothesis is
In this review, we summarize the primary scientific most often one of no difference, the hypothesis is called,
uses of statistics. Then, we illustrate several fundamen- by tradition, the null hypothesis.1 Using data from the
tal concepts: variability, uncertainty, and significance. experiment, the researcher must next compute the
Last, we illustrate that although an understanding of observed value T of a test statistic. Finally, the re-
concepts such as variability, uncertainty, and signifi- searcher must compare the observed value T with some
cance is necessary, it is not sufficient: it is essential to critical value T *, chosen from the distribution of the
realize also that the numerical results of statistical test statistic that is based on the null hypothesis. If T is
analyses have limitations. more extreme than T *, then that is a surprising result
Glossary if the null hypothesis is true, and the researcher is
entitled, on statistical grounds, to become skeptical
a Critical significance level about the scientific validity of the null hypothesis.
Ave 5q6 Average of the quantity q The statistical test of a null hypothesis is useful
µ Population mean because it assesses the strength of the evidence: it helps
n Degrees of freedom guard against an unwarranted conclusion, or it helps
n Number of observations argue for a real experimental effect (19, 48). Neverthe-
N (µ, s2 ) Normal (Gaussian) distribution with mean less, a null hypothesis is often an artificial construct:
µ and variance s2 before any data are recorded, the investigator knows—at
P Achieved significance level least, suspects—that the null hypothesis is not exactly
Pr 5A6 Probability of event A true. Moreover, the only question a hypothesis test can
s Population standard deviation answer is a trivial one: is there anything other than
sy Standard deviation of the sampling distri- random variation here?2
bution of the sample mean Statisticians have emphasized repeatedly the limited
s Sample standard deviation value of hypothesis testing (2, 4, 9, 18, 24, 28, 31, 38,
s2 Population variance 50). In fact, the P values that result from hypothesis
s2 Sample variance tests have been described as ‘‘absurdly academic’’3 (25)
SE 5q6 Standard error of the quantity q and as having a ‘‘strictly limited role’’ (19) in data
Var 5q6 Variance of the quantity q analysis. Within the scientific community, unwar-
Y Random variable Y
yi Sample observation i, where i 5 1, 2, . . . , n
y Sample mean 1 The adjective ‘‘null’’ can be misleading: this hypothesis need not
hypothesis testing and estimation. Most researchers statistical procedures, including the analysis of variance.
INVITED REVIEW 777
ranted focus on hypothesis testing has blurred the son procedures is beyond the scope of this review; Refs.
distinction between statistical significance and scien- 2, 9, 42, and 48 summarize these issues.
tific importance (3, 13, 19). Most investigators appear For the rest of this review, we focus our attention on
to reach scientific conclusions that are based not on several aspects of estimation.
their knowledge of science but solely on the probabili-
USING SAMPLES TO LEARN ABOUT POPULATIONS
ties of test statistics (16); this is an untenable approach
to scientific discovery. As researchers, we use samples to make inferences
The limited utility of hypothesis testing can be about populations. A sample interests us not because of
demonstrated with an example. Suppose a clinician its own merits but because it helps us estimate selected
wants to assess the impact of a placebo and the characteristics of the underlying population: for ex-
b-blockers bisoprolol and metoprolol on heart rate ample, the sample mean y estimates the population
variability in patients with left heart failure. Suppose mean µ.5
also that the clinician constructs the null and alterna- As an illustration, suppose the random variable Y
tive hypotheses, H0 and H1, as represents the change in systolic blood pressure after
some intervention. Suppose also that the distribution of
H0: treatments have identical effects Y conforms to a normal distribution. A normal distribu-
on heart rate variability tion is specified completely by two parameters: the
H1: treatments have different effects mean and variance. The population mean µ conveys the
location of the center of the distribution; the population
on heart rate variability
standard deviation s, the square root of the population
The result of this hypothesis test will fail to convey any variance s2, conveys the spread of the distribution. The
information about the direction or magnitude of the distribution of possible outcomes of the random vari-
treatment effects on heart rate variability. Direction able Y is described by the normal probability density
and magnitude are important: in patients with left function ( f ), which incorporates µ and s2
heart failure, decreases in heart rate variability are 1
associated with increases in the risk of sudden cardiac f(y) 5 ·exp 52( y 2 µ) 2/(2·s2 )6,
catastrophe (49). Direction and magnitude of an effect sÎ2p (1)
reflect scientific importance; they are obtained by esti-
for 2` , y , 1`
mation.
Estimation. Regardless of the statistical result of a In Fig. 1, the distributions for three different popula-
hypothesis test, the crucial question concerns the scien- tions are theoretical: each depicts the distribution of
tific result: is the experimental effect big enough to be population values as if we had observed the entire
relevant? A point estimate of a population parameter4 population.6
and an interval estimate of the uncertainty about the Suppose we want to estimate µ1 5 215, the mean of
value of that parameter help answer this question. For population 1, in Fig. 1. To do this, we would measure
example, one point estimate of a population mean is the the change in systolic blood pressure in a sample of n
sample mean; one interval estimate of the uncertainty independent observations, y1, y2, . . . , yn, from the
about the value of the populations mean is a confidence population. For simplicity, assume we limit the sample
interval. Interval estimates circumvent the drawbacks to 10 observations. One random sample is
inherent to hypothesis testing, yet they provide the
same statistical information as a hypothesis test (15, 233, 215, 26, 0, 18, 23, 8, 222, 222, 27
18, 28, 38). More important, point and interval esti- The average of these sample observations is the sample
mates convey information about scientific importance. mean y
Practical considerations. Estimation focuses atten-
10
tion on the magnitude and uncertainty of the experimen- 1
tal results. We must emphasize that hypothesis testing y5
10
· o y 5 28.2
i51
i (2)
can have value beyond assessing the strength of the
experimental evidence: for example, hypothesis testing Because of intrinsic variability in the population, the
is useful if an investigator wants to evaluate the sample mean y differs from the population mean µ1;
importance of between-subject variability in an experi- only because this is a contrived example do we know
ment. In practice, estimation should be done whenever the true magnitude of the discrepancy.7 Next, we re-
it is relevant and feasible; the precise P value from the view measures that estimate variability in the popula-
associated hypothesis test should be reported with the tion.
point and interval estimates. When more than one
hypothesis is tested in an experiment, the problem of 5
multiple comparisons becomes relevant. Nevertheless, References 2, 9, 42, and 48 discuss other aspects of sampling.
6 Statistical calculations and exercises were executed by using SAS
a discussion of the issues involved in multiple-compari- Release 6.04 (SAS Institute, Cary, NC, 1987).
7 We address the discrepancy between the value of the sample
ESTIMATING VARIABILITY IN THE POPULATION be a deceptive one: even subtle departures from a
normal distribution can render useless the standard
The preceding sample observations, 233, 215, . . . ,
27, differ because the population from which they were deviation as an index of variability (43); often, the
drawn is distributed over a range of possible values. distribution of a biological variable differs grossly from
This intrinsic variability is more than a distraction: it is a normal distribution. As one example, the distribution
an integral part of statistics, and the careful study of of values for plasma creatinine (26) resembles the
variability may reveal something about underlying skewed distribution depicted in Fig. 2. When the tails of
scientific processes (25). The most common measure of a distribution are elongated, as is the right tail of this
the variability among sample observations is the sam- skewed distribution, the sample standard deviation
ple standard deviation s, the square root of the sample will be an inflated measure of variability in the popula-
variance s 2 tion (43, 48). There are two remedies to this misrepre-
n
sentation of variability by the standard deviation: use
1
s 5
2
n21
· o ( y 2 y)
i51
i
2 another measure of variability, or transform the data.
Alternative measures of variability. Two measures of
variability that are useful with a variety of distribu-
(See also Refs. 2, 9, 42, and 48.) The sample standard tions are the mean absolute deviation and the interquar-
deviation characterizes the typical distance of an obser- tile range. The mean absolute deviation (Ave 50 dev06) is
vation from the distribution center; in other words, it the average distance of the sample observations from
reflects the dispersion of individual sample observa- the sample mean
tions about the sample mean. The sample standard
deviation s also estimates the population standard 1 n
deviation s: the standard deviation of the sample
observations 233, 215, . . . , 27 is s 5 15.2, which
Ave 50dev 06 5 ·o 0y 2 y 0
n i51 i
estimates s 5 20. The interquartile range (often designated as IQR)
Most journals would publish the preceding sample
encompasses the middle 50% of a distribution and is
mean and standard deviation as
the difference between the 75th and 25th percentiles.
28.2 mmHg 6 15.2 For 0 , w , 1, the 100wth percentile is the value below
The 6 symbol, however, is superfluous: the standard which 100w% of the distribution is found.
deviation is a single positive number. A standard Data transformation. When the sample observations
deviation can be reported clearly with notation of this happen to be drawn from a population that has a
form skewed distribution (e.g., a constituent of blood or the
growth rate of a tumor), a transformation may change
28.2 mmHg (SD 15.2) the shape of their distribution so that the distribution
In a table, the symbol SD can be omitted without loss of of the transformed observations is more symmetric (14,
clarity as long as the table legend identifies the paren- 23, 26, 32, 48). Common transformations include the
thetical value as a standard deviation. logarithmic, inverse, square root, and arc sine transfor-
The standard deviation is often a useful index of mations. The APPENDIX reviews a useful family of data
variability, but in many experimental situations it may transformations.
INVITED REVIEW 779
In the next section, we revisit the unknown discrep- and variance s2, which are known; the notation for this
ancy between the sample estimate of a population normal distribution is Y , N(µ, s2 ). If an infinite
parameter and the population parameter itself. number of samples, each with n independent observa-
tions, is drawn from this normal distribution, then the
ESTIMATING UNCERTAINTY ABOUT
sample means y1, y2, . . . , y` will also be distributed
A POPULATION PARAMETER
normally.8 The average of the sample means, Ave 5y6, is
In the sampling exercise from USING SAMPLES TO the population mean µ, but the variance of the sample
LEARN ABOUT POPULATIONS, the sample mean y 5 28.2 means (Var 5y6) is smaller than the population variance
(Eq. 2) estimated the population mean µ1 5 215. If we s2 by a factor of 1/n
had calculated this sample mean from experimental
observations, then we would be uncertain about the Ave 5y6 5 µ and Var 5y6 5 s2y 5 s2/n
magnitude of the discrepancy between the sample
estimate y and the population parameter µ1. The ability (The APPENDIX derives these expressions. Figure 3
to estimate the level of uncertainty about the value of a develops these expressions using empirical examples.)
population parameter by using the sample estimate of Therefore, the standard deviation of the theoretical
that parameter is a powerful aspect of statistics (47). distribution of the sample mean, sy, is
Suppose we measure the same response variable, the
change in systolic blood pressure, in a second sample of sy 5 s/În
10 independent observations drawn from the same
population. We know beforehand that because of ran- If the sample size n increases, then the standard
dom sampling the mean of the second sample, y2, will deviation sy will decrease: that is, the more sample
differ from the mean of the first sample, y1 5 28.2. If we observations we have, the more certain we will be that
measure the change in systolic blood pressure in 100 the point estimate y is near the actual population
samples of 10 independent observations, then we ex- mean µ.
pect 100 different estimates of the population mean µ1; The standard deviation of the theoretical distribu-
for example
tion of the sample mean is known also as the standard
y1 5 28.2, y2 5 28.1, · · ·, y100 5 222.5 error of the sample mean, SE 5y6; that is
parameter; in experimental situations, we are uncer- (45) evaluated the impact of antihypertensive drugs on
tain about which of these outcomes has occurred. the incidence of stroke in persons with isolated systolic
Instead, the level of confidence in a confidence interval hypertension. When compared with placebo, these drugs
is based on the concept of drawing a large number of reduced by 36% (P 5 0.0003) the incidence of stroke.
samples, each with n observations, from the popula- Associated with this reduced incidence of stroke was a
tion. When we measured the change in systolic blood greater decrease in systolic blood pressure.
pressure in 100 random samples, we obtained 100 To appreciate the distinction between statistical sig-
different sample means and 100 different sample stan- nificance and scientific importance, consider two popu-
dard deviations. As a consequence, we will calculate lations that represent the theoretical distributions of
100 different 100(1 2 a)% confidence intervals; we ex- the decreases in systolic blood pressure for the two
pect ,100(1 2 a)% of these observed confidence inter- groups. Let the decrease in systolic blood pressure of
vals to include the actual value of the population mean the placebo group be designated Y1 and that of the drug
(see Fig. 4). treatment group be designated Y2. Assume that Y1 and
A confidence interval characterizes the uncertainty Y2 are distributed normally
about the estimated value of a population parameter.
Sometimes, an investigator may be interested less in Y1 , N (µ1, s21 ) and Y2 , N (µ2, s22 )
the value of the population parameter and more in the
distribution of individual observations. A tolerance The normal probability density function (Eq. 1), in
interval characterizes the uncertainty about the esti- which approximate values for the observed sample
mated distribution of those individual observations means and variances from the SHEP trial, yi and s2i , are
(see APPENDIX). substituted for the population means and variances,
Next, we illustrate the distinction between statistical generates the population distributions depicted in Fig. 5
significance and scientific importance. Last, we show
that the numerical results of statistical analyses have y1 5 215 ⇒ µ1, s 21 5 400 ⇒ s21
limitations.
and y2 5 225 ⇒ µ2, s 22 5 400 ⇒ s22
STATISTICAL AND SCIENTIFIC SIGNIFICANCE DIFFER
Suppose our objective is to estimate the difference
Hypothesis testing, as the primary scientific use of between population means
statistics, has a drawback: the result of a hypothesis
test conveys mere statistical significance. In contrast, µ2 2 µ1 5 225 2 (215) 5 210 mmHg
estimation conveys scientific significance.11 This distinc- The SHEP group established convincingly that the
tion is obvious if we use the results of a recent clinical difference µ2 2 µ1, which represents the greater de-
trial. In this trial, the Systolic Hypertension in the crease in systolic blood pressure after drug therapy,
Elderly Program (SHEP) Cooperative Research Group was important. To estimate µ2 2 µ1, we would sample at
random from each population: the difference between
11 The word ‘‘significance,’’ when used to refer to scientific conse- sample means, y2 2 y1, estimates the difference be-
quence, is ambiguous. Hereafter, we use the word ‘‘importance.’’ tween population means, µ2 2 µ1.
782 INVITED REVIEW
By drawing samples of 2–128 observations from each scientific importance can be maintained by routinely
population (Table 2) and by forcing y2 2 y1 5 210 (see addressing two questions: how likely is it that the
Fig. 5), the distinction between statistical significance experimental effect is real, and is the experimental
and scientific importance becomes clear. As sample size effect large enough to be relevant? The first question
n grows, the statistical significance increases, from P 5 can be answered simply: compare the P value, obtained
0.71 for n 5 2 to P , 0.001 for n 5 128. Regardless of in the hypothesis test, with the critical significance
sample size, one aspect of scientific importance, that level a, chosen before any data are collected; if P , a,
reflected by the difference y2 2 y1, remains constant. As then the experimental effect is likely to be real. The
sample size increases, uncertainty about the actual second question can be answered in two steps: calculate
difference µ2 2 µ1, another aspect of scientific impor- a confidence interval for the population parameter, and
tance characterized by the numerical bounds of the then assess the numerical bounds of that confidence
confidence interval, decreases.
interval for scientific importance; if either bound of the
Practical considerations. In experimental situations,
confidence interval is important from a scientific per-
the distinction between statistical significance and
spective, then the experimental effect may be large
enough to be relevant.
Table 2. Statistical and scientific significance differ: Consider the results when 15 sample observations
statistical results were drawn from the placebo and drug treatment
95% Confidence
populations: when compared with placebo, the greater
n y2 2 y1 SE 5 y2 2 y1 6 Interval* t† Pr 5µ2 2 µ1 5 06‡ decrease in systolic blood pressure after drug therapy
was unconvincing from a statistical perspective
2 210 23.1 2110 to 190 20.43 0.71
4 210 22.6 265 to 145 20.44 0.67 (P 5 0.18). Because the 95% confidence interval was
8 210 12.3 236 to 116 20.81 0.43 [225, 15], uncertainty about the actual impact of drug
10 210 10.1 231 to 111 20.99 0.34 treatment on systolic blood pressure is relatively large.
15 210 7.3 225 to 15 21.38 0.18
20 210 5.8 222 to 12 21.74 0.09
Note, however, that the additional decrease in systolic
25 210 5.3 221 to 11 21.88 0.07 blood pressure gained by drug treatment may have
32 210 4.7 219 to 21 22.13 0.04 been as pronounced as 25 mmHg. From a scientific
64 210 3.7 217 to 23 22.70 ,0.01 perspective, further studies, designed with greater
128 210 2.4 215 to 25 24.25 ,0.001
statistical power, are warranted.
n, Sample size drawn from placebo ( population 1) and drug To illustrate that a significant statistical result may
treatment (population 2) populations (see Fig. 5). * Confidence inter- have little scientific importance, imagine that systolic
val for the difference between population means, µ2 2 µ1 (see Eq. A2).
† Test statistic used to evaluate statistical significance of the differ- blood pressure had been measured in mmH2O rather
ence y2 2 y1 (see Eq. A3). ‡ Probability (2-tailed) that µ2 2 µ1 5 0; this than in mmHg. Consider the results when 128 sample
is the significance level P for the null hypothesis H0 : µ2 2 µ1 5 0. The observations were drawn from the two populations: the
difference y2 2 y1 and the 95% confidence interval for the difference greater decrease in systolic blood pressure after drug
µ2 2 µ1 reflect the magnitude and uncertainty of the experimental
results. The test statistic t and its associated P value reflect statisti- therapy was compelling from a statistical perspective
cal significance. An increase in the no. of observations drawn from (P , 0.001). If the confidence interval [215, 25] is
each population decreases SE 5 y2 2 y1 6: as a consequence, the statisti- expressed in mmHg (by dividing each bound by 13.6),
cal significance increases (irregularly, because of random sampling),
but the estimated difference between population means remains
then the investigator can declare, with 95% confidence,
constant at y2 2 y1 5 210. The APPENDIX details the statistical that the magnitude of the greater decrease in systolic
equations required to perform this sampling exercise. blood pressure was 0.4–1.1 mmHg. In this example, the
INVITED REVIEW 783
investigator can be quite certain of a trivial experimen- Table 3. Limitations of statistics: raw data and
tal effect. regression statistics
Whatever the statistical result of a hypothesis test,
assessment of the corresponding confidence interval Drug A Drug B
that introductory courses in statistics are relevant and Consider the linear function L
sound (7, 44, 50).
L 5 k1X11 k2X2 1 · · · 1 kmXm
In this review, we have reiterated the primary role of
statistics within science to be one of estimation: estima- For i 5 1, 2, . . . , m, each ki is a real constant, and each Xi ,
tion of a population parameter or estimation of the N(µi, s2i ). The mean of L, Ave 5L6, is
uncertainty about the value of that parameter. More- m
over, we have demonstrated the essential distinction
between statistical significance and scientific impor-
Ave 5L 6 5 k1µ1 1 k2µ2 1 · · · 1 kmµm 5 okµ
i51
i i
17. Colditz, G. A., and J. D. Emerson. The statistical content of 35. Hogg, R. V., and A. T. Craig. Introduction to Mathematical
published medical research: some implications for biomedical Statistics (4th ed.). New York: Macmillan, 1978.
education. Med. Educ. 19: 248–255, 1985. 36. Iberall, A. S. The problem of low-dose radiation toxicity. Am. J.
18. Colton, T. Statistics in Medicine. Boston, MA: Little, Brown, Physiol. 244 (Regulatory Integrative Comp. Physiol. 13): R7–R13,
1974. 1983.
19. Cox, D. R. Statistical significance tests. Br. J. Clin. Pharmacol. 37. Jackson, T. E. Comparison of a class of regression equations.
14: 325–331, 1982. Am. J. Physiol. 246 (Regulatory Integrative Comp. Physiol. 15):
20. Denenberg, V. H. Some statistical and experimental consider- R271–R276, 1984.
ations in the use of the analysis-of-variance procedure. Am. J. 38. Kruskal, W. H. Tests of significance. In: International Encyclope-
Physiol. 246 (Regulatory Integrative Comp. Physiol. 15): R403– dia of the Social Sciences, edited by D. L. Sills. New York:
R408, 1984. Macmillan & The Free Press, 1968, vol. 14, p. 238–250.
21. Denham, M. J., A. Foster, and D. A. J. Tyrrell. Work of a 39. Land, T. A., and M. Secic. How to Report Statistics in Medicine.
district ethical committee. Br. Med. J. 2: 1042–1045, 1979. Philadelphia, PA: Am. College Physicians, 1997.
22. DiStefano, J. J., III, and E. M. Landaw. Multiexponential, 40. Landaw, E. M., and J. J. DiStefano III. Multiexponential,
multicompartmental, and noncompartmental modeling. I. Meth- multicompartmental, and noncompartmental modeling. II. Data
odological limitations and physiological interpretations. Am. J. analysis and statistical considerations. Am. J. Physiol. 246
Physiol. 246 (Regulatory Integrative Comp. Physiol. 15): R651– (Regulatory Integrative Comp. Physiol. 15): R665–R677, 1984.
R664, 1984. 41. Montgomery, D. C., and G. C. Runger. Applied Statistics and
23. Draper, N. R., and H. Smith. Applied Regression Analysis (2nd Probability for Engineers. New York: Wiley, 1994, p. 361–363.
ed.). New York: Wiley, 1981. 42. Moses, L. E. Think and Explain with Statistics. Reading, MA:
24. Evans, S. J. W., P. Mills, and J. Dawson. The end of the p Addison-Wesley, 1986.
value? Br. Heart J. 60: 177–180, 1988.
43. Mosteller, F., and J. W. Tukey. Data Analysis and Regression.
25. Fisher, R. A. Statistical Methods and Scientific Inference (3rd
Reading, MA: Addison-Wesley, 1977.
ed.). New York: Hafner, 1973.
44. Murray, G. D. How we should approach the future. Stat. Med. 9:
26. Flynn, F. V., K. A. J. Piper, P. Garcia-Webb, K. McPherson,
1063–1068, 1990.
and M. J. R. Healy. The frequency distributions of commonly
45. SHEP Cooperative Research Group. Prevention of stroke by
determined blood constituents in healthy blood donors. Clin.
Chim. Acta 52: 163–171, 1974. antihypertensive drug treatment in older persons with isolated
27. Garcia-Mata, C., and F. I. Shaffner. Solar and economic systolic hypertension. Final results of the systolic hypertension
relationships: a preliminary report. Q. J. Economics 49: 1–51, in the elderly program (SHEP). JAMA 265: 3255–3264, 1991.
1934. 46. Slinker, B. K., and S. A. Glantz. Multiple regression for
28. Gardner, M. J., and D. G. Altman. Confidence intervals rather physiological data analysis: the problem of multicollinearity. Am.
than P values: estimation rather than hypothesis testing. Br. J. Physiol. 249 (Regulatory Integrative Comp. Physiol. 18):
Med. J. 292: 746–750, 1986. R1–R12, 1985.
29. Garfinkel, D., and K. A. Fegley. Fitting physiological models to 47. Snedecor, G. W. The statistical part of the scientific method.
data. Am. J. Physiol. 246 (Regulatory Integrative Comp. Physiol. Ann. NY Acad. Sci. 52: 792–799, 1950.
15): R641–R650, 1984. 48. Snedecor, G. W., and W. G. Cochran. Statistical Methods (7th
30. Gray, B. H., R. A. Cooke, and A. S. Tannenbaum. Research ed.). Ames: Iowa State Univ. Press, 1980.
involving human subjects. Science 201: 1094–1101, 1978. 49. Tuininga, Y. S., D. J. van Veldhuisen, J. Brouwer, J.
31. Healy, M. J. R. Significance tests. Arch. Dis. Child. 66: 1457– Haaksma, H. J. G. M. Crijns, A. J. Man in’t Veld, and K. I.
1458, 1991. Lie. Heart rate variability in left ventricular dysfunction and
32. Healy, M. J. R. Data transformations. Arch. Dis. Child. 69: heart failure: effects and implications of drug treatment. Br.
260–264, 1993. Heart J. 72: 509–513, 1994.
33. Hill, A. B. Principles of medical statistics. XII—Common falla- 50. Yates, F., and M. J. R. Healy. How should we reform the
cies and difficulties. Lancet i: 706–708, 1937. teaching of statistics? J. R. Stat. Soc. Ser. A 127: 199–210, 1964.
34. Hofacker, C. F. Abuse of statistical packages: the case of the 51. Yates, F. E. Contribution of statistics to ethics of science. Am. J.
general linear model. Am. J. Physiol. 245 (Regulatory Integrative Physiol. 244 (Regulatory Integrative Comp. Physiol. 13): R3–R5,
Comp. Physiol. 14): R299–R302, 1983. 1983.