Vous êtes sur la page 1sur 7

CLIN. CHEM.

39/6, 929-935 (1993)

Chemistry with Confidence: Should Clinical Chemistry Require Confidence Intervals


for Analytical and Other Data?
A. Ralph Henderson

Confidence intervals are not commonlyprovidedwith these reported sensitivity values actually overlapped,
analyticalor other data reported in Clinical Chemistry i.e., 22% to 69% and 3.5% to 41%. Test A is not more
although P values are. However, confidenceintervals sensitive; these tests are equivalent.2 In another paper,
providean explicitdemonstrationof the directionand test C was claimed to be superior to test D on the basis
magnitudeof uncertaintyandare intuitivelyeasyto grasp, of a receiver-operating characteristic (ROC) curve anal-
unlike P values. It is thereforeargued that the Journal ysis, but neither the areas under the curve (i.e., accu-
should adopt a policy requiringthe provisionof confidence racy) nor the confidence intervals of these accuracies
intervals. Such a policy wouldimprovethe statisticalrigor were provided. Finally, a set of tests were compared by
of Journalreports. ROC curve analyses of each tests accuracy; one test was
claimed to possessless discriminating power than the
Indexing Terms: statistics parametric vs nonparametric distri- others, but the confidence interval of each estimate of
butions likelihood ratio receiver-operating characteristic test accuracy was not determined and indeed visual
curie inspection of the data suggested that all tests were
When possible, quantify findings and present them with appro- equivalent. These are three examples taken from one
priate indicators of measurement error or uncertainty (such as issue of the Journal.
confidence intervals). Avoid sole reliance on statistical hypothesis How should experimental findings be presented?
testing, such as the use of P values, which fails to convey impor- There are now many sources of good advice. For exam-
tant quantitative information.
ple, Clinical Biostatistics (4) and Statistics in Practice
International Committee of Medical Journal Editors (1) (5), both collections of articles first appearing in Clinical
Pharmacology and Therapeutics and the British Medical
Recently, a reviewer of a paper my colleagues and I Journal, respectively, are invaluable; the latter is par-
submitted to Clinical Chemistry for the P value of
asked ticularly good on the practical aspects. These aspectsare
a difference between sets of data. We had instead pro- also well addressed in Altmans recent book (3), as its
vided the confidence intervals for the data, but this was title suggests. Statistics deals with samples taken from
of no interest: P values were, for the reviewer, the populations consisting of all the possible observations
bottom line in such comparisons. Or are they? It is the that could be made; these samples are assumed to
intent of this Opinion to make the case for more appro- possess the same characteristics as the parent popula-
priate statistical descriptions of experimental data than tion. Altman (3) has pointed out, however, that sam-
axe currently used in Clinical Chemistry but which are pling from a population with a truly gaussian distribu-
in accord with the most recent Uniform Requirements of tion (synonym: normal distribution) may not always
the International Committee of Medical Editors (1) produce samples that are themselves gaussian.
quoted above. The Journal issues very detailed statisti- Samples consist of observed data and possess empiri-
cal guidelines for authors (2), which do include a re- cal distributions. However, both samples and their par-
quirement for the use of appropriate indicators of mea- ent populations may conform to a variety of probability
surement error or uncertainty, but these guidelines are distributions. These mathematical abstractions are
often not observed in practice and frequently the wrong characterized by one or more parameters; for example,
type of statistical information is provided. Examples are the gaussian distribution is completely described by two
readily found on random inspection of the Journal; some
are mentioned here for illustrative purposes only but 1NoJ abbreviations: TP, true positive; TN, true negative;
are not cited in this Opinion, it being my intent to ROC, receiver-operating characteristic and LR, likelihood ratio.
2The details of the calculation of confidence intervals are out-
persuade rather than pillory.
lined in a later section (see Examples of Confidence Inierval
For example, test A was claimed to be more sensi- Calculations). However, it is conceptually easy to grasp, even at
tive [true positive (TP) rate or fraction = 44%] than test this stage, the very obvious overlapping of the two intervals. The
B (TP rate = 17%), but the 95% confidence intervals for more conventional approaches to analyzing data on nominal scales
are the proportion tests such as the z-test, x2 test, Fishers exact
test, or McNemars test, using Yates continuity correction when
Department of Clinical Biochemistry, University Hospital (Uni- the sample sizes are small (3). All of these tests are more involved
versity of Western Ontario), P.O. Box 5339, London, Ontario, than the simple calculation of the confidence intervals; what is
Canada N6A 5A5. more, in the quoted example, they all show that there is no
Received September 9, 1992; accepted January 25, 1993. statistical difference, i.e., P >0.05, between the stated sensitivities.

CLINICAL CHEMISTRY, Vol. 39, No. 6, 1993 929


parameters-mean and standard deviation (SD). The 1.25

advantage of using appropriate probability distribu- a


0,
tions to describe empirical data is that it permits statis- a.
tical analysis by powerful mathematical tools. When 1.00
such distributional assumptions are made, the process is
described as parametric. When no assumptions are
made at all about the distribution of data, the process is -
a
0.75
nonparametric or distribution free. Initially, atten- a-
a
tion will be paid here to parametric methods. a
The most commonly used probability distribution in 0
0.50
medicine is the gaussian, and much biological data is a
a
adequately represented by it. For example, Flynn et al.
(6) reported that, of 19 analytes exaniined from 1000 0.25
C.,
blood donors, nearly all showed gaussian-type distribu-
tions, either with or without logarithmic transforma-
tion. However, before sampled data, either raw or after 0.00

application of a transformation, are assumed to show 0 50 100 150 200 250


normality, this must be validated either by visual in- Sample size
spection or, better still, by a variety of formal checks (3). FIg. 1. The width of the 95% confidence interval as multiples of the
That such formal checks, even of the simplest nature, standarddeviation in relation to the samplesize
are not always used can readily be appreciated by
random review of this Journal. A recent paper gave a 95% confidence interval that will include the mean of
mean and SD of 11.9 and 8.7 kU/mg, respectively. If the the parent population with a probability of 0.95. Or, put
data were distributed normally, 95% of the values would another way, the 95% confidence interval will not in-
be found in the region -5.2 to 29 kU/mi -5.2 kU/mg! dude the true population mean 5% of the time, i.e., in 1
Let us assume the normality of a set of experimental out of 20 times. When working with small samples, say,
data. Its variabifity is measured by the SD. The percent- n <50, where n is the sample size, it is necessary to use
age of the data contained by multiples of the SD on each the t distribution instead of the normal distribution. The
side of the mean is, for 1 SD, 68.8%; for 2 SDs, 95.5%; t-value is obtained from a table of Students t distribu-
and for 3 SDs, 99.7%. Clearly, any percentage of the tion, for n 1 degrees of freedom, and the two-tailed
-

data may be chosen, but the most usual ones are 90%, percentage point (a) for the appropriate confidence in-
95%, and 99%, for which the following multiples of the terval (e.g., for 99%, a = 0.01; for 95%, a = 0.05; for 90%,
SD on each side of the mean apply: 1.645, 1.96, and a = 0.1). A plot of the 95% confidence interval, using the
2.576, respectively. Whereas SD is a descriptive index, t distribution, is shown in Figure 1 for values of n
the standard error of the mean (SE or, less commonly, between 5 and 250. Thus, for sample sizes between 25
SEM) is a measure of uncertainty. Feinstein (7) com- and 50, the confidence interval is in the range of 0.25 to
ments that neither standard nor error is an appro- 0.5 SDs, whereas for sample sizes >50, the confidence
priate term for this parameter and that these terms can interval will be <0.25 SDs.
only serve to confuse the unwary. What advantage does the knowledge of the confidence
Despite its inappropriate name, the SE is an ex- interval confer over the more traditional use of SD and
tremely important index. It is calculated from the SD SE? The latter parameters are usually used in the
and sample size (SE = SD/V). If the population is traditional process of stating a null, and often an alter-
repeatedly sampled, and each sample has its mean and native, hypothesis and then using a test statistic to
SD calculated, how well do these mean values estimate obtain a P value for rejecting or accepting the hypothe-
the true mean of the parent population? If each of these ses (3, 8). This process gives no indication at all of the
sample means is thought of as an individual value, then magnitude of the effect being studied; it merely pro-
the standard deviation of these means is the SE. It is duces a probability value. (This aspect will be examined
thus a measure of the uncertainty of a single sample in more detail later: see P Values, below.) By contrast,
mean as an estimate of the population mean (3). If the the confidence interval demonstrates, explicitly, the
population distribution is gaussian, then the distribu- magnitude of the uncertainty, and its direction, as well
tion of these sample means will also be gaussian. In as being an intuitively easy concept to grasp. Both
addition, the distribution of the sample means will also Lancet and British Medical Journal have published
approach normality, whatever the distribution of the numerous artides on this topic, which have been gath-
variables in the parent population, provided the sample ered, by the latter journal, into a book (9) with an
is sufficiently large (the Central Limit Theorem). associated computer program-the Confidence Interval
These remarks are a necessary prerequisite to intro- Analysis calculator (1O). it is, of course, accepted that
ducing the concept of a confidence interval. This inter-
val for a mean extends on both of its sides by a multiple
3These references may be obtained from Subscriber Services,
of the SE. This idea is exactly analogous to that previ- American College of Physicians, Independence Mall West, Sixth
ously described for the SD. Thus 1.96 x SE defines the Street at Race, Philadelphia, PA 19106-1572.
930 CUNICALCHEMISTRY,Vol.39, No. 6, 1993
the SD value may also be used to calculate the SE and the zone of uncertainty exceeds 30%. This aspect is
the confidence interval. In practice, this is rarely done, certainly not appreciated by many workers who appear
as a random inspection of this Journal will show. to be seduced by the apparently satisfactory test perfor-
mance indicated by a sensitivity of 90%. The confidence
Examples of Confidence IntervalCalculations interval of proportions, such as sensitivity and specific-
Some examples are provided to demonstrate the im- ity, follows a binomial distribution; therefore, unless the
portance of the explicit description of uncertainty. proportion is exactly 50% or the sample size is large, the
Confidence intervals-means. The simplest example of distribution is asymmetric, as shown in Figure 2. These
the value of using the confidence interval can be seen intervals may be calculated (12, 13) or exact values for
when referring to a mean value (11). Its confidence the 90% and 95% zones may be obtained for population
interval is obtained by calculating SE, obtaining the sizes from n = 2 to 100 from the table of binomial
appropriate t-value, as explained above, and evaluating distributions in the Geigy Scientific Tables (14). When n
the term (t x SE). Thus, when the sample size is 15, the >100, the simple formula given by Gardner and Altman
mean = 10.0, SD = 3.0, SE = 3/\/i = 0.775, and t = (12) suffices. Alternatively, these limits may be ob-
2.145, the 95% confidence interval on each side of the tained by use of the Confidence Interval Analysis pro-
mean is 10 0.775 x 2.145 = 10 1.66 (i.e. 8.34 to gram mentioned earlier (10).
11.66); for a population of n = 100, SE = 3/VI6 = 0.3, Confidence intervals-likelihood ratios. Bayesian
t = 1.984, and the 95% confidence interval is now mean analysis is frequently invoked in Clinical Chemistry.
10 0.595, or 9.405 to 10.595. Such data ifiustrate the The likelihood ratio is the link between the pretest and
profound influence of population size on the confidence the posttest odds of disease (15). Of course, as with all
interval already demonstrated in Figure 1. Again, it is such estimates, the likelihood ratio is subject to error,
not hard to find articles in Clinical Chemistry that which the confidence interval quantifies (16). The 95%
display, for example, mean SD values for several confidence interval of a likelihood ratio (LR) value is
groups but with the sizes of the groups varying from 20 LR to LRb, where a = 1 - (1.96/2) and b = 1 +
to >100! ConfIdence intervals would have given a much (1.96I2). x2 is evaluated by the simplified formula for a
clearer understanding of the variability of the data. 2 x 2 predictive value table. Beck (17) provides a
The calculations described above may be completely detailed example of the calculations. Again, it is uncom-
avoided by use of the Confidence Interval Analysis mon to see likelihood ratios associated with this essen-
program mentioned earlier (10). tial indication of variability in Clinical Chemistry, al-
Confidence intervals-p roportions. Sensitivity (TP though it is surely as important as the provision of SD or
rate) and specificity (true negative, or TN, rate) data are SE.
commonly reported in Clinical Chemistry. However, it is Confidence intervals-.area under the ROC curve. ROC
unusual to see the confidence intervals provided with curve analysis is an important and powerful tool for
such data. The need for these can readily be appreciated evaluating a tests diagnostic accuracy. The essential
by an eximinition of Figure 2, which shows the effect of index of accuracy, when using ROC curve analysis, is the
population size on the 95% confidence intervals for a test area under the curve (18). Swets (18) also suggests that
with a sensitivity of 90%. When the population is <20, areas of 0.5 to 0.7 denote low accuracy, 0.7 to 0.9 moder-
ate accuracy, and >0.9 high accuracy. However, one
must actually measure the area to establish the magni-
100
100 tude of the accuracy. Bamber (19) has shown that the
area under the curve is related to the Mann-Whitney
sensitivity = 90% U-statistic (a nonparametric test based on rank order).
90
90 This is the basis for the Hanley-McNeil procedure for
obtaining these areas (20). Nonetheless,it is rare to
encounter this essential index in the pages of Clinical
80
80 Chemistry, although ROC curves are frequently used.
iwcohdence limit But all such estimates of accuracy also require an indi-
cation of the extent of error of this estimation-which is
70
70 provided by the SE. Unfortunately, the calculations of
both the area under the curve and the SE (20, 21) are
60
60 tedious and prone to error, and are best performed either
by spreadsheet analysis (21) or by a more comprehensive
computer program (22,23). Beck and Shultz (21) give an
extended fflustration of these calculations.
50
0 20 40 60 80 100 50 Confidence intervals-regression.4 Clinical Chemistry
(2) requires an extensive list of statistical parameters
Population size
Fig. 2. RelatIonship between populationsize and the 95% confi-
denceintervalfor an estimate of sensitivity I have avoided discussionofthe advantages of the Deming plot
These limits were obtaIned from the Geigy Scientific Tables (14); the values over the conventional method (24) or of the bias plot
least-squares
were corroboratedby using the Confidence Interval Maiysls program (10) (25) for examining the relationship
between two variables.

CUNICAL CHEMISTRY, Vol. 39, No. 6, 1993 931


when contributors use linear-regression analysis. Al- latter type of graph is that commonly seen in Clinical
though sufficient information is provided by this re- Chemistry.
quirement for a reader to construct the confidence inter- The confidence interval for the mean value of y for a
val for the mean value of y, it would be more useful to given value of x is calculated as described earlier for
provide that information directly on the graph. Figure means, but with use of the standard error of the esti-
3A illustrates the usual type of plot seen in Clinical mate which is obtained during the linear-regres-
Chemistry, whereas Figure 3B shows, in addition, two sion procedure, and the appropriate value oft for n 2 -

zones of uncertainty (26). Thus, it is possible to see (in degrees of freedom and the percentage point (a) for the
appropriate confidence interval (e.g., for 99%, a = 0.01,
the inner zone) that for serum glucose concentrations of
etc.) as previously mentioned.
5 and 25 mmol/L, the 95% confidenceintervals for the
The value of y (y) is calculated for the chosen value
mean blood glucose meter readings are 3.39-10.4 and
of x; thus,y = (0.79 x 5.0) + 2.94, and the SE) is
194-26.0 mmol/L, respectively. The outer zone shows
calculated from the expression:
the uncertainty in predicted values of y for an individual
value of x-the prediction or tolerance interval. Clearly,
graph B provides much more information about the
scatter of the data than does graph A, although the
SE) = V/ (_1 +
(x_)2
(n - 1)S2)

For x = 5 mmol/L, Yt = 6.89, and the values for S and


30 - taken from the legend to Figure 3, SE) is evalu-

25
A ated thus:

0
S C /11 (5 - 55)2\
20- SE& )2.14%/1+ 1=1.43
V \8 7(6.99)2 /
2
a
5 15-
V
a The value of t, for 6 degrees of freedom and a = 0.05, is
0
U
10- 2.45; the 95% confidence interval is therefore:
C
0
0
2 5. Yest - t SE) thy + t . SE,

0- i.e.,yt - 2.45 SE&,,) toy + 2.45 SE),

0 5 10 15 20 25 30
or 3.39 to 10.39 mmol/L.
Serum glucose (rnmol/L)

The predictive interval is calculated on the same basis


as the confidence interval calculated above, but with use
of a different expression to obtain SE(,,):
30 -

25-
B SE& )SX
II
I(i+-+
1 (x-.)2
0 _ IY n (n-1)S2
20-
I-
V
For x = 5 mmolJL, y1, = (0.79 x 5.00) + 2.94 = 6.89,
15- and the expression for SE) is:
V
a
0
C)
10-
C // 1 (5 - 15.5)2\
0 SE& ) = 2.14! I1 + -+ I = 2.57
0
.2 s- V \ 8 7(6.99)2 /
0-
/ / As before, the value oft, for 6 degrees of freedom and a
= 0.05, is 2.45; the 95% confidence interval is therefore:
0 5 10 15 20 25 30

Serum glucose (mmol/L.) Ypred - t SE() tOYprei + t SEy,

FIg. 3. RelationshIp between serum glucose and blood glucose


meter estimations i.e.,yp.j - 2.45 SE() t0Yprj + 2.45 SE&_),
The regression line lsy= 0.79x+ 2.94, r= O.941= 15.5J= 15.2, S,,=
6.99, S,, = 5.87, Sw, = 2.14, SE, 0.116, and = 1.94. Graph
A shows the eight points used toobtain the least-square fit graph B shows, in or 0.598 to 13.2 mmolJL.
addition, the 95% confidence Interval for the mean blood glucose meter
reading (Innerzone, and the 95% predictIoninterval for an indMdual blood
glucose meter reading (outerzonei These rather tedious calculations can be completely

932 CUNICAL CHEMISTRY, Vol. 39, No. 6, 1993


avoided by use of the Confidence Interval Analysis
program mentioned earlier (10). L = - 1.96 = 2.25
Confidence intervals-correlation. The correlation co-
efficient, r, also has a degree of uncertainty and this can
be estimated if x and y have a joint bivariate normal 11 \/ii
distribution (26). The calculation requires the transfor- andU=1+-+1.96 =9.75
2 2
mation of r to Fishers z, as follows:

1 1(1+r) or, onroundingto the nearest integer,L = 2andU = 10.


z=-lnI Therefore, the lower 95% interval is the 2nd observation
2 L(l-r) (value in the ordered set and the upper limit is the 10th
observation (value 23). Alternatively, exact values for a
which has a standard error of 1/V(n 3). For the 90% - range of a can be obtained for population sizes from n =2
confidence interval, the standard error is multiplied by to 499 from the table of binomial distributions (P = 0.5) in
1.645, for 95% by 1.96, and for 99% by 2.576. Therefore, the Geigy Scientific Tables (14); from these tables, L = 1
for the 95% confidence interval, the upper and lower (value 5) and U = 10 (value 23). Note that the 95%
limits are: confidence interval for the median is very wide: it indudes
82% (9 of 11) of the population. This is a feature of
1.96 1.96 confidenceintervals for small samples from nonnormal
= Z + and z1 = z - distributions.
Finally, the Confidence Interval Analysis program (10)
may be used to obtain either a Wilcoxon or binomial-based
The values z1 and z2 need to be transformed back to the confidence interval. I emphasize that these calculations
original scale to provide the 95% confidence interval for provide only approximate values; standard texthooks
the correlation coefficient, by use of the expression: should be consulted for more detailed procedures (29, 30).

P Values
e-1 e-1
to Information for Authors (2) suggests that sole reli-
e2+l e2+l
ance should not placed on, for example, P values, but the
experience quoted above suggests otherwise. As far back
For the correlation coefficient in the legend to Figure 3, as 1978, Rothman (31) stated, in an editorial in the New
r = 0.94 1, z = 1.7467, z2 = 2.6228, z1 = 0.87058, and the England Journal of Medicine, that P values.. . are not
95% confidence interval for the correlation coefficient is good measures of the strength of the relation between
0.702 to 0.989. study variables. P values serve poorly as descriptive
All of these equations can be evaluated by direct entry statistics. Bailar and Mosteller write (32), in an article
into the sets of z-transformations in the Geigy Scientific originally prepared for the Annals of Internal Medicine,
Tables (27), thus avoiding a set of awkward arithmetic although not cited in Clinical Chemistrys Information
manipulations. Alternatively, the Confidence Interval for Authors, Confidence intervals offer a more informa-
Analysis program may be used (10). tive way to deal with the significance test than does a
Confidence intervals-nonparametric analy8es. When simple P value. Confidence intervals for a single mean
a studied population has a nonnormal distribution-a or a proportion provide information about both magni-
fairly common occurrence in the practice of clinical tude and its variability. Likewise, Gardner and Alt-
chemistry-the commonly used descriptor of the popu- man (33)-professional statisticians-comment that
lation is the median. Assume that 11 observations have even precise P values convey nothing about the
been made, the results of which are listed in ascending sizes of the differences between study groups. However,
order (a necessary first step in nonparametric statistics): a random search through Clinical Chemistry shows an
5, 7, 9, 11, 13, 15, 17, 19, 21, 23, and 25, with a median undue reliance on P value boundaries, e.g., P <0.01,
value of 15. This data set will be used to obtain the 95% >0.05, and so on. Although, in the past, it was necessary
confidence interval for the median. to rely on statistical tables for the values of P, many
The approximate confidence interval for the median commonly available microcomputer statistical pro-
(28) may be calculated as follows: grams can calculate an exact value for P; so why are P
values stifi given in this manner? The undue reliance on

L=--
2\
I 1.96-
_ 2/
I andU=
n/V 11.96-
1+-+
2\ 2
a P value above or below 0.05 has in any case been
savaged by Feinstein (34 )-who as a mathematician
and clinical epidemiologist speaks with considerable
authority-in terms that should be reprinted in all
where L and U are the lower and upper limits, respec- Advice to Authors-type articles:
tively, and the multiplier is 1.645 for the 90% confidence the statistical strategy proposed by Sir Ronald Fisher, who
interval, 1.96 for 95%, and 2.576 for 99%. For the regarded 95% of the inner values [of a distribution] as common and
example above, these limits evaluate to: the remaining 5% as significantly uncommon. Although the strat-

CUNICAL CHEMISTRY, Vol. 39, No. 6, 1993 933


egy is regularly used to designate the outer 5% of values as 3. Altman DG. Practical statistics for medical research. London:
abnormal, Murphy [35] has pointed out that contrary to popular Chapman and Hall, 1991:6llpp.
opinion, this demarcation of abnormality is not a recommenda- 4. Feinstein AR. Clinical biostatistics. St. Louis, MO: CV Mosby
tion of statisticians and. . . has no support from statistical theory. Co., l977:468pp.
Fishers proposed boundary of uncommon occurrences was in- 5. Gore SM, Altman DG, eds. Statistics in practice: comprising
tended for inferential decisions about P values, not for descriptive Statistics in question and Statistics and ethics in medical research.
decisions about normality. Nevertheless, after years of exposure to London: British Medical Association, l982:lO0pp.
0.05 as the magic level of stochastic significance, many clinicians 6. Flynn FV, Piper KAJ, Garcia-Webb P, McPherson K, Healy
have become thoroughly conditioned to accept the same boundary MJR. The frequency distributions of commonly determined blood
marker for abnormality.
constituents in healthy blood donors. Clin Chin Acta 1974;52:163-
The acceptance or rejection of a null hypothesis on the 71.
7. Feinstein AR. Clinical biostatistics XXXVII: demeaned errors,
basis of a P value of 0.049 or 0.051 is clearly nonsensical, confidence games, non-plussed minuses, inefficient coefficients,
but this aspect dominates biomedical thinking-as and other statistical disruptions of scientific communication. Clin
Feinstein observed. Pharmacol Ther 1976;20:617-31.
It may now be obvious that there is a close relation- 8. Ware JH, Mosteller F, Delgado F, Donnelly C, Ingelfinger JA.
P values. In:Bailar JC, Mosteller F, eds. Medical uses of statistics,
ship between the P value and the confidence interval of 2nd ed. Boston: NEJM Books, 1992:181-200.
the result for a sample. When the P value is signifi- 9. Gardner MJ, Altman DG, eds. Statistics with confidence-
cant, i.e., <0.05, the 95% confidence interval will not confidence intervals and statistical guidelines. London: British
Medical Journal, l989:l4opp.
include the value specified by the null hypothesis. How- 10. Gardner MJ, Gardner SB, Winter PD. Confidence interval
ever, the P value does not indicate the magnitude of the analysis microcomputer program. London: British Medical Jour-
difference, or its direction, and the degree of associated nal, 1989:77pp.
uncertainty is unknown. By contrast, the 95% confi- 11. Altman DG, Gardner MJ. Calculating confidence intervals for
means and their differences. In: Gardner MJ, Altman DG, eds.
denceinterval provides some sense ofall of these aspects Statistics with confidence-confidence intervals and statistical
and therefore is a more useful index for assessing the guidelines. London: British Medical Journal, 1989:20-7.
validity of the data. 12. Gardner MJ, Altman DG. Calculating confidence intervals for
proportions and their differences. In: Gardner MJ, Altman DG,
A Statistical Check Ust for Clinical Chemistry? eds. Statistics with confidence-confidence intervals and statisti-
cal guidelines. London: British Medical Journal, 1989:28-33.
Many publications have stressed the inappropriate 13. Armitage P, Berry G. Statistical methods in medical research,
use of statistical techniques, and in recent years several 2nd ed. Oxford: Blackwell Scientific Publications, 1987:115-20.
14. Diem K, Seldrup J. Geigy scientific tables, Vol.2. Introduction
journals have started to use statistical check lists. Clin- to statistics, statistical tables, & mathematical formulae-the
ical Chemistry now uses a style check list; could a binomial distribution, 8th ed. Basle: Ciba-Geigy, 1982:73-107.
statistical check list be added? The following topics may 15. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical
possibly ensure statistical respectability. They have epidemiology. A basic science for clinical medicine, 2nd ed. Boston:
Little, Brown and Co., l99l:44lpp.
been culled from several sources (3, 32, 36): 16. Miettinen OS. Estimahility and estimation in case-referent
Design of the reported study. Have the studys design studies. Am J Epidemiol 1976;103:226-35.
and objectives been sufficientlydescribed? Have the null 17. Beck JR. Likelihood ratios: another enhancement of sensitiv-
ity and specificity [Editorial]. Arch Pathol Lab Med 1986;110:
and alternative hypotheses been stated? Was the sam- 685-6.
ple size based on pre-study considerations of statistical 18. Swets JA. Measuring the accuracy of diagnostic systems.
power? How were the subjects in the study chosen? Science 1988;240:1285-93.
Analysis and presentation of the data. Have technical 19. Bamber D. The area above the ordinal dominance graph and
the area below the receiver operating graph. J Math Psychol
terms been correctly used? Have statistical terms and 1975;12:387-415.
abbreviations been adequately defined? Was the num- 20. Hanley JA, McNeil BJ. The meaning and use of the area
ber of subjects/samples stated? Were the statistical under a receiver operating characteristic (ROC)curve. Radiology
1982;143:29-36.
analyses appropriateand were reasons given for their 21. Beck JR, Shultz BK. The use of relative operating character-
use? Were these statistical procedures adequately do- istic (ROC) curves in test performance evaluation. Arch Pathol
scribed or referenced? Were indications of measurement Lab Med 1986;11O:13-20.
error or uncertainty provided? Is there undue reliance 22. Pellar TG, Leung FY, Henderson AR. A computer program for
rapid generation of receiver operating characteristic curves and
on P values? Are the tables and figures adequate? likelihood ratios in the evaluation of diagnostic tests. Ann Clin
When the major medical journals are increasingly Biochem 1988;25:411-6.
insisting on rational statistical rigor, can Clinical 23. Pellar TG, Galbraith LV, Leung FY, Henderson AR. A com-
Chemistry afford to ignore such a commendable trend? puter program to determine diagnostic decision thresholds and
likelihood ratios illustrated with aspartate aininotransferase ac-
I believe that this brief outline of the need to employ tivities after a myocardial infarction. Ann Clin Biochem 1989;26:
confidence intervals in the Journal should convince 533-7.
others that the provision of more relevant statistical 24. Cornbleet PJ, Gochman N. Incorrect least-square regression
coefficients in method-comparison analysis. Clin Chem 1979;25:
measurements will enhance the presentation of experi- 432-S.
mental data in Clinical Chemistry. 25. Bland JM, Altman DG. Statistical methods for assessing
agreement between two methods of clinical measurement. Lancet
1986;i:307-10.
References 26. Altman DG, Gardner MJ. Calculating confidence intervals for
1. International Committee of Medical Journal Editors. Uniform regression and correlation. In: Gardner MJ, Altman DG, eds.
requirements for manuscripts submitted to biomedical journals Statistics with confidence-confidence intervals and statistical
[Special Report]. N Engl J Med 1992;324:424-8. guidelines. London: British Medical Journal, 1989:34-49.
2. Information for authors. Clin Chem 1992;38:1-5. 27. Diem K, Seldrup J. Geigy scientific tables, vol.2. Introduction

934 CUNICAL CHEMISTRY, Vol. 39, No. 6, 1993


to statistics, statistical tables & mathematical formulae-z-trans- and statistical guidelines. London: British Medical Journal, 1989:
formation, 8th ed. Basle: Ciba-Geigy, 1982:64-7. 6-19.
28. Campbell MJ, Gardner MJ. Calculating confidence intervals 34. Feinstein AR. Clinical epidemiology: the architecture of clin-
for some non-parametric analyses. In: GardnerMJ, Altman DG, ical research. Philadelphia: WB Saunders Co., 1985:8l2pp.
eds. Statistics with confidence-confidence intervals and statisti- 35. Murphy BA. The normal, and perils of the syileptic argument.
Perspect Biol Med 1972;15:566-82.
cal guidelines. London: British Medical Journal, 1989:71-9. 36. GardnerMJ, Machin D, Campbell MJ. Use of check lists in
29. Sprent P. Quickstatistics-an introduction to non-parametric assessing the statistical content of medical studies. In: Gardner
methods. Harmondsworth, UK Penguin Books, l98l:264pp. MJ, Altman DG, eds. Statistics with confidence-confidence inter-
30. Sprent P. Applied nonparametricstatistical methods.London: vals and statistical guidelines. London:British Medical Journal,
Chapman and Hall, 1989:259pp. 1989:101-8.
31. Rothman K A show of confidence [Editorial]. N Engl J Med Additional Reading
1978;299:1362-3.
1. Bland M. An introduction to medical statistics. Oxford: Oxford
32. Bailar JC, Mosteller F. Guidelines for statistical reporting in Medical Publications, l987:365pp.
articles for medical journals: amplificationsand explanations. In: 2. Conover WJ. Practical nonparametric statistics, 2nd ed. New
Bailar JC, Mosteller F, eds. Medical uses of statistics, 2nd ed. York John Wiley & Sons, l980:493pp.
Boston:NEJM Books, 1992:313-31. 3. Fleiss JL. Statistical methods for rates and proportions, 2nd ed.
33. Gardner MJ, Altman DG. Estimation rather than hypothesis New York: John Wiley & Sons, l98l:32lpp.
testing: confidenceintervals rather than P values. In: Gardner MJ, 4. Sprent P. Applied nonparametric statistical methods. London:
Altman DG, eds. Statistics with confidence-confidence intervals Chapman and Hall, l989:259pp.

CLINICAL CHEMISTRY, Vol. 39, No. 6, 1993 935

Vous aimerez peut-être aussi