Académique Documents
Professionnel Documents
Culture Documents
a
Institute of Food Research, Norwich Research Park, Colney, Norwich NR4 7 UA, UK
Strathclyde Graduate Business School, Strathclyde University, 199 Cathedral Street, Glasgow G4 0 QU, UK
Abstract
This paper systematically reviews empirical studies looking at the effectiveness of the Delphi technique, and provides a
critique of this research. Findings suggest that Delphi groups outperform statistical groups (by 12 studies to two with two
ties) and standard interacting groups (by five studies to one with two ties), although there is no consistent evidence that
the technique outperforms other structured group procedures. However, important differences exist between the typical
laboratory version of the technique and the original concept of Delphi, which make generalisations about Delphi per se
difficult. These differences derive from a lack of control of important group, task, and technique characteristics (such as the
relative level of panellist expertise and the nature of feedback used). Indeed, there are theoretical and empirical reasons to
believe that a Delphi conducted according to ideal specifications might perform better than the standard laboratory
interpretations. It is concluded that a different focus of research is required to answer questions on Delphi effectiveness,
focusing on an analysis of the process of judgment change within nominal groups. 1999 Elsevier Science B.V. All rights
reserved.
Keywords: Delphi; Interacting groups; Statistical group; Structured groups
1. Introduction
Since its design at the RAND Corporation over 40
years ago, the Delphi technique has become a widely
used tool for measuring and aiding forecasting and
decision making in a variety of disciplines. But what
do we really understand about the technique and its
workings, and indeed, how is it being employed? In
this paper we adopt the perspective of Delphi as a
judgment or forecasting or decision-aiding tool, and
we review the studies that have attempted to evaluate
it. These studies are actually sparse, and, we will
argue, their attempts at evaluating the technique have
*Corresponding author. Tel.: 144-1603-255-125.
E-mail address: gene.rowe@bbsrc.ac.uk (G. Rowe)
0169-2070 / 99 / $ see front matter
PII: S0169-2070( 99 )00018-7
354
brief review only. The Delphi technique was developed during the 1950s by workers at the RAND
Corporation while involved on a U.S. Air Force
sponsored project. The aim of the project was the
application of expert opinion to the selection from
the point of view of a Soviet strategic planner of
an optimal U.S. industrial target system, with a
corresponding estimation of the number of atomic
bombs required to reduce munitions output by a
prescribed amount. More generally, the technique is
seen as a procedure to obtain the most reliable
consensus of opinion of a group of experts . . . by a
series of intensive questionnaires interspersed with
controlled opinion feedback (Dalkey & Helmer,
1963, p. 458). In particular, the structure of the
technique is intended to allow access to the positive
attributes of interacting groups (knowledge from a
variety of sources, creative synthesis, etc.), while
pre-empting their negative aspects (attributable to
social, personal and political conflicts, etc.). From a
practical perspective, the method allows input from a
larger number of participants than could feasibly be
included in a group or committee meeting, and from
members who are geographically dispersed.
Delphi is not a procedure intended to challenge
statistical or model-based procedures, against which
human judgment is generally shown to be inferior: it
is intended for use in judgment and forecasting
situations in which pure model-based statistical
methods are not practical or possible because of the
lack of appropriate historical / economic / technical
data, and thus where some form of human judgmental input is necessary (e.g., Wright, Lawrence &
Collopy, 1996). Such input needs to be used as
efficiently as possible, and for this purpose the
Delphi technique might serve a role.
Four key features may be regarded as necessary
for defining a procedure as a Delphi. These are:
anonymity, iteration, controlled feedback, and the
statistical aggregation of group response. Anonymity
is achieved through the use of questionnaires. By
allowing the individual group members the opportunity to express their opinions and judgments
privately, undue social pressures as from dominant
or dogmatic individuals, or from a majority should
be avoided. Ideally, this should allow the individual
group members to consider each idea on the basis of
merit alone, rather than on the basis of potentially
355
of controlled experimentation or academic investigation that has taken place . . . (and that) aside
from some RAND studies by Dalkey, most evaluations of the technique have been secondary efforts
associated with some application which was the
primary interest (p. 11). Indeed, the relative paucity
of evaluation studies noted by Linstone and Turoff is
still evident. Although a number of empirical examinations have subsequently been conducted, the bulk
of Delphi references still concern applications rather
than evaluations. That is, there appears to be a
widespread assumption that Delphi is a useful instrument that may be used to measure some kind of truth
and it is mainly represented in the literature in this
vein. Consideration of what the evaluation studies
actually tell us about Delphi is the focus of this
paper.
356
Table 1
Summary of the methodological features of Delphi in experimental studies a
Study
type
Delphi
group size
Rounds
Nature of Delphi
feedback
Nature of
subjects
Task
Incentives
offered?
Application
(process)
Individual estimates
(bombing schedules)
No
Process
1520
Unclear
No
Process
11 and 14
Medians (normalised
probability medians)
No
Technique
comparison
Individual estimates
(likelihood ratios)
No
Best (1974)
Process
14
No
Technique
comparison
No
Process
Unspecified
(possibly 21)
Means, frequency
distribution, reasons (from
Ps distant from mean)
No
Process
98
Medians, interquartile
range, own responses to
previous round
No
Study
357
Study
type
Technique
comparison,
Process
Technique
comparison
Technique
comparison
Technique
comparison
Technique
comparison
(process)
Application
(process)
Technique
comparison
Technique
comparison
Process
Technique
comparison
Study
Brockhoff (1975)
Miner (1979)
Rohrbaugh (1979)
Fischer (1981)
Spinelli (1983)
Riggs (1983)
Table 1. Continued
80
45
20
(24 initially)
3, 7, 11
57
5, 7, 9, 11
Delphi
group size
Up to 7
Rounds
Individual estimates
(ranks), reasons
Individual estimates,
one reason from each P
Individual estimates
(subjective probability
distributions)
Medians, ranges
Not specified
Nature of Delphi
feedback
Students
Students
Professionals: education
Professionals: banking
Nature of
subjects
Forecasting (likelihood of 34
education-related events occurring
in 20 years, on 16 scale)
Task
No
No
No
No
No
No
Yes
No
No
Yes
Incentives
offered?
358
G. Rowe, G. Wright / International Journal of Forecasting 15 (1999) 353 375
Process
Technique
comparison
Technique
comparison
Process
Technique
comparison
Process
(application)
Technique
comparison
Process
Dietz (1987)
Sniezek (1989)
Sniezek (1990)
15 in r1
19, 11
59
2 and 3
Medians, individual
numerical responses,
reasons
Statistical average of
points for each factor
Response frequency or
regression weights or
induced (if-then) rules
Distribution of
numerical evaluations
of each of the 55 items
Medians
,6
Medians
Individual estimates
(ranks), reasons
Varied
#5
Surgeons
Elementary school
teachers
Forecasting (proportion of
citizens voting yes on ballot
in California election)
No
No
No
No
No
Yes
Yes
No
No
*Analysis significant at the P , 0.05 level; NS, analysis not significant at the P , 0.05 level (trend only); NDA, no statistical analysis of the noted relationship was reported; ., greater/better than; 5, no (statistically significant)
difference between; D, Delphi; R, first round average of Delphi polls; S, staticized group approach (average of non-interacting set of individuals). When used, this indicates that separate staticized groups were used instead of (or in
addition to) the average of the individual panellists from the first round Delphi polls (i.e., instead of/in addition to R); I, interacting group; NGT, Nominal Group Technique; r, round; P, panellist; NA, not applicable.
Process
Rounds
Expertise (e.g., on items about
NA
NA
Satisfaction: NGT.D*, I*
Rounds
Accuracy: D.R*
Consensus: NGT5D5I5S
error).
Techniques (NGT, I, D, S)
Techniques (NGT, I, D)
NA
comparisons
Results of technique
Rounds
Best (1974)
Rounds
(and measures)
(and levels)b,c
Convergence (difference in
variables
Rounds
Dependent
Independent
variables
Study
Table 2
Results of the experimental Delphi studies a
Other results,
NA
NA
e.g. re processes
unexpected/counter-intuitive results
procedures
heterogenous groups
Additional comments
360
G. Rowe, G. Wright / International Journal of Forecasting 15 (1999) 353 375
Rounds
would occur)
Techniques (D, I)
Information on task (high vs. low)
Parente, Anderson,
Riggs (1983)
Accuracy: R5I
Accuracy: D.I*
Accuracy: D.R*
Accuracy: D.I*
Techniques (D, I)
Rounds
interquartile range)
NA
Rounds
Spinelli (1983)
Accuracy: D5NGT5I5S
(Procedures Main Effect: P . 0.20)
correct answer)
Techniques (NGT, D, I, S)
Fischer (1981)
Accuracy: D5SJ
Post-group agreement: SJ.D (significant)
Rohrbaugh (1979)
Rounds
Techniques (D, I)
Group size (5, 7, 9, 11, D; 4, 6, 8, 10, I)
Miner (1979)
Brockhoff (1975)
NA
Rohrbaugh (1979)
accuracy or confidence.
NA
NA
Rounds
this paper
Additional comments
In many of the studies, a number of different questions are presented to the subjects so that items may be considered to be an independent variable. Usually, however, there is no specific rationale behind analysing responses to the different items, so we do not consider items an I.V. here.
For the number of rounds (i.e. levels of independent variable Round), see Table 1.
feedback conditions*.
Rounds
NA
NA
NA
discussion)
survivability/abandonment)
NA
Other results,
e.g. re processes
Quality: D.R*
comparisons
Results of technique
Sniezek (1990)
Sniezek (1989)
Dietz (1987)
(and measures)
(and levels)b,c
Quality (comparison of rankings to correct
variables
Rounds
Dependent
Independent
variables
Study
Table 2. Continued
362
G. Rowe, G. Wright / International Journal of Forecasting 15 (1999) 353 375
5. Findings
In this section, we consider the results obtained by
the evaluative studies as summarised in Table 2, to
which the reader is referred. We will return to the
details in Table 1 in our subsequent critique.
5.1. Consensus
One of the aims of using Delphi is to achieve
greater consensus amongst panellists. Empirically,
consensus has been determined by measuring the
variance in responses of Delphi panellists over
rounds, with a reduction in variance being taken to
indicate that greater consensus has been achieved.
Results from empirical studies seem to suggest that
variance reduction is typical, although claims tend to
be simply reported unanalysed (e.g., Dalkey &
Helmer, 1963), rather than supported by analysis
(though see Jolson & Rossow, 1971). Indeed, the
363
364
365
Table 3
Comparisons of Delphi to staticized (round 1) groups a
Study
Criteria
Trend of relationship
(moderating variable)
Additional comments
Accuracy
D.R (NDA)
Accuracy
Accuracy
Accuracy
D,S (NDA)
D,R (NDA)
Best (1974)
Accuracy
D.R*
Brockhoff (1975)
Accuracy
Rohrbaugh (1979)
Accuracy
D.R (NDA)
Fischer (1981)
Accuracy
D5S
Accuracy
R.D*
Accuracy
D.R*
Accuracy
Accuracy
Quality
D.R*
Quality
D.R*
Dietz (1987)
Accuracy
D.R
Sniezek (1989)
Accuracy
D.R
Sniezek (1990)
Accuracy
No reported comparison of
accuracy change over D rounds
Accuracy
Accuracy
D (reasons fback).R*
D (stats fback).R*
366
Table 4
Comparisons of Delphi to interacting groups a
Study
Criteria
Trend of relationship
(moderating variable)
Additional comments
Accuracy
I.D (NDA)
Number of ideas
D.I*
Brockhoff (1975)
Accuracy
Fischer (1981)
Accuracy
D5I
Accuracy
D.I*
Riggs (1983)
Accuracy
D.I*
Quality
D.I*
Sniezek (1989)
Accuracy
D.I (NDA)
Sniezek (1990)
Accuracy
D5I
367
Table 5
Comparisons of Delphi to other structured group procedures a
Study
Criteria
Trend of relationship
(moderating variable)
Additional comments
Accuracy
NGT.D*
[ of ideas
NGT.D
Miner (1979)
Quality
PCL.NGT.D
Rohrbaugh (1979)
Accuracy
D5SJ
Fischer (1981)
Accuracy
D5NGT
Accuracy
D5NGT
Quality
D.NGT*
D.Guided Group*
Sniezek (1989)
Accuracy
Dictator5D5Dialectic
(NDA)
Sniezek (1990)
Accuracy
D.BM (one
of five items)
368
369
370
forms that differ from one study to the next and that
often differ from the ideal of Delphi. Process studies
ask: what is it about Delphi that makes it work?,
and this involves both asking and answering the
question: what is Delphi?
We believe that the emphasis of research should
be shifted from technique-comparison studies to
process studies. The latter should focus on the way
in which an initial statistical group is transformed by
the Delphi process into a final round statistical
group. To be more precise, the transformation of
statistical group characteristics must come about
through changes in the nature of the panellists, which
are reflected by changes over rounds in their judgments (and participation), which are manifest as
changes in the individual accuracy of the group
members, their judgmental intercorrelations, and
indeed when panellists drop out over rounds
even their group size (as per Hogarth, 1978).
One conceptualisation of the problem is to view
judgment change over rounds as coming about
through the interaction of internal factors related to
the individual (such as personality styles and traits),
and external factors related to technique design and
the general environment (such as the influence of
feedback type and the nature of the judgment task).
From this perspective, the role of research should be
to identify which factors are most important in
explaining how and why an individual changes his /
her judgment, and which are related to change in the
desired direction (i.e., towards more accurate judgment). Only by understanding such relationships will
it be possible to explain the contingent differences in
judgment-enhancing utility of the various available
procedures which include Delphi. Importantly from
this perspective and unlike in technique-comparison studies analysis should be conducted at the
level of the individual panellist, rather than at the
level of the nominal group, since the non-interacting
nature of Delphi ensures that individual responses
and changes are central and measurable and (unlike
interacting groups) the ultimate group response is
effectively related to the sum of the parts and is not
holistically greater than these (e.g., Rowe et al.,
1991). The results of those few studies that we
consider to be of the process type are collated in
subsequent sections.
371
372
8. Conclusion
This paper reviews research conducted on the
Delphi technique. In general, accuracy tends to
increase over Delphi rounds, and hence tends to be
greater than in comparative staticized groups, while
Delphi panels also tend to be more accurate than
unstructured interacting groups. The technique has
shown no clear advantages over other structured
procedures.
Various difficulties exist in research of this technique-comparison type, however. Our main concern
is with the sheer variety of technique formats that
have been used as representative of Delphi, varying
from the technique ideal (and from each other) on
aspects such as the type of feedback used and the
nature of the panellists. If one uses a technique
References
Armstrong, J. S. (1985). Long range forecasting: from crystal ball
to computer, 2nd ed., Wiley, New York.
Armstrong, J. S., & Lusk, E. J. (1987). Return postage in mail
surveys: a meta-analysis. Public Opinion Quarterly 51 (2),
233248.
Ashton, R. H. (1986). Combining the judgments of experts: how
many and which ones? Organizational Behaviour and Human
Decision Processes 38, 405414.
Bardecki, M. J. (1984). Participants response to the Delphi
method: an attitudinal perspective. Technological Forecasting
and Social Change 25, 281292.
Best, R. J. (1974). An experiment in Delphi estimation in
marketing decision making. Journal of Marketing Research 11,
448452.
Bolger, F., & Wright, G. (1994). Assessing the quality of expert
judgment: issues and analysis. Decision Support Systems 11,
124.
Boje, D. M., & Murnighan, J. K. (1982). Group confidence
pressures in iterative decisions. Management Science 28,
11871196.
373
374
375