Vous êtes sur la page 1sur 19

..

's. New York: Peter


catiOllal Evaluation
ducational practice.
npany.
a University.
lmpet Club.
. New York: Basic
ve psychology. San
ttlefield Publishers.
"inners and losers in
mis, Trans. & Ed.).
,","
: .
. ; ,',"
'1' "
. ,'.
.
Jo reatA far
See- p. g e-u. "
9 fvr tuStlV1t5.
In Living Color: Qualitative Methods in 6--""" 0 '
Educational Evaluation
LINDA MABRY
Washington Stale University Wlncouver, Vancouver, Washington
I know a Puerto Rican girl who became the first in her family to finish high
school and who then got a nursing degree. She started at the gallery by
participating in an art program, then worked at the front desk. I know an
Hispanic girl, a participant in one of our drama programs and later a
recipient of our housing resource program, who was the first in her family
to earn a college degree. She now works in the courts with our battered
women program .... Yesterday, I had lunch at a sandwich shop and met a
young woman ... who got a full scholarship to the University of Illinois
and was the first in her family to speak English or go to college. She said,
"That art program was the best thing that ever happened to me."
. (P. Murphy, personal communication, December 2, 1994, quoted in
Mabry, 1998b,p. 154)
This is interview data collected in the course of studying an educational program
in four Chicago schools partnered with neighborhood arts agencies. It 'is
empirical data, but it is unlike test scores, costs per student, or graduation rates.
It is vividly experiential, differently compelling. It is qualitative data.
It is evaluation data, but is it good evaluation data? It is a testimonial from a
program developer, an articulate and committed proponent with a clear bias .
Confirmation of the events described by the respondent would have helped
establish the credibility of this information, but verification of these specifics was
somewhat peripheral to the evaluation and beyond its resources, limited as is the
case for many such inquiries. Evidence that the events described by the .
interviewee were typical rather than isolated would have bolstered her implicit
claim of program worthiness and effectiveness and, overall, the dataset did.
indicate program merit. But a day visitor during the program's first year reported
otherwise, a reminder that such data were open to contrary interpretation. Are
qualitative data stable, credible, useful? Do. they help answer evaluation
questions or complicate them? Are qualitative methods worth the evaluation
resources they consume? How well do they improve understanding of the quality
of educational programsj how much do they clutter and distract?
167
Intt/1lQtiollal Handbook of Educational Evalllalioll, 167-185
T. KtlJaghan, D.L. SlIIfIltbtam (tlls.)
C 2003 Dordrtchl: Kluwtr Acadtmic Publishers. Prinltd in Grtat B,llain.
168 Mabry
Test scores no longer reign as the sole arbiters of educational program quality
even in the U.S., but testing's unabated capacity to magnetize attention reveals a
common yearning for simple, definitive, reliable judgments of the quality of
sChooling. Many want evaluation to provide lucid representation of issues and
unambiguous findings regarding quality - evaluation with bottom lines,
evaluation in black-and-white. Qualitative data, lively and colorful, represent
programs from mUltiple vantage points simultaneously, as cubist portraits do.
Applied to the study of education, where causal attribution for long-term results
is notoriously difficult, where contexts and variables are labyrinthine, where con-
trasting ideologies and practices uneasily coexist, where constructivist learning
theories celebrate idiosyncratic conceptions of knowledge, the expansionist
motive of qualitative methods serve better to reveal complexity than to resolve
it. In educational evaluation, qualitative methods produce detailed, experiential
accounts which promote individual understandings more readily than collective
agreement and consensual programmatic action.
Naturalistic research designs in which "a single group is studied only once" by
means of "tedious collection of specific detail, careCul observation, testing, and
the like" were disapproved by Campbell and Stanley (1963, pp. 6-7), who voiced
early objections from social science traditionalists that "such studies have such a
total absence of control as to be of almost no scientific value" (pp. 6-7). Yet. initial
articulation and justification of qualitative methods in educational evaluation by
Guba (1978) and Stake (1978) arose Crom no less troubling doubts about the
appropriateness and eCCectiveness of quantitative methods which might "Coster
misunderstandings" or "lead one to see phenomena more simplistically than one
should" (Stake, 1978, p. 6-7). Ricocheting doubts and defenses culminated in
public debate among three American Evaluation Association presidents and
others in the early 1990s (see Reichardt & Rallis, 1994). Since then, the so-called
paradigm wars have repeatedly been declared over or moot (see Howe, 1992),
but issues regarding the credibility and feasibility of qualitative methods in evalu-
ation continue to vex not only the so-called "quants," evaluators more confident
of quantitative than qualitative methods, but also the "quaIs" (Rossi, 1994, p. 23).
Qualitative methods, ethnographic in character and interpretivist in tradition,
have earned a place in evaluation's methodological repertoire. A number of
evaluation approaches have been developed that rely heavily on qualitative
methods (see, e.g., Eisner, 1985, 1991; Fetterman, 1996; Greene, 1997; Guba &
Lincoln, 1989; Patton, 1997; Stake, 1973). Mixed-method designs in evaluation
are attractive and popular (see, e.g., Datta, 1994, 1997; Greene, Caracelli, &
Graham, 1989). It has even been claimed that qualitative methods in educational
evaluation have overshadowed quantitative (see Rossi,1994), although questions
linger and recur.
Qualitative epistemology and strategies for data collection, interpretation, and
reporting will be sketched here. Then, under the major categories of The
Program Evaluation Standards (Joint Committee, 1994), issues regarding
qualitative methods in the evaluation of educational programs will be presented
as fundamentally irresolvable.
, ..
al program quality
attention reveals a
of the quality of
.tion of issues and
.th bottom lines,
:olorful, represent
ubist portraits do.
~ long-term results
lthine, where con-
itructivist learning
the expansionist
ity than to resolve
ailed, experiential
lily than collective
iied only once" by
ation, testing, and
. 6-7), who voiced
tudies have such a
p. 6-7). Yet, initial
:)Oal evaluation by
doubts about the
lich might "foster
>listically than one
ses culminated in
m presidents and
then, the so-called
~ s e e Howe, 1992),
methods in evalu-
Irs more confident
Rossi, 1994, p. 23).
!tivist in tradition,
ire. A number of
'ily on qualitative
ne, 1997; Guba &
.igns in evaluation
:ene, Caracelli, &
ods in educational
although questions
nterpretation, and
:ategories of The
issues regarding
; will be presented
In Living Color: Qualitative Methods in Educational Evaluation 169
QUALITATIVE METHODS IN RESEARCH AND EVALUATION
Qualitative methods have been described well in the literature of educational
research by Denzin (1989, 1997); Denzin and Lincoln and their colleagues
(2000); Eisner (1991); Erickson (1986); LeCompte and Preissle (1993); Lincoln
and Guba (1985); Stake (1978, 2000); and Wolcott (1994, 1995). In evaluation,
too, qualitative methods have been described (e.g., Greene, 1994; Guba, &
Lincoln, 1989; Mabry, 1998a) and accepted as legitimate (e.g., House, 1994;
Shadish, Cook, & Leviton, 1991; Worthen, Sanders, & Fitzpatrick, 1997).
Despite warnings of methodological incommensurability (see especially ,Lincoln
& Guba, 1985) and epistemological incommensurability (Howe, 1992),
qualitative and quantitative strategies have been justified for mixed method
designs (see Greene, Caracelli, & Graham,1989), and quantitative strategies are
not unknown as supportive elements in predominantly qualitative evaluation
designs (see Miles & Huberman, 1994). Still, the paradigmatic differences are
stark, even if viewed congenially as points on a series of shared continua.
Qualitative add-ons to quantitative designs are sometimes disregarded in overall
derivations of program quality. Qualitative data are sometimes relegated to
secondary status in other ways as well: considered merely exploratory prepara-
tion for subsequent quantitative efforts; mined for quantifiable indications of
frequency, distribution, and magnitude of specified program aspects; neglected
as inaggregable or impenetrably diverse and ambiguous, helpful only in providing
a colorful vignette or quotation or two. Such dismissiveness demonstrates that
problematic' aspects remain regarding how qualitative data are construed,
collected, interpreted, and reported. .
Epistemological Orientation
Sometimes in painful contrast to clients' expectations regarding scientific and
professional inquiry, qualitative methodologists do not seek to discover,
measure, and judge programs as objects. They - we - do not believe objectivity
is possible. To recognize the program or its aspects, even to notice them, is a
subjective act filtered through prior experience and personal perspective and
values. We do not believe a program is an entity which exists outside human
perception, awaiting our yardsticks. Rather, we think it a complex creation, not
static but continuously co-created by human perceptions and actions.
The meaning and quality of the program do not inhere in its mission statement
and by-laws, policies, personnel ,schematics, processes, outcomes, or relationships
to standards. The program does not exist - or does not meaningfully exist -
outside the experiences of stakeholders, the meanings they attach to those
experiences, and the behaviors that flow from those meanings and keep the
program in a perpetual state of revision. Reciprocally, program change and
evolution affect experiences and perceptions. Thus, the program is pliant, not
fixed, and more subjective than objective in nature. Coming to understand a
0.. : ~ : . ~ ' : , ~ ~ ~ : : : , : .
. :""
170 Mabry
program is less like snapping a photograph of it and more like trying to paint an
impression of it in changing natural light.
Coming to understand a program's quality requires sustained attention from
an array of vantage points, analysis more complex and contextual than can be
anticipated by prescriptive procedures. Representation of a program requires
portrayal of subtle nuances and multiple lines of perspective. Qualitative
evaluators have responded to these challenges with stakeholder-oriented
approaches which prioritize variety in viewpoints. These approaches also offer a
variety of conceptualizations of the evaluator's role and responsibility. Guba and
Lincoln (1989) take as their charge the representation of a spectrum of
stakeholder perceptions and experiences in natural settings and the construction
of an evaluative judgment of program quality by the evaluator, an irreplicable
construction because of the uniqueness of the evaluator's individual rendering
but trustworthy because of sensitive, systematic methods verified by member-
checking, audit trails, and the like. Also naturalistic and generally but not
exclusively qualitative, Stake's (1973) approach requires the evaluator to
respond to emergent understandings of program issues and reserves judgment to
stakeholders, a stance which critics feel sidesteps the primary responsibility of
rendering an evaluation conclusion (e.g., Scriven, 1998). Eisner (1991) relies on
the enlightened eye of the expert to recognize program quality in its critical and
subtle emanations, not easily discerned from the surface even by engaged stake-
holders. These three variations on the basic qualitative theme define respectively
naturalistic, responsive, and connoisseurship approaches.
Some evaluation approaches, generally but not exclusively qualitative, orient
not only to stakeholder perspectives but also to stakeholder interests. Patton's
approach prioritizes the utilization of evaluation results by "primary intended
users" (1997, p. 21) as the prime goal and merit of an evaluation. More internally
political, Greene (1997) presses for local participation in evaluation processes to
transform working relationships among stakeholder groups, especially
relationships with program managers. Fetterman (1996) takes the empowerment
of stakeholders a ~ an explicit and primary goal of evaluation. More specifically
ideological, House (1993) urges evaluation as instrumental to social justice,
House and Howe (1999) as instrumental to deliberative democracy, and Mertens
(1999) as instrumental to the inclusion of the historically neglected, including
women, the disabled, and racial and cultural minorities.
This family of approaches, varying in their reliance on qualitative methods,
hold in common the view that a program is inseparable from subjective percep-
tions and experienc,es of it. For most qualitative practitioners, evaluation is a
process of examination of stakeholders'. subjective perceptions leading to
evaluators' subjective interpretations of program quality. As these interpretations
take shape, the design and progress of a qualitative evaluation emerges, not
preordinate but adaptive, benefiting from and responding to what is learned
about the program along the way.
The palette of qualitative approaches in evaluation reflects the varied origins
of a methodology involving critical trade-ofCs:
.e trying to paint an
ned attention from
:extual than can be
I program requires
ective. Qualitative
lkeholder-oriented
also offer a
,nsibility. Guba and
of a spectrum of
nd the construction
tor, an irreplicable
Idividual rendering
:rified by member-
generally but not
the evaluator to
:serves judgment to
ry responsibility of
ler (1991) relies on
ty in its critical and
1 by engaged stake-
define respectively
, qualitative, orient
. interests. Patton's
"primary intended
on. Moreintemally
uation processes to .
especially
; the empowerment
1. More specifically
II to social justice,
,cracy, and Mertens
eglected, including
ualitative methods,
I subjective percep-
evaluation is a
!ptions leading to
lese interpretations
ation emerges, not
to what is learned
s the varied origins
;".
In Living Color: Qualitative Methods in Educational Evaluation 171
We borrowed methods from anthropology, sociology, and even journalism
and the arts. We were willing to cede some internal validity to gain
authenticity, unit generalization for analytical and naturalistic gener-
alization, objectivity for Verstehen.
1
For some of us, this was a fair trade
in spite of accusations that we are numerical idiots or mere storytellers.
(Smith, 1994, p. 40)
Even as "mere storytellers," qualitative evaluators encounter difficult constraints.
Ethnographers insist upon the necessity of sustained engagement at a site of
study yielding thick description (Geertz, 1973) which recounts perspectives and
events as illustrative elements in cultural analysis, but relatively few educational
evaluations luxuriate in resources sufficient for long-term fieldwork. The
ethnographer's gradual development of themes and cultural categories from
layers of redundancy in extensive observational and interview data becomes, for
qualitative evaluators, compressed by the contract period. Consequently, special
care is needed in the selection of occasions for observation and in the selection
of interviewees, complemented by alertness to unanticipated opportunities to
learn. Still, the data collected will almost certainly be too little to satisfy
ethnographers, too much to reduce to unambiguous interpretation, and too
vulnerable for comfort to complaints about validity.
Compared to the quantitative, qualitative findings encounter disproportionate
challenge regarding validity despite common acknowledgment that "there are no
procedures that will regularly (or always) yield either sound data or true
conclusions" (PhiIlips,1987, p. 21). Qualitative researchers have responded with
conflicting arguments that their work satisfies the traditional notion of validity
(see LeCompte & Goetz, 1982) and that the traditional notion of validity is so
irrelevant to qualitative work as to be absurd (see Wolcott, 1990). Guba's (1981)
effort to conceptualize validity meaningfully for qualitative work generated the
venerable alternative term, "trustWorthiness" (Lincoln & Guba, 1985, p. 218),
which has been widely accepted.
Data Collection
Three data collection methods are the hallmarks of qualitative work: observa-
tion, interview, and review and analysis of documents and artifacts. These methods
provide the empirical bases (or colorful accounts highlighting occurrences and
the experiences and perceptions of participants in the program studied.
Observation is generally unstructured, based on realization that predetermined
protocols, as they direct focus, also introduce blinders and preconceptions into
the data. The intent of structured protocols may be to reduce bias, but bias can
be seen in the prescriptive categories defined for recording observational data,
categories that predict what will be seen and prescribe how it will be documented,
categories that preempt attention to the unanticipated and sometimes more
meaningful observable matters. Structured observation in qualitative work is
-
,"- .'
.:' ..
....
'" .;.'
0-
.,' :":::.'.
172 Mabry
reserved for relatively rare program aspects regarded as requiring single-minded
attention. Interviews tend, for similar reasons, to be semi-structured, featuring
flexible use of prepared protocols to maximize both issue-driven and emergent
information gathering (see also Rubin & Rubin, 1995). Review of relevant
documents and artifacts (see Hodder, 1994) provides another and more
unobtrusive means of triangulation by both method and data source (Denzin,
1989) to strengthen data quality and descriptive validity (Maxwell, 1992).
Hybrid, innovative, and other types of methods increasingly augment these
three data collection mainstays. Bibliometrics may offer insight into scholarly
impact, as the number of citations indicate breadth of program impact (e.g.,
House, Marion, Rastelli, Aguilera, & Weston, 1996). Videotapes may document
observation, .serve as stimuli for interviews, and facilitate repeated or collective
analysis. Finding vectors into informants' thinking, read-aloud and talk-aloud
methods may attempt to convert cognition into language while activities are
observed and documented. "Report-and-respond forms" (Stronach, Allan, &
Morris, 1996, p. 497) provide data summaries and preliminary interpretations to
selected stakeholders for review and revision, offering simultaneous opportunity
for further data collection, interpretive validation, and multi-vocal analysis.
Technology opens new data collection opportunities and blurs some long-
standing distinctions: observation of asynchronous discussion, of online and
distance education classes, and interactions in virtual space; documentation of
process through capture of records; interview by electronic mail,
and so forth.
Restraining the impulse toward premature design creates possibilities for
discovery of foci and issues during data collection. Initial questions are refined
in light of incoming information and, reciprocally, refined questions focus new
data collection. The relationship between data collection and analysis is similarly
reciprocal; preliminary interpretations are drawn from . data and require
verification and elaboration in further data collection. Articulated by Glaser and
Strauss (1967) as the constant comparative method, the usual goal is grounded
theolY, that is, theory arising inductively from and grounded in empirical data.
For qualitative evaluators, the goal is grounded interpretations of program
quality. Evaluation by qualitative methods involves continual shaping and reshaping
through parallel dialogues involving design and data. data and interpretation,
evaluator perspectives and stakeholder perspectives, internal perceptions and
external standards. These dialogues demand efforts to confirm and disconfirm,
to search beyond indicators and facts which may only weakly reflect meaning.
Interpretation, like data responding to emergent foci and issues,
tends to multiply meanings, giving qualitative methods its expansionist character.
Are qualitative methods, revised on the fly in response to the unanticipated,
sufficient to satisfy the expectations of science and professionalism? Lacking the
procedural guarantees of quality presumed by quantitative inquirers, Smith
(1994) has claimed that "in assuming no connection between correct methods
and true accounts, extreme constructivists have seemingly abandoned the search
for the warrant for qualitative accounts" (p. 41). But the seeming abandonment
.;. )
ng single-minded
.ctured, featuring
en and emergent
of relevant
::lther and more
. source (Denzin,
lell, 1992).
Iy augment these
Jht into scholarly
ram impact (e.g.,
es may document
ated or collective
Id and talk-aloud
hile activities are
ronach, Allan, &
interpretations to
leous opportunity
Iti-vocal analysis.
:>lurs some long-
'n, of online and
documentation of
'y electronic mail,
s possibilities for
stions are refined
estions focus new
nalysis is similarly
lata and require
Ited by Glaser and
I goal is grounded
in empirical data.
tions of program
ping and reshaping
nd interpretation,
1 perceptions and
m and disconfirm.
y reflect meaning.
t foci and issues.
nsionist character.
the unanticipated,
llism? Lacking the
: inquirers, Smith
n correct methods
ndoned the search
ling abandonment
r.
. '.
, .
In Living Color: Qualitative Methods in Educational Evaluation 173
of warrant is actually a redirection of efforts - qualitative practitioners seek
substantive warrants rather than procedural ones. Quality in qualitative work is
more a matter of whether the account is persuasive on theoretical, logical. and
empirical grounds, less a matter of strict adherence to generalized, decon-
textualized procedures.
Validity is enhanced by triangulation of data. deliberate attempts to confirm,
elaborate, and disconfirm information by seeking out a variety of data sources,
applying additional methods, checking for similarities and dissimilarities across
time and circumstance. The data, the preliminary interpretations, and drafts of
reports may be submitted to diverse audiences, selected on the bases of expertise
and sensitivity to confidentiality, to try to ensure "getting it right" 1973,
p. 29). Critical review by evaluation colleagues and external substantive experts
may also be sought, and metaevaluation is advisable (as always) as an additional
strategy to manage subjective bias, which cannot be eliminated whatever one's
approach or method.
Interpretation
The data collected by qualitative methods are typically so diverse and ambiguous
that even dedicated practitioners often feel overwhelmed by the interpretive
task. The difficulty is exacerbated by the absence of clear prescriptive
procedures, making it necessary not only to determine the quality of a program
but also to figure out how to determine the quality of a program. The best advice
available regarding the interpretive process is rather nebulous (Erickson, 1986;
Wolcott, 1994), but some char.acteristics of qualitative data analysis are
foundational:
1. Qualitative interpretation is inductive. Data are not considered illustrations or
confirmations of theories or models of programs but, rather, building blocks
for conceptualizing and representing them. Theoretical triangulation
(Deilzin, 1989) may spur deeper understanding of the program and may
surface rival explanations for consideration. but theories are not the a priori
impetus for study. not focal but instrumental to interpretation. Rival
explanations and different lenses for interpreting the data from a varietY of
theoretical vantage points compound the expansionist tendencies of
qualitative data collection and contrast with the data reduction strategies
common to quantitative analysis. .
2. Qualitative interpretation is phenomenological. The orientation is emic,2
prioritizing insiders' (i.e., immediate stakeholders') views, values, interests,
and perspectives over those of outsiders (e.g., theorists, accreditors, even
evaluators). The emphasis on stakeholders' perceptions and experiences has
sometimes been disparaged as an overemphasis leading to neglect of dis-
interested external perspectives (Howe, 1992) or of salient numerical indicators
of program quality (Reichardt & Rallis, 1994, p.IO). But determinations of
!.
:;\: .
.' "
: "
..: ~ ~ : . : : ~ : . , : . ...... : , ~ ..
174 Mabry
program impact necessarily demand learning about the diverse experiences of
participants in natural contexts. Because the respondents selected for
observation and interview by evaluation designers influence what can be
learned about the program, care must be taken to ensure that qualitative data
document a broad band of stakeholder views, not just the interests and
perceptions of clients.
3. Qualitative interpretation is holistic. Because, the program is viewed as a
complex tapestry of interwoven, interdependent threads, too many and too
embedded to isolate meaningfully from the patterns, little attention is
devoted to identifying and correlating variables. Clarity is not dependent on
distinguishing and measuring variables but deflected and obscured by
decontextualizing and manipulating them. Indicators merely indicate,
capturing such thin slices of programs that they may distort more than reveal.
Not the isolation, correlation, and aggregation of data reduced to numerical
representations but thematic analysis, content analysis, cultural analysis, and
symbolic interaction ism typify approaches to qualitative data interpretation.
The effort to understand involves macro- and micro-examination of the data
and identification of emergent patterns and themes, both broad-brush and
fine-grained.
4. Qualitative interpretation is intuitive. Personalistic interpretation is not
merely a matter of hunches, although hunches are teased out and followed up.
It is trying hard to understand complex phenomena from mUltiple empirical
and theoretical perspectives, searching for meaning in the baffling and outlying
data as well as in the easily comprehended. It can be as difficult to describe
and justify as to employ non-analytic analysis, reasoning without rationalism.
Qualitative findings are warranted by data and reasoned from data, but they
are not the residue of easily articulated procedures or of simple juxtapositions
of performances against preordained standards. Analysis is not an orderly
juggernaut of recording the performances of program components, comparing
performances to standards, weighting, and synthesizing (see especially Scriven,
1994; see also Stake, et aI., 1997). Rather, the complexity of the program, of
the dataset, and of the interpretive possibilities typically overwhelm
criteriality and linear procedures for movement from complex datasets to
findings. The effort to understand may, of course, include rationalistic and
even quantitative procedures, but more-or-Iess standardized formalities
generally give way to complex, situated forms of understanding, forms
sometimes unhelpfully termed irrational (see Cohen, 1981, pp. 317-331).
Qualitative interpretation sometimes borrows strategies from the literary and
visual arts, where the capacity of expressiveness to deepen understanding has
long been recognized (see Eisner, 1981). Narrative and metaphoric and artistic
renderings of datasets can open insightful lines of meaning exposition, greatly
enhancing personal comprehension and memorability (Carter, 1993; Eisner,
1981; Saito, 1999). Such interpretation can open rather than finalize discussion,
encouraging deep understanding but perhaps at the expense of definitive findings. ..; ,
:
j
..
experiences of
:nts selected for
nce what can be
1t qualitative data
the interests and
m is viewed as a
00 many and too
ittte attention is
lot dependent on
md obscured by
merely indicate,
more than reveal.
lced to numerical
:ural analysis, and
Ita interpretation.
nation of the data
broad-brush and
rpretation is not
t and followed up.
nultiple empirical
ffling and outlying
fficult to describe
thout rationalism.
'om data, but they
lple juxtapositions
is not an orderly
onents,
especially Scriven,
)f the program, of
ically overwhelm
mplex datasets to
e rationalistic and
rdized formalities
erstanding, forms
,pp.317-331).
n the literary and
understanding has
phoric and artistic
exposition, greatly
ter, 1993; Eisner,
finalize discussion,
definitive findings.
r
I
I
I'
In Living Color: Qualitative Methods in Educational Evaluation 175
As beauty is in the eye of the beholder, different audiences and even different
evaluators render unique, irreplicable interpretations of program quality (see,
e.g., Brandt, 1981). The diversity of interpretive possibilities, not uncommon in
the experiences of clients and evaluators of all stripes, brings into sharp focus not
only problems of consensus and closure but also of bias, validity, and credibility.
Noting that, in evaluation, "judgments often involve multidimensional criteria
and conflicting interests," House (1994), among others, has advised, "the
evaluator should strive to reduce biases in making such judgments" (p. 15). But
bias can be difficult to recognize, much less reduce, especially in advance,
especially in oneself. Naturally occurring diversity in values, in standards of
quality, in experiential understandings, and in theoretical perspectives offer
many layers of bias. Even methodological choices inject bias, and many such
choices must be made. Bias bleeds in with the social and monetary rewards that
come with happy clients, a greater temptation where methods require
greater social interaction, and with political pressures large and small. Subjective
understanding is both the point of qualitative evaluation and its Achilles' heel.
Reporting
Consistent with attention to stakeholder perceptions and experiences in data
collection and interpretation, qualitative evaluation reporting aims for broad
audience accessibility, for vicarious experience of naturalistic events, and for
representation of stakeholder perspectives of those events. Narratives which
reveal details that matter and which promote personal and allusionary
connections are considered important to the development of understanding by
audiences, more complex understanding than is generally available from other
scientific reporting styles (Carter, 1993). Development of implicit understanding
by readers is more desirable than the development of explicit explanations (see
von Wright, 1971) because personalistic tacit knowledge is held to be more
productive of action than is abstracted propositional knowledge (Polanyi, 1958).
Consequently, qualitative representations of programs feature experiential
vignettes and interview excerpts which convey multiple perspectives through
narratives. Such reporting tends to be engaging for readers and readerly - that is,
borrowing a postmodernist term, consciously facilitative of meaning construction
by readers.
Advantageous as the benefits of experiential engagement and understanding
are, there is a significant disadvantage associated with qualitative reporting:
. length. For those evaluation audiences interested only. in the historically
enduring question, "What works?" and their brethren whose appetites stretch no
farther than executive summaries, the voluminousness of an experiential report
with a cornucopia of perspectives is problematic. Some clients, funders, and
primary stakeholders are eager for such informativeness, but others are irritated.
Multiple reports targeted for specific groups can help some, although the gains
in utility compete with costs regarding feasibility. Like other trade-offs in
176 Mabry
qualitative inquiry, those involving report length and audience desires are not
easily resolved.
ISSUES IN QUALITATIVE EVALUATION
Issues introduced in the foregoing discussion of methods will be clustered for
further attention here under the categories of The Program Evaluation Standards
(Joint Committee, 1994): feasibility, accuracy, propriety, and utility.
Feasibility
Qualitative fieldwork requires the devotion of significant resources to accumu-
lating data about day-to-day events to support development and documentation
of patterns and issues illuminative for understanding program quality. Time for
the collection and interpretation ofvoluminous datasets, time for the emergence
of issues and findings, time for validation and interpretation, time for creation of
experiential and multi-vocal reports - time is needed at every stage of qualitative
inquiry, time that is often painfully constrained by contractual timelines and
resources.
The methodological expertise needed for each stage of a qualitative
evaluation is not generously distributed within the population, and training and
experience take even more time. Identifil=ation, preparation, and coordination of
a cadre of data collectors may strain evaluation resources. Substantive expertise,
also needed, often requires further expansion and resources. One may well
ask whether qualitative work can be done well, whether it can be done in a
timely manner, whether it can be done at all under ordinary evaluation circum-
stances.
Scarce as they may be, logistical resources and methodological expertise are
less inherently troublesome than is accurate representation of the perspectives
of mUltiple stakeholders. For the most part, broad professional discussion has
not progressed beyond expressions of interest in stakeholder perspectives and, in
some approaches, in the involvement of stakeholders in some or all evaluation
processes. Serious attention has not yet been devoted to the difficulty of fully
realizing and truly representing diverse stakeholders; especially since the
interests of managers, who typically commission evaluations, may contrast with
those of program personnel and beneficiaries. Nor does the evaluation literature
brim with discussion of the potential for multiple perspectives to obstruct
consensus in decision-making. Documentation of stakeholder perspectives in
order to develop understanding of the multiple realities of program quality is
significantly obstructed by the complexity and diversity of those perspectives and
by contractual and political circumstances.
i
.i
)
. .
lesires are not
clustered for
Ilion Standards
ty.
:es to accumu-
:tocumentation
ality. Time for
the emergence
for creation of
e of qualitative
timelines and
a qualitative
td training and
:oordination of
nlive expertise,
One may well
be done in a
uation circum-
J expertise are
Ie perspectives
discussion has
lectives and, in
all evaluation
fficulty of fully
ally since the
y contrast with
ation literature
es to obstruct
;rerspectives in
quality is
and
i.
.,
'.: ..
. ". .. '
I
, '.:'
. I
.
.1
. ,'.
In Living Color: Qualitative l\tIethods in Educational Evaluation 177
Accuracy
Awareness of the complexity of even small programs, their situationality, and
their fluidity has led qualitative evaluators to doubt quantitative representations
of programs as "numbers that misrepresent social reality" (Reichardt & Rallis,
1994, p. 7). Organizational charts, logic models, budgets - if these were enough
to represent programs accurately, qualitative methods would be a superfluous
luxury, but these are not enough. Enrollment and graduation figures may say
more about the reputation or cost or catchment area of a teacher preparation
program than about its quality. Growth patterns may be silent regarding
personnel tensions and institutional stability. Balanced budgets may be
uninformative about the appropriateness of expenditures and allocations. Such
data may even deflect attention counterproductively for understanding program
quality. But the addition of qualitative methods does not guarantee a remedy for
the insufficiency of quantitative data.
In evaluation, by definition a judgment-intense enterprise,l concerns persist
about the potential mischief of subjective judgment in practice. It is not
subjectivity per se but its associated bias and incursions into accuracy that
trouble. In distinguishing qualitative from quantitative evaluation. Datta (1994)
claims that "the differences are less sharp in practice than in theoretical
statements" (p. 67). But it is the subjectivity of the practicing qualitative
evaluator, not that of the quantitative evaluator or of the evaluation theorist,
which has particularly raised questions regarding accuracy. Qualitative
methodologists are familiar with the notion of researcher-as-instrument, familiar
with the vulnerability to challenge of interpretive findings, familiar with the
necessity of managing subjectivity through such means as triangulation,
validation. and internal and external review, but the familiar arguments and
strategies offer limited security. .
Qualitative evaluation datasets - any datasets - are biased. There is bias in
decisions about which events to observe, what to notice and document, how to
interpret what is seen. There is bias in every interviewee's perspective. Every
document encapsulates a biased viewpoint. Because of the prominence of
subjective data sources and subjective data collectors and especially because of
reliance on subjective interpretation, consciousness of the potential for bias in
qualitative work is particularly strong. The skepticism associated with subjectivity
works against credibility, even when triangulation, validation, and peer review
are thoroughly exercised .
Even more challenging is the task of accurate representation of various
stakeholders. Postmodernists have raised issues that many qualitative evaluators
take to heart: whether outsiders' portrayals of insiders' perspectives ne'cessarily
misrepresent and objectify humans and human experiences, whether authors of
reports have legitimate authority to construct through text the realities of others,
whether the power associated with authorship contributes to the intractable
social inequities of the status quo (Brodkey, 1989; Derrida, 1976; Foucault, 1979;
Lyotard, 1984). These problems may leave evaluation authors writing at an ironic
. ,.
{:
.j
.',J
, .
,
Lt
.
; .
, .;

. . : f
I " j
." = .. :
.' .
..
.

.
178 Mabry
distance from their own reports as they attempt, even as they write, to facilitate
readers' deconstructions (Mabry, 1997), producing open texts which demand
participation in meaning construction from uncomfortable readers (see Abma,
1997; McLean, 1997). Presentation of unresolved complexity and preservation of
ambiguity in reports bewilders and annoys some readers, especially clients and
others desirous of clear external judgments and specific recommendations.
Tightly coupled with representation is misrepresentation (Mabry, 1999b,
1999c); with deep understanding, misunderstanding; with usefulness, misuse.
The very vividity of experiential accounts can carry unintended narrative.fraud.
Even for those who wish to represent it and represent it fully, truth is a mirage .
When knowledge is individually constructed, truth is a matter of perspective.
Determining and presenting what is true about a program, when truth is
idiosyncratic, is a formidable obligation.
If truth is subjective, must reality be? Since reality is apprehended subjectively
and in no other way by human beings and since subjectivity cannot be distilled
from the apprehension of reality, it follows that reality cannot be known with
certainty. The history of scientific revolution demonstrates the fragility of facts,
just as ordinary experience demonstrates the frequent triumph of
misconceptions and preconceptions. No one's reality, no one's truth quite holds
for others, although more confidence is invested in some versions than in others.
Evaluators hope to be awarded confidence. but is it not reasonable that
evaluation should be considered less than entirely credible, given the op-art
elusiveness of truth? Can there be a truth, a bottom line, about programs in the
postmodern era? If there were, what would it reveal. and what would it obscure?
How accurate and credible must - can - an evaluation be?
Accuracy and credibility are not inseparable. An evaluation may support valid
inferences of program quality and valid actions within and about programs but
be dismissed by non-believers or opponents, while an evaluation saturated with
positive bias or simplistic superficialities may be taken as credible by happy
clients and funding agencies. Suspicion about the accuracy of an evaluation, well-
founded or not, undermines its credibility. In an era of suspicion about
representation, truth. and even reality, suspicion about accuracy is inevitable.
The qualitative commitment to multiple realities testifies against the simpler
truths of positivist science, against its comforting .correspondence theory of
truth,4 and against single truths - even evaluators' truths. Alas, accuracy and
credibility are uneven within and across evaluation studies partly because truth
is more struggle than achievement.
Propriety
In addition to the difficulties noted regarding feasibility and accuracy, qualitative
evaluation, as all evaluation, is susceptible to such propriety issues as conflicts of
interest and political manipulation. Dependent as it is on persons, qualitative
fieldwork is particularly vulnerable to micropolitics, to sympathy and persuasion
j.:
. .1 :
I. 'i
'. ;I ....... .
. ' .
to
:te, to facilitate
which demand
(see Abma,
preservation of
ally clients and

Mabry, 1999b,
ulness, misuse.
larrative fraud.
Jth is a mirage.
of perspective.
when truth is
Jed subjectively
not be distilled
be known with
ragility of facts,
t triumph of
'uth quite holds
; than in others.
easonable that
iven the op-art
)rograms in the
)uld it obscure?
'iy support valid
It programs but
I saturated with
dible by happy
!valuation, well-
uspicion about
cy is inevitable.
nst the simpler ,
lence theory of
s, accuracy' and
Iy because truth
. racy, qualitative
es as conflicts of
ions, qualitative
, and persuasion
r
j
,
,
-.'!'
' .
" '.
'.:.!
J
!
":,1
I'
. . .
,:1
. ,I
;.. ....
. Or
.,-'
:
)
, ,j
",
:.
In Living Color: Qualitative Methods in Educational Evaluation 179
at a personal and sometimes unconscious level. The close proximity between
qualitative evaluators and respondents raises special issues related to bias, ethics,
and advocacy. Given the paucity of evaluation training in ethics (Newman &
Brown, 1996) and the myriad unique circumstances which spawn unexpected
ethical problems (Mabry, 1999a), proper handling of these issues cannot be
assured.
Misuse of evaluation results by stakeholders mayor may not be harmful, may
or may not be innocent, mayor may not be programmatically, personally, or
politically expedient. Misuse is not limited to evaluation results - evaluations
may be commissioned to stall action, to frighten actors, to reassure to
stimulate change, to build or demolish support. Failure to perceive stakeholder
intent to misuse and failure to prevent misuse, sometimes unavoidable, may
nevertheless raise questions regarding propriety.
Not only stakeholders but evaluators, too, may misuse evaluation.
Promotionalism of certain principles or certain stakeholders adds to the political
swirl, subtracts from credibility, and complicates propriety. Whether reports
should be advocative and whether they can avoid advocacy is an issue which has
exercised the evaluation community in recent years (Greene & Schwandt, 1995;
House & Howe, 1998; Scriven, Greene, Stake, & Mabry, 1995). The
inescapability of the evaluator's personal values, as fundamental undergirding
for reports, has been noted (Mabry, 1997), a recognition carrying over from
qualitative research (see especially Lincoln & Guba, 1985), but resisted by
objectivist evaluators focused on bias management through design elements and
criterial analysis (see especially Scriven, 1994, More explosive is the
question of whether evaluators should (or should ever) take explicit, proactive
advocative positions in support of endangered groups or principles as part of
their professional obligations (see Greene, 1995, 1997; House & Howe, 1998;
Mabry, 1997; Scriven, 1997; Stake, 1997; Stufflebeam, 1997). Advocacy by
evaluators is seen as an appropriate assumption of responsibility by some and as
a misunderstanding of responsibility by others.
Beneath the arguments for and against advocacy can be seen personal
allegiances regarding the evaluator's primary responsibility. Anti-advocacy
proponents prioritize evaluation information delivery, professionalism, and
credibility. Pro-advocacy proponents' prioritize the program, some aspect of it,
its field of endeavor, such as education (Mabry, 1997) - or more broadly, to
principles that underlie social endeavor such as social justice (House, 1993),
deliberative democracy (House & Howe, 1999), the elevation of specific or
historically underrepresented groups (Fetterman, 1996; Greene, 1997; Mertens,
1999). The focus is more directly on human and societal interests than on
information and science. At issue is whether evaluation should be proactive or
merely instrumental in advancing human, social, and educational agendae - the
nature and scope of evaluation as change agent.
Methodological approaches that pander to simplistic conceptions of reality
and of science raise a different array of propriety issues. Rossi (1994) has
observed that "the quants get the big evaluation contracts" (p. 25), that the
:'.;
.... : .
. -
I _ ~ ; ~ \ ....
180 Mabry
lopsided competition among evaluation professionals regarding approach and
scale "masks a struggle over market share" (p. 35),. and that "the dominant
discipline in most of the big firms is economics" (p. 29). This is problematic in
the evaluation of massive educational programs sponsored by organizations such
as the World Bank (Jones, 1992; Psacharopoulos & Woodhall, 1991), for
example, because education is not properly considered merely a matter of
economics. Educational evaluators should beware designs which imply simple or
simply economic realities and should beware demands to conduct evaluations
according to such designs. Wariness of this kind requires considerable alertness
to the implications of methodology and client demands and considerable ethical
fortitude.
Utility
As an applied social science, evaluation's raison d'etre is provision of grounding
for sound decisions within and about programs. Both quantitative and qualitative
evaluations have influenced public policy decisions (Datta, 1994, p. 56), although
non-use of evaluation results has been a common complaint among work-weary
evaluators, some of whom have developed strategies (Chelimsky, 1994) and
approaches (Patton, 1997) specifically intended to enhance utilization.
Qualitative evaluation raises troublesome questions for utility - questions which
again highlight the interrelatedness of feasibility, accuracy, propriety, and utility:
Are reports too long to be read, much less used? Is it possible to ensure accurate,
useful representation of diverse interests? Can reports be prepared in time to
support program decisions and actions? Are they credible enough for confident,
responsible use? At least for small-scale educational evaluations, qualitative
work has been described as more useful than quantitative to program operators
(Rossi, 1994)5 but, unsurprisingly, some quantitative practitioners hold that the
"utility is extremely limited for my setting and the credibility of its findings is too
vulnerable" (Hedrick, 1994, p. 50, referring to Guba & Lincoln, 1989).
The invitation to personal understanding that characterizes many qualitative
reports necessarily opens opportunity for interpretations different from the
evaluator's. Respect for individual experience and knowledge construction
motivates qualitative report-writers and presumes the likelihood of more-or-Iess
contrary interpretations. The breadth and magnitude of dissent can vary greatly
and can work not only against credibility but also against consensual
programmatic decision-making.
The qualitative characteristic of openness to interpretation highlights the
questions: Use by whom? And for what? If it is not (or not entirely) the
evaluator's interpretations that direct use. whose should it be? The too-facile
response that stakeholders' values. criteria, or interpretations should drive
decisions underestimates the gridlock of natural disagreement among competing
stakeholder groups. Prioritization of the interests of managerial decision-
makers, even in the interest of enhancing utility, reinforces anti-democratic
-,
;
',:
,
. <
.,
. 1
. !
In Living Color: Qualitative Methods in Educational Evaluation 181
limitations on broad participation. Attention to the values, interests, and
perspectives of multiple stakeholders can clarify divisions and entrench
d i s s e ~ s u s . Consideration of issues related to qualitative evaluation, such as issues
of epistemology and authority, make it all too clear that utility and propriety, for
example, are simultaneously connected and conflicted.
REALITY, REALISM, AND BEING REALISTIC
Let's be realistic. The reality of educational programs is too complex to be
represented as dichotomously black and white. Qualitative approaches are
necessary to portray evaluands with the shades of meaning which actually
characterize the multi-hued realities of programs. But while the complex nature
of educational programs suggests the necessity of qualitative approaches to
evaluation, the dizzying variety of stakeholder perspectives as to a program's real
failures and accomplishments, the ambiguous and conflicting interpretations
which can be painted from qualitative data, and the resource limitations
common to evaluations of educational programs may render qualitative
fieldwork unrealistic.
Is educational evaluation a science, a craft, an art? Realism in art refers to
. photograph-like representation in which subjects are easily recognized by
outward appearance, neglecting perhaps their deeper natures. Hyperrealism
refers to portrayals characterized by such meticulous concentration on minute
physical details - hair follicles and the seams in c1othing:- as to demand attention
to technique, sometimes deflecting it from message. Surrealism, on the other
hand, refers to depiction of deep subconscious reality through the fantastic and
incongruous, but this may bewilder more than enlighten. Artists from each
movement offer conflicting views of what is real - views which inform, baffle,
repel, and enthrall audiences. In evaluation, different approaches provide
different kinds of program representations (see Brandt, 1981), with a similar
array of responses from clients and other stakeholders. Whether the program is
recognizable as portrayed in evaluation reports is necessarily. dependent upon
the acuity of audiences as well as the skill of evaluators. Such is our daunting
professional reality.
According to some philosophers of art, artworks are not the physical pieces
themselves but the conceptual co-creations of artists and beholders. According
to some theories of reading and literary criticism, text is co-created by authors
and readers. As analogies regarding meaning and authorship, these notions
resonate 'with the actual experiences of evaluators. Our reports document data
and interpretations of program quality, sometimes participatory interpretations,
but they are not the end of the brushstroke. The utility standard implies the
practical priority of stakeholder interpretations of program quality, those who
ultimately make, influence, and implement program decisions.
In the hands of accomplished practitioners, educational evaluation may seem
an art form, but most clients expect not art but science - social science, applied
','
j "
" ,.
'r
. ",;
I
.
i ','I
'., '.: ~
.
I
182 Mabry
science. Programs have real consequences for real people, however multiple and
indeterminate the reality of programs may be. Such realization suggests need for
complex qualitative strategies in evaluation, with all the living color associated
with real people and all the local color associated with rea) contexts, and with all
the struggles and irresolutions they entail.
ENDNOTES
1 Dilthey (1883) prescribed hermeneutical or interpretive research to discover the meanings and
perspectives of people studied, a matter he referred to as Ve rstchen (1883). .
Anthropologists have distinguished between elic accounts which prioritize the meanings and
explanations of outside observers from emic accounts which prioritize indigenous meanings and
understandings (see Seymour-Smith, 1986, p. 92).
l Worthen, Sanders, and Fitzpatrick note that, "among professional evaluators, there is no uniformly .
agreedupon definition of precisely what the term evaluation means. It has been used by various
evaluation theorists to refer to a great many disparate phenomena" (1997, p. S, emphasis in the
original). However, the judgmental aspect, whether the judgment is of the evaluator or
someone else, is consistent across evaluators' definitions of evaluation: (I) Worthen, Sanders, &
Fitzpatrick: "Put most simply, evaluation is determining the worth or merit of an evaluation object"
(1997, p. 5). (2) Michael Scriven: "The key sense of the term 'evaluation' refers to the process of
determining the merit, wolth, or value of something, or the product of that process" (1991, p. 139,
emphasis in the original). (3) Ernest House: "Evaluation is the determination of the merit or worth
of something, according to a set of crileria, with those criteria (often but not always) explicated and
justified" (1994, p. 14, emphasis added).
The positivist correspondence theory of truth holds that a representation is true if it corresponds
. exactly to reality and is verifiable by observation.
S Note, however, that the very helpfulness of these evaluations has led to claims that they arc not
evaluations at all but rather, "management consultations" (Rossi, 1994, p. 33; Scriven, 1998).
REFERENCES
Abma, T. (1997). Sharing power, facing ambiguity. In L. Mabry (Ed.), in program
evaluation: VoL 3 Evaluation and the post-modem dUemma (pp. 105-119). Greenwich, CI': JAI
Press.
Brandt, R.S. (Ed.). {19Bl}. Applied strategies for cumcuiwn evaluation. Alexandria, VA: ASCD.
Brodkey, L. (1989). On the subjects of class and gender in "The literacy letters." College English, 51,
125-141.
Campbell, D.T. & Stanley, J.C. (1963). Experimental tmd qllasierperimental designs for research.
Boston: Houghton-Mifflin.
Carter, K. (1993). The place of story in the study of teaching and teacher education. Educational
Researcher, 22(1), 5-12, 18.
Chelimsky, E. (1994). Evaluation: Where we are. Evaluation Practice, 15(3),339-345.
Cohen, L.1. (1981). Can human. irrationality be experimentally demonstrated? Behavioral and Brain
Sciences, 4, 317-331.
Dalla, L. (1994). Paradigm wars: A basis for peaceful coexistence and beyond. In C.S. Reichardt, &
S.F. Rallis (Eds.), The qualitative-quantitative debate: New perspectives New Directionsfor Program
Evaluation, 61, 153-170.
Datta, L. (1997). Multimethod evaluations: Using case studies together with other methods. In E.
Chelimsky, & W.R. Shadish (Eds.), Evaluation for the 21s1 century: A handbook (pp. 344.JS9).
Thousand Oaks, CA: Sage.
Denzin, N.K. (1989). The research act: A theoretical introduction to sociological metllods (lrd ed.).
Englewood Cliffs, NJ: Prentice Hall.
T
,
i
I
!
,
1
I
!
i
r multiple and
:gests need for
Ilor associated
ts, and with all
In Living Color: Qualitative Methods in Educational Evaluation 183
Dentin, N.K. (1997). Interpretive ethnography: Ethnographic pl'actices for the 21st century. Thousand
Oaks, CA: Sage.
Denzin, N.K. & Lincoln, Y.S. (2000). Handbook of qualitative reseal'ch (2nd cd.). Thousand Oaks,
CA: Sage.
Derrida, J. (1976). On grammatology (trans. G. Spivak). Baltimore, MD: Johns Hopkins University
Press.
Dilthey, W. (1883). The development of hermeneutics. In H.P. Richman (Ed.), W. DUthey: Selected
writings. Cambridge: Cambridge University Press.
Eisner, E.W. (1981). On the differences between scientific and artistic approaches to qualitative
research. Educational Researcher, 10(4}, 5-9.
Eisner, E.W. (1985). The art of educational evaluation: A persona/view. London: Falmer.
Eisner, E.W. (1991). Tht enlightened eye: Qualitative inquiry and the tnhancement of educational
practice. NY: Macmillan.
Erickson, F. (1986). Qualitative methods in research on teaching. In M.C. Wittrock (Ed.), Hcmdbook
of research on teaching (3rd ed.), (pp. 119-161). New York: Macmillan.
Fetterman, D.M. (1996). Empowerment evaluation: Knowledgt and tools for selfnssessment and
accountabUity. Thousand Oaks, CA: Sage.
Foucault, M. (1979). What is an author? Screen, Spring.
GeertI, C. (1973). The interpretation of cultures: Selected essays. New York: Basic Books.
Glaser, B.G. & Strauss, A.I. (1967). The discovery of grounded theory. Chicago, IL: Aldine.
Greene, J.C. (1994). Qualitative program evaluation: Practice and promise. In N.K. Denzin & Y.S.
Lincoln (Eds.), Handbook of qualitative research (pp. 530-544). Newbury Park, CA: Sage.
Greene, J.C. (1997). Participatory evaluation. In 1.. Mabry (Ed.), Advances in program evaluation:
Evaluation and the post-modem dilemma (pp. 171-189). Greenwich, CT: JAI Press.
Greene, J.C., Caracelli, v., & Oraham, w.F. (1989). Toward a conceptual framework for multimethod
evaluation designs. Educational Evaluation and Policy Analysis, 11,255-274.
Greene, J.O. & Schwandt, T.A. (1995). Beyond qualuativt evaluation: The significance o/"positioning"
oneself. Paper presentation to the International Evaluation Conference, Vancouver, Canada.
Ouba. E.G. (1978). Toward a methodology of naturalistic inquiry in educational evaluation.
Monograph 8. Los Angeles: UCLA Center for the Study or Evaluation.
Guba, E.G. (198J). Criteria for assessing the trustworthiness of naturalistic inquiries. Educational
Communication and Technology Journal, 29, 75-92.
Guba, E.G. & Lincoln, Y.S. (1989). Fourth genef'lJtion evaluation. Thousand Oaks, CA: Sage.
Hedrick, 'IE. (1994). The quantitalive-qualitative debate: Possibilities for integration. In C.s.
Reichardt & S.F. Rallis (Eds.). The qualiladve-quantuative debate: New perspectives. New Directions
for Program Evaluation, 61,145-152.
Hodder, 1. (1994). The interpretation of documents and material culture. In Denzin, N.K.. &
Lincoln, Y.S. (Eds.), Handbook of qualitative research (pp. 403-412). Thousand Oaks, CA: Sage.
House, E.R. (1993). Professional tvaluation: Social impact and poluical consequences. Newbury Park,
CA: Sage.
House. E.R. (1994). Integrating the quantitative and qualitative. In C. S. Reichardt, & S. F. Rallis
(Eds.), The qualitativequantitative defHIte: New perspectives. New Directions for Program Evaluation,
61, 113-122-
House, E.R., & Howe, K.R. (1998). The issue of advocacy in evaluations. Anwican Journal of
Evaluation, 19(2}, 233-236.
House. E.R . & Howe, K.R. (1999). Values UI evaluation and social resealrh. Thousand Oaks, CA:
Sage.
House, E.R . Marion. S.F.. Rastelli, 1.., Aguilera, D., & Weston, T. (1996). Evaluating R&D impact.
University of Colorado at Boulder: Unpublished report.
Howe, K. (1992). Getting over the quantitative-qualitative debate. American Journal of Education,
100(2). 236-256. .
Joint Committee on Standards (or Educational Evaluation (1994). The program tvaluation stalUulffu:
How to assess evaluations of educational programs (2nd cd.). Thousand Oaks, CA: Sage.
Jones. P. (1992). World Bank fUUlllcing of education: Lending, learning and development. London:
Routledge.
leCompte, M.D. & Ooetz, J.P. (1982). Problems of reliability and validity in ethnographic research.
&view of Educational Research. 52,31-60.
,.1,
, . ~ , : .
'.
':\"
, ',If",
't
, ..
'. ': ',: .'- ; ~ ; ; , - ",
_' r
. - - ~ ~ - - : ' "
184 Mabry
LeCompte, M.D. & Preissle, 1. (1993). Ethnography Ilnd qlla/ilCllive design illeducatiUlta/ research (2nd
ed.). San Diego: Academic Press.
lincoln, Y.S. & Guba, E.G. (1985). Natura/islic inquit)l. Newbury Park, CA: Sage.
Lyotard, J . F. (198-1). The pOJlmodem condi/ion: A rcPtJT/ on knowledge. Minneapolis: University of
Minnesota Press.
Mabry, L. (Ed.). (1997). Advances in program tva/llalion: VtJI. J. EVII/ltalion and the posl.modem
dilemma. Greenwich. Cf: JAI Press.
Mabry, L. (1998a). Case study methods. In HJ. Walberg. & A.J. Reynolels (Eds.), Advances in
educational productivity: Vol. 7. Evalutllion research for educalional productivity (pp. 155-170).
Greenwich, cr: JAI Press.
Mabry, L. (1998b). A forward LEAP: A study of the involvement of Beacon Street Art Gallery and
Theatre in the Lake View Education and Arts Partnership. In D. Boughton & K.G. Congdon
(Eels.), Advances in program evaluation: Vol. 4. Evaluating un education progrom1 in community
centel1: International perspeClives on problems of conception ond practice. Greenwich. Cf: JAI Press.
Mabry, L. (1999a). Circumstantial ethics. American Journal of Evaluation, 20(2), 199-212.
Mabry, t. (1999b. April). On representation. Paper presented an invited symposium at the annual
meeting of the American Educational Research Association, Montreal.
Mabry. L. (l999c, November). Truth and nantl/ive repre.ftntation. Paper presented at the annual
meeting of the American Evaluation Association. Orlando. FL.
Maxwell, J.A. (1992). Understanding and validity in qualitative research. HaT\lard Educational
&view, 62(3).279-300.
Mclean. L.O. (1997). It in search of truth an evalu:llor. In L. Mabry (Ed.). Advances in program
evalualion: Evaluation and Ihe post.modem dilemma (pp. 139-153). Greenwich, CT: JAI Press.
Mertens, D.M. (1999). Inclusive evaluation: Implications of transformalive theory for evaluation .
American Journal of EIIQfuation. 20, 1-14.
Miles. M.B., & Huberman, A.M. (1994). Qua/halive data analysis: An expanded sourcebook (2nd ed.).
Thousand Oaks, CA: Sage.
Newman. D.L. & Brown. R.D. (1996). Applied ethics for program evaluation. Thousand Oaks, CA:
Sage.
PaIlOn, M.O. (1997). Utilizationfocused tvaluotion (3rd ed.). Thousand Ouks. CA: Sage.
Phillips, D.C. (1987). Validity in qualitative research: Why the worry about warrant will not wane.
Education and Urban SOciety, 20, 9-24.
Polanyi, M. (19S8).l'ersonaf knowledge: Towards a postcritical philosophy. Chicago, It: University of
Chicago Press.
Psacharopoulos, G . & Woodhall, M. (1991). Education for development: An analysis of inveslmelll
choices. New York: Oxford University Press.
Reichardt, C.S., & Rallis, S.F. (Eds.) (1994). The qualitalivequantitative debate: New perspectives.
New Directions lor Program Evafuatinn, 61.
Rossi. P.H. (1994). The war between the quais and the quants: Is a lasting peace possible? In C.S.
Reichardt, & S.F. Rallis (Eds.), The qualitative.quanti/olive debate: New persptclives. New DirectiollS
for Program Ellaluation, 61,23-36.
Rubin, H.J., & Rubin. 1.5. (1995). Qualitative inteNiewing: The aT/ 0/ hearing data. Thousand Oaks.
CA: Sage.
Saito, R. (1999). A phenomenological-e.ristential approach (0 instruclional social computer simulation.
Unpublished doctoral dissertation, Indiana University. Bloomington, IN.
Scriven, M. (1991). Evaluation thesaurus (4th cd.). Newbury Park, CA: Sage.
Scriven, M. (1994). The final synthesis. Evalumion Practice, 15(3).367-382.
Scriven. M. (1997). Truth and objectivity in evaluation. In E. Chelimsky. & W.R. Shadish (Eds.).
Ella/ua/ionlor the 21st century: A handbook (pp. 477-500). Thousand Oaks, CA: Sage.
Scriven, M. (1998, November). An ella/uation dilemma; Change agent liS. analyst. Paper presented at
the annual meeting ot the American Evaluation Association, Chicago.
Scriven. M., Greene, J., Stake, R . & Mabry. L. (1995. November). Advocacy for our clients: The
necessary evil in evaluation? Panel presentation to the International Evaluation Conference.
Vancouver, BC.
SeymourSmith, C. (1986). Diceionary of anlhropology. Boston: G. K. Hnll.
Shadish, W.R., Jr., Cook, T.O. & Leviton. L.C. (199 J ). Foundations 01 program eva/liation: Theories of
p,.actice. Newbury Park, CA: Sage.
...
! research (2nd
University of
~ post-modem
, Advancer in
pp. 155-170).
rt Gallery and
:.G. Congdon
in community
0: JA r Press.
!12.
at the annual
at the annual
I Educational
es in program
JAI Press.
:lr evaluation.
ook (2nd Cd.).
nd Oaks, CA:
~ e .
.viII not wane.
University of
of inlltstmenl
, perspectives.
;sible? In C.S.
VtW Directions
.ousand Oaks,'
ter sinwlation.
had ish (Eds.),
leo
r presented at
q c/ienJS: The
1 Conference,
,n: Theories of
r
f
.J
. ~ .
In Living Color: Qualitative Methods in Educational Evaluation 185
Smith, M.L. (1994). Qualitative plus/versus quantitative: The Illst word. In C.S. Reichardt & S.F.
Rallis (Eds.), The qualitativequantitative debate: New pmpectilles. New Directions fo, Progrom
Ellaluation, 61, 37-44.
Stake, R.E. (1973). Program evaluation, particularly responsive evaluation. Paper presented al
conference on New Trends in Evaluation, GClteborg, Sweden. Reprinted in G.F. Madaus, M.S.
Scriven & StuCflebeam, D.L. (1987), Evaluation models: Viewpoints on educational and human
services evaluation (pp. 287-310). Boston: K1uwer-Nijhoff.
Stake, R.E. (1978). The case study method in socinl inquiry. Educational Researcher, 7(2), 5-8.
Stake, R.E. (1997). Advocacy in evaluation: A necessary evil? In E. Chelimsky, & w.R. Shndish
(Eds.), Evaluation for thl! 21st century: A handbook (pp. 470-476). Thousand Oaks, CA: Sage.
Stake, R.E. (2000). Case studies. In N.K. Denzin, & Y.S. Lincoln (Eds.), Handbook of qualitative
research (2nd ed.) (pp. 236-247). Thousand Oaks, CA: Sage.
Stake, R., Migotsky, C., Davis, R., Cisneros, E., DePaul, G., Dunbar, C. Jr., et al. (1997). The .
evolving synthesis of program vnlue. Evaluation Proctice, 18(2),89-103.
Stronach, I., Allan, J. & Morris, B. (1996). Can the mothers of invention make virtue out of
necessity? An optimistic deconstruction of research compromises in contract research and
evalu:nion. British Educational Research Joumal, 22(4): 493-509.
Stufflebeam, D.L. (1997). A standards-based perspective on evaluation. In L. Mabry (Ed.), Advances
in program ellaluation: Vol. J Evaluation and the post-modem dilemma (pp. 61-88). Greenwich, 0:
JAI Press.
von Wright, G.H. (1971). Explanation and understanding. London: Routledge & Kegan Paul.
Wolcott, H.F. (1990). On seeking - and rejecting - validity in qualitative research. In E.W. Eisner &
A. Peshkin (Eds.), Qualitative inquiry in education: The continuing debate (pp. 121-152). New York:
Teachers College Press.
Wolcott. H.F. (1994). Transforming qualitative dota: Description. analysis, and interpretation. Thousand
Oaks, CA: Sage.
Wolcott, H.F. (1995). The art offieldwork. Walnut Creek, CA: Aha Mira.
Worthen, B.R., Sanders, J.R., & Fitzpatrick, J.L. (1997). Program evaluation: Altemative approachts
and practical guidelines (2nd ed.). New York: Longman.
.' .1

Vous aimerez peut-être aussi