catiOllal Evaluation ducational practice. npany. a University. lmpet Club. . New York: Basic ve psychology. San ttlefield Publishers. "inners and losers in mis, Trans. & Ed.). ,"," : . . ; ,'," '1' " . ,'. . Jo reatA far See- p. g e-u. " 9 fvr tuStlV1t5. In Living Color: Qualitative Methods in 6--""" 0 ' Educational Evaluation LINDA MABRY Washington Stale University Wlncouver, Vancouver, Washington I know a Puerto Rican girl who became the first in her family to finish high school and who then got a nursing degree. She started at the gallery by participating in an art program, then worked at the front desk. I know an Hispanic girl, a participant in one of our drama programs and later a recipient of our housing resource program, who was the first in her family to earn a college degree. She now works in the courts with our battered women program .... Yesterday, I had lunch at a sandwich shop and met a young woman ... who got a full scholarship to the University of Illinois and was the first in her family to speak English or go to college. She said, "That art program was the best thing that ever happened to me." . (P. Murphy, personal communication, December 2, 1994, quoted in Mabry, 1998b,p. 154) This is interview data collected in the course of studying an educational program in four Chicago schools partnered with neighborhood arts agencies. It 'is empirical data, but it is unlike test scores, costs per student, or graduation rates. It is vividly experiential, differently compelling. It is qualitative data. It is evaluation data, but is it good evaluation data? It is a testimonial from a program developer, an articulate and committed proponent with a clear bias . Confirmation of the events described by the respondent would have helped establish the credibility of this information, but verification of these specifics was somewhat peripheral to the evaluation and beyond its resources, limited as is the case for many such inquiries. Evidence that the events described by the . interviewee were typical rather than isolated would have bolstered her implicit claim of program worthiness and effectiveness and, overall, the dataset did. indicate program merit. But a day visitor during the program's first year reported otherwise, a reminder that such data were open to contrary interpretation. Are qualitative data stable, credible, useful? Do. they help answer evaluation questions or complicate them? Are qualitative methods worth the evaluation resources they consume? How well do they improve understanding of the quality of educational programsj how much do they clutter and distract? 167 Intt/1lQtiollal Handbook of Educational Evalllalioll, 167-185 T. KtlJaghan, D.L. SlIIfIltbtam (tlls.) C 2003 Dordrtchl: Kluwtr Acadtmic Publishers. Prinltd in Grtat B,llain. 168 Mabry Test scores no longer reign as the sole arbiters of educational program quality even in the U.S., but testing's unabated capacity to magnetize attention reveals a common yearning for simple, definitive, reliable judgments of the quality of sChooling. Many want evaluation to provide lucid representation of issues and unambiguous findings regarding quality - evaluation with bottom lines, evaluation in black-and-white. Qualitative data, lively and colorful, represent programs from mUltiple vantage points simultaneously, as cubist portraits do. Applied to the study of education, where causal attribution for long-term results is notoriously difficult, where contexts and variables are labyrinthine, where con- trasting ideologies and practices uneasily coexist, where constructivist learning theories celebrate idiosyncratic conceptions of knowledge, the expansionist motive of qualitative methods serve better to reveal complexity than to resolve it. In educational evaluation, qualitative methods produce detailed, experiential accounts which promote individual understandings more readily than collective agreement and consensual programmatic action. Naturalistic research designs in which "a single group is studied only once" by means of "tedious collection of specific detail, careCul observation, testing, and the like" were disapproved by Campbell and Stanley (1963, pp. 6-7), who voiced early objections from social science traditionalists that "such studies have such a total absence of control as to be of almost no scientific value" (pp. 6-7). Yet. initial articulation and justification of qualitative methods in educational evaluation by Guba (1978) and Stake (1978) arose Crom no less troubling doubts about the appropriateness and eCCectiveness of quantitative methods which might "Coster misunderstandings" or "lead one to see phenomena more simplistically than one should" (Stake, 1978, p. 6-7). Ricocheting doubts and defenses culminated in public debate among three American Evaluation Association presidents and others in the early 1990s (see Reichardt & Rallis, 1994). Since then, the so-called paradigm wars have repeatedly been declared over or moot (see Howe, 1992), but issues regarding the credibility and feasibility of qualitative methods in evalu- ation continue to vex not only the so-called "quants," evaluators more confident of quantitative than qualitative methods, but also the "quaIs" (Rossi, 1994, p. 23). Qualitative methods, ethnographic in character and interpretivist in tradition, have earned a place in evaluation's methodological repertoire. A number of evaluation approaches have been developed that rely heavily on qualitative methods (see, e.g., Eisner, 1985, 1991; Fetterman, 1996; Greene, 1997; Guba & Lincoln, 1989; Patton, 1997; Stake, 1973). Mixed-method designs in evaluation are attractive and popular (see, e.g., Datta, 1994, 1997; Greene, Caracelli, & Graham, 1989). It has even been claimed that qualitative methods in educational evaluation have overshadowed quantitative (see Rossi,1994), although questions linger and recur. Qualitative epistemology and strategies for data collection, interpretation, and reporting will be sketched here. Then, under the major categories of The Program Evaluation Standards (Joint Committee, 1994), issues regarding qualitative methods in the evaluation of educational programs will be presented as fundamentally irresolvable. , .. al program quality attention reveals a of the quality of .tion of issues and .th bottom lines, :olorful, represent ubist portraits do. ~ long-term results lthine, where con- itructivist learning the expansionist ity than to resolve ailed, experiential lily than collective iied only once" by ation, testing, and . 6-7), who voiced tudies have such a p. 6-7). Yet, initial :)Oal evaluation by doubts about the lich might "foster >listically than one ses culminated in m presidents and then, the so-called ~ s e e Howe, 1992), methods in evalu- Irs more confident Rossi, 1994, p. 23). !tivist in tradition, ire. A number of 'ily on qualitative ne, 1997; Guba & .igns in evaluation :ene, Caracelli, & ods in educational although questions nterpretation, and :ategories of The issues regarding ; will be presented In Living Color: Qualitative Methods in Educational Evaluation 169 QUALITATIVE METHODS IN RESEARCH AND EVALUATION Qualitative methods have been described well in the literature of educational research by Denzin (1989, 1997); Denzin and Lincoln and their colleagues (2000); Eisner (1991); Erickson (1986); LeCompte and Preissle (1993); Lincoln and Guba (1985); Stake (1978, 2000); and Wolcott (1994, 1995). In evaluation, too, qualitative methods have been described (e.g., Greene, 1994; Guba, & Lincoln, 1989; Mabry, 1998a) and accepted as legitimate (e.g., House, 1994; Shadish, Cook, & Leviton, 1991; Worthen, Sanders, & Fitzpatrick, 1997). Despite warnings of methodological incommensurability (see especially ,Lincoln & Guba, 1985) and epistemological incommensurability (Howe, 1992), qualitative and quantitative strategies have been justified for mixed method designs (see Greene, Caracelli, & Graham,1989), and quantitative strategies are not unknown as supportive elements in predominantly qualitative evaluation designs (see Miles & Huberman, 1994). Still, the paradigmatic differences are stark, even if viewed congenially as points on a series of shared continua. Qualitative add-ons to quantitative designs are sometimes disregarded in overall derivations of program quality. Qualitative data are sometimes relegated to secondary status in other ways as well: considered merely exploratory prepara- tion for subsequent quantitative efforts; mined for quantifiable indications of frequency, distribution, and magnitude of specified program aspects; neglected as inaggregable or impenetrably diverse and ambiguous, helpful only in providing a colorful vignette or quotation or two. Such dismissiveness demonstrates that problematic' aspects remain regarding how qualitative data are construed, collected, interpreted, and reported. . Epistemological Orientation Sometimes in painful contrast to clients' expectations regarding scientific and professional inquiry, qualitative methodologists do not seek to discover, measure, and judge programs as objects. They - we - do not believe objectivity is possible. To recognize the program or its aspects, even to notice them, is a subjective act filtered through prior experience and personal perspective and values. We do not believe a program is an entity which exists outside human perception, awaiting our yardsticks. Rather, we think it a complex creation, not static but continuously co-created by human perceptions and actions. The meaning and quality of the program do not inhere in its mission statement and by-laws, policies, personnel ,schematics, processes, outcomes, or relationships to standards. The program does not exist - or does not meaningfully exist - outside the experiences of stakeholders, the meanings they attach to those experiences, and the behaviors that flow from those meanings and keep the program in a perpetual state of revision. Reciprocally, program change and evolution affect experiences and perceptions. Thus, the program is pliant, not fixed, and more subjective than objective in nature. Coming to understand a 0.. : ~ : . ~ ' : , ~ ~ ~ : : : , : . . :"" 170 Mabry program is less like snapping a photograph of it and more like trying to paint an impression of it in changing natural light. Coming to understand a program's quality requires sustained attention from an array of vantage points, analysis more complex and contextual than can be anticipated by prescriptive procedures. Representation of a program requires portrayal of subtle nuances and multiple lines of perspective. Qualitative evaluators have responded to these challenges with stakeholder-oriented approaches which prioritize variety in viewpoints. These approaches also offer a variety of conceptualizations of the evaluator's role and responsibility. Guba and Lincoln (1989) take as their charge the representation of a spectrum of stakeholder perceptions and experiences in natural settings and the construction of an evaluative judgment of program quality by the evaluator, an irreplicable construction because of the uniqueness of the evaluator's individual rendering but trustworthy because of sensitive, systematic methods verified by member- checking, audit trails, and the like. Also naturalistic and generally but not exclusively qualitative, Stake's (1973) approach requires the evaluator to respond to emergent understandings of program issues and reserves judgment to stakeholders, a stance which critics feel sidesteps the primary responsibility of rendering an evaluation conclusion (e.g., Scriven, 1998). Eisner (1991) relies on the enlightened eye of the expert to recognize program quality in its critical and subtle emanations, not easily discerned from the surface even by engaged stake- holders. These three variations on the basic qualitative theme define respectively naturalistic, responsive, and connoisseurship approaches. Some evaluation approaches, generally but not exclusively qualitative, orient not only to stakeholder perspectives but also to stakeholder interests. Patton's approach prioritizes the utilization of evaluation results by "primary intended users" (1997, p. 21) as the prime goal and merit of an evaluation. More internally political, Greene (1997) presses for local participation in evaluation processes to transform working relationships among stakeholder groups, especially relationships with program managers. Fetterman (1996) takes the empowerment of stakeholders a ~ an explicit and primary goal of evaluation. More specifically ideological, House (1993) urges evaluation as instrumental to social justice, House and Howe (1999) as instrumental to deliberative democracy, and Mertens (1999) as instrumental to the inclusion of the historically neglected, including women, the disabled, and racial and cultural minorities. This family of approaches, varying in their reliance on qualitative methods, hold in common the view that a program is inseparable from subjective percep- tions and experienc,es of it. For most qualitative practitioners, evaluation is a process of examination of stakeholders'. subjective perceptions leading to evaluators' subjective interpretations of program quality. As these interpretations take shape, the design and progress of a qualitative evaluation emerges, not preordinate but adaptive, benefiting from and responding to what is learned about the program along the way. The palette of qualitative approaches in evaluation reflects the varied origins of a methodology involving critical trade-ofCs: .e trying to paint an ned attention from :extual than can be I program requires ective. Qualitative lkeholder-oriented also offer a ,nsibility. Guba and of a spectrum of nd the construction tor, an irreplicable Idividual rendering :rified by member- generally but not the evaluator to :serves judgment to ry responsibility of ler (1991) relies on ty in its critical and 1 by engaged stake- define respectively , qualitative, orient . interests. Patton's "primary intended on. Moreintemally uation processes to . especially ; the empowerment 1. More specifically II to social justice, ,cracy, and Mertens eglected, including ualitative methods, I subjective percep- evaluation is a !ptions leading to lese interpretations ation emerges, not to what is learned s the varied origins ;". In Living Color: Qualitative Methods in Educational Evaluation 171 We borrowed methods from anthropology, sociology, and even journalism and the arts. We were willing to cede some internal validity to gain authenticity, unit generalization for analytical and naturalistic gener- alization, objectivity for Verstehen. 1 For some of us, this was a fair trade in spite of accusations that we are numerical idiots or mere storytellers. (Smith, 1994, p. 40) Even as "mere storytellers," qualitative evaluators encounter difficult constraints. Ethnographers insist upon the necessity of sustained engagement at a site of study yielding thick description (Geertz, 1973) which recounts perspectives and events as illustrative elements in cultural analysis, but relatively few educational evaluations luxuriate in resources sufficient for long-term fieldwork. The ethnographer's gradual development of themes and cultural categories from layers of redundancy in extensive observational and interview data becomes, for qualitative evaluators, compressed by the contract period. Consequently, special care is needed in the selection of occasions for observation and in the selection of interviewees, complemented by alertness to unanticipated opportunities to learn. Still, the data collected will almost certainly be too little to satisfy ethnographers, too much to reduce to unambiguous interpretation, and too vulnerable for comfort to complaints about validity. Compared to the quantitative, qualitative findings encounter disproportionate challenge regarding validity despite common acknowledgment that "there are no procedures that will regularly (or always) yield either sound data or true conclusions" (PhiIlips,1987, p. 21). Qualitative researchers have responded with conflicting arguments that their work satisfies the traditional notion of validity (see LeCompte & Goetz, 1982) and that the traditional notion of validity is so irrelevant to qualitative work as to be absurd (see Wolcott, 1990). Guba's (1981) effort to conceptualize validity meaningfully for qualitative work generated the venerable alternative term, "trustWorthiness" (Lincoln & Guba, 1985, p. 218), which has been widely accepted. Data Collection Three data collection methods are the hallmarks of qualitative work: observa- tion, interview, and review and analysis of documents and artifacts. These methods provide the empirical bases (or colorful accounts highlighting occurrences and the experiences and perceptions of participants in the program studied. Observation is generally unstructured, based on realization that predetermined protocols, as they direct focus, also introduce blinders and preconceptions into the data. The intent of structured protocols may be to reduce bias, but bias can be seen in the prescriptive categories defined for recording observational data, categories that predict what will be seen and prescribe how it will be documented, categories that preempt attention to the unanticipated and sometimes more meaningful observable matters. Structured observation in qualitative work is - ,"- .' .:' .. .... '" .;.' 0- .,' :":::.'. 172 Mabry reserved for relatively rare program aspects regarded as requiring single-minded attention. Interviews tend, for similar reasons, to be semi-structured, featuring flexible use of prepared protocols to maximize both issue-driven and emergent information gathering (see also Rubin & Rubin, 1995). Review of relevant documents and artifacts (see Hodder, 1994) provides another and more unobtrusive means of triangulation by both method and data source (Denzin, 1989) to strengthen data quality and descriptive validity (Maxwell, 1992). Hybrid, innovative, and other types of methods increasingly augment these three data collection mainstays. Bibliometrics may offer insight into scholarly impact, as the number of citations indicate breadth of program impact (e.g., House, Marion, Rastelli, Aguilera, & Weston, 1996). Videotapes may document observation, .serve as stimuli for interviews, and facilitate repeated or collective analysis. Finding vectors into informants' thinking, read-aloud and talk-aloud methods may attempt to convert cognition into language while activities are observed and documented. "Report-and-respond forms" (Stronach, Allan, & Morris, 1996, p. 497) provide data summaries and preliminary interpretations to selected stakeholders for review and revision, offering simultaneous opportunity for further data collection, interpretive validation, and multi-vocal analysis. Technology opens new data collection opportunities and blurs some long- standing distinctions: observation of asynchronous discussion, of online and distance education classes, and interactions in virtual space; documentation of process through capture of records; interview by electronic mail, and so forth. Restraining the impulse toward premature design creates possibilities for discovery of foci and issues during data collection. Initial questions are refined in light of incoming information and, reciprocally, refined questions focus new data collection. The relationship between data collection and analysis is similarly reciprocal; preliminary interpretations are drawn from . data and require verification and elaboration in further data collection. Articulated by Glaser and Strauss (1967) as the constant comparative method, the usual goal is grounded theolY, that is, theory arising inductively from and grounded in empirical data. For qualitative evaluators, the goal is grounded interpretations of program quality. Evaluation by qualitative methods involves continual shaping and reshaping through parallel dialogues involving design and data. data and interpretation, evaluator perspectives and stakeholder perspectives, internal perceptions and external standards. These dialogues demand efforts to confirm and disconfirm, to search beyond indicators and facts which may only weakly reflect meaning. Interpretation, like data responding to emergent foci and issues, tends to multiply meanings, giving qualitative methods its expansionist character. Are qualitative methods, revised on the fly in response to the unanticipated, sufficient to satisfy the expectations of science and professionalism? Lacking the procedural guarantees of quality presumed by quantitative inquirers, Smith (1994) has claimed that "in assuming no connection between correct methods and true accounts, extreme constructivists have seemingly abandoned the search for the warrant for qualitative accounts" (p. 41). But the seeming abandonment .;. ) ng single-minded .ctured, featuring en and emergent of relevant ::lther and more . source (Denzin, lell, 1992). Iy augment these Jht into scholarly ram impact (e.g., es may document ated or collective Id and talk-aloud hile activities are ronach, Allan, & interpretations to leous opportunity Iti-vocal analysis. :>lurs some long- 'n, of online and documentation of 'y electronic mail, s possibilities for stions are refined estions focus new nalysis is similarly lata and require Ited by Glaser and I goal is grounded in empirical data. tions of program ping and reshaping nd interpretation, 1 perceptions and m and disconfirm. y reflect meaning. t foci and issues. nsionist character. the unanticipated, llism? Lacking the : inquirers, Smith n correct methods ndoned the search ling abandonment r. . '. , . In Living Color: Qualitative Methods in Educational Evaluation 173 of warrant is actually a redirection of efforts - qualitative practitioners seek substantive warrants rather than procedural ones. Quality in qualitative work is more a matter of whether the account is persuasive on theoretical, logical. and empirical grounds, less a matter of strict adherence to generalized, decon- textualized procedures. Validity is enhanced by triangulation of data. deliberate attempts to confirm, elaborate, and disconfirm information by seeking out a variety of data sources, applying additional methods, checking for similarities and dissimilarities across time and circumstance. The data, the preliminary interpretations, and drafts of reports may be submitted to diverse audiences, selected on the bases of expertise and sensitivity to confidentiality, to try to ensure "getting it right" 1973, p. 29). Critical review by evaluation colleagues and external substantive experts may also be sought, and metaevaluation is advisable (as always) as an additional strategy to manage subjective bias, which cannot be eliminated whatever one's approach or method. Interpretation The data collected by qualitative methods are typically so diverse and ambiguous that even dedicated practitioners often feel overwhelmed by the interpretive task. The difficulty is exacerbated by the absence of clear prescriptive procedures, making it necessary not only to determine the quality of a program but also to figure out how to determine the quality of a program. The best advice available regarding the interpretive process is rather nebulous (Erickson, 1986; Wolcott, 1994), but some char.acteristics of qualitative data analysis are foundational: 1. Qualitative interpretation is inductive. Data are not considered illustrations or confirmations of theories or models of programs but, rather, building blocks for conceptualizing and representing them. Theoretical triangulation (Deilzin, 1989) may spur deeper understanding of the program and may surface rival explanations for consideration. but theories are not the a priori impetus for study. not focal but instrumental to interpretation. Rival explanations and different lenses for interpreting the data from a varietY of theoretical vantage points compound the expansionist tendencies of qualitative data collection and contrast with the data reduction strategies common to quantitative analysis. . 2. Qualitative interpretation is phenomenological. The orientation is emic,2 prioritizing insiders' (i.e., immediate stakeholders') views, values, interests, and perspectives over those of outsiders (e.g., theorists, accreditors, even evaluators). The emphasis on stakeholders' perceptions and experiences has sometimes been disparaged as an overemphasis leading to neglect of dis- interested external perspectives (Howe, 1992) or of salient numerical indicators of program quality (Reichardt & Rallis, 1994, p.IO). But determinations of !. :;\: . .' " : " ..: ~ ~ : . : : ~ : . , : . ...... : , ~ .. 174 Mabry program impact necessarily demand learning about the diverse experiences of participants in natural contexts. Because the respondents selected for observation and interview by evaluation designers influence what can be learned about the program, care must be taken to ensure that qualitative data document a broad band of stakeholder views, not just the interests and perceptions of clients. 3. Qualitative interpretation is holistic. Because, the program is viewed as a complex tapestry of interwoven, interdependent threads, too many and too embedded to isolate meaningfully from the patterns, little attention is devoted to identifying and correlating variables. Clarity is not dependent on distinguishing and measuring variables but deflected and obscured by decontextualizing and manipulating them. Indicators merely indicate, capturing such thin slices of programs that they may distort more than reveal. Not the isolation, correlation, and aggregation of data reduced to numerical representations but thematic analysis, content analysis, cultural analysis, and symbolic interaction ism typify approaches to qualitative data interpretation. The effort to understand involves macro- and micro-examination of the data and identification of emergent patterns and themes, both broad-brush and fine-grained. 4. Qualitative interpretation is intuitive. Personalistic interpretation is not merely a matter of hunches, although hunches are teased out and followed up. It is trying hard to understand complex phenomena from mUltiple empirical and theoretical perspectives, searching for meaning in the baffling and outlying data as well as in the easily comprehended. It can be as difficult to describe and justify as to employ non-analytic analysis, reasoning without rationalism. Qualitative findings are warranted by data and reasoned from data, but they are not the residue of easily articulated procedures or of simple juxtapositions of performances against preordained standards. Analysis is not an orderly juggernaut of recording the performances of program components, comparing performances to standards, weighting, and synthesizing (see especially Scriven, 1994; see also Stake, et aI., 1997). Rather, the complexity of the program, of the dataset, and of the interpretive possibilities typically overwhelm criteriality and linear procedures for movement from complex datasets to findings. The effort to understand may, of course, include rationalistic and even quantitative procedures, but more-or-Iess standardized formalities generally give way to complex, situated forms of understanding, forms sometimes unhelpfully termed irrational (see Cohen, 1981, pp. 317-331). Qualitative interpretation sometimes borrows strategies from the literary and visual arts, where the capacity of expressiveness to deepen understanding has long been recognized (see Eisner, 1981). Narrative and metaphoric and artistic renderings of datasets can open insightful lines of meaning exposition, greatly enhancing personal comprehension and memorability (Carter, 1993; Eisner, 1981; Saito, 1999). Such interpretation can open rather than finalize discussion, encouraging deep understanding but perhaps at the expense of definitive findings. ..; , : j .. experiences of :nts selected for nce what can be 1t qualitative data the interests and m is viewed as a 00 many and too ittte attention is lot dependent on md obscured by merely indicate, more than reveal. lced to numerical :ural analysis, and Ita interpretation. nation of the data broad-brush and rpretation is not t and followed up. nultiple empirical ffling and outlying fficult to describe thout rationalism. 'om data, but they lple juxtapositions is not an orderly onents, especially Scriven, )f the program, of ically overwhelm mplex datasets to e rationalistic and rdized formalities erstanding, forms ,pp.317-331). n the literary and understanding has phoric and artistic exposition, greatly ter, 1993; Eisner, finalize discussion, definitive findings. r I I I' In Living Color: Qualitative Methods in Educational Evaluation 175 As beauty is in the eye of the beholder, different audiences and even different evaluators render unique, irreplicable interpretations of program quality (see, e.g., Brandt, 1981). The diversity of interpretive possibilities, not uncommon in the experiences of clients and evaluators of all stripes, brings into sharp focus not only problems of consensus and closure but also of bias, validity, and credibility. Noting that, in evaluation, "judgments often involve multidimensional criteria and conflicting interests," House (1994), among others, has advised, "the evaluator should strive to reduce biases in making such judgments" (p. 15). But bias can be difficult to recognize, much less reduce, especially in advance, especially in oneself. Naturally occurring diversity in values, in standards of quality, in experiential understandings, and in theoretical perspectives offer many layers of bias. Even methodological choices inject bias, and many such choices must be made. Bias bleeds in with the social and monetary rewards that come with happy clients, a greater temptation where methods require greater social interaction, and with political pressures large and small. Subjective understanding is both the point of qualitative evaluation and its Achilles' heel. Reporting Consistent with attention to stakeholder perceptions and experiences in data collection and interpretation, qualitative evaluation reporting aims for broad audience accessibility, for vicarious experience of naturalistic events, and for representation of stakeholder perspectives of those events. Narratives which reveal details that matter and which promote personal and allusionary connections are considered important to the development of understanding by audiences, more complex understanding than is generally available from other scientific reporting styles (Carter, 1993). Development of implicit understanding by readers is more desirable than the development of explicit explanations (see von Wright, 1971) because personalistic tacit knowledge is held to be more productive of action than is abstracted propositional knowledge (Polanyi, 1958). Consequently, qualitative representations of programs feature experiential vignettes and interview excerpts which convey multiple perspectives through narratives. Such reporting tends to be engaging for readers and readerly - that is, borrowing a postmodernist term, consciously facilitative of meaning construction by readers. Advantageous as the benefits of experiential engagement and understanding are, there is a significant disadvantage associated with qualitative reporting: . length. For those evaluation audiences interested only. in the historically enduring question, "What works?" and their brethren whose appetites stretch no farther than executive summaries, the voluminousness of an experiential report with a cornucopia of perspectives is problematic. Some clients, funders, and primary stakeholders are eager for such informativeness, but others are irritated. Multiple reports targeted for specific groups can help some, although the gains in utility compete with costs regarding feasibility. Like other trade-offs in 176 Mabry qualitative inquiry, those involving report length and audience desires are not easily resolved. ISSUES IN QUALITATIVE EVALUATION Issues introduced in the foregoing discussion of methods will be clustered for further attention here under the categories of The Program Evaluation Standards (Joint Committee, 1994): feasibility, accuracy, propriety, and utility. Feasibility Qualitative fieldwork requires the devotion of significant resources to accumu- lating data about day-to-day events to support development and documentation of patterns and issues illuminative for understanding program quality. Time for the collection and interpretation ofvoluminous datasets, time for the emergence of issues and findings, time for validation and interpretation, time for creation of experiential and multi-vocal reports - time is needed at every stage of qualitative inquiry, time that is often painfully constrained by contractual timelines and resources. The methodological expertise needed for each stage of a qualitative evaluation is not generously distributed within the population, and training and experience take even more time. Identifil=ation, preparation, and coordination of a cadre of data collectors may strain evaluation resources. Substantive expertise, also needed, often requires further expansion and resources. One may well ask whether qualitative work can be done well, whether it can be done in a timely manner, whether it can be done at all under ordinary evaluation circum- stances. Scarce as they may be, logistical resources and methodological expertise are less inherently troublesome than is accurate representation of the perspectives of mUltiple stakeholders. For the most part, broad professional discussion has not progressed beyond expressions of interest in stakeholder perspectives and, in some approaches, in the involvement of stakeholders in some or all evaluation processes. Serious attention has not yet been devoted to the difficulty of fully realizing and truly representing diverse stakeholders; especially since the interests of managers, who typically commission evaluations, may contrast with those of program personnel and beneficiaries. Nor does the evaluation literature brim with discussion of the potential for multiple perspectives to obstruct consensus in decision-making. Documentation of stakeholder perspectives in order to develop understanding of the multiple realities of program quality is significantly obstructed by the complexity and diversity of those perspectives and by contractual and political circumstances. i .i ) . . lesires are not clustered for Ilion Standards ty. :es to accumu- :tocumentation ality. Time for the emergence for creation of e of qualitative timelines and a qualitative td training and :oordination of nlive expertise, One may well be done in a uation circum- J expertise are Ie perspectives discussion has lectives and, in all evaluation fficulty of fully ally since the y contrast with ation literature es to obstruct ;rerspectives in quality is and i. ., '.: .. . ". .. ' I , '.:' . I . .1 . ,'. In Living Color: Qualitative l\tIethods in Educational Evaluation 177 Accuracy Awareness of the complexity of even small programs, their situationality, and their fluidity has led qualitative evaluators to doubt quantitative representations of programs as "numbers that misrepresent social reality" (Reichardt & Rallis, 1994, p. 7). Organizational charts, logic models, budgets - if these were enough to represent programs accurately, qualitative methods would be a superfluous luxury, but these are not enough. Enrollment and graduation figures may say more about the reputation or cost or catchment area of a teacher preparation program than about its quality. Growth patterns may be silent regarding personnel tensions and institutional stability. Balanced budgets may be uninformative about the appropriateness of expenditures and allocations. Such data may even deflect attention counterproductively for understanding program quality. But the addition of qualitative methods does not guarantee a remedy for the insufficiency of quantitative data. In evaluation, by definition a judgment-intense enterprise,l concerns persist about the potential mischief of subjective judgment in practice. It is not subjectivity per se but its associated bias and incursions into accuracy that trouble. In distinguishing qualitative from quantitative evaluation. Datta (1994) claims that "the differences are less sharp in practice than in theoretical statements" (p. 67). But it is the subjectivity of the practicing qualitative evaluator, not that of the quantitative evaluator or of the evaluation theorist, which has particularly raised questions regarding accuracy. Qualitative methodologists are familiar with the notion of researcher-as-instrument, familiar with the vulnerability to challenge of interpretive findings, familiar with the necessity of managing subjectivity through such means as triangulation, validation. and internal and external review, but the familiar arguments and strategies offer limited security. . Qualitative evaluation datasets - any datasets - are biased. There is bias in decisions about which events to observe, what to notice and document, how to interpret what is seen. There is bias in every interviewee's perspective. Every document encapsulates a biased viewpoint. Because of the prominence of subjective data sources and subjective data collectors and especially because of reliance on subjective interpretation, consciousness of the potential for bias in qualitative work is particularly strong. The skepticism associated with subjectivity works against credibility, even when triangulation, validation, and peer review are thoroughly exercised . Even more challenging is the task of accurate representation of various stakeholders. Postmodernists have raised issues that many qualitative evaluators take to heart: whether outsiders' portrayals of insiders' perspectives ne'cessarily misrepresent and objectify humans and human experiences, whether authors of reports have legitimate authority to construct through text the realities of others, whether the power associated with authorship contributes to the intractable social inequities of the status quo (Brodkey, 1989; Derrida, 1976; Foucault, 1979; Lyotard, 1984). These problems may leave evaluation authors writing at an ironic . ,. {: .j .',J , . , Lt . ; . , .;
. . : f I " j ." = .. : .' . .. .
. 178 Mabry distance from their own reports as they attempt, even as they write, to facilitate readers' deconstructions (Mabry, 1997), producing open texts which demand participation in meaning construction from uncomfortable readers (see Abma, 1997; McLean, 1997). Presentation of unresolved complexity and preservation of ambiguity in reports bewilders and annoys some readers, especially clients and others desirous of clear external judgments and specific recommendations. Tightly coupled with representation is misrepresentation (Mabry, 1999b, 1999c); with deep understanding, misunderstanding; with usefulness, misuse. The very vividity of experiential accounts can carry unintended narrative.fraud. Even for those who wish to represent it and represent it fully, truth is a mirage . When knowledge is individually constructed, truth is a matter of perspective. Determining and presenting what is true about a program, when truth is idiosyncratic, is a formidable obligation. If truth is subjective, must reality be? Since reality is apprehended subjectively and in no other way by human beings and since subjectivity cannot be distilled from the apprehension of reality, it follows that reality cannot be known with certainty. The history of scientific revolution demonstrates the fragility of facts, just as ordinary experience demonstrates the frequent triumph of misconceptions and preconceptions. No one's reality, no one's truth quite holds for others, although more confidence is invested in some versions than in others. Evaluators hope to be awarded confidence. but is it not reasonable that evaluation should be considered less than entirely credible, given the op-art elusiveness of truth? Can there be a truth, a bottom line, about programs in the postmodern era? If there were, what would it reveal. and what would it obscure? How accurate and credible must - can - an evaluation be? Accuracy and credibility are not inseparable. An evaluation may support valid inferences of program quality and valid actions within and about programs but be dismissed by non-believers or opponents, while an evaluation saturated with positive bias or simplistic superficialities may be taken as credible by happy clients and funding agencies. Suspicion about the accuracy of an evaluation, well- founded or not, undermines its credibility. In an era of suspicion about representation, truth. and even reality, suspicion about accuracy is inevitable. The qualitative commitment to multiple realities testifies against the simpler truths of positivist science, against its comforting .correspondence theory of truth,4 and against single truths - even evaluators' truths. Alas, accuracy and credibility are uneven within and across evaluation studies partly because truth is more struggle than achievement. Propriety In addition to the difficulties noted regarding feasibility and accuracy, qualitative evaluation, as all evaluation, is susceptible to such propriety issues as conflicts of interest and political manipulation. Dependent as it is on persons, qualitative fieldwork is particularly vulnerable to micropolitics, to sympathy and persuasion j.: . .1 : I. 'i '. ;I ....... . . ' . to :te, to facilitate which demand (see Abma, preservation of ally clients and
Mabry, 1999b, ulness, misuse. larrative fraud. Jth is a mirage. of perspective. when truth is Jed subjectively not be distilled be known with ragility of facts, t triumph of 'uth quite holds ; than in others. easonable that iven the op-art )rograms in the )uld it obscure? 'iy support valid It programs but I saturated with dible by happy !valuation, well- uspicion about cy is inevitable. nst the simpler , lence theory of s, accuracy' and Iy because truth . racy, qualitative es as conflicts of ions, qualitative , and persuasion r j , , -.'!' ' . " '. '.:.! J ! ":,1 I' . . . ,:1 . ,I ;.. .... . Or .,-' : ) , ,j ", :. In Living Color: Qualitative Methods in Educational Evaluation 179 at a personal and sometimes unconscious level. The close proximity between qualitative evaluators and respondents raises special issues related to bias, ethics, and advocacy. Given the paucity of evaluation training in ethics (Newman & Brown, 1996) and the myriad unique circumstances which spawn unexpected ethical problems (Mabry, 1999a), proper handling of these issues cannot be assured. Misuse of evaluation results by stakeholders mayor may not be harmful, may or may not be innocent, mayor may not be programmatically, personally, or politically expedient. Misuse is not limited to evaluation results - evaluations may be commissioned to stall action, to frighten actors, to reassure to stimulate change, to build or demolish support. Failure to perceive stakeholder intent to misuse and failure to prevent misuse, sometimes unavoidable, may nevertheless raise questions regarding propriety. Not only stakeholders but evaluators, too, may misuse evaluation. Promotionalism of certain principles or certain stakeholders adds to the political swirl, subtracts from credibility, and complicates propriety. Whether reports should be advocative and whether they can avoid advocacy is an issue which has exercised the evaluation community in recent years (Greene & Schwandt, 1995; House & Howe, 1998; Scriven, Greene, Stake, & Mabry, 1995). The inescapability of the evaluator's personal values, as fundamental undergirding for reports, has been noted (Mabry, 1997), a recognition carrying over from qualitative research (see especially Lincoln & Guba, 1985), but resisted by objectivist evaluators focused on bias management through design elements and criterial analysis (see especially Scriven, 1994, More explosive is the question of whether evaluators should (or should ever) take explicit, proactive advocative positions in support of endangered groups or principles as part of their professional obligations (see Greene, 1995, 1997; House & Howe, 1998; Mabry, 1997; Scriven, 1997; Stake, 1997; Stufflebeam, 1997). Advocacy by evaluators is seen as an appropriate assumption of responsibility by some and as a misunderstanding of responsibility by others. Beneath the arguments for and against advocacy can be seen personal allegiances regarding the evaluator's primary responsibility. Anti-advocacy proponents prioritize evaluation information delivery, professionalism, and credibility. Pro-advocacy proponents' prioritize the program, some aspect of it, its field of endeavor, such as education (Mabry, 1997) - or more broadly, to principles that underlie social endeavor such as social justice (House, 1993), deliberative democracy (House & Howe, 1999), the elevation of specific or historically underrepresented groups (Fetterman, 1996; Greene, 1997; Mertens, 1999). The focus is more directly on human and societal interests than on information and science. At issue is whether evaluation should be proactive or merely instrumental in advancing human, social, and educational agendae - the nature and scope of evaluation as change agent. Methodological approaches that pander to simplistic conceptions of reality and of science raise a different array of propriety issues. Rossi (1994) has observed that "the quants get the big evaluation contracts" (p. 25), that the :'.; .... : . . - I _ ~ ; ~ \ .... 180 Mabry lopsided competition among evaluation professionals regarding approach and scale "masks a struggle over market share" (p. 35),. and that "the dominant discipline in most of the big firms is economics" (p. 29). This is problematic in the evaluation of massive educational programs sponsored by organizations such as the World Bank (Jones, 1992; Psacharopoulos & Woodhall, 1991), for example, because education is not properly considered merely a matter of economics. Educational evaluators should beware designs which imply simple or simply economic realities and should beware demands to conduct evaluations according to such designs. Wariness of this kind requires considerable alertness to the implications of methodology and client demands and considerable ethical fortitude. Utility As an applied social science, evaluation's raison d'etre is provision of grounding for sound decisions within and about programs. Both quantitative and qualitative evaluations have influenced public policy decisions (Datta, 1994, p. 56), although non-use of evaluation results has been a common complaint among work-weary evaluators, some of whom have developed strategies (Chelimsky, 1994) and approaches (Patton, 1997) specifically intended to enhance utilization. Qualitative evaluation raises troublesome questions for utility - questions which again highlight the interrelatedness of feasibility, accuracy, propriety, and utility: Are reports too long to be read, much less used? Is it possible to ensure accurate, useful representation of diverse interests? Can reports be prepared in time to support program decisions and actions? Are they credible enough for confident, responsible use? At least for small-scale educational evaluations, qualitative work has been described as more useful than quantitative to program operators (Rossi, 1994)5 but, unsurprisingly, some quantitative practitioners hold that the "utility is extremely limited for my setting and the credibility of its findings is too vulnerable" (Hedrick, 1994, p. 50, referring to Guba & Lincoln, 1989). The invitation to personal understanding that characterizes many qualitative reports necessarily opens opportunity for interpretations different from the evaluator's. Respect for individual experience and knowledge construction motivates qualitative report-writers and presumes the likelihood of more-or-Iess contrary interpretations. The breadth and magnitude of dissent can vary greatly and can work not only against credibility but also against consensual programmatic decision-making. The qualitative characteristic of openness to interpretation highlights the questions: Use by whom? And for what? If it is not (or not entirely) the evaluator's interpretations that direct use. whose should it be? The too-facile response that stakeholders' values. criteria, or interpretations should drive decisions underestimates the gridlock of natural disagreement among competing stakeholder groups. Prioritization of the interests of managerial decision- makers, even in the interest of enhancing utility, reinforces anti-democratic -, ; ',: , . < ., . 1 . ! In Living Color: Qualitative Methods in Educational Evaluation 181 limitations on broad participation. Attention to the values, interests, and perspectives of multiple stakeholders can clarify divisions and entrench d i s s e ~ s u s . Consideration of issues related to qualitative evaluation, such as issues of epistemology and authority, make it all too clear that utility and propriety, for example, are simultaneously connected and conflicted. REALITY, REALISM, AND BEING REALISTIC Let's be realistic. The reality of educational programs is too complex to be represented as dichotomously black and white. Qualitative approaches are necessary to portray evaluands with the shades of meaning which actually characterize the multi-hued realities of programs. But while the complex nature of educational programs suggests the necessity of qualitative approaches to evaluation, the dizzying variety of stakeholder perspectives as to a program's real failures and accomplishments, the ambiguous and conflicting interpretations which can be painted from qualitative data, and the resource limitations common to evaluations of educational programs may render qualitative fieldwork unrealistic. Is educational evaluation a science, a craft, an art? Realism in art refers to . photograph-like representation in which subjects are easily recognized by outward appearance, neglecting perhaps their deeper natures. Hyperrealism refers to portrayals characterized by such meticulous concentration on minute physical details - hair follicles and the seams in c1othing:- as to demand attention to technique, sometimes deflecting it from message. Surrealism, on the other hand, refers to depiction of deep subconscious reality through the fantastic and incongruous, but this may bewilder more than enlighten. Artists from each movement offer conflicting views of what is real - views which inform, baffle, repel, and enthrall audiences. In evaluation, different approaches provide different kinds of program representations (see Brandt, 1981), with a similar array of responses from clients and other stakeholders. Whether the program is recognizable as portrayed in evaluation reports is necessarily. dependent upon the acuity of audiences as well as the skill of evaluators. Such is our daunting professional reality. According to some philosophers of art, artworks are not the physical pieces themselves but the conceptual co-creations of artists and beholders. According to some theories of reading and literary criticism, text is co-created by authors and readers. As analogies regarding meaning and authorship, these notions resonate 'with the actual experiences of evaluators. Our reports document data and interpretations of program quality, sometimes participatory interpretations, but they are not the end of the brushstroke. The utility standard implies the practical priority of stakeholder interpretations of program quality, those who ultimately make, influence, and implement program decisions. In the hands of accomplished practitioners, educational evaluation may seem an art form, but most clients expect not art but science - social science, applied ',' j " " ,. 'r . ",; I . i ','I '., '.: ~ . I 182 Mabry science. Programs have real consequences for real people, however multiple and indeterminate the reality of programs may be. Such realization suggests need for complex qualitative strategies in evaluation, with all the living color associated with real people and all the local color associated with rea) contexts, and with all the struggles and irresolutions they entail. ENDNOTES 1 Dilthey (1883) prescribed hermeneutical or interpretive research to discover the meanings and perspectives of people studied, a matter he referred to as Ve rstchen (1883). . Anthropologists have distinguished between elic accounts which prioritize the meanings and explanations of outside observers from emic accounts which prioritize indigenous meanings and understandings (see Seymour-Smith, 1986, p. 92). l Worthen, Sanders, and Fitzpatrick note that, "among professional evaluators, there is no uniformly . agreedupon definition of precisely what the term evaluation means. It has been used by various evaluation theorists to refer to a great many disparate phenomena" (1997, p. S, emphasis in the original). However, the judgmental aspect, whether the judgment is of the evaluator or someone else, is consistent across evaluators' definitions of evaluation: (I) Worthen, Sanders, & Fitzpatrick: "Put most simply, evaluation is determining the worth or merit of an evaluation object" (1997, p. 5). (2) Michael Scriven: "The key sense of the term 'evaluation' refers to the process of determining the merit, wolth, or value of something, or the product of that process" (1991, p. 139, emphasis in the original). (3) Ernest House: "Evaluation is the determination of the merit or worth of something, according to a set of crileria, with those criteria (often but not always) explicated and justified" (1994, p. 14, emphasis added). The positivist correspondence theory of truth holds that a representation is true if it corresponds . exactly to reality and is verifiable by observation. S Note, however, that the very helpfulness of these evaluations has led to claims that they arc not evaluations at all but rather, "management consultations" (Rossi, 1994, p. 33; Scriven, 1998). REFERENCES Abma, T. (1997). Sharing power, facing ambiguity. In L. Mabry (Ed.), in program evaluation: VoL 3 Evaluation and the post-modem dUemma (pp. 105-119). Greenwich, CI': JAI Press. Brandt, R.S. (Ed.). {19Bl}. Applied strategies for cumcuiwn evaluation. Alexandria, VA: ASCD. Brodkey, L. (1989). On the subjects of class and gender in "The literacy letters." College English, 51, 125-141. Campbell, D.T. & Stanley, J.C. (1963). Experimental tmd qllasierperimental designs for research. Boston: Houghton-Mifflin. Carter, K. (1993). The place of story in the study of teaching and teacher education. Educational Researcher, 22(1), 5-12, 18. Chelimsky, E. (1994). Evaluation: Where we are. Evaluation Practice, 15(3),339-345. Cohen, L.1. (1981). Can human. irrationality be experimentally demonstrated? Behavioral and Brain Sciences, 4, 317-331. Dalla, L. (1994). Paradigm wars: A basis for peaceful coexistence and beyond. In C.S. Reichardt, & S.F. Rallis (Eds.), The qualitative-quantitative debate: New perspectives New Directionsfor Program Evaluation, 61, 153-170. Datta, L. (1997). Multimethod evaluations: Using case studies together with other methods. In E. Chelimsky, & W.R. Shadish (Eds.), Evaluation for the 21s1 century: A handbook (pp. 344.JS9). Thousand Oaks, CA: Sage. Denzin, N.K. (1989). The research act: A theoretical introduction to sociological metllods (lrd ed.). Englewood Cliffs, NJ: Prentice Hall. T , i I ! , 1 I ! i r multiple and :gests need for Ilor associated ts, and with all In Living Color: Qualitative Methods in Educational Evaluation 183 Dentin, N.K. (1997). Interpretive ethnography: Ethnographic pl'actices for the 21st century. Thousand Oaks, CA: Sage. Denzin, N.K. & Lincoln, Y.S. (2000). Handbook of qualitative reseal'ch (2nd cd.). Thousand Oaks, CA: Sage. Derrida, J. (1976). On grammatology (trans. G. Spivak). Baltimore, MD: Johns Hopkins University Press. Dilthey, W. (1883). The development of hermeneutics. In H.P. Richman (Ed.), W. DUthey: Selected writings. Cambridge: Cambridge University Press. Eisner, E.W. (1981). On the differences between scientific and artistic approaches to qualitative research. Educational Researcher, 10(4}, 5-9. Eisner, E.W. (1985). The art of educational evaluation: A persona/view. London: Falmer. Eisner, E.W. (1991). Tht enlightened eye: Qualitative inquiry and the tnhancement of educational practice. NY: Macmillan. Erickson, F. (1986). Qualitative methods in research on teaching. In M.C. Wittrock (Ed.), Hcmdbook of research on teaching (3rd ed.), (pp. 119-161). New York: Macmillan. Fetterman, D.M. (1996). Empowerment evaluation: Knowledgt and tools for selfnssessment and accountabUity. Thousand Oaks, CA: Sage. Foucault, M. (1979). What is an author? Screen, Spring. GeertI, C. (1973). The interpretation of cultures: Selected essays. New York: Basic Books. Glaser, B.G. & Strauss, A.I. (1967). The discovery of grounded theory. Chicago, IL: Aldine. Greene, J.C. (1994). Qualitative program evaluation: Practice and promise. In N.K. Denzin & Y.S. Lincoln (Eds.), Handbook of qualitative research (pp. 530-544). Newbury Park, CA: Sage. Greene, J.C. (1997). Participatory evaluation. In 1.. Mabry (Ed.), Advances in program evaluation: Evaluation and the post-modem dilemma (pp. 171-189). Greenwich, CT: JAI Press. Greene, J.C., Caracelli, v., & Oraham, w.F. (1989). Toward a conceptual framework for multimethod evaluation designs. Educational Evaluation and Policy Analysis, 11,255-274. Greene, J.O. & Schwandt, T.A. (1995). Beyond qualuativt evaluation: The significance o/"positioning" oneself. Paper presentation to the International Evaluation Conference, Vancouver, Canada. Ouba. E.G. (1978). Toward a methodology of naturalistic inquiry in educational evaluation. Monograph 8. Los Angeles: UCLA Center for the Study or Evaluation. Guba, E.G. (198J). Criteria for assessing the trustworthiness of naturalistic inquiries. Educational Communication and Technology Journal, 29, 75-92. Guba, E.G. & Lincoln, Y.S. (1989). Fourth genef'lJtion evaluation. Thousand Oaks, CA: Sage. Hedrick, 'IE. (1994). The quantitalive-qualitative debate: Possibilities for integration. In C.s. Reichardt & S.F. Rallis (Eds.). The qualiladve-quantuative debate: New perspectives. New Directions for Program Evaluation, 61,145-152. Hodder, 1. (1994). The interpretation of documents and material culture. In Denzin, N.K.. & Lincoln, Y.S. (Eds.), Handbook of qualitative research (pp. 403-412). Thousand Oaks, CA: Sage. House, E.R. (1993). Professional tvaluation: Social impact and poluical consequences. Newbury Park, CA: Sage. House. E.R. (1994). Integrating the quantitative and qualitative. In C. S. Reichardt, & S. F. Rallis (Eds.), The qualitativequantitative defHIte: New perspectives. New Directions for Program Evaluation, 61, 113-122- House, E.R., & Howe, K.R. (1998). The issue of advocacy in evaluations. Anwican Journal of Evaluation, 19(2}, 233-236. House. E.R . & Howe, K.R. (1999). Values UI evaluation and social resealrh. Thousand Oaks, CA: Sage. House, E.R . Marion. S.F.. Rastelli, 1.., Aguilera, D., & Weston, T. (1996). Evaluating R&D impact. University of Colorado at Boulder: Unpublished report. Howe, K. (1992). Getting over the quantitative-qualitative debate. American Journal of Education, 100(2). 236-256. . Joint Committee on Standards (or Educational Evaluation (1994). The program tvaluation stalUulffu: How to assess evaluations of educational programs (2nd cd.). Thousand Oaks, CA: Sage. Jones. P. (1992). World Bank fUUlllcing of education: Lending, learning and development. London: Routledge. leCompte, M.D. & Ooetz, J.P. (1982). Problems of reliability and validity in ethnographic research. &view of Educational Research. 52,31-60. ,.1, , . ~ , : . '. ':\" , ',If", 't , .. '. ': ',: .'- ; ~ ; ; , - ", _' r . - - ~ ~ - - : ' " 184 Mabry LeCompte, M.D. & Preissle, 1. (1993). Ethnography Ilnd qlla/ilCllive design illeducatiUlta/ research (2nd ed.). San Diego: Academic Press. lincoln, Y.S. & Guba, E.G. (1985). Natura/islic inquit)l. Newbury Park, CA: Sage. Lyotard, J . F. (198-1). The pOJlmodem condi/ion: A rcPtJT/ on knowledge. Minneapolis: University of Minnesota Press. Mabry, L. (Ed.). (1997). Advances in program tva/llalion: VtJI. J. EVII/ltalion and the posl.modem dilemma. Greenwich. Cf: JAI Press. Mabry, L. (1998a). Case study methods. In HJ. Walberg. & A.J. Reynolels (Eds.), Advances in educational productivity: Vol. 7. Evalutllion research for educalional productivity (pp. 155-170). Greenwich, cr: JAI Press. Mabry, L. (1998b). A forward LEAP: A study of the involvement of Beacon Street Art Gallery and Theatre in the Lake View Education and Arts Partnership. In D. Boughton & K.G. Congdon (Eels.), Advances in program evaluation: Vol. 4. Evaluating un education progrom1 in community centel1: International perspeClives on problems of conception ond practice. Greenwich. Cf: JAI Press. Mabry, L. (1999a). Circumstantial ethics. American Journal of Evaluation, 20(2), 199-212. Mabry, t. (1999b. April). On representation. Paper presented an invited symposium at the annual meeting of the American Educational Research Association, Montreal. Mabry. L. (l999c, November). Truth and nantl/ive repre.ftntation. Paper presented at the annual meeting of the American Evaluation Association. Orlando. FL. Maxwell, J.A. (1992). Understanding and validity in qualitative research. HaT\lard Educational &view, 62(3).279-300. Mclean. L.O. (1997). It in search of truth an evalu:llor. In L. Mabry (Ed.). Advances in program evalualion: Evaluation and Ihe post.modem dilemma (pp. 139-153). Greenwich, CT: JAI Press. Mertens, D.M. (1999). Inclusive evaluation: Implications of transformalive theory for evaluation . American Journal of EIIQfuation. 20, 1-14. Miles. M.B., & Huberman, A.M. (1994). Qua/halive data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage. Newman. D.L. & Brown. R.D. (1996). Applied ethics for program evaluation. Thousand Oaks, CA: Sage. PaIlOn, M.O. (1997). Utilizationfocused tvaluotion (3rd ed.). Thousand Ouks. CA: Sage. Phillips, D.C. (1987). Validity in qualitative research: Why the worry about warrant will not wane. Education and Urban SOciety, 20, 9-24. Polanyi, M. (19S8).l'ersonaf knowledge: Towards a postcritical philosophy. Chicago, It: University of Chicago Press. Psacharopoulos, G . & Woodhall, M. (1991). Education for development: An analysis of inveslmelll choices. New York: Oxford University Press. Reichardt, C.S., & Rallis, S.F. (Eds.) (1994). The qualitalivequantitative debate: New perspectives. New Directions lor Program Evafuatinn, 61. Rossi. P.H. (1994). The war between the quais and the quants: Is a lasting peace possible? In C.S. Reichardt, & S.F. Rallis (Eds.), The qualitative.quanti/olive debate: New persptclives. New DirectiollS for Program Ellaluation, 61,23-36. Rubin, H.J., & Rubin. 1.5. (1995). Qualitative inteNiewing: The aT/ 0/ hearing data. Thousand Oaks. CA: Sage. Saito, R. (1999). A phenomenological-e.ristential approach (0 instruclional social computer simulation. Unpublished doctoral dissertation, Indiana University. Bloomington, IN. Scriven, M. (1991). Evaluation thesaurus (4th cd.). Newbury Park, CA: Sage. Scriven, M. (1994). The final synthesis. Evalumion Practice, 15(3).367-382. Scriven. M. (1997). Truth and objectivity in evaluation. In E. Chelimsky. & W.R. Shadish (Eds.). Ella/ua/ionlor the 21st century: A handbook (pp. 477-500). Thousand Oaks, CA: Sage. Scriven, M. (1998, November). An ella/uation dilemma; Change agent liS. analyst. Paper presented at the annual meeting ot the American Evaluation Association, Chicago. Scriven. M., Greene, J., Stake, R . & Mabry. L. (1995. November). Advocacy for our clients: The necessary evil in evaluation? Panel presentation to the International Evaluation Conference. Vancouver, BC. SeymourSmith, C. (1986). Diceionary of anlhropology. Boston: G. K. Hnll. Shadish, W.R., Jr., Cook, T.O. & Leviton. L.C. (199 J ). Foundations 01 program eva/liation: Theories of p,.actice. Newbury Park, CA: Sage. ... ! research (2nd University of ~ post-modem , Advancer in pp. 155-170). rt Gallery and :.G. Congdon in community 0: JA r Press. !12. at the annual at the annual I Educational es in program JAI Press. :lr evaluation. ook (2nd Cd.). nd Oaks, CA: ~ e . .viII not wane. University of of inlltstmenl , perspectives. ;sible? In C.S. VtW Directions .ousand Oaks,' ter sinwlation. had ish (Eds.), leo r presented at q c/ienJS: The 1 Conference, ,n: Theories of r f .J . ~ . In Living Color: Qualitative Methods in Educational Evaluation 185 Smith, M.L. (1994). Qualitative plus/versus quantitative: The Illst word. In C.S. Reichardt & S.F. Rallis (Eds.), The qualitativequantitative debate: New pmpectilles. New Directions fo, Progrom Ellaluation, 61, 37-44. Stake, R.E. (1973). Program evaluation, particularly responsive evaluation. Paper presented al conference on New Trends in Evaluation, GClteborg, Sweden. Reprinted in G.F. Madaus, M.S. Scriven & StuCflebeam, D.L. (1987), Evaluation models: Viewpoints on educational and human services evaluation (pp. 287-310). Boston: K1uwer-Nijhoff. Stake, R.E. (1978). The case study method in socinl inquiry. Educational Researcher, 7(2), 5-8. Stake, R.E. (1997). Advocacy in evaluation: A necessary evil? In E. Chelimsky, & w.R. Shndish (Eds.), Evaluation for thl! 21st century: A handbook (pp. 470-476). Thousand Oaks, CA: Sage. Stake, R.E. (2000). Case studies. In N.K. Denzin, & Y.S. Lincoln (Eds.), Handbook of qualitative research (2nd ed.) (pp. 236-247). Thousand Oaks, CA: Sage. Stake, R., Migotsky, C., Davis, R., Cisneros, E., DePaul, G., Dunbar, C. Jr., et al. (1997). The . evolving synthesis of program vnlue. Evaluation Proctice, 18(2),89-103. Stronach, I., Allan, J. & Morris, B. (1996). Can the mothers of invention make virtue out of necessity? An optimistic deconstruction of research compromises in contract research and evalu:nion. British Educational Research Joumal, 22(4): 493-509. Stufflebeam, D.L. (1997). A standards-based perspective on evaluation. In L. Mabry (Ed.), Advances in program ellaluation: Vol. J Evaluation and the post-modem dilemma (pp. 61-88). Greenwich, 0: JAI Press. von Wright, G.H. (1971). Explanation and understanding. London: Routledge & Kegan Paul. Wolcott, H.F. (1990). On seeking - and rejecting - validity in qualitative research. In E.W. Eisner & A. Peshkin (Eds.), Qualitative inquiry in education: The continuing debate (pp. 121-152). New York: Teachers College Press. Wolcott. H.F. (1994). Transforming qualitative dota: Description. analysis, and interpretation. Thousand Oaks, CA: Sage. Wolcott, H.F. (1995). The art offieldwork. Walnut Creek, CA: Aha Mira. Worthen, B.R., Sanders, J.R., & Fitzpatrick, J.L. (1997). Program evaluation: Altemative approachts and practical guidelines (2nd ed.). New York: Longman. .' .1
Multimodal Metaphor and Intersubjective Experiences: The Importance of Eye-Contact in Davison's Graphic Novel The Spiral Cage and in Annie Abrahams Net-Project On Collaboration