Vous êtes sur la page 1sur 12

A True Test: Toward More Authentic and Equitable Assessment

Author(s): Grant Wiggins


Source: The Phi Delta Kappan, Vol. 70, No. 9 (May, 1989), pp. 703-713
Published by: Phi Delta Kappa International
Stable URL: http://www.jstor.org/stable/20404004 .
Accessed: 15/02/2011 00:41

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=pdki. .

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Phi Delta Kappa International is collaborating with JSTOR to digitize, preserve and extend access to The Phi
Delta Kappan.

http://www.jstor.org
Toward More Authentic and

Equlitable Assessmemsnt

s W ' tHEN AN educational


As long as we hold simplisticmonitoring tests to be
problem persists de
spite the well-inten
adequate models of and incentivesfor reaching national
tioned efforts of many
people to solve it, it's
intellectual standards,Mr. Wiggins warns, student
a safe bet that the problem hasn't been performance, teaching, and our thinkingand discussion
properly framed.Assessment in educa about assessment will remainflaccid and uninspired.
tion has clearly become such a problem,
since every state reports above-average
scores on norm-referencedachievement BY GRANT WIGGINS
tests and since everyone agrees (paradox
ically) that such tests shouldn't drive in
struction but that their number and in
fluence should nevertheless increase.'
More ominously, we seem unable to see
hisory
any moral hann in bypassing context _ s ~~~~C,:
ea e+
sensitive human judgments of human
abilities in the name of statistical accura
cy and economy.
-~~~science -
We haven'tmoved beyond lamenting
these problems, because we have failed
to stop and ask some essential questions:
Just what are tests meant to do? Whose
purposes do they (and should they) serve?
Are large-scale testingprogramsneces
sary? When are tests that are designed
tomonitor accountability harmful to the
educational process? Need they be so in
trusive? Is there an approach to uphold
ing and examining a school's standards
that might actually aid learning?
But we won't get far in answering these
questions until we ask themost basic one:
What is a true test? I propose a radical
answer, in the sense of a return to the
roots; we have lost sight of the fact that
a true test of intellectual ability requires
of exemplarytasks.First,
theperforimance wo.y
authentic assessments replicate the chal
lenges and standards of performance that
typicallyfacewriters, businesspeople,sci
GRANTWIGGINS is a senior associate with
the National Center on Education and the
Economy, Rochester, N. Y, and a special con
sultant on assessment for the Coalition of Es
sential Schools.

Illustrationby Jean K. Stephens MAY 1989 703


entists, communityleaders,designers,or Students acknowledge this truthwith their
historians.These includewriting essays plaintive query, Is this going to be on the
and reports, conducting individualand test? And their instincts are correct; we
group research,designingproposalsand should not feel despair about such a view.
mock-ups, assemblingportfolios, and so The test always sets the de facto standards
on. Second, legitimateassessments are of a school despite whatever else is lsing authentic
responsive to individualstudentsand to proclaimed. A school should "teach to the
school contexts. Evaluation is most ac test." The catch is that the test must of
standards and
curate and equitable when it entails hu fer students a genuine intellectual chal tasks to judge
man judgment and dialogue, so that the lenge, and teachers must be involved in
person tested can ask for clarification of designing the test if it is to be an effec
intellectual
questions and explain his or her answers. tive point of leverage. ability is labor
A genuine testof intellectualachieve
ment doesn'tmerely check "standardized" intensive and
SETTING STANDARDS
work in a mechanical way. It reveals time-consuming.
achievement on the essentials, even if they We need to recognize from the out
are not easily quantified. In other words, set that the problems we face are more
an authentic test not only reveals student ecological (i.e., political, structural, and
achievement to the examiner, but also re economic) than technical. For example,
veals to the test-taker the actual chal Norman Frederiksen, a senior research
lenges and standards of the field. er with the Educational Testing Service mass testing is that using authentic stan
To use a medical metaphor, our con (ETS), notes that "situational tests are not dards and tasks to judge intellectual abil
fusion over the uses of standardized tests widely used in testing programs because ity is labor-intensiveand time-consum
is akin tomistaking pulse rate for the to of considerations having to do with cost ing. Examiners must be trained, and mul
tal effect of a healthful regimen. Stan and efficiency."2 In order to overcome tiple, contextual tests of the students must
dardized tests have no more effect on a the resistance to using such situational be conducted.Genuine testsalsomake it
student's intellectual health than taking a tests, we must make a powerful case to more difficult to compare, rank, and sort
pulse has on a patient's physical health. the public (and to teachers habituated to because they rarely involve one simple,
If we want standardized tests to be au short-answer tests as an adequate meas definitive test with an unambiguous re
thentic, to help students learn about them ure of ability) that a standardized test sult and a single residue number. There
selves and about the subject matter or field of intellectual ability is a contradiction fore, as long as tests are thought of only
being tested, they must become more than in terms. We must show that influential in termsof accountability, real reforms
merely indicators of one superficial symp "'monitoring" tests are so irrelevant (and will be thwarted. After all, why do we
tom. even harmful) to genuine intellectual stan need to devise more expensive tests if
Reform begins, then, by recognizing dards that their cost - to student learn current data are reliable? When we fac
that the test is central to instruction. Any ing and teacher professionalism - is too tor in the self-interest of test companies
tests and final exams inevitably cast their high, however financially efficient they and of colleges and school districts, we
shadows on all prior work. Thus they not may be as a means of gathering data. can see that resistance to reform is like
only monitor standards, but also set them. The inescapable dilemma presented by ly to be strong.
The psychometriciansand theaccount
ants are not the villains, however. As I
have noted elsewhere, teachers fail to un
derstand their own unwitting role in the
growth of standardized testing.3Mass
assessmentresultedfrom legitimatecon
cern about the failure of the schools to
set clear, justifiable, and consistent stan
dards to which it would hold its gradu
ates and teachers accountable. But the
problem is still with us: high school tran
scripts tell us nothing about what a stu
dent can actually do. Grades and Car
negie unitshide vast differencesbetween
courses and schools. An A in 11th-grade
Englishmay mean merely thata student
was dutiful and able to fill in blanks on
worksheets aboutjuvenilenovels. And it
remainspossible for a studentto pass all
of his or her courses and still remain
functionallyand culturally illiterate.
'7don'tunderstandall thefuss aboutmy repeatingthirdgrade.Mr. Wilkins But the solution of imposing an effi
has been therefor six years."
cient and "standard"testhas an uglierhis

704 PHIDELTA
KAPPAN
tory. The tests grew out of the "school the educationalenterprise, such reduc
efficiency"movement in the years be tionistshortcuts,suchhigh student/teach
tween 1911 and 1916, a time oddly simi er ratios, and such dysfunctional alloca
lar to our own. The movement, spear tion of time and resources will be seen
headedby thework of FranklinBobbitt, as intolerable.
was driven by crude and harmful analo Schools and teachers do not tolerate the To design an
gies drawn fromFrederickTaylor'sman same kind of thinking in athletics, the
agementprinciples,which were used to arts, and clubs. The requirements of the
authentic test,we
improve factory production. Raymond game, recital, play, debate, or science must first decide
Callahan notes that the reformers, then fair are clear, and those requirements de
as now, were far too anxious to satisfy termine the use of time, the assignment what are the actual
external critics and to reduce complex in of personnel, and the allocation of mon performancesthat
tellectualstandardsand teacherbehaviors ey. Far more time - often one's spare
to simple numbers and traits.4Implicit time - is devoted to insuring adequate we want students
ly, therewere signs of hereditarianand practiceand success.Even in thepoorest to be good at.
social-class-basedviews of intelligence; schools, the ratio of players to inter
the tests were used as sorting mechanisms scholastic coaches is about 12 to 1.8 The
at least partly in response to the increased test demands such dedication of time;
heterogeneityof the schoolpopulationas coaching requiresone-to-one interaction.
a result of the influx of immigrants.5 And no one complains about teaching to
The "standards" were usually cast in the test in athletic competition. sure thoroughanalysis and useful feed
terms of the increased amount of work We need to begin anew, from the back to studentsabout results.
to be demanded of teachers and students. premise that a testing program must ad This reversal in thinkingwill make us
As George Strayer, head of the National dress questions about the inevitable im pay more attention to what we mean by
EducationAssociation (NEA) Commit pact of tests (and scoring methods) on evidence of knowing.Mastery ismore
tee on Tests and Standards for School Ef students and their learning. We must ask than producing verbal answers on cue;
ficiency, reported,"Wemay not hope to different questions. What kinds of chal it involves thoughtful understanding,
achieve progress except as such measur lenges would be of most educational val as well. And thoughtfulunderstanding
ing sticks are available." A school su ue to students? What kinds of challenges implies being able to do something ef
perintendent put it more bluntly: "The would give teachersuseful information fective, transformative, or novel with a
resultsof a fewwell-planned testswould about the abilities of their students? How problem or complex situation. An au
carry more weight with the businessman will the results of a test help students thentic test enables us to watch a learner
and parent than all the psychology in the know their strengthsandweaknesses on pose, tackle, and solve slightly ambigu
world."6 essential tasks? How can a school ade ous problems. It allows us to watch a
Evenwith unionizationand the insights quately communicate its standards to in student marshal evidence, arrange argu
gained from better education, modern terested outsiders and justify them, so ments, and take purposeful action to ad
teachers still fall prey to the insistent that standardized tests become less neces dress the problems.9Understanding is
claims of noneducation interests. The sary and less influential? often best seen in the ability to criticize
wishes of college admissionsofficers, of or extend knowledge, to explain and ex
employers, of budget makers, of sched plore the limitsandassumptionsonwhich
ulers, and even of the secretaries who AUTHENTIC TESTS
a theory rests. Knowledge is thus dis
enter grades on computers often take Tests shouldbe centralexperiences in played as thoughtful know-how - a
precedenceover theneeds of studentsto learning. The problems of administra blend of good judgment, sound habits,
be properly examined and the needs of tion, scoring, and between-school com responsiveness to the problem at hand,
teachers to deliberate and confer about parisons should come only after an au and control over the appropriate informa
effective test design and grading. thentic test had been devised - a rever tion and context. Indeed, genuine mas
Thus, when teachers regard tests as sal of the current practice of test design. tery usually involves even more: doing
something to be done as quickly as pos If we wish to design an authentic test, something with grace and style.
sible after "teaching" has ended in order we must first decide what are the actual To prove that an answer was not an ac
to shake out a final grade, they succumb performances that we want students to be cident or a thoughtless (if correct) re
to the same flawed logic employed by the good at.We must design those perform sponse, multiple and varied tests are re
test companies (with far less statistical ances first and worry about a fair and quired. In performance-based areas we
justification).Such acquiescence is pos thorough method of grading them later. do not assess competence on the basis of
sible only when the essential ideas and Do we judge our studentsto be deficient one performance.We repeatedlyassess
priorities ineducationareunclearor have inwriting, speaking, listening, artistic a student'swork - througha portfolio
been lost. If tests serve only as adminis creation, findingand citing evidence, and or a season of games. Over time and in
trativemonitors, then short-answer,"ob problem solving?Then let the tests ask the context of numerousperformances,
jective" tests - an ironicmisnomer7 - them towrite, speak, listen, create, do we observe thepatterns of success and
will suffice (particularlyif one teaches original research, and solve problems. failure and the reasons behind them.
128 studentsand has only a single day Only thenneed we worry about scoring Traditional tests -as arbitrarilytimed,
inwhich tograde finalexams).However, the performances, training the judges, superficialexercises (more likedrills on
if a test is seen as the heart and soul of and adapting the school calendar to in the practice field than like a game) that

MAY 1989 705


are given only once or twice - leave us
ter. They know that what one learns in
with noway of gauging a student'sabili drill is never adequate to produce mas
ty tomake progress over time. tery.
We typically learn toomuch about a That is why most so-called "criterion
student'sshort-termrecall and too little referenced" tests are inadequate: the
aboutwhat ismost important:a student's M ost so-called problems are contrived, and the cues are
habitsof mind. In talkingabouthabitsof
mind, Imean somethingmore substan "criterion-referenced" artificial. Such tests remove what is cen
tral to intellectualcompetence:theuse of
tive than "process"skills divorced from tests are inadequate judgmentto recognizeandpose complex
context - the formalismdecried by E.
D. Hirsch and others. For example, a because the problems as a prelude to using one's dis
crete knowledge to solve them.Authen
new concept - say, ironyor the formu
la F = ma - can be learned as a habit
problemsare ticchallenges - be theyessays, original
contrived,and the research, or artisticperformances - are
or disposition of mind for effortlessly inherentlyambiguous and open-ended.
handling informationthathad previous
ly been confusing.10As theword habit
cues are artificial. As Frederiksen has said:
implies, if we are serious about having Most of the important problems one
studentsdisplay thoughtfulcontrol over faces are ill-structured, as are all the
ideas,a singleperformanceis inadequate. really important social, political, and
We need to observe students'repertoires, scientific problems.... But ill-struc
not rote catechisms coughed up in re ingwhether studentshave learnedwhat tured problems are not found in stan
sponse to pat questions. was taught,we should"assessknowledge dardized achievement tests.... Effi
The problem ismore serious than it in termsof its constructiveuse for fur cient tests tend to drive out less effi
first appears.The difficulty of learning ther lea.ing.... [We should assess cient tests, leaving many important
that takes into abilities untested and untaught.... All
lies in the breaking of natural but dys reading ability] in a way
this reveals a problem when we con
functionalhabits.The often-strangequal account that the purpose of learning to
sider the influence of an accountabili
ity of new knowledge can cause us to un read is to enable [students]to learnfrom ty system in education.... We need
wittinglymisunderstandnew ideasby as reading."'3All tests should involve stu a much broader conception of what a
similatingthem intoour old conceptions; dents in theactualchallenges, standards, test is.14
this is particularlytruewhen instruction and habits needed for success in the aca
is only verbal. That is why so many stu demic disciplines or in theworkplace: Put simply, what the student needs is
dents who do well on school tests seem conductingoriginal research, analyzing a test with more sophisticated criteria for
so thoughtlessand incompetentin solv the research of others in the service of judging performance. In a truly authen
ing real-worldproblems. For example, one's research, arguing critically, and ticandcriterion-referenced education,far
the researchdone at JohnsHopkins Uni synthesizingdivergentviewpoints.With more time would be spent teaching and
versity demonstrateshow precariousand in reasonable and reachable limits, a real testing the student'sability to understand
illusory"knowledge" of physics really is, test replicates the authentic intellectual and internalize the criteria of genuine
when even well-trained studentshabitu challenges facing a person in the field. competence. What is so harmful about
ally invokeerroneousbut plausible ideas (Such tests are usually also the most en current teaching and testing is that they
about force on certain problems."1 gaging.) frequentlyreinforce- unwittingly- the
The true test is so central to instruc The practical problems of test design lesson thatmere right answers, put forth
tion that it is known from the start and can best be overcome by hinindng of aca by going through the motions, are ade
repeatedly takenbecause it is both cen demic tests as the intellectualequivalent quate signs of ability. Again, this is a
tral and complex - equivalent to the of public "performances." To enable a mistake rarely made by coaches, who
game to be played or the musical piece student is to help him or her make prog know that their hardest and most impor
to be performed. The true test of abili ress inhandlingcomplex tasks.The nov tant job is to raise the standards and ex
ty is to perform consistentlywell tasks ice athlete and the novice actor face the pectations of their students.
whose criteria for success are known same challenges as the seasonedprofes
and valued. By contrast, questions on sional. But school tests make the complex
EXAMPLES OF AUTHENTIC TESTS
standardized tests are usually kept "se simple by dividing it into isolated and
cure,"hidden from studentsand teachers, simplistic chores - as if the student need Let us examine some tests and criteria
and they thus contradict the most basic not practice the truetestof performance, devised by teachers working to honor the
conditions required for learning.12(Of the test of putting all the elements togeth ideas rve been discussing under the head
course, statistical validity and reliability er. This apparently logical approach of ing of "exhibition of mastery" - one of
depend on the test being secret, and, breaking tasks down into their compo the nine "CommonPrinciples" around
when a test is kept secret, the questions nents leads to tests that assess only ar which members of theCoalition of Es
can be used again.) tificially isolated"outcomes"andprovide sential Schools have organized their re
Designing authentic tests should in no hope of stimulatinggenuine intellec form efforts.15Here are two examples
volve knowledge use tat is forward tual progress. As a result, teaching to of finalexams thiatseem to replicatemore
looking.We need toview testsas "assess such tests becomes mechanical, static, accurately the challenges facing experts
ments of enablement,"toborrowRobert and disengaging. Coaches ofmusicians, in the field.
Glaser's term.Rather thanmerely judg actors, debaters, and athletes know bet An oral historyproject for ninth-grad

706 KAPPAN
PHIDELTA
ers. 16You must complete an oral history tion over the next year, and 6) discuss *Novice. Students use high-frequen
based on interviews and written sources where your company will be in the mar cy words, memorized phrases, and for
and present your findings orally in class. ket six months from today and one year mulaic sentenceson familiartopics. Stu
The choice of subject matter will be up from today. dents show little or no creativity with the
to you. Some examples of possible topics The tasks thatmust be completed in the languagebeyond thememorizedpatterns.
include: your family, running a small course of this project include: * Intermediate.Studentsrecombinethe
business, substance abuse, a labor union, * deriving formulas for supply, de learnedvocabulary and structures into
teenage parents, or recent immigrants. mand, elasticity, and equilibrium; simple sentences.Sentences are choppy,
You are to create three workable hypoth * preparingschedules for supply, de with frequenterrors ingrammar,vocabu
eses based on your preliminary investi mand, costs, and revenues; lary,and spelling.Sentenceswill be very
gations and come up with four questions * graphing all work; simple at the low end of the intermedi
you will ask to test each hypothesis. * preparing a written evaluation of the ate range and will often read very much
To meet the criteria for evaluating the current and future situation for the mar like a direct translation of English.
oral history project described above, you ket in general and for your company in * Intermediate high. Students can write
must: particular; creativesentences,sometimesfairlycom
*investigate threehypotheses; * preparingawritten recommendation plex ones, but not consistently.Structural
* describe at least one change over for your board of directors; forms reflecting time, tense, or aspect are
time; * showing aggregate demand today attempted, but the result is not always
* demonstrate that you have done back and predicting what it will be one year successful. Student show an emerging
ground research; hence; and ability to describe and narrate in para
* interview four appropriate people as * showing the demand for your firm's graphs, but papers often read like aca
sources; product today and predicting what itwill demic exercises.
* prepare at least four questions relat be one year hence. * Advanced. Students are able to join
ed to each hypothesis; Connecticut has implemented a range sentences in simple discourse and have
* ask questions that are not leading or of performance-based assessments in sci sufficientwriting vocabulary to express
biased; ence, foreign languages, drafting, and themselvessimply,althoughthe language
* ask follow-up questions when ap small-engine repair, using experts in the may not be idiomatic. Students show
propriate; field to help develop apt performance good control of themost frequently used
* note important differences between criteria and test protocols. Here is an syntactic structures and a sense that they
fact and opinion in answers that you re excerpt from the Connecticut manual are comfortable with the target language
ceive; describing the performancecriteria for and can go beyond the academic task.
* use evidence to support your choice foreign languages; these criteria have Of course, using such an approach is
of the best hypothesis; and been derived from the guidelines of the time-consuming, but it is not impractical
* organize your writing and your class American Council on the Teaching of or inapplicable to all subject areas on
presentation. Foreign Languages (ACTFL).18On the a large scale. The MAP (Monitoring
A course-ending simulation/exam in written test, students are asked to draft Achievement in Pittsburgh) testingpro
economics.'7 You are the chief execu a letter to a pen pal. The four levels used gram offers tests of critical thinking and
tive officer of an established firm. Your for scoring are novice, intermediate, in writing that rely on essay questions and
firm has always captured a major share termediate high, and advanced; they are are specifically designed to provide di
of the market, because of good use of differentiatedas follows: agnostic information to teachers and
technology, understanding of the natural
laws of constraint, understanding of mar
ket systems, and the maintenance of a
high standard for your product. How
ever, in recent months your product has
become part of a new trend in public
tastes. Several new firms have entered
themarket and have captured part of your
sales. Your product's proportional share
of total aggregate demand is continuing
to fall. When demand returns to normal,
you will be controlling less of the mar
ket than before.
Your board of directors has given you
less thanamonth topreparea reportthat
solves theproblem in the short run and
in the long run. Inpreparing the report,
you should: 1) define the problem, 2)
preparedata to illustratethe current sit
uation, 3) preparedata to illustratecon
'Ms. Kelsor says I'd do better with a team of teachers. She thinks six or eight
ditions one year in the future,4) recom
mend action for today,5) recommendac would be about right."

MAY 1989 707


students. Pittsburgh is also working,
TABLE 1.
through its Syllabus-DrivenExam Pro
gram, todevise exemplary test items that An Item from the NAEP Science Test
are based more closely on the curricu
lum.19 Child's Name Frisbee Toss Weight Lift 50-Yard Dash
On the state level, Vermont has recent (yds.) (lbs.) (secs.)
ly announced that it will move toward a
portfolio-basedassessmentinwriting and Joe 40 205 9.5
mathematics, drawing on the work of the Jose 30 170 8.0
various affiliatesof theNationalWriting Kim 45 130 9.0
Project and of theAssessment of Per Sarah 28 120 7.6
formanceUnit (APU) inGreat Britain. Zabi 48 140 8.3
Californiahas pilotedperformance-based fects in testquestions.However, respons posedly derived from a mini-Olympics
tests in science and other subjects to go es tohigher-orderquestionsare inherent that some children had staged (see Table
with its statewide essay-writing test. ly unpredictable. 1). The introductory text noted that the
The standardizedtest is thus inherent children "decided to make each event of
ly inequitable. I am using the word eq the same importance." No other informa
RESPONSIVENESS AND EQUITY
uity in its original, philosophicalmean tion that bears on the question was pro
Daniel Resnick and Lauren Resnick ing, as it is incorporated into the British vided. The test presented the students
have proposed a different way of mak and American legal systems. The concept with the results of three events from the
ing many of these points. They have is commonsensicalbut profound: blank competition.
argued that American students are the laws and policies (or standardizedtests)
are inherentlyunable to encompass the The first question asked,Who would
"most tested" but the "least examined"
inevitable idiosyncraticcases forwhich be the all-aroundwinner? The scoring
youngsters in theworld.20As their epi
we ought always to make exceptions to
manual gives these instructions:
gram suggests, we rarely honor the origi
nalmeaning of theword test.Originally the rule. Aristotle put it best: "The equita
Score 4 points for accurate ranking
a testum was a porous cup for determin ble is a correction of the law where it is of the children's performance on each
ing the purity of metal; later it came to defective owing to its universality."2' event and citing Zabi as the overall
stand for any procedures for determin In the context of testing, equity re winner. Score 3 points for using a rank
ing the worth of a person's effort. To quires us to insure that human judgment ing approach ... but misinterpreting
prove the value or ascertain the nature of is not overrun or made obsolete by an ef performance on the dash event ... and
a student's understanding implies that ap ficient,mechanical scoring system. Ex therefore, citing the wrong winner.
pearances can deceive. A correct answer ternallydesigned and externallymandat Score 2 points for a response which
candisguise thoughtlessrecall.A student ed tests are dangerously immune to the cites an overall winner or a tie with
an explanation that demonstrates some
might quickly correct an error or a slip possibility that a studentmight legiti recognition that a quantitative means
that obscures thoughtful understand mately need to have a question rephrased of comparison is needed. Score 1 point
ing; indeed, when a student's reasoning or might deserve the opportunity to de if the student makes a selection of an
is heard, an error might not actually be fend an unexpected or "incorrect" an overall winner with an irrelevant or
an error at all. swer, even when the test questions are non-quantitative account or without
The root of the word assessment re well-structuredand theanswersaremul providing an explanation. Score 0 for
minds us that an assessor should "sitwith" tiple choice. How many times do teach no response.
a learner in some sense to be sure that ers, parents, or employers have to alter
the student's answer really means what an evaluation after having an answer or Makes sense, right? But now ask your
it seems tomean. Does a correct answer action explained? Sometimes, students self how, using the given criteria, you
mask thoughtless recall? Does a wrong need only a hint or a slight rephrasing to would score the following response giv
answer obscure thoughtfulunderstand recall and use what they know. We rely en by a third-grader:
ing?We can know for sure by asking fur on human judges in law and in athletics
ther questions, by seeking explanation or because complex judgments cannot be A. Who would be the all-around
winner?
substantiation, by requesting a self-as reduced to rules if they are to be truly
No one.
sessment, or by soliciting the student's re equitable. To gauge understanding, we B. Explain how you decided who
sponse to the assessment. must explore a student'sanswer; there would be the all-around winner. Be
The problem can be cast in broader must be some possibility of dialogue be sure to show your work.
moral terms: the standardized test is dis tween the assessor and the assessed to in No one is the all-around winner.
respectfulby design.Mass testingaswe sure that the student is fully examined.
know it treats students as objects -as This concern for equity and dialogue The NAEP scorer gave the answer a
if theireducationand thoughtprocesses is not idle, romantic, or esoteric. Con score of 1. Given the criteria, we can see
were similar and as if the reasons for sider the following example from the Na why. The student failed to give an ex
theiranswerswere irrelevant.Test-takers tional Assessment of Educational Prog planation or any numerical calculations
are not, therefore,treatedas human sub ress (NAEP) science test, Learning by to support the answer.
jectswhose feedback is essential to the Doing, which was piloted a few years But could that answer somehow be
accuracy of the assessment. Pilot stan ago.22 On one of the tasks, students were apt in the mind of the student? Could
dardized tests catchmany technicalde given three sets of statistics that sup it be that the 9-year-old deliberately

708 KAPPAN
PHIDELTA
and correctly answered "no one," since overlooked feasible in-class alternatives
"all-around" could mean "winner of all to such impersonal testing, which are
events"? If looked at in this way, couldn't already in use around the world. The
it be that the child was more thoughtful German abitur (containing essay and
than most by deliberately not taking the oral questions) is designed and scored by
bait of part B (which presumably would classroom teachers, who submit two pos XX4ho is re
have caused the child to pause and con
sider his or her answer). The full sen
sible tests to a state board for approval.
The APU in Great Britain has for more
sponsiblefor
tence answer in part B - remember, this than a decade developed tests that are insuring that an
is a 9-year-old - is revealing to me. It designed for classroom use and that in
ismore emphatic than the answer to part volve interaction between assessor and
answer has been
A, as if to say, "Your question suggests student. fully exploredor
I should have found one all-around win What is so striking about many of the
ner, but Iwon't be fooled. I stick to my APU test protocols is that the assessor is
understood,the
answer that no one was the all-around meant to probe, prompt, and even teach, testeror the student?
winner." (Note, by the way, that in the if necessary, to be sure of the student's
scorer's manual the word all-around has actual ability and to enable the learner to
been changed to overall.) The student did learn from the assessment. In many of
not, of course, explain the answer, but these tests the first answer (or lack of
it is conceivable that the instruction was one) is not deemed a sufficient insight
into the student'sknowledge.23Consid the table.] If no response, prompt for
confusing, given that there was no "work"
string.
needed to determine that "no one" was the er, for example, the following sections
13. Ask: "Is there any other meth
all-around winner. One quick follow-up from the assessor's manual for a mathe od?" If student does not suggest using
question could have settled the matter. matics test for British 15-year-olds cover C = itd, prompt with, Would it help
A moral question with intellectual ram ing the ideas of perimeter, area, and cir tomeasure the diameter of the circle?"
ifications is at issue here: Who is re cumference.
sponsible for insuring that an answer has The scoring system works as follows:
been fully explored or understood, the 1. Ask: "What is the perimeter of a 1) unaided success; 2) success following
tester or the student? One reason to safe rectangle?" [Write student answer.] one prompt from the tester; 3) success
guard the teacher's role as primary asses 2. Present sheet with rectangle following a series of prompts; 4) teach
sor is that the most accurate and equita ABCD. Ask: "Could you show me the ing by the tester, prompts unsuccessful;
ble evaluation depends on relationships perimeter of this rectangle?" If neces 5) an unsuccessful response, and tester
that have developed over time between sary, teach. did not prompt or teach; 6) an unsuc
examiner and student. The teacher is the 3. Ask: "How would you measure cessful response despite prompting and
the perimeter of the rectangle?" If nec teaching; 7) question not given; and 8)
only one who knows what the student can
essary, prompt for full procedure. If
or cannot do consistently, and the teach unaided success where student correct
necessary, teach. . ..
er can always follow up on confusing, 10. "Estimate the length of the cir ed an unsuccessful attempt without help.
glib, or ambiguous answers. cumference of this circle." The "successful" responses were com
In this country we have been so en 11. Ask: "What would you do to bined into two larger categories called
amored of efficient testing that we have check your estimate?" [String is on "unaided success" and "aided success,"
with percentages given for each.24
The Australians for years have used
similar tasks and similarly trained teach
X 'mcar iafull fuu
IF_g Wkv ers to conduct district- and statewide
load thSsemester..
Remedial writini.. assessments in academic subject areas
r (much as we do in this country with the
Remedlial readinrg.
Remedial miath... Advanced Placement exams). Teachers
give tests made up of questions drawn
from banks of agreed-upon items and
then mark them. Reliability is achieved
through a process called "moderation," in
which teachers of the same subjects gath
RATE~~~~~~~~~~~TT OLL~ er to compare results and to set criteria
for grading.
To insure that professionalization is
aided, not undermined,by national test
ing, the process of "groupmoderation"
has been made a central featureof the
V proposednew nationalassessmentsystem
inGreat Britain. The testswil.l be both
teacher-givenand standardized.Butwhat
(Pq?HCO1-FO(7T is so admirable-and equitable -is that

1989
MAY 709
development that teachers need and de intellectual standards and can reveal only
sire. Both equity in testing and reform of where students stand in relation to one
schooling ultimately depend on a more another. It tells us nothing about where
open and consensual process of establish they ought to be. Moreover, students are
ing and upholding schoolwide standards. left with only a letter or number - with
A number of reasons are often cited
XVe must over for retaining "objective" tests (the design
nothing to learn from.
Consider, too, that the bell-shaped
come the lazy habit of which is usually quite "subjective"'), curve is an intendedresult in designing
among them: the unreliability of teacher a means of scoring a test, not some co
of grading and created tests and the subjectivity of hu incidental statistical result of a mass test
scoring"on the man judgment. However, reliability is ing. Norm-referenced tests, be they lo
only a problem when judges operate in cally or nationally normed, operate un
curve" as a cheap private and without shared criteria. In der the assumption that teachers have no
way of setting and fact, multiple judges, when properly effect - or only a random effect - on
trained to assess actual student perform students.
upholdingstandards. ance using agreed-upon criteria, display
a high degree of inter-rater reliability. In There is nothing sacred about the
the Connecticut foreign language test de normal curve. It is the distributionmost
scribed above, on the thousands of stu appropriate to chance and random ac
dent tests given, two judges using a four tivity. Education is a purposeful activi
point scoring system agreed on a student's ty, and we seek to have the students
theprocess of groupmoderation requires score 85% of the time.26 Criticisms of
learnwhat we have to teach. . . . [W]e
collective judgmentsabout any discrep Advanced Placement exams that contain
may even insist thatour efforts are un
ancies between grade patterns in differ successful to the extent that the distri
essay questions usually focus on the cost bution of achievement approximates
ent schools and between results in a giv
of scoring, not on problems of inter-rater the normal distribution.27
en school and on the nationally stan
reliability. Inadequatetesting technolo
dardized criterion-referencedtest. Sig In addition, such scoring insures that,
gy is a red herring. The real problem
nificantly, theprocessofmoderationcan, standing in the way of developing more by design, at least half of the student
on occasion, override the results of the
authentic assessment with collaborative population is always made to feel inept
nationally standardized test:
standard-setting is the lack of will to in and discouraged about their work, while
vest the necessary time and money. the other half often has a feeling of
A first task of a moderation group True criterion-referenced tests and di achievement that is illusory.
would be to examine how well the pat ploma requirements, though difficult to Grading on a curve in the classroom
ternsof the twomatched for each group
frame in performance standards, are es is even less justifiable. There is no sta
of pupils [comparing percentages of
students assigned to each level].... sential for establishing an effective and tistical validity to the practice, and it
The meeting could then go on to ex just education system. We must over allows teachers to continually bypass the
plore discrepancies in the pattern of come the lazy habit of grading and scor harder but more fruitful work of setting
particular schools or groups, using ing "on the curve" as a cheap way of set and teaching performance criteria from
samples of pupils' work and knowledge ting and upholding standards. Such a which better learning would follow.
of the circumstances of schools. The practice is unrelated to any agreed-upon To let students show off what they
group moderation would first explore
any general lack of matching between
the overall teacher rating distribution
and the overall distribution of results 1fFACHN-R5
on the national tests. The general aim
would be to adjust the overall teacher
LOUNGEI
rating results tomatch the overall re
sults of the national tests; if the group
were to have clear and agreed reasons
for not doing this, these should be
reported ... [and] departures could be
approved if the group as a whole could
be convinced that theywere justified in
particular cases.25 (Emphasis added)

At the school-sitelevel in theU.S., we


might consider theneed for an oversight
process akin to groupmoderation to in
sure that studentsare not subject to ec
centric testingandgrading -a commit
tee on testingstandards,for example. In
short,what groupmoderation can pro "171l will be providingme with enough
have thedecaffeinated;21 third-graders
vide is thekind of on-going professional stimulation."

710 PHIDELTA
KAPPAN
know and are able to do is a very differ Structureand logistics.Authentic tests make studentjudgmentcentral inposing,
ent business from the fatalisminducedby aremore appropriatelypublic, involving clarifying, and tacklingproblems.
counting errors on contrived questions. an actual audience, client, panel, and so Standards of grading and scoring.
Since standardizedtests are designed to on. The evaluation is typically based on Authentic testsmeasure essentials, not
highlight differences, they often end up judgment that involvesmultiple criteria easily counted (but relatively unimpor
exaggeratingthem(e.g., by throwingout (andsometimesmultiple judges),and the tant) errors. Thus the criteria for scor
pilot questions that everyone answers cor judging ismade reliableby agreed-upon ing them must be equally complex, as in
rectly in order to gain a useful "spread" standardsand prior training. the cases of the primary-trait scoring of
of scores).28And since the tasksare de Authentic testsdo not relyon unrealis essays or the scoring of ACTFL tests of
signedaroundhiddenandoften arbitrary tic and arbitrary time constraints, nor do foreignlanguages.Nor can authentictests
questions,we shouldnot be surprisedif they rely on secret questions or tasks. be scored on a curve. They must be
the test results end up too dependent on They tend to be like portfolios or a full scoredwith reference to authentic stan
the native language ability or cultural season's schedule of games, and they em
backgroundof the students,insteadof on phasize studentprogress towardmastery.
the fruit of theirbest efforts. Authentic tests require some collab
Tracking is the inevitable result of orationwith others. Most professional
grading on a curve and thinking of stan challenges faced by adults involve the ca
dards only in terms of drawing exag pacity to balance individual and group
gerated comparisonsbetween students. achievement.Authentic tests recur, and A uthentic tests
Schools end up institutionalizingthese they are worth practicing, rehearsing,
differences, and, as the very word track and retaking. We become better educat are contextualized,
implies, the standardsfordifferenttracks
never converge. Students in the lower
ed by taking the test over and over. Feed
back to students is central, and so authen
complex intellectual
tracks are not taught and assessed in such tic tests are more intimately connected challenges,not
a way that they become better enabled to
close the gap between their current com
with theaims, structures,schedules,and
policies of schooling.
fragmentedand
petence and ideal standards of perform Intellectualdesign features.Authentic staticbits
ance.29Tracking simply enables students tests are not needlessly intrusive, arbi or tasks.
in the lower tracks to get higher grades. trary, or contrived merely for the sake
In the performance areas, by contrast, of shaking out a single score or grade.
high standards and the incentives for Instead, they are "enabling" - construct
students are clear.30 Musicians and ath ed to point the student toward more
letes have expert performers constantly sophisticated and effective ways to use
before them from which to learn. We set knowledge. The characteristicsof com
up differentweight classes forwrestling by which we might
petent performance dards of performance,which students
competition, different ratingclasses for sort nonenabling from enabling tests must understand to be inherent to suc
chess tournaments, and separate varsity might include: "The coherence of [the cessful performance.
and junior varsity athletic teams to nur student's]knowledge, principled [as op Authentic tests usemultifaceted scor
ture students'confidence as they slowly posed to merely algorithmic] problem ing systems instead of a single aggregate
grow and develop their skills. We assume solving,usableknowledge, attention-free grade. The many variables of complex
that progress toward higher levels is not and efficient performance, and self-reg performance are disaggregated in judg
only possible but is aided by such group ulatory skills. "33 ing.Moreover, self-assessmentbecomes
ings. Authentic testsare contextualized,com more central.35
The tangible sense of efficacy (aided plex intellectual challenges, not frag Authentic tests exist in harmony with
by the desire to do well publicly and the mented and static bits or tasks. They cul schoolwideaims; theyembody standards
power of positive peer pressure) that minate in the student's own research or to which everyone in the school can as
theseextracurricularactivitiesprovide is product, for which "content" is to be pire. This implies the need for school
a powerful incentive. Notice how often mastered as a means, not as an end. wide policy-making bodies (other than
some students will try to sneak back into Authentic tests assess studenthabits and academicdepartnents)thatcrossdiscipli
school after cutting class to submit them repertoires;theyare not simply restrict nary boundaries and safeguard the essen
selves to the rigorsof athletics, debate, ed to' recall and do not reflect lucky or tial aims of the school. At Alvemo Col
or band practice - even when they are unlucky one-shot responses. The port lege inMilwaukee, all facultymembers
not the stars or when their team has an folio is the appropriate model; the general are both members of disciplinary depart
abysmal record.31 task is to assess longitudinalcontrolover ments and of"competency groups" that
the essentials.34 span all departments.
Authentic testsare representativechal Fairness and equity.Rather than rely
CRITERIA OF AUTHlENT7ICITY
lengeswithiina given discipline.They are on right/wronganswers, unfair "distrac
From the arguments and examples designed to emphasize realistic (but fair) tors,"andother statisticalartificestowid
above, letme move to a considerationof complexity; they stress depthmore than en the spreadof scores, authentic tests
a set of criteriaby which we might dis breadth. In doing so, theymust neces ferretout and identify (perhapshidden)
tmnguish authenticfrom inauthenticforms sarily involve somewhatambiguous, ill strengths.The aim is to enable-the stu
of testing.32 structuredtasksor problems, and so they dents to show off what they can do. Au

MAY
1989 711
thentic tests strike a constantly examined teaching begins with the freedom and re
balancebetween honoring achievement, sponsibility to set and uphold clear, ap
progress, native language skill, and pri propriate standards - a feat that is im
or fortunate training. In doing so, they possible when tests are seen as onerous
can better reflect our intellectual values. add-ons for "accountability" and are de
Authentic tests minimize needless, un O nly a humane signed externally (and in secret) or ad
fair, and demoralizing comparisons and ministered internally in the last few days
do away with fatalistic thinking about re and intellectually of a semester or year.
sults. They also allow appropriate room valid approach to The redesign of testing is thus linked
to accommodate students' learning styles, to the restructuring of schools. The re
aptitudes, and interests. There is room evaluation can help structuring must be built around intellec
for the quiet "techie" and the show-off us insure progress tual standards, however, not just around
as has too
prima donna in plays; there is room for issues involving governance,
the slow, heavy lineman and for the towardnational often been the case so far. Authentic re
small, fleet pass receiver in football. In
professional work, too, there is room for
intellectualfitness. structuring depends on continually ask
ing a series of questions: What new meth
choice and style in tasks, topics, and ods, materials, and schedules are re
methodologies. Why must all students be quired to test and teach habits of mind?
tested in the same way and at the same What structures, incentives, and policies
time? Why should speed of recall be so will insure that a school's standards will
well-rewarded and slow answering be so be known, reflected in teaching and test
throughout to the test's "face" and "eco
heavily penalized in conventional test design, coherent schoolwide, and high
logical"validity.)
ing?36 enough but still reachable by most stu
Authentic tests can be - indeed, As I said at the outset, we need a new dents? Who will monitor for teachers'
should be - attempted by all students, philosophy of assessment in this country failure to comply? And what response to
with the tests "scaffolded up," not that never loses sight of the student. To such failure is appropriate? How schools
"dumbed down" as necessary to compen build such an assessment, we need to re frame diploma requirements, how the
sate for poor skill, inexperience, or weak turn to the roots of authentic assessment, schedule supports a school's aims, how
training. Those who use authentic tests the assessment of performance of exem job descriptions are written, how hiring
should welcome student input and feed plary tasks. We might start by adopting is carried out, how syllabi and exams are
back. The model here is the oral exam the manifesto in the introduction of the designed, how the grading system rein
for graduate students, insuring that the new national assessment report in Great forces standards, and how teachers po
student is given ample opportunity to ex Britain, a plan that places the interests of lice themselves are all inseparable from
plain his or her work and respond to criti students and teachers first: the reform of assessment.
cism as integral parts of the assessment. Authentic tests must come to be seen
In authentic testing, typical procedures Any system of assessment should as so essential that they justify disrupt
of test design are reversed, and account satisfy general criteria. For the purpose ing the habits and spending practices of
ability serves student learning. A model of national assessment we give priori conventional schoolkeeping.Otherwise
task is first specified. Then a fair and ty to the following four criteria: standards will simply be idealized, not
incentive-building plan for scoring is de * the assessment results should made tangible. Nor is it "soft-hearted" to
vised. Only then would reliability be con give direct information about pupils' worry primarily about the interests of stu
sidered. (Far greater attention is paid achievement in relation to objectives: dents and teachers: reform has little to do
they should be criterion-referenced;
with pandering and everything to do with
* the results should provide a basis
for decisions about pupils' further the requirements for effective learning
learning needs: they should be forma and self-betterment. There are, of course,
tive; legitimate reasons for taking the intellec
* the grades should be capable of tual pulse of students, schools, or school
comparison across classes and schools systems through standardized tests, par
. . .so the assessments should be ticularly when the results are used as an
calibrated or moderated; "anchor" for school-based assessment (as
* theways inwhich criteria are set the British propose). But testing through
up and used should relate to expected
matrix sampling and other less intrusive
ffFy soea m is ilbeal routes of educational development, giv
methods can and should be more often
ltJo yms ing some continuity to a pupil's assess
tolanapwu used.
ment at different ages: the assessments
takes." Only sucha humaneand intellectually
should relate to progression.37
valid approachto evaluationcan help us
The task is to define reliable assess insureprogress towardnational intellec
ment in a differentway, committingor tual fitness.As long aswe hold simplis
reallocatingthe time andmoney needed ticmonitoring teststobe adequatemodels
to obtainmore authentic and equitable of and incentives for reaching our in
testswithin schools. As theBritish pro tellectualstandards,studentperformance,
posals imply, theprofessionalizationof teaching,andour thiinking anddiscussion

712 PHIDELTA
KAPPAN
about assessment will remain flaccid and 12. See also Walter Haney, "Making Testing More Against the Wall: Psychometrics Meets Praxis,"
Educational," Educational October Educational Measurement: Issues and Practice, vol.
uninspired. Leadership,
1985, pp. 4-13. 5, 1986, pp. 12-16; and Richard Wallace, "Redirect
13. Robert Glaser, "Cognitive and Environmental ing a School District Based on theMeasurement of
1. For an explanation of the State reports of above in Freeman, The
Perspectives on Assessing Achievement," in Eileen Learning Through Examination,"
average test scores, see Daniel Koretz, "Arriving in the Service of Learn of Testing . . . , pp. 59-68.
Freeman, ed., Assessment Redesign
in Lake Wobegon: Are Standardized Tests Exag
ing: Proceedings of the 1987 ETS Invitational Con 20. Daniel P. Resnick and Lauren B. Resnick,
gerating Achievement and Distorting Instruction?,"
ference (Princeton, N.J.: Educational Testing Ser "Standards, Curriculum, and Performance: A His
American Educator, Summer 1988, pp. 8-15, 46
an American vice, 1988), pp. 40-42; and idem, "The Integration torical and Comparative Perspective," Education
52; and Edward Fiske, "Questioning of Instruction and Testing," in Eileen Freeman, ed., al Researcher, vol. 14, 1985, pp. 5-21.
Rite of Passage: How Valuable Is the SAT?," New
The Redesign of Testing for the 21st Century: 21. Aristotle Nicomachean Ethics 1137b25-30.
York Times, 1 January 1989.
Proceedings of the 1985 ETS Invitational Confer 22. Learning by Doing: A Manual for Teaching and
2. Norman Frederiksen, "The Real Test Bias: In ence (Princeton, N.J.: Educational Testing Service,
fluences of Testing on Teaching and Learning," Assessing Higher-Order Thinking in Science and
1986).
American Psychologist, vol. 39, 1984, p. 200. Mathematics (Princeton, N.J.: Educational Testing
14. Frederiksen, p. 199.
3. Grant Wiggins, "Rational Numbers: Scoring and Service, Report No. 17-HOS-80, 1987).
15. For a complete account of the nine "Common
23. Similar work on a research scale is being done
Grading That Helps Rather Than Hurts Learning," see Theodore R. Sizer, Horaces Com
American Educator, Winter Principles," in the U.S. as part of what is called "diagnostic
1988, pp. 20, 25, 45,
48. promise: The Dilemma of the American High achievement assessment." See Richard Snow, "Prog
School, updated ed. (Boston: Houghton Mifflin, ress inMeasurement,
4. Raymond Callahan, Education and the Cult of Cognitive Science, and Tech
1984), Afterword. For a summary of the idea of
of Chicago Press, nology That Can Change the Relation Between In
Efficiency (Chicago: University "exhibitions," see Grant Wiggins, "Teaching to the struction and Assessment," in Freeman, Assess
1962), pp. 80-84. (Authentic) Test," Educational Leadership, April ment in the Service of Learning . . . ,pp. 9-25; and
5. David Tyack, The One Best System: A History 1989. J. S. Brown and R. R. Burton, "Diagnostic Models
of American Urban Education (Cambridge, Mass.: 16. I wish to thank Albin Moser of Hope High for Procedural Bugs in Basic Mathematical Skills,"
Harvard University Press, 1974), pp. 140-46. School in Providence, R.I., for this example. For Cognitive Science, vol. 2, 1978, pp. 155-92.
6. Callahan, pp. 100-101. an account of a performance-based history course, 24. Mathematical Development, Secondary Survey
7. Richard J. Stiggins, "Revitalizing Classroom As
including the lessons used and pitfalls encountered, Report No. 1 (London: Assessment of Performance
sessment: The Highest Instructional Priority," Phi write to David Kobrin, Department of Education,
Unit, Department of Education and Science, 1980),
Delta Kappan, January 1988, pp. 363-68. Brown University, Providence, RJ 02912.
pp. 98-108.
8. Peter Elbow points out that in all performance 17. I wish to thank Dick Esner of Brighton High on Assessment
25. Task Group and Testing,
based education, the teacher goes from being the School in Rochester, N.Y., for this example. De
(TGAT) Report (London: Department of Education
student's adversary to being the student's ally. See tails on the ground rules, the information supplied and Science, 1988), Paragraphs 73-75.
Peter Elbow, Embracing Contraries: Explorations for the simulation, the logistics, and the evaluation
26. Personal communication from Joan Baron,
in Teaching and Learning (New York: Oxford Uni can be obtained by writing to Esner.
director of the Connecticut Assessment of Educa
versity Press, 1986). 18. Manuals are available from the Office of Re
tional Progress.
9. For more on content as knowledge in use and search and Evaluation, Connecticut Department of
27. Benjamin Bloom, George Madaus, and J.
on the design of curricula and tests around "essen Education, P.O. Box 2219, Hartford, CT 06115.
Thomas Hastings, Evaluation to Improve Learning
tial questions," see Grant Wiggins, "Creating a For further information on the ACTFL guidelines
American Edu (New York: McGraw-Hill, 1981), pp. 52-53.
Thought-Provoking Curriculum," and their use, see ACTFL Provisional Proficiency
28. Jeannie Oakes, Keeping Track: How Schools
cator, Winter 1987, pp. 10-17. Guidelines (Hastings-on-Hudson, N.Y.: American
Structure Inequality (New Haven, Conn.: Yale
10. Gilbert Ryle, The Concept of Mind (London: Council on the Teaching of Foreign Languages,
University Press, 1985), pp. 10-13.
Hutchinson Press, 1949). 1982); and Theodore Higgs, ed., Teaching for
the Organizing 29. Ibid.
11. M. McCloskey, A. Carramaza, andB. Green, Proficiency, Principle (Lincoln
in 'Sophisticated' Subjects: Miscon wood, 111.: National Textbook Co. and ACTFL, 30. On the engaging quality of "exhibitions" of mas
"Naive Beliefs
About Trajectories of Objects," Cognition, 1984). tery, see Sizer, pp. 62-68.
ceptions
19. See Paul LeMahieu and Richard Wallace, 31. For various group testing and grading strate
vol. 9, 1981, pp. 117-23. "Up
gies, see Robert Slavin, Using Student Team Learn
ing, 3rd ed. (Baltimore: Johns Hopkins Team
Learning Project Press, 1986).
32. Credit for some of these criteria are due to Ar
thur Powell, Theodore Sizer, Fred Newmann, and
Doug Archbald and to the writings of Peter Elbow
and Robert Glaser.
. . . ,"
33. Glaser, "Cognitive and Environmental
pp. 38-40.
34. See the work of the ARTS Propel project, head
ed by Howard Gardner, in which the portfolio idea
is described as it is used in pilot schools in Pitts
burgh. ARTS Propel is described in "Learning from
the Arts," Harvard Education Letter, Septem
ber/October 1988, p. 3. Many members of the Coa
lition of Essential Schools use portfolios to assess
students' readiness to graduate.
35. Alverno College has prepared material on the
hows and whys of self-assessment. See Faculty of
Alverno College, Assessment at Alverno, rev. ed.
(Milwaukee: Alverno College, 1985).
36. For a lively discussion of the research results
on the special ETS testing conditions for dyslec
tics, who are given unlimited time, see "Testing,
Equality, and Handicapped People," ETS Focus, no.
21, 1988.
37. Task Group on Assessment and Testing,
(TGAT) Report (London: Department of Education
'You changed an F to a B and a C to an A. Nice work, son!"
and Science, 1988), Paragraph 5. IB

MAY 1989 713

Vous aimerez peut-être aussi