Vous êtes sur la page 1sur 45

The Impact of Classroom Evaluation Practices on Students

Author(s): Terence J. Crooks


Source: Review of Educational Research, Vol. 58, No. 4, (Winter, 1988), pp. 438-481
Published by: American Educational Research Association
Stable URL: http://www.jstor.org/stable/1170281
Accessed: 15/05/2008 15:42
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=aera.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We enable the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.

http://www.jstor.org

Review of Educational Research


Winter 1988, Vol. 58, No. 4, pp. 438-481

The Impact of ClassroomEvaluationPractices on


Students
Terence J. Crooks
University of Otago
In most educationalprograms,a substantialproportionof teacherand studenttime
is devotedto activitieswhichinvolve(or lead directlyto) evaluationby the teacher
of studentproductsor behavior.This reviewsummarizesresultsfrom 14 specific
fields of researchthat cast light on the relationshipsbetweenclassroomevaluation
practicesand studentoutcomes.Particularattentionis given to outcomesinvolving
learningstrategies,motivation,and achievement.Wherepossible,mechanismsare
suggestedthat couldaccountfor the reportedeffects.The conclusionsderivedfrom
the individualfieldsare then mergedto producean integratedsummarywithclear
implicationsfor effectiveeducationalpractice. The primary conclusion is that
classroomevaluationhaspowerfuldirectand indirectimpacts,whichmaybepositive
or negative,and thus deservesverythoughtfulplanningand implementation.

We say we want sensitive,thoughtful,analytic,independentscholars,then treat


them like Belgiangeese being stuffedfor pate de foie gras. We rewardthem for
compliance,ratherthan independence;for giving the answerswe have taught
themratherthan for challengingthe conclusionswe have reached;for admiringthe
brillianceof purelyscientificadvancesratherthan developinggreatersensitivityto
the inequitiesin healthcare we have too often ignored.As one of my associates
has observed,'We put our medicalstudentsin jail for yearsin orderthat they shall
learnto become freemen.' (Miller,1978)
There has been extensive research on the impact of standardized testing on
students, and this research has been repeatedly reviewed (Goslin, 1967; Kellaghan,
Madaus, & Airasian, 1982; Kirkland, 1971; Madaus & Airasian, 1977; Madaus &
McDonagh, 1979; Rudman et al., 1980). Although standardized tests do have
important and widespread effects under some circumstances (such as when students
must reach given standards to graduate from high school, or when the funding of
school districts is affected by test results), students spend vastly greater amounts of
time engaged in classroom evaluation activities than in standardized testing. Further, surveys of teachers and students have consistently indicated that they believe
the educational and psychological effects of classroom evaluation are generally
substantially greater than the corresponding effects of standardized testing (DorrThe authoracknowledgessupportfrom the Universityof Otago,Universityof Michigan,
Universityof Iowa, and the AmericanCollegeTesting Programduringthe writingof this
paper.Some partsof the paperwere presentedat the 14th InternationalConferenceof the
InternationalAssociationforEducationalAssessment,IowaCity,IA, May23-27, 1988.Mina
Crooks,Janet Kane, Michael Kane, WilbertMcKeachieand three anonymous reviewers
providedveryhelpfulcommentson draftsof this paper.
438

Impactof ClassroomEvaluationon Students


Bremme & Herman, 1986; Haertel, 1986; Kellaghanet al., 1982; Salmon-Cox,
1981;Stiggins& Bridgeford,1985).
Becauseclassroomevaluationactivitiesappearto have verysignificanteffectson
students,this reviewwill synthesizeresearchthat relatesto the impactof classroom
evaluationon students.Researchevidencefroma wide varietyof researchdomains
will be reviewedand summarized,and the conclusionsfrom these domainswill be
drawntogetherto identifyimplicationsfor effectiveeducationalpractice.
For the purposesof this review,classroomevaluationis defined as evaluation
based on activitiesthat studentsundertakeas an integralpart of the educational
programsin which they are enrolled.These activitiesmay involve time spentboth
inside and outside the classroom.This definition includes tasks such as formal
teacher-madetests, curriculum-embedded
tests (includingadjunct questions and
otherexercisesintendedto be an integralpartof learningmaterials),oralquestions
askedof students,and a wide varietyof otherperformanceactivities(cognitiveand
psychomotor).It also includesassessmentof motivationaland attitudinalvariables
and of learningskills.
Formal testing under carefully controlled conditions is often only a small
component of the total set of evaluationactivitiesin a course (especiallyin the
earlyyearsof schooling),but the impactof classroomtestingon studentshas been
studied much more extensively than the impact of other forms of classroom
evaluation.Thus tests and test-likeactivitiesfeatureprominentlyin this review.
Otherforms of classroomevaluationundoubtedlyalso have importanteffects on
students.Fortunately,many of the generalconclusionsthat can be drawn from
researchon testingare likelyto applyalso to otherformsof classroomevaluation.
I have chosento discussresearchthat was conductedin laboratorysettings,even
though it may seem to have little ecological validity for classroom evaluation.
Much of the classroom-basedresearchalso has verylimitedecologicalvalidity,due
to artificialexperimentalconditions, very brief treatments,or other factors.The
applicationof almostall educationalresearchto new settingsor conditionsrequires
thoughtful analysis and sensitivity to factors that may affect the relevance or
applicabilityof the findings in the new settings, or with particularcategoriesof
people.As Cronbach(1975) has put it,
Onereasonable
hopeto maketwocontributions.
inquirycanreasonably
Systematic
to improveshort-run
control.The
is to assesslocaleventsaccurately,
aspiration
is to developexplanatory
otherreasonable
concepts,conceptsthatwill
aspiration
helppeopleusetheirheads.(p. 126)
Althoughgeneralizationsmust be made with extremecaution(Cronbach,1975;
Crooks, 1982; McKeachie, 1974), it seems desirable to look for parallels or
convergencebetweenthe broadfindingsof researchin differentdomains.Thus, for
instance, the present review finds that researchin three quite differentdomains
(teacheroral questions,adjunctquestionsin readingpassages,and teacher-made
tests)thatexaminedthe effectsof usingquestionsof higheror lowercognitivelevels
seems to convergequite nicely. This enhancesconfidencein the generalityof the
findings.
The review is structuredin four sections. The first section takes a preliminary
look at the nature,role, and impact of classroomevaluation.The second section
439

TerenceJ. Crooks
reviewsresearchthat focusesprimarilyon the impact of variousclassroomevaluation practiceson studentlearningactivitiesand achievement.Nine specificareas
of researchare included.The thirdsectionreviewsresearchon studentmotivation,
examining the effects of differentevaluation practiceson motivation, and the
consequencesof resultingmotivationaltendenciesfor studentlearning.Five areas
of motivationalresearchare included.The final section drawstogetherthe major
findingsfrom the second and thirdsectionsand indicatesthe implicationsof these
findingsfor the effectiveuse of classroomevaluationin education.
The Nature,Role, and Impactof ClassroomEvaluation:An Overview
This sectionincludesthreesubsections.The firstbrieflysummarizesthe findings
of researchon existingpatternsof classroomevaluationin elementaryand secondary schools. The second discussesand categorizesthe variablesthat are assessed
throughclassroomevaluation.The thirdlists 17 specificways(categorizedas short,
medium,or long term)in which classroomevaluationaffectsstudents.
Patternsof classroomevaluation.In the past few years, severalresearchteams
and individualshaveexaminedclassroomevaluationpracticesin elementary,junior
high, and high schools in some detail (Dorr-Bremme& Herman, 1986;Fennessy,
1982;Fleming& Chambers,1983;Gullickson,1984, 1985;Gullickson& Ellwein,
1985;Haertel,1986;Stiggins,1985;Stiggins& Bridgeford,1985;Stiggins,Conklin,
& Bridgeford,1986).Theirfindingsare summarizedbelow.
A substantialproportionof studenttime is involvedin activitiesthat are evaluated.In two studies(Dorr-Bremme& Herman,1986;Haertel,1986),testsoccupied
studentsfor 5 to 15%of their time on average,with the lower figurebeing more
typicalforelementaryschoolstudentsand the higherfigureforhighschoolstudents.
However, this was only the time spent on taking formal written tests. Much
additionaltime is spenton otheractivitiesthatareevaluated,formallyor informally.
Particularemphasisis placed on these nontest approachesat the elementarylevel
(Gullickson,1985).
A wide rangeof evaluativeactivitiestakes place in classrooms,with the pattern
varyingmarkedlyat differentgradelevels and in differentsubjectareas(Fennessy,
1982;Gullickson,1985;Stiggins& Bridgeford,1985).Activitiesincludeevaluation
throughteacherquestioningand classor groupdiscussion,markingor commenting
on performancesof various kinds, checklists,informal observationof learning
activities,teacher-madewrittentests,and writtenexercisesof variouskinds(including projects,assignments,worksheets,text-embeddedquestions,and tests). Affective variables(e.g., aspects of motivation) are also assessed,usually in informal
ways.
Teachersjudge evaluative activities to be important aspects of teaching and
learningand workat them accordingly,but areoftenconcernedaboutthe perceived
inadequaciesin theirefforts(Gullickson,1984;Stiggins& Bridgeford,1985).
A substantialproportionof teachershave little or no formaltrainingin educational measurementtechniques,and many of those who do have such trainingfind
it of little relevanceto their classroom evaluation activities (Gullickson, 1984;
Gullickson& Ellwein, 1985;Haertel, 1986;Stiggins,1985). This is especiallytrue
for elementaryschool teachersbecauseof their heavy relianceon observationand
othernontestmeansof evaluation.Thereare strongargumentsfor helpingteachers
to improvethese nontestformsof evaluation(e.g., Shulman, 1980, pp. 69-70).
440

Impactof ClassroomEvaluationon Students


What is evaluated?Bloom (1956) classifiededucationaloutcomes into three
majordomains:cognitive,affective,and psychomotor;and subdividedthe cognitive
domaininto the six well knowncategories(knowledge,comprehension,application,
analysis,synthesis,and evaluation).Otherresearchershave used differentclassification schemes. For instance, Gagne, Briggs,and Wager (1988) identified five
categoriesof learnedoutcomes:intellectualskills,cognitivestrategies,verbalinformation, attitudes,and motor skills. Their first three categoriescould be seen as a
subdivisionof Bloom's cognitivedomain, althoughthe categorylabeledcognitive
strategieswas not directlyaddressedin the Bloom taxonomy.Most educatorswould
agreethat objectivesin all three domains are importantoutcomes of education,
with the relativeimportanceof the differentdomainsvaryingsomewhatby subject
area.
This reviewfocusesprimarilyon researchexaminingeffectsof evaluationon the
cognitivedomain and certainaspectsof the affectivedomain (test anxiety,student
self-efficacy,intrinsicmotivation,attributionsfor successand failure,and cooperation among students).This is not intendedto imply the unimportanceof other
learningoutcomes, and I would expect that most of the effects identifiedin this
reviewwould have close parallelsin otherareas.Also, increasingattentionis being
given to evaluationof the processesteachersand studentsuse in the pursuitof
learningoutcomes. This is certainlyan area that should not be neglected:Many
studies reviewedhere demonstratethat the learningstrategiesstudentsadopt are
powerfulpredictorsof educationaloutcomes,so that expertisein the selectionand
applicationof learningstrategiesis an importanteducationaloutcome.
Many researchersand educatorshave found Bloom's six level taxonomy of the
cognitive domain difficult to apply in practice.Any given test question may be
answeredin differentways by differentstudents,dependingon their specificpast
experienceswith the topic (see, for instance,the discussionby Haertel,1985, p. 33,
of researchby Nuthall and Lee). Althoughthese difficultiesdo not fully vanish
when levels of the taxonomy are collapsed, many authors have found it more
satisfactoryto use a simplifiedversion of the taxonomy with two or three levels.
The most common approachis to retainknowledge(recallor recognitionof specific
information)as one categoryand to form two levels from the remainingcategories
(see, for example, Buckwalter,Schumacher,Albright,& Cooper, 1981;Crooks&
Collins, 1986; Mathews, 1980; and Rinchuse & Zullo, 1986). The second level
usually includes Bloom's comprehensioncategoryand the capacity to perform
routine,well-practicedapplicationof knowledge.The thirdlevel is often described
as problemsolving,the key featureof which is the transferof existingknowledge
and skillsto situationsthat the studenthas not met before.
Of course, for some purposesspeciallydesignedtaxonomiesare desirable.For
instance,in view of the heavy emphasison Bloom's knowledgelevel in teachermade tests in schools, when analyzing342 of these tests, Fleming and Chambers
(1983) brokethe knowledgelevel into threesublevels.Likewise,withinthe specific
domain of problem solving, Fredericksen(1984b) has identifiedthree classes of
problems:well structuredproblems, structuredproblems requiringproductive
thinking,and ill structuredproblems.The first of these, involving routine procedures such as calculatingthe area of a righttriangle,would not even be classified
as problemsolvingin the threelevel taxonomydescribedin the last paragraph,but
Fredericksen'slabel for this type of activityis consistentwith the widespreaduse
441

TerenceJ. Crooks
of description,in textbooksand courses,of routineexercisesas "problems."At the
other extreme,as Fredericksenand othershave noted, many real problemsare ill
structured,but such problemsare often avoidedin our educationsystemsbecause
of theircomplexityand open-endedness.
Such terms as higher level questions, thinkingskills, and problemsolving are
widely used in the researchsummarizedhere. In light of the discussion above,
however,it is not surprisingthatthereis much inconsistencyin the way theseterms
have been defined (see, for instance, Carrier& Fautsch-Partridge,1981, for a
discussionof categoriesused in one researcharea).Carefulattentionto the particulardefinitionsused in each researchreportis thus essential.
Severalresearchershave used coding schemesto analyzethe cognitivelevels of
questionsincludedin teacher-madetests, at gradelevels rangingfrom elementary
school to university(Ballet al., 1986;Black, 1968;Buckwalteret al., 1981;Crooks
& Collins,1986;Fleming& Chambers,1983;Haertel,1986;Milton, 1982;Rinchuse
& Zullo, 1986; Stiggins,Griswold,Green, & associates, 1988). In general,these
studieshave revealedextensiveuse of questionsat Bloom's lowest ("knowledge")
level. For instance,afteranalyzing8800 test questionsfrom tests in 12 gradeand
subject area combinations(elementaryto high school), Fleming and Chambers
(1983) reportedthat almost 80% of all questions were at the knowledgelevel.
Mathematicsand French contributedmost of the higher level items. Similarly,
Haertel(1986) found that "classroomexaminationsoften failedto reflectteachers'
stated instructionalobjectives,frequentlyrequiringlittle more than repetitionof
materialpresentedin the textbookor class,or solutionof problemsmuch like those
encounteredduringinstruction"(p. 2).
Thisfindingis not unexpected.Indeed,both proponentsand criticsof educational
testingwidelyagreethat teacher-madetests tend to give greateremphasisto lower
cognitivelevels than the teachers'statedobjectiveswouldjustify. Severalpossible
causeshavebeen suggested.These includethe difficultyof writingitems (especially
the widely used short answerand objectiveitems) to assess comprehensionand
higherlevel skills,the greaterease with which teacherscan defendtheirmarkingof
questions involving recall or recognitionand achieve tests with high reliability
(Elton, 1982, pp. 115-116; Natriello, 1987, p. 158), and the belief of teachersthat
the use of higherlevel questionswill resultin confusion, anxiety, and significant
levels of failure (Doyle, 1983, 1986). Nevertheless,this pattern is a cause for
concern, both because it reduces the validity of teacher'sevaluations of their
studentsand becausethis reviewwill presentstrongevidencethat the use of higher
level questions in evaluationenhances learning,retention,transfer,interest,and
developmentof learningskills.
Otheraspectsof the case for reducedemphasison testingrecalland recognition
of factualknowledgeare presentedby Broudy(1988), Cole (1986), DiSibio (1982),
Ebel (1982), Glaser (1985), Linn (1983), Messick (1984a, 1984b), Quellmaltz
(1985), Rothkopf (1988), and Thomdike (1969). While they differ in focus and
emphasis,they tend to agreethat transferis a very importantqualityof learning.
Thorndikeputs it particularlywell:
The crucialindicatorof a student'sunderstandingof a concept, a principle,or a
procedureis that he is able to apply it in circumstancesthat are differentfrom
those underwhich it was taught.Transferability
is the key featureof meaningful

442

Impactof ClassroomEvaluationon Students


we musttest in circumstances
learning.So if we areto test for understanding,
whichareat leastpartnew.(Thorndike,
1969,p. 2)
Clearly,educationalachievementmust be seen as substantiallymore than the
accumulationof isolated pieces of informationand the developmentof certain
overlearnedskills that can be reliablyperformed.Indeed, Broudy (1988) argues
that neitherthe replicativenor the applicativeuses of schoolingare sufficientto
make a good case for generaleducationof the whole school population.Rather,he
argues, one must look to what he calls the associativeand interpretiveuses of
schoolingto build such a case.
Ways in which evaluations affect students:An overview.Evaluations affect
studentsin short, medium, and long term ways. I have classifiedthe effects into
three groupsbased on this time perspective.There are inevitablysome parallels
betweeneffectsin the differentcategories.
At the level of a particularlesson,topic, or assignment,the followingeffectsseem
to apply(see Gagne, 1977, for a similarlist):
1. Reactivatingor consolidatingprerequisiteskills or knowledgepriorto introducingthe new material;
2. Focusingattentionon importantaspectsof the subject;
3. Encouragingactive learningstrategies;
4. Giving studentsopportunitiesto practiceskillsand consolidatelearning;
5. Providingknowledgeof resultsand correctivefeedback;
6. Helping studentsto monitor their own progressand develop skills of selfevaluation;
7. Guiding the choice of furtherinstructionalor learningactivitiesto increase
mastery;
8. Helpingstudentsfeel a sense of accomplishment.
At the level of a particularlearning module, course, or extended learning
experience,the followingare importanteffects:
1. Checkingthat studentshave adequateprerequisiteskills and knowledgeto
effectivelylearnthe materialto be covered;
2. Influencingstudents'motivationto studythe subjectand theirperceptionsof
their capabilitiesin the subject;
3. Communicatingand reinforcing(or in some cases undermining)the instructor's or the curriculum'sbroadgoals for students,includingthe desiredstandards
of performance;
4. Influencingstudents'choice of (and developmentof) learningstrategiesand
studypatterns;
5. Describingor certifyingstudents'achievementsin the course,thusinfluencing
their futureactivities.
Finally,evaluationhas longerterm consequences,especiallywhen studentsmeet
consistentpatternsof evaluationyearafteryear.These longerterm effectsinclude:
1. Influencingstudents'abilityto retainand apply in variedcontextsand ways
the materiallearned;
2. Influencingthe developmentof students'learningskillsand styles;
3. Influencingstudents'continuingmotivation,both in particularsubjectsand
more generally;
443

TerenceJ. Crooks
4. Influencingthe students'self-perceptions,such as their perceptionsof their
self-efficacyas learners.
Theseeffectshavebeenlistedveryconciselyhere,but most of them will be discussed
in considerabledepthin the next two sectionsof this paper.
The Impactof ClassroomEvaluationon StudentLearningActivitiesand
Achievement
This sectionconsistsof nine subsections,arrangedin two groups.Eachsubsection
presentsa brief review of a particularfield of researchon classroomevaluation
practices.Althoughmotivationalfactorshelp explainsome of the reportedfindings,
and some of the evaluationarrangementsdiscussedhave markedeffectson motivationaland affectiveoutcomes,the prime emphasisin this section is on how the
implementationof classroomevaluationaffects learningstrategiesand cognitive
outcomes. Motivationalinfluencesand outcomes are more fully discussedin the
next majorsection of this review.
The Impactof NormalClassroomTestingPractices
Effectsrelatedto expectationsof what will be tested. The studyingand learning
practices of college students. Intensive research on the studying and learning
approachesof college students over the past 20 years has identified consistent
patterns in the learning strategiesadopted by university students and in the
relationshipsbetween these strategiesand teaching arrangements(notably the
evaluation approachesused). Although this researchhas been conducted with
collegestudents,the findingsseem to have much widerapplication.
The researchhas been characterizedby extensiveuse of interviewswith students,
although later researchershave developed questionnairesto gather data more
economically.This researchbeganin the United Stateswith the sociologicalwork
of Becker,Geer, and Hughes(1968) and the insightfulpsychologicalinvestigations
of the intellectualdevelopmentof HarvardUniversitystudentsconductedby Perry
(1970). Most of the more recent work, however,has been carriedout in Europe
and Australia.Marton,Saljo,and theircolleaguesin Swedengave greatimpetusto
this field with their work in the 1970s, and were the first to identifythe patterns
that have been verified repeatedlysince then (althoughit should be noted that
Perry'searlierwork is highlyrelated).This work has been extensivelyreviewedby
Entwistleand Ramsden (1983), Ford (1981), Marton, Hounsell, and Entwistle
(1984), Schmeck(1983), and Wilson (1981).
Martonand Saljo (1976a) reportedthat students'approachesto learningtasks
could be categorizedinto two broadcategoriesthat they labeledas deep or surface
approaches.Deep approachesinvolved an active searchfor meaning, underlying
principles,structuresthat linked differentconcepts or ideas together,and widely
applicabletechniques.Surfaceapproaches,in contrast,reliedprimarilyon attempts
to memorizecourse material,treatingthe materialas if differentfacts and topics
wereunrelated.Similarcategorieshavebeen foundin many laterstudies(see Biggs,
1978;Entwistle& Ramsden,1983;Marton,Hounsell,& Entwistle,1984;Ramsden,
1985;and Watkins,1984),althoughsome researchershave identifiedsubcategories
withinthe surfaceand deep approaches(van Rossum, Deijkers,& Hamer, 1985).
Afterthe initial study,f6llow-upstudiesby Martonand Saljo(1976b), Svensson
444

Impactof ClassroomEvaluationon Students


(1977), Dahlgren(1978), Laurillard(1979), and Ramsden and Entwistle(1981)
demonstratedthat most studentsweresomewhatversatilein theirchoice of learning
approach.Theirchoice dependedon such factorsas their interestin the topic, the
natureof their academicmotivation,the pressureof other demandson their time
and energy,the total amount of content in the course, the way in which a task is
introduced,and theirperceptionsof whatwill be demandedof them in subsequent
evaluationsor applicationsof the material(Entwistle& Ramsden,1983;Laurillard,
1984;Ramsden, 1985).
The choice of evaluation approachesseemed to be particularlypotent in its
effect, leading Elton and Laurillard(1979) to conclude that perhaps "here is
somethingapproachinga law of learningbehaviourfor students:namely that the
quickestway to change student learningis to change the assessmentsystem"(p.
100).
The effects of evaluationon the studyingand learningapproachesadopted by
studentscan be positive or negative.Fredericksen(1984a) describedthese effects
as "the real test bias,"and illustratedhis case with numerousexamplesfrom the
researchliterature.Moreinformally,but no lesspowerfully,Rogers(1969),receiving
an awardfor careercontributionsto the teachingof physics,describedthe effects
of examinationson studentsas follows:
Examinations
tell themour realaims,at leastso theybelieve.If we stressclear
of physics,we maycompletely
and aim at a growingknowledge
understanding
to be putinto
thatasksfornumbers
ourteachingby a finalexamination
sabotage
theexperiHoweverloudoursermons,howeverintriguing
formulas.
memorized
ments,studentswilljudgeby thatexamination-andso willnextyear'sstudents
whohearaboutit (p.956).
Some of the qualitativeinfluencesof evaluationon learninghave been investigatedand describedin books by Beckeret al. (1968), Millerand Parlett(1974), and
Snyder(1971). They found that many studentsaimed to plan their study with the
primarygoal of performingwell on course examinationsand other evaluation
tasks.Unfortunately,the studentsoften saw this goal as conflictingwith the more
fundamentalgoal of gaining a deep and enduringgrasp of the subject. At the
MassachusettsInstituteof Technology,Snyder(1971) found that while what he
calledthe formalcurriculumemphasizeda problem-oriented
approach,originality,
and independenceof thought,the evaluation(which he called the hiddencurriculum) tended to emphasizean answer-orientedapproachand rote learning.Some
students with high intrinsic motivation chose not to let the evaluation system
distorttheir learninggoals (for example, the student quoted in Snyder, 1971, p.
36), but the majoritywerehappyto focus mainlyon the demandsof the evaluation
system.
Of course,studentsdiffermarkedlyin theircapacityto clearlyidentifythe nature
and substanceof those demands. Some (Miller & Parlett, 1974, call them cue
seekers) are very adept and energetic in figuring out optimum strategiesfor
obtaininghighmarkseconomically,while others(cue conscious)areless active,but
take carefulnote of any cues that come theirway, and a minorityare cue deaf
Even when studentscorrectlyidentifythis hidden curriculum,they may not be
capable of adaptingto its demands. Severalstudies (Martin & Ramsden, 1987;
Marton& Saljo, 1976b;Ramsden,1984;van Rossum& Schenk, 1984)haveshown
445

TerenceJ. Crooks
that studentswho generallyuse surfaceapproacheshave greatdifficultyadaptingto
evaluationrequirementsthat favordeep approaches.On the otherhand, these and
otherstudieshave demonstratedthat studentswho on some occasionssuccessfully
use deep approachescan all too easilybe persuadedto adopt surfaceapproachesif
evaluationor other factorssuggestthat these will be successful.For instance,if an
examinationconsists entirelyof detailedfactualquestionson lecturematerial,an
effectivestrategywould be to attend all lectures,take detailednotes, and rely on
last-minutecramming of the lecture notes in the days immediatelybefore the
examination(Crooks& Mahalski, 1986). Miller and Parlett(1974, p. 107) have
suggestedthat such examinationsmay actually serve to clear from the student's
memory the knowledge involved, rather than to strengthenit. Other research
suggeststhis is unlikely, but certainly there is ample researchto indicate that
detailedfactualknowledgedecaysrapidlyunlessit is used or restudied.
One interestingillustrationof an apparentinfluenceof curriculumand evaluation
practiceson studentsemergesfrom a studyby Entwistleand Kozeki (1985). They
examinedthe school motivation,approachesto studying,and attainmentof high
school students in Britain and Hungary.Using Entwistle'swell-establishedApproachesto StudyingInventory,they identified substantialmean differencesbetween British and Hungarianstudents on the deep and reproducing(surface)
approachscales. Comparedto the British students,the Hungarianstudents had
higher scores on deep approach and lower scores on surface approach.They
convincinglyhypothesizedthatthis reflecteddifferencesin teachingand examining
in the two countries.As they interpretedit, the externalexaminationsin Britainin
the latteryears of high school place a very heavy emphasison the correctreproductionof information,andthisinfluencesthe approachesadoptedby bothteachers
and students. In Hungary,on the other hand, there has been a strong reaction
againsta formerstresson rotelearningin the schools,and the emphasishas recently
been placed on attemptingto foster creativitythroughhelping studentsto think
about relationships,with much reducedemphasison factualknowledgeor operation learning.If Entwistleand Kozeki'sinterpretationis correct,their findingsare
a vividdemonstrationof the influenceof whatis emphasizedand assessedin school
on how studentsapproachtheirlearning.
On a smallerscale, Newble and Jaeger(1983) describedthe effectsof a change
in evaluationon studentsin a medicalschool. When wardratingsreplacedan oral
clinical examination,studentsfound that ward ratingswere almost alwaysabove
the pass level. Given that their writtentheory examinationsdid producefailures,
they startedspendingmore time in the libraryand less in the wards.Institutinga
differentclinical examinationshiftedthe balanceback. Newble and Jaegercommentedthat the effect of the changewas so greatas to indicatethat examinations
may be the major factorinfluencingstudent learningin a medical school with a
traditionalcurriculum.A numberof similarexamplesare given by Milton (1982),
in a book which criticallyanalyzescollegeevaluationpractices.
Ramsden, Beswick, and Bowden (1987) gave universitystudents trainingintendedto improvetheirlearningskills,expectingthe studentsto make more use of
deep approachesas a result.They found, however,the trainingactuallyled to an
increasein use of surfaceapproaches,becausethe traininghad made studentsmore
able to analyzethe demands of their course evaluationprocedures,which suited
surfaceapproaches(see also the commentsof Schmeck, 1988, p. 180).
446

Impactof ClassroomEvaluationon Students


All these examples serve to demonstratethat evaluation approachesexert a
powerful influence on how students go about their studying. Ericksen(1983),
reflectingon a lifetime of researchand writingon teachingand learning,left no
doubt about one of his conclusions:
An examination
is a revealing
statement
in
by a teacheraboutwhatis important
thecourse.In fact,facultystandards
performance
concerning
A-grade
maybe the
settheacademic
mostsignificant
valuesof a college.
singlemeansbywhichteachers
(p. 135)

Thus far in this section, I have reportedon researchthat has demonstratedthat


evaluation of students often has a major impact on how they go about their
studying.However,many of the studiesalso looked,qualitativelyor quantitatively,
at the outcomes achieved.These studies have shown that the natureof students'
recall of the content is highly relatedto the strategiesused earlierin studyingit
(e.g., Marton & Saljo, 1976a, 1976b;van Rossum et al., 1985; van Rossum &
Schenk, 1984). They have also shown that students adopting deep approaches
performwell on the evaluationsassociatedwith their courses,apparentlydoing at
least as well on lowercognitivelevel questionsas their surface-oriented
peers,and
doing much better than those peers on questions at higher levels (Biggs, 1973;
Martin& Ramsden, 1987;Svensson, 1977).
Stice (1987), in a vivid autobiographicalaccount of his own academicexperiences, describeshow he achievedexcellentgradesthroughhigh school and his first
2 yearsof collegeby relyingsolelyon surfacestrategies.By the firstyearof graduate
school, however,these strategieswere not producingsatisfactoryresults.With the
help of a friend, he painfully developed new and deeper strategies,and this
ultimatelyled to successin graduateschool and a distinguishedacademiccareer.
In light of the researchreviewedin this section, there seems to be a strongcase
for encouragingthe developmentof deep strategiesfrom the early years of the
education system. This would be facilitatedby greateremphasison higherlevel
questionsin evaluationsof studentprogress.
Effectsrelatingto expectationsof the evaluationformat.A substantialnumberof
studies over the past 50 years have examinedeffects on study behaviorand test
performanceof studentexpectations(sets) relatingto the types of test items they
expectto have to answer(e.g., d'Ydewalle,Swerts,& De Corte, 1983;Gay, 1980;
Hakstian,1971;Hunkins, 1969;Kulhavy,Dyer,& Silver, 1975;Kumar,Rabinsky,
& Pandey, 1979; Meyer, 1934, 1935; Rickards& Friedman,1978; Sax & Collet,
1968;Terry,1933).Unfortunately,synthesisof the resultsof this researchis severely
constrainedby inconsistencyor inadequacyin design of the studies. In several
studies,studentswere told to expect either essay or multiplechoice test formats,
but were not apparentlygiven examplesof the items or practiceon similaritems.
Thustheirexpectationsmay havebeen verygeneralor unclear,and they may have
receivedlittleguidanceas to the cognitivelevels coveredby the two typesof items.
Under these circumstancesone would not expect strongeffects. In other studies,
morecarefulattentionwasgivento establishingclearexpectations,but the cognitive
levels of practiceand criterionitems were not reported,leadingto difficultiesin
interpretingthe findings.
Wherethe cognitive level (or rangeof cognitive levels) of items was similarin
447

TerenceJ. Crooks
the two groups,with only the item formatdiffering,differencesbetweenthe groups
weregenerallysmall (e.g., Hakstian,1971, who used a mix of cognitivelevels,and
Kumaret al., 1979,who usedonly factualquestions).Wherestatisticallysignificant
differenceswere found under these circumstances(e.g., d'Ydewalleet al., 1983;
Meyer, 1935, 1936),they favoredthe groupwhichpreparedfor a recall(as opposed
to a recognition)task. The recall group tended to preparemore thoroughlyand
performa little better.
On the whole, though,studentexpectationsof the cognitivelevel and contentof
tasksprobablyexertmuch moreinfluenceon theirstudybehaviorand achievement
than do theirexpectationsof the taskformat(forgivencontentand cognitivelevel).
Thus I believe that there is no strong evidence from this researchto support
widespreadadoptionof any one item formator style of task. Instead,the basisfor
selectingitem formatsshould be their suitabilityfor testingthe skills and content
that areto be evaluated.
A few studieshave examinedthe comparativemeritsof open book and closed
book testing (see Boniface, 1985; Francis, 1982). These studies have shown that
studentstendto be less anxiousaboutopen book tests,and to preparesomewhatless
thoroughlyfor them. Predictably,the studentswho rely most on using their notes
and/or textbooksduringthe test tend to be among the lowerachievers.Studiesto
date have demonstratedno clear benefit in levels of studentachievementarising
from open book tests. More researchis needed, however,because most of the
treatmentshave been very briefand thus have not allowedthe studentsadequate
opportunityto develop skills in handlingthe demandsof open book tests. Also,
moreattentionto the natureof the test is neededbecausethe availabilityof resource
materialsis most likelyto be meaningfuland usefulwhen testsare not speededand
consistof highercognitivelevel questions.
Effectsoffrequencyof testing.The substantialbody of researchon the effectson
studentsof the frequencyof classroomtesting has been thoroughlyreviewedin a
Kulik, and Kulik (1988). This reviewwill draw
meta-analysisby Bangert-Drowns,
heavilyon theirwork.
The reviewby Bangert-Drowns
et al. (1988) used datafrom 31 studieswhich:(a)
wereconductedin realclassrooms,(b) had all groupsreceivingthe sameinstruction
exceptfor varyingfrequenciesof testing,(c) used conventionalclassroomtests, (d)
did not have seriousmethodologicalflaws,and (e) used a summativeend-of-course
examinationtakenby all groupsas a dependentvariable.The courselengthranged
from4 weeksto 18 weeks,but only 9 studieswereof coursesshorterthan 10 weeks.
Bangert-Drownset al. reportedtheir resultsin terms of effect size (differencein
mean scoresdividedby standarddeviationof the less frequentlytestedgroup).
Overall,they found an effect size of 0.25 favoringthe frequentlytested group,
representinga modest gain in examinationperformanceassociatedwith frequent
testing. However, the actual frequenciesof testing varied dramatically,so the
collectionof studieswas veryheterogeneous.In 12 studieswherethe low frequency
group received no testing prior to the summative examination, the effect size
increasedto 0.43. It seemsreasonableto hypothesizethat,in part,this largeincrease
may have come about becausestudentswho had at least one experienceof a test
fromthe teacherbeforethe summativeexaminationwereableto betterjudgewhat
preparationwould be most valuablefor the summativeexamination.On average,
effect sizes were smaller for longer treatments,probably because most longer
448

Impactof ClassroomEvaluationon Students


treatmentshad includedat least one intermediatetest for the less frequentlytested
group.
Bangert-Drownset al. were surprisedto find that the numberof tests per week
given to the high frequencygroup was not significantlycorrelatedwith the effect
size. Rather,the effect size was best predictedfrom the frequencyof testingfor the
control group. This suggeststhat the prime benefit from testing duringa course
comes from havingat least one or two such tests, but that greaterfrequenciesdo
not convey much benefit. One furtheranalysisthey conducted, however, raises
some doubtsaboutthis conclusion.They identifiedeightstudieswhich had groups
with high, intermediate,and low frequenciesof testing. Comparedto the low
frequencygroups,the highfrequencygroupshad a mean effectsize of 0.48, whereas
the intermediatefrequencygroupshad a mean effect size of 0.22. The difference
between 0.48 and 0.22 was statisticallysignificant.This finding must be treated
with caution,however,becauseof the smallproportionof the total sampleincluded
in this analysis.
Overall,the evidencesuggeststhat a moderatefrequencyof testingis desirable,
and more frequent testing may produce further modest benefits. Groups that
receivedno testingduringthe coursewereclearlydisadvantaged,on average.Only
four studies reportedstudent attitudestowardsinstruction,but all favoredmore
frequenttesting,with a mean effect size of 0.59, a largeeffect.
One issue not coveredin the reviewby Bangert-Drownset al. was whetherthe
tests duringthe course were cumulative,or relatedonly to content since the last
test (Keys, 1934; Rohm, Sparzo,& Bennett, 1986). The literatureon distributed
practice(see, for instance,Bjork, 1979, and Dempster,1987) suggeststhat the use
of cumulativetests, requiringrepeatedreviewof earliermaterial,would be advantageouson a comprehensiveend of courseexamination.The extent of this benefit
should depend on the natureof the course and of the examination.Hierarchical
courses, in which later topics draw heavily on earliermaterial,tend to build in
distributedpractice,and thus should not benefit as much from directlyretesting
earliercontent later in the course. On the other hand, courses that consist of a
collectionof topics that are only modestlyinterrelatedwould seem likelyto benefit
more from cumulativetesting practices(see, for example, Guza & McLaughlin,
1987, who studiedperformanceon spellingtests).
Anotherissue that needs furtherinvestigationis the effect of frequenttestingon
highercognitive level outcomes. In their review, Bangert-Drownset al. did not
distinguishamong studiesby the cognitivelevel of the tests and criterionexaminations,and it seemslikelythat most of the studiesdid not use significantnumbers
of questionsat the highercognitivelevels of Bloom's taxonomy. It can be argued
that frequenttestingmay not help (andmay actuallyinhibit)higherlevel outcomes,
even when the evaluationsfocus heavily on these outcomes. Studentsmay need
some "breathingspace"if they are to adopt the deep level approachesthat lead
most effectivelyto higherlevel outcomes (Entwistle& Ramsden, 1983;Ramsden,
1985).
Effectsof evaluativestandards.The effects of teacherevaluativestandardson
studenteffort have been examinedin a recent book by Natrielloand Dornbusch
(1984). They found that higherstandardsgenerallyled to greaterstudenteffortand
to studentsbeing more likely to attendclass. Studentswho perceivedstandardsas
unattainable,however,were more likely to become disengagedfrom school. As
449

TerenceJ. Crooks
Natriello(1987) has suggested,theremay well be a curvilinearrelationshipbetween
the level of standardsand studenteffortand performance,with some optimallevel
for each situation.This optimal level would probablydepend on other aspectsof
the evaluationarrangements,such as whetheror not studentsare given opportunities to get credit for correctingthe deficienciesof evaluatedwork, or the nature
of the feedbackon theirefforts.The weakerstudents,who are most at riskin highdemandclassrooms,may need considerablepracticalsupportand encouragement
if they areto avoid disillusionment.
Not surprisingly,Natriello and Dornbuschfound that if studentsthought the
evaluationsof theirworkwere not importantor did not accuratelyreflectthe level
of their performanceand effort,they were less likely to considerthem worthyof
effort.This conclusionis consistentwith the resultsof researchon studentattributions of the reasonsfor success or failurein educationaltasks (discussedlater in
this paper).
An importantissue is whetherthe standardsadoptedare to be norm-referenced,
or basedon the effortand improvementof individualstudents
criterion-referenced,
(Natriello, 1987). This choice appearsto differentiallyaffect the motivation and
learningof differentcategoriesof students.For instance,norm-referencedevaluation tends to underminethe learningand motivation of studentswho regularly
score near the bottom of a class, while posing much less risk to the top students.
No clearconsensusemergesfromthe literatureto date,but Natriello(1987)suggests
that self-referencedstandardsmay be optimal for most students.All studentscan
improve their knowledge, skills, and attitudes and have this verified through
evaluation,but only some can scoreabove the class medianon a measure.
When student performanceon achievementtests is the criterion,researchhas
generallyshown that higherstandardslead to higherperformance(e.g., Rosswork,
1977), although again a curvilinearrelationshipmay be predicted.Most of the
relevantclassroom-basedresearchderives from studies of masterylearning,and
these will be reviewedin a latersection.
The Impactof OtherInstructionalPracticesInvolvingEvaluation
EJJectsof adjunctquestionsin learningfrom text. In contrastto the researchin
the previous section, much of the researchreviewed in this section has been
conductedin laboratorysettings.The findingsof this research,however,converge
with findingsfrom researchon the use of conventionaltests in educationalprograms.Thus I believe that it is appropriateand valuableto include this extensive
body of researchin this review.
Adjunctquestionsarequestionsinsertedbefore,during,or aftera writtenpassage
that studentsareto study.Some studieshave allowedthe studentsto reviewearlier
materialafter they encounteran adjunctquestion, whereasothers have not permitted such looking back. The adjunctquestions may be factual or higherlevel
questions,althoughdefinitionsof higherlevel vary markedly(Carrier& FautschPartridge,1981). Their effects have been studied by examining the pace and
intensityof students'readingof portionsof the passage,and by testingstudentsin
a varietyof ways and at a varietyof times on the content of the passage.These
tests have looked at the students'graspof the content or skill directlycoveredby
the adjunctquestions,theirgraspof closelyrelatedmaterialnot directlyaddressed
450

Impactof ClassroomEvaluationon Students


by adjunct questions, and their ability to answer questions on materialin the
passagethat is unrelatedto any adjunctquestion.Finally,researchershave examined the effectson these outcomes of providingvariousforms of feedbackon the
students'performanceon the adjunctquestions,immediatelyor at some laterstage.
This body of researchhas been reviewed recently by Hamaker (1986) and
Hamilton(1985), and the earlierreviewof Andersonand Biddle(1975) has been
widelycited. These reviewsformedthe startingpoint for my reviewof this area.
Factual adjunctquestions. Hamaker(1986) used meta-analytictechniquesto
review50 experimentson the effectsof factualadjunctquestions.She found that
factualadjunctquestionsconsiderablyimprovedthe performanceof studentson
subsequenttest items testingthe same facts. The averageeffect size was about 1.0
(the adjunctquestiongroupmean averagedone standarddeviationhigherthan the
controlgroup mean). The mean effect size was very similarfor adjunctquestions
placedbeforeor afterthe relevantportion of the readingpassage(prequestionsor
postquestions).The effect of these same factualquestionson performanceon test
items coveringrelatedbut not identicalcontentwas also positive,but of about half
the magnitude(effect size approximately0.5). Again, the mean effect size was
similarfor prequestionsand postquestions.When Hamakerexaminedthe effects
of factualadjunctquestionson unrelatedtest questions,she found modestnegative
effects(effect size approximately-0.3) for prequestions,and negligibleeffectsfor
postquestions.The negativeeffect for factualprequestionshas been interpretedby
numerousresearchers(e.g., Anderson& Biddle, 1975;Hamaker, 1986;Wittrock,
1986)as resultingfromselectiveattentionto the materialcued by the prequestions.
Similareffectshave been found when studentsare given lists of factualobjectives
beforea readingassignment(Hamilton,1985;Wittrock,1986).The negligibleeffect
size on unrelatedquestionswhen factualpostquestionsare used differsfrom the
findingsof severalpreviousreviews(e.g., Anderson& Biddle, 1975)that claimeda
generalfacilitativeeffectof factualpostquestions.
Hamakerreportedthatthe effectsizeswereunrelatedto subjects'age,the interval
betweenreadingtask and posttest,the averagedistancebetweenadjunctquestions
and relevanttext information,and whetheror not subjectswereallowedto consult
the text whileansweringthe adjunctquestions.This finalfindingis important,even
if basedon relativelyfew studies,becausemost adjunctquestion studieshave not
allowedtheir subjectsto look back to earlierportionsof the text, a conditionthat
reduces ecological validity for applying the conclusions about postquestionsto
normalstudying(Duchastel,1979a, 1983).
The formatof the adjunctquestionsappearsto have some influenceon performance on test questions(Anderson& Biddle, 1975;Foos & Fisher, 1988;Hamaker,
1986). Adjunctquestionsin short-answerformathave producedmean effect sizes
abouttwice as largeas adjunctquestionsin multiple-choiceformat,when performance on the same informationwas the criterion.Smallerbenefitsmay also occur
with relatedquestions.This effectwould seem to be due to the differentprocessing
demandsof short-answerand multiple-choiceadjunctquestions.
The effect sizes were also found to be relatedto text length and the density of
adjunctquestions.The lengthof text and the ratioof lengthto numberof questions
were both positivelycorrelatedwith effect size in studiesusing postquestions,but
length was negativelycorrelatedwith performanceon repeatedtest questions in
studies using prequestions.It appearsthat the selectiveattention benefits of pre451

TerenceJ. Crooks
questionsdecreaseas the numberof factsto be coveredand the amountof searching
requiredincreases,whereaspostquestionswork well with long texts, especiallyif
the numberof adjunctquestionsis not too large.
The beneficialeffects of factualadjunctquestionsare not due to greaterstudy
time of studentsreceivingadjunctquestions.Althoughit is true that the inclusion
of adjunctquestionstends to increasestudy time a little where study time is not
controlled,the effectsizesfromstudiesin whichstudytime wascontrolled(identical
for experimentaland control groups)were generallyhigherthan the effect sizes
from studiesin which studytime was not controlled(Hamaker,1986, TableIX).
Higher order adjunct questions. Studying the effects of higher order adjunct

questionsis more complicated.These questionscan be at a varietyof cognitive


levelsand can requirethe studentto integrateideasfromgreateror smallersections
of the passage.The nature of the criterionquestions is also importantbecause
greatereffects would be expected with higher order criterionquestions,but the
effectson performanceon factualcriterionquestionsare also of interest(Watts&
Anderson,1971).
Hamakerincludedin her review21 studiesthat comparedthe relativeeffectiveness of higherand lower order adjunctquestions,and calculatedher effect sizes
based on this comparison. Comparedto lower order questions, higher order
questionsled to substantiallyimprovedperformanceon test performanceon the
same higherorderquestions,and to moderatelyimprovedperformanceon related
and unrelatedhigherorderquestions.They also improvedperformancea little on
unrelatedfactualquestions(Hamakerinterpretsthis improvementas due to more
thoroughreadingof the whole text). Comparedto lowerorderquestions,however,
the higherorder adjunctquestionsled to moderatelylower performanceon the
content of the lowerorderquestions,and on closelyrelatedcontent.
It shouldbe emphasizedthat, as reportedearlier,studentsreceivinglower order
adjunctquestionsgenerallyperformedsignificantlybetter than studentswho received no adjunctquestions,so the comparativeperformancereportedabove does
not mean that the use of higherorderadjunctquestionsdepressesperformanceon
factualquestionscomparedto a controlgroupthat receivesno adjunctquestions.
Indeed,studiesby Mayer(1975) and Wattsand Anderson(1971), which included
comparisonsof groupsansweringhigherorderadjunctquestionsand controlgroups
receivingno adjunctquestions,foundthatthe groupsreceivinghigherorderadjunct
questions generallyperformedas well or better than the control groups on the
factualtest items. Thus it appearsthat the use of higherorderadjunctquestionsis
not detrimentalto factual learning,but is distinctlyadvantageousto learningof
higher order skills, whether directly covered by the adjunct questions or not.
Hamakerconcludedthat higherorderquestionshave a more generalfacilitative
effect than factual questions. The results of Shavelson, Berliner, Ravitch, and
Loeding(1974) suggestthat this may be especiallytrue for longerterm retention.
D. W. Rowe (1986) reviewedevidence about the positioningof higher order
questions,and concludedthatthe facilitativeeffectsof higherorderquestionsapply
to prequestionsas well as to postquestions.This conclusion may only apply,
however,if studentsreturnto the questionsand activelyanswerthem.
Evaluation and the consolidation of learning. Beginning with early studies by

Jones (1923) and Spitzer(1939), numerousstudieshave demonstratedthat taking


a test on a topic after studyingit tends to enhance longer term retentionof the
452

Impactof ClassroomEvaluationon Students


materialstudied,even if no feedbackis given on the test performance.In many
cases,the observedeffecthasbeen strong(forinstance,Jones, 1923,foundretention
test scores for tested studentsdouble those for untestedstudents).This effect has
been describedas the consolidationfunctionof testing,and wouldappearto closely
parallel the benefits of adjunct postquestions(Duchastel, 1979b, Duchastel &
Nungester,1982,Foos & Fisher,1988).Indeed,wherethe learningis from reading,
the effectis indistinguishablefromthe effectof adjunctpostquestionsin the special
case wherethe postquestionsare massedtogetherat the end of the readingpassage
and studentsare not permittedto look backat the materialthey have read.Because
Hamaker(1986) found no importantdifferencebetweenthe effectsof insertedand
massedpostquestions,the findingsreportedearlierforadjunctpostquestionsshould
also applyin this situation.
The benefitsfrom testingcan apparentlybe explainedby threefactors.First,the
testinggets the studentsto attendto the content anothertime. This constitutesa
limitedformof distributedpractice,and the beneficialeffectsof distributedpractice
on retention are well established.Second, the testing encouragesthe student to
activelyprocesscontent,whichis knownto enhancelearningand retention(Brown,
Bransford,Ferrara,& Campione, 1983;Levin, 1982;McKeachie,Pintrich,Lin, &
Smith, 1986;Thomas & Rohwer, 1986;Wittrock,1974, 1979, 1986). Some types
of items may stimulate more active processingthan others (Duchastel, 1981).
Third,the test directsattentionto the topics, skills,and detailstested, which may
focus the student'spreparationfor a subsequentretentiontest. Studentsare more
likely to achieve goals that they clearlyperceive(Anderson& Armbruster,1984;
Brownet al., 1983; Rohwer& Thomas, 1987; Thomas & Rohwer, 1986). All of
these effects are predominantlyassociatedwith the content actuallytested, so it
is not surprisingthat little benefithas been shown for untestedmaterialunlessit is
closely relatedto the tested material(see, for instance, LaPorte& Voss, 1975;
Nungester& Duchastel,1982).
Effectsof teacheroral questionsin class. The extensiveliteratureon the relationshipsbetweenteacherclassroombehaviorsand studentachievement(fromprocessproductand experimentalstudies)has been summarizedrecentlyby Brophyand
Good (1986) and Rosenshineand Stevens(1986). One aspectof teacherbehavior
theydiscussis the use of teacheroralquestionsdirectedto studentsand the feedback
given to studentanswers.Researchon teacherquestioning(oftencalledrecitation)
has also been reviewedby Gall (1984).
Thesereviewsreportthatthe frequencyof teacherquestioninghasgenerallybeen
shown to be positively relatedto student achievement.Rosenshineand Stevens
(1986) state that "the critical variableseems to be a high percentageof student
responses"(p. 383). Reasonsfor the effectivenessof recitationhave been suggested
by Gall (1984). These include severalfactorsalreadydiscussedin this review:that
questionsencouragemoreactiveengagementin learning;thatthey providepractice
on the material,which helps to consolidate student learning;that they lead to
feedbackthat clarifiesunderstandingand correctsmisconceptions;that they cue
studentsas to the aspectsthe teacherregardsas more important(and thus more
likelyto be includedin tests subsequently);and that they give practiceon activities
similarto those in the criteriontests.
In orderto obtain full benefitfrom classroomquestioning,the reviewerssuggest
that questionsshouldbe directedto as many studentsas possible(to encourageall
453

TerenceJ. Crooks
towardactive learning),that teachersneed to practicephrasingquestionsin ways
that communicatethe task clearly,that the difficultylevel shouldbe such that the
majorityof questions receive satisfactoryresponses,and that responsesto other
than simple factual questions tend to be fuller and more appropriateif several
secondsare allowedbetweenquestionand response(see also M. B. Rowe, 1986).
Feedbackshould include knowledgeof results,but should make only limited use
of praise(e.g., praisemight be used mainly for correctresponsesfrom anxious or
less capablestudents)and verylittle use of criticism.
Perhapsthe most frequentlyresearchedaspectof teacheroralquestionshas been
the cognitivelevel of the questionsand the effectsof differentcognitivelevels on
studentachievement.This has also been an areain which reviewershave reached
markedlyvariedconclusions,althoughthe reviewershave agreedthat higherlevel
questionsare generallyused much less than lowerlevel questions(a ratioof 1 to 3
is typical of reportedfiguresfrom researchin school classrooms).Medley (1979)
and Rosenshine(1979) both concludedthat greateruse of higherlevel questions
led to lower student achievement.Winne (1979), in a review of relevantexperimental studies, found no clear pattern of achievementchange associatedwith
greateruse of higherlevel questions.Redfieldand Rousseau(1981), however,used
meta-analysison a very similarcollection of experimentalstudiesand reporteda
mean effect size of 0.73 favoringuse of higher cognitive level questions. More
recently,Samson,Strykowski,Weinstein,and Walberg(1987) conductedanother
meta-analysisof experimentalstudiesand founda mean effectsize of 0.26 favoring
use of higherlevel questions.
Severalfactorshelp to make senseof these contradictoryfindings(cf. Gall, 1984;
Samsonet al., 1987).First,studiesin this areahave been veryinconsistentin their
definitionsof higherand lower level questions.Lower level has been defined to
include the bottom one, two, or three categoriesfrom Bloom's taxonomy, and
other taxonomieshave also been used. Second,the difficultyof the questionshas
rarelybeen controlled,so that higherlevel questionsmay have been substantially
more difficult on averagethan lower level questions,which could have reduced
students'opportunityand motivation to learn effectivelyfrom these questions.
Third,too little attentionhas been paid to the natureof the criterionachievement
measures.The use of a criterioninvolvingonly factualrecallor recognitioncould
be predictednot to favorthe use of higherlevel oral questions.For example,the
studies reviewedby Medley (1979) and Rosenshine(1979) were predominantly
conductedin junior elementaryschool classeswith high proportionsof disadvantaged children,where the teaching focused very much on basic knowledgeand
skills.Thesestudentsmayhavehaddifficultyattendingto and correctlyinterpreting
the higherlevel questions,and the criterionmeasuresused generallyincludedfew
higherlevel questions.Fourth,many of the studieswere of very briefduration.It
could be predictedthat highercognitive level questionswould be most effective
when used consistentlyover substantialperiodsof time, especiallyif studentshad
previouslyhad little experiencewith such questions.This predictionis supported
by an analysisincludedin the reviewby Samsonet al. (1987). They found a mean
effectsize of 0.05 for 22 studieslasting5 daysor less, but a mean effectsize of 0.83
for 4 studieslasting20 daysor more.Finally,it is interestingto note that the review
by Samson et al. also reportedmarkedlylargermean effect sizes in studies that
were betterdesigned(randomassignmentto treatments,sample size greaterthan
454

Impact of Classroom Evaluation on Students


50), and in studies that more closely specified and/or monitored the degree to
which higher level questions were used.
Taking all these considerations into account, I believe it is justifiable to conclude
that the use of higher level oral questions by teachers usually fosters, or at least
does not harm, student achievement. The main exceptions are likely to be situations
in which the achievement measure consists almost entirely of factual recall or
recognition questions and situations in which the higher level questions are too
difficult or too unclear for many of the students. Careful guidance and training
may be needed before some students can respond appropriately to higher level
questions (see Dillon, 1982; Klinzing, Klinzing-Eurich, & Tisher, 1985; Mills, Rice,
Berliner, & Rousseau, 1980). Further, if higher level questions are to substantially
enhance student achievement, they will need to be used consistently over extended
periods of time.
The impact of the cognitive level of questions on student affect has not received
much attention. In particular, it seems reasonable to hypothesize that higher level
questions of appropriate difficulty would tend to enhance student interest in the
course content more than factual questions. Because of the long-term importance
of motivational factors in learning, research is needed to investigate this hypothesis.
Effects offeedback on performance. There is extensive literature on the effects on
providing knowledge of results and other forms of feedback on the evaluative tasks
performed by students. Factors involved include the nature and extent of the
feedback, its timing, its value in relation to the student's existing level of performance, and its relationship to the summative functions of evaluation.
Research that examined the effects of feedback on learning from text was
reviewed thoroughly by Kulhavy (1977). He found that feedback generally increased
what students learned from reading assignments that included questions or tests
for them to answer.
One exception to this positive conclusion occurred if the material was too difficult
for the students to process, so that they tended to choose to try to learn the
highlights from the feedback. This exception is further supported by a recent metaanalysis of the effects of feedback in 22 studies involving programmed and computer-based instruction. In this meta-analysis, Bangert-Drowns, Kulik, and Kulik
(1987) found a correlation of -0.44 between task difficulty (control group error
rate) and benefit of feedback (effect size comparing feedback group mean with
control group mean). Where error rates are high, the task of learning from the
feedback apparently becomes daunting.
A second exception occurred if the feedback was available too soon (as in some
programmed textbooks), thus allowing the student to avoid careful reading and
answering of the questions. This exception has also been confirmed by BangertDrowns et al. (1987), who found in their sample that where students had to make
responses to questions before receiving feedback, the average effect size was 0.38,
but where feedback was available without a student response, the average effect
size was -0.13.
Research on feedback on learning from classroom teaching has produced similar
findings on the effectiveness of feedback (Beeson, 1973; Bergland, 1969; Ingenkamp, 1986; Karraker, 1967; O'Neill, Razor, & Bartz, 1976; Page, 1958; Sassenrath
& Garverick, 1965; Strang & Rust, 1973; Wexley & Thornton, 1972).
Functions and form of feedback. Kulhavy reported that feedback acts to confirm

455

TerenceJ. Crooks
correctanswers,thus helpingstudentsto "knowwhat they know."There is very
little evidence that such knowledgeof correct responsesacts by reinforcingthe
correct response,and indeed feedbackon correct responseshas little effect on
subsequentperformance,except perhapsin the specialcase wherethe studenthas
gravedoubtsaboutthe correctnessof the initial answer.
The major benefit from feedbackreportedby Kulhavyis the identificationof
errorsof knowledgeand understanding,and assistancewith correctingthose errors.
In most studies,suchfeedbackclearlyimprovedsubsequentperformanceon similar
questions.Feedbackon incorrectresponseshas been shown to be most effective
wherethe initial responsewas made with high confidence,probablybecausethe
studentattendsmore to the feedbackin such cases (due to the elementof surprise
and the initialdesireto defendthe correctnessof the response).
It seems likely that the most effective form of feedbackwill depend on the
correctnessof the answer,the student'sdegreeof confidencein the answer,and the
natureof the task. If the answeris correct,simpleconfirmationof its correctnessis
sufficient.If the questionwas factualand the answeris incorrect,the most efficient
form of feedbackis probablysimplyto give the correctanswer(Phye, 1979).If the
questioninvolvescomprehensionor highercognitiveskills,however,moredetailed
feedbackis desirable.Studentswho answeredsuch questionsincorrectlywith high
confidencemay need help to identifythe sourceof their misunderstanding
(Block
& Anderson, 1975; Fredericksen,1984b), whereas students who answeredthe
question incorrectlywith low confidencemay need to be given conceptualhelp
and advisedto restudythe material.
Thereis little supportfromlaboratoryor classroomresearchfor makingpraisea
prominentpartof feedback,but Page(1958) foundthat simplepositivecomments
werebeneficial,and harshcriticismis predictablycounterproductive.
Both the age
and achievementlevel of the studentmay modifythis conclusion:youngerand less
able studentsmay benefitmost from praise.Praiseshould be reservedfor specific
achievementsthat truly representsubstantialaccomplishmentsfor the individual
student. The motivationaleffects of differenttypes of feedbackare discussedin
more detailin latersectionsof this paper.
Feedbackcan also play a very positive role in guidingstudentsin their use of
learningstrategies(Pressley,Levin, & Ghatala, 1984). Pressleyet al. found that
explicit feedbackon strategyuse was especially valuable with young children,
whereasadults who had tried severalstrategiesand been tested on their learning
weregenerallyable to identifythe most effectivestrategy.
The timingof feedback.Effectsof the timing of feedbackhave receivedconsiderableattention.Kulik and Kulik (1988) used meta-analytictechniquesto review
53 studies of the timing of feedback in verbal learning.They identified three
differentcategoriesof study,findingquite differentresultsfor the threecategories.
A key factorthat apparentlyinfluencedthese differenceswas whetheror not the
criteriontest questions were identical to the earlier feedbackquestions. Where
differentquestionswereused, most studiesfound a small advantagefor immediate
feedback(the mean effect size for 11 studieswas 0.28). Whereidenticalquestions
wereused(e.g.,Kulhavy& Anderson,1972),however,most studiesfounda modest
advantagefor delayed feedback(the mean effect size for 14 studies was -0.36).
Kulhavyand Anderson(1972) suggestedthat this effectarosebecausethe memory
of incorrectresponsesmade duringacquisitioninterferedwith the learningof the
456

Impactof ClassroomEvaluationon Students


correctresponsesfromthe immediatefeedback.Suchinterferencecouldbe expected
to decrease with delayed feedback, which would essentially serve as a second
learningtrial,providingdistributedpracticeon the task.
In most classroomsituations,where the tasks leadingto feedbackform only a
sampleof the desiredcourseoutcomes,thesedata suggestthat immediatefeedback
will be more beneficialthan delayedfeedback.Becausethe typicaleffectsizes were
not large,however,the precisetimingof feedbackdoes not appearto be too critical,
unlessit is delayedso long thatstudentshavelittlemotivationto payclose attention
and learnfrom it.
Are feedbackandsummativeevaluationcompatible?A finalissueto be addressed
here is whetherthe feedbackand summativepurposesof student evaluationare
best separated.Strong argumentsfor such separationhave been presentedby
McPartland(1987), Miller(1976), Sadler(1983), and Slavin(1978), amongothers.
They arguethat where evaluationscount significantlytowardthe student'sfinal
grade,the studenttendsto pay less attentionto the feedback,and thus to learnless
from it. This effect shouldbe reducedif studentsare given multipleopportunities
to test and provetheirachievement,with only the finalevaluationcountingtoward
theirgrade,as is generallythe case in masterylearningprocedures.Of course,one
argumentfor countingmore evaluationsin gradingis to improvethe reliabilityof
the gradingprocess,but this considerationwill often be less importantthan the
benefitsof evaluationfor learning.
Effectsof masterytesting.Kulik and Kulik (1987) conducteda meta-analysisof
studies of testing in masterylearningprograms,analyzingdata from 49 studies.
Each study took place in real classrooms,providedresultsfor both a class taught
with a masterytestingrequirementand a classtaughtwithoutsuch a requirement,
and wasjudgedfreeof seriousexperimentalbias. The studiesvariedin lengthfrom
1 to 32 weeks,with about half shorterthan 10 weeks. Effectsizes were againused
to describethe findings.
The mean effect size on summative, end-of-courseexaminationperformance
was 0.54, a strong effect. Kulik and Kulik note that this mean effect size is
substantiallylowerthan the figureof 0.82 reportedrecentlyby Guskeyand Gates
(1986) in anotherreview of studies on masterylearning.They rightlypoint out,
however,that 9 of the 25 studiesused by Guskeyand Gates calculatedeffect sizes
using combined scores from the instructionalquizzes (which the masterygroups
had multiple opportunitiesto pass) and the final examination,thus biasing the
resultsin favorof the masterygroup.The mean effectsize for the 16 studieswhich
avoidedthis biaswas 0.47, a figuremuch moreconsistentwith the Kuliks'findings.
Effectsizes variedmarkedlyin relationto three featuresof the studies.Studies
that had the same frequencyof testing in both groupshad a mean effect size of
0.48, whereasstudiesin whichthe test frequencywas not controlled(usuallyhigher
in the masterytesting group)had a mean effect size of 0.65. This differencewas
not statisticallysignificant,but it is worthyof note that the extrabenefit for more
frequenttesting is similarto the 0.25 reportedin the earliersection on frequency
of testing.
A statisticallysignificantdifferencewas found betweeneffect sizes from studies
in whichsimilarlevelsand typesof feedbackweregiven to studentsin both groups,
and those from studiesin which this was not the case (in these cases, the mastery
testinggroupscould be expectedto have receivedmore feedback).The two mean
457

TerenceJ. Crooks
effectsizeswere0.36 and 0.67, suggestingthat a majorcomponentof the effectiveness of masterytestingarisesfromthe additionalfeedbackthat it usuallyprovides.
The otherstatisticallysignificantdifferencewas betweenstudiesat varyinglevels
of masterycriterion.In 17 studieswherethe criterionlevel for masterywas a score
of 91%or higheron unit tests, the mean effect size was 0.73; in 15 studieswith a
criterionlevel of 81 to 90%,the mean effect size was 0.51; and in 17 studieswith
a criterionlevel below 81%the mean effect size was 0.38. This is a strongeffect,
demonstratingthat under masterytesting conditionsa highercriterionlevel generallyproducesgreaterlearning(assessedon an end-of-courseexamination).
Thus the resultsof researchon masterytestingsuggestthat the sizeablebenefits
observedlargelyrepresentthe combinedeffectsof the benefitsdescribedin earlier
sectionsfrommore frequenttesting,fromgivingdetailedfeedbackon theirprogress
on a regularbasis, and from setting high but attainablestandards.One further
effect that is probablyimportantis the benefit of allowingrepeatedopportunities
to attain the standardsset. This feature might have considerablebenefits in
increasingmotivationand a sense of self-efficacy,while reducingthe anxietyoften
associatedwith one-shottesting(Friedman,1987).Kulikand Kulik(1987) reacha
similarconclusionto Abbottand Falstrom(1977):the otherfeaturesoften included
in coursesbasedon masterylearningmodelsdo not appearto add significantlyto
the effectsdescribedabove.
As in the sectionon frequencyof testing,some cautionmust be expressedabout
the generalizabilityof the findingson masterytestingbecausethe cognitivelevels
of the tests and examinationswere not analyzed.Differenteffects may occur for
coursesand teststhat heavilyemphasizehighercognitivelevel outcomes,especially
in relationto the benefits of more frequenttesting. The benefitsof feedback,of
opportunitiesforextraattemptsat tasksinitiallyhandledpoorly,and of challenging
standardsseem more likelyto applyto evaluationtasksat all cognitivelevels.
Effectsof competitive,individualistic,and cooperativelearningstructures.Many
studieshaveexaminedthe effectsof differentclassroomlearningand goal structures
on students.In particular,considerableattentionhas been given to the effectsand
comparativemeritsof competitive,individualistic,and cooperativelearningstructures. In competitivestructures,the successor failureof studentsis largelydeterminedby theirperformancerelativeto otherstudents.In individualisticstructures,
studentsare rewardedon the basis of their own work, independentof the work of
other students.In cooperativestructures,studentswork togetherin groups,and
judgmentsof successare based on the overallachievementsof each group.Ames
(1984) has classifiedthese situationsaccordingto the patternof interdependence
among students.Competitivestructuresinvolve negativeinterdependencebecause
success for one student reducesthe chances that other studentswill succeed. In
individualisticstructures,there is no interdependenceamong students.Finally,in
cooperativestructures,there is positive interdependenceamong students, since
success for one student assiststhe success of all membersof the group in which
that studentis a member.
Effects on cognitive outcomes. Johnson, Maruyama, Johnson, Nelson, and Skon
(1981) conducted a meta-analysis of 122 studies that examined the comparative

effectson studentachievementof two or moreof thesecategories(fortheirpurposes,


they identified four categories,subdividingthe cooperative structurecategory
458

Impactof ClassroomEvaluationon Students


into two subcategories:cooperationwith intergroupcompetition,and cooperation
without intergroupcompetition).They used three differentways of summarizing
the findings(vote count, a z-score method, and effect size). Becausethese three
approachesusuallyproducedsimilarconclusions,I shallbase my summaryof their
findingson the effect size data. They found that competitiveand individualistic
structuresseemedequallyeffective,with a mean effectsize betweenthese structures
of 0.03. Cooperativestructures(without intergroupcompetition)generallyproducedhigherachievementthancompetitiveor individualisticstructures(bothmean
effect sizes were 0.78). Structuresthat involved cooperationwithin groups but
competitionbetweengroupsalso led to higheraverageachievementthan competitive or individualisticstructures(mean effect sizes of 0.37 and 0.50, respectively).
Johnsonet al. (1981) also conductedregressionanalysesto examinethe influence
on theseeffectsizesof some 20 possiblemediatingor moderatingvariables,although
small sample sizes restrictedthe usefulnessof many of the findings. There was
some evidencethat the benefitsof cooperativestructuresweregreaterwhen group
sizes were small (2 or 3), when the task requiredmore interdependenceamong
groupmembers(e.g., a groupproductwas to be generated),and when the task was
not a simple exercise(see Johnson, Maruyama,& Johnson, 1982, for the clearest
data on these issues). Overall,Johnson et al. (1981) concluded that cooperative
structuresare generallysuperiorto competitive or individualisticstructuresin
promotingstudentachievement.
This conclusionwas criticizedby Cottonand Cook (1982) and McGlynn(1982),
with a responsefrom Johnson et al. (1982). The heart of the criticisminvolved
concern that no such generalstatementcould be made, given the reportedinteractions of the effect sizes with other variables,and furtherprobableinteractions
with othervariablesthat were not studied.Johnsonet al. (1982) effectivelyrefuted
some of the more specificcriticisms,but agreedthat there probablyare learning
situationsin which cooperativestructuresare not as effective as competitiveor
individualisticstructures.They noted, however,that such situationsappearto be
much less common than those in which cooperativestructuresare superior.
The effectson achievementof cooperativelearningstructureshave been further
analyzed by Slavin (1983b, 1984), who focused on the value of cooperative
incentives.Cooperativeincentivesare incentivesin which the rewardsfor individuals are based on performanceof the group as a whole (either througha group
productor throughthe aggregatedperformancesof the individualgroupmembers).
Slavin contrastedthree incentive situationsfor studentswho have been asked to
work on tasks in groups:group rewardfor the individualperformanceof group
members,grouprewardfora groupproduct,and individualrewardforperformance
tested individuallyafter the group activitieswere completed. He reportedstrong
evidence(basedon 28 studies)thatthe use of grouprewardbasedon the individual
performanceof group memberswas an effectivestrategyfor enhancingthe mean
achievementof the group, hypothesizingthat this incentive structureencouraged
group membersto be concernedabout improvingthe learningof all group members.Slavinreportedthat studiesof the use of grouprewardon the basisof a group
product did not demonstrateany clear superiorityof cooperativelearningover
noncooperativeapproaches.Slavin gave some emphasis to this finding in his
conclusions,causingsome controversybecauseit was based on only eight studies
459

TerenceJ. Crooks
and should thus probably be regardedas tentative. The 10 studies in which
individualrewardsweregiven basedon individualperformanceshowedno advantage for cooperativestudy over noncooperativeapproaches.
Slavinconcludedthat the use of grouprewardsbasedon the individualperformance of group members is essentialto the effectivenessof cooperativelearning
methods.Such a strongconclusionmay not be justifiedon the basisof the data he
reported,but this incentivestructuredoes appearto be beneficialto grouplearning
(see also Lew, Mesch,Johnson,& Johnson, 1986).
Effects on social outcomes. One widely cited benefit of cooperativelearning
structuresis that they lead to increasedcohesivenessamong the studentsinvolved
(Johnson, Johnson, & Maruyama,1983; Slavin, 1983a). This can be especially
beneficialin classesthat are diversein ethnic composition,abilitylevel, or because
of the inclusion of mainstreamedhandicappedstudents. Johnson et al. (1983)
conducteda meta-analysisof 98 studiesof cooperativelearning,with interpersonal
attractionas the dependentvariable.They found little differencebetweencompetitive and individualisticstructures,but studentsin cooperativestructuresscored
substantiallyhigherin mean interpersonalattraction.Wherethe cooperativegroups
were not competitivewith each other, the effect size was 1.11 (comparedboth to
competitiveandto individualisticstructures).Wheretherewascompetitionbetween
groups,the meaneffectsizewassmaller(0.79 comparedto individualisticstructures,
0.55 comparedto competitivestructures).Clearly,structuresthat encouragecooperationamongstudentscan havesubstantialbeneficialeffectson socialrelationships
among students.
Astin (1987) discussedthe benefitsof cooperativelearningin highereducation.
Among otherthings,he emphasizedthat a key benefitcould be an enhancedsense
of mutualtrust,both among studentsand betweenstudentsand teacher.He noted
that in competitivelearningsituations,studentsoften work very hard to disguise
theirignorance(frompeersand fromtheirteacher).This limits the availabilityand
effectivenessof feedback,thus undermininglearning.Astin sees cooperativestructureshelpingto overcomethis problem,while fosteringinterpersonalskillsthat are
greatlyneededin the community.
MotivationalAspects Relatingto ClassroomEvaluation
Researchhas repeatedlydemonstratedthat the responsesof individualstudents
to educationalexperiencesand tasks are complex functionsof their abilitiesand
personalities,theirpasteducationalexperiences,theircurrentattitudes,self-perceptions and motivationalstates,togetherwith the natureof the currentexperiences
and tasks. Effectiveeducationrequiresthe fusing of "skilland will" (Paris, 1988;
Paris& Cross, 1983),and intrinsicinterestand continuingmotivationto learnare
educationaloutcomesthat shouldbe regardedas at least as importantas cognitive
outcomes(Maehr,1976;Paris, 1988).The importanceof motivationalfactorshas
been vigorouslystatedby Howe (1987):
I have a strong feeling that motivational factors are crucial whenever a person
achieves anything of significance as a result of learning and thought, and I cannot
think of exceptions to this statement. That is not to claim that a high level of
motivation can ever be a sufficient condition for human achievements, but it is
undoubtedly a necessary one. And, conversely, negative motivational influences,
such as fear of failure, feelings of helplessness, lack of confidence, and having the

460

Impactof ClassroomEvaluationon Students


thatone'sfateis largelycontrolled
experience
by externalfactorsratherthanby
haveeffectsthatrestricta person'slearnedachievements.
oneself,almostcertainly
(p. 142)
Modern theories of achievement motivation (Dweck & Elliott, 1983; Eccles,
1983; Nicholls, 1984; Weiner, 1986) place considerablestresson the importance
of studentself-perceptionsin determiningresponsesto educationaland evaluative
tasks. Thus, for instance,the attributions(reasons)studentsgive for their success
or failure,or theirperceptionsof self-efficacy(capabilityto performwell)arehighly
importantfactorsinfluencingtheirbehavior.To a significantdegreethesevariables
are task- or domain-specific,so that it is more profitableto think about them in
this way than as enduringgeneralcharacteristics.
One important factor that should be taken into account in consideringthe
relationshipbetweenmotivationalvariablesandachievementis the repeatedfinding
or suggestionof curvilinearrelationships(Eccles, 1983; McKeachieet al., 1986).
Both veryhigh and very low levels on motivationalvariablesmay be less desirable
than intermediatelevels. For instance, if perceivedtask importanceis very low,
many students may not try very hard (as reportedby Natriello & Dornbusch,
1984). If perceivedtask importanceis very high, however, anxiety may inhibit
performance(Tobias, 1985). Similarly,if studentshave a very low level of selfefficacyfor a task, they are unlikelyto attackthe task with much enthusiasmor
persistence,but if they have a very high level of self-efficacy,they may not give the
task sufficientcareand attentionto achievegood results(Schunk,1984).
The following subsectionsbriefly review five interrelatedareas of researchon
studentmotivationand affect. In each area, the classroomevaluationof students
appearsto playa majorrole.Althoughmotivationalconsiderationsareemphasized,
effectson cognitiveoutcomesare also discussedwhereappropriate.
Testanxiety.The researchon test anxietyhas been reviewedby Hill (1984), Hill
and Wigfield(1984), McKeachie(1984), McKeachieet al. (1986), Sarason(1980),
and Tobias(1985). Studieshave repeatedlyshown substantialnegativecorrelations
between measures of test anxiety collected before tests are administeredand
performanceon those tests. The magnitudesof the correlationsappearto increase
at highergradelevels, with one study findinga correlationas strongas -.60 for
1Ith-gradestudents(see Hill, 1984,p. 248). The debilitatingeffectsfor highanxiety
studentsaregreaterwhen the studentperceivesgood performanceon the test to be
particularlyimportant,when the test is expected to be difficult, and when the
testing conditionsare particularlyintrusive(e.g., rigid time limits and associated
time pressures,special test instructionsor conditions, unfamiliartest formats).
Thus the effectstend to be greateron standardizedtests than classroomtests.
Althoughfailureson earliertasks clearlyinfluencethe developmentof anxiety,
the anxietydoes not simply arise from lack of the knowledgeor skills requiredto
answerthe test items. Severalstudies have shown that high anxiety studentsdo
much betteron the same cognitive tasks administeredunder less stressfulconditions, performingat levels much closer to those of their less anxious peers (Hill,
1984;Hill & Wigfield,1984).
A numberof differentmechanismshavebeen suggestedto explainthe debilitating
effects of anxiety on achievement(McKeachieet al., 1986; Tobias, 1985). These
have includedsuggestionsthat high anxiety studentsmay be weak in their use of
461

TerenceJ. Crooks
cognitiveand metacognitivelearningstrategies,that they may use poor test-taking
strategies,or that they may be particularlyprone to distractingthoughts while
taking a test (such as thoughts about failure or about difficult items yet to be
completed).These proposedmechanismsare clearlynot mutuallyexclusive.The
first (weaklearningstrategies)would not explain the findingsreportedin the last
paragraph,so it is not a sufficientexplanationby itself. However, it does have
empiricalsupport (Naveh-Benjamin,McKeachie,& Lin, 1987). The other two
mechanismsare more specific to the testing situation, and both have empirical
support.
Severalguidelineshave been suggestedfor reducingthe debilitatingeffectsof test
anxietyin classroomevaluationprograms(Hill, 1984;Hill & Wigfield,1984).These
include:testingunder"power"testingconditions(verygeneroustime limits, so no
student feels under significanttime pressure);avoiding distinctiveand stressful
testingconditions;giving the studentsample details of the nature,difficulty,and
formatof the test (with practiceexamples);settingtasksthat allow each studenta
reasonablelevel of success; reducing emphasis on social comparison (Hill &
Wigfield suggest avoiding the use of letter grades in elementaryschools); and
providingspecialtrainingfor studentswho may be victimsof test anxiety.
Studentself-efficacy.Self-efficacy,as definedby Bandura(1977, 1982), refersto
students'perceptionsof their capabilityto performcertain tasks or domains of
tasks.Researchon the role of self-efficacyin achievementbehaviorand classroom
learninghas been reviewedby Schunk(1984, 1985).Perceptionsof self-efficacyin
an area have been shown to correlatehighly with achievementin that area. For
instance,in a recentstudyby Thomas,Iventosch,and Rohwer(1987), self-efficacy
was found to be a better predictorof school achievement than their selected
measureof academicability.They also found that studentswith high self-efficacy
tended to make more use of deeper learningstrategies(generativeand selective
activities)than otherstudentsdid.
Perceptionsof self-efficacyappear to have a strong influence on effort and
persistencewith difficult tasks, or after experiencesof failure (Bandura, 1982;
Schunk, 1984, 1985). Under such circumstances,students high in self-efficacy
usuallyredoubletheir efforts,whereasstudentslow in self-efficacytend to make
minimaleffortsor avoid such tasks.
The main mechanismfor buildingself-efficacyin a particulardomain appears
to be experiencingrepeatedsuccess on tasks in that domain. Success at tasks
perceivedas difficultor challengingis more influentialthan successon easiertasks.
On the other hand, of course,repeatedfailureleadsto loweredself-efficacy.More
than 40 yearsago, E. L. Thorndikebegana paperwith these words:
It is a matterof common knowledgethat a mind which for any reasonbecomes
engaged in an activity and finds itself repeatedly and persistently failing therein, is
impelled to intermit or abandon it. The person does abandon it unless this
impulsion is counterbalanced by some contrary force, such as the hope of a turn
of the tide toward success, or an inner sense of worth from maintaining the activity,
or a fear that worse will befall him if he stops. (Thorndike & Woodyard, 1934, p.
241).

To foster self-efficacy,evaluationsof task performanceshould emphasizeperformance(task mastery)ratherthan task engagement(Schunk, 1984). Thus, for
462

Impactof ClassroomEvaluationon Students


instance,gradecredit should be given for quality of work on an assignment,not
merely for handing it in (Schunk, 1984). Also, the emphasis in performance
feedbackshould be on informingstudentsabout their progressin mastery,rather
than on socialcomparison(Schunk,1985).This is crucialforthe less able students,
who mightotherwisereceivelittlepositivefeedback.Finally,thereis strongevidence
that self-efficacyis best enhancedif longerterm goals are supportedby a carefully
sequencedseriesof subgoalswith clearcriteriathat studentsfind attainable.This is
especiallyimportantif the studentsare young, or if they initiallylack confidence
or interestin the domain (Bandura& Schunk, 1981).These requirementsare met
by masterylearningprocedures,if well implemented(Driscoll, 1986),but can also
be incorporatedin otherapproachesto teachingand learning.One concernis that
teaching and evaluation arrangementsbe sufficientlyflexible to ensure suitably
challengingtasksfor the most capablestudents,as otherwisethey would have little
opportunityto build their perceptionsof self-efficacy(and much opportunityfor
boredom).
Intrinsic motivationand continuingmotivation.Intrinsic motivation to learn
(definedas a self-sustainingdesireto learn)and continuingmotivation(definedby
Maehr,1976,as a tendencyto returnto and continueworkingon tasksawayfrom
the instructionalcontextin whichthey wereinitiallyconfronted)arehighlyrelated
concepts.Both, in turn, are closely relatedto interestin the materialthat is being
studied. Few would disagreethat such interest is a very desirableoutcome of
educationalactivity,and also a very importantfactorinfluencingthe qualityand
extent of learning activities. Alfred North Whitehead, in his characteristically
forthrightway, went so faras to suggestthat "therecan be no mentaldevelopment
without interest. Interest is the sine qua non for attention and apprehension"
(Whitehead,1929, p. 48). Maehr(1976) arguesthat continuingmotivationis also
importantbecauselearningdoes not just take place in classrooms.Activitiesthat
studentsengagein by choice outsidethe classroomcan complementand strengthen
classroom-basedlearning,and can also lead to that learningbeing extended and
updatedlong afterthe formalclassroomprogramends.
The researchon intrinsicand continuingmotivationhas been reviewedin recent
yearsby Cornoand Mandinach(1983), Cornoand Rohrkemper(1985), deCharms
(1976),Deci (1975), Deci and Ryan(1985),Harter(1985),Maehr(1976),McCombs
(1984), and Ryan, Connell, and Deci (1985), among others. Corno and her
colleagueshave arguedthat intrinsic motivation and self-regulatedlearningare
closely linked, presentingevidence that self-regulatedlearningexperiencesfoster
intrinsicmotivation,and that intrinsicmotivationin turn encouragesstudentsto
be more independentas learners.There is general agreementamong the other
reviewersthatallowinga degreeof studentautonomyin choiceof learningactivities
and objectivesis a key factorin fosteringintrinsicmotivation.In consideringthe
problem of passive readingfailure,Johnston and Winograd(1985) drew on the
work of deCharms(1983) and othersto suggestthat more opportunitymight be
given in school for studentsto engagein recreationalreading.They point out that
this would both encourageand make use of intrinsicmotivation,and that it would
also reducethe likelihoodof normativesocialcomparisons,becauseany evaluation
of this activitywould of necessityhave to be individualized.
Thereis also widespreadagreementamong the reviewersthat the use of extrinsic
motivationis problematic.The problemscan be illustratedby brieflyexamining
463

TerenceJ. Crooks
the findingsof threestudies.Lepper,Greene,and Nisbett(1973) foundthatstudents
who had previouslychosen to engage in an activity voluntarily,with apparent
enjoyment,were less inclined to returnto that activity after they had receiveda
rewardfrom a teacher for engagingin the activity. Maehr and Stallings(1972)
studiedstudentsperformingeasy or hardtasksunderextrinsicor intrinsicmotivation conditions.They found that studentswho workedunder the intrinsicmotivation condition continuedto be interestedin workingon difficulttasks,whereas
students who worked under the extrinsic motivation condition lost interest in
attemptingdifficulttasks, preferringto attemptonly easy ones (see also Hughes,
Sullivan, & Mosley, 1985). Finally, Condry and Chambers(1978) found that
studentsin their extrinsicmotivationgroupwere more answeroriented,tryingto
take shortcutsto producethe desiredanswers,whereasstudentsin the intrinsic
motivationgrouptendedto use deeper,moremeaningfulapproachesto understanding the tasks.
These and other studieshave repeatedlyshown that wherestudentsare initially
intrinsicallymotivated,attemptingto stimulatelearningthroughextrinsicmotivation usuallyleadsto decreasedintrinsicmotivation,especiallyon challengingtasks.
Such a resultis clearlynot desirable.On the other hand, where studentsinitially
lack intrinsicmotivationin a particularsubjectarea, researchreportedin the last
sectionsuggeststhata carefullyplannedprogramof positiveeducationalexperiences
accompaniedby extrinsicmotivation can lead to the developmentof interestin
the area,and thus to intrinsicmotivation.Unfortunately,however,thereis strong
evidencethat in most educationsystemssuch gains are usuallyoutweighedby the
losses.Many observershave commentedon the contrastbetweenthe broadenthusiasm for learning demonstratedby most children in the first year or two of
schoolingand the jaded approachof many older students.Althoughsome of this
differencemay relateto developmentalfactors,it is hardto escapethe conclusion
that for many studentsschoolingtends to lower ratherthan increaseinterestin
learning.
It is importantto note that classroomevaluationproceduresneed not have the
debilitatingeffects on intrinsic motivation noted above. Deci (1975) and others
(Keller, 1983;Ryan, Connell,& Deci, 1985) have noted that the key factorseems
to be whetherstudentsperceivethe primarygoal of the evaluationto be controlling
their behavioror providinginformativeand helpfulfeedbackon their progressin
learning.Evaluationcan be used as a bludgeonto make studentslearn,and in the
shorttermthis may producesignificantlearning,but the longerterm consequences
of such an approachappearto be most undesirable,especiallyfor the less able
students.
Attributionsfor success and failure. Extensiveresearchhas demonstratedthat
studentself-perceptionsof the factorsinfluencingsuccessor failurein learningtasks
have a very significantinfluenceon their motivationand behavior.Such attributions for successor failurearecentralto Weiner'stheoryof achievementmotivation
(Weiner, 1979, 1985, 1986), and many other researcherson motivationhave also
stressedtheir importance.Researchon studentattributionshas been reviewedby
Covington(1984, 1985), Dweck and Elliott (1983), Nicholls (1983, 1984), Paris
and Cross(1983), and Weiner(1985, 1986),among others.
Weiner(1979) statedthat successor failurecould be attributedto four possible
causes:ability,effort,luck, or task difficulty.The firsttwo of these are internalto
464

Impactof ClassroomEvaluationon Students


the student, the latter two are external.Weiner also identifiedemotional consequenceswhen successor failureis attributedto these causes.For instance,he stated
that successwhich is attributedto ability or effort leads to pride and self-esteem,
that failureattributedto lack of effortleads to guilt, and that failureattributedto
stablefactors(lack of abilityor task difficultythat is consistentlytoo high) leadsto
hopelessness.
Nicholls (1984) and Dweck and Bempechat (1983) reviewed evidence that
studentsdo not sharea single conceptionof ability.Up to about 10 yearsof age,
studentsgenerallyconceive of ability as learningthrougheffort, so that gains in
task masteryare indicativeof enhancedability. Many older childrenand adults,
however,conceive of ability as a stable trait that is judged normatively(i.e., by
comparingperformancesof differentindividuals).In this conception,normatively
superiorperformanceis indicativeof ability,especiallyif it requirescomparatively
little effort.
These two conceptionsof ability differentiallyaffect the achievementbehavior
of students(Covington,1984, 1985;Nicholls, 1984).Studentswiththe taskmastery
concept of ability like challengingtasks that appear reasonablylikely to yield
successafterconsiderableeffort.Such taskscan give them a sense of achievement
and thus enhance their perceivedability. Among students with the normative
concept of ability, those who believe they are of high ability tend to prefertasks
that they perceiveas of medium difficulty,and thus are likely to confirm their
ability by again distinguishingthem from studentsof lower ability. On the other
hand, those who believe they are of low ability try to avoid tasks of medium
difficulty,becausethese are likelyto confirmtheirlow abilityby requiringsubstantial effortyet carryinga substantialrisk of failure.Such studentsprefereithereasy
or difficulttasks becausethese are less likely to demonstratetheir lack of ability.
Studentswho perceivethemselvesas havinginsufficientabilityto do well on most
assignedclassroomtaskstend to displayhelplessness(see Dweck & Elliott, 1983).
This meansthat they do not expendmuch effortto learnbecausethey expecttheir
effortsto resultin failureanyway,and thus to reemphasizetheir low ability.
Severalresearchers(Ames, 1984;deCharms,1983;Maehr, 1983;Nicholls, 1983)
have identified two or more categoriesof achievementgoals. They all make a
distinctionbetweentask goalsand ego goals,which parallelthe two conceptionsof
ability discussedabove. With task goals, studentsbelieve they are responsiblefor
the outcome, that there are reasonablyclear masterycriteriafor success,and that
the outcome is not preordained.Task goals are often associatedwith intrinsic
motivation. With ego goals, on the other hand, the key feature is that success
requiresdoing betterthan someone else.
Task characteristicsinteract with students'personal conceptions of ability to
determinewhetherstudentstreat particulartasks as task goals or ego goals. For
instance,many computergames are extremelychallengingfor the players,yet less
competentplayersoften displayhigh levels of motivationand persistencebecause
the task characteristicsfavor task goals and reducethe salience of ego goals. By
feedback,and under
contrast,many othertasksgive much less criterion-referenced
such conditionssuccessis more likelyto be judged normatively.
This researchseems to have clear implications for classroom teaching and
evaluation.If all studentsare to be encouragedto learn,conditionsthat favortask
goals over ego goals are desirable.These conditionsincludechallengingbut attain465

TerenceJ. Crooks
able tasks, some individualizationof tasks, use of tasks that are more intrinsically
motivating or more gamelike in nature, opportunitiesfor student autonomy in
learning,little use of abilitygroups,use of cooperativelearningapproaches,provision of unambiguousperformancefeedbackthat emphasizesmasteryand progress
(ratherthan normativecomparisons),and little emphasison summativegrading
(Covington, 1985; Johnston & Winograd, 1985; Maehr, 1983; Nicholls, 1983;
Rosenholtz & Simpson, 1984). Under such conditions, failure at a task is more
likelyto be constructiveratherthan destructive(Clifford,1984). If such conditions
couldbe fostered,perceivedabilitystratificationwouldbe reduced,with consequent
reductionsin the seriousdifferentialchangesof self-esteemthat occur from about
the age of 10 (Kifer, 1977).
Motivational aspects of competitive,individualistic,and cooperativelearning
structures.Researchon motivationalaspects of competitive, individualistic,and
cooperative task and incentive structureshas been reviewed by Ames (1984),
Johnson and Johnson (1985), and Slavin (1987). The motivational effects of
competitive structureshave been discussedin earliersections, but will be briefly
summarizedhere. Social comparison(norm referencing)is centralto competitive
structures.This tends to resultin severediscouragementfor the studentswho have
few academic successes in competition with their peers. It discouragesstudents
from helpingeach otherwith theiracademicwork,and also threatenspeerrelationships,encouragingan "us and them"mentalitywhichtends to segregatethe higher
and lower achieving students (Deutsch, 1979). It does not encourage intrinsic
motivation.Finally,it tends to encouragestudentsto attributesuccessand failure
to abilityratherthan to effort,which is especiallyharmfulfor the weakerstudents.
In individualisticstructures,rewardsarebasedon criterion-referenced
evaluation.
If all studentsare evaluatedon the same tasks, using the same standards,this can
simply become another type of competitive structure(Ames, 1984), but at least
there is some possibilityof all students meeting specifiedpassingstandards.The
provisionof repeatedopportunitiesto meet the standardscan be a key factorin
reducingthe competitivenessof such individualisticstructures.If, on the other
hand, student'sprogramsof work are more individualized,and the emphasisin
evaluation is placed on each student's progressin learning, competitivenessis
minimized. Under these circumstances,students are more inclined to help each
other, and successand failureon a task are more likely to be attributedto effort
ratherthan to ability. This, in turn, generatesconditions that support intrinsic
motivation.
Cooperativestructuresencouragehelpingand within-grouptutoringbehaviors,
especiallywhen group rewardsare based on the performanceof all the individual
group members. Webb (1985, 1988) has identified the giving or receiving of
elaboratedexplanations as a key factor in student learning within groups, so
conditions that favor such activities are desirable.Participationin cooperative
learningtends to moderatethe positive or negativeinfluence of a student'sown
high or low performance,temperingboth negative and positive self-perceptions
resultingfrom performance,and reducingperformanceanxiety(Ames, 1984).This
can help build both self-esteem and achievement for previously low-achieving
students, especially if their group is successful reasonablyconsistently. Effort
attributionsare encouraged,partlybecause the differentgroups are usuallycom466

Impactof ClassroomEvaluationon Students


parablein their mix of abilities.Finally,Ames (1984) and Johnson and Johnson
(1985) presentedevidencethat learningin a cooperativegroupis more enjoyable
for most studentsthan learningindividually,and thatthistendsto enhanceintrinsic
motivationfor learning.
for EducationalPractice
Conclusionsand Recommendations
This review began with a caution about the dangersof overgeneralizationin
educationalresearch.In statingthe followingconclusionsand recommendations,
therefore,I must stressthat they are not likelyto applyin all situationsor with all
students.Instead,they representsimplificationsthat appearlikely to benefit the
greatestproportionof students,and in particularto providemorefavorablelearning
conditionsfor the weakerstudents.Many of the specificpoints drawsupportfrom
severalof the areas of researchreviewedearlier,thus increasingthe confidence
which I have in them.
Importanceof evaluation.Classroomevaluationaffectsstudentsin manydifferent
ways. For instance,it guidestheirjudgmentof what is importantto learn,affects
their motivationand self-perceptionsof competence,structurestheir approaches
to and timing of personalstudy (e.g., spacedpractice),consolidateslearning,and
affectsthe developmentof enduringlearningstrategiesand skills. It appearsto be
one of the most potent forcesinfluencingeducation.Accordingly,it deservesvery
carefulplanningand considerableinvestmentof time fromeducators.Manyof the
skills and attitudesthat are goals of education take years to develop, and their
developmentcan be underminedby lack of consistent supportfor them in the
educationalexperiencesof the students(see Howe, 1987;Meyers,1986).
Classroomevaluationcurrentlyappearsto receiveless thoughtthan most other
aspectsof education.Itspowerto affectstudentsis not widelyperceivedor discussed.
A more professionalapproachto evaluationwoulddemandregularand thoughtful
analysisby teachersof theirpersonalevaluationpractices,greateruse of peerreview
procedures,and considerableattention to the establishmentof more consistent
progressionsof expectationsand criteriawithinand amongeducationalinstitutions.
Importanceof deep learning.All too often, classroomevaluationplaces heavy
emphasison the recallor recognitionof comparativelyisolatedpieces of information to which the students have earlierbeen exposed. This encouragessurface
(memorizing)approachesto learning. Many of these details have at best only
temporaryrelevanceto the students,eitherbecausethe areastudieddoes not relate
to their later activitiesor interests,or becausethe details are supercededby new
informationor developments.Further,it has been repeatedlydemonstratedthat
isolateddetailsareespeciallyreadilyforgotten,and that informationis remembered
better and is more useable if students learn it within a broaderframeworkof
and understanding.Finally,the knowledgethat stumeaningfulinterrelationships
dents accumulateduringschoolingmay be less importantthan the learningskills
and habitsthey develop, which can help them grow and adaptto new needs and
experiencesthroughouttheirlifetime.This is increasinglytrue as moderntechnology makesfactualinformationavailableveryflexiblyand quickly(Rothkopf,1988,
p. 279).

For all these reasons,there is a need to make deep learninga centralgoal of


education,and to fosterdevelopmentof this goalthroughthe evaluationof students
467

TerenceJ. Crooks
(see also Bloom, 1986; Bok, 1986; Cronbach, 1988; Lowell, 1926; Whitehead,
1929).This requiresthat we place emphasison understanding,transferof learning
to untaught problems or situations, and other thinking skills, evaluating the
developmentof these skills throughtasks that clearly must involve more than
recognitionor recall.
Theseskillstaketime to develop,however,and areparticularlydifficultfor some
students(Lohman, 1986;Thomas,Iventosch,& Rohwer, 1987),so it is important
that they be given steadilyincreasingemphasisfromthe earliestyearsof schooling.
By the time studentsare in the uppergradelevels or in college,thereis a good case
for arguingthat factualknowledgeshould be subsumedunderhigherlevel objectives, so that studentsare expectedto use factualknowledgein solvinga problem
or carryingout a process,but are not tested directlyon their ability to recallthe
information.
Evaluationto assist learning.Too muchemphasishasbeen placedon the grading
functionof evaluation,and too little on its role in assistingstudentsto learn.The
integralrole of evaluationin teachingand learningneeds to be grasped,and its
certificationfunctionplacedin properperspective.It is hardto see anyjustification
before the final year or so of high school for placing much emphasison using
classroom evaluation for normativegradingof student achievement,given the
evidence reviewedhere that normativegrading(with the social comparisonand
interstudentcompetitionthat accompanyit) producesundesirableconsequences
for most students.
These undesirableeffectsinclude reductionof intrinsicmotivation,debilitating
evaluation anxiety, ability attributionsfor success and failure that undermine
studenteffort,loweredself-efficacyfor learningin the weakerstudents,reduceduse
and effectivenessof feedbackto improvelearning,and poorersocial relationships
among the students.Gradingon a fixed curve is especiallyinappropriatebecause
it emphasizesparticularlystronglya comparativeapproachto grading.Strong
emphasison the gradingfunctionof evaluationhas also led to overuseof features
normallyassociatedwith standardizedtesting, such as very formal testing conditions, speededtests with strict time limits, a restrictedrange of item types, and
emphasison the overallscoreratherthan what can be learnedabout strengthsand
weaknesses.These may be appropriatein psychologicaltesting, but are rarely
appropriatein educationaltesting(Wood, 1986).
Much of the evaluationactivityin educationmight more profitablybe directed
solely to giving useful feedbackto students,whereasthe less frequentevaluations
for summativepurposesshould focus on describingwhat studentscan or can't do
(i.e., should be criterion referenced).The likely small reduction in reliability
associatedwith countingfewerevaluationsin the summativeevaluationwould be
a modestpenaltyto pay for the benefitsdescribedaboveand the improvedvalidity
associatedwith greateremphasison final competence(ratherthan on the mistakes
made alongthe way).
Effectivefeedback.Thereare severalwaysin whichthe effectivenessof feedback
could be enhanced.First,feedbackis most effectiveif it focusesstudents'attention
on their progressin masteringeducational tasks. Such emphasis on personal
progressenhancesself-efficacy,encourageseffortattributions,and reducesattention
to social comparison.The approachthat leads to the most valuablefeedbackis
nicely capturedby Easleyand Zwoyer(1975):
468

Impactof ClassroomEvaluationon Students


If youcanbothlistento childrenandaccepttheiranswersnotas thingsto justbe
whichmayrevealwhatthechild
judgedrightorwrongbutaspiecesof information
is thinkingyou will havetakena giantsteptowardbecominga masterteacher
ratherthanmerelya disseminator
of information.
(p.25)
Second, feedbackshould take place while it is still clearlyrelevant.This usually
implies that it should be providedsoon after a task is completed, and that the
studentshould be given opportunitiessubsequentlyto demonstratelearningfrom
the feedback.One of the strengthsof masterylearningapproachesis the emphasis
on feedbackand subsequentopportunitiesto correctdeficiencieswithout penalty
for the earlierfailure.
Third, feedbackshould be specific and relatedto need. Simple knowledgeof
resultsshould be providedconsistently(directlyor implicitly),with more detailed
feedbackonly where necessaryto help the studentwork throughmisconceptions
or other weaknessesin performance.Praiseshould be used sparinglyand where
used should be task specific, whereas criticism (other than simply identifying
deficiencies)is usuallycounterproductive.
Benefits of cooperation.Cooperativelearning approachescan be effective in
facilitatingstudentlearningand motivationand in developinggood interpersonal
skills and relationships.They are particularlyappropriatefor more complex tasks
wherethe differentperspectivesand skillsof groupmemberscan complementeach
other.
Approachesthat encourageactive engagementof all individualsand that stimulate helpingbehaviorswithingroupsaremost desirable.Groupsmay worktogether
on a group product, but it is also desirableto include some evaluation of the
learningof the individualmembersin the overallevaluationof the achievements
of the group.
One of the benefitsof cooperativelearningis likelyto be enhanceddevelopment
of valuablepeer and self-evaluationskills (see Boyd & Cowan, 1985, Johnston &
Winograd,1985), becausethere is an incentive for groupsto monitor their own
progress.When normativegradingis de-emphasized,cooperativelearningis predictablymore easy to establish.
Setting standards.Researchhas repeatedlydemonstratedthat studentsachieve
most and gain most on key motivationalvariableswhen evaluationstandardsare
high but attainable.In many teachingsituationsthis is not possibleif all students
are working simultaneouslyon the same tasks and trying to meet the same
standards.Under such circumstances,some studentswill probablynot be challenged,whereasothersmay find the standardsunattainable(see Bennett, 1988, p.
26).
To optimize learning outcomes, several alternativeapproachesare possible.
Standardsand/or tasksmay be set forindividualstudents,or considerableflexibility
in learningpathwaysprovided(e.g., masterylearningapproaches),or cooperative
learning may be used to reduce pressure on individuals and compensate for
individualstrengthsand weaknesses.Weakerstudentsmay benefit from identification of more attainableintermediategoals, thus makingpossiblethe patternof
repeatedsuccessesthat leads to improvedself-efficacy.Requirementsand criteria
should be made very clear before an importanttask is attempted(Anderson&
Armbruster,1984; Natriello, 1987), to avoid misdirectedeffort and increased
evaluationanxiety.
469

Terence J. Crooks
Frequency of evaluation. Students should be given regular opportunities to
practice and use the skills and knowledge that are the goals of the program, and to
obtain feedback on their performance. Such evaluation fosters active learning,
consolidation of learning, and if appropriately arranged can also provide the
retention benefits associated with spaced practice. Much of this evaluation can be
quite informal, however, and certainly does not need to be conducted under testlike conditions. For higher level outcomes, in particular, it seems likely that too
much formal evaluation may be as bad as too little because conceptual understanding and skills do not develop overnight.
Selection of evaluation tasks. The nature and format of evaluation tasks should
be selected to suit the goals that are being assessed. In most courses this will lead
to substantial variety in tasks, with benefits in versatility of approach and development of transfer skills (Elton, 1982). If it is not inconsistent with program
objectives, students could be given some choice of tasks to be attempted. This
stimulates and takes advantage of intrinsic motivation, and helps provide suitable
challenges for all students.
What is evaluated. The most vital of all the messages emerging from this review
is that as educators we must ensure that we give appropriate emphasis in our
evaluations to the skills, knowledge, and attitudes that we perceive to be most
important. Some of these important outcomes may be hard to evaluate, but it is
important that we find ways to assess them. Cross (1987) sums up this point very
clearly:
It serves no useful purpose to lower our educational aspirations because we cannot
yet measure what we think is important to teach. Quite the contrary, measurement
and assessment will have to rise to the challenge of our educational aspirations. (p.
6)

Concluding remarks. This review has taken a multidimensional look at the


impact of classroom evaluation on students. Although the research reviewed is
diverse both in focus and in perspective, considerable convergence emerges in the
implications of the research findings for effective use of classroom evaluation. My
hope is that this review will "help people use their heads" (Cronbach, 1975) in
thinking about classroom evaluation, thus enhancing the professionalism and the
effectiveness of this important component of teaching and learning.
References
Abbott, R. D., & Falstrom, P. (1977). Frequent testing and personalized systems of instruction.

EducationalPsychology,2, 251-257.
Contemporar'v
Ames, C. (1984). Competitive, cooperative, and individualistic goal structures: A cognitivemotivational analysis. In R. E. Ames & C. Ames (Eds.), Research on motivation in

education:Vol.1. Studentmotivation.New York:AcademicPress.


Anderson, R. C., & Biddle, W. B. (1975). On asking people questions about what they are
reading. In G. Bower (Ed.), Psychology of learning and motivation (Vol. 9, pp. 89-132).
New York: Academic Press.
Anderson, T. H., & Armbruster, B. B. (1984). Studying. In P. D. Pearson (Ed.), Handbook of
reading research. New York: Longman.
Astin, A. W. (1987). Competition or cooperation? Change, 19(5), 12-19.
Ball, D. W., et al. (1986). Level of teacher objectives and their classroom tests: Match or

mismatch.Journalof Social StudiesResearch,10(2),27-31.

470

Impact of Classroom Evaluation on Students


Bandura,A. (1977). Self-efficacy:Towarda unifyingtheoryof behavioralchange.Psychological Review,84, 191-215.
Bandura,A. (1982). Self-efficacymechanismin human agency.AmericanPsychologist,37,
122-147.
Bandura,A., & Schunk, D. H. (1981). Cultivatingcompetence,self-efficacy,and intrinsic
interestthroughproximalself-motivation.Journalof Personalityand Social Psychology,
41, 586-598.
R. L., Kulik,J. A., & Kulik,C. C. (1987, April).The impactof peekability
Bangert-Drowns,
on feedbackeffects.Paperpresentedat the annual meetingof the AmericanEducational
ResearchAssociation,Washington,DC.
Bangert-Drowns,R. L., Kulik, J. A., & Kulik, C. C. (1988). Effectsof frequentclassroom
testing.Unpublishedmanuscript,Universityof Michigan.
Becker,H. S., Geer, B., & Hughes,E. C. (1968). Making the grade: The academicside of
college li/f. New York: Wiley.

Beeson, R. 0. (1973). Immediateknowledgeof results and test performance.Journal of


EducationalResearch,66, 224-226.
Bennett,N. (1988). The effectiveprimary-schoolteacher-The searchfor a theoryof pedagogy. Teachingand TeacherEducation,4, 19-30.
Bergland,G. W. (1969). The effect of knowledgeof resultson retention.Psychologyin the
Schools,6, 420-421.
Biggs, J. B. (1973). Study behaviour and performancein objective and essay formats.
Journalof Education,17, 157-167.
Alustralian
Biggs,J. B. (1978). Individualand group differencesin study processes.BritishJournalof
EducationalPsvchology,48, 266-279.
analysisof college teaching.EducationalPsyBjork,R. A. (1979). Information-processing
chologist,14, 15-23.
Black,P. J. (1968). Universityexaminations.PhysicsEducation,3, 93-101.
Block, J. H., & Anderson,L. W. (1975). Masteri,learningin classroominstruction.New
York:Macmillan.
Bloom, B. S. (Ed.).(1956).A taxonomyof educationalobjectives.HandbookI, the cognitive
donmain.New York: Longman.

Bloom, B. S. (1986). Ralph Tyler'simpact on evaluationtheory and practice.Journalof


Thought,21, 36-46.
Bok, D. (1986). Towardhigherlearning.Change,18(6), 18-27.
Boniface,D. (1985). Candidates'use of notes and textbooksduringan open-bookexamination. EducationalResearch,27, 201-209.
Boyd, H., & Cowan,J. (1985). A case for self-assessmentbasedon recentstudiesof student
learning.Assessmentand Evaluationin HigherEducation,10, 225-235.
Brophy,J., & Good, T. L. (1986). Teacherbehaviorand student achievement.In M. C.
Wittrock(Ed.), Handbookof researchon teaching(3rd ed., pp. 328-374). New York:
Macmillan.
Broudy,H. S. (1988). The uses of schooling.New York:Routledge.
Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. (1983). In P. H. Mussen

(Ed.),Handbookof childpsychology(Vol. 3, pp. 77-166). New York:Wiley.


Buckwalter,J. A., Schumacher,R., Albright,J. P., & Cooper, R. R. (1981). Use of an
educationaltaxonomy for evaluationof cognitiveperformance.Journalof MedicalEducation,56, 115-121.
T. (1981). Levels of questions:A frameworkfor the
Carrier,C. A., & Fautsch-Partridge,
EducationalPsychology,6, 365-382.
explorationof processingactivities.Contemporary
Clifford,M. M. (1984).Thoughtson a theoryof constructivefailure.EducationalPsychologist,
19, 108-120.

Cole, N. S. (1986). Futuredirectionsfor educationalachievementand abilitytesting.In B. S.


471

Terence J. Crooks
Plake& J. C. Witt(Eds.),Buros-Nebraska
symposiumon measurementand testing:Vol.2.
The ftlure of testing. Hillsdale, NJ: Erlbaum.
Condry, J. C., & Chambers, J. (1978). Intrinsic motivation and the process of learning. In M.

R. Lepper& D. Greene (Eds.), The hidden costs of reward:New perspectiveson the


psychology of human motivation. Hillsdale, NJ: Erlbaum.
Corno, L., & Mandinach, E. B. (1983). The role of cognitive engagement in classroom
learning and motivation. Educational Psychologist, 18, 88-108.
Corno, L., & Rohrkemper, M. M. (1985). The intrinsic motivation to learn in classrooms. In

C. Ames & R. Ames (Eds.),Researchon motivationin education.Vol.2. The classroom


milieuz.New York: Academic Press.
Cotton, J. L., & Cook, M. S. (1982). Meta-analysis and the effects of various reward systems:
Some different conclusions from Johnson et al. Psychological Bulletin, 92, 176-183.
Covington, M. V. (1984). The motive for self-worth. In R. E. Ames & C. Ames (Eds.),

Researchon motivationin education.Vol. 1. Studentmotivation.New York: Academic


Press.
Covington, M. V. (1985). Strategic thinking and the fear of failure. In J. W. Segal, S. F.

Chipman,& R. Glaser(Eds.), Thinkingand learningskills: Vol. 1. Relatinginstructionto


research. Hillsdale, NJ: Erlbaum.
Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American

30, 116-127.
Ps!vchologist,
Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. I. Braun
(Eds.), Test validity. Hillsdale, NJ: Erlbaum.

Crooks,T. J. (1982, March).Generalizationin educationalresearch:Througha glass darkly.


Paper presented at the Annual Meeting of the American Educational Research Association,
New York. (ERIC Document Reproduction Service No. ED 220 498)
Crooks, T. J., & Collins, E. A. G. (1986). What do first year university examinations assess?

New ZealandJolrnal of EducationalStudies,21, 123-132.


Crooks, T. J., & Mahalski, P. A. (1986). Relationships among assessment practices, study
methods, and grades obtained. In J. Jones & M. Horsburgh (Eds.), Research and development in higher education. Vol. 8. Sydney, Australia: Higher Education Research and
Development Society of Australasia.
Cross, K. P. (1987). Teaching for learning. AAHE Bulletin, 39(8), 3-7.
Dahlgren, L. 0. (1978). Students' conceptions of subject matter: An aspect of learning and
teaching in higher education. Studies in Higher Education, 3, 25-35.
deCharms, R. (1976). Enhancing motivation: Change in the classroom. New York: Irvington.
deCharms, R. (1983). Intrinsic motivation, peer tutoring, and cooperative learning. In J. M.

Levine & M. C. Wang(Eds.), Teacherand studentperceptions.Implicationsfor learning


(pp. 391-398). Hillsdale, NJ: Erlbaum.

in humanbehavior.New York:
Deci, E. L. (1975).Intrinsicmotivationand self-determination
Irvington.

Deci, E. L., & Ryan, R. M. (1985). Intrinsicmotivationand self determinationin human


behavior. New York: Plenum.
Dempster, F. N. (1987). Time and the production of classroom learning: Discerning implications from basic research. Educational Psychologist, 22, 1-21.
Deutsch, M. (1979). Education and distributive justice: Some reflections on grading systems.

AmericanPsychologist,34, 391-401.
Dillon, J. T. (1982). Cognitive correspondence between question/statement and response.

AmericanEducationalResearchJournal,19, 540-551.
DiSibio, M. (1982). Memory for connected discourse: A constructivist view. Review of

EduationalResearch,52, 149-174.

Dorr-Bremme, D. W., & Herman, J. (1986). Assessing school achievement: A profile of


classroom practices. Los Angeles: Center for the Study of Evaluation, UCLA Graduate
School of Education.

472

Impact of Classroom Evaluation on Students


Doyle, W. (1983). Academicwork.Reviewof EducationalResearch,53, 159-199.
Doyle, W. (1986) Classroom organizationand management.In M. C. Wittrock (Ed.),
Handbookof researchon teaching(3rded., 392-431). New York:Macmillan.
Driscoll,M. P. (1986). The relationshipbetweengradingstandardsand achievement:A new
perspective.Jolrnal of Researchand Developmentin Education,19(3), 13-17.
Duchastel,P. C. (1979a).Adjunctquestioneffectsand experimentalconstraints.Occasional
Paper 1, AmericanCollege,BrynMawr,PA. (ERICDocumentReproductionServiceNo.
ED 216 312)
Duchastel,P. C. (1979b). Retention of prose materials:The effect of testing. Journal of
EducationalResearch,72, 299-300.
Duchastel,P. C. (1981). Retention of prose followingtesting with differenttypes of test.
EducationalPsychology,6, 217-226.
Contemporary
Duchastel, P. C. (1983). Interpretingadjunct question research:Processesand ecological
validity.HumanLearning,2, 1-5.
Duchastel,P. C., & Nungester,R. J. (1982). Testing effects measuredwith alternatetest
forms.Journalof EducationalResearch,75, 309-313.
Dweck,C. S., & Bempechat,J. (1983). Children'stheoriesof intelligence:Consequencesfor
learning.In S. G. Paris,G. M. Olson,& H. W. Stevenson(Eds.),Learningand motivation
in the classroom.Hillsdale,NJ: Erlbaum.
Dweck, C. S., & Elliott, E. S. (1983). Achievementmotivation. In P. H. Mussen (Ed.),
Handbookof child psychology(Vol. 4, pp. 643-691). New York: Holt, Rinehartand
Winston.
d'Ydewalle,G., Swerts,A., & De Corte, E. (1983). Study time and test performanceas a
functionof test expectations.Contemporary
EducationalPslychology,
8, 55-67.
Easley,J. A., & Zwoyer,R. E. (1975). Teachingby listening-Toward a new day in math
classes.Contemporary
Education,47, 19-25.
Ebel, R. L. (1982). Proposedsolution to two problemsof test construction.Journal of
Edulcational
Measurement,19, 267-278.
Eccles,J. (1983).Expectancies,valuesandacademicbehavior.In J. T. Spence(Ed.),Academic
and achievementmotives.San Francisco:Freeman.
Elton, L. R. B. (1982). Assessmentfor learning.In D. Bligh (Ed.), Professionalismand
flexibilit'yfor learning. Guildford, Surrey, England:Society for Researchinto Higher
Education.
Elton, L. R. B., & Laurillard,D. M. (1979). Trendsin researchon studentlearning.Studies
in HigherEducation,4, 87-102.
Entwistle,N. J., & Kozeki,B. (1985). Relationshipsbetweenschool motivation,approaches
to studying,and attainment,among Britishand Hungarianadolescents.BritishJournalof
EducationalPsychology,55, 124-137.
Entwistle,N. J., & Ramsden,P. (1983). Understandingstudentlearning.London:Croom
Helm.
Ericksen,S. C. (1983). Privatemeasuresof good teaching.Teachingof Psychology,10, 133136.

Fennessy,D. (1982, July). Primaryteachers'assessmentpractices:Some implicationsfor


teacher training. Paper presentedat the 12th annual conferenceof the South Pacific
Association for Teacher Education, Frankston,Victoria, Australia.(ERIC Document
ReproductionServiceNo. ED 229 346)
Fleming,M., & Chambers,B. (1983). Teacher-madetests:Windowson the classroom.In W.
E. Hathaway(Ed.), New directionsfor testingand measurement:Vol. 19. Testingin the
schools.San Francisco:Jossey-Bass.
Foos, P. W., & Fisher, R. P. (1988). Using tests as learning opportunities.Journal of
EducationalPsychology,80, 179-183.
Ford,N. (1981).Recentapproachesto the studyand teachingof "effectivelearning"in higher
education.Reviewof EducationalResearch,51, 345-377.
473

Terence J. Crooks
Francis, J. (1982). A case for open-book examinations. Educational Review, 34, 13-26.
Fredericksen, N. (1984a). The real test bias: Influences of testing on teaching and learning.
/AmericanPsy!chologist,39, 193-202.
Fredericksen, N. (1984b). Implications of cognitive theory for instruction in problem solving.

Reviewlof EducationalResearch,54, 363-407.


Friedman, H. (1987). Repeat examinations in introductory statistics. Teaching of Psychology,
14, 20-23.
Gagne, R. M. (1977). The conditions of learning (3rd ed.). New York: Holt, Rinehart and
Winston.
Gagne, R. M., Briggs, L. J., & Wager, W. W. (1988). Principles of instructional design. New
York: Holt, Rinehart and Winston.
Gall, M. (1984). Synthesis of research on teachers' questioning. Educational Leadership,
42(3), 40-47.
Gay, L. R. (1980). The comparative effects of multiple-choice versus short-answer tests on

retention.Journalof EducationalMeasurement,17, 45-50.


Glaser, R. (1985, November). The integration of instruction and testing. Paper presented at
the E.T.S. Invitational Conference on the Redesign of Testing for the 21st Century, New
York.
Goslin, D. A. (1967). Teachers and testing (2nd ed.). New York: Russell Sage Foundation.
Gullickson, A. R. (1984). Teacher perspectives of their instructional use of tests. Journal of

Edlucational
Research,77, 244-248.
Gullickson, A. R. (1985). Student evaluation techniques and their relationship to grade and

curriculum.Journalof EducationalResearch,79, 96-100.


Gullickson, A. R., & Ellwein, M. C. (1985). Post hoc analysis of teacher-made tests: The
goodness-of-fit between prescription and practice. Educational Measurement. Issues and
Practice, 4(1), 15-18.
Guskey, T. R., & Gates, S. L. (1986). Synthesis of research on the effects of mastery learning
in elementary and secondary classrooms. Educational Leadership, 43(8), 73-80.
Guza, D. S., & McLaughlin, T. F. (1987). A comparison of daily and weekly testing on
student spelling performance. Journal of Educational Research, 80, 373-376.
Haertel, E. (1985). Construct validity and criterion-referenced testing. Review of Educational

Research,55, 23-46.
Haertel, E. (1986, April). Chloosingand using classroomtests. Teachers'perspectiveson
asse.srsment.Paper presented at the annual meeting of the American Educational Research
Association, San Francisco.
Hakstian, R. (1971). The effects of type of examination anticipated on test preparation and

performance.Journalof EducationalResearch,64, 319-324.


Hamaker, C. (1986). The effects of adjunct questions on prose learning. Review of Educational

Research,56, 212-242.
Hamilton, R. J. (1985). A framework for the evaluation of the effectiveness of adjunct
questions and objectives. Reviewvof Educational Research, 55, 47-85.
Harter, S. (1985). Competence as a dimension of self-evaluation: Toward a comprehensive
model of self-worth. In R. Leahy (Ed.), The development of the self. New York: Academic
Press.
Hill, K. T. (1984). Debilitating motivation and testing: A major educational problemPossible solutions and policy applications. In R. E. Ames & C. Ames (Eds.), Research on

motivationin education.Vol.1. Studentmotivation.New York:AcademicPress.


Hill, K. T., & Wigfield, A. (1984). Test anxiety: A major educational problem and what can
be done about it. Elementary School Joulrnal,85, 105-126.
Howe, M. J. A. (1987). Using cognitive psychology to help students learn how to learn. In J.
T. E. Richardson, M. W. Eysenck, & D. W. Piper (Eds.), Student learning: Research in
education and cognitive psychology. Milton Keynes, England: Open University Press &
Society for Research into Higher Education.

474

Impact of Classroom Evaluation on Students


Hughes,B., Sullivan,H. J., & Mosley,M. L. (1985). Externalevaluation,task difficulty,and
continuingmotivation.Journalof EducationalResearch,78, 210-215.
Hunkins, F. P. (1969). Effects of analysisand evaluationquestions on various levels of
achievement.Journalof ExperimentalEducation,38(2), 45-58.
Ingenkamp,K. (1986). The possibleeffects of variousreportingmethodson learningoutcomes. Studiesin EducationalEvaluation,12, 341-350.
Johnson,D. W., & Johnson,R. T. (1985). Motivationalprocessesin cooperative,competitive,
and individualisticlearning situations. In C. Ames & R. Ames (Eds.), Research on
motivationin education.Vol.2. Theclassroommilieu.New York:AcademicPress.
and interpersonal
Johnson,D. W., Johnson,R. T., & Maruyama,G. (1983). Interdependence
attractionamong heterogeneousand homogeneousindividuals:A theoreticalformulation
and a meta-analysisof the research.Reviewof EducationalResearch,53, 5-54.
Johnson, D. W., Maruyama,G., and Johnson, R. T. (1982). Separatingideology from
currentlyavailabledata:A replyto Cottonand Cookand McGlynn.PsychologicalBulletin,
92, 186-192.
Johnson, D. W., Maruyama,G., Johnson, R., Nelson, D., & Skon, L. (1981). Effectsof
cooperative,competitive, and individualisticgoal structureson achievement:A metaanalysis.PsychologicalBulletin,89, 47-62.
Johnston,P. H., & Winograd,P. N. (1985). Passivefailurein reading.Journalof Reading
Behavior,17, 279-301.
Jones, H. E. (1923). Experimentalstudiesof collegeteaching:The effect of examinationon
permanenceof learning.Archivesof Psychology,10, 1-70.
Karraker,R. J. (1967). Knowledgeof resultsand incorrectrecallof plausiblemultiple-choice
alternatives.Journalof EducationalPsychology,58, 11-14.
Kellaghan,T., Madaus,G. F., & Airasian,P. W. (1982). The effectsof standardizedtesting.
Boston:Kluwer-Nijhoff.
Keller,J. M. (1983).Motivationaldesignof instruction.In C. M. Reigeluth(Ed.),Instructional
designtheoriesand models(pp. 383-434). Hillsdale,NJ: Erlbaum.
Keys, N. (1934). The influenceon learningand retentionof weeklyas opposedto monthly
tests.Journalof EducationalPsychology,25, 427-436.
Kifer,E. (1977). The impactof successand failureon the learner.Evaluationin Education:
InternationalProgress,1, 281-359.
Kirkland,M. C. ( 1971). The effectsof tests on studentsand schools.Reviewof Educational
Research,41, 303-350.
Klinzing,G., Klinzing-Eurich,G., & Tisher, R. P. (1985). Highercognitivebehavioursin
classroom discourse:Congruenciesbetween teachers'questions and pupils' responses.
AustralianJournalof Education,29, 63-75.
Kulhavy,R. W. (1977). Feedbackin writteninstruction.Reviewof EducationalResearch,47,
211-232.
Kulhavy,R. W., & Anderson,R. C. (1972). Delay-retentioneffectwith multiplechoicetests.
Journalof EducationalPsychology,63, 505-512.
Kulhavy,R. W., Dyer,J. W., & Silver,L. (1975).Theeffectsof note-takingandtestexpectancy
on the learningof text material.Journalof EducationalResearch,68, 363-365.
Kulik,C-L.C., & Kulik,J. A. (1987). Masterytestingand studentlearning:A meta-analysis.
Journalof EducationalTechnologySystems,15, 325-345.
Kulik, J. A., & Kulik, C-L. C. (1988). Timing of feedbackand verballearning.Reviewof
Edutcational
Research,58, 79-97.
Kumar,V. K., Rabinsky,L., & Pandey, T. N. (1979). Test mode, test instructions,and
retention.Contemporary
EducationalPsychology,4, 211-218.
LaPorte,R. E., & Voss, J. F. (1975). Retentionof prose materialsas a functionof postacquisitiontesting.Journalof EducationalPsychology,67, 259-266.
Laurillard,D. (1979). The processesof studentlearning.HigherEducation,8, 395-409.
Laurillard,D. (1984). Learningfromproblem-solving.In F. Marton,D. J. Hounsell,& N. J.
475

Terence J. Crooks
Entwistle(Eds.),The experienceof learning.Edinburgh:ScottishAcademicPress.
Lepper,M. R., Greene,D., & Nisbett,R. E. (1973). Underminingchildren'sintrinsicinterest
with extrinsicrewards:A test of the "overjustification"
hypothesis.Journalof Personality
and Social Psychology,28, 129-137.
Levin,J. R. (1982). Picturesas prose-learningdevices.In A. Flamme& W. Kintsch(Eds.),
Advances in psychology. Vol. 8. Discourse processing. Amsterdam: North-Holland.

Lew, M., Mesch, D., Johnson, D. W., & Johnson, R. (1986). Positive interdependence,
academic and collaborative-skills
group contingencies,and isolated students.American
EducationalResearchJournal,23, 476-488.
Linn, R. L. (1983). Testingand instruction:Linksand distinctions.Journalof Educational
Measurement,20, 179-189.
Lohman, D. F. (1986). Predictingmathemathaniceffects in the teachingof higher-order
thinkingskills.EducationalPsychologist,21, 191-208.
Lowell,A. L. (1926). The art of examination.AtlanticMonthly,137, 58-66.
Madaus,G. F., & Airasian,P. W. (1977). Issuesin evaluatingstudentoutcomesin competency-basedgraduationprograms.Journalof Researchand Developmentin Education,10,
79-91.
Madaus,G. F., & McDonagh,J. T. (1979). Minimum competencytesting:Unexamined
assumptionsand unexplorednegativeoutcomes.In R. T. Lennon(Ed.),Impactivechanges
in measurement,New Directions for Testing and Measurement,No. 3. San Francisco:
Jossey-Bass.
Maehr,M. L. (1976).Continuingmotivation:An analysisof a seldomconsiderededucational
outcome.Reviewof EducationalResearch,46, 443-462.
Maehr,M. L. (1983). Doing well in science:Why Johnnyno longerexcels;why Sarahnever
did. In S. G. Paris,G. M. Olson, & H. W. Stevenson(Eds.),Learningand motivationin
the classroom.Hillsdale,NJ: Erlbaum.
Maehr,M. L., & Stallings,W. M. (1972). Freedomfromexternalevaluation.ChildDevelopment,43, 177-185.
Martin,E., & Ramsden,P. (1987).Learningskillsand skillin learning.In J. T. E. Richardson,
M. W. Eysenck,& D. W. Piper(Eds.),Studentlearning:Researchin educationandcognitive
psychology.Milton Keynes,England:Open UniversityPress& Societyfor Researchinto
HigherEducation.
Marton,F., Hounsell, D. J., & Entwistle,N. J. (Eds.).(1984). The experienceof learning.
Edinburgh:ScottishAcademicPress.
Marton,F., & Saljo, R. (1976a). On qualitativedifferencesin learning:1. Outcome and
process.BritishJournalof EducationalPsychology,46, 4-11.
Marton,F., & Saljo, R. (1976b). On qualitativedifferencesin learning:2. Outcome as a
functionof the learner'sconceptionof the task.BritishJournalof EducationalPsychology,
46, 115-127.
Mathews,J. (1980). The uses of objectivetests. Teachingin HigherEducationSeries,No. 9.
England:LancasterUniversity.(ERICDocumentReproductionServiceNo. ED 230 106)
Mayer,R. E. (1975). Forwardtransferof differentreadingstrategiesevokedby testlikeevents
in mathematicstext. Journalof EducationalPsychology,67, 165-169.
McCombs,B. L. (1984). Processesand skills underlyingcontinuingintrinsicmotivationto
learn:Towarda definition'ofmotivationalskills traininginterventions.EducationalPsychologist,19, 199-218.
McGlynn,R. P. (1982). A comment on the meta-analysisof goal structures.Psychological
Bulletin,92, 184-185.
McKeachie,W. J. (1974).The declineandfallof the lawsof learning.EducationalResearcher,
3, 7-11.
McKeachie,W. J. (1984). Does anxietydisruptinformationprocessingor does poor information processinglead to anxiety.InternationalReviewof AppliedPsychology,33, 187203.
476

Impact of Classroom Evaluation on Students


McKeachie,W. J., Pintrich,P. R., Lin, Y., & Smith,D. A. F. (1986). Teachingand learning
in the collegeclassroom:A reviewof the researchliterature.Ann Arbor,Michigan:National
Centerfor Researchto ImprovePostsecondaryTeachingand Learning.
McPartland,J. M. (1987, April).Changingtestingand gradingpracticesto improvestudent
motivationand teacher-student
relationships:Designsfor researchto evaluatenewideasfor
departmentalexams and progressgrades. Paperpresentedat the annual meeting of the
AmericanEducationalResearchAssociation,Washington,DC.
Medley,D. M. (1979).The effectivenessof teachers.In P. L. Peterson& H. J. Walberg(Eds.),
Researchon teaching.Berkeley,CA: McCutchan.
Messick,S. (1984a). The psychologyof educationalmeasurement.Journalof Educational
Measurement,21, 215-237.
Messick,S. (1984b).Abilitiesand knowledgein educationalachievementtesting:The assessment of dynamiccognitivestructures.In B. S. Plake(Ed.),Buros-Nebraska
symposiumon
measurementand testing.Vol.1. Social and technicalissues in testing:Implicationsfor test
constructionand usage. Hillsdale,NJ: Erlbaum.
Meyer,G. (1934). An experimentalstudy of the old and new types of examination:1. The
effectof the examinationset on memory.Journalof EducationalPsychology,25, 641-661.
Meyer,G. (1935).An experimentalstudyof the old andnewtypesof examination:2. Methods
of study.Journalof EducationalPsychology,26, 30-40.
Meyers,C. (1986). Teachingstudentsto thinkcritically.San Francisco:Jossey-Bass.
Miller,C. M. L., & Parlett,M. (1974). Up to the mark:A study of the examinationgame.
London:Societyfor Researchinto HigherEducation.
Miller,G. E. (1976). Continuousassessment.MedicalEducation,10, 81-86.
Miller,G. E. (1978). 'Teachingand learningin medicalschool'revisited.MedicalEducation,
12, Supplement,120-125.
Mills, S. R., Rice, C. T., Berliner,D. C., & Rousseau,E. W. (1980). The correspondence
betweenteacherquestionsand studentanswersin classroomdiscourse.Journalof ExperimentalEducation,48, 194-204.
Milton,0. (1982). Willthat be on thefinal?Springfield,IL:CharlesC. Thomas.
Natriello,G. (1987). The impactof evaluationprocesseson students.EducationalPsychologist, 22, 155-175.

Natriello,G., & Dornbusch,S. M. (1984). Teacherevaluativestandardsand studenteffort.


New York: Longman.

Naveh-Benjamin,M., McKeachie,W. J., & Lin, Y-G. (1987). Two types of test-anxious
students:Supportfor an informationprocessingmodel.Journalof EducationalPsychology,
79, 131-136.
Newble, D. I., & Jaeger,K. (1983). The effect of assessmentsand examinationson the
learningof medicalstudents.MedicalEducation,17, 25-31.
Nicholls,J. G. (1983). Conceptionsof abilityand achievementmotivation:A theoryand its
implicationsforeducation.In S. G. Paris,G. M. Olson,& H. W. Stevenson(Eds.),Learning
and motivationin the classroom.Hillsdale,NJ: Erlbaum.
Nicholls,J. G. (1984).Achievementmotivation:Conceptionsof ability,subjectiveexperience,
task choice, and performance.PsychologicalReview,91, 328-346.
Nungester,R. J., & Duchastel,P. C. (1982). Testing versus review:Effectson retention.
Journalof EducationalPsychology,74, 18-22.
O'Neill, M., Razor, R. A., & Bartz, W. R. (1976). Immediateretentionof objectivetest
answersas a functionof feedbackcomplexity.Journalof EducationalResearch,70, 72-75.
Page, E. B. (1958). Teachercommentsand studentperformance:A seventy-fourclassroom
experimentin school motivation.Journalof EducationalPsychology,49, 173-181.
Paris,S. G. (1988, April).Fusing skill and will in children'slearningand schooling.Paper
presentedat the annualmeetingof the AmericanEducationalResearchAssociation,New
Orleans,LA.
Paris,S. G., & Cross,D. R. (1983).Ordinarylearning:Pragmaticconnectionsamongchildren's
477

Terence J. Crooks
beliefs,motives,and actions.In J. Bisanz,G. Bisanz,& R. Kail (Eds.),Learningin children
(pp. 137-169). New York:Springer-Verlag.
Perry,W. F. (1970). Forms of intellectualand ethical developmentin the college years:A
scheme.New York:Holt, Rinehartand Winston.
Phye, G. D. (1979). The processingof informativefeedback about multiple-choicetest
EducationalPsychology,4, 381-394.
performance.Contemporary
Pressley,M*,Levin,J. R., & Chatala,E. S. (1984). Memorystrategymonitoringin adultsand
children.Journalof VerbalLearningand VerbalBehavior,23, 270-288.
Quellmalz,E. S. (1985). Needed: Better methods for testing higher-orderthinking skills.
EducationalLeadership,43(2), 29-35.
Ramsden,P. (1984). The contextof learning.In F. Marton,D. J. Hounsell,& N. J. Entwistle
(Eds.), The experienceof learning.Edinburgh:ScottishAcademicPress.
Ramsden,P. (1985). Studentlearningresearch:Retrospectand prospect.HigherEducation
Researchand Development,4, 51-69.
Ramsden,P., Beswick,D., & Bowden,J. (1987). Learningprocessesand learningskills.In J.
T. E. Richardson,M. W. Eysenck,& D. W. Piper (Eds.), Studentlearning:Researchin
educationand cognitivepsychology.Milton Keynes, England:Open UniversityPress &
Societyfor Researchinto HigherEducation.
Ramsden, P., & Entwistle,N. J. (1981). Effects of academic departmentson students'
approachesto studying.BritishJournalof EducationalPsychology,51, 368-383.
Redfield,D. L., & Rousseau, E. W. (1981). A meta-analysisof experimentalresearchon
teacherquestioningbehavior.Reviewof EducationalResearch,51, 237-245.
Rickards,J. P., & Friedman,F. (1978). The encodingversusthe externalstoragehypothesis
in notetaking.Contemporary
EducationalPsychology,3, 136-143.
Rinchuse, D. J., & Zullo, J. (1986). The cognitive level demands of a dental school's
predoctoraldidacticexaminations.Journalof DentalEducation,50, 167-171.
Rogers,E. M. (1969). Examinations:Powerfulagentsfor good or ill in teaching.American
Journalof Physics,37, 954-962.
Rohm, R. A., Sparzo,F. J., & Bennett,C. M. (1986). College studentperformanceunder
repeatedtesting and cumulativetesting conditions:Reports on five studies. Journal of
EducationalResearch,80, 99-104.
Rohwer, W. D., & Thomas, J. W. (1987). The role of mnemonic strategiesin study
effectiveness:Theories,individualdifferences,and applications.In M. A. McDaniel& M.
Pressley(Eds.),Imageryand relatedmnemonicprocesses.New York:Springer-Verlag.
Rosenholtz,S. J., & Simpson,C. (1984). Classroomorganizationand studentstratification.
ElementarySchoolJournal,85, 21-37.
Rosenshine,B. (1979). Content, time, and direct instruction.In P. L. Peterson& H. J.
Walberg(Eds.),Researchon teaching.Berkeley,CA:McCutchan.
Rosenshine,B., & Stevens,R. (1986).Teachingfunctions.In M. C. Wittrock(Ed.),Handbook
of researchon teaching(3rd ed., pp. 376-391). New York:Macmillan.
Rosswork,S. G. (1977).Goal setting:The effectsof an academictaskwith varyingmagnitudes
of incentive.Journalof EducationalPsychology,69, 710-715.
Rothkopf, E. Z. (1988). Perspectiveson study skills training in a realistic instructional
economy. In C. E. Weinstein,E. T. Goetz, & P. A. Alexander(Eds.),Learningand study
strategies:Issues in assessment, instruction,and evaluation.San Diego, CA: Academic
Press.
Rowe, D. W. (1986). Does researchsupport the use of "purposequestions"on reading
comprehensiontests?Journalof EducationalMeasurement,23, 43-55.
Rowe, M. B. (1986). Wait time: Slowing down may be a way of speedingup! Journalof
TeacherEducation,37(1), 43-50.
Rudman,H. E., Kelley,J. L., Wanous,D. S., Mehrens,W. A., Clark,C. M., & Porter,A. C.
A review(1922-1980) (ResearchSeriesNo.
(1980). Integratingassessmentwithinstruction:

478

Impactof ClassroomEvaluationon Students


75). East Lansing,MI: MichiganState University,Institutefor Researchon Teaching.
(ERICDocumentReproductionServiceNo. ED 189 136).
Ryan,R. M., Connell,J. P., & Deci, E. L. (198.5).A motivationalanalysisof self-determination
and self-regulationin education.In C. Ames & R. Ames (Eds.),Researchon motivationin
education.Vol.2. The classroommilieu.New York:AcademicPress.
Sadler,D. R. (1983). Evaluationand the improvementof academic learning.Journal of
HigherEducation,54, 60-79.
Salmon-Cox,L. (1981). Teachersand standardizedachievementtests:What'sreallyhappening?Phi Delta Kappan,62, 631-634.
Samson,G. E., Strykowski,B., Weinstein,T., & Walberg,H. J. (1987). The effectsof teacher
questioninglevelson studentachievement:A quantitativesynthesis.Journalof Educational
Research,80, 290-295.
Sarason,I. R. (1980). Testanxiety:Theoryand applications.Hillsdale,NJ: Erlbaum.
Sassenrath,J. M., & Garverick,C. M. (1965). Effectsof differentialfeedbackfrom examinations on retentionand transfer.Journalof EducationalPsychology,56, 259-263.
Sax, G., & Collet,L. S. (1968). An empiricalcomparisonof the effectsof recalland multiplechoice tests on studentachievement.Journalof EducationalMeasurement,5, 169-173.
Schmeck,R. R. (1983). Learningstylesof college students.In R. Dillon & R. R. Schmeck,
Individualdifferencesin cognition.New York:AcademicPress.
Schmeck,R. R. (1988). Individualdifferencesand learningstrategies.In C. E. Weinstein,E.
T. Goetz, & P. A. Alexander(Eds.),Learningand study strategies:Issues in assessment,
instruction,and evaluation.San Diego, CA:AcademicPress.
Schunk,D. (1984). Self-efficacyperspectiveon achievementbehavior.EducationalPsychologist, 19, 48-58.
Schunk,D. (1985). Self-efficacyand classroomlearning.Psychologyin the Schools,22, 208223.
Shavelson,R. J., Berliner,D. C., Ravitch,M. M., & Loeding,D. (1974). Effectsof position
and type of question on learningfrom prose material:Interactionof treatmentswith
individualdifferences.Journalof EducationalPsychology,66, 40-48.
Shulman,L. S. (1980). Test design:A view from practice.In E. L. Baker& E. S. Quellmaltz
(Eds.),Educationaltestingand evaluation.BeverlyHills, CA:Sage.
Slavin,R. E. (1978).Separatingincentives,feedback,and evaluation:Towarda moreeffective
classroomsystem.EducationalPsychologist,13, 97-100.
Slavin,R. E. (1983a).Cooperativelearning.New York:Longman.
Slavin,R. E. (1983b).Whendoes cooperativelearningincreasestudentachievement?PsychologicalBulletin,94, 429-445.
Slavin,R. E. (1984). Studentsmotivatingstudentsto excel:Cooperativeincentives,cooperative tasks,and studentachievement.ElementarySchoolJournal,85, 53-63.
Slavin,R. E. (1987). Developmentaland motivationalperspectiveson cooperativelearning:
A reconciliation.ChildDevelopment,58, 1161-1167.
Cambridge,MA:M.I.T.Press.
Snyder,B. R. (1971). Thehiddencurricululm.
Spitzer,H. F. (1939). Studiesin retention.Journalof EducationalPsychology,30, 641-656.
Stice,J. E. (1987). Learninghow to think:Beingearnestis importantbut it's not enough.In
J. E. Stice (Ed.), New directionsfor teachingand learning. Vol. 30. Developingcritical
abilities.San Francisco:Jossey-Bass.
thinkingandproblem-solving
Stiggins,R. J. (1985). Improvingassessmentwhere it means the most: In the classroom.
EducationalLeadership,43(2), 69-74.
Stiggins,R. J., & Bridgeford,N. J. (1985). The ecology of classroomassessment.Journalof
EducationalMeasurement,22, 271-286.
Stiggins,R. J., Conklin,N. F., & Bridgeford,N. J. (1986). Classroomassessment:A key to
effectiveeducation.EducationalMeasurement:Issues and Practice,5(2), 5-17.
Stiggins,R. J., Griswold,M., Green, K. R., & associates(1988, April).Measuringthinking
479

Terence J. Crooks
skills through classroom assessment. Paper presented at the annual meeting of the National
Council on Measurement in Education, New Orleans.
Strang, H. R., & Rust, J. 0. (1973). The effect of immediate knowledge of results and task
definition on multiple-choice answering. Journal of Experimental Education, 42, 77-80.
Svensson, L. (1977). On qualitative differences in learning: 3. Study skill and learning. British
Journal of Educational Psvchology, 47, 233-243.
Terry, P. W. (1933). How students review for objective and essay tests. Elementary School

Journal,33, 592-603.
Thomas, J. W., Iventosch, L., & Rohwer, W. D. (1987). Relationships among student
characteristics, study activities, and achievement as a function of course characteristics.

EducationalPsychology,12, 344-364.
Contemporary
Thomas, J. W., & Rohwer, W. D. (1986). Academic studying: the role of learning strategies.

EducationalPsy'chologist,
21, 19-41.

Thorndike, E. L., & Woodyard, E. (1934). The influence of the relative frequency of success
and frustrations upon intellectual achievement. Journal of Educational Psychology, 25,
241-250.
Thorndike, R. L. (1969). Helping teachers use tests. NCME Measurement in Education, 1(1),
1-4.
Tobias, S. (1985). Test anxiety: Interference, defective skills, and cognitive capacity. Educational Psychologist, 20, 135-142.
van Rossum, E. J., Diejkers, R., & Hamer, R. (1985). Students' learning conceptions and
their interpretation of significant educational concepts. Higher Education, 14, 617-641.
van Rossum, E. J., & Schenk, S. M. (1984). The relationship between learning conception,
study strategy, and learning outcome. British Jolurnal of Educational Psychology, 54, 7383.
Watkins, D. (1984). Students' perceptions of factors influencing tertiary learning. Higher

EducationResearchand Development,3, 33-50.


Watts, G., & Anderson, R. C. ( 1971). Effects of three types of inserted questions on learning
from prose. Journal of Educational Psychology, 62, 387-394.
Webb, N. M. (1985). Student interaction and learning in small groups: A research summary.
In R. E. Slavin, S. Sharan, S. Kagan, R. Hertz-Lazarowitz, C. Webb, & R. Schmuck (Eds.),
Learning to cooperate, cooperating to learn (pp. 147-172). New York: Plenum.

Webb, N. M. (1988, April). Small group problem-solving.Peer interactionand learning.


Invited address at the annual meeting of the American Educational Research Association,
New Orleans, LA.
Weiner, B. (1979). A theory of motivation for some classroom experiences. Journal of
Educational Psychology, 71, 3-25.
Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psvchological Review, 92, 548-573.
Weiner, B. (1986). An attrihltional theory of motivation and emotion. New York: SpringerVerlag.
Wexley, K. N., & Thornton, C. L. (1972). Effect of verbal feedback of test results on learning.

Journalof EducationalResearch,66, 119-121.


Whitehead, A. N. (1929). The aims of education. New York: Macmillan.
Wilson, J. D. (1981). Student learning in higher education. London: Croom Helm.
Winne, P. H. (1979). Experiments relating teachers' use of higher cognitive level questions to
student achievement. Review of Educational Research, 49, 13-50.
Wittrock, M. C. (1974). Learning as a generative process. Educational Psychologist, 11, 8795.
Wittrock, M. C. (1979). The cognitive movement in instruction. Educational Psychologist,
13, 15-29.
Wittrock, M. C. (1986). Students' thought processes. In M. C. Wittrock (Ed.), Handbook of
research on teaching (3rd ed., pp. 297-314). New York: Macmillan.

480

Impact of Classroom Evaluation on Students


Wood, R. (1986). The agenda for educational measurement. In D. L. Nuttall (Ed.), Assessing
educational achievement. London: Falmer Press.

Author
TERENCE J. CROOKS, Senior Lecturer, Director, Higher Education Development Centre,
University of Otago, P.O. Box 56, Dunedin, New Zealand. Specializations: improvement
of tertiary education, research design, measurement and evaluation.

481

Vous aimerez peut-être aussi