Testing A Test' - Evaluating Our Assessment Tools Testing A Test' - Evaluating Our Assessment Tools

‘Testing a test’ – Evaluating our
Assessment Tools
Eddy White, Ph.D.
British Columbia
Institute of
Technology
Presented at the 2011 BC TEAL Annual Conference, May 7

Vancouver, BC, Canada
The goal of assessment is to . . .
2
The goal of assessment has to be, above
all, to support the improvement of
learning and teaching.
(Fredrickson & Collins, 1989)
3
Consider • Research suggests that
teachers spend from
one-quarter to one-third
of their professional
time on assessment-
related activities.
• Almost all do so
without the benefit
of having learned the
principles of sound
assessment.
(Stiggins, 2007)
Targets 1. Tests -
purposes/functions
2. Your assessment
practices
3. The ‘cardinal
criteria’
4. Assessing test
tasks
5. Conclusions
5
(2004)
6
• Exploring how
principles of language
assessment can and
should be applied to
formal tests.
• These principles apply
to assessment of all
kinds.
• How to use these
principles to design a
good test.
7
Quiz time!
8
Assessing an English articles
quiz
Context
• Conversation class
(listening & speaking)
• high-beginner level
9
What is a fundamental
problem with this quiz?
10
Answer - later
11
Targets 1. Tests -
purposes/functio
ns
2. Your assessment
practices
3. The ‘cardinal
criteria’
4. Assessing test
tasks
5. Conclusions 12
What is
a test?
13
A test . . .
• is a method of measuring a person’s
ability, knowledge, or performance in
a given domain.
• is an instrument – a set of
techniques, procedures, or items –
that requires performance on the
part of the test-taker.
14
Tests – measuring function
15
A test must measure
• Some tests measure general ability, while
others focus on very specific competencies or
objectives.
• Examples
• A multi-skill proficiency test measures general
ability;
• a quiz on recognizing correct use of definite
articles measures very specific knowledge.
16
• A test measures
performance, . . .
• but, the results
imply the test-
takers ability, or
competence.
17
• Performance-
based tests
sample the test-
takers actual use
of language,
• but from those
samples the test
administrator
infers general
competence.
18
• A well-constructed
test is an instrument
that provides an
accurate measure of
a test-taker’s ability
within a particular
domain.
• Constructing a
good test is a
complex task.
19
Targets 1. Tests -
purposes/functions
2. Your assessment
practices
3. The ‘cardinal
criteria’
4. Assessing test
tasks
5. Conclusions
20
Think about
what is
happening in
your context
and your
assessment
practices
Your assessment practices?
• True–False Item • Inventories
• Multiple Choice • Checklists
• Completion How do you
• Peer Rating
• Short Answer
• Essay
assess your
• Self Rating
• Journals
• Practical Exam students?
• Portfolios
• Papers/Reports
• Observations
• Projects
• Discussions
• Questionnaires
• Interviews
• Presentations
For you,
which of the
four skills are
more/less
challenging
to test?
23
Targets 1. Tests -
purposes/functions
2. Your assessment
practices
3. The ‘cardinal
criteria’
4. Assessing test
tasks
5. Conclusions
24
(2004)
25
Quiz time!
26
• What are the
‘five cardinal
criteria’ that
can be used to
design and
evaluate all
types of
assessment?
27
• Practicality
Five key • Reliability
assessment • Validity
principles
• Authenticity
• Washback
28
29
Key Assessment Principles
1. Validity - Does the assessment measure what
we really want to measure?
2. Reliability- Is all work being consistently
marked to the same standard?
3. Practicality - Is the procedure relatively easy to
administer?
4. Washback - Does the assessment have positive
effects on learning and teaching?
5. Authenticity - Are students asked to perform
real-world tasks?
• These questions
provide an
excellent
criterion to
evaluate the
tests we design
and use.
31
32
1. Practicality
• Is the procedure relatively easy
to administer?
33
An effective test is practical.
This means that it:
• is not excessively expensive
• stays within appropriate time
constraints
• is relatively easy to administer, and
• has a scoring/evaluation procedure
that is specific and time efficient
34
The value and quality of a
test sometimes hinge on
such nitty-gritty practical
considerations.
35
• In classroom
based testing,
_________ is
almost always
a crucial
practical factor
for busy
teachers.
36
37
2. Reliability
• Is all work being
consistently marked to the
same standard?
38
• A reliable test is consistent and
dependable.
• If you give the same test to the same

student or matched students on two
different occasions, the test should
yield similar results.
39
What
factors
contribute
to the
unreliability
of a test?
40
Test Unreliability-
contributing factors
• Student related reliability

• Rater reliability (inter, intra)
• Test administration reliability
• Test reliability
41
Q. What is one key way
to increase reliability?
A. Use rubrics
42
• Rubrics are scoring guidelines.
• They provide a way to make
judgments fair and sound when
assessing performance.
• A uniform set of precisely defined
criteria or guidelines are set forth to
judge student work.
43
44
3. Validity
• Does the
assessment - most complex
measure criteria
what we - most important
really want principle
to measure?
Validity - definition
• ‘The extend to which inferences
made from assessment results
are appropriate, meaningful, and
useful in terms of the purpose of
the assessment.’
(Gronlund, 1998, p. 226)

46
• A valid test of reading
ability . . .
• actually measures reading
ability –
• not math skills
• or previous knowledge in
a subject
• nor writing skills
• nor some other variable of
questionable relevance
47
How is the validity of a test
established?
1.Content
validity
2.Face
validity
48
Content validity
• If a test requires the test-taker to perform the
behavior that is being measured. . .
• it can claim content-related evidence of validity
(content validity)
• e.g. A test of a person’s ability to speak an L2
requires the student to actually speak within some
sort of authentic context.
• A test with paper and pencil multiple choice
questions requiring grammatical judgments does
not achieve content validity.
49
• direct testing –
Another way of involves the test-taker
in actually performing
understanding the target task
content validity • indirect testing-
is to consider students not
performing the task
the difference itself, but a related
between direct task.
and indirect • e.g. testing oral
testing. production of
syllable stress
50
To achieve
content validity in
classroom
assessment, try to
test performance
directly.
51
52
How is the validity of a test
established?
1.Content
validity
2.Face
validity
53
Face validity
• The extent to which students view the
assessment as:
1. fair
2. relevant
3. useful for improving learning
• Face validity refers to the degree to which a
test looks right, and appears to measure the
knowledge or abilities it claims to measure.
54
High face validity: the test . . .
• is well-constructed, expected format with
familiar tasks
• is clearly doable within allotted time
• has items that are clear and uncomplicated
• directions that are crystal clear
• has tasks related to course work (content
validity)
• has a difficulty level that presents a reasonable
challenge
55
• Most significant cardinal principle of
assessment evaluation.
• If validity is not established, all other
considerations may be rendered useless.
56
57
4. Authenticity
• Are students asked to perform
real-world tasks?
58
Test task authenticity
• tasks represent, or closely
approximate, real-world tasks
• the task is likely to be enacted
in the “real world”
• not contrived or artificial
59
• The authenticity of
test tasks in recent
years has increased
noticeably.
• Two or three decades
ago, unconnected,
boring, contrived
items were accepted
as a necessary
component of testing.
60
61
5.Washback
• Does the assessment have
positive effects on learning
and teaching?
62
Washback = the effect
of testing on teaching
and learning
-positive washback
-negative washback
63
Washback
• Classroom assessment: the affects of an
assessment on teaching and learning prior to
the assessment itself (preparation)
• Another form of washback=the information
that ‘washes back’ to students in the form of
useful diagnoses of strengths and weaknesses.
• Formal tests provide no washback if students
receive a simple letter grade or single overall
numerical score.
64
Teachers’ challenge
• to create classroom tests
that serve as learning tools
through which washback is
achieved
65
66
67
Discussion Question
What challenges
do you face with
regard to these
principles and
test creation/use
in your context?
68
Targets 1. Tests -
purposes/functions
2. Your assessment
practices
3. The ‘cardinal
criteria’
4. Assessing test
tasks
5. Conclusions
69
Evaluating assessment scenarios
70
Key Principles Score
Practicality Fill in the chart with 5-4-3-2-1
Rater Reliability scores
-5 indicating the principle is
Test reliability
highly fulfilled
Content validity
- 1 indicating very low or no
Face validity
fulfillment.
Authenticity - Use your best intuition.
Scenario 1: One-on-one oral interview to assess

overall oral production ability. S receives one
holistic score ranging between 0 and 5.
71
72
Test reliability
highly fulfilled
Content validity
Face validity
fulfillment.
Scenario 2: S listens to a 15 minute video lecture

and takes notes. T makes individual comments
on each S’s notes.
73
74
Test reliability
highly fulfilled
Content validity
Face validity
fulfillment.
Scenario 3: S writes a take-home (overnight) one

page essay on an assigned topic. T reads paper
and comments on organization and content
only, and returns to S for subsequent draft. 75
Targets 1. Tests -
purposes/functions
2. Your assessment
practices
3. The ‘cardinal
criteria’
4. Assessing test
tasks
5. Conclusions
76
77
Key test questions to ask
78
• These principles will help
you make accurate
judgments about the
English competence of
your students.
• They provide useful

guidelines for evaluating
existing tests, and
designing our own.
79
“We owe it to
ourselves and our
students to devote at
least as much energy
to ensuring that our
assessment practices
are worthwhile as we
do to ensuring that
we teach well”.
Dr. David Boud,

University of Technology,
Sydney, Australia
80
Thank you for
your time and
participation.
Best wishes
with your
assessment
practices.
End
83

Testing A Test' - Evaluating Our Assessment Tools Testing A Test' - Evaluating Our Assessment Tools

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Testing A Test' - Evaluating Our Assessment Tools Testing A Test' - Evaluating Our Assessment Tools

Transféré par

Droits d'auteur :

Formats disponibles

‘Testing a test’ – Evaluating our

Presented at the 2011 BC TEAL Annual Conference, May 7

• If you give the same test to the same

• Student related reliability

(Gronlund, 1998, p. 226)

Scenario 1: One-on-one oral interview to assess

Scenario 2: S listens to a 15 minute video lecture

Scenario 3: S writes a take-home (overnight) one

• They provide useful

Dr. David Boud,

Vous aimerez peut-être aussi