Académique Documents
Professionnel Documents
Culture Documents
CHAPTER
12
2
Understanding Test
Results in Context
Motivation
Supportive Classroom Management
Physical
Social
Moral
Emotional
Development
Cognitive
Language
Teaching
Learning
Educational
Psychology
Assessment
Classroom
Standardized
Interpreting
Theories
Intelligence
Creativity
Environment
Diversity
Culture
Society
Special Needs
Ways of Learning
n this chapter we introduce you to basics for constructing tests in your classroom
and background on standardized tests. Then we show you statistical methods for
organizing and, more importantly, understanding the results from both types of tests.
Traditional teacher-made tests, if properly constructed, provide more authentic
information for the teacher, parents, and school. As we saw in Chapter 11, we
can use teacher-made tests to make informed decisions on various levels about
329
330
www.ablongman.com/jordan1e
the performance of students and the success of lessons. For standardized tests,
your role as a teacher is determined by the type of test being given and the rationale behind its administration. Your role may consist of administering the test to
your individual class, using an answer sheet to grade it, analyzing the results
based on the materials supplied to you, making curricular decisions based on
school results, and/or relaying results to students or parents.
For the results of teacher-made tests to be meaningful, we have to first make
sure we design the best test possible. Next, we need to organize the results in a
way that ensures we are getting the most accurate picture. This requires understanding some aspects of test construction and the basics of measurement and
statistics.
Chapter 12
331
Once you have decided why you are giving the test, this decision should
direct you to the test format. For formative information, teachers usually like
short, quick quizzes. They do not want to spend valuable class time, but do
need to keep close track of learning. For summative information, teachers usually like to give students ample time to reflect on the topic. By utilizing the
objectives for a unit or topic, the teacher can determine whether students can
recall, identify, explain, analyze, understand, and so on. The questions asked
should be directed by the objectives. For example, recall is easily accessed by a
fill in the blank question, whereas understanding is perhaps better accessed by
an essay question.
Criterion-referenced test: A
teacher-made test for which the
criterion for mastery is
determined by the teacher and
the curriculum.
332
www.ablongman.com/jordan1e
geometry. He had never received instruction in geometry, but managed to figure out
the concepts tested at the fifth- and sixth-grade levels. The point at which he began to
have some difficulty with other aspects of mathematics was at the ninth-grade test. I
recommended that his mathematics curriculum be modified to allow him to work at
the ninth-grade level in math and the seventh-grade level in geometry, through
mentorship.
Without testing to the limits, this students level of ability would remain
unknown. It was clearly above grade level, but just where exactly? Testing beyond
age expectations allowed achieving an optimal match between development and
curriculum (Keating, 1991).
It is always good to ask challenging questions of students that require
thought to determine where they stand in their knowledge. Under no circumstances should trick questions be used. While you may get the range of scores
you are looking for, the validity and reliability of the test will be compromised.
Also, what you have just tested was not achievement, but rather the students
ability to detect tricks in questions. This introduces ethical issues.
Chapter 12
Teacher-Made Tests
This test is the one we are most familiar with. It is the paper and pencil type of
quiz or test usually given during a unit of instruction or at the end of an instructional unit. The intent of a formative test is often to determine level of understanding. It provides important feedback to the teacher regarding the clarity of
the lessons and the comprehension of the topic. Teachers use formative tests to
regulate their planning and provide a basis for decision making in lesson strategies. They use summative tests to determine the overall extent of student learning and understanding compared to the initial objectives and goals of a unit or
topic. Since teachers use tests or quizzes to answer so many questions, they need
to ensure that the results are providing accurate information. This means that
tests must be not only reliable and free of error but valid to make sure we are
making the correct decisions for students and programs.
Test Validity
For a test to be valid (a meaningful representation of the students knowledge),
the norm group should include children whose background and experience are
similar to the children who will take the test (Sattler, 1992; Wilson, 1996). For
example, tests normed on North American populations are not suitable for children who are recent immigrants. On the other hand, some children who are
recent immigrants excel on tests normed on North American children, thus
demonstrating their exceptional capacity to learn and to learn quickly. When considering whether standardized testing will help us understand a childs development, we need to consider the degree of match between the child and the test.
Stress, motivation, and fatigue also can influence the validity of standardized
tests. With standardized achievement tests, the relationship of test content to
local curricula is not perfect. The tests can provide valid and reliable indicators of
overall achievement in school subjects, but will not always test the knowledge
acquired in specific school curricula. After completing the science items on a standardized achievement test, an 11-year-old once told one of the authors, Well,
those items were OK but you didnt ask me about ants. Ive spent the whole term
studying about ants and I can tell you everything you want to know about them!
Test Reliability
Reliability is an important consideration in standardized testing. It is an indicator
of how consistent the results of testing are likely to be (Sattler, 1992), that is, how
likely it is that a person will obtain a similar score on other testing occasions. Test
developers use a variety of methods to ensure reliability. These include giving the
same test to the same people on two occasions (testretest reliability), giving alternative forms of the same test to two different groups in different order (i.e., one
group does Form A, then B; the other does Form B, then A) (alternative form reliability), and examining the degree of consistency among comparable items (internal consistency reliability) (Sattler). Reliability indicators should be reported in
test manuals; they range from .00 (complete absence of reliability) to 1.00 (perfect
reliability). A reliability coefficient of .80 or higher indicates acceptable reliability.
A number of factors can affect the reliability of test results. If the same test is
given to a student a short time after the first administration, reliability is likely
to be high due to a practice effect. Guessing can lower reliability, as can misunderstood (or misleading) instructions, errors in scoring, and fatigue, stress, and
degree of motivation (Sattler, 1992).
333
334
www.ablongman.com/jordan1e
Info Byte
12.1
Taxonomies are classification systems used to describe learning behaviors. The bottom levels usually
indicate the basic or simplest level, often related to some type of associative learning, such as naming
something. The highest levels are related to complex tasks, such as evaluation. The taxonomies are for
the cognitive, affective, and psychomotor domains (Bloom, 1956; Krathwohl, Bloom & Masia, 1964;
Harrow, 1972).
Cognitive Domain
Type of Learning
Example
Evaluation
Synthesis
Analysis
Application
Comprehension
Knowledge
Affective Domain
Type of Learning
Examples
Psychomotor Domain
Type of Learning
Examples
Nondiscursive communication
Skilled movements
Physical activities
Perceptual
Basic fundamental movement
Reflex movement
Chapter 12
Pre-WW I
Europe
Recall: names,
dates, places
Leaders
during
WWI
5 Multiple choice
10 Matching
US and
Canada,
WWI
Geography
and
Economics
10 Matching
item/place on
map
Explain: roles, 1 Short answer 1 Short answer 3 Fill ins 1 Short essay
etc.
Compare:
3 Multiple 2 Multiple
philosophic, etc. 1 Short essay
choice
choice
we have discussed other types of assessment tools in Chapter 11, we will continue
with teacher-made testing. The easiest way to organize any test and to make sure
you are actually writing a test that reflects what was covered in the class is to
form a testing grid (Figure 12.1). For example, if you have an objective for students to know the capitals of European countries, you would list this as recall. If
you want students to be able to understand and explain the causes of World War
I, you would list it as understanding and explain, or maybe synthesize. In this way,
you start to list the cognitive levels you hoped were attained during your lessons.
Along the vertical axis you would briefly list the general topics you worked on in
class. For example, if you spent a fair amount of time on the geography of countries before World War I, you could identify it briefly with GeoWWI. Once your
table is organized, keep track of questions as they are developed or selected from
a test bank. In this way you can ensure that you are covering all aspects of the lessons for the unit or section. It is too easy to develop a test that actually misses
some section of a lesson simply because you forgot about it. The grid will also
make you aware of how many questions are being asked on one topic.
FIGURE 12.1
335
336
www.ablongman.com/jordan1e
Info Byte
12.2
Binary-Choice Items
1.
2.
3.
4.
5.
Phrase items so that a superficial analysis by the students suggests a wrong answer.
Rarely use negative statements, and never use double negatives.
Include only one concept in each statement.
Have an approximately equal number of items representing the two categories being tested.
Keep item length similar for both categories being tested. (p. 130)
Matching Items
1.
2.
3.
4.
5.
6.
Multiple-Choice Items
Short-Answer Items
1.
2.
3.
4.
5.
Usually employ direct questions rather than incomplete statements, particularly for young students.
Structure the item so that a response should be concise.
Place blanks in the margin for direct questions or near the end of incomplete statements.
For incomplete statements, use only one or, at the most, two blanks.
Make sure blanks for all items are of equal length. (p. 153)
Essay Items
1. Convey to students a clear idea regarding the extensiveness of the response desired.
2. Construct items so that the students task is explicitly described.
3. Provide students with the approximate time to be expended on each item, as well as each items
value.
4. Do not employ optional items.
5. Precursively judge an items quality by composing, mentally or in writing, a possible response.
(p. 157)
Source: W. James Popham, Classroom Assessment: What Teachers Need to Know, Third Edition. Published by Allyn and
Bacon, Boston, MA. Copyright 2002 by Pearson Education. Reprinted by permission of the publisher.
Chapter 12
Points to Note
These items are good when you want to cover a large amount of material in
a short time.
They are subject to guessing, since you generally have a 5050 chance of
getting the correct answer.
A student may know something is wrong, but not know the correct answer.
Sometimes teachers ask students to write the correct answer when the
statement is wrong. Use caution since the word not placed in a sentence can
make the statement correct, even though the student doesnt really know
the correct answer.
Matching Items
These items are excellent when you want to check understanding around concepts
that group together, such as names and dates, events and places, and terms and
definitions. Make sure you identify the basis for the matching and provide information in the directions to indicate whether an item can be used more than once.
Points to Note
The column on the left should include the test number; the column on the
right should be lettered.
Entries or statements for which the student will try to find a match are
called premises. Entries that contain the match are called responses.
Do not have students draw lines from one column to another, unless this is
in a primary grade where there are generally fewer items. So many crisscrossing lines are very difficult to correct.
Have students write the answers in capital letters to avoid confusion over
the letter.
Give each column a title (even A or B).
Multiple-Choice Items
A multiple-choice item consists of two parts: a stem that states a problem and a
number of options that contain the correct answer and multiple distracters. This
type of item has a wide range of uses, from recall to evaluation. With careful
wording, they can provide assessment of a great deal of material in a relatively
short amount of time. For most teachers, a testing period usually consists of a
maximum of one class period. Testing that goes beyond that compromises valuable classroom time. For this reason, most teachers use combinations of test
questions, including multiple-choice items.
Points to Note
Short-Answer Items
These items can be in either the form of a question or an incomplete sentence.
Either way the intent is to have a very short written response. Short-answer items
337
338
www.ablongman.com/jordan1e
work well for elementary students who do not necessarily have the skills to write
longer essays.
Points to Note
Write the item so that there is only one answer. Multiple answers make the
question not only confusing for students but more difficult for the teacher
to correct.
Avoid verbatim material since it encourages rote memorization.
Avoid grammatical clues.
Keep blanks and blank space restricted or students will be triggered to fill
in the space provided.
Essay Items
Essay items are excellent for providing students with an opportunity to provide
critical analysis or evaluation of large amounts of important material. Since
most essays require a considerable amount of time to answer, they have the disadvantage of being able to cover very small amounts of material during a testing
period. For this reason, it is advised to limit this type of question to objectives
that require more analysis from a student. Always ask yourself, Does this question really provide me, as a teacher, with the kind of information I want? If you
find it doesnt do that, select another question type and save the essay question
for the higher-level analysis of the topic.
Points to Note
Standardized Testing
Standardized achievement tests can inform teachers of students level of academic ability as compared to other students their age, the standard for comparison. This is a norm-referenced comparison; that is, a childs performance on the
test is compared to that of a norm group, other children who took the test for the
purposes of determining the ranges of performance at different age and grade
levels. Standardized intelligence tests can provide valuable information about a
childs problem-solving abilities, verbal reasoning, abstract visualspatial abilities, and mathematical reasoning. These tests sometimes are necessary to obtain
and justify funding for special educational support. For example, in Canada,
British Columbias special educational guidelines include standardized testing in
identification criteria for learning disabilities, intellectual disabilities, and giftedness (British Columbia Ministry of Education, 2002). From teachers and parents
Chapter 12
339
12.1
Achievement tests
Aptitude tests
Authentic assessment
Cognitive assessment
Content-related validity
Construct-related validity
Criterion-referenced validity
Formative evaluation
Halo effect
Percentages
Portfolio assessment
Performance assessment
Reliability
Rubrics
Self-reporting inventories
340
www.ablongman.com/jordan1e
FIGURE 12.2
Chapter 12
Central Tendency
We mentioned the clustering at the middle of the curve. This is called central tendency.
There are three parts to this description of the clustering: mean, median, and mode.
1. The mean is the arithmetic average. If you take all the raw scores, add
them up, and divide by the total number of items, it will give you what we
know as an average.
2. The median is the number that separates one-half of the items from the
other half. For example, the median of 2, 3, 6, 8, 9 would be the number 6.
There are five items, so we find the one that represents the midpoint. If
there is an even number of items, the midpoint is halfway between the
items. The median for 3, 4, 5, 6, 7, 8 would be 5.5 (halfway between).
3. The mode is the item that occurs the most times. In the previous example,
there isnt a mode since each number appears only once. But in a normal
curve the item that occurs the most is the same as the mean and the
median. This is because the curve is perfectly symmetrical, mirror images
of each side, always the same.
The mean, or average, is the unit of central tendency we tend to be the most
familiar with. Teachers always refer to class averages, and students use class averages to see how well they did compared to others in the class. But sometimes a
mean can be deceiving. For example, we live in a city where there are several very
expensive neighborhoods. When realtors want to advertise the city in a general
advertisement, they often put not only the average sale price of a house, but also
the median price. If realtors sold several multimillion dollar properties during a
quarter and they advertised the average selling price, it would discourage people
who feel they cannot afford to live here. The price of these high-end homes
would pull the average higher than the actual price of a regular home. So realtors
will mention the average (e.g., $300,000), but also include the median price (e.g.,
$100,000). In this way a buyer would know that, while the average is high, 50% of
the houses are under $100,000 and 50% are above this amount.
The meaning of this discussion for you as a teacher is that you will need far
more information about a test result than an average to make sense of the results.
A few students who get 100% on a quiz may pull the average in such a way that you
do not know how the rest of the class actually performed. The majority of students
may be clustered around 50%, but the average is closer to 70% because of a few individuals. If we use only the average grade for comparison, we may think someone
with a 60% did not do very well when he or she actually performed better than the
majority of other students in the class. Even with the mean and median, there is
still a problem. Not everyone got the mean or median scores (it is also possible no
one got the mean or median scores). It would be good to find out how far away
from the mean a particular score lies. This is where we can use the normal curve.
Since the normal curve is symmetrical, we could divide the area on either
side of the midpoint into zones that capture a certain percentage of the population. These lines delineate an average distance from the midpoint. They are
called standard deviations and mark set divisions or zones at specific distances
from the mean. On the diagram (Figure 12.2) you will find that the zone
between the mean and 1 or 1 standard deviation will encompass 34.13% of
the population of items. This graphically stands for approximately 68% of any
sample of items clustered around the mean. If you were to translate this into
our previous example of height, we might find that the average height of
21-year-old females is 55, with most of the population sampled being between
51 and 59. The reason we can say that most 21-year-old females are somewhere
between 51 and 59 is because of the placement of the standard deviation lines.
This is often written as follows: the average height of 21-year-old females is 55
4. Someone with a height of 52 is still considered within the average range,
even though they are not 55 tall.
341
342
www.ablongman.com/jordan1e
Going out further from the mean we find that between 1 and 2 and 1
and 2 standard deviations the zone contains 13.59% of the population. Again,
going out further between 2 and 3 and 2 and 3, the zone contains 2.14%
of the population. Beyond 3 and 3 there is a very small percentage of cases, as
you can see on the diagram. Previously, we described what these small percentages mean in terms of IQ. We can now try to apply this understanding of the normal curve to a classroom, as a tool for teachers.
12.2
Assessment bias
Correction for guessing
Grade-equivalent scores
p-Value
Percentiles
Percentages
Stanines
Scaled scores
Skewed distribution
Standard error of measurement
T scores
z Scores
When teachers say they arent getting a normal curve within a class, this
is very true. You should never expect to see a normal distribution since
most classes have only 20 to 30 students. Remember, the normal curve is
based on large (infinite) numbers. If your test was criterion-referenced
and everyone understood the material, you will find that the grades will
cluster toward the high end. This is great. It means you attained your
objectives and students have grasped the material. We mentioned above
how to get a range of scores with a test, if that is your intention (normreferenced test).
After taking a group reading assessment, the scores were reported to the
teacher in the form of a standard score. Since every test has a unique
mean and standard deviation (averages are always different on tests, and
so are the standard deviations), it is easier for testing facilities or publishers to convert a raw score into a standard score. Below the normal curve
on the diagram, you will find several common standard scores (Figure
12.2). It is important to realize that all the conversion does is change the
numbers on the line along the bottom of the curve. If a raw score is
placed just a touch above the mean when diagrammed with the raw
scores, it will be just a touch above the mean when diagrammed with a
standard score. Now all a teacher has to do is find the placement of an
individual score along the standard score line and move up to the normal
curve to understand what the students score means. For example, if John
gets a T-score of 55, go along the T-score line until you get to 55. (T-scores
have a mean of 50 and a standard deviation of 10, so you know John did
better than average). Place a ruler on the 55 and move up to the zone on
the normal curve. John places midway between the mean and 1. It is
within the average zone of students, but on the higher side. In Chapter 8
we discussed students who had WISC-III scores of 130. Looking along
the deviation IQ line, we can find why these students were considered
gifted. An IQ of 130 translates into 2 standard deviations above the
mean. The student scored in the upper 2% of the population on this particular test.
Since calculators usually have the option of providing a mean, median,
and standard deviation, teachers should consider obtaining all these to
answer questions about tests and quizzes. If all we rely on is getting an
average for a students grade, we may be doing the student an injustice.
For example, Bill gets a 48% on a test. If the teacher has more information
about the test, any conversation or interpretations of how well Bill performed on the test are more valuable. The average was 60%; the median
was 40% with a standard deviation of 5. We can see that there were some
high scores that pulled the average up, and Bill really did better than most
of the students on this particular test. Sixty-eight percent of the students
got between 35% and 45% on this test. Now we can go beyond our analysis
of Bills score to ask some questions about the test, the teaching strategy,
students understanding, motivation to learn the material, and the like.
More statistical information has allowed us to become more informed
about our assessment.
Chapter 12
343
Problem-Based Scenario
12.1
K-5
6-8
9-12
SpEd
344
Problem-Based Scenario
12.2
www.ablongman.com/jordan1e
K-5
6-8
9-12
SpEd
Teacher: Barbara
Barbara thought this was a great opportunity. Teaching jobs had been very hard to come by lately, so when
the school district offered her this chance she jumped
at it. Mrs. Williams was going on a maternity leave.
She had apparently worked up until the last minute,
and when Barbara took over she indicated she did not
really want to return, even after the leave was up.
Barbara knew she was a good teacher and was
well liked by her colleagues and the students. She
had three Social Studies 11 classes and two grade 8
Social Studies classes. If things continued to go well,
she knew the job would be hers permanently.
Things had gone very well until yesterday. After
only 2 weeks of teaching, it turned out the end of
term grades were due in the office on Monday morning. Barbara had been talking to other teachers, so
she did have some insight into her students that she
felt would help when it came time to write comments after each grade. It seems Mrs. Williams was
not a very good bookkeeper, however, and Barbara
was having problems with the record of grades she had
kept (see Figure 12.5 on page 348). She did not think it
would be much of a problem and, with everything else
FIGURE 12.4
Chapter 12
Problem-Based Scenario
12.3
K-5
6-8
9-12
345
SpEd
Teacher: Sara
Summary
In this chapter, you learned additional ways to assess and evaluate your students from teacher-made tests to standardized tests. Well-designed assessment
procedures require adhering to the principles of reliability and validity in testing. You also learned basic measurement principles and some fundamental
test statistics to help you interpret the results of testing. With all these tools,
you will be well equipped to plan, interpret, and report on your students
progress.
346
www.ablongman.com/jordan1e
A Metacognitive Challenge
You should now be able to reflect on the following questions:
How can I plan and construct a test that reflects classroom objectives
and activities?
What are the appropriate conditions for the selection and use of
standardized tests?
Chapter 12
MEMO
Alice,
I got a note from the superintendent today he wants to have some
plans for implementing creativity in place by the end of the month. He knows
it might be hard for some teachers, but the Parent Advisory Group is
becoming quite vocal about this issue.
Please see what you can do to help some of the teachers who might
be having trouble with anything.
The workshop should be a good start.
Maria
347
B
C
C
D
B
A+
A
B
D
C
BC+
C
D
F
BCD
BA
C
C+
B+
C
D
B-
A. Carlson
P. Cheng
V. Chow
D. Foulds
L. Herback
J. Hoogland
C. Ivey
J. Kadonoff
R. Lacy
R. Lo
C. McLellan
V. Mala
K. Martin
A. Miester
G. Milne
E. Opper
I. Plaxton
K. Proudlove
W. Regan
D. Richardson
L. Sawyer
P. Scott
J. Smart
A. Takahashi
A. Tong
C. Vincent
75
40
85
90
71
76
82
73
52
60
65
60
32
81
70
62
55
71
69
90
100
75
AB
80
50
90
90
72
48
72
92
85
74
81
59
69
95
83
70
105
60
110
60
0*
63
75
40
45
55
70
50
60
0*
65
79
70
29
68
110
115
60
65
87
62
80
80
3 Test
85
70
70
75
58
72
65
82
78
69
37
85
60
85
95
86
31
81
110
79
62
120
61
62
67
75
30
65
90
80
89
60
40
70
95
100
90
70
100
68
80
75
60
60
81
41
110
25
59
60
45
90
68
91
82
AB
80
105
110
103
70
70
110
89
80
23
72
14
102
88
23
AB
115
43
52
60
65
70
72
76
88
72
61
110
87
92
69
73
100
73
60
2z 2
35
80
19
93
76
55
73
80
30
110
57
AB
23
110
69
78
87
77
82
98
92
86
82
77
70
62
69
3z 3
80
53
63
10
80
70
20
55
50
40
80
60
10
75
45
90
90
83
85
90
95
90
70
59
60
100
12
15
20
15
11
0.9
-1
-3
-0.3
-1
2.8
0.5
0.8
0.9
1.8
-1.5
0.1
-1
-0.2
1.3
1.5
1.8
0.9
-1.6
-0.3
0.25
0.5
Dept.
Dept z - Score
esl/social isolate
has a tutor
caught cheating
hates school
usually late
LD
esl
hard worker
discipline problem
quiet
esl
hard worker
social isolate
helpful
doesnt care
very popular
new to school
polite
Mrs. W. Comments
C+
G. Boucher
348
www.ablongman.com/jordan1e
Chapter 12
Date
Adams, Joseph
Assam, Ali
Bigelow, Jessica
Dodge, Tony
Fenize, Lise
Fenize, Olivia
Graham, Robert
ok
La Paz, Antonio
Ok - ESL?
Activity
Mark Comments
349
350
www.ablongman.com/jordan1e
Date
Adams, Joseph
Sept. 18 IRI
Activity
Mark %ile
85
Assam, Ali
64
Bigelow, Jessica
99
Dodge, Tony
23
Fenize, Lise
16
Fenize, Olivia
14
Graham, Robert
60
La Paz, Antonio
Muhammed, Shirley
Sept. 18 IRI
50
Comments
Chapter 12
Name
Adams, Joseph
10
10
10
10
Assam, Ali
Bigelow, Jessica
10
10
10
10
10
Dodge, Tony
Fenize, Lise
Fenize, Olivia
Graham, Robert
La Paz, Antonio
Muhammed, Shirley
Nychak, Kenneth
Abs.
OToole, Giles
10
10
Pacino, Francesca
10
10
10
10
10
351
352
www.ablongman.com/jordan1e
Math
Subject: ________________
4
Grade: ________
Mr. Garcia
Teacher:_________________
Name
Date
CTBS
math.
Gr. 3
Date
Oct. 10
chap 4
test
Adams, Joseph
81
30
Assam, Ali
66
22
Bigelow, Jessica
72
25
Dodge, Tony
53
18
Cant do
Fenize, Lise
26
10
Fenize, Olivia
30
Graham, Robert
95
29
La Paz, Antonio
68
21
Muhammed, Shirley
86
26
Nychak, Kenneth
69
20
OToole, Giles
98
30
10
Pacino, Francesca
91
30
10
10
10
10
Date
Date
Date