Académique Documents
Professionnel Documents
Culture Documents
1.0
OVERVIEW OF ASSESSMENT:
CONTEXT, ISSUES AND TRENDS
SYNOPSIS
LEARNING OUTCOMES
By the end of this topic, you will be able to:
1.2
1.
2.
3.
FRAMEWORK OF TOPICS
CONTENT
SESSION ONE (3 hours)
1.3
INTRODUCTION
1.4
1.4.1 Test
The four terms above are frequently used interchangeably in any
academic discussions. A test is a subset of assessment intended to measure
a test-taker's language proficiency, knowledge, performance or skills. Testing
is a type of assessment techniques. It is a systematically prepared procedure
that happens at a point in time when a test-taker gathers all his abilities to
achieve ultimateperformance because he knows that his responses are being
evaluated and measured.A test is first a method of measuring a test-takers
ability, knowledge or performance in a given area; and second it must
measure.
Bachman (1990) who was also quoted by Brown defined a test as a
process of quantifying a test-takers performance according to explicit
procedures or rules.
2
1.4.2 Assessment
Assessment is every so oftena misunderstood term. Assessment is a
comprehensive process of planning, collecting, analysing, reporting, and using
information on students over time(Gottlieb, 2006, p. 86).Mousavi (2009)is of
the opinion that assessment is appraising or estimating the level of magnitude
of some attribute of a person. Assessment is an important aspect in the fields
of language testing and educational measurement and perhaps, the most
challenging partof it. It is an ongoing process in educational practice, which
involves a multitude of methodological techniques. It can consist of tests,
projects, portfolios, anecdotal information and student self-reflection.A test
may be assessed formally or informally, subconsciously or consciously, as well
as incidental or intended by an appraiser.
1.4.3 Evaluation
Evaluation is another confusing term. Many are confused between
evaluation and testing. Evaluation does not necessary entail testing. In reality,
evaluation is involved when the results of a test (or other assessment
procedure) are used for decision-making (Bachman, 1990, pp. 22-23).
Evaluation involves the interpretation of information. If a teacher simply
records numbers or makes check marks on a chart, it does not constitute
evaluation. When a tester or marker evaluate, s/he values the results in such
a way that the worth of the performance is conveyed to the test-taker. This is
usually done with some reference to the consequences, either good or bad of
the performance.This is commonly practised in applied linguistics research,
where the focus is often on describing processes, individuals, and groups, and
the relationships among language use, the language use situation, and
language ability.
Test scores are an example of measurement, and conveying the
meaning of those scores is evaluation. However, evaluation can occur
without measurement. For example, if a teacher appraises a students correct
oral response with words like Excellent insight, Lilly!it is evaluation.
3
1.4.4 Measurement
Measurement is the assigning of numbers to certain attributes of
objects, events, or people according to a rule-governed system. For our
purposes of language testing, we will limit the discussion to unobservable
abilities or attributes, sometimes referred to as traits, such as grammatical
knowledge, strategic competence or language aptitude. Similar to other tyoes
of assessment, measurement must be conducted according to explicit rules
and procedures as spelled out in test specifications, criteria, and procedures
for scoring.Measurement could be interpreted as the process of quantifying the
observed performance of classroom learners. Bachman (1990) cautioned us
to distinguish between quantitative and qualitative descriptions. Simply put,
the former involves assigning numbers (including rankings and letter grades)
to observed performance, while the latter consists of written descriptions, oral
feedback, and non-quantifiable reports.
The relationships among test, measurement, assessment, and their
uses are illustrated in Figure 1.
research methodology;
b)
practical advances;
c)
d)
e)
3.0
PreIndependence
Implementation
of the Razak
Report (1956)
Razak
Razak Report
Report gave
gave birth
birth to
to the
the National
National
Education
Policy
and
the
creation
Education Policy and the creation of
of
Examination Syndicate (LP). LP conducted
examinations such as the Cambridge and
Malayan
Malayan Secondary
Secondary School
School Entrance
Entrance
Examination (MSSEE), and Lower Certificate of
Education (LCE) Examination.
Implementation
of the
RahmanTalib
Report (1960)
RahmanTalib
RahmanTalib Report
Report recommended
recommended the
the
following actions:
1. Extend schooling age to 15 years old.
2.
2. Automatic
Automatic promotion
promotion to
to higher
higher classes.
classes.
3.
3. Multi-stream
Multi-stream education
education (Aneka
(Aneka Jurusan).
Jurusan).
The following changes in examination were
made:
-- The
The entry
entry of
of elective
elective subjects
subjects in
in LCE
LCE and
and
SRP.
-- Introduction
Introduction examination
examination of
of the
the Standard
Standard 5
5
Evaluation
Evaluation Examination.
Examination.
-- The
The introduction
introduction of
of Malaysia's
Malaysia's Vocational
Vocational
Education Examination.
-- The
The introduction
introduction of
of the
the Standard
Standard 3
3 Dignostic
Dignostic
Test
Test (UDT).
(UDT).
The
The implementation
implementation of
of Cabinet
Cabinet Report
Report
resulted
resulted in
in evolution
evolution of
of the
the education
education system
system
to
to its
its present
present state,
state, especially
especially with
with KBSR
KBSR
and
and KBSM.
KBSM. Adjustments
Adjustments were
were made
made in
in
examination to fulfill the new curriculum's
needs
needs and
and to
to ensure
ensure itit is
is in
in line
line with
with the
the
National
National Education
Education Philosophy.
Philosophy.
Implementation
of the Cabinet
Report (1979)
Implementation of
the Malaysia
Education Blueprint
(2013 2025)
TOPICvi 2
ROLE
AND PURPOSES OF
i
ASSESSMENT IN
TEACHING AND LEARNING
ii
iii
v
iv
Tutorial question
Examine the contributing factors to the changing trends of
language assessment.
Create and present findings using graphic organisers.
10
o Program evaluation
o Providing research criteria
o Assessment of attitudes and socio-psychological differences
Alderson, Clapham and Wall (1995) have a different classification
scheme. They sort tests into these broad categories: proficiency,
achievement, diagnostic, progress, andplacement. Brown (2010), however,
categorised tests according to their purpose, namely achievement tests,
diagnostic tests, placement tests, proficiency test, and aptitude tests.
Proficiency Tests
Proficiency tests are not based on a particular curriculum or language
programme. They are designed to assess the overall language ability of
students at varying levels. They may also tell us how capable a
person is in a particular language skill area.Their purpose is to describe what
students are capable of doing in a language.
Proficiency tests are usually developed by external bodies such as
examination boards like Educational Testing Services (ETS) or Cambridge
ESOL. Some proficiency tests have been standardised for international use,
such as the American TOEFL test which is used to measure the English
language proficiency of foreign college students who wish to study in NorthAmerican universities or the British-Australian IELTS test designed for those
who wish to study in the United Kingdom or Australia (Davies et al., 1999).
Achievement Tests
Achievement tests are similar to progress tests in that their purpose is
to see what a student has learned with regard to stated course outcomes.
However, they are usually administered at mid-and end- point of the semester
or academic year. The content of achievement tests is generally based on the
specific course content or on the course objectives. Achievement tests are
11
because they cover a smaller amount of material and assess fewer objectives.
Placement Tests
These tests, on the other hand, are designed to assess students level
of language ability for placement in an appropriate course or class. This type
of test indicates the level at which a student will learn most effectively. The
main aim is to create groups, which are homogeneous in level. In designing a
placement test, the test developer may choose to base the test content either
on a theory of general language proficiency or on learning objectives of the
curriculum. In the former, institutions may choose to use a well-established
proficiency test such as the TOEFL or IELTS exam and link it to curricular
benchmarks. In the latter, tests are based on aspects of the syllabus taught at
the institution concerned.
In some contexts, students are placed according to their overall rank in
the test results. At other institutions, students are placed according to their
level in each individual skill area. Elsewhere, placement test scores are used
to determine if a student needs any further instruction in the language or could
matriculate directly into an academic programme.
Discuss and present the various types of tests and assessment tasks
that students have experienced.
Discuss the extent tests or assessment tasks serve their purpose.
13
TOPIC 3
3.0
SYNOPSIS
LEARNING OUTCOMES
By the end of this topic, you will be able to:
7.
8.
3.2
FRAMEWORK OF TOPICS
Norm-Referenced
and CriterionReferenced
Types of Tests
14
Formative and
Summative
Objective and
Subjective
CONTENT
SESSION THREE (3 hours)
3.3
3.5
Norm-Referenced Test
A test that measures
students achievement as
compared to other
students in the group
Formative Test
Formative test or assessment, as the name implies, is a kind of
With continual feedback the teachers may assist students to improve their
performance. The teachers point out on what the students have done wrong
and help them to get it right. This can take place when teachers examine the
results of achievement and progress tests. Based on the results of formative
test or assessment, the teachers can suggest changes to the focus of
curriculum or emphasis on some specific lesson elements. On the other hand,
students may also need to change and improve. Due to the demanding nature
of this formative test, numerous teachers prefer not to adopt this test although
giving back any assessed homework or achievement test present both
teachers and students healthy and ultimate learning opportunities.
3.6
Summative Test
Summative test or assessment, on the other hand, refers to the kind of
measurement that summarise what the student has learnt orgive a one-off
measurement.In other words, summative assessment is assessment of
student learning. Students are more likely to experience assessment carried
out individually where they are expected to reproduce discrete language items
from memory.The results then are used to yield a school report and to
determine what students know and do not know.It does not necessarily provide
a clear picture of an individuals overall progress or even his/her full potential,
especially if s/heis hindered by the fear factor of physically sitting for a test, but
may provide straightforward and invaluable results for teachers to analyse. It is
given at a point in time to measure student achievement in relation to a clearly
defined set of standards, but it does not necessarily show the way to future
progress. It is given after learning is supposed to occur. End of the year tests
in a course and other general proficiency or public exams are some of the
examples of summative tests or assessment.Table 3.1 shows formative and
summative assessments that are common in schools.
Formative Assessment
Anecdotal records
Quizzes and essays
Summative Assessment
Final exams
National exams (UPSR, PMR, SPM,
STPM)
Entrance exams
Diagnostic tests
17
3.7
Objective Test
According to BBC Teaching English, an objective test is a test that
ii.
True-false items/questions:
iii.
Matchingitems/questions; and
iv.
18
2.
Stem
Every multiple-choice item consists of a stem (the body of the item that
Options or alternatives
They are known as a list of possible responses to a test item.
There are usually between three and five options/alternatives to
choose from.
4.
Key
This is the correct response. The response can either be correct
or the best one. Usually for a good item, the correct answer is not obvious as
compared to the distractors.
5. Distractors
This is known as a disturber that is included to distract students from
selecting the correct answer. An excellent distractor is almost the same as the
correct answer but it is not.
ii.
iii.
Make certain that the intended answer is clearly the one correct
one;
iv.
3.8
Subjective Test
Contrary to an objective test, a subjective test is evaluated by giving an
opinion, usually based on agreed criteria.Subjective tests include essay, shortanswer, vocabulary, and take-home tests. Some students become very
anxious of these tests because they feel their writing skills are not up to par.
In reality, a subjective test provides more opportunity to test-takers to
show/demonstrate their understanding and/or in-depth knowledge and skills in
the subject matter. In this case, test takers might provide some acceptable,
alternative responses that the tester, teacher or test developer did not
predict. Generally, subjective tests will test the higher skills of analysis,
synthesis, and evaluation. In short, subjective test will enable students to be
more creative and critical. Table 3.2 shows various types of objective and
subjective assessments.
Objective Assessments
Subjective Assessments
True/False Items
Extended-response Items
Multiple-choice Items
Restricted-response Items
Multiple-responses Item
Essay
Matching Items
Table 3.2: Various types of objective and subjective assessments
Some have argued that the distinction between objective and subjective
assessments is neither useful nor accurate because, in reality, there is no such
thing as objective assessment. In fact, all assessments are created with
inherent biases built into decisions about relevant subject matter and content,
as well as cultural (class, ethnic, and gender) biases.
Reflection
20
1.
Objective test items are items that have only one answer or correct
response. Describe in-depth the multiple-choice test item.
2.
Discussion
1. Identify at least three differences between formative and summative
assessment?
2. What are the strengths of multiple-choice items compared to essay
items?
3. Informal assessments are often unreliable, yet they are still
important in classrooms. Explain why this is the case, and defend
your explanation with examples.
4. Compare and contrast Norm-Referenced Test with CriterionReferenced Test.
TOPIC 4
4.0
SYNOPSIS
LEARNING OUTCOMES
By the end of this topic, you will be able to:
21
4.2
1.
2.
3.
FRAMEWORK OF TOPICS
Reliability
Interpretability
Validity
Types of
Tests
Practicality
Authenticity
22
Washback Effect
Objectivity
CONTENT
SESSION FOUR (3 hours)
4.3
INTRODUCTION
Assessment is a complex, iterative process requiring skills,
RELIABILITY ( consistency)
Reliability means the degree to which an assessment tool produces
and poorly on the second half due to fatigue, and so on. Thus, lack of
reliability in the scores students receive is a treat to validity.
According to Brown (2010), a reliable test can be described as
follows:
Consistent in its conditions across two or more administrations
Gives clear directions for scoring / evaluation
Has uniform rubrics for scoring / evaluation
Lends itself to consistent application of those rubrics by the
scorer
Contains item / tasks that are unambiguous to the test-taker
the raters agree 8 out of 10 times, the test has an 80% inter-rater
reliability rate. Rater reliability is assessed by having two or more
independent judges score the test. The scores are then compared to
determine the consistency of the raters estimates.
24
same rater
(Clark, 1979).
4.4.2 Test Administration Reliability
There are a number of reasons which influences test
administration reliability. Unreliability occurs due to outside
interference like noise, variations in photocopying, temperature
variations, the amount of light in various parts of the room, and even
the condition of desk and chairs. Brown (2010) stated that he once
witnessed the administration of a test of aural comprehension in which
an audio player was used to deliver items for comprehension, but due
to street noise outside the building, test-taker sitting next to open
windows could not hear the stimuli clearly. According to him, that was
a clear case of unreliability caused by the conditions of the test
administration.
25
b.
Teacher-Student factors
In most tests, it is normally for teachers to construct and
c.
Environment factors
An examination environment certainly influences test-takers and
Because students' grades are dependent on the way tests are being
administered, test administrators should strive to provide clear and
accurate instructions, sufficient time and careful monitoring of tests to
improve the reliability of their tests. A test-re-test technique can be
used to determine test reliability.
e.
Marking factors
common that different markers award different marks for the same
answer even with a prepared mark scheme. A markers assessment
may vary from time to time and with different situations. Conversely, it
does not happen to the objective type of tests since the responses are
fixed. Thus, objectivity is a condition for reliability.
4.5
VALIDITY
Validity refers to the evidence base that can be provided about
28
Content validity: Does the assessment content cover what you want to
assess? Have satisfactory samples of language and language skills been
selected for testing?
Construct validity: Are you measuring what you think you're measuring?
Is the test based on the best available theory of language and language
use?
Concurrent (parallel) validity: Can you use the current test score to
estimate scores of other criteria? Does the test correlate with other existing
measures?
the criteria (concepts, skills and knowledge) relevant to the purpose of the
examination. The important notion here is the purpose.
29
30
juncture, (lack of) hesitations, and other elements within the construct of
fluency. Tests are, in a manner of speaking, operational definitions of
constructs in that their test tasks are the building blocks of the entity that
is being measured (see Davidson, Hudson, & Lynch, 1985; T.
McNamara, 2000).
4.5.4 Concurrent validity
Concurrent validity is the use of another more reputable and
recognised test to validate ones own test. For example, suppose you
come up with your own new test and would like to determine the validity
of your test. If you choose to use concurrent validity, you would look for
a reputable test and compare your students performance on your test
with their performance on the reputable and acknowledged test. In
concurrent validity, a correlation coefficient is obtained and used to
generate an actual numerical value. A high positive correlation of 0.7 to
1 indicates that the learners score is relatively similar for the two tests
or measures.
For example, in a course unit whose objective is for students to
be able to orally produce voiced and unvoiced stops in all possible
phonetics environments, the results of one teachers unit test might be
compared with an independent assessment such as a commercially
produced test of similar phonemic proficiency. Since criterion-related
evidence usually falls into one of two categories of concurrent and
predictive validity, a classroom test designed to assess mastery of a
point of grammar in a communicative use will have criterion validity if
test scores are verified either by observed subsequent behaviour or by
other communicative measures of grammar point in question.
4.5.5 Predictive validity
Predictive validity is closely related to concurrent validity in that it
too generates a numerical value. For example, the predictive validity of
32
4.5.7 Objectivity
The objectivity of a test refers to the ability of
teachers/examiners who mark the answer scripts. Objectivity refers to
the extent, in which an examiner examines and awards scores to the
same answer script. The test is said to have high objectivity when the
examiner is able to give the same score to the similar answers guided
by the mark scheme. An objective test is a test that has the highest
level of objectivity due to the scoring that is not influenced by the
examiners skills and emotions. Meanwhile, subjective test is said to
have the lowest objectivity. Based on various researches, different
examiners tend to award different scores to an essay test. It is also
possible that the same examiner would give different scores to the
same essay if s/he is to re-check at different times.
4.5.8 Washback effect
The term 'washback' or backwash (Hughes, 2003, p.1)
refers to the impact that tests have on teaching and learning. Such
impact is usually seen as being negative: tests are said to force
teachers to do things they do not necessarily wish to do.However, some
34
have argued that tests are potentially also 'levers for change' in
language education: theargument being that if a bad test has negative
impact,a good test should or could have positive washback(Alderson,
1986b; Pearson, 1988).
Cheng, Watanabe, and Curtis (2004) offered an entire anthology
to the issue of wash back while Spratt (2005) challenged teachers to
become agents of beneficial washback in their language classrooms.
Brown (2010) discusses the factors that provide beneficial washback in
a test.He mentions that such a test can positively influence what and
how teachers teach, students learn; offer learners a chance to
adequately prepare, give learners feedback that enhance their language
development, is more formative in nature than summative, and provide
conditions for peak performance by the learners.
In large-scale assessment, washback often refers to the effects
that tests have on instruction in terms of how students prepare for the
test. In classroom-based assessment, washback can have a number of
positive manisfestations, ranging from the benefit of preparing and
reviewing for a test to the learning that accrues from feedback on ones
performance. Teachers can provide information that washes back to
students in the form of useful diagnoses of strengths and weaknesses.
The challenge to teachers is to create classroom tests that serve
as learning devices through which washback is achieved. Students
incorrect responses can become a platform for further improvements.
On the other hand, their correct responses need to be complimented,
especially when they represent accomplishments in a students
developing competence. Teachers can have various strategies in
providing guidance or coaching. Washback enhances a number of
basic principles of language acquisition namely intrinsic motivation,
autonomy, self-confidence, language ego, interlanguage, and strategic
investment, among others.
Washback is generally said to be either positive or negative.
35
38
TOPIC 5
5.0
SYNOPSIS
Topic 5 exposes you the stages of test construction, the preparing of test
blueprint/test specifications, the elements in a Test Specifications Guidelines
And the importance of following the guidelines for constructing tests items.
Then we look at the various test formats that are appropriate for language
assessment.
5.1
LEARNING OUTCOMES
By the end of this topic, you will be able to:
1.
2.
3.
4.
5.
6.
7.
8.
validity
identify the elements in a Test Specifications Guidelines
demonstrate an understanding of the importance of following the
9.
39
5.2
FRAMEWORK OF TOPICS
CONTENT
SESSION FIVE (3 hours)
5.3
determining
planning
writing
preparing
reviewing
vi
vii
pre-testing
validating
5.3.1 Determining
The essential first step in testing is to make oneself perfectly
clear about what it is one wants to know and for what purpose. When
we start to construct a test, the following questions have to be
answered.
5.3.2 Planning
The first form that the solution takes is a set of specifications for
the test.This will include information on: content, format and timing,
criteria,levels of performance, and scoring procedures.
In this stage, the test constructor has to determine the content by
answering the following questions:
Describing the purpose of the test;
Describing the characteristics of the test takers, the nature of the
population of the examinees for whom the test is being designed.
Defining the nature of the ability we want to measure;
Developing a plan for evaluating the qualities of test usefulness, which
is the degree to which a test is useful for teachers and students, it
includes six qualities: reliability, validity, authenticity, practicality interactiveness, and impact;
Identifying resources and developing a plan for their allocation and
management;
Determining format and timing of the test;
Determining levels of performance;
Determining scoring procedures
5.3.3 Writing
Although writing items is time-consuming, writing good items is an art.
No one can expect to be able consistently to produce perfect items.
Some items will have to be rejected, others reworked. The best way to
identify items that have to be improved or abandoned is through
teamwork. Colleagues must really try to find fault; and despite the
seemingly inevitable emotional attachment that item writers develop to
items that they have created, they must be open to, and ready to
41
accept, the criticisms that are offered to them. Good personal relations
are a desirable quality in any test writing team.
Test items writers should possess the following characteristics:
5.3.4 Preparing
One has to understand the major principles, techniques and experience
of preparing the test items. Not every teacher can make a good tester.
To construct different kinds of tests, the tester should observe some
principles. In the production-type tests, we have to bear in mind that no
comments are necessary. Test writers should also try to avoid test
items, which can be answered through test-
wiseness. Test-
skills to be included
are your guiding plan for designing an instrument that effectively fulfils
your desired principles, especially validity.
It is vital to note that for large-scale standardised tests like Test
of English as a Foreign Language (TOEFL Test), International
English Language Testing System (IELTS), Michigan English
Language Assessment Battery) MELAB, and the like, that are intended
to be widely distributed and thus are broadly generalised, test
specifications are much more formal and detailed (Spaan, 2006). They
are also usually confidential so that the institution that is designing the
test can ensure the validity of subsequent forms of a test.
Many language teachers claim that it is difficult to construct an item. In
reality, it is rather easy to develop an item, if we are committed in the
planning of the measuring instruments to evaluate students
achievement.
However, what exactly is an item for a test? An item is a tool, an
instrument, instruction or question used to get feedback from testtakers, which is an evidence t of something that is being measured. An
item is an instrument used to get feedback, which is a useful information
for consideration in measuring or asserting a construct measurement.
Items can be classified as a recall and thinking item. A recall item is the
item that requires one to recall in order to answer, and a thinking item
refers to an item that requires test-takers to use their thinking skills to
attempt.
For instance, in a grammar unit test that will be administered at
the end of a three-week grammar course for high beginning adult
learners (Level 2). The students will be taking a test that covers verb
44
45
Taxonomy by allowing these two aspects, the noun and verb, to form
separate dimensions, the noun providing the basis for the Knowledge
dimension
and the verb forming the basis for the Cognitive Process
46
The
product of
word knowledge
Categories &
Cognitive Processes
Alternative Names
Remember
Recognising
Definition
Retrieve knowledge
from long-term
memory
Identifying
Locating knowledge in
long-term memory that
is consistent with
presented material
47
Recalling
Retrieving
Retrieving relevant
knowledge from longterm memory
Level 2 C2
Categories &
Cognitive Processes
Alternative Names
Understand
Interpreting
Construct meaning
from instructional
messages, including
oral, written, and
graphic
communication
Clarifying
Paraphrasing
Representing
Translating
Exemplifying
Illustrating
Instantiating
Classifying
Categorising
Subsuming
Summarising
Abstracting
Generalising
Inferring
Comparing
Definition
Concluding
Extrapolating
Interpolating
Predicting
Contrasting
Mapping
Matching
Constructing models
Explaining
Constructing a cause
and effect model of a
system
Level 3 C3
48
Categories &
Cognitive Processes
Alternative Names
Apply
Definition
Applying a procedure
to a familiar task
Carrying out
Executing
Exemplifying
Applying a procedure to
a familiar task
Illustrating
Instantiating
Applying a procedure to
an unfamiliar task
Using
Analyse
Differentiating
Organising
Distinguishing relevant
from irrelevant parts or
important from
unimportant parts of
presented material
Determining how
elements fit or function
within a structure
Attributing
Determining a point of
view, bias, values, or
intent underlying
presented material
Evaluating
Make judgments
based on criteria and
standards
Checking
Coordinating
Detecting
Monitoring
Testing
49
Detecting
inconsistencies or
fallacies within a
process or product,
determining whether a
process or product has
internal consistency;
detecting the
effectiveness of a
procedure as it is being
implemented
Judging
Critiquing
Detecting
inconsistencies
betweena product and
external
criteria;determining
whether a product has
external consistency;
detecting the
appropriateness of a
procedure for a given
problem
Create
Putting elements
together to form a
coherent or functional
whole; reorganise
elements into a new
pattern or structure
Hypothesising
Generating
Coming upwith
alternative hypotheses
based on criteria
Designing
Planning
Producing
Inventing a product
The Knowledge Domain
Categories &
Cognitive Processes
Factual Knowledge
Definition
The basic elements students must know to the
acquainted with a discipline or solve problems in it
Conceptual Knowledge
Procedural Knowledge
51
diagram
52
53
55
The most powerful model for understanding these three levels and
integrating them into learning intentions and success criteria is the
SOLO model.
However, the taxonomy is not without critics; Chick (1998:20) believes
that there is potential to misjudge the level of functioning and Chan et al.
(2002:512) criticises its conceptual ambiguity stating that the categorisation
is unstable. In these two studies, the SOLO taxonomy was used primarily for
assessing completed work, so use throughout the teaching process may
alleviate these issues.
An additional criticism, in particular when the taxonomy is compared
with that of Bloom (1956), is the SOLO taxonomys structure. Biggs & Collis
(1991) refers to the structure as a hierarchy, as does Moseley et al. (2005);
naturally, there are concerns when complex processes, such as human
thought, are categorised in this manner. However, Campbell et al. (1992)
explained the structure of the SOLO taxonomy as consisting as a series of
cycles (especially between the Unistructural, Multistructural and Relational
levels), which would allow for a development of breadth of knowledge as well
as depth.
clearly written questions that do not attempt to trick or confuse them into
incorrect responses. The following presents the major characteristics of wellwritten test items.
5.6.1 Aim of the test
56
items can help expedite the laborious process of writing test items as well as
supply a format for asking basic questions. A format that provides an initial
starting structure to use in writing questions can be valuable for item writers.
When these formats are used, test takers can quickly read and understand the
questions, since the format is expected. For example, to measure
understanding of knowledge or facts, questions can begin with the following:
What best defines .?
What is not the characteristic of .?
What is an example of .?
5.6.5 Level of difficulty
A test has a planned number of questions at a level of difficulty and
discrimination to best determine mastery and non-mastery performance states.
Test-takers should clearly understand what is needed in education and
language assessment to prepare for the examination and how much
experience performing certain activities would help in preparation. This should
be the road map that helps item writers create test items and helps test takers
understand what will be required of them to pass an examination. In any test
item construction, we must assure that weak students could answer easy item,
intermediate language proficiency students could answer easy and moderate
items whereas high language proficiency students could answer easy,
moderate and advance test items. A reliable and valid test instrument should
encompass all three levels of difficulties.
5.6.6 International and Cultural Considerations (biasness)
In standardised tests when exams are distributed internationally, either
in a single language or translated to other languages, always refrain from
the use of slang, geographic references, historical references or dates
(holidays) that may not be understood by an international examinee. Tests
58
59
http://books.google.com.my/books/about/Constructing_Test_Items.html?id=Ia3SGDfbaV
Test format
What is the difference between test format and test type? For example,
when you want to introduce new kinds of test, for example, reading test, which
is organised a little bit different from the existing test items, what do you say?
Test format or test type? Test format refers to the layout of questions on a test.
For example, the format of a test could be two essay questions, 50 multiplechoice questions, etc.For the sake of brevity, I will consider providing the
outlines of some large-scale standardised tests.
UPSR
Primary School Evaluation Test, also as known Ujian Penilaian Sekolah
Rendah (commonly abbreviated as UPSR; Malay), is a national examination
taken by all pupils in our country at the end of their sixth year in primary
school before they leave for secondary school. It is prepared and examined by
the Malaysian Examinations Syndicate. This test consists of two papers
namely Paper 1 and Paper 2.
Multiple-choice questions are tested using a standardised optical
answer sheet that uses optical mark recognition for detecting answers for
Paper 1 and Paper 2 comprises three sections, namely Sections A, B, and C.
English and administered via the Internet. There are four sections (listening,
reading, speaking and writing), which take a total of about four and a half
hours to complete.
61
TOPIC 6
6.0
SYNOPSIS
Topic 6 focuses on ways to assess language skills and language
content. It defines the types of test items used to assess language skills
and language content. It also provides teachers with suggestions on
ways a teacher can assess the listening, speaking, reading and writing
skills in a classroom. It also discusses concepts of and differences
between discrete point test, integrative test and communicative test.
6.1
LEARNING OUTCOMES
At the end of Topic 6, teachers will be able to:
6.2
FRAMEWORK OF TOPICS
CONTENT
SESSION SIX (6 hours)
6.2.1
b.
Speaking
In the assessment of oral production, both discrete feature
objective tests and integrative task-based tests are used. The first
type tests such skills as pronunciation, knowledge of what
language is appropriate in different situations, language required in
doing different things like describing, giving directions, giving
instructions, etc. The second type involves finding out if pupils can
perform different tasks using spoken language that is appropriate
for the purpose and the context. Task-based activities involve
describing scenes shown in a picture, participating in a discussion
about a given topic, narrating a story, etc. As in the listening
performance assessment tasks, Brown 2010 cited four categories
for oral assessment.
1.
B.
C.
concern.
Intensive (controlled). Beyond the fundamentals of imitative
writing are skills in producing appropriate vocabulary within a
context, collocation and idioms, and correct grammatical features
up to the length of a sentence. Meaning and context are
important in determining correctness and appropriateness but
most assessment tasks are more concerned with a focus on form
3.
4.
It is not by accident that we find there are few, if any, test formats that are
either supply type and objective or select type and subjective. Select type
tests tend to be objective while supply type tests tend to be subjective.
In addition to the above, Brown and Hudson (1998), have also suggested
three broad categories to differentiate tests according to how students are
expected to respond. These categories are the selected response tests, the
constructed response tests, and the personal response tests. Examples of
each of these types of tests are given in Table 6.1.
Constructed response
Personal response
True false
Fill-in
Conferences
Matching
Short answer
Portfolios
Multiple choice
Performance test
Communicative Test
As language teaching has emphasised the importance of
communication through the communicative approach, it is not surprising
that communicative tests have also been given prominence. A
communicative emphasis in testing involves many aspects, two of
which revolve around communicative elements in tests and meaningful
content. Both these aspects are briefly addressed in the following sub
sections:
In short, the kinds of tests that we should expect more of in the future
will be communicative tests in which candidates actually have to
produce the language in an interactive setting involving some degree of
73
Exercise 1
1.
2.
74
TOPIC 7
7.0
SYNOPSIS
Topic 7 focuses on the scoring, grading and assessment criteria. It provides
teachers with brief descriptions on the different approaches to scoring
namely:-objective, holistic and analytic.
7.1
LEARNING OUTCOMES
7.2
FRAMEWORK OF TOPICS
CONTENT
SESSION SEVEN (3 hours)
75
7.2.1
Objective approach
A type of scoring approach is the objective scoring approach. This scoring
approach relies on quantified methods of evaluating students writing. A
sample of how objective scoring is conducted is given by Bailey (1999) as
follows:
76
CCriteria
77
No response.
The 6 point scale above includes broad descriptors of what a students essay
reflects for each band. It is quite apparent that graders using this scale are
expected to pay attention to vocabulary, meaning, organisation, topic
development and communication. Mechanics such as punctuation are
secondary to communication.
Bailey also describes another type of scoring related to the holistic approach
which she refers to as primary trait scoring. In primary trait scoring, a particular
functional focus is selected which is based on the purpose of the writing and
grading is based on how well the student is able to express that function. For
example, if the function is to persuade, scoring would be on how well the
author has been able to persuade the grader rather than how well organised
the ideas were, or how grammatical the structures in the essay were. This
technique to grading emphasises functional and communicative ability rather
than discrete linguistic ability and accuracy.
7.2.3 Analytic approach
Analytical scoring is a familiar approach to many teachers. In analytical
scoring, raters assess students performance on a variety of categories which
are hypothesised to make up the skill of writing. Content, for example, is
often seen as an important aspect of writing i.e. is there substance to what
is written? Is the essay meaningful? Similarly, we may also want to consider
the organisation of the essay. Does the writer begin the essay with an
appropriate topic sentence?
Are there good transitions between paragraphs? Other categories that we
may want to also consider include vocabulary, language use and mechanics.
The following are some possible components used in assessing writing
ability using an analytical scoring approach and the suggested weightage
assigned to each:
Components
Content
Organisation
Vocabulary
Language Used
Weight
30 points
20 points
20 points
25 points
78
Mechanics
5 points
Advantages
Analytical
Objective
Disadvantages
Quickly graded
Provide a public standard that is
understood by the teachers and
students alike
Relatively higher degree of rater
reliability
Applicable to the assessment of
many different topics
Emphasise the students
strengths rather than their
weaknesses.
It provides clear guidelines in
grading in the form of the various
components.
Allows the graders to consciously
address important aspects of
writing.
Emphasises the students
strengths rather than their
weaknesses.
EXERCISE
1.
79
TOPIC 8
8.0
SYNOPSIS
Topic 8 focuses on item analysis and interpretation. It provides teachers with
brief descriptions on basic statistics terminologies such as mode, median, mean,
standard deviation, standard score and interpretation of data. It will also look at
some item analysis that deals with item difficulty and item discrimination.
Teachers will also be introduced to distractor analysis in language assessment.
FRAMEWORK OF TOPICS
80
CONTENT
SESSION EIGHT (6 hours)
MEDIAN
81
MEAN
8.2.2
Standard deviation
Standard deviation refers to how much the scores deviate from the mean.
There are two methods of calculating standard deviation which are the
deviation method and raw score method which are illustrated by the following
formulae.
To illustrate this, we will use 20, 25,30. Using standard deviation method, we
come up with the following table:
Table 8.1:Calculating the Standard Deviation Using the Deviation Method
82
Using the raw score method, we can come up with the following:
Table 8.2 : Calculating the Standard Deviation Using the Raw Score Method
Both methods result in the same final value of 5. If you are calculating
standard deviation with a calculator, it is suggested that the deviation
method be used when there are only a few scores and the raw score
method be used when there are many scores. This is because when
there are many scores, it will be tedious to calculate the square of the
deviations and their sum.
8.2.3 Standard score
Standardised scores are necessary when we want to make
comparisons across tests and measurements. Z scores and T scores
are the more common forms of standardised scores although you
may come up with your own standardised score. A standardised score
can be computed for every raw score in a set of scores for a test.
83
i. The Z score
The Z score is the basic standardised score. It is referred to as the
basic form as other computations of standardised scores must first
calculate the Z score. The formula used to calculate the Z score is as
follows:
Z score values are very small and usually range only from 2 to 2.
Such small values make it inappropriate for score reporting especially
for those unaccustomed to the concept. Imagine what a parent may
say if his child comes home with a report card with a Z score of 0.47 in
English Language! Fortunately, there is another form of standardised
score - the T score with values that are more palatable to the
relevant parties.
84
ii.
8.2.4
The T score
The T score is a standardised score which can be computed using the
formula 10 (Z) + 50. As such, the T score for students A, B, C, and D in
the table 4.3 are 10(-1.28) + 50; 10 (-0.23) + 50; 10(0.47) + 50; and 10
(1.04) + 50 or 37.2, 47.7, 54.7, and 60.4 respectively. These values
seem perfectly appropriate compared to the Z score. The T score
average or mean is always 50 (i.e. a standard deviation of 0) which
connotes an average ability and the mid point of a 100 point scale.
Interpretation of data
The standardised score is actually a very important score if we want to
compare performance across tests and between students. Let us take the
following scenario as an example:
How can En. Abu solve this problem? He would have to have
standardised scores in order to decide. This would require the following
information:
Test 1 : X = 42 standard deviation= 7
Test 2 : X = 47 standard deviation= 8
Using the information above, En. Abu can find the Z score for each raw
score reported as follows:
Table 8.4: Z Score for Form 2A
Based on Table 8.4, both Ali and Chong have a negative Z score as
their total score for both tests. However, Chong has a higher Z score
total (i.e. 1.07 compared to 1.34) and therefore performed better
when we take the performance of all the other students into
consideration.
85
percentage on the diagram. For example, the area between the mean (0
standard deviation) and +1 standard deviation is 34.13%. Similarly, the
area between the mean and 1 standard deviation is also 34.13%. As
such, the area between 1 and 1 standard deviations is 68.26%.
In using the normal curve, it is important to make a distinction between
standard deviation values and standard deviation scores. A standard
deviation value is a constant and is shown on the horizontal axis of the
diagram above. The standard deviation score, on the other hand, is the
obtained score when we use the standard deviation formula provided
earlier. So, if we find the score to be 5 as in the earlier example, then the
score for the standard deviation value of 1 is 5 and for the value of 2 is 5
x 2 = 10 and for the value of 3 is 15 and so on. Standard deviation values
of 1, -2, and 3 will have corresponding negative scores of 5, -10, and
15.
8.2.5
Item analysis
a.
Item difficulty
Item difficulty refers to how easy or difficult an item is. The formula
used to measure item difficulty is quite straightforward. It involves
finding out how many students answered an item correctly and
dividing it by the number of students who took this test. The formula is
therefore:
Lets use the following instance as an example. Suppose you have just
conducted a twenty item test and obtained the following results:
As there are twelve students in the class, 33% of this total would be 4
students. Therefore, the upper group and lower group will each consist
of 4 students each. Based on their total scores, the upper group would
consist of students L, A, E, and G while the lower group would consist of
students J, H, D and I.
We now need to look at the performance of these students for each item
in order to find the item discrimination index of each item.
For item 1, all four students in the upper group (L, A, E, and G)
answered correctly while only student H in the lower group answered
correctly. Using the formula described earlier, we can plug in the
numbers as follows:
students who took the test in the formula is not inflexible as it is possible
to use any percentage between 27.5% to 35% as the value.
c.
Distractor analysis
Distractor analysis is an extension of item analysis, using techniques
that are similar to item difficulty and item discrimination. In distractor
analysis, however, we are no longer interested in how test takers select
the correct answer, but how the distractors were able to function
effectively by drawing the test takers away from the correct answer. The
number of times each distractor is selected is noted in order to
determine the effectiveness of the distractor. We would expect that the
distractor is selected by enough candidates for it to be a viable
distractor.
What exactly is an acceptable value? This depends to a large extent on
the difficulty of the item itself and what we consider to be an acceptable
item difficulty value for test items. If we are to assume that 0.7 is an
appropriate item difficulty value, then we should expect that the
remaining 0.3 be about evenly distributed among the distractors.
Let us assume that 100 students took the test. If we assume that A is the
answer and the item difficulty is 0.7, then 70 students answered correctly.
What about the remaining 30 students and the effectiveness of the three
distractors? If all 30 selected D, then distractors B and C are useless in their
role as distractors. Similarly, if 15 students selected D and another 15
selected B, then C is not an effective distractor and should be replaced.
Therefore, the ideal situation would be for each of the three distractors to be
selected by an equal number of all students who did not get the answer
correct, i.e. in this case 10 students. Therefore the effectiveness of each
90
Distractor B
Distractor C
Distractor D
Item 1
8*
Item 2
8*
Item 3
8*
Item 4
8*
Item 5
7*
d.
* indicates key
For Item 1, the discrimination index for each distractor can be calculated
using the discrimination index formula. From Table 8.5, we know that all the
students in the upper group answered this item correctly and only one student
from the lower group did so. If we assume that the three remaining students
from the lower group all selected distractor B, then the discrimination index for
item 1, distractor B will be:
91
This negative value indicates that more students from the lower group
selected the distractor compared to students from the upper group. This result
is to be expected of a distractor and a value of -1 to 0 is preferred.
EXERCISE
1. Calculate the mean, mode, median and range of the following set of
scores:
23, 24, 25, 23, 24, 23, 23, 26, 27, 22, 28.
2. What is a normal curve and what does this show? Does the final
result always show a normal curve and how does this relate to
standardised tests?
TOPIC 9
92
9.0 SYNOPSIS
Topic 9 focuses on reporting assessment data. It provides teachers with brief
descriptions on the purposes of reporting and the reporting methods.
9.1 LEARNING OUTCOMES
By the end of Topic 9, teachers will be able to:
CONTENT
SESSION NINE (3 hours)
9.2.2
Reporting methods
Student achievement progress can be reported by comparing:
i. Norm - Referenced Assessment and Reporting
95
TOPIC 10
10.0 SYNOPSIS
Topic 10 focuses on the issues and concerns related to assessment in the
Malaysian primary schools. It will look at how assessment is viewed and used
in Malaysia.
10.1 LEARNING OUTCOMES
By the end of Topic 10, teachers will be able to:
CONTENT
SESSION TEN (3 hours)
10.3
Exam-oriented System
100
103
10.4
Knowledge
Comprehension
Application
Analysis
Synthesis
Evaluation
Knowledge
Recalling memorized information. May involve remembering a wide range of
material from specific facts to complete theories, but all that is required is the
bringing to mind of the appropriate information. Represents the lowest level of
learning outcomes in the cognitive domain.
Learning objectives at this level: know common terms, know specific facts,
know methods and procedures, know basic concepts, know principles.
Question verbs: Define, list, state, identify, label, name, who? when? where?
what?
Comprehension
The ability to grasp the meaning of material. Translating material from one
form to another (words to numbers), interpreting material (explaining or
summarizing), estimating future trends (predicting consequences or effects).
Goes one step beyond the simple remembering of material, and represent the
lowest level of understanding.
Learning objectives at this level: understand facts and principles, interpret
verbal material, interpret charts and graphs, translate verbal material to
mathematical formulae, estimate the future consequences implied in data,
justify methods and procedures.
Question verbs: Explain, predict, interpret, infer, summarize, convert,
translate, give example, account for, paraphrasex?
Application
The ability to use learned material in new and concrete situations. Applying
rules, methods, concepts, principles, laws, and theories. Learning outcomes
in this area require a higher level of understanding than those under
comprehension.
Learning objectives at this level: apply concepts and principles to new
situations, apply laws and theories to practical situations, solve mathematical
104
Analysis
The ability to break down material into its component parts. Identifying parts,
analysis of relationships between parts, recognition of the organizational
principles involved. Learning outcomes here represent a higher intellectual
level than comprehension and application because they require an
understanding of both the content and the structural form of the material.
Learning objectives at this level: recognize unstated assumptions, recognizes
logical fallacies in reasoning, distinguish between facts and inferences,
evaluate the relevancy of data, analyze the organizational structure of a work
(art, music, writing).
Question verbs: Differentiate, compare / contrast, distinguish x from y, how
does x affect or relate to y? why? how? What piece of x is missing / needed?
Synthesis
(By definition, synthesis cannot be assessed with multiple-choice questions. It
appears here to complete Bloom's taxonomy.)
The ability to put parts together to form a new whole. This may involve the
production of a unique communication (theme or speech), a plan of
operations (research proposal), or a set of abstract relations (scheme for
classifying information). Learning outcomes in this area stress creative
behaviors, with major emphasis on the formulation of new patterns or
structure.
Learning objectives at this level: write a well organized paper, give a well
organized speech, write a creative short story (or poem or music), propose a
plan for an experiment, integrate learning from different areas into a plan for
solving a problem, formulate a new scheme for classifying objects (or events,
or ideas).
Question verbs: Design, construct, develop, formulate, imagine, create,
change, write a short story and label the following elements:
105
Evaluation
The ability to judge the value of material (statement, novel, poem, research
report) for a given purpose. The judgments are to be based on definite
criteria, which may be internal (organization) or external (relevance to the
purpose). The student may determine the criteria or be given them. Learning
outcomes in this area are highest in the cognitivehierarchy because they
contain elements of all the other categories, plus conscious value judgments
based on clearly defined criteria.
Learning objectives at this level: judge the logical consistency of written
material, judge the adequacy with which conclusions are supported by data,
judge the value of a work (art, music, writing) by the use of internal criteria,
judge the value of a work (art, music, writing) by use of external standards of
excellence.
Question verbs: Justify, appraise, evaluate, judgexaccording to given criteria.
Which option would be better/preferable to partyy?
10.5
School-based Assessment
The traditional system of assessment no longer satisfies the educational
and social needs of the third millennium. In the past few decades, many
countries have made profound reforms in their assessment systems.
Several educational systems have in turn introduced school-based
assessment as part of or instead of external assessment in their
certification. While examination bodies acknowledge the immense
potential of school-based assessment in terms of validity and flexibility,
yet at the same time they have to guard against or deal with difficulties
related to reliability, quality control and quality assurance. In the debate
on school-based assessment, the issue of why has been widely written
about and there is general agreement on the principles of validity of
this form of assessment.
Izard (2001) as well as Raivoce and Pongi (2001) explain that schoolbased assessment (SBA) is often perceived as the process put in place
to collect evidence of what students have achieved, especially in
106
Academic:
Non-academic:
Centralised Assessment
Conducted and administered by teachers in schools using instruments,
rubrics, guidelines, time line and procedures prepared by LP
Monitoring and moderation conducted by PBS Committee at School,
District and State Education Department, and LP
School Assessment
The emphasis is on collecting first hand information about pupils learning
based on curriculum standards
Teachers plan the assessment, prepare the instrument and administer the
assessment during teaching and learning process
Teachers mark pupils responses and report their progress continuously.
10.6
Alternative Assessment
108
Alternative Assessment
One-shot tests
Indirect tests
Direct tests
Inauthentic tests
Authentic assessment
Individual projects
Group projects
No feedback to learners
Speeded exams
Power exams
Classroom-based tests
Summative
Formative
Product of instruction
Process of instruction
Intrusive
Integrated
Judgmental
Developmental
Teacher proof
Teacher mediated
Physical demonstration
Pictorial products
Reading response logs
K-W-L (what I know/what I want to know/what Ive learned) charts
Dialogue journals
Checklists
Teacher-pupils conferences
Interviews
Performace tasks
Portfolios
Self assessment
Peer assessment
110
Portfolios
A well known and commonly uses alternative assessment is the portfolio
assessment. The contents of the portfolio become evidence of abilities
much like how we would use a test to measure the abilities of our
students.
Bailey (1998, p: 218), describes a portfolio to contain four primary
elements.
First, it should have an introduction to the portfolio itself
which provides an overview to the content of the portfolio. Bailey
even suggests that this section include a reflective essay by the
student in order to help express the students thoughts and
feelings about the portfolio, perhaps explaining strengths and
possible weaknesses as well as explain why certain pieces are
included in the portfolio.
Introductory Section
Overview
Reflective Essay
Personal Section
Assessment Section
Evaluation by peers
Self-evaluation
Journals
Score reports
Photographs
Personal items
4.
3.
I have difficulty with some questions, but I generally get the meaning
2.
1.
stimulate meta-cognition.
EXERCISE
In your opinion, what are the advantages of using portfolios as
a form of alternative assessment?
114
REFERENCES
Allen, I. J. (2011). Repriviledging reading: The negotiation of
uncertainty.
Pedagogy: Critical Approaches to Teaching
Literature, Language Composition, and Culture, 12 (1) pp. 97-120.
Available at:
http://pedagogy.dukejournals.org/cgi/doi/10.1215/153142001416540(RetrievedSeptember 26, 2013)
Alderson, J. C. (1986b). Innovations in language testing? In M.
Portal
(Ed.), Innovations in language testing. pp. 93-105.
Windsor: NFER/Nelson.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test
construction
and evaluation. Cambridge: Cambridge University
Press.
Anderson, L.W. (Ed.), Krathwohl, D.R. (Ed.), Airasian,P.W.,
Cruikshank, K.A.,
Mayer, R.E., Pintrich, P.R.,Raths, J., &
Wittrock, M.C. (2001). A
taxonomy for learning, teaching, and
assessing: A revision of Bloom's
Taxonomy of Educational
Objectives (Complete edition). New York: Longman.
Anderson, K. M., (2007). Differentiating instruction to include all
students. Preventing School Failure, 51 (3) pp. 49-54.
Bachman, L. F. (2004). Statistical Analyses for Language
Assessment. pp.
22-23. Cambridge, UK: Cambridge
University Press.
Biggs, J. B. and Collis, K. F. (1982).Evaluating the Quality of
Learning: the
SOLO taxonomy. New York, NY: Academic Press.
Biggs, J. B., & Collis, K .F. (1991) Multimodal learning and the quality
of intelligent behaviour. In: H. Rowe (Ed.) Intelligence:
Reconceptualization and measurement. Hillsdale, NJ: Lawrence
Erlbaum. pp. 57-75.
115
University Press.
Moseley, D., Baumfield, V., Elliott, J., Gregson, M., Higgins, S.,
Miller, J., &
Newton, D. (2005).Frameworks for Thinking: A
handbook for teaching
and learning. Cambridge: Cambridge
University Press.
Mousavi, S. A. (2009). An encyclopedic dictionary of language
testing (4th ed.)
Tehran: Rahnama Publications.
Norleha Ibrahim. (2009). Management of measurement and
evaluation
Module. Selongor: Open University Malaysia.
Nckles, M., Hbner, S. & Renkl, A. (2009). Enhancing selfregulated learning
by writing learning protocols. Learning and
Instruction, 19(3), pp. 259 271. Available
at: http://linkinghub.elsevier.com/retrieve/pii/S0959475208000558
(Retrieved March 26, 2013).
Oller, J. W. (1979). Language tests at school: A pragmatic
approach. London: Longman.
Pearson, I. (1988).Tests as levers for change. In D. Chamberlain
& R. Baumgardner (Eds.), ESP in the classroom: Practice and
evaluation (Vol. 128, 98-107). London: Modern
118
EnglishPublications.
Pimsleur, P. (1966). Pimsleur Language Aptitude Battery. New
York, NY:
Harcourt, Brace & World.
Shepard, L. A. (2000). The role of assessment in a learning
culture. Paper
presented at the Annual Meeting of the
American Educational
Research Association.
Available
http://www.aera.net/meeting/am2000/wrap/praddr01.htm
(Retrieved 10.8.2013)
Smith, A. (2011) High Performers: The Secrets of Successful
Schools.
Camarthen: Crown House Publishing.
Smith, T.W. & Colby, S.A. (2007). Teaching for Deep Learning. The
Clearing
House. 80 (5) pp. 205211.
Spaan, M. (2006). Test and item specifications
development.Language Assessment Quarterly, 3, pp. 71-79.
Spratt, M. (2005). Washback and the classroom: The implications
for teaching and learning of studies of washback from exams.
Language Teaching Research, 19, 5-29.
Stansfield, C., & Reed, D. (2004). The story behind the Modern
Language
Aptitude Test: An interview with John B. Carrol
(1916-2003). Language
Assessment Quarterly, 1, pp.43-56.
Websites
http://www.catforms.com/pages/Introduction-to-Test-Items.html
(Retrieved 9.8.2013)
http://myenglishpages.com/blog/summative-formativeassessment/ - (Retrieved 10.8.2013)
http://www.teachingenglish.org.uk/knowledge-database/objectivetest - (Retrieved 12.8.2013)
http://assessment.tki.org.nz/Using-evidence-for
learning/Concepts/Concept/Reliability-and-validity
119
NAMA
NURLIZA BT OTHMAN
othmannurliza@yahoo.com
KELAYAKAN
KELULUSAN:
KELULUSAN
120