Académique Documents
Professionnel Documents
Culture Documents
1.0 SYNOPSIS
Definitions
OVERVIEW OF
ASSESSMENT: CONTEXT,
ISSUES & TRENDS
Differences
Purposes of various
tests
CONTENT
1.3 INTRODUCTION
1.4.1 Test
1.4.3 Evaluation
i Implementation
of Malay
vi Language as the ii
National
Language (1960) Pioneering the
Implementation of
the Open use of
Certificate computer in
Syndicate the country
(1967)
The
achievements
of Malaysia
Examination
Syndicate
iii
v
Taking over the
work of the
Recognition of iv Cambridge
Examination Examination
certificates Putting in place an Syndicate
examination system
to meet national
needs
Exercise
Describe the stages involved in the development of
educational evaluation in Malaysia.
Read
more: http://www.nst.com.my/nation/general/school-
based-assessment-plan-may-need-tweaking-
1.166386
Tutorial question
Examine the contributing factors to the changing trends of
language assessment.
Create and present findings using graphic organisers.
ROLE AND PURPOSES OF
TOPIC 2 ASSESSMENT IN
TEACHING AND LEARNING
2.0 SYNOPSIS
Role and
Purposes of
Assessment in
Teaching and
Learning
Types of Tests:
Assessment of Proficiency,
Reasons / Purposes Learning / Achievement,
of Assessment Assessment for Diagnostic, Aptitude,
Learning and Placement Tests
CONTENT
This summative assessment, the logic goes, will provide the focus to
improve student achievement, give everyone the information they need to
improve student achievement, and apply the pressure needed to motivate
teachers to work harder to teach and learn.
Proficiency Tests
Achievement Tests
Aptitude Tests
This type of test no longer enjoys the widespread use it once had. An
aptitude test is designed to measure general ability or capacity to learn a
foreign language a priori (before taking a course) and ultimate predicted
success in that undertaking. Language aptitude tests were seemingly
designed to apply to the classroom learning of any language. In the United
States, two common standardised English Language tests once used were
the Modern Language Aptitude Test (MLAT; Carroll & Sapon, 1958) and the
Pimsleur Language Aptitude Battery (PLAB; Pimsleur, 1966). Since there is
no research to show unequivocally that these kinds of tasks predict
communicative success in a language, apart from untutored language
acquisition, standardised aptitude tests are seldom used today with the
exception of identifying foreign language disability (Stansfield & Reed, 2004).
Progress Tests
These tests measure the progress that students are making towards
defined course or programme goals. They are administered at various stages
throughout a language course to see what the students have learned,
perhaps after certain segments of instruction have been completed. Progress
tests are generally teacher produced and are narrower in focus than
achievement tests because they cover a smaller amount of material and
assess fewer objectives.
Placement Tests
These tests, on the other hand, are designed to assess students level
of language ability for placement in an appropriate course or class. This type
of test indicates the level at which a student will learn most effectively. The
main aim is to create groups, which are homogeneous in level. In designing a
placement test, the test developer may choose to base the test content either
on a theory of general language proficiency or on learning objectives of the
curriculum. In the former, institutions may choose to use a well-established
proficiency test such as the TOEFL or IELTS exam and link it to curricular
benchmarks. In the latter, tests are based on aspects of the syllabus taught
at the institution concerned.
3.0 SYNOPSIS
Norm-Referenced
and Criterion-
Referenced
Types of Tests
3. Options or alternatives
They are known as a list of possible responses to a test item.
There are usually between three and five options/alternatives to
choose from.
4. Key
This is the correct response. The response can either be
correct or the best one. Usually for a good item, the correct answer is not
obvious as compared to the distractors.
5. Distractors
This is known as a disturber that is included to distract students from
selecting the correct answer. An excellent distractor is almost the same as
the correct answer but it is not.
Reflection
1. Objective test items are items that have only one answer or correct
response. Describe in-depth the multiple-choice test item.
4.0 SYNOPSIS
Reliability
Interpretability Validity
Types of
Tests
Practicality
Authenticity
Washback Effect
Objectivity
CONTENT
4.3 INTRODUCTION
4.4 RELIABILITY
Test
Teacher and Environment
Test Factor Administration
Student Factor Factor Marking Factor
Factor
c. Environment factors
Because students' grades are dependent on the way tests are being
administered, test administrators should strive to provide clear and
accurate instructions, sufficient time and careful monitoring of tests to
improve the reliability of their tests. A test-re-test technique can be
used to determine test reliability.
e. Marking factors
4.5 VALIDITY
Content validity: Does the assessment content cover what you want to
assess? Have satisfactory samples of language and language skills been
selected for testing?
Construct validity: Are you measuring what you think you're measuring? Is
the test based on the best available theory of language and language use?
Concurrent validity: Can you use the current test score to estimate scores
of other criteria? Does the test correlate with other existing measures?
b. Content Validity
d. Concurrent Validity
e. Predictive Validity
What are the different types of validity? Describe any three types and
cite examples.
http://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.php
4.5.6 Practicality
4.5.7 Objectivity
4.5.9 Authenticity
4.6.0 Interpretability
5.0 SYNOPSIS
Topic 5 exposes you the stages of test construction, the preparing of test
blueprint/test specifications, the elements in a Test Specifications Guidelines
And the importance of following the guidelines for constructing tests items.
Then we look at the various test formats that are appropriate for language
assessment.
Stages of Test
Construction
CONTENT
i determining vi pre-testing
ii planning vii validating
iii writing
iv preparing
v reviewing
5.3.1 Determining
The essential first step in testing is to make oneself perfectly
clear about what it is one wants to know and for what purpose. When
we start to construct a test, the following questions have to be
answered.
5.3.2 Planning
The first form that the solution takes is a set of specifications for
the test.This will include information on: content, format and timing,
criteria,levels of performance, and scoring procedures.
In this stage, the test constructor has to determine the content by
answering the following questions:
v Describing the purpose of the test;
v Describing the characteristics of the test takers, the nature of the
population of the examinees for whom the test is being designed.
v Defining the nature of the ability we want to measure;
v Developing a plan for evaluating the qualities of test usefulness, which
is the degree to which a test is useful for teachers and students, it
includes six qualities: reliability, validity, authenticity, practicality inter-
activeness, and impact;
v Identifying resources and developing a plan for their allocation and
management;
v Determining format and timing of the test;
v Determining levels of performance;
v Determining scoring procedures
5.3.3 Writing
Although writing items is time-consuming, writing good items is an art.
No one can expect to be able consistently to produce perfect items.
Some items will have to be rejected, others reworked. The best way to
identify items that have to be improved or abandoned is through
teamwork. Colleagues must really try to find fault; and despite the
seemingly inevitable emotional attachment that item writers develop to
items that they have created, they must be open to, and ready to
accept, the criticisms that are offered to them. Good personal relations
are a desirable quality in any test writing team.
5.3.4 Preparing
One has to understand the major principles, techniques and
experience of preparing the test items. Not every teacher can make a
good tester. To construct different kinds of tests, the tester should
observe some principles. In the production-type tests, we have to bear
in mind that no comments are necessary. Test writers should also try to
avoid test items, which can be answered through test- wiseness. Test-
wiseness refers to the capacity of the examinees to utilise the
characteristics and formats of the test to guess the correct answer.
5.3.5 Reviewing
Principles for reviewing test items:
v The test should not be reviewed immediately after its construction,
but after some considerable time.
v Other teachers or testers should review it. In a language test, it is
preferable if native speakers are available to review the test.
5.3.6 Pre-testing
After reviewing the test, it should be submitted to pre-testing.
v The tester should administer the newly-developed test to a group of
examinees similar to the target group and the purpose is to analyse
every individual item as well as the whole test.
v Numerical data (test results) should be collected to check the
efficiency of the item, it should include item facility and
discrimination.
5.3.7 Validating
Item Facility (IF) shows to what extent the item is easy or difficult. The
items should neither be too easy nor too difficult. To measure the
facility or easiness of the item, the following formula is used:
IF= number of correct responses (c) / total number of candidates (N)
And to measure item difficulty:
IF= (w) / (N)
The results of such equations range from 0 1. An item with a
facility index of 0 is too difficult, and with 1 is too easy. The ideal item is
one with the value of (0.5) and the acceptability range for item facility is
between [0.37 0.63], i.e. less than 0.37 is difficult, and above 0.63 is
easy.
Thus, tests which are too easy or too difficult for a given sample
population, often show low reliability. As noted in Topic 4, reliability is
one of the complementary aspects of measurement.
Besides knowing the purpose of the test you are creating, you
are required to know as precisely as possible what it is you want to
test. Do not conduct a test hastily. Instead, you need to examine the
objectives for the unit you are testing carefully.
5.5 Blooms and SOLO Taxonomies
5.5.1 Blooms Taxonomy (Revised)
Blooms Taxonomy is a systematic way of describing how a
learners performance develops from simple to complex levels in their
affective, psychomotor and cognitive domain of learning. The Original
Taxonomy provided carefully developed definitions for each of the six
major categories in the cognitive domain. The categories were
Knowledge, Comprehension, Application, Analysis, Synthesis, and
Evaluation. With the exception of Application, each of these was
broken into subcategories. The complete structure of the original
Taxonomy is shown in Figure 5.1.
Level 1 C1
Level 3 C3
On the other hand, SOLO, which stands for the Structure of the
Observed Learning Outcome, taxonomy is a systematic way of
describing how a learners performance develops from simple to
complex levels in their learning. Biggs & Collis first introduced it, in their
1982 study. There are 5 stages, namely Prestructural, Unistructural,
Multistructural, which are in a quantitative phrase and Relational and
Extended Abstract, which are in a qualitative phrase.
The most powerful model for understanding these three levels and
integrating them into learning intentions and success criteria is the
SOLO model.
What is an example of .?
What is the difference between test format and test type? For example,
when you want to introduce new kinds of test, for example, reading test, which
is organised a little bit different from the existing test items, what do you say?
Test format or test type? Test format refers to the layout of questions on a
test. For example, the format of a test could be two essay questions, 50
multiple- choice questions, etc.For the sake of brevity, I will consider providing
the outlines of some large-scale standardised tests.
UPSR
6.0 SYNOPSIS
Topic 6 focuses on ways to assess language skills and language
content. It defines the types of test items used to assess language
skills and language content. It also provides teachers with suggestions
on ways a teacher can assess the listening, speaking, reading and
writing skills in a classroom. It also discusses concepts of and
differences between discrete point test, integrative test and
communicative test.
OBJECTIVE AND
LANGUAGE SKILLS
SUBJECTIVE TESTING
LISTENING
SPEAKING
ASSESSING READING
LANGUAGE SKILLS
AND
WRITING
LANGUAGE CONTENT
LANGUAGE CONTENT
DISCRETE TEST
INTEGRATIVE
TEST
COMMUNICATIVE
TEST
CONTENT
c. Reading
Cohen (1994), discussed various types of reading and meaning
assessed. He describes skimming and scanning as two different types
of reading. In the first, a respondent is given a lengthy passage and is
required to inspect it rapidly (skim) or read to locate specific
information (scan) within a short period of time. He also discusses
receptive reading or intensive reading which refers to a form of
reading aimed at discovering exactly what the author seeks to
convey (p. 218). This is the most common form of reading especially
in test or assessment conditions. Another type of reading is to read
responsively where respondents are expected to respond to some
point in a reading text through writing or by answering questions.
A reading text can also convey various kinds of meaning and reading
involves the interpretation or comprehension of these meanings. First,
grammatical meaning are meanings that are expressed through
linguistic structures such as complex and simple sentences and the
correct interpretation of those structures. A second meaning is
informational meaning which refers largely to the concept or
messages contained in the text. Respondents may be required to
comprehend merely the information or content of the passage and this
may be assessed through various means such as summary and
prcis writing. Compared to grammatical or syntactic meaning,
informational meaning requires a more general understanding of a text
rather than having to pay close attention to the linguistic structure of
sentences. A third meaning contained in many texts is discourse
meaning. This refers to the perception of rhetorical functions conveyed
by the text. One typical function is discourse marking which adds
cohesiveness to a text. These words, such as unless, however, thus,
therefore etc., are crucial to the correct interpretation of a text and
students may be assessed on their ability to understand the discoursal
meaning that they bring in the passage. Finally, a fourth meaning
which may also be an object of assessment in a reading test is the
meaning conveyed by the writers tone. The writers tone whether it
is cynical, sarcastic, sad or etc.- is important in reading
comprehension but may be quite difficult to identify, especially by less
proficient learners. Nevertheless, there can be many situations where
the reader is completely wrong in comprehending a text simply
because he has failed to perceive the correct tone of the author.
d. Writing
Brown (2004), identifies three different genres of writing which are
academic writing, job-related writing and personal writing, each of
which can be expanded to include many different examples. Fiction,
for example, may be considered as personal writing according to
Browns taxonomy. Brown (2010) identified four categories of written
performance that capture the range of written production which can
be used to assess writing skill.
There are many examples of each type of test. Objective type tests
include the multiple choice test, true false items and matching
items because each of these are graded objectively. In these
examples of objective tests, there is only
one correct response and the grader does not need to subjectively
assess the response.
Two other terms, select type tests and supply type tests are related
terms when we think of objective and subjective tests. In most
cases, objective tests are similar to select type tests where
students are expected to select or choose the answer from a list of
options. Just as a multiple choice question test is an objective type
test, it can also be considered a select type test. Similarly, tests
involving essay type questions are supply type as the students are
expected to supply the answer through their essay. How then
would you classify a fill in the blank type test? Definitely for this
type of test, the students need to supply the answer, but what is
supplied is merely a single word or a short phrase which differs
tremendously from an essay. It may therefore be helpful to once
again consider a continuum with supply type and select type items
at each end of the continuum respectively.
In addition to the above, Brown and Hudson (1998), have also suggested
three broad categories to differentiate tests according to how students are
expected to respond. These categories are the selected response tests, the
constructed response tests, and the personal response tests. Examples of
each of these types of tests are given in Table 6.1.
b. Communicative Test
As language teaching has emphasised the importance of
communication through the communicative approach, it is not
surprising that communicative tests have also been given prominence.
A communicative emphasis in testing involves many aspects, two of
which revolve around communicative elements in tests and meaningful
content. Both these aspects are briefly addressed in the following sub
sections:
involve performance;
are authentic; and
are scored on real-life outcomes.
In short, the kinds of tests that we should expect more of in the future
will be communicative tests in which candidates actually have to
produce the language in an interactive setting involving some degree of
unpredictability which is typical of any language interaction situation.
These tests would also take the communicative purpose of the
interaction into consideration and require the student to interact with
language that is actual and unsimplified for the learner. Fulcher finally
points out that in a communicative test, the only real criterion of
success is the behavioural outcome, or whether the learner was
able to achieve the intended communicative effect (p. 493). It is
obvious from this description that the communicative test may not be
so easily developed and implemented. Practical reasons may hinder
some of the demands listed. Nevertheless, a solution to this problem
has to be found in the near future in order to have valid language that
are purposeful and can stimulate positive washback in teaching and
learning.
Exercise 1
7.0 SYNOPSIS
Topic 7 focuses on the scoring, grading and assessment criteria. It
provides teachers with brief descriptions on the different approaches to
scoring namely:-objective, holistic and analytic.
Approaches to
scoring
CONTENT
RRating CCriteria
5-6 Vocabulary is precise, varied, and vivid.
Organization is appropriate to writing assignment
and contains clear introduction, development of
ideas, and conclusion.
Transition from one idea to another is smooth
and provides reader with clear understanding that
topic is changing.
Meaning is conveyed effectively.
A few mechanical errors may be present but do
not disrupt communication.
Shows a clear understanding of writing and topic
development.
4 Vocabulary is adequate for grade level. Events
are organized logically, but some part of the
sample may not be fully developed.
Some transition of ideas is evident.
Meaning is conveyed but breaks down at times.
Mechanical errors are present but do not disrupt
communication.
Shows a good understanding of writing and topic
development.
3 Vocabulary is simple. Organization may be
extremely simple or there may be evidence of
disorganization.
There are a few transitional markers or
repetitive transitional markers.
Meaning is frequently not clear.
Mechanical errors affect communication.
Shows some understanding of writing and
topic development.
2 Vocabulary is limited and repetitious. Sample
is comprised of only a few disjointed
sentences.
No transitional markers.
Meaning is unclear.
Mechanical errors cause serious disruption in
communication.
Shows little evidence of discourse
understanding.
1 Responds with a few isolated words. No
complete sentences are written.
No evidence of concepts of writing.
0 No response.
The 6 point scale above includes broad descriptors of what a students essay
reflects for each band. It is quite apparent that graders using this scale are
expected to pay attention to vocabulary, meaning, organisation, topic
development and communication. Mechanics such as punctuation are
secondary to communication.
Bailey also describes another type of scoring related to the holistic approach
which she refers to as primary trait scoring. In primary trait scoring, a
particular functional focus is selected which is based on the purpose of the
writing and grading is based on how well the student is able to express that
function. For example, if the function is to persuade, scoring would be on how
well the author has been able to persuade the grader rather than how well
organised the ideas were, or how grammatical the structures in the essay
were. This technique to grading emphasises functional and communicative
ability rather than discrete linguistic ability and accuracy.
Components Weight
Content 30 points
Organisation 20 points
Vocabulary 20 points
Language Used 25 points
Mechanics 5 points
Each of the three scoring approaches claims to have its own advantages
and disadvantages. These can be illustrated by Table 7.2
EXERCISE
8.0 SYNOPSIS
Topic 8 focuses on item analysis and interpretation. It provides teachers with
brief descriptions on basic statistics terminologies such as mode, median,
mean, standard deviation, standard score and interpretation of data. It will also
look at some item analysis that deals with item difficulty and item discrimination.
Teachers will also be introduced to distractor analysis in language assessment.
ITEM ANALYSIS
AND
INTERPRETATIO
N
BASIC
ITEM ANALYSIS
STATISTICS
ITEM
STANDARD
MEDIAN DISCRIMINATIO
SCORE
N
INTERPRETATIO
MEAN
N OF DATA
CONTENT
Let us assume that you have just graded the test papers for your class. You
now have a set of scores. If a person were to ask you about the performance
of the students in your class, it would be very difficult to give all the scores in
the class. Instead, you may prefer to cite only one score.
Or perhaps you would like to report on the performance by giving some
values that would help provide a good indication of how the students in your
class performed. What values would you give? In this section, we will look at
two kinds of measures, namely measures of central tendency and measures
of dispersion. Both these types of measures are useful in score reporting.
Standard deviation refers to how much the scores deviate from the mean.
There are two methods of calculating standard deviation which are the
deviation method and raw score method which are illustrated by the following
formulae.
To illustrate this, we will use 20, 25,30. Using standard deviation method,
we come up with the following table:
Using the raw score method, we can come up with the following:
Table 8.2 : Calculating the Standard Deviation Using the Raw Score Method
Both methods result in the same final value of 5. If you are calculating
standard deviation with a calculator, it is suggested that the deviation
method be used when there are only a few scores and the raw score
method be used when there are many scores. This is because when
there are many scores, it will be tedious to calculate the square of the
deviations and their sum.
Z score values are very small and usually range only from 2 to 2.
Such small values make it inappropriate for score reporting especially
for those unaccustomed to the concept. Imagine what a parent may
say if his child comes home with a report card with a Z score of 0.47
in English Language! Fortunately, there is another form of
standardised score - the T score with values that are more
palatable to the relevant parties.
How can En. Abu solve this problem? He would have to have
standardised scores in order to decide. This would require the
following information:
Using the information above, En. Abu can find the Z score for each
raw score reported as follows:
Based on Table 8.4, both Ali and Chong have a negative Z score as
their total score for both tests. However, Chong has a higher Z score
total (i.e. 1.07 compared to 1.34) and therefore performed better
when we take the performance of all the other students into
consideration.
THE NORMAL CURVE
a. Item difficulty
Item difficulty refers to how easy or difficult an item is. The formula
used to measure item difficulty is quite straightforward. It involves
finding out how many students answered an item correctly and
dividing it by the number of students who took this test. The formula
is therefore:
b. Item discrimination
Lets use the following instance as an example. Suppose you have just
conducted a twenty item test and obtained the following results:
Table 8.5: Item Discrimination
As there are twelve students in the class, 33% of this total would be 4
students. Therefore, the upper group and lower group will each consist
of 4 students each. Based on their total scores, the upper group would
consist of students L, A, E, and G while the lower group would consist
of students J, H, D and I.
We now need to look at the performance of these students for each
item in order to find the item discrimination index of each item.
For item 1, all four students in the upper group (L, A, E, and G)
answered correctly while only student H in the lower group answered
correctly. Using the formula described earlier, we can plug in the
numbers as follows:
c. Distractor analysis
Let us assume that 100 students took the test. If we assume that A is the
answer and the item difficulty is 0.7, then 70 students answered correctly.
What about the remaining 30 students and the effectiveness of the three
distractors? If all 30 selected D, then distractors B and C are useless in
their role as distractors. Similarly, if 15 students selected D and another 15
selected B, then C is not an effective distractor and should be replaced.
Therefore, the ideal situation would be for each of the three distractors to
be selected by an equal number of all students who did not get the answer
correct, i.e. in this case 10 students. Therefore the effectiveness of each
distractor can be quantified as 10/100 or 0.1 where 10 is the number of
students who selected the tiems and 100 is the total number of students
who took the test. This technique is similar to a difficulty index although the
result does not indicate the difficulty of each item, but rather the
effectiveness of the distractor. In the first situation described in this
paragraph, options A, B, C and D would have a difficulty index of 0.7, 0, 0,
and 0.3 respectively. If the distractors worked equally well, then the indices
would be 0.7, 0.1, 0.1, and 0.1. Unlike in determining the difficulty of an
item, the value of the difficulty index formula for the distractors must be
interpreted in relation to the indices for the other distractors.
From a different perspective, the item discrimination formula can also be
used in distractor analysis. The concept of upper groups and lower groups
would still remain, but the analysis and expectation would differ slightly
from the regular item discrimination that we have looked at earlier. Instead
of expecting a positive value, we should logically expect a negative value
as more students from the lower group should select distractors. Each
distractor can have its own item discrimination value in order to analyse
how the distractors work and ultimately refine the effectiveness of the test
item itself.
Item 1 8* 3 1 0
Item 2 2 8* 2 0
Item 3 4 8* 0 0
Item 4 1 3 8* 0
Item 5 5 0 0 7*
d. * indicates key
For Item 1, the discrimination index for each distractor can be calculated
using the discrimination index formula. From Table 8.5, we know that all the
students in the upper group answered this item correctly and only one
student from the lower group did so. If we assume that the three remaining
students from the lower group all selected distractor B, then the
discrimination index for item 1, distractor B will be:
This negative value indicates that more students from the lower group
selected the distractor compared to students from the upper group. This
result is to be expected of a distractor and a value of -1 to 0 is preferred.
EXERCISE
1. Calculate the mean, mode, median and range of the following set of
scores:
23, 24, 25, 23, 24, 23, 23, 26, 27, 22, 28.
2. What is a normal curve and what does this show? Does the final
result always show a normal curve and how does this relate to
standardised tests?
TOPIC 9 REPORTING OF ASSESSMENT DATA
9.0 SYNOPSIS
Topic 9 focuses on reporting assessment data. It provides teachers with brief
descriptions on the purposes of reporting and the reporting methods.
REPORTING OF
ASSESSMENT
DATA
PURPOSES OF REPORTING
REPORTING METHODS
CONTENT
iii An outcomes-approach
Acknowledges that students, regardless of their class or grade, can be
working towards syllabus outcomes anywhere along the learning
continuum.
10.0 SYNOPSIS
Topic 10 focuses on the issues and concerns related to assessment in the
Malaysian primary schools. It will look at how assessment is viewed and used
in Malaysia.
Exam-
Oriented
system
Issues and
Alternative Concerns in Cognitive
Levels of
assessment Malaysian assessment
Schools
School-
based
assessment
CONTENT
SESSION TEN (3 hours)
10.3 Exam-oriented System
The educational administration in Malaysia is highly centralised with four
hierarchical levels; that is, federal, state, district and the lowest level, school.
Major decision-and policy-making take place at the federal level represented
by the Ministry of Education (MoE), which consists of the Curriculum
Development Centre, the school division, and the Malaysian Examination
Syndicate (MES).
The current education system in Malaysia is too examination-oriented and
over-emphasizes rote-learning with institutions of higher learning fast
becoming mere diploma mills.Like most Asian countries (e.g., Gang 1996;
Lim and Tan 1999; Choi 1999); Malaysia so far has focused on public
examination results as important determinants of students progression to
higher levels of education or occupational opportunities (Chiam 1984).
The Malaysian education system requires all students to sit for public
examinations at the end of each level of schooling. There are four public
examinations from primary to postsecondary education. These are the
Primary School Achievement Test (UPSR) at the end of six years of primary
education, the Lower Secondary Examination (PMR) at the end of another
three years schooling, the Malaysian Certificate of Education (SPM) at the
end of 11 years of schooling, and the Malaysian Higher School Certificate
Examination (STPM) or the Higher Malaysian Certificate for Religious
Education (STAM) at the end of 13 years schooling (MoE 2004).
Knowledge
Comprehension
Application
Analysis
Synthesis
Evaluation
Knowledge
Learning objectives at this level: know common terms, know specific facts,
know methods and procedures, know basic concepts, know principles.
Question verbs: Define, list, state, identify, label, name, who? when? where?
what?
Comprehension
The ability to grasp the meaning of material. Translating material from one
form to another (words to numbers), interpreting material (explaining or
summarizing), estimating future trends (predicting consequences or effects).
Goes one step beyond the simple remembering of material, and represent
the lowest level of understanding.
Application
The ability to use learned material in new and concrete situations. Applying
rules, methods, concepts, principles, laws, and theories. Learning outcomes
in this area require a higher level of understanding than those under
comprehension.
Question verbs: How could x be used to y? How would you show, make use
of, modify, demonstrate, solve, or apply x to conditions y?
Analysis
The ability to break down material into its component parts. Identifying parts,
analysis of relationships between parts, recognition of the organizational
principles involved. Learning outcomes here represent a higher intellectual
level than comprehension and application because they require an
understanding of both the content and the structural form of the material.
Synthesis
The ability to put parts together to form a new whole. This may involve the
production of a unique communication (theme or speech), a plan of
operations (research proposal), or a set of abstract relations (scheme for
classifying information). Learning outcomes in this area stress creative
behaviors, with major emphasis on the formulation of new patterns or
structure.
Learning objectives at this level: write a well organized paper, give a well
organized speech, write a creative short story (or poem or music), propose a
plan for an experiment, integrate learning from different areas into a plan for
solving a problem, formulate a new scheme for classifying objects (or events,
or ideas).
The ability to judge the value of material (statement, novel, poem, research
report) for a given purpose. The judgments are to be based on definite
criteria, which may be internal (organization) or external (relevance to the
purpose). The student may determine the criteria or be given them. Learning
outcomes in this area are highest in the cognitive hierarchy because they
contain elements of all the other categories, plus conscious value judgments
based on clearly defined criteria.
Summative Formative
Intrusive Integrated
Judgmental Developmental
Physical demonstration
Pictorial products
Reading response logs
K-W-L (what I know/what I want to know/what Ive learned) charts
Dialogue journals
Checklists
Teacher-pupils conferences
Interviews
Performace tasks
Portfolios
Self assessment
Peer assessment
Portfolios
3. I have difficulty with some questions, but I generally get the meaning
EXERCISE
In your opinion, what are the advantages of using portfolios as
a form of alternative assessment?
REFERENCES
Biggs, J. B., & Collis, K .F. (1991) Multimodal learning and the
quality of intelligent behaviour. In: H. Rowe (Ed.) Intelligence:
Reconceptualization and measurement. Hillsdale, NJ: Lawrence
Erlbaum. pp. 57-75.
Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T. and
McNamara, T. (1999). Dictionary of language testing.
Cambridge: University ofCambridge Local Examinations
Syndicate and Cambridge University Press.
Moseley, D., Baumfield, V., Elliott, J., Gregson, M., Higgins, S.,
Miller, J., & Newton, D. (2005).Frameworks for Thinking: A
handbook for teaching and learning. Cambridge: Cambridge
University Press.
Smith, T.W. & Colby, S.A. (2007). Teaching for Deep Learning.
The Clearing House. 80 (5) pp. 205211.
Stansfield, C., & Reed, D. (2004). The story behind the Modern
Language Aptitude Test: An interview with John B. Carrol
(1916-2003). Language Assessment Quarterly, 1, pp.43-56.
Websites
http://www.catforms.com/pages/Introduction-to-Test-Items.html
(Retrieved 9.8.2013)
http://myenglishpages.com/blog/summative-formative-
assessment/ - (Retrieved 10.8.2013)
http://www.teachingenglish.org.uk/knowledge-
database/objective-test - (Retrieved 12.8.2013)
http://assessment.tki.org.nz/Using-evidence-for
learning/Concepts/Concept/Reliability-and-validity
PANEL PENULIS MODUL
PROGRAM PENSISWAZAHAN GURU
MOD PENDIDIKAN JARAK JAUH
(PENDIDIKAN RENDAH)
NAMA KELAYAKAN
NURLIZA BT OTHMAN KELULUSAN:
othmannurliza@yahoo.com
M.A TESL University of North Texas, USA
B.A (Hons) English North Texas State University, USA
Sijil Latihan Perguruan Guru Siswazah (Kementerian
Pelajaran Malaysia)
PENGALAMAN KERJA
4 tahun sebagai guru di sekolah menengah
21 tahun sebagai pensyarah di IPG
KELULUSAN
ANG CHWEE PIN M.Ed.TESL Universiti Teknologi Malaysia
chweepin819@yahoo.com B.Ed. (Hons.) Agri. Science/TESL, Universiti Pertanian
Malaysia
PENGALAMAN KERJA
23 tahun sebagai guru di sekolah menengah
7 tahun sebagai pensyarah di IPG