Notes

NOTES: LA
PRACTICALITY
Validity and reliability are not enough to build a test. Instead, the test should be practical
across time, cost, and energy. Dealing with time and energy, tests should be efficient in terms of
making, doing, and evaluating. Then, the tests must be affordable. It is quite useless if a valid
and reliable test cannot be done in remote areas because it requires an inexpensive computer to
do it (Heaton, 1975: 158-159; Weir, 1990: 34-35; Brown, 2004: 19-20).
PRINCIPLES OF LANGUAGE ASSESSMENT
There are five principles of language assessment; they are practicality, reliability, validity,
authenticity, and wasback.
1. PRACTICALITY
An effective test is practical. This means that it:
is not excessively expensive.
A test that is prohibitively expensive is impractical.
stays within appropriate time constraint.
A test of language proficiency that takes a student 10 hours to complete is impractical.
is relatively easy to administer.
A test that takes a few minutes for a student to take and several hours for an examiner to
evaluate for most classroom situation is impractical.
has a scoring/evaluation procedure that is specific and time efficient.
A test that can be scored only by computer if the test takes place a thousand miles away
from the nearest computer is impractical.
Furthermore, for a test to be practical:
administrative details should clearly be established before the test,
students should be able to complete the test reasonably within the set time frame,
all materials and equipment should be ready,
the cost of the test should be within budgeted limits,
the scoring/evaluation system should be feasible in the teachers time frame.
RELIABILITY
A reliability test is consistent and dependable. A number of sources of unreliability may be
identified:
a. Students-related Reliability
A test yields unreliable results because of factors beyond the control of the test taker,
such as illness, fatigue, a bad day, or no sleep the night before.
b. Rater (scorer) Reliability
Rater reliability sometime refers to the consistency of scoring by two or more scorers.
Human error, subjectivity, and bias may enter into the scoring process. Inter-rater unreliability
occurs when two or more scorers yield inconsistent score of the same test, possibly for lack of
attention to scoring criteria, inexperience, or inattention. Intra-rater unreliability is because of
unclear scoring criteria, fatigue, and bias toward particular good and bad students.
c. Test Administration Reliability
Unreliability may result from the condition in which the test is administered. For example
is the test of aural comprehension with a tape recorder. When a tape recorder played items, the
students sitting next to windows could not hear the tape accurately because of the street noise
outside the building.
d. Test Reliability
If a test is too long, test-takers may become fatigued by the time they reach the later items
and hastily respond incorrectly.
Test and test administration reliability can be achieved by making sure that all students
received the same quality of input. Part of achieving test reliability depends on the physical
context-making sure, for example, that every students has a cleanly photocopied test sheet, sound
amplification is clearly audible to everyone in the room, video input is equally visible to all,
lightning, temperature, and other classroom conditions are equal (and optimal) for all students.
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
VALIDITY
Validity is the extent to which inferences made from assessment result are appropriate,
meaningful, and useful for the purpose of the assessment. It is the most complicated yet the most
important principle. Validity can be measured using statistical correlation with other related
measures.
A. Content-related Validity
A test is said to have content validity when it actually samples the subject matter about which
conclusion are to be drawn, and require the testtaker to perform the behavior being measured.
For example, speaking ability is tested using speaking performance, not pencil and paper test. It
can be identified when we can define the achievement being measured.
It can be achieved by making a direct test performance. For example to test pronunciation
teacher should require the students to pronounce the target words orally.
There are two questions are used to applying content validity in classroom test:
1. Are classroom objectives identified and appropriately framed? The objective should include
a performance verb and specific linguistic target.
2.
Are lesson objectives represented in the form of test specification? A test should have a
structure that follows logically from the lesson or unit being tested. It can be designed by
dividing the objectives into sections, offering students a variety of item types, and gives
appropriate weight to each section.
B. Criterion-related Validity
The extent to which the criterion of the test has actually been reached. It can be best
demonstrated through a comparison of result of an assessment with result of some other measure
of the same criterion.
Criterion-related validity usually falls into two categories:
1. Concurrent Validity: if the test result supported by other concurrent performance beyond
assessment. (e.g.: high score in English final exam supported by actual proficiency in English)
2.
Predictive Validity: to asses or predict the test-takers likelihood of future success. (e.g.:
placement test, admission assessment)
C. Construct-related Validity
Construct validity ask Does the test actually touch into the theoretical construct as it has been
defined?. An informal construct validation of the use of virtually every classroom test is both
essential and feasible. For example, the scoring analysis of interview includes pronunciation,
fluency, grammatical accuracy, vocabulary used and sociolinguistics appropriateness. This is the
theoretical construct of oral proficiency. Construct validity is a major issue in validating largescale standardized test of proficiency.
D. Consequential Validity
It includes all the consequences of a test, such as its accuracy in measuring the intended criteria,
its impact on the test-takers preparation, its effect on the learner, and the social consequences of
the test interpretation and use. One aspect of consequential validity which draws special attention
is the effect of test preparation courses and manual on performance.
E. Face Validity
Face validity is the extent to which students view the assessment as fair, relevant, and useful for
improving learning. It means that students perceive the test to be valid. It will be perceived valid
if it samples the actual content of what the learners has achieved or expect to achieve.
Nevertheless the psychological state of the test-taker (confidence, anxiety) is an important aspect
in their peak performance.
Test with high face validity has the following characteristics:
Well constructed, expected format with familiar task.
Clearly doable within the allotted time.
Clear and uncomplicated test item.
Crystal clear direction.
Task that relate to students course work.
A difficulty level that present a reasonable challenge.
Another phrase associated with face validity is biased for best. Teachers can make a test which
is biased for best by offering students appropriate review and preparation for the test,
suggesting strategies that will be beneficial, or structuring the test so that the best students will
be modestly challenged and the weaker students will not be overwhelmed.
3. AUTHENTICITY
Authenticity is the degree of correspondence of the characteristics of a given language test task
to the features of a target language task. It also means a task that is likely to be encountered in
the real world.
Authenticity can be presented by:
Using a natural language
Contextualizing the test item
Giving meaningful (relevant, interesting) topics for the learners.
Providing thematic organization to the item (e.g. through story line or episode)
Giving test which represent or closely approximate real world task.
4. WASHBACK
In general terms, washback means the effect of testing on teaching and learning. In large-scale
assessment, it refers to the effects that test have on instruction in the terms of how the students
prepare for the test. While in classroom assessment, washback means the beneficial information
that washesback to the students in the form of useful diagnoses of strengths and weaknesses.
In enhancing washback, the teachers should comment generously and specifically on test
performance, respond to as many details as possible, praise strengths, criticize weaknesses
constructively, and give strategic hints to improve performance.
The teachers should serve classroom tests as learning device through which washback is
achieved. Students incorrect responses can become windows of insight into further work. Their
correct responses need to be praised, especially when they represent accomplishments in a
students inter-language.
Washback enhances a number of basic principles of language acquisition: Intrinsic motivation,
autonomy, self confidence, language ego, inter-language, and strategic investment, among others.
One way to enhance washback is to comment generously and specifically on test
performance. Washback implies that students have ready access to the teacher to discuss the
feedback and evaluation he/she has given.

Notes

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Notes

Transféré par

Droits d'auteur :

Formats disponibles

NOTES: LA

Vous aimerez peut-être aussi