Vous êtes sur la page 1sur 9

DESIGNING CLASSROOM LANGUAGE TESTS Classroom tests play a central role in the evaluation of students learning.

The main goal of classroom testing is to obtain valid, reliable, and useful information concerning students achievement (Gronlund, 1981:123). An appropriate classroom test can help to show students the progress they are making. Heaton (1990:6) stated that the most useful tests in our classroom are the tests which we write ourselves. As teachers, we know our students strengths and weaknesses, and the skills and areas of language we wish to emphasize in teaching and testing. A. TEST TYPES In designing a test, first we have to determine the purpose of the test. Defining the purpose will help to choose the right kind of test and focus on the specific objectives of the test (Brown, 2004:43). 1. Language Aptitude Tests A language aptitude test is designed to measure capacity or general ability to learn a foreign language. It predicts a persons success prior to exposure to the second language. 2. Proficiency Tests It measures language ability and based on what is needed for a particular purpose, e.g., English for secretaries, English for car mechanics etc. The proficiency test aims to test global competence in language. It tests overall ability rather than a single skill in a language. TOEFL is a typical example of a standardized proficiency test. 3. Placement Tests The purpose in conducting a placement test is to place a student into a particular level or section of a language curriculum or school. It usually includes a sampling of the material to be covered in the various courses in a curriculum 4. Diagnostic Tests A diagnostic test is designed to diagnose specified aspects of a language. Heaton (1990:11) underlined that a good diagnostic test helps us to check the students progress for specific weaknesses and problems they may have encountered. Simply, it identifies learners' strengths and weaknesses. Helps teachers to make decisions on what needs to be taught. 5. Achievement Tests. An achievement test aims to determine whether by the end of a period of instruction course objectives have been met and appropriate knowledge and skilled acquired. It is usually a formal examination given by the end of the school year or at the end of the course. An effective achievement test will offer washback about the quality of a learners performance in subsets of the unit or course (Brown, 2004:48). B. SOME PRACTICAL STEPS TO TEST CONSTRUCTION 1. Assessing clear, unambiguous objectives

Every curriculum should have appropriately framed assessable objectives, that is, objectives that are stated in terms of overt performance by students. The first task in designing a test is to determine appropriate objectives. Each objective is stated in terms of the performance elicited and the target linguistic domain. Selected objectives for a unit in a low intermediate integrated skills course. a. Form focused objectives (listening and speaking) Students will 1. Recognize and produce tag questions, with the correct grammatical form and final intonation pattern, in simple social conversations. 2. Recognize and produce wh information questions with correct final intonation pattern. b. Communication skills (speaking) Students will 1. State completed actions and events in a social conversation 2. Ask for confirmation in a social conversation 3. Give opinions about an event in a social conversation 4. Produce language with contextually appropriate intonation, stress and rhythm 3. Reading skills (simple essay or story) Students will recognize irregular past tense of selected verbs in a story or essay 4. Writing skills (simple essay or short story) Students will 1. Write one paragraph story about a simple event in the past 2. Use conjunction so and because in a statement of opinion. 2. Drawing up test specification Test specifications for classroom use can be a simple and practical outline of the test. Test specification will simply comprise a broad outline of the test, what skills we will test, and what the items and task will look like. Test specifications: Speaking (5 minutes per person, previous day) Format : oral interview, T and S Task : T asks questions of S (objectives 3,5; emphasize on 6) Listening (10 minutes) Format : T makes audiotape in advance, with one other voice on it. Tasks : a. 5 minimal pair items, multiple choice (objective 1) b. 5 interpretation items, multiple choice (objective 2) Reading (10 minutes) Format : cloze test items (10 total) in a story line Tasks : fill in blanks (objective 7) Writing (10 minutes) Format : prompt for a topic: why I liked/didnt like a recent TV sitcom Task : writing a short opinion paragraph (objective 9)

Classroom oriented specifications give you an indication of: The topics (objectives) you will cover The implied elicitation and response formats for the items The number of items in each section The time to be allocated for each. 3. Devising test tasks In revising draft, ask ourselves some important questions: 1. Are the directions to each section absolutely clear? 2. Is there an example item for each section? 3. Does each item measure a specified objective? 4. Is each item stated in clear, simple language? 5. Does each multiple choice item have appropriate distracters? 6. Is the difficulty of each item appropriate for your students? 7. Is the language of each item sufficiently authentic? 8. Do the sum of the items and the test as a whole adequately reflect the learning objectives? 4. Designing multiple choice test item According to Gronlund (2009: 91), Multiple choice items are the most widely used and highly regarded of the selection type items. They can be designed to measure a variety of learning outcomes, from simple to complex, and can provide the highest quality items. The multiple choice items consists of a stem, which presents a problem situation, and several alternatives (options or choices), which provide possible solutions to the problem. The stem may be a question or incomplete statement. The alternatives include the correct answer and several plausible wrong answers called distracters. The function is to distract those students who are uncertain of the answer. There are two types of multiple choice items, such as the question form and the incomplete statement form. First, the question form or called as best answer from is use for more complex achievement. The alternatives are all partially correct but one is clearly better than the others. Second, the incomplete statement is more concise. The question is easier to write and forces the test maker to pose the clear problem but tends to results in a longer stem. The following items illustrate the question form and the incomplete statement form: Example: Which one of the following item types is an example of a supply test item? a. Multiple choice item b. True false item c. Matching item d. Short answer item An example of a supply type test item is the a. Multiple choice item b. True false item c. Matching item d. Short answer item The examples given illustrate the use of four alternatives. Multiple choice item typically include, three, four or five choices. The large number will reduce the students

chances of obtaining the correct answer by guessing. The strength and limitations of multiple choice items: The strengths: 1. Learning outcomes from simple to complex can be measured 2. Highly structured and clear tasks are provided 3. A broad sample of achievement can be measured 4. Incorrect alternatives provide diagnostic information 5. Scores are less influenced by guessing than true false items 6. Scoring is easy, objective and reliable. The limitations 1. Constructing good items is time consuming 2. It is frequently difficult to find plausible distracters 3. This item is ineffective for measuring some types of problem solving and the ability to organize and express ideas. 4. Score can be influenced by reading ability Rules for writing multiple choice items An effective multiple choice items present students with a task that is both important and clearly understood, and one that can be answered correctly by anyone who has achieved the intended learning outcome. The following rules for item writing are intended as guides for the preparation of multiple choice items that function as intended. 1. Design each item to measure an important learning outcome. The problem situation around which an item is to be built should be important and should be related to the intended learning outcome to be measured. When writing the item, focus on the functioning content of the item and resist the temptation to include irrelevant material or more obscure and less significant content to increase item difficulty. The purpose of each item is to call forth the type of performance that will help determine the extent to which the intended learning outcomes have been achieved. Example: Where did George go after the party last night? a. Yes, he did b. Because he was tired c. To Elaines place for another party d. Around eleven oclock The specific objective being tested is comprehension of Wh questions. 2. Present a single clearly formulated problem in the stem of the item. The task set forth in the stem of the item should be so clear that a student can understand it without reading the alternatives. Example: Poor: A table of specifications a. Indicates how a test will be used to improve learning b. Provides a more balanced sampling of content (C) c. Arranges the instructional objectives in order of their importance

d. Specifies the method of scoring to be used on a test. Better: What is the main advantage of using a table of specifications when preparing an achievement test? a. It reduces the amount of time required b. It improves the sampling of content (C) c. It makes the construction of test items easier d. It increases the objectivity of the test. 3. State the stem of the item in simple, clear language. The problem in the stem of multiple choice item should be stated as precisely as possible and should be free of unnecessarily complex wording and sentence structure. Example: Poor: The paucity of plausible, but incorrect, statement that can be related to a central idea poses a problem when constructing which one of the following types of test items? a. Short answer b. True false c. Multiple choice ( C ) d. Essay Better: The lack of plausible, but incorrect, alternatives will cause the greatest difficulty when constructing a. Short answer b. True false c. Multiple choice ( C ) d. Essay 4. Put as much of the wording as possible in the stem of the item. Avoid repeating the same material in each of the alternatives such as Poor: In objective testing, the term objective a. Refers to the method of identifying the learning outcomes b. Refers to the method of selecting the test content c. Refers to the method of presenting problem d. Refers to the method of scoring the answer ( C ) Better: In objective testing, the term objective refers to the method of a. Identifying learning outcomes b. Selecting the test content c. Presenting the problem d. Scoring the answer (C ) 5. States the stem of the item in positive form, wherever possible.

A positively phrased test item tends to measure more important learning outcomes than a negatively stated item because the best method or the most relevant argument typically has greater educational significance than the poorest method or the least relevant argument. Example: Item one: Which one of the following is a category in the revised taxonomy of the cognitive domain? a. Understand (C) b. (distracter needed) c. (distracter needed) d. (distracter needed) Item two: Which one of the following is not a category in the revised taxonomy of the cognitive domain? a. Understand b. Apply c. Analyze d. (answer needed) ( C ) 6. Use item indices to accept, discard or revise items The appropriate selection and arrangement of suitable multiple choice items on a test can best accomplish by measuring item against three indices: item facility (item difficulty), item discrimination (item differentiation), and distracter analysis. 1. Item facility (IF) is the extent to which an item is easy or difficult for the proposed group of test takers. If the item is too easy (99% of respondent s get it right) or too difficult (99% get it wrong). IF simply reflects the percentage of students answering the item correctly. The formula: IF = The numbers of Ss answer the item correctly Total numbers of Ss respond to the item The appropriate test items will generally have Ifs that range between .15 and .85. If the number of students respond to the item is 20. The students respond correctly is 3. IF index is .15 which means that the item is too difficult for students. 2. Item Discrimination (ID) is the extent to which an item differentiates between high and low ability test takers. An item that garners correct responses from most of the high ability group and incorrect responses from most of the low ability group has good discrimination power. For example: Suppose the class has 30 students taken a test. Divide them into third that is three ranks ordered ability groups including the top scores 10, the middle 10, and the lowest 10. Eliminate the middle group, leaving two groups with results that might look something like this on particular item:

Item #23 High ability Ss (top 1) Low ability Ss (bottom 10)

#Correct 7 2

#Incorrect 3 8

The formula of calculating ID is: ID = high group correct - low group correct x total of your two comparison groups = 7-2 = 5 1/2x20 10 = .50

The result of this example item tells us that the item has a moderate level of ID. High discriminating power would approach a perfect 1.0 and no discriminating power at all would be zero. 3. Distractor efficiency. The efficiency of distractor is the extent to which a. The distracters luren a sufficient number of test takers, especially lower ability ones b. Those responses are somewhat evenly distributed across all distracters. Choices High ability Ss (10) Low ability Ss (10) A 0 3 B 1 5 C* 7 2 D 0 0 E 2 0

*Note: C is the correct response. No mathematical formula is needed to tell you that this item successfully attracts seven of the ten high ability students toward the correct response, while only two of the low ability students get this right. From the table above, it can be explained that distractor D doesnt fool anyone. So, it has no utility. C. SCORING, GRADING, AND GIVING FEEDBACK 1. Scoring As we design a classroom test, we must consider how the test will be scored and graded. The scoring plan reflects the relative weight that we place on each section and items in each section. For example, we are going to assign scoring for the integrated skills that focuses on listening and speaking skills with some attention to reading and writing. Our decision about scoring the test can be like the example below: Percent of Total Grade 40% 20% 20% 20% Possible Total Correct = 40 = 20 = 20 = 20 100

Oral Interview Listening Reading Writing Total

4 scores, 5 to 1 range x 2 10 items @ 2 points each 10 items @ 2 points each 2 scores, 5 to 1 range x 2

2. Grading Assigning grades to the student performance on the test is not just given an A, B, and so on. How we assign letter grades to the test is a product of: The country, culture, and context of the English classroom, Institutional expectations (most of them unwritten), Explicit and implicit definitions of grades that we have set forth, The relationship we have established with the class, Student expectations that have been engendered in previous tests and quizzes in the class. 3. Giving Feedback Scoring and grading would not be complete without giving feedback to the students since we need feedback as beneficial washback. We might choose to return the test to the students with one of, or a combination of, any of the possibilities below: 1. a letter grade 2. a total score 3. four subscores (speaking, listening, reading, writing) 4. for the listening and reading sections a. an indication of correct/incorrect responses b. a marginal comments 5. for the oral interview a. scores for each element being rated b. a checklist of areas needing work c. oral feedback after the interview d. a post-interview conference to go over the result 6. on the essay a. scores for each element being rated b. a checklist of areas needing work c. marginal and end-of-essay comments, suggestions d. a post-test conference to go over work e. a self-assessment 7. on all or selected parts of the test, peer checking of results 8. a whole-class discussion of results of the test 9. individual conferences with each student to review the whole test

References: Brown, H.D. 2004. Language Assessment: Principle and Classroom Practices. White Plains, NY: Pearson Education

Gronlund, N.E. 1981. Measurement and Evaluation In Teaching. New York: Macmillan Publishing Gronlund, N.E., & Waugh, C.K. 2009. Assessment of Student Achievement. New Jersey: Pearson Heaton, J.B. 1990. Classroom Testing. New York: Longman