Académique Documents
Professionnel Documents
Culture Documents
CHAPTER I
INTRODUCTION
knowing the students weakness, the teacher to solve the students problem.
The same opinion with Arikunto, that a good classroom test will also help
to locate the precise areas of difficulty encountered by the class or by the
individual student.5 Therefore, it is necessary for the teacher to know their
students weakness and difficulties.
Because testing is important in teaching, teachers as a test
constructor should be able to construct a good test. Teachers who construct
a good test will give contribution to students education. On the other hand,
teachers who have lack of skill in constructing a good test will give less
contribution or might even make students education become worst. The
test will fulfill the purpose of testing if it has the characteristic of a good
test. There are many ways to know the quality of a good test.
From those evaluation experts, each expert mentions both validity
and reliability. It can be said that both validity and reliability are the
important thing for good test quality.
There are two kinds of test, classification of the test from types of
the test according to the test role and types of the test according to the test
maker. In standardized test, the test is made by professional testing services
5
that the test is tried on first, analyzed, and revised before being used, the
example of this test is UAN and SPMB.
Standardized test is design to be used with thousands and
sometimes hundreds of thousands of subjects throughout the nation or the
world, and prepared (and perhaps administered, scored, and interpreted) by
a team of testing specialists.6
However, in the teacher-made test, the test is made by the teacher
self or group of teachers without being tried on first, analyzed, and revised.
Teacher made test is a test made by the teacher himself or group of
teachers using untried out, unanalyzed, and unrevised test items.7 Since the
test is prepared, administrated, and scored by one teacher without being
tried out, analyzed, and revised, the reliability of the teacher made test is
considered to be low. The teacher made test has mid or lower reliability
than standardized test. As the result, the test is far from the expectation..
UAS and UTS are the examples of the teacher made test.
Teachers as a test constructor should be able to construct a good
test for their students. A good test should be valid and reliable. Moreover,
6
the quality of the test made by the teacher is doubtful, because the test
unanalyzed by the other. It is still to be questioned whether the test is valid
and reliable or not since teachers seldom analyze and revise the test they
made. Teachers prefer use a unanalyzed and unrevised test items. It is
supported by Arikunto, that teachers rarely use an analyzed and revised test
items. Knowing this fact, the validity and reliability of the teacher made
test is doubtful. It can be low or even unknown. Knowing this fact, the
teacher should analyze their test so that they can know which items can be
used or which items should be revised. Based on the fact above, the quality
of teacher made test is investigated.
There are some studies taken before, concerning the content
validity, reliability, index of difficulty, and index of discrimination. This
study also analyzes this elements, but it is different from those previous
studies. The differences are: this study uses the English curriculum to
analyze the content validity, this study analyzes two forms of objective
test, multiple choice and completion test, and also the object of this study is
the first year students of senior high school.
This study is focus an analyzing the teacher made English test items
in UAS semester 2 2008/2009 of the first year students of SMA
Muhammadiyah 2 Sidoarjo concerning study about the content validity,
reliability, item difficulty, and discrimination index. The form of test used
is the multiple choice and completion form. Here, the teacher does not use
standardized test but the teacher made test. It means that the test is
prepared, administrated, and scored by the teacher himself or herself. So,
the teacher made English test items in UAS semester 2 2008/2009 of the
first year students of SMA Muhammadiyah 2 Sidoarjo are analyzed,
whether it is really constructed in a right way, following the right
principles or not.
good test and can be used as comparison between the item analysis
in one school with another.
10
2. Reliability
The reliability of a test is a matter of how consistently it produced
similar results on different occasions under similar circumstances.
3. Item analysis
Item analysis is an examination of the tests from the point view of
their difficulty level and their level of discrimination.
4. Item difficulty
The index of difficulty shows how shows how easy or difficult the
particular item proved in test.
5. Discrimination index
The discrimination index of an item indicates the extant to which
the item discriminates between the tastes, separating the more able
testes from the less able.
6. Test
The examination or trial of the quality of a person or things;
examine and measure the qualities of person or the knowledge.
11
CHAPTER II
REVIEW OF RELATED LITERATURE
are
sometimes
confused
with
the
terms
evaluation,
12
10
13
evaluation on it.12 For example, if the students comprehend the reading text
well, we can say that their reading ability are good.
For testing, test is a procedure designed to elicit certain behavior
from which one can make inferences about certain characteristics of an
individual.13A test can be considered to be a device typically used to find
out something about a person.14 In addition, Arikunto that test is a device
or a procedure which is used to find out or to measure something. Here, a
test is used to measure the changing of individuals behavior as the goal of
instruction. By giving a test the changing of their students behavior. The
objectives of language testing.15
1. To determine readiness for instructional programs.
2. To classify or place individuals in appropriate language classes.
3. To diagnose the individuals specific strengths and weaknesses.
4. To measure aptitude for learning.
12
14
16
15
17
16
17
certifying students mastery, but also for judging the appropriateness of the
couse objectives and the effectiveness of the instruction.
In line with Gronlund, Johnson and Johnson state that summative
test is conducted at the end of an instructional unit or semester to judge the
final quality and quantity of student achievement and the success of the
instructional program.18 In curriculum 2004, summative test is known as
UAS (Ujian Akhir Semester) of final form test.
18
20
19
1. Based on the content and the general goal for the whole schools in
the country.
2. In relation with general knowledge or capability.
3. Developed by professors, reviewer, and editors of test items.
4. Using items that are tried out, analyzed, and revised before being
used for a test.
5. Having high reliability.
6. Having norms which represent the whole performance of schools in
the country.
23
20
a test made by the teacher himself or group of teachers is using untried out,
unanalyzed, and unrevised test items.24
The teacher-made test is used to measure his students achievement
on the objectives given after finishing the teaching learning progress. The
teacher-made test is made by the teacher based on his or her own
objectives and it is not tried out, analyzed and revised.25 Therefore, he also
states that the teacher-made test has average or lower reliability than
standardized test. UTS (Ujian Tengah Semester) or mid form test and UAS
(Ujian Akhir Semester) or final form test are the examples of teacher-made
test.
21
22
test. Different scorer may produce different score. Subjective test are those
that require an opinion, a judgment on the part of the examiner.
The opinion above lead to the conclusion of the strengths and the
weakness of subjective test. Here are the strengths and the weaknesses of
subjective test:30
The strength of subjective test are:
a. It is easy to construct the items.
b. It encourages the students to express their ideas and construct them
in good sentences.
c. It is able to see how far the students master the material.
The weaknesses of subjective test are:
a. It has low validity and reliability because it is easy to know which
knowledge has been mastered perfectly.
b. It lacks representative of all the materials that will be examined to
the students.
c. It takes a long time in scoring.
d. It is difficult to score because it requires the scorer considerations.
30
23
24
34
25
____________________ stem
a. Gone
b. Went
Distracter option
c. Going
d. Goes
e. Go
35
26
27
38
28
39
29
Present the correct part of the statement first, and vary the truth or
falsity of the second part if the statement expresses a relationship
(cause, effect--if, then)
If any one part of the sentence is false, the whole sentence is false
despite many other true statements.
30
Simple
Conflict is essential in a play
False
True
False
True
False
Complex
conflict is essential in a play
True
Opinion
Compound
conflict is essential in a play
31
b. Claude Monet
c. Andy Warhol
d. Claude Debussy
32
Desirable Directions: On the line to the left of each art style in Column
I, write the letter of a representative artist from Column II. Use each name
only once.
1.____ Impressionist
a. Jackson Pollack
b. Claude Monet
c. Andy Warhol
d. Claude Debussy
33
represent the degree of the students mastery over the language teaching
materials have been thought.
All good tests include three qualities namely validity, reliability,
and practicality.40 In this study, validity and reliability will be discussed
because they are the most important characteristics of a good test.
A teacher, who wishes to use a good test to make an important
decision about an individual or group, must be sure that the test possesses
two absolutely essentials characteristics, validity and reliability.41
2.4.1 Validity
Validity refers to the extent to which the results of an evaluation
procedure serve the particular uses for which they are intended.42 It means
that validity of a test measures what it is supposed to measure. If the test is
able to measure what its purposes, then the test has high validity. There are
40
34
2.4.2 Reliability
Reliability refers to the consistency of measurement. It means that
it shows the consistency of the test score or other evaluation results from
43
35
46
36
N 1
m (N m
1 Nx 2
Where:
r = the reliability
N= the number of items in the test
m = the mean score on the test for all the testees
x = the standard deviation o all the testees score
49
Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.164
Harris, David P. 1969. Testing Language as a Second Language. USA: McGrawHill.pg.105
50
37
51
52
Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.178
Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.178
38
answer the item correctly. In addition, Oller points out that items difficulty
is about how difficult or how easy a test item for the students being
investigated.54 A good test item must not too difficult or too easy for the
students.
The students score must be analyzed in order to know exactly the
index of difficulty of the test. The index of difficulty is calculated by using
formula below:55
Correct U + Correct L
F. V =
2n
Where :
F.V
Correct U
53
Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.178
Oller, John W. 1979. Language Test at School. USA: Longman.pg.246
55
Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.182
54
39
Correct L
= easy
0, 31 0,70
= moderate
0, 00 0,30
= difficult
The criteria above show that if the index of difficulty shows 1,00, the
test is too easy since the students can answer all items. It is not good to be
given to the students. Moreover, if the index of difficulty shows 0,00, the
test is too difficult since the students cannot answer all the items. This test
56
40
is also not good be given. The test which is good to be given to the students
is the test with criterion between 0,31 0,70.57
41
proportions of the upper and lower groups who answered the item
correctly. For example, if 30% of the upper group and 10% of the lower
group answered the item correctly, the maximum possible discrimination is
30 plus 10, or 40.
42
Where:
D
Correct U
58
Heaton, J.B. 1988. Writing English Language Test. New York: Longman.pg.182
43
Correct L
= poor
0, 20 0, 40
= satisfactory
0, 40 0, 70
= good
0, 70 1, 00
= excellent
59
60
44
the lower group can. This kind of test is entirely wrong and must be
replaced. However, if both the students in the upper group and in the lower
group can or cannot answer the items correctly, so the index of
discrimination is 0. This kind of test does not discriminate in any way at
all.
45