Académique Documents
Professionnel Documents
Culture Documents
Measuring Instrument
Leave a reply
Whether a test is standardized or teacher-made, it should apply the qualities of a good measuring
instrument. This module discusses the qualities of a good test which are: validity, reliability, and
usability.
Validity
Validity is the most important characteristics of a good test. Validity refers to the extent to
which the test serves its purpose or the efficiency with which it measures what it intends to
measure.
The validity of test concerns what the test measures and how well it does for. For example, in
order to judge the validity of a test, it is necessary to consider what behavior the test is supposed
to measure.
A test may reveal consistent scores but if it is not useful for the purpose, then it is not valid. For
example, a test for grade V students given to grade IV is not valid.
Validity is classified into four types: content validity, concurrent validity, predictive validity, and
construct validity.
Content validity means that extent to which the content of the test is truly a representative of
the content of the course. A well constructed achievement test should cover the objectives of
instruction, not just its subject matter. Three domains of behavior are included: cognitive,
affective and psychomotor.
Concurrent validity is the degree to which the test agrees with or correlates with a criterion
which is set up an acceptable measure. The criterion is always available at the time of testing.
Concurrent validity or criterion-related validity- establishes statistical tool to interpret and
correlate test results.
For example, a teacher wants to validate an achievement test in Science (X) he constructed. He
administers this test to his students. The result of this test can be compared to another Science
students (Y), which has been proven valid. If the relationship between X and Y is high, this
means that the achievement test is Science is valid. According to Garrett, a highly reliable test is
always valid measure of some functions.
Predictive validity is evaluated by relating the test to some actual achievement of the students
of which the test is supposed to predict his success. The criterion measure against this type is
important because the future outcome of the testee is predicted. The criterion measure against
which the test scores are validated and obtained are available after a long period.
Construct validity is the extent to which the test measures a theoretical trait. Test item must
include factors that make up psychological construct like intelligence, critical thinking, reading
comprehension or mathematical aptitude.
1. Inappropriateness of test items items that measure knowledge can not measure skill.
2. Direction unclear direction reduce validity. Direction that do not clearly indicate how the
pupils should answer and record their answers affect validity of test items.
3. Reading vocabulary and sentence structures too difficult and complicated vocabulary and
sentence structure will not measure what it intend to measure.
4. Level of difficulty of Items too difficult or too easy test items can not discriminate between
bright and slow pupils will lower its validity.
5. Poorly constructed test item test items that provide clues and items that are ambiguous
confuse the students and will not reveal a true measure.
6. Length of the test- a test should of sufficient length to measure what it is supposed to measure.
A test that is too short can not adequately sample the performance we want to measure.
7. Arrangement of items test item should be arrange according to difficulty, with the easiest
items to the difficult ones. Difficult items when encountered ahead may cause mental block and
may also cause student to take much time in that number.
8. Patterns of answers when students can detect the pattern of correct answer, they are liable to
guess and this lowers validity.
Reliability
Reliability means consistency and accuracy. It refers then to the extent to which a test is
dependable, self consistent and stable. In other words, the test agrees with itself. It is concerned
with the consistency of responses from moment to moments even if the person takes the same
test twice, the test yields the same result.
For example, if a student got a score of 90 in an English achievement test this Monday and gets
30 on the same test given on Friday, then both score can not be relied upon.
Inconsistency of individual scores however may be affected by persons scoring the test, by
limited samples on certain areas of the subject matter and particularly the examinees himself. If
the examinees mood is unstable this may affect his score.
1. Length of the test. As a general rule, the longer the test, the higher the reliability. A longer test
provides a more adequate sample of the behavior being measured and is less distorted by chance
factors like guessing.
2. Difficulty of the test. When a test is too easy or too difficult, it cannot show the differences
among individuals; thus it is unreliable. Ideally, achievement tests should be constructed such
that the average score is 50 percent correct and the scores range from near zero to near perfect.
3. Objectivity. Objectivity eliminates the bias, opinions or judgments of the person who checks
the test. Reliability is greater when test can be scored objectively.
4. Heterogeneity of the student group. Reliability is higher when test scores are spread over a
range of abilities. Measurement errors are smaller than that of a group that is more
heterogeneous.
5. Limited time. a test in which speed is a factor is more reliable than a test that is conducted at a
longer time.
1. Test-retest method. The same instrument is administered twice to the same group of subjects.
The scores of the first and second administrations of the test are determined by Spearman rank
correlation coefficient or Spearman rho and Pearson Product-Moment Correlation Coefficient.
Differences squared
Scores Ranks between ranks difference
Students S1 S2 R1 R2 D D2
Total D = 3.5
rs = 1 6D2
N3 N
=1
=1
= 1 0.0212
The rs value obtained is 0.98 which means very high relationship; hence achievement test in
Biology is reliable.
Pearson Product-Moment Correlation Coefficient can also be used for test-retest method of
estimating the reliability of test. The formula is:
Using the same data for Spearman rho, the scores for 1st and 2nd administration may be
presented in this way:
X (S1) Y(S2) X2 Y2 XY
Could you now compute by using the formula above? Illustrate below:
Alternate-forms method. The second method of establishing the reliability of test results. In this
method, we give two forms of a test similar in content, type of items, difficulty, and others in
close succession to the same group of students. To test the reliability the correlation technique is
used (refer to the formula used in Pearson Product-Moment Correlation Coefficient).
2. Split-half method. The test may be administered once, but the test items are divided into two
halves. The most common procedure is to divide a test into odd or even items. The results are
correlated and the r obtained is the reliability coefficient for a half test. The Spearman-Brown
formula is used which is:
rt = 2 rht
1 + rht
= 2(0.69)
1+ 0.69
Split-half method is applicable for not highly speeded measuring instrument. If the measuring
instrument includes easy items and the subject is able to answer correctly all or nearly all items
within the time limit of the test, the scores on the two halves would be about similar and the
correlation would be closed to +1.00.
3. Kuder-Richardson Formula 21 is the last method of establishing the reliability of a test. Like
the split half method, a test is conducted only once. This method assumes that all items are of
equal difficulty. The formula is:
Where:
X = the mean of the obtained scores
S = the standard deviation
k = the total number of items
Example: Mr. Marvin administered a 50-item test to 10 of his grade 5 pupils. The scores of his
pupils are presented in the table below:
Pupils
Score (X)
X-X
(X-X)
A 32 3.2 10.24
B 36 7.2 51.84
C 36 7.2 51.84
D 22 -6.8 46.24
E 38 9.2 84.64
F 15 -13.8 190.44
G 43 14.2 201.64
H 25 -3.8 14.44
I 18 -10.8 116.64
J 23 -5.8 33.64
288 801.60
X = 28.8 S = 89.07 k = 50
Show how mean and standard deviation was obtained in the box below:
Could you now compute the reliability of the test applying the formula on Kuder Richardson
formula 21? Please try!
Usability
Usability means the degree to which the tests are used without much expenditure of time, money
and effort. It also means practicability. Factors that determine usability are: administrability,
scorability, interpretability, economy and proper mechanical makeup of the test.
Administrability means that the test can be administered with ease, clarity and uniformity.
Directions must be made simple, clear and concise. Time limits, oral instructions and sample
questions are specified. Provisions for preparation, distribution, and collection of test materials
must be definite.
Scorability is concerned on scoring of test. A good test is easy to score thus: scoring direction is
clear, scoring key is simple, answer is available, and machine scoring as much as possible be
made possible.
Test results can be useful if after evaluation it is interpreted. Correct interpretation and
application of test results is very useful for sound educational decisions.
An economical test is of low cost. One way to economize cost is to use answer sheet and
reusable test. However, test validity and reliability should not be sacrificed for economy.
Proper mechanical make-up of the test concerns on how tests are printed, what font size are used,
and are illustrations fit the level of pupils/students.
Summary
A good measuring instruments posses three qualities, which include: validity, reliability and
usability.
Validity the extent to measure what it intends to measure. It has four types; the content,
construct, concurrent and predictive. Test validity can be affected by: inappropriateness of the
test, direction, vocabulary and construction of test, level of difficulty, constructions, length of
test, arrangement of items and patterns of answers.
Reliability is the consistency of scores obtained by an individual given the same test at different
times. This can be estimated using test-retest method, alternate forms, split-half and kuder-
Richardson Formula 21. The reliability of test may however be affected by the length of test, the
difficulty of test item, the objectivity of scoring heterogeneity of the student group, and limited
time.
Usability of test means its practicability. This quality is determined by: ease in administration
(administrability), ease in scoring (scorability), ease in interpretation and application
(interpretability), economy of materials and the proper mechanical make-up of the test.
A test to be effective must be valid. For a valid test is always valid, but not all reliable test is
valid.
Learning Exercises
3. When an achievement test for grade V pupils was administered to grade VI, what is most
affected?
II. Setting and Option Multiple choice: Table 1 presents the scores of 10 students who were
tested twice (test-retest) to test the reliability of such test. Complete the table and answer the
question below. Choose the correct answer and show the computation in the question where it is
needed.
1 68 71
2 65 65
3 70 69
4 65 68
5 70 72
6 65 63
7 62 62
8 64 66
9 58 60
10 60 60
Total =
5. Based on Garrets interpretation of the calculated rs, what can you say about the test
constructed?
1. Mr. Gwen conducted a 40-item Mathematics test to his 10 udents. Their scores in the first half
and in the second half are shown below. Find the reliability of the whole test using split-half
method. Is the test reliable? Justify.
1st half 17 18 20 11 10 13 20 19 19 15
2nd half 15 13 18 10 8 10 18 16 17 14
2. Ms. Pearl administered a 30 item science test to her Grade 6 pupils. the scores are shown
below. What is the reliability of the whole test using Kuder-Richardson Formula 21? Is the test
reliable? Justify.
Pupils A B C D E F G H I J K L
Scores 25 22 30 22 17 15 18 24 27 18 23 26
IV. Essay:
1. In your own opinion, which is better a valid test or reliable test? Why?
2. Why do you think students score in a particular test sometimes vary?
3. Discuss what makes test items/test results invalid?
References
Calmorin, Laurentina. Educational Research, Measurement and Evaluation, 2nd Ed. Metro
Manila, Philippines: National Book Store, Inc. 1994.