Vous êtes sur la page 1sur 10

Module 2 Qualities of a good

Measuring Instrument
Leave a reply

Whether a test is standardized or teacher-made, it should apply the qualities of a good measuring
instrument. This module discusses the qualities of a good test which are: validity, reliability, and
usability.

After reading this module, students are expected to:

1. define and explain the characteristics of a good measuring instruments;


2. identify the types of validity;
3. describe what conditions can affect the validity of test items;
4. discuss the factors that affect the reliability of test;
5. estimate test reliability using different methods;
6. enumerate and discuss the factors that determine the usability of test; and
7. point out which is the most important characteristics of a good test.

Validity

Validity is the most important characteristics of a good test. Validity refers to the extent to
which the test serves its purpose or the efficiency with which it measures what it intends to
measure.

The validity of test concerns what the test measures and how well it does for. For example, in
order to judge the validity of a test, it is necessary to consider what behavior the test is supposed
to measure.

A test may reveal consistent scores but if it is not useful for the purpose, then it is not valid. For
example, a test for grade V students given to grade IV is not valid.

Validity is classified into four types: content validity, concurrent validity, predictive validity, and
construct validity.

Content validity means that extent to which the content of the test is truly a representative of
the content of the course. A well constructed achievement test should cover the objectives of
instruction, not just its subject matter. Three domains of behavior are included: cognitive,
affective and psychomotor.

Concurrent validity is the degree to which the test agrees with or correlates with a criterion
which is set up an acceptable measure. The criterion is always available at the time of testing.
Concurrent validity or criterion-related validity- establishes statistical tool to interpret and
correlate test results.

For example, a teacher wants to validate an achievement test in Science (X) he constructed. He
administers this test to his students. The result of this test can be compared to another Science
students (Y), which has been proven valid. If the relationship between X and Y is high, this
means that the achievement test is Science is valid. According to Garrett, a highly reliable test is
always valid measure of some functions.

Predictive validity is evaluated by relating the test to some actual achievement of the students
of which the test is supposed to predict his success. The criterion measure against this type is
important because the future outcome of the testee is predicted. The criterion measure against
which the test scores are validated and obtained are available after a long period.

Construct validity is the extent to which the test measures a theoretical trait. Test item must
include factors that make up psychological construct like intelligence, critical thinking, reading
comprehension or mathematical aptitude.

Factors that influence validity are:

1. Inappropriateness of test items items that measure knowledge can not measure skill.
2. Direction unclear direction reduce validity. Direction that do not clearly indicate how the
pupils should answer and record their answers affect validity of test items.
3. Reading vocabulary and sentence structures too difficult and complicated vocabulary and
sentence structure will not measure what it intend to measure.
4. Level of difficulty of Items too difficult or too easy test items can not discriminate between
bright and slow pupils will lower its validity.

5. Poorly constructed test item test items that provide clues and items that are ambiguous
confuse the students and will not reveal a true measure.

6. Length of the test- a test should of sufficient length to measure what it is supposed to measure.
A test that is too short can not adequately sample the performance we want to measure.

7. Arrangement of items test item should be arrange according to difficulty, with the easiest
items to the difficult ones. Difficult items when encountered ahead may cause mental block and
may also cause student to take much time in that number.

8. Patterns of answers when students can detect the pattern of correct answer, they are liable to
guess and this lowers validity.

Reliability

Reliability means consistency and accuracy. It refers then to the extent to which a test is
dependable, self consistent and stable. In other words, the test agrees with itself. It is concerned
with the consistency of responses from moment to moments even if the person takes the same
test twice, the test yields the same result.

For example, if a student got a score of 90 in an English achievement test this Monday and gets
30 on the same test given on Friday, then both score can not be relied upon.

Inconsistency of individual scores however may be affected by persons scoring the test, by
limited samples on certain areas of the subject matter and particularly the examinees himself. If
the examinees mood is unstable this may affect his score.

Factors that affect reliability are:

1. Length of the test. As a general rule, the longer the test, the higher the reliability. A longer test
provides a more adequate sample of the behavior being measured and is less distorted by chance
factors like guessing.

2. Difficulty of the test. When a test is too easy or too difficult, it cannot show the differences
among individuals; thus it is unreliable. Ideally, achievement tests should be constructed such
that the average score is 50 percent correct and the scores range from near zero to near perfect.

3. Objectivity. Objectivity eliminates the bias, opinions or judgments of the person who checks
the test. Reliability is greater when test can be scored objectively.

4. Heterogeneity of the student group. Reliability is higher when test scores are spread over a
range of abilities. Measurement errors are smaller than that of a group that is more
heterogeneous.

5. Limited time. a test in which speed is a factor is more reliable than a test that is conducted at a
longer time.

A reliable test however, is not always valid.

Methods of Estimating Reliability of Test:

1. Test-retest method. The same instrument is administered twice to the same group of subjects.
The scores of the first and second administrations of the test are determined by Spearman rank
correlation coefficient or Spearman rho and Pearson Product-Moment Correlation Coefficient.

The formula using Spearman rho is:

rs 1 6D2 Where ; D2 = sum of squared difference


N3 N between ranks
N = total number of cases
For example, 10 students where used as samples to test the reliability of the achievement test in
Biology. After two administration of test the data and computation of Spearman rho is presented
in the table below:

Differences squared
Scores Ranks between ranks difference
Students S1 S2 R1 R2 D D2

1 89 90 2 1.5 0.5 0.25


2 85 85 4.5 4 0.5 0.25
3 77 76 9 9 0 0
4 80 81 7.5 8 0.5 0.25
5 83 83 6 6.5 0.5 0.25
6 87 85 3 4 1.0 1.0
7 90 90 1 1.5 0.5 0.25
8 73 72 10 10 0 0
9 85 85 4.5 4 0.5 0.25
10 80 83 7.5 6.5 1.0 1.0

Total D = 3.5

rs = 1 6D2
N3 N

=1

=1

= 1 0.0212

= 0.98 (very high relationship

The rs value obtained is 0.98 which means very high relationship; hence achievement test in
Biology is reliable.

Pearson Product-Moment Correlation Coefficient can also be used for test-retest method of
estimating the reliability of test. The formula is:

Using the same data for Spearman rho, the scores for 1st and 2nd administration may be
presented in this way:

X (S1) Y(S2) X2 Y2 XY

89 90 7921 8100 8010


85 85 7225 7225 7225
77 76 5929 5776 5852
80 81 6400 6561 6480
83 83 6869 6869 6869
87 85 7569 7225 7395
90 90 8100 8100 8100
73 72 5329 5184 5256
85 85 7225 7225 7225
80 83 6400 6889 6640
X = 829 = 830 X2 = 68967 2 =69154 X = 69052

Could you now compute by using the formula above? Illustrate below:

Alternate-forms method. The second method of establishing the reliability of test results. In this
method, we give two forms of a test similar in content, type of items, difficulty, and others in
close succession to the same group of students. To test the reliability the correlation technique is
used (refer to the formula used in Pearson Product-Moment Correlation Coefficient).

2. Split-half method. The test may be administered once, but the test items are divided into two
halves. The most common procedure is to divide a test into odd or even items. The results are
correlated and the r obtained is the reliability coefficient for a half test. The Spearman-Brown
formula is used which is:

where; r = reliability of whole test


rht = reliability of half of the test

For example, rht is 0.69. what is r?

rt = 2 rht
1 + rht

= 2(0.69)
1+ 0.69

= 0.82 very high relationship, so the test is reliable.

Split-half method is applicable for not highly speeded measuring instrument. If the measuring
instrument includes easy items and the subject is able to answer correctly all or nearly all items
within the time limit of the test, the scores on the two halves would be about similar and the
correlation would be closed to +1.00.

3. Kuder-Richardson Formula 21 is the last method of establishing the reliability of a test. Like
the split half method, a test is conducted only once. This method assumes that all items are of
equal difficulty. The formula is:

Where:
X = the mean of the obtained scores
S = the standard deviation
k = the total number of items

Example: Mr. Marvin administered a 50-item test to 10 of his grade 5 pupils. The scores of his
pupils are presented in the table below:

Pupils
Score (X)
X-X
(X-X)
A 32 3.2 10.24
B 36 7.2 51.84
C 36 7.2 51.84
D 22 -6.8 46.24
E 38 9.2 84.64
F 15 -13.8 190.44
G 43 14.2 201.64
H 25 -3.8 14.44
I 18 -10.8 116.64
J 23 -5.8 33.64

288 801.60

X = 28.8 S = 89.07 k = 50

Show how mean and standard deviation was obtained in the box below:

Could you now compute the reliability of the test applying the formula on Kuder Richardson
formula 21? Please try!

Usability

Usability means the degree to which the tests are used without much expenditure of time, money
and effort. It also means practicability. Factors that determine usability are: administrability,
scorability, interpretability, economy and proper mechanical makeup of the test.

Administrability means that the test can be administered with ease, clarity and uniformity.
Directions must be made simple, clear and concise. Time limits, oral instructions and sample
questions are specified. Provisions for preparation, distribution, and collection of test materials
must be definite.

Scorability is concerned on scoring of test. A good test is easy to score thus: scoring direction is
clear, scoring key is simple, answer is available, and machine scoring as much as possible be
made possible.
Test results can be useful if after evaluation it is interpreted. Correct interpretation and
application of test results is very useful for sound educational decisions.

An economical test is of low cost. One way to economize cost is to use answer sheet and
reusable test. However, test validity and reliability should not be sacrificed for economy.

Proper mechanical make-up of the test concerns on how tests are printed, what font size are used,
and are illustrations fit the level of pupils/students.

Summary

A good measuring instruments posses three qualities, which include: validity, reliability and
usability.

Validity the extent to measure what it intends to measure. It has four types; the content,
construct, concurrent and predictive. Test validity can be affected by: inappropriateness of the
test, direction, vocabulary and construction of test, level of difficulty, constructions, length of
test, arrangement of items and patterns of answers.

Reliability is the consistency of scores obtained by an individual given the same test at different
times. This can be estimated using test-retest method, alternate forms, split-half and kuder-
Richardson Formula 21. The reliability of test may however be affected by the length of test, the
difficulty of test item, the objectivity of scoring heterogeneity of the student group, and limited
time.
Usability of test means its practicability. This quality is determined by: ease in administration
(administrability), ease in scoring (scorability), ease in interpretation and application
(interpretability), economy of materials and the proper mechanical make-up of the test.
A test to be effective must be valid. For a valid test is always valid, but not all reliable test is
valid.

Learning Exercises

I. Multiple Choice: Encircle the correct answer.

1. Which statement concerning validity and reliability is most accurate?

a. A test can not be reliable unless it is valid.


b. A test can not be valid unless it is reliable.
c. A test can not be valid and reliable unless it is objective.
d. A test can not be valid and reliable unless it is standardized.

2. Which type of validity is appropriate for criterion-reference measure?

a. content validity c. construct validity


b. concurrent validity d. predictive validity
3. Which is directly affected by objectivity in scoring?

a. The validity of test c. The reliability of test


b. The usability of test d. The administrability of test

2. A teacher-made test constructed had overemphasized facts and underemphasized other


objectives of the course for which it is designed, what can be said about the test?

a. It lacks content validity.


b. It lacks construct validity.
c. It lacks predictive validity.
d. It lacks criterion-related validity.

3. When an achievement test for grade V pupils was administered to grade VI, what is most
affected?

a. reliability of the test c. usability of the test


b. validity of the test D. reliability and validity of the test

4. Which factor of usability is described by the wise use of testing materials?


a. scorability c. economy
b. adminstrability d. proper mechanical make-up

5. Clarity and uniformity in giving directions affects:


a. scorability of the test c. interpretability of the test
b. administrability of the test d. proper mechanical make-up
6. Which best describe validity?
a. consistency in test result
b. practicability of the test
c. homogeneity in the content of the test
d. objectivity in administration and scoring of test

II. Setting and Option Multiple choice: Table 1 presents the scores of 10 students who were
tested twice (test-retest) to test the reliability of such test. Complete the table and answer the
question below. Choose the correct answer and show the computation in the question where it is
needed.

Score Rank Differences Squared


Students from ranks difference
S1 S2 R1 R2 D D2

1 68 71
2 65 65
3 70 69
4 65 68
5 70 72
6 65 63
7 62 62
8 64 66
9 58 60
10 60 60

Total =

1. What is the squared difference between ranks of students?


a. 13 b. 14 c. 15 d. 16

2. Who got the highest score in the second administration of test?


a. student 1 c. student 5
b. student 3 d. student 7

3. What is the calculated rs?


a. 0.86 b. 0.88 c. 0.90 d. 0.92

4. What is the Garrets interpretation of the obtained rs (refer to question no.3)?


a. negligible correlation c. marked relationship
b. low correlation d. high or very high relationship

5. Based on Garrets interpretation of the calculated rs, what can you say about the test
constructed?

III. Simple Recall:

1. Mr. Gwen conducted a 40-item Mathematics test to his 10 udents. Their scores in the first half
and in the second half are shown below. Find the reliability of the whole test using split-half
method. Is the test reliable? Justify.

1st half 17 18 20 11 10 13 20 19 19 15
2nd half 15 13 18 10 8 10 18 16 17 14

2. Ms. Pearl administered a 30 item science test to her Grade 6 pupils. the scores are shown
below. What is the reliability of the whole test using Kuder-Richardson Formula 21? Is the test
reliable? Justify.

Pupils A B C D E F G H I J K L
Scores 25 22 30 22 17 15 18 24 27 18 23 26

IV. Essay:

1. In your own opinion, which is better a valid test or reliable test? Why?
2. Why do you think students score in a particular test sometimes vary?
3. Discuss what makes test items/test results invalid?
References

Asaad, Abubakar S. and Wilham M. Hailaya. Measurement and Evaluation,


Concepts and Principles, 1st Ed. Manila, Philippines: Rex Book Store,
Inc. 2004.

Calmorin, Laurentina. Educational Research, Measurement and Evaluation, 2nd Ed. Metro
Manila, Philippines: National Book Store, Inc. 1994.

Oriondo, Leonora L. and Eleonor M. Antonio. Evaluating Educational Outcomes. Manila,


Philippines: Rex Book store, 1984.

Vous aimerez peut-être aussi