Académique Documents
Professionnel Documents
Culture Documents
Introduction
Since the need to item analyze the test questions is often ignored because of the difficulty and
complexity of the process, there is a need to develop a system that will lessen, if not eliminate, the said
difficulty or complexity. This project is a response to this need, for it has developed software for item
analysis that will analyze the answers of the examinees of each item given accurately and efficiently.
More specifically, this project attempted to answer these questions:
1) What is the design of the Test Checker and Item Analyzer with Statistics?
2) How does the Test Checker and Item Analyzer with Statistics differ with the traditional
method, manual encoding with software method, in terms of accuracy and efficiency in the conduct of
item analysis?
3) How do teachers perceive the level of usability and acceptability of the design of the TCIAS
in terms of feasibility, functionality, accuracy and efficiency?
Review of Literature
The most difficult and demanding aspect of preparing good classroom tests is writing the test
questions. The choice of item type should be made on the basis of the objective or process to be appraised
by the item. If possible, the questions and items for the test should be prepared well in advance of time
for testing, and reviewed and edited before they are used in the test. According to Jacob and Chase
(1993), after instructors have written a set of test items, following the rules, they do not know if the items
will show which students have mastered the topic of instruction and which have not. The items must be
tried out on the students before the instructor can determine how well each item works.
Hopkins and Antes (1990) wrote that information gathered about item difficulty, discrimination
power of items, balance, specificity and objectivity could be used to improve future tests. Effective items
can be developed and good testing practices can be determined from what has been successfully
developed in the past. Since the individual test items determine the nature of the test and the extent that
the instrument measures what the teacher intends to measure, successful testing rests, first of all, with a
set of effective items. Improvement of test quality rests in using appraisal information to strengthen test
item through appropriate revision to reduce technical defects and the factors causing them.
Analyzing Test Questions. Thorndike and Hagen (1991) comment that after a test has been tried
2
out and scored, the result may be analyzed in two ways. One is from the standpoint of what the results
reveal about the pupils' learning or how successful instruction has been. Another type of the analysis has
for its purpose the evaluation of the test as a measuring instrument.
Insuring Validity and Reliability of the Test. Hopkins and Antes (1990) discusses more on
forming criterion groups such as 25%, 27%, 30% and 33% of the respondents for the high and low
groups. Oosterwolf (1994) touches on the Item Response Theory (IRT) that is being used quite
extensively in educational measurement, particularly with standardized test. Ebel and Prisbie (1986)
discuss the item revision process on the basis of item analysis data. Brown (1981) agrees on the
percentage of the distracters, which are attractive, perhaps even more attractive than the correct
response.
The 27% Criterion Group. Based on Kelly's study (1939) which recommends 27% criterion
group, Daleon (1989) also recommends 27%. On the other hand, Sax (1989) points out the desirability
of high discrimination indexes, since reliability increases as the average value of discrimination indexes
for the test items increases. The standard deviation of scores is increased when the discrimination
index increases, and if the increase measures true differences, the reliability will also increase.
Frederick B. Davis, in his "Item Analysis in Relation to Educational and Psychological Testing,"
Psychological Bulletin, volume 49, series 1952, pages 97-121, stated that the construction of solid and
reliable tests must consider the quantitative information on the difficulty and discriminating power of
each test exercise, or item proposed for use.
The researchers utilized the developmental process of research; however, in testing the product,
the descriptive research was adopted. The developmental process requires that the researchers' output is
immediately useful in the field of education.
Computer Software. In developing the system, the researchers gathered information on the
procedures of item analysis and test norms from different textbooks, web sites, and journals. The
computer software for international edition was then developed to facilitate item analysis, test norms,
and competency level.
Development of the Test Checker and Item Analyzer with Statistics The Planning Stage. The
software was developed using Visual Basic 6. It was based from the Procedure of Item Analysis and
primarily focused on the conditions for the difficulty level, discriminatory power, and the desirability of
options. The software was designed to accept more than 500 examinees in a test with 4 options and 120
items, although this is also dependent on the memory and capacity of the computer.
Testing and Re-testing. Since it is very important that the results of the computer output for a
certain statistical function match the long hand computations given the same set of data, this process of
testing and re-testing was done repeatedly. Whenever the result was different up to the hundredth
place, adjustment in the program was made until such time that all the conditions were satisfied.
Feasibility. Feasibility was determined from the results of the four half-day seminar-
workshops conducted in four districts of Camarines Sur, where the teacher participants were asked to rate
the system in the five-point scale of 1-
Least Feasible, 2-Less Feasible, 3-Feasible, 4-More Feasible and 5-Most Feasible. The results
were validated through the interviews conducted by the researchers with most of the participants.
Functionality, The functionality of the computer software was determined through the output of
the computer.
The Accuracy. The accuracy of the software computation was determined through the
3
comparison of the output of the software and the long hand computation.
The Efficiency. The efficiency was determined in terms of the percentage of time saved
between a) the manual/traditional method versus the Test Checker and Item Analyzer.
The z-test. This test was used to determine the accuracy of the output generated by the two
methods used in this study on item analysis.
The t-test. The t-test is used in statistics for small-sample tests concerning difference between
two means and, in this project, was used to tell the difference in efficiency measured in terms of the
average time consumed by the TCIAS versus the traditional procedure.
The Rate of Time Saved was used to measure the degree of efficiency of the TCIAS.
The significance level for the tests conducted in this study was set at one percent (1%).
key to correction and actual respondents of the examinees. Below is the frequency result per
option per item to include the void answers and the no answer items. The summary shows the
class performance per item in comparison with other classes and the over all performance. It
shows the correlation per two groups and with respect to the over all performance. It also shows
the mean performance (skill) of each class and over all performance of the school or division.
The competency level Figure 4 shows the comparative skills per sub subject, per subject of
individual, class and school. The user can also rearrange the item number per skill, correlate
skills, and can change the index for percent correct. This can be done by entering first the number
of subjects, press Enter key, then the number of sub subject per subject press Enter key then the
number of items per skill or sub subject.
Summary and Conclusions
The main purpose of this project is the development of computer software named
TEST CHECKER and ITEM ANALYZER with STATISTICS (TCIAS). The product
developed is accurate and efficient; hence, faster than the traditional method.
Problem 1
What is the design of the Test Checker and Item Analyzer with Statistics?
Findings
The program was designed using Visual Basic 6, a computer language. The design of the
software is user-friendly just like most widely used programs designed by programmers
throughout the world. It contains the layout of the title and the menu for the different
procedures. It is in color to attract the users. Since the software is user-friendly, any one can
manipulate it. By one look at the tabulated results, a user could, therefore, easily deduce the
analysis of each item, each student's performance, the totality of the test results and skills
competency.
Problem 2
What is the significant difference between the traditional method and, the Test
Checker and Item Analyzer with Statistics in terms of accuracy and efficiency in the conduct
of item analysis?
Hypothesis:
The traditional method and the Test Checker and Item Analyzer with Statistics method
are significantly different in terms of accuracy and efficiency in the conduct of item analysis
and test norms.
Findings
The z-test for differences in accuracy of the methods used for item analysis, which was
based on the discrepancy in the decisions made between methods was found to be zero. This is
less than the critical value of 2.33 at 1% level; hence, the decisions made by both methods are
exactly the same. The results of the t-test for efficiency show that between methods a value of
12.90 is obtained. The value is greater than the critical value of 3.74 in 1% limit under the area
of the t-curve. The efficiency rates in terms of time saved of Method 1 and Method 2 was 90%.
This shows that the time consumed as a measure of efficiency in item analysis is highly and
significantly different which leads to the rejection of the null hypothesis.
Conclusion
On efficiency, use of the Test Checker and Item Analyzer with Statistics takes a
significantly lesser time than the other methods; hence, the system is more efficient in
performing item analysis and test norms.
Problem 3
6
How do teachers rate the level of usability and acceptability of the design of Test
Checker and Item Analyzer with Statistics in terms of feasibility functionality, accuracy and
efficiency?
Findings
The Test Checker and Item Analyzer with Statistics is most useful, as perceived by the
elementary and secondary school teachers, as shown by the general weighted mean of 4.56. In
terms of the level of acceptability, the teachers in elementary and high school perceived the
TCIAS as most acceptable with respect to efficiency (4,71), accuracy (4.39), and functionality
(4.35). The technology is more acceptable (3.92) with respect to feasibility (it affordability and
being user-friendly).
Conclusion
The TCIAS is perceived by the elementary, secondary and tertiary school teachers to be
most useful and most acceptable in the conduct of item analysis, test norms, and competency
level. This implies that the teachers acknowledge that with the use of the system, they will be
more efficient and accurate in doing the job. They likewise acknowledge that the system is
functional, affordable by their school, and can be used without much problem
institutions, most specially the schools considered as centers of excellence, to acquire the
TCIAS to help facilitate the task of item analysis, and thus alleviate the workload of their
teachers. The DepEd must, in fact, set the example by acquiring one for the National
Educational Testing Center. Likewise, it is further recommended that every school under the
supervision of the DepEd, the CHED, DOST, and the TESDA, be enjoined to develop a data
bank of questions by disciplines, consisting of the questions accepted by the system. Item cards
or an electronic data bank for permanent record for each item should be made.
Literature Cited
Anastasi, Anne, C1988, "Psychological Testing", Macmilan lane. United State of America, 6th
edition, 117 p
Brown, Frederick G., "Measuring Classroom Achievement", Halt Richard and Winston, U.S.A.",
c1981, pp. 101-110, 224p
Ebel, Robert L. and Prisbie, David A., "Essentials of Educational Management", Printice Hall
Inc. N.J., 4th Edition, c1986, pp 226-240
Hopkins, Charles D. and Antes, Richard L., "Classroom Management and Evaluation", F.E.
Pencock Publishing, Inc. U.S.A., 3rd Edition, c 1990, pp 537-554, 273 p
Hopkins, Kenneth D. / Stanley, Julian C, "Educational and Psychological Measurement
and Evaluation", Prantice Hall, lee. N.J., 6th Edition, c1981,pp 269-288, 217 p.
Jacob, Lucy C. & Chase, Clinton I., C1993, "Development and Using Tests Effectively", San
Francisco, Jersey Publishing
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items.
Journal of Educational Psychology, 30,17-24
Linn, Robert L. and Gronlund, Norman E., "Measurement and Assesment in Teaching",
Macmillan Publishing Co, U.S.A., 7th Edition, c1995, pp 318-320. 360 D
Noll Victor H., "Introduction to Educational Measurement11, Cambridge, Massachusetts, Printed in
U.S.A., c 1957, pp. 148
Oosterwolf, Albert, "Classromm Application of Educaional Management'', Macmillan
Publishing Co. N.Y., 2nd Edition, c 1994, pp 196-208, 474 p
Sax, G., "Principle of Educational and Psychological Measurement and Evaluation", c1989, 3rd
edition, pp 227-253 Belmont, CA,
Wadsworth Thorndike, Robert L & Hagen, Elizabeth, "Measurement and Evaluation in
Phychology and Education", 5rd Edition, c1991 New York. pp. 124-128 554 p