Items Papers

1
Introduction
The value of good examinations cannot be undermined. Any reputable institution—public or

private, business or educational—requires all entrants to pass a standard examination. The
government, through its various departments, requires prospective government personnel to take a
battery of tests that will determine their employees' career aptitude, so that they can be given the jobs
that suit their qualifications. The business sector, in like manner, does the same. The education sector,
on the other hand, requires student applicants to pass a set of examinations to determine who should be
admitted and periodically assesses the performance of the students currently enrolled. Formulating a
good examination is therefore the thrust of every agency involved in education and employment. This
task calls for a table of specifications and course objectives, which set the learning targets at the
cognitive level, and, in some cases where necessary, the expected psychomotor skills or affective
skills. These prerequisites show that preparing good examination questions requires a great deal of
thinking and preparation on the part of the persons doing the task.
Statement of the Problem
Since the need to item analyze the test questions is often ignored because of the difficulty and
complexity of the process, there is a need to develop a system that will lessen, if not eliminate, the said
difficulty or complexity. This project is a response to this need, for it has developed software for item
analysis that will analyze the answers of the examinees of each item given accurately and efficiently.
More specifically, this project attempted to answer these questions:
1) What is the design of the Test Checker and Item Analyzer with Statistics?
2) How does the Test Checker and Item Analyzer with Statistics differ with the traditional
method, manual encoding with software method, in terms of accuracy and efficiency in the conduct of
item analysis?
3) How do teachers perceive the level of usability and acceptability of the design of the TCIAS
in terms of feasibility, functionality, accuracy and efficiency?
Review of Literature
The most difficult and demanding aspect of preparing good classroom tests is writing the test
questions. The choice of item type should be made on the basis of the objective or process to be appraised
by the item. If possible, the questions and items for the test should be prepared well in advance of time
for testing, and reviewed and edited before they are used in the test. According to Jacob and Chase
(1993), after instructors have written a set of test items, following the rules, they do not know if the items
will show which students have mastered the topic of instruction and which have not. The items must be
tried out on the students before the instructor can determine how well each item works.
Hopkins and Antes (1990) wrote that information gathered about item difficulty, discrimination
power of items, balance, specificity and objectivity could be used to improve future tests. Effective items
can be developed and good testing practices can be determined from what has been successfully
developed in the past. Since the individual test items determine the nature of the test and the extent that
the instrument measures what the teacher intends to measure, successful testing rests, first of all, with a
set of effective items. Improvement of test quality rests in using appraisal information to strengthen test
item through appropriate revision to reduce technical defects and the factors causing them.
Analyzing Test Questions. Thorndike and Hagen (1991) comment that after a test has been tried
2
out and scored, the result may be analyzed in two ways. One is from the standpoint of what the results
reveal about the pupils' learning or how successful instruction has been. Another type of the analysis has
for its purpose the evaluation of the test as a measuring instrument.
Insuring Validity and Reliability of the Test. Hopkins and Antes (1990) discusses more on
forming criterion groups such as 25%, 27%, 30% and 33% of the respondents for the high and low
groups. Oosterwolf (1994) touches on the Item Response Theory (IRT) that is being used quite
extensively in educational measurement, particularly with standardized test. Ebel and Prisbie (1986)
discuss the item revision process on the basis of item analysis data. Brown (1981) agrees on the
percentage of the distracters, which are attractive, perhaps even more attractive than the correct
response.
The 27% Criterion Group. Based on Kelly's study (1939) which recommends 27% criterion
group, Daleon (1989) also recommends 27%. On the other hand, Sax (1989) points out the desirability
of high discrimination indexes, since reliability increases as the average value of discrimination indexes
for the test items increases. The standard deviation of scores is increased when the discrimination
index increases, and if the increase measures true differences, the reliability will also increase.
Frederick B. Davis, in his "Item Analysis in Relation to Educational and Psychological Testing,"
Psychological Bulletin, volume 49, series 1952, pages 97-121, stated that the construction of solid and
reliable tests must consider the quantitative information on the difficulty and discriminating power of
each test exercise, or item proposed for use.
Materials and Methods
The researchers utilized the developmental process of research; however, in testing the product,
the descriptive research was adopted. The developmental process requires that the researchers' output is
immediately useful in the field of education.
Computer Software. In developing the system, the researchers gathered information on the
procedures of item analysis and test norms from different textbooks, web sites, and journals. The
computer software for international edition was then developed to facilitate item analysis, test norms,
and competency level.
Development of the Test Checker and Item Analyzer with Statistics The Planning Stage. The
software was developed using Visual Basic 6. It was based from the Procedure of Item Analysis and
primarily focused on the conditions for the difficulty level, discriminatory power, and the desirability of
options. The software was designed to accept more than 500 examinees in a test with 4 options and 120
items, although this is also dependent on the memory and capacity of the computer.
Testing and Re-testing. Since it is very important that the results of the computer output for a
certain statistical function match the long hand computations given the same set of data, this process of
testing and re-testing was done repeatedly. Whenever the result was different up to the hundredth
place, adjustment in the program was made until such time that all the conditions were satisfied.
Feasibility. Feasibility was determined from the results of the four half-day seminar-
workshops conducted in four districts of Camarines Sur, where the teacher participants were asked to rate
the system in the five-point scale of 1-
Least Feasible, 2-Less Feasible, 3-Feasible, 4-More Feasible and 5-Most Feasible. The results
were validated through the interviews conducted by the researchers with most of the participants.
Functionality, The functionality of the computer software was determined through the output of
the computer.
The Accuracy. The accuracy of the software computation was determined through the
3
comparison of the output of the software and the long hand computation.
The Efficiency. The efficiency was determined in terms of the percentage of time saved
between a) the manual/traditional method versus the Test Checker and Item Analyzer.
The z-test. This test was used to determine the accuracy of the output generated by the two
methods used in this study on item analysis.
The t-test. The t-test is used in statistics for small-sample tests concerning difference between
two means and, in this project, was used to tell the difference in efficiency measured in terms of the
average time consumed by the TCIAS versus the traditional procedure.
The Rate of Time Saved was used to measure the degree of efficiency of the TCIAS.
The significance level for the tests conducted in this study was set at one percent (1%).
Results and Discussions

The software is run by
the usual procedure using
Windows by clicking the icon
items twice. The user can direct
the operations he prefers on the
screen menu, for instance, item/
Table of specification. The data
entry is where the user can
encode the "No. of Items" then
press the Enter key. The key to
correction and answers of the
examinees can be encoded using
the left hand: number 1 means
A, 2 for B, number 5 means
no answer and number 6
means two or more answers
Figure 1: The Item Analysis for each item. The data can be
edited by retyping, using the
mouse or the up and down
arrow keys. The key to
correction should be entered
first in the column provided.
After each respondent's entry,
the user can press the Enter
key or click the next button.
The computer shows the
code and the test score in the
space provided. The user can
click the examinee score to
edit and analyze the entry
encoded previously. Finally,
the user may press the end
button to complete the data.
The Item Analysis
4
Figure 2: The Test Analysis Figure 1 is used to show the

analysis of each item, such as key to correction, difficulty level, discriminatory power, remarks if
the item is retained, revised or rejected. It also shows the frequency or the number of
respondents that selected each option between the high scoring and low scoring respondents, the
desirability of each option if accepted or not and the point biserial correlation.
The result also shows the measures of central tendency; the measure of dispersions; reliability
and standard error of measures of the test the different graph distributions such as Item
characteristics and Item response theory, Item response distributions and correlations. The user
can also change the indices of
difficulty and discrimination
and see the analysis of each
item in table.
The test Analysis as shown in
Figure 2 shows the number
of examinees, their
examination code, raw score
which is arranged from
highest to lowest, percentile
rank, standard score norm
like z-score, T-score, Normal
Curve Equivalent (NCE) and
the stanine of each examinee.
These data can be directed to
a printer or a file- It shows
Figure 3: The Graph Presentation also the score equivalent,
correlation, split halves reliability, frequency distribution of score brackets and mean
performance level (MPL) of the class.
The graph presentation in Figure 3 shows the frequency distributions of scores, the
normality of the curve, if
positive skew or negative
skew; and the kurtosis of the
curve if Leptokurtic
,Mesokurtic , or Platykurtic ,
the normal curve as
compared to the normal curve
shown below. It also shows
the group frequency
distribution, individual
frequency distribution,
accumulative frequency
distribution, "box and
whiskers plot, dot plot,
“beam and balance plot”,
“stem and leap plot” and
quantile plot.
Figure 4: The Competency Level The data matrix shows the
5
key to correction and actual respondents of the examinees. Below is the frequency result per
option per item to include the void answers and the no answer items. The summary shows the
class performance per item in comparison with other classes and the over all performance. It
shows the correlation per two groups and with respect to the over all performance. It also shows
the mean performance (skill) of each class and over all performance of the school or division.
The competency level Figure 4 shows the comparative skills per sub subject, per subject of
individual, class and school. The user can also rearrange the item number per skill, correlate
skills, and can change the index for percent correct. This can be done by entering first the number
of subjects, press Enter key, then the number of sub subject per subject press Enter key then the
number of items per skill or sub subject.
Summary and Conclusions
The main purpose of this project is the development of computer software named
TEST CHECKER and ITEM ANALYZER with STATISTICS (TCIAS). The product
developed is accurate and efficient; hence, faster than the traditional method.
Problem 1
What is the design of the Test Checker and Item Analyzer with Statistics?
Findings
The program was designed using Visual Basic 6, a computer language. The design of the
software is user-friendly just like most widely used programs designed by programmers
throughout the world. It contains the layout of the title and the menu for the different
procedures. It is in color to attract the users. Since the software is user-friendly, any one can
manipulate it. By one look at the tabulated results, a user could, therefore, easily deduce the
analysis of each item, each student's performance, the totality of the test results and skills
competency.
Problem 2
What is the significant difference between the traditional method and, the Test
Checker and Item Analyzer with Statistics in terms of accuracy and efficiency in the conduct
of item analysis?
Hypothesis:
The traditional method and the Test Checker and Item Analyzer with Statistics method
are significantly different in terms of accuracy and efficiency in the conduct of item analysis
and test norms.
Findings
The z-test for differences in accuracy of the methods used for item analysis, which was
based on the discrepancy in the decisions made between methods was found to be zero. This is
less than the critical value of 2.33 at 1% level; hence, the decisions made by both methods are
exactly the same. The results of the t-test for efficiency show that between methods a value of
12.90 is obtained. The value is greater than the critical value of 3.74 in 1% limit under the area
of the t-curve. The efficiency rates in terms of time saved of Method 1 and Method 2 was 90%.
This shows that the time consumed as a measure of efficiency in item analysis is highly and
significantly different which leads to the rejection of the null hypothesis.
Conclusion
On efficiency, use of the Test Checker and Item Analyzer with Statistics takes a
significantly lesser time than the other methods; hence, the system is more efficient in
performing item analysis and test norms.
Problem 3
6
How do teachers rate the level of usability and acceptability of the design of Test
Checker and Item Analyzer with Statistics in terms of feasibility functionality, accuracy and
efficiency?
Findings
The Test Checker and Item Analyzer with Statistics is most useful, as perceived by the
elementary and secondary school teachers, as shown by the general weighted mean of 4.56. In
terms of the level of acceptability, the teachers in elementary and high school perceived the
TCIAS as most acceptable with respect to efficiency (4,71), accuracy (4.39), and functionality
(4.35). The technology is more acceptable (3.92) with respect to feasibility (it affordability and
being user-friendly).
Conclusion
The TCIAS is perceived by the elementary, secondary and tertiary school teachers to be
most useful and most acceptable in the conduct of item analysis, test norms, and competency
level. This implies that the teachers acknowledge that with the use of the system, they will be
more efficient and accurate in doing the job. They likewise acknowledge that the system is
functional, affordable by their school, and can be used without much problem
Implications and Recommendations

In view of the foregoing findings, the researchers recommend:
a) mature technologies for dissemination
Since the TCIAS has been found to be efficient and effective, it is recommended that a
seminar be conducted in each division in every region of the country to orient teachers and
administrators on the use of the software. This can be done by the Project Team with the
cooperation of the DepEd, the CHED, the TESDA, and the DOST.
b) research and development breakthrough
This project is the first of its kind in the country, and most probably, in the whole world.
The government is therefore expected to support the development of this project that will
certainly put the Philippines in the map specifically in the field of inventions that are vital to
national development. Furthermore, a study on electronically operated data-bank for the
accepted items can be made by future researchers to simulate saving and retrieving related
topics on the cognitive level.
c) technologies that need further verification
The sensitivity of the scanner of the TCIAS is limited to a larger shaded area. There is
therefore a need for the researchers to acquire a photo sensor that can recognize even a single
line, though admittedly, what is currently in circulation is similar to the sensor used in this
device.
d) result that can be formulated/solution to a specific problem
With the TCIAS, teachers who used to neglect and abandon item analysis in favor of
other teacher-related tasks, can now be helped. Using the data gathered from item analysis and
test norms on the reliability and validity of the test questions, teachers can now formulate
questions that will challenge the critical thinking of their students. Since teachers should not
settle for mediocrity in their schoolwork, and should instead strive for excellence, their tasks
can be facilitated if they recommend to the school the acquisition of the TCIAS.
e) result for policy, planning, formulation and implementation
Since the DepEd, TESDA, DOST, and the CHED all desire to have quality education in
the country, it is recommended that, as a matter of policy, these agencies require educational
7
institutions, most specially the schools considered as centers of excellence, to acquire the
TCIAS to help facilitate the task of item analysis, and thus alleviate the workload of their
teachers. The DepEd must, in fact, set the example by acquiring one for the National
Educational Testing Center. Likewise, it is further recommended that every school under the
supervision of the DepEd, the CHED, DOST, and the TESDA, be enjoined to develop a data
bank of questions by disciplines, consisting of the questions accepted by the system. Item cards
or an electronic data bank for permanent record for each item should be made.
Literature Cited
Anastasi, Anne, C1988, "Psychological Testing", Macmilan lane. United State of America, 6th
edition, 117 p
Brown, Frederick G., "Measuring Classroom Achievement", Halt Richard and Winston, U.S.A.",
c1981, pp. 101-110, 224p
Ebel, Robert L. and Prisbie, David A., "Essentials of Educational Management", Printice Hall
Inc. N.J., 4th Edition, c1986, pp 226-240
Hopkins, Charles D. and Antes, Richard L., "Classroom Management and Evaluation", F.E.
Pencock Publishing, Inc. U.S.A., 3rd Edition, c 1990, pp 537-554, 273 p
Hopkins, Kenneth D. / Stanley, Julian C, "Educational and Psychological Measurement
and Evaluation", Prantice Hall, lee. N.J., 6th Edition, c1981,pp 269-288, 217 p.
Jacob, Lucy C. & Chase, Clinton I., C1993, "Development and Using Tests Effectively", San
Francisco, Jersey Publishing
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items.
Journal of Educational Psychology, 30,17-24
Linn, Robert L. and Gronlund, Norman E., "Measurement and Assesment in Teaching",
Macmillan Publishing Co, U.S.A., 7th Edition, c1995, pp 318-320. 360 D
Noll Victor H., "Introduction to Educational Measurement11, Cambridge, Massachusetts, Printed in
U.S.A., c 1957, pp. 148
Oosterwolf, Albert, "Classromm Application of Educaional Management'', Macmillan
Publishing Co. N.Y., 2nd Edition, c 1994, pp 196-208, 474 p
Sax, G., "Principle of Educational and Psychological Measurement and Evaluation", c1989, 3rd
edition, pp 227-253 Belmont, CA,
Wadsworth Thorndike, Robert L & Hagen, Elizabeth, "Measurement and Evaluation in
Phychology and Education", 5rd Edition, c1991 New York. pp. 124-128 554 p
Recognition/Awards Received, Papers Published

Certificates of Recognition from various universities and organizations -- e.g. Mapua
Institute of Technology, Aquinas University, Meralco Foundation, Ateneo de Naga University,
Bicol University, Camarines Norte State College, Camarines Sur Polytechnic College,
University of Nueva Caceres, Universidad de Sta. Isabel, Mabini College, La Consolacion
College of Daet, Ago Foundation, Inc., Divine Word College, etc.
Certificates of Recognition from various high schools - e.g. Philippine Science High
School (Regional campus), Naga City Science High School, Tigaon National High
School, St. Agnes, Academy, Ocampo National High School, Calabanga National High School,
etc.
Certificate of Recognition from DOST as participant/exhibitor in local and national
exhibits Web Page Visitors http://sili.adnu.edu,ph/~czar 5,568 visitors
8
Mayoral Award – The City of Naga, June 17, 2006

Items Papers

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Items Papers

Transféré par

Droits d'auteur :

Formats disponibles

1

The value of good examinations cannot be undermined. Any reputable institution—public or

Statement of the Problem

Materials and Methods

Results and Discussions

Figure 2: The Test Analysis Figure 1 is used to show the

Implications and Recommendations

Recognition/Awards Received, Papers Published

Mayoral Award – The City of Naga, June 17, 2006

Vous aimerez peut-être aussi