Académique Documents
Professionnel Documents
Culture Documents
Abstract
Keywords:
1.0 Introduction
The increasing number students in programming classes(Hidekatsu, Kiyoshi, Hiko, & Katsunori, 2006;
Hussein, 2008) warrants the use of computer supported systems in relieving lecturers’ academic
tasks. Student assessment has become an issue because of the large number of students involved.
For a fundamental course such as computer programming, it has become necessary to be able to
oversee students’ weekly progress to make sure that every students is at par with one another.
Realizing of the difficulties involved, Computer-Assisted Assessment (CAA) as explained in (Janet et
al., 2003), might be able to help reduce the marking chores and results management. As a
consequence, the number of assessment can be increased gradually and student performance can be
monitored very closely.
We are also interested in investigating the feasibility, scalability of the system if it were to be widely
implemented. The performance as well as accuracy of the system output will also be evaluated. While
some measurement can be obtained direct from the running system other more subjective measures
will be evaluated through users perceptions.
According to (Christopher, David, & James, 2005), the generations of assessment systems can be
divided into Early Assessment Systems, Tool-Oriented Systems and Web-Oriented Systems. We
classify our CAA system as Tool-Oriented Systems that are developed using pre-existing tool sets and
utilities supplied with the operating system and programming environment. An example of Tool-
Oriented System can be seen in the work of (David & Michelle, 1997) that introduces a scheme that
analysed submissions across several criteria. The system was named as ASSYST and has capability
to analysed the correctness, efficiency, complexity and style of a program. The BOSS system (Mike,
Nathan, & Russell, 2005) that is similar to ASSYST, ran on the Unix operating system and used for C
programming assessment. The latest version of BOSS facilitates JAVA GUI application for the tutor
grading and assignment management. Michael (Michael, Steven, Ann, & Vallipuram, 2004) details a
system called GAME for grading variety of programming languages by comparing the program outputs
with marking scheme written in XML scripting. The system can examine program structure and
correctness of the program’s output. It has also been tested on a number of student programming
exercises and assignment. The analysis of comparing human marking with GAME system provide
encouraging results.
According to (James, 2008), the student chapter of the ACM at the College of Charleston hosts
programming competitions for high school students every year. Current programming competitions
assessment method stresses on completing and running program rather than the structure or any
basic elements of a program. According to him these other evaluation aspects merit consideration as
well. The concept of black-box testing has been extensively used in many programming competition
judging process. Black-box testing takes an external perspective of the test object, while white-box
testing evaluate the object’s internal structure. James has introduced new paradigm of programming
contest in his paper (James, 2008) that involve incorporating both technical and artistic criteria in the
assessment process.
3.0 Approach
3.1 System architecture
a. Laboratory setup
The architecture of physical laboratory setup is depicted in figure 1. The setup consist of a number of
computers linked together with a network switch. There is also a need for Server to host a File
Transfer Protocol (FTP) application server. Each user or team will be allocated an account to the
server. Students or contestant will have to submit their programs to be stored on the server upon
completing them. We have also made the necessary configuration so as to isolate the participant from
each other even though they are connected to the same network. Access to outside network such as
the Internet has also been disabled.
Some tools have been used to support automated processing such as data processing interface,
program testing and optionally program benchmarking. The overall automated processing is depicted
in figure 2.
Start
[ Optional ] Compilation
benchmark acceptance
End End
The main reason of functional testing is to verify the program output send by student. The verification
process can be done by manually view the output files or by performing similarity test, which has been
illustrated in figure 3. The tested program will be accepted if passess the similarity test. We determine
the acceptance by comparing the similarity value with the set treshold. In our experiment the treshold
value has been set from 50%-80%. The higher the treshold value the more the rigid the similarity test
becomes.
User’s program
Acceptance /
Treshold value Similarity test Rejection
Answer scheme
d. Performance test
Performance measure is collected through the testing process shown in figure 4. It is however, not
always necessary to run performance tests unless the problem given to the students requires complex
functions in which case the judges will be interested to evaluate the efficiency of the implemented
functions. It is impractical to compute the Big-O complexity(Arefin, 2006) for each of the programs
given the limited assessment time. The performance measure is therefore the closest yardstick to
gauge the elegance and programming style of the contestants (or students).
Compiler
Profiling data
Profiler Performance
report
In line with the practice in any programming competition, we only allow a one-time submission for any
programming problems. After the submission of any particular program, the system will lock the file,
preventing any further modification. Students will have to test and evaluate their own program
thoroughly before submitting the answer. Whenever a program is submitted, the system will record its
submission time. This mechanism allows the calculation of time taken to complete (TTC) the program.
The assumption is that we make is that every students work on one problem at a time. Submission of
one solved problem therefore mark the beginning of work on the following program to be submitted.
The calculation of time taken is still relevant even if the students are given a number of questions at a
time. Their decision on which problem to solve first does not affect the calculation of TTC. Security
mechanisms employed in the FTP server limit student’s access only to their dedicated folder. This will
prevent any attempt to sabotage other competing teams or any attempt to plagiarize other teams’
work. The networking equipments are also configured to allow access only from clients to servers.
Client to client communication and client to Internet communication are blocked in order to maintain a
totally isolated environment.
The questions are designed in such a way that weekly progress can be evaluated easily. Each
questions will be tailored for specific learning objectives/topics. To distinguish between good and
average students, however there will be questions that regroups several topics at once. These
questions will be stored in a dedicated questions banks and will not be provided to students except
during the administration of the tests. The simple reason behind this is to make sure that future
students will not have already been exposed to the same questions hence rendering further
experiments invalid or biased.
Three criteria have been selected for the evaluation of the student programs: functionality, time to
complete and program performance. Functionality is measured simply by conducting a black-box
testing of the program. If the program conforms to the set requirements it is accepted otherwise it will
be rejected (Hussein, 2008). Time to complete is obviously the duration taken by the students to
complete a functional/acceptable program. It is recorded once a student submits a program to the FTP
server.
c. Students performance
Since the system is adapted to teaching environment we have found it necessary to introduce two
derived measures: competency and proficiency.
We define competency as a measure that link learning objectives/topics to the validity of the program
that they write. A program is considered valid only if it successfully compiles, runs and produces
output that has a similarity value greater to the set threshold. When a program passes the validity
check we record the mark in the respective student record. Proficiency is define in this context as the
ability to produce the correct program within a reasonable time. In certain cases we also link
proficiency with the ability to produce elegant solutions within reasonable time. Reasonable time is in
turn defined by the average time-to-complete (any specific program) of the whole class or student
batches. While there might be more complicated measure of elegance in programming, we limit
ourselves in this work by relying on the performance report of each of the program.
Table 1: Student’s competency by topic
Student Name
Lab
Topic
test Run Similarity Total Marks
(0 / 1) (4*) (5)
4 Repetition For 1 1 2
5 Repetition While 0 0 0
6 Functions 0 0 0
*weighted score
Student Name
Lab
Topic Time to Total
test Run Performance
complete Marks
(0 / 1) (4*)
(5*) (10)
4 Repetition For 1
5 Repetition While 0
6 Functions 1
*weighted score
We have successfully implemented the system and have had our students used it for a period of four
months. At the end of semester, we conducted a study on the students perception towards the
incorporation of the new tool in their learning environment. We have used some similar questions as
was done in (Hussein, 2008). The purpose of this survey is to gather students’ view on several
aspects such as the fun element, self-awareness, discipline, motivation, accuracy of the system in
marking and the overall impact on students learning experience.
The subjects of our study was a class of 30 students undertaking a computer programming class. All
of them are Part 2 Quantitative Science (CS113) students. The subject taught was Introduction to
Computer and Problem Solving (using C++). Total coursework marks allocated for the subject is 40%,
and 20% of it comes from the lab test conducted using the new system.
4.0 Results and Analysis
Thirty students were randomly divided into ten groups of programming teams. They were given ten
questions of medium difficulty and asked to answer as many as they can within a period of two hours.
They were given the report generated by the system at the end of the lab session and were asked to
look at their answers and compare the results obtained with the other teams. Finally each one of them
was given a questionnaire to complete as shown in table 3.
Question N 1 2 3 4
1. Do you find the system enjoyable to use in a learning 2 1 1 18 8
environment?
2. Rate how the system increases your motivation in 0 0 1 19 10
learning computer programming?
3. Do you find the marks given by the system fair? 0 2 2 15 11
4. Rate how the system helps you identify your 0 0 0 12 18
weaknesses in programming.
5. Rate how the system helps increase your team spirit. 3 0 0 8 19
6. Rate your overall experience 0 0 4 19 7
The survey result shows that 86.6% of the student found the experience enjoyable. One student did
not like the experience at all while two other students did not answer. 96.6% of respondents conclude
that the system helps increase their level of motivation in learning computer programming. All except
one student agreed the marks awarded by the system were fair. The result also shows that 100% of
the respondents think the system could help them pinpoint their weaknesses in computer
programming. All except three students find the exercise helps them increase their teamwork. Finally
86.6% of the respondents agree the overall experience was good.
We were interested to know how the system will perform in an actual teaching environment. We have
therefore selected 10 programming questions of average difficulty. The instruction given to our
students was to answer correctly as many as they can, as fast as they can. Below are the details of
the setup that have been used in the performance test.
The test conducted has shown that any instructor would require typically less than four minutes to
complete the assessment process given a class of thirty students. Even if the number increases to
forty students, one would require only about six minutes to complete. This is of course a big leap
compared with the old way of manual marking.
We have not considered the delay taken for the transfer of the programs to the server. The reason is it
happens before the assessment process; therefore it does not affect the performance of the
assessment system.
Given the setup and resource requirements to implement the system, we can conclude that it is
indeed a scalable system. The same setup can be replicated in other labs to achieve the same level of
productivity.
Through the experience we must also acknowledge that the system is only capable of assessing the
functional aspect of the program given the verification method employed here is a black-box test. We
are thus unable to measure aspects such as elegance of the program, except when it is tied to its
performance. We are also not able to test programs behaviour when it come to handling illegal input
and exceptions.
Future work can be directed towards improving the similarity checking function, to allow more flexibility
when comparing outputs of the programs with the answer scheme provided by the instructors. A better
similarity checking function can also be used in detecting plagiarism in the computer programs
submitted to the system.
References
Arefin, A. S. (2006). The Art of Programming Contest (2nd Editon) (Special Online Edition ed.):
Gyankosh Prokashoni.
Christopher, D., David, L., & James, O. (2005). Automatic test-based assessment of
programming: A review. J. Educ. Resour. Comput., 5(3), 4.
David, J., & Michelle, U. (1997). Grading student programs using ASSYST. Paper presented at
the Proceedings of the twenty-eighth SIGCSE technical symposium on Computer science
education.
Hidekatsu, K., Kiyoshi, A., Hiko, M., & Katsunori, M. (2006). Using an automatic marking system
for programming courses. Paper presented at the Proceedings of the 34th annual ACM
SIGUCCS conference on User services.
Hussein, S. (2008). Automatic marking with Sakai. Paper presented at the Proceedings of the
2008 annual research conference of the South African Institute of Computer Scientists and
Information Technologists on IT research in developing countries: riding the wave of
technology.
James, F. B. (2008). A new paradigm for programming competitions. Paper presented at the
Proceedings of the 39th SIGCSE technical symposium on Computer science education.
Janet, C., Kirsti, A.-M., Ursula, F., Martin, D., John, E., William, F., et al. (2003). How shall we
assess this? Paper presented at the Working group reports from ITiCSE on Innovation and
technology in computer science education.
Michael, B., Steven, G., Ann, N., & Vallipuram, M. (2004). An experimental analysis of GAME: a
generic automated marking environment. SIGCSE Bull., 36(3), 67-71.
Mike, J., Nathan, G., & Russell, B. (2005). The boss online submission and assessment system. J.
Educ. Resour. Comput., 5(3), 2.