Vous êtes sur la page 1sur 31

Synopsis Report On

Novel Approach to evaluate student


performance using Data Mining
Submitted in partial fulfillment for the award of the degree
Of
BACHELOR OF ENGINEERING
In
INFORMATION TECHNOLOGY
By

Rahul Raghavan
Manas Saxena
Sagar Wahal

Under the guidance of
Mr. Anil Vasoya
Designation
Assistant Professor (IT)

Academic Year 2013-2014










Synopsis Report On
Novel Approach to evaluate student
performance using Data Mining
Submitted in partial fulfillment for the award of the degree
Of
BACHELOR OF ENGINEERING
In
INFORMATION TECHNOLOGY
By

Rahul Raghavan
Manas Saxena
Sagar Wahal

Under the guidance of
Mr. Anil Vasoya
Designation
Assistant Professor (IT)
Academic Year 2013-2014



ACKNOWLEGEMENT
We are foremost thankful to the Principal of our college Dr. B.K. Mishra who has taken
strenuous efforts in providing us with excellent lab facilities.
We are greatly indebted to our internal project guide Prof. Anil Vasoya for his guidance
and enlightened comments, which has helped us in better understanding our project work.
We would like to thank him for his helpful suggestions and numerous discussions which he
has guided us.
We are also thankful to our Head of Department Dr. Kamal Shah and Project Co-
coordinator
Dr.Vinayak Bharadi who always gave us constant motivation guidance and
encouragement for the project.
We are also grateful to our classmates and friends who have given us feedback and
encouragement
Finally we would wish to thank our college Thakur College of Engineering and
Technology for providing us with a platform and the necessary facilities to make this
project

Name of Students:


Rahul Raghavan


Manas Saxena


Sagar Wahal




ABSTRACT
Data mining is a process of extracting hidden information from huge volumes of data. The
various data mining techniques used are Classification, Clustering and Association mining.
All these techniques can be applied to educational data to predict a students academic
performance and also to determine the areas he is currently lacking in.
The student can evaluate his performance and find out area to improve. In order to increase
his percentage. While calculating a students performance we take into consideration a
students marks in previous semesters and his term test marks, attendance, viva marks and
other factors.
Here we use One R algorithm and Frequency table to predict the score which determines
how important a particular area is. The accuracy of this algorithm can be measured by
comparing the predicted score with the actual score.
Teachers can forward the result of students report. They can also determine which
students are currently lacking based on their marks and other factors. Using this data
teacher can motivate a student to improve his performance in a particular area. Also
students can view the report themselves and can make improvements based on area which
they are lacking in.



CERTIFICATE
This is to certify that Rahul Raghavan, Manas Saxena and Sagar Wahal are the bonafide
students of Thakur College of Engineering and Technology, Mumbai. They have
satisfactorily completed the requirements of the PROJECT-I as prescribed by University of
Mumbai while working on Novel approach to evaluate student performance using Data
Mining.



(Signature) (Signature) (Signature)
Name: Name: Name:
(Internal Guide) (Internal Examiner) (External Examiner)











Thakur College of Engineering and Technology
Kandivali (E), Mumbai 400101

Place:
Date:
(College Round Seal)
(Signature)
Name:
(Head of department)


(Signature)
Name:
(Principal)
C O N T E N T S

Chapter No. Topic

Page No.

Chapter 1 Overview

1.1 Importance of Project
1.2 Literature Survey
1.3 Motivation
1.4 Scope of the Project

1

1
2
4
5
Chapter 2 Proposed Work

2.1 Problem Definition
2.2Methodology
2.3 Data Flow Diagram
2.4 As per guides instructions
6

6
8
13
17
Chapter 3 Analysis & Planning

3.1 Feasibility Study
3.2 Project Planning
3.3 Gantt Chart
18

18
21
22
Chapter 4 Results & Discussion

23

Chapter 5

Conclusion 24
References 25


1

Chapter 1: Overview
1.1 Importance of the project

Evaluation is a systematic process of collecting, analyzing and interpreting evidences of
students progress and achievement both in cognitive and non-cognitive areas of learning for the
purpose of taking a variety of decisions. Evaluation, thus, involves gathering and processing of
information and decision-making.

The present system of evaluation at school stage suffers from a number of imperfections. The
first and foremost shortcoming of the evaluation system is that it focuses only on cognitive
learning outcomes and completely ignores the non-cognitive aspects which are a vital component
of human personality.

Another shortcoming of the present examination system is that the results are declared in terms
of raw marks which also depend on the subjectivity of the examiner.

In our project we try to extract useful knowledge from graduate students data collected from
Thakur College of Engineering & Technology- Mumbai. Here, we use various data mining
algorithms to evaluate students performance. By using these algorithms we extract knowledge
that describes students performance at the end of the semester examination. It also helps earlier
in identifying the dropouts and students who need special attention and allow the professor to
provide appropriate advising and counseling.

This project attempts to correct the fallacies of the current system of student evaluation. The
project intends to extract knowledge from the raw data present. This information can help the
college management get an insight into the strengths and weaknesses of a student. Armed with
this information the college management can help students work on their personal weaknesses.
The project also contains a tool that can predict the performance of the students in future
examinations.
This project provides an innovative approach towards the evaluation of student performance. It
enhances the reach of the current system which helps the students grow as individuals.











2

1.2 Literature Survey
1.2.1 Use of Data Mining Techniques for the Evaluation of Student
Performance: A Case Study
ABSTRACT:
In this paper the author introduces the concept of extracting information from large volume of
database of Sri Sai University- Palampur. The author uses marks obtained by students in their
post graduate exam and other factors. Also the authors introduces various techniques to improve
post graduate students performance and identify students with low grades. The data include one
and half year period of data. Authors use Clustering, Decision Tree and Neural Networks are
used evaluate students performance. It also helps in identifying dropouts and students who need
special counseling.
The drawback of this system is:
The system only takes into consideration the marks of the students. It completely ignores
the non-cognitive factors. We believe that those factors have a lasting impact on the
performance of the student.
The system does not provided any suggestions for future options for the student
The system does not evaluate the strengths and weaknesses of the student[1]
Author: Er. Rimmy Chuchra

1.2.2 Predicting students performance using ID3 and c4.5 Classification
algorithms

ABSTRACT:
This paper introduces the concept of predicting a students marks based on previous
performances
The authors takes into consideration number factors like scores in board examinations of
classes X and XII
The system uses a number of data mining algorithms like ID3 and C.5 to predict the
marks accurately
However the drawbacks of this system are:
3

The system does not take into consideration a students family background ,socio
economic factors and friend circle
Also the system does not suggest ways a person can improve his marks
The system also does not give proper results in case of missing data[2]
Author(s): Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao


















4

1.3 Motivation
One set of the existing system of student evaluation process involves analysis of newly generated
data from separate examination conducted solely for the system.
In this project we perform data mining on the data which is already available. This provides easy
integration of our system with the current system. In addition, we also include non-cognitive data
like family background, friend circle etc.
Another set of system which we studied evaluated the student based on existing system but it
did not include the analysis of strengths and weaknesses of a student. Another important aspect
we did not find in the systems we studied was the inclusion of future options.
Thus, broadly speaking the our motivation for the project is to provide a student evaluation
system that can be easily integrated with existing system and in addition to that provide
mathematically calculated recommendations about the ways to improve the performance in the
coming semester.












5

1.4 Scope of the Project
The system will take into account a number of factors by gathering data about a students
Semester marks, Term Test Marks, Attendance, Students background and various other factors
from Thakur College of Engineering and Technology (Mumbai), IT Department and predicting
student marks. All these factors will be taken into consideration while designing the final project
which could be used by a student for decision making process. This information can be used by a
student to monitor his progress. Also this can be used by a student to determine his academic
strength.
This system will analyze the performance of the student and highlight those parameters on which
the student needs to work on in order to improve their performance in the near future.
















6

CHAPTER 2: PROPOSED WORK
2.1 Problem Definition
The primary purpose of our project is to provide a novel approach to evaluate student
performance using data mining. Data mining is a type of sorting technique which is actually used
to extract hidden patterns from databases. The major advantages of using data mining are the fast
retrieval of data or information, Knowledge Discovery from databases, detection of hidden
patterns, and reduction in the level of complexity, time saving etc.
The main objective of educational institutions is to provide high quality education to its students
and to improve the quality of managerial decisions. One way to achieve high quality education is
by discovering knowledge from educational database and using it to create an environment that
helps students grow better.
The application has the following objectives:
To predict student performance on the basis of both congestive and non-congestive
parameters. The parameters are as follows:-
Aggregate Marks
Term Work
Term Test Marks
Viva Marks
Practical Marks
10
th
Marks
12
th
marks
Attendance
Family background
Hostelite or days scholar
Friend Circle
Educational background of father and mother
MH-CET marks
Any Live KTs or not
Educational background of siblings
Current posting of siblings
Current posting of father
Income of family
Mothers job profile
After prediction of the performance of the student, analysis based on different parameters
used to make the prediction.
Identifying the strengths and weaknesses of the student on the basis of this prediction.
7

Communicating to the student the parameters on which they need to work in order to
improve their overall performance.
Flexible design strategy which allows future updating and improvements.
Round the clock availability.
Easy and understandable graphical user interface.
Formidable security measures to ward off any attacks on database of students.
We expect this application to be used by college professors and administrators for evaluating
student performance and taking important managerial decisions. The primary objective of
this application is to provide a detailed evaluation of a student to the professors so that they
can condition their teaching style to suit the needs of the student. The information provided
by the application can also be used by visiting companies to filter out those candidates that
suit their requirements.


















8

2.2Methodology
1.2.1: One R Algorithm:
Description: One R, short for "One Rule", is a simple, yet accurate, classification algorithm that
generates one rule for each predictor in the data, and then selects the rule with the smallest total
error as its "one rule. The basic idea behind this algorithm is to test every single attribute and
branch for every value of that attribute. In our case we are predicting how a particular student
can improve is performance using other attributes like Term Test marks, Viva Marks and other
factors.
Algorithm:
Use of One R algorithm to calculate the weight age to be given to each parameter.
Classifying each parameter in ranges namely high , medium and low
Classifying the target attribute into high ,medium and low
Calculating success percentage of each parameter
Now we calculate total error for each frequency table and find the frequency table with minimum
or low total error. A low total error means higher contribution to improve the accuracy of the
model
One Rule Algorithm on a chosen data set:
To illustrate we have collected the sample data of students performance. We already know the
marks student have obtained in semester 4. Here we use this information to calculate the impact
each input parameter has on the final result.

Figure 1.2.1.1 Sample data of students
H-High, L-Low, M- Medium
9

We have classified our input parameters into the following categories:
Aggregate marks in previous semester:
Above 70 High
Between 60 and 70 inclusive Medium
Below 60- Low
Attendance in this semester:
Above 80-High
Between 60-80 inclusive-Medium
Below 60 Low
Term work in this semester:
Above 85-high
Between 75-85 Medium
Below 75 Low
Viva marks in this semester:
Above 85- High
Between 80 to 85 Medium
Below 80 Low
Our class level attribute here is our semester 4 marks. We take each parameter and attempt to
match it with our class level attribute. Example:-
If input parameter Attendance is High and our class level attribute Semester 4 marks are
also High then we have a match. Similarly we calculate the total count of all such matches
from our sample data. We finally get the following result:
Frequency Table:
For input parameter: 3
rd
semester marks:

Class level attribute: Semester 4
High Medium Low
Semester 3
High 2(Match) 1 0
Medium 1 1(Match) 0
Low 0 0 2(Match)
Table 1.2.1.1 Frequency Table generated for the parameter 3
rd
semester marks
10

Now we calculate the success rate of the total count of matches with the following formula
Success Rate= (Number of successful match / Total number of samples)*100
Example Success Rate for input parameter 3
rd
semester marks= ((2+1+2)/7)*100=71.42%
Success rate is= 71.42%
For input parameter: Attendance:

Class level attribute: Semester 4
High Medium Low
Attendance
High 3 1 0
Medium 0 1 0
Low 0 0 2
Table 1.2.1.2 Frequency Table generated for the parameter Attendance
Success rate= 85.71%
For Term work:

Class level attribute: Semester 4
High Medium Low
Term
work
High 2 1 0
Medium 1 1 0
Low 0 0 2
Table 1.2.1.3 Frequency Table generated for the parameter Term Work
Success rate=71.42%






11

For Viva:

Class level attribute: Semester 4
High Medium Low
Viva
High 3 0 0
Medium 0 1 0
Low 0 1 2
Table 1.2.1.4 Frequency Table generated for the parameter Viva
Success rate=85.71%
We have now calculated the percent success rates for each parameter individually. Now, we
calculate the impact each parameter has on the final aggregate score of the student.
We do this by calculating the overall impact of each of these factors in predicting the final result
Parameter Impact Rate= Success Rate / Success rate
Example:-If I want to calculate the Parameter Impact Rate for Input Parameter Attendance it will
be done as follows
Parameter Impact Rate for attendance= 85.71/ (71.42+85.71+71.42+85.71) = 0.27
Input Parameters Success rate (%) Parameter Impact Rate(PIR)
Semester 3 71.42 0.22
Attendance 85.71 0.27
Term Work 71.42 0.22
Viva 85.71 0.27
Table 1.2.1.5 Table for Parameter Impact Rate
Once we have obtained the PIR for all the parameters we will use this information to predict the
marks of the student in the current semester. We have taken the example of student Shivam
Thakur and predicted his marks for 5
th
semester on the basis of the PIR we obtained.
To predict his marks for the current semester we will have to calculate his Parameter Impact
Score (PIS) for the previous semester. After that we will calculate the PIS for the current
semester and then by unitary method predict the marks for the current semester. PIS can be
obtained by the formula given below:
Parameter Impact Score (PIS) = Parameter Impact Rate*(Parameter Value)
12

Ex:-So in case of input parameter 3
rd
Semester marks the PIS will be:
PIS= 0.22*73=16.06
Now, here we calculate the PIS of Shivam Thakur for his score in Semester 4
Name PIS for 3
rd

Semester Marks
PIS for Attendance
(4
th
semester)
PIS for Term Work
(4
th
semester)
PIS for Viva
(4
th
semester)
Shivam Thakur 16.06 24.84 19.8 24.84
Table 1.2.1.6 Table for Parameter Impact Score
Mean PIS for Semester 4 is = (16.06+24.84+24.84+19.8)/4
Mean PIS for Semester 4 = 21.39
We now know that when Shivam Thakur obtained mean PIS of 21.39 his aggregate score was
78%
Similarly we will calculate the PIS of individual input parameters of Shivam Thakurs for
Semester 5
Name 4
th
semester
marks
Attendance
(5
th
semester)
Term Work
(5
th
semester)
Viva
(5
th
semester)
Shivam Thakur 78 80 85 78
PIS 17.16 21.6 18.7 21.06
Table 1.2.1.7 Table for Parameter Impact Score
Mean PIS for 5
th
semester = (17.16+21.6+18.7+21.06)/4 = 19.63
So, when Mean PIS of Shivam Thakur was 21.39 he obtained 78 % marks. Therefore when mean
PIS value is 19.335 the marks he obtains will be:-
Predicted Percentage for Semester 5 is = (78/21.39)*19.335=71.58
However his Actual 5
th
semester marks =73.2%
Hence we were able to make an approximation of the marks he will obtain in semester 5.



13

2.3 DATA FLOW DIAGRAM
LEVEL 0 DFD











Fig 2.3.1: Data flow diagram Level 0







Login
Faculty
Administrator
Database
14

LEVEL 1 DFD



















Fig 2.3.2: Data flow diagram Level 1
Administrator
Verify Student
Student
Add/delete
records
Update Data
Faculty Database
Verify Faculty
User Database

Login
User
Register
Student Database
Faculty
15

LEVEL 2 DFD Students















Fig 2.3.3: Data flow diagram detailing Students flow




Faculty
Login
Edit Personal
Details
Generate Reports
Faculty Database
User
User database
Register
Student Search
Report errors
Analysis
16

Level 2 DFD Faculty












Fig 2.3.4: Data flow diagram detailing Facultys flow








Faculty
View Student
Data
Mail Report
Generate
Report
Login
Placement
Generate
Reports
Feedback
Student Database
17

2.4 As per guides instructions
The following modifications were suggested by our guide during the design phase of the project:
Prediction:
Our guide suggested that we increase the number of parameters which we would use to predict
the performance of the students. After further research and evaluation we came up with some
additional parameters that will be used to predict the performance of the students:
Hostelite or days scholar
Educational background of father and mother
MH-CET marks
Any Live KTs or not
Educational background of siblings
Current posting of siblings
Current posting of father
Income of family
Mothers job profile










18

CHAPTER 3: Analysis and Planning
3.1 Feasibility Study:
As per our project Novel Approach to evaluate student performance the total requirement for
setting up the project is given below:
Time Feasibility: Our project requires only software so the time required for the project is
68months
WE ARE DESIGNING OUR SOFTWARE ACCORDING TO SOFTWARE DEVELOPMENT
LIFE CYCLE [SDLC]
SR No. CRITERIA TIME PERIOD
1. Feasibility Study 0.5 months
2. Analysis and Data Gathering 1.5 months
4 Design of project 1.5 months
4. Implementation (Coding) 2 months
5. Testing and Finalization 1 months
6. Maintenance 1.5 months
TOTAL 8 months
Table 3.1.1: Software development lifecycle

SOFTWARE REQUIRED: The whole project is designed using JAVA technology
19

Project Schedule:
VII Semester Timeline:
Name Duration Start Finish
1.Requrement Analysis 25 days 2/08/2013 27/08/2013
1.1 Software specification 4 days 2/08/2013 6/08/2013
1.2 Presentation 7 days 6/08/2013 13/08/2013
1.3 In house requirement
specification
2 days 13/08/2013 15/08/2013
1.4 SRS 8 days 15/08/2013 23/08/2013
1.5 Requirement
Gathering
4 days 23/08/2013 27/08/2013
2.Analysis 12 days 27/08/2013 8/09/2013
2.1 User Requirements 3 days 27/08/2013 30/08/2013
2.2 Functional
Requirements
5 days 30/08/2013 4/09/2013
2.2 Non functional
Requirements
4 days 4/09/2013 8/09/2013
3. Design 21 days 8/09/2013 29/09/2013
3.1 Architecture Design 6 days 8/09/2013 14/09/2013
3.2 Database Schema 7 days 14/09/2013 21/09/2013
3.3 Graphical User
Interface
8 days 18/09/2013 29/09/2013
Table 3.1.2: VII Time Line



20

VIII Semester Timeline:
Name Durati
on
Start Finish
4. Coding /Implementation 60
days
25/01/2014 25/03/2014
4.1 Database Creation 3 days 25/01/2014 28/01/2014
4.2 Software Development 12 days 28/01/2014 10/02/2014
4.3 Database Integration 4 days 10/02/2014 14/02/2014
4.5 Coding and Implementation 20 days 14/02/2014 6/03/2014
4.4 Integration 6 days 6/03/2014 12/03/2014
4.5 Implementation of
Application
20 days 12/03/2014 25/03/2014
5. Verification and Testing 30
days
25/03/2014 25/04/2014
5.1 Unit Testing 5 days 25/03/2014 30/03/2014
5.2 Stress Testing 5 days 30/03/2014 05/04/2014
5.3 Alpha/Beta Testing 6 days 5/04/2014 11/04/2014
5.4 Acceptance testing 5 days 11/04/2014 16/04/2014
5.5 Performance Testing 5 days 16/04/2014 21/04/2014
5.6 Modification 4 days 21/04/2014 25/04/2014
Table 3.1.3: VIII Time Line










21

3.2 Project Planning
The goal of our system is to evaluate ,predict and improve students performance.A student can
also monitor his progress.The system will also help students as well as companies to improve
their placement process.
The key stakeholders of this system are:
1. Students
2. Faculty
3. Administration
4. Development Team
Student:
The reports can be mailed to the student so that he/she can analyze his/her performance
And improve his /her performance
Faculty:
Faculty will visit the system the most.Faculty will be able to view a students weakness and
strengths ,help to improve them accordingly.
Administrator:
The administrator will be responsible to maintain the system.Administrator will update a
Students records.Administrator will be responsible to report errors made by the Facuty and
forward it to college.Administrator will be responsible to add/delete/edit student information.
Development Team:
The development team will be repsonsible for checking any bugs in the system. Also the
development team will report System critical error and provide patches for the system.
Development team will be reponsible for adding new functionalities to the system
Project Deliverables:
Platform to interact between between students and faculty
Customized report generation for faculty of students
Administrator can edit/delete/add student information


22

3.3 Scheduling(TimeLine chart):

Fig 3.3.1: Gantt Scheduling chart created using Microsoft Visio


23

CHAPTER 4: RESULTS & DISCUSSION
Our application will use One R algorithm to evaluate a students performance. Our system uses a
number of parameter to calculate a parameter index score for the student and evaluate them on
the basis of this score. The student will be able to visualize which topics or areas they are
currently lacking in. Faculty will mail the analysis report to the student. The student can improve
upon the areas he is currently lacking in to improve his overall score. For example if a student
notices that his attendance parameter is not up to the mark then he can work upon improving his
attendance. This would ultimately help him understand his weak area and allow him to focus
more on that area.













24

Chapter 5: Conclusion
The educational system is the backbone of progress and development of any society. Greater the
ability of the education system to improve the performances of its students better the chance of
the society to produce successful citizens. Keeping this fact in mind it is necessary to constantly
work towards a more sophisticated education system.
Data mining is an incredible concept which provides us hidden information from voluminous
and exhaustive databases. Data mining can provide many solutions towards making a stronger
education system. Our project is a stepping stone towards the integration of technology and the
education system.













25

References:
[1] IJCSMR paper-Er. Rimmy Chuchra Use of Data Mining Techniques for the Evaluation
of Student Performance: A Case Study
[2] IJDKP paper-Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul
Honrao PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5
CLASSIFICATION ALGORITHMS
Books: Data Mining Concepts and Techniques Jiawei Han and Micheline Kamber

Websites
Java-ww.oracle.com
Wikipedia- www.wikipedia.com
One Rule Algorithm-www.soc.napier,ac.uk/~peter/vldb/dm/node8.html
http://www.saedsayad.com/oner.htm