Vous êtes sur la page 1sur 41

M.

TECH DISSERTATION PART -II

MINING CMS DATA TO UNDERSTAND


STUDENTS’ LEARNING ISSUES
GUIDED BY
DR.SANTOSH VISHWAKARMA
DEPT OF CSE

PRESENTED BY

PRAKHAR GAUTAM

0206CS15MT13

MTECH 4TH SEM


PRESENTATION OUTLINE
• INTRODUCTION
• REVIEW OF LITERATURE
• OBJECTIVES OF THE DISSERTATION
• METHODOLOGY OF THE PROPOSED WORK
• EXPERIMENTAL RESULTS
• COMPARATIVE ANALYSIS
• CONCLUSION & FUTURE WORK
• REFERENCES
INTRODUCTION
• At 315 million, India has most students in the world.
• Particularly, “Engineering graduates” constitute a
significant part of Nation’s future workforce.
• Direct impact on Nation’s economic growth.
• So its very important to address their concerns/problems.
• They can have a better future and good academic career.
• Over “80 per cent” engineering graduates in India
unemployable - Aspiring Minds report.
CONTD..
• Traditionally, educational researchers have been using methods
such as surveys, interviews, - time consuming and inefficient.
• Researchers have also used social media data, but the social
media data - unreliable, unauthentic and mostly anonymous.
• In this dissertation work, the focus is on mining CMS data,
which is authentic and real, as it doesn't allow users to go
anonymous.
• CMS data - much more reliable as compared to other platforms.
CMS
CONTD..
• Content Management System (CMS) - software application, used
by Institutes/Colleges to store students information.
• Students can post their problems and solution can be provided for
the same.
• Comes with a user login and password to prevent unauthorized
access.
• We have taken 3 years of CMS data comprising of students'
problems; and analysed those problems so that the solutions can
be provided for the same by the Institute Management.
• The CMS data is taken from Gyan Ganga Institute, Jabalpur.
CONTD..

• Data collected is huge and manually extracting useful


information from this huge dataset is not feasible.
• Data will be of no use if we don't convert it into
something useful.
• One such approach which can be used for this task is
Data Mining.
• Data Mining is defined as a process of extracting useful
and hidden information from large databases.
APPLICATIONS

• The knowledge extracted can be used to increase revenue, cuts


costs, or both.
• The retailers of grocery products increase their sales through
data mining (through analysing customer purchase history).
From loan payment prediction (weather a loan should be given
to a customer based on his/her past data) to detecting financial
fraud detection; data mining is used everywhere.
• In this dissertation work, the focus is on text mining.
• Text mining is defined as deriving knowledge from text; which
can be either stored in a Notepad text file, Word file or an Excel
sheet.
DATA MINING
LITERATURE REVIEW
S. Author Title Publ. Year Methods Limitations
N
o.
1. Xin Chen, Mining Social IEEE, 2014 Naive Bayes method is • Twitter-less Users
Mihaela Media Data for Learni used for categorizing • Anonymous
Vorvoreanu, Understanding ng student issues and then identities.
and Krishna Students’ Techno predict the same. • “Others” list huge.
Madhavan. Learning logies, • Unreliable Data.
Experiences Januar
y.

2. Nyalleng Privacy In IInd 2015 Discusses and raises • No


IEEE,
Moorosi and Mining Crime privacy issues and solutions/workflow
Internati
Vukosi Data From onal public safety provided.
Marivate. Social Media: Confere incidents from social • Not enough Data
S-African nce, media posts by users. used for mining.
Perspective Cape
Town,
South
Africa,
Nov.
LITERATURE REVIEW
S. Author Title Publ. Yea Methods Limitations
No. r
3. Cui Yuan. Data Mining Advanced 201 IDA Supervised • Different
Techniques with Research 4 learning is used to train techniques
and
its Application to Technolog the system for making Clustering/Classific
the Dataset of y in predictions. ation; not used for
Mental Health of Industry proper comparative
College Students Applicatio study.
ns
(WARTIA
• Sample taken is
). small.
Ottawa,
Canada
29th sept.

4. A. A Novel 2012 201 Clustering is applied to • 10 students Data


Banumathi Approach for IEEE 2 the dataset of 10 taken; very less for
and A. Upgrading Indian Internatio students to classify performing mining.
Pethalaksh Education by nal them in different • Dataset created by
mi Using Data Conferen clusters as low, average, the authors; not
Mining ce, high, on the basis of taken from any real
Kerala,
Techniques their marks. school.
India, 01
June
LITERATURE REVIEW
S. Author Title Publ. Year Methods Limitations
No
.
5. Bo Guo, Rui Predicting IEEE, 2016 • Students • Not applied on any
Zhang, Guang Students Interna Performance testing dataset to
Xu, Performance in tional, Prediction Network" show how accurate
Chuangming Educational Data Sympo (SPPN)" is proposed the predictions are
Shi and Li Mining sium. to predict students' • System very
Yang Wuhan performance which complex which
,China, is demonstrated to makes it costly.
27-29 be highly accurate • GPU's are used for
July. with large datasets. processing which
are very costly as
• The system is compared to CPU's
trained using
Supervised and
Unsupervised
Learning both.
OBJECTIVES

• To analyze problems of undergraduate students using data


mining approach.
• To conduct a predictive analysis of students behavior using
multilevel analysis of CMS logs.
• To build a system where the data generated by engineering
students in future can also be mined and solutions can be
provided instantly.
METHODOLOGY
NAIVE BAYES CLASSIFIER

• Naive bayes classifier is based on bayes' theorem based on


probabilities.
• Naive bayes classifiers are based on strong
(naive) independence assumptions between the features.
• These classifiers can predict class membership probabilities
such as the probability that a given record belongs to a
particular class.
The equation is as follows:-
𝑃 𝑥 𝑐 𝑃(𝑐)
𝑃 𝑐𝑥 =
𝑃(𝑥)
CONTD..

• P(x) is the apriori probability for feature x.


• P(c) is the apriori probability for class.
• P(x/c) is the likelihood probability.
• Using these, we calculate p(c/x) which is called as Posterior
probability.
• P(c/x) is the probability that an object belongs to category p(c)
given feature "x".
Naive bayes classifier takes the dataset for training and the
system is trained to correctly predict the class of student
problems.
NAIVE BAYES EXAMPLE

• Players will play if weather is “sunny”?


• Posterior(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
• Here we have P (Sunny |Yes) = 3/9 = 0.33,
• P(Sunny) = 5/14 = 0.36,
• P( Yes)= 9/14 = 0.64
• Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher
probability.
RAPIDMINER
RAPIDMINER TOOL :-

OPEN CORE SOFTWARE PLATFORM.


WRITTEN IN JAVA.
WIDELY USED IN DATA MINING.
FREE TO USE.
RAPIDMINER - GUI
RAPIDMINER - SETUP
TRAINING DATASET

• The system is trained using 1000 records of students'


problems collected after months of research.
TESTING DATASET

• Testing Dataset contains records of 200 students.

• There are 200 text files for 200 students; one text file for each.

• The problems which students face are stored in these text files.

• The testing dataset helps us to check the accuracy of the system; i.e. how accurate
our system is making predictions.
EXPERIMENTAL RESULTS

• Records of 200 students are used to check the accuracy of the


system.
• Here in the below figure, we can see, our system has correctly
predicted class for student's problems.
• This helps us to analyse what problems students' face and how
they can be resolved.
• We can see that majority of students' have issues with BUS and
LIBRARY.
• The students' who are not facing any such problems; we have
made a category as NO_PROBLEM.
CONTD..
CONTD..
CONTD..

• Out of 200 students, 57 students have problem with BUS,


46 with library, 26 with WASHROOMS not clean, 22 have
issues with WIFI, 21 with BREAK_TIME being not
enough, and 28 students have no problems.
• Automatically predicted by the system and no manual effort
required.
• 29% students' face issues with bus regularly, 23% have
issues with library, 13% have issues with washrooms, 11%
with break time and wifi; and 14% of student's have no
issues with the Institute.
COMPARATIVE ANALYSIS
Classifier Naive K-NN Decisio Genera Random
Bayes n Tree lized Forest(moder
Linear nize)
Model
Accuracy
98.65% 95.68% 93.38% 93.06% 89.88%
Classification error
1.35% 4.32% 6.62% 6.94% 10.12%
Kappa
0.983 0.932 0.917 0.912 0.854
Weighted mean recall
78.82% 77.16% 66.24% 66.21% 63.10%
Weighted mean
precision 78.87% 73.25% 70.59% 72.51% 68.20%
CONTD..

• Naive bayes has given best performance in terms of high


accuracy and low classification error rate.
• Random forest which uses combination of decision trees
with random function is giving little high error rate as it
requires more features for classifying data.
• This may not be appropriate to classify students' problems
from the CMS dataset as the emphasis is on problems
rather than different features.
• K-NN and Decision tree have also achieved high accuracy.
ACCURACY GRAPH

Accuracy
100.00%

98.00%

96.00%

94.00%

92.00%

90.00%

88.00%
Random Forest Generalized Linear Model Decision Tree K-NN Naive Bayes
CLASSIFICATION ERROR GRAPH

Classification error
12.00%

10.00%

8.00%

6.00%

4.00%

2.00%

0.00%
Naive Bayes K-NN Decision Tree Generalized Linear Model Random Forest
KAPPA GRAPH

Kappa
1

0.98

0.96

0.94

0.92

0.9

0.88

0.86

0.84

0.82

0.8
Random Forest Generalized Linear Model Decision Tree K-NN Naive Bayes
WEIGHTED MEAN RECALL

Weighted mean recall


80.00%

78.00%

76.00%

74.00%

72.00%

70.00%

68.00%

66.00%

64.00%

62.00%

60.00%
Random Forest GL Model Decision tree K-NN Naive Bayes
WEIGHTED MEAN PRECISION

Weighted mean precision

79.00%

77.00%

75.00%

73.00%

71.00%

69.00%

67.00%
Random Forest Decision Tree GL Model K-NN Naive Bayes
CONCLUSION

• In this work, we analyzed problems of students which they face


in their engineering/college life.
• Important to get rid of these problems so that they can have a
better career; as it's been said that “Today's students are
Tomorrow's Nation builders”.
• Knowledge extracted will be very useful for Policy Makers and
Educators in making informed decisions.
• We build a system where the data generated by engineering
students in future can also be mined and solutions by the college
management can be provided instantly.
FUTURE WORK

• In future, this research work can be further extended to


more students' records from other fields than Engineering
such as Medical or other streams.
• It can also be extended to students from School or
University.
• We used Naive bayes classifier; but there are other
classifiers such as Artificial Neural Network, Multi-SVM;
which can also be applied to the dataset in future.
• Testing dataset can also be further increased.
PAPER PUBLISHED
• The research paper/manuscript is published in International
Journal of Computer Applications (IJCA) June 2017
edition with effect from June 16, 2017.
• Manuscript title: Mining CMS data to understand students'
learning issues
• URL : http://www.Ijcaonline.Org/archives/volume168/num
ber10/27914-2017914519
• ISBN: 973-93-80896-68-8
REFERENCES
1. Chen, Xin & Vorvoreanu, Mihaela & Madhavan, Krishna. (2014), “Mining
Social Media Data for Understanding Students’ Learning Experiences”.
IEEE, Learning Technologies, vol. 7, no. 3, pp. 246 - 259, 06 January 2014.
2. C. Yuan, "Data mining techniques with its application to the dataset of
mental health of college students," in Advanced Research and Technology in
Industry Applications (WARTIA), 2014 IEEE , Ottawa, ON, Canada, 29-30
Sept. 2014.
3. A. P. A. Banumathi, "A novel approach for upgrading Indian education by
using data mining techniques,"in Technology Enhanced Education (ICTEE),
2012 IEEE International Conference, Kerala, India, 01 June 2012.
4. B. Guo, R. Zhang, G. Xu, C. Shi and L. Yang, "Predicting Students
Performance in Educational Data Mining," in Educational Technology
(ISET), 2015 International Symposium , Wuhan, China, 27-29 July 2015.
REFERENCES
5. N. M. Nyalleng Moorosi, "Privacy in mining crime data from social Media: A
South African perspective," in Information Security and Cyber Forensics
(InfoSec), 2015 Second International Conference , Cape Town, South Africa, 15-
17 Nov. 2015.
6. Snigdha Dixit and Santosh Kr. “Collaborative Analysis of Customer Feedbacks
using RapidMiner”. International Journal of Computer Applications 142(2):29-
36, May 2016.
7. Rish, Irina, "An empirical study of the naive Bayes classifier." IJCAI 2001,
workshop on empirical methods in artificial intelligence. Vol. 3. No. 22. IBM,
2001.
8. Keogh, Eamonn. "Naive bayes classifier." UCR, and Christopher Bishop “Pattern
Recognition Machine Learning”, Springer-Verlag (2006).
9. G. Kesavaraj and S. Sukumaran, "A study on classification techniques in data
mining," 2013 Fourth International Conference on Computing, Communications
and Networking Technologies (ICCCNT), Tiruchengode 2013,pp.1-7.
REFERENCES

10. R. Manickam et al. “An analysis of data mining: past present and future,”
International Journal of Computer Engineering and Technology (IJCET),
ISSN 0976 – 6375 Volume 3 Issue 1, January-June (2012).
11. Shouyi Wang et al. “Machine Learning Algorithms in Bipedal Robot
Control” IEEE Transactions on Systems, Man, and Cybernetics, Volume:
42, Issue: 5, Sept. 2012.
12. Simmi Bagga et al. “Applications of Data Mining” IJSETT, 19-23 (2012).
13. Lokesh Kumar, Parul Kalra Bhatia, “TEXT MINING: CONCEPTS,
PROCESS AND APPLICATIONS” Journal of Global Research in
Computer Science, Volume 4, No. 3, March 2013.
14. Pena-Ayala, Alejandro. "Educational data mining: A survey and a data
mining-based analysis of recent works." Expert systems with applications
41.4 (2014): 1432-1462.
15. CS512 Spring 2007 “Data Mining: Principles and Algorithms”.
REFERENCES

16. Pujari, Arun K. Data mining techniques. Universities press, 2001.


17. Freitas, Alex A, “Data mining and knowledge discovery with
evolutionary algorithms”. Springer Science & Business Media, 2013.
18. Priyadharsini C, Dr. Antony Selvadoss Thanamani, “An Overview of
Knowledge Discovery Database and Data mining Techniques”.
International Journal of Innovative Research in Computer and
Communication Engineering, Vol.2, Issue 1, March 2014.
19. Larose, Daniel T. Discovering knowledge in data: An Introduction to
data mining. John Wiley & Sons, 2014.
20. Kalyani M Raval, Data Mining Techniques, International Journal of
Advanced Research in Computer Science and Software Engineering
Volume 2, Issue 10, October 2012.
THANK YOU….

Vous aimerez peut-être aussi