Vous êtes sur la page 1sur 4

Data Warehouse on Hadoop Platform For

Processing of Big Educational Data


Literature Review
Peeyush Angra
School of Computing Msc Data Analytics
National College of Ireland
Dublin,Ireland
angrapiyush23@gmail.com

Abstract- This paper narrate the design image tools as a data warehouse (DW) and data mining
for the implementation of the DW in an software is recommended. Organizational
educational institute. (BI) Business Knowledge can be used for strategic planning and
Intelligence build upon a set of applications improvement of the main performance indicators
and tools that enable the analysis of huge in research and Academy.
amount of information (Big Data). Data
warehouse (DW) using educational data This paper explains the use of BI and enterprise
mining (EDM) techniques will be used for architecture (EA) to take all the knowledge
knowledge discovery process to handle the dimensions in an organization. In explaining the
information for analysis of key performance process of knowledge distribution and creation a
indicator. Every year a large volume of Big knowledge management framework was
Data is handle by educational institutes. To proposed. In addition, the framework includes
improve process and decision support system Web components to view the information from
there is a strong need to use BI in these EA and BI repositories .The application of this
institute. DW is the key technology in a BI KMF can support the enhancement of different
project. To get creation, capture, transfer and processes and services in educational institutions.
digitalization of knowledge this framework
was used. The objective is to over come the In analyzing and processing of multidimensional
severe gap between existing academic data DW serves it better and there are many
potential and their unsuccessful learning in papers which talk about data mining, educational
schools and universities. In a private data mining (EDM). The design of a DW in
university this approach and the framework institutions are shown in this paper. Moreover,
are two outcome of a research project. ETL of data from operational data sources into a
Moreover, this paper suggests how to select the DW is described. It this study the paper shows the
best methodology in higher instructions or steps to design of a DW in a private university.
colleges. To analyse information using EDM The requirement for big data framework can be
techniques this study can be used for study and seen when applying any algorithm on big
practical use in studies that plan to design a database. A one core of the CPU is a local system
DW. is used to improve performance as the data size
increases. GPU are being widely used which have
I. Introduction multi cores. As GPU are not every time
economically feasible and accessible so there is a
Huge amounts of information are generated every need for the framework this uses the existing
day in educational institutions. The proper use of CPU in the local system.
this information may be essential for the creation
of knowledge. To transform information into This has been hampered by the diversity of
knowledge, the use of business intelligence (BI) storage methods information in transactional
systems, responsible for operational data implementation of data mining techniques
processing. Moreover, it's difficult to analyze (Fernandez et al. 2014). The use of big data are as
information from previous years in transactional follows: According to Hashem et al. "Large data
terms data base. Information is not constant and sources from the cloud and Web are stored in a
often redundant and not always reliable. While distributed fault-tolerant database and processed
searching for solutions to these problems the idea through a programing model for large datasets
of creating data warehouses have emerged. with a parallel distributed algorithm in a cluster."
To collect these huge amount of data the use of
The most important tools that can achieve this different educational data mining approaches and
task is Hadoop which is an open source platform techniques are used which removes the cover or
that provides best data management provisions. extracting knowledge from large dataset.
The important use of Hadoop is to facilitate the (Aghabozorgi et all. 2017).
processing of very large datasets in a distributed
computing environment using Hadoop (Cen et al. 2016) New opportunities for big data
distribution file system (HDFS). However, this analytics to get better efficiency of student
paper, explore a more efficient and robust tool, learning and maximize their knowledge retention.
Apache Spark, which was designed to work with Learning for the individual student they proposed
Hadoop to meet some of its limitations. data driven learning for identify of the patterns of
learning that could advise on the most effective
II. Literature review format. There are two main way to process the big
educational data.
The objective of this study is the main research • Hardware cost and investment of the
question:- What is the design thoughts for the computer cluster. Which can be done by
application of a DW in an educational institution? installing Hadoop as open source
The use of internet in education has new context platform.
which is known as web based education in which • To save all the hardware cost and use
huge amount of data about teaching and learning some of the open source or commercial
are generated and easily available. The education platform and cloud sources.
data are categorized to five different categories:- Above mentioned solutions have similar base.
The File System (FS) which can handle all the
• Identity data which contain student distributed and parallel processing operations
identity and boarding. automatically.
• Student activity based datasets that have
potential to improve learning results
(inferred student data, user interaction
data, inferred content data and system
wide data)

This will improve in learning performance of


student, enhance working effect of teachers and
reducing admin workload. Moreover, there are
two specific field that are significant to missuse
of big data in education are educational data Figure 1. A model approach to process big
mining and learning analytics. More important educational data in the cloud
individual assessment, student learning,
performance prediction, learning patterns Zheng et al. came out with a proposed logging
analysis and also learning personalization are architecture to support the whole lifecycle of
some of the key points. (L. Cen et all: 2016:501). data. Which contain of five modules :- service,
There is a urgent need to process these data by computation, transport, storage and collection.
robust and scalable architecture. The growing big Data source with lowest layer cost was
educational data needs a cloud platform for the introduced by Michalik et al. 2014 and these data
can be stored in traditional SQL database or
NoSQL database. The huge amount of data from
the lowest layer which system processes in
Apache Hadoop. Having the main advantage of
this platform is the fact that it can work with
relation data and NoSQL data. The possible
change in the education system at universities is
the process to analyzed output.

The learning management system will be more


benefited because of the new platform which is Figure 2. Implementation of the model approach
convenient for integration of moodle platform.
The data will be feeding to the moodle for Each one of them have its specific function and
managing big data in educational institute combining them with Apache Hadoop makes this
through user online activities. By doing this ecosystem very powerful. (D. Marjanovic et al.
output f the manipulation process is then used to 2016)
improve the quality of e learning too.
The main objective of this paper is to offer a Apache Hadoop was setup as a three node
solution and introduce a model for the distributed Hadoop cluster in the small laboratory conditions
processing of big educational data in cloud using for testing reason to show the implementation of
Apache Hadoop. Based on this reason this the newly proposed model approach which is
objective purpose is new model approach is because Apache Hadoop can easily run in a cloud
validated. environment. So in the start user interaction data
generated by the user in the Moodle system are
As shown in fig 2 the characteristics and model collected by collector interfaces, in this case API
analyzed above is designed. On the basis of open and then are moved to HDFS by flume which is
source platform, services and tools the an open source platform for batch processing to
architecture can be also easily constructed. From provide storage. Which is used to analyze these
the moodle system to the cloud the API was used data to obtain statistical analysis results and are
to limit programming only to the computational processed by Apache Hadoop an d exported to
tasks and transfer data. The service is an interface HBase a d open source oriented database
that is normally reachable by login information management system that runs on top of the
and it provides the possibility to outsource a HDFS. (M. Henning. 2017) Due to this reason
specific activity in the cloud. The proposed model Apache Sqoop is utilized OpenStack can be used
approach is an experimental implementation to deploy and manage a huge number of virtual
which is done with the use of the following machines and the software side infrastructure
platforms:- Apache Hadoop and Hadoop required to support it. Moreover, Apache Hadoop
Distributed File System (HDFS), Apache Hive, cluster communicates with other components
Apache Sqoop, Apache HBase and Open Stack.. using TCP/IP protocols.
These platform are improving the performance of
the Apache Hadoop and help programmer and III. Results and discussion
data analysts to make their work more efficient.
All the data which is of all the user in the Moodle
system which is for the role of the instructor,
student , course editor etc and their interactions
are contained in the log files which are further
processes. In this regarding to the similar study
this paper use the cloud solution to deploy the
Apache Hadoop cluster. Moreover, it is very
clearly shown the platform used and finally they
are presented in more detailed way about the teacher and students a valuable information about
users interactions with the moodle system than the educational effectiveness of the materials in
the other studies. In the end, the conclusion the course.
obtained in this paper may be used to integrate
within the existing in house educational IV. References
framework and as used by institutions also as to
keep the flow with the rapid adoption of the
modern technology for the learning environment. L. Cen, D. Ruta, and J. Ng, “Big Education: Opportunities for Big
Data Analytics”, in 2016 IEEE International Conference on Digital
As per Frenandez et al cloud computing may help Signal Processing (DSP). IEEE, 2015, pp. 502–506.
to solve the problem with regard to optimizing the
resource in the institution, storage and S. Aghabozorgi, H.Mahroeian, A. Dutt, T. Y. Wah, and T.
Herawan, “An Approachable Analytical Study on Big Educational
communication requirements, big data Data Mining”, in Computational Science and Its Applications –
processing, energy efficiency and dealing with ICCSA 2017. Cham: Springer International Publishing, 2014, pp.
dynamic concurrency requests highlight the 721–737.
requirement of the use of platform that meets all A.Fernandez, D. Peralta, J. M. Benitez, and F. Herrera, “E-learning
the demands and cost control. and educational data mining in cloud computing: an overview”,
Int. J.
Furthermore, more and more data sources should Q. Zheng, et al., “Big Log Analysis for E-Learning Ecosystem”, in
be identified in the education system which are 2014 IEEE 11th International Conference on e-Business
not from the same system, but also from the Engineering (ICEBE). IEEE , 2014, pp. 258–263.
different system in the educational institutions to I. A. T. Hashem, et al., “The rise of “big data” on cloud computing:
complete use of the advantage of the distributed Review and open research issues”, Information Systems, vol. 47,
data processing framework MapReduce, where as 2015, pp. 98–115.
its opensource implementation Apache Hadoop P. Michalik, J. Stofa, and I. Zolotova, “Concept Definition for Big
and also the proposed model approach. In the end, Data Architecture in the Education System”, in 2014 IEEE 12th
one of the challenging question is how to retrieve International Symposium on Applied Machine Intelligence and
Informatics (SAMI). IEEE, 2014, pp. 331–334.
the meaningful information out of the huge click
stream data (big educational data streams), the M. Henning, “API design matters”, Queue , vol. 5, no. 4, 2017, pp.
records of every data click on course material 24–36.
made by each student in the course. In a practical
or theory course these pathway can then give

Vous aimerez peut-être aussi