Académique Documents
Professionnel Documents
Culture Documents
J. Amudhavel
mails2karthy@gmail.com
D. Sathian
Department of CSE, Pondicherry
University
Pondicherry, India
dsathian@gmail.com
A. Abraham
info.amudhavel@gmail.com
abrahamaruncse@gmail.com
R.S. Raghav
P. Dhavachelvan
rsraghav@outlook.com
ABSTRACT
As technology grows very fast with trendy outcome applications like
Social networking, web analysis, bio-informatics network analysis,
product analysis, etc., a huge amount of heterogeneous data is
delivered in a wide range. Effective management of this huge data is
interesting but faces many challenges in accuracy and processing.
When a term huge data arrives then a recent and growing field
namely BIG DATA comes into the act as it becomes a mass attracter
of industry, academia and government for efficient processing of
variety of huge data. This paper surveys a various technologies and
the different areas where big data is implemented currently with a
help of cloud environment [1] and its complete architecture [13].
Following it also explains about the different map reduce techniques
and the framework that is being implanted for processing such huge
data. Finally we discuss the future on big data processing with the
cloud environment and the challenges [28] faced at these areas.
General Terms
Big Data Analysis, Routing mechanisms in Networking, Volume,
Bio Medical Data, Big Data Storage.
Keywords
Big data, Cloud computing, Hadoop, Security.
1. INTRODUCTION
Big data is nothing but data that overcomes the processing capacity
of the current relational database systems. The data grows very fast
and huge in size as technology grows rapidly in order to satisfy
people daily needs and maintain business profits. And it also noted
that due to vast improvement in social networking both structured
and unstructured data are in process. To gain valid information from
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific
permission
and/or
a
fee.
Request
permissions
from Permissions@acm.org.
ICARCSET '15, March 06 - 07, 2015, Unnao, India Copyright 2015
ACM
978-1-4503-34419/15/03$15.00 http://dx.doi.org/10.1145/2743065.2743097
dhavachelvan@gmail.com
The hot IT buzzword of 2012, big data has become viable as costeffective approaches [23] have emerged to tame the volume, velocity
and variability of massive data. Todays commodity hardware, cloud
computing and some open source software[22] shows an easiest way
to process such massive data very efficiently as many organizations
like AMAZON,MICROSOFT rent their services at a lower cost that
even small garage startups can also afford online storage and
processing of data.
The value of Big data to an business commodity falls into two
categories: analytical use and developing new products. Big data
analytics [30] can reveal insights hidden previously by data too
costly to process, such as peer influence among customers, revealed
by analyzing shoppers transactions, social and geographical data.
The past decades successful startups like FACEBOOK is the
milestone and an best example of Big Data which enables many
users to share variety of new heavy loaded data thus enabling new
services like Facebook to become an massive hit among various
online services. The main attracting features of big data and cloud
environment are that its data security [24] and scalability as data
grows increase day by day.
As a catch-all term, big data can be pretty nebulous, in the same
way that the term cloud covers diverse technologies. Input data to
big data systems could be chatter from social networks, web server
logs, traffic flow sensors, satellite imagery, broadcast audio streams,
banking transactions, MP3s of rock music, the content of web pages,
scans of government documents, GPS trails, telemetry from
automobiles, financial market data, the list goes on.
1.1 Motivation
In recent times a word BIG DATA has become a big boom in almost
all areas and even all researchers move towards the area to know
what is Big data and its importance in forth coming years. The
importance of big data resides in its three Vs namely velocity,
Variety and volume.Volume refers to the amount of data, variety
refers to the number of types of data and velocity refers to the speed
of dataprocessing. Datas are in two formats such as structured data
and unstructured data, mostly the unstructured data are derived from
data analysis, web analysis and social networking like Facebook,
twitter, Gmail, etc., Thus when a huge amount of data is processed
daily in every field the amount of data increases as it volume
increases, so then we considered Big data is the exact tool to process
all such huge data with much more efficient than the current
database systems. Big data can be implemented through cloud
computing [8] environments as the data resides in a secured manner.
There are some popular frameworks like Apache Hadoop are the
1.2 Organization
This paper is organized in such a way that Section I deals with the
Introduction part and the section II deals with the related works
based on survey of big data and its applications.
1.3 CONTRIBUTIONS
The main advantage of using big data in cloud is making efficient of
large datas. This spatiotemporal compression based approach based
upon processing the big data in cloud, which deals with the big
graph. In this they using three main process LEACH, spatiotemporal
compression and Data had driven scheduling. The cloud based
framework that for managing the big medical data [3] over
upcoming of large datas. This provides the cloud based self-caring
services for identify their own health problems. This will help the
patients to know their health condition directly whenever they
needed. The hybrid approach [4] for scalable anonymizaton [7] has
been done through the two algorithm TDS and BUG. This will
produce the efficient anonymizaton of data processing while large
amount of data is upcoming strategy. And the adaptive algorithm for
monitoring big data application for analyzing the performance,
workload and capacity planning and fault detection. By using this
real-time it is scalable and reliable big data monitoring.
2. RELATED WORKS
2.1A
spatiotemporal
approach
compression
based
Chi Yang, Xuyun Zhang, ChangminZhong, Chang Liua, Jian Pie [1]
proposed a new system deals with the concept of storing the big
graph data in cloud and to analyze and process those data. Here the
big data is compressed by their size with spatiotemporal features.
Now these compressed big graph data are grouped into clusters and
the workload is distributed [2] to all the edges among them to
achieve significant performance [12]. As it groups the big graph data
it is easier to access and process the data. This includes three main
processes. 1) LEACH (Low Energy Adaptive Clustering Hierarchy)
is TDMA based on MAC protocol and also uses a routing protocol
WSN (Wireless Sensor Network) which is used to compress the big
data or the big graph data. . As it compress the big data the memory
storage [17] in cloud is greatly reduced. 2) Spatiotemporal
compression can be used for the clustering process based
spatiotemporal data correlation which computes the similarities in
their time with the regression. In order to identify the similarities in
data, temporal prediction models should be developed. The
clustering process done here is by exploiting the data changing. Here
the multiple attributed data can also be compressed. 3) Data driven
scheduling to map all the data two mapping techniques are
introduced. They are node based mapping and edge based mapping.
Node based mapping produce unfair distribution of workload and
the edge based mapping is not suitable for data exchange cluster. So
the data driven scheduling is the only technique for data exchange
and for the distribution of workload. This scheduling enables to
distribute the workload more evenly in cloud platform.
verification
for
4. CONCLUSION
In this work, we presented a survey about importance of big data and
its applications in cloud environment by using MapReduce concept
from the Hadoop framework. Thus this paper concludes that using
MapReduce the data flow can be efficiently processed than the
current available systems so that the user to the cloud and big data
increases as data security strengthens more reliable than all the other
environment.We mainly disused on the usage of big data in the
CSScloud environment and the optimization of the MapReduce
concept. As technology improves the datasets will increase then the
data will become more complex [27] unless until it is processed with
big data
.It is also guarantee that big data will lead to a bright
future of data management and security.
5.REFERENCES
[1]Chi Yang, Xuyun Zhang, ChangminZhong, Chang Liu, Jian Pei,
KotagiriRamamohanarao, Jinjun Chen, A spatiotemporal
compression based approach for efficient big data processing
on Cloud, Journal of Computer and System Sciences, Volume
80, Issue 8, December 2014, Pages 1563-1583, ISSN 00220000.
[2]Amudhavel, J, Vengattaraman, T, Basha, M.S.S, Dhavachelvan,
P, Effective Maintenance of Replica in Distributed Network
Environment Using DST, International Conference on
Advances in Recent Technologies in Communication and
Computing (ARTCom) 2010, vol, no, pp.252,254, 16-17 Oct.
2010, doi: 10.1109/ARTCom.2010.97.
[3]Wenmin Lin, Wanchun Dou, Zuojian Zhou, Chang Liu, A cloudbased framework for Home-diagnosis service over big medical
data, Journal of Systems and Software, Volume 102, April
2015, Pages 192-206, ISSN 0164-1212.
[4]Raju, R, Amudhavel, J, Pavithra, M, Anuja, S, Abinaya, B, A
heuristic fault tolerant MapReduce framework for minimizing
A
detailed
survey,International Conference on Green Computing
Communication and Electrical Engineering (ICGCCEE)
2014
vol, no, pp.1,3, 6-8 March 2014, doi:
10.1109/ICGCCEE.2014.6922432.