Académique Documents
Professionnel Documents
Culture Documents
Abstract
Big data analysis can realize development of various societal aspects and preferences of individual day by day deeds.
This provides a new prospect to explore elementary questions about the composite world. Presently, it is important
to provide proficient techniques and tools for big data analysis. Efficient Range Aggregate queries are important
tools in decision management, online proposition, trend estimation, and so on. Existing methods for handling range
aggregate queries are insufficient to quickly obtain accurate results in big data. In this paper, we propose effective
methods for handling range aggregation queries in big data. Proposed system makes use of hadoop distributed file
system, which will provide framework for the analysis and transformation of very large data sets using the Map
Reduce paradigm. The interface to hadoop file system will be Linux file system, which in turn improve the
performance for the applications. In proposed system, Big data will get divided into independent partitions with map
reduce paradigm, and then generates estimation sketch for each partition. When range aggregation query request
arrives, system will obtain the result directly by summarizing estimates from all partitions. The big data involves
major increase in data volumes, and the selected tuples maybe locate in different files formats i.e. data may present
in structured, semi structured or unstructured format. In this paper, proposed system aims to provide fast approach
for range aggregate query in order to fetch results within least amount of time by using structured and semi
structured heterogeneous file context.
Keywords: Big data, map reduce, RAQ (Range Aggregate Query), Mongo DB
1. INTRODUCTION
Big data is described as huge amount of data which requires new techniques so that it becomes possible to extract
knowledge from it by capturing and analysis process [2]. Due to such huge size of data it becomes very complex to
perform effective analysis using the existing conventional techniques. Big data due to its various properties like
volume, velocity, variety, variability, value and complexity put forward many challenges [4]. There is another thing
linked to big data is social sites and media. Social sites like Google for Gmail, facebook, whatsapp are strike every day
by billions of people everywhere the world. The more elementary test for Big Data applications is to travel the large
volumes of data and fetch useful information or knowledge for coming actions [2].
An application example of big data analysis is Distributed intrusion detection systems (DIDS) which monitor and
report anomaly activities or strange patterns on the network level. A DIDS detects anomalies via statistics information
of summarizing traffic features from diverse sensors to improve false-alarm rates of detecting coordinated attacks. Such
a scenario motivates a typical range-aggregate query problem that summarizes aggregated features from all tuples
within given queried ranges [1]. Range-aggregate queries are applied on such records for certain aggregate function for
analysis within given query range. These range aggregation queries efficiently work with tiny datasets but when big
data comes in picture the huge record not processed efficiently [5].
In existing system, range aggregate queries executed in big data environment which give better efficiency than the
other linear execution process. Proposed work contributed with use of MongoDB database for better result than
preceding system. MongoDB built specifically to handle semi-structured and unstructured data. Proposed work divide
big data into multiple partitions using map reduce algorithm, and then generates a local estimation sketch for each
partition. When a range-aggregate query request arrives, system obtains the result directly by summarizing all estimates
from all partitions. Proposed work also focuses on data security techniques which enhances the privacy of sensitive data
[8].
2. PROPOSED WORK
2.1 SCOPE
Existing methods for handling range aggregate queries are insufficient to quickly obtain accurate results in big data
environments. In this paper, we propose effective methods for handling range aggregation queries in big data
environments. Proposed system also focuses on heterogeneous file context i.e., structured and semi structure files can be
used in a database and accessed to fetch answer of range aggregation queries. Heterogeneous file context requires data
cleaning, preprocessing and need to convert semi structured files in structured form and then apply aggregate function
on structured database. Proposed system overcomes this problem using MongoDB database. MongoDB store a semi-
structure data e.g. xml file in the form of tree structure and execute queries on this tree structure dataset. Proposed
system provides data security techniques which give privacy to the sensitive data which needs to hide from specific
user. This can be done using generalized form to display sensitive data [8].
3. MODULES
MODULE 1: PREPROCESSING PHASE
Preprocessing phase includes following parts,
Big Data Setup, User Profile, Account Registration:
First user should set up big data environment and then create an account and then only they are allowed to access the
system. Once User creates an account in system, they will able to login through their accounts. Based on the Users
request, the system will process the user request and respond to them
MODULE 4: SECURITY
Security can be improved by providing data security techniques which enhances the privacy of sensitive data. Executed
results can be obtained securely by hiding sensitive data from users and security can be improved for query processing
in big data environment.
Example: Display data in generalized form of age attribute of employees
Age attribute can have any numerical value and user can reveal employees data based upon his or her age so we can
use generalization algorithm so that particular age of employee can be secured and falls in range e.g. from 20-30, 30-40
etc. [8].
Generalization Algorithm:
1. Retrieve age from Mongo database
2. Calculate unit place digit from age
3. Subtract above result from age this gives starting range of age
4. Add 10 to starting range to get ending range
5. Concatenate start age range and end age range to obtain generalization results
4. CONCLUSION
In this paper, we proposed approach for range aggregate queries which is new estimated answering method that obtains
accurate estimations quickly for range aggregate queries in big data environments. We believe that our system provides
a better starting point for building real time answering methods for big data analysis. In proposed system we will
overcome problems present in existing system. This makes system more proficient in providing accurate results quickly
for range aggregate queries in big data environment.
REFERENCES
[1] Elumalai R, Mathankumar G, Gunaseelan V, Aravind raj S, Gnanavel S, Black Money Check: Integration of Big
Data & Cloud Computing To Detect Black Money Rotation with Range Aggregate Queries, international research
journal in advanced engg. And technology VOL 2 ISSUE 2 (2016) PAGES 767-772 RECEIVED: 25/03/2016.
PUBLISHED : 01/04/2016
[2] Prasadkumar Kale1, Arti Mohanpurkar, Efficient Query Handling On Big Data in Network Using Pattern
Matching Algorithm: A Review, Volume 3 Issue 11, November 2014
[3] Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, Ramakrishnan Srikant Range Queries in OLAP Data Cubes
IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120
[4] Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin, Fast Data in the Era of Big Data: Twitters
Real-Time Related Query Suggestion Architecture, arXiv:1210.7350v1, 27 Oct 2012
[5] Weifa Liang a, Hui Wang b, Maria E. Orlowska Range queries in dynamic OLAP data cubes , St. Lucia Qld 4072,
Australia Received 12 August 1999; received in revised form 14 February 2000;
[6] Dilpreet Singh, Chandan K Reddy, A survey on platforms for big data analytics, Journal of Big Data 2014 1:8.,
doi:10.1186/s40537-014-0008-6
[7] Jeffrey Dean, Sanjay Ghemawat, Map Reduce: Simplified Data Processing on Large Clusters,
COMMUNICATIONS OF THE ACM January 2008/Vol. 51, No. 1
[8] Varsha P Gaikwad, Nikita R. Khare, Chaitanya N. Kalantri, Collaborative Data Publishing Technique with
Enhanced Security, International Journal of Latest Trends in Engineering and Technology, Vol. 6 Issue 4March
2016, ISSN : 2278-621X.
Anisa I. Tamboli received B.Tech degree in Information Technology from Walchand College of
Engineering, Sangli, M.E. pursuing in Computer Science and Engineering from Annasaheb Dange College
of Engineering, Ashta, has industrial experience of 2.9 years and teaching experience of 2 years.
Sandeep G. Sutar Assistant Professor of Computer Science and Engineering department, experience of 12
years in teaching, received B.E & M.E degree in Computer Science and Engineering from Shivaji
University of Kolhapur, renowned teacher for subjects like Grid, Cloud Computing and Big Data.