Académique Documents
Professionnel Documents
Culture Documents
,(((
,QWHUQDWLRQDO&RQIHUHQFHRQ,QIRUPDWLRQ7HFKQRORJ\,Q&,7H7KH1H[W*HQHUDWLRQ,76XPPLW
Information can be put in numerous formats. For instance The problems which occur due to Variety testing are
database, excel and access or for the matter of the actuality, it (1)Validation of semi-structured and unstructured data using
can be put away in a basic content document. Some of the human intervention, (2)Due to lack of proper defined formats,
time the information is not even in the customary existence of unstructured validation issues. (3)In order to
configuration as we accept, it might be as video, SMS, pdf or process semi-structured and unstructured data, scripting is a
something we may have not contemplated it. It is the need of big issue. (4)Sampling problem.
the organization to mastermind it and make it important. This
present reality has information in a wide range of formats and Solution for the above challenges are as follow (1) To
that is the test we have to overcome with the Big Data. This identify the inconsistency use compare tools to compare the
assortment of the information speak about Big Data[2]. data. (2) For the semi-structured data, the conversion into the
structured format is required, (3) Parsing of unstructured text
data is required which further is compared with the data
Velocity output.
,QWHUQDWLRQDO&RQIHUHQFHRQ,QIRUPDWLRQ7HFKQRORJ\,Q&,7H7KH1H[W*HQHUDWLRQ,76XPPLW
A. Genetic algorithm The work was carried out by making comparison on the
As genetic algorithm is sequential in nature, it does not basis of quality between K-means, PSO and hybrid clustering
support parallelism; the work focuses on parallelizing genetic algorithm. The two PSO techniques i.e. the standard PSO
algorithm using extended hadoop mapreduce for clustering as algorithm and the one used with K-means algorithm, were
shown in figure 2. Two phased clustering is carried out. The matched and it was concluded that the hybrid version has
first phase clustering is carried out by splitting the input data, improved merging minor quantization faults.[11]
each split is the passed onto the mapper. The result is then Various PSO based clustering algorithm were used where
passed to the second phased clustering where single reducer is major evaluating parameters were quantization error, objective
used. In other words, for the implementation of genetic function value, fitness value, inter and intra cluster, mean and
algorithm, numerous mappers and a single reducer is used. standard deviation, execution time, error rate and mean square
error.[15]
C. Performance Testing
,QWHUQDWLRQDO&RQIHUHQFHRQ,QIRUPDWLRQ7HFKQRORJ\,Q&,7H7KH1H[W*HQHUDWLRQ,76XPPLW
characterization which would help in maintenance life cycle of 3. Performance YES YES YES YES
commercial big data system. Test
[1] Shilpa and Manjit Kaur, “BIG Data and Methodology-A review”,
IJARCSSE, vol 3, Issue 10,pp. 991-995, October 2013.
Comparison of techniques based on 4V’s:
[2] “Big Data – What is Big Data – 3 Vs of Big Data – Volume, Velocity
As we know that 4V’s play an important role for testing of and Variety – Day 2 of 21”, SQL AUTHORITY.COM, October 2013.
big data, so testing techniques has been compared on the basis [3] “The 3Vs that define Big Data”, Data Science Central, July 2012.
of 4 V’s and shown in table 2. [4] Tom Davenport, Three big benefits of big data analytics(sascom
magazine).
[5] Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar
TABLE II. COMPARING THE VARIOUS APPROACHES Gajja, “Big Data: Testing Apprroach to Overcome Quality Challenges”,
Infosys Lab Briefings, vol 11, no 1, pp. 65-72, 2013
S Approach Variety Velocity Volume Veracity
No. [6] Nivranshu Hans, Sana Mahajan, and SN Omkar, “Big Data Clustering
Using Genetic Algorithm On Hadoop Mapreduce”, INTERNATIONAL
JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH, vol4,
1. GA YES YES YES NO
Issue 04, pp. 58- 62, April 2015
[7] Dian Palupi Rini, Siti Mariyam Shamsuddin and Siti Sophiyati Yuhaniz,
“Particle Swarm Optimization: Technique, System and Challenge”,
2. PSO YES YES YES NO International Journal of Computer Applications, vol 14, no 1, pp. 19 –
27, January 2011
,QWHUQDWLRQDO&RQIHUHQFHRQ,QIRUPDWLRQ7HFKQRORJ\,Q&,7H7KH1H[W*HQHUDWLRQ,76XPPLW
[8] “Five big data challenges And how to overcome them with visual [11] Alexander Alexandrov, Christoph Brucke and Volker Markl,”Issues in
analytics”, sas THE POWER TO KNOW. Big Data Testing and Benchmarking”, Technische Universität Berlin
[9] DW van der Merwe and AP Engelbrecht, Data Clustering using Particle Einsteinufer
Swarm Optimization(Department of Computer Science University of [12] Bhasker Allene and Marco Righini, “Better Performance for Big Data”,
Pretoria). Intel Corporation, 2013
[10] Mustafa Batterywala and Shirish Bhale, “Performance Testing of Big [13] Bhagyashree Bhoyar, Pramod Patil and Priyanka Abhang, “ A Survey of
Data Applications”, Impetus Technologies, STC 2013 Accelerated PSO Swarm Search Feature Selection for Data Stream
Mining Big Data”, IJIET, vol 6, Issue 3, pp. 53-58, February 2016.