Efficient Sorting For Repeated Data

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.
ORG
70
Efficient Sorting for Repeated Data

Bhavesh Patel, Nishant Doshi
Abstract In repeated data, there are some values which come often or say more than once. Therefore, it may happen that during sorting we may get the already sorted part to sort. The known algorithms like quick sort, merge sort which doing O(nlogn) time for n data items, will not taken care of such thing. Therefore, resulting the sorting time is required more in repeated data. In this paper we had consider only repeated data and try to modify the quick sort to get better efficiency compare to previously proposed sorting algorithms. The proposed algorithms were implemented in C language. Index Terms Algorithm, Comparisons, Quick sort, Merge Sort, Repeated data, Sorting.
1 INTRODUCTION
From the past the sorting of data play an important role in database applications. There are many sorting algorithms designed. Quick sort is a sorting algorithm which was developed by C.A.Hoare, that on an average make O(nlogn) comparisons and in the worst case O(n2) comparisons. Merge sort always partitioned data in two equal length part and sort the sub part recursively. Both sorting are an in-place sorting algorithm which takes no any auxiliary memory and sorting the data within array. modified the current or original quick sort algorithm and get the better time and comparisons utilization. Merge sort was discussed in [3,4]. The main idea was to divide the whole array in two equal length sub part and solve each part recursively, but what will happen if at any point of time the resulting subpart is already sorted than merge sort still going to divide and solved so its wasting time by doing unnecessary comparisons. In quick sort it was happen that during divide the subpart may become sorted due to exchanges, but in merge sort we not doing any exchanges, so we had to assume that the resulting subpart is already sorted. In [20] authors try to reduce the computational complexity of merge sort by reducing the swapping and memory usage. The rest of paper organize as follow: section 3 give the algorithm and give the result analysis. Section 4 gives the conclusion and the future work. References are at the end.
2 RELATED WORK
The quick sort was introduced by [1,2]. The main idea is to divide the data based on the partition algorithm and then do the sorting for lower half which contains elements less than pivot and upper half which contain elements greater than pivot element. Till now in the literature they considered the distinct and randomize data as given in [3]-[17] and there are different variations of quick sort also proposed like multikey [19]. In none of the literature they had raised an issue when data will be repeated. In [18] authors talk about improved heap sort algorithm and compare with quick sort and existing heap sort algorithm and given test conditions on the different scenario. The duplicate data will be very useful at different places. Assume they insurance agent wants to see monthly data or yearly data based on the city or name than there are same data can be repeated many times. At railway station they displayed the list of passengers for railway in terms of chart so if the chart is based on the name of passengers then the name will be repeated often. Here in above applications the data which already sorted doesnt give any advantage for future. Assume that currently we search for all the insurance people in 2009 year and then in 2010 year. So both searches will be independent. So even insertion sort also not useful in this type of scenario. The repetition can also be useful in bank databases and other organizations. So this paper raise the issue and importance of duplicate data in various filed and according to that
Mr. Bhavesh Ptel is currently working as . Mr. Nishant Doshi is currently working as Ph.D. research scholar at S V National Institute of Technology, India.
3 PROPOSED WORK
3.1 Efficient quick sort for repeated data The main disadvantage with quick sort is in worst case when data is already sorted whether in increasing or nonincreasing order. Even this is true for particular sub half of a given half. So the modification is if somehow we can check the given sub half is already sorted than we not require to select the pivot element and go further. The advantage of quick sort procedure is if data is already sorted in non-increasing order and we assume that we always select pivot element as middle element than within one pass of the algorithm Partition, the data will be sorted in increasing order. So thats why we use middle element as pivot element in our modified algorithm. In the original algorithm we require to change only partition algorithm and add some module in the stating of the partition algorithm which will take care of the sorted data.
Algorithm check_order (low, high, A [1...n])
2011 Journal of Computing Press, NY, USA, ISSN 2151-9617
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
71
{ i1=low i2=high while (i1<i2) { if(a[i1]>a[i1+1] OR a[i2]<a[i2-1]) break i1=i1 + 1 i2=i2 - 1 } if( i1=i2) return -1; if(i2=i1-1 AND a[i2]<=a[i1]) return -1; call Partition(low, high, A [1...n]) //call the original partition algorithm.
} The above is simple call to check_order function which was called from quick sort main algorithm. And then this algorithm check whether the given algorithm is sorted or not by checking from the both side the moment is found that the array is sorted in increasing order of its data the algorithm will return -1 or it will called the usual partition algorithm which will return the partition index in array for the pivot element. The table 1 shows the comparison of modified quick sort with other sorting algorithm. It shows that as data growing more repeated the algorithm work better. The Time shown in table is in terms of Microseconds.
Table 1. : Comparison of improved quick sort No. Data 10 100 1000 of Range Data 1...10 1...100 1...10 1...1000 1...100 1...10 10000 1...10000 1...1000 1...100 1...10 100000 1...100000 1...10000 1...1000 1...100 1...10 1000000 1...1000000 1...100000 1...10000 1...1000 1...100 1...10 of Heap Sort [Time][Comparisons] [1][66] [26][1423] [24][1224] [216][21849] [238][20007] [284][16011] [2774][297857] [2794][278502] [2691][240354] [2432][192720] [35000][3778441] [34869][3586407] [34096][3172089] [31864][2722607] [29599][2268002] [189727][17862589] [190226][17181448] [180733][15690704] [171987][13778451] [162597][12009359] [142791][10305271] Quick sort [Time][Comparisons] [1][44] [14][839] [11][728] [109][12812] [124][11670] [108][10334] [1296][174037] [1208][162981] [1030][149127] [900][136893] [15447][2209438] [14360][2104808] [12759][1936460] [11543][1828365] [10468][1703178] [72681][10698542] [68297][10194710] [62140][9679211] [56969][9135140] [52607][8594749] [48518][8123870] Modified Quick sort [Time][Comparisons] [1][38] [14][859] [10][651] [111][13036] [98][11128] [72][7013] [1350][177307] [1166][157397] [794][118404] [461][70057] [16009][2250480] [14023][2051159] [10478][1629009] [7391][1169213] [4485][699500] [75013][10770300] [67230][10080939] [53273][8490511] [40533][6490555] [29260][4611405] [18030][2805949]
3.2 Efficient merge sort for repeated data Here the first initial steps are same as in previous section, but for clarity of work we had given the complete algorithm. It is straight forward that modified merge sort work better if there will be any already sorted sub-array
is given. So for brevity of work we are not including the comparison for this proposed algorithm.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
72
Algorithm merge_sort_modified (p, q, A [1...n]) { i1=p i2=q while (i1<i2) { if(a[i1]>a[i1+1] OR a[i2]<a[i2-1]) break i1=i1 + 1 i2=i2 - 1 } if( i1=i2) return; if(i2=i1-1 AND a[i2]<=a[i1]) return; r=(p+q)/2; merge_sort_modified(p,r,A); merge_sort_modified(r,q,A); merge(A,p,r,q); // call to original merge algorithm }
[11] [12]
[13]
[14]
[15]
[16]
4 CONCLUSION AND FUTURE WORK

This paper proves that the original algorithm will be inefficient for repeated data so the new modification can be useful to improve the efficiency of the existing algorithm. It may happen that the algorithm also required in order tackling the case of repeated data and we may find any other algorithm which be also improved to tackle this case.
[17]
[18]
[19]
ACKNOWLEDGMENT
The authors wish to thank A, B, C. This work was supported in part by a grant from XYZ.
[20]
REFERENCES
[1] J C.A.R. Hoare: Quicksort. Computer Journal, Vol. 5, 1, 10-15 [2] [3] [4] [5]
(1962). C. Hoare, FIND (Algorithm 65), Communications of ACM, 4 (1961), pp. 321322. Knuth, D.E., 1988. The Art of programming-Sorting and Searching. 2nd Edn., Addison Wesley, ISBN: 020103803X. Cormen, T.H. et al. 2001. Introduction to Algorithms. 2nd Edn., ISBN: 0262032937 R. S. Francis and L. J. H. Pannan. A parallel partition for enhanced parallel quicksort. Parallel Computing, 18(5):543550, 1992. H. M. Mahmoud, R. Modarres, and R. T. Smythe. Analysis of quickselect: An algo-rithm for order statistics. ITA Theoretical Informatics and Applications, 29(4):255276, 1995. B. Vallee, J. Clement, J. A. Fill, and P. Flajolet. The number of symbol comparisons in quicksort and quickselect. In 36th International Colloquium on Automata, Lan-guages and Programming (ICALP 2009), volume 5555 of Lecture Notes in Computer Science, pages 750763, Berlin, Heidelberg, 2009. Springer. M. A. Weiss. Data Structures and Algorithm Analysis in C++. Addison-Wesley,1998. H. Mahmoud, Average-case analysis of moves in quick select, in: Proceedings of Workshop on Analytic Algorithms and Combinatorics, ANALCO, 2009 H. Mahmoud, R. Modarres, and R. Smythe,Analysis of quickse-
lect: An algorithm for order statistics, RAIRO, Theoretical Informatics and Applications, 29 (1995), pp. 255276. Rsler, U., A limit theorem for Quicksort. RAIRO Inform. Thor. Appl. v25. 85-100. P. Hennequin, Combinatorial analysis of Quick-sort algorithm, RAIRO: Theoretical Informatics and Applications, 23 (1988), pp. 317333. P. Kirschenhofer and H. Prodinger, Compar-isons in Hoares Find algorithm, Combinatorics,Probability, and Computing, 7 (1998) pp. 111120. R. Sedgewick and P. Flajolet, An Introduction to the Analysis of Algorithms, Addison-Wesley Publishing Company, Reading, Massachusetts, 1996. P. Kirschenhofer , H. Prodinger , C. Martnez, Analysis of Hoare's FIND algorithm with median-of-three partition, Random Structures & Algorithms, v.10 n.1-2, p.143-156, Jan.March 1997. H. M. Mahmoud. Average-case analysis of moves in quick select. In C. Martnez and R. Sedgewick, editors, Proc. of the 6th Workshop on Analytic Algorithmics and Combinatorics (ANALCO), pages 3540. SIAM, 2009. Helmut Prodinger, Multiple QuickselectHoare's Find algorithm for several elements, Information Processing Letters, v.56 n.3, p.123-129, Nov. 10, 1995 . Vandana Sharma, Satwinder Singh and K.S. Kahlon Comparative Performance Study of Improved Heap Sort Algorithm on Different Hardware Journal of Computer Science 5 (7): 476-478, 2009 ISSN 1549-3636 A. Panholzer and H. Prodinger, A generating functions approach for the analysis of grand averages for Multiple Quickselect, Random Structures and Algorithms 13 (1998), pp. 189209. Manouchehr Zadahmad jafarlou ,Parisa Yousefzadeh fard. Heuristic and pattern based Merge Sort. WCIT-2010. Elsevier.Procedia Computer Science 3 (2011) 322324.
Mr. Bhavesh Patel He had done B.E. from DDU, India in 2007. Then He joined Bridgeport University, CT, USA for M.S. and completed in 2009. After, M.S. he had done job of developer at Generation Digital Solution, NY, USA. Currently he is working as a full time Assistant Professor at Vidhya Bharti Trust Institute of Technology & Reacher Center, India from 2011 owards. Mr. Nishant Doshi He had done B.E. from DDU, India in 2007.
[6]
[7]
Then He joined DA-IICT, India for M.Tech. and completed in 2009. After, M.Tech. he had done job of lecturer at V.V.P. college, India. Currently he is working as a full time Ph.D. research scholar at S V National Institute of Technology, India from 2010 owards. He had been reviwer for IEEE Trasaction on Computers, Elsevier journal of system and software and other international conferences like PDCTA, CCSIT, DPPR, ICCAIE etc. He had been Programme committee member in DPPR, PDCTA, CCSIT etc.
[8] [9]
[10]

Efficient Sorting For Repeated Data

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Efficient Sorting For Repeated Data

Transféré par

Droits d'auteur :

Formats disponibles

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.

Efficient Sorting for Repeated Data

Algorithm check_order (low, high, A [1...n])

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

4 CONCLUSION AND FUTURE WORK

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

Vous aimerez peut-être aussi