Vous êtes sur la page 1sur 9

Hybrid Quick Sort + Insertion Sort:

Runtime Comparison
Anirban Ray
23 February 2018

Objective
Among numerous sorting algorithms, some of the common algorithms are Quick Sort and Insertion Sort.
Quick sort is very popular since it is the fastest known general sorting algorithm in practice which provides
best run-time in average cases. Insertion sort, on the other hand, works very well when the array is partially
sorted and also when the array size is not too large. In this project, we will try to combine these two algorithms
in such a way that we can use both the speed of quick sort and also the benefit of effectiveness of insertion
sort. Afterwards, we would like to find hybrid algorithm (combination of insertion and quick), which is optimum
in the sense of minimum average run-time.

Insertion Sort
Insertion sort is an iterative sorting algorithm. The main idea of this is that at each iteration, insertion sort
removes an element, find its ordered position in the sorted array of the previous elements and inserts it
there. The algorithm can be written as below:

INSERTIONSORT(A)
for j = 2 to A.length
key = A[j]
i = j - 1
while i > 0 and A[i] > key
A[i + 1] = A[i]
i = i - 1
A[i + 1] = key

Quick Sort
Quick sort is a divide and conquer algorithm. It first divides a large array into two sub-arrays with respect to a
pivot element, where all elements of one sub-array is not more than the pivot element, and those of the other
are not less than that. Then it does the same for the two sub-arrays and continue to do so until a stage is
reached where all sub-arrays are of size 1. Since all these sub-arrays are now sorted trivially, merging these
will result in completion of the sorting process. The algorithm to sort the pth to rth of the array A is as follows.
QUICKSORT(A, p, r)
if p < r
q = PARTITION(A, p, r)
QUICKSORT(A, p, q)
QUICKSORT(A, q + 1, r)

PARTITION(A, p, r)
x = A[p]
i = p - 1
j = r + 1
while TRUE
repeat
j = j - 1
until A[j] <= x
repeat
i = i + 1
until A[i] >= x
if (i < j) exchange A[i] with A[j]
else return j

Different choices of the pivot element are available for different types of the input array. In the above-
mentioned algorithm, we have used the first element of the array. Lomuto used the last element of the array.
Sometimes a random index is chosen and swapped with the last element and then the Lomuto partitioning
method is followed. Singleton used the median of three method, where one first sort the first, last and middle-
most elements of the array, and then exchange the middle most element of the modified array with the first
element of the array and proceed as before. In this project, we will always use random inputs, in which case
the choice of pivot does not matter too much. So, we will continue to use the first element as pivot following
Hoare, the first proposer of the quick sort algorithm.

Hybrid Sort
Now we come to the formulation of the new hybrid algorithm. Since we know that insertion sort works better
for arrays with partially sorted sub-arrays of small size, we start the sorting procedure by the partition
approach of quick sort algorithm. But instead of continuing until we reach sub-arrays of one element each,
we stop partitioning when we reach the stage of sub-arrays of size less than some given cut-off size, which
distinguishes between the small and large arrays. After this step gets completed, we have an array
constituting of sub-arrays of sizes less than or equal to the cut-off size, which are not sorted themselves, but
as a whole, they are sorted. Finally, we run insertion sort over the entire array to get the completely sorted
output. The algorithm is the following.

HYBRIDSORT(A, p, r, k)
if (p < r)
if (r - p + 1 > k)
q = PARTITION(A, p, r)
HYBRIDSORT(A, p, q, k)
HYBRIDSORT(A, q + 1, r, k)

INSERTIONSORT(A)

Implementation of the Sorting Algorithms


We first define the sorting algorithms in C++ using the Rcpp package.
#include <Rcpp.h>
using namespace Rcpp;

// function to interchange elements of an array at two positions


void swap(NumericVector array, int first_position, int second_position) {
double temporary = array[first_position];
array[first_position] = array[second_position];
array[second_position] = temporary;
}

// partition algorithm for quick sort using first element as pivot


int partition(NumericVector array, int start, int end) {
double pivot = array[start];
int i = (start - 1);
int j = (end + 1);
while(TRUE) {
do {
i = (i + 1);
} while (array[i] < pivot);
do {
j = (j - 1);
} while (array[j] > pivot);
if (i >= j) {
return j;
}
swap(array, i, j);
}
}

// insertion sort algorithm


void insertion(NumericVector array, int start, int end) {
if (start < end) {
for (int i = (start + 1); i <= end; ++i) {
double temporary = array[i];
int j = (i - 1);
while ((j >= start) && (array[j] > temporary)) {
array[(j + 1)] = array[j];
j = (j - 1);
}
array[(j + 1)] = temporary;
}
}
}

// quick sort algorithm


void quick(NumericVector array, int start, int end) {
if (start < end) {
int key = partition(array, start, end);
quick(array, start, key);
quick(array, (key + 1), end);
}
}

// hybrid sort algorithm


void hybrid(NumericVector array, int start, int end, int cutoff) {
if (start < end) {
// applying partition algorithm only when array size is more than cutoff
if ((end - start + 1) > cutoff) {
int key = partition(array, start, end);
hybrid(array, start, key, cutoff);
hybrid(array, (key + 1), end, cutoff);
}
}
}

// [[Rcpp::export]]
NumericVector sorting_R(NumericVector array, char method, int cutoff) {
int n = array.length();
// making an explicit copy of the input array to keep that unchanged
NumericVector sorted_array = clone(array);
// applying different sorting algorithms based on method
switch (method) {
case 'h': {
hybrid(sorted_array, 0, (n - 1), cutoff);
insertion(sorted_array, 0, (n - 1));
break;
}
case 'i': {
insertion(sorted_array, 0, (n - 1));
break;
}
case 'q': {
quick(sorted_array, 0, (n - 1));
break;
}
default: {
Rcpp::stop("Permissible methods are Hybrid(h), Insertion(i) and Quick(q).");
}
}
return sorted_array;
}

Finding the Optimum Cutoff Size


Now that we have defined our sorting algorithms, in the next step, we wish to find the optimum choice for the
cut-off by simulation study, since it is not known and the concept of “small” is pretty vague. Therefore, we
define functions in R (by calling the C++ functions) to compute the average run-time of our hybrid algorithm
for given choice of the cut-off array size. We run these functions over different choices of cut-off sizes for
different array sizes and plot the average run-times against choices of cut-offs for different array sizes as
below.
# function to calculate required time to sort a particular input array using
# a user defined cutoff
single_hybrid_runtime <- function(array_to_be_sorted, cutoff_to_be_used) {
system.time(sorting_R(array_to_be_sorted, "h", cutoff_to_be_used))["user.self"]
}

# function to calculate the time required for sorting arrays of some


# particular size using different choices of cutoff
comparative_hybrid_runtime <- function(array_size, cutoff) {
simulated_array <- rnorm(array_size)
sapply(cutoff, single_hybrid_runtime, array_to_be_sorted = simulated_array)
}

# function to calculate average runtime for user defined array size for
# different choices of cutoff, average being taken over different
# replications (optionally user defined)
average_hybrid_runtime <- function(array_size, cutoff, replication = 25) {
rowMeans(replicate(replication, comparative_hybrid_runtime(array_size, cutoff)))
}

keys <- seq(1, 1000, 1) # choices of cutoff used for simulation study
times_1_e_5 <- average_hybrid_runtime(array_size = 1e+05, cutoff = keys)
times_4_e_5 <- average_hybrid_runtime(array_size = 4e+05, cutoff = keys)
times_7_e_5 <- average_hybrid_runtime(array_size = 7e+05, cutoff = keys)
times_1_e_6 <- average_hybrid_runtime(array_size = 1e+06, cutoff = keys)

plot(keys, times_1_e_5, type = "o", main = "For array size 1e+05", xlab = "Cutoff Used",
ylab = "Time Taken")
plot(keys, times_4_e_5, type = "o", main = "For array size 4e+05", xlab = "Cutoff Used",
ylab = "Time Taken")

plot(keys, times_7_e_5, type = "o", main = "For array size 7e+05", xlab = "Cutoff Used",
ylab = "Time Taken")
plot(keys, times_1_e_6, type = "o", main = "For array size 1e+06", xlab = "Cutoff Used",
ylab = "Time Taken")
Observations from the Graphs
Firstly, we see that there is a sharp fall in all the graphs initially. This proves the effectiveness of the
hybrid algorithm over quick sort, as it should be noted that for the choice of cut-off as 1, we are
essentially applying quick sort over the entire array. So that steep fall helps us to conclude with
confidence that combining the two algorithms is not at all worthless. This is because of the fact that as
quick sort is a recursive algorithm, it has a too much of overhead cost for calling itself repeatedly for
small arrays.

Secondly, we note that after a certain point, average run-time has a steadily increasing trend, which is
due to the fact that insertion sort is effective only for “small” arrays. As we are increasing the cut-off
size, insertion sort needs to be applied on larger partially sorted sub-arrays and hence the sorting of
the entire array becomes slower.

Finally, we observe that the trade-off between these two opposite effects on run-time is balanced in the
lower part of the skewed U-shaped pattern, which is revealed in all the graphs, in more or less extent.

Therefore, based on the simulation study, we can conclude that the optimum choice of cut-off lies in the
range from 100 to 200. Based on our interpretation of the graph, we will subjectively choose 140 as cut-off in
the latter sections, without any analytical justification.

Improvement over Quick Sort


Now, a plausible (and of course perfectly reasonable) question will be how much do we gain from this
algorithm or do we gain at all. We have already shown in the previous section that the run-time is significantly
improved for hybrid method over quick sort. Now, we wish to see whether this improvement varies with the
size of the input array or not. For that purpose, we define to function to calculate the percentage
improvement in run-time in hybrid sort over quick sort and plot the results.

# function to compute improvement in runtime for a single replication for a


# user defined input size
single_improvement <- function(array_size) {
x <- rnorm(array_size)
hybrid_time <- system.time(sorting_R(x, "h", 140))["user.self"]
quick_time <- system.time(sorting_R(x, "q", 140))["user.self"]
(quick_time - hybrid_time) * 100/quick_time
}

# function to compute average improvement over multiple replications


average_improvement <- function(length_of_array, replication = 50) {
mean(replicate(replication, single_improvement(length_of_array)))
}

sizes <- seq(1e+05, 1e+07, 1e+05) # simulated sizes used for improvement calculation
improvement <- sapply(sizes, average_improvement)

plot(sizes, improvement, type = "o", xlab = "Array Size", ylab = "Percentage Improvement",
main = "Improvement in Hybrid algorithm over Quick")
Explanation of Improvement Pattern
From the graph, it is evident that hybrid sort always outperforms quick sort comfortably for all the array sizes.
But the same graph also reveals that the improvement is decreasing as array size increases. But one should
note that the percentage improvement is still around 40% (which is, of course, very significant for practical
purposes). The unexpected decreasing trend can be explained by the slow nature of insertion sort algorithm.
In hybrid sort, we are using insertion sort over the entire array in the last step. Although, at this step, the
array is partially sorted, it should be kept in mind the insertion sort is significantly effective only for small
arrays. We use insertion sort to minimise the large overhead cost due to recursive calls of the quick sort for
small arrays, but this remedy comes with its own cost that for large arrays, it is intrinsically slow, however
partially sorted the array may be. Thus, as array size increases, the run-time for this step also increases.

Summary
At the end the project, we see that we have successfully improved the quick sort by combining insertion sort
with it. We have also provided an interval where the optimum choice of cut-off size should lie. We have also
verified the consistent out-performance of hybrid sort over quick sort. Thus, we can use this algorithm as an
alternative for the quick sort algorithm.

References
1. Introduction to Algorithms - Third Edition (https://mitpress.mit.edu/books/introduction-algorithms)
2. Wikipedia - Quick Sort (https://en.wikipedia.org/wiki/Quicksort)
3. Wikipedia - Insertion Sort (https://en.wikipedia.org/wiki/Insertion_sort)
4. Techie Delight - Hybrid QuickSort Algorithm (www.techiedelight.com/hybrid-quicksort)