Vous êtes sur la page 1sur 10

COMP1927 15s2 Sort Detective Lab (Friday 10-11am Tuba)

5075018
Victor Tse
5006582
Justin Wang (Junren Wang)

Aim
To identify the algorithm of a sorting program based on black box testing
alone.

Theory
In preparation for being able to identify the algorithm used in a program,
we first had to research the properties for all the possible algorithms, as
illustrated below:

O(n2)

Adapti
ve?
No

Stable
?
Yes

O(n2)

O(n2)

Yes

Yes

O(n)

O(n2)

Yes

Yes

O(n)

O(n2)

O(n2)
O(n2), or
O(n*log(n
)) if more

Yes

Yes

Algorithm

Sorted

Reverse

Random

Oblivious Bubble Sort


Bubble Sort with Early
Exit
Vanilla Insertion Sort

O(n2)

O(n2)

O(n)

Insertion Sort with Binary


Search

compariso
ns than
swaps.

No,

Vanilla Selection Sort

O(n )

O(n )

O(n )

No

(swappi
ng
action)

Quadratic Selection Sort

O(n*n1/2)
O(n*log(n
))

O(n*n1/2)

O(n*n1/2)

No

No

(n*log(n))

(n*log(n))

No

Yes

O(n*log(n
)),
or O(n2)
O(n*log(n
)), or
O(n2)

Yes, in a
bad
way

No

Yes

No

Yes

No

Yes

No

Merge Sort

Vanilla Quick Sort

O(n2)

O(n2)

Quick Sort Median of


Three

O(n2)

O(n2)

occurs
less often.

Randomized Quick Sort

O(n2)

O(n2)

Shell Sort Powers of Two

O(nk),
1<k<2

O(nk),
1<k<2

O(n*log(n
)), or
O(n2)
occurs
randomly
(highly
unlikely).

O(nk),
1<k<2

Shell Sort Sedgewick


Bogo Sort (random sort)

O(nk),
1<k<2
Unbound
ed,
generally
O((n+1)!)

O(nk),
1<k<2
Unbound
ed,
generally
O((n+1)!)

O(nk),
1<k<2
Unbound
ed,
generally
O((n+1)!)

Yes

No

No

No

Hypothesis
On a quick run of Sort A, the program took in excess of five minutes for a
million elements. We therefore expect this program to utilize one of the
slower algorithms, such as an O(n2) algorithm, or bogo sort.
On a quick run of Sort B, since the program took seconds rather than
minutes to complete, and that it was substantially faster, we expect that it
is likely one of the faster O(n*log(n)) algorithms.

Procedure
The entire procedure was done on one machine. This keeps any error from
machine performance (due to hardware) the same for all our
measurements.
Part 1a: Time Complexity: Sorted Order (test case)
1. The following command was typed into Linux terminal (where
amount was replaced for an integer less than 1,000,000 (i.e.
initially 100)):
seq amount | /usr/bin/time format=%U seconds ./sortA > /dev/null
2. The time measured (for the Sort A program) was tabulated into a
table.
3. Steps 1-2 were repeated 3 times (for the same value of amount),
and the average time taken was found.
a. This helps to reduce error from the fluctuations in the
measured results for time taken, which come about due to
factors like variations in machine performance at any one
point in time due to background operations.
4. Steps 1-3 were repeated for different values of amount, e.g. 100,
1,000, 10,000, 100,000, and 1,000,000.
a. This sequence was initially chosen as it would allow an easy
means of identifying if an algorithm was O(nk) or not. For
instance, for k = 2, an increase in 10 times the number of
elements would result in an increase in 100 times the time
taken for the program to complete.
b. However, we found that for some of the sorting algorithms it
was more appropriate to omit the 1,000,000 test case and
instead insert additional cases of 250,000, 500,000, and
750,000 elements in-between this range instead. This decision
however, can only be made on a case-by-case basis
determined during actual experimentation.
2

5. The results were plotted on the line graph, time taken to number of
elements.
6. Steps 1-5 were repeated for Sort B.
Part 1b: Time Complexity: Reverse Order (test case)
1. The following command was typed into Linux terminal (the same
definition for amount applies as in Part 1a):
seq amount | sort r n | /usr/bin/time format=%U seconds ./sortA
> /dev/null
2. Again, the time measured was tabulated into a table.
3. Again, Steps 1-2 were repeated 3 times to find the average time.
4. Again, Steps 1-3 were repeated for different values of amount
(taking the same intervals as outlined in Part 1a, but deviating if
necessary).
5. The results were plotted on the line graph, time taken to number of
elements.
6. Steps 1-5 were repeated for Sort B.
Part 1c: Time Complexity: Random Order (test case)
1. The following command was typed into Linux terminal (the same
definition for amount applies as in Part 1a):
seq amount | sort R | /usr/bin/time --format=%U seconds ./sortA >
/dev/null
2. Again, the time measured was tabulated into a table.
3. Again, Steps 1-2 were repeated 3 times to find the average time.
4. Again, Steps 1-3 were repeated for different values of amount
(taking the same intervals as outlined in Part 1a, but deviating if
necessary).
5. The results were plotted on the line graph, time taken to number of
elements.
6. Steps 1-5 were repeated for Sort B.
Part 2: Stability
1. Since the gen program does not generate duplicate elements, a
simple quick program for testing stability was created instead. The
numbers are out of order, while their initial index positions are
denoted by characters a-e (which are in order).
stableTest.c

#include <stdio.h>
int main (int argc, char*
argv[]) {
printf(1 a\n);
printf(5 b\n);
printf(3 c\n);
printf(4 d\n);
printf(1 e\n);
return 0;
}
2. On compiling stableTest.c, the following command was run:
./stableTest.c | ./sortA
3. The order of elements after sorting was qualitatively noted.
4. Step 3 was repeated up to 20 times (or more if required) to check
for stability in the sorting program, and a conclusion was made after
20 tests.
5. Steps 2-4 were repeated for Sort B.

Results
Sort A: Time Complexity

Sort A: Sorted List

Sort A: Reverse List

20000

40000

60000

80000

100000

120000

Number of elements

Sort A: Random List


80.00
70.00
60.00
50.00

Time (s) 40.00


30.00
20.00
10.00
0.00

Sort B: Time Complexity

Sort B: Sorted List

Sort B: Reverse List

200000

400000

600000

800000

1000000

1200000

Number of elements

Sort B: Random List

Adaptability & Stability


Sort
A
B

Adaptive?
No
No

Stable?
Yes
No

Discussion
Sort A
The collected data (see Appendix), alongside the shape of the graph,
indicates that for a x10 increase from 10,000 to 100,000 elements, there
is a x100 increase in the time taken for the program to run. This indicates
that Sort A is an O(n2) algorithm, agreeing with our hypothesis.
Sort A is also not adaptive, and stable. If we compare this combination of
results with the theoretical expectations, we can conclude that Sort A is an
Oblivious Bubble Sort.
Sort B
The graph displays the shape of some function y = x*log(x), indicating
that Sort B, the faster sorting program, uses a faster algorithm of
O(n*log(n)), which agrees with our hypothesis.
As Sort B is not stable, this rules out merge sort as a viable candidate for
consideration.
Although a vanilla quick sort is adaptive, we note from additional theory
that it is only adaptive when the array is sorted or reverse sorted (always
creating one partition rather than two, and therefore causing it to decay
from O(n*log(n)) to O(n2)), and that this property of adaptability will
always occur for sorted and reverse sorted arrays.
Although adaptive, this property of the algorithm occurs less often in a
quick sort median of three, and will almost never occur in a randomized
quick sort (since randomizing the array before sorting it means it is highly
unlikely that the array being analyzed by the algorithm will ever be
completely sorted, or nearly completely sorted). Since we did not see a
certainty of adaptability in our results, Sort B is not a vanilla quick sort.
For the randomized quick sort, we expect that the time taken for sorted,
reverse sorted, and randomly structured arrays containing the same
number of elements to always have the same time taken, since after the
initial randomization all three test cases are actually just randomly
structured arrays. However, this is not the case as seen from the
tabulated data, particularly at 1,000,000 elements where there is a large
discrepancy in the time taken between the random test case, and the
sorted and reverse sorted test cases (see Appendix for tabulated data). As
a result, we can conclude that Sort B is not a randomized quick sort, and
therefore it is a Quick Sort Median of Three.

Conclusion
There is sufficient information from black box analysis alone to determine
what algorithm each sorting function is, through looking empirically at its
time complexity, and its stability.
Since Sort A is O(n2), non-adaptive and stable, it is an Oblivious Bubble
Sort.
Since Sort B is O(n*log(n)), non-adaptive, unstable, and there is a time
discrepancy between the random and sorted array test cases for arrays of
the same size, it is a Quick Sort Median of Three.

Appendix
Data for the above line graphs supplied below:
Sort A
Number of
elements
100
1,000
10,000
25,000
50,000
75,000
100,000

Number of
elements
100
1,000
10,000
25,000
50,000
75,000
100,000

Number of
elements
100
1,000
10,000
25,000

Sorted Order (Sort A)


Test 1
Average
Test 2
Test 3
(s)
Time
0
0
0
0.00
0
0
0
0.00
0.12
0.12
0.12
0.12
0.74
0.75
0.74
0.74
2.94
2.95
2.95
2.95
6.62
6.63
6.61
6.62
11.76
11.75
11.76
11.76
Reverse Order (Sort A)
Test 1
Average
Test 2
Test 3
(s)
Time
0
0
0
0.00
0
0
0
0.00
0.12
0.12
0.12
0.12
0.7
0.7
0.7
0.70
2.76
2.76
2.76
2.76
6.19
6.2
6.2
6.20
11.01
11.02
11.01
11.01
Random Order (Sort A)
Test 1
Average
Test 2
Test 3
(s)
Time
0
0
0
0.00
0
0
0
0.00
0.12
0.13
0.12
0.12
0.74
0.74
0.74
0.74
8

50,000
75,000
100,000
250,000

2.95
6.62
11.81
73.46

2.95
6.63
11.76
73.47

2.95
6.62
11.77
73.4

2.95
6.62
11.78
73.44

Sort B
Number of
elements
100
1,000
10,000
100,000
250,000
500,000
750,000
1,000,000

Number of
elements
100
1,000
10,000
100,000
250,000
500,000
750,000
1,000,000

Number of
elements
100
1,000
10,000
100,000
250,000
500,000
750,000
1,000,000

Sorted Order (Sort B)


Test 1
Test 2
Test
(s)
0
0
0
0
0
0
0.04
0.04
0.08
0.08
0.15
0.15
0.24
0.23
0.32
0.29

3
0
0
0
0.04
0.07
0.15
0.23
0.3

Average
Time
0.00
0.00
0.00
0.04
0.08
0.15
0.23
0.30

Reverse Order (Sort B)


Test 1
Average
Test 2
Test 3
(s)
Time
0
0
0
0.00
0
0
0
0.00
0
0
0
0.00
0.04
0.03
0.03
0.03
0.07
0.07
0.08
0.07
0.16
0.14
0.15
0.15
0.23
0.22
0.23
0.23
0.31
0.32
0.31
0.31
Random Order (Sort B)
Test 1
Average
Test 2
Test 3
(s)
Time
0
0
0
0.00
0
0
0
0.00
0
0
0
0.00
0.03
0.02
0.03
0.03
0.08
0.08
0.08
0.08
0.15
0.18
0.16
0.16
0.27
0.27
0.26
0.27
0.37
0.36
0.37
0.37

10

Vous aimerez peut-être aussi