Nuts and Bolts

Chapter 27
Matching Nuts and Bolts in O(nlogn) Time

(Extended Abstract)
Jinos KomMs t4 Yuan Ma 2 Endre Szemerkdi 3y4
Abstract
Given a set of n nuts of distinct widths and a set of n bolts
such that each nut corresponds to a unique bolt of the same
width, how should we match every nut with its correspond-
ing bolt by comparing nuts with bolts (no comparison is
allowed between two nuts or between two bolts)? The prob-
lem can be naturally viewed as a variant of the classic sort-
ing problem as follows. Given two lists of n numbers each
such that one list is a permutation of the other, how should
we sort the lists by comparisons only between numbers in
different lists? We give an O(n log n)-time deterministic al-
gorithm for the problem. This is optimal up to a constant
factor and answers an open question posed by Alon, Blum,
Fiat, Kannan, Naor, and Ostrovsky [3]. Moreover, when
copies of nuts and bolts are allowed, our algorithm runs in
optimal O(logn) t ime on n processors in Valiants parallel
comparison tree model. Our algorithm is based on the AKS
sorting algorithm with substantial modifications.
1 Introduction
Given a set of n nuts of distinct widths and a set of n
bolts such that each nut corresponds to a unique bolt
of the same width, how should we match every nut with
Department of Mathematics, Rutgers University, P&&away,
NJ 08855. Email: komlos&nath.rutgem.edu.
Department of Computer Science, Stanford University,
CA 94305. Supported by an NSF Mathematical Sciences Post-
doctoral Research Fellowship. Part of the work was done
while the author was visiting DIMACS, and part of work
was done while he was at MIT and supported by DARPA
Contracts N00014-91-J-1698 and N00014-92-J-1799. Email:
yuanOcs.stanford.edu.
3Department of Computer Science, Rutgers University, Pis-
CataWay, NJ 08855. Part of the work was done while the au-
thor was at University of Paderborn, Germany. Email: sse-
mered@cs.rutgers.edu.
The work presented here is part of the Hypercomputing &
Design (HPCD) project; and it is supported (partly) by ARPA
under contract DABT-63-93-C-0064. The content of the infor-
mation herein does not necessarily reflect the position of the
Government and official endorsement should not be inferred.
its corresponding bolt by comparing nuts with bolts (no
comparison is allowed between two nuts or between two
bolts)?
This problem can be naturally viewed as a variant
of the classic sorting problem as follows. Given two
lists of n numbers each such that one list is a permu-
tation of the other, how should we sort the lists by
comparisons only between numbers in different lists?
In fact, the following simple reasoning illustrates that
the problem of matching nuts and bolts and the prob-
lem of sorting them have the same complexity, up to
a constant factor. On one hand, if the nuts and bolts
are sorted, then a nut and a bolt at the same position
in the sorted order certainly match with each other.
On the other hand, if the nuts and bolts are matched,
we can sort them by any optimal sorting algorithm in
O(n log n) time. Hence, the complexity equivalence of
sorting and matching them follows from the simple in-
formation lower bound of R(nlogn) on the matching
problem, which can be easily derived from the fact that
there are n! possible ways to match the nuts and bolts.
So in this paper, we will consider the problem of how
to sort the nuts and bolts, instead of matching them.
The problem of sorting nuts and bolts has a sim-
ple randomized algorithm (e.g., a simple variant of
the QUICKSORT algorithm) that runs in the opti-
mal O(n logn) expected time [8]. However, finding
a nontrivial (say, o(n2)-time) deterministic algorithm
has appeared to be highly nontrivial. Alon, Blum,
Fiat, Kannan, Naor, and Ostrovsky [3] designed an
O(n log4 n)-time deterministic algorithm based on ex-
pander graphs, and they posed the open question of de-
signing an optimal deterministic algorithm to the prob-
lem. Recently, Bradford and Fleischer [6] improved the
running time to O(n log n), but the question remains
open if O(n log n) can be achieved.
Since the classic sorting problem has been inten-
sively studied, it is natural to ask if any existing
O(n log n)-time deterministic sorting algorithm can be
easily adapted to sort nuts and bolts. In a certain sense,
232
MATCHING NUTS AND BOLTS IN O(n Iog n) TIME
233
most of the existing O(n log n)-time sorting algorithms
use a divide-and-conquer approach. In particular, they
require recursive solutions to subproblems of smaller
size. For the classic sorting problem, solving the sub-
problems is simple. However, in the context of sorting
nuts and bolts, solving a subproblem can raise many
problems. In particular, the fact that we can sort the
nuts and bolts at all relies on the fact that there is a
matching between them.l For example, if all of the
nuts happen to be smaller than all of the bolts, then
we will not be able to learn anything about the order
of the nuts or the order of the bolts, by comparing
nuts against bolts only. As a consequence, if we want
to make use of existing sorting algorithms, it is essen-
tial to make arrangements so that, when we work on a
smaller set of nuts and a smaller set of bolts, we may
obtain useful information in an efficient way. Unfor-
tunately, no existing deterministic sorting algorithm of
O(n log n) time seems readily adaptable to make such
arrangements.
Faced with such difficulty, the algorithm of Alon et
al. [3] uses an O(n log3 n)-time algorithm for selecting a
median nut and a median bolt, which in turns is based
on expander graphs. However, as pointed out by Alon
etaZ.[3],th p t en ar icular method cannot be adapted to
select a median in O(n) time, and a possible O(nlogn)
algorithm needs to come from a different means. Sim-
ilarly, the O(n log2 n)-time algorithm of Bradford et
al. [S] is based on an O(nlogn)-time algorithm for se-
lecting a median nut and a median bolt. In fact, we
have discovered a (fairly) simple O(n(log log n)2)-time
algorithm for selecting a median nut and a median bolt,
thereby giving an O(n log n (log log n)2)-time algorithm
for sorting nuts and bolts. We will not give any details
of this algorithm, however, since it appears that we
need to do something very different to achieve an opti-
mal O(n log n) time.
The main contribution of this paper is an 0( n log n)-
time algorithm for sorting nuts and bolts, which is
based on the AKS sorting algorithm [2] with substan-
tial modifications.2 As a by-product of our AKS-based
approach, our algorithm can be executed in O(logn)
time on n processors in Valiants parallel compari-
son tree model [9], when copying of nuts and bolts
is allowed. In Valiants model, only comparisons are
counted towards the running time, and book-keeping
is free. We remark that our algorithm is not fully con-
structive, and some of its gadgets depend on some ran-
dom graph properties. The existence of such graphs
is easily proved by a random construction, but we do
not know how to construct them explicitly. However,
all other parts of our algorithm are constructive, and
once explicit constructions of the desired graphs are
discovered, our algorithm will be constructive as well.
The rationale of using an AKS-based approach for
sorting nuts and bolts lies behind some special prop
erties of the AKS sorting algorithm. Roughly, as de-
scribed by Paterson [7], the AKS sorting algorithm pro-
ceeds as follows: It arranges the numbers being sorted
in a complete binary tree, which will be referred to as
the AKS tree. Each node of the AKS tree contains
a set of numbers. Most of the numbers in the same
node have ranks within a certain interval. At each
stage of the algorithm, a certain sorting-related devise
(with O(1) parallel time) is used to approximately par-
tition the numbers at each node of the AKS tree. In a
way, the AKS sorting algorithm proceeds by partition-
ing in a weak sense: it approzimately partitions num-
bers into almost correct halves and has an intricate
error-correcting mechanism. In particular, unlike most
other known O(n logn)-time deterministic sorting algo-
rithms, the AKS sorting algorithm does not proceed in
a rigorous divide-and-conquer fashion. These special
properties will appear to be advantageous in sorting
nuts and bolts.
Although there are good reasons that the AKS sort-
ing algorithm may be a good tool for sorting nuts and
bolts, a direct modification of the AKS sorting algo-
rithm does not solve our problem. For example, one
naive approach is as follows: Keep two AKS trees, one
for the nuts and the other for the bolts; At each stage of
the algorithm, compare nuts and bolts in correspond-
ing AKS tree nodes according to an expander graph
and reallocate the nuts and bolts according to the re-
sults of the comparisons. Such an approach proceeds
well at the initial few stages, but it has serious troubles
Such a condition cau be slightly relaxed, as to be discussed
in future stages. The problem arises since we cannot
in $4.
keep a matching between the nuts and bolts in corre-
2The AKS sorting algorithm was designed to be implemented sponding AKS tree nodes. For example, when the roots
in an oblivious fashion on a comparator network, and it also has
an optimal parallel running time of 0 (logn) on ta processors. In
contain only a constant number of nuts and bolts, it is
this paper, our main focus is the sequential algorithm model, and
possible that all of the nuts contained in the root of
we will refer to the work of [2] as the AKS sorting algorithm, aa
one AKS tree are smaller than all of the bolts con-
opposed to the AKS sorting network. tained in the root of the other AKS tree, in which case
234 KOML~SETAL.
we cannot obtain any information, by comparisons be- As described in [7], the behavior of the AKS sort-
tween the nuts and bolts in the roots, about the order ing algorithm can be best understood by thinking of
of the nuts or the order the bolts that are located in all the elements (which refer to nuts or bolts) being
the roots. In fact, such observations may even lead one
to suspect whether the AKS-based approach is helpful
at all in the context of sorting nuts and bolts. The nov-
elty of our work in adapting the AKS sorting algorithm
is to introduce certain mechanisms that allow efficient
approximate-partitioning at an AKS tree node even if
the nuts and bolts in the corresponding AKS tree nodes
do not form a matching.
The remainder of the paper is organized into sections
as follows. In $2, we present our algorithm for sorting
nuts and bolts. In $3, we prove the correctness of the
algorithm and analyze its running time. We conclude
in $4 with discussions on some extensions and open
problems. Due to space limitations, we omit most of
the technical proofs in this extended abstract.
2 An O(nlog n)-Time Algorithm
for Sorting Nuts and Bolts
This section contains the description of our O(n log n)-
time algorithm for sorting nuts and bolts. As pointed
out in the introduction, our algorithm depends on some
random graphs, which we do not know how to construct
explicitly. Also, we will be content with an algorithm
of O(nlog n) running time. No attempt will be made
to keep the involved constants small. In particular, a
large constant (much larger than the previously best
known constant for the AKS sorting algorithm) is hid-
den behind the 0 notation.
2.1 An Overview of the Algorithm
In this subsection, we give a high-level description of
our AKS-based algorithm. The algorithm proceeds
much like the AKS sorting algorithm, except that we
use a completely different method to partition nuts
(bolts) in an AKS tree node. The partition method is
fairly complicated and will be the subject of the next
subsection. In this subsection, we will assume such a
partition can be done and focus on other simpler issues.
We first need a complete understanding of the AKS
sorting algorithm. However, the AKS sorting algorithm
is fairly complicated and we will not be able to include
a complete description of the entire algorithm due to
the limited space. In what follows, we only sketch the
AKS sorting algorithm at a high level, and we refer the
readers to [7] f or a complete and rigorous description.
sorted to move within a complete binary tree, with the
root at the top. A rigorous treatment of such a tree
structure can be found in [7]. We will refer to such a
tree as an AKS tree and refer to a node in the tree
as an AKS tree node. The elements being sorted are
arranged within the nodes of the AKS tree. Each AKS
tree node X has a capacity, denoted by cap(X), that
specifies the maximum number of elements that can be
contained in X. Let 1x1 denote the number of elements
that are indeed contained in X. X is called empty,
full, or partially full, if 1x1 = 0, 1x1 = cap(X), or
0 < 1x1 < cap(X),
respectively. The AKS sorting algo-
rithm works in stages starting from stage one. Within
each stage, there is a sorting-related devise that par-
titions each AKS tree node, X, into four parts, FL,
CL, CR, and FR, which stand for far-left, %enter-
leftn, center-right, and far-right, respectively. (To
be rigorous, we partition the list of elements in node
X, as opposed to node X itself. But we will not dis-
tinguish X from the list of elements contained in X
when no confusion can arise.) By doing so, we hope
to move most of the elements in X into the correct
halves, FL U CL and CR U FR, and to move most
of the extreme elements to the extreme positions FL
and FR. At the end of a stage, elements in FL and FR
are sent to the parent of X, and CL and CR are sent
to the left and right children of X, respectively. This
will have the effect of moving most of the correctly lo-
cated elements downward in the AKS tree and moving
most of the incorrectly-located elements upward in the
AKS tree. Overall, most elements in lower part of the
AKS tree are near their correct positions, and elements
far away from their correct positions tend to move up-
wards in the AKS tree so that they will be processed
further. The AKS tree can be viewed to be infinite, but
we make the convention that a leaf of an AKS tree is
a non-empty node in the lowest level of the AKS tree.
At odd stages, all nodes at odd levels and all nodes
below the leaf level are empty, and all nodes at even
levels above the leaf level are full except that nodes at
the leaf level can be full or partially fu11.3 (The root is
assumed to be at level 0.) The opposite holds at even
stages. This completes our brief description of the AKS
To be more rigorous, when we say that a node is fkll or empty
during a stage, we mean it is full or empty at the beginning of the
stage and stay so in most time of the stage. Note that elements
in a full or partially full node X will be moved to the parent or
children of X at the end of the stage.
MATCHING NUTS AND BOLTS IN O(n log n) TIME
235
sorting algorithm.
At a high level, our algorithm differs from the origi-
nal AKS sorting algorithm in two ways: (1) we need to
keep two separate AKS trees: TN for the set of nuts N,
and TB for the set of bolts B; (2) we need a completely
different method to partition elements in an AKS tree
node.
Other than these two differences, our algorithm for
sorting nuts and bolts works exactly as the AKS sort-
ing algorithm. In particular, the structures of the two
AKS trees are identical (except that one contains nuts
and the other contains bolts): they are specified by the
same set of parameters. To describe explicitly how our
algorithm works, we need to specify some parameters
associated with the AKS sorting algorithm. For sim-
plicity, we will explicitly follow the parameter choices
of [7] whenever possible. In particular, we will use the
same letters to denote the same quantities as in [7] un-
less specified otherwise.
We choose the same parameters associated with our
AKS trees as in [7]:
A=3 ~~43 andX=l
7
48 8
As in [7], the choices of these parameters completely
determine how the nuts and bolts move within TN and
TB. In particular,
l the capacity of an AKS tree node X immediately
after stage t at level d is determined by cap(X) =
vt Ad N(l - &);
l at each stage, the elements at X are partitioned
into four parts FL, CL, CR, and FR such that
(1) /FL/ = (FR( = min{icap(X), F}, JCL] =
(CR1 = +(FL(, and (2) at the end of the stage,
FL and FR are moved to the parent of X, and CL
and CR are moved to the left and right children
of X, respectively.
Also, we choose
the same as in [7]. Note that p and 6 have nothing to
do with the description of the algorithm and will be
used only in the analysis of the algorithm.
Another parameter E was used in [7] to specify the
functionality of the so-called separator, which corre-
sponds to the so-called near-soding network of [2].
In [7], a separator is used to partition an AKS tree
node X into four parts FL, CL, CR, and FR. In
our algorithm, however, we cannot use a separator or
near-sorting network, since, as we have explained in the
introduction, we cannot enforce a matching between
nuts and bolts in corresponding AKS tree nodes. Nev-
ertheless, we need a sorting-related devise for such a
partition. The partition scheme is fairly intricate and
will be the subject of the next subsection. In any event,
following the notation of [7], we will use parameter E
to measure the accuracy of our partition method. We
do not specify how to choose E explicitly. Instead, we
will be content with proving that a sufficiently small E
suffices for our purposes.
Finally, as in [7], we also need to deal with the
so-called boundary conditions and integer rounding.
These can be easily handled in the same way as in [7],
and we will not address these particular technical prob-
lems hereafter.
2.2 Partitioning Nuts or Bolts at an
AKS Tree Node
In this subsection, we describe an algorithm to parti-
tion elements in an AKS tree node X into four parts
FL, CL, CR, and FR. We will accomplish the par-
tition of X by comparing nuts (or bolts) in X with
bolts (or nuts) in a set S(X), which is to be defined
in s2.2.2. On one hand, S(X) should be large enough
so that a proper partition of X is possible, i.e., S(X)
should contain enough bolts (or nuts) to separate some
of the nuts (or bolts) in X from the others. On the
other hand, S(X) should be small enough so that the
number of necessary comparisons between X and S(X)
for partitioning X is not prohibitively large.
The remainder of the subsection is organized as fol-
lows. In $2.2.1, we prove a lemma on random graphs,
and describe how to use the graphs to construct a
comparison algorithm. In $2.2.2, we construct S(X).
In $2.2.3, we describe how to partition X by applying
the comparison algorithm of $2.2.1 to X and S(X).
2.2.1 Random Graphs and a Comparison Al-
gorithm
In this sub-subsection, we first prove a useful lemma
on random bipartite graphs. Then, we describe how to
use such graphs in a comparison algorithm, which is an
important building block in our O(nlogn)-time algo-
rithm for sorting nuts and bolts. Although a random
graph will yield a desired graph with high probability,
we do not know how to construct such graphs explicitly.
236 KOML~SET AL.
The graphs considered in this paper are allowed to be
multi-graphs, and we use e(X, Y) to denote the number
of edges between X and Y for arbitrary vertex subsets
X and Y. In particular, if there are m edges between
a vertex u E X and a vertex v E Y, then each of the
m multiple edges between 1~ and v is counted exactly
once in e(X, Y). Also, we use e to denote the natural
number and use ln to denote the logarithm with base
e.
Lemma 2.1 Let E and 8 be two arbitrary constants in
(0, l), and let U and V be two aeta such that IV1 5 IVl.
If d 2 2ev3 In ((e2/e2)(lVl/lUl)), then there eziata a
bipartite graph G = (U, V, E), E s U x V with the
following properties: (1) deg(v) = d for all v E V;
(2) e(X,Y) 1 (1 - e)d 1x1 ]Yl/lUl, for any sets X C_
U, Y c V Buch that 1x1 > eJUI and IYI 1 e[UI, and (3)
if (U( = (VI and d _> (8/0)ln(l6/0), then, any Y 5 V
of size IYI 5 2em48 IUl/d ia directly connected (i.e.,
connected by an edge, as opposed to by a path) to at
least 6djYj/2 rt ve ices in U, even if an arbitrary set of
up to (1 - B)d edges are removed from each vertex in
Y.
Proof We can prove that a random graph has the
desired properties. Details are omitted. m
Roughly, Lemma 2.1 says that the number of edges
between two sets of vertices cannot be much smaller
than the average number of edges between two sets of
their sizes. In a certain sense, this also means that the
edges between U and V are evenly distributed and so
the number of edges between two sets of vertices cannot
be much larger than the average. Formally, we have the
following corollary whose proof is straightforward and
omitted.
Corollary 2.1 In the graph of Lemma 2.1, foT any
aeta X E U and Y C_ V such that IY ( 2 elU[, e(X, Y) 2
dlXIIYIIlUl+ EdlYI*
We now describe how to apply the graph of
Lemma 2.1 to construct a comparison algorithm, in
a way similar to that of [2] and [7]. We will use some
adaptive methods, such as counting, in some future
applications of the algorithm, whereas [2] and [7] deal
with comparator networks and can only use oblivi-
ous methods. Given an arbitrary set of nuts (bolts)
U, an arbitrary set of bolts (nuts) V where (VI 2
/VI, and a bipartite graph G C U x V, algorithm
COMPARE(U, V, G) works as follows.
Algorithm COMPARE(U, V, G)
Step 1. Set SMALL(V) = LARGE(V) = 0 for each
v E v.
Step 2. For each edge (u, v) in graph G, compare u
and v. Then, increment SMALL(V) by 1 if v < U; in-.
crement LARGE(V) by 1 if v > u; increment SMALL(V)
and LARGE(V) each by l/2 if u = v.
In the above algorithm, SMALL(V) (resp., LARGE(V))
denotes the number of comparisons in algorithm COM-
PARE where v is strictly smaller than (resp., strictly
larger than) its opponent plus half of the number of
comparisons in algorithm COMPARE where v is equal
to (resp., equal to) its opponent. In particular, we in-
crement both LARGE(V) and SMALL(V) by l/2 if v is
equal to its opponent. Such an arrangement will make
some of our future arguments simple by ensuring that
the values of SMALL and LARGE are symmetric. We
remark that there may be multiple edges between u
and v, in which case, u and v is compared for more
than once, and SMALL(V) or LARGE(V) is updated ev-
ery time a comparison between u and v occurs.
It would be nice if algorithm COMPARE(& V, G) al-
ways provides an approximate partition of V. However,
such a partition is not always possible. For example, if
every nut in V is smaller than every bolt in U, then no
matter how we conduct our comparisons, the outcome
will not provide any useful information for partition-
ing V. Nevertheless, we next show that the algorithm
has a certain ranking property in a certain case. Such
a ranking property will then be further exploited to
provide a more sophisticated algorithm for partition.
In what follows, we define the rank of an element z
with respect to a set Y, denoted by rank(z, Y), as the
number of elements in Y that are smaller than or equal
to z. Note that rank(t,Y) is well defined even if z and
elements of Y cannot be compared by a direct compar-
ison, e.g., x is a bolt and Y is a set of nuts. When we
say the rank of element z, denoted by rank(z), with-
out specifying a corresponding Y, we mean the rank of
z with respect to l3 (or, equivalently, with respect to
N). For any C, < E [0, l] and for any sets of elements
U and V, let
V(C,t, U> = {v E V I 6 PI L -WV, U) I E IW.
In the next lemma, U and V are a set of nuts and a
set of bolts (or a set of bolts and a set of nuts), respec-
tively, G C U x V is a bipartite graph with parameters
d and e as described in Lemma 2.1. (Here, we do not
need the third property of Lemma 2.1, and so we do
not need the parameter 0.)
Lemma 2.2 Assume clU/ 2 2, C, t E [O, 11. 1,f aZgo-
tiihm COMPARE( U, V, G) is executed, then (1) at most
E IV1 eZements in V(O, t, 17) have their SMALL values
less than or equal to (1 - t - 2e)d, (2) at most E IV 1
elements in V(c, 1, U) have their LARGE vales Iess
than OT equal to (C - 2~)d, and (3) for any X E
V(0, C, U) and any Y E V(<, 1, U) where < - C >_ 66,
if SMALL(Z) 5 SMALL(Y) for all z E X and alZ y E Y,
then either 1x1 < cIU( 07 (Y( < e]Ul.
Proof Use Lemma 2.1. Details are omitted. n
2.2.2 Construction of S(X)
S(X) consists of three subsets SL(X), SR(X), and
SC(X). In order to partition X properly, not only
do we need to know S(X), but also we need to know
h(X), sR(x),
and SC(X). This sub-subsection is de-
voted to construct these sets.
We first introduce some concepts. Some of these con-
cepts are not directly used in the construction of S(X),
but they are useful to understand the relevant termi-
nologies and to analyze our final algorithm. So we de-
fine these concepts here for ease of reference. The con-
cepts of a natural interval and strangeness were used
in [7].
The natural interval of an AKS tree node is in-
ductively defined as follows: the natural interval
of the root of an AKS tree is [l,n]; if the natural
interval of an AKS tree node X is [cy, p], then the
natural intervals of the left and right children of
X are [a, v] and [q, p], respectively.
Let [a(X),p(X)] denote the natural interval of an
AKS tree node X, and let m(X) = a(XJ:pX1.
The strangeness of an element z w.r.t. (with re-
spect to) an AKS tree node X is defined to be
the number of levels that z needs to move from
X upward in Xs AKS tree in order to reach the
first AKS tree node whose natural interval con-
tains rank(z). (Note that the strangeness of z
w.r.t. X is well-defined even if x is not located
in X.)
For each AKS tree node X, let h(X) denote the
height of X in its AKS tree. (The height of a leaf
is assumed to be 0.)
237
Claim 2.1 If X is an AKS tree node such that h(X) 2
0 (i.e., X is either above or included in the leaf level),
then cap(X) _< 6 2-h(X) (P(X) - a(X) + 1).
Proof Assume that X is i levels below the root,
where i 2 0. Consider the lowest level where each
AKS tree node is full. This level is at least h(X) - 2
levels below Xs level, since either a leaf is full or its
grandparent is full. The sum of the capacity of all the
nodes at this level is at most n, since there are at most
n elements in an AKS tree. Hence,
2+h(x)-2cap(X) A _ h(X)-2 < 12 = 2@(X)-a(X)+l) ,
where the last equality holds since the sum of the
natural-interval sizes at any level of an AKS tree is
equal to n. The correctness of the claim follows imme-
diately from the above inequality. n
In the next claim and the rest of the paper, we will
use parameter c to denote a certain large constant. We
will not specify the explicit value of c, but we will see
that a sufficiently large value of c will be good for our
algorithm.
Claim 2.2 If h(Y) 2 $h(X) + c and h(X) 1 0, then
cap(X) <p(Y) - o(Y) + 1.
Proof Note that h(Y) 2 $h(X) + c implies h(X) -
h(Y) < i/r(X) - c. Hence, by Claim 2.1,
cap(X) < 62-h(Xl (/3(X) - (Y(X) + 1)
=
pw (p(y) - a(y) + 1) 2h(xbh(Y)
5 62-h(X) (p(Y) - a(Y) + 1) 2*h(Xl-c
< P(Y) - a(Y) + 1,
where the last inequality holds since c is sufficiently
large and h(X) 2 0. H
Claim 2.3 At each level with height at least 0.5 h(X)+
c in either TN or TB, there exists a unique AKS tree
node whose natural interval contains [cu(X),a(X) +
cap(X)/36 - 11.
Proof Since natural intervals at the same level of
an AKS tree cannot overlap with each other, we only
need to show the existence of a desired node at each
level. Moreover, since the natural interval of a node
is contained in the natural interval of its parent, we
only need to consider the level with height 0.5 h(X) +
c. If h(X) 5 0.5 h(X) + c, then the ancestor of X
238
with height 0.5 h(X) +c has the desired property since
Claim 2.1 implies that Xs natural interval contains
[a(X), a(X) + cap(X)/36 - 11. If h(X) > 0.5 h(X) +
c, then let Y be the unique descendant of X at level
0.5 h(X) + c such that (Y(Y) = o(X). By Claim 2.2, Y
has the desired property. n
By Claim 2.3, the following notation of XL,j (i =
0,l) is well-defined.
q2.i (i = 0, 11, X&,7
defined. For an AKS
let
l
Similarly, we can verify that
XhL,r, and X&r are all well-
tree node X in TN (resp., TB),
X be the unique AKS tree node in TB (resp., .TN)
such that [4X), P(X)l = b(X), P(X)l,
l Xi,i (i = 0,l) be the unique AKS tree node in TB
(resp., TN) such that /z(X~,~) = 2-/4X) + c and
[~(XL,i>,P(Xi,i)I 2 [a(X>> o(X)+ca~(X)/36-
11,
l X;l,i (i = 0,l) be th e unique AKS tree node in
TB (resp., TN) such that h(X& i) = 2-h(X) + c
and [a(Xk,i),P(Xk,i)] 1 P(X) - w(X)/36 +
LPWl7
l X c,e be the unique AKS tree node in TB
(resp., TN) such that h(X,&,) = h(X) + c
and b(X,&,Oh P(-J&,N 2 [4X> - cap(X)/72 +
l/2, m(X)) + cap(X)/72 - l/2],
l X cL,1 be the unique AKS tree node in TB (resp.,
TN) such that J6.(XbL,r) = 2-lh(X) + c and
bwL,1w(&,1N 2 [4X> - cap(X)/72 +
1/2,4X)l,
l X&,,, be the unique AKS tree node in TB
(resp., TN) such that h(X&,) = 2-h(X) +
c and ~(X~R,~),P(X&)I 2 bW>,~(x> +
cap(X)/72 - l/2],
l PL(X) be the unique path from XL,-, to Xi,l,
l PR(X) be the unique path from Xk,c to Xk,r,
l PcL(X) be the unique path from X&, to X&,
l PcR(X) be the unique path from Xk,,
lJo x&R,l.
In the above definition, PL(X) is assumed to con-
tain the nodes XL,, and Xi,,; Similarly, each of the
other three paths (PR(X), Pc~((x), PcR(X)) contains
its end nodes described above. We are now ready to
define SL(X), S&X), and SC(X).
KOML~SET AL.
Let Tx denote the subtree rooted at X in the
AKS tree containing X, and let TX(d) denote the
subtree of TX consisting of all nodes in TX that
are d levels within X. (Note that TX(~) contains
d + 1 levels.)
Let
SL(X) = u TYJ (0.5 h(X) + c) ,
Y'EPl(X)
sR(x) =
u Tyt (0.5 h(X) + c), and
Y'a3?(Xy)
SC(X) =
U
Tyr(O.5 h(X) + c).
YEPCL(X) u &R(X)
Note that S&(X), SR(X), and SC(X) are supposed
to be sets of bolts or nuts, but the above definitions
define them as sets of AKS tree nodes. Like we have
used X to denote both an AKS tree node and the list
Of elements in X, We Use SL(X), SR(X), and SC(X)
to denote both the sets of AKS tree nodes as defined
above and the lists of nuts or bolts contained therein, as
long as the meaning is clear from the context. Roughly,
SL(X) (resp., SR(X)) looks like a tape attached to
path PL(X) (resp., P&X)). The tape vertically ex-
tends from c levels above X all the way down to the
leaf level. Similarly, SC(X) looks like two tapes of a
similar shape. The intuition behind this complicated
definition of St(X), SE(X), and SC(X) will become
clear in the proof of Theorem 1.
2.2.3 A Partition Algorithm
In this sub-subsection, we describe how to partition
X into FL, CL, CR, and FR by comparing elements
in X with elements in S(X), according to algorithm
COMPARE described in $2.2.1. The reason that our
algorithm can provide a proper partition of X is fairly
lengthy and will be discussed in $3. In particular, it
is dependent upon another key property of the original
AKS sorting algorithm.
Note that Lemma 2.2 only states that algorithm
COMPARE@& V, G) sometimes gives a proper partition
of V, the larger set between U and V. In fact, a care-
ful investigation of the proof of Lemma 2.2 reveals that
when V is substantially larger than U, not much can
be said about the ranking of V (the smaller set be-
tween V and V) by COMPARE(V, V, G). On the other
hand, however, we will need to partition X by compar-
ing X with S(X), which can be much larger than X in
many cases. Hence, in the most interesting case (see
239
Case 2 below), the following algorithm PARTITION(X)
consists of two major phases: In the first phase, we
choose subsets S;(X) C Sh(X), S&(X) E SC(X), and
s;(x) c sR@),
each of which has size comparable
to 1x1, and we let S(X) = Sk(X) U S;(X) U Sk(X).
In the second phase, we use S(X) to partition X into
FL, CL, CR, and FR.
Algorithm PARTITION(X)
Let Q be a sufficiently small constant. There are two
cases.
Case 1: ~1x1 < 2. We compare all elements in X
with all elements in S(X). Then, we construct a graph
on all elements of X by drawing a directed edge from
~1 E X to x2 E X if there exists an element 2 E
S(X) such that ~1 5 z 5 ~2. Such a graph is a DAG
(directed acyclic graph), and we can topologically sort
X according to the DAG. According to this order, we
divide X into FL, CL, CR, and FR each with size
specified in the AKS sorting algorithm, i.e., IFL =
?
IFRl = min{$cap(X), y}, and lCLl= ICRl = y-
WI.
Case 2: E/XI > 2. Let G be a bipartite graph de-
scribed in Lemma 2.1. (As we will see in the proof of
Theorem 1, 8 will be a fraction of A). In particular, in
the first, three steps of the algorithm, G s X x SL(X),
G c X x SR(X), and G E X x SC(X), respectively,
and in the last step of the algorithm, G c S(X) x X.
Step 1. Apply COMPARE(X,SL(X),G). Let S;(X)
be a set consisting of (A/10)/X] elements in SL(X) with
the smallest SMALL values among those whose SMALL
values are at least d(lXl - $$ cap(X) - 2E IX[)/lXl.
(Ties are broken arbitrarily.)
Step 2. Apply COMPARE(X, SR(X), G). Let S&(X)
be a set consisting of (X/lO)lX( elements in S&X) with
the smallest LARGE values among those whose LARGE
values are at least d(lXl - Fcap(X) - 2~)Xl)/lXl.
(Ties are broken arbitrarily.)
Step 3. Apply COMPARE(X, SC(X), G). Let
S&(X) consist of at most (l/2 - A/lO)lXl elements
in SC(X) with the smallest SMALL values among those
whose SMALL values are at least (l/2 - 2~)d. (That is,
(i) if there are more than (l/2 - X/10)(X( elements in
SC(X) having their SMALL values at least, (l/2 - 2e)d,
then let S&,(X) consist. of (l/2 - X/lO)lXl elements
in SC(X) with the smallest SMALL values among those
whose SMALL values are at least, (l/2-26)& (ii) if there
are at most (l/2 - /\/lO)lXl elements in SC(X) hav-
ing their SMALL values at least (l/2 - 2c)d, then let
S&,(X) consist of all these elements.) Similarly, let
&(X) consist of at most (l/2 - X/lO)lXl elements
in 5~ (X) with the smallest. LARGE values among those
whose LARGE values are at least (l/2 - 2~)d. (Ties are
broken arbitrarily.)
Include all elements in S&,(X) and S&,(X) into
S&(X). If S;(X) has less than (1- X/5)(X( elements,
put an additional arbitrary set, of elements from SC(X)
into Sk(X) so that S>(X) contains exactly (l-X/5)IXJ
elements.
Step 4. Let, S(X) = S;(X) U Sk(X) US&(X). Ap-
ply COMPARE(S(X), X, G). Use COUNTINGSORT to
sort, all elements in X according to their SMALL val-
ues, with the element with the largest SMALL value
listed first,. (Ties are broken arbitrarily.) According to
this order, we divide X (from the first to the last) into
FL, CL, CR, and FR each with size specified in the
AKS sorting algorithm.
Remark It is not clear at all why there are always
enough elements to be included in S;;(X) and Sk(X)
in Steps 1 and 2. However, we will see in the proof
of Theorem 1 that there are always sufficiently many
elements to be included in S;(X) and Sk(X) when
we use PARTITION(X) within our final algorithm for
sorting nuts and bolts.
3 An Analysis of the Algorithm
In this section, we sketch the correctness proof and the
running-time analysis of our algorithm for sorting nuts
and bolts.
Theorem 1 The algorithm described in the preceding
section sorts n nuts and n bolts in o(n log n) time.
Proof Sketch The proof is very complicated, and
we can only give a very brief sketch.
We define S,(X) to be the number elements that are
contained in X and are T or more strange w.r.t. X.
Note that our definition of S,.(X) is slightly different
from that of [7], in which $(X) is defined as the ratio
of the quantity in our definition to cap(X),
We will establish the correctness of the algorithm by
proving that the following two properties hold through-
out the execution of the algorithm. The parameter q
used in Property 3.2 is formally defined as
1
1-462A2+8A2-2A+L.
Note that our 77 is (slightly) different from the param-
eter 9 defined in [7, page 861, but they play a similar
role in the analyses.
240 KOML~SETAL.
Property 3.1 For any AKS tree node X and for any
rz 1,
ST(X) _< j.4 d-l cap(X).
(1)
Property 3.2 For any r >_ 1 and any AKS tree
node X such that (X( 2 Acap( when algorithm
PARTITION(X) is ezecuted, (1) at moat c p S-leap(X)
elements in X whose strangeness w.r.t. X is r or more
can be placed into CLU CR; (2) at most (q + E)cap(X)
elements in X whose ranks are at most m(X) can be
placed into CR; and (3) at most (q+c)cap(X) elements
whose Tanks are at least m(X) can be placed into CL.
We point out that, as in [7], Property 3.1 alone is
sufficient to establish the correctness of the algorithm,
since, towards the end of the algorithm when cap(X) is
less than a sufficiently small constant for all nonempty
nodes Xs, Property 3.1 implies that no item can be
strange w.r.t. the AKS tree node that it resides in.
In fact, inequality 1 is the key theorem proved in [7],
which guarantees the correctness of the original AKS
sorting algorithm, and an analogue of Property 3.2 was
(relatively easily) verified by the so-call c-halver prop
erty, which in turns depends on expander graphs. So
there was no need in [7] to deal with the analogue of
Property 3.2 when it came to the proof of Property 3.1.
In our algorithm, however, the two properties are mu-
tually dependent. In particular, algorithm PARTITION
would not provide a reasonable partition of X with-
out the validity of Property 3.1, because we cannot al-
ways keep a matching between X and X. Thus, in the
analysis of our algorithm, we will need to prove both
properties simultaneously. The following claims are key
steps to establish the correctness of Property 3.2. De-
tails are omitted.
Claim 3.1 For T 2 c + 1, where c is the constant de-
scribed immediately before Claim 2.2, S&(X), SR(X),
and SC(X) contain at most (1c1~~~~~~ cap(X),
6*--l
(l-2Ad)A=
cap(X), and $~~~~~0 cap(X), respec-
tively, elements whose strangeness w,r.t. X is at least
7-s
In the next claim, ~1 is an arbitrarily small constant,
which will be much smaller than E. This is achieved
at the cost of making e of Lemma 2.1 be a sufficiently
small constant, much smaller than ~1.
Claim 3.2 (I) SL(X) contains at least *cap(X)
elements whose ranks are in [a(X),cr(X) + 9 -
11; (2) SR(X) contains at least *cap(X) elements
whose ranks are in p(X) - 9 + 1, a(X)]; (3)
SC(X) contains at least *cap(X) elements whose
ranks are in [m(X) - w + f, m(X) + w - +I.
To prove that our algorithm has running time
O(n logn), it suffices to show that each stage of the
algorithm needs O(n) time, since the entire algorithm
proceeds in O(logn) stages. The key to the time anal-
ysis is to show that the time needed to partition an
AKS tree node X is at most
0 IW>l 1% -
(
ISWI
> cap(X)
where IS(X)] d enotes the number of elements con-
tained in S(X). Details are omitted. n
Corollary 3.1 When it is allowed to make copies of
nuts and bolts, the algorithm can be modified to sort n
nuts and n bolts in O(logn) time on n processors in
Valiants parallel comparison tree model.
Proof Sketch Given the proof of Theorem 1, the
proof of the corollary is relatively simple. The key
fact is that COMPARE(U,V, G) can be executed in a
constant number of parallel steps in Valiants paral-
lel comparison tree model, even if d, the degree of
a vertex in V, may not be constant: we can simply
make d copies for each element in V. This modifica-
tion will not affect the outcome of COMPARE@& V, G)
because within COMPARE( U, V, G) whether an element
x should be compared with another element y does
not depend on the outcome of any other compar-
isons that are made earlier during the execution of
COMPARE@, V, G). M oreover, the modification will
not increase the total number of comparisons involved
in COMPARE@, V, G). So the total number of proces-
sors needed for each of the O(logn) stages remains lin-
ear in n. Details are omitted. H
4 Conclusions
We have designed an optimal O(n log n)-time algorithm
for sorting or matching nuts and bolts. Since our al-
gorithm depends on some random graphs that we do
not know how to construct explicitly, a natural open
question is how to make our algorithm constructive.
Our algorithm can be executed in optimal O(logn)
time on n processors in Valiants parallel comparison
MATCHINGNUTS ANDBOLTS IN O(nlogn) TIME
tree model, provided that we can make copies of nuts
and bolts. However, when no copies are allowed (which
appears to be a reasonable assumption), we do not
know if it is possible to sort the nuts and bolts in
O(log n) time on n processors in Valiants parallel com-
parison tree model.
Yonatan Aumann [4] has pointed out that it is still
possible to sort nuts and bolts, by some algorithm, even
if there is no one-to-one matching between the nuts and
bolts. It is easy to see that, when all different nuts (and
all different bolts) are assumed to have distinct widths,
such sorting is possible if and only if for any pair of nuts
(resp., bolts) there exists a bolt (resp., nut) separating
the pair. It can be shown that our algorithm works
even under the most relaxed assumption. That is, our
algorithm sorts distinct nuts and bolts in the optimal
O(n logn) sequential time (or O(log n) parallel time on
n processors in Valiants parallel comparison tree model
when copying nuts and bolts is allowed) as long as such
sorting is possible by any algorithm. Note that under
the most relaxed assumption, even O(nlogn) ezpected
sequential time does not seem to be entirely trivial [4].
As we have mentioned in the introduction, the
O(n log4 n)-time algorithm of Alon ed al. [3] (resp., the
O(n log2 n)-time algorithm of Bradford et al. [S]) for
sorting nuts and bolts is based on an O(n log3 n)-time
(resp., O(n logn)-time) algorithm for selecting a me-
dian nut and a median bolt. It is well known that
the classic median selection (from a list of n numbers)
can be done in O(n) time 151. It would be interest-
ing to study if O(n)-time median selection is possible
in the context of nuts and bolts (say, when there is
a matching between nuts and bolts), since such an
algorithm (if possible) would immediately yield an-
other optimal algorithm for sorting nuts and bolts.
By using the graphs of Lemma 2.1 in some interest-
ing way and by using some technique of [l], we have
found an O(n (log log n)2)-time algorithm for select-
ing a median nut and a median bolt, This also gives
an O(n logn (log log n)2)-time algorithm for sorting or
matching nuts and bolts. One nice property of this al-
gorithm is that the constant factors behind the 0 no-
tations are reasonable, as opposed to the prohibitively
large constant involved in our AKS-based approach.
Details of our median-selection algorithm are omitted.
Acknowledgment
We thank Noga Alon for telling us the problem be-
fore [3] was published. We thank Greg Plaxton for
241
stimulating discussion on the design of the partition
scheme described in $2.2.3. We thank Yonatan Au-
mann, Nabil Kahale, and Tom Leighton for helpful con-
versations.
References
PI
PI
I31
141
151
PI
171
PI
PI
M. Ajtai, J. Komlos, , W. L. Steiger, and E. Sze-
meredi. Optimal parallel selection has complexity
O(log log N). J ounal of Computer and System Sci-
ences, 38(1):125-133, 1989. The conference version
appears in Proceedings of the 18th Annual ACM
Symposium on the Theory of Computing, pages
188-195, 1986.
M. Ajtai, J. Koml&, and E. SzemerCdi. Sorting
in c log n parallel steps. Combinatotica, 3(1):1-19,
1983. See also the conference version, which appears
in Proceedings of the 15th Annual ACM Symposium
on the Theory of Computing, pages l-9, May 1983.
N. Alon, M. Blum, A. Fiat, S. Kannan, M. Naor,
and R. Ostrovsky. Matching nuts and bolts. In
Proceedings of the 5th Annual ACM-SIAM Sympo-
sium on Discrete Algorithms, pages 690-696, Jan-
uary 1994.
Y. Aumann. Personal communication. 1994.
M. Blum, R. Floyd, V. Pratt, R. Rive&, and R. Tar-
jan. Time bounds for selection. Joozlmal of Com-
pitter and System Sciences, 7:448461, 1973.
P. Bradford and R. Fleischer. Matching nuts and
bolts faster. Technical Report MPI-I-95-1-003,
Max-Planck-Institut Fiir Informatik, May 1995. An
updated version appears in Proceedings of the Sixth
International Symposium on Algorithms and Com-
putation (ISAAC 95).
M. S. Paterson. Improved sorting networks with
O(log N) depth. dlgotithmica, 5:75-92, 1990.
G. J. E. Rawlins. Compared to what? an intro-
duction to the analysis of algorithms. Computer
Science Press, 1991.
L. G. Valiant. Parallelism in comparison problems.
SIAM J. Comput., 4~348-355, 1975.

Nuts and Bolts

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Nuts and Bolts

Transféré par

Droits d'auteur :

Formats disponibles

Chapter 27

Matching Nuts and Bolts in O(nlogn) Time

Vous aimerez peut-être aussi