(Extended Abstract) Jinos KomMs t4 Yuan Ma 2 Endre Szemerkdi 3y4 Abstract Given a set of n nuts of distinct widths and a set of n bolts such that each nut corresponds to a unique bolt of the same width, how should we match every nut with its correspond- ing bolt by comparing nuts with bolts (no comparison is allowed between two nuts or between two bolts)? The prob- lem can be naturally viewed as a variant of the classic sort- ing problem as follows. Given two lists of n numbers each such that one list is a permutation of the other, how should we sort the lists by comparisons only between numbers in different lists? We give an O(n log n)-time deterministic al- gorithm for the problem. This is optimal up to a constant factor and answers an open question posed by Alon, Blum, Fiat, Kannan, Naor, and Ostrovsky [3]. Moreover, when copies of nuts and bolts are allowed, our algorithm runs in optimal O(logn) t ime on n processors in Valiants parallel comparison tree model. Our algorithm is based on the AKS sorting algorithm with substantial modifications. 1 Introduction Given a set of n nuts of distinct widths and a set of n bolts such that each nut corresponds to a unique bolt of the same width, how should we match every nut with Department of Mathematics, Rutgers University, P&&away, NJ 08855. Email: komlos&nath.rutgem.edu. Department of Computer Science, Stanford University, CA 94305. Supported by an NSF Mathematical Sciences Post- doctoral Research Fellowship. Part of the work was done while the author was visiting DIMACS, and part of work was done while he was at MIT and supported by DARPA Contracts N00014-91-J-1698 and N00014-92-J-1799. Email: yuanOcs.stanford.edu. 3Department of Computer Science, Rutgers University, Pis- CataWay, NJ 08855. Part of the work was done while the au- thor was at University of Paderborn, Germany. Email: sse- mered@cs.rutgers.edu. The work presented here is part of the Hypercomputing & Design (HPCD) project; and it is supported (partly) by ARPA under contract DABT-63-93-C-0064. The content of the infor- mation herein does not necessarily reflect the position of the Government and official endorsement should not be inferred. its corresponding bolt by comparing nuts with bolts (no comparison is allowed between two nuts or between two bolts)? This problem can be naturally viewed as a variant of the classic sorting problem as follows. Given two lists of n numbers each such that one list is a permu- tation of the other, how should we sort the lists by comparisons only between numbers in different lists? In fact, the following simple reasoning illustrates that the problem of matching nuts and bolts and the prob- lem of sorting them have the same complexity, up to a constant factor. On one hand, if the nuts and bolts are sorted, then a nut and a bolt at the same position in the sorted order certainly match with each other. On the other hand, if the nuts and bolts are matched, we can sort them by any optimal sorting algorithm in O(n log n) time. Hence, the complexity equivalence of sorting and matching them follows from the simple in- formation lower bound of R(nlogn) on the matching problem, which can be easily derived from the fact that there are n! possible ways to match the nuts and bolts. So in this paper, we will consider the problem of how to sort the nuts and bolts, instead of matching them. The problem of sorting nuts and bolts has a sim- ple randomized algorithm (e.g., a simple variant of the QUICKSORT algorithm) that runs in the opti- mal O(n logn) expected time [8]. However, finding a nontrivial (say, o(n2)-time) deterministic algorithm has appeared to be highly nontrivial. Alon, Blum, Fiat, Kannan, Naor, and Ostrovsky [3] designed an O(n log4 n)-time deterministic algorithm based on ex- pander graphs, and they posed the open question of de- signing an optimal deterministic algorithm to the prob- lem. Recently, Bradford and Fleischer [6] improved the running time to O(n log n), but the question remains open if O(n log n) can be achieved. Since the classic sorting problem has been inten- sively studied, it is natural to ask if any existing O(n log n)-time deterministic sorting algorithm can be easily adapted to sort nuts and bolts. In a certain sense, 232 MATCHING NUTS AND BOLTS IN O(n Iog n) TIME 233 most of the existing O(n log n)-time sorting algorithms use a divide-and-conquer approach. In particular, they require recursive solutions to subproblems of smaller size. For the classic sorting problem, solving the sub- problems is simple. However, in the context of sorting nuts and bolts, solving a subproblem can raise many problems. In particular, the fact that we can sort the nuts and bolts at all relies on the fact that there is a matching between them.l For example, if all of the nuts happen to be smaller than all of the bolts, then we will not be able to learn anything about the order of the nuts or the order of the bolts, by comparing nuts against bolts only. As a consequence, if we want to make use of existing sorting algorithms, it is essen- tial to make arrangements so that, when we work on a smaller set of nuts and a smaller set of bolts, we may obtain useful information in an efficient way. Unfor- tunately, no existing deterministic sorting algorithm of O(n log n) time seems readily adaptable to make such arrangements. Faced with such difficulty, the algorithm of Alon et al. [3] uses an O(n log3 n)-time algorithm for selecting a median nut and a median bolt, which in turns is based on expander graphs. However, as pointed out by Alon etaZ.[3],th p t en ar icular method cannot be adapted to select a median in O(n) time, and a possible O(nlogn) algorithm needs to come from a different means. Sim- ilarly, the O(n log2 n)-time algorithm of Bradford et al. [S] is based on an O(nlogn)-time algorithm for se- lecting a median nut and a median bolt. In fact, we have discovered a (fairly) simple O(n(log log n)2)-time algorithm for selecting a median nut and a median bolt, thereby giving an O(n log n (log log n)2)-time algorithm for sorting nuts and bolts. We will not give any details of this algorithm, however, since it appears that we need to do something very different to achieve an opti- mal O(n log n) time. The main contribution of this paper is an 0( n log n)- time algorithm for sorting nuts and bolts, which is based on the AKS sorting algorithm [2] with substan- tial modifications.2 As a by-product of our AKS-based approach, our algorithm can be executed in O(logn) time on n processors in Valiants parallel compari- son tree model [9], when copying of nuts and bolts is allowed. In Valiants model, only comparisons are counted towards the running time, and book-keeping is free. We remark that our algorithm is not fully con- structive, and some of its gadgets depend on some ran- dom graph properties. The existence of such graphs is easily proved by a random construction, but we do not know how to construct them explicitly. However, all other parts of our algorithm are constructive, and once explicit constructions of the desired graphs are discovered, our algorithm will be constructive as well. The rationale of using an AKS-based approach for sorting nuts and bolts lies behind some special prop erties of the AKS sorting algorithm. Roughly, as de- scribed by Paterson [7], the AKS sorting algorithm pro- ceeds as follows: It arranges the numbers being sorted in a complete binary tree, which will be referred to as the AKS tree. Each node of the AKS tree contains a set of numbers. Most of the numbers in the same node have ranks within a certain interval. At each stage of the algorithm, a certain sorting-related devise (with O(1) parallel time) is used to approximately par- tition the numbers at each node of the AKS tree. In a way, the AKS sorting algorithm proceeds by partition- ing in a weak sense: it approzimately partitions num- bers into almost correct halves and has an intricate error-correcting mechanism. In particular, unlike most other known O(n logn)-time deterministic sorting algo- rithms, the AKS sorting algorithm does not proceed in a rigorous divide-and-conquer fashion. These special properties will appear to be advantageous in sorting nuts and bolts. Although there are good reasons that the AKS sort- ing algorithm may be a good tool for sorting nuts and bolts, a direct modification of the AKS sorting algo- rithm does not solve our problem. For example, one naive approach is as follows: Keep two AKS trees, one for the nuts and the other for the bolts; At each stage of the algorithm, compare nuts and bolts in correspond- ing AKS tree nodes according to an expander graph and reallocate the nuts and bolts according to the re- sults of the comparisons. Such an approach proceeds well at the initial few stages, but it has serious troubles Such a condition cau be slightly relaxed, as to be discussed in future stages. The problem arises since we cannot in $4. keep a matching between the nuts and bolts in corre- 2The AKS sorting algorithm was designed to be implemented sponding AKS tree nodes. For example, when the roots in an oblivious fashion on a comparator network, and it also has an optimal parallel running time of 0 (logn) on ta processors. In contain only a constant number of nuts and bolts, it is this paper, our main focus is the sequential algorithm model, and possible that all of the nuts contained in the root of we will refer to the work of [2] as the AKS sorting algorithm, aa one AKS tree are smaller than all of the bolts con- opposed to the AKS sorting network. tained in the root of the other AKS tree, in which case 234 KOML~SETAL. we cannot obtain any information, by comparisons be- As described in [7], the behavior of the AKS sort- tween the nuts and bolts in the roots, about the order ing algorithm can be best understood by thinking of of the nuts or the order the bolts that are located in all the elements (which refer to nuts or bolts) being the roots. In fact, such observations may even lead one to suspect whether the AKS-based approach is helpful at all in the context of sorting nuts and bolts. The nov- elty of our work in adapting the AKS sorting algorithm is to introduce certain mechanisms that allow efficient approximate-partitioning at an AKS tree node even if the nuts and bolts in the corresponding AKS tree nodes do not form a matching. The remainder of the paper is organized into sections as follows. In $2, we present our algorithm for sorting nuts and bolts. In $3, we prove the correctness of the algorithm and analyze its running time. We conclude in $4 with discussions on some extensions and open problems. Due to space limitations, we omit most of the technical proofs in this extended abstract. 2 An O(nlog n)-Time Algorithm for Sorting Nuts and Bolts This section contains the description of our O(n log n)- time algorithm for sorting nuts and bolts. As pointed out in the introduction, our algorithm depends on some random graphs, which we do not know how to construct explicitly. Also, we will be content with an algorithm of O(nlog n) running time. No attempt will be made to keep the involved constants small. In particular, a large constant (much larger than the previously best known constant for the AKS sorting algorithm) is hid- den behind the 0 notation. 2.1 An Overview of the Algorithm In this subsection, we give a high-level description of our AKS-based algorithm. The algorithm proceeds much like the AKS sorting algorithm, except that we use a completely different method to partition nuts (bolts) in an AKS tree node. The partition method is fairly complicated and will be the subject of the next subsection. In this subsection, we will assume such a partition can be done and focus on other simpler issues. We first need a complete understanding of the AKS sorting algorithm. However, the AKS sorting algorithm is fairly complicated and we will not be able to include a complete description of the entire algorithm due to the limited space. In what follows, we only sketch the AKS sorting algorithm at a high level, and we refer the readers to [7] f or a complete and rigorous description. sorted to move within a complete binary tree, with the root at the top. A rigorous treatment of such a tree structure can be found in [7]. We will refer to such a tree as an AKS tree and refer to a node in the tree as an AKS tree node. The elements being sorted are arranged within the nodes of the AKS tree. Each AKS tree node X has a capacity, denoted by cap(X), that specifies the maximum number of elements that can be contained in X. Let 1x1 denote the number of elements that are indeed contained in X. X is called empty, full, or partially full, if 1x1 = 0, 1x1 = cap(X), or 0 < 1x1 < cap(X), respectively. The AKS sorting algo- rithm works in stages starting from stage one. Within each stage, there is a sorting-related devise that par- titions each AKS tree node, X, into four parts, FL, CL, CR, and FR, which stand for far-left, %enter- leftn, center-right, and far-right, respectively. (To be rigorous, we partition the list of elements in node X, as opposed to node X itself. But we will not dis- tinguish X from the list of elements contained in X when no confusion can arise.) By doing so, we hope to move most of the elements in X into the correct halves, FL U CL and CR U FR, and to move most of the extreme elements to the extreme positions FL and FR. At the end of a stage, elements in FL and FR are sent to the parent of X, and CL and CR are sent to the left and right children of X, respectively. This will have the effect of moving most of the correctly lo- cated elements downward in the AKS tree and moving most of the incorrectly-located elements upward in the AKS tree. Overall, most elements in lower part of the AKS tree are near their correct positions, and elements far away from their correct positions tend to move up- wards in the AKS tree so that they will be processed further. The AKS tree can be viewed to be infinite, but we make the convention that a leaf of an AKS tree is a non-empty node in the lowest level of the AKS tree. At odd stages, all nodes at odd levels and all nodes below the leaf level are empty, and all nodes at even levels above the leaf level are full except that nodes at the leaf level can be full or partially fu11.3 (The root is assumed to be at level 0.) The opposite holds at even stages. This completes our brief description of the AKS To be more rigorous, when we say that a node is fkll or empty during a stage, we mean it is full or empty at the beginning of the stage and stay so in most time of the stage. Note that elements in a full or partially full node X will be moved to the parent or children of X at the end of the stage. MATCHING NUTS AND BOLTS IN O(n log n) TIME 235 sorting algorithm. At a high level, our algorithm differs from the origi- nal AKS sorting algorithm in two ways: (1) we need to keep two separate AKS trees: TN for the set of nuts N, and TB for the set of bolts B; (2) we need a completely different method to partition elements in an AKS tree node. Other than these two differences, our algorithm for sorting nuts and bolts works exactly as the AKS sort- ing algorithm. In particular, the structures of the two AKS trees are identical (except that one contains nuts and the other contains bolts): they are specified by the same set of parameters. To describe explicitly how our algorithm works, we need to specify some parameters associated with the AKS sorting algorithm. For sim- plicity, we will explicitly follow the parameter choices of [7] whenever possible. In particular, we will use the same letters to denote the same quantities as in [7] un- less specified otherwise. We choose the same parameters associated with our AKS trees as in [7]: A=3 ~~43 andX=l 7 48 8 As in [7], the choices of these parameters completely determine how the nuts and bolts move within TN and TB. In particular, l the capacity of an AKS tree node X immediately after stage t at level d is determined by cap(X) = vt Ad N(l - &); l at each stage, the elements at X are partitioned into four parts FL, CL, CR, and FR such that (1) /FL/ = (FR( = min{icap(X), F}, JCL] = (CR1 = +(FL(, and (2) at the end of the stage, FL and FR are moved to the parent of X, and CL and CR are moved to the left and right children of X, respectively. Also, we choose the same as in [7]. Note that p and 6 have nothing to do with the description of the algorithm and will be used only in the analysis of the algorithm. Another parameter E was used in [7] to specify the functionality of the so-called separator, which corre- sponds to the so-called near-soding network of [2]. In [7], a separator is used to partition an AKS tree node X into four parts FL, CL, CR, and FR. In our algorithm, however, we cannot use a separator or near-sorting network, since, as we have explained in the introduction, we cannot enforce a matching between nuts and bolts in corresponding AKS tree nodes. Nev- ertheless, we need a sorting-related devise for such a partition. The partition scheme is fairly intricate and will be the subject of the next subsection. In any event, following the notation of [7], we will use parameter E to measure the accuracy of our partition method. We do not specify how to choose E explicitly. Instead, we will be content with proving that a sufficiently small E suffices for our purposes. Finally, as in [7], we also need to deal with the so-called boundary conditions and integer rounding. These can be easily handled in the same way as in [7], and we will not address these particular technical prob- lems hereafter. 2.2 Partitioning Nuts or Bolts at an AKS Tree Node In this subsection, we describe an algorithm to parti- tion elements in an AKS tree node X into four parts FL, CL, CR, and FR. We will accomplish the par- tition of X by comparing nuts (or bolts) in X with bolts (or nuts) in a set S(X), which is to be defined in s2.2.2. On one hand, S(X) should be large enough so that a proper partition of X is possible, i.e., S(X) should contain enough bolts (or nuts) to separate some of the nuts (or bolts) in X from the others. On the other hand, S(X) should be small enough so that the number of necessary comparisons between X and S(X) for partitioning X is not prohibitively large. The remainder of the subsection is organized as fol- lows. In $2.2.1, we prove a lemma on random graphs, and describe how to use the graphs to construct a comparison algorithm. In $2.2.2, we construct S(X). In $2.2.3, we describe how to partition X by applying the comparison algorithm of $2.2.1 to X and S(X). 2.2.1 Random Graphs and a Comparison Al- gorithm In this sub-subsection, we first prove a useful lemma on random bipartite graphs. Then, we describe how to use such graphs in a comparison algorithm, which is an important building block in our O(nlogn)-time algo- rithm for sorting nuts and bolts. Although a random graph will yield a desired graph with high probability, we do not know how to construct such graphs explicitly. 236 KOML~SET AL. The graphs considered in this paper are allowed to be multi-graphs, and we use e(X, Y) to denote the number of edges between X and Y for arbitrary vertex subsets X and Y. In particular, if there are m edges between a vertex u E X and a vertex v E Y, then each of the m multiple edges between 1~ and v is counted exactly once in e(X, Y). Also, we use e to denote the natural number and use ln to denote the logarithm with base e. Lemma 2.1 Let E and 8 be two arbitrary constants in (0, l), and let U and V be two aeta such that IV1 5 IVl. If d 2 2ev3 In ((e2/e2)(lVl/lUl)), then there eziata a bipartite graph G = (U, V, E), E s U x V with the following properties: (1) deg(v) = d for all v E V; (2) e(X,Y) 1 (1 - e)d 1x1 ]Yl/lUl, for any sets X C_ U, Y c V Buch that 1x1 > eJUI and IYI 1 e[UI, and (3) if (U( = (VI and d _> (8/0)ln(l6/0), then, any Y 5 V of size IYI 5 2em48 IUl/d ia directly connected (i.e., connected by an edge, as opposed to by a path) to at least 6djYj/2 rt ve ices in U, even if an arbitrary set of up to (1 - B)d edges are removed from each vertex in Y. Proof We can prove that a random graph has the desired properties. Details are omitted. m Roughly, Lemma 2.1 says that the number of edges between two sets of vertices cannot be much smaller than the average number of edges between two sets of their sizes. In a certain sense, this also means that the edges between U and V are evenly distributed and so the number of edges between two sets of vertices cannot be much larger than the average. Formally, we have the following corollary whose proof is straightforward and omitted. Corollary 2.1 In the graph of Lemma 2.1, foT any aeta X E U and Y C_ V such that IY ( 2 elU[, e(X, Y) 2 dlXIIYIIlUl+ EdlYI* We now describe how to apply the graph of Lemma 2.1 to construct a comparison algorithm, in a way similar to that of [2] and [7]. We will use some adaptive methods, such as counting, in some future applications of the algorithm, whereas [2] and [7] deal with comparator networks and can only use oblivi- ous methods. Given an arbitrary set of nuts (bolts) U, an arbitrary set of bolts (nuts) V where (VI 2 /VI, and a bipartite graph G C U x V, algorithm COMPARE(U, V, G) works as follows. Algorithm COMPARE(U, V, G) Step 1. Set SMALL(V) = LARGE(V) = 0 for each v E v. Step 2. For each edge (u, v) in graph G, compare u and v. Then, increment SMALL(V) by 1 if v < U; in-. crement LARGE(V) by 1 if v > u; increment SMALL(V) and LARGE(V) each by l/2 if u = v. In the above algorithm, SMALL(V) (resp., LARGE(V)) denotes the number of comparisons in algorithm COM- PARE where v is strictly smaller than (resp., strictly larger than) its opponent plus half of the number of comparisons in algorithm COMPARE where v is equal to (resp., equal to) its opponent. In particular, we in- crement both LARGE(V) and SMALL(V) by l/2 if v is equal to its opponent. Such an arrangement will make some of our future arguments simple by ensuring that the values of SMALL and LARGE are symmetric. We remark that there may be multiple edges between u and v, in which case, u and v is compared for more than once, and SMALL(V) or LARGE(V) is updated ev- ery time a comparison between u and v occurs. It would be nice if algorithm COMPARE(& V, G) al- ways provides an approximate partition of V. However, such a partition is not always possible. For example, if every nut in V is smaller than every bolt in U, then no matter how we conduct our comparisons, the outcome will not provide any useful information for partition- ing V. Nevertheless, we next show that the algorithm has a certain ranking property in a certain case. Such a ranking property will then be further exploited to provide a more sophisticated algorithm for partition. In what follows, we define the rank of an element z with respect to a set Y, denoted by rank(z, Y), as the number of elements in Y that are smaller than or equal to z. Note that rank(t,Y) is well defined even if z and elements of Y cannot be compared by a direct compar- ison, e.g., x is a bolt and Y is a set of nuts. When we say the rank of element z, denoted by rank(z), with- out specifying a corresponding Y, we mean the rank of z with respect to l3 (or, equivalently, with respect to N). For any C, < E [0, l] and for any sets of elements U and V, let V(C,t, U> = {v E V I 6 PI L -WV, U) I E IW. In the next lemma, U and V are a set of nuts and a set of bolts (or a set of bolts and a set of nuts), respec- tively, G C U x V is a bipartite graph with parameters d and e as described in Lemma 2.1. (Here, we do not MATCHING NUTS AND BOLTS IN O(n log n) TIME need the third property of Lemma 2.1, and so we do not need the parameter 0.) Lemma 2.2 Assume clU/ 2 2, C, t E [O, 11. 1,f aZgo- tiihm COMPARE( U, V, G) is executed, then (1) at most E IV1 eZements in V(O, t, 17) have their SMALL values less than or equal to (1 - t - 2e)d, (2) at most E IV 1 elements in V(c, 1, U) have their LARGE vales Iess than OT equal to (C - 2~)d, and (3) for any X E V(0, C, U) and any Y E V(<, 1, U) where < - C >_ 66, if SMALL(Z) 5 SMALL(Y) for all z E X and alZ y E Y, then either 1x1 < cIU( 07 (Y( < e]Ul. Proof Use Lemma 2.1. Details are omitted. n 2.2.2 Construction of S(X) S(X) consists of three subsets SL(X), SR(X), and SC(X). In order to partition X properly, not only do we need to know S(X), but also we need to know h(X), sR(x), and SC(X). This sub-subsection is de- voted to construct these sets. We first introduce some concepts. Some of these con- cepts are not directly used in the construction of S(X), but they are useful to understand the relevant termi- nologies and to analyze our final algorithm. So we de- fine these concepts here for ease of reference. The con- cepts of a natural interval and strangeness were used in [7]. The natural interval of an AKS tree node is in- ductively defined as follows: the natural interval of the root of an AKS tree is [l,n]; if the natural interval of an AKS tree node X is [cy, p], then the natural intervals of the left and right children of X are [a, v] and [q, p], respectively. Let [a(X),p(X)] denote the natural interval of an AKS tree node X, and let m(X) = a(XJ:pX1. The strangeness of an element z w.r.t. (with re- spect to) an AKS tree node X is defined to be the number of levels that z needs to move from X upward in Xs AKS tree in order to reach the first AKS tree node whose natural interval con- tains rank(z). (Note that the strangeness of z w.r.t. X is well-defined even if x is not located in X.) For each AKS tree node X, let h(X) denote the height of X in its AKS tree. (The height of a leaf is assumed to be 0.) 237 Claim 2.1 If X is an AKS tree node such that h(X) 2 0 (i.e., X is either above or included in the leaf level), then cap(X) _< 6 2-h(X) (P(X) - a(X) + 1). Proof Assume that X is i levels below the root, where i 2 0. Consider the lowest level where each AKS tree node is full. This level is at least h(X) - 2 levels below Xs level, since either a leaf is full or its grandparent is full. The sum of the capacity of all the nodes at this level is at most n, since there are at most n elements in an AKS tree. Hence, 2+h(x)-2cap(X) A _ h(X)-2 < 12 = 2@(X)-a(X)+l) , where the last equality holds since the sum of the natural-interval sizes at any level of an AKS tree is equal to n. The correctness of the claim follows imme- diately from the above inequality. n In the next claim and the rest of the paper, we will use parameter c to denote a certain large constant. We will not specify the explicit value of c, but we will see that a sufficiently large value of c will be good for our algorithm. Claim 2.2 If h(Y) 2 $h(X) + c and h(X) 1 0, then cap(X) <p(Y) - o(Y) + 1. Proof Note that h(Y) 2 $h(X) + c implies h(X) - h(Y) < i/r(X) - c. Hence, by Claim 2.1, cap(X) < 62-h(Xl (/3(X) - (Y(X) + 1) = pw (p(y) - a(y) + 1) 2h(xbh(Y) 5 62-h(X) (p(Y) - a(Y) + 1) 2*h(Xl-c < P(Y) - a(Y) + 1, where the last inequality holds since c is sufficiently large and h(X) 2 0. H Claim 2.3 At each level with height at least 0.5 h(X)+ c in either TN or TB, there exists a unique AKS tree node whose natural interval contains [cu(X),a(X) + cap(X)/36 - 11. Proof Since natural intervals at the same level of an AKS tree cannot overlap with each other, we only need to show the existence of a desired node at each level. Moreover, since the natural interval of a node is contained in the natural interval of its parent, we only need to consider the level with height 0.5 h(X) + c. If h(X) 5 0.5 h(X) + c, then the ancestor of X 238 with height 0.5 h(X) +c has the desired property since Claim 2.1 implies that Xs natural interval contains [a(X), a(X) + cap(X)/36 - 11. If h(X) > 0.5 h(X) + c, then let Y be the unique descendant of X at level 0.5 h(X) + c such that (Y(Y) = o(X). By Claim 2.2, Y has the desired property. n By Claim 2.3, the following notation of XL,j (i = 0,l) is well-defined. q2.i (i = 0, 11, X&,7 defined. For an AKS let l Similarly, we can verify that XhL,r, and X&r are all well- tree node X in TN (resp., TB), X be the unique AKS tree node in TB (resp., .TN) such that [4X), P(X)l = b(X), P(X)l, l Xi,i (i = 0,l) be the unique AKS tree node in TB (resp., TN) such that /z(X~,~) = 2-/4X) + c and [~(XL,i>,P(Xi,i)I 2 [a(X>> o(X)+ca~(X)/36- 11, l X;l,i (i = 0,l) be th e unique AKS tree node in TB (resp., TN) such that h(X& i) = 2-h(X) + c and [a(Xk,i),P(Xk,i)] 1 P(X) - w(X)/36 + LPWl7 l X c,e be the unique AKS tree node in TB (resp., TN) such that h(X,&,) = h(X) + c and b(X,&,Oh P(-J&,N 2 [4X> - cap(X)/72 + l/2, m(X)) + cap(X)/72 - l/2], l X cL,1 be the unique AKS tree node in TB (resp., TN) such that J6.(XbL,r) = 2-lh(X) + c and bwL,1w(&,1N 2 [4X> - cap(X)/72 + 1/2,4X)l, l X&,,, be the unique AKS tree node in TB (resp., TN) such that h(X&,) = 2-h(X) + c and ~(X~R,~),P(X&)I 2 bW>,~(x> + cap(X)/72 - l/2], l PL(X) be the unique path from XL,-, to Xi,l, l PR(X) be the unique path from Xk,c to Xk,r, l PcL(X) be the unique path from X&, to X&, l PcR(X) be the unique path from Xk,, lJo x&R,l. In the above definition, PL(X) is assumed to con- tain the nodes XL,, and Xi,,; Similarly, each of the other three paths (PR(X), Pc~((x), PcR(X)) contains its end nodes described above. We are now ready to define SL(X), S&X), and SC(X). KOML~SET AL. Let Tx denote the subtree rooted at X in the AKS tree containing X, and let TX(d) denote the subtree of TX consisting of all nodes in TX that are d levels within X. (Note that TX(~) contains d + 1 levels.) Let SL(X) = u TYJ (0.5 h(X) + c) , Y'EPl(X) sR(x) = u Tyt (0.5 h(X) + c), and Y'a3?(Xy) SC(X) = U Tyr(O.5 h(X) + c). YEPCL(X) u &R(X) Note that S&(X), SR(X), and SC(X) are supposed to be sets of bolts or nuts, but the above definitions define them as sets of AKS tree nodes. Like we have used X to denote both an AKS tree node and the list Of elements in X, We Use SL(X), SR(X), and SC(X) to denote both the sets of AKS tree nodes as defined above and the lists of nuts or bolts contained therein, as long as the meaning is clear from the context. Roughly, SL(X) (resp., SR(X)) looks like a tape attached to path PL(X) (resp., P&X)). The tape vertically ex- tends from c levels above X all the way down to the leaf level. Similarly, SC(X) looks like two tapes of a similar shape. The intuition behind this complicated definition of St(X), SE(X), and SC(X) will become clear in the proof of Theorem 1. 2.2.3 A Partition Algorithm In this sub-subsection, we describe how to partition X into FL, CL, CR, and FR by comparing elements in X with elements in S(X), according to algorithm COMPARE described in $2.2.1. The reason that our algorithm can provide a proper partition of X is fairly lengthy and will be discussed in $3. In particular, it is dependent upon another key property of the original AKS sorting algorithm. Note that Lemma 2.2 only states that algorithm COMPARE@& V, G) sometimes gives a proper partition of V, the larger set between U and V. In fact, a care- ful investigation of the proof of Lemma 2.2 reveals that when V is substantially larger than U, not much can be said about the ranking of V (the smaller set be- tween V and V) by COMPARE(V, V, G). On the other hand, however, we will need to partition X by compar- ing X with S(X), which can be much larger than X in many cases. Hence, in the most interesting case (see MATCHING NUTS AND BOLTS IN O(n log n) TIME 239 Case 2 below), the following algorithm PARTITION(X) consists of two major phases: In the first phase, we choose subsets S;(X) C Sh(X), S&(X) E SC(X), and s;(x) c sR@), each of which has size comparable to 1x1, and we let S(X) = Sk(X) U S;(X) U Sk(X). In the second phase, we use S(X) to partition X into FL, CL, CR, and FR. Algorithm PARTITION(X) Let Q be a sufficiently small constant. There are two cases. Case 1: ~1x1 < 2. We compare all elements in X with all elements in S(X). Then, we construct a graph on all elements of X by drawing a directed edge from ~1 E X to x2 E X if there exists an element 2 E S(X) such that ~1 5 z 5 ~2. Such a graph is a DAG (directed acyclic graph), and we can topologically sort X according to the DAG. According to this order, we divide X into FL, CL, CR, and FR each with size specified in the AKS sorting algorithm, i.e., IFL = ? IFRl = min{$cap(X), y}, and lCLl= ICRl = y- WI. Case 2: E/XI > 2. Let G be a bipartite graph de- scribed in Lemma 2.1. (As we will see in the proof of Theorem 1, 8 will be a fraction of A). In particular, in the first, three steps of the algorithm, G s X x SL(X), G c X x SR(X), and G E X x SC(X), respectively, and in the last step of the algorithm, G c S(X) x X. Step 1. Apply COMPARE(X,SL(X),G). Let S;(X) be a set consisting of (A/10)/X] elements in SL(X) with the smallest SMALL values among those whose SMALL values are at least d(lXl - $$ cap(X) - 2E IX[)/lXl. (Ties are broken arbitrarily.) Step 2. Apply COMPARE(X, SR(X), G). Let S&(X) be a set consisting of (X/lO)lX( elements in S&X) with the smallest LARGE values among those whose LARGE values are at least d(lXl - Fcap(X) - 2~)Xl)/lXl. (Ties are broken arbitrarily.) Step 3. Apply COMPARE(X, SC(X), G). Let S&(X) consist of at most (l/2 - A/lO)lXl elements in SC(X) with the smallest SMALL values among those whose SMALL values are at least (l/2 - 2~)d. (That is, (i) if there are more than (l/2 - X/10)(X( elements in SC(X) having their SMALL values at least, (l/2 - 2e)d, then let S&,(X) consist. of (l/2 - X/lO)lXl elements in SC(X) with the smallest SMALL values among those whose SMALL values are at least, (l/2-26)& (ii) if there are at most (l/2 - /\/lO)lXl elements in SC(X) hav- ing their SMALL values at least (l/2 - 2c)d, then let S&,(X) consist of all these elements.) Similarly, let &(X) consist of at most (l/2 - X/lO)lXl elements in 5~ (X) with the smallest. LARGE values among those whose LARGE values are at least (l/2 - 2~)d. (Ties are broken arbitrarily.) Include all elements in S&,(X) and S&,(X) into S&(X). If S;(X) has less than (1- X/5)(X( elements, put an additional arbitrary set, of elements from SC(X) into Sk(X) so that S>(X) contains exactly (l-X/5)IXJ elements. Step 4. Let, S(X) = S;(X) U Sk(X) US&(X). Ap- ply COMPARE(S(X), X, G). Use COUNTINGSORT to sort, all elements in X according to their SMALL val- ues, with the element with the largest SMALL value listed first,. (Ties are broken arbitrarily.) According to this order, we divide X (from the first to the last) into FL, CL, CR, and FR each with size specified in the AKS sorting algorithm. Remark It is not clear at all why there are always enough elements to be included in S;;(X) and Sk(X) in Steps 1 and 2. However, we will see in the proof of Theorem 1 that there are always sufficiently many elements to be included in S;(X) and Sk(X) when we use PARTITION(X) within our final algorithm for sorting nuts and bolts. 3 An Analysis of the Algorithm In this section, we sketch the correctness proof and the running-time analysis of our algorithm for sorting nuts and bolts. Theorem 1 The algorithm described in the preceding section sorts n nuts and n bolts in o(n log n) time. Proof Sketch The proof is very complicated, and we can only give a very brief sketch. We define S,(X) to be the number elements that are contained in X and are T or more strange w.r.t. X. Note that our definition of S,.(X) is slightly different from that of [7], in which $(X) is defined as the ratio of the quantity in our definition to cap(X), We will establish the correctness of the algorithm by proving that the following two properties hold through- out the execution of the algorithm. The parameter q used in Property 3.2 is formally defined as 1 1-462A2+8A2-2A+L. Note that our 77 is (slightly) different from the param- eter 9 defined in [7, page 861, but they play a similar role in the analyses. 240 KOML~SETAL. Property 3.1 For any AKS tree node X and for any rz 1, ST(X) _< j.4 d-l cap(X). (1) Property 3.2 For any r >_ 1 and any AKS tree node X such that (X( 2 Acap( when algorithm PARTITION(X) is ezecuted, (1) at moat c p S-leap(X) elements in X whose strangeness w.r.t. X is r or more can be placed into CLU CR; (2) at most (q + E)cap(X) elements in X whose ranks are at most m(X) can be placed into CR; and (3) at most (q+c)cap(X) elements whose Tanks are at least m(X) can be placed into CL. We point out that, as in [7], Property 3.1 alone is sufficient to establish the correctness of the algorithm, since, towards the end of the algorithm when cap(X) is less than a sufficiently small constant for all nonempty nodes Xs, Property 3.1 implies that no item can be strange w.r.t. the AKS tree node that it resides in. In fact, inequality 1 is the key theorem proved in [7], which guarantees the correctness of the original AKS sorting algorithm, and an analogue of Property 3.2 was (relatively easily) verified by the so-call c-halver prop erty, which in turns depends on expander graphs. So there was no need in [7] to deal with the analogue of Property 3.2 when it came to the proof of Property 3.1. In our algorithm, however, the two properties are mu- tually dependent. In particular, algorithm PARTITION would not provide a reasonable partition of X with- out the validity of Property 3.1, because we cannot al- ways keep a matching between X and X. Thus, in the analysis of our algorithm, we will need to prove both properties simultaneously. The following claims are key steps to establish the correctness of Property 3.2. De- tails are omitted. Claim 3.1 For T 2 c + 1, where c is the constant de- scribed immediately before Claim 2.2, S&(X), SR(X), and SC(X) contain at most (1c1~~~~~~ cap(X), 6*--l (l-2Ad)A= cap(X), and $~~~~~0 cap(X), respec- tively, elements whose strangeness w,r.t. X is at least 7-s In the next claim, ~1 is an arbitrarily small constant, which will be much smaller than E. This is achieved at the cost of making e of Lemma 2.1 be a sufficiently small constant, much smaller than ~1. Claim 3.2 (I) SL(X) contains at least *cap(X) elements whose ranks are in [a(X),cr(X) + 9 - 11; (2) SR(X) contains at least *cap(X) elements whose ranks are in p(X) - 9 + 1, a(X)]; (3) SC(X) contains at least *cap(X) elements whose ranks are in [m(X) - w + f, m(X) + w - +I. To prove that our algorithm has running time O(n logn), it suffices to show that each stage of the algorithm needs O(n) time, since the entire algorithm proceeds in O(logn) stages. The key to the time anal- ysis is to show that the time needed to partition an AKS tree node X is at most 0 IW>l 1% - ( ISWI > cap(X) where IS(X)] d enotes the number of elements con- tained in S(X). Details are omitted. n Corollary 3.1 When it is allowed to make copies of nuts and bolts, the algorithm can be modified to sort n nuts and n bolts in O(logn) time on n processors in Valiants parallel comparison tree model. Proof Sketch Given the proof of Theorem 1, the proof of the corollary is relatively simple. The key fact is that COMPARE(U,V, G) can be executed in a constant number of parallel steps in Valiants paral- lel comparison tree model, even if d, the degree of a vertex in V, may not be constant: we can simply make d copies for each element in V. This modifica- tion will not affect the outcome of COMPARE@& V, G) because within COMPARE( U, V, G) whether an element x should be compared with another element y does not depend on the outcome of any other compar- isons that are made earlier during the execution of COMPARE@, V, G). M oreover, the modification will not increase the total number of comparisons involved in COMPARE@, V, G). So the total number of proces- sors needed for each of the O(logn) stages remains lin- ear in n. Details are omitted. H 4 Conclusions We have designed an optimal O(n log n)-time algorithm for sorting or matching nuts and bolts. Since our al- gorithm depends on some random graphs that we do not know how to construct explicitly, a natural open question is how to make our algorithm constructive. Our algorithm can be executed in optimal O(logn) time on n processors in Valiants parallel comparison MATCHINGNUTS ANDBOLTS IN O(nlogn) TIME tree model, provided that we can make copies of nuts and bolts. However, when no copies are allowed (which appears to be a reasonable assumption), we do not know if it is possible to sort the nuts and bolts in O(log n) time on n processors in Valiants parallel com- parison tree model. Yonatan Aumann [4] has pointed out that it is still possible to sort nuts and bolts, by some algorithm, even if there is no one-to-one matching between the nuts and bolts. It is easy to see that, when all different nuts (and all different bolts) are assumed to have distinct widths, such sorting is possible if and only if for any pair of nuts (resp., bolts) there exists a bolt (resp., nut) separating the pair. It can be shown that our algorithm works even under the most relaxed assumption. That is, our algorithm sorts distinct nuts and bolts in the optimal O(n logn) sequential time (or O(log n) parallel time on n processors in Valiants parallel comparison tree model when copying nuts and bolts is allowed) as long as such sorting is possible by any algorithm. Note that under the most relaxed assumption, even O(nlogn) ezpected sequential time does not seem to be entirely trivial [4]. As we have mentioned in the introduction, the O(n log4 n)-time algorithm of Alon ed al. [3] (resp., the O(n log2 n)-time algorithm of Bradford et al. [S]) for sorting nuts and bolts is based on an O(n log3 n)-time (resp., O(n logn)-time) algorithm for selecting a me- dian nut and a median bolt. It is well known that the classic median selection (from a list of n numbers) can be done in O(n) time 151. It would be interest- ing to study if O(n)-time median selection is possible in the context of nuts and bolts (say, when there is a matching between nuts and bolts), since such an algorithm (if possible) would immediately yield an- other optimal algorithm for sorting nuts and bolts. By using the graphs of Lemma 2.1 in some interest- ing way and by using some technique of [l], we have found an O(n (log log n)2)-time algorithm for select- ing a median nut and a median bolt, This also gives an O(n logn (log log n)2)-time algorithm for sorting or matching nuts and bolts. One nice property of this al- gorithm is that the constant factors behind the 0 no- tations are reasonable, as opposed to the prohibitively large constant involved in our AKS-based approach. Details of our median-selection algorithm are omitted. Acknowledgment We thank Noga Alon for telling us the problem be- fore [3] was published. We thank Greg Plaxton for 241 stimulating discussion on the design of the partition scheme described in $2.2.3. We thank Yonatan Au- mann, Nabil Kahale, and Tom Leighton for helpful con- versations. References PI PI I31 141 151 PI 171 PI PI M. Ajtai, J. Komlos, , W. L. Steiger, and E. Sze- meredi. Optimal parallel selection has complexity O(log log N). J ounal of Computer and System Sci- ences, 38(1):125-133, 1989. The conference version appears in Proceedings of the 18th Annual ACM Symposium on the Theory of Computing, pages 188-195, 1986. M. Ajtai, J. Koml&, and E. SzemerCdi. Sorting in c log n parallel steps. Combinatotica, 3(1):1-19, 1983. See also the conference version, which appears in Proceedings of the 15th Annual ACM Symposium on the Theory of Computing, pages l-9, May 1983. N. Alon, M. Blum, A. Fiat, S. Kannan, M. Naor, and R. Ostrovsky. Matching nuts and bolts. In Proceedings of the 5th Annual ACM-SIAM Sympo- sium on Discrete Algorithms, pages 690-696, Jan- uary 1994. Y. Aumann. Personal communication. 1994. M. Blum, R. Floyd, V. Pratt, R. Rive&, and R. Tar- jan. Time bounds for selection. Joozlmal of Com- pitter and System Sciences, 7:448461, 1973. P. Bradford and R. Fleischer. Matching nuts and bolts faster. Technical Report MPI-I-95-1-003, Max-Planck-Institut Fiir Informatik, May 1995. An updated version appears in Proceedings of the Sixth International Symposium on Algorithms and Com- putation (ISAAC 95). M. S. Paterson. Improved sorting networks with O(log N) depth. dlgotithmica, 5:75-92, 1990. G. J. E. Rawlins. Compared to what? an intro- duction to the analysis of algorithms. Computer Science Press, 1991. L. G. Valiant. Parallelism in comparison problems. SIAM J. Comput., 4~348-355, 1975.