Vous êtes sur la page 1sur 149

DATA STRUCTURES Syllabus: Unit 1:

Introduction: need for data structures data types - Abstract Data Types Types of data structures Algorithm Analysis Problem Solving Categories of problem solving problem solving strategies Asymptotic notation Linked List - Types of Linked List single, double, circular Primitive operations Creation, insertion, deletion and traversal

Unit 2:
Stacks: Definition Primitive operations Push Pop - Representation using arrays and Linked Lists Applications (well formed ness of Parenthesis Evaluation of postfix expressions Conversion of Infix to Postfix forms Recursive functions Tower of Hanoi Queue: Definition Primitive Operations Representation using arrays and Linked Lists Circular Queue De queues.

Unit 3:
Trees Hierarchical relations binary trees types of binary trees complete binary trees skew trees - Representation of arrays and Linked Lists binary tree traversal in order, preorder, post order traversal Breadth first Traversal Expression Trees Binary Search Trees - BST Primitive Operations on BST

Unit 4:
Sorting Types Bubble sort Insertion sort Shell sort Merge sort Quick sort heap sort radix sort complexity of sorting algorithm - comparison

Unit 5:
Graphs Directed graphs Undirected graph weighted graph Representation of graphs adjacency matrix adjacency list BFS DFS Shortest path - spanning trees minimum spanning trees

UNIT 1: Introduction to Data Structures


Data structures are some objects that we generate to store data in them , these data can be more than one variable ( Arrays can be considered as built in data structures ) They also cotain algorithms to process those stored data . The Need for Data Structures Data structures organize data more efficient programs. [ 1. More powerful computers more complex applications. 2. More complex applications demand more calculations. 3. Complex computing tasks are unlike our everyday experience. [So we need special] Any organization for a collection of records can be searched, processed in any order, or modified. Ex: Simple unordered array of records.] The choice of data structure and algorithm can make the difference between a program running in a few seconds or many days.

Definitions A type is a set of values. A data type is a type and a collection of operations that manipulate the type. A data item or element is a piece of information or a record. Physical instantiation] A data item is said to be a member of a data type.A simple data item contains no subparts. [Ex: Integer] An aggregate data item may contain several pieces of information. [Ex: Payroll record, city database record] Abstract Data Types Abstract Data Type (ADT): a definition for a data type solely in terms of a set of values and a set of operations on that data type. Each ADT operation is de_ned by its inputs and outputs. Encapsulation: hide implementation details A data structure is the physical implementation of an ADT. Each operation associated with the ADT is implemented by one or more subroutines in the implementation. Data structure usually refers to an organization for data in main memory. File structure: an organization for data on peripheral storage, such as a disk drive or tape. An ADT manages complexity through abstraction: metaphor. [Hierarchies of labels]

Defining an abstract data type (ADT)


An Abstract Data type is defined as a mathematical model of the data objects that make up a data type as well as the functions that operate on these objects. There are no standard conventions for defining them. A broad division may be drawn between "imperative" and "functional" definition styles.

Imperative abstract data type definitions


In the "imperative" view, which is closer to the philosophy of imperative programming languages, an abstract data structure is conceived as an entity that is mutable meaning that it may be in different states at different times. Some operations may change the state of the ADT; therefore, the order in which operations are evaluated is important, and the same operation on the same entities may have different effects if executed at different times just like the instructions of a computer, or the commands and procedures of an imperative language. To underscore this view, it is customary to say that the operations are executed or applied, rather than evaluated. The imperative style is often used when describing abstract algorithms.[2] Abstract variable Imperative ADT definitions often depend on the concept of an abstract variable, which may be regarded as the simplest non-trivial ADT. An abstract variable V is a mutable entity that admits two operations:
store(V,x) where x is a value fetch(V), that yields a value;

of unspecified nature; and

with the constraint that


fetch(V)

always returns the value x used in the most recent store(V,x) operation on the same variable V. As in many programming languages, the operation store(V,x) is often written V x (or some similar notation), and fetch(V) is implied whenever a variable V is used in a context where a value is required. Thus, for example, V V + 1 is commonly understood to be a shorthand for store(V,fetch(V) + 1). In this definition, it is implicitly assumed that storing a value into a variable U has no effect on the state of a distinct variable V. To make this assumption explicit, one could add the constraint that if U and V are distinct variables, the sequence { store(U,x); store(V,y) } is equivalent to { store(V,y); store(U,x) }.

More generally, ADT definitions often assume that any operation that changes the state of one ADT instance has no effect on the state of any other instance (including other instances of the same ADT) unless the ADT axioms imply that the two instances are connected (aliased) in that sense. For example, when extending the definition of abstract variable to include abstract records, the operation that selects a field from a record variable R must yield a variable V that is aliased to that part of R. The definition of an abstract variable V may also restrict the stored values x to members of a specific set X, called the range or type of V. As in programming languages, such restrictions may simplify the description and analysis of algorithms, and improve their readability. Note that this definition does not imply anything about the result of evaluating fetch(V) when V is un-initialized, that is, before performing any store operation on V. An algorithm that does so is usually considered invalid, because its effect is not defined. (However, there are some important algorithms whose efficiency strongly depends on the assumption that such a fetch is legal, and returns some arbitrary value in the variable's range.[citation needed]) Instance creation Some algorithms need to create new instances of some ADT (such as new variables, or new stacks). To describe such algorithms, one usually includes in the ADT definition a create() operation that yields an instance of the ADT, usually with axioms equivalent to the result of create() is distinct from any instance S in use by the algorithm. This axiom may be strengthened to exclude also partial aliasing with other instances. On the other hand, this axiom still allows implementations of create() to yield a previously created instance that has become inaccessible to the program. Preconditions, postconditions, and invariants In imperative-style definitions, the axioms are often expressed by preconditions, that specify when an operation may be executed; postconditions, that relate the states of the ADT before and after the exceution of each operation; and invariants, that specify properties of the ADT that are not changed by the operations. Example: abstract stack (imperative) As another example, an imperative definition of an abstract stack could specify that the state of a stack S can be modified only by the operations
push(S ,x), where x is some value of unspecified pop(S), that yields a value as a result;

nature; and

with the constraint that For any value x and any abstract variable V, the sequence of operations { push(S,x); V pop(S) } is equivalent to { V x }; Since the assignment { V x }, by definition, cannot change the state of S, this condition implies that { V pop(S) } restores S to the state it had before the { push(S,x) }. From this condition and from the properties of abstract variables, it follows, for example, that the sequence { push(S,x); push(S,y); U pop(S); push(S,z); V pop(S); W pop(S); } where x,y, and z are any values, and U, V, W are pairwise distinct variables, is equivalent to { U y; V z; W x } Here it is implicitly assumed that operations on a stack instance do not modify the state of any other ADT instance, including other stacks; that is, For any values x,y, and any distinct stacks S and T, the sequence { push(S,x); push(T,y) } is equivalent to { push(T,y); push(S,x) }. A stack ADT definition usually includes also a Boolean-valued function empty(S) and a create() operation that returns a stack instance, with axioms equivalent to
create()

S for any stack S (a newly created stack is distinct from all previous

stacks) empty(create()) (a newly created stack is empty) not empty(push(S,x)) (pushing something into a stack makes it non-empty) Single-instance style Sometimes an ADT is defined as if only one instance of it existed during the execution of the algorithm, and all operations were applied to that instance, which is not explicitly notated. For example, the abstract stack above could have been defined with operations push(x) and pop(), that operate on "the" only existing stack. ADT definitions in this style can be easily rewritten to admit multiple coexisting instances of the ADT, by adding an explicit instance parameter (like S in the previous example) to every operation that uses or modifies the implicit instance. On the other hand, some ADTs cannot be meaningfully defined without assuming multiple instances. This is the case when a single operation takes two distinct instances of the ADT as parameters. For an example, consider augmenting the definition of the stack ADT with an operation compare(S,T) that checks whether the stacks S and T contain the same items in the same order.

Functional ADT definitions


Another way to define an ADT, closer to the spirit of functional programming, is to consider each state of the structure as a separate entity. In this view, any operation that modifies the ADT is modeled as a mathematical function that takes the old state as an argument, and returns the new state as part of the result. Unlike the "imperative" operations, these functions have no side effects. Therefore, the order in which they are evaluated is immaterial, and the same operation applied to the same arguments (including the same input states) will always return the same results (and output states). In the functional view, in particular, there is no way (or need) to define an "abstract variable" with the semantics of imperative variables (namely, with fetch and store operations). Instead of storing values into variables, one passes them as arguments to functions. Example: abstract stack (functional) For example, a complete functional-style definition of a stack ADT could use the three operations:
push: takes a stack state and an arbitrary value, top: takes a stack state, returns a value; pop: takes a stack state, returns a stack state;

returns a stack state;

with the following axioms:


top(push(s,x)) pop(push(s,x))

= x (pushing an item onto a stack leaves it at the top) = s (pop undoes the effect of push)

In a functional-style definition there is no need for a create operation. Indeed, there is no notion of "stack instance". The stack states can be thought of as being potential states of a single stack structure, and two stack states that contain the same values in the same order are considered to be identical states. This view actually mirrors the behavior of some concrete implementations, such as linked lists with hash cons. Instead of create(), a functional definition of a stack ADT may assume the existence of a special stack state, the empty stack, designated by a special symbol like or "()"; or define a bottom() operation that takes no aguments and returns this special stack state. Note that the axioms imply that
push(,x)

In a functional definition of a stack one does not need an empty predicate: instead, one can test whether a stack is empty by testing whether it is equal to .

Note that these axioms do not define the effect of top(s) or pop(s), unless s is a stack state returned by a push. Since push leaves the stack non-empty, those two operations are undefined (hence invalid) when s = . On the other hand, the axioms (and the lack of side effects) imply that push(s,x) = push(t,y) if and only if x = y and s = t. As in some other branches of mathematics, it is customary to assume also that the stack states are only those whose existence can be proved from the axioms in a finite number of steps. In the stack ADT example above, this rule means that every stack is a finite sequence of values, that becomes the empty stack () after a finite number of pops. By themselves, the axioms above do not exclude the existence of infinite stacks (that can be poped forever, each time yielding a different state) or circular stacks (that return to the same state after a finite number of pops). In particular, they do not exclude states s such that pop(s) = s or push(s,x) = s for some x. However, since one cannot obtain such stack states with the given operations, they are assumed "not to exist".

Typical operations
Some operations that are often specified for ADTs (possibly under other names) are
compare(s,t), that tests whether two structures are equivalent in some sense; hash(s), that computes some standard hash function from the instance's state; print(s) or show(s), that produces a human-readable representation of the

structure's state. In imperative-style ADT definitions, one often finds also


create(), that yields a new instance of the ADT; initialize(s), that prepares a newly-created instance

s for further operations, or resets it to some "initial state"; copy(s,t), that puts instance s in a state equivalent to that of t; clone(t), that performs s new(), copy(s,t), and returns s; free(s) or destroy(s), that reclaims the memory and other resources used by s; The free operation is not normally relevant or meaningful, since ADTs are theoretical entities that do not "use memory". However, it may be necessary when one needs to analyze the storage used by an algorithm that uses the ADT. In that case one needs additional axioms that specify how much memory each ADT instance uses, as a function of its state, and how much of it is returned to the pool by free.

Examples
Some common ADTs, which have proved useful in a great variety of applications, are Container Deque

List Map Multimap Multiset Priority queue Queue Set Stack String Tree Each of these ADTs may be defined in many ways and variants, not necessarily equivalent. For example, a stack ADT may or may not have a count operation that tells how many items have been pushed and not yet popped. This choice makes a difference not only for its clients but also for the implementation.

Types of Data Structures Data items have both a logical and a physical form. Logical form: de_nition of the data item within an ADT. [Ex: Integers in mathematical sense: +, ] Physical form: implementation of the data item within a data structure. [16/32 bit

Data Type ADT: _ Type _ Operations Data Items: Logical Form Data Structure: _ Storage Space _Subroutine Data Items: Physical Form ] Problems

Problem: a task to be performed. _ Best thought of as inputs and matching outputs. _ Problem definition should include constraints on the resources that may be consumed by any acceptable solution. [But s on HOW the problem is solved] Problems -- mathematical functions _ A function is a matching between inputs (the domain) and outputs (the range). _ An input to a function may be single number, or a collection of information. _ The values making up an input are called the parameters of the function. _ A particular input must always result in the same output every time the function is computed.

Algorithm Analysis
What is an algorithm and why do we want to analyze one? An algorithm is ``a...step-bystep procedure for accomplishing some end.''[9] An algorithm can be given in many ways. For example, it can be written down in English (or French, or any other ``natural'' language). However, we are interested in algorithms which have been precisely specified using an appropriate mathematical formalism--such as a programming language. Given such an expression of an algorithm, what can we do with it? Well, obviously we can run the program and observe its behavior. This is not likely to be very useful or informative in the general case. If we run a particular program on a particular computer with a particular set of inputs, then all know is the behavior of the program in a single instance. Such knowledge is anecdotal and we must be careful when drawing conclusions based upon anecdotal evidence. In order to learn more about an algorithm, we can ``analyze'' it. By this we mean to study the specification of the algorithm and to draw conclusions about how the implementation of that algorithm--the program--will perform in general. But what can we analyze? We can determine the running time of a program as a function of its inputs; determine the total or maximum memory space needed for program data; determine the total size of the program code; determine whether the program correctly computes the desired result; determine the complexity of the program--e.g., how easy is it to read, understand, and modify; and, determine the robustness of the program--e.g., how well does it deal with unexpected or erroneous inputs? In this text, we are concerned primarily with the running time. We also consider the memory space needed to execute the program. There are many factors that affect the

running time of a program. Among these are the algorithm itself, the input data, and the computer system used to run the program. The performance of a computer is determined by the hardware: o processor used (type and speed), o memory available (cache and RAM), and o disk available; the programming language in which the algorithm is specified; the language compiler/interpreter used; and the computer operating system software. A detailed analysis of the performance of a program which takes all of these factors into account is a very difficult and time-consuming undertaking. Furthermore, such an analysis is not likely to have lasting significance. The rapid pace of change in the underlying technologies means that results of such analyses are not likely to be applicable to the next generation of hardware and software. In order to overcome this shortcoming, we devise a ``model'' of the behavior of a computer with the goals of simplifying the analysis while still producing meaningful results. The next section introduces the first in a series of such models.

Asymptotic Notation
Introduction
A problem may have numerous algorithmic solutions. In order to choose the best algorithm for a particular task, you need to be able to judge how long a particular solution will take to run. Or, more accurately, you need to be able to judge how long two solutions will take to run, and choose the better of the two. You don't need to know how many minutes and seconds they will take, but you do need some way to compare algorithms against one another. Asymptotic complexity is a way of expressing the main component of the cost of an algorithm, using idealized units of computational work. Consider, for example, the algorithm for sorting a deck of cards, which proceeds by repeatedly searching through the deck for the lowest card. The asymptotic complexity of this algorithm is the square of the number of cards in the deck. This quadratic behavior is the main term in the complexity formula, it says, e.g., if you double the size of the deck, then the work is roughly quadrupled. The exact formula for the cost is more complex, and contains more details than are needed to understand the essential complexity of the algorithm. With our deck of cards, in the worst case, the deck would start out reverse-sorted, so our scans would have to go

all the way to the end. The first scan would involve scanning 52 cards, the next would take 51, etc. So the cost formula is 52 + 51 + ... + 2. generally, letting N be the number of cards, the formula is 2 + ... + N, which equals ((N) * (N + 1) / 2) 1 = ((N2 + N) / 2) 1 = (1 / 2)N2 + (1 / 2)N 1. But the N^2 term dominates the expression, and this is what is key for comparing algorithm costs. (This is in fact an expensive algorithm; the best sorting algorithms run in sub-quadratic time.) Asymptotically speaking, in the limit as N tends towards infinity, 2 + 3 + ... + N gets closer and closer to the pure quadratic function (1/2)N^2. And what difference does the constant factor of 1/2 make, at this level of abstraction? So the behavior is said to be O(n2). Now let us consider how we would go about comparing the complexity of two algorithms. Let f(n) be the cost, in the worst case, of one algorithm, expressed as a function of the input size n, and g(n) be the cost function for the other algorithm. E.g., for sorting algorithms, f(10) and g(10) would be the maximum number of steps that the algorithms would take on a list of 10 items. If, for all values of n >= 0, f(n) is less than or equal to g(n), then the algorithm with complexity function f is strictly faster. But, generally speaking, our concern for computational cost is for the cases with large inputs; so the comparison of f(n) and g(n) for small values of n is less significant than the "long term" comparison of f(n) and g(n), for n larger than some threshold. Note that we have been speaking about bounds on the performance of algorithms, rather than giving exact speeds. The actual number of steps required to sort our deck of cards (with our naive quadratic algorithm) will depend upon the order in which the cards begin. The actual time to perform each of our steps will depend upon our processor speed, the condition of our processor cache, etc., etc. It's all very complicated in the concrete details, and moreover not relevant to the essence of the algorithm.

Big-O Notation
Definition Big-O is the formal method of expressing the upper bound of an algorithm's running time. It's a measure of the longest amount of time it could possibly take for the algorithm to complete. More formally, for non-negative functions, f(n) and g(n), if there exists an integer n0 and a constant c > 0 such that for all integers n > n0, f(n) cg(n), then f(n) is Big O of g(n). This is denoted as "f(n) = O(g(n))". If graphed, g(n) serves as an upper bound to the curve you are analyzing, f(n). Note that if f can take on finite values only (as it should happen normally) then this definition implies that there exists some constant C (potentially larger than c) such that

for all values of n, f(n) Cg(n). An appropriate value for C is the maximum of c and . Theory Examples So, let's take an example of Big-O. Say that f(n) = 2n + 8, and g(n) = n2. Can we find a constant n0, so that 2n + 8 <= n2? The number 4 works here, giving us 16 <= 16. For any number n greater than 4, this will still work. Since we're trying to generalize this for large values of n, and small values (1, 2, 3) aren't that important, we can say that f(n) is generally faster than g(n); that is, f(n) is bound by g(n), and will always be less than it. It could then be said that f(n) runs in O(n2) time: "f-of-n runs in Big-O of n-squared time". To find the upper bound - the Big-O time - assuming we know that f(n) is equal to (exactly) 2n + 8, we can take a few shortcuts. For example, we can remove all constants from the runtime; eventually, at some value of c, they become irrelevant. This makes f(n) = 2n. Also, for convenience of comparison, we remove constant multipliers; in this case, the 2. This makes f(n) = n. It could also be said that f(n) runs in O(n) time; that lets us put a tighter (closer) upper bound onto the estimate. Practical Examples O(n): printing a list of n items to the screen, looking at each item once. O(ln n): also "log n", taking a list of items, cutting it in half repeatedly until there's only one item left. O(n2): taking a list of n items, and comparing every item to every other item.

Big-Omega Notation
For non-negative functions, f(n) and g(n), if there exists an integer n0 and a constant c > 0 such that for all integers n > n0, f(n) cg(n), then f(n) is omega of g(n). This is denoted as "f(n) = (g(n))". This is almost the same definition as Big Oh, except that "f(n) cg(n)", this makes g(n) a lower bound function, instead of an upper bound function. It describes the best that can happen for a given data size.

Theta Notation
Theta Notation For non-negative functions, f(n) and g(n), f(n) is theta of g(n) if and only if f(n) = O(g(n)) and f(n) = (g(n)). This is denoted as "f(n) = (g(n))".

This is basically saying that the function, f(n) is bounded both from the top and bottom by the same function, g(n). Little-O Notation For non-negative functions, f(n) and g(n), f(n) is little o of g(n) if and only if f(n) = O(g(n)), but f(n) (g(n)). This is denoted as "f(n) = o(g(n))". This represents a loose bounding version of Big O. g(n) bounds from the top, but it does not bound the bottom. Little Omega Notation For non-negative functions, f(n) and g(n), f(n) is little omega of g(n) if and only if f(n) = (g(n)), but f(n) (g(n)). This is denoted as "f(n) = (g(n))". Much like Little Oh, this is the equivalent for Big Omega. g(n) is a loose lower boundary of the function f(n); it bounds from the bottom, but not from the top.

How asymptotic notation relates to analyzing complexity


Temporal comparison is not the only issue in algorithms. There are space issues as well. Generally, a trade off between time and space is noticed in algorithms. Asymptotic notation empowers you to make that trade off. If you think of the amount of time and space your algorithm uses as a function of your data over time or space (time and space are usually analyzed separately), you can analyze how the time and space is handled when you introduce more data to your program. This is important in data structures because you want a structure that behaves efficiently as you increase the amount of data it handles. Keep in mind though that algorithms that are efficient with large amounts of data are not always simple and efficient for small amounts of data. So if you know you are working with only a small amount of data and you have concerns for speed and code space, a trade off can be made for a function that does not behave well for large amounts of data.

A few examples of asymptotic notation


Generally, we use asymptotic notation as a convenient way to examine what can happen in a function in the worst case or in the best case. For example, if you want to write a function that searches through an array of numbers and returns the smallest one:
function find-min(array a[1..n]) let j := for i := 1 to n: j := min(j, a[i]) repeat return j

end

Regardless of how big or small the array is, every time we run find-min, we have to initialize the i and j integer variables and return j at the end. Therefore, we can just think of those parts of the function as constant and ignore them. So, how can we use asymptotic notation to discuss the find-min function? If we search through an array with 87 elements, then the for loop iterates 87 times, even if the very first element we hit turns out to be the minimum. Likewise, for n elements, the for loop iterates n times. Therefore we say the function runs in time O(n). What about this function:
function find-min-plus-max(array a[1..n]) // First, find the smallest element in the array let j := ; for i := 1 to n: j := min(j, a[i]) repeat let minim := j // Now, find the biggest element, add it to the smallest and j := ; for i := 1 to n: j := max(j, a[i]) repeat let maxim := j // return the sum of the two return minim + maxim; end

What's the running time for find-min-plus-max? There are two for loops, that each iterate n times, so the running time is clearly O(2n). Because 2 is a constant, we throw it away and write the running time as O(n). Why can you do this? If you recall the definition of Big-O notation, the function whose bound you're testing can be multiplied by some constant. If f(x) = 2x, we can see that if g(x) = x, then the Big-O condition holds. Thus O(2n) = O(n). This rule is general for the various asymptotic notations.

Linked list
From Wikipedia, the free encyclopedia Jump to: navigation, search In computer science, a linked list is a data structure that consists of a sequence of data records such that in each record there is a field that contains a reference (i.e., a link) to the next record in the sequence.

A linked list whose nodes contain two fields: an integer value and a link to the next node

Linked lists are among the simplest and most common data structures; they provide an easy implementation for several important abstract data structures, including stacks, queues, associative arrays, and symbolic expressions. The principal benefit of a linked list over a conventional array is that the order of the linked items may be different from the order that the data items are stored in memory or on disk. For that reason, linked lists allow insertion and removal of nodes at any point in the list, with a constant number of operations. On the other hand, linked lists by themselves do not allow random access to the data, or any form of efficient indexing. Thus, many basic operations such as obtaining the last node of the list, or finding a node that contains a given datum, or locating the place where a new node should be inserted may require scanning most of the list elements.

Linear and circular lists


In the last node of a list, the link field often contains a null reference, a special value that is interpreted by programs as meaning "there is no such node". A less common convention is to make it point to the first node of the list; in that case the list is said to be circular or circularly linked; otherwise it is said to be open or linear.

A circular linked list

Singly-, doubly-, and multiply-linked lists


Singly-linked lists contain nodes which have a data field as well as a next field, which points to the next node in the linked list.

A singly-linked list whose nodes contain two fields: an integer value and a link to the next node

In a doubly-linked list, each node contains, besides the next-node link, a second link field pointing to the previous node in the sequence. The two links may be called forward(s) and backwards, or next and prev(ious).

A doubly-linked list whose nodes contain three fields: an integer value, the link forward to the next node, and the link backward to the previous node

The technique known as XOR-linking allows a doubly-linked list to be implemented using a single link field in each node. However, this technique requires the ability to do bit operations on addresses, and therefore may not be available in some high-level languages. In a multiply-linked list, each node contains two or more link fields, each field being used to connect the same set of data records in a different order (e.g., by name, by department, by date of birth, etc.). (While doubly-linked lists can be seen as special cases of multiply-linked list, the fact that the two orders are opposite to each other leads to simpler and more efficient algorithms, so they are usually treated as a separate case.) In the case of a doubly circular linked list, the only change that occurs is the end, or "tail" of the said list is linked back to the front, "head", of the list and vice versa.

Sentinel nodes
In some implementations, an extra sentinel or dummy node may be added before the first data record and/or after the last one. This convention simplifies and accelerates some list-handling algorithms, by ensuring that all links can be safely dereferenced and that every list (even one that contains no data elements) always has a "first" and "last" node.

Empty lists
An empty list is a list that contains no data records. This is usually the same as saying that it has zero nodes. If sentinel nodes are being used, the list is usually said to be empty when it has only sentinel nodes.

Hash linking
The link fields need not be physically part of the nodes. If the data records are stored in an array and referenced by their indices, the link field may be stored in a separate array with the same indices as the data records.

List handles
Since a reference to the first node gives access to the whole list, that reference is often called the address, pointer, or handle of the latter. Algorithms that manipulate linked lists usually get such handles to the input lists and return the handles to the resulting lists. In fact, in the context of such algorithms, the word "list" often means "list handle". In some situations, however, it may be convenient to refer to a list by a handle that consists of two links, pointing to its first and last nodes.

Combining alternatives

The alternatives listed above may be arbitrarily combined in almost every way, so one may have circular doubly-linked lists without sentinels, circular singly-linked lists with sentinels, etc.

Tradeoffs
As with most choices in computer programming and design, no method is well suited to all circumstances. A linked list data structure might work well in one case, but cause problems in another. This is a list of some of the common tradeoffs involving linked list structures.

Linked lists vs. dynamic arrays


Linked Dynamic Array list array Indexing (n) (1) (1) N/A (n) Insertion/deletion at beginning (1) [1] N/A (1) Insertion/deletion at end (1) search time + N/A (n) Insertion/deletion in middle (1)[2] Wasted space (average) (n) 0 (n) A dynamic array is a data structure that allocates all elements contiguously in memory, and keeps a count of the current number of elements. If the space reserved for the dynamic array is exceeded, it is reallocated and (possibly) copied, an expensive operation. Linked lists have several advantages over dynamic arrays. Insertion of an element at a specific point of a list is a constant-time operation, whereas insertion in a dynamic array at random locations will require moving half of the elements on average, and all the elements in the worst case. While one can "delete" an element from an array in constant time by somehow marking its slot as "vacant", this causes fragmentation that impedes the performance of iteration. Moreover, arbitrarily many elements may be inserted into a linked list, limited only by the total memory available; while a dynamic array will eventually fill up its underlying array data structure and have to reallocate an expensive operation, one that may not even be possible if memory is fragmented. Similarly, an array from which many elements are removed may have to be resized in order to avoid wasting too much space. On the other hand, dynamic arrays (as well as fixed-size array data structures) allow constant-time random access, while linked lists allow only sequential access to elements. Singly-linked lists, in fact, can only be traversed in one direction. This makes linked lists unsuitable for applications where it's useful to look up an element by its index quickly,

such as heapsort. Sequential access on arrays and dynamic arrays is also faster than on linked lists on many machines, because they have optimal locality of reference and thus make good use of data caching. Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. In contrast, a dynamic array requires only the space for the data itself (and a very small amount of control data).[note 1] It can also be slow, and with a nave allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools. Some hybrid solutions try to combine the advantages of the two representations. Unrolled linked lists store several elements in each list node, increasing cache performance while decreasing memory overhead for references. CDR coding does both these as well, by replacing references with the actual data referenced, which extends off the end of the referencing record. A good example that highlights the pros and cons of using dynamic arrays vs. linked lists is by implementing a program that resolves the Josephus problem. The Josephus problem is an election method that works by having a group of people stand in a circle. Starting at a predetermined person, you count around the circle n times. Once you reach the nth person, take them out of the circle and have the members close the circle. Then count around the circle the same n times and repeat the process, until only one person is left. That person wins the election. This shows the strengths and weaknesses of a linked list vs. a dynamic array, because if you view the people as connected nodes in a circular linked list then it shows how easily the linked list is able to delete nodes (as it only has to rearrange the links to the different nodes). However, the linked list will be poor at finding the next person to remove and will need to search through the list until it finds that person. A dynamic array, on the other hand, will be poor at deleting nodes (or elements) as it cannot remove one node without individually shifting all the elements up the list by one. However, it is exceptionally easy to find the nth person in the circle by directly referencing them by their position in the array. The list ranking problem concerns the efficient conversion of a linked list representation into an array. Although trivial for a conventional computer, solving this problem by a parallel algorithm is complicated and has been the subject of much research.

Singly-linked linear lists vs. other lists


While doubly-linked and/or circular lists have advantages over singly-linked linear lists, linear lists offer some advantages that make them preferable in some situations. For one thing, a singly-linked linear list is a recursive data structure, because it contains a pointer to a smaller object of the same type. For that reason, many operations on singlylinked linear lists (such as merging two lists, or enumerating the elements in reverse

order) often have very simple recursive algorithms, much simpler than any solution using iterative commands. While one can adapt those recursive solutions for doubly-linked and circularly-linked lists, the procedures generally need extra arguments and more complicated base cases. Linear singly-linked lists also allow tail-sharing, the use of a common final portion of sub-list as the terminal portion of two different lists. In particular, if a new node is added at the beginning of a list, the former list remains available as the tail of the new one a simple example of a persistent data structure. Again, this is not true with the other variants: a node may never belong to two different circular or doubly-linked lists. In particular, end-sentinel nodes can be shared among singly-linked non-circular lists. One may even use the same end-sentinel node for every such list. In Lisp, for example, every proper list ends with a link to a special node, denoted by nil or (), whose CAR and CDR links point to itself. Thus a Lisp procedure can safely take the CAR or CDR of any list. Indeed, the advantages of the fancy variants are often limited to the complexity of the algorithms, not in their efficiency. A circular list, in particular, can usually be emulated by a linear list together with two variables that point to the first and last nodes, at no extra cost.

Doubly-linked vs. singly-linked


Double-linked lists require more space per node (unless one uses xor-linking), and their elementary operations are more expensive; but they are often easier to manipulate because they allow sequential access to the list in both directions. In particular, one can insert or delete a node in a constant number of operations given only that node's address. To do the same in a singly-linked list, one must have the previous node's address. Some algorithms require access in both directions. On the other hand, doubly-linked lists do not allow tail-sharing and cannot be used as persistent data structures.

Circularly-linked vs. linearly-linked


A circularly linked list may be a natural option to represent arrays that are naturally circular, e.g. the corners of a polygon, a pool of buffers that are used and released in FIFO order, or a set of processes that should be time-shared in round-robin order. In these applications, a pointer to any node serves as a handle to the whole list. With a circular list, a pointer to the last node gives easy access also to the first node, by following one link. Thus, in applications that require access to both ends of the list (e.g., in the implementation of a queue), a circular structure allows one to handle the structure by a single pointer, instead of two. A circular list can be split into two circular lists, in constant time, by giving the addresses of the last node of each piece. The operation consists in swapping the contents of the link fields of those two nodes. Applying the same operation to any two nodes in two distinct

lists joins the two list into one. This property greatly simplifies some algorithms and data structures, such as the quad-edge and face-edge. The simplest representation for an empty circular list (when such thing makes sense) has no nodes, and is represented by a null pointer. With this choice, many algorithms have to test for this special case, and handle it separately. By contrast, the use of null to denote an empty linear list is more natural and often creates fewer special cases.

Using sentinel nodes


Sentinel node may simplify certain list operations, by ensuring that the next and/or previous nodes exist for every element, and that even empty lists have at least one node. One may also use a sentinel node at the end of the list, with an appropriate data field, to eliminate some end-of-list tests. For example, when scanning the list looking for a node with a given value x, setting the sentinel's data field to x makes it unnecessary to test for end-of-list inside the loop. Another example is the merging two sorted lists: if their sentinels have data fields set to +, the choice of the next output node does not need special handling for empty lists. However, sentinel nodes use up extra space (especially in applications that use many short lists), and they may complicate other operations (such as the creation of a new empty list). However, if the circular list is used merely to simulate a linear list, one may avoid some of this complexity by adding a single sentinel node to every list, between the last and the first data nodes. With this convention, an empty list consists of the sentinel node alone, pointing to itself via the next-node link. The list handle should then be a pointer to the last data node, before the sentinel, if the list is not empty; or to the sentinel itself, if the list is empty. The same trick can be used to simplify the handling of a doubly-linked linear list, by turning it into a circular doubly-linked list with a single sentinel node. However, in this case, the handle should be a single pointer to the dummy node itself.[3]

Linked list operations


When manipulating linked lists in-place, care must be taken to not use values that you have invalidated in previous assignments. This makes algorithms for inserting or deleting linked list nodes somewhat subtle. This section gives pseudocode for adding or removing nodes from singly, doubly, and circularly linked lists in-place. Throughout we will use null to refer to an end-of-list marker or sentinel, which may be implemented in a number of ways.

Linearly-linked lists
Singly-linked lists

Our node data structure will have two fields. We also keep a variable firstNode which always points to the first node in the list, or is null for an empty list.
record Node { data // The data being stored in the node next // A reference to the next node, null for last node } record List { Node firstNode // points to first node of list; null for empty list }

Traversal of a singly-linked list is simple, beginning at the first node and following each next link until we come to the end:
node := list.firstNode while node not null { (do something with node.data) node := node.next }

The following code inserts a node after an existing node in a singly linked list. The diagram shows how it works. Inserting a node before an existing one cannot be done directly; instead, you have to keep track of the previous node and insert a node after it.

function insertAfter(Node node, Node newNode) { // insert newNode after node newNode.next := node.next node.next := newNode }

Inserting at the beginning of the list requires a separate function. This requires updating firstNode.
function insertBeginning(List list, Node newNode) { // insert node before current first node newNode.next := list.firstNode list.firstNode := newNode }

Similarly, we have functions for removing the node after a given node, and for removing a node from the beginning of the list. The diagram demonstrates the former. To find and remove a particular node, one must again keep track of the previous element.

function removeAfter(node node) { // remove node past this one obsoleteNode := node.next node.next := node.next.next destroy obsoleteNode } function removeBeginning(List list) { // remove first node obsoleteNode := list.firstNode list.firstNode := list.firstNode.next // point past deleted node destroy obsoleteNode }

Notice that removeBeginning() sets list.firstNode to null when removing the last node in the list. Since we can't iterate backwards, efficient "insertBefore" or "removeBefore" operations are not possible. Appending one linked list to another can be inefficient unless a reference to the tail is kept as part of the List structure, because we must traverse the entire first list in order to find the tail, and then append the second list to this. Thus, if two linearly-linked lists are each of length n, list appending has asymptotic time complexity of O(n). In the Lisp family of languages, list appending is provided by the append procedure. Many of the special cases of linked list operations can be eliminated by including a dummy element at the front of the list. This ensures that there are no special cases for the beginning of the list and renders both insertBeginning() and removeBeginning() unnecessary. In this case, the first useful data in the list will be found at list.firstNode.next.

Circularly-linked list
In a circularly linked list, all nodes are linked in a continuous circle, without using null. For lists with a front and a back (such as a queue), one stores a reference to the last node in the list. The next node after the last node is the first node. Elements can be added to the back of the list and removed from the front in constant time. Circularly-linked lists can be either singly or doubly linked. Both types of circularly-linked lists benefit from the ability to traverse the full list beginning at any given node. This often allows us to avoid storing firstNode and

lastNode, although if the list may be empty we need a special representation for the empty list, such as a lastNode variable which points to some node in the list or is null if it's empty; we use such a lastNode here. This representation significantly simplifies adding and removing nodes with a non-empty list, but empty lists are then a special case. Algorithms Assuming that someNode is some node in a non-empty circular singly-linked list, this code iterates through that list starting with someNode:
function iterate(someNode) if someNode null node := someNode do do something with node.value node := node.next while node someNode

Notice that the test "while node someNode" must be at the end of the loop. If it were replaced by the test "" at the beginning of the loop, the procedure would fail whenever the list had only one node. This function inserts a node "newNode" into a circular linked list after a given node "node". If "node" is null, it assumes that the list is empty.
function insertAfter(Node node, Node newNode) if node = null newNode.next := newNode else newNode.next := node.next node.next := newNode

Suppose that "L" is a variable pointing to the last node of a circular linked list (or null if the list is empty). To append "newNode" to the end of the list, one may do
insertAfter(L, newNode) L := newNode

To insert "newNode" at the beginning of the list, one may do


insertAfter(L, newNode) if L = null L := newNode

Linked lists using arrays of nodes

Languages that do not support any type of reference can still create links by replacing pointers with array indices. The approach is to keep an array of records, where each record has integer fields indicating the index of the next (and possibly previous) node in the array. Not all nodes in the array need be used. If records are also not supported, parallel arrays can often be used instead. As an example, consider the following linked list record that uses arrays instead of pointers:
record Entry { integer next // index of next entry in array integer prev // previous entry (if double-linked) string name real balance }

By creating an array of these structures, and an integer variable to store the index of the first element, a linked list can be built:
integer listHead Entry Records[1000]

Links between elements are formed by placing the array index of the next (or previous) cell into the Next or Prev field within a given element. For example: Index 0 1 3 4 5 6 7 In the above example, ListHead would be set to 2, the location of the first entry in the list. Notice that entry 3 and 5 through 7 are not part of the list. These cells are available for any additions to the list. By creating a ListFree integer variable, a free list could be created to keep track of what cells are available. If all entries are in use, the size of the array would have to be increased or some elements would have to be deleted before new entries could be stored in the list. The following code would traverse the list and display names and account balance: 0 2 Next Prev 1 -1 4 0 -1 Name Jones, John Smith, Joseph Adams, Adam Balance 123.45 234.56 0.00

2 (listHead) 4

Ignore, Ignatius 999.99 Another, Anita 876.54

i := listHead while i >= 0 { '// loop through the list print i, Records[i].name, Records[i].balance // print entry i := Records[i].next }

When faced with a choice, the advantages of this approach include: The linked list is relocatable, meaning it can be moved about in memory at will, and it can also be quickly and directly serialized for storage on disk or transfer over a network. Especially for a small list, array indexes can occupy significantly less space than a full pointer on many architectures. Locality of reference can be improved by keeping the nodes together in memory and by periodically rearranging them, although this can also be done in a general store. Nave dynamic memory allocators can produce an excessive amount of overhead storage for each node allocated; almost no allocation overhead is incurred per node in this approach. Seizing an entry from a pre-allocated array is faster than using dynamic memory allocation for each node, since dynamic memory allocation typically requires a search for a free memory block of the desired size. This approach has one main disadvantage, however: it creates and manages a private memory space for its nodes. This leads to the following issues: It increase complexity of the implementation. Growing a large array when it is full may be difficult or impossible, whereas finding space for a new linked list node in a large, general memory pool may be easier. Adding elements to a dynamic array will occasionally (when it is full) unexpectedly take linear (O(n)) instead of constant time (although it's still an amortized constant). Using a general memory pool leaves more memory for other data if the list is smaller than expected or if many nodes are freed. For these reasons, this approach is mainly used for languages that do not support dynamic memory allocation. These disadvantages are also mitigated if the maximum size of the list is known at the time the array is created.

Dynamic Memory
Until now, in all our programs, we have only had as much memory available as we declared for our variables, having the size of all of them to be determined in the source code, before the execution of the program. But, what if we need a variable amount of memory that can only be determined during runtime? For example, in the case that we need some user input to determine the necessary amount of memory space.

The answer is dynamic memory, for which C++ integrates the operators new and delete.

Operators new and new[]


In order to request dynamic memory we use the operator new. new is followed by a data type specifier and -if a sequence of more than one element is required- the number of these within brackets []. It returns a pointer to the beginning of the new block of memory allocated. Its form is:

pointer = new type pointer = new type [number_of_elements]


The first expression is used to allocate memory to contain one single element of type type. The second one is used to assign a block (an array) of elements of type type, where number_of_elements is an integer value representing the amount of these. For example:

1 int * bobby; 2 bobby = new int [5];

In this case, the system dynamically assigns space for five elements of type int and returns a pointer to the first element of the sequence, which is assigned to bobby. Therefore, now, bobby points to a valid block of memory with space for five elements of type int.

The first element pointed by bobby can be accessed either with the expression bobby[0] or the expression *bobby. Both are equivalent as has been explained in the section about pointers. The second element can be accessed either with bobby[1] or *(bobby+1) and so on... You could be wondering the difference between declaring a normal array and assigning dynamic memory to a pointer, as we have just done. The most important difference is that the size of an array has to be a constant value, which limits its size to what we decide at the moment of designing the program, before its execution, whereas the dynamic memory allocation allows us to assign memory during the execution of the program (runtime) using any variable or constant value as its size. The dynamic memory requested by our program is allocated by the system from the memory heap. However, computer memory is a limited resource, and it can be exhausted. Therefore, it is important to have some mechanism to check if our request to allocate memory was successful or not. C++ provides two standard methods to check if the allocation was successful: One is by handling exceptions. Using this method an exception of type bad_alloc is thrown when the allocation fails. Exceptions are a powerful C++ feature explained later in these tutorials. But for now you should know that if this exception is thrown and it is not handled by a specific handler, the program execution is terminated.

This exception method is the default method used by new, and is the one used in a declaration like:

bobby = new int [5];

// if it fails an exception is thrown

The other method is known as nothrow, and what happens when it is used is that when a memory allocation fails, instead of throwing a bad_alloc exception or terminating the program, the pointer returned by new is a null pointer, and the program continues its execution. This method can be specified by using a special object called nothrow, declared in header <new>, as argument for new:

bobby = new (nothrow) int [5];

In this case, if the allocation of this block of memory failed, the failure could be detected by checking if bobby took a null pointer value:

1 int * bobby; 2 bobby = new (nothrow) int [5]; 3 if (bobby == 0) { 4 // error assigning memory. Take measures. 5 };

This nothrow method requires more work than the exception method, since the value returned has to be checked after each and every memory allocation, but I will use it in our examples due to its simplicity. Anyway this method can become tedious for larger projects, where the exception method is generally preferred. The exception method will be explained in detail later in this tutorial.

Operators delete and delete[]


Since the necessity of dynamic memory is usually limited to specific moments within a program, once it is no longer needed it should be freed so that the memory becomes available again for other requests of dynamic memory. This is the purpose of the operator delete, whose format is:

1 delete pointer; 2 delete [] pointer;

The first expression should be used to delete memory allocated for a single element, and the second one for memory allocated for arrays of elements. The value passed as argument to delete must be either a pointer to a memory block previously allocated with new, or a null pointer (in the case of a null pointer, delete produces no effect).

1 // rememb-o-matic 2 #include <iostream>

How many numbers would you like to type? 5

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

#include <new> using namespace std; int main () { int i,n; int * p; cout << "How many numbers would you like to type? "; cin >> i; p= new (nothrow) int[i]; if (p == 0) cout << "Error: memory could not be allocated"; else { for (n=0; n<i; n++) { cout << "Enter number: "; cin >> p[n]; } cout << "You have entered: "; for (n=0; n<i; n++) cout << p[n] << ", "; delete[] p; } return 0; }

Enter number : 75 Enter number : 436 Enter number : 1067 Enter number : 8 Enter number : 32 You have entered: 75, 436, 1067, 8, 32,

Notice how the value within brackets in the new statement is a variable value entered by the user (i), not a constant value:

p= new (nothrow) int[i];

But the user could have entered a value for i so big that our system could not handle it. For example, when I tried to give a value of 1 billion to the "How many numbers" question, my system could not allocate that much memory for the program and I got the text message we prepared for this case (Error: memory could not be allocated). Remember that in the case that we tried to allocate the memory without specifying the nothrow parameter in the new expression, an exception would be thrown, which if it's not handled terminates the program. It is a good practice to always check if a dynamic memory block was successfully allocated. Therefore, if you use the nothrow method, you should always check the value of the pointer returned. Otherwise, use the exception method, even if you do not handle the exception. This way, the program will terminate at that point without causing the unexpected results of continuing executing a code that assumes a block of memory to have been allocated when in fact it has not.

Dynamic memory in ANSI-C


Operators new and delete are exclusive of C++. They are not available in the C language. But using pure C language and its library, dynamic memory can also be used through the

functions malloc, calloc, realloc and free, which are also available in C++ including the <cstdlib> header file (see cstdlib for more info). The memory blocks allocated by these functions are not necessarily compatible with those returned by new, so each one should be manipulated with its own set of functions or operators.

UNIT 2:
STACK

A stack is a last in, first out (LIFO) abstract data type and data structure. A stack can have any abstract data type as an element, but is characterized by only two fundamental operations: push and pop. The push operation adds to the top of the list, hiding any items already on the stack, or initializing the stack if it is empty. The pop operation removes an item from the top of the list, and returns this value to the caller. A pop either reveals previously concealed items, or results in an empty list.

Simple representation of a stack A stack is a restricted data structure, because only a small number of operations are performed on it. The nature of the pop and push operations also means that stack elements have a natural order. Elements are removed from the stack in the reverse order to the order of their addition: therefore, the lower elements are typically those that have been in the list the longest.

Typical Operations of a Stack


bool IsEmpty bool IsFull Push(ItemType item) Pop() ItemType Top()
Determines whether the stack is empty Determines whether the stack is full Adds an item to the top of the stack Removes top item from the stack Returns a copy of the top item on the stack

Stack operations in pseudo-code: STACK-EMPTY(S) if top[S] = 0 return true else return false PUSH(S, x) top[S] <- top[S] + 1 S[top[S]] <- x POP(S) if STACK-EMPTY(S) then error underflow else top[S] <- top[S] 1 return S[top[S] + 1] The running time for the operations is O(1) (constant time).

Array implementation.
Representing stacks with arrays is a natural idea. The first problem that you might encounter is implementing the constructor ArrayStackOfStrings(). An instance variable a[] with an array of strings to hold the stack items is clearly needed, but how big should it be? For the moment, We will finesse this problem by having the client provide an argument for the constructor that gives the maximum stack size. We keep the items in reverse order of their arrival. This policy allows us to add and remove items at the end without moving any of the other items in the stack.

We could hardly hope for a simpler implementation of ArrayStackOfStrings.java: all of the methods are one-liners! The instance variables are an array a[] that hold the items in the stack and an integer N that counts the number of items in the stack. To remove an item, we decrement N and then return a[N]; to insert a new item, we set a[N] equal to the new item and then increment N. These operations preserve the following properties: the items in the array are in their insertion order the stack is empty when the value of N is 0 the top of the stack (if it is nonempty) is at a[N-1]

The primary characteristic of this implementation is that the push and pop operations take constant time. The drawback of this implementation is that it requires the client to estimate the maximum size of the stack ahead of time and always uses space proportional to that maximum, which may be unreasonable in some situations.

Implementing stacks with linked lists.


Program LinkedStackOfStrings.java uses a linked list to implement a stack of strings. The implementation is based on a nested class Node like the one we have been using. Java allows us to define and use other classes within class implementations in this natural way. The class is private because clients do not need to know any of the details of the linked lists.

Application: Determining Well-Formed Expressions with Parenthesis


The following section outlines an algorithm for determining whether or not an expression is well formed. Later, in the lab exercise, you will be given a chance to play with and modify this program. A classic use of a stack is for determining whether a set of parenthesis is "well formed". What exactly do we mean by well formed? In the case of parenthesis pairs, for an

expression to be well formed, a closing parenthesis symbol must match the last unmatched opening parenthesis symbol and all parenthesis symbols must be matched when the input is finished. Consider the following table demonstrating Well-Formed Versus Ill-Formed Expressions:
Well-Formed Expressions Ill-Formed Expressions (xxx[]) ()[]{} {(xxxxx)()} (xxx[) (( {xxxxx)()}

Why do we care about balancing parenthesis? Because the compiler looks for balanced parenthesis in such situations as: nested components (such as for loops, if-then statements, and while loops). The compiler checks for matching {} pairs to denote blocks of code. If the pairs do not match, a compiler error will occur. mathmatical expressions. For example a = b - {(x + y) * z + 3}; array notation [ ]. function calls and function definitions. Specifically, the parameters must be surrounded by balanced "(" ")".

2.1 The Algorithm


Given an expression with characters and parenthesis, ( ), [ ], and { }, determine if an expression is well-formed by using the following algorithm in conjunction with a stack. Read in the expression character by character. As each character is read in: If the character is an opening symbol (characters "(", "{", or "["), then push it onto the stack Else if the character is a closing symbol (characters ")", "}", or "]"), then: o Check if the stack is empty. If it is, then there is no matching opening symbol. The expression is not well formed. o Else, get the character at the top of the stack pop the stack compare the current closing symbol to the top symbol. If they balance each other, then the expression is still balanced and continue looping (reading the next character) If the end of the expression is reached, and it is still balanced, then print "expression is well formed" Otherwise, print "expression is not well formed" The expression: (x{x[]}x)

yields the following computation


'(' 'x' '{' 'x' '[' ']' : Push ( : Ignore : Push { : Ignore : Push [ : Get top of stack, openSymbol='[' Pop Compare if '[' matches ']' : Get top of stack, openSymbol='{' Pop Compare if '{' matches '}' : Ignore : Get top of stack, openSymbol='(' Pop Compare if '(' matches ')' : Print expression is well-formed

'}'

'x' ')'

'\n'

Or graphically:

Arithmetic expression evaluation. An important application of stacks is in parsing. For example, a compiler must parse arithmetic expressions written using infix notation. For example the following infix expression evaluates to 212.
( 2 + ( ( 3 + 4 ) * ( 5 * 6 ) ) ) We break the problem of parsing infix expressions into two stages. First, we convert from infix to a different representation called postfix. Then we parse the postfix expression, which is a somewhat easier problem than directly parsing infix. o Evaluating a postfix expression. A postfix expression is....

2 3 4 + 5 6 * * + First, we describe how to parse and evaluate a postfix expression. We read the tokens in one at a time. If it is an integer, push it on the stack; if it is a binary operator, pop the top two elements from the stack, apply the operator to the two elements, and push the result back on the stack. Program Postfix.java reads in and evaluates postfix expressions using this algorithm. o Converting from infix to postfix. Now, we describe how to convert from infix to postfix. We read in the tokens one at a time. If it is an operator, we push it on the stack; if it is an integer, we print it out; if it is a right parentheses, we pop the topmost element from the stack and print it out; if it is a left parentheses, we ignore it. Program Infix.java reads in an infix expression, and uses a stack to output an equivalent postfix expression using the algorithm described above. Relate back to the parse tree example in Section 4.3.

Towers of Hanoi and the Interpreter design pattern


Towers of Hanoi is a puzzle that consists of three pegs and five disks. Figure 1 shows the starting position of the puzzle. The goal is to reposition the stack of disks from peg A to peg C by moving one disk at a time, and, never placing a larger disk on top of a smaller disk. This is a classic problem for teaching recursion in data structures courses.

Figure 1 Towers of Hanoi starting position

This article discusses the recursive solution to this puzzle; and then refactors the algorithm to demonstrate an application of the "gang of four" Interpreter design pattern.

Solving the puzzle


Divide and conquer is a common algorithm development strategy. If we could break this daunting problem down into successively smaller problems, at some point we ought to be

able to identify a sufficiently obvious problem that even a troglodyte could solve. To this end, if we knew how to move the four smallest disks to peg B, it would be a simple matter to move disk five to peg C, and then we could repeat the magic recipe for moving the four smallest disks from their temporary staging area on peg B to their final destination on peg C. Voila! But how can we move the four smallest disks to peg B? If we knew how to move the three smallest disks to peg C, then we could easily move disk four to peg B, and repeat the hidden handshake capable of moving the three smallest disks back on top of disk four. Moving those three disks is still too hard. Lets decompose that requirement into: move the two smallest disks to peg B, move disk three to peg C, and move the two smallest disks back on top. But moving two disks is still beyond our capacity to solve unassisted. So well decompose that task yet one more time: move disk one to peg C, move disk two to peg B, and then reposition disk one back on top of disk two. In summary, if the original goal is to move five disks to peg C, then the sub-goals are to move: four disks to peg B, three disks to peg C, two disks to peg B, one disk to peg C. Once we have achieved each lower level sub-goal, we retrace our steps upward and outward until the original goal is achieved. Figure 2 shows us to be one move away from having three disks on peg C. Figure 3 is two moves away from having four disks positioned on peg B. Figure 4 has the two most buried disks where they belong, and it is now a simple matter to move: disk three to peg C, disk one to peg A, disk two to peg C, and disk one to peg C.

Figure 2 Almost ready to move disk four to peg B

Figure 3 Almost ready to move disk five to peg C

Figure 4 Disks four and five in position, only 4 moves remain

The recursive solution


The magic of recursion consists in deferring (or stacking) each unsolvable decomposition while the next decomposition is launched. At some point, a solvable decomposition is reached, and then the deferred decompositions are unstacked and handled in the reverse order in which they were created. Listing 1 shows the code originally proposed by reference 1. Each decomposition is composed of: check if the problem has been made sufficiently small to solve easily; if not, then: move number-1 disks to the auxiliary peg, move the remaining disk to the destination, and finally move the previous disks to the destination. Notice that the pegs find themselves playing different roles with respect to the current decomposition. These roles are: source (aka from), destination (or to), and auxiliary (spare). Listing 1 The classical recursive solution

void move( int number, int from, int to, int spare, List instructions ) { if (number == 1) { instructions.add( new LeafMove(from,to) ); return; } move( number-1, from, spare, to, instructions ); instructions.add( new LeafMove( from, to ) ); move( number-1, spare, to, from, instructions ); }

The list of LeafMove objects are created in the domain layer, and then used in the presentation layer.

Queue.
A queue supports the insert and remove operations using a FIFO discipline. By convention, we name the queue insert operation enqueue and the remove operation dequeue. Lincoln tunnel. Student has tasks that must be completed. Put on a queue. Do the tasks in the same order that they arrive.

In queues, the INSERT and DELETE operations are called ENQUEUE and DEQUEUE. As the POP operation of the stack, the DEQUEUE operation does not take any parameters, it just deletes the element from the head of the queue. Stacks have only one pointer (the top[S] attribute that points to the top) but queues have two: head[Q] points to the head and tail[Q] points to the end of the queue. A queue can be realized efficiently using a circular array Q that wraps around, i.e. so that when we are at the last index of an array the next element is at index 1. The only thing we have to take care of is updating the head[Q] and tail[Q] attributes. When head[Q] = tail[Q], the queue is empty. Initially, head[Q] = tail[Q] = 1 When head[Q] = tail[Q] + 1, the queue is full. Queue operations in pseudo-code: ENQUEUE(Q, x) Q[tail[Q]] <- x if tail[Q] = length[Q] then tail[Q] <- 1 else tail[Q] <- tail[Q] + 1

DEQUEUE(Q) x <- Q[head[Q]] if head[Q] = length[Q] then head[Q] <- 1 else head[Q] <- head[Q] + 1 return x These are also O(1) time operations.

Linked list implementation. Like Stack, we maintain a reference first to the least-recently added Node on the queue. For efficiency, we also maintain a reference last to the least-recently added Node on the queue.

Array implementation. Similar to array implementation of stack, but a little trickier since need to wrap-around. The array is dynamically resized using repeated doubling.

UNIT 3: Trees
A tree is a non-empty set, one element of which is designated the root of the tree while the remaining elements are partitioned into non-empty sets each of which is a subtree of the root. Tree nodes have many useful properties. The depth of a node is the length of the path (or the number of edges) from the root to that node. The height of a node is the longest path from that node to its leaves. The height of a tree is the height of the root. A leaf node has no children -- its only path is up to its parent. See the axiomatic development of trees and its consequences for more information. Types of trees: Binary: Each node has zero, one, or two children. This assertion makes many tree operations simple and efficient. Binary Search: A binary tree where any left child node has a value less than its parent node and any right child node has a value greater than or equal to that of its parent node.

AVL: A balanced binary search tree according to the following specification: the heights of the two child subtrees of any node differ by at most one. Red-Black Tree: A balanced binary search tree using a balancing algorithm based on colors assigned to a node, and the colors of nearby nodes. AA Tree: A balanced tree, in fact a more restrictive variation of a red-black tree.

Binary tree

A simple binary tree of size 9 and height 3, with a root node whose value is 2. The above tree is balanced but not sorted. In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to the "root" node (the ancestor of all nodes), if it exists. Any node in the data structure can be reached by starting at root node and repeatedly following references to either the left or right child. Binary trees are commonly used to implement binary search trees and binary heaps.

Definitions for rooted trees


A directed edge refers to the link from the parent to the child (the arrows in the picture of the tree). The root node of a tree is the node with no parents. There is at most one root node in a rooted tree. A leaf node has no children. The depth of a node n is the length of the path from the root to the node. The set of all nodes at a given depth is sometimes called a level of the tree. The root node is at depth zero (or one [1]). The height of a tree is the length of the path from the root to the deepest node in the tree. A (rooted) tree with only one node (the root) has a height of zero (or one[1]). Siblings are nodes that share the same parent node. A node p is an ancestor of a node q if it exists on the path from q to the root. The node q is then termed a descendant of p. The size of a node is the number of descendants it has including itself. In-degree of a node is the number of edges arriving at that node. Out-degree of a node is the number of edges leaving that node. The root is the only node in the tree with In-degree = 0.

e.g.:depth of tree with level =3,then, size of tree is, level+1=4

Types of binary trees


A rooted binary tree is a tree with a root node in which every node has at most two children. A full binary tree (sometimes proper binary tree or 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children. A perfect binary tree is a full binary tree in which all leaves are at the same depth or same level.[2] (This is ambiguously also called a complete binary tree.) A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.[3] An infinite complete binary tree is a tree with levels, where for each level d the number of existing nodes at level d is equal to 2d. The cardinal number of the set of all nodes is . The cardinal number of the set of all paths is . The infinite complete binary tree essentially describes the structure of the Cantor set; the unit interval on the real line (of cardinality ) is the continuous image of the Cantor set; this tree is sometimes called the Cantor space. A balanced binary tree is where the depth of all the leaves differs by at most 1. Balanced trees have a predictable depth (how many nodes are traversed from the root to a leaf, root counting as node 0 and subsequent as 1, 2, ..., depth). This depth is equal to the integer part of log2(n) where n is the number of nodes on the balanced tree. Example 1: balanced tree with 1 node, log2(1) = 0 (depth = 0). Example 2: balanced tree with 3 nodes, log2(3) = 1.59 (depth=1). Example 3: balanced tree with 5 nodes, log2(5) = 2.32 (depth of tree is 2 nodes). A rooted complete binary tree can be identified with a free magma. A degenerate tree is a tree where for each parent node, there is only one associated child node. This means that in a performance measurement, the tree will behave like a linked list data structure. Note that this terminology often varies in the literature, especially with respect to the meaning "complete" and "full".

Properties of binary trees


The number of nodes n in a perfect binary tree can be found using this formula: n = 2h + 1 1 where h is the height of the tree. h The number of nodes n in a complete binary tree is minimum: n = 2 and h+1 maximum: n = 2 1 where h is the height of the tree. The number of leaf nodes L in a perfect binary tree can be found using this h formula: L = 2 where h is the height of the tree.

The number of nodes n in a perfect binary tree can also be found using this formula: n = 2L 1 where L is the number of leaf nodes in the tree. The number of NULL links in a Complete Binary Tree of n-node is (n+1). The number of leaf node in a Complete Binary Tree of n-node is UpperBound(n / 2). For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2, n0 = n2 + 1.[4] n = n0 + n1 + n2 + n4 + n3 + n5 + .... + nB-1 + nB B = n - 1, n = 1 + 1*n1 + 2*n2 + 3*n3 + 4*n4 + ... + B*nB, NOT include n0

Proof
proof: n0=n2+1
Let n = the total number of nodes B = the branches in T n0, n1, n2 represent the nodes with no children, single child, and two children respectively. B = n - 1 B = n1 + 2*n2 n = n1+ 2*n2 + 1 n = n0 + n1 + n2 n1+ 2*n2 + 1 = n0 + n1 + n2 ==> n0 = n2 + 1

Traversal
Many problems require we visit* the nodes of a tree in a systematic way: tasks such as counting how many nodes exist or finding the maximum element. Three different methods are possible for binary trees: preorder, postorder, and in-order, which all do the same three things: recursively traverse both the left and rights subtrees and visit the current node. The difference is when the algorithm visits the current node: preorder: Current node, left subtree, right subtree(DLR) postorder: Left subtree, right subtree, current node(LRD) in-order: Left subtree, current node, right subtree.(LDR) levelorder: Level by level, from left to right, starting from the root node. Visit means performing some operation involving the current node of a tree, like incrementing a counter or checking if the value of the current node is greater than any other recorded.

Sample implementations for Tree Traversal

preorder(node) visit(node) if node.left null then preorder(node.left) if node.right null then preorder(node.right) inorder(node) if node.left null then inorder(node.left) visit(node) if node.right null then inorder(node.right) postorder(node) if node.left null then postorder(node.left) if node.right null then postorder(node.right) visit(node) levelorder(root) queue<node> q q.push(root) while not q.empty do node = q.pop visit(node) if node.left null then q.push(node.left) if node.right null then q.push(node.right)

For an algorithm that is less taxing on the stack, see Threaded Trees.

Examples of Tree Traversals

preorder: 50, 30, 20, 40, 90, 100 inorder: 20, 30, 40, 50, 90, 100 postorder: 20, 40, 30, 100, 90, 50 levelorder: 50, 30, 90, 20, 40, 100

Balancing
When entries that are already sorted are stored in a tree, all new records will go the same route, and the tree will look more like a list (such a tree is called a degenerate tree). Therefore the tree needs balancing routines, making sure that under all branches are an equal number of records. This will keep searching in the tree at optimal speed. Specifically, if a tree with n nodes is a degenerate tree, the longest path through the tree will be n nodes; if it is a balanced tree, the longest path will be log n nodes. Algorithms/Left_rotation This shows how balancing is applied to establish a priority heap invariant in a Treap , a data structure which has the queueing performance of a heap, and

the key lookup performance of a tree. A balancing operation can change the tree structure while maintaining another order, which is binary tree sort order. The binary tree order is left to right, with left nodes' keys less than right nodes' keys, whereas the priority order is up and down, with higher nodes' priorities greater than lower nodes' priorities. Alternatively, the priority can be viewed as another ordering key, except that finding a specific key is more involved. The balancing operation can move nodes up and down a tree without affecting the left right ordering. Media:left_rotation.svg

Binary Search Trees


A typical binary search tree looks like this:

Terms Node Any item that is stored in the tree. Root The top item in the tree. (50 in the tree above) Child Node(s) under the current node. (20 and 40 are children of 30 in the tree above) Parent The node directly above the current node. (90 is the parent of 100 in the tree above) Leaf A node which has no children. (20 is a leaf in the tree above) Searching through a binary search tree To search for an item in a binary tree: 1. Start at the root node 2. If the item that you are searching for is less than the root node, move to the left child of the root node, if the item that you are searching for is more than the root node, move to the right child of the root node and if it is equal to the root node, then you have found the item that you are looking for :) 3. Now check to see if the item that you are searching for is equal to, less than or more than the new node that you are on. Again if the item that you are searching for is less than the current node, move to the left child, and if the item that you are searching for is greater than the current node, move to the right child. 4. Repeat this process until you find the item that you are looking for or until the node doesn't have a child on the correct branch, in which case the tree doesn't contain the item which you are looking for.

Example

For example, to find the node 40... 1. The root node is 50, which is greater than 40, so you go to 50's left child. 2. 50's left child is 30, which is less than 40, so you next go to 30's right child. 3. 30's right child is 40, so you have found the item that you are looking for :) Adding an item to a binary search tree 1. To add an item, you first must search through the tree to find the position that you should put it in. You do this following the steps above. 2. When you reach a node which doesn't contain a child on the correct branch, add the new node there.

For example, to add the node 25... 1. 2. 3. 4. The root node is 50, which is greater than 25, so you go to 50's left child. 50's left child is 30, which is greater than 25, so you go to 30's left child. 30's left child is 20, which is less than 25, so you go to 20's right child. 20's right child doesn't exist, so you add 25 there :)

Deleting an item from a binary search tree It is assumed that you have already found the node that you want to delete, using the search technique described above.
Case 1: The node you want to delete is a leaf

For example, do delete 40... Simply delete the node!


Case 2: The node you want to delete has one child

1. Directly connect the child of the node that you want to delete, to the parent of the node that you want to delete.

For example, to delete 90... Delete 90, then make 100 the child node of 50.
Case 3: The node you want to delete has two children

1. Find the left-most node in the right subtree of the node being deleted. (After you have found the node you want to delete, go to its right node, then for every node under that, go to its left node until the node has no left node) From now on, this node will be known as the successor.

For example, to delete 30 1. The right node of the node which is being deleted is 40. 2. (From now on, we continually go to the left node until there isn't another one...) The first left node of 40, is 35. 3. 35 has no left node, therefore 35 is the successor!
Case 1: The successor is the right child of the node being deleted

1. Directly move the child to the right of the node being deleted into the position of the node being deleted. 2. As the new node has no left children, you can connect the deleted node's left subtree's root as it's left child.

For example, to delete 30 1. Move 40 up to where 30 was. 2. 20 now becomes 40's left child.

Case 2: The successor isn't the right child of the node being deleted

This is best shown with an example

To delete 30... 1. Move the successor into the place where the deleted node was and make it inherit both of its children. So 35 moves to where 30 was and 20 and 40 become its children. 2. Move the successor's (35s) right subtree to where the successor was. So 37 becomes a child of 40.
Node deletion

In general, remember that a node's left subtree's rightmost node is the closest node on the left , and the right subtree's leftmost node is the closest node on the right, and either one of these can be chosen to replace the deleted node, and the only complication is when the replacing node has a child subtree on the same side to that node as the replacing node's side of the deleting node, and the easiest thing to do is to always return the same side subtree of the replacing node as the child of the parent of the replacing node, even if the subtree is empty.

B Trees
B Trees were described originally as generalizations of binary trees , where a binary tree is a 2-node B-Tree, the 2 standing for two children, with 2-1 = 1 key separating the 2 children. Hence a 3-node has 2 values separating 3 children, and a N node has N children separated by N-1 keys.

A classical B-Tree can have N-node internal nodes, and empty 2-nodes as leaf nodes, or more conveniently, the children can either be a value or a pointer to the next N-node, so it is a union. The main idea with B-trees is that one starts with a root N-node , which is able to hold N1 entries, but on the Nth entry, the number of keys for the node is exhausted, and the node can be split into two half sized N/2 sized N nodes, separated by a single key K, which is equal to the right node's leftmost key, so any entry with key K2 equal or greater than K goes in the right node, and anything less than K goes in the left. When the root node is split, a new root node is created with one key, and a left child and a right child. Since there are N children but only N-1 entries, the leftmost child is stored as a separate pointer. If the leftmost pointer splits, then the left half becomes the new leftmost pointer, and the right half and separating key is inserted into the front of the entries. An alternative is the B+ tree which is the most commonly used in database systems, because only values are stored in leaf nodes, whereas internal nodes only store keys and pointers to other nodes, putting a limit on the size of the datum value as the size of a pointer. This often allows internal nodes with more entries able to fit a certain block size, e.g. 4K is a common physical disc block size. Hence , if a B+ tree internal node is aligned to a physical disc block, then the main rate limiting factor of reading a block of a large index from disc because it isn't cached in a memory list of blocks is reduced to one block read. A B+ tree has bigger internal nodes, so is more wider and shorter in theory than an equivalent B tree which must fit all nodes within a given physical block size, hence overall it is a faster index due to greater fan out and less height to reach keys on average. Apparently, this fan out is so important, compression can also be applied to the blocks to increase the number of entries fitting within a given underlying layer's block size ( the underlying layer is often a filesystem block ). Most database systems use the B+ tree algorithm, including postgresql, mysql, derbydb, firebird, many Xbase index types, etc. Many filesystems also use a B+ tree to manage their block layout ( e.g. xfs, NTFS, etc . don't know much detail though ). Transwiki has a java implementation of a B+ Tree which uses traditional arrays as key list and value list. Below is an example of a B Tree with test driver, and a B+ tree with a test driver. The memory / disc management is not included, but a usable hacked example can be found at Hashing Memory Checking Example.

This B+ tree implementation was written out of the B Tree , and the difference from the transwiki B+ tree is that it tries to use the semantics of SortedMap and SortedSet already present in the standard java collections library. Hence , the flat leaf block list of this B+ implementation can't contain blocks that don't contain any data, because the ordering depends on the first key of the entries, so a leaf block needs to be created with its first entry.

UNIT 4:

Sorting algorithm
In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions: 1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order); 2. The output is a permutation, or reordering, of the input.

Classification
Sorting algorithms used in computer science are often classified by: Computational complexity (worst, average and best behaviour) of element comparisons in terms of the size of the list good behavior is . For typical sorting algorithms . (See Big O notation.) and bad behavior is

Ideal behavior for a sort is , but this is not possible in the average case. Comparison-based sorting algorithms, which evaluate the elements of the list via an abstract key comparison operation, need at least comparisons for most inputs. Computational complexity of swaps (for "in place" algorithms). Memory usage (and use of other computer resources). In particular, some sorting algorithms are "in place". This means that they need only or memory beyond the items being sorted and they don't need to create auxiliary locations for data to be temporarily stored, as in other sorting algorithms.

Recursion. Some algorithms are either recursive or non-recursive, while others may be both (e.g., merge sort). Stability: stable sorting algorithms maintain the relative order of records with equal keys (i.e., values). See below for more information. Whether or not they are a comparison sort. A comparison sort examines the data only by comparing two elements with a comparison operator. General method: insertion, exchange, selection, merging, etc.. Exchange sorts include bubble sort and quicksort. Selection sorts include shaker sort and heapsort. Adaptability: Whether or not the presortedness of the input affects the running time. Algorithms that take this into account are known to be adaptive. Stability Stable sorting algorithms maintain the relative order of records with equal keys. If all keys are different then this distinction is not necessary. But if there are equal keys, then a sorting algorithm is stable if whenever there are two records (let's say R and S) with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list. When equal elements are indistinguishable, such as with integers, or more generally, any data where the entire element is the key, stability is not an issue. However, assume that the following pairs of numbers are to be sorted by their first component:
(4, 2) (3, 7) (3, 1) (5, 6)

In this case, two different results are possible, one which maintains the relative order of records with equal keys, and one which does not:
(3, 7) (3, 1) (3, 1) (3, 7) (4, 2) (4, 2) (5, 6) (5, 6) (order maintained) (order changed)

Unstable sorting algorithms may change the relative order of records with equal keys, but stable sorting algorithms never do so. Unstable sorting algorithms can be specially implemented to be stable. One way of doing this is to artificially extend the key comparison, so that comparisons between two objects with otherwise equal keys are decided using the order of the entries in the original data order as a tie-breaker. Remembering this order, however, often involves an additional computational cost. Sorting based on a primary, secondary, tertiary, etc. sort key can be done by any sorting method, taking all sort keys into account in comparisons (in other words, using a single composite sort key). If a sorting method is stable, it is also possible to sort multiple times, each time with one sort key. In that case the keys need to be applied in order of increasing priority. Example: sorting pairs of numbers as above by first, then second component:
(4, 2) (3, 7) (3, 1) (5, 6) (original)

(3, 1) (3, 1)

(4, 2) (3, 7)

(5, 6) (4, 2)

(3, 7) (after sorting by second component) (5, 6) (after sorting by first component)

On the other hand:


(3, 7) (3, 1) (3, 1) (4, 2) (4, 2) (5, 6) (5, 6) (after sorting by first component) (3, 7) (after sorting by second component, order by first component is disrupted).

Comparison of algorithms

The complexity of different algorithms in a specific situation. In this table, n is the number of records to be sorted. The columns "Average" and "Worst" give the time complexity in each case, under the assumption that the length of each key is constant, and that therefore all comparisons, swaps, and other needed operations can proceed in constant time. "Memory" denotes the amount of auxiliary storage needed beyond that used by the list itself, under the same assumption. These are all comparison sorts. The run-time and the memory of algorithms could be measured using various notations like theta, sigma, Big-O, small-o, etc. The memory and the run-times below are applicable for all the 5 notations. Best Name Average Worst Memory Stable Method Other

Insertion sort

Yes

notes Average Insertio case is also n , where d is

the number of inversions Insertio n When using a selfInsertio balancing n binary search tree In-place with Insertio theoreticall n y optimal number of writes Insertio n Finds all Insertio the longest n & increasing Selectio subsequenc n es within O(n log n) comparis Insertio ons when n & the data is Merging already sorted. Its stability depends on the implementa tion. Used Selectio to sort this n table in Safari or other Webkit web browser [2]. Selectio

Shell sort

No

Binary tree sort

Yes

Cycle sort

No

Library sort

Yes

Patience sorting

No

Timsort

Yes

Selection sort

No

Heapsort

No

n An adaptive sort comparis Selectio ons when No n the data is already sorted, and 0 swaps. Selectio Yes n Selectio n Exchang Yes Tiny code ing Exchang Yes ing Exchang Small code No ing size Exchang Tiny code Yes ing size Used to sort Depends Yes Merging this table in Firefox [3]. Example implementa tion here: [4]; can be implemente Depend Merging d as a stable s sort based on stable in-place merging: [5] Can be implemente d as a stable sort Depend Partition depending s ing on how the pivot is handled. Nave

Smoothsort

Strand sort Tournament sort Bubble sort Cocktail sort Comb sort Gnome sort Merge sort

In-place merge sort

Quicksort

variants use spac e Partition Used in ing & SGI STL Selectio implementa n tions Randomly sort the Luck array and check if sorted

Introsort

No

Bogosort

No

The following table describes sorting algorithms that are not comparison sorts. As such, they are not limited by a lower bound. Complexities below are in terms of n, the number of items to be sorted, k, the size of each key, and d, the digit size used by the implementation. Many of them are based on the assumption that the key size is large enough that all entries have unique key values, and hence that n << 2k, where << means "much less than." Name Best Average Worst Memory Stable n << 2k Notes

Pigeonhol e sort

Yes

Yes Assumes uniform distributio n of No elements from the domain in the array. r is the range of numbers to be sorted. Yes If r = then Avg RT =

Bucket sort

Yes

Counting sort

Yes

LSD Radix Sort

Yes

No Stable version uses an external No array of size n to hold all of the bins In-Place. k /d recursion No levels, 2d for count array Asymptoti cs are based on the assumption No that n << 2k, but the algorithm does not require this.

MSD Radix Sort

Yes

MSD Radix Sort

No

Spreadsor t

No

The following table describes some sorting algorithms that are impractical for real-life use due to extremely poor performance or a requirement for specialized hardware. Name Best Averag Worst Memor Stable Comparis e Bead sort Simpl e panca ke sort Sortin g netwo rks N/A N/A y N/A on No Other notes Requires specialized hardware

No

Yes

Count is number of flips.

Yes

No

Requires a custom circuit of size

Additionally, theoretical computer scientists have detailed other sorting algorithms that provide better than time complexity with additional constraints, including:

Han's algorithm, a deterministic algorithm for sorting keys from a domain of finite size, taking time and space.[2] Thorup's algorithm, a randomized algorithm for sorting keys from a domain of finite size, taking An integer sorting algorithm taking space.
[4]

time and

space.[3] expected time and

Algorithms not yet compared above include: Odd-even sort Flashsort Burstsort Postman sort Stooge sort Samplesort Bitonic sorter Topological sort

Summaries of popular sorting algorithms


Bubble sort

A bubble sort, a sorting algorithm that continuously steps through a list, swapping items until they appear in the correct order. Bubble sort is a straightforward and simplistic method of sorting data that is used in computer science education. Bubble sort is based on the principle that an air bubble in water will rise displacing all the heavier water molecules in its way. The algorithm starts at the beginning of the data set. It compares the first two elements, and if the first is

greater than the second, then it swaps them. It continues doing this for each pair of adjacent elements to the end of the data set. It then starts again with the first two elements, repeating until no swaps have occurred on the last pass. This algorithm is highly inefficient, and is rarely used[citation needed][dubious discuss], except as a simplistic example. For example, if we have 100 elements then the total number of comparisons will be 10000. A slightly better variant, cocktail sort, works by inverting the ordering criteria and the pass direction on alternating passes. The modified Bubble sort will stop 1 shorter each time through the loop, so the total number of comparisons for 100 elements will be 4950. Bubble sort may, however, be efficiently used on a list that is already sorted, except for a very small number of elements. For example, if only one element is not in order, bubble sort will only take 2n time. If two elements are not in order, bubble sort will only take at most 3n time. Bubble sort average case and worst case are both O(n).

Selection sort
Selection sort is a sorting algorithm, specifically an in-place comparison sort. It has O(n2) complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations. The algorithm finds the minimum value, swaps it with the value in the first position, and repeats these steps for the remainder of the list.

Insertion sort
Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly-sorted lists, and often is used as part of more sophisticated algorithms. It works by taking elements from the list one by one and inserting them in their correct position into a new sorted list. In arrays, the new list and the remaining elements can share the array's space, but insertion is expensive, requiring shifting all following elements over by one. Shell sort (see below) is a variant of insertion sort that is more efficient for larger lists.

Shell sort
Shell sort was invented by Donald Shell in 1959. It improves upon bubble sort and insertion sort by moving out of order elements more than one position at a time. One implementation can be described as arranging the data sequence in a two-dimensional array and then sorting the columns of the array using insertion sort.

Comb sort
Comb sort is a relatively simplistic sorting algorithm originally designed by Wlodzimierz Dobosiewicz in 1980. Later it was rediscovered and popularized by Stephen

Lacey and Richard Box with a Byte Magazine article published in April 1991. Comb sort improves on bubble sort, and rivals algorithms like Quicksort. The basic idea is to eliminate turtles, or small values near the end of the list, since in a bubble sort these slow the sorting down tremendously. (Rabbits, large values around the beginning of the list, do not pose a problem in bubble sort.).

Merge sort
Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists are merged into the final sorted list. Of the algorithms described here, this is the first that scales well to very large lists, because its worst-case running time is O(n log n). Merge sort has seen a relatively recent surge in popularity for practical implementations, being used for the standard sort routine in the programming languages Perl[5], Python (as timsort[6]), and Java (also uses timsort as of JDK7[7]), among others. Merge sort has been used in Java at least since 2000 in JDK1.3.[8][9]

Heapsort

Heapsort is a much more efficient version of selection sort. It also works by determining the largest (or smallest) element of the list, placing that at the end (or beginning) of the list, then continuing with the rest of the list, but accomplishes this task efficiently by using a data structure called a heap, a special type of binary tree. Once the data list has been made into a heap, the root node is guaranteed to be the largest (or smallest) element. When it is removed and placed at the end of the list, the heap is rearranged so the largest element remaining moves to the root. Using the heap, finding the next largest element takes O(log n) time, instead of O(n) for a linear scan as in simple selection sort. This allows Heapsort to run in O(n log n) time, and this is also the worst case complexity.

Quicksort
Quicksort is a divide and conquer algorithm which relies on a partition operation: to partition an array, we choose an element, called a pivot, move all smaller elements before the pivot, and move all greater elements after it. This can be done efficiently in linear time and in-place. We then recursively sort the lesser and greater sublists. Efficient implementations of quicksort (with in-place partitioning) are typically unstable sorts and somewhat complex, but are among the fastest sorting algorithms in practice. Together with its modest O(log n) space usage, this makes quicksort one of the most popular sorting algorithms, available in many standard libraries. The most complex issue in quicksort is choosing a good pivot element; consistently poor choices of pivots can result in drastically slower O(n) performance, but if at each step we choose the median as the

pivot then it works in O(n log n). Finding the median, however, is an O(n) operation on unsorted lists, and therefore exacts its own penalty.

Counting Sort
Counting sort is applicable when each input is known to belong to a particular set, S, of possibilities. The algorithm runs in O(|S| + n) time and O(|S|) memory where n is the length of the input. It works by creating an integer array of size |S| and using the ith bin to count the occurrences of the ith member of S in the input. Each input is then counted by incrementing the value of its corresponding bin. Afterward, the counting array is looped through to arrange all of the inputs in order. This sorting algorithm cannot often be used because S needs to be reasonably small for it to be efficient, but the algorithm is extremely fast and demonstrates great asymptotic behavior as n increases. It also can be modified to provide stable behavior.

Bucket sort
Bucket sort is a divide and conquer sorting algorithm that generalizes Counting sort by partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. A variation of this method called the single buffered count sort is faster than quicksort and takes about the same time to run on any set of data.[citation
needed]

Due to the fact that bucket sort must use a limited number of buckets it is best suited to be used on data sets of a limited scope. Bucket sort would be unsuitable for data such as social security numbers - which have a lot of variation.

Radix sort
Radix sort is an algorithm that sorts numbers by processing individual digits. n numbers consisting of k digits each are sorted in O(n k) time. Radix sort can either process digits of each number starting from the least significant digit (LSD) or the most significant digit (MSD). The LSD algorithm first sorts the list by the least significant digit while preserving their relative order using a stable sort. Then it sorts them by the next digit, and so on from the least significant to the most significant, ending up with a sorted list. While the LSD radix sort requires the use of a stable sort, the MSD radix sort algorithm does not (unless stable sorting is desired). In-place MSD radix sort is not stable. It is common for the counting sort algorithm to be used internally by the radix sort. Hybrid sorting approach, such as using insertion sort for small bins improves performance of radix sort significantly.

Distribution sort

Distribution sort refers to any sorting algorithm where data is distributed from its input to multiple intermediate structures which are then gathered and placed on the output. See Bucket sort.

Memory usage patterns and index sorting


When the size of the array to be sorted approaches or exceeds the available primary memory, so that (much slower) disk or swap space must be employed, the memory usage pattern of a sorting algorithm becomes important, and an algorithm that might have been fairly efficient when the array fit easily in RAM may become impractical. In this scenario, the total number of comparisons becomes (relatively) less important, and the number of times sections of memory must be copied or swapped to and from the disk can dominate the performance characteristics of an algorithm. Thus, the number of passes and the localization of comparisons can be more important than the raw number of comparisons, since comparisons of nearby elements to one another happen at system bus speed (or, with caching, even at CPU speed), which, compared to disk speed, is virtually instantaneous. For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk. In that scenario, another algorithm may be preferable even if it requires more total comparisons. One way to work around this problem, which works well when complex records (such as in a relational database) are being sorted by a relatively small key field, is to create an index into the array and then sort the index, rather than the entire array. (A sorted version of the entire array can then be produced with one pass, reading from the index, but often even that is unnecessary, as having the sorted index is adequate.) Because the index is much smaller than the entire array, it may fit easily in memory where the entire array would not, effectively eliminating the disk-swapping problem. This procedure is sometimes called "tag sort".[10] Another technique for overcoming the memory-size problem is to combine two algorithms in a way that takes advantages of the strength of each to improve overall performance. For instance, the array might be subdivided into chunks of a size that will fit easily in RAM (say, a few thousand elements), the chunks sorted using an efficient algorithm (such as quicksort or heapsort), and the results merged as per mergesort. This is less efficient than just doing mergesort in the first place, but it requires less physical RAM (to be practical) than a full quicksort on the whole array. Techniques can also be combined. For sorting very large sets of data that vastly exceed system memory, even the index may need to be sorted using an algorithm or combination of algorithms designed to perform reasonably with virtual memory, i.e., to reduce the amount of swapping required.

Inefficient/humorous sorts
These are algorithms that are extremely slow compared to those discussed above Bogosort , Stooge sort O(n
2.7

).

Bubble Sort
Bubble sort is a simple and well-known sorting algorithm. It is used in practice once in a blue moon and its main application is to make an introduction to the sorting algorithms. Bubble sort belongs to O(n2) sorting algorithms, which makes it quite inefficient for sorting large data volumes. Bubble sort is stable and adaptive.

Algorithm
1. Compare each pair of adjacent elements from the beginning of an array and, if they are in reversed order, swap them. 2. If at least one swap has been done, repeat step 1. You can imagine that on every step big bubbles float to the surface and stay there. At the step, when no bubble moves, sorting stops. Let us see an example of sorting an array to make the idea of bubble sort clearer. Example. Sort {5, 1, 12, -5, 16} using bubble sort.

Complexity analysis
Average and worst case complexity of bubble sort is O(n2). Also, it makes O(n2) swaps in the worst case. Bubble sort is adaptive. It means that for almost sorted array it gives O(n) estimation. Avoid implementations, which don't check if the array is already sorted on every step (any swaps made). This check is necessary, in order to preserve adaptive

property.

Turtles and rabbits


One more problem of bubble sort is that its running time badly depends on the initial order of the elements. Big elements (rabbits) go up fast, while small ones (turtles) go down very slow. This problem is solved in the Cocktail sort. Turtle example. Thought, array {2, 3, 4, 5, 1} is almost sorted, it takes O(n2) iterations to sort an array. Element {1} is a turtle.

Rabbit example. Array {6, 1, 2, 3, 4, 5} is almost sorted too, but it takes O(n) iterations to sort it. Element {6} is a rabbit. This example demonstrates adaptive property of the bubble sort.

Insertion sort
Insertion sort

Example of insertion sort sorting a list of random numbers.

Class

Sorting algorithm

Data structure

Array (n2)

Worst case performance

Best case performance

O(n)

Average case performance

(n2)

Worst case space complexity

(n) total, O(1) auxiliary

Insertion sort is a simple sorting algorithm, a comparison sort in which the sorted array (or list) is built one entry at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages: Simple implementation Efficient for (quite) small data sets Adaptive, i.e. efficient for data sets that are already substantially sorted: the time complexity is O(n + d), where d is the number of inversions

More efficient in practice than most other simple quadratic, i.e. O(n2) algorithms such as selection sort or bubble sort; the best case (nearly sorted input) is O(n) Stable, i.e. does not change the relative order of elements with equal keys In-place, i.e. only requires a constant amount O(1) of additional memory space Online, i.e. can sort a list as it receives it Most humans when sortingordering a deck of cards, for exampleuse a method that is similar to insertion sort.[1]

Algorithm
Every repetition of insertion sort removes an element from the input data, inserting it into the correct position in the already-sorted list, until no input elements remain. The choice of which element to remove from the input is arbitrary, and can be made using almost any choice algorithm. Sorting is typically done in-place. The resulting array after k iterations has the property where the first k + 1 entries are sorted. In each iteration the first remaining entry of the input is removed, inserted into the result at the correct position, thus extending the result:

becomes

with each element greater than x copied to the right as it is compared against x. The most common variant of insertion sort, which operates on arrays, can be described as follows: 1. Suppose there exists a function called Insert designed to insert a value into a sorted sequence at the beginning of an array. It operates by beginning at the end of the sequence and shifting each element one place to the right until a suitable position is found for the new element. The function has the side effect of overwriting the value stored immediately after the sorted sequence in the array. 2. To perform an insertion sort, begin at the left-most element of the array and invoke Insert to insert each element encountered into its correct position. The ordered sequence into which the element is inserted is stored at the beginning of

the array in the set of indices already examined. Each insertion overwrites a single value: the value being inserted. Pseudocode of the complete algorithm follows, where the arrays are zero-based and the for-loop includes both the top and bottom limits (as in Pascal):
insertionSort(array A) { This procedure sorts in ascending order. } begin for i := 1 to length[A]-1 do begin value := A[i]; j := i - 1; done := false; repeat { To sort in descending order simply reverse the operator i.e. A[j] < value } if A[j] > value then begin A[j + 1] := A[j]; j := j - 1; if j < 0 then done := true; end else done := true; until done; A[j + 1] := value; end; end;

Below is the pseudocode for insertion sort for a zero-based array (as in C):
1. for j 1 to length(A)-1 2. key A[ j ] 3. > A[ j ] is added in the sorted sequence A[1, .. j-1] 4. i j - 1 5. while i >= 0 and A [ i ] > key 6. A[ i +1 ] A[ i ] 7. i i -1 8. A [i +1] key

Best, worst, and average cases


The best case input is an array that is already sorted. In this case insertion sort has a linear running time (i.e., (n)). During each iteration, the first remaining element of the input is only compared with the right-most element of the sorted subsection of the array. The worst case input is an array sorted in reverse order. In this case every iteration of the inner loop will scan and shift the entire sorted subsection of the array before inserting the next element. For this case insertion sort has a quadratic running time (i.e., O(n2)).

The average case is also quadratic, which makes insertion sort impractical for sorting large arrays. However, insertion sort is one of the fastest algorithms for sorting very small arrays, even faster than quick sort; indeed, good quick sort implementations use insertion sort for arrays smaller than a certain threshold, also when arising as subproblems; the exact threshold must be determined experimentally and depends on the machine, but is commonly around ten. Example: The following table shows the steps for sorting the sequence {5, 7, 0, 3, 4, 2, 6, 1}. For each iteration, the number of positions the inserted element has moved is shown in parentheses. Altogether this amounts to 17 steps.

5 7 0 3 4 2 6 1 (0)

5 7 0 3 4 2 6 1 (0) 0 5 7 3 4 2 6 1 (2) 0 3 5 7 4 2 6 1 (2) 0 3 4 5 7 2 6 1 (2) 0 2 3 4 5 7 6 1 (4) 0 2 3 4 5 6 7 1 (1) 0 1 2 3 4 5 6 7 (6)

Comparisons to other sorting algorithms


Insertion sort is very similar to selection sort. As in selection sort, after k passes through the array, the first k elements are in sorted order. For selection sort these are the k smallest elements, while in insertion sort they are whatever the first k elements were in the unsorted array. Insertion sort's advantage is that it only scans as many elements as needed to determine the correct location of the k+1th element, while selection sort must scan all remaining elements to find the absolute smallest element. Calculations show that insertion sort will usually perform about half as many comparisons as selection sort. Assuming the k+1th element's rank is random, insertion sort will on average require shifting half of the previous k elements, while selection sort always requires scanning all unplaced elements. If the input array is reverse-sorted, insertion sort performs as many comparisons as selection sort. If the input array is already sorted, insertion sort performs as few as n-1 comparisons, thus making insertion sort more efficient when given sorted or "nearly-sorted" arrays.

While insertion sort typically makes fewer comparisons than selection sort, it requires more writes because the inner loop can require shifting large sections of the sorted portion of the array. In general, insertion sort will write to the array O(n2) times, whereas selection sort will write only O(n) times. For this reason selection sort may be preferable in cases where writing to memory is significantly more expensive than reading, such as with EEPROM or flash memory. Some divide-and-conquer algorithms such as quicksort and mergesort sort by recursively dividing the list into smaller sublists which are then sorted. A useful optimization in practice for these algorithms is to use insertion sort for sorting small sublists, where insertion sort outperforms these more complex algorithms. The size of list for which insertion sort has the advantage varies by environment and implementation, but is typically between eight and twenty elements.

Shell sort
Shell sort

Shell sort in action on a list of numbers.

Class

Sorting algorithm, unstable

Data structure

Array

Worst case performance

depends on gap sequence. Best known: O(nlog


2

n)

Best case performance

O(n)

Average case performance

depends on gap sequence

Worst case space complexity

O(n)

Shell sort is a sorting algorithm, devised by Donald Shell in 1959, that is a generalization of insertion sort, which exploits the fact that insertion sort works efficiently on input that is already almost sorted. It improves on insertion sort by allowing the comparison and exchange of elements that are far apart. The last step of Shell sort is a plain insertion sort, but by then, the array of data is guaranteed to be almost sorted. The algorithm is an example of an algorithm that is simple to code but difficult to analyze theoretically.

History
The Shell sort is named after its inventor, Donald Shell, who published the algorithm in 1959.[1] Some older textbooks and references call this the "Shell-Metzner" sort after Marlene Metzner Norton, but according to Metzner, "I had nothing to do with the sort, and my name should never have been attached to it."[2]

Description
The principle of Shell sort is to rearrange the file so that looking at every hth element yields a sorted file. We call such a file h-sorted. If the file is then k-sorted for some other integer k, then the file remains h-sorted.[3] For instance, if a list was 5-sorted and then 3sorted, the list is now not only 3-sorted, but both 5- and 3-sorted. If this were not true, the algorithm would undo work that it had done in previous iterations, and would not achieve such a low running time. The algorithm draws upon a sequence of positive integers known as the increment sequence. Any sequence will do, as long as it ends with 1, but some sequences perform better than others.[4] The algorithm begins by performing a gap insertion sort, with the

gap being the first number in the increment sequence. It continues to perform a gap insertion sort for each number in the sequence, until it finishes with a gap of 1. When the increment reaches 1, the gap insertion sort is simply an ordinary insertion sort, guaranteeing that the final list is sorted. Beginning with large increments allows elements in the file to move quickly towards their final positions, and makes it easier to subsequently sort for smaller increments.[3] Although sorting algorithms exist that are more efficient, Shell sort remains a good choice for moderately large files because it has good running time and is easy to code.

Shell sort algorithm in pseudocode


The following is an implementation of Shell sort written in pseudocode. The increment sequence is a geometric sequence in which every term is roughly 2.2 times smaller than the previous one:
input: an array a of length n with array elements numbered 0 to n 1 inc round(n/2) while inc > 0 do: for i = inc .. n 1 do: temp a[i] j i while j inc and a[j inc] > temp do: a[j] a[j inc] j j inc a[j] temp inc round(inc / 2.2)

Analysis
Although Shell sort is easy to code, analyzing its performance is very difficult and depends on the choice of increment sequence. The algorithm was one of the first to break the quadratic time barrier, but this fact was not proven until some time after its discovery.[4] The initial increment sequence suggested by Donald Shell was [1,2,4,8,16,...,2k], but this is a very poor choice in practice because it means that elements in odd positions are not compared with elements in even positions until the very last step. The original implementation performs O(n2) comparisons and exchanges in the worst case.[3] A simple change, replacing 2k with 2k-1, improves the worst-case running time to O(N3/2),[4] a bound that cannot be improved.[5] A minor change given in V. Pratt's book[5] improved the bound to O(n log2 n). This is worse than the optimal comparison sorts, which are O(n log n), but lends itself to sorting networks and has the same asymptotic gate complexity as Batcher's bitonic sorter.

Consider a small value that is initially stored in the wrong end of the array. Using an O(n2) sort such as bubble sort or insertion sort, it will take roughly n comparisons and exchanges to move this value all the way to the other end of the array. Shell sort first moves values using giant step sizes, so a small value will move a long way towards its final position, with just a few comparisons and exchanges. One can visualize Shell sort in the following way: arrange the list into a table and sort the columns (using an insertion sort). Repeat this process, each time with smaller number of longer columns. At the end, the table has only one column. While transforming the list into a table makes it easier to visualize, the algorithm itself does its sorting in-place (by incrementing the index by the step size, i.e. using i += step_size instead of i++). For example, consider a list of numbers like [ 13 14 94 33 82 25 59 94 65 23 45 27 73 25 39 10 ]. If we started with a step-size of 5, we could visualize this as breaking the list of numbers into a table with 5 columns. This would look like this:
13 14 94 33 82 25 59 94 65 23 45 27 73 25 39 10

We then sort each column, which gives us


10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45

When read back as a single list of numbers, we get [ 10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45 ]. Here, the 10 which was all the way at the end, has moved all the way to the beginning. This list is then again sorted using a 3-gap sort as shown below.
10 25 27 39 94 45 14 23 94 25 65 73 13 33 59 82

Which gives us
10 25 27 39 45 94 14 23 25 65 94 13 33 59 73 82

Now all that remains to be done is a 1-gap sort (simple insertion sort).

Gap sequence

The Shell sort algorithm in action The gap sequence is an integral part of the Shell sort algorithm. The gap sequence that was originally suggested by Donald Shell was to begin with N / 2 and to halve the number until it reaches 1. While this sequence provides significant performance enhancements over the quadratic algorithms such as insertion sort, it can be changed slightly to further decrease the average and worst-case running times. Weiss' 2 textbook[6] demonstrates that this sequence allows a worst case O(n ) sort, if the data is initially in the array as (small_1, large_1, small_2, large_2, ...) - that is, the upper half of the numbers are placed, in sorted order, in the even index locations and the lower end of the numbers are placed similarly in the odd indexed locations. Perhaps the most crucial property of Shell sort is that the elements remain k-sorted even as the gap diminishes.[5] Depending on the choice of gap sequence, Shell sort has a proven worst-case running 2 time of O(n ) (using Shell's increments that start with 1/2 the array size and divide by 2 3/2 k 4/3 each time), O(n ) (using Hibbard's increments of 2 1), O(n ) (using Sedgewick's increments of , or ), or 2 i j O(nlog n) (using Pratt's increments 2 3 ), and possibly unproven better running times. The existence of an O(nlogn) worst-case implementation of Shell sort was precluded by Poonen, Plaxton, and Suel.[7] The best known sequence according to research by Marcin Ciura is 1, 4, 10, 23, 57, 132, 301, 701, 1750.[8] This study also concluded that "comparisons rather than moves should be considered the dominant operation in Shellsort."[9] A Shell sort using this sequence runs faster than an insertion sort, but even if it is faster than a quicksort for small arrays, it is slower for sufficiently big arrays. In his paper Ciura writes "the greatest increments play a minor role in the overall performance of a sequence," making it reasonable to extend Ciura's sequence beyond 701 as a geometric progression with a ratio roughly that of Ciura's last two elements, 1750/701. Taking the successor of each increment g to be floor(g*5/2)+1 for example would then extend Ciura's sequence as 701, 1753, 4383, 10958, 27396, 68491, 171228, 428071, 1070178, , all pairs of which can be seen by

inspection to be well decorrelated, most being relatively prime and none with common divisor greater than 12. Another sequence which performs empirically well[citation needed] on large arrays is the Fibonacci numbers (leaving out one of the starting 1's) to the power of twice the golden ratio, which gives the following sequence: 1, 9, 34, 182, 836, 4025, 19001, 90358, 428481, 2034035, 9651787, 45806244, 217378076, 1031612713, .[10]

Selection sort
Selection sort

Class

Sorting algorithm

Data structure

Array (n2)

Worst case performance

Best case performance

(n2)

Average case performance

(n2)

Worst case space complexity

(n) total, O(1) auxiliary

Selection Sort Animation Selection sort is a sorting algorithm, specifically an in-place comparison sort. It has O(n2) complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations.

Algorithm
The algorithm works as follows: 1. Find the minimum value in the list 2. Swap it with the value in the first position 3. Repeat the steps above for the remainder of the list (starting at the second position and advancing each time)

Effectively, the list is divided into two parts: the sublist of items already sorted, which is built up from left to right and is found at the beginning, and the sublist of items remaining to be sorted, occupying the remainder of the array. Here is an example of this sort algorithm sorting five elements:
64 25 12 22 11 11 25 12 22 64 11 12 25 22 64 11 12 22 25 64 11 12 22 25 64

(nothing appears changed on this last line because the last 2 numbers were already in order) Selection sort can also be used on list structures that make add and remove efficient, such as a linked list. In this case it's more common to remove the minimum element from the remainder of the list, and then insert it at the end of the values sorted so far. For example:
64 25 12 22 11 11 64 25 12 22 11 12 64 25 22 11 12 22 64 25 11 12 22 25 64 /* a[0] to a[n-1] is the array to sort */ int iPos; int iMin; /* advance the position through the entire array */ /* (could do iPos < n-1 because single element is also min element) */ for (iPos = 0; iPos < n; iPos++) { /* find the min element in the unsorted a[iPos .. n-1] */ /* assume the min is the first element */ iMin = iPos; /* test against all other elements */ for (i = iPos+1; i < n; i++) { /* if this element is less, then it is the new minimum */ if (a[i] < a[iMin]) { /* found new minimum; remember its index */ iMin = i; }

} /* iMin is the index of the minimum element. Swap it with the current position */ swap(a, iPos, iMin); }

Mathematical definition
Let L be a non-empty set and 1. 2. such that f(L)

= L' where:

L' is a permutation of L,
for all and ,

3. 4. 5.

s is the smallest element of L, and Ls is the set of elements of L without one instance of the smallest element of L.

Analysis
Selection sort is not difficult to analyze compared to other sorting algorithms since none of the loops depend on the data in the array. Selecting the lowest element requires scanning all n elements (this takes n 1 comparisons) and then swapping it into the first position. Finding the next lowest element requires scanning the remaining n 1 elements and so on, for (n 1) + (n 2) + ... + 2 + 1 = n(n 1) / 2 (n2) comparisons (see arithmetic progression). Each of these scans requires one swap for n 1 elements (the final element is already in place).

Comparison to other sorting algorithms


Among simple average-case (n2) algorithms, selection sort almost always outperforms bubble sort and gnome sort, but is generally outperformed by insertion sort. Insertion sort is very similar in that after the kth iteration, the first k elements in the array are in sorted order. Insertion sort's advantage is that it only scans as many elements as it needs in order to place the k + 1st element, while selection sort must scan all remaining elements to find the k + 1st element. Simple calculation shows that insertion sort will therefore usually perform about half as many comparisons as selection sort, although it can perform just as many or far fewer depending on the order the array was in prior to sorting. It can be seen as an advantage for some real-time applications that selection sort will perform identically regardless of the order of the array, while insertion sort's running time can vary considerably.

However, this is more often an advantage for insertion sort in that it runs much more efficiently if the array is already sorted or "close to sorted." While selection sort is preferable to insertion sort in terms of number of writes ((n) swaps versus (n2) swaps), it almost always far exceeds (and never beats) the number of writes that cycle sort makes, as cycle sort is theoretically optimal in the number of writes. This can be important if writes are significantly more expensive than reads, such as with EEPROM or Flash memory, where every write lessens the lifespan of the memory. Finally, selection sort is greatly outperformed on larger arrays by (n log n) divide-andconquer algorithms such as mergesort. However, insertion sort or selection sort are both typically faster for small arrays (i.e. fewer than 10-20 elements). A useful optimization in practice for the recursive algorithms is to switch to insertion sort or selection sort for "small enough" sublists.

Merge sort
Merge sort

Example of merge sort sorting a list of random dots.

Class

Sorting algorithm

Data structure

Array

Worst case performance

O(nlogn) O(nlogn) typical, O(n) natural variant

Best case performance

Average case performance Worst case space complexity

O(nlogn) O(n) auxiliary

Merge sort is an O(n log n) comparison-based sorting algorithm. Most implementations produce a stable sort, meaning that the implementation preserves the input order of equal elements in the sorted output. It is a divide and conquer algorithm. Merge sort was invented by John von Neumann in 1945.[1]

Algorithm
Conceptually, a merge sort works as follows 1. 2. 3. 4. If the list is of length 0 or 1, then it is already sorted. Otherwise: Divide the unsorted list into two sublists of about half the size. Sort each sublist recursively by re-applying merge sort. Merge the two sublists back into one sorted list.

Merge sort incorporates two main ideas to improve its runtime: 1. A small list will take fewer steps to sort than a large list. 2. Fewer steps are required to construct a sorted list from two sorted lists than two unsorted lists. For example, you only have to traverse each list once if they're already sorted (see the merge function below for an example implementation). Example: Using merge sort to sort a list of integers contained in an array: Suppose we have an array A with n indices ranging from A0 to An 1. We apply merge sort to A(A0..Ac 1) and A(Ac..An 1) where c is the integer part of n / 2. When the two halves are returned they will have been sorted. They can now be merged together to form a sorted array. In a simple pseudocode form, the algorithm could look something like this:

function merge_sort(m) if length(m) 1 return m var list left, right, result var integer middle = length(m) / 2 for each x in m up to middle add x to left for each x in m after middle add x to right left = merge_sort(left) right = merge_sort(right) result = merge(left, right) return result

Following writing merge_sort function, then it is required to merge both the left and right lists created above. There are several variants for the merge() function; one possibility is this:
function merge(left,right) var list result while length(left) > 0 or length(right) > 0 if length(left) > 0 and length(right) > 0 if first(left) first(right) append first(left) to result left = rest(left) else append first(right) to result right = rest(right) else if length(left) > 0 append left to result left = rest(left) else if length(right) > 0 append right to result right = rest(right) end while return result

Analysis

A recursive merge sort algorithm used to sort an array of 7 integer values. These are the steps a human would take to emulate merge sort (top-down). In sorting n objects, merge sort has an average and worst-case performance of O(n log n). If the running time of merge sort for a list of length n is T(n), then the recurrence T(n) = 2T(n/2) + n follows from the definition of the algorithm (apply the algorithm to two lists of half the size of the original list, and add the n steps taken to merge the resulting two lists). The closed form follows from the master theorem. In the worst case, merge sort does an amount of comparisons equal to or slightly smaller than (n lg n - 2lg n + 1), which is between (n lg n - n + 1) and (n lg n + n + O(lg n)).[2] For large n and a randomly ordered input list, merge sort's expected (average) number of comparisons approaches n fewer than the worst case where

In the worst case, merge sort does about 39% fewer comparisons than quicksort does in the average case; merge sort always makes fewer comparisons than quicksort, except in extremely rare cases, when they tie, where merge sort's worst case is found simultaneously with quicksort's best case. In terms of moves, merge sort's worst case complexity is O(n log n)the same complexity as quicksort's best case, and merge sort's best case takes about half as many iterations as the worst case.[citation needed] Recursive implementations of merge sort make 2n 1 method calls in the worst case, compared to quicksort's n, thus merge sort has roughly twice as much recursive overhead as quicksort. However, iterative, non-recursive implementations of merge sort, avoiding method call overhead, are not difficult to code. Merge sort's most common implementation does not sort in place; therefore, the memory size of the input must be

allocated for the sorted output to be stored in (see below for versions that need only n/2 extra spaces). Merge sort as described here also has an often overlooked, but practically important, best-case property. If the input is already sorted, its complexity falls to O(n). Specifically, n-1 comparisons and zero moves are performed, which is the same as for simply running through the input, checking if it is pre-sorted. Sorting in-place is possible (e.g., using lists rather than arrays) but is very complicated, and will offer little performance gains in practice, even if the algorithm runs in O(n log n) time. (Katajainen, Pasanen & Teuhola 1996) In these cases, algorithms like heapsort usually offer comparable speed, and are far less complex. Additionally, unlike the standard merge sort, in-place merge sort is not a stable sort. In the case of linked lists the algorithm does not use more space than that the already used by the list representation, but the O(log(k)) used for the recursion trace. Merge sort is more efficient than quick sort for some types of lists if the data to be sorted can only be efficiently accessed sequentially, and is thus popular in languages such as Lisp, where sequentially accessed data structures are very common. Unlike some (efficient) implementations of quicksort, merge sort is a stable sort as long as the merge operation is implemented properly. As can be seen from the procedure merge sort, there are some demerits. One complaint we might raise is its use of 2n locations; the additional n locations were needed because one couldn't reasonably merge two sorted sets in place. But despite the use of this space the algorithm must still work hard: The contents of m are first copied into left and right and later into the list result on each invocation of merge_sort (variable names according to the pseudocode above). An alternative to this copying is to associate a new field of information with each key (the elements in m are called keys). This field will be used to link the keys and any associated information together in a sorted list (a key and its related information is called a record). Then the merging of the sorted lists proceeds by changing the link values; no records need to be moved at all. A field which contains only a link will generally be smaller than an entire record so less space will also be used. Another alternative for reducing the space overhead to n/2 is to maintain left and right as a combined structure, copy only the left part of m into temporary space, and to direct the merge routine to place the merged output into m. With this version it is better to allocate the temporary space outside the merge routine, so that only one allocation is needed. The excessive copying mentioned in the previous paragraph is also mitigated, since the last pair of lines before the return result statement (function merge in the pseudo code above) become superfluous.

Merge sorting tape drives

Merge sort type algorithms allowed large data sets to be sorted on early computers that had small random access memories by modern standards. Records were stored on magnetic tape and processed on banks of magnetic tape drives, such as these IBM 729s. Merge sort is so inherently sequential that it is practical to run it using slow tape drives as input and output devices. It requires very little memory, and the memory required does not depend on the number of data elements. For the same reason it is also useful for sorting data on disk that is too large to fit entirely into primary memory. On tape drives that can run both backwards and forwards, merge passes can be run in both directions, avoiding rewind time. If you have four tape drives, it works as follows: 1. Divide the data to be sorted in half and put half on each of two tapes 2. Merge individual pairs of records from the two tapes; write two-record chunks alternately to each of the two output tapes 3. Merge the two-record chunks from the two output tapes into four-record chunks; write these alternately to the original two input tapes 4. Merge the four-record chunks into eight-record chunks; write these alternately to the original two output tapes 5. Repeat until you have one chunk containing all the data, sorted --- that is, for log n passes, where n is the number of records. For almost-sorted data on tape, a bottom-up "natural merge sort" variant of this algorithm is popular. The bottom-up "natural merge sort" merges whatever "chunks" of in-order records are already in the data. In the worst case (reversed data), "natural merge sort" performs the same as the aboveit merges individual records into 2-record chunks, then 2-record chunks into 4-record chunks, etc. In the best case (already mostly-sorted data), "natural merge sort" merges large already-sorted chunks into even larger chunks, hopefully finishing in fewer than log n passes. In a simple pseudocode form, the "natural merge sort" algorithm could look something like this:

# Original data is on the input tape; the other tapes are blank function merge_sort(input_tape, output_tape, scratch_tape_C, scratch_tape_D) while any records remain on the input_tape while any records remain on the input_tape merge( input_tape, output_tape, scratch_tape_C) merge( input_tape, output_tape, scratch_tape_D) while any records remain on C or D merge( scratch_tape_C, scratch_tape_D, output_tape) merge( scratch_tape_C, scratch_tape_D, input_tape) # take the next sorted chunk from the input tapes, and merge into the single given output_tape. # tapes are scanned linearly. # tape[next] gives the record currently under the read head of that tape. # tape[current] gives the record previously under the read head of that tape. # (Generally both tape[current] and tape[previous] are buffered in RAM ...) function merge(left[], right[], output_tape[]) do if left[current] right[current] append left[current] to output_tape read next record from left tape else append right[current] to output_tape read next record from right tape while left[current] < left[next] and right[current] < right[next] if left[current] < left[next] append current_left_record to output_tape if right[current] < right[next] append current_right_record to output_tape return

Either form of merge sort can be generalized to any number of tapes.

Optimizing merge sort


On modern computers, locality of reference can be of paramount importance in software optimization, because multi-level memory hierarchies are used. Cache-aware versions of the merge sort algorithm, whose operations have been specifically chosen to minimize the movement of pages in and out of a machine's memory cache, have been proposed. For example, the tiled merge sort algorithm stops partitioning subarrays when subarrays of size S are reached, where S is the number of data items fitting into a single page in memory. Each of these subarrays is sorted with an in-place sorting algorithm, to discourage memory swaps, and normal merge sort is then completed in the standard recursive fashion. This algorithm has demonstrated better performance on machines that benefit from cache optimization. (LaMarca & Ladner 1997) Kronrod (1969) suggested an alternative version of merge sort that uses constant additional space. This algorithm was refined by Katajainen, Pasanen & Teuhola (1996).

Comparison with other sort algorithms


Although heapsort has the same time bounds as merge sort, it requires only (1) auxiliary space instead of merge sort's (n), and is often faster in practical implementations. On typical modern architectures, efficient quicksort implementations generally outperform mergesort for sorting RAM-based arrays. On the other hand, merge sort is a stable sort, parallelizes better, and is more efficient at handling slow-to-access sequential media. Merge sort is often the best choice for sorting a linked list: in this situation it is relatively easy to implement a merge sort in such a way that it requires only (1) extra space, and the slow random-access performance of a linked list makes some other algorithms (such as quicksort) perform poorly, and others (such as heapsort) completely impossible. As of Perl 5.8, merge sort is its default sorting algorithm (it was quicksort in previous versions of Perl). In Java, the Arrays.sort() methods use merge sort or a tuned quicksort depending on the datatypes and for implementation efficiency switch to insertion sort when fewer than seven array elements are being sorted.[3] Python uses timsort, another tuned hybrid of merge sort and insertion sort, which will also become the standard sort algorithm for Java SE 7.[4]

Heapsort
Heapsort

A run of the heapsort algorithm sorting an array of randomly permuted values. In the first stage of the algorithm the array elements are reordered to satisfy the heap property. Before the actual sorting takes place, the heap tree structure is shown briefly for illustration.

Class

Sorting algorithm

Data structure

Array

Worst case performance

(nlogn) (nlogn)[1] (nlogn) (n) total, (1) auxiliary

Best case performance

Average case performance

Worst case space complexity

Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a well implemented quicksort, it has the advantage of a more favorable worst-case (n log n) runtime. Heapsort is an in-place algorithm, but is not a stable sort.

Overview
Heapsort begins by building a heap out of the data set, and then removing the largest item and placing it at the end of the partially sorted array. After removing the largest item, it reconstructs the heap, removes the largest remaining item, and places it in the next open position from the end of the partially sorted array. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements. [2] Heapsort inserts the input list elements into a binary heap data structure. The largest value (in a max-heap) or the smallest value (in a min-heap) are extracted until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction. During extraction, the only space required is that needed to store the heap. To achieve constant space overhead, the heap is stored in the part of the input array not yet sorted. (The storage of heaps as arrays is diagrammed at Binary heap#Heap implementation.)

Heapsort uses two heap operations: insertion and root deletion. Each extraction places an element in the last empty location of the array. The remaining prefix of the array stores the unsorted elements.

Variations
The most important variation to the simple variant is an improvement by R. W. Floyd that, in practice, gives about a 25% speed improvement by using only one comparison in each siftup run, which must be followed by a siftdown for the original child. Moreover, it is more elegant to formulate. Heapsort's natural way of indexing works on indices from 1 up to the number of items. Therefore the start address of the data should be shifted such that this logic can be implemented avoiding unnecessary +/- 1 offsets in the coded algorithm. Ternary heapsort[3] uses a ternary heap instead of a binary heap; that is, each element in the heap has three children. It is more complicated to program, but does a constant number of times fewer swap and comparison operations. This is because each step in the shift operation of a ternary heap requires three comparisons and one swap, whereas in a binary heap two comparisons and one swap are required. The ternary heap does two steps in less time than the binary heap requires for three steps, which multiplies the index by a factor of 9 instead of the factor 8 of three binary steps. Ternary heapsort is about 12% faster than the simple variant of binary heapsort.[citation needed] The smoothsort algorithm[4][5] is a variation of heapsort developed by Edsger Dijkstra in 1981. Like heapsort, smoothsort's upper bound is O(n log n). The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state. Due to its complexity, smoothsort is rarely used. Levcopoulos and Petersson[6] describe a variation of heapsort based on a Cartesian tree that does not add an element to the heap until smaller values on both sides of it have already been included in the sorted output. As they show, this modification can allow the algorithm to sort more quickly than O(n log n) for inputs that are already nearly sorted. Ingo Wegener[7] describes a bottom up version of heapsort that replaces siftdown with an alternative that reduces the worst case from 2n log(n) to 1.5n log(n) and is claimed to perform better than some versions of quicksort.

Comparison with other sorts


Heapsort primarily competes with quicksort, another very efficient general purpose nearly-in-place comparison-based sort algorithm. Quicksort is typically somewhat faster, due to better cache behavior and other factors, but the worst-case running time for quicksort is O(n2), which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation,

creating a security risk. See quicksort for a detailed discussion of this problem, and possible solutions. Thus, because of the O(n log n) upper bound on heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heapsort. Heapsort also competes with merge sort, which has the same time bounds, but requires (n) auxiliary space, whereas heapsort requires only a constant amount. Heapsort also typically runs more quickly in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heapsort: Like quicksort, merge sort on arrays has considerably better data cache performance, often outperforming heapsort on a modern desktop PC, because it accesses the elements in order. Merge sort is a stable sort. Merge sort parallelizes better; the most trivial way of parallelizing merge sort achieves close to linear speedup, while there is no obvious way to parallelize heapsort at all. Merge sort can be easily adapted to operate on linked lists (with (1) extra space[8]) and very large lists stored on slow-to-access media such as disk storage or network attached storage. Heapsort relies strongly on random access, and its poor locality of reference makes it very slow on media with long access times. (Note: Heapsort can also be applied to doubly linked lists with only (1) extra space overhead) Introsort is an interesting alternative to heapsort that combines quicksort and heapsort to retain advantages of both: worst case speed of heapsort and average speed of quicksort.

Pseudocode
The following is the "simple" way to implement the algorithm in pseudocode. Arrays are zero based and swap is used to exchange two elements of the array. Movement 'down' means from the root towards the leaves, or from lower indices to higher. Note that during the sort, the smallest element is at the root of the heap at a[0], while at the end of the sort, the largest element is in a[end].
function heapSort(a, count) is input: an unordered array a of length count (first place a in max-heap order) heapify(a, count) end := count-1 //in languages with zero-based arrays the children are 2*i+1 and 2*i+2 while end > 0 do (swap the root(maximum value) of the heap with the last element of the heap)

swap(a[end], a[0]) (put the heap back in max-heap order) siftDown(a, 0, end-1) (decrease the size of the heap by one so that the previous max value will stay in its proper placement) end := end - 1 function heapify(a, count) is (start is assigned the index in a of the last parent node) start := count / 2 - 1 while start 0 do (sift down the node at index start to the proper place such that all nodes below the start index are in heap order) siftDown(a, start, count-1) start := start - 1 (after sifting down the root all nodes/elements are in heap order) function siftDown(a, start, end) is input: end represents the limit of how far down the heap to sift. root := start while root * 2 + 1 end do (While the root has at least one child) child := root * 2 + 1 (root*2 + 1 points to the left child) swap := root (keeps track of child to swap with) (check if root is smaller than left child) if a[swap] < a[child] swap := child (check if right child exists, and if it's bigger than what we're currently swapping with) if child < end and a[swap] < a[child+1] swap := child + 1 (check if we need to swap at all) if swap != root swap(a[root], a[swap]) root := swap (repeat to continue sifting down the child now) else return

The heapify function can be thought of as building a heap from the bottom up, successively shifting downward to establish the heap property. An alternative version (shown below) that builds the heap top-down and shifts upward is conceptually simpler to grasp. This "siftUp" version can be visualized as starting with an empty heap and successively inserting elements. However, it is asymptotically slower: the "siftDown" version is O(n), and the "siftUp" version is O(n log n) in the worst case. The heapsort algorithm is O(n log n) overall using either version of heapify.
function heapify(a,count) is (end is assigned the index of the first (left) child of the root)

end := 1 while end < count (sift up the node at index end to the proper place such that all nodes above the end index are in heap order) siftUp(a, 0, end) end := end + 1 (after sifting up the last node all nodes are in heap order) function siftUp(a, start, end) is input: start represents the limit of how far up the heap to sift. end is the node to sift up. child := end while child > start parent := floor((child - 1) 2) if a[parent] < a[child] then (out of max-heap order) swap(a[parent], a[child]) child := parent (repeat to continue sifting up the parent now) else return

Radix sort
Radix sort
Class Sorting algorithm

Data structure

Array

Worst case performance

O(kN) O(kN)

Worst case space complexity

In computer science, radix sort is a sorting algorithm that sorts integers by processing individual digits, by comparing individual digits sharing the same significant position. A positional notation is required, but because integers can represent strings of characters (e.g., names or dates) and specially formatted floating point numbers, radix sort is not limited to integers. Radix sort dates back as far as 1887 to the work of Herman Hollerith on tabulating machines.[1]

Most digital computers internally represent all of their data as electronic representations of binary numbers, so processing the digits of integer representations by groups of binary digit representations is most convenient. Two classifications of radix sorts are least significant digit (LSD) radix sorts and most significant digit (MSD) radix sorts. LSD radix sorts process the integer representations starting from the least significant digit and move towards the most significant digit. MSD radix sorts work the other way around. The integer representations that are processed by sorting algorithms are often called "keys", which can exist all by themselves or be associated with other data. LSD radix sorts typically use the following sorting order: short keys come before longer keys, and keys of the same length are sorted lexicographically. This coincides with the normal order of integer representations, such as the sequence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. MSD radix sorts use lexicographic order, which is suitable for sorting strings, such as words, or fixed-length integer representations. A sequence such as "b, c, d, e, f, g, h, i, j, ba" would be lexicographically sorted as "b, ba, c, d, e, f, g, h, i, j". If lexicographic ordering is used to sort variable-length integer representations, then the representations of the numbers from 1 to 10 would be output as 1, 10, 2, 3, 4, 5, 6, 7, 8, 9, as if the shorter keys were leftjustified and padded on the right with blank characters to make the shorter keys as long as the longest key for the purpose of determining sorted order.

Least significant digit radix sorts


A least significant digit (LSD) radix sort is a fast stable sorting algorithm which can be used to sort keys in integer representation order. Keys may be a string of characters, or numerical digits in a given 'radix'. The processing of the keys begins at the least significant digit (i.e., the rightmost digit), and proceeds to the most significant digit (i.e., the leftmost digit). The sequence in which digits are processed by a least significant digit (LSD) radix sort is the opposite of the sequence in which digits are processed by a most significant digit (MSD) radix sort. A least significant digit (LSD) radix sort operates in O(nk) time, where n is the number of keys, and k is the average key length. This kind of performance for variable-length keys can be achieved by grouping all of the keys that have the same length together and separately performing an LSD radix sort on each group of keys for each length, from shortest to longest, in order to avoid processing the whole list of keys on every sorting pass. A radix sorting algorithm was originally used to sort punched cards in several passes. A computer algorithm was invented for radix sort in 1954 at MIT by Harold H. Seward. In many large applications needing speed, the computer radix sort is an improvement on (slower) comparison sorts.

LSD radix sorts have resurfaced as an alternative to high performance comparison-based sorting algorithms (like heapsort and mergesort) that require (n log n) comparisons, where n is the number of items to be sorted. Comparison sorts can do no better than (n log n) execution time but offer the flexibility of being able to sort with respect to more complicated orderings than a lexicographic one; however, this ability is of little importance in many practical applications.

Definition
Each key is first figuratively dropped into one level of buckets corresponding to the value of the rightmost digit. Each bucket preserves the original order of the keys as the keys are dropped into. There is a one-to-one correspondence between the number of buckets and the number of values that can be represented by a digit. Then, the process repeats with the next neighbouring digit until there are no more digits to process. In other words: 1. Take the least significant digit (or group of bits, both being examples of radices) of each key. 2. Group the keys based on that digit, but otherwise keep the original order of keys. (This is what makes the LSD radix sort a stable sort). 3. Repeat the grouping process with each more significant digit. The sort in step 2 is usually done using bucket sort or counting sort, which are efficient in this case since there are usually only a small number of digits.

An example
Original, unsorted list: 170, 45, 75, 90, 802, 24, 2, 66 Sorting by least significant digit (1s place) gives: 170, 90, 802, 2, 24, 45, 75, 66 Notice that we keep 802 before 2, because 802 occurred before 2 in the original list, and similarly for pairs 170 & 90 and 45 & 75. Sorting by next digit (10s place) gives: 802, 2, 24, 45, 66, 170, 75, 90 Sorting by most significant digit (100s place) gives: 2, 24, 45, 66, 75, 90, 170, 802 It is important to realize that each of the above steps requires just a single pass over the data, since each item can be placed in its correct bucket without having to be compared with other items.

Some LSD radix sort implementations allocate space for buckets by first counting the number of keys that belong in each bucket before moving keys into those buckets. The number of times that each digit occurs is stored in an array. Consider the previous list of keys viewed in a different way: 170, 045, 075, 090, 002, 024, 802, 066 The first counting pass starts on the least significant digit of each key, producing an array of bucket sizes: 2 (bucket size for digits of 0: 170, 090) 2 (bucket size for digits of 2: 002, 802) 1 (bucket size for digits of 4: 024) 2 (bucket size for digits of 5: 045, 075) 1 (bucket size for digits of 6: 066) A second counting pass on the next more significant digit of each key will produce an array of bucket sizes: 2 (bucket size for digits of 0: 002, 802) 1 (bucket size for digits of 2: 024) 1 (bucket size for digits of 4: 045) 1 (bucket size for digits of 6: 066) 2 (bucket size for digits of 7: 170, 075) 1 (bucket size for digits of 9: 090) A third and final counting pass on the most significant digit of each key will produce an array of bucket sizes: 6 (bucket size for digits of 0: 002, 024, 045, 066, 075, 090) 1 (bucket size for digits of 1: 170) 1 (bucket size for digits of 8: 802) At least one LSD radix sort implementation now counts the number of times that each digit occurs in each column for all columns in a single counting pass. (See the external links section.) Other LSD radix sort implementations allocate space for buckets dynamically as the space is needed.

Iterative version using queues


A simple version of an LSD radix sort can be achieved using queues as buckets. The following process is repeated for a number of times equal to the length of the longest key: 1. The integers are enqueued into an array of ten separate queues based on their digits from right to left. Computers often represent integers internally as fixedlength binary digits. Here, we will do something analogous with fixed-length

decimal digits. So, using the numbers from the previous example, the queues for the 1st pass would be: 0: 170, 090 1: none 2: 002, 802 3: none 4: 024 5: 045, 075 6: 066 79: none 2. The queues are dequeued back into an array of integers, in increasing order. Using the same numbers, the array will look like this after the first pass: 170, 090, 002, 802, 024, 045, 075, 066 3. For the second pass: Queues: 0: 002, 802 1: none 2: 024 3: none 4: 045 5: none 6: 066 7: 170, 075 8: none 9: 090 Array: 002, 802, 024, 045, 066, 170, 075, 090 (note that at this point only 802 and 170 are out of order) 4. For the third pass: Queues: 0: 002, 024, 045, 066, 075, 090 1: 170 27: none 8: 802 9: none Array: 002, 024, 045, 066, 075, 090, 170, 802 (sorted)

While this may not be the most efficient radix sort algorithm, it is relatively simple, and still quite efficient.

Example in Python
This example written in the Python programming language will perform the radix sort for any radix (base) of 2 or greater. Simplicity of exposition is chosen over clever programming, and so the log function is used instead of bit shifting techniques.
#python2.6 < from math import log def getDigit(n, b, k): return (n // b ** k) % b # pulls the selected digit # create a list of empty lists to

def mkBlanks(k): return [ [] for i in range(k) ] hold the split by digit

def split(l, b, k): x = mkBlanks(b) for i in l: x[getDigit(i, b, k)].append(i) selected by the digit return x

# append the number to the list

def merge(x): # concatenate the lists back in order for the next step l = [] for sublist in x: for item in sublist: l.append(item) return l def maxAbs(l): return max(abs(x) for x in l) list # largest abs value element of a

def radixSort(l, b): passes = int(log(maxAbs(l), b) + 1) there are digits in the longest number t = list(l) for i in range(passes): t = merge(split(t, b, i)) return t

# there are as many passes as

Most significant digit radix sorts


A most significant digit (MSD) radix sort can be used to sort keys in lexicographic order. Unlike a least significant digit (LSD) radix sort, a most significant digit radix sort does not necessarily preserve the original order of duplicate keys. A MSD radix sort starts processing the keys from the most significant digit, leftmost digit, to the least significant digit, rightmost digit. This sequence is opposite that of least significant digit (LSD) radix

sorts. An MSD radix sort stops rearranging the position of a key when the processing reaches a unique prefix of the key. Some MSD radix sorts use one level of buckets in which to group the keys. See the counting sort and pigeonhole sort articles. Other MSD radix sorts use multiple levels of buckets, which form a trie or a path in a trie. A postman's sort / postal sort is a kind of MSD radix sort.

Recursion
A recursively subdividing MSD radix sort algorithm works as follows: 1. Take the most significant digit of each key. 2. Sort the list of elements based on that digit, grouping elements with the same digit into one bucket. 3. Recursively sort each bucket, starting with the next digit to the right. 4. Concatenate the buckets together in order. Implementation A two-pass method can be used to first find out how big each bucket needs to be and then place each key (or pointer to the key) into the appropriate bucket. A single-pass system can also be used, where each bucket is dynamically allocated and resized as needed, but this runs the risk of serious memory fragmentation, discontiguous allocations of memory, which may degrade performance. This memory fragmentation could be avoided if a fixed allocation of buckets is used for all possible values of a digit, but, for an 8-bit digit, this would require 256 (28) buckets, even if not all of the buckets were used. So, this approach might use up all available memory quickly and go into paging space, where data is stored and accessed on a hard drive or some other secondary memory device instead of main memory, which would radically degrade performance. A fixed allocation approach would only make sense if each digit was very small, such as a single bit.

Recursive forward radix sort example


Sort the list: 170, 045, 075, 090, 002, 024, 802, 066 1. Sorting by most significant digit (100s place) gives: Zero hundreds bucket: 045, 075, 090, 002, 024, 066 One hundreds bucket: 170 Eight hundreds bucket: 802 2. Sorting by next digit (10s place) is only needed for those numbers in the zero hundreds bucket (no other buckets contain more than one item): Zero tens bucket: 002 Twenties bucket: 024 Forties bucket: 045 Sixties bucket: 066

Seventies bucket: 075 Nineties bucket: 090 3. Sorting by least significant digit (1s place) is not needed, as there is no tens bucket with more than one number. Therefore, the now sorted zero hundreds bucket is concatenated, joined in sequence, with the one hundreds bucket and eight hundreds bucket to give: 002, 024, 045, 066, 075, 090, 170, 802 This example used base ten digits for the sake of readability, but of course binary digits or perhaps bytes might make more sense for a binary computer to process.

In-place MSD radix sort implementations


Binary MSD radix sort, also called binary quicksort, can be implemented in-place by splitting the input array into two bins - the 0's bin and the 1's bin. The 0's bin is grown from the beginning of the array, whereas the 1's bin is grown from the end of the array. The 0's bin boundary is placed before the first array element. The 1's bin boundary is placed after the last array element. The most significant bit of the first array element is examined. If this bit is a 1, then the first element is swapped with the element in front of the 1's bin boundary (the last element of the array), and the 1's bin is grown by one element by decrementing the 1's boundary array index. If this bit is a 0, then the first element remains at its current location, and the 0's bin is grown by one element. The next array element examined is the one in front of the 0's bin boundary (i.e. the first element that is not in the 0's bin or the 1's bin). This process continues until the 0's bin and the 1's bin reach each other. The 0's bin and the 1's bin are then sorted recursively based on the next bit of each array element. Recursive processing continues until the least significant bit has been used for sorting.[2] [3] Handling signed integers requires treating the most significant bit with the opposite sense, followed by unsigned treatment of the rest of the bits. In-place MSD binary-radix sort can be extended to larger radix and retain in-place capability. Counting sort is used to determine the size of each bin and their starting index. Swapping is used to place the current element into its bin, followed by expanding the bin boundary. As the array elements are scanned the bins are skipped over and only elements between bins are processed, until the entire array has been processed and all elements end up in their respective bins. The number of bins is the same as the radix used - e.g. 16 bins for 16-Radix. Each pass is based on a single digit (e.g. 4-bits per digit in the case of 16Radix), starting from the most significant digit. Each bin is then processed recursively using the next digit, until all digits have been used for sorting.[4] In-place binary-radix sort and n-bit-radix sort discussed in paragraphs above are both not stable algorithms.

Stable MSD radix sort implementations

MSD Radix Sort can be implemented as a stable algorithm, but requires the use of a memory buffer of the same size as the input array. This extra memory allows the input buffer to be scanned from the first array element to last, and move the array elements to the destination bins in the same order. Thus, equal elements will be placed in the memory buffer in the same order they were in the input array. The MSD-based algorithm uses the extra memory buffer as the output on the first level of recursion, but swaps the input and output on the next level of recursion, to avoid the overhead of copying the output result back to the input buffer. Each of the bins are recursively processed, as is done for the inplace MSD Radix Sort. After the sort by the last digit has been completed, the output buffer is checked to see if it is the original input array, and if it's not, then a single copy is performed. If the digit size is chosen such that the key size divided by the digit size is an even number, the copy at the end is avoided.[5]

Hybrid approaches
Radix sort, such as two pass method where Counting sort is used during the first pass of each level of recursion, has a large constant overhead. Thus, when the bins get small, other sorting algorithms should be used, such an Insertion sort. A good implementation of Insertion sort is fast for small arrays, stable, in-place, and can significantly speed up Radix Sort.

Application to parallel computing


Note that this recursive sorting algorithm has particular application to parallel computing, as each of the bins can be sorted independently. In this case, each bin is passed to the next available processor. A single processor would be used at the start (the most significant digit). Then, by the second or third digit, all available processors would likely be engaged. Ideally, as each subdivision is fully sorted, fewer and fewer processors would be utilized. In the worst case, all of the keys will be identical or nearly identical to each other, with the result that there will be little to no advantage to using parallel computing to sort the keys. In the top level of recursion, opportunity for parallelism is in the Counting sort portion of the algorithm. Counting is highly parallel, amenable to the parallel_reduce pattern, and splits the work well across multiple cores until reaching memory bandwidth limit. This portion of the algorithm has data-independent parallelism. Processing each bin in subsequent recursion levels is data-dependent, however. For example, if all keys were of the same value, then there would be only a single bin with any elements in it, and no parallelism would be available. For random inputs all bins would be near equally populated and a large amount of parallelism opportunity would be available. [6] Note that there are faster sorting algorithms available, for example optimal complexity O(log(n)) are those of the Three Hungarians and Richard Cole[7][8] and Batcher's bitonic mergesort has an algorithmic complexity of O(log2(n)), all of which have a lower algorithmic time complexity to radix sort on a CREW-PRAM. The fastest known PRAM sorts were described in 1991 by David Powers with a parallelized quicksort that can

operate in O(log(n)) time given enough processors by performing partitioning implicitly, as well as a radixsort that operates using the same trick in O(k), where k is the maximum keylength.[9] However, neither the PRAM architecture or a single sequential processor can actually be built in a way that will scale without the number of constant fanout gate delays per cycle increasing as O(log(n)), so that in effect a pipelined version of Batcher's bitonic mergesort and the O(log(n)) PRAM sorts are all O(log2(n)) in terms of clock cycles, with Powers acknowledging that Batcher's would have lower constant in terms of gate delays than his Parallel Quicksort and Radixsort or Cole's Mergesort for a keylengthindependent sorting network of O(nlog2(n)).[10]

Incremental trie-based radix sort


Another way to proceed with an MSD radix sort is to use more memory to create a trie to represent the keys and then traverse the trie to visit each key in order. A depth-first traversal of a trie starting from the root node will visit each key in order. A depth-first traversal of a trie, or any other kind of acyclic tree structure, is equivalent to traversing a maze via the right-hand rule. A trie essentially represents a set of strings or numbers, and a radix sort which uses a trie structure is not necessarily stable, which means that the original order of duplicate keys is not necessarily preserved, because a set does not contain duplicate elements. Additional information will have to be associated with each key to indicate the population count or original order of any duplicate keys in a trie-based radix sort if keeping track of that information is important for a particular application. It may even be desirable to discard any duplicate strings as the trie creation proceeds if the goal is to find only unique strings in sorted order. Some people sort a list of strings first and then make a separate pass through the sorted list to discard duplicate strings, which can be slower than using a trie to simultaneously sort and discard duplicate strings in one pass. One of the advantages of maintaining the trie structure is that the trie makes it possible to determine quickly if a particular key is a member of the set of keys in a time that is proportional to the length of the key, k, in O(k) time, that is independent of the total number of keys. Determining set membership in a plain list, as opposed to determining set membership in a trie, requires binary search, O(klog(n)) time; linear search, O(kn) time; or some other method whose execution time is in some way dependent on the total number, n, of all of the keys in the worst case. It is sometimes possible to determine set membership in a plain list in O(k) time, in a time that is independent of the total number of keys, such as when the list is known to be in an arithmetic sequence or some other computable sequence. Maintaining the trie structure also makes it possible to insert new keys into the set incrementally or delete keys from the set incrementally while maintaining sorted order in O(k) time, in a time that is independent of the total number of keys. In contrast, other radix sorting algorithms must, in the worst case, re-sort the entire list of keys each time that a new key is added or deleted from an existing list, requiring O(kn) time.

Snow White analogy

If the nodes were rooms connected by hallways, then here is how Snow White might proceed to visit all of the dwarfs if the place were dark, keeping her right hand on a wall at all times: 1. She travels down hall B to find Bashful. 2. She continues moving forward with her right hand on the wall, which takes her around the room and back up hall B. 3. She moves down halls D, O, and C to find Doc. 4. Continuing to follow the wall with her right hand, she goes back up hall C, then down hall P, where she finds Dopey. 5. She continues back up halls P, O, D, and then goes down hall G to find Grumpy. 6. She goes back up hall G, with her right hand still on the wall, and goes down hall H to the room where Happy is. 7. She travels back up hall H and turns right down halls S and L, where she finds Sleepy. 8. She goes back up hall L, down hall N, where she finally finds Sneezy. 9. She travels back up halls N and S to her starting point and knows that she is done. These series of steps serve to illustrate the path taken in the trie by Snow White via a depth-first traversal to visit the dwarfs by the ascending order of their names, Bashful, Doc, Dopey, Grumpy, Happy, Sleepy, and Sneezy. The algorithm for performing some operation on the data associated with each node of a tree first, such as printing the data, and then moving deeper into the tree is called a pre-order traversal, which is a kind of depth-first traversal. A pre-order traversal is used to process the contents of a trie in ascending order. If Snow White wanted to visit the dwarfs by the descending order of their names, then she could walk backwards while following the wall with her right hand, or, alternatively, walk forward while following the wall with her left hand. The algorithm for moving deeper into a tree first until no further descent to unvisited nodes is possible and then performing some operation on the data associated with each node is called postorder traversal, which is another kind of depth-first traversal. A post-order traversal is used to process the contents of a trie in descending order.

The root node of the trie in the diagram essentially represents a null string, an empty string, which can be useful for keeping track of the number of blank lines in a list of words. The null string can be associated with a circularly linked list with the null string initially as its only member, with the forward and backward pointers both initially pointing to the null string. The circularly linked list can then be expanded as each new key is inserted into the trie. The circularly linked list is represented in the following diagram as thick, grey, horizontally linked lines:

If a new key, other than the null string, is inserted into a leaf node of the trie, then the computer can go to the last preceding node where there was a key or a bifurcation to perform a depth-first search to find the lexicographic successor or predecessor of the inserted key for the purpose of splicing the new key into the circularly linked list. The last preceding node where there was a key or a bifurcation, a fork in the path, is a parent node in the type of trie shown here, where only unique string prefixes are represented as paths in the trie. If there is already a key associated with the parent node that would have been visited during a movement away from the root during a right-hand, forward-moving, depth-first traversal, then that immediately ends the depth-first search, as that key is the predecessor of the inserted key. For example, if Bashful is inserted into the trie, then the predecessor is the null string in the parent node, which is the root node in this case. In other words, if the key that is being inserted is on the leftmost branch of the parent node, then any string contained in the parent node is the lexicographic predecessor of the key that is being inserted, else the lexicographic predecessor of the key that is being inserted exists down the parent node's branch that is immediately to the left of the branch where the new key is being inserted. For example, if Grumpy were the last key inserted into the trie, then the computer would have a choice of trying to find either the predecessor, Dopey, or the successor, Happy, with a depth-first search starting from the parent node of Grumpy. With no additional information to indicate which path is longer, the computer might traverse the longer path, D, O, P. If Dopey were the last key inserted into the trie, then the depth-first search starting from the parent node of Dopey would soon find the predecessor, "Doc", because that would be the only choice. If a new key is inserted into an internal node, then a depth-first search can be started from the internal node to find the lexicographic successor. For example, if the literal string "DO" were inserted in the node at the end of the path D, O, then a depth-first search could be started from that internal node to find the successor, "DOC", for the purpose of splicing the new string into the circularly linked list.

Forming the circularly linked list requires more memory but allows the keys to be visited more directly in either ascending or descending order via a linear traversal of the linked list rather than a depth-first traversal of the entire trie. This concept of a circularly linked trie structure is similar to the concept of a threaded binary tree. This structure will be called a circularly threaded trie.

When a trie is used to sort numbers, the number representations must all be the same length unless you are willing to perform a breadth-first traversal. When the number representations will be visited via depth-first traversal, as in the above diagram, the number representations will always be on the leaf nodes of the trie. Note how similar in concept this particular example of a trie is to the recursive forward radix sort example which involves the use of buckets instead of a trie. Performing a radix sort with the buckets is like creating a trie and then discarding the non-leaf nodes.

UNIT 5

Graph (data structure)

A labeled graph of 6 vertices and 7 edges.

In computer science, a graph is an abstract data structure that is meant to implement the graph concept from mathematics.

A graph data structure consists mainly of a finite (and possibly mutable) set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices. As in mathematics, an edge (x,y) is said to point or go from x to y. The nodes may be part of the graph structure, or may be external entities represented by integer indices or references. A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.).

Operations
The basic operations provided by a graph data structure G usually include:
adjacent(G, x,y): tests whether there is an edge from node x to node y. neighbors(G, x): lists all nodes y such that there is an edge from x to y. add(G, x,y): adds to G the edge from x to y, if it is not there. delete(G, x,y): removes the edge from x to y, if it is there. get_node_value(G, x): returns the value associated with the node x. set_node_value(G, x, a): sets the value associated with the node x to a.

Structures that associate values to the edges usually also provide:


get_edge_value(G, set_edge_value(G,

x,y): returns the value associated to the edge (x,y). x,y,v): sets the value associated to the edge (x,y) to v.

Representations
Different data structures for the representation of graphs are used in practice: Adjacency list - An adjacency list is implemented as an array of lists, with one list of destination nodes for each source node. Incidence list - A variant of the adjacency list that allows for the description of the edges at the cost of additional edges. Adjacency matrix - A two-dimensional Boolean matrix, in which the rows and columns represent source and destination vertices and entries in the matrix indicate whether an edge exists between the vertices associated with that row and column. Incidence matrix - A two-dimensional Boolean matrix, in which the rows represent the vertices and columns represent the edges. The array entries indicate if both are related, i.e. incident.

Adjacency lists are preferred for sparse graphs; otherwise, an adjacency matrix is a good choice. For graphs with some regularity in the placement of edges, a symbolic graph is a possible choice of representation.

Algorithms
Graph algorithms are a significant field of interest within computer science. Typical higher-level operations associated with graphs are: finding a path between two nodes, like depth-first search and breadth-first search and finding the shortest path from one node to another, like Dijkstra's algorithm. A solution to finding the shortest path from each node to every other node also exists in the form of the Floyd-Warshall algorithm. A directed graph can be seen as a flow network, where each edge has a capacity and each edge receives a flow. The Ford-Fulkerson algorithm is used to find out the maximum flow from a source to a sink in a graph.

Graphs
A graph is a mathematical structure consisting of a set of vertexes (also called nodes) and a set of edges . An edge is a pair of vertexes . The two vertexes are called the edge endpoints. Graphs are ubiquitous in computer science. They are used to model real-world systems such as the Internet (each node represents a router and each edge represents a connection between routers); airline connections (each node is an airport and each edge is a flight); or a city road network (each node represents an intersection and each edge represents a block). The wireframe drawings in computer graphics are another example of graphs. A graph may be either undirected or directed. Intuitively, an undirected edge models a "two-way" or "duplex" connection between its endpoints, while a directed edge is a oneway connection, and is typically drawn as an arrow. A directed edge is often called an arc. Mathematically, an undirected edge is an unordered pair of vertexes, and an arc is an ordered pair. For example, a road network might be modeled as a directed graph, with one-way streets indicated by an arrow between endpoints in the appropriate direction, and two-way streets shown by a pair of parallel directed edges going both directions between the endpoints. You might ask, why not use a single undirected edge for a two-way street. There's no theoretical problem with this, but from a practical programming standpoint, it's generally simpler and less error-prone to stick with all directed or all undirected edges.

An undirected graph can have at most edges (one for each unordered pair), while a directed graph can have at most n2 edges (one per ordered pair). A graph is called

sparse if it has many fewer than this many edges (typically O(n) edges), and dense if it has closer to (n2) edges. A multigraph can have more than one edge between the same two vertices. For example, if one were modeling airline flights, there might be multiple flights between two cities, occurring at different times of the day. A path in a graph is a sequence of vertexes such that there exists an edge or arc between consecutive vertexes. The path is called a cycle if . An undirected acyclic graph is equivalent to an undirected tree. A directed acyclic graph is called a DAG. It is not necessarily a tree. Nodes and edges often have associated information, such as labels or weights. For example, in a graph of airline flights, a node might be labeled with the name of the corresponding airport, and an edge might have a weight equal to the flight time. The popular game "Six Degrees of Kevin Bacon" can be modeled by a labeled undirected graph. Each actor becomes a node, labeled by the actor's name. Nodes are connected by an edge when the two actors appeared together in some movie. We can label this edge by the name of the movie. Deciding if an actor is separated from Kevin Bacon by six or fewer steps is equivalent to finding a path of length at most six in the graph between Bacon's vertex and the other actors vertex. (This can be done with the breadth-first search algorithm found in the companion Algorithms book. The Oracle of Bacon at the University of Virginia has actually implemented this algorithm and can tell you the path from any actor to Kevin Bacon in a few clicks[1].)

Directed Graphs

A directed graph. The number of edges with one endpoint on a given vertex is called that vertex's degree. In a directed graph, the number of edges that point to a given vertex is called its indegree, and the number that point from it is called its out-degree. Often, we may want to be able to distinguish between different nodes and edges. We can associate labels with either. We call such a graph labeled. Directed Graph Operations
make-graph(): graph

Create a new graph, initially with no nodes or edges.


make-vertex(graph G, element value): vertex

Create a new vertex, with the given value.

make-edge(vertex u, vertex v): edge

Create an edge between u and v. In a directed graph, the edge will flow from u to v.
get-edges(vertex v): edge-set

Returns the set of edges flowing from v


get-neighbors(vertex v): vertex-set

Returns the set of vertexes connected to v

Undirected Graphs

An undirected graph. In a directed graph, the edges point from one vertex to another, while in an undirected graph, they merely connect two vertices.

Weighted Graphs
We may also want to associate some cost or weight to the traversal of an edge. When we add this information, the graph is called weighted. An example of a weighted graph would be the distance between the capitals of a set of countries. Directed and undirected graphs may both be weighted. The operations on a weighted graph are the same with addition of a weight parameter during edge creation: Weighted Graph Operations (an extension of undirected/directed graph operations)
make-edge(vertex u, vertex v, weight w): edge

Create an edge between u and v with weight w. In a directed graph, the edge will flow from u to v.

Graph Representations
Adjacency Matrix Representation An adjacency matrix is one of the two common ways to represent a graph. The adjacency matrix shows which nodes are adjacent to one another. Two nodes are adjacent if there is an edge connecting them. In the case of a directed graph, if node j is adjacent to node i, there is an edge from i to j. In other words, if j is adjacent to i, you can get from i to j by traversing one edge. For a given graph with n nodes, the adjacency

matrix will have dimensions of . For an unweighted graph, the adjacency matrix will be populated with boolean values. For any given node i, you can determine its adjacent nodes by looking at row of the adjacency matrix. A value of true at indicates that there is an edge from node i to node j, and false indicating no edge. In an undirected graph, the values of and will be equal. In a weighted graph, the boolean values will be replaced by the weight of the edge connecting the two nodes, with a special value that indicates the absence of an edge. The memory use of an adjacency matrix is O(n2). Adjacency List Representation The adjacency list is another common representation of a graph. There are many ways to implement this adjacency representation. One way is to have the graph maintain a list of lists, in which the first list is a list of indices corresponding to each node in the graph. Each of these refer to another list that stores a the index of each adjacent node to this one. It might also be useful to associate the weight of each link with the adjacent node in this list. Example: An undirected graph contains four nodes 1, 2, 3 and 4. 1 is linked to 2 and 3. 2 is linked to 3. 3 is linked to 4. 1 - [2, 3] 2 - [1, 3] 3 - [1, 2, 4] 4 - [3] It might be useful to store the list of all the nodes in the graph in a hash table. The keys then would correspond to the indices of each node and the value would be a reference to the list of adjacent node indecies. Another implementation might require that each node keep a list of its adjacent nodes.

Graph Traversals
Depth-First Search Start at vertex v, visit its neighbour w, then w's neighbour y and keep going until reach 'a dead end' then iterate back and visit nodes reachable from second last visited vertex and keep applying the same principle.

[TODO: depth-first search -- with compelling example] // Search in the subgraph for a node matching 'criteria'. Do not reexamine // nodes listed in 'visited' which have already been tested. GraphNode depth_first_search(GraphNode node, Predicate criteria, VisitedSet visited) { // Check that we haven't already visited this part of the graph if (visited.contains(node)) { return null; } visited.insert(node); // Test to see if this node satifies the criteria if (criteria.apply(node.value)) { return node; } // Search adjacent nodes for a match for (adjacent in node.adjacentnodes()) { GraphNode ret = depth_first_search(adjacent, criteria, visited); if (ret != null) { return ret; } } // Give up - not in this part of the graph return null; } Breadth-First Search

Breadth first search visits the nodes neighbours and then the univisited neighbours of the neighbours etc. If it starts on vertex a it will go to all vertices that have an edge from a. If some points are not reachable it will have to start another BFS from a new vertex.
[TODO: breadth-first search -- with compelling example] [TODO: topological sort -- with compelling example]

Graph (mathematics)

A drawing of a labeled graph on 6 vertices and 7 edges.

In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges. Typically, a graph is depicted in diagrammatic form as a set of dots for the vertices, joined by lines or curves for the edges. Graphs are one of the objects of study in discrete mathematics. The edges may be directed (asymmetric) or undirected (symmetric). For example, if the vertices represent people at a party, and there is an edge between two people if they shake hands, then this is an undirected graph, because if person A shook hands with person B, then person B also shook hands with person A. On the other hand, if the vertices represent people at a party, and there is an edge from person A to person B when person A knows of person B, then this graph is directed, because knowing of someone is not necessarily a symmetric relation (that is, one person knowing of another person does not necessarily imply the reverse; for example, many fans may know of a celebrity, but the celebrity is unlikely to know of all their fans). This latter type of graph is called a directed graph and the edges are called directed edges or arcs; in contrast, a graph where the edges are not directed is called undirected. Vertices are also called nodes or points, and edges are also called lines. Graphs are the basic subject studied by graph theory. The word "graph" was first used in this sense by James Joseph Sylvester in 1878.[1]

[edit] Definitions
Definitions in graph theory vary. The following are some of the more basic ways of defining graphs and related mathematical structures.

Graph

A general example of a graph (actually, a pseudograph) with three vertices and six edges. In the most common sense of the term,[2] a graph is an ordered pair G = (V, E) comprising a set V of vertices or nodes together with a set E of edges or lines, which are

2-element subsets of V (i.e., an edge is related with two vertices, and the relation is represented as unordered pair of the vertices with respect to the particular edge). To avoid ambiguity, this type of graph may be described precisely as undirected and simple. Other senses of graph stem from different conceptions of the edge set. In one more generalized notion,[3] E is a set together with a relation of incidence that associates with each edge two vertices. In another generalized notion, E is a multiset of unordered pairs of (not necessarily distinct) vertices. Many authors call this type of object a multigraph or pseudograph. All of these variants and others are described more fully below. The vertices belonging to an edge are called the ends, endpoints, or end vertices of the edge. A vertex may exist in a graph and not belong to an edge. V and E are usually taken to be finite, and many of the well-known results are not true (or are rather different) for infinite graphs because many of the arguments fail in the infinite case. The order of a graph is | V | (the number of vertices). A graph's size is | E | , the number of edges. The degree of a vertex is the number of edges that connect to it, where an edge that connects to the vertex at both ends (a loop) is counted twice. For an edge {u, v}, graph theorists usually use the somewhat shorter notation uv.

Adjacency relation
The edges E of an undirected graph G induce a symmetric binary relation ~ on V that is called the adjacency relation of G. Specifically, for each edge {u, v} the vertices u and v are said to be adjacent to one another, which is denoted u ~ v.

Types of graphs
Distinction in terms of the main definition
As stated above, in different contexts it may be useful to define the term graph with different degrees of generality. Whenever it is necessary to draw a strict distinction, the following terms are used. Most commonly, in modern texts in graph theory, unless stated otherwise, graph means "undirected simple finite graph" (see the definitions below). Undirected graph A graph in which edges have no orientation, i.e., they are not ordered pairs, but sets {u, v} (or 2-multisets) of vertices. Directed graph

A directed graph or digraph is an ordered pair D = (V, A) with V a set whose elements are called vertices or nodes, and A a set of ordered pairs of vertices, called arcs, directed edges, or arrows. An arc a = (x, y) is considered to be directed from x to y; y is called the head and x is called the tail of the arc; y is said to be a direct successor of x, and x is said to be a direct predecessor of y. If a path leads from x to y, then y is said to be a successor of x and reachable from x, and x is said to be a predecessor of y. The arc (y, x) is called the arc (x, y) inverted. A directed graph D is called symmetric if, for every arc in D, the corresponding inverted arc also belongs to D. A symmetric loopless directed graph D = (V, A) is equivalent to a simple undirected graph G = (V, E), where the pairs of inverse arcs in A correspond 1-to1 with the edges in E; thus the edges in G number |E| = |A|/2, or half the number of arcs in D. A variation on this definition is the oriented graph, in which not more than one of (x, y) and (y, x) may be arcs. Mixed graph A mixed graph G is a graph in which some edges may be directed and some may be undirected. It is written as an ordered triple G = (V, E, A) with V, E, and A defined as above. Directed and undirected graphs are special cases. Multigraph A loop is an edge (directed or undirected) which starts and ends on the same vertex; these may be permitted or not permitted according to the application. In this context, an edge with two different ends is called a link. The term "multigraph" is generally understood to mean that multiple edges (and sometimes loops) are allowed. Where graphs are defined so as to allow loops and multiple edges, a multigraph is often defined to mean a graph without loops,[4] however, where graphs are defined so as to disallow loops and multiple edges, the term is often defined to mean a "graph" which can have both multiple edges and loops,[5] although many use the term "pseudograph" for this meaning.[6]

Simple graph

A simple graph with three vertices and three edges. Each vertex has degree two, so this is also a regular graph. As opposed to a multigraph, a simple graph is an undirected graph that has no loops and no more than one edge between any two different vertices. In a simple graph the edges of the graph form a set (rather than a multiset) and each edge is a pair of distinct vertices. In a simple graph with n vertices every vertex has a degree that is less than n (the converse, however, is not true - there exist non-simple graphs with n vertices in which every vertex has a degree smaller than n). Weighted graph A graph is a weighted graph if a number (weight) is assigned to each edge. Such weights might represent, for example, costs, lengths or capacities, etc. depending on the problem. The weight of the graph is the sum of the weights given to all edges. Half-edges, loose edges In exceptional situations it is even necessary to have edges with only one end, called half-edges, or no ends (loose edges); see for example signed graphs and biased graphs.

Important graph classes


Regular graph Main article: Regular graph A regular graph is a graph where each vertex has the same number of neighbors, i.e., every vertex has the same degree or valency. A regular graph with vertices of degree k is called a k-regular graph or regular graph of degree k. Complete graph

A complete graph with 5 vertices. Each vertex has an edge to every other vertex. Main article: Complete graph Complete graphs have the feature that each pair of vertices has an edge connecting them. Finite and infinite graphs A finite graph is a graph G = (V, E) such that V and E are finite sets. An infinite graph is one with an infinite set of vertices or edges or both. Most commonly in graph theory it is implied that the graphs discussed are finite. If the graphs are infinite, that is usually specifically stated. Graph classes in terms of connectivity Main article: Connectivity (graph theory) In an undirected graph G, two vertices u and v are called connected if G contains a path from u to v. Otherwise, they are called disconnected. A graph is called connected if every pair of distinct vertices in the graph is connected; otherwise, it is called disconnected. A graph is called k-vertex-connected or k-edge-connected if no set of k-1 vertices (respectively, edges) exists that disconnects the graph. A k-vertex-connected graph is often called simply k-connected. A directed graph is called weakly connected if replacing all of its directed edges with undirected edges produces a connected (undirected) graph. It is strongly connected or strong if it contains a directed path from u to v and a directed path from v to u for every pair of vertices u, v.

Properties of graphs
Two edges of a graph are called adjacent (sometimes coincident) if they share a common vertex. Two arrows of a directed graph are called consecutive if the head of the first one is at the nock (notch end) of the second one. Similarly, two vertices are called adjacent if they share a common edge (consecutive if they are at the notch and at the

head of an arrow), in which case the common edge is said to join the two vertices. An edge and a vertex on that edge are called incident. The graph with only one vertex and no edges is called the trivial graph. A graph with only vertices and no edges is known as an edgeless graph. The graph with no vertices and no edges is sometimes called the null graph or empty graph, but the terminology is not consistent and not all mathematicians allow this object. In a weighted graph or digraph, each edge is associated with some value, variously called its cost, weight, length or other term depending on the application; such graphs arise in many contexts, for example in optimal routing problems such as the traveling salesman problem. Normally, the vertices of a graph, by their nature as elements of a set, are distinguishable. This kind of graph may be called vertex-labeled. However, for many questions it is better to treat vertices as indistinguishable; then the graph may be called unlabeled. (Of course, the vertices may be still distinguishable by the properties of the graph itself, e.g., by the numbers of incident edges). The same remarks apply to edges, so that graphs which have labeled edges are called edge-labeled graphs. Graphs with labels attached to edges or vertices are more generally designated as labeled. Consequently, graphs in which vertices are indistinguishable and edges are indistinguishable are called unlabeled. (Note that in the literature the term labeled may apply to other kinds of labeling, besides that which serves only to distinguish different vertices or edges.)

Examples

A graph with six nodes. The diagram at right is a graphic representation of the following graph: V = {1, 2, 3, 4, 5, 6} E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}. In category theory a small category can be represented by a directed multigraph in which the objects of the category represented as vertices and the morphisms as directed edges. Then, the functors between categories induce some, but not necessarily all, of the digraph morphisms of the graph.

In computer science, directed graphs are used to represent knowledge (e.g., Conceptual graph), finite state machines, and many other discrete structures. A binary relation R on a set X defines a directed graph. An element x of X is a direct predecessor of an element y of X iff xRy.

Important graphs
Basic examples are: In a complete graph, each pair of vertices is joined by an edge; that is, the graph contains all possible edges. In a bipartite graph, the vertex set can be partitioned into two sets, W and X, so that no two vertices in W are adjacent and no two vertices in X are adjacent. Alternatively, it is a graph with a chromatic number of 2. In a complete bipartite graph, the vertex set is the union of two disjoint sets, W and X, so that every vertex in W is adjacent to every vertex in X but there are no edges within W or X. In a linear graph or path graph of length n, the vertices can be listed in order, v0, v1, ..., vn, so that the edges are vi1vi for each i = 1, 2, ..., n. If a linear graph occurs as a subgraph of another graph, it is a path in that graph. In a cycle graph of length n 3, vertices can be named v1, ..., vn so that the edges are vi1vi for each i = 2,...,n in addition to vnv1. Cycle graphs can be characterized as connected 2-regular graphs. If a cycle graph occurs as a subgraph of another graph, it is a cycle or circuit in that graph. A planar graph is a graph whose vertices and edges can be drawn in a plane such that no two of the edges intersect (i.e., embedded in a plane). A tree is a connected graph with no cycles. A forest is a graph with no cycles (i.e. the disjoint union of one or more trees). More advanced kinds of graphs are: The Petersen graph and its generalizations Perfect graphs Cographs Other graphs with large automorphism groups: vertex-transitive, arc-transitive, and distance-transitive graphs. Strongly regular graphs and their generalization distance-regular graphs.

Operations on graphs
There are several operations that produce new graphs from old ones, which might be classified into the following categories:

Elementary operations, sometimes called "editing operations" on graphs, which create a new graph from the original one by a simple, local change, such as addition or deletion of a vertex or an edge, merging and splitting of vertices, etc. Graph rewrite operations replacing the occurrence of some pattern graph within the host graph by an instance of the corresponding replacement graph. Unary operations, which create a significantly new graph from the old one. Examples: o Line graph o Dual graph o Complement graph Binary operations, which create new graph from two initial graphs. Examples: o Disjoint union of graphs o Cartesian product of graphs o Tensor product of graphs o Strong product of graphs o Lexicographic product of graphs

Generalizations
In a hypergraph, an edge can join more than two vertices. An undirected graph can be seen as a simplicial complex consisting of 1-simplices (the edges) and 0-simplices (the vertices). As such, complexes are generalizations of graphs since they allow for higher-dimensional simplices. Every graph gives rise to a matroid. In model theory, a graph is just a structure. But in that case, there is no limitation on the number of edges: it can be any cardinal number, see continuous graph. In computational biology, power graph analysis introduces power graphs as an alternative representation of undirected graphs.

Adjacency matrix
In mathematics and computer science, an adjacency matrix is a means of representing which vertices of a graph are adjacent to which other vertices. Another matrix representation for a graph is the incidence matrix. Specifically, the adjacency matrix of a finite graph G on n vertices is the n n matrix where the non-diagonal entry aij is the number of edges from vertex i to vertex j, and the diagonal entry aii, depending on the convention, is either once or twice the number of edges (loops) from vertex i to itself. Undirected graphs often use the former convention of counting loops twice, whereas directed graphs typically use the latter convention. There exists a unique adjacency matrix for each isomorphism class of graphs (up to

permuting rows and columns), and it is not the adjacency matrix of any other isomorphism class of graphs. In the special case of a finite simple graph, the adjacency matrix is a (0,1)-matrix with zeros on its diagonal. If the graph is undirected, the adjacency matrix is symmetric. The relationship between a graph and the eigenvalues and eigenvectors of its adjacency matrix is studied in spectral graph theory.

Examples
Here is an example of a labeled graph and its adjacency matrix. The convention followed here is that an adjacent edge counts 1 in the matrix for an undirected graph. (X,Y coordinates are 1-6) Labeled graph Adjacency matrix

The adjacency matrix of a complete graph is all 1's except for 0's on the diagonal. The adjacency matrix of an empty graph is a zero matrix.

Adjacency matrix of a bipartite graph


The adjacency matrix A of a bipartite graph whose parts have r and s vertices has the form

where B is an r s matrix and O is an all-zero matrix. Clearly, the matrix B uniquely represents the bipartite graphs, and it is commonly called its biadjacency matrix.

Formally, let G = (U, V, E) be a bipartite graph or bigraph with parts U = u1,...,ur and V = v1,...,vs. An r x s 0-1 matrix B is called the biadjacency matrix if Bi,j = 1 iff . If G is a bipartite multigraph or weighted graph then the elements Bi,j are taken to be the number of edges between or the weight of (ui,vj) respectively.

Properties
The adjacency matrix of an undirected simple graph is symmetric, and therefore has a complete set of real eigenvalues and an orthogonal eigenvector basis. The set of eigenvalues of a graph is the spectrum of the graph. Suppose two directed or undirected graphs G1 and G2 with adjacency matrices A1 and A2 are given. G1 and G2 are isomorphic if and only if there exists a permutation matrix P such that

PA1P 1 = A2.
In particular, A1 and A2 are similar and therefore have the same minimal polynomial, characteristic polynomial, eigenvalues, determinant and trace. These can therefore serve as isomorphism invariants of graphs. However, two graphs may possess the same set of eigenvalues but not be isomorphic. If A is the adjacency matrix of the directed or undirected graph G, then the matrix An (i.e., the matrix product of n copies of A) has an interesting interpretation: the entry in row i and column j gives the number of (directed or undirected) walks of length n from vertex i to vertex j. This implies, for example, that the number of triangles in an undirected graph G is exactly the trace of A3 divided by 6. The main diagonal of every adjacency matrix corresponding to a graph without loops has all zero entries. For -regular graphs, d is also an eigenvalue of A for the vector , and G is connected if and only if the multiplicity of d is 1. It can be shown that d is also an eigenvalue of A if G is a connected bipartite graph. The above are results of Perron Frobenius theorem.

Variations

The Seidel adjacency matrix or (0,1,1)-adjacency matrix of a simple graph has zero on the diagonal and entry aij = 1 if ij is an edge and +1 if it is not. This matrix is used in studying strongly regular graphs and two-graphs. A distance matrix is like a higher-level adjacency matrix. Instead of only providing information about whether or not two vertices are connected, also tells the distances between them. This assumes the length of every edge is 1. A variation is where the length of an edge is not necessarily 1.

Data structures
When used as a data structure, the main alternative for the adjacency matrix is the adjacency list. Because each entry in the adjacency matrix requires only one bit, they can 2 be represented in a very compact way, occupying only n / 8 bytes of contiguous space, where n is the number of vertices. Besides just avoiding wasted space, this compactness encourages locality of reference. On the other hand, for a sparse graph, adjacency lists win out, because they do not use any space to represent edges which are not present. Using a nave array implementation on a 32-bit computer, an adjacency list for an undirected graph requires about 8e bytes of storage, where e is the number of edges. Noting that a simple graph can have at most n edges, allowing loops, we can let d = e n2 denote the density of the graph. Then, 8e > n2 / 8, or the adjacency list representation occupies more space, precisely when d > 1 / 64. Thus a graph must be sparse indeed to justify an adjacency list representation.
2

Besides the space tradeoff, the different data structures also facilitate different operations. Finding all vertices adjacent to a given vertex in an adjacency list is as simple as reading the list. With an adjacency matrix, an entire row must instead be scanned, which takes O(n) time. Whether there is an edge between two given vertices can be determined at once with an adjacency matrix, while requiring time proportional to the minimum degree of the two vertices with the adjacency list.

Breadth-first search
Breadth-first search

Order in which the nodes are expanded

Class

Search algorithm

Data structure

Graph

Worst case performance

O( | V | + | E | ) = O(bd) O( | V | + | E | ) = O(bd)

Worst case space complexity

In graph theory, breadth-first search (BFS) is a graph search algorithm that begins at the root node and explores all the neighboring nodes. Then for each of those nearest nodes, it explores their unexplored neighbor nodes, and so on, until it finds the goal.

How it works
BFS is an uninformed search method that aims to expand and examine all nodes of a graph or combination of sequences by systematically searching through every solution. In other words, it exhaustively searches the entire graph or sequence without considering the goal until it finds it. It does not use a heuristic algorithm. From the standpoint of the algorithm, all child nodes obtained by expanding a node are added to a FIFO (i.e., First In, First Out) queue. In typical implementations, nodes that

have not yet been examined for their neighbors are placed in some container (such as a queue or linked list) called "open" and then once examined are placed in the container "closed".

An example map of Germany with some connections between cities

The breadth-first tree obtained Animated example of a when running BFS on the given breadth-first search map and starting in Frankfurt

Algorithm (informal)
1. Enqueue the root node. 2. Dequeue a node and examine it. o If the element sought is found in this node, quit the search and return a result. o Otherwise enqueue any successors (the direct child nodes) that have not yet been discovered. 3. If the queue is empty, every node on the graph has been examined quit the search and return "not found". 4. Repeat from Step 2. Note: Using a stack instead of a queue would turn this algorithm into a depth-first search.

Pseudocode
1 2 3 4 5 6 7 procedure BFS(Graph,source): create a queue Q enqueue source onto Q mark source while Q is not empty: dequeue an item from Q into v for each edge e incident on v in Graph:

8 9 10 11

let w be the other end of e if w is not marked: mark w enqueue w onto Q

Space complexity
Since all of the nodes of a level must be saved until their child nodes in the next level have been generated, the space complexity is proportional to the number of nodes at the deepest level. Given a branching factor b and graph depth d the asymptotic space d complexity is the number of nodes at the deepest level, O(b ). When the number of vertices and edges in the graph are known ahead of time, the space complexity can also be expressed as O( | E | + | V | ) where | E | is the cardinality of the set of edges (the number of edges), and | V | is the cardinality of the set of vertices. In the worst case the graph has a depth of 1 and all vertices must be stored. Since it is exponential in the depth of the graph, breadth-first search is often impractical for large problems on systems with bounded space.

Time complexity
Since in the worst case breadth-first search has to consider all paths to all possible nodes the time complexity of breadth-first search is which is d O(b ). The time complexity can also be expressed as O( | E | + | V | ) since every vertex and every edge will be explored in the worst case.

Completeness
Breadth-first search is complete. This means that if there is a solution, breadth-first search will find it regardless of the kind of graph. However, if the graph is infinite and there is no solution breadth-first search will diverge. Proof of completeness If the shallowest goal node is at some finite depth say d, breadth-first search will eventually find it after expanding all shallower nodes (provided that the branching factor b is finite).[1]

Optimality
For unit-step cost, breadth-first search is optimal. In general breadth-first search is not optimal since it always returns the result with the fewest edges between the start node and the goal node. If the graph is a weighted graph, and therefore has costs associated with each step, a goal next to the start does not have to be the cheapest goal available. This problem is solved by improving breadth-first search to uniform-cost search which

considers the path costs. Nevertheless, if the graph is not weighted, and therefore all step costs are equal, breadth-first search will find the nearest and the best solution.

Bias towards nodes of high degree


It has been empirically observed (and analytically shown for random graphs) that incomplete breadth-first search is biased towards nodes of high degree. This makes a breadth-first search sample of a graph very difficult to interpret. For example, a breadthfirst sample of 1 million nodes in Facebook (less than 1% of the entire graph) overestimates the average node degree by 240%.[2]

Applications
Breadth-first search can be used to solve many problems in graph theory, for example: Finding all nodes within one connected component Copying Collection, Cheney's algorithm Finding the shortest path between two nodes u and v Testing a graph for bipartiteness (Reverse) CuthillMcKee mesh numbering FordFulkerson method for computing the maximum flow in a flow network Serialization/Deserialization of a binary tree vs serialization in sorted order, allows the tree to be re-constructed in an efficient manner.

Finding connected components


The set of nodes reached by a BFS (breadth-first search) form the connected component containing the starting node.

Testing bipartiteness
BFS can be used to test bipartiteness, by starting the search at any vertex and giving alternating labels to the vertices visited during the search. That is, give label 0 to the starting vertex, 1 to all its neighbours, 0 to those neighbours' neighbours, and so on. If at any step a vertex has (visited) neighbours with the same label as itself, then the graph is not bipartite. If the search ends without such a situation occurring, then the graph is bipartite.

Depth-first search
Depth-first search

Order in which the nodes are expanded

Class

Search algorithm

Data structure

Graph

Worst case performance

O( | V | + | E | ) for explicit graphs d traversed without repetition, O(b ) for


implicit graphs with branching factor b searched to depth d

Worst case space complexity

O( | V | ) if entire graph is traversed


without repetition, O(longest path length searched) for implicit graphs without elimination of duplicate nodes

Depth-first search (DFS) is an algorithm for traversing or searching a tree, tree structure, or graph. One starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking.

Formal definition
Formally, DFS is an uninformed search that progresses by expanding the first child node of the search tree that appears and thus going deeper and deeper until a goal node is found, or until it hits a node that has no children. Then the search backtracks, returning to

the most recent node it hasn't finished exploring. In a non-recursive implementation, all freshly expanded nodes are added to a stack for exploration. The time and space analysis of DFS differs according to its application area. In theoretical computer science, DFS is typically used to traverse an entire graph, and takes time O(|V| + |E|), linear in the size of the graph. In these applications it also uses space O(|V|) in the worst case to store the stack of vertices on the current search path as well as the set of already-visited vertices. Thus, in this setting, the time and space bounds are the same as for breadth first search and the choice of which of these two algorithms to use depends less on their complexity and more on the different properties of the vertex orderings the two algorithms produce. For applications of DFS to search problems in artificial intelligence, however, the graph to be searched is often either too large to visit in its entirety or even infinite, and DFS may suffer from non-termination when the length of a path in the search tree is infinite. Therefore, the search is only performed to a limited depth, and due to limited memory availability one typically does not use data structures that keep track of the set of all previously visited vertices. In this case, the time is still linear in the number of expanded vertices and edges (although this number is not the same as the size of the entire graph because some vertices may be searched more than once and others not at all) but the space complexity of this variant of DFS is only proportional to the depth limit, much smaller than the space needed for searching to the same depth using breadth-first search. For such applications, DFS also lends itself much better to heuristic methods of choosing a likely-looking branch. When an appropriate depth limit is not known a priori, iterative deepening depth-first search applies DFS repeatedly with a sequence of increasing limits; in the artificial intelligence mode of analysis, with a branching factor greater than one, iterative deepening increases the running time by only a constant factor over the case in which the correct depth limit is known due to the geometric growth of the number of nodes per level.

Example
For the following graph:

a depth-first search starting at A, assuming that the left edges in the shown graph are chosen before right edges, and assuming the search remembers previously-visited nodes

and will not repeat them (since this is a small graph), will visit the nodes in the following order: A, B, D, F, E, C, G. Performing the same search without remembering previously visited nodes results in visiting nodes in the order A, B, D, F, E, A, B, D, F, E, etc. forever, caught in the A, B, D, F, E cycle and never reaching C or G. Iterative deepening prevents this loop and will reach the following nodes on the following depths, assuming it proceeds left-to-right as above: 0: A 1: A (repeated), B, C, E (Note that iterative deepening has now seen C, when a conventional depth-first search did not.) 2: A, B, D, F, C, G, E, F (Note that it still sees C, but that it came later. Also note that it sees E via a different path, and loops back to F twice.) 3: A, B, D, F, E, C, G, E, F, B For this graph, as more depth is added, the two cycles "ABFE" and "AEFB" will simply get longer before the algorithm gives up and tries another branch.

Output of a depth-first search

The four types of edges defined by a spanning tree The most natural result of a depth first search of a graph (if it is considered as a function rather than a procedure) is a spanning tree of the vertices reached during the search. Based on this spanning tree, the edges of the original graph can be divided into three classes: forward edges, which point from a node of the tree to one of its descendants, back edges, which point from a node to one of its ancestors, and cross edges, which do neither. Sometimes tree edges, edges which belong to the spanning tree itself, are

classified separately from forward edges. It can be shown that if the original graph is undirected then all of its edges are tree edges or back edges.

Vertex orderings
It is also possible to use the depth-first search to linearly order the vertices of the original graph (or tree). There are three common ways of doing this: A preordering is a list of the vertices in the order that they were first visited by the depth-first search algorithm. This is a compact and natural way of describing the progress of the search, as was done earlier in this article. A preordering of an expression tree is the expression in Polish notation. A postordering is a list of the vertices in the order that they were last visited by the algorithm. A postordering of an expression tree is the expression in reverse Polish notation. A reverse postordering is the reverse of a postordering, i.e. a list of the vertices in the opposite order of their last visit. When searching a tree, reverse postordering is the same as preordering, but in general they are different when searching a graph. For example, when searching the directed graph

beginning at node A, one visits the nodes in sequence, to produce lists either A B D B A C A, or A C D C A B A (depending upon the algorithm chooses to visit B or C first). Note that repeat visits in the form of backtracking to a node, to check if it has still unvisited neighbours, are included here (even if it is found to have none). Thus the possible preorderings are A B D C and A C D B (order by node's leftmost occurrence in above list), while the possible reverse postorderings are A C B D and A B C D (order by node's rightmost occurrence in above list). Reverse postordering produces a topological sorting of any directed acyclic graph. This ordering is also useful in control flow analysis as it often represents a natural linearization of the control flow. The graph above might represent the flow of control in a code fragment like

if (A) then { B } else { C } D

and it is natural to consider this code in the order A B C D or A C B D, but not natural to use the order A B D C or A C D B.

Applications
Algorithms that use depth-first search as a building block include: Finding connected components. Topological sorting. Finding 2-(edge or vertex)-connected components. Finding strongly connected components. Planarity Testing Solving puzzles with only one solution, such as mazes. (DFS can be adapted to find all solutions to a maze by only including nodes on the current path in the visited set.) bi connectivity.

Shortest Paths
Input: A graph with weights or costs associated with each edge. Output: The list of edges forming the shortest path. Sample problems: _ Find the shortest path between two speci_ed vertices. _ Find the shortest path from vertex S to all other vertices. _ Find the shortest path between all pairs of vertices.

Dijkstra's algorithm
Dijkstra's algorithm

Dijkstra's algorithm runtime

Class

Search algorithm

Data structure

Graph

Worst case performance

O( | E | + | V | log | V | )

Dijkstra's algorithm, conceived by Dutch computer scientist Edsger Dijkstra in 1956 and published in 1959,[1][2] is a graph search algorithm that solves the single-source shortest path problem for a graph with nonnegative edge path costs, producing a shortest path tree. This algorithm is often used in routing. An equivalent algorithm was developed by Edward F. Moore in 1957.[3] For a given source vertex (node) in the graph, the algorithm finds the path with lowest cost (i.e. the shortest path) between that vertex and every other vertex. It can also be used for finding costs of shortest paths from a single vertex to a single destination vertex by stopping the algorithm once the shortest path to the destination vertex has been determined. For example, if the vertices of the graph represent cities and edge path costs represent driving distances between pairs of cities connected by a direct road, Dijkstra's algorithm can be used to find the shortest route between one city and all other cities. As a result, the shortest path first is widely used in network routing protocols, most notably ISIS and OSPF (Open Shortest Path First).

Algorithm

Let the node at which we are starting be called the initial node. Let the distance of node Y be the distance from the initial node to Y. Dijkstra's algorithm will assign some initial distance values and will try to improve them step by step. 1. Assign to every node a distance value. Set it to zero for our initial node and to infinity for all other nodes. 2. Mark all nodes as unvisited. Set initial node as current. 3. For current node, consider all its unvisited neighbors and calculate their tentative distance (from the initial node). For example, if current node (A) has distance of 6, and an edge connecting it with another node (B) is 2, the distance to B through A will be 6+2=8. If this distance is less than the previously recorded distance (infinity in the beginning, zero for the initial node), overwrite the distance. 4. When we are done considering all neighbors of the current node, mark it as visited. A visited node will not be checked ever again; its distance recorded now is final and minimal. 5. If all nodes have been visited, finish. Otherwise, set the unvisited node with the smallest distance (from the initial node) as the next "current node" and continue from step 3.

Description
Note: For ease of understanding, this discussion uses the terms intersection, road and map however, formally these terms are vertex, edge and graph, respectively. Suppose you want to find the shortest path between two intersections on a city map, a starting point and a destination. The order is conceptually simple: to start, mark the distance to every intersection on the map with infinity. This is done not to imply there is an infinite distance, but to note that that intersection has not yet been visited. (Some variants of this method simply leave the intersection unlabeled.) Now, at each iteration, select a current intersection. For the first iteration the current intersection will be the starting point and the distance to it (the intersection's label) will be zero. For subsequent iterations (after the first) the current intersection will be the closest unvisited intersection to the starting pointthis will be easy to find. From the current intersection, update the distance to every unvisited intersection that is directly connected to it. This is done by determining the sum of the distance between an unvisited intersection and the value of the current intersection, and relabeling the unvisited intersection with this value if it is less than its current value. In effect, the intersection is relabeled if the path to it through the current intersection is shorter than the previously known paths. To facilitate shortest path identification, in pencil, mark the road with an arrow pointing to the relabeled intersection if you label/relabel it, and erase all others pointing to it. After you have updated the distances to each neighboring intersection, mark the current intersection as visited and select the unvisited intersection with lowest distance (from the starting point) -- or lowest labelas the current

intersection. Nodes marked as visited are labeled with the shortest path from the starting point to it and will not be revisited or returned to. Continue this process of updating the neighboring intersections with the shortest distances, then marking the current intersection as visited and moving onto the closest unvisited intersection until you have marked the destination as visited. Once you have marked the destination as visited (as is the case with any visited intersection) you have determined the shortest path to it, from the starting point, and can trace your way back, following the arrows in reverse. In the accompanying animated graphic, the starting and destination intersections are colored in light pink and blue and labelled a and b respectively. The visited intersections are colored in red, and the current intersection in a pale blue. Of note is the fact that this algorithm makes no attempt to direct "exploration" towards the destination as one might expect. Rather, the sole consideration in determining the next "current" intersection is its distance from the starting point. In some sense, this algorithm "expands outward" from the starting point, iteratively considering every node that is closer in terms of shortest path distance until it reaches the destination. When understood in this way, it is clear how the algorithm necessarily finds the shortest path, however it may also reveal one of the algorithm's weaknesses: its relative slowness in some topologies.

Pseudocode
In the following algorithm, the code u := vertex in Q with smallest dist[], searches for the vertex u in the vertex set Q that has the least dist[u] value. That vertex is removed from the set Q and returned to the user. dist_between(u, v) calculates the length between the two neighbor-nodes u and v. The variable alt on line 15 is the length of the path from the root node to the neighbor node v if it were to go through u. If this path is shorter than the current shortest path recorded for v, that current path is replaced with this alt path. The previous array is populated with a pointer to the "next-hop" node on the source graph to get the shortest route to the source.
1 function Dijkstra(Graph, source): 2 for each vertex v in Graph: // Initializations 3 dist[v] := infinity ; // Unknown distance function from source to v 4 previous[v] := undefined ; // Previous node in optimal path from source 5 end for ; 6 dist[source] := 0 ; // Distance from source to source 7 Q := the set of all nodes in Graph ; // All nodes in the graph are unoptimized - thus are in Q 8 while Q is not empty: // The main loop 9 u := vertex in Q with smallest dist[] ; 10 if dist[u] = infinity:

11 break ; // all remaining vertices are inaccessible from source 12 fi ; 13 remove u from Q ; 14 for each neighbor v of u: // where v has not yet been removed from Q. 15 alt := dist[u] + dist_between(u, v) ; 16 if alt < dist[v]: // Relax (u,v,a) 17 dist[v] := alt ; 18 previous[v] := u ; 19 fi ; 20 end for ; 21 end while ; 22 return dist[] ; 23 end Dijkstra.

If we are only interested in a shortest path between vertices source and target, we can terminate the search at line 13 if u = target. Now we can read the shortest path from source to target by iteration:
1 2 3 4 5 S := empty sequence u := target while previous[u] is defined: insert u at the beginning of S u := previous[u]

Now sequence S is the list of vertices constituting one of the shortest paths from target to source, or the empty sequence if no path exists. A more general problem would be to find all the shortest paths between source and target (there might be several different ones of the same length). Then instead of storing only a single node in each entry of previous[] we would store all nodes satisfying the relaxation condition. For example, if both r and source connect to target and both of them lie on different shortest paths through target (because the edge cost is the same in both cases), then we would add both r and source to previous[target]. When the algorithm completes, previous[] data structure will actually describe a graph that is a subset of the original graph with some edges removed. Its key property will be that if the algorithm was run with some starting node, then every path from that node to any other node in the new graph will be the shortest path between those nodes in the original graph, and all paths of that length from the original graph will be present in the new graph. Then to actually find all these short paths between two given nodes we would use a path finding algorithm on the new graph, such as depth-first search.

Running time
An upper bound of the running time of Dijkstra's algorithm on a graph with edges E and vertices V can be expressed as a function of | E | and | V | using the Big-O notation.

For any implementation of set Q the running time is , where dkQ and emQ are times needed to perform decrease key and extract minimum operations in set Q, respectively. The simplest implementation of the Dijkstra's algorithm stores vertices of set Q in an ordinary linked list or array, and extract minimum from Q is simply a linear search 2 through all vertices in Q. In this case, the running time is O( | V | + | E | ) = O( | | 2).
2

For sparse graphs, that is, graphs with far fewer than O( | V | ) edges, Dijkstra's algorithm can be implemented more efficiently by storing the graph in the form of adjacency lists and using a binary heap, pairing heap, or Fibonacci heap as a priority queue to implement extracting minimum efficiently. With a binary heap, the algorithm requires O(( | E | + | V | )log | V | ) time (which is dominated by O( | E | log | V | ), assuming the graph is connected), and the Fibonacci heap improves this to O( | E | + | V | log | V | ). Note that for Directed acyclic graphs, it is possible to find shortest paths from a given starting vertex in linear time, by processing the vertices in a topological order, and calculating the path length for each vertex to be the minimum or maximum length obtained via any of its incoming edges.[4]

Related problems and algorithms


The functionality of Dijkstra's original algorithm can be extended with a variety of modifications. For example, sometimes it is desirable to present solutions which are less than mathematically optimal. To obtain a ranked list of less-than-optimal solutions, the optimal solution is first calculated. A single edge appearing in the optimal solution is removed from the graph, and the optimum solution to this new graph is calculated. Each edge of the original solution is suppressed in turn and a new shortest-path calculated. The secondary solutions are then ranked and presented after the first optimal solution. Dijkstra's algorithm is usually the working principle behind link-state routing protocols, OSPF and IS-IS being the most common ones. Unlike Dijkstra's algorithm, the Bellman-Ford algorithm can be used on graphs with negative edge weights, as long as the graph contains no negative cycle reachable from the source vertex s. (The presence of such cycles means there is no shortest path, since the total weight becomes lower each time the cycle is traversed.) The A* algorithm is a generalization of Dijkstra's algorithm that cuts down on the size of the subgraph that must be explored, if additional information is available that provides a lower bound on the "distance" to the target. This approach can be viewed from the

perspective of linear programming: there is a natural linear program for computing shortest paths, and solutions to its dual linear program are feasible if and only if they form a consistent heuristic (speaking roughly, since the sign conventions differ from place to place in the literature). This feasible dual / consistent heuristic defines a nonnegative reduced cost and A* is essentially running Dijkstra's algorithm with these reduced costs. If the dual satisfies the weaker condition of admissibility, then A* is instead more akin to the Bellman-Ford algorithm. The process that underlies Dijkstra's algorithm is similar to the greedy process used in Prim's algorithm. Prim's purpose is to find a minimum spanning tree for a graph.

Dynamic programming perspective


From a dynamic programming point of view, Dijkstra's algorithm is a successive approximation scheme that solves the dynamic programming functional equation for the shortest path problem by the Reaching method.[5][6][7] In fact, Dijkstra's explanation of the logic behind the algorithm,[8] namely Problem 2. Find the path of minimum total length between two given nodes P and Q. We use the fact that, if R is a node on the minimal path from P to Q, knowledge of the latter implies the knowledge of the minimal path from P to R. is a paraphrasing of Bellman's famous Principle of Optimality in the context of the shortest path problem.

Spanning tree

A spanning tree (blue heavy edges) of a grid graph In the mathematical field of graph theory, a spanning tree T of a connected, undirected graph G is a tree composed of all the vertices and some (or perhaps all) of the edges of G. Informally, a spanning tree of G is a selection of edges of G that form a tree spanning every vertex. That is, every vertex lies in the tree, but no cycles (or loops) are formed. On the other hand, every bridge of G must belong to T. A spanning tree of a connected graph G can also be defined as a maximal set of edges of G that contains no cycle, or as a minimal set of edges that connect all vertices. In certain fields of graph theory it is often useful to find a minimum spanning tree of a weighted graph. Other optimization problems on spanning trees have also been studied, including the maximum spanning tree, the minimum tree that spans at least k vertices, the minimum spanning tree with at most k edges per vertex (Degree-Constrained Spanning Tree), the spanning tree with the largest number of leaves (closely related to the smallest connected dominating set), the spanning tree with the fewest leaves (closely related to the Hamiltonian path problem), the minimum diameter spanning tree, and the minimum dilation spanning tree.

Fundamental cycles
Adding just one edge to a spanning tree will create a cycle; such a cycle is called a fundamental cycle. There is a distinct fundamental cycle for each edge; thus, there is a one-to-one correspondence between fundamental cycles and edges not in the spanning tree. For a connected graph with V vertices, any spanning tree will have V-1 edges, and thus, a graph of E edges will have E-V+1 fundamental cycles. For any given spanning tree these cycles form a basis for the cycle space.

Dual to the notion of a fundamental cycle is the notion of a fundamental cutset. By deleting just one edge of the spanning tree, the vertices are partitioned into two disjoint sets. The fundamental cutset is defined as the set of edges that must be removed from the graph G to accomplish the same partition. Thus, there are precisely V-1 fundamental cutsets for the graph, one for each edge of the spanning tree. The duality between fundamental cutsets and fundamental cycles is established by noting that cycle edges not in the spanning tree can only appear in the cutsets of the other edges in the cycle; and vice versa: edges in a cutset can only appear in those cycles containing the edge corresponding to the cutset.

Spanning forests
A spanning forest is a type of subgraph that generalises the concept of a spanning tree. However, there are two definitions in common use. One is that a spanning forest is a subgraph that consists of a spanning tree in each connected component of a graph. (Equivalently, it is a maximal cycle-free subgraph.) This definition is common in computer science and optimisation. It is also the definition used when discussing minimum spanning forests, the generalization to disconnected graphs of minimum spanning trees. Another definition, common in graph theory, is that a spanning forest is any subgraph that is both a forest (contains no cycles) and spanning (includes every vertex).

Counting spanning trees


The number t(G) of spanning trees of a connected graph is an important invariant. In some cases, it is easy to calculate t(G) directly. It is also widely used in data structures in different computer languages. For example, if G is itself a tree, then t(G)=1, while if G is the cycle graph Cn with n vertices, then t(G)=n. For any graph G, the number t(G) can be calculated using Kirchhoff's matrix-tree theorem (follow the link for an explicit example using the theorem). Cayley's formula is a formula for the number of spanning trees in the complete graph Kn with n vertices. The formula states that t(Kn) = nn 2. Another way of stating n2 Cayley's formula is that there are exactly n labelled trees with n vertices. Cayley's formula can be proved using Kirchhoff's matrix-tree theorem or via the Prfer code. If G is the complete bipartite graph Kp,q, then t(G) dimensional hypercube graph Qn, then are also consequences of the matrix-tree theorem.

= pq 1qp 1, while if G is the n. These formulae

If G is a multigraph and e is an edge of G, then the number t(G) of spanning trees of G satisfies the deletion-contraction recurrence t(G)=t(G-e)+t(G/e), where G-e is the multigraph obtained by deleting e and G/e is the contraction of G by e, where multiple edges arising from this contraction are not deleted.

Uniform spanning trees


A spanning tree chosen randomly from among all the spanning trees with equal probability is called a uniform spanning tree (UST). This model has been extensively researched in probability and mathematical physics.

Algorithms
The classic spanning tree algorithm, depth-first search (DFS), is due to Robert Tarjan. Another important algorithm is based on breadth-first search (BFS). Parallel algorithms typically take different approaches than BFS or DFS. Halperin and Zwick designed an optimal randomized parallel algorithm that runs in O(log n) time with high probability on EREW PRAM.[1] The Shiloach-Vishkin algorithm, due to Yossi Shiloach and Uzi Vishkin, is the basis for many parallel implementations.[2] Bader and Cong's algorithm is shown to run fast in practice on a variety of graphs.[3] The most common distributed algorithm is the Spanning Tree Protocol, used by OSI link layer devices to create a spanning tree using the existing links as the source graph in order to avoid broadcast storms.

Minimum Spanning Trees


Spanning trees
A spanning tree of a graph is just a subgraph that contains all the vertices and is a tree. A graph may have many spanning trees; for instance the complete graph on four vertices
o---o |\ /| | X | |/ \| o---o

has sixteen spanning trees:


o---o | | | | | | o o o---o \ / o---o | | | o---o o o |\ / o o | | | | | | o---o o o \ / o---o | | | o---o o o \ /|

X / \ o o o o |\ | | \ | | \| o o o---o |\ | \ | \ o o

| X |/ \ o o o---o / / / o---o o o | / | / |/ o---o

X / \ o---o o o | /| | / | |/ | o o o o | \ | \| o---o \

X | / \| o o o---o \ \ \ o---o o---o /| / | / | o o

Minimum spanning trees


Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum of weights of its edges. Obviously, different trees have different lengths. The problem: how to find the minimum length spanning tree? This problem can be solved by many different algorithms. It is the topic of some very recent research. There are several "best" algorithms, depending on the assumptions you make: A randomized algorithm can solve it in linear expected time. [Karger, Klein, and Tarjan, "A randomized linear-time algorithm to find minimum spanning trees", J. ACM, vol. 42, 1995, pp. 321-328.] It can be solved in linear worst case time if the weights are small integers. [Fredman and Willard, "Trans-dichotomous algorithms for minimum spanning trees and shortest paths", 31st IEEE Symp. Foundations of Comp. Sci., 1990, pp. 719--725.] Otherwise, the best solution is very close to linear but not exactly linear. The exact bound is O(m log beta(m,n)) where the beta function has a complicated definition: the smallest i such that log(log(log(...log(n)...))) is less than m/n, where the logs are nested i times. [Gabow, Galil, Spencer, and Tarjan, Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica, vol. 6, 1986, pp. 109--122.] These algorithms are all quite complicated, and probably not that great in practice unless you're looking at really huge graphs. The book tries to keep things simpler, so it only describes one algorithm but (in my opinion) doesn't do a very good job of it. I'll go through three simple classical algorithms (spending not so much time on each one).

Minimum spanning trees prove important for several reasons:

They can be computed quickly and easily, and they create a sparse subgraph that reflects a lot about the original graph. They provide a way to identify clusters in sets of points. Deleting the long edges from a minimum spanning tree leaves connected components that define natural clusters in the data set, as shown in the output figure above. They can be used to give approximate solutions to hard problems such as Steiner tree and traveling salesman. As an educational tool, minimum spanning tree algorithms provide graphic evidence that greedy algorithms can give provably optimal solutions.

Why minimum spanning trees?


The standard application is to a problem like phone network design. You have a business with several offices; you want to lease phone lines to connect them up with each other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of lines that connects all your offices with a minimum total cost. It should be a spanning tree, since if a network isn't a tree you can always remove some edges and save money. A less obvious application is that the minimum spanning tree can be used to approximately solve the traveling salesman problem. A convenient formal way of defining this problem is to find the shortest path that visits each point at least once. Note that if you have a path visiting all points exactly once, it's a special kind of tree. For instance in the example above, twelve of sixteen spanning trees are actually paths. If you have a path visiting some vertices more than once, you can always drop some edges to get a tree. So in general the MST weight is less than the TSP weight, because it's a minimization over a strictly larger set. On the other hand, if you draw a path tracing around the minimum spanning tree, you trace each edge twice and visit all points, so the TSP weight is less than twice the MST weight. Therefore this tour is within a factor of two of optimal. There is a more complicated way (Christofides' heuristic) of using minimum spanning trees to find a tour within a factor of 1.5 of optimal; I won't describe this here but it might be covered in ICS 163 (graph algorithms) next year.

How to find minimum spanning tree?


The stupid method is to list all spanning trees, and find minimum of list. We already know how to find minima... But there are far too many trees for this to be efficient. It's also not really an algorithm, because you'd still need to know how to list all the trees.

A better idea is to find some key property of the MST that lets us be sure that some edge is part of it, and use this property to build up the MST one edge at a time. For simplicity, we assume that there is a unique minimum spanning tree. (Problem 4.3 of Baase is related to this assumption). You can get ideas like this to work without this assumption but it becomes harder to state your theorems or write your algorithms precisely. Lemma: Let X be any subset of the vertices of G, and let edge e be the smallest edge connecting X to G-X. Then e is part of the minimum spanning tree. Proof: Suppose you have a tree T not containing e; then I want to show that T is not the MST. Let e=(u,v), with u in X and v not in X. Then because T is a spanning tree it contains a unique path from u to v, which together with e forms a cycle in G. This path has to include another edge f connecting X to G-X. T+e-f is another spanning tree (it has the same number of edges, and remains connected since you can replace any path containing f by one going the other way around the cycle). It has smaller weight than t since e has smaller weight than f. So T was not minimum, which is what we wanted to prove.

Kruskal's algorithm
Kruskal's algorithm is an algorithm in graph theory that finds a minimum spanning tree for a connected weighted graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component). Kruskal's algorithm is an example of a greedy algorithm. This algorithm first appeared in Proceedings of the American Mathematical Society, pp. 4850 in 1956, and was written by Joseph Kruskal. Other algorithms for this problem include Prim's algorithm, Reverse-Delete algorithm, and Borvka's algorithm.

Description
create a forest F (a set of trees), where each vertex in the graph is a separate tree create a set S containing all the edges in the graph while S is nonempty and F is not yet spanning o remove an edge with minimum weight from S o if that edge connects two different trees, then add it to the forest, combining two trees into a single tree o otherwise discard that edge.

At the termination of the algorithm, the forest has only one component and forms a minimum spanning tree of the graph.

Performance
Where E is the number of edges in the graph and V is the number of vertices, Kruskal's algorithm can be shown to run in O(E log E) time, or equivalently, O(E log V) time, all with simple data structures. These running times are equivalent because: E is at most V2 and logV = 2logV is O(log V). If we ignore isolated vertices, which will each be their own component of the minimum spanning forest, V E+1, so log V is O(log E). We can achieve this bound as follows: first sort the edges by weight using a comparison sort in O(E log E) time; this allows the step "remove an edge with minimum weight from S" to operate in constant time. Next, we use a disjoint-set data structure (Union&Find) to keep track of which vertices are in which components. We need to perform O(E) operations, two 'find' operations and possibly one union for each edge. Even a simple disjoint-set data structure such as disjoint-set forests with union by rank can perform O(E) operations in O(E log V) time. Thus the total time is O(E log E) = O(E log V). Provided that the edges are either already sorted or can be sorted in linear time (for example with counting sort or radix sort), the algorithm can use more sophisticated disjoint-set data structure to run in O(E (V)) time, where is the extremely slowlygrowing inverse of the single-valued Ackermann function.
2

Example
Image Description

This is our original graph. The numbers near the arcs indicate their weight. None of the arcs are highlighted.

AD and CE are the shortest arcs, with length 5, and AD has been arbitrarily chosen, so it is highlighted.

CE is now the shortest arc that does not form a cycle, with length 5, so it is highlighted as the second arc.

The next arc, DF with length 6, is highlighted using much the same method.

The next-shortest arcs are AB and BE, both with length 7. AB is chosen arbitrarily, and is highlighted. The arc BD has been highlighted in red, because there already exists a path (in green) between B and D, so it would form a cycle (ABD) if it were chosen.

The process continues to highlight the next-smallest arc, BE with length 7. Many more arcs are highlighted in red at this stage: BC because it would form the loop BCE, DE because it would form the loop DEBA, and FE because it would form FEBAD.

Finally, the process finishes with the arc EG of length 9, and the minimum spanning tree is found.

Proof of correctness
The proof consists of two parts. First, it is proved that the algorithm produces a spanning tree. Second, it is proved that the constructed spanning tree is of minimal weight.

Spanning tree
Let P be a connected, weighted graph and let Y be the subgraph of P produced by the algorithm. Y cannot have a cycle, since the last edge added to that cycle would have been within one subtree and not between two different trees. Y cannot be disconnected, since the first encountered edge that joins two components of Y would have been added by the algorithm. Thus, Y is a spanning tree of P.

Minimality
We show that the following proposition P is true by induction: If F is the set of edges chosen at any stage of the algorithm, there is some minimum spanning tree that contains F. Clearly P is true at the beginning, when F is empty: any minimum spanning tree will do. Now assume P is true for some non-final edge set F and let T be a minimum spanning tree that contains F. If the next chosen edge e is also in T, then P is true

for F+e. Otherwise, T+e has a cycle C and there is another edge f that is in C but not F. Then T-f+e is a tree, and its weight is not more than the weight of T since otherwise the algorithm would choose f in preference to e. So T-f+e is a minimum spanning tree containing F+e and again P holds. Therefore, by the principle of induction, P holds when F has become a spanning tree, which is only possible if F is a minimum spanning tree itself.

Pseudocode
1 function Kruskal(G = <N, A>: graph; length: A R+): set of edges 2 Define an elementary cluster C(v) {v}. 3 Initialize a priority queue Q to contain all edges in G, using the weights as keys. 4 Define a forest T //T will ultimately contain the edges of the MST 5 // n is total number of vertices 6 while T has fewer than n-1 edges do 7 // edge u,v is the minimum weighted route from u to v 8 (u,v) Q.removeMin() 9 // prevent cycles in T. add u,v only if T does not already contain a path between u and v. 10 // the vertices has been added to the tree. 11 Let C(v) be the cluster containing v, and let C(u) be the cluster containing u. 13 if C(v) C(u) then 14 Add edge (v,u) to T. 15 Merge C(v) and C(u) into one cluster, that is, union C(v) and C(u). 16 return tree T

Prim's algorithm
Rather than build a subgraph one edge at a time, Prim's algorithm builds a tree one vertex at a time.
Prim's algorithm: let T be a single vertex x while (T has fewer than n vertices) { find the smallest edge connecting T to G-T add it to T }

Since each edge added is the smallest connecting T to G-T, the lemma we proved shows that we only add edges that should be part of the MST. Again, it looks like the loop has a slow step in it. But again, some data structures can be used to speed this up. The idea is to use a heap to remember, for each vertex, the smallest edge connecting T with that vertex.
Prim with heaps: make a heap of values (vertex,edge,weight(edge)) initially (v,-,infinity) for each vertex

let tree T be empty while (T has fewer than n vertices) { let (v,e,weight(e)) have the smallest weight in the heap remove (v,e,weight(e)) from the heap add v and e to T for each edge f=(u,v) if u is not already in T find value (u,g,weight(g)) in heap if weight(f) < weight(g) replace (u,g,weight(g)) with (u,f,weight(f)) }

Analysis: We perform n steps in which we remove the smallest element in the heap, and at most 2m steps in which we examine an edge f=(u,v). For each of those steps, we might replace a value on the heap, reducing it's weight. (You also have to find the right value on the heap, but that can be done easily enough by keeping a pointer from the vertices to the corresponding values.) I haven't described how to reduce the weight of an element of a binary heap, but it's easy to do in O(log n) time. Alternately by using a more complicated data structure known as a Fibonacci heap, you can reduce the weight of an element in constant time. The result is a total time bound of O(m + n log n).

Boruvka's algorithm
(Actually Boruvka should be spelled with a small raised circle accent over the "u".) Although this seems a little complicated to explain, it's probably the easiest one for computer implementation since it doesn't require any complicated data structures. The idea is to do steps like Prim's algorithm, in parallel all over the graph at the same time.
Boruvka's algorithm: make a list L of n trees, each a single vertex while (L has more than one tree) for each T in L, find the smallest edge connecting T to G-T add all those edges to the MST (causing pairs of trees in L to merge)

As we saw in Prim's algorithm, each edge you add must be part of the MST, so it must be ok to add them all at once. Analysis: This is similar to merge sort. Each pass reduces the number of trees by a factor of two, so there are O(log n) passes. Each pass takes time O(m) (first figure out which tree each vertex is in, then for each edge test whether it connects two trees and is better than the ones seen before for the trees on either endpoint) so the total is O(m log n).

A hybrid algorithm
This isn't really a separate algorithm, but you can combine two of the classical algorithms and do better than either one alone. The idea is to do O(log log n) passes of Boruvka's algorithm, then switch to Prim's algorithm. Prim's algorithm then builds one large tree by connecting it with the small trees in the list L built by Boruvka's algorithm, keeping a heap which stores, for each tree in L, the best edge that can be used to connect it to the large tree. Alternately, you can think of collapsing the trees found by Boruvka's algorithm

into "supervertices" and running Prim's algorithm on the resulting smaller graph. The point is that this reduces the number of remove min operations in the heap used by Prim's algorithm, to equal the number of trees left in L after Boruvka's algorithm, which is O(n / log n). Analysis: O(m log log n) for the first part, O(m + (n/log n) log n) = O(m + n) for the second, so O(m log log n) total.

Vous aimerez peut-être aussi