Académique Documents
Professionnel Documents
Culture Documents
In computer science, a binary search tree (BST) is a node based binary tree data structure which has the following properties:
The left subtree of a node contains only nodes with keys less than the node's key. The right subtree of a node contains only nodes with keys greater than the node's key. Both the left and right subtrees must also be binary search trees. Each node (item in the tree) has a distinct key.
From the above properties it naturally follows that: Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their keys rather than any part of their their associated records. The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as inorder traversal can be very efficient. Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays.
A binary search tree of size 9 and depth 3, with root 8 and leaves 1, 4, 7 and 13
Binary Search tree is a binary tree in which each internal node x stores an element such that the
element stored in the left subtree of x are less than or equal to x and elements stored in the right subtree of x are greater than or equal to x. This is called binary-search-tree property.
The basic operations on a binary search tree take time proportional to the height of the tree. For a complete binary tree with node n, such operations runs in (lg n) worst-case time. If the tree is a linear chain of n nodes, however, the same operations takes
it is a binary tree; each node contains a value; a total order is defined on these values (every two values can be compared with each other); left subtree of a node contains only values lesser, than the node's value; right subtree of a node contains only values greater, than the node's value.
utilized to construct set data structure, which allows to store an unordered collecti on of unique values and make operations with such collections. Performance of a binary search tree depends of its height. In order to keep tree balanced and minimize its height, the idea of binary search trees was advanced in balanced search trees (AVL trees, Red-Black trees, Splay trees). Here we will discuss the basic ideas, laying in the foundation of binary search trees.
a value (user's data); a link to the left child (auxiliary data); a link to the right child (auxiliary data).
Depending on the size of user data, memory overhead may vary, but in general it is quite reasonable. In some implementations, node may store a link to the parent, but it depends on algorithm, programmer want to apply to BST. For basic operations, like addition, removal and search a link to the parent is not necessary. It is needed in order to implement iterators. With a view to internal representation, the sample from the overview changes:
Leaf nodes have links to the children, but they don't have children. In a programming language it means, that corresponding links are set to NULL.
At this stage analgorithm should follow binary search tree property. If a new value is less, than the current node's value, go to the left subtree, else go to the right subtree. Following this simple rule, the algorithm reaches a node, which has no left or right subtree. By the moment a place for insertion is found, we can say for sure, that a new value has no duplicate in the tree. Initially, a new node has no children, so it is a leaf. Let us see it at the picture. Gray circles indicate possible places for a new node.
Now, let's go down to algorithm itself. Here and in almost every operation on BST recursion is utilized. Starting from the root,
1. check, whether value in current node and a new value are equal. If so, duplicate is found. Otherwise, 2. if a new value is less, than the node's value: o if a current node has no left child, place for insertion has been found; o otherwise, handle the left child with the same algorithm. 3. if a new value is greater, than the node's value: o if a current node has no right child, place for insertion has been found; o otherwise, handle the right child with the same algorithm. Just before code snippets, let us have a look on the example, demonstrating a case of insertion in the binary search tree. Example
Code snippets
The only the difference, between the algorithm above and the real routine is that first we should check, if a root exists. If not, just create it and don't run a common algorithm for this special case. This can be done in the BinarySearchTree class. Principal algorithm is implemented in the BSTNode class.
Now, let's see more detailed description of the search algorithm. Like an add operation, and almost every operation on BST, search algorithm utilizes recursion. Starting from the root,
1. check, whether value in current node and searched value are equal. If so, value is found. Otherwise, 2. if searched value is less, than the node's value: o if current node has no left child, searched value doesn't exist in the BST; o otherwise, handle the left child with the same algorithm. 3. if a new value is greater, than the node's value: o if current node has no right child, searched value doesn't exist in the BST; o otherwise, handle the right child with the same algorithm. Just before code snippets, let us have a look on the example, demonstrating searching for a value in the binary search tree. Example
search for a node to remove; if the node is found, run remove algorithm.
Now, let's see more detailed description of a remove algorithm. First stage is identical to algorithm for lookup, except we should track the parent of the current node. Second part is more tricky. There are three cases, which are described below.
1. Node to be removed has no children.
This case is quite simple. Algorithm sets corresponding link of the parent to NULL and disposes the node. Example. Remove -4 from a BST.
2. Node to be removed has one child. It this case, node is cut from the tree and algorithm links single child (with it's subtree) directly to the parent of the removed node. Example. Remove 18 from a BST.
3. Node to be removed has two children. This is the most complex case. To solve it, let us see one useful BST property first. We are going to use the idea, that the same set of values may be represented as different binary-search trees. For example those BSTs:
contains the same values {5, 19, 21, 25}. To transform first tree into second one, we can do following:
o o o
choose minimum element from the right subtree (19 in the example); replace 5 by 19; hang 5 as a left child.
The same approach can be utilized to remove a node, which has two children:
o o o
find a minimum value in the right subtree; replace value of the node to be removed with found minimum. Now, right subtree contains a duplicate! apply remove to the right subtree to remove a duplicate.
Notice, that the node with minimum value has no left child and, therefore, it's removal may result in first or second cases only. Example. Remove 12 from a BST.
Find minimum element in the right subtree of the node to be removed. In current example it is 19.
Replace 12 with 19. Notice, that only values are replaced, not nodes. Now we have two nodes with the same value.
AVL tress
An AVL tree is a self-balancing binary search tree, and it is the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; therefore, it is also said to be height-balanced. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations. The AVL tree is named after its two inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their 1962 paper "An algorithm for the organization of information." The balance factor of a node is the height of its right subtree minus the height of its left subtree and a node with balance factor 1, 0, or -1 is considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree. The balance factor is either stored directly at each node or computed from the heights of the subtrees. AVL trees are often compared with red-black trees because they support the same set of operations and because red-black trees also take O(log n) time for the basic operations. AVL trees perform better than red-black trees for lookup-intensive applications.The AVL tree balancing algorithm appears in many computer science curricula.
Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The keys are stored in non-decreasing order. Each key has an associated child that is the root of a subtree containing all nodes with keys less than or equal to the key but greater than the preceeding key. A node also has an additional rightmost child that is the root for a subtree containing all keys greater than any keys in the node. A b-tree has a minumum number of allowable children for each node known as the minimization factor. If t is this minimization factor, every node must have at least t - 1 keys. Under certain circumstances, the root node is allowed to violate this property by having fewer than t - 1 keys. Every node may have at most 2t - 1 keys or, equivalently, 2t children. Since each node tends to have a large branching factor (a large number of children), it is typically neccessary to traverse relatively few nodes before locating the desired key. If access to each node requires a disk access, then a btree will minimize the number of disk accesses required. The minimzation factor is usually chosen so that the total size of each node corresponds to a multiple of the block size of the underlying storage device. This choice simplifies and optimizes disk access. Consequently, a b-tree is an ideal data structure for situations where all data cannot reside in primary storage and accesses to secondary storage are comparatively expensive (or time consuming).
Height of B-Trees
For n greater than or equal to one, the height of an n-key b-tree T of height h with a minimum degree t greater than or equal to 2,
For a proof of the above inequality, refer to Cormen, Leiserson, and Rivest pages 383-384.
The worst case height is O(log n). Since the "branchiness" of a b-tree can be large compared to many other balanced tree structures, the base of the logarithm tends to be large; therefore, the number of nodes visited during a search tends to be smaller than required by other tree structures. Although this does not affect the asymptotic worst case height, b-trees tend to have smaller heights than other trees with the same asymptotic height.
Operations on B-Trees
The algorithms for the search, create, and insert operations are shown below. Note that these algorithms are single pass; in other words, they do not traverse back up the tree. Since b-trees strive to minimize disk accesses and the nodes are usually stored on disk, this single-pass approach will reduce the number of node visits and thus the number of disk accesses. Simpler double-pass approaches that move back up the tree to fix violations are possible. Since all nodes are assumed to be stored in secondary storage (disk) rather than primary storage (memory), all references to a given node be be preceeded by a read operation denoted by Disk-Read. Similarly, once a node is modified and it is no longer needed, it must be written out to secondary storage with a write operation denoted by
Disk-Write. The algorithms below assume that all nodes referenced in parameters have already had a corresponding Disk-Read operation. New nodes are created and assigned storage with the Allocate-Node call. The implementation details of the Disk-Read, Disk-Write, and Allocate-Node functions are operating system and implementation dependent.
Examples
Sample B-Tree
A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:
The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right. The heap property: each node is greater than or equal to each of its children according to some comparison predicate which is fixed for the entire data structure.
"Greater than or equal to" means according to whatever comparison function is chosen to sort the heap, not necessarily "greater than or equal to" in the mathematical sense (since the quantities are not always numerical). Heaps where the comparison function is mathematical "greater than or equal to" are called max-heaps; those where the comparison function is mathematical "less than" are called "min-heaps". Conventionally, min-heaps are used, since they are readily applicable for use in priority queues. Note that the ordering of siblings in a heap is not specified by the heap property, so the two children of a parent can be freely interchanged, as long as this does not violate the shape and heap properties (compare with treap). The binary heap is a special case of the d-ary heap in which d = 2. It is possible to modify the heap structure to allow extraction of both the smallest and largest element in O(logn) time.[1] To do this, the rows alternate between min heap and max heap. The algorithms are roughly the same, but, in each step, one must consider the alternating rows with alternating comparisons. The performance is roughly the same as a normal single direction heap. This idea can be generalised to a min-max-median heap.
If we have a heap, and we add an element, we can perform an operation known as up-heap, bubble-up, percolate-up, sift-up, or heapify-up in order to restore the heap property. We can do this in O(log n) time, using a binary heap, by following this algorithm:
1. Add the element on the bottom level of the heap. 2. Compare the added element with its parent; if they are in the correct order, stop. 3. If not, swap the element with its parent and return to the previous step.
We do this at maximum for each level in the treethe height of the tree, which is O(log n). However, since approximately 50% of the elements are leaves and 75% are in the bottom two levels, it is likely that the new element to be inserted will only move a few levels upwards to maintain the heap. Thus, binary heaps support insertion in average constant time, O(1). Say we have a max-heap
and we want to add the number 15 to the heap. We first place the 15 in the position marked by the X. However, the heap property is violated since 15 is greater than 8, so we need to swap the 15 and the 8. So, we have the heap looking as follows after the first swap:
However the heap property is still violated since 15 is greater than 11, so we need to swap again:
which is a valid max-heap. There is no need to check the children after this. Before we placed 15 on X, the heap was valid, meaning 11 is greater than 5. If 15 is greater than 11, and 11 is greater than 5, then 15 must be greater than 5.
[edit] Deleting the root from the heap
The procedure for deleting the root from the heapeffectively extracting the maximum element in a max-heap or the minimum element in a min-heapstarts by replacing it with the last element on the last level. So, if we have the same max-heap as before, we remove the 11 and replace it with the 4.
Now the heap property is violated since 8 is greater than 4. The operation that restores the property is called downheap, bubble-down, percolate-down, sift-down, or heapify-down. In this case, swapping the two elements 4 and 8, is enough to restore the heap property and we need not swap elements further:
The downward-moving node is swapped with the larger of its children in a max-heap (in a min-heap it would be swapped with its smaller child), until it satisfies the heap property in its new position. This functionality is achieved by the Max-Heapify function as defined below in pseudocode for an array-backed heap A. Note that "A" is indexed starting at 1, not 0 as is common in many programming languages. Max-Heapify[2](A, i): left 2i
right 2i + 1 largest i if left heap-length[A] and A[left] > A[i] then: largest left if right heap-length[A] and A[right] > A[largest] then: largest right if largest i then: swap A[i] A[largest] Max-Heapify(A, largest) Note that the down-heap operation (without the preceding swap) can be used in general to modify the value of the root, even when an element is not being deleted.