Vous êtes sur la page 1sur 67

Trees

Binary search tree is a simple data structure for which running time of most operations is
O(logN) on average.

*Trees are used to implement the file system of several popular operating systems.

*Trees can be used to support searching operations in O(log N) average time.

*Trees can be used to implement symbol table.

Tree(Recursive definition): A tree is a collection of nodes.The collection can be
empty;otherwise, a tree consists of a distinguished node r,called the root,and zero or
more nonempty(sub) trees T1,T2, T
k
, each of whose roots are connected by a
directed edge from r .

















A
B
G
C
H I
N O
D
J
E
K
P
F
L M
Q


*In a tree with N nodes ,the no of edges will be N-1.

*Root is a node without a parent.

*Leaves: Nodes without children are known as leaves.

*Siblings: Nodes with the same parent.

*Path : A path from node n
1
to n
k
is defined as a sequence of nodes n
1
, n
2
, nk ,such
that n
i
is the parent of n
i+1
for, 1i k.

*Path Length : It is the no of edges on the path .If you have k nodes in the path,path
length will be k-1.

*Depth:- Depth of a node is the path length from the root to the given node.

*Depth of the tree is the maximum path length that is existing in the tree.

*Depth of the root is zero.

*Height :- Height of a node is the maximum path length from a leaf to the given node.

Height of a tree is the maximum path length that is existing in the tree.

*Height of the leaf is Zero.

*The Height of the tree is equal to the depth of the tree.

*Ancestors : All nodes in the path from the root to the parent of a given node.

*Descendants :- All the children ,grand children etc are called as descendants of the
node.


Generic tree:

It is a tree in which any node can have any number of children.

m-ary tree: It is a tree in which any node can have maximum m children only.

2-ary tree or Binary Tree: It is a tree in which no node can have more than two
children.

Implementation of Trees: Generic tree to Binary tree.

Since the no of children per node can vary so greatly and is not known in advance,it
might be infeasible to make the children direct links in the data structure,because there
would be too much wastage of space.The solution is keep the children of each node in a
linked list of tree nodes.










Ex: node structure:




data First Child Next Sibling


T

A /



B C D E F
H I J k L
N
O
P
M
Q
G















Directory Structure:



Binary Tree:




Worst Case
B
C
D
T1 T2
A
T







*The Depth of a binary tree may vary from (N-1) worstcase to log
2
N Best Case .


* The average depth of a binary tree is O(square root(N)).


Perfectly Balance Binary tree:-

If the height of the left subtree is equal to the height of the right subtree at all the nodes it
is called as the perfectly balanced binary tree.

A
B
D E
C
F G



*The maximum no of nodes in a binary tree of height H is 2
H+1
1. If D is the depth of
the tree then N = 2
D+1
- 1

*Full node : A full node is a node with two children.

*The number of full nodes plus one is equal to the no of leaves in a nonempty binary tree.

*If the height of the left subtree is greater than the height of the right subtree the tree is
called as left heavy.

*If the height of the right is greater than the height of the left subtree it is called as right
heavy tree.

*Any nodes which has outdegree 0 is called a terminal node or a leaf ,all other nodes are
called branch nodes/Internal-nodes.

*Level: The level of any node is the length of its path from the root.

*Ordered tree: If in a(directed) tree an ordering of the nodes at each level is prescribed
,then such a tree is called an ordered tree.

*Degree of a node: The no. of subtrees of a node is called the degree of the node.

*A set of disjoint trees is a forest.

*If the outdegree of every node is exactly equal to m or 0 and the no of nodes at level i
is m
i-1
(assuming the root is at level 1) then the tree is called a full or complete m-ary
tree

.
*No of ordered trees with n nodes is (1/n+1)(2n)Cn.



Storage Representation of Binary Trees.


1.Sequential /Array representation
.
2.Linked storage representation.

3.Threaded storage representation.

Sequential Representation:-

If the root/parent is stored in ith location its left child must be stored in (2i)
th
location
and its right child must be stored in (2i+1)
th
location.

C language : left child in (2i +1 )
th
location
& right child in (2i + 2)
th
location

Eg :


T
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
B C D E F G H I J K




A
B
D E
H
k
C
F
I
G
J







Disadvantage :- Memory wastage

















*Linked storage Representation:In each node two(pointers) link fields are required.


left data right





A
B C
D E
F G
H
K
I
J



*In a binary tree,with n nodes there exists (n+1) NULL Links.


Binary Search tree: It is a binary tree in which for every node, x, the values of all the
keys in its left subtree are smaller than the key value in x, and the values of all the keys
in its right subtree are larger than the key value in x.






eg :
15

10
13
5
8 2
34
25 41
18 30
38



Implementation of BST:-
left data right
Find ,insert,delete operations.

typedef struct treenode* treptr;
struct treenode
{
treptr left;
int data;
treptr right;
};


Searching for a given element (Find)

treptr find (treptr T, int x)
{
if(T = = NULL)
return(NULL);
if( x < T->data)
return(find(T->left, x));
else if(x > T->data)
return(find (T->right, x));
else
return(T);
}


*Find min and Find max

treptr findmin(treptr T)
{
if(T = = NULL)
return(NULL);
else if(T->left = = NULL)
return(T);
else return((findmin(T->left));
}

*Nonrecursive Implementation of Find Max.

treptr findmax(treptr T)
{
if(T != NULL)
while(T->right!=NULL)
T = T->right;
return(T);
}


*Inserting an element into Binary search tree if the element is not existing.

treptr insert(treptr T,int x)
{
if(T= = NULL)
{
T =(treptr)malloc(sizeof(struct treenode));
if(T = = NULL)
printf(out of space);
else
{
T->data = x;
T->left = T->right = NULL;
}
}
else if(x < T->data)
T->left = insert(T->left , x);
else if(x > T->data)
T->right = insert(T->right,x);
return(T);
}


*Duplicates can be handled by keeping an extrafield in the node indicating the frequency
of occurence.

*If the key is only part of a larger structure,then we can keep all of the structure that have
the same key in an auxiliary data structure ,such as a list or another search tree.

Deleting an element .

The general strategy is to replace the data of this node (with two children) with the
smaller data of the right subtree and recursively delete the node.

If the node to be deleted is having one child delete the node and return the child to its
parent.

If the node is not having any children then free the node and return a null to its parent.

*Lazy deletion: when an element is to be deleted, it is left in the tree and merely marked
as being deleted(use extra field in the node).

It is used when duplicate keys are present.

If the no of deletions is expected to be small

If the keys which are deleted are to be inserted in future.

*A small time penality is associated with lazy deletion.

Time complexity of insert/delete operations is O(log
2
n) on an average for a balanced
tree.



Deletion routine for BST:-

treptr delete (treptr T , int x)
{
treptr temp;
if(T = = NULL)
printf(Element is not existing);
else
if(x < T->data)
T->left = delete (T->left ,x);
else
if( x > T->data)
T->right = delete(T->.right ,x);
else
if(T->left && T->right)
{
temp = findmin(T->right);
T->data = temp ->data;
T->right = delete(T ->right,T ->data);
}
else
{
temp = T;
if(T->left = =NULL)
T = T->right;
else if(T-> right = = NULL)
T = T->left;
free(temp);
}
return(T);
}

























Expression Trees:

The leaves of an expression tree are operands , and the other nodes contain operators.

eg : (a + b * c) +( (d * e + f ) * g)



+
+ *
*
b
c
a
+ g
f
*
e
d




Prefix exp: If preorder traversal is applied on an expression tree we will get prefix
expression .

Postfix exp: If postorder traversal is applied on expression tree we will get postfix
expression.

Constructing an Expression Tree:

From a given postfix expression we can construct an expression tree.

1.Repeat thru step 5 (the following ) until the end of the postfix expression.

2.Read a symbol from the expression.

3.Create a node,store the symbol in the datafield.

4.If the symbol is an operand push the node address into the stack and go to step 2.

5.If the symbol is an operator pop two nodes from the stack and attach as right & left
children to the node respectively and push it into the stack and go to step 2.

6.pop the root address from the stack.
Tree Traversals


To list the names of all the files in the directory.

To swap the left & right children of all the nodes.

To prepare a copy of the existing tree.

Tree Traversal is a procedure by which each node in the tree is processed exactly once
in a systematic manner.The meaning of processed depends on the nature of the
application.

Main traversals:
Within the children if we assume that the left is to be visited first, depending upon the
position of the parent we get preorder,inorder, and postorder traversals.

If we assume that the right is to be visited first then we get
reverse(converse)preorder,reverse inorder & reverse postorder traversals.

The nodes in the tree can be visited level by level and is called as Levelorder Traversal.

Level order left to right traversal.
Level order right to left traversal.

Preorder Traversal of a binary tree is defined as follows:

1.Process the root/parent node.
2. Traverse the left subtree in preorder.
3. Traverse the right subtree in preorder.

Inorder traversal of a binary tree is given by the following steps.

1.Traverse the left subtree in inorder.
2.Process the root/parent node.
3.Traverse the right subtree in inorder.


*we define the postorder traversal of a binary tree as follows.

1.Traverse the left subtree in postorder.
2.Traverse the right subtree in postorder.
3. Process the root/parent node.

Preorder o/p:- A B C D E F G

Inorder o/p:- C B A E F D G

Postorder o/p:-C B F E G D A



A
B
C
D
E
F
G




Implementation of traversals(Recursive Algorithms)

void rpreorder(treptr T)
{
if(T! = NULL)
{
printf(%d,T->data);
rpreorder(T->left);
preorder(T->right);
}
}

void rinorder(treptr T)
{
if(T!= NULL)
{
rinorder(T->left);
printf(%d,T->data);
rinorder(T->right);
}
}


void rpostorder(treptr T)
{
if(T!=NULL)
{
rpostorder(T->left);
rpostorder(T->right);
printf(%d,T->data);
}
}

Nonrecursive preorder routine:

General Algorithm:

1.If the tree is empty then write tree empty and return else place the pointer to the root
of the tree in the stack.

2.Repeat step3 while the stack is not empty.

3.Pop the top pointer off the stack.Repeat while the pointer value is not null .write the
data associated with the node.If right subtree is not empty then stack the pointer to the
right subtree set pointer value to left subtree.

*Assuming tos is global,push & pop routines are available.

void preorder(treptr T)
{
treptr p,S[20];
tos = -1;
if( T = = NULL)
printf(tree is empty);
else
{
push(S,T);
while(tos ! =-1)
{
p = pop(S);
while(p ! = NULL)
{
printf(%d,p->data);
if (p->right ! = NULL)
push ( S, p->right);
p = p->left;

}
}
}
}



Iterative Postorder Traversal.

General Algorithm.

1.If the tree is empty then write empty tree and return else initialize the stack and
initialize the pointer value to root of tree.

2.Start an infinite loop to repeat through step 5.

3.Repeat while pointer value is not null.
Stack current pointer value.
Set pointer value to left subtree.

4.Repeat while top pointer on stack is negative.
Pop pointer off stack.
write data associated with positive value of this pointer.
If stack is empty then return.

5.Set pointer value to the right subtree of the value on top of the stack.

Stack the negative value of the pointer to the right subtree.


Assuming push & pop routines are existing & tos is global.


void postorder(treptr T)
{
treptr p, s[50];
if(T= = NULL)
{
printf(Tree is empty);
return;
}
else
{
p = T;
tos = -1;
}
while(1)
{
while(p! = NULL)
{
push(s,p);
p = p->left;
}
while( s[tos] < 0 )
{
p = pop(s);
printf(%d, p->data);
if(tos = = -1)
return;
}

p =s[tos]->right;
s[tos] = - s[tos];
}
}

* Routine to prepare a copy of the given tree T.

treptr copy(treptr T)
{
treptr temp;
if( T = = NULL)
return(NULL);
temp = ((treptr)malloc(sizeof(struct treenode));
temp ->data = T-> data;
temp ->left = copy(T->left);
temp ->right = copy (T->right);
return(temp);
}

* If two traversal outputs are given namely

Inorder-preorder
Or
Inorder -postorder

We can construct a tree

If preorder & postorder traversal outputs are given we can not construct a unique tree

ex: preorder output : a,b,d,e,g ,e ,f
inorder output : d,b ,g, e,a,f,c.

* from the preorder take the first data,construct a node and using this divide the inorder
output.





a
d,b,g ,e f,c


Take the next data item and create a node and attach at the appropriate place.




a
f,c
g,e
b
d
















Repeat the above until all the data items of preorder output is explored.

The elements of a binary search tree can be printed in ascending order



a




b
c

d








* by applying.
Inorder Traversal.



*Threaded storage Representation for Binary trees.

The wasted NULL links in the linked representation can be replaced by threads.
binary tree is threaded according to a particular traversal order.
g:- Threads for the inorder

A

e traversal of a tree are pointers to its higher nodes.
If the left link of a node is null then this is replaced by the address of the
If the right link of a node is null then this is replaced by the address of the successor.
A null link without predecessor/successor can be replaced with the address of the

*
predecessor.

*

*
root
f g,e
*A thread can be represented using negative address.

*A separate field can be used


left leftthread data rightthread right


AVL(Adelson velski and landis) Trees:

*If the tree is balanced then its depth will be O(log
2
N) and any operation can be
implemented in O(log
2
N) time.

*As the no of insertions/deletions increases on BST,the tree becomes either left heavy or
right heavy.And the time complexity becomes O(N).

*To implement any operation in O(log
2
N) time insist on an extra structural condition
called balance.

*The data structures in which after every operation, a restructuring rule is applied that
tends to make future operations efficient are classified as self-adjusting: AVL,splay
tree.

AVL tree is a binary search tree with a balance condition which ensures that the depth of
the tree is O(logN)

*An AVL tree is identical to a binary search tree,except that for every node in the tree,the
height of the left and right subtrees can differ by at most 1.

*Height information is kept for each node.

*The height of an empty tree is defined to be 1.

*The height of an AVL tree is at most roughly.
h = 1.44 log(N+2) 0.328.

*The minimum no of nodes,s(h) in an AVL tree of heighth is given by
s(h) = s(h-1) +s(h-2) +1 ; for h =0 s(h) =1, for h =1 s(h) = 2 .

*when we do an insert operation,we need to update all the balancing information for the
nodes on the path back to the root.

*Normally lazy deletion is performed.

*When we insert a new element,a node could violate the AVL tree property ,which can
be restored with a simple modification to the tree, known as a rotation.

The violation may occur in four cases.

1.An insertion into the left subtree of the left child of T.

2. An insertion into the right subtree of the right child of T.

3.An insertion into the right subtree of the left child of T.

4.An insertion into the right subtree of the right child of T.

*If the insertion occurs on the outside (left-left or right-right) is fixed by a Single
rotation of the tree.

*If the insertion occurs on the inside (left-right or right-left) is handled by the double
rotation.


Single rotation: Single logic here is if the height of the left subtree is more due to
outside insertion make the left child as the new root and the root as the right child of the
new root.Make the new links keeping the order property in mind.







Single rotation with its left child



Single rotation with its right child





K1
B C
K1
K2
A
B
C
K2
K1
K2
A B
K1
C
K2
A
B C
A



Example :Inserting the elements 3,2,1,4,5,6,7










2
1 3
4
5
1
2
3
4
4 2
2
1
5
4
5 3






2
1 4
3 5
6
6
4
2
3 1
6
5




7 4
2
1 3
5
6
7
4
2
1 3 5
6
7





Double rotation: Can be implemented using two single rotations.










Left-right double rotation










K3
K1
D
A
K2
B C
K3
K2
K1
B A
C
D
K1
A B
K2
K3
C D
















Right-left double rotation


































K1
A
K3
K2
D
B
C
K1
A
K2
B
K3
D
C
K2
K1
K3
A B
D
C



Implementation:




left data height right


typedef struct avlnode * avlptr;

struct avlnode
{
avlptr left;
int data;
int height;
avlptr right;
}

int height(avlptr T)
{
if (T = = NULL)
return(-1);
else
return( T ->height);
}



This function can be called only if k2 has a left child .Perform a rotate between a
node(k2) and its left child.update height and return the new root.

avlptr srotatewithleft(avlptr k2)
{
avlptr k1;
k1 = k2->left;
k2->left = k1->right;
k1->right = k2;
k2 ->height = max(height(k2->left), height(k2->right))+1;
k1->height = max(height(k1->left),k2->height) + 1;
return(k1);
}

This function can be called only if k3 has a left child and k3s left child has a right
child.Do the left right double rotation.update heights, then return new root.


avlptr drotatewithleft(avlptr k3)
{
K3 ->left = srotatewithright(k3->left);
return(Srotatewithleft(k3));
}

avlptr drotatewithright(avlptr k3)
{

K3->right = Srotatewithleft(K3->right);
return(Srotatewithright(k3));
}


Insertion into an AVL tree


avlptr insert(avlptr T,int x)
{
if (T = = NULL)
{
T = (avlptr)malloc(sizeof(struct avlnode));
if (T = = NULL)
printf(out f space);
else
{
T->data = x;
T->height = 0;
T->left = T->right = NULL;
}
}

else if ( x < T->data)
{
T->left =insert(T->left ,x) ;
if(height(T->left) height(T->right)= = 2)
if ( x < T->left->data)
T = srotatewithleft(T);
else
T =drotatewithleft(T);
}


else if ( x > T->data)
{
T->right =insert(T->right ,x) ;
if(height(T->left) height(T->right)= = -2)
if ( x > T->right->data)
T = srotatewithright(T);
else
T =drotatewithright(T);
}

T ->height =max(height(T->left), height(T->right)) + 1;
return(T);
}


Splay Trees

Splay tree guarantees that any M consecutive tree operations staring from an empty tree
take at most O(Mlog N) time.


Splay tree has an O(logN) amortized cost per operation.


The basic idea of the splay tree is that after a node is accessed,it is pushed to the root by a
series of AVL tree rotations.


This is based on the locality of reference principle.

In many applications when a node is accessed,it is likely to be accessed again
in the near future.




A simple Idea:

We rotate every node on the access path with its parent(single rotation,bottom up)

ex : Find on k1











An access on the required node will then push other nodes deep in the tree.
K2 K5
A B
K4
K3
F
E
C D
K1

*If we perform ordered rotations on the required node,it can be the new root and
simultaneously some balancing can be achieved.

*If the node is having parent & grand parent a double rotation(zig-zig/zig-zag) is
performed,other wise a single rotation is performed.

*If the total no of nodes in the path is odd then a single rotation is required at the
end,otherwise double rotations are sufficient.















Single rotations

Zig Left

p
C
x
B A
x
P
C B
A

Zig -right




x
A
p
B
C
p
x
A
B C


Double rotations



Zig Zig left


ig Zig right


Z



Zig Zag left



Zig Zag right



The splay tree node & structure definition
pedef struct splaynode * splptr
ruct splaynode
splptr left;
ent;
};
plementaion of splay tree:


ty

st
{

int data;
splptr par
splptr right;




Im
asic splay routine

B
oid splay(splptr current)
plptr father;
t ->parent;
f(father->parent = = NULL)
elrotate(current);
t;
ingle rotate function

v
{
s
father = curren
while(father ! = NULL)
{
i
Zig(current);
else
doub
father = current ->paren
}
}


S
oid Zig(splptr current)
f(current->parent->left = = current)
ght(current);
left data parent right

v
{
i
Zigleft(current);
else
Zigri
}




Double Rotate function
oid doublerotate(splptr current)
plptr p,g;
>parent;

f(p->left = = current)
agleft(current);
lse
f(p->right = = current)
agright(current);
outine for Zigleft

v
{
s
p = current-
g = p->parent;
if(g->left = = p)
{
i
ZigZigleft(current);
else
ZigZ
}

e
{
i
ZigZig right(current);
else
ZigZ
}
}



R :
oid Zigleft(splptr current)
plptr p, B;
parent;

)
;
t;
LL;

v
{
s
p = current->
B = current->right;
p->right = B;
if (B != NULL
B -> parent = p
current->right = p;
p ->parent = curren
current ->parent = NU
}





p
x
A B
C
x
A
p
B C

ZigZig left routine

void ZigZigleft(splptr current)
{
splptr p,g,ggp,B,c;
p = current ->parent;
g = p->parent;
ggp = g->parent;
B = current->right;
current ->.right
current->right = p;
p->parent = current;
p -> left = B;
if ( B! = NULL)
B -> parent = p;
c = p->right ;
p-> right = g;
g ->parent = p;
g -> left = c;
if(c! = NULL)
c -> parent = g;
current->parent = g;
current ->parent = ggp;
if(ggp !=NULL)
{ ggp->left = current;
else
ggp -> right = current;
}}





G
P
B-Tree:

It is a popular search tree that is not binary.

*A B-tree of order M is a tree with the following structural properties.

- The root is either a leaf or has between 2 and M children.

-All nonleaf nodes(expect the root) have between [M/2] and M children.

-All leaves are at the same depth.

used for indexing purpose-indexed sequential files.

All data items are stored at the leaves.

*No. of keys in a(nonroot) leaf is also between [M/2] and M.

Each interior node contains p1,p2,.pm pointers to the m children , k1,k2km-1
values, representing the smallest key when compared to k2,k3..km-1.

If a pointer pi is NULL ,the corresponding ki is undefined.

The node structure



P1 k1 P2 k2 P3 .. kn-1 Pn


The leaves contain all the data ,which are either the keys themselves or pointers to
records containing the keys.


Example of a B-tree of order 4







21

48



72

















A B-tree of order 4 is popularly known as 2-3-4 tree.

A B-tree of order 3 is known as a 2-3 tree.

Insert operation on B-trees by using the special case of 2-3 tree.

The interior nodes(nonleaves) in ellipses Leaves are drawn in boxes ,which contain the
keys.The keys in the leaves are ordered.








12 15 - 59



_ _

84



_ _

25

31 41

1,4,8 ,11 13,14
74,78
74,78 60,63 47,5
43,4
33,3 26,28 21,24 17,18,19


16:- 41:58
8,11,12 16,17 22,23,31 58,59,60 41,52
22:-


Case1 : To insert a node with key 18, we can just add it to a leaf without causing any
violations of the 2-3 tree properties.

Case2 : To insert a key X , if there is no place in the leaf split the leaf into two and attach
to its parent.

Case3: If there is no place at its parent then split its parent into two and attach both at its
grandparent, and so on.

When we are inserting the elements using the above procedure satisfying the B-tree
properties the levels of the tree may increase.






Deletion : We can perform deletion by finding the key to be deleted and removing it.

If there is any violation of the property(no of keys) combine this leaf with its
sibling if there is any violation at its parent then merge the parent with its sibling and so
on.

When we delete an element from a B-tree the level of the B-tree may be reduced.
The worst-case running time for each of the Insert/delete operations is O(MlogM N)

Find operations takes O(logN)

The real use of B-trees lies in database systems.

B+-tree:

The minimum in the right subtree is maintained in the root/parent.

B*-tree:

Horizontal links are maintained.Links are available from a node to its siblings.

*A node is not going to be splitted until all the siblings are 70% filled. Shifting of
the data from one node to its sibling is implemented.












Hashing


The process of converting a key into address is called as Hashing.

Key-to-address transformation is defined as a mapping or a Hashing functions.

Hash Table:

The sequence of memory locations(Array of memory locations) in which the keys are to
be stored is called as a Hash Table.

Hash Table Size(m) : The no. of memory locations in the hash table.The table size m
must be a prime number for the even distribution of the keys in the table.

Load factor() : It is the ratio of the present no. of elements in the table to the table size.

Collision: I f more than one key is transformed into the same address then it is called as
collision.

Collision are resolved using Collision resolution techniques.

Preconditions: The process of converting alphanumeric keys into a from which can be
more easily manipulated by a hashing function is called as preconditioning.

Mainly hashing is used to implement Direct files.

Hash search is faster than other search algorithms.








Hashing Functions:


0
1
2
3
4

21

83

5
15
6

7
8
17

9




m = 10

= 4/10 = 0.4

The Division Method :

H(x) = x mod m


In this system the term x mod m has a value which is equal to the remainder of dividing
x by m.

The division method yields a hash value which belongs to the set { 0, 1,2 ..m-1 }

It is a simple and widely accepted method.

Possibility of collisions is more.





Midsquare Method :

In this method a key is multiplied by itself and the address is obtained by selecting an
apprppriate no of bits or digits from the middle of the square depending up on the table
size.

Ex: x=123456



x
2
=15241383936


3-digit h(x)=138

The Folded Method:

In this method the key is partitioned into no.of parts , each of which has the same length
as the required address with the possible exception of the last part.


Fold Shifting method:

All the parts are added together ,ignoring the final carry .In the case of binary Ex- OR can
be used.

Fold-boundary method:

A variation of the basic method involves the reversal of the digits in the outermost
partitions.

Folding is a hashing function which is also useful in converting multiword keys into a
single word so that other hashing functions can be used.

Digit Analysis:

This method forms addresses by selecting and shifting digits or bits of the original key.

This hashing function is in a sense distribution dependent.

For a given key set,the same positions in the key and the same rearrangement pattern
must be used consistently.

After the analysis on a sample of the key set,Digit positions having the most uniform
distributions are selected.

It is used in conjuction with static key sets.
(i.e key sets that do not change over time).


The length Dependent Method

It is commonly used in table-handling applications. In this method the length of the key is
used along with some portion of the key to introduce either a table address directly or an
intermediate key which is used, for example , with the division method to produce a final
table address.

ex: The sum of the binary equivalent of the first and last characters + 6 times the
length of the key.

Algebraic Coding:

It is a cluster-separating hashing function on algebraic theory. An r-bit key
(k1,k2,kr)2 is considered as a polynomial

r
K(x)= ki xi-1
i=1
m = 2t-1;
t
P(x)= xt + pi xi-1
i=1;
t
K(x) mod P(x) = hi xi-1
i=1.

Multiplicative Hashing:

For a nonnegative integral key x and constant c such that 0 < c< 1 , the function is

H[x] = [ m(cx mod 1)]

Here cx mod 1 is the fractional part of cx.

Collision Resolution Techiniques.

Separate Chaining :

It is to keep a list of all elements that has to the same hash value.






To perform a find, we use the hash function to determine which list to traverse,we then
traverse the list in the normal manner,returning the position where the item is found.

To insert an element find the required list and insert the new element either in the
beginning or at the end.

Implementation:

Ttypedef struct node * nptr;
Struct node
{
int data;
nptr next;
};
Creating the hash table assuming memory is allocated by malloc.

nptr * create ht(int m)
{
nptr * HT;
int i;
HT = (nptr *) malloc (sizeof(nptr) * m);
for( i =0 ;i< m ;i++)
{
HT[ i ] = createhead();

}

return(HT);

}


Disadv : space for pointers.

Open Addressing : In this if a collision occurs, alternative cells are tried until an empty
cell is found.More formally ,cells h0(x), h1(x),h2(x) .. are tired in succession ,where
hi(x) = (Hash(x) + F(i)) mod Tablesize.

If F(i) =i ->i = 1,2, ..m-1 linear probing
F(i) = i*i Quadracting probing.
&F(i) = i.h2(x) Double hashing.


Generally the load factor should be below ( < 0.5) 50% for open addressing hashing.
A Bigger table is needed for open addressing hashing than for separate chaining hashing.
Linear Probing.

In linear probing ,F is alinear function of i , typically F(i) = i .This amounts to trying cells
sequencially (with wraparound) in search of an empty cell.
Ex: h(x) = x mod m + F(i) ,i = 1,2, m-1.
F(i)=i;





As long as the table is big enough, a free cell can always found ,but the time to do so can
get quite large.


Worse ,even if the table is relatively empty ,blocks of occupied cells start forming.
Thi s effect ,known as Primary clustering , means that any key that hashes into the
cluster will require several attempts to resolve the collision, and then it will add to the
cluster.

The expected no. of probes using linear probing is roughly ()(1+ 1/(1- )
2
)


for insertions and unsuccessful searches and for

Successful searches. ( )(1+1/(1- ) )


The mean value of Insertion time I()=1/ 1/(1-x)dx
0 = (1/ )ln(1/(1- ))
Empty
table
After 89 After 18 After 49 After 58 After 69
0 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18 18
9 89 89 89 89 89

Quadratic probing:

It is a collision resolution method that eliminates the primary clustering problem of
linear probing.

h(x) + F(i) when F(i) = i*i




Empty
table
After 89 After 18 After 49 After 58 After 69
0 49 49 49
1
2 58 58
3 69
4
5
6
7
8 18 18 18 18
9 89 89 89 89 89


There is no guarentee of finding an empty cell once the table gets more than half full, or
even before the table gets half full if the table size is not prime.
If quadratic probing is used , and the table size is prime ,then a new element can always
be inserted if the table is at least half empty.

Standard deletion cannot be performed in an open addressing hash table,because the cell
might have caused a collision to go past it.
Open addressing hash tables require lazy deletion.

Secondary Clustering :
Although Quadratic probing eliminates primary clustering,elements that hash to the same
position will probe the same alternative cells. This is known as Secondary Clustering.


Double Hashing:

h1(x) + F(i)

Where F(i) = i.h2(x) and h2(x)is another hash function .A better choice for h2(x) is
h2(x) = R (x mod R ) with R is a prime number smaller than Table size.

If double hashing is correctly implemented , the expected no. of probes is about almost
the same as for a Random Collision resolution Strategy.

Quadratic probing , however , does not require the use of a second hash function and is
thus likely to be simpler and faster in practice.

Rehashing:

If the table gets too full ,the running time for the operations will start taking too long and
Inserts might fail for open addressing hashing with quadratic resolution.This can happen
if there are too many removals intermixed with insertions.A solution ,then is to build
another table that is about twice as big(with an associated new hash function) and scan
down the entire original hash function,computing the new hash value for
each(nondeleted) element and inserting it in the new table.

This entire operation is called rehasing If the exceeds 0.5 rehashing can be
implemented.













0
1
2
3
4
5
6 6
7 23
8 24
9
10
11
12
13 13
14
15 15
16




0 6
1 15
2 23
3 24
4
5
6 13




The running time of rehashing operation O(N) .

Rehashing can also be done when an insertion fails.

Rehashing frees the programmer from worrying about the table size and is important
because hash tables cannot be made arbitrarily large in complex programs.

Extendible Hashing:
This deals with the case where the amount of data is too large to fit in main
memory.We assume that at any point we have N records to store; the value of N changes
over time. Furthermore, at most M records ( 4 in this case) fit in one disk block.
Extendible hashing, allows a find to be perfomed in two disk accesses .Insertions also
require few disk accesses.
As M increases,the depth of a B-tree decreases.







To insert an element,if the leaf is full then we have to split the leaf and three bits are
required for identifying a key. Hence extend the address.





The expected number of leaves is (N/M)log2e .This the average leaf is ln 2 = 0.69 full.


Priority Queues(Heaps)



Insert(pq,x) Y=deletemax(pq)
Priotiry Queue
In the case of single server and several users the queue is used .But if users are with
different priorites then a Priority Queue is used.

Priority queue is queue in which the elements inserted are with different priorities.

A priority queue is a data structure that allows at least the following two operations.

Insert,which does the abvious thing ,and.

Deletion ,which finds, returns and removes the minimum element/maximum element in
the priority Queue.



Applications It is used in the operating systems.

It is used for external sorting(replacement selection/Extendable runs)

Priority queues are also important in the implementation of greedy algorithms,which
operate by repeatedly finding a minimum.

Ways of Implementation Priority Queue

1.use a simple linked list.performing insertion at the point in O(1) and traversing the
list,which requires O(n) time, to delete the minimum.
2.Maintain always a sorting list.This makes insertions expensive (O(n)) and Delete Min
cheap(O(1)).
3.Priority Queue can be implemented using a binary search tree.This gives an O(log n)
average running time for both operations.
In BST minimum will exist in the left after some deletion the tree may be
right heavy.
The basic data structure we will use will not require pointers and will support both
operations in O(log n) worst case time.Insertion will actually take constant time on
average,and our implementation will allow building a priority queue of nitems in
Linear time,if no deletion intervene,which known as a binary heap.

Binary Heap: (heap)

Heaps have two properties:
1.Structure property and 2.Heap order property.

Structure property: A heap is a binary tree that is completely filled, with the possible
exception of the bottom level,which is filled from left to right such a tree is known as a
Complete binary tree.
The height of a complete binary tree is [log n].
Heap order property:
In a heap ,for every node x, the key in the parent of x with is smaller than (or equal to)
the key in x,with the expection of the root.This heap is called as Minimum Heap.

Max heap order property:
For every node x, the key in the parent of x is greater than(or equal to) the key in x, with
the exception of the root.(Which is not having any root).

A complete binary tree is so regular,it can be represented in an array and no pointers are
necessary.

For any element in array position i, the left child is in 2i,the right child is in the cell after
the left child(2i+1) and the parent is in position [i/2].


13 21 16 24 31 19 68 65 26 32

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14


Heapsize
















MINHEAP MAXHEAP












After insert/delete operation the structure and order properties if disturbed must be
restored.



To (define) implement heap ADT:

Centinal value


0 1 2 3 4 5 6 7 8 9.. heap-1




16
19
13
24
68
21
13
32 26
58
37
24
73
35
64
48
18
16
19
68
24
13
32 26
15
21
13
65
Heap
hsize
harray
H
Struct heap
{
int heap;
int hsize;
int *harray;
};

typedef struct heap *hptr;

In the first array location a centinal value (minimum/maximum) is inserted.
(Declaration) of creation of a priority queue

hptr createpq(int n)
{
hptr H;
H = (hptr) malloc (sizeof( struct heap));
if( H = = NULL)
Printf(out of space);
Eelse
{
H ->harray = (int * )malloc(n*sizeof(int));
If(H->harray==NULL)
{
Printf(out of space );
Rreturn(NULL);
}

else
{
H->heap = n;
H ->hsize = 0;
H-> harray[0] = MinData;/*centinal value*
}
}

return(H);
}

Insert Operation:

To insert an element x into the heap,we create a hole in the next available location to
maintain the structure property by increment the present heap size we can get this.

If x can be placed in the hole without violating heap order then insert otherwise move the
parent into the hole and hole into the parent position continue this (perculate up) until we
get the correct place for x and insert x.

Insert function:
void Insert(nptr H, int x)
{
int i;
if( H ->hsize >= H->heap)
printf(overflow on insert);
else
{
for( i = ++H ->hsize; H->harray[i/2] > x; i/ =2)
{
H->harray[i] = H->harray[i/2];
}
H->harray[i] = x;
}
}
DeleteMin :

Minimum is existing in the root .Deletion of minimum creates a hole( violation of
structure property) which is to be moved to the last(perculate _down) and the last element
is inserted in appropriate position.

int deletemin(hptr H)
{
int I,child,min,last;
if(H->hsize < = 0)
{
printf (underflow on delete);
return(-1);
}
else {
min = H->harray[1];
last = H->harray[H->hsize --];
for( i = 1 ; i* 2 < = H->hsize ; i = child )
{
child = i* 2;
if( child ! = H ->hsize && H->harray[child + 1) < H-.harray[child])
child ++;
if( last > H ->harray[child])
H -> harray[i] = H->harray[child];
Eelse
break;
H ->harray[i] = last;
return(min);
}}

Insert/Delete operations can be completed in O(n) time.

Other Heap Operations:

The decrease_key(H,k,p) operation lowers the value of the key at position p, by
a positive amount k.Since this might violate the heap order,it must be fixed by a
perculate_up.

This operation could be useful to system administrators. They can make their programs
run with highest priority.
Increase_key(H,K,P) Operation increases the value of the key at position p by a
positive amount k.This is done with a perculate-down.

Many schedules automatically drop the priority of a process that is consuming excessive
cpu time.
Delete(H,P) Operation removes the node at a positive p from the heap.This is
done by first performing decrease_key(H, K , p ) and then performing deletemin(H).

When a process is terminated by a user(instead of finishing normally) ,it must be
removed from the priority queue.

Build Heap:
The build heap(H) operation takes as input n takes and places then into an empty
heap.This can be done with n successive Inserts.Since each insert will take O(1)
average and O(log n) worst_case time, The total running time of this algorithm would be
O(n) average but O(nlog n) worst_case.
With reasonable care a linear time bound can be guaranteed.

The general algorithm is to place the n keys into the tree in any order,maintaining the
structure property.Perculate all the elements from i = n/2 to 1 down.


for(i = n/2 ; i> O ; i --)
Perculatedown(i);
The time complexity of this algorithm is O(n).
For the perfect binary tree of height h containing 2h+1 1 nodes,the sum of the heights
of the nodes is 2h+1 1 (h+1).
The heap is represented using simple array.
The perculate down routine is given below.











59
58
31
41
97
53
26
97 53 59 26 41 58 31

0 1 2 3 4 5 6 7 8 9 10 11





void perculatedown(int H[] ,int i, int n)
{

int child,temp;
for(temp = H[i] ; 2* i< n ;i =child)
{
child = 2* i;
if( child! = n-1 && H[child] < H[child +1])
child ++;
if(temp < H[child ])
H[i ] = H[child];
Else
break;
}
H[i] = temp;
}


HeapSort :

Space complexity = 1

T(n) = O(nlog n)

Basic-strategy :

Build a binary heap of n elements. This takes O(n) time.Then perform n deletemin
Operations which requires n extra locations.
To Reduce the space complexity build max heap and perform delete maximum
Operation.
This maximum can be inserted in last location of the heap.The same can be implemented
by swaping the first and last element of the heap and perculate_down the first element.
Repeating the above procedure n-1 times the given list can be sorted .The function is:
void heapsort(int a[],int n)
{
int i;
for( i =n/2 ; i>=0 ; i--)
perculatedown(a,i,n );


for(i = n-1 ;i>0 ;i-- )
{
swap( &a[0] ,&A[i ]);
perculatedown(a,0,i);
}
}

The average no of comparisons used to heapsort a random permutation of n distinct items
is
2nlogn-O(nloglogn)=O(nlogn)

Vous aimerez peut-être aussi