0 évaluation0% ont trouvé ce document utile (0 vote)
63 vues18 pages
This document provides information about hashing and disjoint sets. It defines key concepts like equivalence relations, disjoint set operations and data structures, path compression, equivalence classes, and hash collisions. It also describes techniques for resolving collisions in open addressing hashing systems like linear probing and quadratic probing. Applications of disjoint sets and hashing are listed, including symbol tables, spell checkers, graphs, and games. An example of separate chaining hashing is provided with insertion and search algorithms.
This document provides information about hashing and disjoint sets. It defines key concepts like equivalence relations, disjoint set operations and data structures, path compression, equivalence classes, and hash collisions. It also describes techniques for resolving collisions in open addressing hashing systems like linear probing and quadratic probing. Applications of disjoint sets and hashing are listed, including symbol tables, spell checkers, graphs, and games. An example of separate chaining hashing is provided with insertion and search algorithms.
This document provides information about hashing and disjoint sets. It defines key concepts like equivalence relations, disjoint set operations and data structures, path compression, equivalence classes, and hash collisions. It also describes techniques for resolving collisions in open addressing hashing systems like linear probing and quadratic probing. Applications of disjoint sets and hashing are listed, including symbol tables, spell checkers, graphs, and games. An example of separate chaining hashing is provided with insertion and search algorithms.
1. Define Equivalence relation. An equivalence relation R is defined on a set S, if for every pair of elements (a,b) in S, a R b is either false or true. Where a R b is true if and only if, i. a R a, for each element a in S (said to be Reflexive) ii. a R b if and only if b R a (said to be Symmetric) iii. a R b and b R c implies a R c (said to be Transitive) Example: Electrical connectivity.
2. What are the basic operations that are performed on Disjoint Set ADT and specify the data structures used for representing the Set ADT. (i) The basic operations performed on Disjoint Set ADT are, Union (x, y) Performs a Union of the sets containing the two elements x and y Find (x) Returns a pointer to the set containing the element x (ii) The Data structures that are used for representing the SET are, Array Linked List Tree
3. What is path compression? Path compression is performed during a Find ( ) operation on Set ADT. This is the only way to speed up the find ( ) algorithm, without reworking the data structure entirely.
4. Define Equivalence Classes. Specify the properties of Equivalence Classes. Definition: The equivalence class of an element a (in S) is the subset of S that contains all elements related to a. Properties of Equivalence Classes (i) Each element must belong to exactly one equivalence class. (ii) All equivalence classes are mutually disjoint.
5. Define hashing. Hashing is the transformation of a given key (integer, real or string) into a shorter fixed length value (called hash value or location) that represents the original key. Hashing is used to index and retrieve items in a database because it is faster to find the item using the short hashed key than to find it using the original value.
2 | CS 2201 - Data Structures Unit 4
6. What is a hash collision? List the different collision resolving techniques? Collision: When two different keys computed to the same hash location or value in the hash table through the hash function, then it is termed as hash collision. The hash collision resolving techniques are (i) Separate chaining or External hashing (ii) Open addressing or Closed hashing
7. What do you mean by Open addressed hashing system? In a Open addressing hashing system, when a collision occurs, alternative cells are tried until an empty cell is found. The cells h i (x), h i+1 (x), h i+2 (x),. are tried in succession. The Hash function of the Open addressed hashing system is h (key) = (Hash (key) + F(i) ) % TABLESIZE, Where F(i) is the collision function.
8. What is Extendible Hashing System? An Extendible hashing is a hash system which treats a hash as a bit string, and uses a trie for bucket lookup. It is hierarchical in nature, re-hashing is an incremental operation and can be performed one bucket at a time, as needed.
9. What is a hash table? The hash table data structure is merely an array of some fixed size, containing the keys. Each key is+ mapped into some number (called hash location or value) in the range from 0 to TABLESIZE 1 and placed the key in the appropriate cell.
10. List the applications of SET and Hashing systems. Applications of Disjoint Set ADT: i. Connected components algorithm ii. Minimum spanning tree algorithm iii. Maze construction & Puzzles and games. Applications of Hashing System: a. Symbol table management in Compilers b. Online Spell Checkers and Dictionary System c. Graph theory d. Error detection in computer networks e. Puzzles & Gaming
3 | CS 2201 - Data Structures Unit 4
PART B 1. Define Hash function. Write routines to find and insert an element in the Separate Chaining hash system. (8) Hash Function: A hash function h (Key) is a key to address transformation (maps a key value into hash value or location) which acts upon a given key to compute the relative position of the key in an array called Hash table. Properties of Hash function The hash function should be simple and it must distribute the data evenly. It should minimize the number of hash collisions. Separate Chaining Hash System: Separate Chaining is a most common hash collision resolving method which keeps linked lists of all the Key values that hashes into the same hash location. The Hash table entries acts as a HEAD for all the linked lists. Find Operation: To perform a search, a hashing function is used to determine the linked list to traverse. Traverse the linked list in normal manner and return the position where the element is found. Insertion To perform insertion of an element, traverse down the appropriate list to check whether the element is already in place. If the element turns to be a new one, it can be inserted either at the front of the list or at the end of the list. If it is a duplicate element, an extra field is kept and placed. Case (i) Inserting a New Key at front of the Linked List Inserting a new key at front of the list is easy and convenient. It also may happen that recently inserted keys are most likely to be accessed in the near future and it eliminates the need for traversing the linked list. Case (ii) Inserting a New Key at end of the Linked List Inserting a new Key at the end helps to avoid redundancy. It requires the appropriate list to be traversed. Algorithm void Insert (int key, Hashtable H) begin /* Traverse the list to check whether the key is already present */ Pos = FIND (Key, H); If (Pos = = NULL) /* Key is not found */ begin 4 | CS 2201 - Data Structures Unit 4
Newcell = getnode(); If (Newcell ! = NULL) begin Loc = key % HashTablesize; Newcell Element = Key; Newcell Next = Loc Next; /* Insert the key at the front of the list */ Loc Next = Newcell; end end end.
Pros and Cons of Separate Chaining Hash System Pros: Unlimited Memory more number of Key values can be inserted as it uses array of linked lists. Cons: It requires pointers, which occupies more memory space. It takes more effort to perform a search, since it takes time to evaluate the hash function and also to traverse the list.
Example: Keys: 64, 81, 0, 4, 25, 49, 36, 16 and 9. H (Key) = Key % TableSize H (64) = 64 % 10 = 4 H (64) = 81 % 10 = 1 H (64) = 0 % 10 = 0 H (64) = 4 % 10 = 4 H (64) = 25 % 10 = 5 H (64) = 49 % 10 = 9 H (64) = 36 % 10 = 6 H (64) = 16 % 10 = 6 H (64) = 9 % 10 = 9
5 | CS 2201 - Data Structures Unit 4
2. Explain how the Collision is handled in the Open addressing Hashing System? Or Discuss the collision resolving strategies used in the closed hashing system.
OPEN ADDRESSED HASH SYSTEM or CLOSED HASH SYSTEM In a Open addressing hashing system, when a collision occurs, alternative cells are tried until an empty cell is found. The cells h i (x), h i+1 (x), h i+2 (x),. are tried in succession. The Hash function of the Open addressed hashing system is h (key) = (Hash (key) + F(i) ) % TABLESIZE, Where F( ) is the collision function. There are three common collision resolution strategies. They are (i) Open Addressing with Linear Probing (ii) Open Addressing with Quadratic probing (iii) Open Addressing with Double Hashing.
LINEAR PROBING In linear probing, the collision function F (i) is the linear function of I, which is amounts to try cells sequentially in search of an empty cell (with wrap around). If the end of the table is reached and no empty cell has been found, then the search is continued from the beginning of the table. It has a tendency to create clusters in the table. Hash (key) = (Hash (key) + F(i) ) % TABLESIZE, Where F(i) = i is the collision function for the i th collision.
Limitations: As long as the table is big enough, a free cell can always be found, but the time to do so can get large.
QUADRATIC PROBING It is a collision resolution method that eliminates the primary clustering problem using quadratic collision function. The collision function F(i)=i 2 . In Quadratic probing, on the first collision, look ahead one position and place the key in the hash table. On the second collision, look 2 2 positions ahead, and on the third collision look 3 2
positions ahead and so on.
Example: Insert the keys 89, 18, 49, 58, 69 Hash (key) = (Hash (key) + F(i) ) % TABLESIZE, F (i) = i 2
Limitations: In a Quadratic Probing System, The TableSize needs to be Prime and The new key element can always be inserted iff the hash table is atleast half empty. 7 | CS 2201 - Data Structures Unit 4
DOUBLE HASHING: In double hashing, a second hash function, hash 2 (x) is applied and probe at a distance hash 2 (Key), 2 hash 2 (Key), 3 hash 2 (Key) and so on. The collision function, F(i)=i* hash 2 (Key) Where i=1, 2, 3, 4 Here the second hash function, Hash 2 (Key) = R (Key % R), R is any prime < Tablesize. Example: Insert the keys 89, 18, 49, 58, 69 Hash (key) = (Hash (key) + F(i) ) % TABLESIZE, F (i) = i * hash 2 (Key) (i) Hash (89) = (Hash (89) + F(0) ) % 10 = 9 (No Collision) (ii) Hash (18) = (Hash (18) + F(0) ) % 10 = 8 (No Collision) (iii) Hash (49) = (Hash (49) + F(0) ) % 10 = 9 (1 st Collision) F (1) = 1 * hash 2 (Key) = 1 * 7 (49 % 7) taking R =7 = (Hash (49) +7) % 10 = 56 % 10 = 6 (No Collision) (iv) Hash (58) = (Hash (58) + F(0) ) % 10 = 8 (1 st Collision) F (1) = 1 * hash 2 (Key) = 1 * 7 (58 % 7) taking R = 7 = (Hash (58) + 5) % 10 = 63 % 10 = 3 (No Collision) (v) Hash (69) = (Hash (69) + F(0) ) % 10 = 9 (1 st Collision) F (1) = 1 * hash 2 (Key) = 1 * 7 (69 % 7) taking R =7 = (Hash (69) +1) % 10 = 70 % 10 = 0 (No Collision)
Note: Taking common PRIME number for the second hash function during collision is advisable. 8 | CS 2201 - Data Structures Unit 4
3. Write short notes on Re-hashing and Extendible hashing System with suitable example. (10)
RE-HASHING SYSTEM Rehashing System increases the size of a hash table array, and restoring all of the items into the array using the hash function.
When the original hash table is too full, Build the new hash table that is about twice as big (relatively next prime that is at least twice the current tables size) with an associated new hash function. Scan down the original hash table and compute the hash location for each element and Insert the elements into the new hash table. Then drop the original table. When should we Rehash? The Rehashing process occurs when, the original hash table is HALF full an insertion fails load reaches certain level (load factor) best option for rehashing. Load Factor Number of Key elements in the hash Table and can be represented as (when = 0 (table empty); = 0.5 (half full); =1 (table Full)
Example: Hash the following key elements 18, 15, 6 and 24 for the TableSize of 7.
0 1 15 2 3 24 4 18 5 6 6 9 | CS 2201 - Data Structures Unit 4
Initiating the Rehashing System,
New hash table Size = 7 * 2 = 14, and the nearest PRIME greater than 14 is 17.
By scanning down the original hash table and Rehashing using the new hash function as,
Hash (18) = 18 % 17 = 1 Hash (15) = 15 % 17 = 15 Hash (6) = 6 % 17 = 6 Hash (24) = 24 % 17 = 7 Now the original Hash table is freed. New HashTable
Pros and Cons of Rehashing System Rehashing can be used in other Data structures as well. For instance if the Queue data structure became full, declare a double-sized array and copy everything over, freeing the original. Rehashing frees the programmer from worrying about the table size. Cons: Rehashing is time consuming, rehash every element once again. It is also very expensive when running short of memory space.
EXTENDIBLE HASHING SYSTEM Extendible hashing is a type of hash system which treats a hash as a bit string, and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation (done one bucket at a time, as needed). This means that time-sensitive applications are less affected by table growth than by standard full-table rehashes. Hash Function: The hash function Hash (Key) for the extendible hash system returns a binary number. The first i bits of each string will be used as indices to figure out where they will go in the "directory" (hash table). Additionally, i is the smallest number such that the first i bits of all keys are different. Key terms used here is: 1. The key size that maps the directory (the Global depth), and 2. The key size that has previously mapped the bucket (the Local depth)
Operations on hash table: 1. Doubling the directory when a bucket becomes full - If the local depth is equal to the global depth, then there is only one pointer to the bucket, and there is no other directory pointers that can map to the bucket, so the directory must be doubled. 2. Creating a new bucket, and re-distributing the entries between the old and the new bucket - If the bucket is full, if the local depth is less than the global depth, then there exists more than one pointer from the directory to the bucket, and the bucket can be split Example: Keys (say k1, k2, k3) to be used: 100100, 010110, 110110 Initially the bucket size is 1. The first two keys to be inserted, k 1 and k 2 , can be distinguished by the most significant bit, and would be inserted into the table. Now, if k 3 were to be hashed to the table, it wouldn't be enough to distinguish all three keys by one bit (because k 3 and k 1 have 1 as their leftmost bit. Also, because the bucket size is one, the table would overflow. Because comparing the first two most significant bits would give each key a unique location, the directory size is doubled to 4 as:
And so now k 1 and k 3 have a unique location, being distinguished by the first two leftmost bits. Because k 2 is in the top half of the table, both 00 and 01 point to it because there is no other key to compare to that begins with a 0.
4. Consider a hash table of size 10, initially empty, after adding the following elements with h(x) = x mod 10 as the hash function. Assume that the hash table uses linear probing and rehashing occurs at the start of an add where the load factor is 0.5. keys are 7, 84, 31, 57, 44, 19, 27, 14, and 64 (6)
Refer class work note book. 1
5. Show the result of the following sequence of instructions on the sets from 1 to 17 integer digits: Union (1,2); Union (3,4); Union (1,7); Union (3,6); Union (8,9); Union (1,8); Union (3,10); Union (3,11); Union (8,12); Union (9,13); Union (14,15); Union (16,17); Union (14,16); Union (1,3); Union (1,14) when unions are performed as (i) Arbitrarily (ii) by height (iii) by Size (iv) find sets of 13, 7, 10 thro path compression. (6) Ans: Refer class work note book. 1
11 | CS 2201 - Data Structures Unit 4
6. What do you mean by Disjoint Sets. Discuss in detail the various representations of Disjoint Sets. (16) BASIC DEFINITIONS: (i) A set is a collection of objects. (ii) Set A is a subset of set B, if all elements of A are in B. Subsets are also Sets. (iii) Union of two sets A and B is a set C which consists of all elements in A and B. (iv) Two sets are mutually disjoint if they do not have a common element, also called Disjoint Sets (v) A relation R is defined on a set S if for every pair of elements (a,b), a, b S, a R b is either true or false. If a R b is true, then we say that a is related to b. (vi) An equivalence relation is a relation R that satisfy three properties: (reflexive) a R a, for all a S. (symmetric) a R b if and only if b R a. (transitive) a R b and b R c implies that a R c. (vii) An equivalence relation partitions a set into distinct equivalence classes
Operations on Disjoint Set 1. Union ( a, b) Check if a and b are already related: if they are in the same equivalence class. If not, merge the two equivalence classes containing a and b into a new equivalence class. 2. Find (x ) Return the name (pointer or index of representative) of the set containing a given element X. Implementation / Representation of Disjoint Set The Disjoint Set can be implemented using three data structures as 1. Array Implementation. 2. Linked List Implementation. 3. Tree Implementation.
ARRAY IMPLEMENTATION OF DISJOINT SET Array representation assigns one position for each element. Each position stores the element and an index to the representative. Initially, each element is in its own set. Find-Set(): To make the Find-Set operation fast, it stores the name of each equivalence class in the array. Thus the find takes constant time, O(1). Union-Sets(): Assume element a belongs to set i and element b belongs to set j. When we perform Union(a,b) all js have to be changed to is. Each union operation unfortunately takes (n) time. So for n-1 unions the time taken is (n2). 12 | CS 2201 - Data Structures Unit 4
Algorithms: Initialize( int N ) begin int array[N+1]; for (int i=1; i<=N; i++) array[i] = i; end int find( int i ) begin return array[i]; end
void UnionSets( int i, int j ) begin rooti=find(i); rootj=find(j); for (int k=1; k<=N; k++) begin if (array[k] == rootj) array[k] = rooti;
end end
Limitations: Using Array for representing the Disjoint Set requires more memory. LINKED LIST IMPLEMENTATION OF DISJOINT SET Each set is represented by a linked list The first object in each linked list serves as its set's representative. Each object in the linked list contains a set member, a pointer to the object containing the next set member, a pointer back to the representative. Each list maintains pointers, head, to the representative, and tail, to the last object in the list. Within each linked list, the objects may appear in any order (subject to our assumption that the first object in each list is the representative).
1 2 3 4 5 1 2 3 4 5 13 | CS 2201 - Data Structures Unit 4
Example: Set C and set F using Linked list & Union ( F, C).
TREE IMPLEMENTATION OF DISJOINT SET A tree data structure can be used to represent a disjoint set ADT. Each set is represented by a tree. The elements in the tree have the same root and hence the root is used to name the set. The trees do not have to be binary since we only need a parent pointer.
Operations & algorithms: nitialize Set (int N) begin int parent [N]; for ( int i = 0; i < N; ++i ) { parent[i] = -1; } end
If parent[i] == -1, then i is a root node. Initially, each integer is in its own set Find-set( ): The Find-Set operation takes a time proportional to the depth of the tree. int find-set( int i ) // Iterative Find-set () algorithm. begin while( parent[i]!=-1) i = parent[i]; return i; end.
int find-set( int i ) // Recursive Find-set () algorithm begin if(parent[i]==-1) return i; else return find-set(parent[i]); end.
Union-sets() operation: (an Arbitrary Union-Sets () algorithm) void UnionSets( int i, int j ) begin i = find-set ( i ); // root of i j = find-set ( j ); // root of j if ( i != j ) parent[j] = i; // 2nd set is appended to 1st set end. Example: Tree representation of disjoint set ADT
After union-sets (5,6)
Tree representation of disjoint set ADT after union (7,8),
Tree representation of disjoint set ADT after union (5,7) as
15 | CS 2201 - Data Structures Unit 4
7. With algorithm, discuss the effect of path compression and the various smart Union strategies in disjoint sets with suitable example. (10)
The Set union problem: The set union problem consists of performing a sequence of union and find operations, starting from a collection of n singleton sets {I}, {2}...{n}. The union-sets in the basic tree data structure representation were performed arbitrarily, by making the second tree a subtree of the first. Union-sets(X,Y) operation will add Y as subtree of X, irrespective of the depth of the tree. Tree representation after Union-Set(5,7)
Arbitrary algorithm for Union-Sets(4,5)
The basic approaches to improve the Union-sets algorithm are (i) Union by Size (ii) Union by Height / Union by Rank
Union by Size(X, Y) Algorithm: Union by Size makes the children of the root of the smaller tree point to the root of the larger. This requires that the size of each tree is maintained. Union by size is easy to implement and requires no extra space.
16 | CS 2201 - Data Structures Unit 4
Example: Union-sets(3, 7)
Result of Union by Size (3, 7)
Union by Height (X, Y) / Union by Rank (X, Y) Algorithm: Union-by-height is a trivial modification of union-by-size. It keeps track of the height, instead of the size, of each tree and performs unions by making the shallow tree a subtree of the deeper tree. It requires maintaining the height of the subtree rooted at each node (also referred to as the rank of a node.) The height of a tree increases only when two equally deep trees are joined (and then the height goes up by one).
Algorithm:
Step 1: first compare heights Step 2: link up shorter tree as child of taller tree Step 3: if equal height, make arbitrary choice Step 4: then increment height of new merged tree if height has changed will happen if merging two equal height trees
17 | CS 2201 - Data Structures Unit 4
void Union-Set by rank (root1, root2) //let array s[] is a set { if(s[root1] < s[root2]) s[root2]=root1; if(s[root2] < s[root1]) s[root1]=root2; if(s[root1]==s[root2]) s[root1]=root2; s[root2]--; }
Example: Union by rank (3,7)
Result of Union by height (3,7)
APPLICATION OF DISJOINT SETS 1. Maze generation (using a modified Kruskal's algorithm) 2. Construction of spanning tree for the graphs 3. Connected component labeling (electrical connections, network connections, etc) 4. Online maintenance of biconnected components 5. Alias analysis system software (compilers) 6. Used in construction of contour trees
18 | CS 2201 - Data Structures Unit 4
Path Compression Algorithm:
After finding the root V of the tree containing U in a find-set (U), traverse the path from u to v one more time and change the parent pointers of all vertices along the path to point directly to the root node V. This process is called path compression.
path compression, is also quite simple and very effective. During Find-set operations to make each node on the find path point directly to the root. Path compression does not change any ranks. Algorithm: int Find(int x) begin if (parent[x] < 0) return x else return parent[x] = Find(parent[x]) end Example: