Vous êtes sur la page 1sur 33
UNIT-IV Graphs, Hashing & Heaps @ What are Graphs? Explain types of graphs. 2. What are the applications of graphs? 3. How do you represent a Graph? 4. Explain Adjacency Matrix? 5. Explain adjacency lists. |. Define the terms: adjacent vertices, incident edges, degree, in- d out-degree, cycle, spanning tree, loop and path. Explain Graph traversals. (OR) Explain Depth first and Breadth Search methods. £ Differentiate Graph and Tree. Discuss various methods of re graphs in memory. in minimum spanning trees. What is Heap? What are the Types of Heaps? How can we implement it. . Explain the procedure for Heap sort. What are the Applications of Heap? Define the terms: Hash table, Hash Function, Bucket, Probe, Synonym, Overflow, Open (or) External Hashing, Closed (of) I Hashing, Hash Function, Perfect Hash Function, Load density, Table, Load Factor, Rehashing. What are the issues in Hashing? . What is Hash Function? What are the features of Good Function? What are the methods of implementing Hash Function? Explain. are the collision resolution strategies are available in about Hash Table Overflow, . Explain about Extending Hashing, | Agraptris @ collection of vertices or’nodes, which ‘ by lines oF edges. " Formally, a graph G = (V, E) is an ordered Vertices and Edges. pa 4 Vertices are also called as nodes or points Edges are a as lines or arcs. § Vertices are isplayed as circles and edges are displayed as ‘An edge with orientation () is directed edge while an ome orientation (—) is undirected edge. Pere of graphs are listed below: “4 Directed Graph: A graph in which each edge is directed is Directed graph AC Uindiected graph: A graph in which each edge fs undirected iscaled | an undirected graph. — : 3--Connected graph: A graph Gis comedted ifthere is @ path between every pair of vertices. x i Bub graph: sub graph Is @ graph in which vertex and e602 subsets of those of G. ‘ 5, Bipartite graph: A graph in which the vertices can be two sub sets (Interpreter vertices and language ‘edge has its end points in those two sub sets. 2 ‘Complete Graph: A graph G is said to be oom is adjacent to every other vertex V In G: A 0 vertices has ‘n (n-1)/2' edges. iC graph: A graph ‘G' is weighted, if each "assigned a nonnegative numerical value (cost or wag directed graph or strongly connect graph G is said to be Bonnected, or strongly oo Graphs are used in many applications: = Analysis of electrical networks: Graphs, digraphs are used to analyze electrical networks. B circuits can be represented by graphs = Study of molecular structure: The study of molecular structure of chemical compounds: graphs. = The representation of airlines routes: To represent airline route system airports are considered to be and flights to be edges and distance or ticket price to be weights — * Networks: ‘Computer networks or communication networks can be renroesaeay graphs. Here, vertices are the computers and edges are comi lines. Weight could be bandwidth. * Topological sort: Graphs are also used to sort elements using the topological algorithm ~ Finding i ie Graphs can used to find shor b two ites acblee shortest path between any ora n be represented using matrices. matt nih fs él element of ‘A’ is either zero of one, If Gis Digraph © ACE = fifi) exists 0 otherwise 1 is Undireced graph = AUG) a if (,j) or Gi) exists 0 otherwise ier the following graph. 3 ee graph has 4 vertices and it is undirected graphs Write 1 vertices ie. 1 t to 4 as as row and nd column headers adjacency ae if edge exists between any two nodes (row, column ‘ee write it as 1 otherwise in the matrix. Vertices {1,234} 1 Ee 4 p the following graph. The graph has 4 vertices and it is undirected graph. Packed Adjacency list: 4 * Here we use two single dimensional arrays h{0 to n#1] and i[0 to x, = h[0 to n+41] represents vertices and I[0 to x] represents edges list: ed list for each adjacency list In addition to these linked lists, we take an array to maintain all the lists. Vertices : { 1,2, 3,4} +2), (1,3), (1,4) 2 (2,1), (2,3) G,D, G,2), B.4) 24D, 4,3) ie) es) L L fee | a , 3, 5, 8, 10) Packed Adjacency list for the Adjacent Vertices: In a directed graph (digraph), if (ij) represents edge that vertices Vi and Vj then Viis adjacent to Vj and V/ is adjacent from Vi. Incident edges: ina directed graph (digrap! vertices Vi and Vj then edge if (jj) represents edge that connects the is incedent from Vi and incedent from Vj Degree: ‘ Degree of a vertex is number of edges that are connected to that vertex. In-degree: In-degree of a vertex is number of edges that are oriented towards that, vertex.’ Out-degree: Out-degree of a vertex is number of edges that are oriented in ‘opposite direction of that vertex. Cycle: Acycle is the path, which start and end at the same vertex. ‘Spanning tree: Spanning tree is a sub-graph of G that contains all the vertices of G is connected undirected and has no cycles. ‘ Loop (Self Edge): ‘An edge of form (j,i) is called as loop or self edge ie, an connects a node with itself. Path: Path is a sequence of vertices that are connec are oriented in such way that it does not oppose t There are mainly two methods for traversing in a graph, Depth Breadth first traversals, ‘ami 1e starting node ‘A’, * aah we we examine each node along a path 'P’, which begins at A, ‘we process one of the neighbors of ‘A’ and continue along the p * After coming to the end of 'P’, we back track on ‘P’ until we continue along another path. = Ituses a STACK to hold the nodes that are waiting to be proc Algorithm: 1. Initialize all nodes to ready state ( status = 1) 2. Push a node ‘A’ onto the stack and change its status to 2. 3. While STACK is not empty Begin 4. POP the top node ‘xX’ from the stack. Process ‘X’ and change — its status to 3. ‘ 5. Push all the neighbors (that are still in ready state) of node. 'X’ on to the Stack. End 6. If graph still contains nodes that are in ready state repeat from step 2 7. return Function n to implement DFS (Depth First Search] void DepthFirst(Graph G) { boolean visited[MAX]; inty; for(all v in G) visited|v]=FALSE; for(all v in G) if(visited{v)) traverse(v); } vi traverse(int v) int w; visitedly}=TRUE; visit(v); fo al adacnt tov _ "tot 2, Breasith first search; «First we examine the starting node A. » Then we examine all the neighbors of A, = Continue with the other nodes in the same manner. No ssed more the once. s «= Ituses QUEUE to hold nodes that are waiting to be processed 1. Initialize all nodes to the ready state. ae 2. Puta node ‘A’ in Queue and change its status to2. 3. while QUEUE is not empty Begin 4. Remove the front node ‘x’ from the Queue. Process ‘X’ and change its status to 3. 5. Add the neighbors of ‘x’ that are in ready state to the rear of e Queue and change their status to 2. . End 6. If graph still contains nodes that are in ready state repeat from step 2 Function to implement BFS (Breadth First Search): void BreadthFirst(Graph G) { queue q; boolean visited[MAX]; int v.w; for(all v in G) visited[v|=FALSE; initialize(q), forall v in G) { if(Wvisited[v)) { addqueue(v.q); do « etequeve(v. a visited{vJe TRUE; C—O Depth First Search Breadth Firs Re _ Result: ABEDCGHEY - 8. Differentiate Graph and Tree. Discuss Various methods fepresenting graphs in memory, is Tree Graph A tree is @ data structure in which |'A graph is @ collection of veri | ®ach node is attached to One or | or nodes, which are joined More nodes as children pairs by lines or edges 3 data structure This is a non linear data struc can be graphs All the graphs are not trees Characterized by vertices edges a Agraph can be represented mainly in two ways: 4. Matrix Pen | — sss aus — {Explain minimum spanning tees a ininum Spanning Tree: A spanning tree is a subgraph of G (where G is ynaiested connected graph), is a tree, and contains all the vertices of G. A minimum spanning tree is a spanning tree, but has weights associated with the ) gages, and the total weight of the tree (the sum of the weights of its edges) is ©? at a minimum. A minimum spanning tree should satisfy the following conditions. ‘sit should be a sub-graph of the main graph (say ‘G') {tshould have all the vertices of ' The number of edges should be n-1, and there should not be any cycle. The total weight of the tree should be least. For, example consider the following graphs : Three (of the many possible) spanning trees from Agraph G: graph G: Aweighted graph G:. The minimum spanning tree from weighted graph G 2 2 KR Sat. 4 1 1 There are two different algorithms to find out minimum spanning tree: 1. Kruskal algorithm 2. Prim algorithm, 7) Data Structures stag COT Perper eee eseeeee eee i : ae ' B Kruskal's Algorithm: ices, keep adding the shortest (least Take a graph with 'n’ vertice: ant i ion of cycles, until (n - 1) edges have been while ee en does are chosen, does not matter if two or The on the same cost. Different minimum spanning trees May regu, Ines oil ok have ths barns Woal Weight, which tan always be the Tint weight ithm: Tews an vertex in a graph (vertex A, for example), and finds the feast weight vertex (vertex B, for example) connected to the Start vertex. Now, from either ‘A’ or 'B’, it will find the next least weighted vertex connecti n, creating a cycle (vertex C, for example). Now, from either ‘A’, Bor Crt ie find the next least weighted vertex Connection, without Creating a Cycle, ang $0 on it goes. Eventually, all the vertices will be Connected, without any cycles, which results minimum ‘Spanning trees. 10. What is Heap? What are the Types of Heaps? How can we’ implement it. Heap: A Heap is a complete or nearly complete binary tree in which the value in a Popes Rode is greater than or equal to the values of ity children. Types of Heaps: are 2 types of Heaps. They are: * Max-Heap * Min-Heap Max-Heap: ioe value de nia at Parent node is greater than all its children then such a mples 7 2° Max-Heap (or) Descending Heap. Example: ana ~ _ gg bn es Min-Heap: ifthe value present at parent node is smaller than all its children ther tree is called as Min-Heap (or) Ascending Heap. ealelle Insert operation of Heap: To insert an element to a Heap we must perform Heap-up ‘operation. Add the element to the bottom level of the Heap: Compare the added element with its parent. If it is found to be greater than its parent then they are interchanged. This procedure is repeated till the element is placed at its appropriate place. Example: Example: ee Data Structures wsingCF _ 84 To insert a new element 8 in the above Heap, first add it to bottom oft Heap. xy a gO © Compare 8 with its parent node i.e. 3. 8 is greater than 3. So swap the elements. CG) C) @) : Compare 8 with 10. 8 i3 smaller than its parent. Now it is satisfying the Heap Condition. Hence, we cain insert an element into the Heap. Delete operation of Heap: To delete an element from a Heap we must perform Heap-down operation. Here “delete” means, deleting the node with maximum value. This node is always the root, Steps to delete the root node: «Remove the root + Move the last node into the root. * Compare with its child nodes & swap them until it satisfy the Heap Condition. Data Structures sing CHF | yo Apoorva Publishers. Example: Root Node Now, we remove the 10 and replace it with the last node 1 Compare 1 with 5. 5 is greater than 1. So swap the elements Compare 1 with 4. 4 is greater than 1. So swap the elements Data Structures using C77 Now, it is satisfying the Heap Condition. Hence, we can delete the root node from the Heap. | (11. Explain the procedure for Heap sort. Heap Sort technique sorts agiven Set of values by buiding the heaps. Heap is a binary tree in which the value in a parent node is larger than (or equal to) the values of its children. Heap Sort Logic: In this technique, first build the heap using the given elements. We create a Max Heap to sort the elements in Ascending Order. Once the Heap is created we swap the root node with the last node and delete the last node from the Heap. Continue this procedure til all the elements are eliminated. Example: Array elements are: 4 10 3 5 1 Build the Heap Data Structures_using CT? Apoorva Publishers 87 Now, Transform it into a Max Heap. In a Max Heap, Parent node is always greater than or equal to child nodes. {0 is greater than 4. So swap 4 and 10. 5 is greater than 4. So swap 4 and 5. Data Structures using CHT — ee ‘Swap first and last node 1s | 3 | a io Removel ast node from the-heap Create Max Heap. 5 is‘greater than 1. So swap 1 and § 5 1 3 4] 10 4is greater than 1. So Swap 1 and 4 Data Structures using CTT = ‘Swap first and last node Create Max Heap. 4 is greater than 1, So. Remove last node from the heap oS itis Max Heap. So swap the first and last node -W “agoowva Publishers | 12. What are the Applications of Heap? 9 Heaps are commonly used in the following operations 1. Selection problem 2. Scheduling and prioritizing (priority queue) 3. Sorting 1. Selection problem: For the solution to the problem of determining the kth element, we can create the heap and delete k - 1 elements from it, leaving the desired element at the root. So the selection of the kth element will be very easy as it is the root of the heap. For this, we can easily implement the algorithm of the selection problem using heap creation and heap deletion operations. This problem can also be solved in O(nlogn) time using priority queues. 2. Scheduling and prioritizing (priority queue): The heap is usually defined so that only the largest element (that is, the root) is removed at a time. This makes the heap useful for scheduling and prioritizing. In fact, one of the two main uses of the heap is as a priority queue, which helps systems decide what to do next. Implementing and programming this structure is not as difficult as it was with anormal BST because the denseness and fullness allow us to conveniently represent the heap with an array. In a 0-indexed array, the first element has the index 0; a node at the index n has a parent node at (n- 12, rounded down, The major advantage of using heaps here is that they are fast, efficient, and require minimal storage space. Applications of priority queues where heaps are implemented include the following: 1. CPU scheduling 2. VO scheduling 3. Process scheduling 3. Sorting: Other than as a priority queue, the heap has one other important usage, heap sort. Heap sort is one of the fastest sorting algorithms, achieving speed as that of the quicksort and merge sort algorithms. The advantages of heap sort are that it does not use recursion, and it is efficient for any data order. There is no worst-case scenario in the case of heap sort. ~~ Data Structures using C¥+ A Define the terms: Hash table, Hash Function, Bucket, Probe, Synonym, Overflow, Open (or) External Hashing, Closed Hashing, Hash Function, Perfect Hash Function, Load density, Load Factor, Rehashing. Hash table: Hash table is an array [0 to Max - 1] of size Max. Hash function: Hash function is one that maps a key in the range [0 to Max ~ 1), the which is used as an index (or address) in the hash table for Tetrieving records. One more way to define a hash function is as the that transforms a key into an address. The address generated by function is called the home address. All home addresses refer to a p ‘area of the memory called the prime area. Bucket: A bucket is an index position in a hash table that can store more than record. Probe: 4 Each action of address calculation and check for success is called as a probe. Collision: ‘The result of two keys hashing into the same address is called collision. Keys that hash to the same address are called synonyms. Overflow: The result of many keys hashing to a single address and lack of room in bucket is known as an overflow. Collis id Gabicketle of size 1. ision and overflow are synonymous Open or external hashing: __ When we allow records to be stored in os nsligiag Potentially unlimited space, it is ¢ ire ee . ™ fee ed cated scone Nera ae ( 14, Hash function: Ee Hash function is an arithmetic function that transforms a which is used for storing and retrieving a record. ae Perfect hash function: The hash function that transforms different keys into different called a perfect hash function. The worth of a hash function depends well it avoids collision. Load density: ' The maximum storage capacity, that is, the maximum number of records that can be accommodated, is called as loading density. Full table: A full table is one in which all locations are occupied. Owing to the characteristics of hash functions, there are always empty locations, rather a hash function should not allow the table to get filled in more than 75%. Load factor: Load factor is the number of records stored in a table divided by the maximum capacity of the table, expressed in terms of percentage. Rehashing: Rehashing is with re with Key‘ at the bucl spect to closed hashing. When we try to store the record et position Hash(Key1) and find that it already: holds a record, it is collision situation. To handle collision, we use & strategy to choose a sequence of alternative locations Hash1(Key1), Hash2(Key1), and so on within the bucket table so as to place the record with Key1. This is known as rehashing. What are the issues in Hashing? Issues in hashing In case of collision, there are two main v consider 4, We need a good hashing function that minimizes the number | 2. We want an efficient collision resolution strategy SO aS ‘synonyms. 9 i ae 16. ‘o store a record in a hash table, a hash function is applied to the key of record being stored, returning an index within the range of the hash table. record is stored at that index position, if it is empty. With direct record with key K is stored in slot K. With hashing, this record is stored at location Hash(K), where Hash(K) is the function. The hash function Hash(k) used to compute the slot for the key K. If the probability that a key ‘Key’ occurs in our collection is P(Key), and for M slots in our hash table, a uniform hashing function, Hash(Key), should ensure that for 0 < Key < M- 1, P(Key) = 1, are all equiprobable with probability 1/M, The hash function should ensure that they are hashed to different locations. Sometimes, this is easy to ensure. For example, if the keys are randomly distributed in {0 ... r], with 0 to M - 4 locations then, Hash(Key) = floor((M x Key)/t) will provide vniform hashing. Features of a Good Hashing Function: 1. Addresses generated from the key are uniformly and randomly distributed. 2. Small variations in the value of the key will cause large variations in the record addresses to distribute records (with similar keys) evenly. 3, The hashing function must minimize the occurrence of collision. What are the methods of implementing Hash Function? Explain. ] There are many methods of implementing hash functions. They are: « Division Method Multiplication Method Extraction Method Mid-square Hashing Folding Technique Rotation Universal Hashing Division Method: One of the required features of the hash function is that the resultant must be within the table index range. One simple choice for a hash function | to use the modulus division indicated as MOD (the operator % in C/C++) function MOD returni remainder when the first parameter is divided byt To pall The ron i bnegeive a if the first parameter is mut in The function returns is NULL, the result is NULL, a Key is divided by some number M, and the remainder is This function gives the bucket addresses in the range of y hash table should at least be of size M. The choice of M is Vv method, we usually avoid certain values of M. Binary keys of length in po are usually avoided. A good choice of M is that it should be a prime number than 20. a Multiplication Method: Another hash function that has been widely used in many applications is the multiplication method. The multiplication method works as follows: 4. Muttiply the key ‘Key’ by a constant A in the range 0 < A < 4 and extract the fractional part of Key ¥ A. 2. Then multiply this value by M and take the floor of the result. Hash(Key) = [M x ((Key x A) MOD 1)] Extraction Method: When a portion of the key is used for address calculation, the technique is called as the extraction method. In digit extraction, a few digits are selected, extracted from the key and are used as the address. For example, if the book accession number is of six digits and we require an address of 3 digits, then we can select the odd number digits—first, third, and fifth—which can be used as the address for the hash table. For example, Following Table shows the keys with their respective hashed addresses using digit extraction. Keys and addresses using digit extraction (eM lina Span names take the of the key Mid-square Suggests to ‘square of the key and gts othe square ey as he ates: The icity when te ky el ‘the entire key participates in the address calculation, if the key is large, then Very difficult to store its square as it should no! exceed the storage limit, $0 it ‘square is used when the key size is less than or equal to 4 digits, For example, Following Table shows the keys with their hashed addresses, 4 key is a string, it has to be preprocessed to produce a number. 4 Keys and addresses using mid-square The difficulty of storing the squares of larger numbers can be Overcome if we use fewer digits of the key (instead of the whole key) for squaring. If the key is large, we: car select a portion of the key and square it. For example, Following Table gives the keys and the squares of the first three digi with their hashed addresses. Folding Technique: In this technique, the key is subdivided into subparts that are combined or folded and then combined to form the address. For a key with digits, we can subdivide the 3 digits into three parts, add them up, and use the result as an address. Here the size of the subparts of the key is the same as that of the address. 4 ere Bene ret ong mech: ore PeuamcenmaneiononeeteN TT Pnised into parts of the size of the address. Left and right parts are Wey va fixed boundary between them and the centre part. if the key is 987654321, it is understood as Left 987 Centre 654 “i$ Fold shift the sum is 987 + 654 + 321 = 1962. Now discard dig 1 and is 962, x poundary, sum of the reverse of the parts is 789 + 456 + 123 = 1368. dot! and the address is 368. the keys are serial, they vary only in the last digit and this leads to the vtston of synonyms: Rotating the Key would minimize this problem. This method is ted along with other methods. Here, the key is rotated right by one digit and then tating tecmique is used to avoid synonyms. For example, let the key be 120605, rot is rotated we get 512060. Then the address is calculated using any other nash function. Universal Hashing: ‘Sometimes wrong operations are performed deliberately, such as choosing N keys ai of which hash to the same slot, yielding an average retrieval time of O(n). Any fixed hash function is helpless to this sort of worst-case behaviour. The only effective way to improve the situation is to choose the hash function randomly in 2 way that is independent of the keys that are actually going to be stored. This approach is called universal hashing and yields good performance on the average, no matter what keys are chosen The main idea behind universal hashing is to select the hash function at random at tuntime from a carefully designed set of functions. Because of randomization, the algorithm can behave differently on each execution; even for the same input. This ‘approach guarantees good average case performance, no matter what keys are Provided as input. [47 What are the collision resolution strategies are available in Hashing? No hash function is perfect. If Hash(Key1) = Hash(Key2), then Key and Key2 are synonyms and if bucket size is 1, we say that collision has occurred, As a Consequence, we have to store the record Key2 at some ‘other location. A Search is made for a bucket in which a record is stored containing Key2, using ne of the several collision resolution strategies. i eee : The resolution strategies are’ Hen ir b. Quadratic Probing ¢. Double Hashing eae nd Char (or Linked List) 2 irate Chaining (or Linke 3. Bucket hashing (defers collision but does not prevent it) The most important factors to be taken care of to avoid collision are the table Size and choice of the hash function. As we know, no hash function is perfect ‘and we have a limitation on the table size too. 1. Open Addressing: ? In open addressing, when collision occurs, it is resolved by finding an available empty location other than the home address. if Hash(Key) is not ‘empty, the positions are probed in the following sequence until an empty location is found. When we reach the end of table, the search is wrapped round to start and the search continues till the current collision location. N(Hash(Key) + C(1)), N(Hash(Key) + C(2)), ..., N(Hash(Key) + Ci), -.. Here Nis the normalizing function, Hash(Key) is the hashing function, and C(i) Ks the collision resolution (or probing) function with the i th probe Tho normalizing function is required when the resulting index is out bf range. A commonly used normalization function is MOD. Closed hash tables use open addressing. In open addressing, all records are sired in the hash table itself also said to be resolving in the prime area whesk Eontains all home addresses. In case of chaining, the collisions are resolved By storing them at a separate area known as the overfiow ares In open addressing, when collision occurs, the table rd is found or it is clear sing, to store the record, we successively examine, Probe, the hash table until we find an empty slot. Three techniques commonly used to compute the probe sequences required for 19—linear probing, quadratic probing, and rehashing, Unear Probing _A hash table in which a collision is resolved by placing the item in the ct following the occupied place is called linear the next free location until itis found, The function from the next location is as follows; ai ; (Hasho* i) MOD Max. Initially i = 1, if the location is not empty then it becomes 2, 3, 4, ..., and so on till an empty location is found. We simply add one to the current address when — collision occurs or till we find an empty location within the hash table limits. Alternatively, we can also add 2, subtract 2, or add 4, etc. Here Max is the table size or the nearest prime number greater than the table size. The use of MOD wraps the linear probing to the table start, if it reaches the end. Linear probing can be done using the following: With replacement: if the slot is already occupied by the key there are two possibilities, that is, either it is the home address (collision) or the location is occupied by some key. If the key's actual address is different, then the new key having the address at that slot is placed at that position and the key with the other address is placed in the next empty position. For example, in hash table of size 100, suppose Key! = 127 is stored at address 25 and a new Key2 = 1325 is to be stored. Address for Key2 (1325 MOD 100) is 25. Now as the location 25 is occupied by Key1, the with replacement strategy places Key2 at location 25 and searches for an empty location for Key = 127 Without replacement: When some data is to be stored in'the hash table, if the slot is already occupied by the key, then another empty location is searched for a new record. There are two possibilities when the location is occupied—it is either its home address or not. In both the cases, the without replacement strategy searches for empty positions for the key that is to be stored. Quadratic Probing: In quadratic probing, we add the offset as the square of the collision probe numb er. In quadratic probing, the empty location is searched by using the “following formula: open (Hash(Key) + i 2) MO D Max where i lies between 1 and (Max ~ ye 4 “4 4 ee Here if Maxis a prime numberof the form (4 ¥ integer + - covers all the buckets in the table, i seme probe position, then their sequences are the same. Similar 4 linear iene, the intial probe determines the entire sequence and hence maximum distinct probe s equences are used. As the offset added is not 4, ‘quadratic probing slows down the growth of primary clusters. Double Hashing: 3 Double hi Pid vine two hash functions, one for accessing the home address of a Key and the other for resolving the conflict. The sequence for Probing is generated as follows: (Hash1 (Key), (Hash1(Key) + 1 ¥ Hash2(Key)), ... =; 27 Ona and the resultant address is modulo Max. Rehashing: igi : If the table gets full, msertion using open addressing with quadratic ir might fail or it might take too much time. The solution for this Problem is to build another table that is about twice as. big and scan down the entire original hash table, compute the new hash value for each record, and insert them ina new table. Chaining: We have discussed three techniques that are used to compute probe ‘Sequences (to relocate synonyms) namely, linear Probing, quadratic Probing, and rehashing. We can Store the linked lists inside the hash table, in the unused hash table slots. The technique used to handle Synonyms is chaining; it chains together all the records that hash to the same address. Instead of relocating synonyms, a linked list of synor is ik home address of Synonyms. We need to _Apoowva Publishers __ 104 Hash Table Overflow: Even if a hashing algorithm (function) is very c7od, it is likely that collisions will occur. The identifi ers that have hashed into ..e same bucket, as discussed earlier, are called synonyms. An overfl ow is said to occur when a new identifi er is mapped or hashed into a full bucket. When the bucket size Is one , a collision and an overfl ow occur simultaneously. Therefore, any hashing program must incorporate some method for dealing with records that cannot fi t into their home addresses. There are a number of techniques for handling overfl ow of records. Open Addressing for Overfl ow Handling: We shall study two ways to handle overfl ows—open addressing and chaining In open addressing, we assume that the hash table is an array. When a new identifi er is hashed into a full bycket, we need to find another bucket for this identifi er. The simplest cotstor' to find the closest unfi lled bucket through linear probing or linear open addressing. When linear open addressing is used to handle overflows, a hash table search for an identifier | proceeds as follows: 1. Compute Hash(!) 2. Examine identifiers position Table[Hash(|)], Table[Hash(!) + 1], ..., Table[Hashfl] + i], in order uni (a) If Table[Hash(!) + j] = | then In this case | is found. (b) If Table[Hash(|) + j] is NULL, then I is not in the table. (c) If we return to the start position Hash(I), then the table is full and | is not in the table. One of the problems with linear open addressing is that it tends to create clusters of identifiers. Moreover, these clusters tend to merge as more identifiers are entered, leading to big clusters. An alternative method to retard the growth of clusters is to use a series of hash functions h1, h2, ..., hm. This method is called as rehashing. Buckets hi (x), 1 $ i $ m are examined in that order. Overflow Handling by Chaining: Linear probing and its variations are inefficient as the search for an identifier involves comparison with identifiers that have different hash values. Consider the following hash table O23 AO; Ml 25 (ATaTaTe Tata [a reT@ [er Tet Te Chaining Data Structures using C+ 3 ofZa +[Zyo Hash chains 2 25 In the above hash table of 25 buckets, one slot per bucket, searching for identifier Z, involves comparisons with the buckets Table[0] to Table[7], though none of the identifiers in these buckets had a collision with Tab and so cannot possibly be Z,. Many of the comparisons can be saved if maintain lists of identifiers, one list per bucket, each list containing alll th synonyms for that bucket. If this is done, a search involves computing th hash address Hash(!) and examining only those identifiers in the list Hash(1). Since the sizes of these lists are not known in advance, the best to maintain them is as linked chains. In each slot, additional space is link. Each chain has a head node, The head node, however, usué than the other nodes, since it has to retain d nodes should be Explain about Extending Extending Hashing: If linear probing or separate chaining is used for collision handling, then in case of collision, several blocks are required to be examined to search a key and when table is full, then expensive rehash should be used. For fast searching and less disk access, extendible hashing is used. It is a type of hash system, which treats a hash as a bit string, and uses a trie for bucket lookup. For example, assume that the hash function Hash(Key) retums a binary number. The first i bits of each string will be used as indices to figure out where they will go in the hash table. Additionally, i is the smallest number such that the first i bits of all keys are different. The keys to be used are as follows: 4. h{key1) = 100101 2. h{key2) = 011110 3. h{key3) = 110110 “Directory TT 0 |» Bucket A for kya 1 |—+ Bucket B for key1 Directory 00 |» Buckst A forkey2 o “ 1 10 Bucket B for keyt fot” en KY 11 Bucket C for key3 Let us assume that for this particular example, the bucket size is 1. The first two keys to be inserted, key1 and key2, can be distinguished by the most significant bit, and would be inserted into the table as follows: When key3 is hashed to the table, it would not be enough to distinguish all three keys by one bit (because key3 and key1 have 1 as their leftmost bit). Also, because the bucket size is one, the table would overflow. Because comparing the first two most significant bits would give each key a unique location, the directory size is doubled as follows: And so now key1 and key3 have unique locations being distinguished by the first two leftmost bits. Since key2 is in the top half of the table, both 00 and 01 point to it because there is no other key that begins with a 0 to compare. The root of the tree contains four pointers determined by the leading two bits of data. Each leaf has upto 4 records. D will be represented by the number of bits used by the root, which is known as a directory.

Vous aimerez peut-être aussi