Vous êtes sur la page 1sur 63

Searching

Searching is the process of finding the location of given element in the linear array. The search is said to be successful if the given element is found i.e. , the element does exists in the array; otherwise unsuccessful. There are two approaches to search operation: Linear search Binary search

Linear Search
This method, which traverse a sequentially to locate item is called linear search or sequential search. The algorithm that one chooses generally depends on organization of the array elements, if the elements are in random order, then one have to use linear search technique

Algorithm
Linearsearch (a,n,item,loc) Here a is the linear array of the size n. this algorithm finds the location of the elements item in linear array a. if search ends in success it sets loc to the index of the element; otherwise it sets loc to -1 Begin for i=0 to (n-1) by 1 do if (a[i] = item) then set loc=I exit endif endfor set loc -1 end

C implementation of algorithm
Int linearsearch (int *a, int n, int item) { int k; for(k=0;k<n;k++) { if(a[k]==item) return k; } return -1; }

Analysis of Linear Search


In the best possible case, the item may occur at first position. In that case, the search operation terminates in success with just one comparison. Worst case occurs when either the item is present at last position or missing from the array. In former case, the search terminates in success with n comparisons. In the later case, the search terminates in failure with n comparisons. Thus, we find that in worst case the linear search is O(n) operations.

Binary Search
Suppose the elements of the sorted in ascending order. sorting algorithm, called binary used to fined the location of element. array are The best search, is the given

Example
3,10,15,20,35,40,60 We want to search element 15

Given array

A[0] A[1] A[2] A[3] A[4] A[5] A[6] 3 10 15 20 35 40 60

1. We take the beg=0, end=6 and compute location of the middle element as mid=(beg+end)/2 = (0+6)/2=3

2.

Compare the item with mid i.e. a[mid]=a[3] is not equal to 15, beg<end. We start next iteration. 3. As a[mid]=20>15, therefore, we take end=mid-1=3-1=2 where as beg remains same.. Thus mid=(beg+end)/2 = (0+2)/2=1 Since a[mid] i.e. a[1]=10<15, therefore, we take beg=mid+1=1+1=2, where as end remains same Since beg=end 4. Compute the mid element mid=(beg+end)/2=(2+2)/2=2 Since a[mid] i.e. a[2]=15, the search terminates on success.

Algorithm
Binarysearch(a,n,item,loc) Begin set beg=0 set end=n-1 Set mid=(beg+end)/2 while((beg<=end) and(a[mid]!=item) do if(item<a[mid]) then set end=mid-1 else set beg=mid+1 endif set mid=(beg+end)/2 endwhile if(beg>end) then set loc=-1 else set loc=mid endif end

C Implementation
int binarysearch(int *a, int n, int item) { int beg,end,mid; beg=0; end=n-1; mid=(beg+end)/2; while((beg<end)&&(a[mid]!=item)) { if(item<a[mid]) end=mid-1; else beg=mid+1; mid=(beg+end)/2; } if(beg>end) return -1; else return mid; }

Analysis of binary search


In each iteration or in each recursive call, the search is reduced to one half of the array. Therefore , for n element in the array, there will be log2n iteration for recursive calls. Thus the complexity of binary search is O(log2n). This complexity will be same irrespective of the position of the element, event if it is not present in the array.

Hash table and Hashing


Objectives: Understand the problem with direct address tables Understand the concept of hash tables. Understand different hash functions. Understand the different collision resolution schemes.

Introduction
In all the search algorithms considered so far, the location of item is determined by a sequence of comparisons. In each case, a data item sought is repeatedly compared with item in certain locations of the data structure. However, the number of comparison depends on the data structure and the search algorithm used. E.g. In an array and linked list, the linear search requires O(n) comparisons. In an sorted array, the binary search requires O(logn) comparisons. In a binary search tree, search requires O(logn) comparisons.

Contd..
However, there are some applications that requires search to be performed in constant time, i.e. O(1). Ideally it may not be possible, but still we can achieve a performance very close to it. And this is possible using a data structure known as hash table.

Contd..
A hash table in basic sense, is a generalization of the simpler notation of an ordinary array. Directly addressing into an array makes it possible to access any data element of the array in O(1) time. For example, if a[1..100] is an ordinary array, then the nth data element, 1<=n<=100, can be directly accessed as a[n]. However direct addressing is applicable only when we can allocate an array that has one position for every possible key. In addition direct addressing suffers from following problems: 1. If the actual number of possible keys is very large, it may not be possible to allocate an array of that size because of the memory available in the system of the application software does not permit it. 2. If the actual number of keys is very small as compared to total number of possible keys, lot of space in the array will be wasted.

Direct address Tables


Direct addressing is a simple technique that works quite well when the universe U of keys is reasonably small. As an example, consider an application that needs a dynamic set in which each element has a key drawn from the universe U={0,1,2,3.,m1}, where m is not very large. We also assume that all elements are unique i.e. no two elements have the same key.

Contd..
Figure in next page shows the implementing a dynamic set by a direct address table T, where the elements are stored in the table itself. Here each key in the unverse U={0,1,2,..9} corresponds to a index in a table. The set K={1,4,7,8} of actual key determines the slot in the table that contains elements. The empty/vaccant slots are marked with slash character /.

Direct Addressing Tables


T / element 9 / / element / element element / 1 2 3 4

U 6 (Universe of keys)

2
0 3 5

1
K K 7 (actual keys) 8 4

5 6
7 8

Implementing a dynamic set by a direct address table T, where the elements are stored in the table itself

Contd..
Previous figure shows the implementation of a dynamic set where a pointer to an element is stored in the direct address table T. To represent the dynamic set, we can use array T[0..m-1] in which each position or slot correspond to a key in the universe U.

Direct Addressing Tables


T 1 U 6 (Universe of keys) 2 9 3 4 4 / / element / element element / / element

2
0 3 5

1
K K 7 (actual keys) 8

5 6
7 8

Implementing a dynamic set by a direct address table T, where the elements are stored in the table itself

Operations on Direct Address Table


Initializing a direct address table In order to initialize a direct address table T[0..m-1], sentinel value -1 is assigned to each slot. void initializeDAT(int t[],int m) { int i; for(i=0;i<m;i++) t[i]=-1; } This initialization operation is O(m)

Operations on Direct Address Table


Searching an element in direct address table To search an element x in a direct address table T[0..m-1], the element at index key[x] is returned. Int serch(int t[],int x) { return t[key[x]]; }

Operations on Direct Address Table


Inserting a new element in direct address table To insert a new element x in a direct address table T[0..m-1], the element is stored at index key[x]. Void insertDAT(int t[],int x) { t[key[x]]=x; }

Operations on Direct Address Table


Deleting a new element from direct address table To delete an element x from a direct address table T[0..m-1], the sentinel value -1 is stored at index key[x]. Void deletefromDAT(int t[],int x) { t[key[x]]=-1; }

DAT
Each of these operations is fast: only o(1) time is required. However, the difficulties with the direct address table are obvious as stated below. 1. If the universe U is large, storing a table T of size U may be impractical or even impossible given the memory available on a typical computer. 2. If the set K of actual keys is very small relative to U, most of the space allocated for T will be wasted.

Hash Table
A hash table is data structure in which location of a data item is determined directly as a function of the data item itself rather than by a sequence of comparisons. Under ideal conditions, the time required to locate a data item in a hash table is o(1) i.e. it is constant and does not depend on the number of data item stored.

Hashing
Hashing is a technique where we can compute the location of the desired record in order to retrieve it in a single access Here, the hash function h maps the universe U of keys into the slots of a hash table T[0..m-1]. This process of mapping keys to appropriate slots in a hash table is known as hashing.

T 0 U (Universe of keys) / h(k1) / /

k1
K K (actual keys) k4 k2 k7 k6 k3 K5 /

h(k2)=h(k4)=h(k7) h(k6) h(k3)=h(k5)

M-1

Implementing a dynamic set by a hash table T[0..m-1], where the elements are stored in the table itself

Hash table
Figure in the previous page shows the implementing a dynamic set by a hash table T[0..m-1], where the elements are stored in the table itself. Here each key in the dynamic set K of actual keys is mapped to hash table slots using hash function h. Note that the keys k2,k4, and k7 map to the same slot. Mapping of more than one key to the same slot known as collision. We can also say that keys k2,k4 and k7 collide. We usually say that an element with key k hashes to slot h(k). We can say that h(k) is the hash value of key k. The purpose of the hash function is to reduce the range of array indices that need to be handled. Therefore, instead of U values, we need to handle only m values which led to the reduction in the storage requirements.

What is hash function?


A hash function h is simply a mathematical formula that manipulates the key in some form to compute the index for this key in the hash table. For example, a hash function can divide the key by some number, usually size of the hash table, and return remainder as the index of the key. In general, we say that a hash function h maps the universe U of keys into the slots of a hash table T[0..m-1]. This process of mapping keys to appropriate slots in a hash table is known as hashing.

Different hash functions


There is variety of hash functions. The main considerations while choosing particular hash function h are: 1. It should be possible to compute it efficiently 2. It should distribute the keys uniformly across the hash table i.e. it should keep the number of collisions as minimum as possible.

Hash Functions
1. Division method: In division method, key K to be mapped into one of the m slots in the hash table is divided by m and the remainder of this division is taken as index into the hash table. That is hash function is h(k)=k mod m

Division method
Consider a hash table with 9 slots i.e. m=9 then the hash function h(k)= k mod m will map the key 132 to slot 6 since h(132)= 132 mod 9 = 6 Since it requires only a single division operation, hashing is quite fast.

example
Let company has 90 employees and 00,01,02,..89 be the two digits 90 memory address ( or index or hash address) to store the records. We have employee code as the key. Choose m in such a way that it is greater than 90. suppose m=93, then for the following employee code (or key k) h(k)=h(2103)=2103(mod 93) =57 h(k)=h(6147)=6147(mod 93) =9 h(k)=h(3750)=3750(mod 93) =30 Then typical hash table will look like as next page So if you enter the employee code to the hash function we can directly retrieve table[h[k]] details directly.

Hash Address

Emploee code (keys)

Employee name & other details

0 1 9 .. 6147 Anish

30 ..
57 .. 89

3750
2103

Saju
Rarish

Midsquare method
The midsquare method operates in two step, the square of the key value k is taken. In the second step, the hash value is obtained by deleting digits from ends of the squared value i.e.k2 . It is important to note that same position of k2 must be used for all keys. This the hash function is h(k)=k2 Where s is obtained by deleting digits from both sides of k2.

Midsquare method
Consider the hash table with 100 slots i.e.m=100, and values k=3205,7148,2345 Solution: K 4147 3750 2103 K2 17197609 14062500 4422609 h(k) 97 62 22 The hash values are obtained by taking fourth and fifth digits counting from right

Hash Address

Emploee code (keys)

Employee name & other details

0 1 22 .. 2103 Giri

62 ..
97 .. 89

3750
4147

Suni
Rohit

Folding method
The folding method also operates in two steps. In the first step, the key value k is divided into number of parts, k1,k2..kr, where each parts has the same number of digits except the last part, which can have lesser digits. H(k)=k1+k2+.+kr In the second step, these parts are added together and hash values are obtained by ignoring the last carry, if any. For example, the hash table has 1000 slots, each parts will have three digits, and the sum of these parts after ignoring the last carry will also be three digits number in the range 0 to 999.

Folding method
Here we are dealing with a hash table with index from 00 to 99, i.e, two digit hash table. So we divide the K numbers of two digits K K1 k2 k3 2103 21,03 7148 71,48 12345 12,34,5

H(k)= H(2103) K1+k2+k3 =21+03=24

H(7148) H(12345) =71+48=19 =12+34+5=51

Folding method
Extra milling can also be applied to even numbered parts, k2, k4, are each reversed before the addition

K K1 k2 k3

2103 21,03

7148 71,48 71,84

12345 12,34,5 12,43,5

Reversing 21,30 k2, k4

H(k)= H(2103) K1+k2+k3 =21+30=51

H(7148) H(12345) =71+84=55 =12+43+5=60

Multiplication method
The multiplication method operates in two steps. In the first step, the key value K is multiplied by a constant A in range 0<A<1 and extract the fractional part of value kA. In the second step, the fractional value obtained above is multiplied by m and the floor of the result is taken as hash value. That is, the hash function is h(k)=m(kAmod1) Where kAmod 1 means the fractional part of kA i.e. kA-kA. Note that x read as floor of x and represent the largest integer less than or equal to x. Although this method works with any value of A, it works better with some values than others. The best choice depends on the characteristics of the key values. Knuth has suggested in his study that the following value of A is likely to work reasonably well. A=(5-1)/2=0.6180339887

Multiplication method
Consider a hash table with 10000 slots i.e m=10000 then the hash function h(k)=m(kAmod1) Will map the key 123456 to slot 41 since H(123456)=10000*(123456*0.61803mod1) =10000*(76300.0041151mod1) =100000*0.0041151.) =41.151. =41

Hash Collision
It is possible that two non identical keys k1, k2 are hased into the same hash address. This situation is called hash collision

Hash Collision
Location 0 1 2 3 4 883 344 keys 210 111 Records

5 6
7 8 9 488

Hash collision
Let us consider a hash table having 10 location as shown in previous figure. Division method is used to hash the key. H(k)=k(mod m) Here m is chosen as 10. the hash function produces any integer between 0 and 9. depending on the value of the key. If we want to insert a new record with key 500 then H(500)=500(mod10)=0 The location 0 in the table is already filled. Thus collision occurred. Collision are almost impossible to avoid but it can be minimized considerably by introducing few techniques.

Resolving Collision
A collision is a phenomenon that occurs when more than one keys maps to same slot in the hash table. Though we can keep collisions to a certain minimum level, but we can not eliminate them together. Therefore we need some mechanism to handle them.

Collision Resolution by Synonyms Chaining


In this scheme, all the elements whose key hash to same hash table slot are put in a linked list. Thus the slot I in the hash table contains a pointer to the head of the linked list of all the elements that hashes to a value I If there is no such element that hash to value I, the slot I contains NULL value

T 0 U (Universe of keys) /
K1 X

/ /

k1
K K (actual keys) k4 k2 k7 k6 k3 K5 /

K2 X

K4 X

K7 X

K6 X K3 X K5 X

M-1

Collision resolution by separating chainng. Each hash table slot T[i] contains a linked list of all the keys whose hash value is i

Collision Resolution by Synonyms Chaining


Structure of node of linked list will look like Typedef struct nodetype { int info; struct nodetype *next; }node; 1. Initializing a Chained hash table Void iniHT(node*t[],int m) { int I; for(i=0;i<=m;i++) t[i]=NULL; }

Searching an element in Chained hash table


node *searchHT(node*t[],int x) { node *ptr; ptr=t[h(x)]; while((ptr!=NULL)&&(ptr->info!=x)) ptr=ptr->next; if(ptr->info==x) return ptr; else return NULL; }

Inserting a new element in Chained hash table


void insertHT(node*t[],int x) { node *ptr; ptr=(node*)malloc(sizeof(node)); ptr->info=x; ptr->next=t[h(x)]; t[h(x)]=ptr; }

Open addressing
In open addressing method, when a key is colliding with another key, the collision is resolved by finding a nearest empty space by probing the cells. Suppose a record R with key K has a hash address H(k)=h, then we will linearly search h+i( where i=0,1,2m) location for free space (ie. H, h+1,h+2.hash address)

Hash Collision
Location 0 1 2 3 4 keys 210 111 500 883 344 Records

5 6
7 8 9 488

Linear probing
The main disadvantage of linear probing is that substantial amount of time will take to find the free cell by sequential or linear searching the table.

Quadratic Probing
Suppose a record with R with key K has the hash address H(K)=h. then instead of searching the location with address h, h+1,h+2.h+i. We search for free hash address h,h+1,h+4,h+9,h+16,h+i2.

Double Hashing
Second hash function H1 is used to resolve the collision. Supose a record R with key K has the hash address H(k)=h and H1(k)= h1, which is not equal to m. then we linearly search for location with addresses H, h+h1,h+2h1,h+3h1,.h+i(h1)2 ( where i=0,1,2,3)

Chaining
In chaining technique the entries in the hash table are dynamically allocated and entered into a linked list associated with each hash key. The hash table in next table can represented using linked list.

Hash Collision
Location 0 1 2 3 4 883 344 14 18 keys 210 111 Records 30 12

5 6
7 8 9 488 31

546

32

Chaining
0 1 2 3 4 5 6 7 8 9
210 111 30 12

833 344

14 18

546

32

488

31

If we try to insert new record with a key 500 then H(500)=500(mod10)=0 then the collision occur in the normal way because there exists a record in the 0th position. But in the chaining corresponding linked list can be extended to accommodate the new record with the key as shown in fig

0 1 2 3 4 5 6 7 8 9

210 111

30 12

500

53

833 344

14 18

546

32

488

31

Bucket Addressing
Another solution to the hash collision problem is to store colliding elements in the same position in the table by introducing a bucket with each hash address. A bucket is a block of memory space, which is large enough to store multiple items. Next figure shows how hash collision can be avoided using buckets. If a bucket is full, then the colliding item can be stored in the new bucket by incorporating its link to previous bucket.

Bucket Addressing
0 1 2 3 4 5 6 7 8 9
K21 D21

K25
K28

D25
D28

K2

D2
K29 K30 K33 D29 D30 D33

K21 K25 K28

D21 D25 D28

K9

D9

Vous aimerez peut-être aussi