HAshing (Satish sir)
HAshing (Satish sir)
2
Contents Cond..
• Problem with Hashing: • What is Rehashing?
What is collision? • Applications of Hash Data structure
How to handle Collisions?
• 1) Separate Chaining:
• 2) Open Addressing:
• 2.a) Linear Probing:
• 2.b) Quadratic Probing:
• 2.c) Double Hashing:
3
What is Hashing
5
Components of Hashing
6
How does Hashing work?
• Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to
store it in a table.
• Let the string itself will act as the value of the string but how to store
the value corresponding to the key?
• Step 1: We know that hash functions (which is some mathematical
formula) are used to calculate the hash value which acts as the index
of the data structure where the value will be stored.
• Step 2: So, let’s assign
“a” = 1, “b”=2, .. etc, to all alphabetical characters.
7
How does Hashing work?
• Step 3: Therefore, the numerical value by summation of all characters
of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
• Step 4: Now, assume that we have a table of size 7 to store these strings.
The hash function that is used here is the sum of the characters in key
mod Table size. We can compute the location of the string in the array
by taking the sum(string) mod 7.
8
How does Hashing work?
• Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.
9
What is a Hash function?
• The hash function creates a mapping between key and value, this is done
through the use of mathematical formulas known as hash functions. The
result of the hash function is referred to as a hash value or hash.
• Types of Hash functions
Division Method.
Mid Square Method.
Folding Method.
Multiplication Method.
10
Division Method
h(K) = k mod M
Here, k is the key value, and
M is the size of the hash table.
Ex: k = 12345
M = 95
h(12345) = 12345 mod 95 = 90
k = 1276
M = 11
h(1276) = 1276 mod 11 = 0
11
Mid Square Method
h(K) = h(k x k)
Here, k is the key value.
• Ex: k = 60
• k x k = 60 x 60 = 3600
• h(60) = 60, The hash value obtained is 60
12
Digit Folding Method
14
Multiplication Method
15
Multiplication Method
16
Multiplication Method
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
17
Properties of a Good hash function
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position is equally
likely for each).
3. Should minimize collisions.
4. Should have a low load factor(number of items in the table divided by
the size of the table).
18
Problem with Hashing
• If we consider the above example, the hash function we used is the sum
of the letters, but if we examined the hash function closely then the
problem can be easily visualized that for different strings same hash
value is begin generated by the hash function.
• For example: {“ab”, “ba”} both have the same hash value, and string
{“cd”,”be”} also generate the same hash value, etc. This is known as
collision and it creates problem in searching, insertion, deletion, and
updating of value.
19
What is collision?
• The hashing process generates a small number for a big key, so there is a
possibility that two keys could produce the same value. The situation
where the newly inserted key maps to an already occupied, and it must
be handled using some collision handling technology.
20
21
How to handle Collisions?
22
Separate Chaining
• The idea is to make each cell of the hash table point to a linked list of
records that have the same hash function value. Chaining is simple but
requires additional memory outside the table.
• Example: We have given a hash function and we have to insert some
elements in the hash table using a separate chaining method for
collision resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
23
Separate Chaining
24
Separate Chaining
25
Separate Chaining
26
Separate Chaining
27
Separate Chaining
28
Open Addressing
• In open addressing, all elements are stored in the hash table itself.
Each table entry contains either a record or NIL.
• When searching for an element, we examine the table slots one by one
until the desired element is found or it is clear that the element is not
in the table.
29
Linear Probing
• In linear probing, the hash table is searched sequentially that starts
from the original location of the hash. If in case the location that we
get is already occupied, then we check for the next location.
30
Algorithm
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
• store the value directly by hashTable[key] = data
3. If the hash index already has some value then
• check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store the
value. Otherwise try for next index.
5. Do the above process till we find the space.
31
Linear Probing
32
Linear Probing
Sequence of keys that are to be inserted are 50, 70, 76, 85, 93.
33
Quadratic Probing
• Quadratic probing is an open addressing scheme in computer
programming for resolving hash collisions in hash tables.
35
Method
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7
and collision resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50
37
Double Hashing
Double hashing is a collision resolving technique in Open
Addressed Hash tables. Double hashing make use of two hash function,
• The first hash function is h1(k) which takes the key and gives out a
location on the hash table. But if the new location is not occupied or
empty then we can easily place our key.
• But in case the location is occupied (collision) we will use secondary
hash-function h2(k) in combination with the first hash-function h1(k) to
find the new location on the hash table.
38
Double Hashing
This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n
where
• i is a non-negative integer that indicates a collision number,
• k = element/key which is being hashed
• n = hash table size.
39
Double Hashing
This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n
where
• i is a non-negative integer that indicates a collision number,
• k = element/key which is being hashed
• n = hash table size.
40
Double Hashing
Example: Insert the keys 27, 43, 692,
72 into the Hash Table of size 7.
where first hash-function is h1(k) =
k mod 7 and second hash-function
is h2(k) = 1 + (k mod 5)
Step 1: Insert 27
• 27 % 7 = 6, location 6 is empty
so insert 27 into 6 slot.
41
Double Hashing
• Step 2: Insert 43
• 43 % 7 = 1, location 1 is empty
so insert 43 into 1 slot.
42
Double Hashing
43
Double Hashing
44
Double Hashing
45
Double Hashing
46
What is Rehashing?
47
Applications of Hash Data structure
48
Real-Time Applications of Hash Data structure
• Hash is used for cache mapping for fast access to the data.
• Hash can be used for password verification.
• Hash is used in cryptography as a message digest.
• Rabin-Karp algorithm for pattern matching in a string.
• Calculating the number of different substrings of a string.
49
Advantages of Hash Data structure
50
Disadvantages of Hash Data structure
51
Thank You