0% found this document useful (0 votes)
20 views

HAshing (Satish sir)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

HAshing (Satish sir)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Hashing

Dr. Satish Kumar T


Data Structures and Application
Contents
• What is Hashing
• Components of Hashing
• How does Hashing work?
• What is a Hash function?
Types of Hash functions
Properties of a Good hash function

2
Contents Cond..
• Problem with Hashing: • What is Rehashing?
What is collision? • Applications of Hash Data structure
How to handle Collisions?
• 1) Separate Chaining:
• 2) Open Addressing:
• 2.a) Linear Probing:
• 2.b) Quadratic Probing:
• 2.c) Double Hashing:

3
What is Hashing

• Hashing refers to the process of generating a fixed-size output from an


input of variable size using the mathematical formulas known as hash
functions. This technique determines an index or location for the storage
of an item in a data structure.
• Hashing is a technique of mapping a large chunk of data into small tables
using a hashing function. It is also known as the message digest function.
It is a technique that uniquely identifies a specific item from a collection
of similar items.
4
Components of Hashing
1. Key: A Key can be anything string or integer which is fed as input in the
hash function the technique that determines an index or location for
storage of an item in a data structure.
2. Hash Function: The hash function receives the input key and returns the
index of an element in an array called a hash table. The index is known
as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to values
using a special function called a hash function. Hash stores the data in
an associative manner in an array where each data value has its own
unique index.

5
Components of Hashing

6
How does Hashing work?
• Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to
store it in a table.
• Let the string itself will act as the value of the string but how to store
the value corresponding to the key?
• Step 1: We know that hash functions (which is some mathematical
formula) are used to calculate the hash value which acts as the index
of the data structure where the value will be stored.
• Step 2: So, let’s assign
“a” = 1, “b”=2, .. etc, to all alphabetical characters.
7
How does Hashing work?
• Step 3: Therefore, the numerical value by summation of all characters
of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
• Step 4: Now, assume that we have a table of size 7 to store these strings.
The hash function that is used here is the sum of the characters in key
mod Table size. We can compute the location of the string in the array
by taking the sum(string) mod 7.
8
How does Hashing work?
• Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.

Mapping key with indices of array

9
What is a Hash function?

• The hash function creates a mapping between key and value, this is done
through the use of mathematical formulas known as hash functions. The
result of the hash function is referred to as a hash value or hash.
• Types of Hash functions
 Division Method.
 Mid Square Method.
 Folding Method.
 Multiplication Method.

10
Division Method

h(K) = k mod M
Here, k is the key value, and
M is the size of the hash table.

Ex: k = 12345
M = 95
h(12345) = 12345 mod 95 = 90
k = 1276

M = 11
h(1276) = 1276 mod 11 = 0
11
Mid Square Method

It involves two steps to compute the hash value:


• Square the value of the key k i.e. k2
• Extract the middle r digits as the hash value.

h(K) = h(k x k)
Here, k is the key value.
• Ex: k = 60
• k x k = 60 x 60 = 3600
• h(60) = 60, The hash value obtained is 60
12
Digit Folding Method

It involves two steps to compute the hash value:


• Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where
each part has the same number of digits except for the last part that can
have lesser digits than the other parts.
• Add the individual parts. The hash value is obtained by ignoring the last
carry if any.

k = k1, k2, k3, k4, ….., kn


s = k1+ k2 + k3 + k4 +….+ kn, h(K)= s
Here, s is obtained by adding the parts of the key k
13
Digit Folding Method

It involves two steps to compute the hash value:


k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51

14
Multiplication Method

This method involves the following steps


1. Choose a constant value A such that 0 < A < 1.
2. Multiply the key value with A.
3. Extract the fractional part of kA.
4. Multiply the result of the above step by the size of the hash table i.e. M.
5. The resulting hash value is obtained by taking the floor of the result
obtained in step 4.

15
Multiplication Method

h(K) = floor (M (kA mod 1))


Here,
M is the size of the hash table.
k is the key value.
A is a constant value.

16
Multiplication Method

Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
17
Properties of a Good hash function

1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position is equally
likely for each).
3. Should minimize collisions.
4. Should have a low load factor(number of items in the table divided by
the size of the table).

18
Problem with Hashing

• If we consider the above example, the hash function we used is the sum
of the letters, but if we examined the hash function closely then the
problem can be easily visualized that for different strings same hash
value is begin generated by the hash function.

• For example: {“ab”, “ba”} both have the same hash value, and string
{“cd”,”be”} also generate the same hash value, etc. This is known as
collision and it creates problem in searching, insertion, deletion, and
updating of value.
19
What is collision?

• The hashing process generates a small number for a big key, so there is a
possibility that two keys could produce the same value. The situation
where the newly inserted key maps to an already occupied, and it must
be handled using some collision handling technology.

20
21
How to handle Collisions?

• There are mainly two methods to handle collision:


 Separate Chaining
 Open Addressing

22
Separate Chaining

• The idea is to make each cell of the hash table point to a linked list of
records that have the same hash function value. Chaining is simple but
requires additional memory outside the table.
• Example: We have given a hash function and we have to insert some
elements in the hash table using a separate chaining method for
collision resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.

23
Separate Chaining

Hash function = key % 5,


Elements = 12, 15, 22, 25, 37.

24
Separate Chaining

Hash function = key % 5,


Elements = 12, 15, 22, 25, 37.

25
Separate Chaining

Hash function = key % 5,


Elements = 12, 15, 22, 25, 37.

26
Separate Chaining

Hash function = key % 5,


Elements = 12, 15, 22, 25, 37.

27
Separate Chaining

Hash function = key % 5,


Elements = 12, 15, 22, 25, 37.

28
Open Addressing

• In open addressing, all elements are stored in the hash table itself.
Each table entry contains either a record or NIL.

• When searching for an element, we examine the table slots one by one
until the desired element is found or it is clear that the element is not
in the table.

29
Linear Probing
• In linear probing, the hash table is searched sequentially that starts
from the original location of the hash. If in case the location that we
get is already occupied, then we check for the next location.

30
Algorithm
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
• store the value directly by hashTable[key] = data
3. If the hash index already has some value then
• check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store the
value. Otherwise try for next index.
5. Do the above process till we find the space.

31
Linear Probing

Example: Let us consider a


simple hash function as “key
mod 5” and a sequence of keys
that are to be inserted are 50,
70, 76, 85, 93.

32
Linear Probing

Sequence of keys that are to be inserted are 50, 70, 76, 85, 93.

33
Quadratic Probing
• Quadratic probing is an open addressing scheme in computer
programming for resolving hash collisions in hash tables.

• Quadratic probing operates by taking the original hash index and


adding successive values of an arbitrary quadratic polynomial until an
open slot is found.

• An example sequence using quadratic probing is:

H + 12, H + 22, H + 32, H + 42…………………. H + k2


34
Method
Let hash(x) be the slot index computed using the hash function and n be
the size of the hash table.
• If the slot hash(x) % n is full, then we try (hash(x) + 12) % n.
• If (hash(x) + 12) % n is also full, then we try (hash(x) + 22) % n.
• If (hash(x) + 22) % n is also full, then we try (hash(x) + 32) % n.
• This process will be repeated for all the values of i until an empty slot
is found

35
Method
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7
and collision resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50

Insert key 22 and 30 in the hash table


36
Method
Inserting 50
• Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already
occupied. So, we will search for slot
1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we
will search for cell 1+22, i.e.1+4 = 5,
• Now, cell 5 is not occupied so we will
place 50 in slot 5.

37
Double Hashing
Double hashing is a collision resolving technique in Open
Addressed Hash tables. Double hashing make use of two hash function,
• The first hash function is h1(k) which takes the key and gives out a
location on the hash table. But if the new location is not occupied or
empty then we can easily place our key.
• But in case the location is occupied (collision) we will use secondary
hash-function h2(k) in combination with the first hash-function h1(k) to
find the new location on the hash table.

38
Double Hashing
This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n
where
• i is a non-negative integer that indicates a collision number,
• k = element/key which is being hashed
• n = hash table size.

39
Double Hashing
This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n
where
• i is a non-negative integer that indicates a collision number,
• k = element/key which is being hashed
• n = hash table size.

40
Double Hashing
Example: Insert the keys 27, 43, 692,
72 into the Hash Table of size 7.
where first hash-function is h1​(k) =
k mod 7 and second hash-function
is h2(k) = 1 + (k mod 5)
Step 1: Insert 27
• 27 % 7 = 6, location 6 is empty
so insert 27 into 6 slot.

41
Double Hashing
• Step 2: Insert 43
• 43 % 7 = 1, location 1 is empty
so insert 43 into 1 slot.

42
Double Hashing

Step 3: Insert 692 hnew = [h1(692) + i * (h2(692)] % 7


• 692 % 7 = 6, but location 6 is = [6 + 1 * (1 + 692 % 5)] % 7
already being occupied and = 9 % 7
this is a collision. =2
• So we need to resolve this
collision using double hashing. Now, as 2 is an empty slot,

so we can insert 692 into 2nd slot.

43
Double Hashing

44
Double Hashing

Step 4: Insert 72 hnew = [h1(72) + i * (h2(72)] % 7


• 72 % 7 = 2, but location 2 is = [2 + 1 * (1 + 72 % 5)] % 7
=5%7
already being occupied and this = 5,
is a collision.
Now, as 5 is an empty slot,
• So we need to resolve this so we can insert 72 into 5th slot.
collision using double hashing.

45
Double Hashing

46
What is Rehashing?

• As the name suggests, rehashing means hashing again.


• Basically, when the load factor increases to more than its predefined
value (the default value of the load factor is 0.75), the complexity
increases.
• So to overcome this, the size of the array is increased (doubled) and all
the values are hashed again and stored in the new double-sized array to
maintain a low load factor and low complexity.

47
Applications of Hash Data structure

• Hash is used in databases for indexing.


• Hash is used in disk-based data structures.
• In some programming languages like Python, JavaScript hash is used
to implement objects.

48
Real-Time Applications of Hash Data structure

• Hash is used for cache mapping for fast access to the data.
• Hash can be used for password verification.
• Hash is used in cryptography as a message digest.
• Rabin-Karp algorithm for pattern matching in a string.
• Calculating the number of different substrings of a string.

49
Advantages of Hash Data structure

• Hash provides better synchronization than other data structures.


• Hash tables are more efficient than search trees or other data
structures
• Hash provides constant time for searching, insertion, and deletion
operations on average.

50
Disadvantages of Hash Data structure

• Hash is inefficient when there are many collisions.


• Hash collisions are practically not avoided for a large set of possible
keys.
• Hash does not allow null values.

51
Thank You

You might also like