0% found this document useful (0 votes)
54 views57 pages

Understanding Hashing and Hash Tables

The document provides an overview of hashing, including its definition, hash functions, and hash tables. It explains various collision resolution techniques such as separate chaining and open addressing, along with different types of hash functions like division, multiplication, and universal hashing. Additionally, it discusses the concepts of load factor, overflow, and the efficiency of hash functions in data retrieval.

Uploaded by

hp1509032014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views57 pages

Understanding Hashing and Hash Tables

The document provides an overview of hashing, including its definition, hash functions, and hash tables. It explains various collision resolution techniques such as separate chaining and open addressing, along with different types of hash functions like division, multiplication, and universal hashing. Additionally, it discusses the concepts of load factor, overflow, and the efficiency of hash functions in data retrieval.

Uploaded by

hp1509032014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit-I

Hashing
Hashing
•Hashing is a process of indexing Students Data =[21,23,44,60,32]
and retrieving element(data) H(x)=(x%5)
to/from data structure to provide
faster way of finding the element 21%5=1
Index
using the hash key
0 1 2 3 4
•Here the hash key is a value which
provides the index value where the 60 21 32 23 44
actual data is likely to store in the
data structure. Hash Table

•Consider a hash function H(x) maps the value x at the index x%5 in an
Array. For example if the list of values is [21,32,23,44,65] it will be
stored at positions {1,2,3,4,5} in the array or Hash table respectively.
•The efficiency of mapping depends on the efficiency of the hash
function used.
Key Hashf(Key) Address

•A hash function is a mathematical function that converts a


numerical input value into another compressed numerical value.

•The input to the hash function is of arbitrary length but output is


always of fixed length.

•The address generated by hashing function is called as home


address.
Hash Function
• Hash function is a function used to place data in hash table.
• Similarly hash function is used to retrieve data from hash table.
• Thus the use of Hash function is to implement hash table.
• E.g. Consider hash function as key mod [Link] hash table of size 5.
• H(x)=key % 5, size of hash table=5
Step1: Step 3:

Insert 23
Insert 60, 60 mod 5=0
23 mod 5 =3 0 60

Hash function 1
2
Step2: Insert 44 3 23 Bucket
44 mod 5 = 4 4 44

Hash Table
Hash Table
• A hash table is an array-based structure used to store <key,
information> pairs.
• Hash Table is a data structure which stores data in an
associative manner.
• In a hash table, data is stored in an array format, where each
data value has its own unique index value.
• Access of data becomes very fast if we know the index of the
desired data.
• Thus, it becomes a data structure in which insertion and search
operations are very fast irrespective of the size of the data.
• Hash Table uses an array as a storage medium and uses hash
technique to generate an index where an element is to be
inserted or is to be located from.
Bucket
•The hash function H(Key) is used to map several entries in the hash
table. Each position of the hash table is called bucket.
•Bucket is an index position in hash table that can store more than one
record.
•When the same index is mapped with two keys, then both the records are
to be saved in the same bucket.
Collision
•Collision is situation in which hash function returns the same address for
more than one record.
25%5=0
0 25,55 If we want to insert 55 then 55
31%5=1 31 mod 5 =0
1 th
But at 0 location 25 is already
42%5=2
2 42 placed and now 55 is demanding
the same location. Hence we say
63%5=3 63
3 “collision occurs”
49%5=4 4 49 The result of two keys hashing into
35%5=0 the same address is collision
• Synonym : The set of keys that has to the same location are called synonyms.
– The keys those having the same address are called synonyms.
– For example – in below hash table computation 25 and 55 are synonyms.
• Overflow : When hash table becomes full and new record needs to be inserted
then it is called overflow.
–The result of more keys hashing to the same address and if there is no room in
the bucket, then it is said that overflow has occurred.
–Collision and overflow are synonymous when the bucket is of size 1

Probing : Probing is just a way of resolving a collision when hashing values into
buckets
25%5=0 25,55
0
31%5=1 31
1
42%5=2
2 42
63%5=3 63
3
49%5=4 4 49
35%5=0
• Load density: The maximum storage capacity that is
maximum no of records that can be accommodated is called as
Load Density
• Load factor: The number of records stored in table divided
by maximum capacity of table, expressed in terms of
percentage.
• Full table: Full table is the one in which all locations are
occupied
Perfect hash function
• A hash function is a mathematical function that converts a
numerical input value into another compressed numerical
value.
• The input to the hash function is of arbitrary length but output
is always of fixed length.
• The address generated by hashing function is called as home
address.

• The perfect hash function is a function that maps distinct


key elements into the hash table with no collisions.
A good hash function should
• Easy to Compute

• Minimize collisions.

• Be easy and quick to compute.

• Distribute key values evenly in the hash table.

• Use all the information provided in the key.


Types of Hash Functions

• Division Method
• Digit Extraction Method
• Multiplication Method
• Mid-Square Method
• Folding Method
• Universal Hashing
Division Method

Steps:
• Compute the index by dividing the key with
some value and use the reminder as index.
• This gives the indexes in the range from 0 to
m-1 so the hash table should be of size m.
• Good example of uniform hash function if
value of m will be chosen carefully.
• Generally a prime number is a best choice
which will spread keys evenly.
Division Method
• Example:
index:key mod table_size Key Hash function Buckets

1. Consider key=82,table 00
size=10 79 79%10=9 01
02 82
Index=82%10 03
68 68%10=8
Index=2
2. 79%10=9 08 68
3. 68%10=8 82 82%10=2 09 79
Multiplication Method

Steps:
• Choose any constant A
• Multiply key k by A
• Extract fractional part of k*A(this gives us a
number between 0 and 1)
• Multiply fractional part by m and take floor of the
multiplication(this transforms a number between 0
and 1, to a discrete number between 0 and m-1 that
we can map to slot in the hash table)0<A<1
Multiplication Method

Example:
1. Key k=1; m=8 slots
2. A=.61
3. KA=1*6.1=.61
0.61
0 1
4. Floor(fm)=Floor(.61*8)=Floor(4.88)=4
1
0 4 m-1=8-1=7
Multiplication Method

Example:
1. Key k=3; m=8 slots
2. A=.61
3. KA=3*6.1=1.83
1.83
0 1
4. Floor(fm)=Floor(.83*8)=Floor(6.64)=6
1 3
0 4 6
m-1=8-1=7
Extraction Method

Steps:
• Ignoring the part of the key and rest as the
array index
• The problem with this approach is that there
may not always be an even distribution
throughout the table
Extraction Method

Example:
1. If student id’s are the key such as 928324312
then select just the last three digits as the index
i.e. 312 as the index
Mid square Method

Steps:
• The key value is squared in this method and the
middle or mid part of the result is used as the index
into the table
• If this middle part is greater for the size of the
table, then it is divided modulo the size of the
table.
• In this method, the whole key participates in
generating the index for the table, thus, there is a
greater chance of finding a different address every
time per key value.
Mid square Method

Example:
• If the key value is 2199, the squared result is 4835601.
• The middle part of this result is 356, which will now be used
for indexing into the table.
• Say, if the size of the table is 1000, then h(2199)=356
• That is, the hash function using mid-square method will
produce 356 as the index value for key value 2199
• The advantage is that the result is not dominated by the
distribution of bottom digit or top digit of the original key
value.
• The problem with this method is choosing the middle part of
the result obtained by squaring the key value.
Mid square Method

Example:
• To resolve this issue , it is advisable to choose
the size of the table to be a power of 2, such as
2*r so that the middle r bits are chosen.
• In such cases, it is easier to choose the middle
part of the result using masking or shift
operation.
Folding Method

Steps:
• In this method, partition the key into several
pieces and then combine it in some convenient
way.
Folding Method

Example:
• The key k is partitioned into number of parts, each
of which has the same length as the required
address with the possible exception of the last part.
• The parts are then added together, ignoring the
final carry, to form an address.
• Consider key=356942781 is to be transformed into
a three digit address.P1=356,P2=942,P3=781 are
added
Universal Method

Steps:
• Universal hashing refers to selecting a hash function at
random from a family of hash function with a certain
mathematical property.
• This guarantees a low number of collisions in
expectation, even if the data is chosen by an adversary.
• Many universal families are known(for hashing
integers, vectors, strings) and their evaluation is often
very efficient.
• Universal hashing has numerous uses in computer
science, for example in implementations of hash tables,
randomized algorithms and cryptography
How to handle Collision?
Collision Resolution Techniques are the techniques used for resolving or handling the
collision.

Collision Resolution Technique

Separate Chaining Open Addressing


(Open Hashing) (Closed Hashing)
Linear Probing

Quadratic Probing

Double Hashing
Separate Chaining
(Open Hashing)

To handle the collision,


• This technique creates a linked list to the slot for which
collision occurs.
• The new key is then inserted in the linked list.
• These linked lists to the slots appear like chains.
• That is why, this technique is called as separate chaining.
Separate Chaining
(Open Hashing)
• The idea is to make each cell of hash table point to a linked list of
records that have same hash function value.
• Let us consider a simple hash function as “key mod 5” and sequence
of keys as 85, 92,64,50, 73, 100,67

85 85 50 0 85 50 100
0 0 85 0 0

1 1 1 1 1

2 2 2 2 2 92 67
92 92
3 3 3 3 3 73
4
4 4 4
64 4
64 64
Initial Empty Insert 85 Insert 92,64 Insert 50, Insert 73,
Table Collision 100,67
occurs, add to
chain
Separate Chaining
(Open Hashing)
• Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements
to the chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how
frequently keys may be inserted or deleted.
• Disadvantages:
1) Cache performance of chaining is not good as keys are
stored using a linked list. Open addressing provides better
cache performance as everything is stored in the same table.
2) Wastage of Space (Some Parts of hash table are never used)
3) If the chain becomes long, then search time can become
O(n) in the worst case.
4) Uses extra space for links.
Separate Chaining
(Open Hashing)
Time Complexity
Searching:
• In worst case, all the keys might map to the same bucket of the hash
table.
• In such a case, all the keys will be present in a single linked list.
• Sequential search will have to be performed on the linked list to
perform the search.
• So, time taken for searching in worst case is O(n).

Deletion:
• In worst case, the key might have to be searched first and then
deleted.
• In worst case, time taken for searching is O(n).
• So, time taken for deletion in worst case is O(n).
Separate Chaining
(Open Hashing)
Load Factor (α)-

Load factor (α) is defined as-

If Load factor (α) = constant, then time complexity of


Insert, Search, Delete = Θ(1)
Open Addressing
(Closed Hashing)
•This collision resolution technique requires a hash table with
fixed and known size.
•During insertion, if a collision is encountered, alternative
cells are tried until an empty bucket is found.
• These techniques require the size of the hash table to be
supposed larger than the number of objects to be stored.
•This type of hashing can be achieved by the following
methods:
[Link] Probing
[Link] probing
[Link] hashing
[Link] Probing
•The idea of linear probing is simple, we take a fixed
sized hash table and every time if we face a hash
collision we linearly traverse the table in a cyclic
manner to find the next empty slot.
•Assume a scenario where we intend to store the
following set of numbers = {0,1,2,4,5,7} into a hash
table of size 5 with the help of the following hash
function H, such that H(x) = x%5.
•So, if we were to map the given data with the given
hash function we'll get the corresponding values
[Link] Probing
•Set of numbers = {0,1,2,4,5,7}
•In this case we see a collision of two terms (0 & 5).
In this situation we move linearly down the table to
find the first empty slot. Note that this linear traversal
is cyclic in nature, i.e. in the event we exhaust the last
element during the search we start again from the
beginning until the initial key is reached.
H(0)-> 0%5 = 0
H(1)-> 1%5 = 1
H(2)-> 2%5 = 2
H(4)-> 4%5 = 4
H(5)-> 5%5 = 0
[Link] Probing
• In this case our hash function can be considered as
this:
H(x, i) = (H(x) + i)%N
where N is the size of the table and i represents the
linearly increasing variable which starts from 1
(until empty bucket is found).
[Link] Probing
Advantage-

• It is easy to compute.

Disadvantage-

• The main problem with linear probing is clustering.


• Many consecutive elements form groups.
• Then, it takes time to search an element or to find an
empty bucket.
[Link] Probing
Time Complexity-

• Worst time to search an element in linear probing is


O (table size).
• This is because-
– Even if there is only one element present and all other
elements are deleted.
– Then, “deleted” markers present in the hash table makes
search the entire table.
[Link] Probing
This method lies in the middle of great cache
performance and the problem of clustering.
In quadratic probing,
• When collision occurs, we probe for i2‘th bucket in
ith iteration.
• The general idea remains the same, the only
difference is that we look at the Q(i) increment at
each iteration when looking for an empty bucket,
where Q(i) is some quadratic expression of i.
• We keep probing until an empty bucket is found.
[Link] Probing
• In this case our hash function can be considered as
this:
H(x, i) = (H(x) + i^2)%N
where N is the size of the table and i represents the
linearly increasing variable which starts from 1
(until empty bucket is found).
• Assume a scenario where we intend to store the
following set of numbers = {0,1,2,5} into a hash
table of size 5 with the help of the following hash
function H, such that H(x, i) = (x%5 + i^2)%5.
[Link] Probing
• Set of numbers = {0,1,2,5}
• H(x, i) = (x%5 + i^2)%5.
• For 0,H(0,0)=(0%5+0*0)%5=0
• For 1,H(1,0)=(1%5+0*0)%5=1
• For 2,H(2,0)=(2%5+0*0)%5=2
• For 5,H(5,0)=(5%5+0*0)%5=0 …collision occur
• H(5,1)=(5%5+1*1)%5=1…. collision occur
• H(5,2)=(5%5+2*2)%5=4
[Link] Probing
Advantage-
• The cache performance in quadratic probing is
lower than the linear probing. Quadratic probing
also reduces the problem of
[Link] hashing
• This method is based upon the idea that in the event
of a collision we use an another hashing function
with the key value as an input to find where in the
open addressing scheme the data should actually be
placed at.
• In this case we use two hashing functions, such that
the final hashing function looks like:
H(x, i) = (H1(x) + i*H2(x))%N
[Link] hashing
• H(x, i) = (H1(x) + i*H2(x))%N
• Typically for H1(x) = x%N a good H2 is H2(x) = P -
(x%P), where P is a prime number smaller than N.
• A good H2 is a function which never evaluates to zero
and ensures that all the cells of a table are effectively
traversed.
• Assume a scenario where we intend to store the
following set of numbers = {0,1,2,5} into a hash table
of size 5 with the help of the following hash function H,
such that
H(x, i) = (H1(x) + i*H2(x))%5
H1(x) = x%5 and H2(x) = P - (x%P), where P = 3
(3 is a prime smaller than 5)
[Link] hashing
• H(x, i) = (H1(x) + i*H2(x))%N->H(x, i) = (H1(x) +
i*H2(x))%5
H1(x) = x%5 and H2(x) = P - (x%P), where P = 3
(3 is a prime smaller than 5)
• Set of numbers = {0,1,2,5}
• For 0, ((0%5)+0*(H2))%5=0
• For 1, ((1%5)+0*(H2))%5=1
• For 2, ((2%5)+0*(H2))%5=2
• For 5, ((5%5)+0*(H2))%5=0………… collision occur
• ((5%5)+1*(3-(5%3)))%5=1………. collision occur
• ((5%5)+2*(3-(5%3)))%5=2………. collision occur
• ((5%5)+3*(3-(5%3)))%5=3
Skip List
• A skip list is a probabilistic data structure. The skip list
is used to store a sorted list of elements or data with a
linked list.
• It allows the process of the elements or data to view
efficiently.
• In one single step, it skips several elements of the
entire list, which is why it is known as a skip list.
• The skip list is an extended version of the linked list. It
allows the user to search, remove, and insert the
element very quickly.
• It consists of a base list that includes a set of elements
which maintains the link hierarchy of the subsequent
elements.
Structure of Skip List
• It is built in two layers: The lowest layer and
Top layer.
• The lowest layer of the skip list is a common
sorted linked list, and the top layers of the skip
list are like an "express line" where the
elements are skipped.
Working of the Skip list
• Let's take an example to understand the working of the skip
list. In this example, we have 14 nodes, such that these nodes
are divided into two layers, as shown in the diagram.
• The lower layer is a common line that links all nodes, and the
top layer is an express line that links only the main nodes, as
you can see in the diagram.
• Suppose you want to find 47 in this example. You will start the
search from the first node of the express line and continue
running on the express line until you find a node that is equal a
47 or more than 47.
• You can see in the example that 47 does not exist in the
express line, so you search for a node of less than 47, which is
40. Now, you go to the normal line with the help of 40, and
search the 47, as shown in the diagram.

Working of the Skip list
• Suppose you want to find 47 in this example. You will start the
search from the first node of the express line and continue
running on the express line until you find a node that is equal a
47 or more than 47.
• You can see in the example that 47 does not exist in the
express line, so you search for a node of less than 47, which is
40. Now, you go to the normal line with the help of 40, and
search the 47, as shown in the diagram.

Basic operation on Skip list
• Suppose you want to find 47 in this example. You will start the
search from the first node of the express line and continue
running on the express line until you find a node that is equal a
47 or more than 47.
• You can see in the example that 47 does not exist in the
express line, so you search for a node of less than 47, which is
40. Now, you go to the normal line with the help of 40, and
search the 47, as shown in the diagram.

Example of Skip list
• Create a skip list, we want to insert these
following keys in the empty skip list.
• 6 with level 1.
• 29 with level 1.
• 22 with level 4.
• 9 with level 3.
• 17 with level 1.
• 4 with level 2.
Example of Skip list
• Create a skip list, we want to insert these
following keys in the empty skip list.
• Step 1: Insert 6 with level 1
Example of Skip list
• Create a skip list, we want to insert these
following keys in the empty skip list.
• Step 2: Insert 29 with level 1
Example of Skip list
• Create a skip list, we want to insert these
following keys in the empty skip list.
• Step 3: Insert 22 with level 4
Example of Skip list
• Create a skip list, we want to insert these
following keys in the empty skip list.
• Step 4: Insert 9 with level 3
Example of Skip list
• Create a skip list, we want to insert these
following keys in the empty skip list.
• Step 5: Insert 17 with level 1
Example of Skip list
• Consider this example where we want to search for
key 17.
Example of Skip list
• Consider this example where we want to search for
key 17.
Skip list
Advantages:
• If you want to insert a new node in the skip list, then it will insert the
node very fast because there are no rotations in the skip list.
• The skip list is simple to implement as compared to the hash table
and the binary search tree.
• It is very simple to find a node in the list because it stores the nodes
in sorted form.
• The skip list algorithm can be modified very easily in a more
specific structure, such as indexable skip lists, trees, or priority
queues.
• The skip list is a robust and reliable list.
Disadvantages:
• It requires more memory than the balanced tree.
• Reverse searching is not allowed.
• The skip list searches the node much slower than the linked list.

You might also like