0% found this document useful (0 votes)

31 views48 pages

Lecture 27 - Hashing

Uploaded by

sans42699

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views48 pages

Lecture 27 - Hashing

Uploaded by

sans42699

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Hashing

• Tables

• Direct address tables

• Hash tables

• Collision and collision resolution

• Chaining
Introduction

• Many applications require a dynamic set that supports dictionary

operations.

• Example: a compiler maintaining a symbol table where keys

correspond to identifiers

• Hash table is a good data structure for implementing dictionary

operations

• Although searching can take as long as a linked list

implementation i.e. O(n) in worst case.
Introduction
• With reasonable assumptions it can take O(1) time.
• In practice hashing performs extremely well.
• A hash table is a generalization of an ordinary array where
direct addressing takes O(1) time.
• When the actual keys is NOT small relative to the total
number of keys, hashing is an effective alternative.
• A key can be accessed using an array index, or is
computed.
What are Tables?

• Table is an abstract storage device that contains table

entries

• Each table entry contains a unique key k.

• Each table entry may also contain some information, I,

associated with its key.

• A table entry is an ordered pair (K, I)

Direct Addressing

• Suppose:
• The range of keys is 0..m-1
• Keys are distinct
• The idea:
• Set up an array T[0..m-1] in which
• T[i] = x if x T and key[x] = i
• T[i] = NULL otherwise
• This is called a direct-address table
• Operations take O(1) time!
8

Direct Addressing
Advantages with Direct Addressing

• Direct Addressing is the most efficient way to access

the data since.

• It takes only single step for any operation on direct

address table.

• It works well when the Universe U of keys is

reasonable small.
Difficulty with Direct Addressing

When the universe U is very large…

• Storing a table T of size U may be impractical, given

the memory available on a typical computer.

• The set K of the keys actually stored may be so small

relative to U that most of the space allocated for T
would be wasted.
An Example
• A table, 50 students in a class.
• The key, 9-digit SSN, used to identify each student.
• Number of different 9-digit number=109
• The fraction of actual keys needed. 50/109, 0.000005%

• Percent of the memory allocated for table wasted, 99.999995%

An ideal table needed!

• The table should be of small fixed size.

• Any key in the universe should be able to be mapped in the

slot into table, using some mapping function
Hash Tables

• Definition: the ideal table data structure is merely an array of

some fixed size, containing the elements.

• Consist : an array and a mapping function (known as hash

function)

• Used for performing insertion, deletion and lookup on average in

constant time.
14

Hash Tables
Compared to direct addressing
• Advantage: Requires less storage and runs in O(1) time.
• Comparison

Storage Space Storing k

Direct |U| Store in slot k

Addressing
Hashing m Store in slot h(k)
16

Collision
Resolving Collisions

• How can we solve the problem of collisions?

• Solution 1: Chaining

• Solution 2: Open addressing

Chaining!

• Put all the elements that hash to same slot in a linked

list.

• Worst case : All n keys hash to the same slot resulting

in a linked list of length n, running time: O(n)

• Best and Average time: O(1)

Collision by Chaining
21
22
Analysis of Chaining

• Assume simple uniform hashing: each key in table is

equally likely to be hashed to any slot

• Given n keys and m slots in the table: the load factor 

= n/m = average # keys per slot

• What will be the average cost of an unsuccessful search for a key?

O(1+ )
What will be the average cost of a successful search?
A: O(1 + /2) = O(1 + )
Analysis of Chaining Continued

• So the cost of searching = O(1 + )

• If the number of keys n is proportional to the number of slots
in the table, what is ?
•  = O(1)
• In other words, we can make the expected cost of searching
constant if we make  constant
Hash Tables

• Nature of keys

• Hash functions

• Division method

• Multiplication method

• Open Addressing (Linear and Quadratic probing, Double

hashing)
Nature of Keys
• Most hash functions assume that universe of keys
is the set N = {0, 1, 2,…} of natural numbers

• If keys are not N, ways to be found to interpret

them as N

• A character key can be interpreted as an integer

expressed in suitable Radix notation.
Nature of Keys
• Example: The identifier pt might be interpreted as
a pair of decimal integers (112, 116) as p = 112 and
t = 116 in ASCII notation. What is the problem?

• Using a product/addition of ASCII codes is

indifferent to the order of characters

• Solution: Using 128-radix notation this becomes

(112.128) + 116 = 14,452
What is a Hash function?

A hash function is a mapping between a set of input

values (Keys) and a set of integers, known as hash
values.

Hash
function

Keys Hash values

The properties of a good hash function

• Rule1: The hash value is fully determined by the data being hashed.

• Rule2: The hash function uses all the input data.

• Rule3: The hash function uniformly distributes the data across the entire
set of possible hash values.

• Rule4: The hash function generates very different hash values for similar
strings.
An example of a hash function
int hash(char *str, int table_size)
{
int sum=0;
//sum up all the characters in the string
for(;*str; str++) sum+=*str
//return sum mod table_size
return sum%table_size;
}
Analysis of example

• Rule1: Satisfies, the hash value is fully determined

by the data being hashed, the hash value is just the
sum of all input characters.

• Rule2: Satisfies, Every character is summed.

Analysis of example (contd.)

• Rule3: Breaks, from looking at it, it is not obvious that it doesn’t

uniformly distribute the strings, but if you were to analyze this function
for larger input string, you will see certain statistical properties which are
bad for a hash function.

• Rule4: Breaks, hash the string “CAT”, now hash the string “ACT”, they
are the same, a slight variation in the string should result in different hash
values, but with this function often they don’t.
Methods to create hash functions

• Division method

• Multiplication method
Division method

The division method requires two steps.

1. The key must be transformed into an integer.

2. The value must be telescoped into range 0 to m-1

Division method…

• We map a key k into one of the m slots by taking the

remainder of k divided by m, so the hash function is of
form
h(k)= k mod m
• For example , if m=12, key is 100 then h(k)=100 mod
12= 4.

• Advantage?
Restrictions on value of m

M should not be a Key Binary K mod 8

power of 2, since if 8 1000 0
m=2p then h(k) is just 7 111 7
the p lowest order bits 12 1100 4
of k. 34 100010 2
56 111000 0
Disadvantage! 78 1001110 6
90 1011010 2
23 10111 7
45 101101 5
67 1000011 3
Restrictions on value of m

• Unless it is known that probability distribution

on keys makes all lower order p-bit patterns
equally likely,

• It is better to make the hash function dependent

on all the bits of the key.
Good value of m

• Power of 10 should be avoided, if application deals

with decimal numbers as keys.

• Good values of m are primes not close to the exact

powers of 2 (or 10).
Multiplication method

• Using a random real number f in the range [0,1).

• The fractional part of the product f*key yields a number in

the range 0 to 1.

• When this number is multiplied by m (hash table size), the

integer portion of the product gives the hash value in the
range 0 to m-1
More on multiplication method
• Choose m = 2P
• For a constant A, 0 < A < 1:
• h(k) =  m (kA - kA) 
• Value of A should not be close to 0 or 1
• Knuth says good value of A is 0.618033
• If k=123456, m=10000,and A as above
h(k)= 10000.(123456*A- 123456*A)
= 10000.0.0041151
=41
Hashing with Open Addressing

• So far we have studied hashing with chaining, using a linked-

list to store keys that hash to the same location.
• Maintaining linked lists involves using pointers which is
complex and inefficient in both storage and time
requirements.
• Another option is to store all the keys directly in the table.
This is known as open addressing, where collisions are
resolved by systematically examining other table indexes, i 0 ,
i 1 , i 2 , … until an empty slot is located.
Open addressing

• Another approach for collision resolution.

• All elements are stored in the hash table itself (so no pointers
involved as in chaining).

• To insert: if slot is full, try another slot, and another, until an

open slot is found (probing)

• To search, follow same sequence of probes as would be used

when inserting the element
Open Addressing

• The key is first mapped to a slot:

index = i 0 = h1 (k )
• If there is a collision subsequent probes are performed:
i j +1 = (i j + c ) mod m for j  0
• If the offset constant, c and m are not relatively prime, we will not
examine all the cells. Ex.:
• Consider m=4 and c=2, then only every other slot is checked.
When c=1 the collision resolution is done as a linear search. This
is known as linear probing.

0 1 2 3
Insertion in hash table
HASH_INSERT(T,k)
1 i0
2 repeat j  h(k,i)
3 if T[j] = NIL
4 then T[j] = k
5 return j
6 else i  i +1
7 until i = m
8 error “ hash table overflow”
Searching from Hash table
HASH_SEARCH(T,k)
1 i0
2 repeat j  h(k,i)
3 if T[j] = k
4 then return j
5 i  i +1
6 until T[j] = NIL or i = m
7 return NIL
• Worst case for inserting a key is (n)
• Worst case for searching is (n)
• Algorithm assumes that keys are not deleted once they are
inserted
• Deleting a key from an open addressing table is difficult,
instead we can mark them in the table as removed
(introduced a new class of entries, full, empty and
removed)
Clustering
• Even with a good hash function, linear probing has its problems:
• The position of the initial mapping i 0 of key k is called the home
position of k.
• When several insertions map to the same home position, they end
up placed contiguously in the table. This collection of keys with
the same home position is called a cluster.
• As clusters grow, the probability that a key will map to the middle
of a cluster increases, increasing the rate of the cluster’s growth.
This tendency of linear probing to place items together is known as
primary clustering.
• As these clusters grow, they merge with other clusters forming even
bigger clusters which grow even faster.
Quadratic probing
h(k,i) = (h’(k) + c1i + c2i 2) mod m for i = 0,1,…,m − 1.
• Leads to a secondary clustering (milder form of clustering)
• The clustering effect can be improved by increasing the order to the
probing function (cubic). However the hash function becomes more
expensive to compute
• But again for two keys k1 and k2, if h(k1,0)= h(k2,0) implies that
h(k1,i)= h(k2,i)
Double Hashing
• Recall that in open addressing the sequence of probes follows

i j +1 = (i j + c ) mod m for j 0

• We can solve the problem of primary clustering in linear probing by having the keys
which map to the same home position use differing probe sequences. In other words, the
different values for c should be used for different keys.
• Double hashing refers to the scheme of using another hash function for c

i j +1 = (i j + h2 (k )) mod m for j 0 and 0  h2 (k )  m − 1

Example – Double hashing

CH 4
No ratings yet
CH 4
58 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hash Tables
No ratings yet
Hash Tables
55 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
51 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing
No ratings yet
Hashing
16 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Hashing
No ratings yet
Hashing
30 pages
Understanding Hashing Techniques and Methods
No ratings yet
Understanding Hashing Techniques and Methods
33 pages
Hashing Techniques Done
No ratings yet
Hashing Techniques Done
53 pages
UNIT 8 Hashing
No ratings yet
UNIT 8 Hashing
24 pages
Unit2 Hashing DSA
No ratings yet
Unit2 Hashing DSA
55 pages
Understanding Hashing Techniques
No ratings yet
Understanding Hashing Techniques
47 pages
Primary Clustering in Hashing
No ratings yet
Primary Clustering in Hashing
61 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
53 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Hashing
No ratings yet
Hashing
34 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Study Material On Hashing
No ratings yet
Study Material On Hashing
4 pages
L-2005-08-Advance Data Structure Part 1-HS
No ratings yet
L-2005-08-Advance Data Structure Part 1-HS
46 pages
Hash Tables and Collision Resolution
No ratings yet
Hash Tables and Collision Resolution
47 pages
Idst 2016 SA 05 Hashing
No ratings yet
Idst 2016 SA 05 Hashing
68 pages
Hashing Techniques and Applications
No ratings yet
Hashing Techniques and Applications
44 pages
Hashing
No ratings yet
Hashing
30 pages
Hash Tables: Concepts & Implementations
No ratings yet
Hash Tables: Concepts & Implementations
53 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
32 pages
Lab 3
No ratings yet
Lab 3
5 pages
Hashing
No ratings yet
Hashing
33 pages
What Is Hashing
No ratings yet
What Is Hashing
11 pages
Hashing Techniques for CS Students
No ratings yet
Hashing Techniques for CS Students
25 pages
Hash Table Search Complexity Explained
No ratings yet
Hash Table Search Complexity Explained
43 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
43 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Hash Tables and Collision Handling Techniques
No ratings yet
Hash Tables and Collision Handling Techniques
25 pages
Hashing
No ratings yet
Hashing
25 pages
Hash Table
No ratings yet
Hash Table
9 pages
08 Hashing
No ratings yet
08 Hashing
26 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hash Tables for Computer Science Students
No ratings yet
Hash Tables for Computer Science Students
20 pages
4
No ratings yet
4
29 pages
DS Module 5 Hashing
No ratings yet
DS Module 5 Hashing
23 pages
Hashing
No ratings yet
Hashing
42 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
23 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Self Unit I
No ratings yet
Self Unit I
57 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Hashing and Indexing Techniques Explained
No ratings yet
Hashing and Indexing Techniques Explained
28 pages
Lecture 6
No ratings yet
Lecture 6
57 pages
Hash Tables: Concepts and Applications
No ratings yet
Hash Tables: Concepts and Applications
15 pages
6 - Hashing
No ratings yet
6 - Hashing
52 pages
Introduction to Hashing Techniques
No ratings yet
Introduction to Hashing Techniques
65 pages
9A Hash Tables
No ratings yet
9A Hash Tables
7 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
No ratings yet
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
77 pages
SDA Lect15 Fall2024 Strategy and Observer Pattern
No ratings yet
SDA Lect15 Fall2024 Strategy and Observer Pattern
37 pages
SDA Lect14 Fall2024 PV, Factory, Singleton
No ratings yet
SDA Lect14 Fall2024 PV, Factory, Singleton
38 pages
Marking Scheme
No ratings yet
Marking Scheme
1 page
Excel Functions and VLOOKUP Guide
No ratings yet
Excel Functions and VLOOKUP Guide
21 pages
Microsoft Access Database Guide
No ratings yet
Microsoft Access Database Guide
52 pages
Commands
No ratings yet
Commands
10 pages
C++ File Handling with ifstream and ofstream
No ratings yet
C++ File Handling with ifstream and ofstream
6 pages
Algorithm Efficiency Explained
No ratings yet
Algorithm Efficiency Explained
6 pages
Beginner's Rubik's Cube Solution Guide
No ratings yet
Beginner's Rubik's Cube Solution Guide
4 pages
Data Structures: AKTU Unit 5 Questions
No ratings yet
Data Structures: AKTU Unit 5 Questions
43 pages
Red Red: Black Tree Black Tree
No ratings yet
Red Red: Black Tree Black Tree
37 pages
Lecture 11
No ratings yet
Lecture 11
37 pages
MADChap3 Algorithms
No ratings yet
MADChap3 Algorithms
60 pages
Improved ACO for Hybrid Flow Shop
No ratings yet
Improved ACO for Hybrid Flow Shop
10 pages
Design and Analysis of Algorithms Lab
No ratings yet
Design and Analysis of Algorithms Lab
24 pages
Space and Time Trade-Off - PPT
No ratings yet
Space and Time Trade-Off - PPT
29 pages
Data Structures and Algorithms Notes
No ratings yet
Data Structures and Algorithms Notes
13 pages
Ga Problem in ML
No ratings yet
Ga Problem in ML
8 pages
SHA-256 Algorithm Steps Explained
No ratings yet
SHA-256 Algorithm Steps Explained
5 pages
M4 - Chapter 8 1
No ratings yet
M4 - Chapter 8 1
15 pages
Cs3491 - Aiml - Unit III - Maximum Margin Classifier
No ratings yet
Cs3491 - Aiml - Unit III - Maximum Margin Classifier
11 pages
Brute-Force String Matching Explained
No ratings yet
Brute-Force String Matching Explained
3 pages
Lecture 5
No ratings yet
Lecture 5
48 pages
16 Maxflowalgs
No ratings yet
16 Maxflowalgs
6 pages
3102 01
No ratings yet
3102 01
13 pages
Ch07 AVL
No ratings yet
Ch07 AVL
82 pages
MA8491 QB - Numerical Methods - by WWW - EasyEngineering.net 4
No ratings yet
MA8491 QB - Numerical Methods - by WWW - EasyEngineering.net 4
16 pages
CS 1332 Exam 1 Java Coding Problems
No ratings yet
CS 1332 Exam 1 Java Coding Problems
8 pages
Mc4101 Ads Notes Advance Data Structure Nodes
0% (1)
Mc4101 Ads Notes Advance Data Structure Nodes
144 pages
(NNMClub - To) - (AlgoExpert) 150 Coding Interview Questions (2021) (En) - Torrent
No ratings yet
(NNMClub - To) - (AlgoExpert) 150 Coding Interview Questions (2021) (En) - Torrent
6 pages
Presentation 1
No ratings yet
Presentation 1
9 pages
Tree Data Structures Guide
No ratings yet
Tree Data Structures Guide
82 pages
Gradient Descent Explained. A Comprehensive Guide To Gradient - by Daksh Trehan - Towards Data Science
No ratings yet
Gradient Descent Explained. A Comprehensive Guide To Gradient - by Daksh Trehan - Towards Data Science
9 pages
Quick Sort Guide for AIML Students
No ratings yet
Quick Sort Guide for AIML Students
4 pages
Minimum Spanning Tree in Venice Network
No ratings yet
Minimum Spanning Tree in Venice Network
8 pages
Problem Set 1 Solutions
No ratings yet
Problem Set 1 Solutions
11 pages
C Programs for Matrix Operations and Data Structures
No ratings yet
C Programs for Matrix Operations and Data Structures
68 pages

Lecture 27 - Hashing

Uploaded by

Lecture 27 - Hashing

Uploaded by

Hashing

• Direct address tables

• Collision and collision resolution

• Many applications require a dynamic set that supports dictionary

• Example: a compiler maintaining a symbol table where keys

• Hash table is a good data structure for implementing dictionary

• Although searching can take as long as a linked list

• Table is an abstract storage device that contains table

• Each table entry contains a unique key k.

• Each table entry may also contain some information, I,

• A table entry is an ordered pair (K, I)

• Direct Addressing is the most efficient way to access

• It takes only single step for any operation on direct

• It works well when the Universe U of keys is

When the universe U is very large…

• Storing a table T of size U may be impractical, given

• The set K of the keys actually stored may be so small

• Percent of the memory allocated for table wasted, 99.999995%

• The table should be of small fixed size.

• Any key in the universe should be able to be mapped in the

• Definition: the ideal table data structure is merely an array of

• Consist : an array and a mapping function (known as hash

• Used for performing insertion, deletion and lookup on average in

Storage Space Storing k

Direct |U| Store in slot k

• How can we solve the problem of collisions?

• Solution 2: Open addressing

• Put all the elements that hash to same slot in a linked

• Worst case : All n keys hash to the same slot resulting

• Best and Average time: O(1)

• Assume simple uniform hashing: each key in table is

• Given n keys and m slots in the table: the load factor 

• What will be the average cost of an unsuccessful search for a key?

• So the cost of searching = O(1 + )

• Open Addressing (Linear and Quadratic probing, Double

• If keys are not N, ways to be found to interpret

• A character key can be interpreted as an integer

• Using a product/addition of ASCII codes is

• Solution: Using 128-radix notation this becomes

A hash function is a mapping between a set of input

Keys Hash values

• Rule2: The hash function uses all the input data.

• Rule1: Satisfies, the hash value is fully determined

• Rule2: Satisfies, Every character is summed.

• Rule3: Breaks, from looking at it, it is not obvious that it doesn’t

The division method requires two steps.

1. The key must be transformed into an integer.

2. The value must be telescoped into range 0 to m-1

• We map a key k into one of the m slots by taking the

M should not be a Key Binary K mod 8

• Unless it is known that probability distribution

• It is better to make the hash function dependent

• Power of 10 should be avoided, if application deals

• Good values of m are primes not close to the exact

• Using a random real number f in the range [0,1).

• The fractional part of the product f*key yields a number in

• When this number is multiplied by m (hash table size), the

• So far we have studied hashing with chaining, using a linked-

• Another approach for collision resolution.

• To insert: if slot is full, try another slot, and another, until an

• To search, follow same sequence of probes as would be used

• The key is first mapped to a slot:

i j +1 = (i j + h2 (k )) mod m for j 0 and 0  h2 (k )  m − 1

You might also like