0% found this document useful (0 votes)

32 views

Hashing: 15-111 Data Structures Data Structures

Uploaded by

Lucas Saavedra

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Hashing: 15-111 Data Structures Data Structures

Uploaded by

Lucas Saavedra

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Hashing

15-111
Data Structures

Ananda Gunawardena
Hashing
Why do we need hashing?
Many applications deal with lots of data
Search engines and web pages
There are myriad look ups.
The look ups are time critical.
Typical data structures like arrays and
lists, may not be sufficient to handle
efficient lookups
In general: When look-ups need to
occur in near constant time. O(1)
Why do we need hashing?
Consider the internet(2002 data):
By the Internet Software Consortium
survey at http://www.isc.org/ in 2001
there are 125,888,197 internet hosts,
and the number is growing by 20%
every six month!
Using the best possible binary search
it takes on average 27 iterations to
find an entry.
By an survey by NUA at
http://www.nua.ie/ there are 513.41
million users world wide.
Why do we need hashing?

We need something that can do

better than a binary search,
O(log N).
We want, O(1).

Solution: Hashing
In fact hashing is used in:

Web searches Spell checkers Databases

Compilers passwords Many others
Building an index using HashMaps

DOCID OCCUR POS 1 POS 2 ...

WORD NDOCS PTR

jezebel 20 34 6 1 118 2087 3922 3981 5002
44 3 215 2291 3010
jezer 3
56 4 5 22 134 992
jezerit 1
jeziah 1 566 3 203 245 287
jeziel 1
jezliah 1 67 1 132

jezoar 1 ...
jezrahliah 1
jezreel 39
jezoar

More on this in Graphs…

The concept
Suppose we need to find a better
way to maintain a table
(Example: a Dictionary) that is
easy to insert and search in
O(1).
Big Idea in Hashing
Let S={a1,a2,…am} be a set of objects that
we need to map into a table of size N.
Find a function such that H:S [1…n]
Ideally we’d like to have a 1-1 map
But it is not easy to find one
Also function must be easy to compute
It is a good idea to pick a prime as the table
size to have a better distribution of values
Assume ai is a 16-bit integer.
Of course there is a trivial map H(ai)=ai
But this may not be practical. Why?
Finding a hash Function

Assume that N = 5 and the values

we need to insert are: cab, bea, bad
etc.
Let a=0, b=1, c=2, etc
Define H such that
H[data] = (∑ characters) Mod N
H[cab] = (2+0+1) Mod 5 = 3
H[bea] = (1+4+0) Mod 5 = 0
H[bad] = (1+0+3) Mod 5 = 4
Collisions

What if the values we need to insert

are “abc”, “cba”, “bca” etc…
They all map to the same location
based on our map H (obviously H is not a good
hash map)
This is called “Collision”
When collisions occur, we need to
“handle” them
Collisions can be reduced with a selection
of a good hash function
Choosing a Hash Function

A good hash function must

Be easy to compute
Avoid collisions
How do we find a good hash function?
A bad hash function
Let S be a string and H(S) = Σ Si where Si is the ith
character of S
Why is this bad?
Choosing a Hash Function?

Question
Think of hashing 10000, 5-letter words into a
table of size 10000 using the map H defined as
follows.
H(a0a1a2a3a4) = Σ ai (i=0,1….4)
If we use H, what would be the key
distribution like?
Choosing a Hash Function
Suppose we need to hash a set of strings
S ={Si} to a table of size N
H(Si) = (Σ Si[j].dj ) mod N, where Si[j] is
the jth character of string Si
How expensive is to compute this function?
• cost with direct calculation

• Is it always possible to do direct calculation?

Is there a cheaper way to calculate this? Hint:
use Horners Rule.
Code
public static int hash(String key, int n){
int value = 0;
for (int i=0; i<key.length(); i++)
value = (value*128+ key.charAt(i))%n;
return value;
}
What does this function return if “guna” is
hashed into a table of size 101?
What is the complexity of code in terms of string
length?
What are some of the problems with this
function?
Collisions

Hash functions can be many-to-1

They can map different search keys to
the same hash key.
hash1(`a`) == 9 == hash1(`w`)

Must compare the search key with

the record found
If the match fails, there is a collision
Collision Resolving strategies

Separate chaining
Open addressing
Linear Probing
Quadratic Probing
Double Probing
Etc.
Separate Chaining

Collisions can be resolved by

creating a list of keys that map to
the same value
Separate Chaining

Use an array of linked lists

LinkedList[ ] Table;
Table = new LinkedList(N), where N is the
table size
Define Load Factor of Table as
λ = number of keys/size of the table
(λ
λ can be more than 1)
Still need a good hash function to
distribute keys evenly
For search and updates
Linear Probing
The idea:
Table remains a simple array of size N
On insert(x), compute f(x) mod N, if
the cell is full, find another by
sequentially searching for the next
available slot
• Go to f(x)+1, f(x)+2 etc..
On find(x), compute f(x) mod N, if the
cell doesn’t match, look elsewhere.
Linear probing function can be given by
• h(x, i) = (f(x) + i) mod N (i=1,2,….)
Figure 20.4
Linear probing
hash table after
each insertion

Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley
Linear Probing Example
Consider H(key) = key Mod 6 (assume N=6)
H(11)=5, H(10)=4, H(17)=5, H(16)=4,H(23)=5
Draw the Hash table

0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
Clustering Problem
• Clustering is a significant problem in linear probing. Why?
• Illustration of primary clustering in linear probing (b) versus no clustering
(a) and the less significant secondary clustering in quadratic probing (c).
Long lines represent occupied cells, and the load factor is 0.7.

How about deleting items from Hash

table?
Item in a hash table connects to others
in the table(eg: BST).
Deleting items will affect finding the
Deleting
others
“Lazy Delete” – Just mark the items as
inactive rather than removing it.
Lazy Delete

Naïve removal can leave gaps!

Insert f
Remove e
Find f
0a 0a 0a 0a

2b 2b 2b 2b
3c 3c 3c 3c
3e 3e
5d 5d 5d 5d
3f 3f 3f

8j 8j 8j 8j
8u 8u 8u 8u
10 g 10 g 10 g 10 g
8s 8s 8s 8s

“3 f” means search key f and hash key 3

Lazy Delete

Clever removal
Insert f
Remove e
Find f
0a 0a 0a 0a

2b 2b 2b 2b
3c 3c 3c 3c
3e 3e gone gone
5d 5d 5d 5d
3f 3f 3f

8j 8j 8j 8j
8u 8u 8u 8u
10 g 10 g 10 g 10 g
8s 8s 8s 8s

“3 f” means search key f and hash key 3

Load Factor (open addressing)
definition: The load factor λ of a probing
hash table is the fraction of the table
that is full. The load factor ranges from 0
(empty) to 1 (completely full).
It is better to keep the load factor under
0.7
Double the table size and rehash if load
factor gets high
Cost of Hash function f(x) must be
minimized
When collisions occur, linear probing can
always find an empty cell
But clustering can be a problem
Quadratic Probing
Quadratic probing
Another open addressing method
Resolve collisions by examining certain
cells (1,4,9,…) away from the original
probe point
Collision policy:
Define h0(k), h1(k), h2(k), h3(k), …
where hi(k) = (hash(k) + i2) mod size
Caveat:
May not find a vacant cell!
• Table must be less than half full (λ < ½)
(Linear probing always finds a cell.)
Quadratic probing
Another issue
Suppose the table size is 16.
Probe offsets that will be tried:
1 mod 16 = 1
4 mod 16 = 4
9 mod 16 = 9
16 mod 16 = 0
25 mod 16 = 9 only four different values!
36 mod 16 = 4
49 mod 16 = 1
64 mod 16 = 0
81 mod 16 = 1
Figure 20.6
A quadratic
probing hash table
after each
insertion (note that
the table size was
poorly chosen
because it is not a
prime number).

Hashing
No ratings yet
Hashing
13 pages
Collision Resolution: Ananda Gunawardena
No ratings yet
Collision Resolution: Ananda Gunawardena
22 pages
15 HashTables
No ratings yet
15 HashTables
27 pages
Chapter 5_Hashing _Part1
No ratings yet
Chapter 5_Hashing _Part1
28 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing
No ratings yet
Hashing
35 pages
Chapter10_HashTables
No ratings yet
Chapter10_HashTables
49 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
05-CSAI-230-COURSE-05
No ratings yet
05-CSAI-230-COURSE-05
44 pages
Hash Table v2
No ratings yet
Hash Table v2
34 pages
Cs 218 - Data Structures: Hashing
No ratings yet
Cs 218 - Data Structures: Hashing
18 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Lecture 3.2.2 Collision Resolution Strategies
No ratings yet
Lecture 3.2.2 Collision Resolution Strategies
35 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hashingppts 150618032137 Lva1 App6891
No ratings yet
Hashingppts 150618032137 Lva1 App6891
30 pages
Hashing PPT
No ratings yet
Hashing PPT
39 pages
L-2005-08-Advance Data Structure Part 1-HS
No ratings yet
L-2005-08-Advance Data Structure Part 1-HS
46 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
Group 15 Hash Tables
No ratings yet
Group 15 Hash Tables
42 pages
Idst 2016 SA 05 Hashing
No ratings yet
Idst 2016 SA 05 Hashing
68 pages
Lecture 14 Hashing
No ratings yet
Lecture 14 Hashing
44 pages
Dsa Lecture 13 Hash Tables
No ratings yet
Dsa Lecture 13 Hash Tables
15 pages
Hashing
No ratings yet
Hashing
10 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
CSE 326: Data Structures Hash Tables: Autumn 2007
No ratings yet
CSE 326: Data Structures Hash Tables: Autumn 2007
29 pages
Full Unit 6 Cse 205 (1)
No ratings yet
Full Unit 6 Cse 205 (1)
20 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
Struktur Data: By: Sri Rezeki Candra Nursari
No ratings yet
Struktur Data: By: Sri Rezeki Candra Nursari
34 pages
CSD203 Hashing
No ratings yet
CSD203 Hashing
32 pages
L04 Hashing
No ratings yet
L04 Hashing
63 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing As A Dictionary Implementation
No ratings yet
Hashing As A Dictionary Implementation
40 pages
Lecture 12
No ratings yet
Lecture 12
33 pages
Course7 Hashing
No ratings yet
Course7 Hashing
19 pages
lecture12_hashing2
No ratings yet
lecture12_hashing2
26 pages
Hash Functions
No ratings yet
Hash Functions
60 pages
ds 5 update
No ratings yet
ds 5 update
26 pages
Searching, Sorting and Hashing
No ratings yet
Searching, Sorting and Hashing
52 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
TCP2101 Algorithm Design & Analysis: - Hash Tables
No ratings yet
TCP2101 Algorithm Design & Analysis: - Hash Tables
58 pages
Hashing new
No ratings yet
Hashing new
48 pages
Hash Tables: 1 Hashing
No ratings yet
Hash Tables: 1 Hashing
22 pages
Hashing
No ratings yet
Hashing
57 pages
Introduction
No ratings yet
Introduction
34 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Ch05_03-HashOpen
No ratings yet
Ch05_03-HashOpen
22 pages
5 Hash_new
No ratings yet
5 Hash_new
24 pages
chap-1 ADS
No ratings yet
chap-1 ADS
5 pages
DSAL Manual Assignment 4
No ratings yet
DSAL Manual Assignment 4
6 pages
Hashing
50% (2)
Hashing
43 pages
DOC-20240131-WA0024.
No ratings yet
DOC-20240131-WA0024.
11 pages
ADS M TECH MID 2
No ratings yet
ADS M TECH MID 2
26 pages
Hashing - Datastructures and Algorithms
No ratings yet
Hashing - Datastructures and Algorithms
32 pages
DSA LABTASK 12
No ratings yet
DSA LABTASK 12
5 pages
Hash Tables
No ratings yet
Hash Tables
21 pages
Hashing
No ratings yet
Hashing
14 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
Python Programs
No ratings yet
Python Programs
36 pages
Question Bank (ADA)
No ratings yet
Question Bank (ADA)
10 pages
Lecture 1
No ratings yet
Lecture 1
45 pages
Boltzmann Machine - Tutorialspoint
No ratings yet
Boltzmann Machine - Tutorialspoint
3 pages
Advanced DSA Asg
No ratings yet
Advanced DSA Asg
16 pages
Chap 1 - Introduction To Algorithms
No ratings yet
Chap 1 - Introduction To Algorithms
2 pages
Bee Algorithm
100% (1)
Bee Algorithm
37 pages
Evolutionary Al-WPS Office
No ratings yet
Evolutionary Al-WPS Office
7 pages
Logika Dan Algoritma, Pertemuan 8
No ratings yet
Logika Dan Algoritma, Pertemuan 8
18 pages
Clustering Menggunakan Metode K-Means Untuk Menentukan Status Gizi Balita
No ratings yet
Clustering Menggunakan Metode K-Means Untuk Menentukan Status Gizi Balita
18 pages
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
No ratings yet
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
4 pages
Sieve of Atkin: Algorithm
No ratings yet
Sieve of Atkin: Algorithm
4 pages
Gomory's Cutting Plane
No ratings yet
Gomory's Cutting Plane
10 pages
CMDA 3606 Exam 2 Review Sheet 2
No ratings yet
CMDA 3606 Exam 2 Review Sheet 2
2 pages
QUEUE USING ARRAYS
No ratings yet
QUEUE USING ARRAYS
23 pages
Addition of Two Polynomials Using Linked List
No ratings yet
Addition of Two Polynomials Using Linked List
15 pages
DP On Graphs
No ratings yet
DP On Graphs
16 pages
Amazon Pearson Informit
No ratings yet
Amazon Pearson Informit
2 pages
Online Instructions For Chapter 3: Decrease-And-Conquer: Algorithms Analysis and Design (CO3031)
No ratings yet
Online Instructions For Chapter 3: Decrease-And-Conquer: Algorithms Analysis and Design (CO3031)
7 pages
Certified Global Minima
100% (1)
Certified Global Minima
8 pages
Internet Technology
No ratings yet
Internet Technology
16 pages
113-225-1-SM
No ratings yet
113-225-1-SM
7 pages
Assignment 4.solution
100% (1)
Assignment 4.solution
7 pages
Dsa Project Report
No ratings yet
Dsa Project Report
8 pages
Maximum Circular Subarray Sum
No ratings yet
Maximum Circular Subarray Sum
3 pages
Data Mining Handout
No ratings yet
Data Mining Handout
4 pages
Iterative Methods For Solution of Systems of Linear Equations
No ratings yet
Iterative Methods For Solution of Systems of Linear Equations
10 pages
Search Applications - Games: This Unit Has Two Main Sections Planning Learning Adaptation and Heuristics
No ratings yet
Search Applications - Games: This Unit Has Two Main Sections Planning Learning Adaptation and Heuristics
53 pages

Hashing: 15-111 Data Structures Data Structures

Uploaded by

Hashing: 15-111 Data Structures Data Structures

Uploaded by

Hashing

We need something that can do

Web searches Spell checkers Databases

DOCID OCCUR POS 1 POS 2 ...

WORD NDOCS PTR

More on this in Graphs…

Assume that N = 5 and the values

What if the values we need to insert

A good hash function must

• Is it always possible to do direct calculation?

Hash functions can be many-to-1

Must compare the search key with

Collisions can be resolved by

Use an array of linked lists

How about deleting items from Hash

Naïve removal can leave gaps!

“3 f” means search key f and hash key 3

“3 f” means search key f and hash key 3

You might also like