Hashing: 15-111 Data Structures Data Structures
Hashing: 15-111 Data Structures Data Structures
15-111
Data Structures
Ananda Gunawardena
Hashing
Why do we need hashing?
Many applications deal with lots of data
Search engines and web pages
There are myriad look ups.
The look ups are time critical.
Typical data structures like arrays and
lists, may not be sufficient to handle
efficient lookups
In general: When look-ups need to
occur in near constant time. O(1)
Why do we need hashing?
Consider the internet(2002 data):
By the Internet Software Consortium
survey at http://www.isc.org/ in 2001
there are 125,888,197 internet hosts,
and the number is growing by 20%
every six month!
Using the best possible binary search
it takes on average 27 iterations to
find an entry.
By an survey by NUA at
http://www.nua.ie/ there are 513.41
million users world wide.
Why do we need hashing?
Solution: Hashing
In fact hashing is used in:
jezoar 1 ...
jezrahliah 1
jezreel 39
jezoar
Question
Think of hashing 10000, 5-letter words into a
table of size 10000 using the map H defined as
follows.
H(a0a1a2a3a4) = Σ ai (i=0,1….4)
If we use H, what would be the key
distribution like?
Choosing a Hash Function
Suppose we need to hash a set of strings
S ={Si} to a table of size N
H(Si) = (Σ Si[j].dj ) mod N, where Si[j] is
the jth character of string Si
How expensive is to compute this function?
• cost with direct calculation
Separate chaining
Open addressing
Linear Probing
Quadratic Probing
Double Probing
Etc.
Separate Chaining
Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley
Linear Probing Example
Consider H(key) = key Mod 6 (assume N=6)
H(11)=5, H(10)=4, H(17)=5, H(16)=4,H(23)=5
Draw the Hash table
0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
Clustering Problem
• Clustering is a significant problem in linear probing. Why?
• Illustration of primary clustering in linear probing (b) versus no clustering
(a) and the less significant secondary clustering in quadratic probing (c).
Long lines represent occupied cells, and the load factor is 0.7.
Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley
Linear Probing
2b 2b 2b 2b
3c 3c 3c 3c
3e 3e
5d 5d 5d 5d
3f 3f 3f
8j 8j 8j 8j
8u 8u 8u 8u
10 g 10 g 10 g 10 g
8s 8s 8s 8s
Clever removal
Insert f
Remove e
Find f
0a 0a 0a 0a
2b 2b 2b 2b
3c 3c 3c 3c
3e 3e gone gone
5d 5d 5d 5d
3f 3f 3f
8j 8j 8j 8j
8u 8u 8u 8u
10 g 10 g 10 g 10 g
8s 8s 8s 8s
Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley