15 HashTables
15 HashTables
2
Hash Tables – Main Idea
• Main idea: Use the key (string or number) to
index directly into an array – O(1) time to
access records Hash Table. N = 11
0
45 1
// Returns the index of 2
45 // the key in the table 3 3
3 int HashFunction(int key){
16 4
20 return key%N;
42 16 5
} //end-HashFunction
6
• Keys 20 & 42 both map 7
to the same slot – 9 8
return hashCode % N;
} //end-HashFunction
7
Hash Table Size
• We need to make sure that Hash Table is big
enough for all keys and that it facilitates the
Hash Function’s job to evenly distribute the
keys
– What if TableSize is 10 and all keys end in 0?
• All keys would map to the same slot!
8
Properties of Good Hash Functions
• Should be efficiently computable – O(1) time
• Should hash evenly throughout hash table
• Should utilize all slots in the table
• Should minimize collisions
9
Collisions and their Resolution
• A collision occurs when two different keys hash to the
same value
– E.g. For TableSize = 17, the keys 18 and 35 hash to the same
value
– 18 mod 17 = 1 and 35 mod 17 = 1
j k2 k3 k4 nil
• To Find an item: compute hash value,
then do Find on linked list
nil
BSTs * Hash(k5)=Hash(k6)=k
11
Example
• In-Class Example: Insert {18, 19, 20, 29, 30, 31}
into empty hash table with TableSize = 10 using
separate chaining
0 20 30 • 18%10=8
1 31 • 19%10=9
2 • 20%10=0
3 • 29%10=9
4 • 30%10=0
5 • 31%10=1
6
7
8 18
9 19 29
12
Load Factor of a Hash Table
• Let N = number of items to be stored
• Load factor LF = N/TableSize
• Suppose TableSize = 2 and number of items N = 10
– LF = 5
• Suppose TableSize = 10 and number of items N = 2
– LF = 0.2
• Average length of chained list = LF
• Average time for accessing an item = O(1) + O(LF)
– Want LF to be close to 1 (i.e. TableSize ~N)
– But chaining continues to work for LF > 1
13
Collision Resolution by Open Addressing
• Linked lists can take up a lot of space…
• Open addressing (or probing): When collision occurs,
try alternative cells in the array until an empty cell is
found
• Given an item X, try cells h0(X), h1(X), h2(X), …, hi(X)
• hi(X) = (Hash(X) + F(i)) mod TableSize
• Define F(0) = 0
• F is the collision resolution function. Three
possibilities:
– Linear: F(i) = i
– Quadratic: F(i) = i^2
– Double Hashing: F(i) = i*Hash2(X)
14
Open Addressing I: Linear Probing
• Main Idea: When collision occurs, scan down the
array one cell at a time looking for an empty cell
15
Example
• In-Class Example: Insert {18, 19, 20, 29, 30, 31}
into empty hash table with TableSize = 10 using
linear probing:
• hi(X) = (Hash(X) + i) mod TableSize (i = 0, 1, 2, …)
0 20
1 29 • (29+1)%10=0 i=1 • (28+0)%10=8 i=0
2 30 • (29+2)%10=1 i=2 • (28+1)%10=9 i=1
3 31 • (30+0)%10=0 i=0 • (28+2)%10=0 i=2
4 28 • (30+1)%10=1 i=1 • (28+3)%10=1 i=3
5 • (30+2)%10=2 i=2 • (28+4)%10=2 i=4
6 • (31+0)%10=1 i=0 • (28+5)%10=3 i=5
7 • (31+1)%10=2 i=1 • (28+6)%10=4 i=6
8 18 • (31+2)%10=3 i=2
9 19
16
Load Factor Analysis of Linear Probing
• Recall: Load factor LF = N/TableSize
17
Drawbacks of Linear Probing
• Works until array is full, but as number of items N
approaches TableSize (LF ~ 1), access time approaches
O(N)
18
Open Addressing II: Quadratic Probing
• Main Idea: Spread out the search for an empty slot
Increment by i^2 instead of i
• hi(X) = (Hash(X) + i2) mod TableSize (i = 0, 1, 2, …)
– No primary clustering but secondary clustering possible
• Example 1: Insert {18, 19, 20, 29, 30, 31} into empty
hash table with TableSize = 10
• Example 2: Insert {1, 2, 5, 10, 17} with TableSize = 16
19
Example
• In-Class Example: Insert {18, 19, 20, 29, 30, 31}
into empty hash table with TableSize = 10 using
quadratic probing:
• hi(X) = (Hash(X) + i2) mod TableSize (i = 0, 1, 2, …)
0 20 • (29+0)%10=9 i=0
1 30 • (29+1)%10=0 i=1
2 31 • (29+4)%10=3 i=2
3 29 • (30+0)%10=0 i=0
4 • (30+1)%10=1 i=1
5 • (31+0)%10=1 i=0
6 • (31+1)%10=2 i=1
7
8 18
9 19
20
Example
• In-Class Example: Insert {1, 2, 5, 10, 17} with
TableSize = 16 using quadratic probing:
• hi(X) = (Hash(X) + i2) mod TableSize (i = 0, 1, 2, …)
0
• (1+0)%16=1 i=0
1 1
• (2+0)%16=2 i=0
2 2
• (5+0)%16=5 i=0
3
• (10+0)%16=10 i=0
4
• (17+0)%16=1 i=0
5 5
• (17+1)%16=2 i=1
…
• (17+4)%16=5 i=2 Theorem: If TableSize is prime
10 10
• (17+9)%16=10 i=3 and LF < 0.5, quadratic
…
• (17+16)%16=1 i=4 probing will always find
15 an empty slot
• (17+25)%16=10 i=5
21
Open Addressing III: Double Hashing
• Idea: Spread out the search for an empty slot by using
a second hash function
– No primary or secondary clustering
• Try this example: Insert {18, 19, 20, 29, 30, 31} into
empty hash table with TableSize = 10 and R = 7
0 20 • (29+0*hash2)%10=9 i=0
• Hash2(29)=7-(29%7)
• Hash2(29)=7-1=6
1 31 • (29+1*hash2(29))%10=5 i=1
2 28 • (31+0*hash2)%10=1 i=0
3 • (28+0*hash2)%10=8 i=0
4 • (28+1*hash2(28))%10=5 i=1
5 29 • (28+2*hash2(28))%10=2 i=2
6
7
8 18
9 19
23
Lazy Deletion with Probing
• Need to use lazy deletion if we use probing
(why?)
– Think about how Find(X) would work…
24
Rehashing
• Rehashing – Allocate a larger hash table (of
size 2*TableSize) whenever LF exceeds a
particular value
26
Hash Tables vs Search Trees
• Hash Tables are good if you would like to
perform ONLY Insert/Delete/Find