Hash Table
Hash Table
Hash Table
Data Structures & Algorithms
CONTENT
• Introduction
• Static hashing
• Dynamic hashing
•Introduction
• Static hashing
• Dynamic hashing
For searching purposes, it is best to store the key and the entry separately
(even though the key’s value may be inside the entry)
key entry
“Smith” “Smith”, “124 Hawkers Lane”, “9675846”
Element
“Yeo” “Yeo”, “1 Apple Crescent”, “0044 1970 622455”
…
with last node; O(n) and so on…
…
and so on…
and so on…
• Implementation 4 - BST
• A BST, ordered by key
22 <data>
and so on…
• Hash tables are not so good if there are many insertions and
deletions, or if table traversals are needed
• Types of hashing
• Static hashing
• Tables with a fixed size
• Dynamic hashing
• Table sizes may vary
• Introduction
•Static hashing
• Hash table
• Hash methods
• Collision resolution
• Dynamic hashing
• Key-value pairs are stored in a fixed size table called a hash table
• A hash table is partitioned into many buckets
• Each bucket has many slots
• Each slot holds one record
• A hash function f(x) transforms the identifier (i.e. key) into an address in the
hash table
s slots
0 1 s-1
. . .
0
b buckets
.. … …
1
. . .
b-1
Data Structures & Algorithms 17
…Static Hashing - Hash table
• Uses an array hash_table[0..b-1].
• Each position of this array is a bucket
• A bucket can normally hold only one dictionary pair
• Uses a hash function f
• that converts each key k into • Data Structure for Hash Table
an index in the range [0, b-1].
#define MAX_CHAR 10
#define TABLE_SIZE 13
ðEvery dictionary pair (key, element) typedef struct {
is stored in its home bucket
char key[MAX_CHAR];
hash_table[f(key)]
/* other fields */
} element;
element hash_table[TABLE_SIZE];
Data Structures & Algorithms 18
…Static Hashing - Hash table
• Hash Function - Example
hash_table [ ]
0
Hash Hashed
Key 1 11
value
function 2
k = 11
1 3
4
f (k) = k % 5
• Overflow Example
Slot 0 Slot 1
0 acos atan synonyms
synonyms: 1
char, ceil,
2 char ceil synonyms
clock, ctime
3 define
4 exp
• Truncation method
• Ignore part of the key and use the rest as the array index
(converting non-numeric parts)
• Example
• If students have an 9-digit identification number, take the last 3 digits as the
table position
• e.g. 925371622 becomes 622
• Division method
• Hash function f(k) = k % b
• Requires only a single division operation (quite fast)
• Mid-square method
• Middle of square method
• This method squares the key value, and then takes out the number of
bits from the middle of the square
• The number of bits to be used to obtain the bucket address depends
on the table size
• If r bits are used, the range of values is 0 to 2r-1
• This works well because most or all bits of the key value contribute to
the result
• Mid-square method
• Example
• consider records whose keys are 4-digit numbers in base 10
• The goal is to hash these key values to a table of size 100
• This range is equivalent to two digits in base 10, that is, r = 2
• If the input is the number 4567, squaring yields an 8-digit number, 20857489
• The middle two digits of this result are 57
• Folding method
• Partition the key into several
parts of the same length
except for the last
• These parts are then added
together to obtain the hash
address for the key
• Two ways of carrying out this
addition
• shift folding
• folding and reverse
Example
Data Structures & Algorithms 29
…Static Hashing - Hash Methods
New integer
key = 87629426
Example:
key = “PRATIVA”, HT_SIZE = 31
hash value = ( P + R + A + T + I + V + A ) % 31
hash value = ( 80 + 82 + 65 + 84 + 73 + 86 + 65 ) % 31
hash value = 8
return (hash);
}
• Open addressing
• relocate the key k to be inserted if it collides with an existing key. That is,
we store k at an entry different from hash_table[f(k)].
• Open Addressing
• To insert a key k, compute f0(k). If hash_table[f0(k)] is empty, insert it there.
• If collision occurs, probe alternative cell f1(k), f2(k), .... until an empty cell is found
• Linear Probing
• g(i) =i
• cells are probed sequentially (with wraparound)
• fi(k) = (f(k) + i) % b
• Insertion
• Let k be the new key to be inserted. We compute f(k)
• For i = 0 to b-1
– compute L = ( f(k) + i ) % b
– hash_table[L] is empty, then we put k there and stop
• If we cannot find an empty entry to put k, it means that the table is full
and we should report an error
0 4 8 12 16
34 0 45 6 23 7 28 12 29 11 30 33
• Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45
0 4 8 12 16
34 0 45 6 23 7 28 12 29 11 30 33
• Delete(0)
0 4 8 12 16
34 45 6 23 7 28 12 29 11 30 33
0 4 8 12 16
34 45 6 23 7 28 12 29 11 30 33
0 4 8 12 16
0 45 6 23 7 28 12 29 11 30 33
0 4 8 12 16
0 45 6 23 7 28 12 29 11 30 33
Data Structures & Algorithms 44
…Static Hashing - Collision Resolution
• Linear Probing – Delete
• Delete(29)
0 4 8 12 16
34 0 45 6 23 7 28 12 29 11 30 33
0 4 8 12 16
34 0 45 6 23 7 28 12 11 30 33
0 4 8 12 16
34 0 45 6 23 7 28 12 29 11 30 33
• Random Probing
• Random Probing works incorporating with random numbers
• f(x) = (f’(x) + S[i]) % b
– S[i] is a table with size b-1
– S[i] is a random permuation of integers [1,b-1]
Data Structures & Algorithms 47
…Static Hashing - Collision Resolution
• Double hashing
• Double hashing is one of the best method for dealing with collisions
• If the slot is full, then a second hash function (which is different
from the first one) is calculated and combined with the first hash
function
• f(k, i) = (f1(k) + i f2(k) ) % b
• Rehashing
• Enlarging the Table
• To rehashing
• Create a new table of double the size (adjusting until it is again prime)
• Transfer the entries in the old table to the new table, by recomputing their
positions (using the hash function)
• Rehashing when the table is completely full
• Separate Chaining
• Instead of a hash table, we use a table of linked list
• keep a linked list of keys that hash to the same value
f(k) = k mod 10
• Separate Chaining
• To insert a key k
• Compute f(k) to determine which list to traverse
• If hash_table[f(k)] contains a null pointer, initiatize this entry to point to a
linked list that contains k alone
• If hash_table[f(k)] is a non-empty list, we add k at the beginning of this list
• To delete a key k
• compute f(k), then search for k within the list at hash_table[f(k)].
Delete k if it is found.
• Separate Chaining
• If the hash function works well, the number of keys in each linked list
will be a small constant
• Therefore, we expect that each search, insertion, and deletion can be
done in constant time
• Disadvantage
• Memory allocation in linked list manipulation will slow down the program
• Advantage
• Deletion is easy
• Array size is not a limitation
Sorted chains
• Put in pairs whose keys are [4]
6, 12, 34, 29, 28, 11, 23, 7,
0, 33, 30, 45 6 23
7
• Bucket = key % 17 [8]
11 28 45
[12] 12 29
30
[16] 33
• Introduction
• Static hashing
•Dynamic hashing
• Dynamic hashing
• The number of identifiers in a hash table may vary
• Use a small table initially; when a lot of identifiers are inserted into
the table, we may increase the table size
• When a lot of identifiers are deleted from the table, we may reduce
the table size
• This is called dynamic hashing or extendible hashing
• Dynamic hashing usually involves databases and buckets may also
be called pages
NumberOfRecord
NumberOfPages * PageCapacity
• Introduction
• Static hashing
• Dynamic hashing