Lecture 4 Hash Table Stu
Lecture 4 Hash Table Stu
Hash tables
Hash functions
Collision Resolution
HASH TABLE • Chaining
Lê Viết Tuấn
• Open addressing
Email: [email protected]
1 2
3 4
Hash tables Hash tables
With direct addressing, an element with key k is The hash function reduces the range of array
stored in slot k. indices and hence the size of the array.
With hashing, we use a hash function h to
compute the slot number from the key k, so that
the element goes into slot ℎ(𝑘).
The hash function h maps the universe U of keys
into the slots of a hash table 𝑇 0: 𝑚 − 1 :
ℎ: 𝑈 → 0,1, … , 𝑚 − 1
where the size m of the hash table is typically much less
than 𝑈 .
5 6
5 6
7 8
Division
ℎ 𝐾 = 𝐾 𝑚𝑜𝑑 𝑇𝑆𝑖𝑧𝑒
if K is a number, where
𝑇𝑆𝑖𝑧𝑒 = 𝑠𝑖𝑧𝑒𝑜𝑓 𝑡𝑎𝑏𝑙𝑒
9 10
11 12
11 12
Boundary folding Boundary folding
The key is seen as being written on a piece of Consider the same three parts of the SSN: 123,
paper that is folded on the borders between 456, and 789.
different parts of the key. • The first part, 123, is taken in the same order,
In this way,
• Then the piece of paper with the second part is folded
every other underneath it so that 123 is aligned with 654, which is
part will be the second part, 456, in reverse order.
put in the
reverse order. • When the folding continues, 789 is aligned with the two
previous parts.
• The result is 123 + 654 + 789 = 1,566.
https://blacksquareprintmedia.co.uk/7-types-of-paper-folds-for-leaflets-and-flyers/
13 14
13 14
For example, if If we assume that the size of the table is 1,024, then,
in this example, the binary representation of 3,1212
• The key is 3,121 is the bit string 100101001010000101100001, with
• 3,1212 = 9,740,641 the middle part shown in bold.
• For the 1,000-cell table This middle part, the binary number 0101000010, is
• ℎ(3,121) = 406, which is the middle part of 3,1212. equal to 322. This part can easily be extracted by
using a mask and a shift operation.
15 16
15 16
Collision Resolution
Foralmost all hash functions, more than one key
can be assigned to the same position.
• For example, if the hash function h1 applied to names
COLLISION returns the ASCII value of the first letter of each name
(i.e., h1(name) = name[0]), then all names starting with
RESOLUTION the same letter are hashed to the same position.
This problem can be solved by finding a function
that distributes names more uniformly in the
table.
• For example, the function h2 could add the first two
letters (i.e., h2(name) = name[0] + name[1]), which is
better than h1.
17 18
17 18
Collision Resolution
Buteven if all the letters are considered (i.e.,
ℎ3(𝑛𝑎𝑚𝑒) = 𝑛𝑎𝑚𝑒[0] + · · ·
+ 𝑛𝑎𝑚𝑒[𝑠𝑡𝑟𝑙𝑒𝑛(𝑛𝑎𝑚𝑒) – 1]), the possibility of
COLLISION
hashing different names to the same location still
exists.
RESOLUTION BY
Increasingthis size may lead to better hashing,
CHAINING
but not always!
Thereare scores of strategies that attempt to
avoid hashing multiple keys to the same location.
19 20
19 20
Separate chaining Separate chaining
Each nonempty slot points to a linked list, and all Slot j contains a pointer to the head of the list of
the elements that hash to the same slot go into that all stored elements with hash value j . If there are
slot’s linked list. no such elements, then slot j contains NIL
Slot j contains a pointer to the head of the list of
all stored elements with hash value j . If there are
no such elements, then slot j contains NIL.
21 22
21 22
23 24
23 24
Separate chaining Coalesced chaining
int ChainedHashTable::remove(int const& k) Inthis method, the first available position is found
{ for a key colliding with another key, and the index of
unsigned int bin = hash(k);
if (table[bin].deleteNode(k))
this position is stored with the key already in the
{ table.
size--;
return 1; Inthis way, a sequential search down the table can be
} avoided by directly accessing the next element on the
return 0; linked list.
}
Eachposition pos of the table stores an object with
bool ChainedHashTable::search(int const& k) const
{
two members: 𝒊𝒏𝒇𝒐 for a key and 𝒏𝒆𝒙𝒕 with the
unsigned int bin = hash(k); index of the next key that is hashed to 𝑝𝑜𝑠.
return (table[bin].isInList(k));
} Availablepositions can be marked by, say, –2 in next;
–1 can be used to indicate the end of a chain.
25 26
25 26
27 28
27 28
Coalesced chaining
OPEN ADDRESSING
29 30
29 30
31 32
31 32
Linear probing Linear probing
Consider a hash table with M = 16 bins Insertthese numbers into this initially empty hash
table: 19A, 207, 3AD, 488, 5BA, 680, 74C, 826,
946, ACD, B32, C8B, DBE, E9C
Given a 3-digit hexadecimal number:
0 1 2 3 4 5 6 7 8 9 A B C D E F
• Theleast-significant digit is the primary hash
function (bin)
• Example: for 6B72A16 , the initial bin is A and the
jump size is 3
33 34
33 34
35 36
35 36
Linear probing
Searching:start at the appropriate bin, and
searching forward until
• 1. The item is found,
LINEAR PROBING - • 2. An empty bin is found, or
SEARCHING • 3. We have traversed the entire array
0 1 2 3 4 5 6 7 8 9 A B C D E F
680 D59 B32 E93 826 207 488 946 19A 5BA 74C 3AD ACD C8B
37 38
37 38
Linear probing
Searching for C8B
ERASING
0 1 2 3 4 5 6 7 8 9 A B C D E F
5B 3A AC
680 D59 B32 E93 826 207 488 946 19A 74C C8B
A D D
39 40
39 40
Erasing Erasing
We cannot simply remove elements from the hash We cannot simply remove elements from the hash
table table
• For example, consider erasing 3AD
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
680 D59 B32 E93 826 207 488 946 19A 5BA 74C 3AD ACD C8B 680 D59 B32 E93 826 207 488 946 19A 5BA 74C 3AD ACD C8B
41 42
41 42
Erasing Erasing
In general, assume: The first possibility is that hole < index
• The
currently removed object has created a hole at • In this case, the hash value of the object at index must either
index hole • equal to or less than the hole or
• The
object we are checking is located at the position • it must be greater than the index of the potential candidate
index and has a hash value of hash
43 44
43 44
Erasing Quadratic probing
The other possibility is we wrapped around the end Quadratic function
of the array, that is, hole > index 𝑝 𝑖 =𝑖
• In
this case, the hash value of the object at index must be
both ℎ 𝑘, 𝑖 = ℎ 𝑘 + 𝑖 𝑚𝑜𝑑 𝑚
• greater than the index of the potential candidate and Another quadratic function
• it must be less than or equal to the hole
𝑝 𝑖 = ℎ 𝑘 + (−1) 𝑖 + 1 /2
for 𝑖 = 1,2, … , 𝑚
This formular can be expressed in a simple form
In
either case, if the move is successful, the ? Now ℎ 𝑘 + 𝑖 ,ℎ 𝑘 − 𝑖
becomes the new hole to be filled
for 𝑖 = 1,2, … , 𝑚
45 46
45 46
47 48
47 48
Double hashing Q&A
In orderfor the entire hash table to be searched,
the value ℎ (𝑘) must be relatively prime to the
hash-table size m.
Letm be prime and to design ℎ so that it always
returns a positive integer less than m. For example
• We could choose m prime and let
ℎ 𝑘 = 𝑘 𝑚𝑜𝑑 𝑚
ℎ 𝑘 = 1 + (𝑘 𝑚𝑜𝑑 𝑚 )
where 𝑚 is chosen to be slightly less than m (say, 𝑚 −
1).
49 50
49 50