0% found this document useful (0 votes)
15 views13 pages

Lecture 4 Hash Table Stu

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views13 pages

Lecture 4 Hash Table Stu

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Table of Contents

Hash tables
Hash functions
Collision Resolution
HASH TABLE • Chaining
Lê Viết Tuấn
• Open addressing
Email: [email protected]

Homepage: vt-le.github.io Applications of Hash Table

1 2

Direct-address tables Direct-address tables


Directaddressing is a simple technique that works Slotk points to an element in the set with key k.
well when the universe U of keys is reasonably If the set contains no element with key k, then
small. 𝑇 𝑘 = 𝑁𝑈𝐿𝐿.
Suppose that an application needs a dynamic set
in which each element has a distinct key drawn
from the universe 𝑈 = 0,1, … , 𝑚 − 1 , where m
is not too large.
Torepresent the dynamic set, you can use an
array, or direct-address table, denoted by
𝑇 0: 𝑚 − 1 , in which each position, or slot,
corresponds to a key in the universe U .
3 4

3 4
Hash tables Hash tables
With direct addressing, an element with key k is The hash function reduces the range of array
stored in slot k. indices and hence the size of the array.
With hashing, we use a hash function h to
compute the slot number from the key k, so that
the element goes into slot ℎ(𝑘).
The hash function h maps the universe U of keys
into the slots of a hash table 𝑇 0: 𝑚 − 1 :
ℎ: 𝑈 → 0,1, … , 𝑚 − 1
where the size m of the hash table is typically much less
than 𝑈 .

5 6

5 6

Hash tables Hash tables


We will refer to the table size as 𝑇𝑆𝑖𝑧𝑒. Weseek a hash function that distributes the keys
Thecommon convention is to have the table run evenly among the cells.
0
from 0 to 𝑇𝑆𝑖𝑧𝑒 − 1. For example 1
Eachkey is mapped into some number in the • john hashes to 3 2
range 0 to 𝑇𝑆𝑖𝑧𝑒 − 1 and placed in the 3 john 25000
• phil hashes to 4
appropriate cell.
• dave hashes to 6, 4 phil 31250
The mapping is called a hash function, which
• mary hashes to 7 5
ideally should be simple to compute and should
ensure that any two distinct keys get different 6 dave 27500
cells. 7 mary 28200
8
7 8

7 8
Division
ℎ 𝐾 = 𝐾 𝑚𝑜𝑑 𝑇𝑆𝑖𝑧𝑒
if K is a number, where
𝑇𝑆𝑖𝑧𝑒 = 𝑠𝑖𝑧𝑒𝑜𝑓 𝑡𝑎𝑏𝑙𝑒

HASH FUNCTIONS Itis best if 𝑇𝑆𝑖𝑧𝑒 is a prime number; otherwise,


ℎ(𝐾) = (𝐾 𝑚𝑜𝑑 𝑝) mod 𝑇𝑆𝑖𝑧𝑒 for some prime
𝑝 > 𝑇𝑆𝑖𝑧𝑒 can be used.
However, nonprime divisors may work equally well
as prime divisors provided they do not have prime
factors less than 20 (Lum et al. 1971).
9 10

9 10

Folding Shift folding


The key is divided into several parts. In shiftfolding, they are put underneath one
another and then processed. For examples:
These parts are combined or folded together and
• A socialsecurity number (SSN) 123-45-6789 can be
are often transformed in a certain way to create
divided into three parts, 123, 456, 789, and then these
the target address. parts can be added.
There are two types of folding: • The resulting number, 1,368, can be divided modulo TSize or, if
the size of the table is 1,000, the first three digits can be used
• Shift folding. for the address.
• Boundary folding. • Anotherpossibility is to divide the same number 123-
45-6789 into five parts (say, 12, 34, 56, 78, and 9), add
them, and divide the result modulo TSize.

11 12

11 12
Boundary folding Boundary folding
The key is seen as being written on a piece of Consider the same three parts of the SSN: 123,
paper that is folded on the borders between 456, and 789.
different parts of the key. • The first part, 123, is taken in the same order,
In this way,
• Then the piece of paper with the second part is folded
every other underneath it so that 123 is aligned with 654, which is
part will be the second part, 456, in reverse order.
put in the
reverse order. • When the folding continues, 789 is aligned with the two
previous parts.
• The result is 123 + 654 + 789 = 1,566.

https://blacksquareprintmedia.co.uk/7-types-of-paper-folds-for-leaflets-and-flyers/
13 14

13 14

Mid-Square Function Mid-Square Function


In the mid-square method, the key is squared and In practice, it is more efficient to choose a power of 2
the middle or mid part of the result is used as the for the size of the table and extract the middle part of
address. the bit representation of the square of a key.

For example, if If we assume that the size of the table is 1,024, then,
in this example, the binary representation of 3,1212
• The key is 3,121 is the bit string 100101001010000101100001, with
• 3,1212 = 9,740,641 the middle part shown in bold.
• For the 1,000-cell table This middle part, the binary number 0101000010, is
• ℎ(3,121) = 406, which is the middle part of 3,1212. equal to 322. This part can easily be extracted by
using a mask and a shift operation.

15 16

15 16
Collision Resolution
Foralmost all hash functions, more than one key
can be assigned to the same position.
• For example, if the hash function h1 applied to names
COLLISION returns the ASCII value of the first letter of each name
(i.e., h1(name) = name[0]), then all names starting with
RESOLUTION the same letter are hashed to the same position.
This problem can be solved by finding a function
that distributes names more uniformly in the
table.
• For example, the function h2 could add the first two
letters (i.e., h2(name) = name[0] + name[1]), which is
better than h1.
17 18

17 18

Collision Resolution
Buteven if all the letters are considered (i.e.,
ℎ3(𝑛𝑎𝑚𝑒) = 𝑛𝑎𝑚𝑒[0] + · · ·
+ 𝑛𝑎𝑚𝑒[𝑠𝑡𝑟𝑙𝑒𝑛(𝑛𝑎𝑚𝑒) – 1]), the possibility of
COLLISION
hashing different names to the same location still
exists.
RESOLUTION BY
Increasingthis size may lead to better hashing,
CHAINING
but not always!
Thereare scores of strategies that attempt to
avoid hashing multiple keys to the same location.

19 20

19 20
Separate chaining Separate chaining
Each nonempty slot points to a linked list, and all Slot j contains a pointer to the head of the list of
the elements that hash to the same slot go into that all stored elements with hash value j . If there are
slot’s linked list. no such elements, then slot j contains NIL
Slot j contains a pointer to the head of the list of
all stored elements with hash value j . If there are
no such elements, then slot j contains NIL.

21 22

21 22

Separate chaining Separate chaining


struct ChainedHashTable unsigned int ChainedHashTable::hash(int const& key) const
{ {
private: return (key % capacity);
IntSLL* table; }
int capacity;
int size; void ChainedHashTable::insert(int const& k)
unsigned int hash(int const& key) const; {
public: unsigned int bin = hash(k);
ChainedHashTable(int cap = 101) if (!table[bin].isInList(k))
{ {
capacity = cap; table[bin].addToHead(k);
table = new IntSLL[capacity]; size++;
size = 0; }
} }
~ChainedHashTable() { . . . }
};

23 24

23 24
Separate chaining Coalesced chaining
int ChainedHashTable::remove(int const& k) Inthis method, the first available position is found
{ for a key colliding with another key, and the index of
unsigned int bin = hash(k);
if (table[bin].deleteNode(k))
this position is stored with the key already in the
{ table.
size--;
return 1; Inthis way, a sequential search down the table can be
} avoided by directly accessing the next element on the
return 0; linked list.
}
Eachposition pos of the table stores an object with
bool ChainedHashTable::search(int const& k) const
{
two members: 𝒊𝒏𝒇𝒐 for a key and 𝒏𝒆𝒙𝒕 with the
unsigned int bin = hash(k); index of the next key that is hashed to 𝑝𝑜𝑠.
return (table[bin].isInList(k));
} Availablepositions can be marked by, say, –2 in next;
–1 can be used to indicate the end of a chain.

25 26

25 26

Coalesced chaining Coalesced chaining


Coalesced hashing puts a colliding key in the last Coalesced hashing that uses a cellar.
position of the table. • Noncolliding keys are stored in their home positions.
• Collidingkeys are put in the last available slot of the
cellar and added to the list starting from their home
position

27 28

27 28
Coalesced chaining

OPEN ADDRESSING

29 30

29 30

Open Addressing Linear probing


When a key collides with another key, the Assume we are inserting into bin k:
collision is resolved by finding an available table • If bin k is empty, we occupy it
entry other than the position (address) to which
• Otherwise,check bin k + 1, k + 2, and so on, until an
the colliding key is originally hashed.
empty bin is found
If
position h(K) is occupied, then the positions in • If we reach the end of the array, we start at the front (bin 0)
the probing sequence
𝑛𝑜𝑟𝑚(ℎ(𝐾) + 𝑝(1)), 𝑛𝑜𝑟𝑚(ℎ(𝐾)
+ 𝑝(2)), . . . , 𝑛𝑜𝑟𝑚(ℎ(𝐾) + 𝑝(𝑖)), . . .

31 32

31 32
Linear probing Linear probing
Consider a hash table with M = 16 bins Insertthese numbers into this initially empty hash
table: 19A, 207, 3AD, 488, 5BA, 680, 74C, 826,
946, ACD, B32, C8B, DBE, E9C
Given a 3-digit hexadecimal number:
0 1 2 3 4 5 6 7 8 9 A B C D E F
• Theleast-significant digit is the primary hash
function (bin)
• Example: for 6B72A16 , the initial bin is A and the
jump size is 3

33 34

33 34

Linear probing Linear probing


Having completed these insertions: Thesimplest method is linear probing, for which
• The load factor is 𝜆 = 14/16 = 0.875 𝑝(𝑖) = 𝑖.
• The average number of probes is 38/14 ≈ 2.71 ℎ 𝑘, 𝑖 = ℎ 𝑘 + 𝑖 𝑚𝑜𝑑 𝑚
0 1 2 3 4 5 6 7 8 9 A B C D E F
for 𝑖 = 0,1, … , 𝑚 − 1.
680 D59 B32 E93 826 207 488 946 19A 5BA 74C 3AD ACD C8B
Thevalue of ℎ 𝑘 determines the entire probe
sequence, and so assuming that ℎ 𝑘 can take on
any value in 0,1, … , 𝑚 − 1 , linear probing
allows only m distinct probe sequences.

35 36

35 36
Linear probing
Searching:start at the appropriate bin, and
searching forward until
• 1. The item is found,
LINEAR PROBING - • 2. An empty bin is found, or
SEARCHING • 3. We have traversed the entire array
0 1 2 3 4 5 6 7 8 9 A B C D E F

680 D59 B32 E93 826 207 488 946 19A 5BA 74C 3AD ACD C8B

37 38

37 38

Linear probing
Searching for C8B

ERASING
0 1 2 3 4 5 6 7 8 9 A B C D E F
5B 3A AC
680 D59 B32 E93 826 207 488 946 19A 74C C8B
A D D

39 40

39 40
Erasing Erasing
We cannot simply remove elements from the hash We cannot simply remove elements from the hash
table table
• For example, consider erasing 3AD

0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

680 D59 B32 E93 826 207 488 946 19A 5BA 74C 3AD ACD C8B 680 D59 B32 E93 826 207 488 946 19A 5BA 74C 3AD ACD C8B

41 42

41 42

Erasing Erasing
In general, assume: The first possibility is that hole < index
• The
currently removed object has created a hole at • In this case, the hash value of the object at index must either
index hole • equal to or less than the hole or
• The
object we are checking is located at the position • it must be greater than the index of the potential candidate
index and has a hash value of hash

• Remember: if we are checking the object ? at location


index, this means that all entries between hole and index are
both occupied and could not have been copied into the hole

43 44

43 44
Erasing Quadratic probing
The other possibility is we wrapped around the end Quadratic function
of the array, that is, hole > index 𝑝 𝑖 =𝑖
• In
this case, the hash value of the object at index must be
both ℎ 𝑘, 𝑖 = ℎ 𝑘 + 𝑖 𝑚𝑜𝑑 𝑚
• greater than the index of the potential candidate and Another quadratic function
• it must be less than or equal to the hole
𝑝 𝑖 = ℎ 𝑘 + (−1) 𝑖 + 1 /2
for 𝑖 = 1,2, … , 𝑚
This formular can be expressed in a simple form
In
either case, if the move is successful, the ? Now ℎ 𝑘 + 𝑖 ,ℎ 𝑘 − 𝑖
becomes the new hole to be filled
for 𝑖 = 1,2, … , 𝑚
45 46

45 46

Double hashing Double hashing


Double hashing uses a hash function of the form Insertion bydouble hashing. The hash
ℎ 𝑘, 𝑖 = ℎ 𝑘 + 𝑖ℎ 𝑘 𝑚𝑜𝑑 𝑚 table has size 13 with ℎ 𝑘 =
𝑘 𝑚𝑜𝑑 13 and ℎ 𝑘 = 1 +
where both h1 and h2 are auxiliary hash functions. 𝑘 𝑚𝑜𝑑 11 .
Theinitial probe goes to position 𝑇[ℎ 𝑘 ], and Since 14 = 1 (𝑚𝑜𝑑 13) and 14 =
successive probe positions are offset from 3(𝑚𝑜𝑑 11), the key 14 goes into empty
previous positions by the amount ℎ 𝑘 , modulo slot 9, after slots 1 and 5 are examined
m. and found to be occupied.

47 48

47 48
Double hashing Q&A
In orderfor the entire hash table to be searched,
the value ℎ (𝑘) must be relatively prime to the
hash-table size m.
Letm be prime and to design ℎ so that it always
returns a positive integer less than m. For example
• We could choose m prime and let
ℎ 𝑘 = 𝑘 𝑚𝑜𝑑 𝑚
ℎ 𝑘 = 1 + (𝑘 𝑚𝑜𝑑 𝑚 )
where 𝑚 is chosen to be slightly less than m (say, 𝑚 −
1).
49 50

49 50

You might also like