0% found this document useful (0 votes)
9 views

Hash Table

Uploaded by

Mạnh Hậu Cao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Hash Table

Uploaded by

Mạnh Hậu Cao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Nhân bản – Phụng sự – Khai phóng

Hash Table
Data Structures & Algorithms
CONTENT

• Introduction

• Static hashing

• Dynamic hashing

Data Structures & Algorithms 2


CONTENT

•Introduction
• Static hashing

• Dynamic hashing

Data Structures & Algorithms 3


Introduction
• A table has several fields (types of information)
• A telephone book may have fields name, address, phone number
• A user account table may have fields user id, password, home folder
ð To find an entry in the table, you only need know the contents of
one of the fields (not all of them).
• The key field
• In a telephone book, the key is usually name
• In a user account table, the key is usually user id
ð Key uniquely identifies an entry
• If the key is name and no two entries in the telephone book have the same name,
the key uniquely identifies the entries

Data Structures & Algorithms 4


…Introduction

• How should we implement a table?

• How often are entries inserted and removed?

• How many of the possible key values are likely to be used?

• What is the likely pattern of searching for keys?


• e.g. Will most of the accesses be to just one or two key values?

• Is the table small enough to fit into memory?

• How long will the table exist?

Data Structures & Algorithms 5


…Introduction

• Element: a key and its entry

For searching purposes, it is best to store the key and the entry separately
(even though the key’s value may be inside the entry)

key entry
“Smith” “Smith”, “124 Hawkers Lane”, “9675846”
Element
“Yeo” “Yeo”, “1 Apple Crescent”, “0044 1970 622455”

Data Structures & Algorithms 6


…Introduction
• Implementation 1- Unsorted Sequential Array
• An array in which elements are stored key entry
consecutively in any order 0 14 <data>
1 45 <data>
• insert: add to back of array; O(1) 2 22 <data>
3 67 <data>
• find: search through the keys one at
4 17 <data>
a time, potentially all of the keys; O(n)
• remove: find + replace removed node


with last node; O(n) and so on…

Data Structures & Algorithms 7


…Introduction

• Implementation 2 - Sorted Sequential Array


• An array in which elements are stored key entry
consecutively, sorted by key 0 15 <data>
1 17 <data>
• insert: add in sorted order; O(n) 2 22 <data>
3 45 <data>
• find: binary search; O(log n)
4 67 <data>
• remove: find, remove element; O(log n)


and so on…

Data Structures & Algorithms 8


…Introduction

• Implementation 3 - Linked List (Unsorted or Sorted)


• Elements are again stored consecutively key entry
14 <data>
• insert: add to front; O(1)
or O(n) for a sorted list 45 <data>
• find: search through potentially all the keys,
one at a time; O(n) still O(n) for a sorted list 22 <data>

• remove: find, remove using pointer


67 <data>
alterations; O(n)
17 <data>

and so on…

Data Structures & Algorithms 9


…Introduction

• Implementation 4 - BST
• A BST, ordered by key
22 <data>

• insert: a standard insert; O(log n)


14 <data> 45 <data>
• find: a standard find (without
removing, of course); O(log n)
• remove: a standard remove; O(log n) 17 <data> 67 <data>

and so on…

Data Structures & Algorithms 10


…Introduction
• Implementation 5 - Hash Table
• An array in which elements are not stored
consecutively - their place of storage is key entry
calculated using the key and a hash function
4 <key> <data>

Key hash array 10 <key> <data>


function index

123 <key> <data>


Hash values à mappings of the keys
as the indexes in the hash table
Data Structures & Algorithms 11
…Introduction
• Implementation 5 - Hash Table
• Hashed key: result of applying a hash function to a key
• Keys and entries are scattered throughout the array key entry

• Array elements are not stored consecutively, their place


of storage is calculated using the key & a hash function 4 <key> <data>

• insert: calculate place of storage, insert TableNode; O(1) 10 <key> <data>


• find: calculate place of storage, retrieve entry; O(1)
• remove: calculate place of storage, set it to null; O(1)

All are O(1) ! 123 <key> <data>

Data Structures & Algorithms 12


…Introduction
• Applications of Hashing
• Compilers use hash tables to keep track of declared variables
• A hash table can be used for on-line spelling checkers — if misspelling detection
(rather than correction) is important, an entire dictionary can be hashed and
words checked in constant time
• Game playing programs use hash tables to store seen positions, thereby saving
computation time if the position is encountered again
• Hash functions can be used to quickly check for inequality — if two elements
hash to different values they must be different
• Storing sparse data

Data Structures & Algorithms 13


…Introduction

• When are other representations more suitable than hashing?


• Hash tables are very good if there is a need for many searches in a
reasonably stable table

• Hash tables are not so good if there are many insertions and
deletions, or if table traversals are needed

Data Structures & Algorithms 14


…Introduction

• Types of hashing

• Static hashing
• Tables with a fixed size

• Dynamic hashing
• Table sizes may vary

Data Structures & Algorithms 15


CONTENT

• Introduction

•Static hashing
• Hash table
• Hash methods
• Collision resolution

• Dynamic hashing

Data Structures & Algorithms 16


Static Hashing - Hash table

• Key-value pairs are stored in a fixed size table called a hash table
• A hash table is partitioned into many buckets
• Each bucket has many slots
• Each slot holds one record
• A hash function f(x) transforms the identifier (i.e. key) into an address in the
hash table

s slots
0 1 s-1
. . .
0
b buckets

.. … …
1
. . .
b-1
Data Structures & Algorithms 17
…Static Hashing - Hash table
• Uses an array hash_table[0..b-1].
• Each position of this array is a bucket
• A bucket can normally hold only one dictionary pair
• Uses a hash function f
• that converts each key k into • Data Structure for Hash Table
an index in the range [0, b-1].
#define MAX_CHAR 10
#define TABLE_SIZE 13
ðEvery dictionary pair (key, element) typedef struct {
is stored in its home bucket
char key[MAX_CHAR];
hash_table[f(key)]
/* other fields */
} element;
element hash_table[TABLE_SIZE];
Data Structures & Algorithms 18
…Static Hashing - Hash table
• Hash Function - Example

void init_table(element ht[]){


int i;
for (i=0; i<TABLE_SIZE; i++)
ht[i].key[0]=NULL;
}

int hash( char *key, int TABLE_SIZE ) {


unsigned int hash_val = 0;
while( *key != '\0' )
hash_val += *key++;
return( hash_val % H_SIZE );
}

Data Structures & Algorithms 19


…Static Hashing - Hash table
• Example

hash_table [ ]

0
Hash Hashed
Key 1 11
value
function 2
k = 11
1 3
4

f (k) = k % 5

Data Structures & Algorithms 20


…Static Hashing - Hash table
• Collision
• In a hash table with a single array table (single slot bucket), two different keys
may be hashed to the same hash value
• Two different k1 and k2
• Hash (k1) = Hash (k2)
• Keys that have the same home bucket are synonyms
• This is called collision. Example: k1 = 11, k2 = 21
Hash (11) = 11 % 5 = 1
Hash (21) = 21 % 5 = 1
• Choice of hash method
• To avoid collision (two different pairs are in the same the same bucket)
• Size (number of buckets) of hash table
• Overflow handling method
• Overflow: there is no space in the bucket for the new pair
Data Structures & Algorithms 21
…Static Hashing - Hash table

• Overflow Example

Slot 0 Slot 1
0 acos atan synonyms
synonyms: 1
char, ceil,
2 char ceil synonyms
clock, ctime
3 define
4 exp

overflow 5 float floor


6

25

Data Structures & Algorithms 22


…Static Hashing - Hash Methods
• Choice of Hash Method
• Requirements
• easy to compute
• minimal number of collisions
• If a hashing function groups key values together, this is called
clustering of the keys
• The larger the cluster, the longer the search
• A good hashing function distributes the key values uniformly
throughout the range
• For a random variable X, P(X = i) = 1/b (b is the number of bucket)

Data Structures & Algorithms 23


…Static Hashing - Hash Methods
• Choice of Hash Method
• The worst hash function maps all keys to the same bucket
• The best hash function maps all keys to distinct addresses
• Ideally, distribution of keys to addresses is uniform and random

• Many hashing methods


– Truncation
– Division
– Mid-square
– Folding
– Digit analysis
– and so on

Data Structures & Algorithms 24


…Static Hashing - Hash Methods

• Truncation method
• Ignore part of the key and use the rest as the array index
(converting non-numeric parts)

• Example
• If students have an 9-digit identification number, take the last 3 digits as the
table position
• e.g. 925371622 becomes 622

Data Structures & Algorithms 25


…Static Hashing - Hash Methods

• Division method
• Hash function f(k) = k % b
• Requires only a single division operation (quite fast)

• Certain values of b should be avoided


• if b=2p, then f(k) is just the p lowest-order bits of k; the hash function does
not depend on all the bits
• It’s a good practice to set the table size b to be a prime number

Data Structures & Algorithms 26


…Static Hashing - Hash Methods

• Mid-square method
• Middle of square method
• This method squares the key value, and then takes out the number of
bits from the middle of the square
• The number of bits to be used to obtain the bucket address depends
on the table size
• If r bits are used, the range of values is 0 to 2r-1
• This works well because most or all bits of the key value contribute to
the result

Data Structures & Algorithms 27


…Static Hashing - Hash Methods

• Mid-square method
• Example
• consider records whose keys are 4-digit numbers in base 10
• The goal is to hash these key values to a table of size 100
• This range is equivalent to two digits in base 10, that is, r = 2
• If the input is the number 4567, squaring yields an 8-digit number, 20857489
• The middle two digits of this result are 57

Data Structures & Algorithms 28


…Static Hashing - Hash Methods

• Folding method
• Partition the key into several
parts of the same length
except for the last
• These parts are then added
together to obtain the hash
address for the key
• Two ways of carrying out this
addition
• shift folding
• folding and reverse
Example
Data Structures & Algorithms 29
…Static Hashing - Hash Methods

• Digit analysis method


• All the identifiers/keys in the table are known in advance
• The index is formed by extracting, and then manipulating specific digits from
the key
• For example, the key is 925371622, we select the digits from 5 through 7
resulting 537
• The manipulation can then take many forms
• Reversing the digits (735)
• Performing a circular shift to the right (753)
• Performing a circular shift to the left (375)
• Swapping each pair of digits (357)

Data Structures & Algorithms 30


…Static Hashing - Hash Methods

• Hash Function Implementations


• A generic hashing function does not exist
• However, there are several forms of a hash function
• Let’s discuss some specific hash function implementations

typedef unsigned HASH_VALUE


typedef unsigned short KEY_TYPE

HASH_VALUE ModulusHashFunc (KEY_TYPE key){


return (key % HT_SIZE);
}

Data Structures & Algorithms 31


…Static Hashing - Hash Methods
• Folding hash function for integers
typedef unsigned long int KEY_TYPE

HASH_VALUE Fold_Integer_Hash (KEY_TYPE key){


return ( (key / 10000 + key % 10000) % HT_SIZE);
}

High-order digits Low-order digits

New integer

Source code above is for a 8-digit integer key


Data Structures & Algorithms 32
…Static Hashing - Hash Methods
• Folding hash function for integers
typedef unsigned long int KEY_TYPE

HASH_VALUE Fold_Integer_Hash (KEY_TYPE key){


return ( (key / 10000 + key % 10000) % HT_SIZE);
}

Example: key = 87629426, HT_SIZE = 251


hash value = (87629426 / 10000 + 87629426 % 10000) % 251

hash value = (8762 + 9426) % 251 = 116

key = 87629426

High-order digits Low-order digits


Data Structures & Algorithms 33
…Static Hashing - Hash Methods

• Folding hash function for pointer-based character strings


Useful for applications involving symbols and names.
Folding hash à adds ASCII values of each character and takes the modulus with
respect to the hash table size.
typedef char *KEY_TYPE

HASH_VALUE Fold_String_Hash (KEY_TYPE key){


unsigned sum_ascii_value = 0;

while (*key != ‘\0’)


sum_ascii_values += *key++;

return (sum_ascii_values % HT_SIZE);


}
Data Structures & Algorithms 34
…Static Hashing - Hash Methods

• Folding hash function for pointer-based character strings

Example:
key = “PRATIVA”, HT_SIZE = 31

hash value = ( P + R + A + T + I + V + A ) % 31
hash value = ( 80 + 82 + 65 + 84 + 73 + 86 + 65 ) % 31
hash value = 8

Data Structures & Algorithms 35


…Static Hashing - Hash Methods

• Digit Analysis-Based Folding

static unsigned DigitFoldStringHash (char *key){


unsigned long hash;

hash = ( (key[0] ^ key[3]) ^ (key[1] ^ key[2]) ) % HT_SIZE;

return (hash);
}

Data Structures & Algorithms 36


…Static Hashing - Collision Resolution

• Collision Resolution/Overflow Handling


• An overflow occurs when the home bucket for a new pair (key, element)
is full
• Methods of solving collisions/overflows
• Open Addressing
– Insert the element into the next free position in the table
• Separate Chaining
– Each table position is a linked list

Data Structures & Algorithms 37


…Static Hashing - Collision Resolution

• Open addressing
• relocate the key k to be inserted if it collides with an existing key. That is,
we store k at an entry different from hash_table[f(k)].

• Two issues arise


• what is the relocation scheme?
• how to search for k later?

• Common methods for resolving a collision in open addressing


• Linear probing
• Quadratic probing
• Double hashing
• Rehashing

Data Structures & Algorithms 38


…Static Hashing - Collision Resolution

• Open Addressing
• To insert a key k, compute f0(k). If hash_table[f0(k)] is empty, insert it there.

• If collision occurs, probe alternative cell f1(k), f2(k), .... until an empty cell is found

• fi(k) = (f(k) + g(i)) % b, with g(0) = 0

• g: collision resolution strategy

Data Structures & Algorithms 39


…Static Hashing - Collision Resolution

• Linear Probing
• g(i) =i
• cells are probed sequentially (with wraparound)
• fi(k) = (f(k) + i) % b

• Insertion
• Let k be the new key to be inserted. We compute f(k)
• For i = 0 to b-1
– compute L = ( f(k) + i ) % b
– hash_table[L] is empty, then we put k there and stop
• If we cannot find an empty entry to put k, it means that the table is full
and we should report an error

Data Structures & Algorithms 40


…Static Hashing - Collision Resolution
• Linear Probing – Insert

• divisor = b (number of buckets) = 17


• Home bucket = key % 17

0 4 8 12 16
34 0 45 6 23 7 28 12 29 11 30 33

• Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45

Data Structures & Algorithms 41


…Static Hashing - Collision Resolution
• Data Structure for Hash Table
• Linear Probing – Insert
#define MAX_CHAR 10
void linear_insert(element item, element ht[]){ #define TABLE_SIZE 13
int i, hash_value; typedef struct {
i = hash_value = hash(item.key); char key[MAX_CHAR];
/* other fields */
while(strlen(ht[i].key)) {
if (!strcmp(ht[i].key, item.key)) { } element;
element hash_table[TABLE_SIZE];
fprintf(stderr, “Duplicate entry\n”);
exit(1);
}
i = (i+1)%TABLE_SIZE;
if (i == hash_value)
fprintf(stderr, “The table is full\n”); exit(1);
}
ht[i] = item;
}
Data Structures & Algorithms 42
…Static Hashing - Collision Resolution
• Linear Probing – Delete

0 4 8 12 16
34 0 45 6 23 7 28 12 29 11 30 33

• Delete(0)
0 4 8 12 16
34 45 6 23 7 28 12 29 11 30 33

• Search cluster for pair (if any) to fill vacated bucket.

0 4 8 12 16
34 45 6 23 7 28 12 29 11 30 33

Data Structures & Algorithms 43


…Static Hashing - Collision Resolution
• Linear Probing – Delete
• Delete(34)
0 4 8 12 16
34 0 45 6 23 7 28 12 29 11 30 33

0 4 8 12 16
0 45 6 23 7 28 12 29 11 30 33

• Search cluster for pair (if any) to fill vacated bucket.


0 4 8 12 16
0 45 6 23 7 28 12 29 11 30 33

0 4 8 12 16
0 45 6 23 7 28 12 29 11 30 33
Data Structures & Algorithms 44
…Static Hashing - Collision Resolution
• Linear Probing – Delete
• Delete(29)
0 4 8 12 16
34 0 45 6 23 7 28 12 29 11 30 33

0 4 8 12 16
34 0 45 6 23 7 28 12 11 30 33

• Search cluster for pair (if any) to fill vacated bucket


0 4 8 12 16
34 0 45 6 23 7 28 12 11 30 33
0 4 8 12 16
34 0 45 6 23 7 28 12 11 30 33
0 4 8 12 16
34 0 6 23 7 28 12 11 30 45 33
Data Structures & Algorithms 45
…Static Hashing - Collision Resolution
• Performance Of Linear Probing

0 4 8 12 16
34 0 45 6 23 7 28 12 29 11 30 33

• Worst-case find/insert/erase time is Q(n), where n is


the number of pairs in the table
• This happens when all pairs are in the same cluster (the
same index/bucket)

Data Structures & Algorithms 46


…Static Hashing - Collision Resolution
• Quadratic Probing
• Linear probing searches buckets (f(x)+i)%b
• Quadratic probing uses a quadratic function of i as the increment
• Examine buckets f(x), (f(x)+i2)%b, (f(x)-i2)%b, for 1<=i<=(b-1)/2
• b is a prime number of the form 4j+3, j is an integer

• Random Probing
• Random Probing works incorporating with random numbers
• f(x) = (f’(x) + S[i]) % b
– S[i] is a table with size b-1
– S[i] is a random permuation of integers [1,b-1]
Data Structures & Algorithms 47
…Static Hashing - Collision Resolution

• Double hashing
• Double hashing is one of the best method for dealing with collisions
• If the slot is full, then a second hash function (which is different
from the first one) is calculated and combined with the first hash
function
• f(k, i) = (f1(k) + i f2(k) ) % b

Data Structures & Algorithms 48


…Static Hashing - Collision Resolution

• Rehashing
• Enlarging the Table
• To rehashing
• Create a new table of double the size (adjusting until it is again prime)
• Transfer the entries in the old table to the new table, by recomputing their
positions (using the hash function)
• Rehashing when the table is completely full

Data Structures & Algorithms 49


…Static Hashing - Collision Resolution

• Separate Chaining
• Instead of a hash table, we use a table of linked list
• keep a linked list of keys that hash to the same value

f(k) = k mod 10

Data Structures & Algorithms 50


…Static Hashing - Collision Resolution

• Separate Chaining
• To insert a key k
• Compute f(k) to determine which list to traverse
• If hash_table[f(k)] contains a null pointer, initiatize this entry to point to a
linked list that contains k alone
• If hash_table[f(k)] is a non-empty list, we add k at the beginning of this list
• To delete a key k
• compute f(k), then search for k within the list at hash_table[f(k)].
Delete k if it is found.

Data Structures & Algorithms 51


…Static Hashing - Collision Resolution

• Separate Chaining
• If the hash function works well, the number of keys in each linked list
will be a small constant
• Therefore, we expect that each search, insertion, and deletion can be
done in constant time
• Disadvantage
• Memory allocation in linked list manipulation will slow down the program
• Advantage
• Deletion is easy
• Array size is not a limitation

Data Structures & Algorithms 52


…Static Hashing - Collision Resolution
• Example
Hash Table

[0] 200 510 30


[1] 401 111
[2] 542 222
[3]
[4]
[5]
[6]
[7]
[8]
[9] 39
Data Structures & Algorithms 53
…Static Hashing - Collision Resolution
Insert a record with
key = 33
Hash Table

[0] 200 510 30


Hash (k)
= k % 10 [1] 401 111
[2] 542 222
[3]
Insert in chain 3 [4]
[5]
[6]
[7]
[8]
[9] 39
Data Structures & Algorithms 54
…Static Hashing - Collision Resolution
Insert a record with
key = 33
Hash Table

[0] 200 510 30


Hash (k)
= k % 10 [1] 401 111
[2] 542 222
[3] 33
Insert in chain 3 [4]
[5]
[6]
[7]
[8]
[9] 39
Data Structures & Algorithms 55
…Static Hashing - Collision Resolution
Insert a record with
key = 33
Hash Table
30
[0] 200 510
Hash (k)
= k % 10 [1] 401 111
[2] 542 222
[3] 33
Insert in chain 3 [4]
[5]
ERROR!
[6]
[7]
[8]
[9] 39
Data Structures & Algorithms 56
…Static Hashing - Collision Resolution
Insert a record with
key = 73
Hash Table

[0] 200 510 30


Hash (k)
= k % 10 [1] 401 111
[2] 542 222
[3] 33 73
Insert in chain 3 [4]
[5]
[6]
[7]
[8]
[9] 39
Data Structures & Algorithms 57
…Static Hashing - Collision Resolution
Insert a record with
key = 13
Hash Table

[0] 200 510 30


Hash (k)
= k % 10 [1] 401 111
[2] 542 222
[3] 33 73 13
Insert in chain 3 [4]
[5]
[6]
[7]
[8]
[9] 39
Data Structures & Algorithms 58
…Static Hashing - Collision Resolution
Insert a record with
key = 43
Hash Table

[0] 200 510 30


Hash (k)
= k % 10 [1] 401 111
[2] 542 222
[3] 33 73 13 43
Insert in chain 3 [4]
[5]
[6]
[7]
[8]
[9] 39
Data Structures & Algorithms 59
…Static Hashing - Collision Resolution
• Data Structure for Chaining
#define MAX_CHAR 10
#define TABLE_SIZE 13
#define IS_FULL(ptr) (!(ptr))
typedef struct {
char key[MAX_CHAR];
/* other fields */
} element;
typedef struct list *list_pointer;
typedef struct list {
element item;
list_pointer link;
};
list_pointer hash_table[TABLE_SIZE];

Data Structures & Algorithms 60


…Static Hashing - Collision Resolution
• Separate Chaining – Insert

void insert( element_type key, HASH_TABLE H ){


position pos, new_cell; LIST L;
pos = find( key, H );
if( pos == NULL ) {
new_cell = (position) malloc(sizeof(struct list_node));
if( new_cell == NULL ) fatal_error("Out of space!!!");
else {
L = H->the_lists[ hash( key, H->table size ) ];
new_cell->next = L->next;
new_cell->element = key; /* Probably need strcpy!! */
L->next = new_cell;
}
}
}

Data Structures & Algorithms 61


…Static Hashing - Collision Resolution
• Other Implementation
[0] 0 34

Sorted chains
• Put in pairs whose keys are [4]
6, 12, 34, 29, 28, 11, 23, 7,
0, 33, 30, 45 6 23
7
• Bucket = key % 17 [8]

11 28 45
[12] 12 29
30

[16] 33

Data Structures & Algorithms 62


CONTENT

• Introduction

• Static hashing

•Dynamic hashing

Data Structures & Algorithms 63


Dynamic hashing

• Dynamic hashing
• The number of identifiers in a hash table may vary
• Use a small table initially; when a lot of identifiers are inserted into
the table, we may increase the table size
• When a lot of identifiers are deleted from the table, we may reduce
the table size
• This is called dynamic hashing or extendible hashing
• Dynamic hashing usually involves databases and buckets may also
be called pages

Data Structures & Algorithms 64


…Dynamic hashing

• We assume each page contains p records

• Each record is identified by a key (i.e., the identifiers in static hashing)

• Space utilization = n/mp


• where n is the number of actual records

• m is the number of pages reserved

• p is the number of records per page

NumberOfRecord
NumberOfPages * PageCapacity

Data Structures & Algorithms 65


…Dynamic hashing

• Objective: Find an extendible hashing function such that


• it minimizes the number of pages accessed
• space utilization is as high as possible

• There are two approaches


• Dynamic Hashing with Using Directories
• Dynamic Hashing without Using Directories

Data Structures & Algorithms 66


SUMMARY

• Introduction

• Static hashing

• Dynamic hashing

Data Structures & Algorithms 67


Nhân bản – Phụng sự – Khai phóng

Enjoy the Course…!

Data Structures & Algorithms 68

You might also like