Lecture 3.2.1 Hashing
Lecture 3.2.1 Hashing
ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE
AND ENGG.
2
Applications of Hashing
• Construct a message authentication code (MAC)
• Digital signature
• Make commitments, but reveal message later
• Time stamping
• Key updating: key is hashed at specific intervals resulting in new key
3
Hash Tables
• We’ll discuss the hash table ADT which supports only a subset of the operations allowed by
binary search trees.
• The implementation of hash tables is called hashing.
• Hashing is a technique used for performing insertions, deletions and finds in constant average
time (i.e. O(1))
• This data structure, however, is not efficient in operations that require any ordering information
among the elements, such as findMin, findMax and printing the entire table in sorted order.
4
General Idea
• The ideal hash table structure is merely an array of some fixed size, containing the items.
• A stored item needs to have a data member, called key, that will be used in computing the index
value for the item.
• Key could be an integer, a string, etc
• e.g. a name or Id that is a part of a large employee structure
• The size of the array is TableSize.
• The items that are stored in the hash table are indexed by values from 0 to TableSize – 1.
• Each key is mapped into some number in the range 0 to TableSize – 1.
• The mapping is called a hash function.
5
Example Hash
Table
0
1
Items
2
john 25000 john25000
25000
3 john
phil 31250 key Hash 4 phil31250
phil 31250
Function
dave 27500 5
6 dave27500
dave 27500
mary 28200
7 mary28200
mary 28200
key 8
9
6
Hash Function
• The hash function:
• must be simple to compute.
• must distribute the keys evenly among the cells.
• If we know which keys will occur in advance we can write perfect hash
functions, but we don’t.
7
Hash function
Problems:
• Keys may not be numeric.
• Number of possible keys is much larger than the space available in table.
• Different keys may map into same location
• Hash function is not one-to-one => collision.
• If there are too many collisions, the performance of the hash table will suffer
dramatically.
8
Hash Functions
• If the input keys are integers then simply Key mod TableSize is a
general strategy.
• Unless key happens to have some undesirable properties. (e.g. all keys end
in 0 and we use mod 10)
• If the keys are strings, hash function needs more care.
• First convert it into a numeric value.
9
Some methods
• Truncation:
• e.g. 123456789 map to a table of 1000 addresses by picking 3 digits of the key.
• Folding:
• e.g. 123|456|789: add them and take mod.
• Key mod N:
• N is the size of the table, better if it is prime.
• Squaring:
• Square the key and then truncate
• Radix conversion:
• e.g. 1 2 3 4 treat it to be base 11, truncate if necessary.
10
Hash Function 1
• Add up the ASCII values of all characters of the key.
int hash(const string &key, int tableSize)
{
int hasVal = 0;
11
Hash Function 2
• Examine only the first 3 characters of the key.
int hash (const string &key, int tableSize)
{
return (key[0]+27 * key[1] + 729*key[2]) % tableSize;
}
• In theory, 26 * 26 * 26 = 17576 different words can be generated. However, English is not random,
only 2851 different combinations are possible.
• Thus, this function although easily computable, is also not appropriate if the hash table is reasonably
large.
12
Hash Function 3
KeySize 1
hash( key ) Key
i 0
[ KeySize i 1] 37 i
hashVal %=tableSize;
if (hashVal < 0) /* in case overflows occurs */
hashVal += tableSize;
return hashVal;
};
13
Hash function for strings:
key[i]
98 108 105
key a l i
0 1 2 i
KeySize = 3;
0
1
2
“ali” hash ……
function 8172
ali
……
10,006 (TableSize)
14
References
1. Lipschutz, Seymour, “Data Structures”, Schaum's Outline Series, Tata McGraw Hill.
2. Data structure and algorithm by Narasimha Karumanchi.
3. www.tutorialspoint.com
4. www.geeksforgeeks.com
15
Books Recommended
• Lipschutz, Seymour, “Data Structures”, Schaum's Outline Series, Tata McGraw Hill.
• Gilberg/Forouzan,” Data Structure with C ,Cengage Learning.
• Augenstein,Moshe J , Tanenbaum, Aaron M, “Data Structures using C and C++”, Prentice Hall
of India
• Goodrich, Michael T., Tamassia, Roberto, and Mount, David M., “Data Structures and
Algorithms in C++”, Wiley Student Edition.
• Aho, Alfred V., Ullman, Jeffrey D., Hopcroft ,John E. “Data Structures and Algorithms”,
Addison Wesley.
THANK YOU
17