What are Hash Functions and How to choose a good Hash Function?
Last Updated :
08 Feb, 2025
What is a Hash Function?
A hash function is a function that converts a given large number (such as a phone number) into a smaller, practical integer value. This mapped integer value is used as an index in a hash table. In simple terms, a hash function maps a large number or string to a small integer that can be used as the index in the hash table.
What is Meant by a Good Hash Function?
A good hash function should have the following properties:
- Efficiently Computable: The function should be fast to compute.
- Uniform Distribution of Keys: The hash function should distribute the keys evenly across the hash table (each table position should be equally likely for each key).
For example, for phone numbers, a bad hash function would be to take the first three digits, while a better hash function would use the last three digits. However, this may not always be the best approach. There may be other more efficient ways to design a hash function.
Rules for Choosing a Good Hash Function:
- Simplicity: The hash function should be simple to compute.
- Minimise Collisions: The number of collisions should be minimised when placing a record in the hash table. Ideally, no collision should occur, which would make it a perfect hash function.
- Uniform Distribution: The hash function should produce keys that get distributed uniformly over the hash table.
- Consider All Bits of the Key: The hash function should depend on every bit of the key. A function that extracts only a portion of the key is not suitable.
In practice, heuristic techniques can often be employed to create a hash function that performs well. Information about the distribution of the keys may be helpful in designing the function. A good hash function should depend on every single bit of the key to ensure that small changes (even a single bit difference) lead to different hash values.
If two keys differ in just one bit, or if they are permutations of each other (such as 139 and 319), they should hash to different values.
Heuristic Methods for Hashing
1. Hashing by Division: In this method, we map a key to one of the slots of a hash table by taking the remainder when dividing the key by the table size. The hash function can be represented as:
h(key) = key % table_size
Since division is computationally fast, hashing by division is quite efficient. However, some values of table_size
should be avoided. For example, if table_size
is a power of a number (e.g., 2^k
), then the hash function may not distribute the keys evenly.
For example: Suppose the key 37599
is mapped using a table size of 17
:
h(37599) = 37599 % 17 = 12
However, for key 573:
h(573) = 573 % 17 = 12
This leads to a collision because both keys are mapped to the same hash value. To avoid this, the table size should ideally be a prime number, and it should not be close to an exact power of 2.
2. The Multiplication Method: In the multiplication method, we multiply the key k
by a constant real number c
in the range 0 < c < 1
, and then extract the fractional part of the result. We then multiply this value by the table size m
and take the floor of the result. The hash function is given by:
h(k) = floor(m * (k * c mod 1))
Alternatively, this can also be written as:
h(k) = floor(m * frac(k * c)), where floor(x)
is the integer part of x
, and frac(x)
is the fractional part of x
(i.e., frac(x) = x - floor(x)
).
An advantage of the multiplication method is that the value of m
is not critical. It’s typically chosen as a power of 2 (e.g., m = 2^p
) for simplicity, as this is easy to implement on most computers.
The constant c
is often chosen to be the fraction (sqrt(5) - 1) / 2 = 0.618033988...
because this value works reasonably well.
Example: Suppose k = 123456
, m = 2^14 = 16384
, and w = 32
(where w
is the word size of the machine). Then we calculate:
key * s = 327706022297664 = (76300 * 2^32) + 17612864
r1 = 76300, r0 = 17612864
The 14 most significant bits of r0 yield the hash value:
h(key) = 67
Similar Reads
Hash Functions and Types of Hash functions
Hash functions are a fundamental concept in computer science and play a crucial role in various applications such as data storage, retrieval, and cryptography. A hash function creates a mapping from an input key to an index in hash table. Below are few examples. Phone numbers as input keys : Conside
5 min read
What is the difference between Hashing and Hash Tables?
What is Hashing? Hashing refers to the process of generating a fixed-size output from an input of variable size using the mathematical formulas known as hash functions. This technique determines an index or location for the storage of an item in a data structure. It might not be strictly related to
2 min read
PHP | hash_hmac_algos() Function
The hash_hmac_algos() function is an inbuilt function in PHP that is used to get the list of registered hashing algorithms suitable for the hash_hmac() function. Syntax: array hash_hmac_algos( void ) Parameters: This function does not accept any parameter. Return Value: This function returns an arra
2 min read
Applications, Advantages and Disadvantages of Hash Data Structure
Introduction : Imagine a giant library where every book is stored in a specific shelf, but instead of searching through endless rows of shelves, you have a magical map that tells you exactly which shelf your book is on. That's exactly what a Hash data structure does for your data! Hash data structur
7 min read
Introduction to Rolling Hash - Data Structures and Algorithms
A rolling hash is a hash function that is used to efficiently compute a hash value for a sliding window of data. It is commonly used in computer science and computational biology, where it can be used to detect approximate string matches, find repeated substrings, and perform other operations on seq
15+ min read
Open Addressing Collision Handling technique in Hashing
Open Addressing is a method for handling collisions. In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed). This appro
7 min read
PHP | hash_algos() Function
The hash_algos() function is an inbuilt function in PHP which is used to return a list of registered hashing algorithms. Syntax: array hash_algos( void ) Parameter: This function does not accepts any parameter. Return Value: This function returns a numerically indexed array which contains the list o
2 min read
Cuckoo Hashing - Worst case O(1) Lookup!
Background :Â There are three basic operations that must be supported by a hash table (or a dictionary):Â Â Lookup(key): return true if key is there on the table, else falseInsert(key): add the item âkeyâ to the table if not already presentDelete(key): removes âkeyâ from the tableCollisions are very
15+ min read
Smallest window that contains all characters of string itself
Given a string str, your task is to find the smallest window length that contains all the characters of the given string at least one time. Examples: Input: str = "aabcbcdbca"Output: 4Explanation: Sub-string -> "dbca" Input: str = "aaab"Output: 2Explanation: Sub-string -> "ab" Table of Content
9 min read
Designing a HashMap without Built-in Libraries
Design a HashMap without using any built-in hash table libraries. To be specific, your design should include these functions: put(key, value): Insert a (key, value) pair into the HashMap. If the value already exists in the HashMap, update the value.get(key): Returns the value to which the specified
4 min read