Hash Functions

A hash function is a function that takes an input (or key) of arbitrary size and converts it into a fixed-size value, called a hash value or hash code.

For example, using the modulo method:
H(x) = x % 10

This function converts any large number into a value between 0 and 9, making it suitable for indexing in a hash table. Hashing enables efficient storage and fast retrieval of data.

Examples of Hash Function

Here are examples showing how input keys are mapped to hash values for efficient storage and retrieval.

1. Phone Numbers as Input Keys

Use the last two digits of the phone number as the hash value.

Hash Function
h(k) = k mod 100

The hash table size is 100, so valid indices range from 0 to 99
Taking the last two digits ensures the output always falls within this range

Note: Taking the first two digits is not a good choice because many phone numbers share the same starting digits, leading to poor distribution and more collisions.

2. Lowercase English Strings as Keys

Assign values to characters (a = 1, b = 2, ..., z = 26), sum them, and take modulo.

Hash Function:
h(s) = (sum of character values) mod 100

Limitation: Different strings can produce the same hash value:

"ad": 1 + 4 = 5
"bc": 2 + 3 = 5

This leads to collisions.

Improved Approach (Weighted Sum):

Use positional weights for characters to reduce collisions.

Better Hash Function (conceptual):
h(s) = (Σ value(s[i]) × weight[i]) mod 100
It is conceptual because the weights are not fixed and can vary.

Each character contributes differently based on its position, improving distribution and reducing collisions.

Properties of Hash Functions

A good hash function should satisfy certain properties to ensure efficient and reliable data storage, retrieval, and security.

Deterministic: A hash function must consistently produce the same output for the same input.
Fixed Output Size: The output of a hash function should have a fixed size, regardless of the size of the input.
Efficiency: The hash function should be able to process input quickly.
Uniformity: The hash function should distribute the hash values uniformly across the output space to avoid clustering.
Pre-image Resistance: It should be computationally infeasible to reverse the hash function, i.e., to find the original input given a hash value.
Collision Resistance: It should be difficult to find two different inputs that produce the same hash value.
Avalanche Effect: A small change in the input should produce a significantly different hash value.

Applications of Hash Functions

Hash functions are widely used across various domains due to their efficiency and versatility:

Hash Tables: The most common use of hash functions in DSA is in hash tables, which provide an efficient way to store and retrieve data.
Data Integrity: Hash functions are used to ensure the integrity of data by generating checksums.
Cryptography: In cryptographic applications, hash functions are used to create secure hash algorithms like SHA-256.
Data Structures: Hash functions are utilized in various data structures such as Bloom filters and hash sets.

Types of Hash Functions

Different hashing techniques are used depending on the application requirements. Below are the commonly used types:

1. Division Method

The division method computes the hash value as the remainder when the key is divided by m, where m is usually chosen as a prime number.

h(k)=k mod m
Where k is the key and m is typically chosen as a prime number.

Advantages:

Simple to implement.
Works well when m is a prime number.

Disadvantages:

Poor distribution if m is not chosen wisely.

2. Multiplication Method

In this method, the key is multiplied by a constant A (where 0 < A < 1). The fractional part of the result is then multiplied by m to obtain the hash value.

h(k)=⌊m * ((k * A) mod 1)⌋
Where ⌊ ⌋ denotes the floor function.

Unlike the division method, this technique is less dependent on the value of m, making it more flexible in practice.

Advantages:

Provides better distribution in many cases.
Less sensitive to table size.

Disadvantages:

Slightly more complex to compute.

3. Mid-Square Method

This method involves squaring the key and extracting the middle portion of the result to generate the hash value.

Steps:
Square the key.
Extract the middle digits of the squared value.

Squaring the key helps in spreading out the digits, which often results in a more uniform distribution of hash values.

Advantages:

Produces a good distribution of hash values.

Disadvantages:

May require more computational effort.

4. Folding Method

In the folding method, the key is divided into several smaller parts, and these parts are combined (usually by addition) to generate the hash value.

Steps:
Split the key into equal or nearly equal segments.
Add all the segments to obtain an intermediate sum.
Apply modulo operation (if required) to fit the hash table size.
Optionally, reverse alternate segments before adding to improve distribution.

Advantages:

Simple and easy to implement.
Works well for large keys.
Flexible in partitioning strategy.

Disadvantages:

Quality depends on how the key is divided.
May not distribute values uniformly if parts are similar.

5. Cryptographic Hash Functions

These hash functions are designed for security rather than speed. They are used in applications where data protection is critical.

Examples include SHA-256 and SHA-3, which are widely used in security applications like password hashing and blockchain.

They ensure strong properties like pre-image resistance, second pre-image resistance, and collision resistance, making them suitable for secure systems.

Advantages:

Highly secure and reliable.

Disadvantages:

Slower compared to non-cryptographic hash functions.

6. Universal Hashing

Universal hashing uses a set of hash functions and selects one randomly at runtime. The goal is to minimize the probability of collisions regardless of the input distribution.

h(k)=((a * k + b) mod p) mod m
Where a and b are randomly chosen constants, p is a prime number greater than m, and k is the key.

By introducing randomness, this method prevents attackers or worst-case inputs from degrading performance.

Advantages:

Reduces collision probability.
More robust against adversarial inputs.

Disadvantages:

Requires more computation and storage.

7. Perfect Hashing

Perfect hashing is a technique used to construct a hash function for a fixed set of keys, ensuring that each key maps to a unique index with no collisions.

Perfect hashing is commonly classified into two main types:

Minimal Perfect Hashing: Ensures that the range of the hash function is equal to the number of keys.
Non-minimal Perfect Hashing: The range may be larger than the number of keys.

Advantages:

Eliminates collisions completely.

Disadvantages:

Difficult to construct.
Not suitable for dynamic data.

Examples of Hash Function

1. Phone Numbers as Input Keys

2. Lowercase English Strings as Keys

Properties of Hash Functions

Applications of Hash Functions

Types of Hash Functions

1. Division Method

2. Multiplication Method

3. Mid-Square Method

4. Folding Method

5. Cryptographic Hash Functions

6. Universal Hashing

7. Perfect Hashing

Explore