Unit 7
Unit 7
Hashing-7
Prof. Yogendra singh, Assistant Professor
Computer Science & Engineering
UNIT-7
Hashing
Hash Table Organizations
1 Hash Function: The hash function is at the core of a hash table. It maps
keys to indices in the hash table's array. A good hash function aims to
distribute keys evenly across the array to minimize collisions.
2 Array Size: The size of the array used in the hash table affects its
performance. A larger array can reduce the likelihood of collisions but may
consume more memory. The array size is typically chosen based on the
expected number of elements and the desired trade-off between memory
usage and performance.
Hash Table Organizations
3 Collision Resolution: Collisions occur when two keys are mapped to the
same index by the hash function. There are several methods for resolving
collisions:
Separate Chaining: Each bucket in the hash table array contains a
linked list (or another data structure) to handle multiple elements hashed to
the same index.
Open Addressing: In this approach, when a collision occurs, the
algorithm probes the array for an alternative location to store the collided
element. This can involve linear probing, quadratic probing, or other
techniques.
Hash Table Organizations
4 Load Factor: The load factor of a hash table is the ratio of the number
of elements stored in the table to the total number of slots (or buckets)
in the array. It influences the likelihood of collisions and affects the
efficiency of operations. A common practice is to resize the hash table
when the load factor exceeds a certain threshold to maintain
performance.
6 Key-Value Pairs: Hash tables often store key-value pairs, where each key
is associated with a value. The hash function is applied to the keys to
determine their storage location in the array.
Hashing functions play a crucial role in hash tables and other hashing-
based data structures. They are responsible for converting keys (such as
strings or integers) into indices within the hash table's array. Here are some
common characteristics and types of hashing functions.
Characteristics of a Good Hash Function
While easy to implement, this method may lead to clustering and poor
distribution, especially if the array size is not prime.
Static Hashing:
We want to create a hash table to store these keys with minimal collisions.
We decide to use static hashing with a hash function that maps each key
directly to its value.
hash(key) = key
In this case, the hash value of a key is the key itself.
Static Hashing
2 Hash Table Creation: Based on the range of keys, we create a hash table
with slots corresponding to each possible key value. In this example, the
keys range from 4 to 88, so we create a hash table with slots from 4 to 88.
3 Insertion: We insert each key into its corresponding slot in the hash
table based on the hash function. Since static hashing guarantees no
collisions, each key is inserted directly into its assigned slot.
Static Hashing
Slot 4: 4
Slot 10: 10
Slot 15: 15
Slot 17: 17
Slot 22: 22
Slot 28: 28
Slot 31: 31
Slot 59: 59
Slot 88: 88
As you can see, each key is stored directly in its assigned slot without any
collisions.
Static Hashing
For example, if we want to retrieve the key 15, we calculate the hash value:
hash(15) = 15
We access slot 15 in the hash table, which contains the key 15.
Static Hashing
Static hashing is efficient for datasets where the keys are known in advance
and do not change frequently. It provides constant-time access without the
need for collision resolution mechanisms. However, it may not be suitable
for dynamic datasets where keys are inserted or deleted frequently, as
resizing the hash table can be challenging.
Dynamic Hashing:
In dynamic hashing, the number of buckets in the hash table is not fixed.
Instead, it adjusts dynamically based on the number of items being stored
and retrieved. This helps in maintaining a good balance between space
efficiency and lookup efficiency.
Let's say we're implementing a hash table to store the names of students
along with their corresponding grades in a class. We want to efficiently
retrieve the grade of any student given their name.
4 Collision Handling: When two or more items hash to the same bucket,
we typically use techniques like chaining (maintaining a linked list of
items in each bucket) or open addressing (probing for an empty bucket
nearby) to handle collisions.
5 Expansion: As more items are inserted into the hash table, if the load
factor (the ratio of the number of items to the number of buckets)
exceeds a certain threshold, we dynamically increase the number of
buckets. For example, if the load factor exceeds 0.75, we can double the
number of buckets to 8.
6 Rehashing: When expanding the number of buckets, all existing items
need to be rehashed and redistributed into the new bucket structure.
This ensures that the distribution remains balanced and efficient.
Dynamic Hashing
Bucket Splitting: Only the overflowing bucket is split, reducing the overhead
compared to resizing the entire table.
Linear Hashing
Splitting Rule: Buckets are split one at a time in a linear order, which allows
for gradual growth.
Hash Functions: Two hash functions, ℎh and ℎ′h′, are used. When a bucket
overflows, the next bucket in the sequence is split.
Conclusion