0% found this document useful (0 votes)
431 views5 pages

Hashing Techniques in DBMS Explained

Hashing in Database Management Systems (DBMS) is a technique for efficient data retrieval and storage by transforming keys into fixed-size hash codes used for indexing in hash tables. It includes concepts such as hash functions, hash tables, and collision handling methods like chaining and open addressing. Types of hashing include static, dynamic, open addressing, and bucket hashing, each with its advantages and disadvantages regarding efficiency and complexity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
431 views5 pages

Hashing Techniques in DBMS Explained

Hashing in Database Management Systems (DBMS) is a technique for efficient data retrieval and storage by transforming keys into fixed-size hash codes used for indexing in hash tables. It includes concepts such as hash functions, hash tables, and collision handling methods like chaining and open addressing. Types of hashing include static, dynamic, open addressing, and bucket hashing, each with its advantages and disadvantages regarding efficiency and complexity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Hashing in DBMS (Database Management Systems) is a technique used for efficient data

retrieval and storage. It involves transforming a key (often a piece of data) into a fixed-size
value, called a hash code. This hash code is used to index data in a hash table, which allows for
quick searching, insertion, and deletion operations. Hashing is commonly used for indexing,
especially when dealing with large datasets.
•​ Hash Function: A function that takes an input (key) and produces a fixed-size string of
characters, typically a number, known as the hash value or hash code.
•​ Hash Table: A data structure that stores data in an array format, where the position of
each data item is determined by the hash code generated by the hash function.
•​ Collision: When two different keys produce the same hash value, this is called a
collision. Handling collisions is an important part of the hashing process.
How Hashing Works:
•​ Key: A piece of data, like a student ID, a name, or any other attribute you want to search
for in the database.
•​ Hash Function: The key is passed through a hash function, which computes a hash code
for that key.
•​ Hash Table: The hash code is used as an index to insert the data into a hash table (or a
similar structure like a hash map). This allows quick access to the data.
•​ Search Operation: To search for a key, the system computes the hash code for the key,
directly accessing the corresponding position in the hash table.
Example:
Let's say we are creating a hash table to store student records, and each student has a unique
student ID.
•​ Step 1: Define a Hash Function Suppose the hash function is a simple one:
hash(key)=keymod table size\text{hash}(key) = \text{key} \mod \text{table size}
where key is the student ID and table size is the size of the hash table.
•​ Step 2: Hash Table Creation Suppose we have a hash table of size 10, and the student
IDs are as follows:
•​ Student ID: 123, 456, 789, 234, 567
•​ Step 3: Compute Hash Values For each student ID, we apply the hash function to
compute the hash value:
•​ Hash(123) = 123 % 10 = 3
•​ Hash(456) = 456 % 10 = 6
•​ Hash(789) = 789 % 10 = 9
•​ Hash(234) = 234 % 10 = 4
•​ Hash(567) = 567 % 10 = 7
•​ Step 4: Insert into the Hash Table The data will be stored in the hash table at the
corresponding index:
•​ Index 0: (empty)
•​ Index 1: (empty)
•​ Index 2: (empty)
•​ Index 3: Student ID 123
•​ Index 4: Student ID 234
•​ Index 5: (empty)
•​ Index 6: Student ID 456
•​ Index 7: Student ID 567
•​ Index 8: (empty)
•​ Index 9: Student ID 789
•​ Step 5: Searching To find a student with a given ID, say 456, we calculate the hash
value:
•​ Hash(456) = 456 % 10 = 6
•​ We then look at index 6 in the hash table and find the student record.
Handling Collisions:
If two student IDs were to hash to the same index (a collision), there are various methods to
handle this:
•​ Chaining: Store multiple values at the same index using a linked list.
•​ Open Addressing: Search for the next available slot in the hash table.
Example of Collision:
Suppose the following IDs hash to the same value:
•​ Hash(123) = 3
•​ Hash(113) = 3
With chaining, both records would be stored at index 3:
Index 3: (113 -> 123)
This allows both student IDs to be stored at the same location without overwriting each other.
Advantages of Hashing:
•​ Efficient Search: Hashing provides fast data retrieval (O(1) time complexity on
average).
•​ Efficient Insertion and Deletion: Adding or removing data can also be done quickly.
Disadvantages:
•​ Collision Handling: Managing collisions can become complex and may degrade
performance.
•​ Memory Usage: Hash tables may require a significant amount of memory, especially if
the table size is too large.
1. Static Hashing
In static hashing, a fixed-size hash table is used to store the data, and the size of the table remains
unchanged. The hash function is used to map a key to a particular index within this fixed-size
table.
•​ Example: Consider a hash table of size 10, and the hash function is Hash(Key) = Key %
10. If we have keys (e.g., 21, 32, 43, 54), the data will be stored based on the modulus of
the key:
•​ Hash(21) = 21 % 10 = 1 → Stored at index 1
•​ Hash(32) = 32 % 10 = 2 → Stored at index 2
•​ Hash(43) = 43 % 10 = 3 → Stored at index 3
•​ Hash(54) = 54 % 10 = 4 → Stored at index 4
Since the table size is fixed, if more data needs to be inserted beyond the table’s capacity, it leads
to a problem called overflow.
2. Dynamic Hashing
Dynamic hashing addresses the issue of static hashing where the size of the hash table is fixed
and may lead to overflow. In dynamic hashing, the hash table grows or shrinks dynamically
based on the number of records. This helps in reducing collisions and provides flexibility in
dealing with the overflow situation.
Types of Dynamic Hashing:
•​ Extendible Hashing
•​ Linear Hashing
Extendible Hashing
Extendible hashing uses a directory of pointers to hash buckets and grows the directory size
dynamically as needed. It allows for splitting of buckets and doubling of the directory size to
accommodate additional records.
•​ Example: Let's assume the hash table has a global depth of 1, which means there are
only 2 buckets (each corresponding to hash values 0 and 1). When the table overflows,
we double the directory size and split the existing bucket into two new buckets.
If a new record (say 5) is inserted into the table, it’s hashed as Hash(5) = 5 % 2 = 1,
but bucket 1 already has a record and overflows. The directory size doubles to
accommodate more records.
Linear Hashing
Linear hashing works by gradually increasing the hash table size in a linear manner. When the
table reaches a certain threshold, it is resized by adding new buckets. New records are inserted
into these new buckets, and old records are rehashed to maintain a consistent distribution.
•​ Example: If a table is using bucket size 4 and becomes full, the system will add a new
bucket and rehash the data into these buckets in a linear fashion. This ensures that at any
point, no bucket is overly full.
3. Open Addressing Hashing
In open addressing, all data is stored directly in the hash table itself. When a collision occurs
(i.e., two keys hash to the same index), the system tries to find another open slot within the table
based on a probe sequence. Open addressing is suitable when there is a high number of
collisions.
Types of Open Addressing:
•​ Linear Probing
•​ Quadratic Probing
•​ Double Hashing
Linear Probing
In linear probing, when a collision occurs, the system checks the next available index (i.e., it
checks index + 1, index + 2, etc., until an empty slot is found).
•​ Example: If we have a hash table of size 5 and a hash function Hash(Key) = Key % 5:
•​ Hash(12) = 12 % 5 = 2 → Insert at index 2.
•​ Hash(17) = 17 % 5 = 2, but index 2 is already occupied (by 12). So, the system
checks index 3.
•​ Hash(17) will be inserted at index 3.
Quadratic Probing
Quadratic probing works similarly to linear probing, but instead of checking the next slot, it
checks slots that increase quadratically (e.g., index + 1^2, index + 2^2, index + 3^2, etc.).
•​ Example: Using the same hash table as before with Hash(Key) = Key % 5:
•​ Hash(12) = 12 % 5 = 2 → Insert at index 2.
•​ Hash(17) = 17 % 5 = 2, but index 2 is occupied, so the system checks index 2
+ 1^2 = 3 (if it's occupied, it checks 2 + 2^2 = 6).

Double Hashing
Double hashing uses two hash functions to calculate the index. If a collision occurs, the second
hash function is used to find the next index.
•​ Example: Let’s assume two hash functions:
•​ Hash1(Key) = Key % 5
•​ Hash2(Key) = 1 + (Key % 4)
If Hash1(17) = 2 and index 2 is occupied, double hashing calculates a new index:
•​ Hash2(17) = 1 + (17 % 4) = 1 + 1 = 2 The system will then try index = 2
+ 2 = 4.

4. Bucket Hashing
In bucket hashing, a bucket is used to store multiple records that have the same hash value (i.e.,
when collisions occur). This is similar to chaining but in the context of hash tables.
•​ Example: Suppose we have a hash table with the hash function Hash(Key) = Key % 5.
If keys 12 and 17 both hash to index 2:
•​ At index 2, we store both keys in a bucket.
The bucket allows us to store multiple items at the same index, reducing collisions
significantly.
Summary of Hashing Types:
•​ Static Hashing: A fixed-size hash table; prone to overflow issues.
•​ Dynamic Hashing: The hash table grows/shrinks dynamically; extendible hashing and
linear hashing are common types.
•​ Open Addressing: The hash table stores elements directly in the table; uses linear
probing, quadratic probing, or double hashing to handle collisions.
•​ Bucket Hashing: Stores multiple records in a bucket to handle collisions, reducing the
impact of a high number of collisions.
Advantages and Disadvantages:
•​ Advantages:
•​ Efficient data retrieval and insertion.
•​ Reduces the search space for finding data.
•​ Disadvantages:
•​ Collisions: Can still be problematic depending on the method used.
•​ Complexity: Some methods (like dynamic hashing or double hashing) can be
complex to implement.

Common questions

Powered by AI

Collisions can significantly impact the efficiency of hash tables by increasing the time required for search operations, potentially degrading performance from O(1) to O(n) in the worst-case scenario. This impact can be mitigated by employing strategies such as chaining, which stores collided entries in a linked list at the same index, or open addressing techniques like linear probing, quadratic probing, and double hashing, which resolve collisions by finding alternative indices for storage .

Hashing is generally preferred for indexing due to its efficient O(1) average time complexity for search, insertion, and deletion operations, which is considerably faster compared to the logarithmic time complexity (O(log n)) associated with B-trees. However, B-trees are more suitable for range queries and ordered data, while hashing excels in situations where exact key matches are frequent. Hashing’s efficiency in handling large datasets with quick access requirements makes it highly suitable for scenarios prioritizing speed over ordered data handling .

The primary advantage of using a hash function in a database management system is efficient data retrieval. Hashing allows for fast searching, insertion, and deletion operations by using a hash code to directly index data in a hash table, typically providing an average time complexity of O(1).

A collision in hashing occurs when two different keys produce the same hash value, leading them to the same index in a hash table. Methods to handle collisions include chaining and open addressing. Chaining involves storing multiple values at the same index using a linked list, while open addressing searches for the next available slot within the hash table .

Bucket hashing reduces collision impact by storing multiple records that hash to the same index in a single bucket, which can contain several items, effectively functioning like a small, local array or linked list. While it minimizes the collision problem by allowing multiple entries per index, potential drawbacks include increased memory usage for storing the additional list structures and the need to handle potentially large buckets, which can increase search times within the bucket .

Open addressing offers the advantage of storing all data directly within the hash table itself, which can save space and potentially improve cache performance. However, it has disadvantages, such as the need for an effective probe sequence to resolve collisions, which can complicate implementation and can lead to clustering, where groups of occupied slots can slow down insertion and retrieval operations .

Quadratic probing differs from linear probing in that it checks slots in a non-linear fashion, using a quadratic function (index + 1^2, index + 2^2, etc.), instead of merely checking the next sequential slot as in linear probing. This approach aims to solve the problem of primary clustering, where sequences of filled slots form during the use of linear probing, which can degrade performance by increasing the time required to find open slots .

Dynamic hashing addresses the limitations of static hashing by allowing the hash table to grow or shrink dynamically based on the number of records. This adaptability reduces overflow issues associated with a fixed-size hash table in static hashing. Techniques like extendible hashing and linear hashing are used to manage table resizing, which helps in reducing collisions and managing overflow effectively .

Linear hashing gradually increases the hash table size by adding buckets as needed, maintaining balance as records are rehashed into the new structure. Extendible hashing, however, dynamically grows the directory by doubling its size when necessary, allowing more flexible hash table expansion. These differences affect their use cases: linear hashing may be preferable for environments with predictable and steady growth, while extendible hashing suits scenarios with abrupt increases in data volume, requiring quick directory adjustments .

Extendible hashing is more beneficial than static hashing in scenarios where the dataset size is unpredictable or when frequent insertions and deletions are expected. It dynamically adjusts the directory and bucket sizes, accommodating growth and reducing overflow and collision issues that static hashing faces due to its fixed table size. This flexibility supports efficient data management and storage, especially in applications with dynamic data volumes .

You might also like