Hashing Techniques in DBMS Explained

Hashing in Database Management Systems (DBMS) is a technique for efficient data retrieval and storage by transforming keys into fixed-size hash codes used for indexing in hash tables. It includes concepts such as hash functions, hash tables, and collision handling methods like chaining and open addressing. Types of hashing include static, dynamic, open addressing, and bucket hashing, each with its advantages and disadvantages regarding efficiency and complexity.

Uploaded by

nvcwrbqonznpijorhc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

431 views5 pages

Hashing Techniques in DBMS Explained

Uploaded by

nvcwrbqonznpijorhc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Hashing in DBMS (Database Management Systems) is a technique used for efficient data

retrieval and storage. It involves transforming a key (often a piece of data) into a fixed-size
value, called a hash code. This hash code is used to index data in a hash table, which allows for
quick searching, insertion, and deletion operations. Hashing is commonly used for indexing,
especially when dealing with large datasets.
• Hash Function: A function that takes an input (key) and produces a fixed-size string of
characters, typically a number, known as the hash value or hash code.
• Hash Table: A data structure that stores data in an array format, where the position of
each data item is determined by the hash code generated by the hash function.
• Collision: When two different keys produce the same hash value, this is called a
collision. Handling collisions is an important part of the hashing process.
How Hashing Works:
• Key: A piece of data, like a student ID, a name, or any other attribute you want to search
for in the database.
• Hash Function: The key is passed through a hash function, which computes a hash code
for that key.
• Hash Table: The hash code is used as an index to insert the data into a hash table (or a
similar structure like a hash map). This allows quick access to the data.
• Search Operation: To search for a key, the system computes the hash code for the key,
directly accessing the corresponding position in the hash table.
Example:
Let's say we are creating a hash table to store student records, and each student has a unique
student ID.
• Step 1: Define a Hash Function Suppose the hash function is a simple one:
hash(key)=keymod table size\text{hash}(key) = \text{key} \mod \text{table size}
where key is the student ID and table size is the size of the hash table.
• Step 2: Hash Table Creation Suppose we have a hash table of size 10, and the student
IDs are as follows:
• Student ID: 123, 456, 789, 234, 567
• Step 3: Compute Hash Values For each student ID, we apply the hash function to
compute the hash value:
• Hash(123) = 123 % 10 = 3
• Hash(456) = 456 % 10 = 6
• Hash(789) = 789 % 10 = 9
• Hash(234) = 234 % 10 = 4
• Hash(567) = 567 % 10 = 7
• Step 4: Insert into the Hash Table The data will be stored in the hash table at the
corresponding index:
• Index 0: (empty)
• Index 1: (empty)
• Index 2: (empty)
• Index 3: Student ID 123
• Index 4: Student ID 234
• Index 5: (empty)
• Index 6: Student ID 456
• Index 7: Student ID 567
• Index 8: (empty)
• Index 9: Student ID 789
• Step 5: Searching To find a student with a given ID, say 456, we calculate the hash
value:
• Hash(456) = 456 % 10 = 6
• We then look at index 6 in the hash table and find the student record.
Handling Collisions:
If two student IDs were to hash to the same index (a collision), there are various methods to
handle this:
• Chaining: Store multiple values at the same index using a linked list.
• Open Addressing: Search for the next available slot in the hash table.
Example of Collision:
Suppose the following IDs hash to the same value:
• Hash(123) = 3
• Hash(113) = 3
With chaining, both records would be stored at index 3:
Index 3: (113 -> 123)
This allows both student IDs to be stored at the same location without overwriting each other.
Advantages of Hashing:
• Efficient Search: Hashing provides fast data retrieval (O(1) time complexity on
average).
• Efficient Insertion and Deletion: Adding or removing data can also be done quickly.
Disadvantages:
• Collision Handling: Managing collisions can become complex and may degrade
performance.
• Memory Usage: Hash tables may require a significant amount of memory, especially if
the table size is too large.
1. Static Hashing
In static hashing, a fixed-size hash table is used to store the data, and the size of the table remains
unchanged. The hash function is used to map a key to a particular index within this fixed-size
table.
• Example: Consider a hash table of size 10, and the hash function is Hash(Key) = Key %
10. If we have keys (e.g., 21, 32, 43, 54), the data will be stored based on the modulus of
the key:
• Hash(21) = 21 % 10 = 1 → Stored at index 1
• Hash(32) = 32 % 10 = 2 → Stored at index 2
• Hash(43) = 43 % 10 = 3 → Stored at index 3
• Hash(54) = 54 % 10 = 4 → Stored at index 4
Since the table size is fixed, if more data needs to be inserted beyond the table’s capacity, it leads
to a problem called overflow.
2. Dynamic Hashing
Dynamic hashing addresses the issue of static hashing where the size of the hash table is fixed
and may lead to overflow. In dynamic hashing, the hash table grows or shrinks dynamically
based on the number of records. This helps in reducing collisions and provides flexibility in
dealing with the overflow situation.
Types of Dynamic Hashing:
• Extendible Hashing
• Linear Hashing
Extendible Hashing
Extendible hashing uses a directory of pointers to hash buckets and grows the directory size
dynamically as needed. It allows for splitting of buckets and doubling of the directory size to
accommodate additional records.
• Example: Let's assume the hash table has a global depth of 1, which means there are
only 2 buckets (each corresponding to hash values 0 and 1). When the table overflows,
we double the directory size and split the existing bucket into two new buckets.
If a new record (say 5) is inserted into the table, it’s hashed as Hash(5) = 5 % 2 = 1,
but bucket 1 already has a record and overflows. The directory size doubles to
accommodate more records.
Linear Hashing
Linear hashing works by gradually increasing the hash table size in a linear manner. When the
table reaches a certain threshold, it is resized by adding new buckets. New records are inserted
into these new buckets, and old records are rehashed to maintain a consistent distribution.
• Example: If a table is using bucket size 4 and becomes full, the system will add a new
bucket and rehash the data into these buckets in a linear fashion. This ensures that at any
point, no bucket is overly full.
3. Open Addressing Hashing
In open addressing, all data is stored directly in the hash table itself. When a collision occurs
(i.e., two keys hash to the same index), the system tries to find another open slot within the table
based on a probe sequence. Open addressing is suitable when there is a high number of
collisions.
Types of Open Addressing:
• Linear Probing
• Quadratic Probing
• Double Hashing
Linear Probing
In linear probing, when a collision occurs, the system checks the next available index (i.e., it
checks index + 1, index + 2, etc., until an empty slot is found).
• Example: If we have a hash table of size 5 and a hash function Hash(Key) = Key % 5:
• Hash(12) = 12 % 5 = 2 → Insert at index 2.
• Hash(17) = 17 % 5 = 2, but index 2 is already occupied (by 12). So, the system
checks index 3.
• Hash(17) will be inserted at index 3.
Quadratic Probing
Quadratic probing works similarly to linear probing, but instead of checking the next slot, it
checks slots that increase quadratically (e.g., index + 1^2, index + 2^2, index + 3^2, etc.).
• Example: Using the same hash table as before with Hash(Key) = Key % 5:
• Hash(12) = 12 % 5 = 2 → Insert at index 2.
• Hash(17) = 17 % 5 = 2, but index 2 is occupied, so the system checks index 2
+ 1^2 = 3 (if it's occupied, it checks 2 + 2^2 = 6).

Double Hashing
Double hashing uses two hash functions to calculate the index. If a collision occurs, the second
hash function is used to find the next index.
• Example: Let’s assume two hash functions:
• Hash1(Key) = Key % 5
• Hash2(Key) = 1 + (Key % 4)
If Hash1(17) = 2 and index 2 is occupied, double hashing calculates a new index:
• Hash2(17) = 1 + (17 % 4) = 1 + 1 = 2 The system will then try index = 2
+ 2 = 4.

4. Bucket Hashing
In bucket hashing, a bucket is used to store multiple records that have the same hash value (i.e.,
when collisions occur). This is similar to chaining but in the context of hash tables.
• Example: Suppose we have a hash table with the hash function Hash(Key) = Key % 5.
If keys 12 and 17 both hash to index 2:
• At index 2, we store both keys in a bucket.
The bucket allows us to store multiple items at the same index, reducing collisions
significantly.
Summary of Hashing Types:
• Static Hashing: A fixed-size hash table; prone to overflow issues.
• Dynamic Hashing: The hash table grows/shrinks dynamically; extendible hashing and
linear hashing are common types.
• Open Addressing: The hash table stores elements directly in the table; uses linear
probing, quadratic probing, or double hashing to handle collisions.
• Bucket Hashing: Stores multiple records in a bucket to handle collisions, reducing the
impact of a high number of collisions.
Advantages and Disadvantages:
• Advantages:
• Efficient data retrieval and insertion.
• Reduces the search space for finding data.
• Disadvantages:
• Collisions: Can still be problematic depending on the method used.
• Complexity: Some methods (like dynamic hashing or double hashing) can be
complex to implement.

Common questions

Collisions can significantly impact the efficiency of hash tables by increasing the time required for search operations, potentially degrading performance from O(1) to O(n) in the worst-case scenario. This impact can be mitigated by employing strategies such as chaining, which stores collided entries in a linked list at the same index, or open addressing techniques like linear probing, quadratic probing, and double hashing, which resolve collisions by finding alternative indices for storage .

Hashing is generally preferred for indexing due to its efficient O(1) average time complexity for search, insertion, and deletion operations, which is considerably faster compared to the logarithmic time complexity (O(log n)) associated with B-trees. However, B-trees are more suitable for range queries and ordered data, while hashing excels in situations where exact key matches are frequent. Hashing’s efficiency in handling large datasets with quick access requirements makes it highly suitable for scenarios prioritizing speed over ordered data handling .

The primary advantage of using a hash function in a database management system is efficient data retrieval. Hashing allows for fast searching, insertion, and deletion operations by using a hash code to directly index data in a hash table, typically providing an average time complexity of O(1).

A collision in hashing occurs when two different keys produce the same hash value, leading them to the same index in a hash table. Methods to handle collisions include chaining and open addressing. Chaining involves storing multiple values at the same index using a linked list, while open addressing searches for the next available slot within the hash table .

Bucket hashing reduces collision impact by storing multiple records that hash to the same index in a single bucket, which can contain several items, effectively functioning like a small, local array or linked list. While it minimizes the collision problem by allowing multiple entries per index, potential drawbacks include increased memory usage for storing the additional list structures and the need to handle potentially large buckets, which can increase search times within the bucket .

Open addressing offers the advantage of storing all data directly within the hash table itself, which can save space and potentially improve cache performance. However, it has disadvantages, such as the need for an effective probe sequence to resolve collisions, which can complicate implementation and can lead to clustering, where groups of occupied slots can slow down insertion and retrieval operations .

Quadratic probing differs from linear probing in that it checks slots in a non-linear fashion, using a quadratic function (index + 1^2, index + 2^2, etc.), instead of merely checking the next sequential slot as in linear probing. This approach aims to solve the problem of primary clustering, where sequences of filled slots form during the use of linear probing, which can degrade performance by increasing the time required to find open slots .

Dynamic hashing addresses the limitations of static hashing by allowing the hash table to grow or shrink dynamically based on the number of records. This adaptability reduces overflow issues associated with a fixed-size hash table in static hashing. Techniques like extendible hashing and linear hashing are used to manage table resizing, which helps in reducing collisions and managing overflow effectively .

Linear hashing gradually increases the hash table size by adding buckets as needed, maintaining balance as records are rehashed into the new structure. Extendible hashing, however, dynamically grows the directory by doubling its size when necessary, allowing more flexible hash table expansion. These differences affect their use cases: linear hashing may be preferable for environments with predictable and steady growth, while extendible hashing suits scenarios with abrupt increases in data volume, requiring quick directory adjustments .

Extendible hashing is more beneficial than static hashing in scenarios where the dataset size is unpredictable or when frequent insertions and deletions are expected. It dynamically adjusts the directory and bucket sizes, accommodating growth and reducing overflow and collision issues that static hashing faces due to its fixed table size. This flexibility supports efficient data management and storage, especially in applications with dynamic data volumes .

Static vs Dynamic Hashing Overview
No ratings yet
Static vs Dynamic Hashing Overview
32 pages
Hashing vs Indexing in DBMS
No ratings yet
Hashing vs Indexing in DBMS
28 pages
Understanding DBMS Checkpoints
No ratings yet
Understanding DBMS Checkpoints
12 pages
Understanding Hashing in Data Structures
100% (1)
Understanding Hashing in Data Structures
135 pages
Overview of Hashing Techniques
No ratings yet
Overview of Hashing Techniques
8 pages
Understanding B Trees and Their Operations
No ratings yet
Understanding B Trees and Their Operations
4 pages
Direct Addressing in Hash Tables
No ratings yet
Direct Addressing in Hash Tables
26 pages
Hashing Techniques and Collision Resolution
No ratings yet
Hashing Techniques and Collision Resolution
50 pages
Real-Time Applications of Queues
No ratings yet
Real-Time Applications of Queues
13 pages
B-Trees and Threaded Binary Trees Explained
No ratings yet
B-Trees and Threaded Binary Trees Explained
23 pages
Extendible Hashing in Data Structures
No ratings yet
Extendible Hashing in Data Structures
47 pages
Understanding Hash Functions in DSA
No ratings yet
Understanding Hash Functions in DSA
18 pages
Algorithmic Strategies Overview
No ratings yet
Algorithmic Strategies Overview
3 pages
Implementing Priority Queues with Heaps
No ratings yet
Implementing Priority Queues with Heaps
18 pages
Understanding AVL Trees and Rotations
No ratings yet
Understanding AVL Trees and Rotations
11 pages
DBMS Hashing
No ratings yet
DBMS Hashing
3 pages
Key Concepts in Heap and Tree Structures
No ratings yet
Key Concepts in Heap and Tree Structures
1 page
Digital Search Trees and Binary Tries
100% (1)
Digital Search Trees and Binary Tries
25 pages
File Organization Methods in DBMS
No ratings yet
File Organization Methods in DBMS
21 pages
Cuckoo and Universal Hashing Explained
No ratings yet
Cuckoo and Universal Hashing Explained
31 pages
Hash Functions: Division & Multiplication Methods
No ratings yet
Hash Functions: Division & Multiplication Methods
3 pages
Internal vs External Sorting Methods
No ratings yet
Internal vs External Sorting Methods
79 pages
File Organization and Storage in DBMS
No ratings yet
File Organization and Storage in DBMS
43 pages
Understanding Java Legacy Classes
No ratings yet
Understanding Java Legacy Classes
46 pages
Min vs Max Heap: Definitions and Examples
No ratings yet
Min vs Max Heap: Definitions and Examples
28 pages
Types of Relational Integrity Constraints
No ratings yet
Types of Relational Integrity Constraints
5 pages
Digital Search Tree Operations Explained
No ratings yet
Digital Search Tree Operations Explained
65 pages
Overview of JDBC and Driver Types
No ratings yet
Overview of JDBC and Driver Types
13 pages
Understanding Hashing in DBMS
No ratings yet
Understanding Hashing in DBMS
4 pages
Data Structures: Unit 1 Overview
No ratings yet
Data Structures: Unit 1 Overview
60 pages
Implementing Hash Tables and Dictionaries
No ratings yet
Implementing Hash Tables and Dictionaries
51 pages
Linear Data Structures and Arrays
No ratings yet
Linear Data Structures and Arrays
92 pages
B+ Tree Structure and Operations
0% (1)
B+ Tree Structure and Operations
9 pages
Extendible Hashing Explained
No ratings yet
Extendible Hashing Explained
40 pages
Dijkstra's Algorithm Overview and Examples
100% (1)
Dijkstra's Algorithm Overview and Examples
24 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
13 pages
Understanding Functional Dependency in DBMS
No ratings yet
Understanding Functional Dependency in DBMS
9 pages
Min-Max Heap and Deap Data Structure
No ratings yet
Min-Max Heap and Deap Data Structure
13 pages
Collision Resolution in Hashing Techniques
No ratings yet
Collision Resolution in Hashing Techniques
17 pages
Understanding Entities in DBMS
No ratings yet
Understanding Entities in DBMS
9 pages
Hashing Functions in Discrete Math
No ratings yet
Hashing Functions in Discrete Math
24 pages
Stacks and Queues in Data Structures
No ratings yet
Stacks and Queues in Data Structures
141 pages
Static vs Dynamic Hashing in DBMS
100% (1)
Static vs Dynamic Hashing in DBMS
8 pages
Storage Allocation Strategies in Compilers
No ratings yet
Storage Allocation Strategies in Compilers
2 pages
Frequent Itemsets and Clustering Methods
No ratings yet
Frequent Itemsets and Clustering Methods
22 pages
Understanding Random Experiments and Probability
No ratings yet
Understanding Random Experiments and Probability
11 pages
Step-by-Step Guide to Heap Sort
100% (1)
Step-by-Step Guide to Heap Sort
11 pages
Hashing Techniques in DBMS Explained
No ratings yet
Hashing Techniques in DBMS Explained
7 pages
Understanding Hash Tables and Hashing
No ratings yet
Understanding Hash Tables and Hashing
12 pages
Data Structures and Algorithms Syllabus
No ratings yet
Data Structures and Algorithms Syllabus
3 pages
ADSA Unit-4
No ratings yet
ADSA Unit-4
16 pages
Indexing and Hashing in DBMS
No ratings yet
Indexing and Hashing in DBMS
35 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
34 pages
Understanding Hashing and Hash Tables
No ratings yet
Understanding Hashing and Hash Tables
57 pages
Dynamic vs Static Hashing Issues
No ratings yet
Dynamic vs Static Hashing Issues
33 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
26 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
32 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
48 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
11 pages
Class 9 Polynomial Worksheet
No ratings yet
Class 9 Polynomial Worksheet
1 page
Power Law and Point Processing Techniques
No ratings yet
Power Law and Point Processing Techniques
35 pages
Digital Image Processing Course Overview
No ratings yet
Digital Image Processing Course Overview
2 pages
Hyperparameter Tuning Random Forest Python
No ratings yet
Hyperparameter Tuning Random Forest Python
15 pages
Root-Finding Algorithms in C
No ratings yet
Root-Finding Algorithms in C
23 pages
Tree and Graph Data Structures Guide
No ratings yet
Tree and Graph Data Structures Guide
36 pages
Block Diagram and State Equation Overview
No ratings yet
Block Diagram and State Equation Overview
27 pages
Chap8 Basic Cluster Analysis
100% (1)
Chap8 Basic Cluster Analysis
104 pages
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
No ratings yet
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
29 pages
Primary Goal of Artificial Intelligence
No ratings yet
Primary Goal of Artificial Intelligence
27 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
14 pages
Deep Learning Exam Questions Guide
No ratings yet
Deep Learning Exam Questions Guide
2 pages
Understanding Model Coefficients
No ratings yet
Understanding Model Coefficients
4 pages
Random Forest Interview Insights
No ratings yet
Random Forest Interview Insights
5 pages
Machine Learning Exam Questions & Answers
No ratings yet
Machine Learning Exam Questions & Answers
2 pages
Neural Networks in Data Mining Applications
No ratings yet
Neural Networks in Data Mining Applications
9 pages
Advanced Optimization Techniques Exam
No ratings yet
Advanced Optimization Techniques Exam
2 pages
Application of linear algebra in Software Engineering _20260215_212108_0000
No ratings yet
Application of linear algebra in Software Engineering _20260215_212108_0000
10 pages
Solving the Traveling Salesman Problem
No ratings yet
Solving the Traveling Salesman Problem
14 pages
Engineering Applications of Optimization
No ratings yet
Engineering Applications of Optimization
7 pages
Discrete Time Fourier Transform Overview
No ratings yet
Discrete Time Fourier Transform Overview
66 pages
Economic Load Dispatch Overview
No ratings yet
Economic Load Dispatch Overview
47 pages
Computer Vision Exam Questions 2024
No ratings yet
Computer Vision Exam Questions 2024
4 pages
Understanding Residual Networks (ResNet)
No ratings yet
Understanding Residual Networks (ResNet)
2 pages
Optimal Binary Search Tree Algorithm
No ratings yet
Optimal Binary Search Tree Algorithm
14 pages
Mathematics 10 Monthly Exam Q1
No ratings yet
Mathematics 10 Monthly Exam Q1
1 page
Neural ODEs: A Brief Tutorial
100% (1)
Neural ODEs: A Brief Tutorial
51 pages
Advancing Fractional Programming Techniques
No ratings yet
Advancing Fractional Programming Techniques
2 pages
Energy Density Spectrum of Aperiodic Signals
No ratings yet
Energy Density Spectrum of Aperiodic Signals
9 pages
CV Segmentation for 3D Reconstruction
No ratings yet
CV Segmentation for 3D Reconstruction
5 pages

Hashing Techniques in DBMS Explained

Uploaded by

Hashing Techniques in DBMS Explained

Uploaded by

Hashing in DBMS (Database Management Systems) is a technique used for efficient data

Common questions

What impact do collisions have on the efficiency of hash tables, and how might this impact be mitigated through specific strategies?

Evaluate why hashing is generally preferred for indexing in databases compared to other methods like B-trees.

What is the primary advantage of using a hash function in a database management system?

Explain the concept of collision in hashing and mention two methods to handle it.

Illustrate how bucket hashing can effectively reduce collision impact and explain its potential drawbacks.

Discuss the advantages and disadvantages of open addressing as a collision resolution technique.

How does quadratic probing differ from linear probing in open addressing, and what problem does it aim to solve?

How does dynamic hashing address the limitations of static hashing in terms of table size and overflow?

What are the key differences between linear hashing and extendible hashing, and how do these differences affect their use cases?

In which scenarios would extendible hashing be more beneficial than static hashing, and why?

You might also like