Hashing & Indexing Structures_ Single Level & Multi Level Indices
Hashing & Indexing Structures_ Single Level & Multi Level Indices
1. Hashing
Definition and Purpose:
• Hashing is a technique used to uniquely map data (e.g., records or keys) to a specific
location in memory using a hash function. It is often used in situations where quick
lookups and retrieval of data are needed.
Hash Functions:
• A hash function is used to convert the key into a hash value, which determines the
position (or address) where the data is stored in the hash table.
• Example: For a key k, a hash function h(k) returns a hash value that is an index in
the hash table.
Collision Handling:
• Collision occurs when two keys hash to the same address or index. There are two
primary methods for handling collisions:
1. Chaining: Involves storing multiple records that hash to the same index in a
linked list or another structure.
• Example: If two records hash to the same position, they are stored as a
linked list at that index.
2. Open Addressing: Involves finding an empty slot within the table using a
probing technique (linear probing, quadratic probing, etc.) when a collision
occurs.
Applications of Hashing:
• Primary Access Methods: Efficiently accessing individual records.
• Distributed Databases: Used in partitioning data across multiple servers or nodes.
• In-memory Databases: Hashing is widely used in in-memory databases for fast
lookups.
2. Indexing
Definition and Purpose:
• Indexing is a data structure that helps speed up the retrieval of records from a
database by providing quick access to data based on some attribute (e.g., a key or
column).
• Indexes allow for efficient searching, sorting, and range queries by storing a
mapping of keys to their corresponding records in the database.
Types of Indexes:
1. Primary Index:
• A primary index is built on the primary key of a table. It helps in uniquely
identifying each record.
• It is usually clustered with the data, meaning the data is physically ordered in
the same sequence as the index.
2. Secondary Index:
• A secondary index is built on non-primary attributes. It speeds up the search for
records based on fields that are not part of the primary key.
• Secondary indexes are typically non-clustered, meaning they store pointers to
data rather than the actual data.
3. Clustered Index:
• In clustered indexing, the data in the table is physically organized in the same
order as the index.
• Only one clustered index can exist for a table because the data can be sorted in
only one order.
4. Non-clustered Index:
• In non-clustered indexing, the data is not physically organized in the same order
as the index.
• A non-clustered index stores the index separately from the actual data and
points to the data location.
3. Single-Level Index
Definition:
• A single-level index is a simple index where the index structure contains only one
level, meaning the index directly points to the data in the storage (disk).
Structure and Functionality:
• A single-level index stores a sorted list of keys, and for each key, it stores a pointer to
the actual data location.
• When querying, the system looks up the key in the index to find the data quickly.
Example:
• If a table has records with keys A, B, and C, the index would directly map A, B, and C
to their respective records.
Advantages:
• Simple to Implement: It is straightforward and efficient for small datasets.
• Faster Lookups: Since the index is only one level, retrieval time is relatively fast.
Limitations:
• Not Scalable for Large Datasets: As the data grows, the single-level index becomes
inefficient.
• Space Inefficiency: If the table is large, the index size may also become large, leading
to wasted space.
4. Multilevel Index
Definition:
• A multilevel index is an index that has multiple levels, meaning the index itself is
indexed. This hierarchical approach reduces the size of the index and makes it
scalable for large datasets.
Structure and Functionality:
• The first level of the index points to the second level, which in turn points to the actual
data. This multi-level approach allows for the management of large amounts of data
more efficiently.
Example:
• Suppose we have a large table with millions of records. We create a first-level index
that maps to smaller blocks of data. Each block then has its own second-level index to
further divide the data.
Advantages:
• Scalable: It can handle large datasets by reducing the size of the index at each level.
• Faster for Large Databases: Since the index is hierarchical, only a few levels are
needed to reach the data.
Limitations:
• More Complex: The creation and maintenance of a multilevel index are more complex
than single-level indexes.
• Increased Storage: Multiple levels require additional storage for index structures.