Open In App

Storage, Indexing & Advanced Topics Interview Questions - DBMS

Last Updated : 30 Aug, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Storage in DBMS is about how data is physically stored and managed on disk. Indexing is a technique to make data retrieval faster, like a book’s index. Advanced topics include query optimization, transaction management, distributed databases, and modern systems like NoSQL and data warehousing.

1. How can a clustered index degrade performance for certain queries despite being faster for others?

A clustered index stores table rows in the physical order of the indexed key.

  • Advantages: Great for range queries and sequential access because rows are stored contiguously.
  • Performance Degradation Cases:

Random Inserts: Require page splits and row movement to maintain order -> high I/O cost.
Frequent Updates to Key Column: Changing the indexed key means moving rows physically.
Non-Indexed Column Access: When queries filter on non-key columns, the clustering offers no benefit and can cause more page reads if data is spread.

  • Key takeaway: Clustered indexing is optimal when the indexed column is stable, frequently searched in ranges, and avoids random inserts.

2. How can bitmap indexing outperform B+ trees for certain workloads, even on large datasets?

Bitmap indexes represent data using bit arrays for each distinct value, making operations like AND, OR, NOT extremely fast.

  • Best Case: Low-cardinality columns (e.g., gender, status flags) with analytical queries involving multiple conditions.
  • Why faster: Logical bit operations are CPU-efficient and reduce disk I/O when combining conditions.
  • Limitations:

Inefficient for high-cardinality columns (bitmaps get huge).
Costly to maintain in write-heavy OLTP systems.

Thus, bitmap indexes excel in read-heavy, analytical workloads with many low-cardinality filters.

3. Why can hash indexing fail to outperform sequential scans for certain queries?

Hash indexes use a hash function to map keys to buckets, great for exact match lookups.

Weaknesses:

  • No range support: You can’t efficiently find values between two keys.
  • Poor cache locality: Hash buckets may be scattered across disk pages -> more random I/O.
  • Small table problem: For small datasets, sequential scans may be faster due to fewer I/O operations.

In short: Hash indexes are unbeatable for equality searches but poor for range queries and small table lookups.

4. How does fill factor in B+ trees affect read and write performance?

Fill factor controls how much space is left empty in index pages during creation/rebuild.

Low fill factor (more empty space):

Pros: Reduces page splits during inserts -> faster writes.
Cons: More pages needed for same data -> slower reads.

High fill factor (less empty space):

Pros: Compact structure, fewer pages -> faster reads.
Cons: More frequent page splits during inserts -> slower writes.

Choosing fill factor is a trade-off based on whether the workload is read-heavy or write-heavy.

5. Can a composite index cause a query to ignore it even if all indexed columns are in the WHERE clause?

Yes, because of the leftmost prefix rule in index usage.

Example: Composite index on (A, B, C)

  • Query filtering only on B and C cannot use index efficiently, because A is missing in filter.
  • Even if all three are in WHERE, if predicates on A are not selective enough, optimizer might prefer a different plan.

Thus, composite index order must match the most common query patterns.

6. How does covering index improve performance, and when can it backfire?

A covering index contains all columns required by a query, so no table lookup is needed (index-only scan).

  • Benefits: Fewer I/O operations, reduced table access.
  • Backfire Cases:

Large indexes due to extra columns -> more maintenance cost.
Insert/update operations become slower because index updates involve more data.
Not useful if queries often require columns outside the covering index.

Use covering indexes selectively for frequently executed, performance-critical queries.

7. How can database page size impact index performance?

Database page size determines how many rows or index entries fit per page.

Large page size:

Pros: More sequential data per read -> better range query performance.
Cons: More data per page means higher lock contention and wasted reads for small fetches.

Small page size:

Pros: Better for random reads, reduces I/O waste.
Cons: More page fetches for large scans.

Optimal page size depends on access pattern and I/O characteristics.

8. What is the difference between primary, secondary, and covering indexes in terms of storage and access path?

  • Primary Index: Built on the primary key; usually clustered, data rows are ordered physically.
  • Secondary Index: Built on non-primary key attributes; stores row pointers (RID) to locate actual data.
  • Covering Index: Special case of secondary index that stores all needed columns for a query, eliminating table lookups.

Storage & Access Path Impact:

  • Primary index fetches data directly without extra lookup.
  • Secondary index needs extra fetch from table.
  • Covering index avoids table fetch entirely.

9. Why can adaptive indexing strategies like auto-tuning lead to performance regression?

Auto-tuning tools create/destroy indexes based on observed query patterns.

Risks:

  • Workload changes: New queries may not benefit from existing indexes.
  • Write overhead: Too many indexes increase insert/update/delete cost.
  • Index thrashing: Frequent creation/deletion wastes resources and can fragment storage.

Adaptive indexing must consider long-term workload stability and not just short-term query patterns.

10. Explain the concept of index intersection and when it’s beneficial.

Index intersection is when the DB engine combines results from multiple indexes to satisfy a query.

Example:

  • Index1 on A
  • Index2 on B
  • Query: WHERE A=5 AND B=10 -> Instead of full table scan, engine fetches from both indexes and intersects results.

Benefits: Avoids creating expensive composite indexes.

Drawbacks: Can be slower if intersection set is large or indexes are not selective.

Useful for ad-hoc queries but not a replacement for well-designed composite indexes.

11. Explain the difference between clustered and non-clustered indexes. Why can a table have only one clustered index but multiple non-clustered indexes?

  • Clustered Index: Physically sorts and stores the data rows in the table based on the index key. The leaf nodes of the index contain the actual data pages.
  • Non-Clustered Index: Maintains a separate structure from the data rows; leaf nodes contain pointers (row IDs or primary key references) to actual data pages.
  • Why only one clustered index? Because the table can be physically sorted in only one way, so only one ordering can be applied to the data storage.
  • Why multiple non-clustered indexes? They are independent of the physical data order and can be created for multiple query patterns.
  • Impact: Clustered index improves range queries, but non-clustered indexes are better for selective lookups.

12. What is the difference between a dense index and a sparse index? Which one is preferred when using a clustered index?

  • Dense Index: Contains an index entry for every search key value in the data file. Faster lookups but consumes more space.
  • Sparse Index: Contains index entries only for some search key values. Saves space but requires more page accesses during lookups.
  • With Clustered Index: Sparse indexes are preferred because data is already sorted; fewer pointers are needed to locate ranges efficiently.

13. How does a B+ Tree differ from a B-Tree in database indexing, and why is B+ Tree generally preferred?

  • B-Tree: Stores keys and records at internal and leaf nodes.
  • B+ Tree: Stores keys in internal nodes and all records only in leaf nodes, which are linked sequentially.
  • Advantages of B+ Tree:

Supports efficient range queries via linked leaves.
Higher fan-out, meaning shallower trees and fewer disk I/Os.
Better sequential access performance.

Why preferred? Most DBMS prefer B+ Trees for indexes because they combine fast point lookups with optimal range query performance.

14. Describe bitmap indexing. Why is it efficient for low-cardinality attributes but not for high-cardinality ones?

  • Bitmap Indexing: Uses bit vectors for each distinct value of an attribute, marking rows where the value occurs.
  • Low-Cardinality: Very space-efficient and allows fast bitwise operations for filtering.
  • High-Cardinality: Creates too many bitmaps, increasing storage and slowing queries.
  • Use case: Ideal for gender, boolean flags, or categorical attributes with few distinct values.

15. What is the difference between primary, secondary, and composite indexes?

  • Primary Index: Built on the primary key, unique and usually clustered.
  • Secondary Index: Built on non-primary key attributes, can be non-unique.
  • Composite Index: Built on two or more columns to speed up multi-column searches.
  • Impact: Choice affects query optimization, composite indexes are powerful but only beneficial if queries match the column order.

16. Why do databases use multi-level indexing, and how does it reduce disk I/O?

  • Problem: Large single-level indexes may not fit in memory, requiring multiple disk accesses.
  • Multi-Level Indexing: Organizes index pages into a hierarchy (like B+ Tree levels) so fewer disk I/Os are needed.
  • Effect: Logarithmic search complexity reduces from O(n) to O(log n) in terms of disk page reads.

17. Explain the concept of hash indexing. Why is it unsuitable for range queries?

  • Hash Index: Uses a hash function to map keys to bucket addresses for O(1) equality search.
  • Limitation: Keys are distributed randomly; no inherent ordering is preserved, so range queries require scanning all buckets.
  • Best Use: Equality lookups like WHERE id = 100.

18. What are covering indexes, and how do they improve performance?

  • Covering Index: Contains all columns needed to satisfy a query, avoiding access to the base table.
  • Benefit: Reduces disk I/O since only the index pages are read.
  • Example: Index on (customer_id, order_date, total_amount) can satisfy SELECT order_date, total_amount WHERE customer_id = ? directly.

19. Describe the difference between static and dynamic hashing in DBMS.

  • Static Hashing: Hash table size is fixed; overflow handled by chaining or overflow areas.
  • Dynamic Hashing: Hash table grows/shrinks dynamically (e.g., extendible or linear hashing) to handle varying data volumes without excessive overflow.
  • Trade-off: Dynamic hashing avoids performance degradation in large datasets.

20. Why is index maintenance important, and what are the trade-offs of too many indexes?

Index Maintenance: Updates, inserts, and deletes must update all affected indexes, increasing overhead.

Too Many Indexes:

  • Slows write operations due to extra maintenance.
  • Uses more storage.
  • May confuse the optimizer if redundant indexes exist.
  • Balance: Create indexes for frequent query patterns but avoid unnecessary duplicates.

21. Explain the concept of a stored procedure in DBMS.

A stored procedure is a precompiled collection of one or more SQL statements stored in the database. Stored procedures allow users to execute a series of operations as a single unit, improving performance and reusability. They can accept input parameters, perform operations, and return results.

Example:

CREATE PROCEDURE GetStudentDetails(IN student_id INT)
BEGIN
SELECT * FROM Student WHERE ID = student_id;
END;

22. What are triggers in DBMS? Provide an example.

A trigger is a special kind of stored procedure that automatically executes (or "fires") in response to certain events on a table, such as insertions, updates, or deletions. Triggers are used to enforce business rules, maintain consistency, or log changes.

Example:

CREATE TRIGGER after_student_insert
AFTER INSERT ON Student
FOR EACH ROW
BEGIN
INSERT INTO AuditLog (Action, StudentID, ActionTime)
VALUES ('INSERT', NEW.ID, NOW());
END;

23. Explain the concept of indexing in DBMS.

Indexing is a technique used to speed up the retrieval of records from a table by creating a data structure that allows for faster searching. An index provides a quick lookup of data based on the values of one or more columns.

Types of Indexes:

  • Single-column index: Created on one column.
  • Composite index: Created on multiple columns.
  • Unique index: Ensures that no two rows have the same values in the indexed columns.

24. What is a database cursor? How is it used?

A cursor in DBMS is a pointer to a result set of a query. It allows for row-by-row processing of query results, which is useful when dealing with large datasets.

Types of cursors:

  • Implicit cursors: Automatically created by the DBMS for SELECT, INSERT, UPDATE, DELETE operations.
  • Explicit cursors: Manually created by the programmer to process query results.

Example:

DECLARE cursor_example CURSOR FOR
SELECT * FROM Employee;

25. What is the difference between a clustered and non-clustered index?

  • Clustered Index: Organizes the data in the table according to the index. There can only be one clustered index per table because the data rows can only be sorted one way.
  • Non-clustered Index: Creates a separate structure from the table that holds pointers to the actual data rows. Multiple non-clustered indexes can be created on a table.

26. What is a materialized view in DBMS?

A materialized view is a database object that contains the results of a query. Unlike a regular view, which is a virtual table (it doesn’t store data), a materialized view stores data physically, improving query performance by precomputing and storing results.

Use Case: Materialized views are commonly used for performance optimization in data warehousing and reporting systems, where the same data is frequently queried.

Example:

CREATE MATERIALIZED VIEW SalesSummary AS
SELECT Product, SUM(Quantity) FROM Sales GROUP BY Product;

27. What are stored functions in DBMS?

A stored function is a set of SQL statements that can be executed in the database. It accepts input parameters, performs some logic, and returns a value. Stored functions are similar to stored procedures but differ in that they must return a value.

Example:

CREATE FUNCTION GetEmployeeSalary(EmployeeID INT)
RETURNS DECIMAL(10,2)
BEGIN
DECLARE salary DECIMAL(10,2);
SELECT Salary INTO salary FROM Employee WHERE ID = EmployeeID;
RETURN salary;
END;

28. What are the different types of backups in DBMS?

There are several types of backups in DBMS:

  • Full Backup: A full backup copies the entire database, including all data and the database structure. It is the most comprehensive but can take up a lot of storage and time.
  • Incremental Backup: An incremental backup only copies the data that has changed since the last backup (either full or incremental). This saves space and time but requires the restoration of the full backup and all incremental backups.
  • Differential Backup: A differential backup copies all changes made since the last full backup. It’s faster than a full backup and simpler to restore than incremental backups.
  • Transaction Log Backup: A transaction log backup copies the transaction log, which records all transactions performed on the database. This allows for point-in-time recovery.

29. What is the use of the "WITH CHECK OPTION" in SQL views?

The "WITH CHECK OPTION" is used when creating a view in SQL to ensure that any insert or update operation on the view must adhere to the conditions defined in the view’s WHERE clause. If the inserted or updated data violates these conditions, the operation will be rejected.

Example: Here, if a user tries to insert or update a Student record with a status other than 'Active', the operation will fail.

CREATE VIEW ActiveStudents AS
SELECT * FROM Students WHERE Status = 'Active'
WITH CHECK OPTION;

30. Explain the concept of a B-tree and B+ tree in DBMS.

B-tree (Balanced Tree):

  • A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, insertions, deletions in logarithmic time.
  • B-trees are used in databases and file systems to store large amounts of data. All nodes in a B-tree can have multiple children, which increases the efficiency of searching.
  • Stores data in both internal and leaf nodes.

B+ tree:

  • A B+ tree is an extension of the B-tree and is widely used in databases for indexing. It differs in that it has a linked list at the leaf level and stores all records in the leaf nodes.
  • The internal nodes of the B+ tree store only keys and pointers to the next level of the tree, while the leaf nodes contain actual data or pointers to the data.
  • Stores data only in the leaf nodes and uses the internal nodes for indexing.

31. What is a hashing technique in DBMS? How does it work?

A hashing technique in DBMS is used to map data (such as a key) to a fixed-size value or address, using a hash function. It is primarily used for quick data retrieval, particularly in hash indexes or hash tables. Example: A hash function might map a StudentID of 123 to a bucket index of 3. The student’s record would be stored in the corresponding bucket.

How it works:

  • A hash function takes the key (e.g., a record’s ID) and calculates a hash value.
  • This hash value determines the bucket or slot where the data is stored.
  • When searching for a record, the hash function is applied again to the key to find the corresponding bucket.

32. What is the difference between a trigger and a stored procedure?

Trigger:

  • A trigger is an automatic action executed by the DBMS when a specific event occurs on a table, such as an INSERT, UPDATE, or DELETE.
  • It cannot be invoked manually and is tied to a specific event.

Stored Procedure:

  • A stored procedure is a precompiled set of SQL statements that can be executed explicitly by a user or application.
  • It is invoked manually, and it can accept input parameters and return output.

33. Explain the concept of data partitioning in DBMS.

Data partitioning is the process of dividing large datasets into smaller, more manageable segments (partitions) to improve performance, scalability, and availability. Each partition can be stored and processed separately.

Types of partitioning:

  1. Horizontal Partitioning: Divides data by rows. For example, splitting data by time range (e.g., 2020 data in one partition, 2021 in another).
  2. Vertical Partitioning: Divides data by columns. For example, putting frequently accessed columns in one partition and less frequently accessed columns in another.
  3. Range Partitioning: Data is divided based on a range of values (e.g., age groups, date ranges).
  4. Hash Partitioning: Data is distributed across partitions based on a hash value derived from a key column.

34. What is the role of the DBMS in handling data integrity and security?

The DBMS plays a critical role in ensuring:

  • Data Integrity: Through constraints like Primary Keys, Foreign Keys, and Check Constraints, the DBMS ensures data consistency and accuracy.
  • Data Security: DBMS systems provide user authentication, access control, and encryption mechanisms to protect data from unauthorized access and breaches. It also supports role-based access control (RBAC), ensuring that only authorized users can perform certain actions on the data.

35. How is DBMS different from a file-based system?

AspectDBMSFile-based System
Data OrganizationData is stored in tables (relations) with structured schemas, supporting complex queries and relationships.Data is stored in flat files without any relationship between data.
Data RedundancyMinimizes redundancy through normalization.Data redundancy is common as data may be duplicated across different files.
Data IntegrityEnsures data integrity through constraints (e.g., primary keys, foreign keys, and check constraints).Data integrity is hard to enforce, as files don’t have built-in integrity checks.
SecurityProvides advanced security features like user authentication, access control, and encryption.Security is managed at the file system level, which is less robust than DBMS security features.
Concurrency ControlHandles concurrent access using locking mechanisms, ensuring data consistency.No built-in concurrency control; data corruption may occur when multiple users access the same file.
Data AccessSupports complex querying and transaction management (e.g., SQL).Data access is limited to basic file operations like reading, writing, and appending.
Backup and RecoveryAutomatic backup and recovery mechanisms, often integrated into the system.Backup and recovery mechanisms are manual and may not be as robust.
Transaction ManagementSupports ACID (Atomicity, Consistency, Isolation, Durability) properties for reliable transaction management.No built-in support for transactions or ACID properties, making it prone to errors during data modification.

Article Tags :

Explore