Advance Level Interview Preparation SQL Server Indexing
Advance Level Interview Preparation SQL Server Indexing
INDEXING
Indexing is a critical part of SQL Server's storage engine that enhances query performance. To understand it at an
advanced level, we need to analyze internal architecture, page structure, and calculations related to index
storage.
Indexes in SQL Server use a balanced-tree (B-tree) structure, ensuring efficient search, insert, delete, and update
operations. The tree consists of:
1. Root Level – The topmost page, directing searches to intermediate or leaf levels.
2. Intermediate Levels – Contains index pages pointing to lower levels.
3. Leaf Level
o Clustered Index: Contains actual table data.
o Non-Clustered Index: Contains indexed columns and row locators (RID for heaps or Clustered Key for
clustered index tables).
Each row has a fixed overhead of 7 bytes, and additional space for variable-length columns.
Where:
EXAMPLE CALCULATION
A Clustered Index stores data in leaf nodes, so the number of data pages required is:
A Non-Clustered Index stores only indexed columns and a row locator (RID). Suppose a non-clustered index
includes:
EmployeeID (4 bytes)
Salary (5 bytes)
Row Locator (RID) (8 bytes)
4. CONCLUSION
SQL Server indexing uses a B-tree structure, with root, intermediate, and leaf levels.
Pages are 8 KB, and row storage depends on column types and sizes.
Clustered Indexes store actual data, while Non-Clustered Indexes store keys + row locators.
Page calculations help estimate storage requirements and performance impact.
INDEX STORAGE
Indexes in SQL Server follow a B-tree structure where each node points to child pages. The depth of the index
tree determines index lookup efficiency.
Where:
Example:
For a Non-Clustered Index where:
Since tree depth must be an integer, it rounds up to 3 levels (Root → Intermediate → Leaf).
✅ Conclusion:
The index lookup requires at most 3 logical reads (Root →Intermediate →Leaf).
Shallow trees result in faster lookups.
A Covering Index includes all necessary columns for a query to avoid extra lookups in the table.
EXAMPLE QUERY:
SELECT EmployeeID, Salary FROM Employee WHERE Age > 30;
A normal Non-Clustered Index on Age would still require a lookup in the Clustered Index to fetch Salary.
Now, SQL Server can fetch results directly from the index, reducing I/O overhead.
If your table has billions of records, SQL Server supports index partitioning to distribute data across multiple
storage locations.
✅ Benefit: Queries automatically scan only the required partitions instead of the entire index.
If a column has many NULL values or a query targets only specific values, a Filtered Index improves performance.
✅ Final Hierarchy:
✅ Best Practice: Always analyze the Actual Execution Plan for real-world performance tuning.
Clustered Index Scan Scans the entire table (slow) ❌ Avoid if possible
Non-Clustered Index
Scans entire non-clustered index ❌ Can be slow
Scan
Non-Clustered Index Seek Uses non-clustered index efficiently ✅ Good for performance
Nested Loop Join Efficient if one table is small ✅ Best for indexed small table joins
Merge Join Works well for sorted data ✅ Best for sorted & indexed tables
It means SQL Server reads all rows because there's no proper index.
Now, if you rerun the query and check the execution plan, you should see:
🔴 Issue: SQL Server found Age in the index but had to lookup the clustered index for Salary.
✔ Now, SQL Server can fetch Age and Salary directly from the index.
🔹 If SQL Server is not using the best index automatically, force it with INDEX hint:
🔎 Finds slow queries with high CPU time, logical reads, and execution time.
✅ Solution:
SCENARIO
A retail company has an Orders table with 100 million rows. A report query for retrieving orders by CustomerID
is very slow.
🔴 Problem:
🔹 Effect: SQL Server seeks the index instead of scanning the entire table.
🔴 Issue: SQL Server looks up OrderDate and TotalAmount in the clustered index.
✅ Now, SQL Server fetches all required data from the index (no extra lookups).
✔ Now, queries only scan relevant partitions, not the full table.
Column store indexes are best for aggregations, OLAP, and reporting queries on large datasets.
REAL-WORLD COLUMNSTORE PERFORMANCE COMPARISON WITH EXECUTION PLANS & QUERY TIMES
Columnstore indexes significantly improve query performance for analytics, reporting, and large datasets. Let's
compare row-store vs. columnstore indexing with real execution plans and query times.
EXECUTION PLAN
Index Scan (Cost: 80%)
🔵 Improvement: Faster than table scan but still reads millions of rows.
EXECUTION PLAN
Columnstore Index Scan (Batch Mode Execution) – Cost: 5%
Hybrid indexing leverages both B-tree (row-store) and columnstore indexes to optimize OLTP + Analytics in
SQL Server. This approach is ideal when you need fast inserts/updates for transactions and efficient
aggregations for reporting.
Range Queries (WHERE OrderDate > '2024-01-01') ✅ Fast (Index Seek) ✅ Good
We use Clustered Row-Store Index for OLTP and Non-Clustered Columnstore Index for Analytics.
Status VARCHAR(20)
);
✔ Primary Key uses Clustered Row-Store Index (B-tree) for fast transactions.
✔ Now, SQL Server can choose the best index based on the query type.
🔍 Execution Plan:
🔍 Execution Plan:
✔ Now, analytics queries scan only relevant partitions (even faster performance).
In this test, we compare Hybrid Indexing (Row-Store + Columnstore) vs. Traditional Indexing (Only Row-
Store) using execution plans and query times.
QUERY
SELECT * FROM Orders WHERE OrderID = 5000000;
QUERY
SELECT CustomerID, SUM(TotalAmount)
FROM Orders
GROUP BY CustomerID;
Row-Store Only Index Scan (Row Mode Execution) – Cost: 80% 8 sec
QUERY
SELECT * FROM Orders WHERE OrderDate > '2024-01-01';
No
Lookup Query (WHERE OrderID = 5000000) 1 ms 1 ms
Difference
HANDS-ON DEMO: HYBRID INDEXING VS. TRADITIONAL INDEXING IN SQL SERVER MANAGEMENT
STUDIO (SSMS)
This guide walks you through executing real queries in SSMS and analyzing execution plans to compare Hybrid
Indexing vs. Traditional Indexing.
Open SSMS and execute the following SQL script to create a 100M-row Orders table.
✅ This table uses a Clustered Row-Store Index for fast OLTP transactions.
Run this script to insert simulated 100M rows into the Orders table.
🔴 Issue: SQL Server scans the entire table, which is very slow.
✅ This index allows fast analytics while keeping OLTP performance intact.
Now, rerun the same aggregation query and check the execution plan.
1. Open SSMS.
2. Click "Query" →"Include Actual Execution Plan".
3. Run the same query before and after indexing.
4. Compare the two execution plans:
o Before Indexing: Table Scan (High Cost)
o After Indexing: Columnstore Index Scan (Batch Mode, Low Cost)
EXECUTION PLAN:
No
Lookup (WHERE OrderID = X) 1 ms 1 ms
Difference
3. FROM Orders
4. GROUP BY CustomerID;
5. Wait for execution to complete (~35 sec).
6. Check the Execution Plan Tab (it should show a Table Scan).
7. Take a screenshot (PrtScn or Snipping Tool in Windows).
The before index execution plan should show a Table Scan (High Cost).
The after index execution plan should show a Columnstore Index Scan (Batch Mode, Low Cost).