Open In App

Advanced SQL and Optimization Interview Questions - SQL

Last Updated : 02 Sep, 2025
Comments
Improve
Suggest changes
1 Likes
Like
Report

Advanced SQL and Optimization covers performance tuning and complex features that make SQL powerful. Topics include stored procedures, triggers, window functions, CTEs, indexing, and query execution plans. Interview questions often explore how to optimize queries, manage transactions and locks, and use advanced functions for analytics and scalability in real-world applications.

1. What are the differences between SQL and NoSQL databases?

SQL Databases:

  • Use structured tables with rows and columns.
  • Rely on a fixed schema.
  • Offer ACID properties.

NoSQL Databases:

  • Use flexible, schema-less structures (e.g., key-value pairs, document stores).
  • Are designed for horizontal scaling.
  • Often focus on performance and scalability over strict consistency.

2. How is data integrity maintained in SQL databases?

Data integrity refers to the accuracy, consistency, and reliability of the data stored in the database. SQL databases maintain data integrity through several mechanisms:

  • Constraints: Ensuring that certain conditions are always met. For example, NOT NULL ensures a column cannot have missing values, FOREIGN KEY ensures a valid relationship between tables, and UNIQUE ensures no duplicate values.
  • Transactions: Ensuring that a series of operations either all succeed or all fail, preserving data consistency.
  • Triggers: Automatically enforcing rules or validations before or after changes to data.
  • Normalization: Organizing data into multiple related tables to minimize redundancy and prevent anomalies.
    These measures collectively ensure that the data remains reliable and meaningful over time.

3. What are the advantages of using stored procedures?

  • Improved Performance: Stored procedures are precompiled and cached in the database, making their execution faster than sending multiple individual queries.
  • Reduced Network Traffic: By executing complex logic on the server, fewer round trips between the application and database are needed.
  • Enhanced Security: Stored procedures can restrict direct access to underlying tables, allowing users to execute only authorized operations.
  • Reusability and Maintenance: Once a procedure is written, it can be reused across multiple applications. If business logic changes, you only need to update the stored procedure, not every application that uses it.

4. What is the difference between an index and a key in SQL?

1. Index

  • An index is a database object created to speed up data retrieval. It stores a sorted reference to table data, which helps the database engine find rows more quickly than scanning the entire table.
  • Example: A non-unique index on a column like LastName allows quick lookups of rows where the last name matches a specific value.

2. Key

  • A key is a logical concept that enforces rules for uniqueness or relationships in the data.
  • For instance, a PRIMARY KEY uniquely identifies each row in a table and ensures that no duplicate or NULL values exist in the key column(s).
  • A FOREIGN KEY maintains referential integrity by linking rows in one table to rows in another.

5. What is the difference between a local and a global temporary table?

Local Temporary Table:

  • Prefixed with # (e.g., #TempTable).
  • Exists only within the session that created it.
  • Automatically dropped when the session ends.

Global Temporary Table:

  • Prefixed with ## (e.g., ##GlobalTempTable).
  • Visible to all sessions.
  • Dropped only when all sessions referencing it are closed.

Example:

CREATE TABLE #LocalTemp (ID INT);
CREATE TABLE ##GlobalTemp (ID INT);

6. What are partitioned tables, and when should we use them?

Partitioned tables divide data into smaller, more manageable segments based on a column’s value (e.g., date or region). Each partition is stored separately, making queries that target a specific partition more efficient. It is used when

  • Large tables with millions or billions of rows.
  • Scenarios where queries frequently filter on partitioned columns (e.g., year, region).
  • To improve maintenance operations, such as archiving older partitions without affecting the rest of the table.

7. What are the ACID properties of a transaction?

ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability, four key properties that ensure database transactions are processed reliably.

  • Atomicity: A transaction is treated as a single unit of work, meaning all operations must succeed or fail as a whole. If any part of the transaction fails, the entire transaction is rolled back.
  • Consistency: A transaction must take the database from one valid state to another, maintaining all defined rules and constraints. This ensures data integrity is preserved throughout the transaction process.
  • Isolation: Transactions should not interfere with each other. Even if multiple transactions occur simultaneously, each must operate as if it were the only one in the system until it is complete.
  • Durability: Once a transaction is committed, its changes must persist, even in the event of a system failure. This ensures the data remains stable after the transaction is successfully completed.

8. What are the differences between isolation levels in SQL?

Isolation levels define the extent to which the operations in one transaction are isolated from those in other transactions. They are critical for managing concurrency and ensuring data integrity. Common isolation levels include:

1. Read Uncommitted:

  • Allows reading uncommitted changes from other transactions.
  • Can result in dirty reads, where a transaction reads data that might later be rolled back.

2. Read Committed:

  • Ensures a transaction can only read committed data.
  • Prevents dirty reads but does not protect against non-repeatable reads or phantom reads.

3. Repeatable Read:

  • Ensures that if a transaction reads a row, that row cannot change until the transaction is complete.
  • Prevents dirty reads and non-repeatable reads but not phantom reads.

4. Serializable:

  • The highest level of isolation.
  • Ensures full isolation by effectively serializing transactions, meaning no other transaction can read or modify data that another transaction is using.
  • Prevents dirty reads, non-repeatable reads, and phantom reads, but may introduce performance overhead due to locking and reduced concurrency.

9. What is the purpose of the WITH (NOLOCK) hint in SQL Server?

  • The WITH (NOLOCK) hint allows a query to read data without acquiring shared locks, effectively reading uncommitted data.
  • It can improve performance by reducing contention for locks, especially on large tables that are frequently updated.
  • Results may be inconsistent or unreliable, as the data read might change or be rolled back.

Example:

SELECT *
FROM Orders WITH (NOLOCK);

This query fetches data from the Orders table without waiting for other transactions to release their locks.

10. How do you handle deadlocks in SQL databases?

Deadlocks occur when two or more transactions hold resources that the other transactions need, resulting in a cycle of dependency that prevents progress. Strategies to handle deadlocks include:

  • Deadlock detection and retry: Many database systems have mechanisms to detect deadlocks and terminate one of the transactions to break the cycle. The terminated transaction can be retried after the other transactions complete.
  • Reducing lock contention: Use indexes and optimized queries to minimize the duration and scope of locks. Break transactions into smaller steps to reduce the likelihood of conflicts.
  • Using proper isolation levels: In some cases, lower isolation levels can help reduce locking. Conversely, higher isolation levels (like Serializable) may ensure a predictable order of operations, reducing deadlock risk.
  • Consistent ordering of resource access: Ensure that transactions acquire resources in the same order to prevent cyclical dependencies.

11. What is a database snapshot, and how is it used?

A database snapshot is a read-only, static view of a database at a specific point in time.

  • Reporting: Allowing users to query a consistent dataset without affecting live operations.
  • Backup and recovery: Snapshots can serve as a point-in-time recovery source if changes need to be reversed.
  • Testing: Providing a stable dataset for testing purposes without the risk of modifying the original data.

Example:

CREATE DATABASE MySnapshot ON
(
NAME = MyDatabase_Data,
FILENAME = 'C:\Snapshots\MyDatabase_Snapshot.ss'
)
AS SNAPSHOT OF MyDatabase;

12. What are the differences between OLTP and OLAP systems?

1. OLTP (Online Transaction Processing)

  • Handles large volumes of simple transactions (e.g., order entry, inventory updates).
  • Optimized for fast, frequent reads and writes.
  • Normalized schema to ensure data integrity and consistency.
  • Examples: e-commerce sites, banking systems.

2. OLAP (Online Analytical Processing)

  • Handles complex queries and analysis on large datasets.
  • Optimized for read-heavy workloads and data aggregation.
  • Denormalized schema (e.g., star or snowflake schemas) to support faster querying.
  • Examples: Business intelligence reporting, data warehousing.

13. What is a live lock, and how does it differ from a deadlock?

1. Live Lock

  • Occurs when two or more transactions keep responding to each other’s changes, but no progress is made.
  • Unlike a deadlock, the transactions are not blocked; they are actively running, but they cannot complete.

2. Deadlock

  • A deadlock occurs when two or more transactions are waiting on each other’s resources indefinitely, blocking all progress.
  • No progress can be made unless one of the transactions is terminated

14. How do you implement dynamic SQL, and what are its advantages and risks?

Dynamic SQL is SQL code that is constructed and executed at runtime rather than being fully defined and static. In SQL Server: Use sp_executesql or EXEC. In other databases: Concatenate query strings and execute them using the respective command for the database platform.

Syntax:

DECLARE @sql NVARCHAR(MAX)
SET @sql = 'SELECT * FROM ' + @TableName
EXEC sp_executesql @sql;

Advantages:

  • Flexibility: Dynamic SQL can adapt to different conditions, tables, or columns that are only known at runtime.
  • Simplifies Complex Logic: Instead of writing multiple queries, a single dynamically constructed query can handle multiple scenarios.

Risks:

  • SQL Injection Vulnerabilities: If user input is not sanitized, attackers can inject malicious SQL code.
  • Performance Overhead: Because dynamic SQL is constructed at runtime, it may not benefit from cached execution plans, leading to slower performance.
  • Complexity in Debugging: Dynamic queries can be harder to read and troubleshoot.

15. What is the difference between horizontal and vertical partitioning?

Partitioning is a database technique used to divide data into smaller, more manageable pieces.

Horizontal Partitioning:

  • Divides the rows of a table into multiple partitions based on values in a specific column.
  • Example: Splitting a customer table into separate partitions by geographic region or by year.
  • Use Case: When dealing with large datasets, horizontal partitioning can improve performance by limiting the number of rows scanned for a query.

Vertical Partitioning:

  • Divides the columns of a table into multiple partitions.
  • Example: Storing infrequently accessed columns (e.g., large text or binary fields) in a separate table or partition.
  • Use Case: Helps in optimizing storage and query performance by separating commonly used columns from less frequently accessed data.

Key Difference:

  • Horizontal partitioning is row-based, focusing on distributing the dataset’s rows across partitions.
  • Vertical partitioning is column-based, aiming to separate less-used columns into different partitions or tables.

16. What are the considerations for indexing very large tables?

  • Indexing Strategy: Focus on the most frequently queried columns or those involved in JOIN and WHERE conditions. Avoid indexing every column, as it increases storage and maintenance costs.
  • Index Types: Use clustered indexes for primary key lookups and range queries. Use non-clustered indexes for filtering, ordering, and covering specific queries.
  • Partitioned Indexes: If the table is partitioned, consider creating local indexes for each partition. This improves manageability and can speed up queries targeting specific partitions.
  • Maintenance Overhead: Index rebuilding and updating can be resource-intensive. Plan for regular index maintenance during off-peak hours. Monitor index fragmentation and rebuild indexes as necessary to maintain performance.
  • Monitoring and Tuning: Continuously evaluate query performance using execution plans and statistics. Remove unused or rarely accessed indexes to reduce maintenance costs.

Indexing large tables requires a careful approach to ensure that performance gains from faster queries outweigh the costs of increased storage and maintenance effort.

17. What is the difference between database sharding and partitioning?

1. Sharding

  • Sharding involves splitting a database into multiple smaller, independent databases (shards). Each shard operates on a subset of the overall data and can be hosted on separate servers.
  • Sharding is a horizontal scaling strategy that distributes data across multiple databases, typically to handle massive data volumes and high traffic.
  • Purpose: Horizontal scaling to handle large volumes of data and high query loads.
  • Example: A global user database might be divided into shards by region, such as a shard for North America, Europe, and Asia.
  • Key Benefit: Each shard can be queried independently, reducing the load on any single server.

2. Partitioning

  • Partitioning splits a single table into smaller, logical pieces, usually within the same database.
  • Partitioning is a logical organization of data within a single database to optimize performance and manageability.
  • Purpose: Improve query performance by reducing the amount of data scanned, and simplify maintenance tasks such as archiving or purging old data.
  • Example: A sales table could be partitioned by year so that queries targeting recent sales do not need to scan historical data.

18. What are the best practices for writing optimized SQL queries?

  • Write Simple, Clear Queries: Avoid overly complex joins and subqueries. Use straightforward, well-structured SQL that is easy to read and maintain.
  • Filter Data Early: Apply WHERE clauses as early as possible to reduce the amount of data processed. Consider using indexed columns in WHERE clauses for faster lookups.
  • **Avoid SELECT *: Retrieve only the columns needed. This reduces I/O and improves performance.
  • Use Indexes Wisely: Create indexes on columns that are frequently used in WHERE clauses, JOIN conditions, and ORDER BY clauses. Regularly review index usage and remove unused indexes.
  • Leverage Query Execution Plans: Use execution plans to identify bottlenecks, missing indexes, or inefficient query patterns.
  • Use Appropriate Join Types: Choose INNER JOIN, LEFT JOIN, or OUTER JOIN based on the data relationships and performance requirements.
  • Break Down Complex Queries: Instead of a single monolithic query, use temporary tables or CTEs to process data in stages.
  • Optimize Aggregations: Use GROUP BY and aggregate functions efficiently. Consider pre-aggregating data if queries frequently require the same computations.
  • Monitor Performance Regularly: Continuously analyze query performance and fine-tune as data volumes grow or usage patterns change.

19. How can you monitor query performance in a production database?

  • Use Execution Plans: Review the execution plan of queries to understand how the database is retrieving data, which indexes are being used, and where potential bottlenecks exist.
  • Analyze Wait Statistics: Identify where queries are waiting, such as on locks, I/O, or CPU, to pinpoint the cause of slowdowns.
  • Leverage Built-in Monitoring Tools:

SQL Server: Use Query Store, DMVs (Dynamic Management Views), and performance dashboards.
MySQL: Use EXPLAIN, SHOW PROFILE, and the Performance Schema.
PostgreSQL: Use EXPLAIN (ANALYZE), pg_stat_statements, and log-based monitoring.

  • Set Up Alerts and Baselines: Monitor key performance metrics (query duration, IOPS, CPU usage) and set thresholds. Establish baselines to quickly identify when performance degrades.
  • Continuous Query Tuning: Regularly revisit and tune queries as data grows or application requirements change. Remove unused or inefficient indexes and re-evaluate the indexing strategy.

20. What are the trade-offs of using indexing versus denormalization?

1. Indexing

Advantages:

  • Speeds up read operations and improves query performance without changing the data structure.
  • Can be applied incrementally and is reversible if not effective.
  • Consider indexing when you need faster lookups without altering the data model.

Disadvantages:

  • Slows down write operations as indexes need to be maintained.
  • Requires additional storage.

2. Denormalization

Advantages:

  • Simplifies query logic by storing pre-joined or aggregated data.
  • Can improve performance for read-heavy workloads where complex joins are frequent.
  • Consider denormalization when complex joins or repeated aggregations significantly slow down queries

Disadvantages:

  • Introduces data redundancy, which can lead to inconsistencies.
  • Increases storage requirements.
  • Makes updates more complex, as redundant data must be synchronized.

21. How does SQL handle recursive queries?

SQL handles recursive queries using Common Table Expressions (CTEs). A recursive CTE repeatedly references itself to process hierarchical or tree-structured data.

Key Components:

  • Anchor Member: The initial query that starts the recursion.
  • Recursive Member: A query that references the CTE to continue building the result set.
  • Termination Condition: Ensures that recursion stops after a certain depth or condition is met.

Example:

WITH RecursiveCTE (ID, ParentID, Depth) AS (
SELECT ID, ParentID, 1 AS Depth
FROM Categories
WHERE ParentID IS NULL
UNION ALL
SELECT c.ID, c.ParentID, r.Depth + 1
FROM Categories c
INNER JOIN RecursiveCTE r
ON c.ParentID = r.ID
)
SELECT * FROM RecursiveCTE;

22. What are the differences between transactional and analytical queries?

1. Transactional Queries:

  • Focus on individual, short-term operations such as inserts, updates, and deletes.
  • Optimize for high-throughput and low-latency.
  • Often used in OLTP (Online Transaction Processing) systems.

2. Analytical Queries:

  • Involve complex aggregations, multi-dimensional analysis, and data transformations.
  • Typically read-heavy, processing large amounts of historical or aggregated data.
  • Often used in OLAP (Online Analytical Processing) systems.

Key Differences:

Transactional queries support day-to-day operations and maintain data integrity.
Analytical queries support decision-making by providing insights from large datasets

23. How can you ensure data consistency across distributed databases?

  • Use Distributed Transactions: Implement two-phase commit (2PC) to ensure all participating databases commit changes simultaneously or roll back if any part fails.
  • Implement Eventual Consistency: If strong consistency isn’t required, allow data to become consistent over time. This approach is common in distributed systems where high availability is a priority.
  • Conflict Resolution Mechanisms: Use versioning, timestamps, or conflict detection rules to resolve inconsistencies.
  • Data Replication and Synchronization: Use reliable replication strategies to ensure that changes made in one database are propagated to others.
  • Regular Audits and Validation: Periodically verify that data remains consistent across databases and fix discrepancies as needed.

24. What is the purpose of the SQL PIVOT operator?

The PIVOT operator transforms rows into columns, making it easier to summarize or rearrange data for reporting.

Example: Converting a dataset that lists monthly sales into a format that displays each month as a separate column.

SELECT ProductID, [2021], [2022]
FROM (
SELECT ProductID, YEAR(SaleDate) AS SaleYear, Amount
FROM Sales
) AS Source
PIVOT (
SUM(Amount)
FOR SaleYear IN ([2021], [2022])
) AS PivotTable;


Article Tags :

Explore