0% found this document useful (0 votes)
11 views14 pages

Optimizing Heavy Queries in SQL Server - Running Totals

The document analyzes six methods for calculating Running Totals in SQL Server using a test dataset from the AdventureWorks2022 database. Each method is evaluated based on SQL code, execution plans, strengths, weaknesses, and resource usage metrics. The Window Function method is highlighted as the most efficient and suitable for production environments, while other methods like INNER JOIN and Subquery are noted for their high processing costs and inefficiencies with larger datasets.

Uploaded by

Anoopkumar AK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

Optimizing Heavy Queries in SQL Server - Running Totals

The document analyzes six methods for calculating Running Totals in SQL Server using a test dataset from the AdventureWorks2022 database. Each method is evaluated based on SQL code, execution plans, strengths, weaknesses, and resource usage metrics. The Window Function method is highlighted as the most efficient and suitable for production environments, while other methods like INNER JOIN and Subquery are noted for their high processing costs and inefficiencies with larger datasets.

Uploaded by

Anoopkumar AK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Optimizing Heavy Queries in SQL Server – Running Totals

Prerequisites

To analyze and compare different methods of calculating Running Total in SQL Server, I needed a test dataset. For this purpose, I used
the Sales.SalesOrderDetail table from the AdventureWorks2022 database to generate the required data.
It is important to note that choosing this table was only for convenience, and there is no strong dependency on its data. Any other method
can be used to generate similar test data.
Finally, about 30,000 rows were inserted into a table named Population, which was then used to perform the Running Total calculations.
The structure of this table is as follows:
CREATE TABLE dbo.Population
(
[Date] DATE NOT NULL PRIMARY KEY,
Births INT ,
Deaths INT
);
GO

To populate the table, a CTE was used to generate simulated data (date, number of births, and number of deaths) based on the columns
of the Sales.SalesOrderDetail table. Then, the first 30,000 rows were inserted into the target table:

; WITH cte AS
(
SELECT DATEADD(DAY,sod.SalesOrderDetailID,'1899-12-31') As [Date],
(sod.ProductID*sod.OrderQty)%200 As Births,
(sod.ProductID*sod.OrderQty)%210 As Deaths
FROM AdventureWorks2022.Sales.SalesOrderDetail as sod
)
INSERT dbo.Population([Date], Births, Deaths)
SELECT TOP 30000
c.[Date], c.Births, Deaths
FROM cte AS c
GO

In this way, a suitable dataset for testing was created, with enough rows to evaluate different Running Total scenarios.
In the following sections, six methods for calculating Running Total in SQL Server are reviewed (although other methods also exist).
For each method, the SQL code, execution plan, strengths and weaknesses, as well as resource usage metrics (CPU, execution time,
Reads, and Writes) are presented.
Method 1: Using INNER JOIN

In this method, the table is joined with itself (Self-Join) so that for each row, all rows with a date less than or equal to the current one
are aggregated.
SELECT p1.[Date],
p1.Births,
SUM(p2.Births) AS RunningTotal_Births,
p1.Deaths,
SUM(p2.Deaths) AS RunningTotal_Deaths
FROM dbo.Population AS p1
INNER JOIN dbo.Population AS p2
ON p2.[Date] <= p1.[Date]
GROUP BY p1.[Date], p1.Births, p1.Deaths
ORDER BY p1.[Date];
GO

🔹 Advantages
* Simple and intuitive to implement * Suitable for very small datasets

🔻 Disadvantages
* Very high processing cost O(n²) * Heavy Reads * Extremely inefficient on large datasets

📊 Results
 Rows processed: 450,015,000
 CPU: 122,484
 Duration: 80,442 ms
 Reads: 1,189,435
 Writes: 0
Method 2: Using Subquery

In this method, for each row a subquery is executed to calculate the sum of all previous rows.
SELECT [Date],
Births,
Births + COALESCE((SELECT SUM(Births)
FROM dbo.Population AS s
WHERE s.[Date] < o.[Date]), 0) AS RunningTotal_Births,
Deaths,
Deaths + COALESCE((SELECT SUM(Deaths)
FROM dbo.Population AS s
WHERE s.[Date] < o.[Date]), 0) As RunningTotal_Deaths
FROM dbo.Population AS o
ORDER BY [Date];
GO

🔹 Advantages
* Simple and does not require a Join * Logically more readable than Method 1

🔻 Disadvantages
* Still an O(n²) algorithm * Each row triggers a separate subquery * Very high CPU and Reads consumption

📊 Results
 Rows processed: 449,985,000
 CPU: 272,297
 Duration: 301,638 ms
 Reads: 4,755,824
 Writes: 0
Method 3: Using Update with Accumulating Variables

In this method, the data is first inserted into a temporary table, and then an Update statement is used to calculate the Running Total
sequentially with the help of variables.
DECLARE @tmp TABLE
(
[Date] DATE PRIMARY KEY,
Births INT,
RunningTotal_Births INT,
Deaths INT,
RunningTotal_Deaths INT
);

DECLARE @RunningTotal_Births INT = 0;


DECLARE @RunningTotal_Deaths INT = 0;

INSERT @tmp([Date], Births, RunningTotal_Births, Deaths, RunningTotal_Deaths)


SELECT [Date], Births, RunningTotal_Births = 0,
Deaths, RunningTotal_Deaths = 0
FROM dbo.Population
ORDER BY [Date];

UPDATE @tmp SET


@RunningTotal_Births = RunningTotal_Births = @RunningTotal_Births + Births,
@RunningTotal_Deaths = RunningTotal_Deaths = @RunningTotal_Deaths + Deaths
FROM @tmp
OPTION(FORCE ORDER);

SELECT [Date], Births, RunningTotal_Births, Deaths, RunningTotal_Deaths


FROM @tmp
ORDER BY [Date];
GO

🔹 Advantages
* Optimized CPU and execution time * Linear algorithm O(n) * Much better performance compared to Join and Subquery

🔻 Disadvantages
* Requires a temporary table and longer code * Less readable compared to the Window Function approach

📊 Results
 CPU: 265
 Duration: 574 ms
 Reads: 226,926
 Writes: 177
Method 4: Using Recursive CTE

This method uses recursion to calculate the Running Total.


;WITH x AS
(
SELECT [Date],
Births, Births AS RunningTotal_Births,
Deaths, Deaths AS RunningTotal_Deaths
FROM dbo.Population
WHERE [Date] = '1900-01-01'
UNION ALL
SELECT y.[Date],
y.Births, x.RunningTotal_Births + y.Births,
y.Deaths, x.RunningTotal_Deaths + y.Deaths
FROM x
INNER JOIN dbo.Population AS y
ON y.[Date] = DATEADD(DAY, 1, x.[Date])
)
SELECT [Date],
Births, RunningTotal_Births,
Deaths, RunningTotal_Deaths
FROM x
ORDER BY [Date]
OPTION (MAXRECURSION 0);
GO

🔹 Advantages
* An interesting idea that demonstrates the power of CTEs * Useful for teaching and specific scenarios

🔻 Disadvantages
* Slower compared to Update or Window Function * High complexity with large datasets
* Requires the MAXRECURSION option for big data

📊 Results
 CPU: 357
 Duration: 660 ms
 Reads: 330,125
 Writes: 0
Method 5: Using Cursor

In this method, rows are processed one by one, and the Running Total is calculated with cumulative variables.
DECLARE @tmp TABLE
(
[Date] DATE PRIMARY KEY,
Births INT,
RunningTotal_Births INT,
Deaths INT,
RunningTotal_Deaths INT
);

DECLARE
@Date DATE,
@Births INT,
@RunningTotal_Births INT = 0,
@Deaths INT,
@RunningTotal_Deaths INT = 0;

DECLARE c CURSOR LOCAL STATIC FORWARD_ONLY READ_ONLY FOR


SELECT [Date], Births, Deaths
FROM dbo.Population
ORDER BY [Date];
OPEN c;
FETCH NEXT FROM c INTO @Date, @Births, @Deaths;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RunningTotal_Births = @RunningTotal_Births + @Births;
SET @RunningTotal_Deaths = @RunningTotal_Deaths + @Deaths;

INSERT @tmp([Date], Births, RunningTotal_Births, Deaths, RunningTotal_Deaths)


SELECT @Date, @Births, @RunningTotal_Births, @Deaths, @RunningTotal_Deaths;

FETCH NEXT FROM c INTO @Date, @Births, @Deaths;


END
CLOSE c;
DEALLOCATE c;

SELECT [Date], Births, RunningTotal_Births, Deaths, RunningTotal_Deaths


FROM @tmp
ORDER BY [Date];
GO
🔹 Advantages
* Full control over each row * High flexibility for complex scenarios

🔻 Disadvantages
* Slowest method with large datasets * High CPU usage and long execution time * Not suitable for production environments

📊 Results
 CPU: 891
 Duration: 2,701 ms
 Reads: 181,934
 Writes: 109
Method 6: Using Window Function (OVER ... ORDER BY)

This is the most efficient method in SQL Server, which calculates the Running Total internally with the best algorithm.
SELECT [Date],
Births,
SUM(Births) OVER (ORDER BY [Date]) AS RunningTotal_Births,
Deaths,
SUM(Deaths) OVER (ORDER BY [Date]) AS RunningTotal_Deaths
FROM dbo.Population
ORDER BY [Date];
GO

🔹 Advantages
* Best performance among all methods * Very simple and readable code * Optimized by the SQL Server engine

🔻 Disadvantages
* Not available in older versions of SQL Server (before 2012)

📊 Results
 CPU: 234
 Duration: 318 ms
 Reads: 285,771
 Writes: 32
Final Comparison of Methods

Duration Algorithm
Method CPU Reads Writes Code Simplicity Suitable For
(ms) Complexity
INNER JOIN 122,484 80,442 1,189,435 0 Simple O(n²) Educational / Small data
Subquery 272,297 301,638 4,755,824 0 Simpler than Join O(n²) Educational / Small data
Update + Variables 265 574 226,926 177 Medium O(n) Large data / High performance
O(n) (but
Recursive CTE 357 660 330,125 0 Medium Experimental / Special cases
expensive)
Special scenarios, not for
Cursor 891 2,701 181,934 109 Complex O(n) with high cost
production
Window Function 234 318 285,771 32 Very simple Optimized O(n) Best choice for Production

You might also like