0% found this document useful (0 votes)
41 views14 pages

Window Function by Pragya Rathi 1751487084 2

Window Functions allow calculations across a set of rows related to the current row without collapsing them, using the OVER() clause to define the influencing subset. They are useful for various tasks such as ranking, running totals, and comparisons, while maintaining all original rows. Key components include PARTITION BY for grouping, ORDER BY for ordering within groups, and advanced options like ROWS/RANGE for further narrowing the frame.

Uploaded by

Harsh Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views14 pages

Window Function by Pragya Rathi 1751487084 2

Window Functions allow calculations across a set of rows related to the current row without collapsing them, using the OVER() clause to define the influencing subset. They are useful for various tasks such as ranking, running totals, and comparisons, while maintaining all original rows. Key components include PARTITION BY for grouping, ORDER BY for ordering within groups, and advanced options like ROWS/RANGE for further narrowing the frame.

Uploaded by

Harsh Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

PRAGYA RATHI

🧠 1. What is a Window Function?


A Window Function lets you perform a calculation across a set of rows related to the
current row, without collapsing them into one.

🔍 Keyword: OVER()
It uses the OVER() clause to define the “window” — the subset of rows that influence the result
of each row.

✨ Real-Life Analogy:
Scenario: You’re viewing employee records.

●​ A GROUP BY on department + AVG(salary) → Gives one row per department.​

●​ A Window Function like AVG(salary) OVER (PARTITION BY department) →


Gives one row per employee + their department's avg salary as a new column.​

You keep the full detail and gain the group-based calculation.

✅ Key Features:
●​ Keeps all original rows.​

●​ Adds calculated data per row.​

●​ Operates on a "window" of rows, not the whole table blindly.​

🛠️ 2. Why Use Window Functions?


They're great for:

Use Case Example

Ranking Top 3 products per region

Running totals Cumulative monthly sales

Moving average 7-day moving avg of daily users

Comparisons Compare this month's sales to previous month (LAG/LEAD)

📚 3. Syntax – The OVER() Clause


FUNCTION() OVER (
PARTITION BY column
ORDER BY column
ROWS/RANGE clause
)

🧩 Components:
➤ PARTITION BY (Optional)

●​ Divides data into logical groups.​

●​ Like a mini GROUP BY only for that function.​

●​ Works independently for each group.​

AVG(salary) OVER (PARTITION BY department)

➤ ORDER BY (Important!)

●​ Orders rows within each partition.​

●​ Essential for ranking, running totals, LAG/LEAD.​

ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)


➤ ROWS / RANGE (Advanced)

●​ Narrows the frame even further:​

○​ E.g., just previous row, current row, next row.​

●​ Used in moving averages, cumulative sums, etc.​

SUM(sales) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT


ROW)

👩‍💼👨‍💼 4. Common Window Functions with Examples


We’ll use this employees table:

emp_id name department salary

1 Alice Sales 70000

2 Bob Sales 80000

3 Charlie Sales 70000

4 David IT 90000

5 Eve IT 95000

6 Frank HR 60000

🧮 A. Aggregate Window Functions


➤ Use case:

Show each employee’s salary and their department’s average salary:

SELECT
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS avg_dept_salary
FROM employees;
✅ Output:
name dept salary avg_dept_salar
y

Frank HR 60000 60000.00

David IT 90000 92500.00

Eve IT 95000 92500.00

Alice Sale 70000 73333.33


s

Bob Sale 80000 73333.33


s

Charlie Sale 70000 73333.33


s

👉 Great for comparison metrics: avg, sum, count per group – without collapsing rows.

🏁 B. Ranking Window Functions


Function Behavior

ROW_NUMBER() Unique number per row


(no ties)

RANK() Same rank for ties, skips


numbers

DENSE_RANK() Same rank for ties, doesn’t


skip

➤ Use case:

Rank employees by salary within each department:

SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS
row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk
FROM employees;

✅ Output:
name dept salary row_num rnk dense_rnk

Bob Sales 80000 1 1 1

Alice Sales 70000 2 2 2

Charlie Sales 70000 3 2 2

👉 ROW_NUMBER always unique.​


👉 RANK skips after tie (2,2, then 4).​
👉 DENSE_RANK does not skip (2,2, then 3).

🔄 C. Value (Offset) Window Functions


Function What it does

LAG(col) Gets value from previous


row

LEAD(col) Gets value from next row

➤ Use case:

Compare current salary with next highest salary:

SELECT
name,
department,
salary,
LEAD(salary, 1, 0) OVER (PARTITION BY department ORDER BY salary DESC) AS
next_highest_salary
FROM employees;

✅ Output:
name dept salary next_highest_salar
y

Eve IT 95000 90000

David IT 90000 0

Bob Sale 80000 70000


s

Alice Sale 70000 70000


s

Charlie Sale 70000 0


s

Frank HR 60000 0

👉 Use this for row-wise comparisons, difference calculations, or change tracking.

📝 5. Summary
Feature Description

What Performs row-aware calculations without grouping

How Uses OVER() clause with optional clauses

Key PARTITION BY, ORDER BY, ROWS


Clauses

Use Cases Rankings, Running Totals, Comparisons,


Aggregates

🧠 Final Tips for Mastery:


1.​ Start with OVER(PARTITION BY) for simple analytics.​

2.​ Then learn ORDER BY inside OVER() for ranking and comparisons.​

3.​ Gradually experiment with ROWS BETWEEN for moving totals.​


4.​ Practice using datasets like employees/sales/orders for real scenarios.​

5.​ Use in interview scenarios: “Find top 3 products per category”, “find users whose
transaction value increased vs last month” etc.​
✅ 1. Practice Questions for Window Functions
Use this sales table:

order_id user_i product amount order_date


d

101 1 Phone 20000 2024-01-01

102 1 Charger 1500 2024-01-05

103 2 Laptop 50000 2024-01-03

104 2 Mouse 1000 2024-01-06

105 1 Cover 500 2024-01-10

106 3 Tablet 25000 2024-01-02

💡 Q1: Show each user's total spend per order and total spend overall.
sql

CopyEdit

SELECT

user_id,

product,

amount,
SUM(amount) OVER (PARTITION BY user_id) AS total_user_spend

FROM sales;

💡 Q2: For each order, show the cumulative spend of that user up to that
order (running total).

sql

CopyEdit

SELECT

user_id,

order_date,

amount,

SUM(amount) OVER (

PARTITION BY user_id

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS running_total

FROM sales;

💡 Q3: Find orders where the user spent more than their previous order.
sql

CopyEdit
SELECT *,

LAG(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS


prev_order_amount

FROM sales

WHERE amount > LAG(amount) OVER (PARTITION BY user_id ORDER BY


order_date);

💡 Q4: Show first order date for each user in every row.
sql

CopyEdit

SELECT *,

MIN(order_date) OVER (PARTITION BY user_id) AS first_order_date

FROM sales;

📊 2. Visualizing the Window Frame (Very Important)


Let’s look at a running total with:

sql

CopyEdit

SUM(amount) OVER (

PARTITION BY user_id

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW


)

For user_id = 1:

order_date amount Running Total

Jan 1 20000 20000

Jan 5 1500 21500

Jan 10 500 22000

🪟 The "window" starts at the first row and ends at the current row — it slides forward as we
move down the rows.

🧠 3. Advanced Patterns Using Window Functions


🔁 A. Moving Average (e.g., 3-day window)
sql

CopyEdit

SELECT

order_date,

amount,

AVG(amount) OVER (

ORDER BY order_date

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW


) AS moving_avg

FROM sales;

🧠 Calculates average of current + 2 previous orders = Smooths out fluctuations.

⚠️ B. Detect Gaps in Dates (using LAG)


sql

CopyEdit

SELECT *,

DATEDIFF(

order_date,

LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date)

) AS days_between_orders

FROM sales;

👀 Useful for churn detection, checking delayed activity, etc.

🚩 C. Flag First Order


sql

CopyEdit

SELECT *,

CASE

WHEN ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date)


= 1 THEN 'Yes'
ELSE 'No'

END AS is_first_order

FROM sales;

📌 Common in customer journey analysis: first order, first purchase in category, etc.

💼 Where Are These Used in Real Life?


Task Window Use

Product Analytics Rank top 3 products per category

Retention Analysis Compare day-0 vs day-7 activity using LAG

Finance Reports Cumulative spend / moving averages

Anomaly Detection Flag users whose spending suddenly drops

Marketing Funnels Track user journey by stage using RANK

📘 Bonus Tip – Combine with CTEs!


sql

CopyEdit

WITH ranked_orders AS (
SELECT *,

RANK() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rnk

FROM sales

SELECT * FROM ranked_orders WHERE rnk = 1;

👉 Helps layer logic, e.g., “Get the highest order per user”.

You might also like