How to Select the First Row of Each GROUP BY in SQL?
Last Updated :
05 Dec, 2024
Selecting the first row of each group in SQL is a common requirement in database queries, especially when working with grouped data. SQL GROUP BY queries are highly effective for organizing data into logical groups, and additional techniques like window functions help to isolate the first row from each group
In this article, we will explore multiple methods to efficiently select the first row of each group in SQL, providing comprehensive insights through syntax, examples, and outputs. By the end of this article, we will have a clear understanding of how to use GROUP BY with aggregate functions.
Selecting the First Row of Each Group in SQL
The GROUP BY
statement in SQL is used to group rows based on the values in one or more columns. To fetch only the first row from each group, additional techniques like aggregate functions (e.g., MIN(), MAX()) or window functions (e.g., ROW_NUMBER()) are often used. Let’s explore these methods in detail.
Syntax
SELECT * FROM yourTable WHERE columnName = (SELECT MIN(columnName) FROM yourTable GROUP BY groupColumnName)
This query selects all rows from ‘ yourTable ‘ where the value in ‘ columnName ‘ is equal to the minimum value of ‘ columnName ‘ in each group defined by ‘ groupColumnName ‘.
Creating and Inserting Data into the users
Table
To demonstrate the SQL queries effectively, let's start by creating the users
table and inserting some sample data.
Step 1: Create the Table
CREATE TABLE users (
user_id INT,
username VARCHAR(50),
start_date DATE
);
Step 2: Insert Sample Data
INSERT INTO users (user_id, username, start_date)
VALUES
(1, 'Jeff', '2021-08-02'),
(2, 'Mack', '2021-01-02'),
(2, 'Jee', '2021-05-23'),
(3, 'Harry', '2020-04-03'),
(4, 'Merry', '2022-04-03'),
(5, 'Twinkle', '2022-03-09');
Step 3: View the user table
Select * FROM users;
Output
users tableMethod 1: Using GROUP BY with Aggregate Functions
To select the first row of each group in SQL, you can use the ' GROUP BY ' clause in combination with an aggregate function such as ' MIN() ' or ' MAX() '. For example, the following query selects the earliest start date for each group of records with the same ' user_id ':
Finding the Earliest Start Date for Each User
For example, if we want to find the earliest start date for each user, we can use the MIN() function to get the first row based on the start date.
Query:
SELECT MIN(start_date)
AS earliest_start_date
FROM users
GROUP BY user_id
Output
earliest_start_date |
---|
2021-08-02 |
2021-01-02 |
2020-04-03 |
2022-04-03 |
2022-03-09 |
Explanation:
This query uses the ' MIN() ' function to find the earliest start date for each group of records with the same ' user_id '. The ' GROUP BY ' clause groups the records by ' user_id ', and the ' SELECT ' statement returns the minimum start date for each group.
Method 2: Using ROW_NUMBER() Window Function
The ROW_NUMBER() function is part of the window functions in SQL, which are more flexible than simple aggregation. This method allows us to assign a unique number to each row within a group and then filter for the first row.
Selecting the First Row per User Based on Start Date
In this approach, we will use ROW_NUMBER() to rank each row within its group by the start_date
. The first row for each group will be assigned a ROW_NUMBER of 1.
Query:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id
ORDER BY start_date ASC) AS row_number
FROM users
) t
WHERE t.row_number = 1
Output
Using ROW_NUMBER() Window FunctionsExplanation:
- The ROW_NUMBER() function assigns a unique row number to each row within a group (specified by the PARTITION BY clause).
- The rows are ordered by start_date in ascending order, meaning the first record for each user will get a row number of 1.
- The outer query filters the results to include only rows where row_number = 1, ensuring that only the first row of each group is returned.
Method 3: Fetching Specific Columns
We can modify this query to return only specific columns by replacing the ' * ' in the ' SELECT ' clause with a comma-separated list of the columns we want to include in the result set. For example, the following query returns the ' user_id ' and ' start_date ' columns for the first row of each group:
Query:
SELECT user_id, start_date
FROM (
SELECT
user_id,
start_date,
ROW_NUMBER() OVER (PARTITION BY user_id
ORDER BY start_date ASC) AS row_number
FROM users
) t
WHERE t.row_number = 1
Output
user_id | start_date |
---|
1 | 2021-08-02 |
2 | 2021-01-02 |
3 | 2020-04-03 |
4 | 2022-04-03 |
5 | 2022-03-09 |
Explanation:
Here, only the user_id and start_date columns are selected from the result set, reducing the data load and improving query performance.
Comparison of Methods
Method | Pros | Cons |
---|
GROUP BY with MIN()/MAX() | Simple and easy to use. | Limited to numeric and date-based comparisons. |
ROW_NUMBER() Function | Highly flexible and works for complex queries. | Slightly more complex syntax. |
Specific Columns | Optimized for performance by fetching fewer columns. | Requires additional query modification. |
Conclusion
In this article, we explained multiple ways to select the first row of each group in SQL. Each method whether using GROUP BY with aggregate functions or the ROW_NUMBER() window function has its unique advantages depending on the use case and database environment. By Using these techniques, we can efficiently retrieve grouped data while optimizing query performance.
Similar Reads
SQL Interview Questions
Are you preparing for a SQL interview? SQL is a standard database language used for accessing and manipulating data in databases. It stands for Structured Query Language and was developed by IBM in the 1970s, SQL allows us to create, read, update, and delete data with simple yet effective commands.
15+ min read
Non-linear Components
In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
SQL Tutorial
SQL is a Structured query language used to access and manipulate data in databases. SQL stands for Structured Query Language. We can create, update, delete, and retrieve data in databases like MySQL, Oracle, PostgreSQL, etc. Overall, SQL is a query language that communicates with databases.In this S
11 min read
SQL Commands | DDL, DQL, DML, DCL and TCL Commands
SQL commands are crucial for managing databases effectively. These commands are divided into categories such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), Data Query Language (DQL), and Transaction Control Language (TCL). In this article, we will e
7 min read
SQL Joins (Inner, Left, Right and Full Join)
SQL joins are fundamental tools for combining data from multiple tables in relational databases. Joins allow efficient data retrieval, which is essential for generating meaningful observations and solving complex business queries. Understanding SQL join types, such as INNER JOIN, LEFT JOIN, RIGHT JO
6 min read
Normal Forms in DBMS
In the world of database management, Normal Forms are important for ensuring that data is structured logically, reducing redundancy, and maintaining data integrity. When working with databases, especially relational databases, it is critical to follow normalization techniques that help to eliminate
8 min read
Class Diagram | Unified Modeling Language (UML)
A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
ACID Properties in DBMS
In the world of Database Management Systems (DBMS), transactions are fundamental operations that allow us to modify and retrieve data. However, to ensure the integrity of a database, it is important that these transactions are executed in a way that maintains consistency, correctness, and reliabilit
8 min read
Spring Boot Tutorial
Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Backpropagation in Neural Network
Backpropagation is also known as "Backward Propagation of Errors" and it is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network. In this article we will explore what
10 min read