PostgreSQL – DISTINCT ON expression
Last Updated :
11 Oct, 2024
The DISTINCT ON
clause in PostgreSQL allows us to retrieve unique rows based on specific columns by offering more flexibility than the standard DISTINCT
clause. DISTINCT ON
allow us to specify which row to keep for each unique value based on an ORDER BY
clause.
This is particularly useful for selecting the most recent or highest values in grouped data. In this article, we’ll explore the PostgreSQL DISTINCT ON syntax, examples and so on.
What is the PostgreSQL DISTINCT ON Clause?
- The
DISTINCT ON
in PostgreSQL clause allows us to retrieve unique rows based on one or more columns in a table.
- However, unlike the standard
DISTINCT
clauses that discard all duplicate rows, DISTINCT ON
gives us more control.
- It enables us to determine which row to retain by arranging the rows in a particular order through the ORDER BY clause.
Syntax
SELECT DISTINCT ON (column1, column2, ...) column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...;
Explanation:
- DISTINCT ON (column1, column2, …): This part tells PostgreSQL to return the first row for each unique combination of the specified columns.
- ORDER BY: The
ORDER BY
clause is crucial because it determines which row from each group of duplicates will be kept. The rows are ordered based on the columns specified here.
Key Features of PostgreSQL DISTINCT ON
- Allows fetching the first unique row based on specified columns.
- Works with the
ORDER BY
clause to determine which row to keep in case of duplicates.
- Enables retrieving data in a more controlled manner compared to the standard
DISTINCT
.
Examples of Using PostgreSQL DISTINCT ON
Let’s explore some examples to understand how DISTINCT ON
works in real-world scenarios.
Example 1: Retrieve Highest Score for Each Student
First, create a table student_scores
to store students’ scores in various subjects.
CREATE TABLE student_scores (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
subject VARCHAR(50) NOT NULL,
score INTEGER NOT NULL
);
Next, insert some sample data:
INSERT INTO student_scores (name, subject, score)
VALUES
('Alice', 'Math', 90),
('Bob', 'Math', 85),
('Alice', 'Physics', 92),
('Bob', 'Physics', 88),
('Charlie', 'Math', 95),
('Charlie', 'Physics', 90);
Now, let’s retrieve the highest score for each student in any subject:
SELECT DISTINCT ON (name) name, subject, score
FROM student_scores
ORDER BY name, score DESC;
Output:
name |
subject |
score |
Alice |
Physics |
92 |
Bob |
Physics |
88 |
Charlie |
Math |
95 |
Explanation: In this query, the DISTINCT ON (name)
clause ensures that we get one row for each student, and the ORDER BY
clause sorts the scores in descending order so that the highest score for each student is returned.
Example 2: Log Data – Latest Request by URL
Suppose we have a log table that records URLs and the duration of each request:
CREATE TABLE logs (
id SERIAL PRIMARY KEY,
url VARCHAR(255) NOT NULL,
request_duration INTEGER NOT NULL,
timestamp TIMESTAMP NOT NULL
);
Insert some data:
INSERT INTO logs (url, request_duration, timestamp)
VALUES
('/home', 120, '2024-01-01 10:00:00'),
('/about', 95, '2024-01-01 11:00:00'),
('/home', 110, '2024-01-01 12:00:00'),
('/contact', 105, '2024-01-01 10:30:00'),
('/about', 100, '2024-01-01 12:30:00');
To retrieve the most recent request duration for each URL, use:
SELECT DISTINCT ON (url) url, request_duration, timestamp
FROM logs
ORDER BY url, timestamp DESC;
Output:
url |
request_duration |
timestamp |
/about |
100 |
2024-01-01 12:30:00 |
/contact |
105 |
2024-01-01 10:30:00 |
/home |
110 |
2024-01-01 12:00:00 |
Explanation: Here, DISTINCT ON (url)
returns the most recent request for each URL, thanks to the ORDER BY url, timestamp DESC
clause.
Important Points about PostgreSQL DISTINCT ON expression
- The PostgreSQL
DISTINCT
ON
expression is used to return only the first row of each set of rows where the given expression has the same value, effectively removing duplicates based on the specified column.
- It is used to retain the “first row” of each group of duplicates in the result set, based on the ordering specified in the
ORDER BY
clause.
- The
DISTINCT ON
expression must always match the leftmost expression in the ORDER BY
clause to ensure predictable results.
- Unlike the
DISTINCT
clause, which removes all duplicates, DISTINCT ON
allows for more fine-grained control by specifying which duplicate row to keep.
Conlusion
Overall, the PostgreSQL DISTINCT ON clause helps you get unique rows based on specific columns while giving you control over which row to keep. By using the ORDER BY
clause, you can decide which entry, such as the highest score or the most recent log, should be shown. This makes it a useful tool for organizing and retrieving data more efficiently in PostgreSQL.
Similar Reads
PostgreSQL - Index On Expression
When working with databases, optimizing query performance is crucial, especially when dealing with large datasets. One powerful technique in PostgreSQL is leveraging indexes on expressions. This approach allows you to optimize queries that involve expressions, ensuring faster retrieval times and eff
3 min read
Group By Vs Distinct in PostgreSQL
When working with PostgreSQL, efficiently organizing and retrieving data is critical for database performance and decision-making. Two commonly used clauses are DISTINCT and GROUP BY that serve distinct purposes in data retrieval. DISTINCT is used to filter out duplicate values, while GROUP BY is em
6 min read
PostgreSQL - Common Table Expression (CTE)
In PostgreSQL, Common Table Expressions (CTEs) are a powerful feature that allows us to define temporary result sets that can be referenced within other SQL statements. This includes statements like SELECT, INSERT, UPDATE, or DELETE. CTEs make complex queries more readable and maintainable by breaki
4 min read
NOT IN vs NOT EXISTS in PostgreSQL
PostgreSQL is one of the most advanced general-purpose object-relational database management systems and is open-source. Being an open-source software, its source code is available under the PostgreSQL license, a liberal open-source license. Anyone with the right skills can use, modify, and distribu
4 min read
PostgreSQL - LAG Function
In PostgreSQL, the LAG() function is a powerful window function that allows you to access data from a previous row within the same result set. Itâs particularly useful for comparing values in the current row with values in the preceding row, making it ideal for analytical queries in PostgreSQL. For
5 min read
Difference Between EXISTS and IN in PostgreSQL
PostgreSQL is one of the most advanced general-purpose object-relational database management systems and is open-source. Being an open-source software, its source code is available under the PostgreSQL license, a liberal open-source license. In this article, we will learn about the EXISTS and IN Con
6 min read
PostgreSQL - LEAD Function
In PostgreSQL, the LEAD() function is a powerful window function used to access a row that follows the current row at a specific physical offset. This function is generally employed to compare the value of the current row with the value of the next row following the current row. Let us better unders
3 min read
PostgreSQL - IN Operator
The IN operator in PostgreSQL is a powerful and efficient tool used to filter records based on a predefined set of values. When used with the WHERE clause, it simplifies SQL queries and enhances readability, making it a key component of SQL query optimization for data retrieval and database manipula
4 min read
PostgreSQL - COUNT() Function
The COUNT() function in PostgreSQL is an aggregate function used to return the number of rows that match a specified condition in a query. This article will explore the various syntaxes of the COUNT() function and provide practical examples to help you understand its usage in PostgreSQL. SyntaxDepen
2 min read
PostgreSQL - FIRST_VALUE() Function
The FIRST_VALUE() function in PostgreSQL is a window function that retrieves the first value within an ordered set of rows, often within a specific partition. This feature is highly useful for data analysis and reporting by allowing targeted access to specific data points. In this article, we will e
5 min read