Open In App

PostgreSQL – DISTINCT ON expression

Last Updated : 11 Oct, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

The DISTINCT ON clause in PostgreSQL allows us to retrieve unique rows based on specific columns by offering more flexibility than the standard DISTINCT clause. DISTINCT ON allow us to specify which row to keep for each unique value based on an ORDER BY clause.

This is particularly useful for selecting the most recent or highest values in grouped data. In this article, we’ll explore the PostgreSQL DISTINCT ON syntax, examples and so on.

What is the PostgreSQL DISTINCT ON Clause?

  • The DISTINCT ON in PostgreSQL clause allows us to retrieve unique rows based on one or more columns in a table.
  • However, unlike the standard DISTINCT clauses that discard all duplicate rows, DISTINCT ON gives us more control.
  • It enables us to determine which row to retain by arranging the rows in a particular order through the ORDER BY clause.

Syntax

SELECT DISTINCT ON (column1, column2, ...) column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...;

Explanation:

  • DISTINCT ON (column1, column2, …): This part tells PostgreSQL to return the first row for each unique combination of the specified columns.
  • ORDER BY: The ORDER BY clause is crucial because it determines which row from each group of duplicates will be kept. The rows are ordered based on the columns specified here.

Key Features of PostgreSQL DISTINCT ON

  • Allows fetching the first unique row based on specified columns.
  • Works with the ORDER BY clause to determine which row to keep in case of duplicates.
  • Enables retrieving data in a more controlled manner compared to the standard DISTINCT.

Examples of Using PostgreSQL DISTINCT ON

Let’s explore some examples to understand how DISTINCT ON works in real-world scenarios.

Example 1: Retrieve Highest Score for Each Student

First, create a table student_scores to store students’ scores in various subjects.

CREATE TABLE student_scores (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
subject VARCHAR(50) NOT NULL,
score INTEGER NOT NULL
);

Next, insert some sample data:

INSERT INTO student_scores (name, subject, score) 
VALUES
('Alice', 'Math', 90),
('Bob', 'Math', 85),
('Alice', 'Physics', 92),
('Bob', 'Physics', 88),
('Charlie', 'Math', 95),
('Charlie', 'Physics', 90);

Now, let’s retrieve the highest score for each student in any subject:

SELECT DISTINCT ON (name) name, subject, score
FROM student_scores
ORDER BY name, score DESC;

Output:

name subject score
Alice Physics 92
Bob Physics 88
Charlie Math 95

Explanation: In this query, the DISTINCT ON (name) clause ensures that we get one row for each student, and the ORDER BY clause sorts the scores in descending order so that the highest score for each student is returned.

Example 2: Log Data – Latest Request by URL

Suppose we have a log table that records URLs and the duration of each request:

CREATE TABLE logs (
id SERIAL PRIMARY KEY,
url VARCHAR(255) NOT NULL,
request_duration INTEGER NOT NULL,
timestamp TIMESTAMP NOT NULL
);

Insert some data:

INSERT INTO logs (url, request_duration, timestamp)
VALUES
('/home', 120, '2024-01-01 10:00:00'),
('/about', 95, '2024-01-01 11:00:00'),
('/home', 110, '2024-01-01 12:00:00'),
('/contact', 105, '2024-01-01 10:30:00'),
('/about', 100, '2024-01-01 12:30:00');

To retrieve the most recent request duration for each URL, use:

SELECT DISTINCT ON (url) url, request_duration, timestamp
FROM logs
ORDER BY url, timestamp DESC;

Output:

url request_duration timestamp
/about 100 2024-01-01 12:30:00
/contact 105 2024-01-01 10:30:00
/home 110 2024-01-01 12:00:00

Explanation: Here, DISTINCT ON (url) returns the most recent request for each URL, thanks to the ORDER BY url, timestamp DESC clause.

Important Points about PostgreSQL DISTINCT ON expression

  • The PostgreSQL DISTINCT ON expression is used to return only the first row of each set of rows where the given expression has the same value, effectively removing duplicates based on the specified column.
  • It is used to retain the “first row” of each group of duplicates in the result set, based on the ordering specified in the ORDER BY clause.
  • The DISTINCT ON expression must always match the leftmost expression in the ORDER BY clause to ensure predictable results.
  • Unlike the DISTINCT clause, which removes all duplicates, DISTINCT ON allows for more fine-grained control by specifying which duplicate row to keep.

Conlusion

Overall, the PostgreSQL DISTINCT ON clause helps you get unique rows based on specific columns while giving you control over which row to keep. By using the ORDER BY clause, you can decide which entry, such as the highest score or the most recent log, should be shown. This makes it a useful tool for organizing and retrieving data more efficiently in PostgreSQL.



Next Article

Similar Reads