How to Remove All Duplicate Rows Except One in PostgreSQL?
Last Updated :
15 Apr, 2024
When working with databases, we often have duplicate rows in the table. These duplicates can arise due to various reasons such as erroneous data entry or data migration processes. Removing duplicates while retaining one instance of each is a frequent requirement in database maintenance tasks.
So to have the unique records, we need to delete the duplicate records except one in the database. So PostgreSQL provides different approaches to deleting duplicate rows. In this article, we will discuss the below-mentioned approaches on how to delete duplicate rows except one.
How to Remove All Duplicate Rows Except One in PostgreSQL
Removing duplicate rows in PostgreSQL while preserving one instance each is essential for data cleanliness. Two efficient methods are DELETE USING and subqueries with row_number(). These approaches maintain data integrity and simplify database management.
- By DELETE USING Statement
- Using Subqueries
Setup an Environment
Creating a student table:
Let us first create a student table having columns s_id , s_name , dept_no using the below query:
create table student (s_id integer , s_name varchar(20) , dept_no integer )
Inserting values into the student table:
After creating a table, let us insert some duplicate values in the table using the below queries:
insert into student values ( 1 , ' Mohit ' , 10 );
insert into student values ( 2, ' Mohit ' , 10 );
insert into student values ( 3, ' Mohit ' , 10 );
insert into student values ( 4, ' Pavan ' , 20);
insert into student values ( 5, ' Pavan ' , 20);
insert into student values ( 6, ' Harish ' , 30);
We inserted 6 records in the student table where we have some duplicate records. To delete those duplicate records while preserving one record, you can use below mentioned approaches:
1. By DELETE USING Statement
The DELETE USING is a feature available in PostgreSQL that allows us to delete duplicate rows except one.
Syntax:
DELETE FROM table_name alias_name1 USING table_name alias_name2 WHERE conditions
Example: The student table which we have created and inserted some records has some duplicate records. So let us delete the duplicate records while retaining one of them by combining the DELETE and USING statements. The below is the student having duplicate records.
Student table with duplicate rowsIn the above table, there are 3 duplicate records with s_name = ' Mohit ' and two duplicate records with s_name = ' Pavan '. So we need to delete those while retaining one of those duplicate records. So we can use the below query:
DELETE from student s1 USING student s2 where s1.s_id > s2.s_id and s1.s_name = s2.s_name
Explanation: The above query has two aliases s1 and s2 for the same table student where we check s1.s_id is > s2.s_id and s1.s_name = s2.s_name which means the records having s_id's greater than the previous duplicate records having equal s_name's are deleted. So, the rows having s_id 3 and 2 are checked with 1 since their s_id is greater than 1 and have equal s_name, that's why it deletes the 2nd row and 3rd row. In the same way, the s_id 5 is checked with 4, since 5 is greater than 4 and has equal s_names. Hence it deletes the 5th row. After executing the above query, our table looks like this:
Output:
Student table with unique rows after deleting duplicate rowsNow we can see that the query deleted two duplicate records with s_name = ' Mohit ' and one duplicate record with s_name = ' Pavan ' which was our requirement.
2. Using Subqueries with row_number( )
In this method, we are using sub-queries with row_number( ) to delete the duplicates except one. It follows three steps given below:
- Partitioning by duplicate values
- Assigning row numbers
- Filtering duplicate rows
Syntax:
DELETE FROM table_name WHERE col1 in
(SELECT col1 FROM
(SELECT col1,row_number() OVER(partition by col2 order by col1 asc) AS row_num FROM table_name)alias_name
WHERE conditions);
Example: Let us take the student table having some duplicate records
student table with duplicate recordsQuery:
DELETE FROM student WHERE s_id in
(SELECT s_id FROM
(SELECT s_id,row_number() OVER(partition by s_name order by s_id asc) AS row_num FROM student)s WHERE s.row_num>1);
Output:
Student table with unique rows after removing duplicate rowsExplanation: The subquery in the above query selects the s_id and it assigns a row number to each row using the row_number() function by partitioning the student table based on the s_name column and sorting the data in ascending order using order by clause. This subquery will return the duplicates except for the first record. After getting s_id from the sub query, it is selected by the second inner query which is then deleted by the outer DELETE FROM statement.
Conclusion
Managing data often involves the need to manipulate records while preserving some important records. In PostgreSQL, the ability to delete all records except one unique record is valuable for maintaining integrity and efficiency. This can be done by a special feature available in PostgreSQL which is combining DELETE and USING keywords and the other way is using DELETE statement with subqueries and conditions. But it is important to keep an eye while deleting records as it may lead to the loss of important data records.
Similar Reads
How to Remove All Duplicate Rows Except One in SQLite?
SQLite is a lightweight and open-source relational database management system (RDBMS). SQLite does not require any server to process since it is a serverless architecture that can run operations and queries without any server. In this article, we will understand how to remove duplicate rows except o
5 min read
How to Remove All Duplicate Rows Except One in MariaDB?
Duplicate rows in database tables can lead to complexity and data integrity issues and affect performance. Removing all duplicate rows while keeping one instance of each unique row is important for maintaining a clean database. In this article, we will explore various methods with the help of exampl
5 min read
How to Remove Duplicate Records Except a Single Record in MySQL?
In MySQL, remove duplicate records while preserving one representative entry using a DELETE statement with a self-join. Match records based on criteria like identical values in a column and delete duplicates based on conditions, such as retaining the record with the lowest ID, ensuring data cleanlin
4 min read
How to Remove Duplicates in Excel? 4 Quick & Easy Methods
Removing duplicates in Excel is essential when cleaning up data to ensure accuracy and avoid redundancy. Whether youâre working with small datasets or large spreadsheets, Excel provides built-in tools and methods to help you identify and remove duplicates effectively. This guide will walk you throug
9 min read
How to Delete Duplicate Rows in MySQL?
Duplicate rows are a common problem in MySQL databases. Duplicate rows can cause problems with data accuracy and integrity. They can also make it difficult to query and analyze data. Ways to Delete Duplicate Rows in MySQLThere are a few different ways to delete duplicate rows from tables in MySQL: U
4 min read
How to Fetch Duplicate Rows in a Table?
Identifying duplicate rows in a database table is a common requirement, especially when dealing with large datasets. Duplicates can arise due to data entry errors, system migrations, or batch processing issues. In this article, we will explain efficient SQL techniques to identify and retrieve duplic
3 min read
How to Remove Duplicate Elements from Array in Ruby?
This article focuses on discussing how to remove duplicate elements from an array in Ruby. In Ruby, an array is a fundamental data structure that stores a collection of elements in a specific order. Various methods and approaches can be used to remove duplicate elements from an array. Removing Dupli
2 min read
How To Remove Duplicates From Vector In R
A vector is a basic data structure that is used to represent an ordered collection of elements of the same data type. It is one-dimensional and can contain numeric, character, or logical values. It is to be noted that the vector in C++ and the vector in R Programming Language are not the same. In C+
3 min read
How to Remove Duplicates in LibreOffice?
Removing duplicates in LibreOffice is a useful way to clean up your data and make sure you have accurate and unique information in your spreadsheets. Duplicates can cause confusion and errors in data analysis and reporting, so it's important to know how to remove them efficiently. LibreOffice Calc,
5 min read
How to Efficiently Convert Rows to Columns in PostgreSQL?
Converting rows to columns, often referred to as pivoting or transposing, is a crucial aspect of data transformation in SQL. This technique is useful for improving data readability, facilitating analysis, aligning data formats with the requirements of reporting tools, and optimizing queries. In Post
5 min read