
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
SQL Query to Delete Duplicate Rows
While working with a database to avoid duplicates, we should follow certain practices when we create the database table.
- Define primary key to identify rows cluster and non-cluster indexes.
- Use constraints for data integrity in performance.
Database tables may have duplicate rows after following best practices. These duplicate rows create issues while we retrieve data from the database.
So we must ensure unique database rows. To do so, first of all, we need to verify whether a table has duplicate rows, If duplicate rows exist, we must remove them by altering the table data. This article will discuss different ways to remove duplicate rows from the table.
- Using GroupBy & Having Clauses
- Using Common Table Expressions
- Using the RANK() Function
Create Database Table
First of all let's create a table with the name Author table in the SQL server using the CREATE TABLE statement
CREATE TABLE Author( [ID] INT identity(1,1), [FirstName] Varchar(100), [LastName] Varchar(100), [Country] Varchar(100) )
Now, lets insert values into the Author table -
INSERT INTO Author([FirstName], [LastName], [Country]) values('Mithlesh', 'Upadhyay', 'India') INSERT INTO Author([FirstName], [LastName], [Country]) values('Mithlesh', 'Upadhyay', 'India') INSERT INTO Author([FirstName], [LastName], [Country]) values('Mithlesh', 'Upadhyay', 'India') INSERT INTO Author([FirstName], [LastName], [Country]) values('Rudra', 'Upadhyay', 'India') INSERT INTO Author([FirstName], [LastName], [Country]) values('Rudra', 'Upadhyay', 'India') INSERT INTO Author([FirstName], [LastName], [Country]) values('Chandan', 'Gautam', 'India') INSERT INTO Author([FirstName], [LastName], [Country]) values('Chandan', 'Gautam', 'India') INSERT INTO Author([FirstName], [LastName], [Country]) values('Chandan', 'Gautam', 'India')
These values will be inserted into the Author table as given below. We can check this Author table and its rows using the below query -
SELECT * FROM Author
The output will be as shown below. Note that the table has duplicate rows as given below -
ID | FirstName | LastName | Country |
---|---|---|---|
1 | Mithlesh | Upadhyay | India |
2 | Mithlesh | Upadhyay | India |
3 | Mithlesh | Upadhyay | India |
4 | Rudra | Upadhyay | India |
5 | Rudra | Upadhyay | India |
6 | Chandan | Gautam | India |
7 | Chandan | Gautam | India |
8 | Chandan | Gautam | India |
Now, we will write different queries to remove duplicate rows from the above Author table.
Using GroupBy & Having Clauses
We can identify duplicate rows using the GROUP BY clause. It groups the rows based on columns. We will select FirstName, LastName, and Country to group the rows and then select only one row from each group. It will return unique rows from the Author table.
DELETE FROM Author WHERE ID NOT IN ( SELECT MIN(ID) FROM Author GROUP BY FirstName, LastName, Country HAVING COUNT(*) > 1 )
The contents of the Author table after the above query is. Note that the ID number of the rows: 2, 3, 5, 7, and 8 are deleted because of duplicate values.
ID | FirstName | LastName | Country |
---|---|---|---|
1 | Mithlesh | Upadhyay | India |
4 | Rudra | Upadhyay | India |
6 | Chandan | Gautam | India |
Using Common Table Expressions (CTE)
We can also use Common Table Expressions (CTE) to delete duplicate rows in the SQL server. CTE was introduced in SQL Server 2005. We will write the SQL ROW_NUMBER() function. It adds a unique sequential row number for the row. We use Partition by clause using FirstName, LastName, and Country to generate a row number for each row.
Note that we have the initial above table with duplicate rows, so we can apply this SQL to delete duplicate and return the original table as output.
WITH CTE AS ( SELECT ID, FirstName, LastName, Country, ROW_NUMBER() OVER (PARTITION BY FirstName, LastName, Country ORDER BY ID) AS row_num FROM Author ) SELECT * FROM CTE
It will add another column "row_num". This column will show the count of the number of times the duplicate row is as below -
ID | FirstName | LastName | Country | row_num |
---|---|---|---|---|
6 | Chandan | Gautam | India | 1 |
1 | Mithlesh | Upadhyay | India | 1 |
4 | Rudra | Upadhyay | India | 1 |
Now, we will delete all those rows which have row_num greater than 1. So it will keep unique rows. You can perform this using the below query -
WITH CTE AS ( SELECT ID, FirstName, LastName, Country, ROW_NUMBER() OVER (PARTITION BY FirstName, LastName, Country ORDER BY ID) AS row_num FROM Author ) DELETE FROM Author WHERE ID IN ( SELECT ID FROM CTE WHERE row_num > 1 )
It will remove all the duplicate rows from the Author table. You can check this updated table using the below query -
SELECT * FROM Author
The output will be -
ID | FirstName | LastName | Country |
1 | Mithlesh | Upadhyay | India |
4 | Rudra | Upadhyay | India |
6 | Chandan | Gautam | India |
Using RANK() Function
We can also use the RANK function to delete duplicate rows in the SQL. This method is similar to the above method method which uses ROW_NUMBER(). The RANK() function shows the value of repeated rows.
Note that we have the initial above table with duplicate rows, so we can apply this SQL to delete duplicate and return the original table as output.
WITH CTE AS ( SELECT ID, FirstName, LastName, Country, RANK() OVER (PARTITION BY FirstName, LastName, Country ORDER BY ID) AS rank_num FROM Author ) SELECT * FROM CTE
Note that this is just same as the above method but here we use RANK() in place ROW_NUMBER() function and so column name is rank_num instead of row_num. The final query to delete duplicate rows from the Author table is -
WITH CTE AS ( SELECT ID, FirstName, LastName, Country, RANK() OVER (PARTITION BY FirstName, LastName, Country ORDER BY ID) AS rank_num FROM Author ) DELETE FROM Author WHERE ID IN ( SELECT ID FROM CTE WHERE rank_num > 1 )
It will delete all the duplicate rows from the Author table. The contents of the Author table after the above query is as follows
ID | FirstName | LastName | Country |
---|---|---|---|
1 | Mithlesh | Upadhyay | India |
4 | Rudra | Upadhyay | India |
6 | Chandan | Gautam | India |
Conclusion
In this article, we have discussed three different methods to delete duplicate rows from the database table in SQL. We use group by and having class in the first method.
The first method is most easy to write in SQL. We have used ROW_NUMBER() and RANK() functions in the second and third methods respectively. The last two methods are similar to each other, so you can understand them easily.
Frequently Ask Questions (FAQs)
We remove duplicate rows from the table because duplicate rows create data integrity issues. Tables with unique rows always ensure data integrity, performance, and calculation of the queries.
Both the ROW_NUMBER() and RANK() functions assign numbers to rows within a partition. ROW_NUMBER() assigns a unique number to each row. Whereas, RANK() assigns the same rank to identical rows and skips subsequent numbers.
There is not much difference but this method is easier to write and understand.