SQL Query to Delete Duplicate Rows

While working with a database to avoid duplicates, we should follow certain practices when we create the database table.

Define primary key to identify rows cluster and non-cluster indexes.
Use constraints for data integrity in performance.

Database tables may have duplicate rows after following best practices. These duplicate rows create issues while we retrieve data from the database.

So we must ensure unique database rows. To do so, first of all, we need to verify whether a table has duplicate rows, If duplicate rows exist, we must remove them by altering the table data. This article will discuss different ways to remove duplicate rows from the table.

Using GroupBy & Having Clauses
Using Common Table Expressions
Using the RANK() Function

Create Database Table

First of all let's create a table with the name Author table in the SQL server using the CREATE TABLE statement

CREATE TABLE Author(
   [ID] INT identity(1,1),
   [FirstName] Varchar(100),
   [LastName] Varchar(100),
   [Country] Varchar(100)
)

Now, lets insert values into the Author table -

INSERT INTO Author([FirstName], [LastName], [Country]) values('Mithlesh', 'Upadhyay', 'India')
INSERT INTO Author([FirstName], [LastName], [Country]) values('Mithlesh', 'Upadhyay', 'India')
INSERT INTO Author([FirstName], [LastName], [Country]) values('Mithlesh', 'Upadhyay', 'India')
INSERT INTO Author([FirstName], [LastName], [Country]) values('Rudra', 'Upadhyay', 'India')
INSERT INTO Author([FirstName], [LastName], [Country]) values('Rudra', 'Upadhyay', 'India')
INSERT INTO Author([FirstName], [LastName], [Country]) values('Chandan', 'Gautam', 'India')
INSERT INTO Author([FirstName], [LastName], [Country]) values('Chandan', 'Gautam', 'India')
INSERT INTO Author([FirstName], [LastName], [Country]) values('Chandan', 'Gautam', 'India')

These values will be inserted into the Author table as given below. We can check this Author table and its rows using the below query -

SELECT * FROM Author

The output will be as shown below. Note that the table has duplicate rows as given below -

ID	FirstName	LastName	Country
1	Mithlesh	Upadhyay	India
2	Mithlesh	Upadhyay	India
3	Mithlesh	Upadhyay	India
4	Rudra	Upadhyay	India
5	Rudra	Upadhyay	India
6	Chandan	Gautam	India
7	Chandan	Gautam	India
8	Chandan	Gautam	India

Now, we will write different queries to remove duplicate rows from the above Author table.

Using GroupBy & Having Clauses

We can identify duplicate rows using the GROUP BY clause. It groups the rows based on columns. We will select FirstName, LastName, and Country to group the rows and then select only one row from each group. It will return unique rows from the Author table.

DELETE FROM Author
WHERE ID NOT IN (
   SELECT MIN(ID)
   FROM Author
   GROUP BY FirstName, LastName, Country
   HAVING COUNT(*) > 1
)

The contents of the Author table after the above query is. Note that the ID number of the rows: 2, 3, 5, 7, and 8 are deleted because of duplicate values.

ID	FirstName	LastName	Country
1	Mithlesh	Upadhyay	India
4	Rudra	Upadhyay	India
6	Chandan	Gautam	India

Using Common Table Expressions (CTE)

We can also use Common Table Expressions (CTE) to delete duplicate rows in the SQL server. CTE was introduced in SQL Server 2005. We will write the SQL ROW_NUMBER() function. It adds a unique sequential row number for the row. We use Partition by clause using FirstName, LastName, and Country to generate a row number for each row.

Note that we have the initial above table with duplicate rows, so we can apply this SQL to delete duplicate and return the original table as output.

WITH CTE AS (
   SELECT ID, FirstName, LastName, Country, ROW_NUMBER() 
   OVER (PARTITION BY FirstName, LastName, Country ORDER BY ID) 
   AS row_num FROM Author
)
SELECT * FROM CTE

It will add another column "row_num". This column will show the count of the number of times the duplicate row is as below -

ID	FirstName	LastName	Country	row_num
6	Chandan	Gautam	India	1
1	Mithlesh	Upadhyay	India	1
4	Rudra	Upadhyay	India	1

Now, we will delete all those rows which have row_num greater than 1. So it will keep unique rows. You can perform this using the below query -

WITH CTE AS ( 
   SELECT ID, FirstName, LastName, Country, ROW_NUMBER() 
   OVER (PARTITION BY FirstName, LastName, Country ORDER BY ID) 
   AS row_num FROM Author
)
DELETE FROM Author WHERE ID IN ( 
   SELECT ID FROM CTE WHERE row_num > 1
)

It will remove all the duplicate rows from the Author table. You can check this updated table using the below query -

SELECT * FROM Author

The output will be -

ID	FirstName	LastName	Country
1	Mithlesh	Upadhyay	India
4	Rudra	Upadhyay	India
6	Chandan	Gautam	India

Using RANK() Function

We can also use the RANK function to delete duplicate rows in the SQL. This method is similar to the above method method which uses ROW_NUMBER(). The RANK() function shows the value of repeated rows.

Note that we have the initial above table with duplicate rows, so we can apply this SQL to delete duplicate and return the original table as output.

WITH CTE AS ( 
   SELECT ID, FirstName, LastName, Country, RANK() 
   OVER (PARTITION BY FirstName, LastName, Country ORDER BY ID) 
   AS rank_num FROM Author
)
SELECT * FROM CTE

Note that this is just same as the above method but here we use RANK() in place ROW_NUMBER() function and so column name is rank_num instead of row_num. The final query to delete duplicate rows from the Author table is -

WITH CTE AS ( 
   SELECT ID, FirstName, LastName, Country, RANK() 
   OVER (PARTITION BY FirstName, LastName, Country ORDER BY ID) 
   AS rank_num FROM Author
)
DELETE FROM Author WHERE ID IN ( 
   SELECT ID FROM CTE WHERE rank_num > 1
)

It will delete all the duplicate rows from the Author table. The contents of the Author table after the above query is as follows

ID	FirstName	LastName	Country
1	Mithlesh	Upadhyay	India
4	Rudra	Upadhyay	India
6	Chandan	Gautam	India

Conclusion

In this article, we have discussed three different methods to delete duplicate rows from the database table in SQL. We use group by and having class in the first method.

The first method is most easy to write in SQL. We have used ROW_NUMBER() and RANK() functions in the second and third methods respectively. The last two methods are similar to each other, so you can understand them easily.

Frequently Ask Questions (FAQs)

Why do you remove duplicate rows the table? chevron

We remove duplicate rows from the table because duplicate rows create data integrity issues. Tables with unique rows always ensure data integrity, performance, and calculation of the queries.

What is the difference between using ROW_NUMBER() and RANK() function in the SQL? chevron

Both the ROW_NUMBER() and RANK() functions assign numbers to rows within a partition. ROW_NUMBER() assigns a unique number to each row. Whereas, RANK() assigns the same rank to identical rows and skips subsequent numbers.

When do you use GROUP BY and HAVING to delete duplicates instead of CTEs? chevron

There is not much difference but this method is easier to write and understand.

Mithlesh Upadhyay

Updated on: 2024-11-20T14:59:50+05:30

186 Views

Kickstart Your Career

Get certified by completing the course

Get Started