Open In App

Identify and Remove Duplicate Data in R

Last Updated : 12 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

A dataset can have duplicate values and to keep it redundancy-free and accurate, duplicate rows need to be identified and removed. In this article, we are going to see how to identify and remove duplicate data in R. First we will check if duplicate data is present in our data, if yes then, we will remove it.

1.1. Identifying Duplicate Data in vector

We can use duplicated() function to find out how many duplicates value are present in a vector. The sum() function will give us the count of the number of duplicate values.

R
vec <- c(1, 2, 3, 4, 4, 5)

duplicated(vec)

sum(duplicated(vec))

Output:

FALSE FALSE FALSE FALSE TRUE FALSE
1

1.2. Removing Duplicate Data in a vector

We can remove duplicate data from vectors by using unique() functions so it will give only unique values.

R
vec <- c(1, 2, 3, 4, 4, 5)

unique(vec)

Output:

[1] 1 2 3 4 5

2.1. Identifying Duplicate Data in a Data Frame

We will use the duplicated() function which returns the count of duplicate rows present in a data frame.

Syntax: duplicated(dataframe)

Example:

R
res=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))

res
duplicated(res)
sum(duplicated(res))

Output:

data_frame
Output

2.2. Removing Duplicate Data in a data frame

We will see some different methods to handle duplicate values in a data frame.

Method 1: Using unique()

We use unique() to get rows having unique values in our data.

Syntax: unique(dataframe)

Example:

R
res=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))

res
unique(res)

Output:

dataframe
Output

Method 2: Using distinct()

 To use this method , tidyverse package should be installed and dplyr library should be loaded. We use distinct() to get rows having distinct values in our data.

Syntax: distinct(dataframe,keepall)

Parameter:

  • dataframe: data in use
  • keepall: decides which variables to keep

Example 1: Using distinct function

R
library(tidyverse)

res=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))

res
distinct(res)

Output:

dataframe
Output

Example 2: Printing unique rows in terms of maths column

R
res=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))

res
distinct(res,maths,.keep_all = TRUE)

Output:

dataframe
Output

The output returns a data frame with distinct rows based on the "maths" column, keeping only the first occurrence of each unique value.


Similar Reads