
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Number of Unique Values in R Data Frame Column
If we have comma separated values that contains duplicate and unique values then we might want to find the number of unique values within each comma separated value. To find the unique values in comma separated strings stored in an R data frame column, we can use stri_extract_all_regex function of stringi package along with sapply function.
Check out the below examples to understand how it can be done.
Example 1
Following snippet creates a sample data frame −
x<-sample(c("3,2,3,4,5,4,3","5,5,6,7,8,6,8","3,2","5,9,8,0"),20,replace=TRUE) df1<-data.frame(x) df1
The following dataframe is created −
x 1 5,5,6,7,8,6,8 2 5,9,8,0 3 5,9,8,0 4 3,2 5 5,5,6,7,8,6,8 6 3,2,3,4,5,4,3 7 3,2,3,4,5,4,3 8 3,2 9 3,2 10 5,5,6,7,8,6,8 11 3,2 12 5,5,6,7,8,6,8 13 5,5,6,7,8,6,8 14 3,2 15 5,5,6,7,8,6,8 16 5,9,8,0 17 5,5,6,7,8,6,8 18 3,2 19 3,2 20 5,9,8,0
To load stringi package and find the number of unique values in each value of x, add the following code to the above snippet −
library(stringi) df1$Unique_in_x<-sapply(stri_extract_all_regex(df1$x,"[0-9]+"),function(x) length(unique(x))) df1
Output
If you execute all the above given snippets as a single program, it generates the following output −
x Unique_in_x 1 5,5,6,7,8,6,8 4 2 5,9,8,0 4 3 5,9,8,0 4 4 3,2 2 5 5,5,6,7,8,6,8 4 6 3,2,3,4,5,4,3 4 7 3,2,3,4,5,4,3 4 8 3,2 2 9 3,2 2 10 5,5,6,7,8,6,8 4 11 3,2 2 12 5,5,6,7,8,6,8 4 13 5,5,6,7,8,6,8 4 14 3,2 2 15 5,5,6,7,8,6,8 4 16 5,9,8,0 4 17 5,5,6,7,8,6,8 4 18 3,2 2 19 3,2 2 20 5,9,8,0 4
Example 2
Following snippet creates a sample data frame −
y<-sample(c("A,G,R,T,D","Y,I,H,H,F,E,L","T,W,E,E,E,D,S,R"),20,replace=TRUE) df2<-data.frame(y) df2
The following dataframe is created −
y 1 Y,I,H,H,F,E,L 2 A,G,R,T,D 3 Y,I,H,H,F,E,L 4 Y,I,H,H,F,E,L 5 A,G,R,T,D 6 Y,I,H,H,F,E,L 7 Y,I,H,H,F,E,L 8 A,G,R,T,D 9 A,G,R,T,D 10 A,G,R,T,D 11 Y,I,H,H,F,E,L 12 Y,I,H,H,F,E,L 13 T,W,E,E,E,D,S,R 14 Y,I,H,H,F,E,L 15 A,G,R,T,D 16 T,W,E,E,E,D,S,R 17 T,W,E,E,E,D,S,R 18 A,G,R,T,D 19 A,G,R,T,D 20 Y,I,H,H,F,E,L
To find the number of unique values in each value of y, add the following code to the above snippet −
df2$Unique_in_y<-sapply(stri_extract_all_regex(df2$y,"[A-Z]+"),function(x) length(unique(x))) df2
Output
If you execute all the above given snippets as a single program, it generates the following output −
y Unique_in_y 1 Y,I,H,H,F,E,L 6 2 A,G,R,T,D 5 3 Y,I,H,H,F,E,L 6 4 Y,I,H,H,F,E,L 6 5 A,G,R,T,D 5 6 Y,I,H,H,F,E,L 6 7 Y,I,H,H,F,E,L 6 8 A,G,R,T,D 5 9 A,G,R,T,D 5 10 A,G,R,T,D 5 11 Y,I,H,H,F,E,L 6 12 Y,I,H,H,F,E,L 6 13 T,W,E,E,E,D,S,R 6 14 Y,I,H,H,F,E,L 6 15 A,G,R,T,D 5 16 T,W,E,E,E,D,S,R 6 17 T,W,E,E,E,D,S,R 6 18 A,G,R,T,D 5 19 A,G,R,T,D 5 20 Y,I,H,H,F,E,L 6