
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Count Non-Missing Values in R Data Frame Groups
To find the number of non-missing values in each group of an R data frame, we can convert the data frame to data.table object and then use the sum function with negation of is.na.
For Example, if we have a data frame called df that contains a grouping column say Group and a numerical column with few NAs say Num then we can find the number of non-missing values in each Group by using the below given command −
setDT(df)[,sum(!is.na(df)),by=.(Group)]
Example 1
Following snippet creates a sample data frame −
Grp<-sample(LETTERS[1:3],20,replace=TRUE) Dep_Var<-sample(c(NA,round(rnorm(2),2),20,replace=TRUE)) df1<-data.frame(Grp,Dep_Var) df1
The following dataframe is created
Grp Dep_Var 1 B NA 2 A 1.00 3 A 20.00 4 B -0.63 5 B -1.48 6 B NA 7 A 1.00 8 C 20.00 9 A -0.63 10 A -1.48 11 C NA 12 C 1.00 13 B 20.00 14 C -0.63 15 B -1.48 16 A NA 17 C 1.00 18 B 20.00 19 A -0.63 20 B -1.48
To load data.table object and find the number of non-missing values in each Grp on the above created data frame, add the following code to the above snippet −
Grp<-sample(LETTERS[1:3],20,replace=TRUE) Dep_Var<-sample(c(NA,round(rnorm(2),2),20,replace=TRUE)) df1<-data.frame(Grp,Dep_Var) library(data.table) setDT(df1)[,sum(!is.na(Dep_Var)),by=.(Grp)]
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Grp V1 1: B 6 2: A 6 3: C 4
Example 2
Following snippet creates a sample data frame −
Category<-sample(c("Low","Medium","High"),20,replace=TRUE) Val<-sample(c(NA,rpois(2,5),20,replace=TRUE)) df2<-data.frame(Category,Val) df2
The following dataframe is created
Category Val 1 Medium 20 2 High 1 3 High 8 4 High 5 5 High NA 6 Medium 20 7 High 1 8 Low 8 9 Low 5 10 Medium NA 11 Medium 20 12 Medium 1 13 Medium 8 14 Medium 5 15 Medium NA 16 High 20 17 Medium 1 18 Medium 8 19 Low 5 20 Low NA
To find the number of non-missing values in each Category on the above created data frame, add the following code to the above snippet −
Category<-sample(c("Low","Medium","High"),20,replace=TRUE) Val<-sample(c(NA,rpois(2,5),20,replace=TRUE)) df2<-data.frame(Category,Val) setDT(df2)[,sum(!is.na(Val)),by=.(Category)]
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Category V1 1: Medium 8 2: High 5 3: Low 3