
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Percentage of Missing Values in R Data Frame
To find the percentage of missing values in an R data frame, we can use sum function with the prod function. For example, if we have a data frame called df that contains some missing values then the percentage of missing values can be calculated by using the command: (sum(is.na(df))/prod(dim(df)))*100
Example1
Consider the below data frame −
x1<−sample(c(NA,1,5,10,15),20,replace=TRUE) x2<−sample(c(NA,rnorm(5)),20,replace=TRUE) x3<−sample(c(NA,100,200),20,replace=TRUE) x4<−sample(c(NA,0,1),20,replace=TRUE) df1<−data.frame(x1,x2,x3,x4) df1
Output
x1 x2 x3 x4 1 10 −0.7734719 NA 1 2 10 −1.8581538 200 0 3 10 −0.7734719 100 1 4 1 NA 200 1 5 5 −1.8581538 100 NA 6 10 −0.7734719 100 0 7 5 −1.8581538 200 0 8 10 −0.3188769 NA NA 9 5 −0.7734719 NA 1 10 NA 1.0589124 200 0 11 NA −0.7734719 200 0 12 15 −1.8581538 NA 1 13 5 −0.7734719 200 NA 14 5 −0.7734719 200 0 15 10 −0.3188769 200 1 16 15 −1.8581538 NA 1 17 NA −1.8581538 200 0 18 15 1.0589124 NA 1 19 10 −0.7734719 NA 1 20 5 −1.8581538 100 NA
Finding the percentage of missing values in df1 −
(sum(is.na(df1))/prod(dim(df1)))*100
Output
[1] 18.75
Example2
y1<−sample(c(NA,rpois(2,1)),20,replace=TRUE) y2<−sample(c(NA,rpois(2,1)),20,replace=TRUE) y3<−sample(c(NA,rpois(2,5)),20,replace=TRUE) y4<−sample(c(NA,rpois(2,3)),20,replace=TRUE) y5<−sample(c(NA,rpois(2,2)),20,replace=TRUE) y6<−sample(c(NA,rpois(2,5)),20,replace=TRUE) df2<−data.frame(y1,y2,y3,y4,y5,y6) df2
Output
y1 y2 y3 y4 y5 y6 1 2 NA NA 2 NA 5 2 1 NA 9 NA 2 5 3 1 1 9 NA 3 5 4 NA 0 6 3 3 5 5 2 1 NA 3 2 NA 6 2 NA 6 NA 3 NA 7 NA 0 6 2 3 NA 8 2 NA NA 3 NA 5 9 1 1 9 NA 3 NA 10 2 NA NA 3 NA 2 11 NA 1 6 3 3 NA 12 2 NA 6 3 2 2 13 NA NA 6 2 2 NA 14 2 0 6 NA NA NA 15 2 0 6 3 NA 5 16 NA 1 9 3 2 2 17 2 NA 6 3 2 NA 18 2 1 6 3 NA 2 19 NA 1 NA NA 3 2 20 2 NA 6 3 NA 5
Finding the percentage of missing values in df2 −
Example
(sum(is.na(df2))/prod(dim(df2)))*100
Output
[1] 34.16667
Advertisements