
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Combine Columns by Excluding Missing Values in R
If we have a data set that contains missing values at alternate places for each column then we might want to combine the columns by excluding those missing values, this will reduce the data set and the analysis is likely to become easier.
For this purpose, we can use na.exclude function along with apply function as shown in the below given examples.
Example 1
Following snippet creates a sample data frame −
x1<-rep(c(NA,2,10),times=c(5,10,5)) x2<-rep(c(1,3,5,NA),times=c(5,5,5,5)) x3<-rep(c(10,NA,3),times=c(5,10,5)) df1<-data.frame(x1,x2,x3) df1
The following dataframe is created −
x1 x2 x3 1 NA 1 10 2 NA 1 10 3 NA 1 10 4 NA 1 10 5 NA 1 10 6 2 3 NA 7 2 3 NA 8 2 3 NA 9 2 3 NA 10 2 3 NA 11 2 5 NA 12 2 5 NA 13 2 5 NA 14 2 5 NA 15 2 5 NA 16 10 NA 3 17 10 NA 3 18 10 NA 3 19 10 NA 3 20 10 NA 3
To exclude NA’s from df1 and combine the columns, add the following code to the above snippet −
x1<-rep(c(NA,2,10),times=c(5,10,5)) x2<-rep(c(1,3,5,NA),times=c(5,5,5,5)) x3<-rep(c(10,NA,3),times=c(5,10,5)) df1<-data.frame(x1,x2,x3) t(apply(df1,1,na.exclude))
Output
If you execute all the above given snippets as a single program, it generates the following output −
[,1][,2] [1,] 1 10 [2,] 1 10 [3,] 1 10 [4,] 1 10 [5,] 1 10 [6,] 2 3 [7,] 2 3 [8,] 2 3 [9,] 2 3 [10,] 2 3 [11,] 2 5 [12,] 2 5 [13,] 2 5 [14,] 2 5 [15,] 2 5 [16,] 10 3 [17,] 10 3 [18,] 10 3 [19,] 10 3 [20,] 10 3
Example 2
Following snippet creates a sample data frame −
y1<-rep(c(NA,rnorm(5)),times=c(5,2,3,3,3,4)) y2<-rep(c(rnorm(2),NA),times=c(10,5,5)) y3<-rep(c(rnorm(1),NA,rnorm(1)),times=c(5,10,5)) df2<-data.frame(y1,y2,y3) df2
The following dataframe is created −
y1 y2 y3 1 NA 0.1152603 -0.9838989 2 NA 0.1152603 -0.9838989 3 NA 0.1152603 -0.9838989 4 NA 0.1152603 -0.9838989 5 NA 0.1152603 -0.9838989 6 -0.74142593 0.1152603 NA 7 -0.74142593 0.1152603 NA 8 -1.88274271 0.1152603 NA 9 -1.88274271 0.1152603 NA 10 -1.88274271 0.1152603 NA 11 -0.09684216 -1.2886519 NA 12 -0.09684216 -1.2886519 NA 13 -0.09684216 -1.2886519 NA 14 -0.08528031 -1.2886519 NA 15 -0.08528031 -1.2886519 NA 16 -0.08528031 NA 0.1967864 17 -0.80126932 NA 0.1967864 18 -0.80126932 NA 0.1967864 19 -0.80126932 NA 0.1967864 20 -0.80126932 NA 0.1967864
To exclude NA’s from df2 and combine the columns, add the following code to the above snippet −
y1<-rep(c(NA,rnorm(5)),times=c(5,2,3,3,3,4)) y2<-rep(c(rnorm(2),NA),times=c(10,5,5)) y3<-rep(c(rnorm(1),NA,rnorm(1)),times=c(5,10,5)) df2<-data.frame(y1,y2,y3) t(apply(df2,1,na.exclude))
Output
If you execute all the above given snippets as a single program, it generates the following output −
[,1] [,2] [1,] 0.11526026 -0.9838989 [2,] 0.11526026 -0.9838989 [3,] 0.11526026 -0.9838989 [4,] 0.11526026 -0.9838989 [5,] 0.11526026 -0.9838989 [6,] -0.74142593 0.1152603 [7,] -0.74142593 0.1152603 [8,] -1.88274271 0.1152603 [9,] -1.88274271 0.1152603 [10,] -1.88274271 0.1152603 [11,] -0.09684216 -1.2886519 [12,] -0.09684216 -1.2886519 [13,] -0.09684216 -1.2886519 [14,] -0.08528031 -1.2886519 [15,] -0.08528031 -1.2886519 [16,] -0.08528031 0.1967864 [17,] -0.80126932 0.1967864 [18,] -0.80126932 0.1967864 [19,] -0.80126932 0.1967864 [20,] -0.80126932 0.1967864