
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Mean of Each Variable Using dplyr in R
If there are NA’s in our data set for multiple values of numerical variables with the grouping variable then using na.rm = FALSE needs to be performed multiple times to find the mean or any other statistic for each of the variables with the mean function. But we can do it with summarise_all function of dplyr package that will result in the mean of all numerical variables in just two lines of code.
Example
Loading dplyr package −
> library(dplyr)
Consider the ToothGrowth data set in base R −
> str(ToothGrowth) 'data.frame': 60 obs. of 3 variables: $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ... $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ... $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ... > grouping_by_supp <- ToothGrowth %>% group_by(supp) > grouping_by_supp %>% summarise_each(funs(mean(., na.rm = TRUE))) # A tibble: 2 x 3 supp len dose <fct> <dbl> <dbl> 1 OJ 20.7 1.17 2 VC 17.0 1.17
Consider the mtcars data set in base R −
> str(mtcars) 'data.frame': 32 obs. of 11 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ cyl : Factor w/ 3 levels "four","six","eight": 2 2 1 2 3 2 3 1 1 2 ... $ disp: num 160 160 108 258 360 ... $ hp : num 110 110 93 110 175 105 245 62 95 123 ... $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ qsec: num 16.5 17 18.6 19.4 17 ... $ vs : num 0 0 1 1 0 1 0 1 1 1 ... $ am : num 1 1 1 0 0 0 0 0 0 0 ... $ gear: num 4 4 4 3 3 3 3 4 4 4 ... $ carb: num 4 4 1 1 2 1 4 2 2 4 ... > grouping_by_cyl <- mtcars %>% group_by(cyl) > grouping_by_cyl %>% summarise_each(funs(mean(., na.rm = TRUE))) # A tibble: 3 x 11 cyl mpg disp hp drat wt qsec vs am gear carb <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 four 26.7 105. 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55 2 six 19.7 183. 122. 3.59 3.12 18.0 0.571 0.429 3.86 3.43 3 eight 15.1 353. 209. 3.23 4.00 16.8 0 0.143 3.29 3.5
Consider the CO2 data set in base R −
> str(CO2) Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 84 obs. of 5 variables: $ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ... $ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ... $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ... $ conc : num 95 175 250 350 500 675 1000 95 175 250 ... $ uptake : num 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ... - attr(*, "formula")=Class 'formula' language uptake ~ conc | Plant .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> - attr(*, "outer")=Class 'formula' language ~Treatment * Type .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> - attr(*, "labels")=List of 2 ..$ x: chr "Ambient carbon dioxide concentration" ..$ y: chr "CO2 uptake rate" - attr(*, "units")=List of 2 ..$ x: chr "(uL/L)" ..$ y: chr "(umol/m^2 s)" > grouping_by_Type <- CO2 %>% group_by(Type) > grouping_by_Type %>% summarise_all(funs(mean(., na.rm = TRUE))) # A tibble: 2 x 5 Type Plant Treatment conc uptake <fct> <dbl> <dbl> <dbl> <dbl> 1 Quebec NA NA 435 33.5 2 Mississippi NA NA 435 20.9
Warning messages
- In mean.default(Plant, na.rm = TRUE) − argument is not numeric or logical− returning NA
- In mean.default(Plant, na.rm = TRUE) − argument is not numeric or logical− returning NA
- In mean.default(Treatment, na.rm = TRUE) − argument is not numeric or logical− returning NA
- In mean.default(Treatment, na.rm = TRUE) − argument is not numeric or logical − returning NA
Here, we are getting some warning messages because the variable Plant and Treatment are not numerical.
Advertisements