Calculate difference between dataframe rows by group in R
Last Updated :
16 Dec, 2021
In this article, we will see how to find the difference between rows by the group in dataframe in R programming language.
Method 1: Using dplyr package
The group_by method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is specified as an argument of this function. It may contain multiple column names.
Syntax:
group_by(col1, col2, ...)
This is followed by the application of mutate() method which is used to shift orientations and perform manipulations in the data. The new column name can be specified using the new column name. The difference from the previous row can be calculated using the lag() method of this library. This method finds the previous values in a vector.
Syntax:
lag(x, n = 1L, default = NA)
Parameter:
- x - A vector of values
- n - Number of positions to lag by
- default (Default : NA)- the value used for non-existent rows.
A mutation is introduced in the data frame by using the lag of the column value subtracted from the specified column's particular row. The default value is the first value of that particular group using the first(col-name).
Example:
R
# installing required libraries
library("dplyr")
# creating a data frame
data_frame <- data.frame(col1 = sample(6:9, 9 , replace = TRUE),
col2 = letters[1:3],
col3 = c(1,4,5,1,NA,NA,2,NA,2))
print ("Original DataFrame")
print (data_frame)
print ("Modified DataFrame")
# computing difference of each group
data_frame%>%group_by(col1)%>%mutate(diff=col3-lag(
col3,default=first(col3)))
Output
[1] "Original DataFrame"
col1 col2 col3
1 6 a 1
2 9 b 4
3 7 c 5
4 6 a 1
5 6 b NA
6 9 c NA
7 6 a 2
8 8 b NA
9 7 c 2
[1] "Modified DataFrame"
# A tibble: 9 x 4
# Groups: col1 [4]
col1 col2 col3 diff
<int> <chr> <dbl> <dbl>
1 6 a 1 0
2 9 b 4 0
3 7 c 5 0
4 6 a 1 0
5 6 b NA NA
6 9 c NA NA
7 6 a 2 NA
8 8 b NA NA
9 7 c 2 -3
Method 2 : Using data.table package
The data frame indexing methods can be used to calculate the difference of rows by group in R. The 'by' attribute is to specify the column to group the data by. All the rows are retained, while a new column is added in the set of columns, using the column to take to compute the difference of rows by the group. The difference is calculated by using the particular row of the specified column and subtracting from it the previous value computed using the shift() method. The shift method is used to lag vectors or lists.
Syntax:
data_frame[ , new-col-name := reqd-col - shift(reqd-col), by = grouping-col]
The first instance of that particular group is replaced by NA in that particular column.
Example:
R
# installing required libraries
library("data.table")
# creating a data frame
data_frame <- data.table(col1 = sample(6:9, 9 , replace = TRUE),
col2 = letters[1:3],
col3 = c(1,4,5,1,9,11,2,7,2))
print ("Original DataFrame")
print (data_frame)
# computing difference of each group
data_frame[ , diff := col3 - shift(col3), by = col1]
print ("Modified DataFrame")
print (data_frame)
Output
[1] "Original DataFrame"
col1 col2 col3
1: 8 a 1
2: 8 b 4
3: 7 c 5
4: 6 a 1
5: 6 b 9
6: 8 c 11
7: 8 a 2
8: 9 b 7
9: 7 c 2
[1] "Modified DataFrame"
col1 col2 col3 diff
1: 8 a 1 NA
2: 8 b 4 3
3: 7 c 5 NA
4: 6 a 1 NA
5: 6 b 9 8
6: 8 c 11 7
7: 8 a 2 -9
8: 9 b 7 NA
9: 7 c 2 -3
Method 3 : Using ave() method
The ave() method in base R is used to group averages over the level combinations of factors.
Syntax:
ave(x, group , FUN = mean)
Parameter :
- x - the required data frame column
- group - the grouping variables
- FUN - The function to apply for each factor level combination.
The function here is to compute the difference of a particular column in that row and the difference of the previous row with it. The first instance of that particular group is replaced by NA in that particular column.
Example:
R
# creating a data frame
data_frame <- data.frame(col1 = sample(6:9, 9 , replace = TRUE),
col2 = letters[1:3],
col3 = c(1,4,5,1,9,11,2,7,2))
print ("Original DataFrame")
print (data_frame)
# computing difference of each group
data_frame$diff <- ave(data_frame$col3, factor(data_frame$col1),
FUN=function(x) c(NA,diff(x)))
print ("Modified DataFrame")
print (data_frame)
Output
[1] "Original DataFrame"
col1 col2 col3
1 9 a 1
2 9 b 4
3 6 c 5
4 7 a 1
5 6 b 9
6 7 c 11
7 9 a 2
8 9 b 7
9 9 c 2
[1] "Modified DataFrame"
col1 col2 col3 diff
1 9 a 1 NA
2 9 b 4 3
3 6 c 5 NA
4 7 a 1 NA
5 6 b 9 4
6 7 c 11 10
7 9 a 2 -2
8 9 b 7 5
9 9 c 2 -5
Similar Reads
Calculate difference between columns of R DataFrame
Generally, the difference between two columns can be calculated from a dataframe that contains some numeric data. In this article, we will discuss how the difference between columns can be calculated in the R programming language. Approach Create a dataframe and the columns should be of numeric or i
2 min read
How to calculate time difference with previous row of a dataframe by group in R
A dataframe may consist of different values belonging to groups. The columns may have values belonging to different data types or time frames as POSIXct objects. These objects allow the application of mathematical operations easily, which can be performed in the following ways :Â Method 1: Using dpl
5 min read
How to Calculate the Mean by Group in R DataFrame ?
Calculating the mean by group in an R DataFrame involves splitting the data into subsets based on a specific grouping variable and then computing the mean of a numeric variable within each subgroup. In this article, we will see how to calculate the mean by the group in R DataFrame in R Programming L
5 min read
How to find common rows between two dataframe in R?
In this article, we are going to find common rows between two dataframes in R programming language. For this, we start by creating two dataframes. Dataframes in use: Method 1: Using inner join We can get the common rows by performing the inner join on the two dataframes. It is available in dplyr() p
2 min read
Create DataFrame Row by Row in R
In this article, we will discuss how to create dataframe row by row in R Programming Language. Method 1: Using for loop and indexing methods An empty data frame in R language can be created using the data.frame() method in R. For better clarity, the data types of the columns can be defined during th
4 min read
How to find common rows and columns between two dataframe in R?
Two data frames can have similar rows, and they can be determined. In this article, we will find the common rows and common columns between two data frames, in the R programming language. Approach Create a first data frameCreate a second data frameCompare using required functionsCopy same rows to an
2 min read
How to filter R DataFrame by values in a column?
In R Programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained : Rows are considered to be a subset of the input.Rows in the subset appear in the same order as the original
5 min read
Count number of rows within each group in R DataFrame
DataFrame in R Programming Language may contain columns where not all values are unique. The duplicate values in the dataframe can be sectioned together into one group. The frequencies corresponding to the same columns' sequence can be captured using various external packages in R programming langua
5 min read
Create Lagged Variable by Group in R DataFrame
Lagged variable is the type of variable that contains the previous value of the variable for which we want to create the lagged variable and the first value is neglected. Data can be segregated based on different groups in R programming language and then these categories can be processed differently
5 min read
Combine two DataFrames in R with different columns
In this article, we will discuss how to combine two dataframes with different columns in R Programming Language. Method 1 : Using plyr package The "plyr" package in R is used to work with data, including its enhancements and manipulations. It can be loaded and installed into the working space by the
5 min read