Count number of rows within each group in R DataFrame
Last Updated :
30 May, 2021
DataFrame in R Programming Language may contain columns where not all values are unique. The duplicate values in the dataframe can be sectioned together into one group. The frequencies corresponding to the same columns’ sequence can be captured using various external packages in R programming language.
Method 1 : Using dplyr package
The “dplyr” package in R is used to perform data enhancements and manipulations. We can use certain functions from this method that can help to realize our functionality.
- Using tally() and group_by() method
group_by() method in R can be used to categorize data into groups based on either a single column or a group of multiple columns. All the plausible unique combinations of the input columns are stacked together as a single group.
Syntax:
group_by(args .. )
Where, the args contain a sequence of column to group data upon
The tally() method in R is used to summarize the data and count the number of values that each group belongs to. Upon successive application of these methods, the dataframe mutations are carried out to return a table where the particular input columns are returned in order of their appearance in the group_by() method, followed by a column ‘n’ containing frequency counts for these groups.
This method is considered to be better than other approaches because it returns detailed information about the column classes of the specified dataframe.
Example:
R
library ( "dplyr" )
data_frame <- data.frame (col1 = rep ( c (1:3), each = 3),
col2 = letters [1:3])
print ( "Original DataFrame" )
print (data_frame)
data_frame %>% group_by (col1) %>% tally ()
|
Output
[1] "Original DataFrame"
col1 col2
1 1 a
2 1 b
3 1 c
4 2 a
5 2 b
6 2 c
7 3 a
8 3 b
9 3 c >
# A tibble: 3 x 2
col1 n
<int> <int>
1 1 3
2 2 3
3 3 3
- Using dplyr::count() method
The count() method can be applied to the input dataframe containing one or more columns and returns a frequency count corresponding to each of the groups. The columns returned on the application of this method is a proper subset of the columns of the original dataframe. The columns appearing in the result are the columns appearing in the count() method.
Syntax:
count(args .. ),
Where, the args contain a sequence of column to group data upon
Example:
R
library ( "dplyr" )
data_frame <- data.frame (col1 = rep ( c (1:3), each = 3),
col2 = letters [1:3],
col3 = c (1,4,1,2,2,3,1,2,2))
print ( "Original DataFrame" )
print (data_frame)
print ( "Modified DataFrame" )
data_frame %>% dplyr:: count (col1, col3)
|
Output:
[1] "Original DataFrame"
col1 col2 col3
1 1 a 1
2 1 b 4
3 1 c 1
4 2 a 2
5 2 b 2
6 2 c 3
7 3 a 1
8 3 b 2
9 3 c 2
[1] "Modified DataFrame"
col1 col3 n
1 1 1 2
2 1 4 1
3 2 2 2
4 2 3 1
5 3 1 1
6 3 2 2
Method 2 : Using data.table package
The data.table package in R can be used to retrieve and store data in an organized tabular structure. The .N attribute of the data_table indexing can be used to categorically keep a count of the frequency of the encountered specified columns’ combinations. The columns are specified in the “by” attribute using the list() method in R, which is an alternative to the group_by() method.
Syntax:
data_table[, .N, by = list(cols..)]
Example:
R
library (data.table)
data_frame <- data.frame (col1 = rep ( c (1:3), each = 3),
col2 = letters [1:3],
col3 = c (1,4,1,2,2,3,1,2,2))
print ( "Original DataFrame" )
print (data_frame)
print ( "Modified DataFrame" )
data_table <- data.table (data_frame)
data_table[, .N, by = list (col1, col3)]
|
Output
[1] "Original DataFrame"
col1 col2 col3
1 1 a 1
2 1 b 4
3 1 c 1
4 2 a 2
5 2 b 2
6 2 c 3
7 3 a 1
8 3 b 2
9 3 c 2
[1] "Modified DataFrame"
col1 col3 N
1: 1 1 2
2: 1 4 1
3: 2 2 2
4: 2 3 1
5: 3 1 1
6: 3 2 2
Method 3 : Using aggregate method
aggregate() method in R programming language is a generic function used to summarize and evaluate both time series as well dataframes.
Syntax:
aggregate(formula, data, FUN)
Parameter :
- formula : such as y ~ x where the y variables are numeric data to be split into groups according to the grouping x variables.
- by – grouping elements
- FUN – function to be applied
The function to be applied here is the length, which counts the frequency associated with each group. It computes the plausible combinations of all the columns mentioned in the formula, and displays each one with a frequency associated. Thus, it is used to perform an aggregation over all the columns.
Example:
R
data_frame <- data.frame (col1 = sample (1:2,9,replace = TRUE ),
col2 = letters [1:3],
col3 = c (1,4,1,2,2,3,1,2,2))
print ( "Original DataFrame" )
print (data_frame)
print ( "keeping a count of all groups" )
data_mod <- aggregate (col3 ~ col1 + col2,
data = data_frame,
FUN = length)
print (data_mod)
|
Output
[1] "Original DataFrame"
col1 col2 col3
1 2 a 1
2 2 b 4
3 1 c 1
4 1 a 2
5 1 b 2
6 2 c 3
7 2 a 1
8 2 b 2
9 1 c 2
[1] "keeping a count of all groups"
col1 col2 col3
1 1 a 1
2 2 a 2
3 1 b 1
4 2 b 2
5 1 c 2
6 2 c 1
Similar Reads
Select First Row of Each Group in DataFrame in R
In this article, we will discuss how to select the first row of each group in Dataframe using R programming language. The duplicated() method is used to determine which of the elements of a dataframe are duplicates of other elements. The method returns a logical vector which tells which of the rows
2 min read
How to change Row Names of DataFrame in R ?
The rows are stacked together, each denoted by a unique name. By default, the integer identifiers beginning from 1 to the number of rows are assigned to the data frame by default. The task here is to change the Rows names in given dataframe using R programming. Dataset in use: First SecondThird1a72a
3 min read
Remove rows with NA in one column of R DataFrame
Columns of DataFrame in R Programming Language can have empty values represented by NA. In this article, we are going to see how to remove rows with NA in one column. We will see various approaches to remove rows with NA values. Approach Create a data frameSelect the column based on which rows are t
2 min read
Find columns and rows with NA in R DataFrame
A data frame comprises cells, called data elements arranged in the form of a table of rows and columns. A data frame can have data elements belonging to different data types as well as missing values, denoted by NA. Approach Declare data frameUse function to get values to get NA valuesStore position
3 min read
Count non-NA values by group in DataFrame in R
In this article, we will discuss how to count non-NA values by the group in dataframe in R Programming Language. Method 1 : Using group_by() and summarise() methods The dplyr package is used to perform simulations in the data by performing manipulations and transformations. The group_by() method in
5 min read
How to Retrieve Row Numbers in R DataFrame?
In this article, we will discuss how to Retrieve Row Numbers in R Programming Language. The dataframe column can be referenced using the $ symbol, which finds its usage as data-frame$col-name. The which() method is then used to retrieve the row number corresponding to the true condition of the speci
2 min read
How to insert blank row into dataframe in R ?
In this article, we will discuss how to insert blank rows in dataframe in R Programming Language. Method 1 : Using nrow() method The nrow() method in R is used to return the number of rows in a dataframe. A new row can be inserted at the end of the dataframe using the indexing technique. The new row
3 min read
Count the number of duplicates in R
In this article, we will see how to find out the number of duplicates in R Programming language. It can be done with two methods: Using duplicated() function.Using algorithm. Method 1: Using duplicated() Here we will use duplicated() function of R and dplyr functions. Approach: Insert the "library(t
2 min read
Convert Row Names into Column of DataFrame in R
In this article, we will discuss how to Convert Row Names into Columns of Dataframe in R Programming Language. Method 1: Using row.names() row.name() function is used to set and get the name of the DataFrame. Apply the row.name() function to the copy of the DataFrame and a name to the column which c
3 min read
Count non zero values in each column of R dataframe
In this article, we are going to count the number of non-zero data entries in the data using R Programming Language. To check the number of non-zero data entries in the data first we have to put that data in the data frame by using: data <- data.frame(x1 = c(1,2,0,100,0,3,10), x2 = c(5,0,1,8,10,0
2 min read