How to Use Gather Function in R
Last Updated :
27 May, 2024
In data analysis and manipulation, it's often necessary to reshape datasets for better comprehension or analysis. The gather() function in the R Programming Language part of the tidyr package, is a powerful tool for reshaping data from wide to long format. This article will explore the gather() function in detail, providing explanations and examples.
What is the Gather Function?
The gather() function is used to transform wide datasets into long datasets, making it easier to work with and analyze the data. It takes multiple columns and collapses them into key-value pairs, resulting in a dataset with fewer columns and more rows.
The syntax of the gather() function is as follows:
Syntax:
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)
- data: The input data frame.
- key: The name of the new key column that will contain the column names of the original dataset.
- value: The name of the new value column that will contain the values from the original dataset.
- ...: Columns to gather. You can specify multiple columns separated by commas.
- na.rm: Boolean value indicating whether to remove NA values. Default is FALSE.
- convert: Boolean value indicating whether to automatically convert character columns to factors. Default is FALSE.
Let's walk through some examples to better understand how the gather() function works.
R
# Load the tidyr package
library(tidyr)
# Create sample wide dataset
wide_data <- data.frame(
ID = 1:3,
A = c(10, 20, 30),
B = c(15, 25, 35),
C = c(12, 22, 32)
)
# Print the wide dataset
print(wide_data)
# Gather the data into long format
long_data <- gather(wide_data, key = "Variable", value = "Value", -ID)
# Print the long format data
print(long_data)
Output:
ID A B C
1 1 10 15 12
2 2 20 25 22
3 3 30 35 32
ID Variable Value
1 1 A 10
2 2 A 20
3 3 A 30
4 1 B 15
5 2 B 25
6 3 B 35
7 1 C 12
8 2 C 22
9 3 C 32
The gather
function has effectively transformed the data from wide format to long format, making it easier to perform certain types of analyses or visualizations.
Gathering Multiple Variables
Now we will use gather function and Gathering Multiple Variables in R Programming Language.
R
# Create sample wide dataset
wide_data <- data.frame(
ID = 1:3,
Age_2019 = c(25, 30, 35),
Age_2020 = c(26, 31, 36),
Age_2021 = c(27, 32, 37)
)
wide_data
# Gather the data into long format
long_data <- gather(wide_data, key = "Year", value = "Age", -ID)
# Print the long format data
print(long_data)
Output:
ID Age_2019 Age_2020 Age_2021
1 1 25 26 27
2 2 30 31 32
3 3 35 36 37
ID Year Age
1 1 Age_2019 25
2 2 Age_2019 30
3 3 Age_2019 35
4 1 Age_2020 26
5 2 Age_2020 31
6 3 Age_2020 36
7 1 Age_2021 27
8 2 Age_2021 32
9 3 Age_2021 37
In this example gathers multiple variables (Age_2019, Age_2020, Age_2021) into a long format, where the "Year" column contains the year information and the "Age" column contains the corresponding ages.
Gathering Categorical Data
Now we will Gathering Categorical Data with the help of gather function in R Programming Language.
R
# Create sample wide dataset
wide_data <- data.frame(
ID = 1:3,
Gender = c("Male", "Female", "Male"),
Ethnicity = c("Asian", "Caucasian", "African American")
)
wide_data
# Gather the data into long format
long_data <- gather(wide_data, key = "Category", value = "Value", -ID)
# Print the long format data
print(long_data)
Output:
ID Gender Ethnicity
1 1 Male Asian
2 2 Female Caucasian
3 3 Male African American
ID Category Value
1 1 Gender Male
2 2 Gender Female
3 3 Gender Male
4 1 Ethnicity Asian
5 2 Ethnicity Caucasian
6 3 Ethnicity African American
In This example gathers categorical variables (Gender, Ethnicity) into a long format, where the "Category" column contains the variable names and the "Value" column contains the corresponding values.
Conclusion
The gather() function in R is a powerful tool for reshaping wide datasets into long format, making them easier to analyze and work with. By understanding its syntax and usage through examples, you can effectively manipulate your data to suit your analytical needs.
Similar Reads
How to Use sum Function in R?
In this article, we will discuss how to use the sum() function in the R Programming Language. sum() function: This is used to return the total/sum of the given data Syntax: sum(data) Arguments: data can be a vector or a dataframeExample 1: Using sum() function to calculate the sum of vector elements
5 min read
How to Use Nrow Function in R?
In this article, we will discuss how to use Nrow function in R Programming Language. This function is used in the dataframe or the matrix to get the number of rows. Syntax: nrow(data) where, data can be a dataframe or a matrix. Example 1: Count Rows in Data Frame In this example, we are going to cou
2 min read
How to use Summary Function in R?
The summary() function provides a quick statistical overview of a given dataset or vector. When applied to numeric data, it returns the following key summary statistics:Min: The minimum value in the data1st Qu: The first quartile (25th percentile)Median: The middle value (50th percentile)3rd Qu: The
2 min read
How to Use ColMeans Function in R?
In this article, we will discuss how to use the ColMeans function in R Programming Language. Using colmeans() function The colmean() function call be simply called by passing the parameter as the data frame to get the mean of every column present in the data frame separately in the R language. Synt
3 min read
How to Use aggregate Function in R
In this article, we will discuss how to use aggregate function in R Programming Language. aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum. max etc. Syntax: aggregate(dataframe$aggregate_column, list(dataframe$group_column), FUN)
2 min read
How to Use file.path() Function in R
R programming language is becoming popular among developers, analysts, and mainly for data scientists. Students are eagerly learning R with Python language to use their analytical skills at their best. While learning any language, one is faced with many difficulties, and the individual learning R Pr
3 min read
How to use the source Function in R
In this article, we will be looking at the practical implementation of the source function in the R programming language. Source Function: Source function in R is used to use functions that are created in another R script. The syntax of this function is given below: source("Users/harsh/Desktop/Geeks
2 min read
How to Use do.call in R
In R Programming language the do.call() function is used to execute a function call using a list of arguments. This can be particularly useful when we have a function and its arguments stored in separate objects (e.g., in a list) and we want to apply the function to these arguments. Syntax do.call(f
4 min read
Tidyverse Functions in R
Tidyverse is a collection of R packages designed to make data analysis easier, more intuitive, and efficient. Among the various packages within Tidyverse, several key functions stand out for their versatility and usefulness in data manipulation tasks. In this article, we'll explore some of the most
4 min read
sum() function in R
sum() function in R Programming Language returns the addition of the values passed as arguments to the function. Syntax: sum(...) Parameters: ...: numeric or complex or logical vectorssum() Function in R ExampleR program to add two numbersHere we will use sum() functions to add two numbers. R a1=c(1
2 min read