How to Perform Inner Join in R
Last Updated :
23 May, 2024
When working with multiple datasets in R, combining them based on common keys or variables is often necessary to derive meaningful insights. Inner join is one of the fundamental operations in data manipulation that allows you to merge datasets based on matching values. In this article, we will explore the inner join operation in R Programming Language.
There are two main two types of methods available:
- merge() function from base R
- inner_join() function from dplyr
The inner join operation in R can be carried out employing either the merge() function which is base R's or the inner_join() function from dplyr. Here's a detailed explanation of the syntax for both approaches:
merge() function from base R
The merge() function in base R is a powerful tool for combining two data frames by columns when they have one or more common variables (similar to SQL joins).
Synatx: merged_df <- merge(x = dataframe1,
y = dataframe2,
by = "common_key",
all = FALSE)
Where:
- x and y: These parameters give accounts for the input dataframes to be merged. x stands for the left dataframe and y indicates the right side of the dataframe.
- by: By setting this parameter, the user can control which column(s) or variable(s) in the input dataframes will act as key(s) to perform the merge operation.
- all: The presence or absence of this parameter is optional.
Let's explane the inner join operation with a practical example:
R
# Create two example data frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
Name = c("Johny", "Ali", "Boby", "Emilya"))
df1
df2 <- data.frame(ID = c(2, 3, 4, 5),
Age = c(30, 25, 40, 35),
Gender = c("Male", "Female", "Male", "Female"))
df2
Output:
ID Name
1 1 Johny
2 2 Ali
3 3 Boby
4 4 Emilya
ID Age Gender
1 2 30 Male
2 3 25 Female
3 4 40 Male
4 5 35 Female
Inner Join using merge()
Here we are going to preform inner join with single columns i.e ID.
R
# Perform an inner join on the "ID" column
merged_df <- merge(df1, df2, by = "ID", all = FALSE)
# Output the merged data frame
print(merged_df)
Output:
ID Name Age Gender
1 2 Ali 30 Male
2 3 Boby 25 Female
3 4 Emilya 40 Male
This will produce a data frame containing only the rows where the "ID" column matches in both data frames.
Inner Join on Multiple Columns
Here we are going to preform inner join with multiple columns i.e ID.
R
# Create two example data frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
Name = c("Johny", "Ali", "Boby", "Emilya"))
df2 <- data.frame(ID = c(2, 3, 4, 5),
Name = c("Ali", "Boby", "Emilya",'Jhonathan'),
Age = c(30, 25, 40, 35),
Gender = c("Male", "Female", "Male", "Female"))
# Perform an inner join on the "ID" and "Name" columns
merged_df <- merge(df1, df2, by = c("ID", "Name"), all = FALSE)
# Output the merged data frame
print(merged_df)
Output:
ID Name Age Gender
1 2 Ali 30 Male
2 3 Boby 25 Female
3 4 Emilya 40 Male
This will merge the data frames based on both "ID" and "Name" columns. Adjust the all
parameter as needed to perform other types of joins.
By using dplyr package we perform Inner join in R. Here are the basic syntax for Perform Inner Join in R.
Syntax: merged_df <- inner_join(dataframe1,
dataframe2,
by = "common_key")
Where,
- dataframe1 and dataframe2: These parameters specify the input dataframes to be merged.
- by: This parameter explains the column(s) or variable(s) in the dataframes being passed in as the keys for merging.
Inner Join on a Single Column
R
# Load the dplyr package
library(dplyr)
# Create two example data frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
Name = c("John", "Alice", "Bob", "Emily"))
df1
df2 <- data.frame(ID = c(2, 3, 4, 5),
Age = c(30, 25, 40, 35),
Gender = c("Male", "Female", "Male", "Female"))
df2
# Perform a simple inner join on the "ID" column using dplyr
merged_df_simple <- inner_join(df1, df2, by = "ID")
# Output the merged data frame
print(merged_df_simple)
Output:
ID Name
1 1 John
2 2 Alice
3 3 Bob
4 4 Emily
ID Age Gender
1 2 30 Male
2 3 25 Female
3 4 40 Male
4 5 35 Female
ID Name Age Gender
1 2 Alice 30 Male
2 3 Bob 25 Female
3 4 Emily 40 Male
In this we join the dataframes by ID column.
Inner Join on Multiple Columns
R
# Load the dplyr package
library(dplyr)
# Create two example data frames with overlapping and non-overlapping columns
df1 <- data.frame(ID = c(1, 2, 3, 4),
Name = c("John", "Alice", "Bob", "Emily"),
Age = c(30, 25, 40, 35))
df1
df2 <- data.frame(ID = c(2, 3, 4, 5),
Name = c("Alice", "Bob", "Emily", "Eve"),
Age = c(25, 40, 35, 45), # Adjusted Age values for matches
Gender = c("Male", "Female", "Male", "Female"))
df2
# Perform an inner join on multiple columns using dplyr
merged_df_multiple <- inner_join(df1, df2, by = c("ID", "Name", "Age"))
# Output the merged data frame
print(merged_df_multiple)
Output:
ID Name Age
1 1 John 30
2 2 Alice 25
3 3 Bob 40
4 4 Emily 35
ID Name Age Gender
1 2 Alice 25 Male
2 3 Bob 40 Female
3 4 Emily 35 Male
4 5 Eve 45 Female
ID Name Age Gender
1 2 Alice 25 Male
2 3 Bob 40 Female
3 4 Emily 35 Male
In this we performed inner join on the "ID", "Name", and "Age" columns simultaneously. Adjust the by
parameter accordingly to match the columns you want to join on.
Conclusion
Inner join is a critical feature of R-based analytics that we can use to append datasets by common key fields or variables. With knowledge of its syntax, applications, and limitations as well as best practices you would be able to incorporate data that comes from different sources, analyze it and extract important information. Whatever data analysis you're carrying out, whether it's building your predictive models, conducting business intelligence or simply in exploratory data analysis, the proper use of inner join in R is vital for effective and insightful data manipulation.
Similar Reads
How to Do a Left Join in R?
In this article, we will discuss how to do a left join in R programming language. A left join is used to join the table by selecting all the records from the first dataframe and only matching records in the second dataframe. Method 1: Using merge() function This function is used to join the datafram
2 min read
How to Perform a SUMIF Function in R?
In this article, we will discuss the sumif function in R Programming Language. This function is used to group the data and get the sum of the values by a group of values in the dataframe, so we are going to perform this operation on the dataframe. Method 1: Perform a SUMIF Function On One Column: In
3 min read
How to Code in R programming?
R is a powerful programming language and environment for statistical computing and graphics. Whether you're a data scientist, statistician, researcher, or enthusiast, learning R programming opens up a world of possibilities for data analysis, visualization, and modeling. This comprehensive guide aim
4 min read
How to Perform Paired t-Test for Multiple Columns in R
In statistics, a paired t-test is used to compare two related groups, determining if their means are significantly different from each other. Itâs commonly applied in cases like "before and after" measurements. In R, we can perform a paired t-test on individual pairs of columns or multiple pairs of
3 min read
How to Print String and Variable on Same Line in R
Printing a string and a variable on the same line is useful for improving readability, concatenating dynamic output, aiding in debugging by displaying variable values, and formatting output for reports or user display. Below are different approaches to printing String and Variable on the Same Line u
3 min read
How to Transform Data in R?
Data transformation in R can be performed using the tidyverse and dplyr packages, which offer various methods for data manipulation. These packages can be easily installed and provide a range of techniques for data transformation.Installing Required PackagesThe tidyverse and dplyr package can be ins
7 min read
Contingency Tables in R Programming
Prerequisite: Data Structures in R ProgrammingContingency tables are very useful to condense a large number of observations into smaller to make it easier to maintain tables. A contingency table shows the distribution of a variable in the rows and another in its columns. Contingency tables are not o
6 min read
Two Dimensional List in R Programming
A list in R is basically an R object that contains within it, elements belonging to different data types, which may be numbers strings or even other lists. Basically, a list can contain other objects which may be of varying lengths. The list is defined using the list() function in R. A two-dimension
5 min read
How to Handle merge Error in R
R is a powerful programming language that is widely used for data analysis and statistical computation. The merge() function is an essential R utility for integrating datasets. However, combining datasets in R may occasionally result in errors, which can be unpleasant for users. Understanding how to
3 min read
How to merge dataframes in R ?
In this article, we will discuss how to perform inner, outer, left, or right joins in a given dataframe in R Programming Language. Functions Used merge() function is used to merge or join two tables. With appropriate values provided to specific parameters, we can create the desired join. Syntax: mer
3 min read