Open In App

How to Perform Inner Join in R

Last Updated : 23 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

When working with multiple datasets in R, combining them based on common keys or variables is often necessary to derive meaningful insights. Inner join is one of the fundamental operations in data manipulation that allows you to merge datasets based on matching values. In this article, we will explore the inner join operation in R Programming Language.

There are two main two types of methods available:

  1. merge() function from base R
  2. inner_join() function from dplyr

The inner join operation in R can be carried out employing either the merge() function which is base R's or the inner_join() function from dplyr. Here's a detailed explanation of the syntax for both approaches:

merge() function from base R

The merge() function in base R is a powerful tool for combining two data frames by columns when they have one or more common variables (similar to SQL joins).

Synatx: merged_df <- merge(x = dataframe1, y = dataframe2, by = "common_key", all = FALSE)

Where:

  • x and y: These parameters give accounts for the input dataframes to be merged. x stands for the left dataframe and y indicates the right side of the dataframe.
  • by: By setting this parameter, the user can control which column(s) or variable(s) in the input dataframes will act as key(s) to perform the merge operation.
  • all: The presence or absence of this parameter is optional.

Let's explane the inner join operation with a practical example:

R
# Create two example data frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
                  Name = c("Johny", "Ali", "Boby", "Emilya"))
df1
df2 <- data.frame(ID = c(2, 3, 4, 5),
                  Age = c(30, 25, 40, 35),
                  Gender = c("Male", "Female", "Male", "Female"))
df2

Output:

  ID   Name
1  1  Johny
2  2    Ali
3  3   Boby
4  4 Emilya

  ID Age Gender
1  2  30   Male
2  3  25 Female
3  4  40   Male
4  5  35 Female

Inner Join using merge()

Here we are going to preform inner join with single columns i.e ID.

R
# Perform an inner join on the "ID" column
merged_df <- merge(df1, df2, by = "ID", all = FALSE)

# Output the merged data frame
print(merged_df)

Output:

  ID   Name Age Gender
1  2    Ali  30   Male
2  3   Boby  25 Female
3  4 Emilya  40   Male

This will produce a data frame containing only the rows where the "ID" column matches in both data frames.

Inner Join on Multiple Columns

Here we are going to preform inner join with multiple columns i.e ID.

R
# Create two example data frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
                  Name = c("Johny", "Ali", "Boby", "Emilya"))
df2 <- data.frame(ID = c(2, 3, 4, 5),
                  Name = c("Ali", "Boby", "Emilya",'Jhonathan'),
                  Age = c(30, 25, 40, 35),
                  Gender = c("Male", "Female", "Male", "Female"))

# Perform an inner join on the "ID" and "Name" columns
merged_df <- merge(df1, df2, by = c("ID", "Name"), all = FALSE)

# Output the merged data frame
print(merged_df)

Output:

  ID   Name Age Gender
1  2    Ali  30   Male
2  3   Boby  25 Female
3  4 Emilya  40   Male

This will merge the data frames based on both "ID" and "Name" columns. Adjust the all parameter as needed to perform other types of joins.

Using dplyr to Perform Inner Join in R

By using dplyr package we perform Inner join in R. Here are the basic syntax for Perform Inner Join in R.

Syntax: merged_df <- inner_join(dataframe1, dataframe2, by = "common_key")

Where,

  • dataframe1 and dataframe2: These parameters specify the input dataframes to be merged.
  • by: This parameter explains the column(s) or variable(s) in the dataframes being passed in as the keys for merging.

Inner Join on a Single Column

R
# Load the dplyr package
library(dplyr)

# Create two example data frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
                  Name = c("John", "Alice", "Bob", "Emily"))
df1
df2 <- data.frame(ID = c(2, 3, 4, 5),
                  Age = c(30, 25, 40, 35),
                  Gender = c("Male", "Female", "Male", "Female"))
df2
# Perform a simple inner join on the "ID" column using dplyr
merged_df_simple <- inner_join(df1, df2, by = "ID")

# Output the merged data frame
print(merged_df_simple)

Output:

  ID  Name
1  1  John
2  2 Alice
3  3   Bob
4  4 Emily

  ID Age Gender
1  2  30   Male
2  3  25 Female
3  4  40   Male
4  5  35 Female

  ID  Name Age Gender
1  2 Alice  30   Male
2  3   Bob  25 Female
3  4 Emily  40   Male

In this we join the dataframes by ID column.

Inner Join on Multiple Columns

R
# Load the dplyr package
library(dplyr)

# Create two example data frames with overlapping and non-overlapping columns
df1 <- data.frame(ID = c(1, 2, 3, 4),
                  Name = c("John", "Alice", "Bob", "Emily"),
                  Age = c(30, 25, 40, 35))
df1
df2 <- data.frame(ID = c(2, 3, 4, 5),
                  Name = c("Alice", "Bob", "Emily", "Eve"),
                  Age = c(25, 40, 35, 45), # Adjusted Age values for matches
                  Gender = c("Male", "Female", "Male", "Female"))
df2
# Perform an inner join on multiple columns using dplyr
merged_df_multiple <- inner_join(df1, df2, by = c("ID", "Name", "Age"))

# Output the merged data frame
print(merged_df_multiple)

Output:

  ID  Name Age
1  1  John  30
2  2 Alice  25
3  3   Bob  40
4  4 Emily  35

  ID  Name Age Gender
1  2 Alice  25   Male
2  3   Bob  40 Female
3  4 Emily  35   Male
4  5   Eve  45 Female

  ID  Name Age Gender
1  2 Alice  25   Male
2  3   Bob  40 Female
3  4 Emily  35   Male

In this we performed inner join on the "ID", "Name", and "Age" columns simultaneously. Adjust the by parameter accordingly to match the columns you want to join on.

Conclusion

Inner join is a critical feature of R-based analytics that we can use to append datasets by common key fields or variables. With knowledge of its syntax, applications, and limitations as well as best practices you would be able to incorporate data that comes from different sources, analyze it and extract important information. Whatever data analysis you're carrying out, whether it's building your predictive models, conducting business intelligence or simply in exploratory data analysis, the proper use of inner join in R is vital for effective and insightful data manipulation.


Next Article
Article Tags :

Similar Reads