Open In App

Merge, Join and Concatenate DataFrames using Pandas

Last Updated : 25 Nov, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Dataframe is a two-dimensional data structure having multiple rows and columns. In a Pandas DataFrame, the data is aligned in the form of rows and columns only. A dataframe can perform arithmetic as well as conditional operations. It has a mutable size. This article will show how to join, concatenate, and merge in Pandas.

Python Merge, Join, and Concatenate DataFrames Using Pandas

Below are the different ways and approaches by which we can merge, join, and concatenate in Pandas in Python:

Concatenate DataFrames

concat() function does all of the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes. In this example, three DataFrames (df1, df2, and df3) with identical columns but different indices are created. Using pd.concat(), these Pandas DataFrames are vertically stacked, resulting in a combined DataFrame where rows from each original DataFrame follow one another.

Python
# Creating first dataframe
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 1, 2, 3])

# Creating second dataframe
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']},
                   index=[4, 5, 6, 7])

# Creating third dataframe
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                    'B': ['B8', 'B9', 'B10', 'B11'],
                    'C': ['C8', 'C9', 'C10', 'C11'],
                    'D': ['D8', 'D9', 'D10', 'D11']},
                   index=[8, 9, 10, 11])

# Concatenating the dataframes
pd.concat([df1, df2, df3])

Output:

      A    B    C    D
0    A0   B0   C0   D0
1    A1   B1   C1   D1
2    A2   B2   C2   D2
3    A3   B3   C3   D3
4    A4   B4   C4   D4
5    A5   B5   C5   D5
6    A6   B6   C6   D6
7    A7   B7   C7   D7
8    A8   B8   C8   D8
9    A9   B9   C9   D9
10  A10  B10  C10  D10
11  A11  B11  C11  D11

Merge DataFrames

DataFrames Merge Pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame objects. In this example, two DataFrames (left and right) are created with a common key column ‘Key’. The pd.merge() function merges these DataFrames based on their common ‘Key’ column using an inner join, resulting in a combined DataFrame containing only the rows where the key values match in both DataFrames.

Python
# Dataframe created
left = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})

# Merging the dataframes
pd.merge(left, right, how='inner', on='Key')

Output:

Inner Merge:
  Key   A   B   C   D
0  K0  A0  B0  C0  D0
1  K1  A1  B1  C1  D1

Left Merge

The pd.merge() function with how='left' performs a left join, merging the left DataFrame with the right DataFrame on the ‘Key’ column. All rows from the left DataFrame are retained, and matching rows from the right DataFrame are appended with their respective values.

Python
left_merged = pd.merge(left, right, how='left', on='Key')
print(left_merged)

Output:

Left Merge:
  Key   A   B    C    D
0  K0  A0  B0   C0   D0
1  K1  A1  B1   C1   D1
2  K2  A2  B2  NaN  NaN
3  K3  A3  B3  NaN  NaN

Right Merge

The pd.merge() function with how='right' performs a right join, merging the left DataFrame with the right DataFrame on the ‘Key’ column. All rows from the right DataFrame are retained, and matching rows from the left DataFrame are appended with their respective values.

Python
right_merged = pd.merge(left, right, how='right', on='Key')
print(right_merged)

Output:

Right Merge:
  Key    A    B   C   D
0  K0   A0   B0  C0  D0
1  K1   A1   B1  C1  D1
2  K4  NaN  NaN  C4  D4
3  K5  NaN  NaN  C5  D5

Outer Merge

The pd.merge() function with how='outer' performs a full outer join, merging the left DataFrame with the right DataFrame on the ‘Key’ column. This results in a DataFrame that contains all rows from both the left and right DataFrames, filling in missing values with NaN where there isn’t a match.

Python
outer_merged = pd.merge(left, right, how='outer', on='Key')
print(outer_merged)

Output:

Outer Merge:
  Key    A    B    C    D
0  K0   A0   B0   C0   D0
1  K1   A1   B1   C1   D1
2  K2   A2   B2  NaN  NaN
3  K3   A3   B3  NaN  NaN
4  K4  NaN  NaN   C4   D4
5  K5  NaN  NaN   C5   D5

Join DataFrames

In this example, both left and right DataFrames are indexed with keys ‘K0’, ‘K1’, ‘K2’, and ‘K3’. The left.join(right) operation merges the DataFrames on their indices, combining columns ‘A’ and ‘B’ from left with columns ‘C’ and ‘D’ from right based on the shared index.

Python
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']},
                    index=['K0', 'K1', 'K2', 'K3'])

right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']},
                     index=['K0', 'K1', 'K2', 'K3'])

# Joining the dataframes
left.join(right)

Output

     A   B   C   D
K0  A0  B0  C0  D0
K1  A1  B1  C1  D1
K2  A2  B2  C2  D2
K3  A3  B3  C3  D3


Next Article
Practice Tags :

Similar Reads