0% found this document useful (0 votes)

15 views16 pages

Edp 3

The document provides a comprehensive guide on merging and concatenating data frames in Pandas, including inner, left, right, and outer joins, as well as merging on multiple columns and different column names. It also covers concatenating data frames both vertically and horizontally, handling different indexes, and creating multi-indexes. Additionally, it discusses reshaping data using pivoting and melting techniques, along with handling missing data in pivot tables.

Uploaded by

ys304123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views16 pages

Edp 3

Uploaded by

ys304123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Merging database-style data frames

Inner Join (Default Merge)

import pandas as pd

# Creating two sample data frames

df1 = pd.DataFrame({

'ID': [1, 2, 3, 4],

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 35, 40]

})

df2 = pd.DataFrame({

'ID': [1, 2, 3],

'Salary': [50000, 60000, 70000]

})

# Merging the data frames on the 'ID' column

merged_df = pd.merge(df1, df2, on='ID')

print(merged_df)

Output:

nginx

Copy

ID Name Age Salary

0 1 Alice 25 50000

1 2 Bob 30 60000

2 3 Charlie 35 70000

Left Join
# Merging the data frames with a left join

merged_df_left = pd.merge(df1, df2, on='ID', how='left')

print(merged_df_left)

Output:

pgsql

Copy

ID Name Age Salary

0 1 Alice 25 50000.0

1 2 Bob 30 60000.0

2 3 Charlie 35 70000.0

3 4 David 40 NaN

Right Join
# Merging the data frames with a right join

merged_df_right = pd.merge(df1, df2, on='ID', how='right')

print(merged_df_right)

Output:

nginx

Copy

ID Name Age Salary

0 1 Alice 25.0 50000

1 2 Bob 30.0 60000

2 3 Charlie 35.0 70000

Outer Join

# Merging the data frames with an outer join

merged_df_outer = pd.merge(df1, df2, on='ID', how='outer')

print(merged_df_outer)

Output:

pgsql

Copy
ID Name Age Salary

0 1 Alice 25.0 50000.0

1 2 Bob 30.0 60000.0

2 3 Charlie 35.0 70000.0

3 4 David 40.0 NaN

Merging on Multiple Columns

# Creating two data frames with multiple common columns

df1 = pd.DataFrame({

'ID': [1, 2, 3],

'Department': ['HR', 'Finance', 'IT'],

'Employee': ['Alice', 'Bob', 'Charlie']

})

df2 = pd.DataFrame({

'ID': [1, 2, 3],

'Department': ['HR', 'Finance', 'IT'],

'Salary': [50000, 60000, 70000]

})

# Merging based on both 'ID' and 'Department'

merged_df_multi = pd.merge(df1, df2, on=['ID', 'Department'])

print(merged_df_multi)

Output:

nginx

Copy

ID Department Employee Salary

0 1 HR Alice 50000

1 2 Finance Bob 60000

2 3 IT Charlie 70000
Merging with Different Column Names

# Creating data frames with different column names for the merge

df1 = pd.DataFrame({

'EmployeeID': [1, 2, 3],

'EmployeeName': ['Alice', 'Bob', 'Charlie']

})

df2 = pd.DataFrame({

'ID': [1, 2, 3],

'Salary': [50000, 60000, 70000]

})

# Merging based on different column names

merged_df_diff_names = pd.merge(df1, df2, left_on='EmployeeID', right_on='ID')

print(merged_df_diff_names)

Output:

nginx

Copy

EmployeeID EmployeeName ID Salary

0 1 Alice 1 50000

1 2 Bob 2 60000

2 3 Charlie 3 70000

Concatenating along with an axis

Concatenating Vertically (Row-wise)

import pandas as pd

# Creating two data frames

df1 = pd.DataFrame({
'ID': [1, 2, 3],

'Name': ['Alice', 'Bob', 'Charlie']

})

df2 = pd.DataFrame({

'ID': [4, 5, 6],

'Name': ['David', 'Eve', 'Frank']

})

# Concatenating along axis=0 (row-wise)

concatenated_df = pd.concat([df1, df2], axis=0, ignore_index=True)

print(concatenated_df)

Output:

nginx

Copy

ID Name

0 1 Alice

1 2 Bob

2 3 Charlie

3 4 David

4 5 Eve

5 6 Frank

Concatenating Horizontally (Column-wise)

# Creating two data frames with the same index but different columns

df1 = pd.DataFrame({

'ID': [1, 2, 3],

'Name': ['Alice', 'Bob', 'Charlie']

})
df2 = pd.DataFrame({

'Age': [25, 30, 35],

'Salary': [50000, 60000, 70000]

})

# Concatenating along axis=1 (column-wise)

concatenated_df_horizontal = pd.concat([df1, df2], axis=1)

print(concatenated_df_horizontal)

Output:

nginx

Copy

ID Name Age Salary

0 1 Alice 25 50000

1 2 Bob 30 60000

2 3 Charlie 35 70000

Concatenating with Different Indexes

# Creating two data frames with different indexes

df1 = pd.DataFrame({

'ID': [1, 2],

'Name': ['Alice', 'Bob']

}, index=[0, 1])

df2 = pd.DataFrame({

'Age': [25, 30],

'Salary': [50000, 60000]

}, index=[1, 2])

# Concatenating along axis=0 (row-wise), handling different indexes

concatenated_df_diff_index = pd.concat([df1, df2], axis=0, ignore_index=True)

print(concatenated_df_diff_index)

Output:

pgsql

Copy

ID Name Age Salary

0 1 Alice NaN NaN

1 2 Bob 25.0 50000.0

2 1 NaN 30.0 60000.0

3 2 NaN NaN NaN

Concatenating with Keys (Creating a MultiIndex)

# Concatenating with keys to create a hierarchical index

concatenated_df_keys = pd.concat([df1, df2], axis=0, keys=['df1', 'df2'])

print(concatenated_df_keys)

Output:

pgsql

Copy

ID Name Age Salary

df1 0 1 Alice NaN NaN

1 2 Bob 25.0 50000.0

df2 1 1 NaN 30.0 60000.0

2 2 NaN NaN NaN

Concatenating with Mismatched Columns

# Creating two data frames with mismatched columns

df1 = pd.DataFrame({

'ID': [1, 2],

'Name': ['Alice', 'Bob']

})

df2 = pd.DataFrame({

'Age': [25, 30],

'Salary': [50000, 60000]

})

# Concatenating along axis=1 (column-wise) with mismatched columns

concatenated_df_mismatched = pd.concat([df1, df2], axis=1)

print(concatenated_df_mismatched)

Output:

pgsql

Copy

ID Name Age Salary

0 1 Alice NaN NaN

1 2 Bob 25.0 50000.0

Merging on index
Simple Merge on Index

import pandas as pd

# Creating two data frames with meaningful indexes

df1 = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({

'Salary': [50000, 60000, 70000],

'Department': ['HR', 'Finance', 'IT']

}, index=['a', 'b', 'c'])

# Merging the data frames on the index

merged_df = pd.merge(df1, df2, left_index=True, right_index=True)

print(merged_df)

Output:

css

Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 70000 IT

Merge on Index with Different Column Names

# Creating two data frames with different column names but same index

df1 = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({

'Salary': [50000, 60000, 70000],

'Department': ['HR', 'Finance', 'IT']

}, index=['a', 'b', 'c'])

# Merging the data frames on index

merged_df_diff_columns = pd.merge(df1, df2, left_index=True, right_index=True)

print(merged_df_diff_columns)

Output:

css
Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 70000 IT

Merge on Index with how Parameter

# Creating two data frames with different indexes

df1 = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({

'Salary': [50000, 60000],

'Department': ['HR', 'Finance']

}, index=['a', 'b'])

# Merging with 'left' join on the index

merged_left = pd.merge(df1, df2, left_index=True, right_index=True, how='left')

print(merged_left)

Output:

Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 NaN NaN

Merge with outer Join on Index

# Merging with an outer join on the index

merged_outer = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')

print(merged_outer)

Output:

Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 NaN NaN

Merge on Index and Column (Multi-key Merge)

# Creating two data frames with different columns and indexes

df1 = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({

'Salary': [50000, 60000, 70000],

'Department': ['HR', 'Finance', 'IT'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

# Merging on both index and a column

merged_df = pd.merge(df1, df2, left_index=True, right_index=True, on='Age')

print(merged_df)

Output:
css

Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 70000 IT

Reshaping and pivoting

Pivoting Data to Wide Format

import pandas as pd

# Creating a sample data frame

df = pd.DataFrame({

'Date': ['2025-03-01', '2025-03-01', '2025-03-02', '2025-03-02'],

'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],

'Temperature': [58, 70, 60, 72]

})

# Pivoting the data: Dates as rows, cities as columns, and Temperature as values

pivoted_df = df.pivot(index='Date', columns='City', values='Temperature')

print(pivoted_df)

Output:

sql

Copy

City Los Angeles New York

Date

2025-03-01 70 58

2025-03-02 72 60

Melting DataFrames with melt()

# Creating a wide-format data frame

df_wide = pd.DataFrame({

'Date': ['2025-03-01', '2025-03-02'],

'New York': [58, 60],

'Los Angeles': [70, 72]

})

# Melting the data: Convert cities from columns to a single "City" column

melted_df = pd.melt(df_wide, id_vars=['Date'], var_name='City', value_name='Temperature')

print(melted_df)

Output:

yaml

Copy

Date City Temperature

0 2025-03-01 New York 58

1 2025-03-02 New York 60

2 2025-03-01 Los Angeles 70

3 2025-03-02 Los Angeles 72

Stacking DataFrame

# Creating a sample DataFrame

df = pd.DataFrame({

'City': ['New York', 'Los Angeles', 'Chicago'],

'Population': [8175133, 3792621, 2695598],

'Area': [789, 503, 589]

})

# Setting 'City' as the index

df.set_index('City', inplace=True)
# Stacking the DataFrame: Converts columns into a MultiIndex (rows)

stacked_df = df.stack()

print(stacked_df)

Output:

mathematica

Copy

City

New York Population 8175133

Area 789

Los Angeles Population 3792621

Area 503

Chicago Population 2695598

Area 589

dtype: int64

 stack(): Converts columns into rows, resulting in a hierarchical index.

Unstacking DataFrame

# Unstacking the stacked data: Converts rows back to columns

unstacked_df = stacked_df.unstack()

print(unstacked_df)

Output:

sql

Copy

City Population Area

New York 8175133 789

Los Angeles 3792621 503

Chicago 2695598 589

Reshaping with pivot_table()

# Creating a sample DataFrame

df = pd.DataFrame({

'Date': ['2025-03-01', '2025-03-01', '2025-03-02', '2025-03-02'],

'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],

'Temperature': [58, 70, 60, 72],

'Humidity': [60, 50, 65, 55]

})

# Using pivot_table to reshape and calculate the average Temperature per Date and City

pivot_table_df = df.pivot_table(index='Date', columns='City', values='Temperature',

aggfunc='mean')

print(pivot_table_df)

Output:

sql

Copy

City Los Angeles New York

Date

2025-03-01 70 58

2025-03-02 72 60

Handling Missing Data with pivot_table()

# Creating a sample DataFrame with missing values

df_with_missing = pd.DataFrame({

'Date': ['2025-03-01', '2025-03-01', '2025-03-02'],

'City': ['New York', 'Los Angeles', 'New York'],

'Temperature': [58, 70, None],

'Humidity': [60, 50, 65]

})

# Pivoting with missing data and using mean as aggregation function

pivot_table_missing = df_with_missing.pivot_table(index='Date', columns='City',

values='Temperature', aggfunc='mean')

print(pivot_table_missing)

Output:

pgsql

Copy

City Los Angeles New York

Date

2025-03-01 70.0 58.0

2025-03-02 NaN NaN

DSP Unit-5 Updated
No ratings yet
DSP Unit-5 Updated
23 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Data Wrangling with Pandas
No ratings yet
Data Wrangling with Pandas
16 pages
Exp 3
No ratings yet
Exp 3
10 pages
Exp 6
No ratings yet
Exp 6
9 pages
Python - Pandas Merging, Joining, and Concatenating
No ratings yet
Python - Pandas Merging, Joining, and Concatenating
1 page
Pandas Moderate
No ratings yet
Pandas Moderate
15 pages
Pandas DataFrame Merging Guide
No ratings yet
Pandas DataFrame Merging Guide
62 pages
Numpy - Pandas - Colab
No ratings yet
Numpy - Pandas - Colab
6 pages
UnitIV 1
No ratings yet
UnitIV 1
4 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Introduction to Pandas DataFrames
100% (1)
Introduction to Pandas DataFrames
21 pages
Lecture 8 - Data Wrangling Using Pandas
No ratings yet
Lecture 8 - Data Wrangling Using Pandas
31 pages
Concat, Join, Merge in Pandas
No ratings yet
Concat, Join, Merge in Pandas
17 pages
Data Integration and Missing Values Analysis
No ratings yet
Data Integration and Missing Values Analysis
23 pages
Dataset Merging and Concatenation Guide
No ratings yet
Dataset Merging and Concatenation Guide
36 pages
Pandas
No ratings yet
Pandas
44 pages
4th Unit Answer Bank
No ratings yet
4th Unit Answer Bank
40 pages
Week 2
No ratings yet
Week 2
6 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
OOM Unit 2
No ratings yet
OOM Unit 2
145 pages
UNIT IV Material
No ratings yet
UNIT IV Material
23 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Top Machine Learning Artificial Intelligence AI Data Science Cheat Sheets ForML & Deep Learning Engineers
No ratings yet
Top Machine Learning Artificial Intelligence AI Data Science Cheat Sheets ForML & Deep Learning Engineers
14 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Unit 4 1
No ratings yet
Unit 4 1
3 pages
Panda - Ipynb - Colab
No ratings yet
Panda - Ipynb - Colab
1 page
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Python Day 6 (Typed Notes) - Pandas Day 3 - Practice HomeWork, Concat, Different Systems - Connectivity, GIT Installation
No ratings yet
Python Day 6 (Typed Notes) - Pandas Day 3 - Practice HomeWork, Concat, Different Systems - Connectivity, GIT Installation
15 pages
Ch-2 - Panda - Part-1 - 2nd - Day
No ratings yet
Ch-2 - Panda - Part-1 - 2nd - Day
4 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Module - d2
No ratings yet
Module - d2
41 pages
07 Data Wrangling
No ratings yet
07 Data Wrangling
51 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Merge, Join, and Concatenate - Pandas 0203 Documentation
No ratings yet
Merge, Join, and Concatenate - Pandas 0203 Documentation
31 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
EDA Lecture 7 - 9
No ratings yet
EDA Lecture 7 - 9
7 pages
Pandas Indexing and Data Handling
No ratings yet
Pandas Indexing and Data Handling
44 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Pandas Intermediate Functions Logic
No ratings yet
Pandas Intermediate Functions Logic
2 pages
Handling Duplicates in DataFrames
No ratings yet
Handling Duplicates in DataFrames
7 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Data Science Data Manipulation With Pandas
No ratings yet
Data Science Data Manipulation With Pandas
77 pages
Pandas Library
No ratings yet
Pandas Library
6 pages
Revision Notes DataFrame XII IP
No ratings yet
Revision Notes DataFrame XII IP
8 pages
Praveen PPT
No ratings yet
Praveen PPT
9 pages
Pandas & PyNumS Essentials
No ratings yet
Pandas & PyNumS Essentials
10 pages
GR12 Record Programs 6TH Onwards
No ratings yet
GR12 Record Programs 6TH Onwards
18 pages
DATABASE SYSTEM 2 Prelim To Pre Finals.
No ratings yet
DATABASE SYSTEM 2 Prelim To Pre Finals.
63 pages
Naukri Rajasekhar (13y 0m)
No ratings yet
Naukri Rajasekhar (13y 0m)
3 pages
How To Use Alembic For Database Migrations in Your FastAPI Application
No ratings yet
How To Use Alembic For Database Migrations in Your FastAPI Application
8 pages
Database Systems Overview
No ratings yet
Database Systems Overview
23 pages
Database User Types & Security
No ratings yet
Database User Types & Security
16 pages
Document 143004.1
No ratings yet
Document 143004.1
4 pages
Performance Tuning
0% (1)
Performance Tuning
13 pages
Database Workshop
No ratings yet
Database Workshop
51 pages
Java Persistence API Insights
No ratings yet
Java Persistence API Insights
28 pages
Wind Chill
No ratings yet
Wind Chill
23 pages
6 Lecture Notes Lendi DBMS - Unit - 1 Notes
100% (1)
6 Lecture Notes Lendi DBMS - Unit - 1 Notes
26 pages
Understanding Data Modeling
No ratings yet
Understanding Data Modeling
3 pages
4-JD DBA MySQL
No ratings yet
4-JD DBA MySQL
2 pages
Data Warehouse Solutions for Analytics
No ratings yet
Data Warehouse Solutions for Analytics
33 pages
HTML and PHP Web Development Examples
No ratings yet
HTML and PHP Web Development Examples
12 pages
C1000-141 Exam Dumps - IBM Maximo Manage v8.x Administrator
No ratings yet
C1000-141 Exam Dumps - IBM Maximo Manage v8.x Administrator
6 pages
Syllabus CSE 7th Sem
No ratings yet
Syllabus CSE 7th Sem
3 pages
Graph Databases Ian Robinson Full
No ratings yet
Graph Databases Ian Robinson Full
151 pages
Searching Strategies
No ratings yet
Searching Strategies
58 pages
RDBMS Overview for Database Management
No ratings yet
RDBMS Overview for Database Management
21 pages
Roland E-56
No ratings yet
Roland E-56
146 pages
JDBC CRUD Operations Overview
No ratings yet
JDBC CRUD Operations Overview
27 pages
16 Multi Version Concurrency Control
No ratings yet
16 Multi Version Concurrency Control
66 pages
Dimensional Modeling Guide
No ratings yet
Dimensional Modeling Guide
45 pages
Unit 4 - Data Science - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Science - WWW - Rgpvnotes.in
18 pages
Mobile Phone Shopping
No ratings yet
Mobile Phone Shopping
33 pages
NIA Environment
No ratings yet
NIA Environment
2 pages
Oracle EBS & DBA Shell Scripts Guide
No ratings yet
Oracle EBS & DBA Shell Scripts Guide
31 pages
How To Import XML Into SQL Server With The XML Bulk Load Component
No ratings yet
How To Import XML Into SQL Server With The XML Bulk Load Component
3 pages
Overview of Archiving For Asset Accounting: Symptom
No ratings yet
Overview of Archiving For Asset Accounting: Symptom
6 pages