
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Merge Multiple Excel Files into a Single File with Python
Excel is one of the most popular tools for data analysis and management. Often, we need to merge multiple Excel files into a single file for analysis or sharing with others. Manually merging these files can be time?consuming and prone to errors, especially when dealing with large datasets. Luckily, Python provides an efficient and flexible way to merge multiple Excel files into a single file.
In this article, we will learn how to merge multiple Excel files using Python. We will be using the Pandas library, which is a powerful and easy?to?use data analysis library for Python to merge the different Excel files in Python.
What is Pandas library in Python?
Pandas is a Python library used for working with data sets. It consists of functions for analyzing, cleaning, exploring, and manipulating data. Apart from this, it allows us to analyze big data and make conclusions based on statistical theories. It can also clean messy data sets, and make them readable and relevant.
When merging Excel files with Pandas, we can use the read_excel() function to read the data from each file into a Pandas DataFrame object. Once we have all the data in separate DataFrames, we can use Pandas' built?in functions like merge(), join(), and concat() to combine the data frames into a single DataFrame.
Pandas' merge() function provides the ability to merge DataFrames based on specific columns, similar to the VLOOKUP function in Excel. We can specify which columns to use as keys to join the data frames and the type of join to perform (inner, outer, left, or right).
Steps to merge multiple Excel files into a single file
Step 1: Import the libraries
The first step to merge the different Excel files is to import the required libraries. We will be using Pandas to read and merge the Excel files, and OS to navigate the file system. Below is the code to import these libraries:
import pandas as pd import os
Step 2: Identifying the Files to Merge
The next step is to identify the files that we want to merge. We can either specify the exact file names or use a pattern to match multiple files. In this example, we will use the latter approach, where we will match all files in a specific directory that have the extension ".xlsx". Below is the code to get the list of files to merge:
path = "path/to/files/" files = os.listdir(path) files_xlsx = [f for f in files if f[-4:] == 'xlsx']
Step 3: Reading your Excel Files
After identifying the files to merge, we will now read the different Excel files into Pandas data frames. We can use a loop to read all files into separate data frames and then concatenate them into a single data frame.
The input Excel files are as follows:
#File1.xlsx Product Quantity Price Apple 10 0.5 Banana 5 0.25 #File2.xlsx Product Quantity Price Oranges 8 0.4 Grapes 12 0.3 #File3.xlsx Product Quantity Price Mango 15 0.6 Pineapple 7 0.8
Below is the code to read the Excel files:
df_list = [] for f in files_xlsx: data = pd.read_excel(os.path.join(path, f)) df_list.append(data) df = pd.concat(df_list)
Step 4: Create a new merged Excel file
The final step to merge the files is to write the merged data frame to a new Excel file. We can use the Pandas to_excel method to do this. Below is the code to write the merged data frame to a new Excel file:
output_path = "path/to/output_file.xlsx" df.to_excel(output_path, index=False)
Now let's see the complete examples to merge multiple excel files into a single file in Python using different approaches like pd.concat and using for loop.
Example 1: Using ?pd.concat()'
In the below example, we first import the required libraries and identified the files to merge. We then identify the files to merge by listing all files in the specified directory that end with .xlsx. Next, we read each Excel file into a data frame and store them in a list called df_list.
import pandas as pd import os # Identifying the files to merge path = "yourpath/to/files/" files = os.listdir(path) files_xlsx = [f for f in files if f.endswith('.xlsx')] # Reading the Excel files and concatenate them mydf_list = [pd.read_excel(os.path.join(path, f)) for f in files_xlsx] mydf = pd.concat(mydf_list) # Writing the merged data frame to a new Excel file myoutput_path = "path/to/myoutput_file.xlsx" mydf.to_excel(myoutput_path, index=False)
Output
The final output after merging three .xlsx files:
Product Quantity Price Apple 10 0.5 Banana 5 0.25 Orange 8 0.4 Grapes 12 0.3 Mango 15 0.6 Pineapple 7 0.8
Example 2: Using a for loop and pd.concat()
In the below example, we first import the required libraries and identified the files to merge, just like in the first example. We then initialize an empty data frame called merged_df. Next, we loop through each Excel file and read it into a data frame. We use the concat() function to concatenate the current data frame with the previously merged data frames and store the result in merged_df.
import pandas as pd import os # Identifying the files to merge path = "yourpath/to/files/" files = os.listdir(path) files_xlsx = [f for f in files if f.endswith('.xlsx')] # Initialize an empty data frame mymerged_df = pd.DataFrame() # Reading the Excel files and concatenate them for file in files_xlsx: data = pd.read_excel(os.path.join(path, file)) mymerged_df = pd.concat([mymerged_df, data], ignore_index=True) # Writing the merged data frame to a new Excel file myoutput_path = "yourpath/to/myoutput_file.xlsx"
Output
The final output after merging three .xlsx files:
Product Quantity Price Apple 10 0.5 Banana 5 0.25 Orange 8 0.4 Grapes 12 0.3 Mango 15 0.6 Pineapple 7 0.8
Example 3: Using pd.merge()
In the below example, we first import the required libraries and identified the files to merge, just like the other two examples. Now, we have read the two Excel files using the pd.read_excel() function e the pd.merge() function to merge the two data frames based on a common column that we specify with the on parameter.
import pandas as pd # read in the first Excel file mydf1 = pd.read_excel('myfile1.xlsx') # read in the second Excel file mydf2 = pd.read_excel('myfile2.xlsx') # merge the two data frames based on a common column mymerged_df = pd.merge(mydf1, mydf2, on='common_column') # write the merged data frame to a new Excel file mymerged_df.to_excel('mymerged_file.xlsx', index=False)
Output
The final output after merging three .xlsx files:
Product Quantity Price Apple 10 0.5 Banana 5 0.25 Orange 8 0.4 Grapes 12 0.3 Mango 15 0.6 Pineapple 7 0.8
Conclusion
Merging multiple Excel files into a single file using Python can save you time and reduce errors when dealing with large datasets. With the help of Pandas, it is a straightforward process. In this article, we learned how to merge multiple Excel files using Python by following a few simple steps. We provided two examples that demonstrate how to merge multiple Excel files using the concat() function and a loop. Overall, with Python's flexibility and efficiency, merging multiple Excel files has never been easier.