How to Merge all excel files in a folder using Python?
Last Updated :
16 May, 2021
Improve
In this article, we will see how to combine all Excel files present in a folder into a single file.
Module used:
The python libraries used are:
- Pandas: Pandas is a python library developed for a python programming language for manipulating data and analyzing the data. It is widely used in Data Science and Data analytics.
- Glob: The glob module matches all the pathnames matching a specified pattern according to rules used by Unix Shell.
Excel files used:
Three Excel files will be used which will be combined into a single Excel file in a folder using python. The three Excel files are x1.xlsx, x2.xlsx, and x3.xlsx:
Stepwise Approach:
- Firstly we have to import libraries and modules
- Python3
Python3
# importing pandas libraries and # glob module import pandas as pd import glob |
- Setting the path of the folder where files are stored. This line of code will fetch the folder where the files are stored.
- Python3
Python3
# path of the folder path = r 'test' |
- Displaying the names of files in the folder using Glob module. glob.glob( ) function will search for all the files in the given path with .xlsx extension. print(filenames) displays the names of all the files with xlsx extension.
- Python3
Python3
# reading all the excel files filenames = glob.glob(path + "\*.xlsx" ) print ( 'File names:' , filenames) |
- Initializing Empty data frames. A Data Frame is a Table data structure in python for analyzing and manipulating the data. Here we have to initialize an empty data frame for storing the combined data in the three files
- Python3
Python3
# Initializing empty data frame finalexcelsheet = pd.DataFrame() |
- Iterating through all the files in the folder one by one. We have to iterate through each file using for loop. The pd.concat() function will concatenate all the multiple sheets present in the excel files as in the case of the third excel file in this example and will store in a variable called df. finalexcelsheet.append( ) function will append the data present in df variable into finalexcelsheet one by one. Hence with this piece of code, you will be able to combine the Excel files with ease
- Python3
Python3
# to iterate excel file one by one # inside the folder for file in filenames: # combining multiple excel worksheets # into single data frames df = pd.concat(pd.read_excel( file , sheet_name = None ), ignore_index = True , sort = False ) # Appending excel files one by one finalexcelsheet = finalexcelsheet.append( df, ignore_index = True ) |
- Displaying the combined data. To display the combined file just write print(finalexcelsheet).
- Python3
Python3
# to print the combined data print ( 'Final Sheet:' ) display(finalexcelsheet) |
- Insert the combined data into a new Excel file.
- Python3
Python3
# save combined data finalexcelsheet.to_excel(r 'Final.xlsx' ,index = False ) |
Below is the complete python program based on the above approach:
- Python3
Python3
#import modules import pandas as pd import glob # path of the folder path = r 'test' # reading all the excel files filenames = glob.glob(path + "\*.xlsx" ) print ( 'File names:' , filenames) # initializing empty data frame finalexcelsheet = pd.DataFrame() # to iterate excel file one by one # inside the folder for file in filenames: # combining multiple excel worksheets # into single data frames df = pd.concat(pd.read_excel( file , sheet_name = None ), ignore_index = True , sort = False ) # appending excel files one by one finalexcelsheet = finalexcelsheet.append( df, ignore_index = True ) # to print the combined data print ( 'Final Sheet:' ) display(finalexcelsheet) finalexcelsheet.to_excel(r 'Final.xlsx' , index = False ) |
Output:
Final Excel: