Convert PDF to TXT File Using Python
Last Updated :
12 Apr, 2025
We have a PDF file and want to extract its text into a simple .txt format. The idea is to automate this process so the content can be easily read, edited, or processed later. For example, a PDF with articles or reports can be converted into plain text using just a few lines of Python. In this article, we’ll use a sample file.pdf to explore different libraries and methods to do this efficiently.
File.pdf fileUsing pdfplumber
pdfplumber is a Python library that provides advanced capabilities for extracting text, tables and metadata from PDF files. It is especially useful when working with PDFs that have a complex layout or contain structured data like tables.
Python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf, open("output.txt", "w", encoding="utf-8") as f:
for page in pdf.pages:
t = page.extract_text()
if t:
f.write(t + '\n')
Output
Output.txt FileExplanation: This code uses pdfplumber to open "file.pdf" and "output.txt" simultaneously. It iterates through each page of the PDF using pdf.pages, extracts text with extract_text(), and if text exists, writes it to the output file followed by a newline to separate the content of each page.
Using PyPDF2
PyPDF2 is a pure-Python library used for reading and writing PDF files. It is widely used for basic PDF manipulation, including text extraction, merging, splitting, and rotating pages. However, it may not always handle complex layouts or structured data as precisely as pdfplumber.
Python
from PyPDF2 import PdfReader
reader = PdfReader("file.pdf")
with open("output.txt", "w", encoding="utf-8") as f:
for page in reader.pages:
t = page.extract_text()
if t:
f.write(t + '\n')
Output

Explanation: This code creates a PdfReader object to read "file.pdf", opens "output.txt" in write mode with UTF-8 encoding, and loops through each page to extract text using extract_text(). If text is found, it writes it to the output file with a newline for separation.
Using fitz
fitz is the interface of the PyMuPDF library, which allows high-performance PDF and eBook manipulation. It is known for its speed and accuracy in text extraction, especially for PDFs that have a complex graphical layout or embedded fonts.
Python
import fitz # PyMuPDF
doc = fitz.open("file.pdf")
with open("output.txt", "w", encoding="utf-8") as f:
for page in doc:
f.write(page.get_text() + '\n')

Explanation: This code uses fitz (PyMuPDF) to open "file.pdf" and reads it page by page. For each page, it extracts the text using get_text() and writes it to "output.txt", adding a newline after each page’s content to keep them separated.
Similar Reads
How To Create A Csv File Using Python CSV stands for comma-separated values, it is a type of text file where information is separated by commas (or any other delimiter), they are commonly used in databases and spreadsheets to store information in an organized manner. In this article, we will see how we can create a CSV file using Python
3 min read
Print the Content of a Txt File in Python Python provides a straightforward way to read and print the contents of a .txt file. Whether you are a beginner or an experienced developer, understanding how to work with file operations in Python is essential. In this article, we will explore some simple code examples to help you print the content
3 min read
Read a text file using Python Tkinter Graphical User Interfaces (GUIs) are an essential aspect of modern software development, providing users with interactive and visually appealing applications. Python's Tkinter library is a robust tool for creating GUIs, and in this article, we will delve into the process of building a Tkinter applic
3 min read
How to Convert Tab-Delimited File to Csv in Python? We are given a tab-delimited file and we need to convert it into a CSV file in Python. In this article, we will see how we can convert tab-delimited files to CSV files in Python. Convert Tab-Delimited Files to CSV in PythonBelow are some of the ways to Convert Tab-Delimited files to CSV in Python: U
2 min read
Check If a Text File Empty in Python Before performing any operations on your required file, you may need to check whether a file is empty or has any data inside it. An empty file is one that contains no data and has a size of zero bytes. In this article, we will look at how to check whether a text file is empty using Python.Check if a
4 min read
How to Load a File into the Python Console Loading files into the Python console is a fundamental skill for any Python programmer, enabling the manipulation and analysis of diverse data formats. In this article, we'll explore how to load four common file typesâtext, JSON, CSV, and HTMLâinto the Python console. Whether you're dealing with raw
4 min read