Read a Particular Page from a PDF File in Python
Last Updated :
26 Apr, 2025
Document processing is one of the most common use cases for the Python programming language. This allows the language to process many files, such as database files, multimedia files and encrypted files, to name a few. This article will teach you how to read a particular page from a PDF (Portable Document Format) file in Python.
Method 1: Using Pymupdf library to read page in Python
The PIL (Python Imaging Library), along with the PyMuPDF library, will be used for PDF processing in this article. To install the PyMuPDF library, run the following command in the command processor of the operating system:
pip install pymupdf
Note: This PyMuPDF library is imported by using the following command.
import fitz
Reading a page from a pdf file requires loading it and then displaying the contents of only one of its pages. This essentially makes that one-page equivalent of an image. Therefore, the page from the pdf file would be read and displayed as an image.
The following example demonstrates the above process:
Python3
import fitz
from PIL import Image
input_file = r "test.pdf"
file_handle = fitz. open (input_file)
page = file_handle[ 0 ]
page_img = page.get_pixmap()
page_img.save( 'PDF_page.png' )
img = Image. open ( 'PDF_page.png' )
img.show()
|
Output:
Explanation:
Firstly the pdf file is opened, and its file handle is stored. Then the first page of the pdf (at index 0) is loaded using list indexing. This page’s pixel map (pixel array) is obtained using the get_pixmap function, and the resultant pixel map is saved in a variable. Then this pixel map is saved as a png image file. Then this png file is opened using the open function present in the Image module of PIL. In the end, the image is displayed using the show function.
Note: The first open function is used to open a pdf file, and the later one is used to open the png image file. The functions belong to different libraries and are used for different purposes.
Method 2: Reading a particular page from a PDF using PyPDF2
For the second example, the PyPDF2 library would be used. Which could be installed by running the following command:
pip install PyPDF2
The same objective could be achieved by using the PyPDF2 library. The library allows processing for pdf files and allows various operations such as reading, writing or creating a pdf file. For the task at hand, the use of the extract text function would be made to obtain the text from the PDF file and display it. The code for this is as follows:
Python3
import PyPDF2
input_file = r "test.pdf"
page = 4
pdfFileObj = open ( 'test.pdf' , 'rb' )
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageObj = pdfReader.getPage(page)
data = pageObj.extractText()
pdfFileObj.close()
print (data)
|
Output:
He started this Journey with just one
thought- every geek should have
access to a never ending range of
academic resources and with a lot
of hardwork and determination,
GeeksforGeeks was born.
Through this platform, he has
successfully enriched the minds of
students with knowledge which has
led to a boost in their careers. But
most importantly, GeeksforGeeks
will always help students stay in
touch with their Geeky side!
EXPERT ADVICE
CEO and Founder of
GeeksforGeeks
I understand that many
students who come to us are
either fans of the sciences or
have been pushed into this
field by their parents.
And I just want you to
know that no matter
where life takes you, we
at GeeksforGeeks hope
to have made this
journey easier for
you.Mr. Sandeep Jain
3
Explanation:
Firstly the path to the input pdf and the page number are defined in separate variables. Then the pdf file is opened, and its file object is stored in a variable. Then this variable is passed as an argument to the PdfFileReader function, which creates a pdf reader object out of a file object. Then the data stored within the page number defined in the page variable is obtained and stored in a variable. Then the text is extracted from that PDF page, and the file object is closed. In the end, the extracted text data is displayed.
Similar Reads
Delete pages from a PDF file in Python
In this article, We are going to learn how to delete pages from a pdf file in Python programming language. Introduction Modifying documents is a common task performed by many users. We can perform this task easily with Python libraries/modules that allow the language to process almost any file, the
4 min read
Requesting a URL from a local File in Python
Making requests over the internet is a common operation performed by most automated web applications. Whether a web scraper or a visitor tracker, such operations are performed by any program that makes requests over the internet. In this article, you will learn how to request a URL from a local File
4 min read
Read a file line by line in Python
Python provides built-in functions for creating, writing, and reading files. Two types of files can be handled in Python, normal text files and binary files (written in binary language, 0s, and 1s). In this article, we are going to study reading line by line from a file. Example [GFGTABS] Python # O
4 min read
Get values of all rows in a particular column in openpyxl - Python
In this article, we will explore how to get the values of all rows in a particular column in a spreadsheet using openpyxl in Python. We will start by discussing the basics of openpyxl and how to install and import it. Then, we will walk through for example, how to extract the values of a particular
4 min read
How to Extract Script and CSS Files from Web Pages in Python ?
Prerequisite: RequestsBeautifulSoupFile Handling in Python In this article, we will discuss how to extract Script and CSS Files from Web Pages using Python. For this, we will be downloading the CSS and JavaScript files that were attached to the source code of the website during its coding process. F
2 min read
Check if a string exists in a PDF file in Python
In this article, we'll learn how to use Python to determine whether a string is present in a PDF file. In Python, strings are essential for Projects, applications software, etc. Most of the time, we have to determine whether a string is present in a PDF file or not. Here, we'll discuss how to check
2 min read
How to count the number of pages in a PDF file in Python
In this article, we will see how can we count the total number of pages in a PDF file in Python, For this article there is no such prerequisite, we will use PyPDF2 library for this purpose. PyPDF2 is a free and open-source pure-Python PyPDF library capable of performing many tasks like splitting, me
4 min read
How to read specific lines from a File in Python?
Text files are composed of plain text content. Text files are also known as flat files or plain files. Python provides easy support to read and access the content within the file. Text files are first opened and then the content is accessed from it in the order of lines. By default, the line numbers
3 min read
How to read large text files in Python?
In this article, we will try to understand how to read a large text file using the fastest way, with less memory usage using Python. To read large text files in Python, we can use the file object as an iterator to iterate over the file and perform the required task. Since the iterator just iterates
3 min read
How to extract images from PDF in Python?
The task in this article is to extract images from PDFs and convert them to Image to PDF and PDF to Image in Python. To extract the images from PDF files and save them, we use the PyMuPDF library. First, we would have to install the PyMuPDF library using Pillow. pip install PyMuPDF PillowPyMuPDF is
3 min read