How to Convert a Document to PDF in Java?
Last Updated :
28 Apr, 2025
In software projects, there is quite often a requirement for conversion of a given file (HTML/TXT/etc.,) to a PDF file and similarly, any PDF file needs to get converted to HTML/TXT/etc., files. Even PDFs need to be stored as images of type PNG or GIF etc., Via a sample maven project, let us see the same. As it is the maven project, necessary dependencies need to be added in pom.xml
Essential Library is PDF2Dom:
<!-- To load the selected PDF file -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox-tools</artifactId>
<version>2.0.25</version>
</dependency>
<!-- To load the selected PDF file -->
<!-- Required for conversion -->
<dependency>
<groupId>net.sf.cssbox</groupId>
<artifactId>pdf2dom</artifactId>
<version>2.0.1</version>
</dependency>
A few more dependencies are also needed. iText is needed to extract the text from a given PDF file. POI is needed to create the .docx document.
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>5.5.10</version>
</dependency>
<dependency>
<groupId>com.itextpdf.tool</groupId>
<artifactId>xmlworker</artifactId>
<version>5.5.10</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.15</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>3.15</version>
</dependency>
Example Maven Project
Let us start with the project structure and pom.xml and then will look for the required source code to convert from PDF to other formats as well as from other formats to HTML
pom.xml
XML
<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<artifactId>pdf</artifactId>
<name>pdf</name>
<url>http://maven.apache.org</url>
<parent>
<groupId>com.gfg</groupId>
<artifactId>parent-modules</artifactId>
<version>1.0.0-SNAPSHOT</version>
</parent>
<dependencies>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox-tools</artifactId>
<version>${pdfbox-tools.version}</version>
<exclusions>
<exclusion>
<artifactId>commons-logging</artifactId>
<groupId>commons-logging</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>net.sf.cssbox</groupId>
<artifactId>pdf2dom</artifactId>
<version>${pdf2dom.version}</version>
<exclusions>
<exclusion>
<artifactId>commons-logging</artifactId>
<groupId>commons-logging</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>${itextpdf.version}</version>
</dependency>
<dependency>
<groupId>com.itextpdf.tool</groupId>
<artifactId>xmlworker</artifactId>
<version>${xmlworker.version}</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>${poi-scratchpad.version}</version>
</dependency>
<dependency>
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>batik-transcoder</artifactId>
<version>${batik-transcoder.version}</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>${poi-ooxml.version}</version>
</dependency>
<dependency>
<groupId>org.thymeleaf</groupId>
<artifactId>thymeleaf</artifactId>
<version>${thymeleaf.version}</version>
</dependency>
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf</artifactId>
<version>${flying-saucer-pdf.version}</version>
</dependency>
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf-openpdf</artifactId>
<version>${flying-saucer-pdf-openpdf.version}</version>
</dependency>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>${jsoup.version}</version>
</dependency>
<dependency>
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-core</artifactId>
<version>${open-html-pdf-core.version}</version>
</dependency>
<dependency>
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-pdfbox</artifactId>
<version>${open-html-pdfbox.version}</version>
</dependency>
</dependencies>
<build>
<finalName>pdf</finalName>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
</resource>
</resources>
</build>
<properties>
<pdfbox-tools.version>2.0.25</pdfbox-tools.version>
<pdf2dom.version>2.0.1</pdf2dom.version>
<itextpdf.version>5.5.10</itextpdf.version>
<xmlworker.version>5.5.10</xmlworker.version>
<poi-scratchpad.version>3.15</poi-scratchpad.version>
<batik-transcoder.version>1.8</batik-transcoder.version>
<poi-ooxml.version>3.15</poi-ooxml.version>
<thymeleaf.version>3.0.11.RELEASE</thymeleaf.version>
<flying-saucer-pdf.version>9.1.20</flying-saucer-pdf.version>
<open-html-pdfbox.version>1.0.6</open-html-pdfbox.version>
<open-html-pdf-core.version>1.0.6</open-html-pdf-core.version>
<flying-saucer-pdf-openpdf.version>9.1.22</flying-saucer-pdf-openpdf.version>
<jsoup.version>1.14.2</jsoup.version>
</properties>
</project>
Let us see important key files
1. PDF and HTML conversion
ConversionOfPDF2HTMLExample.java
In the below program, both methods are handled i.e.
a. generationOfHTMLFromPDF
Note: Conversion of PDF to HTML cannot be predicted 100%, pixel-to-pixel result oriented. If the complexity of the PDF file is more, accuracy varies.
b. generationOfPDFFromHTML
Note: In html file, all tags need to properly closed and then only PDF can be generated
Java
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.Writer;
import javax.xml.parsers.ParserConfigurationException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.fit.pdfdom.PDFDomTree;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
public class ConversionOfPDF2HTMLExample {
private static final String PDF = "src/main/resources/pdf.pdf";
private static final String HTML = "src/main/resources/html.html";
public static void main(String[] args) {
try {
generationOfHTMLFromPDF(PDF);
generationOfPDFFromHTML(HTML);
} catch (IOException | ParserConfigurationException | DocumentException e) {
e.printStackTrace();
}
}
private static void generationOfHTMLFromPDF(String filename) throws ParserConfigurationException, IOException {
PDDocument pdf = PDDocument.load(new File(filename));
PDFDomTree parser = new PDFDomTree();
Writer output = new PrintWriter("src/output/pdf.html", "utf-8");
parser.writeText(pdf, output);
output.close();
if (pdf != null) {
pdf.close();
}
}
private static void generationOfPDFFromHTML(String filename) throws ParserConfigurationException, IOException, DocumentException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("src/output/html.pdf"));
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(filename));
document.close();
}
}
2. PDF and Image Conversions
PDF can be converted to Images in many ways and one important way is Apache PDFBox again from image to PDF can be converted by using iText
ConversionOfPDF2ImageExample.java
In the below program, the following methods are handled
- generationOfPDFFromImage
- Images are of type jpeg, jpg, gif, tiff, or png and can be loaded from disk
- generationOfImageFromPDF
- Apache PDFBox is an advanced tool. Each page of PDF has to be rendered by using PDFRenderer as a BufferedImage. Then ImageIOUtil is used to write the image as of types like JPEG, GIF, PNG, etc.,
Java
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.tools.imageio.ImageIOUtil;
import com.itextpdf.text.BadElementException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
public class ConversionOfPDF2ImageExample {
private static final String PDF = "src/main/resources/pdf.pdf";
private static final String JPG = "http://cdn2.gfg.netdna-cdn.com/wp-content/uploads/2016/05/gfg-rest-widget-main-1.2.0";
private static final String GIF = "https://media.giphy.com/media/l3V0x6kdXUW9M4ONq/giphy";
public static void main(String[] args) {
try {
generationOfImageFromPDF(PDF, "png");
generationOfImageFromPDF(PDF, "jpeg");
generationOfImageFromPDF(PDF, "gif");
generationOfPDFFromImage(JPG, "jpg");
generationOfPDFFromImage(GIF, "gif");
} catch (IOException | DocumentException e) {
e.printStackTrace();
}
}
private static void generationOfImageFromPDF(String filename, String extension) throws IOException {
PDDocument document = PDDocument.load(new File(filename));
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page) {
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
ImageIOUtil.writeImage(bim, String.format("src/output/pdf-%d.%s", page + 1, extension), 300);
}
document.close();
}
private static void generationOfPDFFromImage(String filename, String extension)
throws IOException, BadElementException, DocumentException {
Document document = new Document();
String input = filename + "." + extension;
String output = "src/output/" + extension + ".pdf";
FileOutputStream fos = new FileOutputStream(output);
PdfWriter writer = PdfWriter.getInstance(document, fos);
writer.open();
document.open();
document.add(Image.getInstance((new URL(input))));
document.close();
writer.close();
}
}
3. PDF and Text Conversions
For this also Apache PDFBox is needed to get the text from PDF files and iText is required for text-to-pdf conversion.
Note: cannot preserve the formatting in a plain text file as it has text only
ConversionOfPDF2TextExample.java
Java
import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;
import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.io.RandomAccessFile;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Element;
import com.itextpdf.text.Font;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;
public class ConversionOfPDF2TextExample {
private static final String PDF = "src/main/resources/pdf.pdf";
private static final String TXT = "src/main/resources/txt.txt";
public static void main(String[] args) {
try {
generationOfTxtFromPDF(PDF);
generationOfPDFFromTxt(TXT);
} catch (IOException | DocumentException e) {
e.printStackTrace();
}
}
private static void generationOfTxtFromPDF(String filename) throws IOException {
File f = new File(filename);
String parsedText;
PDFParser parser = new PDFParser(new RandomAccessFile(f, "r"));
parser.parse();
COSDocument cosDoc = parser.getDocument();
PDFTextStripper pdfStripper = new PDFTextStripper();
PDDocument pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
if (cosDoc != null)
cosDoc.close();
if (pdDoc != null)
pdDoc.close();
PrintWriter pw = new PrintWriter("src/output/pdf.txt");
pw.print(parsedText);
pw.close();
}
private static void generationOfPDFFromTxt(String filename) throws IOException, DocumentException {
Document pdfDoc = new Document(PageSize.A4);
PdfWriter.getInstance(pdfDoc, new FileOutputStream("src/output/txt.pdf"))
.setPdfVersion(PdfWriter.PDF_VERSION_1_7);
pdfDoc.open();
Font myfont = new Font();
myfont.setStyle(Font.NORMAL);
myfont.setSize(11);
pdfDoc.add(new Paragraph("\n"));
BufferedReader br = new BufferedReader(new FileReader(filename));
String strLine;
while ((strLine = br.readLine()) != null) {
Paragraph para = new Paragraph(strLine + "\n", myfont);
para.setAlignment(Element.ALIGN_JUSTIFIED);
pdfDoc.add(para);
}
pdfDoc.close();
br.close();
}
}
4. PDF and DocX Conversions
Two libraries are needed. i.e.
- iText: Extract text from PDF
- POI: To create the .docx document
ConversionOfPDF2WordExample.java
Java
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.xwpf.usermodel.BreakType;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfReaderContentParser;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;
public class ConversionOfPDF2WordExample {
private static final String FILENAME = "src/main/resources/pdf.pdf";
public static void main(String[] args) {
try {
generationOfDocFromPDF(FILENAME);
} catch (IOException e) {
e.printStackTrace();
}
}
private static void generationOfDocFromPDF(String filename) throws IOException {
XWPFDocument doc = new XWPFDocument();
String pdf = filename;
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
String text = strategy.getResultantText();
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
run.addBreak(BreakType.PAGE);
}
FileOutputStream out = new FileOutputStream("src/output/pdf.docx");
doc.write(out);
out.close();
reader.close();
doc.close();
}
}
Code Explanation Video:
Conclusion
In many stages of software projects, there are requirements for conversion of text, and image to PDF, and similarly conversion of data from PDF to text, image, and Docx format. The above examples help the best way to do this in Java.
Similar Reads
How to convert a PDF document to a preview image in PHP? Converting a PDF document into a set of images may not sound that fun, but it can have a few applications. As the content from images cannot be copied that easily, the conversion makes the document strictly 'read-only' and brings an extra layer of protection from plagiarism. The images may also come
6 min read
How to Convert WebView to PDF in Android? Sometimes there is a need to save some articles available on the internet in the form of PDF files. And to do so there are many ways, one can use any browser extension or any software or any website to do so. But in order to implement this feature in the android app, one can't rely on other software
6 min read
How to Convert PDF to Image in Linux Command Line? Pdftoppm is a tool that converts PDF document files into .PNG format and many other formats. We can use this tool on Linux to convert the PDF into images. It also provides the features like the cropping image, set resolution, and scale, and many more. Now let's see how to install the pdftoppm Instal
3 min read
How to Load PDF from URL in Android? Most of the apps require to include support to display PDF files in their app. So if we have to use multiple PDF files in our app it is practically not possible to add each PDF file inside our app because this approach may lead to an increase in the size of the app and no user would like to download
5 min read
How to Copy Text From a PDF? Pre-requisites: PDF Full FormPDF means Portable Document Format. it is the file format created by Adobe that gives people an easy, reliable way to present and exchange documents. opening your PDF to a compatible reader and selecting the text Right-click the selected text, and choose Copy, The conten
5 min read
How to Use AsciiDoctor with Maven Project in Java? AsciiDoc is nothing but a text document mainly used to prepare help documents, and documentation on various concepts like documenting web pages, product review pages, etc. AsciiDoc documents can be converted into formats like PDF or HTML or EPUB etc., AsciiDoctor It is a text processor involved in c
3 min read
How To Edit a PDF Editing PDFs after completion of the task includes more hassle and the more time-consuming it becomes, the more frustrating it appears. To make it quick, easy, and accessible at any time, letâs check out the different ways to edit PDFs on Mac and Android using different tools. Depending on the need
7 min read
How to Convert your Emails to PDF through Email Itself? In today's fast-paced digital world, converting emails to PDF format can be incredibly useful for archiving important information, sharing documents securely, and keeping records for future reference. But did you know that you can convert your emails to PDFs directly through your email itself? In th
6 min read
How to convert a PDF file to TIFF file using Python? This article will discover how to transform a PDF (Portable Document Format) file on your local drive into a TIFF (Tag Image File Format) file at the specified location. We'll employ Python's Aspose-Words package for this task. The aspose-words library will be used to convert a PDF file to a TIFF fi
3 min read
How to Generate a PDF file in Android App? There are many apps in which data from the app is provided to users in the downloadable PDF file format. So in this case we have to create a PDF file from the data present inside our app and represent that data properly inside our app. So by using this technique, we can easily create a new PDF accor
6 min read