基于office文档(word/excel)转换为pdf格式的一站式解决方案

zhouyunjian

已于 2024-01-13 21:18:04 修改

阅读量1.6k

点赞数 21

CC 4.0 BY-SA版权

文章标签： word excel pdf

于 2024-01-13 21:15:27 首次发布

本文链接：https://blog.csdn.net/zhouyunjian/article/details/135575775

项目背景

公司配载系统在客户现场出现崩溃的情况。由现场客户反馈在使用配载系统的时候，偶发性出现系统卡死，停顿现象，导致无法访问。由技术人员分析后，初步得出结论，是由于引用了

<groupId>e-iceblue</groupId>

<artifactId>spire.office.free</artifactId>

</dependency>

包所导致，其方法在调用Excel生成pdf方式中卡顿中断。代码demo如下参考：

Workbook workbook = new Workbook();

workbook.loadFromFile("src\\main\\resources\\wordTemplate\\Manifest_767_LIR_E_A_SINGLE.xlsx");

//设置转换后的PDF页面高宽适应工作表的内容大小

workbook.getConverterSetting().setSheetFitToPage(true);

//将生成的文档保存到指定路径

workbook.saveToFile("src\\main\\resources\\transferPdf\\transferResult22.pdf\\", FileFormat.PDF);

评估后发现，此jar包竟然不是开源的、而且版本比较老-jdk1.6、同时不支持反编译；那么意味着这个问题无法直接解决，方案需要重新调整，而此事也引起了严重的后果，，，，，，，此处省略五千字，最终主研离职、一代新人换旧人。于是解决历史老问题，新的解决方案也提上了日程。

方案分析

从系统业务使用场景上说，需要将前端传进来的数据转化为pdf文件，由于pdf文件除了数据还存在图片、表格以及特殊函数图表，故前一任项目组初步采用的方案是：

动态数据+excel模板->excel文件->pdf文件

动态数据+word模板->word文件->pdf文件

动态数据>html文件->pdf文件

经过与技术小伙伴的探讨，以及当前项目详情的了解，初步分析得出有以下三种解决方案：

方案一：按照原版方式走

方案二：全部通过html转pdf

方案三：先手动转为pdf模板

三种方式各有优略，我们分别做可行性评估：

	原版方式走	html转pdf	pdf自编辑
优点	代码修改量最少方案比较稳定	1、能够解决避免因闭源所引起的问题 2、灵活性相对更高	所有包都变成pdf更新统一编辑方便
缺点	市面上相关开源又符合项目实际的方案不多	方案改造量大，成本高；对于已上线项目来讲风险太大，适用于新开发的系统	动态编辑表格难度较大，完整组件普遍少
可行性	可行	是否存在未知风险：比如前端依赖包信息较大，或者html->pdf转码包可能存在其他未知坑	1、变量数据是否支持 2、市面上组件，对于包线图/平衡图的api是否支持
组件分析	收费版：spire.office.free、 appose 个人开源： excel2Pdf、 test-exceltopdf、file-convert-util-master 开源安装版： libreoffice+JODConverter、 openoffice+JODConverter 国外开源版： docx4j	国外开源：openpdf、 docx4j 个人开源：file-convert-util-master 开源安装版： libreoffice+JODConverter、 openoffice+JODConverter	国外开源： itext、openpdf、 Apache PDFBox

方案验证

spire.office.free

目前老版本就是用此方案，使用如下所示：

1、引入使用包

<groupId>e-iceblue</groupId>

<artifactId>spire.office.free</artifactId>

</dependency>

2、调用方法

如下代码所示，将excel转化为pdf

Workbook workbook = new Workbook();

workbook.loadFromFile("src\\main\\resources\\wordTemplate\\Manifest_767_LIR_E_A_SINGLE.xlsx");

//设置转换后的PDF页面高宽适应工作表的内容大小

workbook.getConverterSetting().setSheetFitToPage(true);

//将生成的文档保存到指定路径

workbook.saveToFile("src\\main\\resources\\transferPdf\\Manifest_767_LIR_E_A_SINGLE.pdf\\", FileFormat.PDF);

使用上来说，此方式比较成熟，但目前5.3.1版本较低，而且代码闭源，不可维护。官网正式版费用高，不符合小公司使用

aspose

1、配置引用包

aspose-cells-8.5.2.jar

aspose-words-15.8.0-jdk16.jar

2、引入license

3、使用转换工具

Workbook workbook = new Workbook("input.xlsx");

PdfSaveOptions saveOptions = new PdfSaveOptions();

saveOptions.OnePagePerSheet = true;

saveOptions.PaperSize = PaperSizeType.PaperA4;

saveOptions.Orientation = Aspose.Cells.PageOrientationType.Landscape;

saveOptions.Zoom = 90;

workbook.Save("output.pdf", saveOptions);

从使用上来说，aspose方式与spire.office.free类似。但其也是代码闭源，不可维护。官网正式版费用高，不符合小公司使用

excel2Pdf

这是码云上的个人开源，仅支持excel转pdf，同时整体功能还不够成熟，复杂图表无法转化，pass掉

test-exceltopdf

也是码云上的个人开源，与excel2Pdf类似评价

file-convert-util-master

码云上的个人开源，套了个aspose的壳子，而且使用还有水印，pass掉

libreoffice+JODConverter

1、安装libreoffice

官网下载安装版，安装完成

2、引入maven包

<groupId>org.jodconverter</groupId>

<artifactId>jodconverter-core</artifactId>

</dependency>

<groupId>org.jodconverter</groupId>

<artifactId>jodconverter-local</artifactId>

</dependency>

<groupId>org.jodconverter</groupId>

<artifactId>jodconverter-online</artifactId>

</dependency>

<groupId>org.jodconverter</groupId>

<artifactId>jodconverter-spring-boot-starter</artifactId>

</dependency>

3、代码测试

import com.alibaba.fastjson.JSONObject;

import com.jky.knowledge.config.JodconverterConfig;

import org.apache.logging.log4j.LogManager;

import org.apache.logging.log4j.Logger;

import org.jodconverter.DocumentConverter;

import org.jodconverter.LocalConverter;

import org.jodconverter.OnlineConverter;

import org.jodconverter.office.LocalOfficeManager;

import org.jodconverter.office.OfficeException;

import org.jodconverter.office.OfficeManager;

import org.springframework.beans.factory.annotation.Autowired;

import org.springframework.beans.factory.annotation.Value;

import org.springframework.stereotype.Component;

import java.io.File;

import java.util.List;

/**

* office文档转PDF文档

@Component

public class OfficeToPdfUtil {

static Logger logger = LogManager.getLogger(LogManager.ROOT_LOGGER_NAME);

static LocalOfficeManager localOfficeManager = LocalOfficeManager.builder().officeHome(JodconverterConfig.getOfficeHome()).install().build();

OfficeToPdfUtil() throws OfficeException {

logger.info("***启动OfficeManager***");

if(!localOfficeManager.isRunning()){

localOfficeManager.start();

}

/**

* @Description office转pdf

* @Param [fromPath--源文件, toPath--目标文件]

* @return void

**/

public static void convertByLocal(File fromPath,File toPath) {

try {

logger.info("***开始文件转化pdf***");

DocumentConverter converter = LocalConverter.builder().officeManager(localOfficeManager).build();

//使用本地方式转换

converter.convert(fromPath).to(toPath).execute();

logger.info("***文件转化pdf成功***");

} catch (OfficeException officeException) {

logger.error("文件转换失败，请检查");

officeException.printStackTrace();

}finally {

}

public static void stopServer() throws OfficeException {

// Stop the office process

localOfficeManager.stop();

logger.info("***停止OfficeManager***");

}

不过此方式依赖于安装包，在线服务使用尚可，其他则不建议使用

openoffice+JODConverter

跟libreoffice差不多

itext

外网的开源包，可以直接创建/编辑pdf，也可以将html转化为pdf。

例如将数据生产pdf代码：

package com.wh;

importjava.io.FileOutputStream;

importcom.itextpdf.text.BaseColor;

importcom.itextpdf.text.Document;

importcom.itextpdf.text.Element;

importcom.itextpdf.text.Font;

importcom.itextpdf.text.Paragraph;

importcom.itextpdf.text.Rectangle;

importcom.itextpdf.text.pdf.BaseFont;

importcom.itextpdf.text.pdf.PdfPTable;

importcom.itextpdf.text.pdf.PdfWriter;

public class ToPDF{

// 表头

public static final String[] tableHeader= { "姓名", "性别", "年龄",

"学院", "专业", "年级"};

// 数据表字段数

private static final int colNumber = 6;

// 表格的设置

private static final int spacing = 2;

// 表格的设置

private static final int padding = 2;

// 导出Pdf文挡

public static void exportPdfDocument() {

// 创建文Pdf文挡50, 50, 50,50左右上下距离

Document document = newDocument(new Rectangle(1500, 2000), 50, 50, 50,

50);

try {

//使用PDFWriter进行写文件操作

PdfWriter.getInstance(document,new FileOutputStream(

"d:\\学生信息.pdf"));

document.open();

// 中文字体

BaseFont bfChinese =BaseFont.createFont("STSong-Light",

"UniGB-UCS2-H",BaseFont.NOT_EMBEDDED);

Font fontChinese = newFont(bfChinese, 12, Font.NORMAL);

// 创建有colNumber(6)列的表格

PdfPTable datatable = newPdfPTable(colNumber);

//定义表格的宽度

int[] cellsWidth = { 8, 2,2, 8, 5, 3 };

datatable.setWidths(cellsWidth);

// 表格的宽度百分比

datatable.setWidthPercentage(100);

datatable.getDefaultCell().setPadding(padding);

datatable.getDefaultCell().setBorderWidth(spacing);

//设置表格的底色

datatable.getDefaultCell().setBackgroundColor(BaseColor.GREEN);

datatable.getDefaultCell().setHorizontalAlignment(

Element.ALIGN_CENTER);

// 添加表头元素

for (int i = 0; i <colNumber; i++) {

datatable.addCell(newParagraph(tableHeader[i], fontChinese));

}

// 添加子元素

for (int i = 0; i <colNumber; i++) {

datatable.addCell(newParagraph(tableHeader[i], fontChinese));

}

document.add(datatable);

} catch (Exception e) {

e.printStackTrace();

}

document.close();

}

public static void main(String[] args)throws Exception {

exportPdfDocument();

}

跟excel类似。不过pdf编辑图表没有，特别是复杂图表不合适。

openpdf

是itext的一个开源分支，可以做html到pdf的转化，也可以做pdf编辑操作。

转化代码如下：

* $Id: HelloHtml.java 3373 2008-05-12 16:21:24Z xlv $

* This code is part of the 'OpenPDF Tutorial'.

* You can find the complete tutorial at the following address:

* https://github.com/LibrePDF/OpenPDF/wiki/Tutorial

* This code is distributed in the hope that it will be useful,

* but WITHOUT ANY WARRANTY; without even the implied warranty of

* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

package com.lowagie.examples.html;

import com.lowagie.text.Document;

import com.lowagie.text.DocumentException;

import com.lowagie.text.html.HtmlParser;

import com.lowagie.text.pdf.PdfWriter;

import java.io.FileOutputStream;

import java.io.IOException;

/**

* Generates a simple 'Hello World' HTML page.

* @author blowagie

public class ParseHelloHtml {

/**

* Generates an HTML page with the text 'Hello World'

* @param args no arguments needed here

public static void main(String[] args) {

System.out.println("Parse Hello World");

// step 1: creation of a document-object

try (Document document = new Document()) {

PdfWriter.getInstance(document, new FileOutputStream("D:\\home\\parseHelloWorld.pdf"));

// step 2: we open the document

document.open();

// step 3: parsing the HTML document to convert it in PDF

HtmlParser.parse(document, ParseHelloHtml.class.getClassLoader().getResourceAsStream("com/lowagie/examples/html/parseHelloWorld.html"));

} catch (DocumentException | IOException de) {

System.err.println(de.getMessage());

}

Apache PDFBox

与openPdf还有itext定位差不多，不重复操作了。

docx4j

1、先引入pom文件

<artifactId>docx4j-JAXB-Internal</artifactId>

</dependency>

<artifactId>docx4j-JAXB-ReferenceImpl</artifactId>

</dependency>

</dependency>

<artifactId>docx4j-documents4j-local</artifactId>

</dependency>

<groupId>com.documents4j</groupId>

<artifactId>documents4j-transformer-msoffice-excel</artifactId>

</dependency>

<groupId>org.springframework</groupId>

<artifactId>spring-web</artifactId>

</dependency>

2、设置文件转换

package generator;

/**

* @author zhouyunjian

* @date 2024/01/09 10:07

**/

import com.documents4j.api.DocumentType;

import org.docx4j.documents4j.local.Documents4jLocalServices;

import org.springframework.web.multipart.MultipartFile;

import java.io.File;

import java.io.FileOutputStream;

import java.io.IOException;

public class Docx4jOfficeFileToPDF {

public static File transferToFile(MultipartFile multipartFile) {

// 选择用缓冲区来实现这个转换即使用java 创建的临时文件使用 MultipartFile.transferto()方法。

File file = null;

try {

String originalFilename = multipartFile.getOriginalFilename();

String[] filename = originalFilename.split("\\.");

file=File.createTempFile(filename[0], filename[1]);

multipartFile.transferTo(file);

file.deleteOnExit();

} catch (IOException e) {

e.printStackTrace();

}

return file;

}

public static void docx2Pdf() {

File output = new File("D:\\home\\doc4j_007_doc.pdf");

if (!output.exists()){

output.getParentFile().mkdirs();

}

FileOutputStream fos = null;

try {

fos = new FileOutputStream(output);

Documents4jLocalServices exporter = new Documents4jLocalServices();

exporter.export(new File("D:\\home\\Manifest_LDM_CPM.docx"), fos, DocumentType.DOCX);

fos.close();

} catch (Exception e) {

e.printStackTrace();

}

// public static String pptx2Pdf(MultipartFile file) {

// String relativelyPath = String.format("pdf/%s.pdf", getCtm());

// String path = CommonConfig.getFileSavePath() + relativelyPath;

// File output = new File(path);

// if (!output.exists()){

// output.getParentFile().mkdirs();

// }

// FileOutputStream fos = null;

// try {

// fos = new FileOutputStream(output);

// Documents4jLocalServices exporter = new Documents4jLocalServices ();

// exporter.export(transferToFile(file), fos, DocumentType.MS_POWERPOINT);

// fos.close();

// } catch (Exception e) {

// e.printStackTrace();

// }

// return getServerBasePath() + relativelyPath;

// }

public static void xlsx2Pdf() {

// String relativelyPath = String.format("pdf/%s.pdf", getCtm());

// String path = CommonConfig.getFileSavePath() + relativelyPath;

File output = new File("D:\\home\\doc4j_006_change.pdf");

if (!output.exists()){

output.getParentFile().mkdirs();

}

FileOutputStream fos = null;

try {

fos = new FileOutputStream(output);

Documents4jLocalServices exporter = new Documents4jLocalServices();

exporter.export(new File("D:\\home\\Manifest_737_81B_ACARS.xlsx"), fos, DocumentType.XLSX);

fos.close();

} catch (Exception e) {

e.printStackTrace();

}

public static void html2Pdf() {

// String relativelyPath = String.format("pdf/%s.pdf", getCtm());

// String path = CommonConfig.getFileSavePath() + relativelyPath;

File output = new File("D:\\home\\doc4j_006_html.pdf");

if (!output.exists()){

output.getParentFile().mkdirs();

}

FileOutputStream fos = null;

try {

fos = new FileOutputStream(output);

Documents4jLocalServices exporter = new Documents4jLocalServices();

exporter.export(new File("D:\\home\\test.html"), fos, DocumentType.HTML);

fos.close();

} catch (Exception e) {

e.printStackTrace();

}

public static void main(String[] args) {

// Docx4jOfficeFileToPDF.xlsx2Pdf();

// Docx4jOfficeFileToPDF.docx2Pdf();

Docx4jOfficeFileToPDF.html2Pdf();

}

对比评估

如果公司有钱，绝对买：spire.office.free、 appose 这两款，功能齐全，无问题

如果公司没钱，可以用：libreoffice+JODConverter，但需要安装对应的依赖软件，部署相对麻烦

如果希望项目开源，可控：建议用 docx4j，不仅功能齐全，而且开源，部署也无额外成本

‘

其次一些新出来的个人开源软件，例如excel2Pdf、 test-exceltopdf、file-convert-util-master 成熟度都不足，存在不少缺陷，不建议使用

纯粹的pdf调整，可以考虑用itext、openpdf、 Apache PDFBox 三兄弟

方案总结

根据当前公司情况来讲，比较适合用itext，开源、功能完善、部署方便。在查找方案的过程中，通过百度、头条、csdn、微信公总号以及一些开源网站都找过，

发现网上发的文章90%都比较坑，特别是有些没写完整的是个巨坑，无力吐槽。纸上得来终得浅，各种文章以及可行性方案验证了不下20次，基本上能验证的都验证

过了，所以写了本篇总结。希望对大家有所帮助

相关的验证分析源码：核心可以看demo

链接：https://pan.baidu.com/s/1t2CfOFtyMF4Dga7MQU2M_w?pwd=4itf

提取码：4itf