Python100个库分享第39个—xhtml2pdf（HTML转PDF）

原创于 2025-07-30 10:05:09 发布 · 566 阅读

21 ·

CC 4.0 BY-SA版权

文章标签：

#html #pdf #前端 #python #开发语言 #html转PDF #htmlPDF

Python100个库分享专栏收录该内容

39 篇文章

订阅专栏

专栏导读

🌸 欢迎来到Python办公自动化专栏—Python处理办公问题，解放您的双手

🏳️‍🌈 博客主页：请点击——> 一晌小贪欢的博客主页求关注

👍 该系列文章专栏：请点击——>Python办公自动化专栏求订阅

🕷 此外还有爬虫专栏：请点击——>Python爬虫基础专栏求订阅

📕 此外还有python基础专栏：请点击——>Python基础学习专栏求订阅

文章作者技术和水平有限，如果文中出现错误，希望大家能指正🙏

❤️ 欢迎各位佬关注！ ❤️

📚 库简介

xhtml2pdf 是一个功能强大的Python开源库，专门用于将HTML内容转换为PDF文档。该库基于ReportLab构建，能够保持HTML的原始结构和CSS样式，是Python生态系统中最受欢迎的HTML转PDF解决方案之一。

🎯 主要特性

✅ HTML到PDF转换：支持将HTML字符串或文件直接转换为PDF
✅ CSS样式支持：完整支持CSS样式的解析和渲染
✅ 图像和字体：支持在PDF中嵌入图像和自定义字体
✅ 超链接处理：保留HTML中的超链接功能
✅ 外部样式表：支持引用外部CSS文件
✅ 跨平台兼容：支持Windows、Linux、macOS

🚀 安装方法

系统要求

Python 3.8.0 或更高版本
推荐使用虚拟环境

安装命令

# 使用pip安装
pip install xhtml2pdf

# 或者从源码安装
pip install git+https://github.com/xhtml2pdf/xhtml2pdf.git

💡 基础用法

1. 简单的HTML字符串转PDF

from xhtml2pdf import pisa

# HTML内容
html_content = """
<html>
<head>
    <style>
        body { font-family: Arial, sans-serif; }
        h1 { color: #333; text-align: center; }
        p { line-height: 1.6; margin: 10px 0; }
    </style>
</head>
<body>
    <h1>欢迎使用xhtml2pdf</h1>
    <p>这是一个简单的HTML转PDF示例。</p>
    <p>xhtml2pdf可以保持CSS样式和HTML结构。</p>
</body>
</html>
"""

# 转换为PDF
with open("output.pdf", "w+b") as result_file:
    pisa_status = pisa.CreatePDF(
        html_content,
        dest=result_file
    )
    
    if pisa_status.err:
        print("转换过程中发生错误！")
    else:
        print("PDF生成成功！")

2. 从HTML文件转换

from xhtml2pdf import pisa

def convert_html_to_pdf(source_html, output_filename):
    """
    将HTML文件转换为PDF
    
    Args:
        source_html (str): HTML文件路径
        output_filename (str): 输出PDF文件名
    """
    # 读取HTML文件
    with open(source_html, 'r', encoding='utf-8') as html_file:
        html_content = html_file.read()
    
    # 转换为PDF
    with open(output_filename, "w+b") as result_file:
        pisa_status = pisa.CreatePDF(
            html_content,
            dest=result_file,
            encoding='utf-8'
        )
        
        return not pisa_status.err

# 使用示例
success = convert_html_to_pdf("template.html", "document.pdf")
if success:
    print("转换成功！")
else:
    print("转换失败！")

3. 使用BytesIO处理内存中的PDF

import io
from xhtml2pdf import pisa

def html_to_pdf_bytes(html_content):
    """
    将HTML转换为PDF字节流
    
    Args:
        html_content (str): HTML内容
        
    Returns:
        bytes: PDF文件的字节数据
    """
    output = io.BytesIO()
    
    pisa_status = pisa.CreatePDF(
        html_content,
        dest=output
    )
    
    if not pisa_status.err:
        return output.getvalue()
    else:
        return None

# 使用示例
html = "<html><body><h1>内存中的PDF</h1></body></html>"
pdf_bytes = html_to_pdf_bytes(html)

if pdf_bytes:
    # 保存到文件
    with open("memory_pdf.pdf", "wb") as f:
        f.write(pdf_bytes)
    print(f"PDF大小: {len(pdf_bytes)} 字节")

🎨 高级功能

1. 添加图像支持

html_with_image = """
<html>
<head>
    <style>
        .container { text-align: center; padding: 20px; }
        img { max-width: 300px; border: 2px solid #ccc; }
    </style>
</head>
<body>
    <div class="container">
        <h1>包含图像的PDF</h1>
        <img src="https://via.placeholder.com/300x200" alt="示例图片">
        <p>这是一个包含图像的PDF文档示例。</p>
    </div>
</body>
</html>
"""

with open("pdf_with_image.pdf", "w+b") as result_file:
    pisa.CreatePDF(html_with_image, dest=result_file)

2. 自定义页面设置

html_with_page_settings = """
<html>
<head>
    <style>
        @page {
            size: A4;
            margin: 2cm;
            @bottom-right {
                content: "第 " counter(page) " 页";
            }
        }
        body {
            font-family: "SimHei", sans-serif;
            font-size: 12pt;
        }
        .header {
            text-align: center;
            border-bottom: 2px solid #333;
            padding-bottom: 10px;
            margin-bottom: 20px;
        }
    </style>
</head>
<body>
    <div class="header">
        <h1>专业文档标题</h1>
        <p>副标题信息</p>
    </div>
    
    <h2>第一章 概述</h2>
    <p>这里是文档的主要内容...</p>
    
    <div style="page-break-before: always;">
        <h2>第二章 详细说明</h2>
        <p>这是新页面的内容...</p>
    </div>
</body>
</html>
"""

with open("professional_document.pdf", "w+b") as result_file:
    pisa.CreatePDF(html_with_page_settings, dest=result_file)

3. 错误处理和日志

from xhtml2pdf import pisa
import logging

# 启用详细日志
pisa.showLogging()

def safe_html_to_pdf(html_content, output_path):
    """
    安全的HTML转PDF转换，包含完整的错误处理
    """
    try:
        with open(output_path, "w+b") as result_file:
            pisa_status = pisa.CreatePDF(
                html_content,
                dest=result_file,
                encoding='utf-8'
            )
            
            if pisa_status.err:
                logging.error(f"PDF转换失败，错误数量: {pisa_status.err}")
                return False
            else:
                logging.info(f"PDF转换成功: {output_path}")
                return True
                
    except Exception as e:
        logging.error(f"转换过程中发生异常: {str(e)}")
        return False

# 使用示例
html = "<html><body><h1>测试文档</h1></body></html>"
result = safe_html_to_pdf(html, "safe_output.pdf")

🔧 实际应用场景

1. 生成报告文档

def generate_report(data):
    """
    生成数据报告PDF
    """
    html_template = f"""
    <html>
    <head>
        <style>
            table {{ border-collapse: collapse; width: 100%; }}
            th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
            th {{ background-color: #f2f2f2; }}
            .summary {{ background-color: #e7f3ff; padding: 15px; margin: 10px 0; }}
        </style>
    </head>
    <body>
        <h1>数据分析报告</h1>
        <div class="summary">
            <h3>摘要</h3>
            <p>总记录数: {data['total_records']}</p>
            <p>生成时间: {data['generated_at']}</p>
        </div>
        
        <h2>详细数据</h2>
        <table>
            <tr><th>项目</th><th>数值</th><th>状态</th></tr>
    """
    
    for item in data['items']:
        html_template += f"""
            <tr>
                <td>{item['name']}</td>
                <td>{item['value']}</td>
                <td>{item['status']}</td>
            </tr>
        """
    
    html_template += """
        </table>
    </body>
    </html>
    """
    
    return html_template

# 使用示例
report_data = {
    'total_records': 150,
    'generated_at': '2024-01-15 10:30:00',
    'items': [
        {'name': '项目A', 'value': 100, 'status': '正常'},
        {'name': '项目B', 'value': 85, 'status': '警告'},
        {'name': '项目C', 'value': 120, 'status': '正常'}
    ]
}

html_report = generate_report(report_data)
with open("data_report.pdf", "w+b") as f:
    pisa.CreatePDF(html_report, dest=f)

2. Django集成示例

# Django视图函数
from django.http import HttpResponse
from django.template.loader import render_to_string
from xhtml2pdf import pisa
import io

def generate_invoice_pdf(request, invoice_id):
    """
    生成发票PDF
    """
    # 获取发票数据
    invoice = get_object_or_404(Invoice, id=invoice_id)
    
    # 渲染HTML模板
    html_string = render_to_string('invoice_template.html', {
        'invoice': invoice,
        'items': invoice.items.all()
    })
    
    # 创建PDF
    result = io.BytesIO()
    pdf = pisa.CreatePDF(html_string, dest=result)
    
    if not pdf.err:
        response = HttpResponse(
            result.getvalue(),
            content_type='application/pdf'
        )
        response['Content-Disposition'] = f'attachment; filename="invoice_{invoice_id}.pdf"'
        return response
    
    return HttpResponse('PDF生成失败', status=500)

⚠️ 注意事项

1. 中文字体支持

# 处理中文字体问题
html_with_chinese = """
<html>
<head>
    <style>
        @font-face {
            font-family: 'SimHei';
            src: url('simhei.ttf');
        }
        body {
            font-family: 'SimHei', sans-serif;
        }
    </style>
</head>
<body>
    <h1>中文标题</h1>
    <p>这是中文内容的示例。</p>
</body>
</html>
"""

2. 性能优化建议

对于大量数据，考虑分页处理
优化CSS样式，避免复杂的布局
压缩图像大小以减少PDF文件大小
使用缓存机制避免重复转换

3. 常见问题解决

# 解决编码问题
def fix_encoding_issues(html_content):
    """
    修复常见的编码问题
    """
    # 确保HTML声明了正确的编码
    if 'charset' not in html_content:
        html_content = html_content.replace(
            '<head>',
            '<head><meta charset="UTF-8">'
        )
    return html_content

📊 性能对比

特性	xhtml2pdf	WeasyPrint	ReportLab
易用性	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
CSS支持	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐
性能	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
文档质量	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐