pdfcraft转换cpu与gpu的效率对比

本文链接：https://blog.csdn.net/toseekin/article/details/147040277

使用pdfcraft将pdf文件转换为markdown格式，使用cpu与gpu2种配置方式转换同一文件。使用anaconda构建3.10版本的python环境1、环境2，cpu配置方式启动环境1，gpu配置方式启动环境2。

环境1的安装包参数：

onnx 1.13.0
onnxruntime 1.21.0
pdf-craft 0.0.15
torch 2.6.0
torchvision 0.21.0

环境2的安装包参数：

onnx                     1.16.0
onnxruntime              1.20.0
pdf-craft                0.0.12
torch                    2.6.0+cu126
torchaudio               2.6.0+cu126
torchvision              0.21.0

pdf-craft安装指令

pip install pdf-craft

onnxruntime的gpu版本安装指令为，版本适配参考NVIDIA - CUDA | onnxruntime

pip install onnxruntime-gpu==1.20.0

cpu配置方式的代码

from pdf_craft import PDFPageExtractor, MarkDownWriter
import time
from functools import wraps

def timeit(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        elapsed_ms = (end_time - start_time) * 1000  # 转换为毫秒
        print(f"Function '{func.__name__}' executed in {elapsed_ms:.2f} ms")
        return result
    return wrapper

# 创建 PDF 提取器对象
extractor = PDFPageExtractor(
    device="cpu",  # 如果使用 GPU，请改为 device="cuda:0"
    model_dir_path="C:\ExtractionModel"  # 模型下载和存储路径
)

markdown_path = r'QT中的元对象系统-cpu.md'
pdf_path = r'QT中的元对象系统.pdf'

# 创建 Markdown 写入器对象
@timeit
def ParseMD(pdf_path, markdown_path):
    # 创建 Markdown 写入器对象
    with MarkDownWriter(markdown_path, "images", "utf-8") as md:
        for block in extractor.extract(pdf=pdf_path):
            md.write(block)

ParseMD(pdf_path,markdown_path)

gpu配置方式的代码

from pdf_craft import PDFPageExtractor, MarkDownWriter
import time
from functools import wraps

def timeit(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        elapsed_ms = (end_time - start_time) * 1000  # 转换为毫秒
        print(f"Function '{func.__name__}' executed in {elapsed_ms:.2f} ms")
        return result
    return wrapper

# 创建 PDF 提取器对象
extractor = PDFPageExtractor(
    device="cuda:0",  # 如果使用 GPU，请改为 device="cuda:0"   
    # device="cpu",  # 如果使用 GPU，请改为 device="cuda:0"
    model_dir_path="C:\ExtractionModel"  # 模型下载和存储路径
)

markdown_path = r'QT中的元对象系统_gpu.md'
pdf_path = r'QT中的元对象系统.pdf'

@timeit
def ParseMD(pdf_path, markdown_path):
    # 创建 Markdown 写入器对象
    with MarkDownWriter(markdown_path, "images", "utf-8") as md:
        for block in extractor.extract(pdf=pdf_path):
            md.write(block)

@timeit
def example_function(n):
    time.sleep(n / 1000)  # 转换为秒
    return "done"

ParseMD(pdf_path,markdown_path)

首次运行pdf-craft会下载关联模型的参数文件到C:\ExtractionModel，可根据具体需要调整，在文章顶部可直接下载。

cpu方式运行的日志记录

0: 1024x736 2 titles, 6 plain texts, 8609.1ms
Speed: 24.0ms preprocess, 8609.1ms inference, 27.0ms postprocess per image at shape (1, 3, 1024, 736)

0: 1024x736 12 plain texts, 6749.0ms
Speed: 16.0ms preprocess, 6749.0ms inference, 5.0ms postprocess per image at shape (1, 3, 1024, 736)
0: 1024x736 1 title, 9 plain texts, 8267.8ms
Speed: 20.0ms preprocess, 8267.8ms inference, 5.0ms postprocess per image at shape (1, 3, 1024, 736)
0: 1024x736 1 title, 15 plain texts, 1 figure, 10182.0ms
Speed: 26.0ms preprocess, 10182.0ms inference, 6.0ms postprocess per image at shape (1, 3, 1024, 736)

0: 1024x736 4 plain texts, 7552.4ms
Speed: 23.0ms preprocess, 7552.4ms inference, 7.0ms postprocess per image at shape (1, 3, 1024, 736)
Function 'ParseMD' executed in 383678.58 ms

gpu方式运行的日志记录

0: 1024x736 2 titles, 6 plain texts, 91.7ms
Speed: 8.0ms preprocess, 91.7ms inference, 55.3ms postprocess per image at shape (1, 3, 1024, 736)

0: 1024x736 12 plain texts, 15.9ms
Speed: 4.0ms preprocess, 15.9ms inference, 2.0ms postprocess per image at shape (1, 3, 1024, 736)

0: 1024x736 1 title, 9 plain texts, 15.9ms
Speed: 4.0ms preprocess, 15.9ms inference, 3.0ms postprocess per image at shape (1, 3, 1024, 736)

0: 1024x736 1 title, 15 plain texts, 1 figure, 16.9ms
Speed: 4.0ms preprocess, 16.9ms inference, 1.0ms postprocess per image at shape (1, 3, 1024, 736)

0: 1024x736 4 plain texts, 14.9ms
Speed: 5.0ms preprocess, 14.9ms inference, 2.0ms postprocess per image at shape (1, 3, 1024, 736)
Function 'ParseMD' executed in 471143.47 ms

gpu方式反而更慢...很奇怪