TensorRT--PTQ（隐式量化）--keras

原创已于 2025-01-13 10:45:49 修改 · 348 阅读

CC 4.0 BY-SA版权

文章标签：

于 2025-01-13 10:37:22 首次发布

原因：模型所使用框架是keras，如果使用TFLite工具对训练后得到的.onnx量化，生成的.tflite无法直接转为.trt，还需转回.onnx，因此集成时量化结果无法保留。所以考虑应用TensorRT对训练后得到的.onnx进行量化。

方式1：PTQ -- trtexec

（1）int8量化

trtexec --onnx='xx.onnx' --int8 --saveEngine='xx.trt'

（2）int8、fp16混合量化

trtexec --onnx='xx.onnx' --fp16 --int8 --saveEngine='xx.trt'

上述两方法精度下降都非常严重。

方式2：PTQ -- engine序列化时执行

操作流程：按照常规方案导出onnx，onnx序列化为tensorrt engine之前打开int8量化模式并采用校正数据集进行校正；

Step1: 安装polygraphy工具

pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com

Step2: 编写data_loader.py文件

以下是我的data_loader.py代码，参考训练过程中的数据加载过程编写，加载校正图片数据集及其标签文件。

import os
import numpy as np
from PIL import Image
from utils.utils import cvtColor, preprocess_input

# 数据加载器
class CalibrationDataset:
    def __init__(self, annotation_lines, input_shape, dataset_path):
        self.annotation_lines = annotation_lines
        self.input_shape = input_shape
        self.dataset_path = dataset_path

    def __len__(self):
        return len(self.annotation_lines)

    def __getitem__(self, index):
        annotation_line = self.annotation_lines[index]
        name = annotation_line.split()[0]

        # 加载图像文件
        jpg = Image.open(os.path.join(self.dataset_path, name + ".jpg"))
        jpg = self.get_random_data(jpg, self.input_shape)
        # jpg = np.transpose(preprocess_input(np.array(jpg, np.float32)), [2, 0, 1])
        jpg = preprocess_input(np.array(jpg, np.float32))
        # jpg = np.transpose(jpg,[2,0,1])
        jpg = np.expand_dims(jpg, 0)
        return jpg

    def get_random_data(self, image, input_shape):
        # 图像调整尺寸并填充灰条
        image = cvtColor(image)
        iw, ih = image.size
        h, w = input_shape
        scale = min(w / iw, h / ih)
        nw = int(iw * scale)
        nh = int(ih * scale)

        image = image.resize((nw, nh), Image.BICUBIC)
        new_image = Image.new('RGB', [w, h], (128, 128, 128))
        new_image.paste(image, ((w - nw) // 2, (h - nh) // 2))
        return new_image

# 定义数据加载函数
def load_data():
    # 配置路径和参数
    annotation_path = "./new_data/image_list.txt"
    dataset_path = "./new_data/jpg/"
    input_shape = (416, 416)                   # 替换为模型的输入形状 (H, W)
    
    # 读取标注文件
    with open(annotation_path, "r") as f:
        annotation_lines = f.readlines()

    # 初始化数据集
    dataset = CalibrationDataset(annotation_lines, input_shape, dataset_path)

    # 返回生成器，供 Polygraphy 使用
    for i in range(len(dataset)):
        yield {"input_1": dataset[i]}  # 替换 "input" 为模型的实际输入名称

Step3:利用polygraphy工具进行转换

polygraphy convert "./xx.onnx" --int8 --data-loader-script "./data_loader.py" --calibration-cache calibration.cache -o "./seg.engine"

生成的.engine文件和.trt是一样的，只是后缀不同，都可以使用tensorrt进行推理。

注意事项：

1.保证模型conda虚拟环境中的安装的tensorrt包的版本和下载进行推理的tensorrt的版本一致。

2.注意调整shape兼容，例如统一为（B，C，H，W）。

3.网上各种材料转换时使用命令为polygraphy convert "./xx.onnx" --int8 --data-loader-script "./data_loader.py" --calibration-cache calibration.cache -o "./model.plan" --trt-min-shapes input_1:[1, 3, 416, 416] --trt-opt-shapes input_1:[1, 3, 416, 416] --trt-max-shapes input_1:[1, 3, 416, 416]，但是跑不通，官网上给出的是Step3中的参考示例，建议避雷。

4.以上均为我的个人实践过程和想法，仅供参考，谢谢！