yolov8 python tensort - CSDN文库

### 使用YOLOv8与TensorRT进行模型推理或部署为了在Python中使用YOLOv8与TensorRT进行高效的模型推理或部署，通常需要完成几个主要步骤。这些步骤涉及安装必要的软件包、转换模型以及编写相应的推理代码。 #### 安装依赖项首先，确保已经安装了所需的库和工具： - YOLOv8可以通过`ultralytics`库来安装[^2]： ```bash pip install ultralytics ``` - TensorRT及其Python绑定可以从NVIDIA官方渠道获得。对于特定版本的支持情况，请查阅[NVIDIA文档](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)，并按照指南安装适合的操作系统和CUDA版本对应的TensorRT。 - 如果计划使用ONNX作为中间表示，则还需要安装`onnx`和`onnxruntime-gpu`: ```bash pip install onnx onnxruntime-gpu ``` #### 转换YOLOv8模型至TensorRT兼容格式 YOLOv8默认导出为PyTorch模型文件(.pt)。要将其用于TensorRT加速，需先转成ONNX格式，再进一步编译为TensorRT引擎。以下是具体操作方法： 1. **保存YOLOv8模型为ONNX** 利用`torch.onnx.export()`函数可将训练好的YOLOv8 PyTorch模型转化为ONNX格式。这里假设已有一个名为`model.pt`的预训练权重文件。 ```python import torch from pathlib import Path # 加载YOLOv8模型 model = torch.hub.load('ultralytics/yolov8', 'custom', path='path/to/model.pt') dummy_input = torch.randn(1, 3, 640, 640).cuda() # 输入张量形状应匹配实际需求 output_onnx = "yolov8.onnx" input_names = ["input_0"] output_names = ["output_0"] dynamic_axes = {'input_0': {0: 'batch_size'}, 'output_0': {0: 'batch_size'}} torch.onnx.export( model, dummy_input, output_onnx, export_params=True, opset_version=11, do_constant_folding=True, input_names=input_names, output_names=output_names, dynamic_axes=dynamic_axes ) ``` 2. **构建TensorRT优化后的模型** 接下来，利用TensorRT API读取上述生成的`.onnx`文件，并创建TRT Engine实例。这一步骤可能因使用的API不同而有所差异；下面给出的是基于Python接口的一个简单例子。 ```python import tensorrt as trt TRT_LOGGER = trt.Logger(trt.LogLevel.WARNING) def build_engine(onnx_file_path): builder = trt.Builder(TRT_LOGGER) network = builder.create_network(flags=1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) parser = trt.OnnxParser(network, TRT_LOGGER) with open(onnx_file_path, 'rb') as model: if not parser.parse(model.read()): for error in range(parser.num_errors): print(parser.get_error(error)) return None config = builder.create_builder_config() profile = builder.create_optimization_profile() min_shape = (1, 3, 320, 320) opt_shape = (1, 3, 640, 640) max_shape = (1, 3, 960, 960) profile.set_shape("input_0", min=min_shape, opt=opt_shape, max=max_shape) config.add_optimization_profile(profile) engine = builder.build_serialized_network(network, config) return engine engine = build_engine("yolov8.onnx") ``` #### 执行推理过程一旦拥有了经过TensorRT优化过的engine对象之后，就可以准备执行前向传播计算了。下面是简化版的推理流程展示： ```python import numpy as np from PIL import Image import pycuda.driver as cuda import pycuda.autoinit import time class HostDeviceMem(object): def __init__(self, host_mem, device_mem): self.host = host_mem self.device = device_mem def allocate_buffers(engine): inputs = [] outputs = [] bindings = [] stream = cuda.Stream() for binding in engine: size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size dtype = trt.nptype(engine.get_binding_dtype(binding)) # Allocate host and device buffers host_mem = cuda.pagelocked_empty(size, dtype) device_mem = cuda.mem_alloc(host_mem.nbytes) # Append the device buffer to device bindings. bindings.append(int(device_mem)) # Append to the appropriate list. if engine.binding_is_input(binding): inputs.append(HostDeviceMem(host_mem, device_mem)) else: outputs.append(HostDeviceMem(host_mem, device_mem)) return inputs, outputs, bindings, stream with open("yolov8.trt", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime: engine = runtime.deserialize_cuda_engine(f.read()) context = engine.create_execution_context() inputs, outputs, bindings, stream = allocate_buffers(engine) image = Image.open("test_image.jpg").convert('RGB').resize((640, 640), resample=Image.BILINEAR) np_img = np.array(image).transpose([2, 0, 1]).astype(np.float32)[None,...] start_time = time.time() # 将输入数据复制到GPU上 [cuda.memcpy_htod_async(inp.device, inp.host) for inp in inputs] # 运行推理 context.execute_async_v2(bindings=bindings, stream_handle=stream.handle) # 获取输出结果 [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs] stream.synchronize() end_time = time.time() print(f"Inference took {(end_time - start_time)*1000:.2f} ms") results = [out.host.copy().reshape(-1,) for out in outputs] ```

阅读全文

相关推荐

CSDN会员

开通CSDN年卡参与万元壕礼抽奖

海量 VIP免费资源千本正版电子书商城会员专享价千门课程&专栏

全年可省5,000元立即开通