yolov8搭载tensorrt python
时间: 2025-01-10 16:48:34 浏览: 51
### 使用TensorRT部署和优化YOLOv8模型
为了在Python环境中成功使用TensorRT来部署并优化YOLOv8模型,可以遵循一系列特定的操作流程。这些步骤不仅涵盖了从原始模型到可执行推理引擎的过程,还包括了必要的环境准备。
#### 准备工作
确保安装了所需的库和支持工具,特别是`tensorrt`及其依赖项。可以通过官方文档获取最新的安装指南[^3]。
#### 转换ONNX模型至TensorRT Engine
首先需要将训练好的YOLOv8 PyTorch模型转换成ONNX格式。这一步骤通常通过调用PyTorch内置的方法完成:
```python
import torch.onnx
from models.experimental import attempt_load
model = attempt_load('yolov8.pth', map_location='cpu') # 加载预训练权重文件
dummy_input = torch.randn(1, 3, 640, 640) # 创建虚拟输入张量
torch.onnx.export(model, dummy_input, "yolov8.onnx", export_params=True,
opset_version=12, do_constant_folding=True,
input_names=['input'], output_names=['output'])
```
接着利用TensorRT API加载上述导出的`.onnx`文件,并将其编译为高效的运行时引擎(engine)。此过程中涉及到了一些重要的参数设置,比如精度模式的选择等[^2]:
```python
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
def build_engine(onnx_file_path):
with trt.Builder(TRT_LOGGER) as builder, \
builder.create_network() as network, \
trt.OnnxParser(network, TRT_LOGGER) as parser:
config = builder.create_builder_config()
config.max_workspace_size = 1 << 30 # 设置最大工作空间大小
with open(onnx_file_path, 'rb') as model:
if not parser.parse(model.read()):
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
engine = builder.build_serialized_network(network, config)
return engine
```
#### 执行推理
构建好Engine之后就可以开始实际的数据推断了。这里展示了一个简单的例子说明如何创建上下文对象、分配内存缓冲区并将图像送入网络进行预测:
```python
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
with trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(build_engine("yolov8.onnx"))
context = engine.create_execution_context()
inputs, outputs, bindings, stream = allocate_buffers(engine)
# 假设img_data已经准备好作为numpy数组形式存储待检测图片像素值
np.copyto(inputs[0].host, img_data.ravel())
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
stream.synchronize()
print(outputs[0].host.reshape(-1).tolist())
```
阅读全文
相关推荐


















