yolov8 python tensort
时间: 2025-02-12 18:13:12 浏览: 52
### 使用YOLOv8与TensorRT进行模型推理或部署
为了在Python中使用YOLOv8与TensorRT进行高效的模型推理或部署,通常需要完成几个主要步骤。这些步骤涉及安装必要的软件包、转换模型以及编写相应的推理代码。
#### 安装依赖项
首先,确保已经安装了所需的库和工具:
- YOLOv8可以通过`ultralytics`库来安装[^2]:
```bash
pip install ultralytics
```
- TensorRT及其Python绑定可以从NVIDIA官方渠道获得。对于特定版本的支持情况,请查阅[NVIDIA文档](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html),并按照指南安装适合的操作系统和CUDA版本对应的TensorRT。
- 如果计划使用ONNX作为中间表示,则还需要安装`onnx`和`onnxruntime-gpu`:
```bash
pip install onnx onnxruntime-gpu
```
#### 转换YOLOv8模型至TensorRT兼容格式
YOLOv8默认导出为PyTorch模型文件(.pt)。要将其用于TensorRT加速,需先转成ONNX格式,再进一步编译为TensorRT引擎。以下是具体操作方法:
1. **保存YOLOv8模型为ONNX**
利用`torch.onnx.export()`函数可将训练好的YOLOv8 PyTorch模型转化为ONNX格式。这里假设已有一个名为`model.pt`的预训练权重文件。
```python
import torch
from pathlib import Path
# 加载YOLOv8模型
model = torch.hub.load('ultralytics/yolov8', 'custom', path='path/to/model.pt')
dummy_input = torch.randn(1, 3, 640, 640).cuda() # 输入张量形状应匹配实际需求
output_onnx = "yolov8.onnx"
input_names = ["input_0"]
output_names = ["output_0"]
dynamic_axes = {'input_0': {0: 'batch_size'}, 'output_0': {0: 'batch_size'}}
torch.onnx.export(
model,
dummy_input,
output_onnx,
export_params=True,
opset_version=11,
do_constant_folding=True,
input_names=input_names,
output_names=output_names,
dynamic_axes=dynamic_axes
)
```
2. **构建TensorRT优化后的模型**
接下来,利用TensorRT API读取上述生成的`.onnx`文件,并创建TRT Engine实例。这一步骤可能因使用的API不同而有所差异;下面给出的是基于Python接口的一个简单例子。
```python
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.LogLevel.WARNING)
def build_engine(onnx_file_path):
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(flags=1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, TRT_LOGGER)
with open(onnx_file_path, 'rb') as model:
if not parser.parse(model.read()):
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
config = builder.create_builder_config()
profile = builder.create_optimization_profile()
min_shape = (1, 3, 320, 320)
opt_shape = (1, 3, 640, 640)
max_shape = (1, 3, 960, 960)
profile.set_shape("input_0", min=min_shape, opt=opt_shape, max=max_shape)
config.add_optimization_profile(profile)
engine = builder.build_serialized_network(network, config)
return engine
engine = build_engine("yolov8.onnx")
```
#### 执行推理过程
一旦拥有了经过TensorRT优化过的engine对象之后,就可以准备执行前向传播计算了。下面是简化版的推理流程展示:
```python
import numpy as np
from PIL import Image
import pycuda.driver as cuda
import pycuda.autoinit
import time
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
with open("yolov8.trt", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
inputs, outputs, bindings, stream = allocate_buffers(engine)
image = Image.open("test_image.jpg").convert('RGB').resize((640, 640), resample=Image.BILINEAR)
np_img = np.array(image).transpose([2, 0, 1]).astype(np.float32)[None,...]
start_time = time.time()
# 将输入数据复制到GPU上
[cuda.memcpy_htod_async(inp.device, inp.host) for inp in inputs]
# 运行推理
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
# 获取输出结果
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
stream.synchronize()
end_time = time.time()
print(f"Inference took {(end_time - start_time)*1000:.2f} ms")
results = [out.host.copy().reshape(-1,) for out in outputs]
```
阅读全文
相关推荐

















