yolo数据处理
时间: 2025-04-27 20:24:53 浏览: 22
### YOLO 数据预处理及后处理技术
#### 数据预处理
对于YOLO系列算法,在数据准备阶段,通常会涉及到图像和标签的预处理工作。具体来说,使用工具如LabelImg标注的目标检测数据会被保存为特定格式(通常是Pascal VOC或者COCO格式),这些原始数据需要经过一系列变换才能用于模型训练。
- **图片增强**:为了提高模型泛化能力,会对输入图片做随机裁剪、翻转、颜色抖动等操作来扩充样本多样性[^2]。
- **尺寸调整**:由于神经网络接受固定大小的输入张量,因此所有待训练的图片都需要被缩放到统一尺寸。这一过程中可能会采用插值法保持比例不变的同时填充空白区域以满足目标分辨率的要求。
- **归一化**:像素值一般会在送入卷积层之前映射到0~1之间或其他指定区间内,以便加速收敛并稳定梯度传播过程。
```python
import cv2
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize((640, 640)), # Resize image to fixed size
transforms.ToTensor(), # Convert a PIL Image or numpy.ndarray to tensor (H x W x C) in the range [0.0, 1.0]
])
image = transform(image=cv2.imread('path_to_image'))['image']
```
#### 后处理
当完成推理得到预测框之后,则进入到了至关重要的后处理环节——非极大抑制(NMS),其目的是去除冗余重叠较高的边界框只保留最优解。此部分逻辑主要实现在`yolov5/utils/general.py`文件中的`non_max_suppression`函数里[^1]:
- **置信度过滤**:设定阈值筛选掉那些得分较低的对象候选框;
- **分类概率计算**:基于Softmax激活后的输出向量获取各类别的可能性分布;
- **IoU比较与剔除**:按照类别分别对剩余矩形按面积交并比(IoU)进行两两对比,如果两个框之间的相似程度超过了给定界限则移除其中较小的那个直至不再存在冲突为止。
```python
def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
labels=()):
"""Runs Non-Maximum Suppression (NMS) on inference results.
Parameters:
prediction: Tensor of shape (batch_size, num_boxes, class_confidence + box_coordinates)
...
Returns:
list of detections, on (n,6) tensor per image [xyxy, conf, cls].
"""
nc = prediction.shape[2] - 5 # number of classes
xc = prediction[..., 4] > conf_thres # candidates
min_wh, max_wh = 2, 4096 # (pixels) minimum and maximum box width and height
time_limit = 10.0 # seconds to quit after
redundant = True # require redundant detections
merge = False # use merge-NMS
multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
t = time.time()
output = [torch.zeros(0, 6)] * prediction.shape[0]
for xi, x in enumerate(prediction): # image index, image inference
# Apply constraints
x = x[xc[xi]] # confidence
# If none remain process next image
if not x.shape[0]:
continue
# Compute conf
x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf
# Box (center x, center y, width, height) to (x1, y1, x2, y2)
box = xywh2xyxy(x[:, :4])
# Detections matrix nx6 (xyxy, conf, cls)
if multi_label:
i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T
x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)
else: # best class only
conf, j = x[:, 5:].max(1, keepdim=True)
x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]
# Filter by class
if classes is not None:
x = x[(j.view(-1, 1) == torch.tensor(classes, device=j.device)).any(1)]
# Check shape
n = x.shape[0] # number of boxes
if not n: # no boxes
continue
elif n > max_nms: # excess boxes
x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence
# Batched NMS
c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
if i.shape[0] > max_det: # limit detections
i = i[:max_det]
if merge and (1 < n < 3E3):
# Merge NMS (boxes merged using weighted mean)
# update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix
weights = iou * scores[None] # box weights
x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes
if redundant:
i = i[iou.sum(1) > 1] # require redundancy
output[xi] = x[i]
if (time.time() - t) > time_limit:
print(f'WARNING: NMS time limit {time_limit}s exceeded')
break # time limit exceeded
return output
```
阅读全文
相关推荐

















