【yolov5】loss.py源码理解

永力

已于 2022-04-24 22:02:28 修改

阅读量2.2k

点赞数 1

CC 4.0 BY-SA版权

分类专栏：深度学习 python 文章标签： python 深度学习

于 2022-04-23 23:41:32 首次发布

本文链接：https://blog.csdn.net/sinat_38685124/article/details/124217359

深度学习同时被 2 个专栏收录

12 篇文章

订阅专栏

python

10 篇文章

订阅专栏

本文详细介绍了YoloV5中loss.py的build_targets函数，该函数通过两步操作扩充正样本，缓解正负样本不均衡问题。首先，针对每个目标匹配最合适的anchor，然后根据目标框的偏移量选取邻域的2个网格作为额外正样本。作者通过自己的代码实现了这一过程，加深了对YoloV5目标检测算法的理解。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

yolov5的loss.py中的build_targets函数中有两处扩充正样本的地方：

因为anchor有３个，所以将targets扩充成３份，每一份共享一个anchor；假设一共有20个targets目标框，则将目标数扩充至[3, 20],共６０个目标；第一份的２０个目标与第一个anchor匹配，第二份的２０个目标与第二个anchor匹配，第三份的２０个目标与第三个anchor匹配，那么会有一部分目标没有匹配上(目标框与anchor的宽比或高比超出阈值)，则６０个targets里可能只有３０个targets匹配成功，剩余的targets过滤掉；
因此，可以看到原先的２０个正样本被扩充到３０个，起到了扩充正样本的作用；当然如果阈值(anchor_t)卡的太严，也可能会有大量的目标框被过滤掉；
yolov5考虑到下采样的过程中可能导致中心点偏移误差，因此根据targets的偏移量选择邻域的２个网格(４邻域中选２个)也作为正样本，这个操作可将正样本扩充到原来的３倍，即第一步的３０个目标又被扩充至９０个；

可以看到，经过两处操作，最初的２０个目标被扩充至９０个，缓解了正负样本不均衡的问题；

yolov5的代码晦涩难懂，看了好久才了解其中思路，用自己的代码复现了一遍：

    def build_targets(self, preds, targets):
        '''
        :param preds:       list(Tensor[b, 3, h, w, 85],...)
        :param targets:     Tensor[N, 6]  img_indx, cls, x, y, w, h
        :return:
            tcls            list(Tensor[N1], Tensor[N2], Tensor[N3]) 　　对应三个输出层，每层的targets的类别
            tbox            list(Tensor[N1, 4],Tensor[N2, 4],Tensor [N3, 4])		三个输出层，每层的targets目标框的尺寸(x, y, w, h)
            indices         list(tuple(Tensor[N1], Tensor[N1],Tensor [N1],Tensor [N1]), 
            							tuple(Tensor[N2], Tensor[N2],Tensor [N2],Tensor [N2]),
            							tuple(Tensor[N3], Tensor[N3],Tensor [N3],Tensor [N3]))		三个输出层，每层的targets目标框的信息(b, a, gj, gi)
            anch            list(Tensor[N1, 2],Tensor [N2, 2], Tensor[N3, 2])		三个输出层，每层的targets目标框对应的anchor
        '''
        nt, na = targets.shape[0], self.anchors.shape[1]
        tcls, tbox, indices, anch = [], [], [], []
        device = targets.device
        targets = targets.repeat(3, 1, 1)   # [3, N, 6]
        anchor_idx = torch.arange(3, device=device).view(3, -1).repeat(1, nt)      # [3, N]
        targets = torch.cat((targets, anchor_idx[..., None]), 2)    # [3, N, 7]
        gain = torch.ones(7, device=device)
        for i, pred in enumerate(preds):
            h, w = pred.shape[2], pred.shape[3]
            anchor = self.anchors[i]        # [3, 2]
            if nt:
                '''为每个target匹配合适的anchor'''
                gain[2:6] = torch.tensor([w, h, w, h], device=device)   # [7]
                t_pixel = targets * gain    # targets的pixel坐标[3, N, 7]
                ratio = t_pixel[..., 4:6] / anchor[:, None]   # [3, N, 2]/[3, 1, 2] = [3, N, 2]
                j = torch.max(ratio, 1/ratio).max(2)[0] < self.hyp['anchor_t']        # [3, N]
                t_pixel = t_pixel[j]    # [N1, 6]

                '''为每个target扩增正样本'''
                g = 0.5
                gxy = t_pixel[..., 2:4]      # [N1, 2]
                gxy_t = torch.tensor([w, h], device=device) - gxy     # [N1, 2]
                i, j = ((gxy % 1 < g) & (gxy > 1)).T
                l, k = ((gxy_t % 1 < g) & (gxy_t > 1)).T

                t = torch.cat((t_pixel, t_pixel[i], t_pixel[j], t_pixel[l], t_pixel[k]), dim=0)               # [3*N1, 6]

                t_left = t_pixel[i][..., 2:4] + torch.tensor([-1, 0], device=device)    # [n1, 2]
                t_right = t_pixel[l][..., 2:4] + torch.tensor([1, 0], device=device)    # [n2, 2]
                t_up = t_pixel[j][..., 2:4] + torch.tensor([0, -1], device=device)      # [N1-n1, 2]
                t_down = t_pixel[k][..., 2:4] + torch.tensor([0, 1], device=device)     # [N1-n2, 2]

                tij = torch.cat((gxy, t_left, t_up, t_right, t_down), dim=0).long()     # [3*N1, 2]
            else:
                t = targets[0]
                tij = t[:, 2:4].long()

            ai = t[:, 6].long()            # [３×N1]
            gwh = t[:, 4:6]
            gxy_offset = t[:, 2:4] - tij

            tcls.append(t[:, 1].long())
            tbox.append(torch.cat((gxy_offset, gwh), 1))
            anch.append(anchor[ai])
            indices.append((t[:, 0].long(), t[:, 6].long(), tij[:, 1].long().clamp(0, h-1), tij[:, 0].long().clamp(0, w-1)))
        return tcls, tbox, indices, anch