YOLOv8优化策略:Backbone改进 | 支持restnet50和restnet101

本文介绍了如何将ResNet50和ResNet101应用于YOLOv8的backbone,通过详细解析ResNet的原理和代码实现,展示了在YOLOv8-resnet50.yaml和YOLOv8-resnet101.yaml配置文件中的应用。实验结果显示,使用残差网络能够优化深度学习模型的训练,并且在保持较低复杂性的同时,提高了网络的准确性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  🚀🚀🚀本文改进: 将restnet50和restnet101作为backbone引入到YOLOv8,下表为参数量和计算量的对比

layersparametersgradientsGFLOPs
yolov8m295258568992585688379.1
yolov8l3654363061143630595165.4
yolov8x3656822964868229632258.5
yolov8-resnet50350260779072607789174.3
yolov8-resnet1015544507003545070019135.2

 🚀🚀🚀YOLOv8改进专栏:http://t.csdnimg.cn/hGhVK

学姐带你学习YOLOv8,从入门到创新,轻轻松松搞定科研;

1.Resnet原理介绍

论文:https://arxiv.org/pdf/1512.03385.pdf

 摘要:深度神经网络更难训练。我们提出了一个残差学习框架,以简化比以前使用的网络深度大得多的网络的训练。我们明确地将层重新表述为参考层输入的学习残差函数,而不是学习未参考的函数。我们提供了全面的经验证据,表明这些残差网络更容易优化,并且可以从相当大的深度中获得精度。在ImageNet数据集上,我们评估了深度高达152层的残差网络——比VGG网络深8倍,但仍然具有较低的复杂性。这些残差网络的集合在ImageNet测试集上的误差达到3.57%。该结果在ILSVRC 2015分类任务中获得第一名。我们还介绍了100层和1000层的CIFAR-10分析。

 

残差结构:整个模块除了正常的卷积层输出外,还有一个分支把输入直接连到输出上,该分支输出和卷积的输出做算术相加得到最终的输出,用公式表达就是H(x)=F(x)+x,其中x是输入,F(x)是卷积分支的输出,H(x)是整个结构的输出。可以证明如果F(x)分支中所有参数都是0,H(x)=x,即H(x)与x为恒等映射。
 

2.源码实现

2.1 ResNetLayer实现位置在ultralytics/nn/modules/block.py

温馨提醒:最新版本源码已经自带ResNetLayer

class ResNetLayer(nn.Module):
    """ResNet layer with multiple ResNet blocks."""

    def __init__(self, c1, c2, s=1, is_first=False, n=1, e=4):
        """Initializes the ResNetLayer given arguments."""
        super().__init__()
        self.is_first = is_first

        if self.is_first:
            self.layer = nn.Sequential(Conv(c1, c2, k=7, s=2, p=3, act=True),
                                       nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
        else:
            blocks = [ResNetBlock(c1, c2, s, e=e)]
            blocks.extend([ResNetBlock(e * c2, c2, 1, e=e) for _ in range(n - 1)])
            self.layer = nn.Sequential(*blocks)

    def forward(self, x):
        """Forward pass through the ResNet layer."""
        return self.layer(x)

2.2本文重点,如何将resnet50、resnet101作为backbone在YOLOv8中使用

2.2.1 yolov8-resnet50.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, ResNetLayer, [3, 64, 1, True, 1]]  # 0
  - [-1, 1, ResNetLayer, [64, 64, 1, False, 3]]  # 1
  - [-1, 1, ResNetLayer, [256, 128, 2, False, 4]]  # 2
  - [-1, 1, ResNetLayer, [512, 256, 2, False, 6]]  # 3
  - [-1, 1, ResNetLayer, [1024, 512, 2, False, 3]]  # 4

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 3], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 7

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 2], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 10 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 7], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 13 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 4], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 16 (P5/32-large)

  - [[10, 13, 16], 1, Detect, [nc]]  # Detect(P3, P4, P5)

2.2.2 yolov8-resnet101.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, ResNetLayer, [3, 64, 1, True, 1]]  # 0
  - [-1, 1, ResNetLayer, [64, 64, 1, False, 3]]  # 1
  - [-1, 1, ResNetLayer, [256, 128, 2, False, 4]]  # 2
  - [-1, 1, ResNetLayer, [512, 256, 2, False, 23]]  # 3
  - [-1, 1, ResNetLayer, [1024, 512, 2, False, 3]]  # 4

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 3], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 7

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 2], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 10 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 7], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 13 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 4], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 16 (P5/32-large)

  - [[10, 13, 16], 1, Detect, [nc]]  # Detect(P3, P4, P5)