🚀🚀🚀本文改进: 将restnet50和restnet101作为backbone引入到YOLOv8,下表为参数量和计算量的对比
layers | parameters | gradients | GFLOPs | |
yolov8m | 295 | 25856899 | 25856883 | 79.1 |
yolov8l | 365 | 43630611 | 43630595 | 165.4 |
yolov8x | 365 | 68229648 | 68229632 | 258.5 |
yolov8-resnet50 | 350 | 26077907 | 26077891 | 74.3 |
yolov8-resnet101 | 554 | 45070035 | 45070019 | 135.2 |
🚀🚀🚀YOLOv8改进专栏:http://t.csdnimg.cn/hGhVK
学姐带你学习YOLOv8,从入门到创新,轻轻松松搞定科研;
1.Resnet原理介绍
论文:https://arxiv.org/pdf/1512.03385.pdf
摘要:深度神经网络更难训练。我们提出了一个残差学习框架,以简化比以前使用的网络深度大得多的网络的训练。我们明确地将层重新表述为参考层输入的学习残差函数,而不是学习未参考的函数。我们提供了全面的经验证据,表明这些残差网络更容易优化,并且可以从相当大的深度中获得精度。在ImageNet数据集上,我们评估了深度高达152层的残差网络——比VGG网络深8倍,但仍然具有较低的复杂性。这些残差网络的集合在ImageNet测试集上的误差达到3.57%。该结果在ILSVRC 2015分类任务中获得第一名。我们还介绍了100层和1000层的CIFAR-10分析。
残差结构:整个模块除了正常的卷积层输出外,还有一个分支把输入直接连到输出上,该分支输出和卷积的输出做算术相加得到最终的输出,用公式表达就是H(x)=F(x)+x,其中x是输入,F(x)是卷积分支的输出,H(x)是整个结构的输出。可以证明如果F(x)分支中所有参数都是0,H(x)=x,即H(x)与x为恒等映射。
2.源码实现
2.1 ResNetLayer实现位置在ultralytics/nn/modules/block.py
温馨提醒:最新版本源码已经自带ResNetLayer
class ResNetLayer(nn.Module):
"""ResNet layer with multiple ResNet blocks."""
def __init__(self, c1, c2, s=1, is_first=False, n=1, e=4):
"""Initializes the ResNetLayer given arguments."""
super().__init__()
self.is_first = is_first
if self.is_first:
self.layer = nn.Sequential(Conv(c1, c2, k=7, s=2, p=3, act=True),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
else:
blocks = [ResNetBlock(c1, c2, s, e=e)]
blocks.extend([ResNetBlock(e * c2, c2, 1, e=e) for _ in range(n - 1)])
self.layer = nn.Sequential(*blocks)
def forward(self, x):
"""Forward pass through the ResNet layer."""
return self.layer(x)
2.2本文重点,如何将resnet50、resnet101作为backbone在YOLOv8中使用
2.2.1 yolov8-resnet50.yaml
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, ResNetLayer, [3, 64, 1, True, 1]] # 0
- [-1, 1, ResNetLayer, [64, 64, 1, False, 3]] # 1
- [-1, 1, ResNetLayer, [256, 128, 2, False, 4]] # 2
- [-1, 1, ResNetLayer, [512, 256, 2, False, 6]] # 3
- [-1, 1, ResNetLayer, [1024, 512, 2, False, 3]] # 4
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 3], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f, [512]] # 7
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 2], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f, [256]] # 10 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 7], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]] # 13 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 4], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]] # 16 (P5/32-large)
- [[10, 13, 16], 1, Detect, [nc]] # Detect(P3, P4, P5)
2.2.2 yolov8-resnet101.yaml
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, ResNetLayer, [3, 64, 1, True, 1]] # 0
- [-1, 1, ResNetLayer, [64, 64, 1, False, 3]] # 1
- [-1, 1, ResNetLayer, [256, 128, 2, False, 4]] # 2
- [-1, 1, ResNetLayer, [512, 256, 2, False, 23]] # 3
- [-1, 1, ResNetLayer, [1024, 512, 2, False, 3]] # 4
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 3], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f, [512]] # 7
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 2], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f, [256]] # 10 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 7], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]] # 13 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 4], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]] # 16 (P5/32-large)
- [[10, 13, 16], 1, Detect, [nc]] # Detect(P3, P4, P5)