yolov8新版注意力机制
时间: 2025-04-30 15:43:18 浏览: 22
### YOLOv8 新版 注意力机制 介绍
YOLOv8 是一种先进的实时对象检测模型,在其最新版本中引入了多种注意力机制来提升特征提取的效果。通过加入不同的注意力模块,可以显著提高模型对于复杂场景下的目标识别精度。
#### SEAttention (Squeeze-and-Excitation Attention)
SEAttention是一种通道级别的注意力建模方法,它通过对不同卷积层输出的特征图施加权重调整,使得重要的特征得到强化而不重要的特征受到抑制。具体来说,该机制先计算全局平均池化后的向量表示,再经过两个全连接层映射成比例因子作用于原始输入上[^1]。
```python
import torch.nn as nn
class SEAttention(nn.Module):
def __init__(self, channel=512,reduction=16):
super().__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)
def forward(self,x):
b,c,_,_ = x.size()
y = self.avg_pool(x).view(b,c)
y = self.fc(y).view(b,c,1,1)
return x * y.expand_as(x)
```
#### CBAM (Convolutional Block Attention Module)
CBAM不仅考虑了空间维度上的重要性分布也关注到了各个channel之间的关联程度。此模块由两部分组成:Channel attention module 和 Spatial attention module 。前者负责捕捉跨channels的信息交互;后者则用于感知图像内局部区域的重要性差异[^2]。
```python
from functools import partial
def conv(in_planes, out_planes, kernel_size=3, stride=1, padding=1, dilation=1,
freeze_bn=False):
if freeze_bn:
return nn.Sequential(
nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, bias=True),
FrozenBatchNorm2d(out_planes),
nn.ReLU(inplace=True))
else:
return nn.Sequential(
nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, bias=True),
nn.BatchNorm2d(out_planes),
nn.ReLU(inplace=True))
class ChannelGate(nn.Module):
def __init__(self, gate_channels, reduction_ratio=16, pool_types=['avg', 'max']):
super(ChannelGate, self).__init__()
self.gate_channels = gate_channels
self.mlp = nn.Sequential(
Flatten(),
nn.Linear(gate_channels, gate_channels // reduction_ratio),
nn.ReLU(),
nn.Linear(gate_channels // reduction_ratio, gate_channels)
)
self.pool_types = pool_types
def forward(self, x):
channel_att_sum = None
for pool_type in self.pool_types:
if pool_type=='avg':
avg_pool = F.avg_pool2d( x, (x.size(2), x.size(3)), stride=(x.size(2), x.size(3)))
channel_att_raw = self.mlp(avg_pool)
elif pool_type=='max':
max_pool = F.max_pool2d( x, (x.size(2), x.size(3)), stride=(x.size(2), x.size(3)))
channel_att_raw = self.mlp(max_pool)
if channel_att_sum is None:
channel_att_sum = channel_att_raw
else:
channel_att_sum = channel_att_sum + channel_att_raw
scale = torch.sigmoid( channel_att_sum ).unsqueeze(2).unsqueeze(3).expand_as(x)
return x * scale
class SpatialGate(nn.Module):
def __init__(self):
super(SpatialGate, self).__init__()
kernel_size = 7
self.compress = ChannelPool()
self.spatial = BasicConv(2, 1, kernel_size, stride=1, padding=(kernel_size-1) // 2, relu=False)
def forward(self, x):
x_compress = self.compress(x)
x_out = self.spatial(x_compress)
scale = torch.sigmoid(x_out) # broadcasting
return x * scale
class CBAM(nn.Module):
def __init__(self, gate_channels, reduction_ratio=16, pool_types=['avg', 'max'], no_spatial=False):
super(CBAM, self).__init__()
self.ChannelGate = ChannelGate(gate_channels, reduction_ratio, pool_types)
self.no_spatial=no_spatial
if not no_spatial:
self.SpatialGate = SpatialGate()
def forward(self, x):
x_out = self.ChannelGate(x)
if not self.no_spatial:
x_out = self.SpatialGate(x_out)
return x_out
```
阅读全文
相关推荐


















