CAM,sam,cbam注意力论文
时间: 2025-07-19 16:40:33 浏览: 5
### CAM、SAM 和 CBAM 的概述
在计算机视觉领域,注意力机制被广泛应用于提升模型性能。其中,Channel Attention Module (CAM)[^1]、Spatial Attention Module (SAM)[^2] 和 Channel-Spatial Attention Module (CBAM)[^3] 是三种重要的注意力模块。
#### Channel Attention Module (CAM)
CAM 主要关注通道维度上的特征重要性。通过全局平均池化(Global Average Pooling, GAP)和全局最大池化(Global Max Pooling, GMP),它能够捕捉到不同通道之间的依赖关系并生成相应的权重向量[^1]。这些权重随后用于重新校准输入特征图中的各个通道响应。
#### Spatial Attention Module (SAM)
相比之下,SAM 则侧重于空间维度的信息提取。该方法利用卷积操作融合水平方向与垂直方向的最大值及均值统计特性来构建一个二阶张量作为最终的空间注意映射矩阵[^2]。此映射可以增强目标区域内的像素强度而抑制背景噪声的影响。
#### Channel-Spatial Attention Module (CBAM)
CBAM 结合了上述两种思路的优点,在顺序上先执行 channel-wise attention 计算再进行 spatial-wise processing 处理流程设计[^3]。这种两阶段架构不仅简化了计算复杂度而且提高了检测精度效果显著优于单独使用的任一子组件形式。
以下是实现 CBAM 的 Python 代码示例:
```python
import torch.nn as nn
class BasicConv(nn.Module):
def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True, bn=True, bias=False):
super(BasicConv, self).__init__()
self.out_channels = out_planes
self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=bias)
self.bn = nn.BatchNorm2d(out_planes, eps=1e-5, momentum=0.01, affine=True) if bn else None
self.relu = nn.ReLU() if relu else None
def forward(self, x):
x = self.conv(x)
if self.bn is not None:
x = self.bn(x)
if self.relu is not None:
x = self.relu(x)
return x
class ChannelGate(nn.Module):
def __init__(self, gate_channels, reduction_ratio=16, pool_types=['avg', 'max']):
super(ChannelGate, self).__init__()
self.gate_channels = gate_channels
self.mlp = nn.Sequential(
nn.Linear(gate_channels, gate_channels // reduction_ratio),
nn.ReLU(),
nn.Linear(gate_channels // reduction_ratio, gate_channels)
)
self.pool_types = pool_types
def forward(self, x):
channel_att_sum = None
for pool_type in self.pool_types:
if pool_type=='avg':
avg_pool = F.avg_pool2d( x, (x.size(2), x.size(3)), stride=(x.size(2), x.size(3)))
channel_att_raw = self.mlp(avg_pool.view(avg_pool.size(0),-1))
elif pool_type=='max':
max_pool = F.max_pool2d( x, (x.size(2), x.size(3)), stride=(x.size(2), x.size(3)))
channel_att_raw = self.mlp(max_pool.view(max_pool.size(0),-1))
if channel_att_sum is None:
channel_att_sum = channel_att_raw
else:
channel_att_sum = channel_att_sum + channel_att_raw
scale = torch.sigmoid(channel_att_sum).unsqueeze(2).unsqueeze(3).expand_as(x)
return x * scale
def logsumexp_2d(tensor):
tensor_flatten = tensor.view(tensor.size(0), tensor.size(1), -1)
s, _ = torch.max(tensor_flatten, dim=2, keepdim=True)
outputs = s + (tensor_flatten - s).exp().sum(dim=2, keepdim=True).log()
return outputs
class SpatialGate(nn.Module):
def __init__(self):
super(SpatialGate, self).__init__()
kernel_size = 7
self.compress = ChannelPool()
self.spatial = BasicConv(2, 1, kernel_size, stride=1, padding=(kernel_size-1) // 2, relu=False)
def forward(self, x):
x_compress = self.compress(x)
x_out = self.spatial(x_compress)
scale = torch.sigmoid(x_out)
return x * scale
class CBAM(nn.Module):
def __init__(self, gate_channels, reduction_ratio=16, pool_types=['avg', 'max'], no_spatial=False):
super(CBAM, self).__init__()
self.ChannelGate = ChannelGate(gate_channels, reduction_ratio, pool_types)
self.no_spatial=no_spatial
if not no_spatial:
self.SpatialGate = SpatialGate()
def forward(self, x):
x_out = self.ChannelGate(x)
if not self.no_spatial:
x_out = self.SpatialGate(x_out)
return x_out
```
阅读全文
相关推荐

















