目标检测-空间金字塔池化及其变体

本文链接：https://blog.csdn.net/weixin_49824703/article/details/138199401

空间金字塔池化（Spatial Pyramid Pooling, SPP）及其变体是深度学习中用于目标检测任务的一系列技术，它们能够提高模型对不同尺度目标的识别能力。

1 SPP（Spatial Pyramid Pooling）

SPP是一种允许神经网络处理任意尺寸输入的技术，通过在不同尺度上对特征图进行池化操作，生成固定长度的特征表示，从而使得网络能够捕捉到多尺度的特征信息。SPP通过在特征图上应用不同窗口大小的最大池化，生成多个不同尺度的特征图，然后将这些特征图拼接起来形成最终的特征表示。

class SPP(nn.Module):
    # Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729
    def __init__(self, c1, c2, k=(5, 9, 13)):
        super().__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])

    def forward(self, x):
        x = self.cv1(x)
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
            return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))

2 SPPF（Spatial Pyramid Pooling - Fast）

SPPF是SPP的一个改进版本，由YOLOv5的作者Glenn Jocher提出，旨在减少计算量并提高模型的运行速度。SPPF使用单一的池化核大小，通过连续的池化操作来实现多尺度的特征融合。

class SPPF(nn.Module):
    # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
    def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))
        super().__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * 4, c2, 1, 1)
        self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)

    def forward(self, x):
        x = self.cv1(x)
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
            y1 = self.m(x)
            y2 = self.m(y1)
            return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))

3 SimSPPF（Simplified SPPF）

SimSPPF是SPPF的简化版本，由美团在YOLOv6中提出。它与SPPF的主要区别在于激活函数的使用，SimSPPF使用ReLU激活函数，而SPPF使用SiLU激活函数。

class SimConv(nn.Module):
    '''Normal Conv with ReLU activation'''
    def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False):
        super().__init__()
        padding = kernel_size // 2
        self.conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            groups=groups,
            bias=bi