别再炼丹了！给你的YOLO装上“全局大脑”，性能一键起飞！-CSDN博客

YOLO-RD模块

论文《YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary》
论文地址: https://arxiv.org/abs/2410.15346
发表期刊: ICLR 2025
深度交流Q裙：607618759
全网同名【大嘴带你水论文】 B站定时发布详细讲解视频
详细代码见文章最后

在这里插入图片描述

1、作用

YOLO-RD是一个革命性的检索-字典(Retriever-Dictionary)模块，专门解决现有目标检测模型过度关注当前输入而忽略整个数据集信息的问题。该模块通过构建包含数据集洞察的字典，使YOLO模型能够高效检索特征，显著提升检测性能。RD模块可以利用视觉模型(VM)、大语言模型(LLM)或视觉语言模型(VLM)的知识来构建字典，实现从像素级到图像级的多任务增强，包括分割、检测和分类。实验表明，使用RD模块可以在参数增加不到1%的情况下，使目标检测的平均精度提升超过3%，同时该模块还能改善Faster R-CNN和Deformable DETR等二阶段模型和基于DETR架构的效果。

在这里插入图片描述

图1. YOLO-RD的检索-字典模块整体架构

2、机制

在这里插入图片描述

图2. RD模块的完整工作流程图，清晰展示了三大核心机制

1、检索器核心(Retriever Core)：

检索器核心由系数生成器(Coefficient Generator, G)和全局信息交换器(Global Information Exchanger, E)两部分组成。系数生成器G: ℝ^(f×W×H) → ℝ^{(N×W×H)基于输入特征图X计算粗糙系数，通过投影矩阵W}G ∈ ℝ^(N×f)实现：Y = G(X) = W^G · X_{w,h}。全局信息交换器E: ℝ^(N×W×H) → ℝ^{(N×W×H)使用深度卷积滤波器W}E ∈ ℝ^(N×1×k×k)细化和交换相邻像素间的信息：E(Y) = W^{E{(i)}} * Y^{(i)}。

2、字典初始化与归一化：

字典D由N个原子α组成，每个原子都是ℝ^f向量。通过选定的编码器将整个数据集映射到高维空间，然后使用k-means聚类选择代表性向量作为字典原子。为防止系数向量c简单复制输入特征，采用位置归一化(PONO)：PONO(X) = (X - μ_c)/√(σ_c + ε) · γ + β，其中μ_c和σ_c分别是均值和方差。每个原子在训练期间被归一化为单位长度：|α_i| = 1。

在这里插入图片描述

图3. 字典从整个数据集中提炼知识的概念图

3、特征融合与残差连接：

最终输出通过加权求和字典原子并与输入残差集成产生：Y_{h,w} = λ · X_{h,w} + (1-λ) · Σ_{i=1}^N c’_{i,h,w} · α_i，其中λ是残差权重，c’是归一化后的系数。整个RD过程可表示为：Z = RD(X) = λ · X + (1-λ) · PONO(E(G(X))) * WN(D)，其中WN(D)表示权重归一化的字典原子集合。

3、代码

完整代码见gitcode地址：https://gitcode.com/2301_80107842/research

import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Optional

def auto_pad(kernel_size, padding=None, dilation=1):
    """自动计算padding"""
    if dilation > 1:
        kernel_size = dilation * (kernel_size - 1) + 1 if isinstance(kernel_size, int) else [dilation * (x - 1) + 1 for x in kernel_size]
    if padding is None:
        padding = kernel_size // 2 if isinstance(kernel_size, int) else [x // 2 for x in kernel_size]
    return padding

class Conv(nn.Module):
    """基础卷积块"""
    def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=None, groups=1, activation=True):
        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride,
                             auto_pad(kernel_size, padding), groups=groups, bias=False)
        self.bn = nn.BatchNorm2d(out_channels, eps=1e-3, momentum=3e-2)
        self.act = nn.SiLU() if activation else nn.Identity()

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

class DConv(nn.Module):
    """字典卷积模块"""
    def __init__(self, in_channels=512, alpha=0.8, atoms=512):
        super().__init__()
        self.alpha = alpha  # 残差权重

        # 系数生成器 (Coefficient Generator)
        self.CG = Conv(in_channels, atoms, 1)

        # 全局信息交换器 (Global Information Exchanger)
        self.GIE = Conv(atoms, atoms, 5, groups=atoms, activation=False)

        # 字典投影 (Dictionary projection)
        self.D = Conv(atoms, in_channels, 1, activation=False)

    def PONO(self, x):
        """位置归一化"""
        mean = x.mean(dim=1, keepdim=True)
        std = x.std(dim=1, keepdim=True)
        x = (x - mean) / (std + 1e-5)
        return x

    def forward(self, r):
        """
        前向传播
        Args:
            r: 输入特征 (B, C, H, W)
        Returns:
            增强后的特征 (B, C, H, W)
        """
        # 步骤1: 系数生成 G(X)
        x = self.CG(r)

        # 步骤2: 全局信息交换 E(G(X))
        x = self.GIE(x)

        # 步骤3: 位置归一化 PONO(E(G(X)))
        x = self.PONO(x)

        # 步骤4: 字典投影
        x = self.D(x)

        # 步骤5: 残差连接
        return self.alpha * x + (1 - self.alpha) * r


class RepNCSPELAN(nn.Module):
    """RepNCSPELAN基础块"""
    def __init__(self, in_channels, out_channels, csp_expand=0.5, repeat_num=1):
        super().__init__()
        neck_channels = int(out_channels * csp_expand)

        # 主分支卷积
        self.conv1 = Conv(in_channels, neck_channels, 1)
        self.conv2 = Conv(in_channels, neck_channels, 1)
        self.conv3 = Conv(2 * neck_channels, out_channels, 1)

        # Bottleneck序列
        self.bottleneck = nn.Sequential(*[
            self._make_bottleneck(neck_channels) for _ in range(repeat_num)
        ])

    def _make_bottleneck(self, channels):
        """创建Bottleneck块"""
        return nn.Sequential(
            Conv(channels, channels, 3),
            Conv(channels, channels, 3)
        )

    def forward(self, x):
        """前向传播"""
        x1 = self.bottleneck(self.conv1(x))
        x2 = self.conv2(x)
        return self.conv3(torch.cat((x1, x2), dim=1))


class RepNCSPELAND(RepNCSPELAN):
    """带RD模块的RepNCSPELAN"""
    def __init__(self, in_channels, out_channels, atoms=512, alpha=0.8, **kwargs):
        super().__init__(in_channels, out_channels, **kwargs)
        # 添加DConv模块
        self.dconv = DConv(in_channels=out_channels, atoms=atoms, alpha=alpha)

    def forward(self, x):
        """前向传播"""
        # 先通过基础RepNCSPELAN
        x = super().forward(x)
        # 再通过RD模块增强
        return self.dconv(x)


class YOLORDBlock(nn.Module):
    """YOLO-RD增强块"""
    def __init__(self, in_channels, out_channels, atoms=512, alpha=0.8, use_rd=True):
        super().__init__()
        self.use_rd = use_rd

        # 基础RepNCSPELAN块
        self.base_block = RepNCSPELAN(in_channels, out_channels)

        # RD模块
        if use_rd:
            self.rd_module = DConv(in_channels=out_channels, atoms=atoms, alpha=alpha)

    def forward(self, x):
        """前向传播"""
        # 基础特征提取
        x = self.base_block(x)

        # RD增强
        if self.use_rd:
            x = self.rd_module(x)

        return x


class ImplicitA(nn.Module):
    """隐式知识模块A（加法）"""
    def __init__(self, channel, mean=0.0, std=0.02):
        super().__init__()
        self.channel = channel
        self.mean = mean
        self.std = std

        self.implicit = nn.Parameter(torch.empty(1, channel, 1, 1))
        nn.init.normal_(self.implicit, mean=self.mean, std=self.std)

    def forward(self, x):
        return self.implicit + x


class ImplicitM(nn.Module):
    """隐式知识模块M（乘法）"""
    def __init__(self, channel, mean=1.0, std=0.02):
        super().__init__()
        self.channel = channel
        self.mean = mean
        self.std = std

        self.implicit = nn.Parameter(torch.empty(1, channel, 1, 1))
        nn.init.normal_(self.implicit, mean=self.mean, std=self.std)

    def forward(self, x):
        return self.implicit * x


# 测试代码
if __name__ == '__main__':
    # 创建测试输入：(批次大小, 通道数, 高度, 宽度)
    input_tensor = torch.rand(2, 256, 56, 56)

    # 创建YOLO-RD模块
    model = YOLORDBlock(in_channels=256, out_channels=256, atoms=512, alpha=0.8, use_rd=True)

    # 前向传播
    output = model(input_tensor)

    # 打印输入输出尺寸
    print('输入尺寸:', input_tensor.size())
    print('输出尺寸:', output.size())
    print('参数数量:', sum(p.numel() for p in model.parameters()) / 1e6, 'M')
    print('字典原子数量:', model.rd_module.atoms if model.use_rd else 'N/A')
    print('残差权重α:', model.rd_module.alpha if model.use_rd else 'N/A')

    # 测试RepNCSPELAND（完整版本）
    print('\n--- 测试RepNCSPELAND ---')
    model_full = RepNCSPELAND(in_channels=256, out_channels=256, atoms=512, alpha=0.8)
    output_full = model_full(input_tensor)
    print('RepNCSPELAND输出尺寸:', output_full.size())
    print('RepNCSPELAND参数数量:', sum(p.numel() for p in model_full.parameters()) / 1e6, 'M')