Faster Rcnn源码阅读之RPN-Region Proposal Network-CSDN博客

本文链接：https://blog.csdn.net/luodezhao123/article/details/86508370

本文深入剖析Faster R-CNN中的Region Proposal Network（RPN），从generate_anchors.py到rpn的完整流程。讲解了生成Anchors、bbox_transform、anchor_target_layer、proposal_layer、proposal_target_layer的功能与实现，详细阐述了各层的输入输出，以及如何生成最终的RoIs和损失函数。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

此层的代码位于lib/model/rpn下

Region Proposal Network 主要用于生成 Region Proposal

我将从每一个文件的输入输出，以及每一层的输入输出，去梳理这一层具体的实现流程

1，generate_anchors.py

Anchors
实际上就是矩形区域，由generate_anchors.py生成。
直接执行

python generate_anchors.py

可生成如下数据

[[ -84. -40. 99. 55.]
[-176. -88. 191. 103.]
[-360. -184. 375. 199.]
[ -56. -56. 71. 71.]
[-120. -120. 135. 135.]
[-248. -248. 263. 263.]
[ -36. -80. 51. 95.]
[ -80. -168. 95. 183.]
[-168. -344. 183. 359.]]

里面代表9个anchor的坐标，分别有左上的x,y和右下的x,y表示

那直接上代码吧。

这里的base_size是16，用于生成一个基准的anchor
ration 是anchor的三种形状比例
scale是将基准的achor分别放大 8 ，16，32倍
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2**np.arange(3, 6)):
    #代码中由有两种表示anchor的形式。一种是记录坐上角的坐标和长宽。一种是只记录左上角和做下脚的坐标
    #基准anchor，将生成（0，0，15，15）
    base_anchor = np.array([1, 1, base_size, base_size]) - 1
    #基于base_anchor,做比例变换，返回不同长宽比的anchor（三个）
    ratio_anchors = _ratio_enum(base_anchor, ratios)
    #三种不同长宽比的anchor分别扩大（8，16，32）倍
    #生成9个候选框
    #生成的候选框，表示为：左上角的x,y. 右上角的x，y
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in xrange(ratio_anchors.shape[0])])
    return anchors
#返回anchor的长宽和中心坐标
def _whctrs(anchor):
    w = anchor[2] - anchor[0] + 1
    h = anchor[3] - anchor[1] + 1
    x_ctr = anchor[0] + 0.5 * (w - 1)
    y_ctr = anchor[1] + 0.5 * (h - 1)
    return w, h, x_ctr, y_ctr
#给定中心坐标和长宽，返回anchor（左上角坐标和右下角坐标）
def _mkanchors(ws, hs, x_ctr, y_ctr):

    ws = ws[:, np.newaxis]
    hs = hs[:, np.newaxis]
    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
                         y_ctr - 0.5 * (hs - 1),
                         x_ctr + 0.5 * (ws - 1),
                         y_ctr + 0.5 * (hs - 1)))
    return anchors
    计算不同长宽比下的anchor坐标
    def _ratio_enum(anchor, ratios):
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    size = w * h
    #不同比例下，需要的size
    size_ratios = size / ratios
    ws = np.round(np.sqrt(size_ratios))
    hs = np.round(ws * ratios)
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors
    #基于某个acnhor ，扩大scale倍
    def _scale_enum(anchor, scales):
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    ws = w * scales
    hs = h * scales
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

到此， generate_anchors.py的源码阅读就已经结束了。

总结如下
1）生成一个基准anchor（0，0，15，15）
2）基于当前anchor生成三种不同比例（0.5，1，2）的anchor.现在有三个anchor
3)三个anchor分别扩大8，16，32倍。
因为上一层传入的特征大小为512的，所以anchor最大为16*32

2，bbox_transform.py

这个类主要进行一些坐标转换和其他层需要的数据的计算。下面主要列举两个函数

    #计算box和anchor的IoU值
    #anchors为N个四维的向量
    #gc_boxes为K个四维的向量
    #overlapsN个k维的向量，描述anchor与每个box的IoU
    def bbox_overlaps(anchors, gt_boxes):
        
    #将框的表达表示为，左上角坐标x,y和宽de，长dh
    #计算anchor和其对应框的偏移量
    def bbox_transform(ex_rois, gt_rois):

3，anchor_target_layer.py

class _AnchorTargetLayer(nn.Module):
    """
        Assign anchors to ground-truth targets. Produces anchor classification
        labels and bounding-box regression targets.
    """
    def __init__(self, feat_stride, scales, ratios):
        super(_AnchorTargetLayer, self).__init__()
        #_feat_stride 与原图相比，特征图缩小的比例
        self._feat_stride = feat_stride
        self._scales = scales
        #用于生成anchors
        anchor_scales = scales
        #生成了9个anchors
        self._anchors = torch.from_numpy(generate_anchors(scales=np.array(anchor_scales), ratios=np.array(ratios))).float()
        #acnhor的个数，为9
        self._num_anchors = self._anchors.size(0)

        # allow boxes to sit over the edge by a small amount
        #anchor超出边界的限度
        self._allowed_border = 0  # default is 0

#前向算法，为每一个cell都生成9个anchors
    #input为上一层传入的
    def forward(self, input):
        # Algorithm:
        #
        # for each (H, W) location i
        #   generate 9 anchor boxes centered on cell i
        #   apply predicted bbox deltas at cell i to each of the 9 anchors
        # filter out-of-image anchors
        #网络预置的分类分数
        rpn_cls_score = input[0]
        #标注
        gt_boxes = input[1]
        #图像尺寸
        im_info = input[2]
        num_boxes = input[3]

        # map of shape (..., H, W)
        #获取特征图的高宽
        height, width = rpn_cls_score.size(2), rpn_cls_score.size(3)
        #有几个batch
        batch_size = gt_boxes.size(0)
        #获取特征图的高宽
        feat_height, feat_width = rpn_cls_score.size(2), rpn_cls_score.size(3)
        #下面生成anchor
        #要乘一个feat_stride
        #所以，生成的anchor是对应原图的。
        #下面是在原图上生成anchor
        shift_x = np.arange(0, feat_width) * self._feat_stride
        shift_y = np.arange(0, feat_height) * self._feat_stride
        #生成网格x，y为网格点的坐标
        shift_x, shift_y = np.meshgrid(shift_x, shift_y)
        shifts = torch.from_numpy(np.vstack((shift_x.ravel(), shift_y.ravel(),
                                  shift_x.ravel(), shift_y.ravel())).transpose())
        shifts = shifts.contiguous().type_as(rpn_cls_score).float()

        #A=9
        A = self._num_anchors
        #特征图的大小
        K = shifts.size(0)

        self._anchors = self._anchors.type_as(gt_boxes) # move to specific gpu.
        all_anchors = self._anchors.view(1, A, 4) + shifts.view(K, 1, 4)
        all_anchors = all_anchors.view(K * A, 4)
        #记录所有点
        total_anchors = int(K * A)
        #确保anchor没有超过图像边界
        keep = ((all_anchors[:, 0] >= -self._allowed_border) &
                (all_anchors[:, 1] >= -self._allowed_border) &
                (all_anchors[:, 2] < long(im_info[0][1]) + self._allowed_border) &
                (all_anchors[:, 3] < long(im_info[0][0]) + self._allowed_border))

        inds_inside = torch.nonzero(keep).view(-1)

        # keep only inside anchors
        #只保留在图像中的anchor
        anchors = all_anchors[inds_inside, :]

        # label: 1 is positive, 0 is negative, -1 is dont care
        #labels：合法的anchor的个数
        #并用-1填充labels
        labels = gt_boxes.new(batch_size, inds_inside.size(0)).fill_(-1)
        #使用0初始化前景分数
        bbox_inside_weights = gt_boxes.new(batch_size, inds_inside.size(0)).zero_()
        #使用0初始化背景分数
        bbox_outside_weights = gt_boxes.new(batch_size, inds_inside.size(0)).zero_()
        #计算anchor和标注数据的重合度
        overlaps = bbox_overlaps_batch(anchors, gt_boxes)
        #每个anchor，找到对应的iou最大的gt_box
        max_overlaps, argmax_overlaps = torch.max(overlaps, 2)
        #每个gt_box, 找到最大的anchor
        gt_max_overlaps, _ = torch.max(overlaps, 1)
        if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
            #将小于iou值小于0.3的anchor置分数为0
            labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
        #置为0.00001
        gt_max_overlaps[gt_max_overlaps==0] = 1e-5
        keep = torch.sum(overlaps.eq(gt_max_overlaps.view(batch_size,1,-1).expand_as(overlaps)), 2)

        if torch.sum(keep) > 0:
            labels[keep>0] = 1

        # fg label: above threshold IOU
        #将大于阀值的anchor的分数置1
        labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1

        if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
            #小于某个分数置0
            labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

        #计算出一个训练batch中需要的前景数量
        num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
        #分别找出被置为前景和背景的anchor
        sum_fg = torch.sum((labels == 1).int(), 1)
        sum_bg = torch.sum((labels == 0).int(), 1)

        for i in range(batch_size):
            #如果前景大于我们所需要的，则抛弃一些前景
            # subsample positive labels if we have too many
            if sum_fg[i] > num_fg:
                fg_inds = torch.nonzero(labels[i] == 1).view(-1)
                # torch.randperm seems has a bug on multi-gpu setting that cause the segfault.
                # See https://github.com/pytorch/pytorch/issues/1868 for more details.
                # use numpy instead.
                #rand_num = torch.randperm(fg_inds.size(0)).type_as(gt_boxes).long()
                rand_num = torch.from_numpy(np.random.permutation(fg_inds.size(0))).type_as(gt_boxes).long()
                disable_inds = fg_inds[rand_num[:fg_inds.size(0)-num_fg]]
                labels[i][disable_inds] = -1

#           num_bg = cfg.TRAIN.RPN_BATCHSIZE - sum_fg[i]
            num_bg = cfg.TRAIN.RPN_BATCHSIZE - torch.sum((labels == 1).int(), 1)[i]

            # subsample negative labels if we have too many
            #如果背景大于我们所需要的，则抛弃一些背景

            if sum_bg[i] > num_bg:
                bg_inds = torch.nonzero(labels[i] == 0).view(-1)
                #rand_num = torch.randperm(bg_inds.size(0)).type_as(gt_boxes).long()

                rand_num = torch.from_numpy(np.random.permutation(bg_inds.size(0))).type_as(gt_boxes).long()
                disable_inds = bg_inds[rand_num[:bg_inds.size(0)-num_bg]]
                labels[i][disable_inds] = -1

        offset = torch.arange(0, batch_size)*gt_boxes.size(1)

        argmax_overlaps = argmax_overlaps + offset.view(batch_size, 1).type_as(argmax_overlaps)
        #对于每个anchor，找到变换到对应的最大的overlap的gt_box的四个值
        bbox_targets = _compute_targets_batch(anchors, gt_boxes.view(-1,5)[argmax_overlaps.view(-1), :].view(batch_size, -1, 5))

        # use a single value instead of 4 values for easy index.
        #给前景anchor赋权重
        bbox_inside_weights[labels==1] = cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS[0]

        if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
            num_examples = torch.sum(labels[i] >= 0)
            positive_weights = 1.0 / num_examples.item()
            negative_weights = 1.0 / num_examples.item()
        else:
            assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
                    (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
        #将计算得到的positive 和negative的值
        bbox_outside_weights[labels == 1] = positive_weights
        bbox_outside_weights[labels == 0] = negative_weights
        #讲内部的anchor映射到总的anchor中去
        labels = _unmap(labels, total_anchors, inds_inside, batch_size, fill=-1)
        bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, batch_size, fill=0)
        bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, batch_size, fill=0)
        bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, batch_size, fill=0)

        outputs = []

        labels = labels.view(batch_size, height, width, A).permute(0,3,1,2).contiguous()
        #讲anchor类别的形状设置以下形状
        labels = labels.view(batch_size, 1, A * height, width)
        outputs.append(labels)

        #reshape，anchor的位置映射数组的形状
        bbox_targets = bbox_targets.view(batch_size, height, width, A*4).permute(0,3,1,2).contiguous()
        outputs.append(bbox_targets)


        #以下分别重置设置anchor的inside_weight  和outside_weight 数组形状
        anchors_count = bbox_inside_weights.size(1)
        bbox_inside_weights = bbox_inside_weights.view(batch_size,anchors_count,1).expand(batch_size, anchors_count, 4)

        bbox_inside_weights = bbox_inside_weights.contiguous().view(batch_size, height, width, 4*A)\
                            .permute(0,3,1,2).contiguous()

        outputs.append(bbox_inside_weights)

        bbox_outside_weights = bbox_outside_weights.view(batch_size,anchors_count,1).expand(batch_size, anchors_count, 4)
        bbox_outside_weights = bbox_outside_weights.contiguous().view(batch_size, height, width, 4*A)\
                            .permute(0,3,1,2).contiguous()
        outputs.append(bbox_outside_weights)
##bbox_inside_weight
        ##和outside_weight 是干嘛的
        return outputs

    def backward(self, top, propagate_down, bottom):
        """This layer does not propagate gradients."""
        pass

    def reshape(self, bottom, top):
        """Reshaping happens during the call to forward."""
        pass

def _unmap(data, count, inds, batch_size, fill=0):
    """ Unmap a subset of item (data) back to the original set of items (of
    size count) """

    if data.dim() == 2:
        ret = torch.Tensor(batch_size, count).fill_(fill).type_as(data)
        ret[:, inds] = data
    else:
        ret = torch.Tensor(batch_size, count, data.size(2)).fill_(fill).type_as(data)
        ret[:, inds,:] = data
    return ret


def _compute_targets_batch(ex_rois, gt_rois):
    """Compute bounding-box regression targets for an image."""

    return bbox_transform_batch(ex_rois, gt_rois[:, :, :4])

对于anchor_target_layer 总结如下：

该层的输入:
1,rpn_cls_score 18 维，为9个anchor做的二分类。（是前景还是背景的概率）。前九个为各自的前景分数，后九个为各自的背景分数。是网络判断的分类分数。这一层用这个数据的主要目的是获取，里面数据的维数。获取特征信息。
2，gt_boxes 标注数据
3，im_info 图片信息（长宽）
该层的输出：
forward中的代码主要作用是为每一个cell都生成9个anchors，会丢掉超出图像的anchors。
1，gt_max_overlaps，每个gt_box最大IoU的anchor
2，max_overlaps每个anchor最大IoU的gt_box
3，labels 记录每个anchor的分数1为前景IoU>0.7，0为背景IoU<0.3，-1.IoU大于0.3小于0.7的anchor，需要被忽略。
4,bbox_target为每个anchor对应最大IoU的ground truth的偏移置。( 在外层用于和系统预测的偏移做回归)
5,bbox_outside_weights//该框内有前景或者背景为1
6,bbox_inside_weights//该框内，有前景为1

接下来：

4，proposal_layer.py

先放出他的forward函数

  def forward(self, input):
        #
        # for each (H, W) location i
        #   generate A anchor boxes centered on cell i
        #   apply predicted bbox deltas at cell i to each of the A anchors
        # clip predicted boxes to image
        # remove predicted boxes with either height or width < threshold
        # sort all (proposal, score) pairs by score from highest to lowest
        # take top pre_nms_topN proposals before NMS
        # apply NMS with threshold 0.7 to remaining proposals
        # take after_nms_topN proposals after NMS
        # return the top proposals (-> RoIs top, scores top)
        # the first set of _num_anchors channels are bg probs
        # the second set are the fg probs
        #前九个（num_anchors）是属于前景的分数
        #后九个是属于背景的分数
        scores = input[0][:, self._num_anchors:, :, :]
        #各个框的坐标变换信息
        bbox_deltas = input[1]
        im_info = input[2]
        cfg_key = input[3]
        pre_nms_topN  = cfg[cfg_key].RPN_PRE_NMS_TOP_N
        post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
        nms_thresh    = cfg[cfg_key].RPN_NMS_THRESH
        min_size      = cfg[cfg_key].RPN_MIN_SIZE
        batch_size = bbox_deltas.size(0)
        feat_height, feat_width = scores.size(2), scores.size(3)
        #网格那一套
        shift_x = np.arange(0, feat_width) * self._feat_stride
        shift_y = np.arange(0, feat_height) * self._feat_stride
        shift_x, shift_y = np.meshgrid(shift_x, shift_y)
        shifts = torch.from_numpy(np.vstack((shift_x.ravel(), shift_y.ravel(),
                                  shift_x.ravel(), shift_y.ravel())).transpose())
        shifts = shifts.contiguous().type_as(scores).float()
        A = self._num_anchors#9
        K = shifts.size(0)
        self._anchors = self._anchors.type_as(scores)
        # anchors = self._anchors.view(1, A, 4) + shifts.view(1, K, 4).permute(1, 0, 2).contiguous()
        #得到所有初始框
        anchors = self._anchors.view(1, A, 4) + shifts.view(K, 1, 4)
        #改变纬度
        anchors = anchors.view(1, K * A, 4).expand(batch_size, K * A, 4)
        # Transpose and reshape predicted bbox transformations to get them
        # into the same order as the anchors:
        #改变acnhor表示方法，
        bbox_deltas = bbox_deltas.permute(0, 2, 3, 1).contiguous()
        #变换维度
        bbox_deltas = bbox_deltas.view(batch_size, -1, 4)
        # Same story for the scores:
        #对score做同样操作
        scores = scores.permute(0, 2, 3, 1).contiguous()
        scores = scores.view(batch_size, -1)
        # Convert anchors into proposals via bbox transformations
        #得到变换后的proposal
        proposals = bbox_transform_inv(anchors, bbox_deltas, batch_size)
        # 2. clip predicted boxes to image
        #对超出边界的box进行剪裁
        proposals = clip_boxes(proposals, im_info, batch_size)
        # proposals = clip_boxes_batch(proposals, im_info, batch_size)
        # assign the score to 0 if it's non keep.
        # keep = self._filter_boxes(proposals, min_size * im_info[:, 2])
        # trim keep index to make it euqal over batch
        # keep_idx = torch.cat(tuple(keep_idx), 0)
        # scores_keep = scores.view(-1)[keep_idx].view(batch_size, trim_size)
        # proposals_keep = proposals.view(-1, 4)[keep_idx, :].contiguous().view(batch_size, trim_size, 4)
        # _, order = torch.sort(scores_keep, 1, True)
        scores_keep = scores
        proposals_keep = proposals
        _, order = torch.sort(scores_keep, 1, True)
        output = scores.new(batch_size, post_nms_topN, 5).zero_()
        for i in range(batch_size):
            # # 3. remove predicted boxes with either height or width < threshold
            # # (NOTE: convert min_size to input image scale stored in im_info[2])
            #移除box太高或太低的
            proposals_single = proposals_keep[i]
            scores_single = scores_keep[i]
            # # 4. sort all (proposal, score) pairs by score from highest to lowest
            # # 5. take top pre_nms_topN (e.g. 6000)
            order_single = order[i]
            if pre_nms_topN > 0 and pre_nms_topN < scores_keep.numel():
                #保留前景分数在前1000名的框
                order_single = order_single[:pre_nms_topN]
            proposals_single = proposals_single[order_single, :]
            scores_single = scores_single[order_single].view(-1,1)
            # 6. apply nms (e.g. threshold = 0.7)
            # 7. take after_nms_topN (e.g. 300)
            # 8. return the top proposals (-> RoIs top)
                    keep_idx_i = nms(torch.cat((proposals_single, scores_single), 1), nms_thresh, force_cpu=not cfg.USE_GPU_NMS)
            keep_idx_i = keep_idx_i.long().view(-1)
                #用非最大抑制排除重复框
            if post_nms_topN > 0:
                keep_idx_i = keep_idx_i[:post_nms_topN]
            proposals_single = proposals_single[keep_idx_i, :]
            scores_single = scores_single[keep_idx_i, :]
            # padding 0 at the end.
            num_proposal = proposals_single.size(0)
            output[i,:,0] = i
            output[i,:num_proposal,1:] = proposals_single
        return output

对这一层的总结如下：

输入：
1,score：每个框的前景后景的分数
2,bbox_deltas：框的偏移信息
3,imginfo：图片信息
4,cfg_key:是否在训练
过程：
先将框进行偏移修正。
通过非最大抑制，和前景的分数排序。
选定最终的预测框。
输出：pre_nms_topN个最终的框

5，proposal_target_layer

这一层的目的是：将上一层选出来的框，获取对应gtbox的分类。

5.1 先看_sample_rois_pytorch 函数

输入是：
1，all_rois:选出来的框
2，gt_box:标注数据
3，fg_rois_per_image：每张图片最多的前景框数
4，rois_per_image：每一张图片理论的rois数量
过程：
通过计算anchor对应IoU最大的bbox，找到该bbox的类别，则该anchor属于该类
输出：
1，labels_batch ,每个anchor对应gt_box所归属的类别
2，rois_batch:所有框
3，bbox_targets：rois对应的bbox的偏移量
4，bbox_inside_weights

5.2 _compute_targets_pytorch

输入：
1，ex_rois,预测框
3，gt_rois，标注框
输出：
targets,内容为rois和其对应bbox的偏移量

5.3 _get_bbox_regression_labels_pytorch

输入：
bbox_target_data 偏移量
labels_batch：rois对应的分类信息
输出
bbox_targets：整理了数据结构，将每个类别的坐标统一在一起
bbox_inside_weights：将对应有label的rois，置为1

5.4 该layer的forward：

输入是：上一层选出的框和所有标注数据，
输出：rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights之前各个类已经有解释了。

目前这两层proposal：完成了对rois的筛选，计算rois与的偏移量，矫正rois，获取rois对应label的操作。

下面进入最外层

6，rpn

forward：
输入：
1，base_feat 特征信息
2，im_info 图片信息
3，gt_boxes 标注数据
4，num_boxes
输出：
1，rois，通过proposal_layer 得到。给proposal_layer 传入的是：rpn_cls_prob，anchor属于前景或者前景的分数,rpn_bbox_pred为anchor和bbox的偏移（网络预测的）。得到修正后的rois。
2，rpn_loss_cls，是rpn_cls_score,和rpn_label的交叉熵。
rpn_cls_score：rpn网络判断的前后景的值（-1，0，1）
rpn_label:rois是前景还是背景的值。（-1，0，1）来自anchor_target的输出。
3，rpn_loss_box（边界损失）：也是loss，计算该loss需要：rpn_bbox_pred，rpn网络判断的偏移量。rpn_bbox_targets真实偏移量。
4，rpn_bbox_inside_weights
5，rpn_bbox_outside_weights

rpn就到此了。具体网络是怎么判断偏移量的。我还不知道。

rpn的大致流程就算梳理了一下。总结一下此次看代码的经验：前面纠结在每一行代码的解读上面，我觉得在对pytorch还不怎么熟悉的前提下是没有必要的，现在应该是搞清楚每一个函数的输入输出，然后以及输出是怎么通过输入得到的。接下来的其他代码我也会通过这样的逻辑阅读。