此层的代码位于lib/model/rpn下
Region Proposal Network 主要用于生成 Region Proposal
我将从每一个文件的输入输出,以及每一层的输入输出,去梳理这一层具体的实现流程
1,generate_anchors.py
Anchors
实际上就是矩形区域,由generate_anchors.py生成。
直接执行
python generate_anchors.py
可生成如下数据
[[ -84. -40. 99. 55.]
[-176. -88. 191. 103.]
[-360. -184. 375. 199.]
[ -56. -56. 71. 71.]
[-120. -120. 135. 135.]
[-248. -248. 263. 263.]
[ -36. -80. 51. 95.]
[ -80. -168. 95. 183.]
[-168. -344. 183. 359.]]
里面代表9个anchor的坐标,分别有左上的x,y和右下的x,y表示
那直接上代码吧。
这里的base_size是16,用于生成一个基准的anchor
ration 是anchor的三种形状比例
scale是将基准的achor分别放大 8 ,16,32倍
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
scales=2**np.arange(3, 6)):
#代码中由有两种表示anchor的形式。一种是记录坐上角的坐标和长宽。一种是只记录左上角和做下脚的坐标
#基准anchor,将生成(0,0,15,15)
base_anchor = np.array([1, 1, base_size, base_size]) - 1
#基于base_anchor,做比例变换,返回不同长宽比的anchor(三个)
ratio_anchors = _ratio_enum(base_anchor, ratios)
#三种不同长宽比的anchor分别扩大(8,16,32)倍
#生成9个候选框
#生成的候选框,表示为:左上角的x,y. 右上角的x,y
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in xrange(ratio_anchors.shape[0])])
return anchors
#返回anchor的长宽和中心坐标
def _whctrs(anchor):
w = anchor[2] - anchor[0] + 1
h = anchor[3] - anchor[1] + 1
x_ctr = anchor[0] + 0.5 * (w - 1)
y_ctr = anchor[1] + 0.5 * (h - 1)
return w, h, x_ctr, y_ctr
#给定中心坐标和长宽,返回anchor(左上角坐标和右下角坐标)
def _mkanchors(ws, hs, x_ctr, y_ctr):
ws = ws[:, np.newaxis]
hs = hs[:, np.newaxis]
anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
y_ctr - 0.5 * (hs - 1),
x_ctr + 0.5 * (ws - 1),
y_ctr + 0.5 * (hs - 1)))
return anchors
计算不同长宽比下的anchor坐标
def _ratio_enum(anchor, ratios):
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h
#不同比例下,需要的size
size_ratios = size / ratios
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
#基于某个acnhor ,扩大scale倍
def _scale_enum(anchor, scales):
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = w * scales
hs = h * scales
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
到此, generate_anchors.py的源码阅读就已经结束了。
总结如下
1)生成一个基准anchor(0,0,15,15)
2)基于当前anchor生成三种不同比例(0.5,1,2)的anchor.现在有三个anchor
3)三个anchor分别扩大8,16,32倍。
因为上一层传入的特征大小为512的,所以anchor最大为16*32
2,bbox_transform.py
这个类主要进行一些坐标转换和其他层需要的数据的计算。下面主要列举两个函数
#计算box和anchor的IoU值
#anchors为N个四维的向量
#gc_boxes为K个四维的向量
#overlapsN个k维的向量,描述anchor与每个box的IoU
def bbox_overlaps(anchors, gt_boxes):
#将框的表达表示为,左上角坐标x,y和宽de,长dh
#计算anchor和其对应框的偏移量
def bbox_transform(ex_rois, gt_rois):
3,anchor_target_layer.py
class _AnchorTargetLayer(nn.Module):
"""
Assign anchors to ground-truth targets. Produces anchor classification
labels and bounding-box regression targets.
"""
def __init__(self, feat_stride, scales, ratios):
super(_AnchorTargetLayer, self).__init__()
#_feat_stride 与原图相比,特征图缩小的比例
self._feat_stride = feat_stride
self._scales = scales
#用于生成anchors
anchor_scales = scales
#生成了9个anchors
self._anchors = torch.from_numpy(generate_anchors(scales=np.array(anchor_scales), ratios=np.array(ratios))).float()
#acnhor的个数,为9
self._num_anchors = self._anchors.size(0)
# allow boxes to sit over the edge by a small amount
#anchor超出边界的限度
self._allowed_border = 0 # default is 0
#前向算法,为每一个cell都生成9个anchors
#input为上一层传入的
def forward(self, input):
# Algorithm:
#
# for each (H, W) location i
# generate 9 anchor boxes centered on cell i
# apply predicted bbox deltas at cell i to each of the 9 anchors
# filter out-of-image anchors
#网络预置的分类分数
rpn_cls_score = input[0]
#标注
gt_boxes = input[1]
#图像尺寸
im_info = input[2]
num_boxes = input[3]
# map of shape (..., H, W)
#获取特征图的高宽
height, width = rpn_cls_score.size(2), rpn_cls_score.size(3)
#有几个batch
batch_size = gt_boxes.size(0)
#获取特征图的高宽
feat_height, feat_width = rpn_cls_score.size(2), rpn_cls_score.size(3)
#下面生成anchor
#要乘一个feat_stride
#所以,生成的anchor是对应原图的。
#下面是在原图上生成anchor
shift_x = np.arange(0, feat_width) * self._feat_stride
shift_y = np.arange(0, feat_height) * self._feat_stride
#生成网格x,y为网格点的坐标
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = torch.from_numpy(np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose())
shifts = shifts.contiguous().type_as(rpn_cls_score).float()
#A=9
A = self._num_anchors
#特征图的大小
K = shifts.size(0)
self._anchors = self._anchors.type_as(gt_boxes) # move to specific gpu.
all_anchors = self._anchors.view(1, A, 4) + shifts.view(K, 1, 4)
all_anchors = all_anchors.view(K * A, 4)
#记录所有点
total_anchors = int(K * A)
#确保anchor没有超过图像边界
keep = ((all_anchors[:, 0] >= -self._allowed_border) &
(all_anchors[:, 1] >= -self._allowed_border) &
(all_anchors[:, 2] < long(im_info[0][1]) + self._allowed_border) &
(all_anchors[:, 3] < long(im_info[0][0]) + self._allowed_border))
inds_inside = torch.nonzero(keep).view(-1)
# keep only inside anchors
#只保留在图像中的anchor
anchors = all_anchors[inds_inside, :]
# label: 1 is positive, 0 is negative, -1 is dont care
#labels:合法的anchor的个数
#并用-1填充labels
labels = gt_boxes.new(batch_size, inds_inside.size(0)).fill_(-1)
#使用0初始化前景分数
bbox_inside_weights = gt_boxes.new(batch_size, inds_inside.size(0)).zero_()
#使用0初始化背景分数
bbox_outside_weights = gt_boxes.new(batch_size, inds_inside.size(0)).zero_()
#计算anchor和标注数据的重合度
overlaps = bbox_overlaps_batch(anchors, gt_boxes)
#每个anchor,找到对应的iou最大的gt_box
max_overlaps, argmax_overlaps = torch.max(overlaps, 2)
#每个gt_box, 找到最大的anchor
gt_max_overlaps, _ = torch.max(overlaps, 1)
if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
#将小于iou值小于0.3的anchor置分数为0
labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
#置为0.00001
gt_max_overlaps[gt_max_overlaps==0] = 1e-5
keep = torch.sum(overlaps.eq(gt_max_overlaps.view(batch_size,1,-1).expand_as(overlaps)), 2)
if torch.sum(keep) > 0:
labels[keep>0] = 1
# fg label: above threshold IOU
#将大于阀值的anchor的分数置1
labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1
if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
#小于某个分数置0
labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
#计算出一个训练batch中需要的前景数量
num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
#分别找出被置为前景和背景的anchor
sum_fg = torch.sum((labels == 1).int(), 1)
sum_bg = torch.sum((labels == 0).int(), 1)
for i in range(batch_size):
#如果前景大于我们所需要的,则抛弃一些前景
# subsample positive labels if we have too many
if sum_fg[i] > num_fg:
fg_inds = torch.nonzero(labels[i] == 1).view(-1)
# torch.randperm seems has a bug on multi-gpu setting that cause the segfault.
# See https://github.com/pytorch/pytorch/issues/1868 for more details.
# use numpy instead.
#rand_num = torch.randperm(fg_inds.size(0)).type_as(gt_boxes).long()
rand_num = torch.from_numpy(np.random.permutation(fg_inds.size(0))).type_as(gt_boxes).long()
disable_inds = fg_inds[rand_num[:fg_inds.size(0)-num_fg]]
labels[i][disable_inds] = -1
# num_bg = cfg.TRAIN.RPN_BATCHSIZE - sum_fg[i]
num_bg = cfg.TRAIN.RPN_BATCHSIZE - torch.sum((labels == 1).int(), 1)[i]
# subsample negative labels if we have too many
#如果背景大于我们所需要的,则抛弃一些背景
if sum_bg[i] > num_bg:
bg_inds = torch.nonzero(labels[i] == 0).view(-1)
#rand_num = torch.randperm(bg_inds.size(0)).type_as(gt_boxes).long()
rand_num = torch.from_numpy(np.random.permutation(bg_inds.size(0))).type_as(gt_boxes).long()
disable_inds = bg_inds[rand_num[:bg_inds.size(0)-num_bg]]
labels[i][disable_inds] = -1
offset = torch.arange(0, batch_size)*gt_boxes.size(1)
argmax_overlaps = argmax_overlaps + offset.view(batch_size, 1).type_as(argmax_overlaps)
#对于每个anchor,找到变换到对应的最大的overlap的gt_box的四个值
bbox_targets = _compute_targets_batch(anchors, gt_boxes.view(-1,5)[argmax_overlaps.view(-1), :].view(batch_size, -1, 5))
# use a single value instead of 4 values for easy index.
#给前景anchor赋权重
bbox_inside_weights[labels==1] = cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS[0]
if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
num_examples = torch.sum(labels[i] >= 0)
positive_weights = 1.0 / num_examples.item()
negative_weights = 1.0 / num_examples.item()
else:
assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
(cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
#将计算得到的positive 和negative的值
bbox_outside_weights[labels == 1] = positive_weights
bbox_outside_weights[labels == 0] = negative_weights
#讲内部的anchor映射到总的anchor中去
labels = _unmap(labels, total_anchors, inds_inside, batch_size, fill=-1)
bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, batch_size, fill=0)
bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, batch_size, fill=0)
bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, batch_size, fill=0)
outputs = []
labels = labels.view(batch_size, height, width, A).permute(0,3,1,2).contiguous()
#讲anchor类别的形状设置以下形状
labels = labels.view(batch_size, 1, A * height, width)
outputs.append(labels)
#reshape,anchor的位置映射数组的形状
bbox_targets = bbox_targets.view(batch_size, height, width, A*4).permute(0,3,1,2).contiguous()
outputs.append(bbox_targets)
#以下分别重置设置anchor的inside_weight 和outside_weight 数组形状
anchors_count = bbox_inside_weights.size(1)
bbox_inside_weights = bbox_inside_weights.view(batch_size,anchors_count,1).expand(batch_size, anchors_count, 4)
bbox_inside_weights = bbox_inside_weights.contiguous().view(batch_size, height, width, 4*A)\
.permute(0,3,1,2).contiguous()
outputs.append(bbox_inside_weights)
bbox_outside_weights = bbox_outside_weights.view(batch_size,anchors_count,1).expand(batch_size, anchors_count, 4)
bbox_outside_weights = bbox_outside_weights.contiguous().view(batch_size, height, width, 4*A)\
.permute(0,3,1,2).contiguous()
outputs.append(bbox_outside_weights)
##bbox_inside_weight
##和outside_weight 是干嘛的
return outputs
def backward(self, top, propagate_down, bottom):
"""This layer does not propagate gradients."""
pass
def reshape(self, bottom, top):
"""Reshaping happens during the call to forward."""
pass
def _unmap(data, count, inds, batch_size, fill=0):
""" Unmap a subset of item (data) back to the original set of items (of
size count) """
if data.dim() == 2:
ret = torch.Tensor(batch_size, count).fill_(fill).type_as(data)
ret[:, inds] = data
else:
ret = torch.Tensor(batch_size, count, data.size(2)).fill_(fill).type_as(data)
ret[:, inds,:] = data
return ret
def _compute_targets_batch(ex_rois, gt_rois):
"""Compute bounding-box regression targets for an image."""
return bbox_transform_batch(ex_rois, gt_rois[:, :, :4])
对于anchor_target_layer 总结如下:
该层的输入:
1,rpn_cls_score 18 维 ,为9个anchor做的二分类。(是前景还是背景的概率)。前九个为各自的前景分数,后九个为各自的背景分数。是网络判断的分类分数。这一层用这个数据的主要目的是获取,里面数据的维数。获取特征信息。
2,gt_boxes 标注数据
3,im_info 图片信息(长宽)
该层的输出:
forward中的代码主要作用是为每一个cell都生成9个anchors,会丢掉超出图像的anchors。
1,gt_max_overlaps,每个gt_box最大IoU的anchor
2,max_overlaps每个anchor最大IoU的gt_box
3,labels 记录每个anchor的分数1为前景IoU>0.7,0为背景IoU<0.3,-1.IoU大于0.3小于0.7的anchor,需要被忽略。
4,bbox_target为每个anchor对应最大IoU的ground truth的偏移置。( 在外层用于和系统预测的偏移做回归)
5,bbox_outside_weights//该框内有前景或者背景 为1
6,bbox_inside_weights//该框内,有前景为1
接下来:
4,proposal_layer.py
先放出他的forward函数
def forward(self, input):
#
# for each (H, W) location i
# generate A anchor boxes centered on cell i
# apply predicted bbox deltas at cell i to each of the A anchors
# clip predicted boxes to image
# remove predicted boxes with either height or width < threshold
# sort all (proposal, score) pairs by score from highest to lowest
# take top pre_nms_topN proposals before NMS
# apply NMS with threshold 0.7 to remaining proposals
# take after_nms_topN proposals after NMS
# return the top proposals (-> RoIs top, scores top)
# the first set of _num_anchors channels are bg probs
# the second set are the fg probs
#前九个(num_anchors)是属于前景的分数
#后九个是属于背景的分数
scores = input[0][:, self._num_anchors:, :, :]
#各个框的坐标变换信息
bbox_deltas = input[1]
im_info = input[2]
cfg_key = input[3]
pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
nms_thresh = cfg[cfg_key].RPN_NMS_THRESH
min_size = cfg[cfg_key].RPN_MIN_SIZE
batch_size = bbox_deltas.size(0)
feat_height, feat_width = scores.size(2), scores.size(3)
#网格那一套
shift_x = np.arange(0, feat_width) * self._feat_stride
shift_y = np.arange(0, feat_height) * self._feat_stride
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = torch.from_numpy(np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose())
shifts = shifts.contiguous().type_as(scores).float()
A = self._num_anchors#9
K = shifts.size(0)
self._anchors = self._anchors.type_as(scores)
# anchors = self._anchors.view(1, A, 4) + shifts.view(1, K, 4).permute(1, 0, 2).contiguous()
#得到所有初始框
anchors = self._anchors.view(1, A, 4) + shifts.view(K, 1, 4)
#改变纬度
anchors = anchors.view(1, K * A, 4).expand(batch_size, K * A, 4)
# Transpose and reshape predicted bbox transformations to get them
# into the same order as the anchors:
#改变acnhor表示方法,
bbox_deltas = bbox_deltas.permute(0, 2, 3, 1).contiguous()
#变换维度
bbox_deltas = bbox_deltas.view(batch_size, -1, 4)
# Same story for the scores:
#对score做同样操作
scores = scores.permute(0, 2, 3, 1).contiguous()
scores = scores.view(batch_size, -1)
# Convert anchors into proposals via bbox transformations
#得到变换后的proposal
proposals = bbox_transform_inv(anchors, bbox_deltas, batch_size)
# 2. clip predicted boxes to image
#对超出边界的box进行剪裁
proposals = clip_boxes(proposals, im_info, batch_size)
# proposals = clip_boxes_batch(proposals, im_info, batch_size)
# assign the score to 0 if it's non keep.
# keep = self._filter_boxes(proposals, min_size * im_info[:, 2])
# trim keep index to make it euqal over batch
# keep_idx = torch.cat(tuple(keep_idx), 0)
# scores_keep = scores.view(-1)[keep_idx].view(batch_size, trim_size)
# proposals_keep = proposals.view(-1, 4)[keep_idx, :].contiguous().view(batch_size, trim_size, 4)
# _, order = torch.sort(scores_keep, 1, True)
scores_keep = scores
proposals_keep = proposals
_, order = torch.sort(scores_keep, 1, True)
output = scores.new(batch_size, post_nms_topN, 5).zero_()
for i in range(batch_size):
# # 3. remove predicted boxes with either height or width < threshold
# # (NOTE: convert min_size to input image scale stored in im_info[2])
#移除box太高或太低的
proposals_single = proposals_keep[i]
scores_single = scores_keep[i]
# # 4. sort all (proposal, score) pairs by score from highest to lowest
# # 5. take top pre_nms_topN (e.g. 6000)
order_single = order[i]
if pre_nms_topN > 0 and pre_nms_topN < scores_keep.numel():
#保留前景分数在前1000名的框
order_single = order_single[:pre_nms_topN]
proposals_single = proposals_single[order_single, :]
scores_single = scores_single[order_single].view(-1,1)
# 6. apply nms (e.g. threshold = 0.7)
# 7. take after_nms_topN (e.g. 300)
# 8. return the top proposals (-> RoIs top)
keep_idx_i = nms(torch.cat((proposals_single, scores_single), 1), nms_thresh, force_cpu=not cfg.USE_GPU_NMS)
keep_idx_i = keep_idx_i.long().view(-1)
#用非最大抑制排除重复框
if post_nms_topN > 0:
keep_idx_i = keep_idx_i[:post_nms_topN]
proposals_single = proposals_single[keep_idx_i, :]
scores_single = scores_single[keep_idx_i, :]
# padding 0 at the end.
num_proposal = proposals_single.size(0)
output[i,:,0] = i
output[i,:num_proposal,1:] = proposals_single
return output
对这一层的总结如下:
输入:
1,score:每个框的前景后景的分数
2,bbox_deltas:框的偏移信息
3,imginfo:图片信息
4,cfg_key:是否在训练
过程:
先将框进行偏移修正。
通过非最大抑制,和前景的分数排序。
选定最终的预测框。
输出:pre_nms_topN个最终的框
5,proposal_target_layer
这一层的目的是:将上一层选出来的框,获取对应gtbox的分类。
5.1 先看_sample_rois_pytorch 函数
输入是:
1,all_rois:选出来的框
2,gt_box:标注数据
3,fg_rois_per_image:每张图片最多的前景框数
4,rois_per_image:每一张图片理论的rois数量
过程:
通过计算anchor对应IoU最大的bbox,找到该bbox的类别,则该anchor属于该类
输出:
1,labels_batch ,每个anchor对应gt_box所归属的类别
2,rois_batch:所有框
3,bbox_targets:rois对应的bbox的偏移量
4,bbox_inside_weights
5.2 _compute_targets_pytorch
输入:
1,ex_rois,预测框
3,gt_rois,标注框
输出:
targets,内容为rois和其对应bbox的偏移量
5.3 _get_bbox_regression_labels_pytorch
输入:
bbox_target_data 偏移量
labels_batch:rois对应的分类信息
输出
bbox_targets:整理了数据结构,将每个类别的坐标统一在一起
bbox_inside_weights:将对应有label的rois,置为1
5.4 该layer的forward:
输入是:上一层选出的框和所有标注数据,
输出:rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights之前各个类已经有解释了。
目前这两层proposal:完成了对rois的筛选,计算rois与的偏移量,矫正rois,获取rois对应label的操作。
下面进入最外层
6,rpn
forward:
输入:
1,base_feat 特征信息
2,im_info 图片信息
3,gt_boxes 标注数据
4,num_boxes
输出:
1,rois,通过proposal_layer 得到。给proposal_layer 传入的是:rpn_cls_prob,anchor属于前景或者前景的分数,rpn_bbox_pred为anchor和bbox的偏移(网络预测的)。得到修正后的rois。
2,rpn_loss_cls,是rpn_cls_score,和rpn_label的交叉熵。
rpn_cls_score:rpn网络判断的前后景的值(-1,0,1)
rpn_label:rois是前景还是背景的值。(-1,0,1)来自anchor_target的输出。
3,rpn_loss_box(边界损失):也是loss,计算该loss需要:rpn_bbox_pred,rpn网络判断的偏移量。rpn_bbox_targets真实偏移量。
4,rpn_bbox_inside_weights
5,rpn_bbox_outside_weights
rpn就到此了。具体网络是怎么判断偏移量的。我还不知道。
rpn的大致流程就算梳理了一下。总结一下此次看代码的经验:前面纠结在每一行代码的解读上面,我觉得在对pytorch还不怎么熟悉的前提下是没有必要的,现在应该是搞清楚每一个函数的输入输出,然后以及输出是怎么通过输入得到的。接下来的其他代码我也会通过这样的逻辑阅读。