1.论文信息
标题:DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection
作者:Liang Peng, Xiaopei Wu, Zheng Yang, Haifeng Liu, Deng Cai
来源:2022 European Conference on Computer Vision (ECCV)
原文链接:https://arxiv.org/abs/2207.08531
代码链接:https://github.com/SPengLiang/DID-M3D
2.引言
由于深度信息在相机投影过程之后容易丢失,因此实例深度估计是提高性能的瓶颈。
3.主要贡献
1.指出了实例深度的耦合性,提出将实例深度解耦为属性深度和视觉深度;
2.提出两种不确定性表示深度估计的置信度,一个是属性深度attr的不确定性,另一个是视觉深度vis的不确定性;
3.减轻了在数据增强中使用仿射变换的局限性;
4.设置了新的技术状态SOTA;
4.模型网络架构
总体设计遵循GUPNET,红色部分表示论文新提出的组件,用于解耦实例深度
5.模型label(target):
vis_depth = roi_depth // vis_depth 是 depth_dense 图中的(表面深度)深度经过 roi_align 变换后的深度
att_depth = depth - vis_depth
6.模型loss:
DID_loss主要分为三个部分:
DID_loss = seg_loss + bbox2d_loss + bbox3d_loss
其中:
seg_loss = focal_loss_cornernet bbox2d_loss = offset2d_loss + size2d_loss bbox3d_loss = depth_loss + offset3d_loss + size3d_loss + heading_loss + test_loss depth_loss = vis_depth_loss + att_depth_loss + ins_depth_loss heading_loss = cls_loss + reg_loss
其中损失的具体类型为:
seg_loss: focal_loss_cornernet
vis_depth_loss: laplacian_aleatoric_uncertainty_loss
att_depth_loss: laplacian_aleatoric_uncertainty_loss
ins_depth_loss: laplacian_aleatoric_uncertainty_loss
offset3d_loss: l1loss
size3d_loss: l1loss
cls_loss: 交叉熵损失
reg_loss: l1loss
test_loss 自定义loss
1)laplacian_aleatoric_uncertainty_loss:拉普拉斯概率不确定性损失
loss = 1.4142 * torch.exp(-0.5*log_variance) * torch.abs(input - target) + 0.5*log_variance
7.train过程:
1). 预训练
1. ei_loss = self.compute_e0_loss() 先预训练一个epoch的loss,共11个部分,{dict:11},如图:
2). 分层训练对象初始化
2. loss_weightor = Hierarchical_Task_Learning(ei_loss) 使用预训练的ei_loss,初始化一个分层训练对象,如图:具有属性 self.loss_graph
3). 开始训练
3. for epoch in range(start_epoch, self.cfg_train['max_epoch']): 从start_epoch开始循环训练至max_epoch:
i. loss_weights = loss_weightor.compute_weight(ei_loss,self.epoch) 根据当前的ei_loss和self.epoch,使用Hierarchical_Task_Learning object 的 compute_weight方法,计算出各个loss层的权重loss_weightor,操作为 先将当前的ei_loss赋值给eval_loss_input eval_loss_input = torch.cat([_.unsqueeze(0) for _ in current_loss.values()]).unsqueeze(0) 再将属性self.loss_graph中,值为空的loss层, weight设为1; 否则,weight设为0;当self.past_losses的长度等于stat_epoch_nums(5)时,更新权重列表 if len(self.past_losses)==self.stat_epoch_nums:
ii. ei_loss = self.train_one_epoch(loss_weights,epoch) 根据上一步计算出的loss_weights,使用 train_one_epoch(),计算出新的ei_loss。 train_one_epoch(): # 从第一个batch开始循环遍历 for batch_idx, (inputs,calibs,coord_ranges, targets, info) in enumerate(self.train_loader): # 初始化 DIDLoss object criterion = DIDLoss(self.epoch) # 送入 DID 模型,输出结果outputs (DID模型结构后面具体展开) outputs = self.model(inputs,coord_ranges,calibs,targets) # 根据 outputs 和 targets ,使用 DIDLoss.forward() 计算出各项loss_terms total_loss, loss_terms = criterion(outputs, targets) # 计算total_loss:各项loss乘以相应权重的和 if loss_weights is not None: total_loss = torch.zeros(1).cuda() for key in loss_weights.keys(): total_loss += loss_weights[key].detach()*loss_terms[key] # total_loss反向传播 total_loss.backward() self.optimizer.step() trained_batch = batch_idx + 1
iii. self.epoch += 1
8. 模型介绍
1). DID模型:
① 先进入backbone,再进入feat_up(),得到特征图feat(一个batch为单位);
② 将feat送入heatmap、offset_2d、size_2d三大分支网,结果存在变量ret中;
def forward(self, input, coord_ranges,calibs, targets=None, K=50, mode='train'): device_id = input.device feat = self.backbone(input) feat = self.feat_up(feat[self.first_level:]) ret = {} ''' ret = {} for head in self.heads: ret[head] = self.__getattr__(head)(feat) ''' ret['heatmap']=self.heatmap(feat) ret['offset_2d']=self.offset_2d(feat) ret['size_2d']=self.size_2d(feat)
③ 获取ROI特征:将feat送入get_roi_feat()方法中
ret.update(self.get_roi_feat(feat,inds,masks,ret,calibs,coord_ranges,cls_ids))
其中:
coord_map:tensor(2,2,96,320),代表像素点的坐标图,如下:
tensor([[[ 0., 1., 2., ..., 317., 318., 319.],
[ 0., 1., 2., ..., 317., 318., 319.],
[ 0., 1., 2., ..., 317., 318., 319.],
...,
[ 0., 1., 2., ..., 317., 318., 319.],
[ 0., 1., 2., ..., 317., 318., 319.],
[ 0., 1., 2., ..., 317., 318., 319.]],
[[ 0., 0., 0., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 2., 2., 2., ..., 2., 2., 2.],
...,
[ 93., 93., 93., ..., 93., 93., 93.],
[ 94., 94., 94., ..., 94., 94., 94.],
[ 95., 95., 95., ..., 95., 95., 95.]]], device='cuda:0')
ret['offset_2d']是网络输出的关于每个像素点的偏移,故有:
box2d_centre = coord_map + ret['offset_2d'],加上偏移的2d框中心点,tensor(2,2,96,320)
box2d_maps = torch.cat([box2d_centre-ret['size_2d']/2,box2d_centre+ret['size_2d']/2],1)
④ get_roi_feat_by_mask():
def get_roi_feat_by_mask(self,feat,box2d_maps,inds,mask,calibs,coord_ranges,cls_ids): BATCH_SIZE,_,HEIGHT,WIDE = feat.size() device_id = feat.device # 得到一个批次的所有目标 num_masked_bin = mask.sum() res = {} if num_masked_bin!=0: ... #get roi feature, 先得到roi feature roi_feature_masked = roi_align(feat,scale_box2d_masked,[7,7]) ... roi_feature_masked = torch.cat([roi_feature_masked,coord_maps,cls_hots.unsqueeze(-1).unsqueeze( -1).repeat([1,1,7,7])],1)
得到roi feature后,再计算:
① vis_depth:将roi_feature_masked放入self.vis_depth分支网络中,得到结果后,取负求e的指数,最后乘以scale_depth;
vis_depth = self.vis_depth(roi_feature_masked).squeeze(1)vis_depth = (-vis_depth).exp() vis_depth = vis_depth * scale_depth.unsqueeze(-1).unsqueeze(-1)
② att_depth:将roi_feature_masked放入self.att_depth分支网络中,得到结果;
att_depth = self.att_depth(roi_feature_masked).squeeze(1)
③ ins_depth:vis_depth+att_depth,视觉深度和属性深度相加;
ins_depth = vis_depth + att_depth
④ vis_depth_uncer:将roi_feature_masked放入self.vis_depth_uncer分支网络中,得到结果;
vis_depth_uncer = self.vis_depth_uncer(roi_feature_masked)[:, 0, :, :]
⑤ att_depth_uncer:将roi_feature_masked放入self.att_depth_uncer分支网络中,得到结果;
att_depth_uncer = self.att_depth_uncer(roi_feature_masked)[:, 0, :, :]
⑥ ins_depth_uncer:
即ins_depth_uncer = log(exp(vis_depth_uncer)+(att_depth_uncer))
ins_depth_uncer = torch.logsumexp(torch.stack([vis_depth_uncer, att_depth_uncer], -1), -1)
⑦ merge_prob:e( - e*(0.5*ins_depth_uncer) )
merge_prob = (-(0.5 * ins_depth_uncer).exp()).exp()
⑧ merge_depth:
merge_depth 是ins_depth 的加权平均和,其中权是merge_prob[i]/∑(merge_prob)
merge_depth = (torch.sum((ins_depth * merge_prob).view(K, -1), dim=-1) / torch.sum(merge_prob.view(K, -1), dim=-1)) merge_depth = merge_depth.unsqueeze(1)
⑨ merge_conf:
merge_conf = (torch.sum(merge_prob.view(K, -1) ** 2, dim=-1) / torch.sum(merge_prob.view(K, -1), dim=-1)).unsqueeze(1)