DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection用于单目3D目标检测的解耦实例深度

HHHHGitttt

已于 2024-03-08 10:27:31 修改

阅读量535

点赞数

CC 4.0 BY-SA版权

文章标签：目标检测人工智能

于 2022-11-14 10:31:38 首次发布

本文链接：https://blog.csdn.net/usernameisnotexist/article/details/127842759

DID-M3D是欧洲计算机视觉会议2022年提出的一项新方法，通过解耦实例深度为属性深度和视觉深度来解决3D对象检测的瓶颈。模型引入了两种深度不确定性表示，缓解了仿射变换的数据增强局限性，并在多个任务中设立了新的SOTA。该模型基于GUPNET架构，通过分层训练策略逐步优化各个损失项，实现更精确的深度估计和3D框预测。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.论文信息

标题：DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

作者：Liang Peng, Xiaopei Wu, Zheng Yang, Haifeng Liu, Deng Cai

来源：2022 European Conference on Computer Vision (ECCV)

原文链接：https://arxiv.org/abs/2207.08531

代码链接：https://github.com/SPengLiang/DID-M3D

2.引言

由于深度信息在相机投影过程之后容易丢失，因此实例深度估计是提高性能的瓶颈。

3.主要贡献

1.指出了实例深度的耦合性，提出将实例深度解耦为属性深度和视觉深度；

2.提出两种不确定性表示深度估计的置信度，一个是属性深度attr的不确定性，另一个是视觉深度vis的不确定性；

3.减轻了在数据增强中使用仿射变换的局限性；

4.设置了新的技术状态SOTA；

4.模型网络架构

总体设计遵循GUPNET，红色部分表示论文新提出的组件，用于解耦实例深度

5.模型label（target）：

vis_depth = roi_depth // vis_depth 是 depth_dense 图中的（表面深度）深度经过 roi_align 变换后的深度

att_depth = depth - vis_depth

6.模型loss：

DID_loss主要分为三个部分：

DID_loss = seg_loss + bbox2d_loss + bbox3d_loss

其中：

seg_loss = focal_loss_cornernet

bbox2d_loss = offset2d_loss + size2d_loss

bbox3d_loss = depth_loss + offset3d_loss + size3d_loss + heading_loss + test_loss

depth_loss = vis_depth_loss + att_depth_loss + ins_depth_loss

heading_loss = cls_loss + reg_loss

其中损失的具体类型为：

seg_loss：        focal_loss_cornernet    
vis_depth_loss：  laplacian_aleatoric_uncertainty_loss
att_depth_loss：  laplacian_aleatoric_uncertainty_loss
ins_depth_loss：  laplacian_aleatoric_uncertainty_loss
offset3d_loss：   l1loss
size3d_loss：     l1loss
cls_loss：        交叉熵损失
reg_loss：        l1loss
test_loss         自定义loss

1）laplacian_aleatoric_uncertainty_loss：拉普拉斯概率不确定性损失

    loss = 1.4142 * torch.exp(-0.5*log_variance) * torch.abs(input - target) + 0.5*log_variance

7.train过程：

1）. 预训练

1. ei_loss = self.compute_e0_loss()

   先预训练一个epoch的loss，共11个部分，{dict：11}，如图：

2）. 分层训练对象初始化

2. loss_weightor = Hierarchical_Task_Learning(ei_loss)

   使用预训练的ei_loss，初始化一个分层训练对象，如图：具有属性 self.loss_graph

3）. 开始训练

3. for epoch in range(start_epoch, self.cfg_train['max_epoch'])：

   从start_epoch开始循环训练至max_epoch：

   i. loss_weights = loss_weightor.compute_weight(ei_loss,self.epoch)

      根据当前的ei_loss和self.epoch，使用Hierarchical_Task_Learning object 的       compute_weight方法，计算出各个loss层的权重loss_weightor，操作为
       
      先将当前的ei_loss赋值给eval_loss_input
 eval_loss_input = torch.cat([_.unsqueeze(0) for _ in current_loss.values()]).unsqueeze(0)

      再将属性self.loss_graph中，值为空的loss层， weight设为1；
                                            否则，weight设为0；

      当self.past_losses的长度等于stat_epoch_nums（5）时，更新权重列表
      if len(self.past_losses)==self.stat_epoch_nums:

  ii.  ei_loss = self.train_one_epoch(loss_weights,epoch)

       根据上一步计算出的loss_weights，使用 train_one_epoch()，计算出新的ei_loss。

       train_one_epoch():
           # 从第一个batch开始循环遍历
           for batch_idx, (inputs,calibs,coord_ranges, targets, info) in enumerate(self.train_loader):

               # 初始化 DIDLoss object
               criterion = DIDLoss(self.epoch)

               # 送入 DID 模型，输出结果outputs (DID模型结构后面具体展开)
               outputs = self.model(inputs,coord_ranges,calibs,targets)

               # 根据 outputs 和 targets ,使用 DIDLoss.forward() 计算出各项loss_terms
               total_loss, loss_terms = criterion(outputs, targets) 

               # 计算total_loss：各项loss乘以相应权重的和
               if loss_weights is not None:                  
                  total_loss = torch.zeros(1).cuda()                 
                  for key in loss_weights.keys():                 
                      total_loss += loss_weights[key].detach()*loss_terms[key]

               # total_loss反向传播
               total_loss.backward()
               self.optimizer.step()

               trained_batch = batch_idx + 1

  iii.  self.epoch += 1

8. 模型介绍

1）. DID模型：

① 先进入backbone，再进入feat_up()，得到特征图feat（一个batch为单位）；

② 将feat送入heatmap、offset_2d、size_2d三大分支网，结果存在变量ret中；

def forward(self, input, coord_ranges,calibs, targets=None, K=50, mode='train'):
    device_id = input.device
    feat = self.backbone(input)
    feat = self.feat_up(feat[self.first_level:])

    ret = {}
    '''
    ret = {}
    for head in self.heads:
        ret[head] = self.__getattr__(head)(feat)
    '''
    ret['heatmap']=self.heatmap(feat)
    ret['offset_2d']=self.offset_2d(feat)
    ret['size_2d']=self.size_2d(feat)

③ 获取ROI特征：将feat送入get_roi_feat()方法中

ret.update(self.get_roi_feat(feat,inds,masks,ret,calibs,coord_ranges,cls_ids))

其中：

coord_map：tensor（2,2,96,320），代表像素点的坐标图，如下：

tensor([[[ 0., 1., 2., ..., 317., 318., 319.],
[ 0., 1., 2., ..., 317., 318., 319.],
[ 0., 1., 2., ..., 317., 318., 319.],
...,
[ 0., 1., 2., ..., 317., 318., 319.],
[ 0., 1., 2., ..., 317., 318., 319.],
[ 0., 1., 2., ..., 317., 318., 319.]],

[[ 0., 0., 0., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 2., 2., 2., ..., 2., 2., 2.],
...,
[ 93., 93., 93., ..., 93., 93., 93.],
[ 94., 94., 94., ..., 94., 94., 94.],
[ 95., 95., 95., ..., 95., 95., 95.]]], device='cuda:0')

ret['offset_2d']是网络输出的关于每个像素点的偏移，故有：

box2d_centre = coord_map + ret['offset_2d']，加上偏移的2d框中心点，tensor（2,2,96,320）

box2d_maps = torch.cat([box2d_centre-ret['size_2d']/2,box2d_centre+ret['size_2d']/2],1)

④ get_roi_feat_by_mask（）：

def get_roi_feat_by_mask(self,feat,box2d_maps,inds,mask,calibs,coord_ranges,cls_ids):
    BATCH_SIZE,_,HEIGHT,WIDE = feat.size()
    device_id = feat.device

    # 得到一个批次的所有目标
    num_masked_bin = mask.sum()

    res = {}
    if num_masked_bin!=0:
        ...
        #get roi feature, 先得到roi feature
        roi_feature_masked = roi_align(feat,scale_box2d_masked,[7,7])
        ...
        roi_feature_masked =                  torch.cat([roi_feature_masked,coord_maps,cls_hots.unsqueeze(-1).unsqueeze(                        -1).repeat([1,1,7,7])],1)

得到roi feature后，再计算：

① vis_depth：将roi_feature_masked放入self.vis_depth分支网络中，得到结果后，取负求e的指数，最后乘以scale_depth；

vis_depth = self.vis_depth(roi_feature_masked).squeeze(1)

vis_depth = (-vis_depth).exp()
vis_depth = vis_depth * scale_depth.unsqueeze(-1).unsqueeze(-1)

② att_depth：将roi_feature_masked放入self.att_depth分支网络中，得到结果；

att_depth = self.att_depth(roi_feature_masked).squeeze(1)

③ ins_depth：vis_depth+att_depth，视觉深度和属性深度相加；

ins_depth = vis_depth + att_depth

④ vis_depth_uncer：将roi_feature_masked放入self.vis_depth_uncer分支网络中，得到结果；

vis_depth_uncer = self.vis_depth_uncer(roi_feature_masked)[:, 0, :, :]

⑤ att_depth_uncer：将roi_feature_masked放入self.att_depth_uncer分支网络中，得到结果；

att_depth_uncer = self.att_depth_uncer(roi_feature_masked)[:, 0, :, :]

⑥ ins_depth_uncer：

即ins_depth_uncer = log（exp（vis_depth_uncer）+（att_depth_uncer））

ins_depth_uncer = torch.logsumexp(torch.stack([vis_depth_uncer, att_depth_uncer], -1), -1)

⑦ merge_prob：e( - e*（0.5*ins_depth_uncer) )

merge_prob = (-(0.5 * ins_depth_uncer).exp()).exp()

⑧ merge_depth：

merge_depth 是ins_depth 的加权平均和，其中权是merge_prob[i]/∑(merge_prob)

merge_depth = (torch.sum((ins_depth * merge_prob).view(K, -1), dim=-1) /
               torch.sum(merge_prob.view(K, -1), dim=-1))
merge_depth = merge_depth.unsqueeze(1)

⑨ merge_conf：

merge_conf = (torch.sum(merge_prob.view(K, -1) ** 2, dim=-1) / torch.sum(merge_prob.view(K, -1), dim=-1)).unsqueeze(1)