map相关方法笔记

最新推荐文章于 2026-04-22 19:36:18 发布

原创最新推荐文章于 2026-04-22 19:36:18 发布 · 468 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#笔记 #人工智能 #python

Python3.8

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本

mapQR

maqQR创新点：

提出了scatter-and-gather query机制，降低query数量，提高收敛速度。
maptr中，每个query是点级别的，比如，预测20条线，每条线有30个点，那n_vec=20，n_pt=30, 总的query 数目为n_vec * n_pt = 600,
这600个query经过decoderlayer 的self-attention和deformable attention时，有很大的计算量。
mapqr则的实现方式为：
maptr再self attn前，会query_ins + query_pt, broadcast操作后，产生n_vec * n_pt 个query,mapqr则继续使用query_ins, 先不做broadcast。此时query个数为n_vec
self attn处理n_vec个query。
deformable attn(InstancePointattention)时，使用scatter-and-gather的scatter操作，将n_vec个query扩展为n_vec * n_pt 个query, 进行deformable attn计算，attention结束时通过gather再
将n_vec *n_pt 个query压缩为n_vec个query。这样就大大减少了self-attn的计算量，提高了收敛速度。
decoder中，每层layer推理前,将ref point做正弦编码，然后用linear embedding，生成每个点的embedding.：
InstancePointattention中：

# in InstancePointattention::__init_
    self.output_proj = nn.Sequential(
        nn.Linear(embed_dims*num_pts_per_vec, embed_dims*num_pts_per_vec//2),
        nn.ReLU(),
        nn.Linear(embed_dims*num_pts_per_vec//2, embed_dims),
        nn.ReLU(),
        nn.Linear(embed_dims, embed_dims),
    )
    # in forward, scatter操作。 通过broadcast操作，将每个点的query广播到每个实例上。
    if pt_query_pos is not None:
        bs, num_iq, num_pt, _ = pt_query_pos.shape
        query = query.unsqueeze(2) + pt_query_pos
        query = query.reshape(bs, num_iq * num_pt, -1)
    #...dfa, gather操作
    output = self.output_proj(output) # --> bs, num_iq, 1

创新的位置编码（参考dab detr）
query分为content和position两部分，maptrzhong ,position是learnable的，它生成ref point，在self attn中就给k和q。mapqr则将ref point做正弦编码，然后用linear embedding，生成每个点的embedding.：
再传到dfa中。

for lid, layer in enumerate(self.layers):
    if self.query_pos_embedding == 'instance':
        reference_points_reshape = reference_points.view(bs, -1, self.num_pts_per_vec, self.pt_dim)
        reference_points_reshape = reference_points_reshape.view(bs, -1, self.pt_dim)
        query_sine_embed = gen_sineembed_for_position(reference_points_reshape[..., :2])
        query_sine_embed = query_sine_embed.view(bs, -1, self.num_pts_per_vec, self.embed_dims)
        point_query_pos = self.pt_pos_query_projs[lid](query_sine_embed) #(bs, n_vec, n_pt, embed_dims)
        query_pos_lid = None
        reference_points_input = reference_points_reshape[..., :2].unsqueeze(2)

bev encoder中，GKT with flexible height.
encoder中，用gkt方法将每个grid投影到每个相机前，额外增加一个heigh offset的预测，预测每个pillar的高度偏移，然后加到ref point的z坐标上。
HeightKernelAttention中，

# __init__
self.height_offsets = nn.Linear(self.embed_dims, num_points_in_pillar)
# forward:
sampled_heights = self.height_offsets(query) #linear
sampled_heights = sampled_heights / self.height_range
sampled_heights = sampled_heights.permute(0, 2, 1)

reference_points[:, :, :, 2] += sampled_heights
reference_points_cam, bev_mask = self.point_sampling(reference_points, self.pc_range, kwargs['img_metas'])