算法面试常见手撕题整理笔记（一）：K-Means, 计算检测框IoU, NMS (非极大值抑制)

Gravity!

于 2025-03-25 09:36:12 发布

阅读量882

点赞数 37

CC 4.0 BY-SA版权

文章标签：深度学习面试职场和发展 kmeans nms 算法

本文链接：https://blog.csdn.net/xying_chloe/article/details/146469177

【如果笔记对你有帮助，欢迎关注&点赞&收藏，收到正反馈会加快更新！谢谢支持！】

一、Kmeans

输入：数据 (形状为[L, dim])，K个聚类类别，epochs为迭代次数，tol用于判断聚类是否收敛
方法流程：随机选择K个点作为初始聚类中心 → [ 计算每个点到K个聚类中心的距离(dist) → 每个点按最短距离的聚类中心打上标签(labels) → 更新聚类中心(new_centroids) → 查看是否收敛(如收敛, break) ] × epochs

代码：

import torch

def kmeans(x, K, epochs, tol):
    L, dim = x.shape 
    centroids_idx = torch.randint(0, L, (K,))
    centroids = x[centroids_idx]
    for epoch in epochs:
        dist = torch.cdist(x, centroids)  # 等于 x.unsqueeze(1).expand(L, K, dim), torch.norm(x-centroids, dim=-1)
        labels = torch.argmin(dist, dim=1)  # dist: [L, K]
        new_centroids = [x[labels == k].mean(dim=0) for k in range(K)]

        if torch.norm(new_centroids-centroids) < tol:
            return new_centroids
        
        centroids = new_centroids
    return new_centroids

二、两个⽔平矩形框的IoU计算

输入：box1 = [x_l1, y_l1, x_r1, y_r1] ；box2 = [x_l2, y_l2, x_r2, y_r2]
（满足x_r1 > x_l1, y_r1 > y_l1，box2同理，因为在如图坐标系中，右下角的坐标一定比左上角大）
IoU = 重叠区域面积 / (box1面积 + box2面积 - 重叠区域面积)
重叠区域：
- 左上角坐标 lt_x = max(x_l1, x_l2) ，lt_y = max(y_l1, y_l2)
- 右下角坐标 rb_x = min(x_r1, x_r2) ；rb_y = min(y_r1, y_r2)
- 重叠区域面积 = h × w = (rb_y - lt_y) × (rb_x - lt_x)
- 如果有重叠区域，则 h > 0 且 w > 0，否则不存在重叠区域

代码：

def iou_2box(box1, box2):
    lt_x, lt_y = max(box1[0], box2[0]), max(box1[1], box2[1]) # 重叠区域左上角x, y
    rb_x, rb_y = min(box1[2], box2[2]), min(box1[3], box2[3]) # 重叠区域右下角x, y

    h, w = rb_y - lt_y, rb_x - lt_x

    if h <= 0 or w <= 0:
        return 0

    inter_area = h * w
    area1 = (box1[1] - box1[0])*(box1[3] - box1[2])
    area2 = (box2[1] - box2[0])*(box2[3] - box2[2])

    return inter_area / (area1 + area2 - inter_area)

三、多个⽔平矩形框的IoU计算

方法与上面两个框的计算同理，但变成矩阵运算
box1 形状: [N, 4] box2 形状: [M, 4] （这里box1表示有N个检测框，box2 有M个）
计算重叠区域要将 box1和box2 都expand到 [N, M, 4]，表示box1中的每个框和box2中的每个框的重叠关系（共 N*M 个）

代码：

def iou_multibox(box1, box2):
    N, M = box1.shape(0), box2.shape(0)  # box1: [N, 4]  box2: [M, 4]
    lt = torch.max(box1[:, :2].unsqueeze(1).expand(N, M, 2), box2[:, :2].unsqueeze(0).expand(N, M, 2))
    rb = torch.min(box1[:, 2:].unsqueeze(1).expand(N, M, 2), box2[:, 2:].unsqueeze(0).expand(N, M, 2))

    wh = rb - lt  # [N, M, 2]
    wh[wh <= 0] = 0  # h<=0或w<=0，框的重叠区域即为0
    inter = wh[:,:,0] * wh[:,:,1] 

    area1 = (box1[:, 2]-box1[:, 0]) * (box1[:, 3]-box1[:, 1])  # [N,]
    area2 = (box2[:, 2]-box2[:, 0]) * (box2[:, 3]-box2[:, 1])  # [M,]
    area1 = area1.unsqueeze(1).expand(N, M)  # [N, M]
    area2 = area2.unsqueeze(0).expand(N, M)  # [N, M]

    return inter / (area1 + area2 - inter)

四、NMS（非极大值抑制）

作用：在目标检测任务中，模型通常会输出大量重叠的候选检测框，NMS的作用是消除这些冗余的框，只保留置信度最高的框
步骤：排序（按置信度由高到低给框排序）→ [ 选择（选置信度最高的检测框加入检测结果列表）→ 计算IoU（选中的框和其他框的交并比）→ 去除重叠度高的框（根据阈值筛选）] × N(直到所有检测框都被检查过) → 输入结果列表

代码：

import torch

def nms(bboxes, scores, thresh=0.5):
    # bboxes：所有的检测框；scores：置信度；thresh：重叠区域
    x1, y1 = bboxes[:, 0], bboxes[:, 1]
    x2, y2 = bboxes[:, 2], bboxes[:, 3]
    areas = (x2 - x1) * (y2 - y1)  # bboxes area: [N, ]

    _, order = scores.sort(0, descending=True) # 给置信度得分排序

    keep = []  # 保留下来的框

    while order.numel() > 0:
        if order.numel() == 1: # 如果只剩一个未处理的框，直接保留
            i = order.item()
            keep.append(i)
            break
        else:
            i = order[0].item()  # 取当前剩余的置信度最高的框
            keep.append(i)
            
        # 计算iou
        xx1 = x1[order[1:]].clamp(min=x1[i])  # xx1就是order[1:]这些框的x1和order[0]的x1取 
        yy1 = y1[order[1:]].clamp(min=y1[i]) 
        xx2 = x2[order[1:]].clamp(max=x2[i]) 
        yy2 = y2[order[1:]].clamp(max=y2[i]) 

        inter = (xx2 - xx1).clamp(min=0) * (yy2 - yy1).clamp(min=0) 
        union = areas[i] + areas[order[1:]] - inter

        IoU = inter / union

        idx = (IoU <= thresh).nonzero().squeeze() # 重叠区域小于阈值的框

        if idx.numel() == 0:
            break
        order = order[idx+1]  # order[0]是bboxes[i]；IoU是order[1:]和order[0]对应box的IoU

    return torch.tensor(keep, dtype=torch.long)