论文网址:[2405.11276] Visible and Clear: Finding Tiny Objects in Difference Map
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
2.4.3. Difference Map Guided Feature Enhancement
2.5.3. Results on VisDrone2019 and AI-TOD
2.5.4. Ablation Study and Discussion
1. 心得
(1)大晚上睡不着觉看个短篇怡情吧
(2)想起来了是个对我来说还挺新颖的idea,已经不是初见了比比
(3)到底是哪些老师在没有爷爷先有孙子?感觉论文相关工作还是得从最开始的拉到现在吧,直接一竿子打死说经典的论文时间上太久的老师是真难评啊。建议多看看论文而不是公众号
(4)ECCV风格真的很明显的简单易懂但很有新意!适合当作睡前读物
2. 论文逐段精读
2.1. Abstract
①The solution of tiny object detection in existing works: feature enhancement. While the limitations are spurious textures and artifacts
2.2. Introduction
①Definition of object:
very tiny | tiny | small | |
MS COCO | - | - | ≤ 32 × 32 |
AI-TOD | 2~8 | 8~16 | 16~32 |
②Problem: downsampling swallows tiny objects
③Feature map example of "disappeared" tiny drone:
④创新性就是如上图,用小目标存在的图减去小目标被背景吞掉的图(很容易下采样没了)就是小目标本身。然后还发了个数据集,贡献力度还是挺大的
2.3. Related Work
2.3.1. Object Detection
①Lists traditional two-stage and one-stage object detections
②介绍完之后没有评价什么,就说都很成熟了让他们很容易集成。其实不总需要批判,到底是哪些老师非要现有不足不足不足然后学生一个人开天辟地啊(作者这样写是因为真的集成了啊其他人不用乱抄)
2.3.2. Tiny Object Detection
①Existing works: focus on data augmentation, scale awareness, context modeling, feature imitation, label assignment
2.3.3. Anti-UAV Dataset
①Current UAV datasets: MAV-VID, Drone-vs-Bird, and DUT Anti-UAV
2.4. Method
2.4.1. Overall Architecture
①Framework of their work:
2.4.2. Difference Map
①The up block is constructed by:
where denotes ReLU,
is convolution with kernel size of
,
denotes the Transpose Convolution(again,写视觉的公式略显无聊,主要是图上都有了就把名字搬下来。后面就不写了只搬运形状吧)
②RH里面Conv的核大小是
③Difference map comes from:
where denotes computing the mean value along the channel dimension, and
denotes computing the absolute value of each element.
④Reconstruction loss: MSE
2.4.3. Difference Map Guided Feature Enhancement
①Element-wise attention matrix in Difference Map Guided Feature Enhancement (DGFE) is
②Filtration block:
where denotes the Sign function(是什么具体的信号变换方程就叫这个名字吗?),
denotes learnable threshold
2.4.4. DroneSwarms Dataset
①Including 9,109 images and 242,218 annotated UAV instances, with 2,532 used for testing and 6,577 used for training. On average, each image contains 26.59 drone instances. The images are in the size of 1920 × 1080, manually labeled with high precision.
②Enviroment: urban environments, mountainous terrain, and skies, among others
③Contains 241,249 tiny objects with size of 32 pixels or below, accounting for approximately 99.60%, and the average size is only about 7.9 pixels. The drones are dispersed across the entirety of the image.
2.5. Experiment
2.5.1. Experimental Setting
①Datasets: DroneSwarms, VisDrone2019, and AI-TOD
(1)DroneSwarms
①Initial learning rate: 0.0025
②Optimizer: Stochastic Gradient Descent (SGD) with 0.9 momenta, 0.0001 weight decay
③Epoch: 20(作者是不是采用了非常中式的英语啊...我比较少有看到论文里面这么写的:
类似中文“两个batch size”什么的...但实际上要写成with a batch size of 2什么的吧?主要是2也不是形容词)
④Batch size: 2
⑤Anchor scale
(2)VisDrone2019 and AI-TOD
①Initial learning rate: 0.005
②Optimizer: Stochastic Gradient Descent (SGD) with 0.9 momenta, decays at the 8th and 11th epochs
2.5.2. Results on DroneSwarms
①Performance table:
2.5.3. Results on VisDrone2019 and AI-TOD
①Performance table on VisDrone2019:
②Performance on AI-TOD:
2.5.4. Ablation Study and Discussion
①Module ablation:
②Ablation study on threshold:
③Ablation of feature enhancement methods:
④Performance of different designs of difference map:
⑤Different types of difference map:
2.5.5. Visualization Analysis
①Visualization on DroneSwarms:
2.6. Conclusion
~