【免费】YOLOv6-论文来啦资源-CSDN下载

需积分: 0 39 浏览量 2023-05-05 11:09:35 上传评论收藏 1.03MB PDF 举报

YOLOv6是YOLO（You Only Look Once）系列中最新的一种单阶段目标检测框架，专为工业应用设计。YOLO系列自推出以来，已成为业界高效目标检测的事实标准，广泛应用于各种硬件平台和多样化的场景。这篇技术报告的目的是进一步推动YOLO在工业领域的极限，以满足实际环境中对速度和精度的多样化需求。报告中提到，作者们广泛研究了来自工业界和学术界的最新目标检测进展，包括网络设计、训练策略、测试技术、量化和优化方法。他们将这些理念与自身实践相结合，构建了一系列不同规模的部署就绪网络，以适应不同的应用场景。 YOLOv6的性能通过图表进行了展示，与YOLOv5、YOLOv7、YOLOX和PP-YOLOE等其他前沿目标检测模型进行了比较。图中展示了在Tesla T4 GPU上使用TensorRT 7（除量化模型使用TensorRT 8）的FP16计算精度下的延迟（Latency）和吞吐量（Throughput）。延迟是指处理一个批次（BS=12或32）所需的时间，而吞吐量则是指每秒能处理的批次数。这些指标对于评估模型在实时应用中的性能至关重要。从图中可以看出，YOLOv6的不同变体在速度和精度之间达到了良好的平衡。例如，YOLOv6-N、YOLOv6-T、YOLOv6-S、YOLOv6-M和YOLOv6-L分别代表不同规模的模型，它们在保持高精度的同时，延迟和吞吐量也有所不同，这使得用户可以根据实际应用的性能需求选择合适的模型。此外，YOLOv6还包括一个量化版本（Quantized-YOLOv6-S），这表明该框架考虑到了在资源有限的设备上运行的效率。量化通常可以减少模型的计算复杂度，提高推理速度，同时保持相当的检测性能。总结来说，YOLOv6是针对工业应用优化的单阶段目标检测框架，它结合了最新的研究进展和实践经验，提供了不同规模的模型选项，以满足速度和精度的需求。通过量化和优化，YOLOv6能够在保持高性能的同时，适应广泛的硬件环境，提升了在实际工业场景中的实用性。

资源推荐

资源详情

资源评论

YOLOv6: A Single-Stage Object Detection Framework for Industrial

Applications

Chuyi Li

∗

Lulu Li

∗

Hongliang Jiang

∗

Kaiheng Weng

∗

Yifei Geng

∗

Liang Li

∗

Zaidan Ke

∗

Qingyuan Li

∗

Meng Cheng

∗

Weiqiang Nie

∗

Yiduo Li

∗

Bo Zhang

∗

Yufei Liang Linyuan Zhou Xiaoming Xu

†

Xiangxiang Chu Xiaoming Wei Xiaolin Wei

Meituan Inc.

{lichuyi, lilulu05, jianghongliang02, wengkaiheng, gengyifei,liliang58, kezaidan,

liqingyuan02, chengmeng05, nieweiqiang, liyiduo, zhangbo97, liangyufei, zhoulinyuan,

xuxiaoming04, chuxiangxiang, weixiaoming, weixiaolin02}@meituan.com

0 2 4 6 8 10

Tesla T4 TensorRT FP16 Latency (ms), BS=1

COCO AP (%)

YOLOv6-N

YOLOv6-T

YOLOv6-S

YOLOv6-M

YOLOv6-L

Quantized-

YOLOv6-S

YOLOv6

YOLOv5

YOLOv7

YOLOX

PP-YOLOE

200 400 600 800 1000 1200

Tesla T4 TensorRT FP16 Throughput (FPS), BS=32

COCO AP (%)

YOLOv6-N

YOLOv6-T

YOLOv6-S

YOLOv6-M

YOLOv6-L

Quantized-

YOLOv6-S

YOLOv6

YOLOv5

YOLOv7

YOLOX

PP-YOLOE

Figure 1: Comparison of state-of-the-art efﬁcient object detectors. Both latency and throughput (at a batch size of 32) are

given for a handy reference. All models are test with TensorRT 7 except that the quantized model is with TensorRT 8.

Abstract

For years, YOLO series have been de facto industry-level

standard for efﬁcient object detection. The YOLO com-

munity has prospered overwhelmingly to enrich its use in

a multitude of hardware platforms and abundant scenar-

ios. In this technical report, we strive to push its lim-

its to the next level, stepping forward with an unwavering

mindset for industry application. Considering the diverse

requirements for speed and accuracy in the real environ-

ment, we extensively examine the up-to-date object detec-

tion advancements either from industry or academy. Specif-

ically, we heavily assimilate ideas from recent network de-

sign, training strategies, testing techniques, quantization

and optimization methods. On top of this, we integrate

our thoughts and practice to build a suite of deployment-

* Equal contributions.

† Corresponding author.

ready networks at various scales to accommodate diversi-

ﬁed use cases. With the generous permission of YOLO au-

thors, we name it YOLOv6. We also express our warm wel-

come to users and contributors for further enhancement.

For a glimpse of performance, our YOLOv6-N hits 35.9%

AP on COCO dataset at a throughput of 1234 FPS on an

NVIDIA Tesla T4 GPU. YOLOv6-S strikes 43.5% AP at 495

FPS, outperforming other mainstream detectors at the same

scale (YOLOv5-S, YOLOX-S and PPYOLOE-S). Our quan-

tized version of YOLOv6-S even brings a new state-of-the-

art 43.3% AP at 869 FPS. Furthermore, YOLOv6-M/L also

achieves better accuracy performance (i.e., 49.5%/52.3%)

than other detectors with the similar inference speed. We

carefully conducted experiments to validate the effective-

ness of each component. Our code is made available

at https://github.com/meituan/YOLOv6.

arXiv:2209.02976v1 [cs.CV] 7 Sep 2022

1. Introduction

YOLO series have been the most popular detection

frameworks in industrial applications, for its excellent bal-

ance between speed and accuracy. Pioneering works of

YOLO series are YOLOv1-3 [32–34], which blaze a new

trail of one-stage detectors along with the later substan-

tial improvements. YOLOv4 [1] reorganized the detection

framework into several separate parts (backbone, neck and

head), and veriﬁed bag-of-freebies and bag-of-specials at

the time to design a framework suitable for training on a

single GPU. At present, YOLOv5 [10], YOLOX [7], PPY-

OLOE [44] and YOLOv7 [42] are all the competing candi-

dates for efﬁcient detectors to deploy. Models at different

sizes are commonly obtained through scaling techniques.

In this report, we empirically observed several impor-

tant factors that motivate us to refurnish the YOLO frame-

work: (1) Reparameterization from RepVGG [3] is a supe-

rior technique that is not yet well exploited in detection. We

also notice that simple model scaling for RepVGG blocks

becomes impractical, for which we consider that the elegant

consistency of the network design between small and large

networks is unnecessary. The plain single-path architecture

is a better choice for small networks, but for larger models,

the exponential growth of the parameters and the compu-

tation cost of the single-path architecture makes it infeasi-

ble; (2) Quantization of reparameterization-based detectors

also requires meticulous treatment, otherwise it would be

intractable to deal with performance degradation due to its

heterogeneous conﬁguration during training and inference.

(3) Previous works [7, 10, 42, 44] tend to pay less attention

to deployment, whose latencies are commonly compared

on high-cost machines like V100. There is a hardware gap

when it comes to real serving environment. Typically, low-

power GPUs like Tesla T4 are less costly and provide rather

good inference performance. (4) Advanced domain-speciﬁc

strategies like label assignment and loss function design

need further veriﬁcations considering the architectural vari-

ance; (5) For deployment, we can tolerate the adjustments

of the training strategy that improve the accuracy perfor-

mance but not increase inference costs, such as knowledge

distillation.

With the aforementioned observations in mind, we bring

the birth of YOLOv6, which accomplishes so far the best

trade-off in terms of accuracy and speed. We show the

comparison of YOLOv6 with other peers at a similar scale

in Fig. 1. To boost inference speed without much perfor-

mance degradation, we examined the cutting-edge quanti-

zation methods, including post-training quantization (PTQ)

and quantization-aware training (QAT), and accommodate

them in YOLOv6 to achieve the goal of deployment-ready

networks.

We summarize the main aspects of YOLOv6 as follows:

• We refashion a line of networks of different sizes tai-

lored for industrial applications in diverse scenarios.

The architectures at different scales vary to achieve the

best speed and accuracy trade-off, where small models

feature a plain single-path backbone and large models

are built on efﬁcient multi-branch blocks.

• We imbue YOLOv6 with a self-distillation strategy,

performed both on the classiﬁcation task and the re-

gression task. Meanwhile, we dynamically adjust the

knowledge from the teacher and labels to help the stu-

dent model learn knowledge more efﬁciently during all

training phases.

• We broadly verify the advanced detection techniques

for label assignment, loss function and data augmen-

tation techniques and adopt them selectively to further

boost the performance.

• We reform the quantization scheme for detection with

the help of RepOptimizer [2] and channel-wise distil-

lation [36], which leads to an ever-fast and accurate

detector with 43.3% COCO AP and a throughput of

869 FPS at a batch size of 32.

2. Method

The renovated design of YOLOv6 consists of the fol-

lowing components, network design, label assignment, loss

function, data augmentation, industry-handy improvements,

and quantization and deployment:

• Network Design: Backbone: Compared with other

mainstream architectures, we ﬁnd that RepVGG [3]

backbones are equipped with more feature represen-

tation power in small networks at a similar inference

speed, whereas it can hardly be scaled to obtain larger

models due to the explosive growth of the parame-

ters and computational costs. In this regard, we take

RepBlock [3] as the building block of our small net-

works. For large models, we revise a more efﬁcient

CSP [43] block, named CSPStackRep Block. Neck:

The neck of YOLOv6 adopts PAN topology [24] fol-

lowing YOLOv4 and YOLOv5. We enhance the neck

with RepBlocks or CSPStackRep Blocks to have Rep-

PAN. Head: We simplify the decoupled head to make

it more efﬁcient, called Efﬁcient Decoupled Head.

• Label Assignment: We evaluate the recent progress

of label assignment strategies [5, 7, 18, 48, 51] on

YOLOv6 through numerous experiments, and the re-

sults indicate that TAL [5] is more effective and

training-friendly.

• Loss Function: The loss functions of the mainstream

anchor-free object detectors contain classiﬁcation loss,

RepBlock

Efficient

decoupled head

Efficient

decoupled head

Efficient

decoupled head

RepBlock

EfficientRep

Backbone

Input

Rep-PAN U

: Up-sample

: Concatenation over channel dimension

cls.

reg.

cls.

reg.

cls.

reg.

Conv

RepBlock

Figure 2: The YOLOv6 framework (N and S are shown). Note for M/L, RepBlocks is replaced with CSPStackRep.

box regression loss and object loss. For each loss, we

systematically experiment it with all available tech-

niques and ﬁnally select VariFocal Loss [50] as our

classiﬁcation loss and SIoU [8]/GIoU [35] Loss as our

regression loss.

• Industry-handy improvements: We introduce addi-

tional common practice and tricks to improve the per-

formance including self-distillation and more train-

ing epochs. For self-distillation, both classiﬁcation

and box regression are respectively supervised by the

teacher model. The distillation of box regression is

made possible thanks to DFL [20]. In addition, the

proportion of information from the soft and hard labels

is dynamically declined via cosine decay, which helps

the student selectively acquire knowledge at different

phases during the training process. In addition, we en-

counter the problem of the impaired performance with-

out adding extra gray borders at evaluation, for which

we provide some remedies.

• Quantization and deployment: To cure the perfor-

mance degradation in quantizing reparameterization-

based models, we train YOLOv6 with RepOpti-

mizer [2] to obtain PTQ-friendly weights. We fur-

ther adopt QAT with channel-wise distillation [36] and

graph optimization to pursue extreme performance.

Our quantized YOLOv6-S hits a new state of the art

with 42.3% AP and a throughput of 869 FPS (batch

size=32).

2.1. Network Design

A one-stage object detector is generally composed of the

following parts: a backbone, a neck and a head. The back-

bone mainly determines the feature representation ability,

meanwhile, its design has a critical inﬂuence on the infer-

ence efﬁciency since it carries a large portion of computa-

tion cost. The neck is used to aggregate the low-level phys-

ical features with high-level semantic features, and then

build up pyramid feature maps at all levels. The head

consists of several convolutional layers, and it predicts ﬁ-

nal detection results according to multi-level features as-

sembled by the neck. It can be categorized as anchor-

based and anchor-free, or rather parameter-coupled head

and parameter-decoupled head from the structure’s per-

spective.

In YOLOv6, based on the principle of hardware-

friendly network design [3], we propose two scaled re-

parameterizable backbones and necks to accommodate

models at different sizes, as well as an efﬁcient decoupled

head with the hybrid-channel strategy. The overall architec-

ture of YOLOv6 is shown in Fig. 2.

2.1.1 Backbone

As mentioned above, the design of the backbone network

has a great impact on the effectiveness and efﬁciency of the

detection model. Previously, it has been shown that multi-

branch networks [13, 14, 38, 39] can often achieve better

classiﬁcation performance than single-path ones [15, 37],

but often it comes with the reduction of the parallelism and

results in an increase of inference latency. On the contrary,

plain single-path networks like VGG [37] take the advan-

tages of high parallelism and less memory footprint, lead-

ing to higher inference efﬁciency. Lately in RepVGG [3],

a structural re-parameterization method is proposed to de-

couple the training-time multi-branch topology with an

inference-time plain architecture to achieve a better speed-

accuracy trade-off.

Inspired by the above works, we design an efﬁcient

re-parameterizable backbone denoted as EfﬁcientRep. For

small models, the main component of the backbone is Rep-

Block during the training phase, as shown in Fig. 3 (a). And

each RepBlock is converted to stacks of 3 × 3 convolu-

tional layers (denoted as RepConv) with ReLU activation

functions during the inference phase, as shown in Fig. 3 (b).

Typically a 3×3 convolution is highly optimized on main-

stream GPUs and CPUs and it enjoys higher computational

density. Consequently, EfﬁcientRep Backbone sufﬁciently

剩余16页未读，继续阅读

评论收藏

内容反馈

福尔摩星儿

粉丝: 0

YOLOv6 - 论文来啦

最新资源

YOLOv6 - 论文来啦

美团版yolov6 0.2.1版本 Implementation of paper - YOLOv6: A Single-St

探索多任务学习在YOLOv6实时目标检测中的应用.docx

YOLOv5 vs YOLOv6 vs YOLOv7目标检测模型速度和准确度的性能比较-深入研究.docx

YOLOv6 v3.0：全面重新加载

目标检测+YOLOv6+训练权重

基于YOLOv8-Pose的姿态识别项目，带数据集可直接跑通的源码

YOLOV1-V7英文论文，深度学习、目标检测领域必读经典论文

基于yolov5的旋转目标检测yolov5-obb-master.zip

yolov4-tiny.zip

yolov论文-改进 YOLOv5s 的细胞培养板分类识别方法研究

深度学习目标检测+YOLOv6+训练权重

YOLOv6代码和权重

YOLOv6 自定义数据集训练-水下垃圾检测教程与源码.zip

YOLOv6-main-官方代码+权重.rar

YOLOv6训练自己的目标检测数据集.docx

yolov3-spp.zip

YOLOv1-YOLOv5论文解读.pdf

yolov8视觉学习yolov8-master.zip

yolov8-42-yolov8训练自己的数据集

yolov论文-基于改进的YOLOv射线探伤缺陷检测方法

基于YOLOv6的交通标志检测识别数据集+代码+权重文件+教程.zip

YOLOv3-训练-修剪.zip

对yolov1-v4系列论文的思考和解释

yolov11-vehicle-speed-traffic-detection-yolo-deepsort-main.zip

学习yolo v10,yolov10-main.zip

面向采摘机器人的改进YOLOv3-tiny轻量化柑橘识别方法.docx

YOLOV3-NANO-Tensorflow.zip

yolov3-tiny.weights.rar

论文实现-YOLOv9使用可编程梯度信息学习你想学的东西.zip

yolov论文-gradio-yolov5-det-blocks-master.zip

tcp协议和udp协议整理

virtual-jade:将Jade模板编译为虚拟DOM库的Hyperscript

最新资源