
1. Introduction
YOLO series have been the most popular detection
frameworks in industrial applications, for its excellent bal-
ance between speed and accuracy. Pioneering works of
YOLO series are YOLOv1-3 [32–34], which blaze a new
trail of one-stage detectors along with the later substan-
tial improvements. YOLOv4 [1] reorganized the detection
framework into several separate parts (backbone, neck and
head), and verified bag-of-freebies and bag-of-specials at
the time to design a framework suitable for training on a
single GPU. At present, YOLOv5 [10], YOLOX [7], PPY-
OLOE [44] and YOLOv7 [42] are all the competing candi-
dates for efficient detectors to deploy. Models at different
sizes are commonly obtained through scaling techniques.
In this report, we empirically observed several impor-
tant factors that motivate us to refurnish the YOLO frame-
work: (1) Reparameterization from RepVGG [3] is a supe-
rior technique that is not yet well exploited in detection. We
also notice that simple model scaling for RepVGG blocks
becomes impractical, for which we consider that the elegant
consistency of the network design between small and large
networks is unnecessary. The plain single-path architecture
is a better choice for small networks, but for larger models,
the exponential growth of the parameters and the compu-
tation cost of the single-path architecture makes it infeasi-
ble; (2) Quantization of reparameterization-based detectors
also requires meticulous treatment, otherwise it would be
intractable to deal with performance degradation due to its
heterogeneous configuration during training and inference.
(3) Previous works [7, 10, 42, 44] tend to pay less attention
to deployment, whose latencies are commonly compared
on high-cost machines like V100. There is a hardware gap
when it comes to real serving environment. Typically, low-
power GPUs like Tesla T4 are less costly and provide rather
good inference performance. (4) Advanced domain-specific
strategies like label assignment and loss function design
need further verifications considering the architectural vari-
ance; (5) For deployment, we can tolerate the adjustments
of the training strategy that improve the accuracy perfor-
mance but not increase inference costs, such as knowledge
distillation.
With the aforementioned observations in mind, we bring
the birth of YOLOv6, which accomplishes so far the best
trade-off in terms of accuracy and speed. We show the
comparison of YOLOv6 with other peers at a similar scale
in Fig. 1. To boost inference speed without much perfor-
mance degradation, we examined the cutting-edge quanti-
zation methods, including post-training quantization (PTQ)
and quantization-aware training (QAT), and accommodate
them in YOLOv6 to achieve the goal of deployment-ready
networks.
We summarize the main aspects of YOLOv6 as follows:
• We refashion a line of networks of different sizes tai-
lored for industrial applications in diverse scenarios.
The architectures at different scales vary to achieve the
best speed and accuracy trade-off, where small models
feature a plain single-path backbone and large models
are built on efficient multi-branch blocks.
• We imbue YOLOv6 with a self-distillation strategy,
performed both on the classification task and the re-
gression task. Meanwhile, we dynamically adjust the
knowledge from the teacher and labels to help the stu-
dent model learn knowledge more efficiently during all
training phases.
• We broadly verify the advanced detection techniques
for label assignment, loss function and data augmen-
tation techniques and adopt them selectively to further
boost the performance.
• We reform the quantization scheme for detection with
the help of RepOptimizer [2] and channel-wise distil-
lation [36], which leads to an ever-fast and accurate
detector with 43.3% COCO AP and a throughput of
869 FPS at a batch size of 32.
2. Method
The renovated design of YOLOv6 consists of the fol-
lowing components, network design, label assignment, loss
function, data augmentation, industry-handy improvements,
and quantization and deployment:
• Network Design: Backbone: Compared with other
mainstream architectures, we find that RepVGG [3]
backbones are equipped with more feature represen-
tation power in small networks at a similar inference
speed, whereas it can hardly be scaled to obtain larger
models due to the explosive growth of the parame-
ters and computational costs. In this regard, we take
RepBlock [3] as the building block of our small net-
works. For large models, we revise a more efficient
CSP [43] block, named CSPStackRep Block. Neck:
The neck of YOLOv6 adopts PAN topology [24] fol-
lowing YOLOv4 and YOLOv5. We enhance the neck
with RepBlocks or CSPStackRep Blocks to have Rep-
PAN. Head: We simplify the decoupled head to make
it more efficient, called Efficient Decoupled Head.
• Label Assignment: We evaluate the recent progress
of label assignment strategies [5, 7, 18, 48, 51] on
YOLOv6 through numerous experiments, and the re-
sults indicate that TAL [5] is more effective and
training-friendly.
• Loss Function: The loss functions of the mainstream
anchor-free object detectors contain classification loss,
2