[NeurIPS 2022] Leveraging Inter-Layer Dependency for Post-Training Quantization

连理o

于 2024-11-28 23:31:42 发布

阅读量673

点赞数 14

CC 4.0 BY-SA版权

文章标签： NeurIPS 2022

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_42437114/article/details/144114901

模型部署专栏收录该内容

42 篇文章

订阅专栏

Contents

Introduction
Method
Experiments
References

Introduction

作者提出一种端到端的 PTQ 训练策略 Network-Wise Quantization (NWQ)，并通过 Annealing Softmax (ASoftmax) 和 Annealing Mixup (AMixup) 改进了 AdaRound，降低了训练收敛难度

Method

Activation Regularization (AR). 采用端到端而非 layer/block-wise 优化每个 block 的量化损失
Annealing Softmax (ASoftmax). 类似于 AdaRound，采用 Adaptive Rounding，但不同的是作者采用 Softmax 而非 Sigmoid，这使得 rounding 范围由 0~1 扩展到了 $n$ ~ $m$ ，但相应得训练参数量也增加到了原来的 $m - n + 1$ 倍 (不过作者默认采用 $n = 0, m = 1$ ，所以 ASoftmax 的优势很大可能来自与 AdaRound 的第二点不同，也就是加速模型收敛；如果扩展 $m, n$ ，那么随着训练参数量的增加，如果校准数据比较少，模型容易过拟合)
此外，不同于 AdaRound 采用正则项促使 $h(\mathbf V)$ 趋近 0/1，而作者认为这个正则项和量化损失其实是冲突的 (量化损失会促使 $h(\mathbf V)$ 趋近 $\frac{\mathbf w}{s}-\lfloor\frac{\mathbf w}{s}\rfloor$ )，这会导致 AdaRound 不容易收敛；对此，作者借助 softmax temperature 帮助模型更好收敛
其中， $\tau^t$ 代表 iter $t$ 时刻的 temperature，从 1 线性衰减到 0.01；作者还给出了 $\mathbf V_i$ 的初始化策略 $\mathbf V_i=\log(\sigma'(\mathbf V)_i)$ ，这样可以使得初始 rounding 与原始权重尽可能接近，证明可参考附录 A
Annealing Mixup (AMixup). 采用 mixup 混合全精度模型输出和量化模型输出，作为 AR 中的优化目标 $a_l$ ，其中全精度模型输出在 iter $t$ 所占比例从 $P_s=0.5$ 线性衰减到 $P_e=0$ 从而帮助模型更好收敛

Experiments

Comprehensive Comparison.
Ablation Study. (1) AR.
(2) ASoftmax.
(3) AMixup.

References

Zheng, DanDan, Yuanliu Liu, and Liang Li. “Leveraging inter-layer dependency for post-training quantization.” Advances in Neural Information Processing Systems 35 (2022): 6666-6679.

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。