在深神经网络中查找任务- 优化的低B子子分布 (Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks)

Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment. However, quantization inevitably leads to a distribution divergence from the original network, which generally degrades the performance. To tackle this issue, massive efforts have been made, but most existing approaches lack statistical considerations and depend on several manual configurations. In this paper, we present an adaptive-mapping quantization method to learn an optimal latent sub-distribution that is inherent within models and smoothly approximated with a concrete Gaussian Mixture (GM). In particular, the network weights are projected in compliance with the GM-approximated sub-distribution. This sub-distribution evolves along with the weight update in a co-tuning schema guided by the direct task-objective optimization. Sufficient experiments on image classification and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method. Besides, an efficient deployment flow for the mobile CPU is developed, achieving up to 7.46$\times$ inference acceleration on an octa-core ARM CPU. Codes have been publicly released on Github (https://github.com/RunpeiDong/DGMS).

翻译：量化神经网络通常需要较少的记忆足迹和较低的计算复杂性,这对有效部署至关重要。然而,量化不可避免地导致与原始网络的分布差异,这通常会降低性能。为解决这一问题,已经做出了大量努力,但大多数现有方法缺乏统计考虑,并取决于若干人工配置。在本文件中,我们提出了一个适应性绘图量化方法,以学习模型内固有的、与混凝土Gaussian Mixture(GM)相近的优化潜在子分配的最佳潜在分布;特别是,预测网络重量符合GM-apbloid分流。这一次分配随着在直接任务目标优化指导下的组合调整系统中更新重量而不断演变。关于图像分类和物体探测的各种现代结构的足够实验显示了拟议方法的有效性、通用属性和可转移性。此外,还开发了移动式CPU的高效部署流量,达到7.46美元/时间。在GIAR CPU/MPUD Compubs上已公开发布(http://MSPUB/DGDG)。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日