利用谨慎神经网络的公平性优化大型示范培训</s> (Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training)

Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e. setting to zero) 80-90% of the parameters in a neural network to yield sparse subnetworks that equal the accuracy of the unpruned parent network. In this work, we propose a novel approach that exploits these sparse subnetworks to optimize the memory utilization and communication in two popular algorithms for parallel deep learning namely -- data and inter-layer parallelism. We integrate our approach into AxoNN, a highly scalable framework for parallel deep learning that relies on data and inter-layer parallelism, and demonstrate the reduction in communication time and memory utilization. On 512 NVIDIA V100 GPUs, our optimizations reduce the memory consumption of a 2.7 billion parameter model by 74%, and the total communication time by 40%, thus providing an overall speedup of 34% over AxoNN, 32% over DeepSpeed-3D and 46% over Sputnik, a sparse matrix computation baseline.

翻译：大规模神经网络平行培训具有挑战性,因为通信带来的大量间接费用。最近,深层学习研究人员开发了各种运行算法,这些算法可以运行(即将神经网络参数的80-90%设定为零)80-90%,以产生与未运行的父网络精度相等的稀薄子网络。在这项工作中,我们提出了一个新颖的方法,利用这些稀疏的子网络优化记忆利用和通信,在两个常用的平行深层学习算法中,即数据和跨层平行学习。我们将我们的方法纳入AxoNN,这是一个高度可扩展的平行深层学习框架,依靠数据和跨层平行平行学习,并显示通信时间和记忆利用的减少。关于512 NVIDIA V100 GPUs,我们的优化将270亿参数模型的记忆消耗减少74%,通信总时间减少40%,从而提供了比AxONN(34%)、DeepSpeed-3D(32%)和Sputnik(Sputnik)(一个分散的矩阵计算基线)的总体速度达到34%。</s>

相关内容

Networking

关注 0

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日