预培训模式中存在彩票奖项 (Lottery Jackpots Exist in Pre-trained Models)

Network pruning is an effective approach to reduce network complexity with acceptable performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight training or complex searching on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involvement of weight training, termed "lottery jackpots", exist in pre-trained models with unexpanded width. Furthermore, we improve the efficiency for searching lottery jackpots from two perspectives. Firstly, we observe that the sparse masks derived from many existing pruning criteria have a high overlap with the searched mask of our lottery jackpot, among which, the magnitude-based pruning results in the most similar mask with ours. Consequently, our searched lottery jackpot removes 90% weights in ResNet-50, while it easily obtains more than 70% top-1 accuracy using only 5 searching epochs on ImageNet. In compliance with this insight, we initialize our sparse mask using the magnitude-based pruning, resulting in at least 3x cost reduction on the lottery jackpot searching while achieving comparable or even better performance. Secondly, we conduct an in-depth analysis of the searching process for lottery jackpots. Our theoretical result suggests that the decrease in training loss during weight searching can be disturbed by the dependency between weights in modern networks. To mitigate this, we propose a novel short restriction method to restrict change of masks that may have potential negative impacts on the training loss. Our code is available at https://github.com/zyxxmu/lottery-jackpots.

翻译：网络运行是一种有效的方法,可以降低网络复杂性,可以接受业绩妥协。现有的研究通过时间耗重培训或对宽度扩大的网络进行复杂搜索,从而实现神经网络的广度,这极大地限制了网络运行的应用。在本文中,我们显示,在没有重量培训参与的情况下,高性能和稀少的子网络,称为“彩虹中奖”,存在于预先培训的模型中,且宽度未变广。此外,我们从两个角度提高搜索彩票决牌的效率。首先,我们观察到,从许多现有裁剪标准中得来的稀薄面罩与我们彩虹罐的搜索面罩有高度重叠,其中,在与我们最相似的掩罩中,基于规模的裁剪结果大大限制了。因此,我们搜索的彩虹中彩虹中90%的重量,而它很容易获得超过70%的顶级1级精度,只有5次在图像网上搜索“彩虹”。为了遵守这一洞察,我们使用基于规模的钻探, 开始使用我们的稀薄面面面面面罩,导致至少3x成本降低我们彩虹中头彩票中的潜在重量影响。在搜索过程中,我们在搜索过程中,我们可以进行更精确地分析。