Deep neural networks achieve outstanding performance in visual recognition tasks, yet their large number of parameters makes them less practical for real-world applications. Recently, one-shot pruning has emerged as an effective strategy for reducing model size without additional training. However, models trained with standard objective functions often suffer a significant drop in accuracy after aggressive pruning. Some existing pruning-robust optimizers, such as SAM, and CrAM, mitigate this accuracy drop by guiding the model toward flatter regions of the parameter space, but they inevitably incur non-negligible additional computations. We propose a Variance Amplifying Regularizer (VAR) that deliberately increases the variance of model parameters during training. Our study reveals an intriguing finding that parameters with higher variance exhibit greater pruning robustness. VAR exploits this property by promoting such variance in the weight distribution, thereby mitigating the adverse effects of pruning. We further provide a theoretical analysis of its convergence behavior, supported by extensive empirical results demonstrating the superior pruning robustness of VAR.
翻译:深度神经网络在视觉识别任务中取得了卓越的性能,但其庞大的参数量使其在实际应用中不够实用。近年来,一次性剪枝已成为一种无需额外训练即可减小模型规模的有效策略。然而,使用标准目标函数训练的模型在激进剪枝后通常会出现显著的精度下降。一些现有的剪枝鲁棒优化器,如SAM和CrAM,通过引导模型朝向参数空间中更平坦的区域来缓解这种精度下降,但它们不可避免地引入了不可忽略的额外计算开销。我们提出了一种方差放大正则化器(VAR),在训练过程中有意增加模型参数的方差。我们的研究揭示了一个有趣的发现:方差较高的参数表现出更强的剪枝鲁棒性。VAR通过促进权重分布中的这种方差来利用这一特性,从而减轻剪枝的不利影响。我们进一步对其收敛行为进行了理论分析,并通过大量实证结果证明了VAR在剪枝鲁棒性方面的优越性。