Deep neural networks (DNNs) achieve remarkable performance but often suffer from overfitting due to their high capacity. We introduce Momentum-Adaptive Gradient Dropout (MAGDrop), a novel regularization method that dynamically adjusts dropout rates on activations based on current gradients and accumulated momentum, enhancing stability in non-convex optimization landscapes. To theoretically justify MAGDrop's effectiveness, we derive a non-asymptotic, computable PAC-Bayes generalization bound that accounts for its adaptive nature, achieving up to 29.2\% tighter bounds compared to standard approaches by leveraging momentum-driven perturbation control. Empirically, the activation-based MAGDrop achieves competitive performance on MNIST (99.52\%) and CIFAR-10 (92.03\%), with generalization gaps of 0.48\% and 6.52\%, respectively. We provide fully reproducible code and numerical computation of our bounds to validate our theoretical claims. Our work bridges theoretical insights and practical advancements, offering a robust framework for enhancing DNN generalization, making it suitable for high-stakes applications.
翻译:深度神经网络(DNNs)虽能取得卓越性能,但因其高容量常面临过拟合问题。本文提出动量自适应梯度丢弃(MAGDrop),一种新颖的正则化方法,该方法基于当前梯度与累积动量动态调整激活值的丢弃率,从而增强非凸优化景观中的稳定性。为从理论上验证MAGDrop的有效性,我们推导了一个非渐近、可计算的PAC-Bayes泛化界,该界考虑了其自适应特性,并通过利用动量驱动的扰动控制,实现了相较于标准方法高达29.2%的更紧界。实证结果表明,基于激活的MAGDrop在MNIST(99.52%)和CIFAR-10(92.03%)数据集上取得了具有竞争力的性能,其泛化差距分别为0.48%和6.52%。我们提供了完全可复现的代码及泛化界的数值计算,以验证理论主张。本研究连接了理论洞见与实践进展,为提升DNN泛化能力提供了一个鲁棒框架,适用于高风险应用场景。