解除对多参数化PAC-Bayesian学习的优化和普及的神秘化 (Demystify Optimization and Generalization of Over-parameterized PAC-Bayesian Learning)

PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution whilst incorporating the prior knowledge. In addition to being a pure generalization bound analysis tool, PAC-Bayesian bound can also be incorporated into an objective function to train a probabilistic neural network, making them a powerful and relevant framework that can numerically provide a tight generalization bound for supervised learning. For simplicity, we call probabilistic neural network learned using training objectives derived from PAC-Bayesian bounds as {\it PAC-Bayesian learning}. Despite their empirical success, the theoretical analysis of PAC-Bayesian learning for neural networks is rarely explored. This paper proposes a new class of convergence and generalization analysis for PAC-Bayes learning when it is used to train the over-parameterized neural networks by the gradient descent method. For a wide probabilistic neural network, we show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression when the probabilistic neural tangent kernel (PNTK) is used as its kernel. Based on this finding, we further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network. Finally, drawing the insight from our theoretical results, we propose a proxy measure for efficient hyperparameters selection, which is proven to be time-saving.

翻译：PAC-BAYESian 是一个分析框架,其中培训错误可以表现为事后分配中假设的加权平均值,同时纳入先前的知识。PAC-BAYESian 约束除了是一个纯粹的通用约束分析工具外,还可以将PAC-BAYESian 约束纳入一个客观功能中,以培训概率神经网络,使其成为一个强大和相关的框架,从数字上为监督学习提供一个紧凑的概括性框架。为了简单起见,我们把利用PAC-Bayesian 界限作为Shuit PAC-Bayesian 学习的培训目标所学到的概率性神经网络。尽管它们取得了经验上的成功,但是对PAC-BAYes 神经网络的理论性分析却很少得到探讨。本文建议为PAC-BAYES学习一个新型的趋同级神经网络进行新的类趋同性分析,用来培训过度的神经神经网络。对于广泛的不稳定性神经网络来说,我们使用这种趋同的结果是用来进一步的。