This paper presents four theoretical contributions that improve the usability of risk certificates for neural networks based on PAC-Bayes bounds. First, two bounds on the KL divergence between Bernoulli distributions enable the derivation of the tightest explicit bounds on the true risk of classifiers across different ranges of empirical risk. The paper next focuses on the formalization of an efficient methodology based on implicit differentiation that enables the introduction of the optimization of PAC-Bayesian risk certificates inside the loss/objective function used to fit the network/model. The last contribution is a method to optimize bounds on non-differentiable objectives such as the 0-1 loss. These theoretical contributions are complemented with an empirical evaluation on the MNIST and CIFAR-10 datasets. In fact, this paper presents the first non-vacuous generalization bounds on CIFAR-10 for neural networks. Code to reproduce all experiments is available at github.com/Diegogpcm/pacbayesgradients.
翻译:本文提出四项理论贡献,旨在提升基于PAC-Bayes界的神经网络风险证书的实用性。首先,通过推导伯努利分布间KL散度的两个上界,实现了在不同经验风险范围内分类器真实风险的最紧显式界。其次,论文重点形式化了一种基于隐函数微分的高效方法,使得在训练网络/模型的目标函数中能够直接优化PAC-Bayesian风险证书。最后一项贡献是针对不可微目标函数(如0-1损失)的界优化方法。这些理论成果在MNIST和CIFAR-10数据集上进行了实证评估。值得注意的是,本文首次为CIFAR-10上的神经网络提供了非平凡泛化界。所有实验的复现代码已公开于github.com/Diegogpcm/pacbayesgradients。