学习在学习优化中普遍推广 (Learning to Generalize Provably in Learning to Optimize)

Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or ``generalizable learning of optimizers"); and (ii) the test performance of an optimizee (itself as a machine learning model), trained by the optimizer, in terms of the accuracy over unseen data (optimizee generalization, or ``learning to generalize"). While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper. We first theoretically establish an implicit connection between the local entropy and the Hessian, and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions. We then propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize, and theoretically show that such generalization ability can be learned during the L2O meta-training process and then transformed to the optimizee loss function. Extensive experiments consistently validate the effectiveness of our proposals with substantially improved generalization on multiple sophisticated L2O models and diverse optimizees. Our code is available at: https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy.

翻译：优化学习(L2O)越来越受欢迎,这通过数据驱动的方法使优化优化的设计自动化。然而,目前的L2O方法在至少两个折叠中往往会因多重通用性效绩差而受害:(一) 将L2O学习的优化优化应用到看不见的优化,降低损失函数值(优化通用,或“普遍优化优化优化优化”);(二) 由优化者培训的优化(自动优化为机器学习模式)的测试性能(优化为机器学习模式),其精密性能比未见数据的准确性(优化普局化,或持续学习超常规化 ) 。虽然最近已经研究了优化的通用性能,但在L2O背景下并未严格研究优化的优化通用性优化(或学习常规化),这是本文的目的。我们首先从理论上在本地通则和赫斯兰斯文之间建立起隐含的连接,从而统一了它们自己在手工制作的通用优化优化优化优化的优化优化的流程中的角色,相当于景观稳定度/损失功能的衡量标准。我们随后提议在普通的LO值学习过程中将这些常规的机能转化为常规的常规学习。