通过双 Momentum 优化存储双层的近最佳算法 (A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum)

This paper proposes a new algorithm -- the Single-timescale double-momentum Stochastic Approximation (SUSTAIN) -- for tackling unconstrained bilevel optimization problems. We focus on stochastic bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on two-timescale or double loop techniques that track the optimal solution to the lower level subproblem, we design a stochastic momentum assisted gradient estimator for both the upper and lower level updates. The latter allows us to gradually control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. We show that if the upper objective function is smooth but possibly non-convex (resp. strongly-convex), SUSTAIN requires $\mathcal{O}(\epsilon^{-3/2})$ (resp. $\mathcal{O}(\epsilon^{-1})$) iterations (each using constant samples) to find an $\epsilon$-stationary (resp. $\epsilon$-optimal) solution. The $\epsilon$-stationary (resp. $\epsilon$-optimal) solution is defined as the point where norm squared of the gradient of the outer function (resp. difference of outer function from optimal objective value) is less than or equal to $\epsilon$. The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known sample complexity for single-level stochastic gradient descent algorithms.

翻译：本文建议一种新的算法 -- -- 单一时间级双色双色软化缩略图( Sustain) -- -- 用于解决未受限制的双级优化问题。我们侧重于低级别子问题极强的双级问题, 而上级目标函数是平滑的。与以往依赖双度或双圈技术来跟踪较低级别子问题的最佳解决方案的工程不同, 我们为上级和下级更新设计一个随机性动动动助梯度梯度估计值。后者允许我们逐渐控制双级优化的梯度更新错误, 原因是对两个子问题的解决办法不准确。我们显示, 如果上级目标函数平滑, 但可能非convex( 强烈的convex), SUStain 需要 $mathcal{O} (\ epsilon%-3/2} $( respest septrial slickral $) 和 legal- sal- develop leal- sal- pal exal- exal exal exmodeal $.