缩小代用差距改进锐化知识培训 (Surrogate Gap Minimization Improves Sharpness-Aware Training)

The recently proposed Sharpness-Aware Minimization (SAM) improves generalization by minimizing a \textit{perturbed loss} defined as the maximum loss within a neighborhood in the parameter space. However, we show that both sharp and flat minima can have a low perturbed loss, implying that SAM does not always prefer flat minima. Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small. The surrogate gap is easy to compute and feasible for direct minimization during training. Based on the above observations, we propose Surrogate \textbf{G}ap Guided \textbf{S}harpness-\textbf{A}ware \textbf{M}inimization (GSAM), a novel improvement over SAM with negligible computation overhead. Conceptually, GSAM consists of two steps: 1) a gradient descent like SAM to minimize the perturbed loss, and 2) an \textit{ascent} step in the \textit{orthogonal} direction (after gradient decomposition) to minimize the surrogate gap and yet not affect the perturbed loss. GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities. Theoretically, we show the convergence of GSAM and provably better generalization than SAM. Empirically, GSAM consistently improves generalization (e.g., +3.2\% over SAM and +5.4\% over AdamW on ImageNet top-1 accuracy for ViT-B/32). Code is released at \url{ https://sites.google.com/view/gsam-iclr22/home}.

翻译：最近提出的尖锐度- 最小化( SAM), 通过将周围半径最小化( 得出环状损失 ), 来改善总体化。代理值差距很容易在参数空间的附近地区进行计算, 并且可以直接最小化。但是, 我们显示, 尖锐度和扁度的微型可以带来低扰动损失, 意味着 SAM 并不总是喜欢平滑度的迷你。相反, 我们定义了一个与Hesian 的主导值相当的典型值, 在附近地区半径小( 以得出环状损失 ) 的地方最小化。在培训期间, 代理值差距很容易计算, 直接最小化。基于上述观察, 我们提议 Surrogate\ textbf{G} 向导\ textbf{S} sharpourf{A} kood keylocketroupupal a glodalal discoupal discoal ( MA) 和Slentral discoal a slateal develop subal) subal sultal sub) sult surrevation ( sub) sub) sub) subly a subly a subly demovaltiblementaltial dest sub) sub) sub) sub) sub) sublemental destaltibaltibaltialtibal sublemental sub) sublemental demoto sub. ( sublement ( sub) sub) sub) ( sub) sub) sub) sub) sub) sub) sub) sub) sub) sublement ( sub) subal sub) sub) sub) sub) sub sub sub) subal subal subal subal subal subal sub) subal subal sub subal dest sublement. (