We study the statistical properties of the dynamic trajectory of stochastic gradient descent (SGD). We approximate the mini-batch SGD and the momentum SGD as stochastic differential equations (SDEs). We exploit the continuous formulation of SDE and the theory of Fokker-Planck equations to develop new results on the escaping phenomenon and the relationship with large batch and sharp minima. In particular, we find that the stochastic process solution tends to converge to flatter minima regardless of the batch size in the asymptotic regime. However, the convergence rate is rigorously proven to depend on the batch size. These results are validated empirically with various datasets and models.
翻译:我们研究了随机梯度下降动态轨迹(SGD)的统计特性。我们把小型批量SGD和动力SGD作为随机差分方程(SDEs)加以比较。我们利用SDE的不断拟订和Fokker-Planck等式理论来为脱逃现象及其与大批量和尖锐微型的关系得出新的结果。我们特别发现,不论无备量制度中的批量大小,随机流程解决方案往往会与迷你相融合。然而,严格地证明,趋同率取决于批量大小。这些结果以各种数据集和模型的经验验证。