Gradient optimization algorithms using epochs, that is those based on stochastic gradient descent without replacement (SGDo), are predominantly used to train machine learning models in practice. However, the mathematical theory of SGDo and related algorithms remain underexplored compared to their "with replacement" and "one-pass" counterparts. In this article, we propose a stochastic, continuous-time approximation to SGDo with additive noise based on a Young differential equation driven by a stochastic process we call an "epoched Brownian motion". We show its usefulness by proving the almost sure convergence of the continuous-time approximation for strongly convex objectives and learning rate schedules of the form $u_t = \frac{1}{(1+t)^β}, β\in (0,1)$. Moreover, we compute an upper bound on the asymptotic rate of almost sure convergence, which is as good or better than previous results for SGDo.
翻译:基于轮次(epoch)的梯度优化算法,即采用无放回随机梯度下降(SGDo)的方法,在实践中被广泛用于训练机器学习模型。然而,相较于其“有放回”和“单次遍历”的对应算法,SGDo及相关算法的数学理论仍相对缺乏深入探索。本文提出了一种基于“轮次化布朗运动”随机过程驱动的Young微分方程的随机连续时间逼近方法,用于近似SGDo并引入加性噪声。通过证明在强凸目标函数及学习率调度为$u_t = \\frac{1}{(1+t)^\\beta}, \\beta \\in (0,1)$形式下,该连续时间逼近几乎必然收敛,我们展示了其有效性。此外,我们计算了几乎必然收敛渐近速率的上界,该结果优于或等同于先前针对SGDo的研究成果。