用于快速和通信高效分布式传播学习的适应性存储梯度源 (Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning)

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive~$k$, i.e., varying $k$ throughout the runtime of the algorithm. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time that we derive. Then, we propose and implement an algorithm for adaptive distributed SGD that is based on a statistical heuristic. Our results show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations. Moreover, the results also show that the adaptive version is communication-efficient, where the amount of communication required between the master and the workers is less than that of non-adaptive versions.

翻译：我们考虑的是一位大师想要对每个拥有数据子集的美元工人进行分布式随机梯度下降(SGD)算法(SGD)的设置。分布式 SGD可能受到裁员者(即造成延误的缓慢或不反应工人)的影响。文献研究的一个解决办法是在更新模型之前,等待最快的美元<nn美元工人的反应,K美元是一个固定参数。选择美元值是SGD的运行时间(即趋同率)和模型的错误之间的权衡。为了优化错误-运行时间交易,我们用适应性~k美元调查分布式SGD,也就是说,在整个算法运行期间,美元差异很大。我们首先为不同的美元设计一个调整政策,根据我们所得出的时钟的错误的高度约束优化这种交易。然后,我们提出并实施一个调整性分布式非同步的SGDD的算法,这个算法以适应性较低的时间值为基础,我们还可以在统计性调整性调整后显示不那么高的SGDA值。我们提出的调整型的SGD的计算结果可以比统计性地显示不那么高的汇率。