This paper considers decentralized minimization of $N:=nm$ smooth non-convex cost functions equally divided over a directed network of $n$ nodes. Specifically, we describe a stochastic first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique and gradient tracking (GT) to address the stochastic and decentralized nature of the problem. We show that GT-SARAH, with appropriate algorithmic parameters, finds an $\epsilon$-accurate first-order stationary point with $O\big(\max\big\{N^{\frac{1}{2}},n(1-\lambda)^{-2},n^{\frac{2}{3}}m^{\frac{1}{3}}(1-\lambda)^{-1}\big\}L\epsilon^{-2}\big)$ gradient complexity, where ${(1-\lambda)\in(0,1]}$ is the spectral gap of the network weight matrix and $L$ is the smoothness parameter of the cost functions. This gradient complexity outperforms that of the existing decentralized stochastic gradient methods. In particular, in a big-data regime such that ${n = O(N^{\frac{1}{2}}(1-\lambda)^{3})}$, this gradient complexity furthers reduces to ${O(N^{\frac{1}{2}}L\epsilon^{-2})}$, independent of the network topology, and matches that of the centralized near-optimal variance-reduced methods. Moreover, in this regime GT-SARAH achieves a non-asymptotic linear speedup, in that, the total number of gradient computations at each node is reduced by a factor of $1/n$ compared to the centralized near-optimal algorithms that perform all gradient computations at a single node. To the best of our knowledge, GT-SARAH is the first algorithm that achieves this property. In addition, we show that appropriate choices of local minibatch size balance the trade-offs between the gradient and communication complexity of GT-SARAH. Over infinite time horizon, we establish that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.


翻译:本文考虑将 $N: = mum 平滑的非convex 成本函数分散化, 在一个直观的网络中平坦。 具体地说, 我们描述一种叫做 GT- SARAH 的随机一级梯度方法, 使用SAAH 型的减少差异技术和梯度跟踪( GT) 来解决问题的随机性和分散性性质。 我们显示 GT- SARAH 具有适当的算法参数, 找到一个 $\ limlon$- 准确的一阶梯度固定点, 以 $Big (max\ big) N% 2+%, n( 1-\ bda) n( 1) 平坦氏 平流度梯度 梯度方法, 以 美元( 1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ 美元) 直立度( 美元) 直立度( =xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

1
下载
关闭预览

相关内容

专知会员服务
51+阅读 · 2020年12月14日
一份简单《图神经网络》教程,28页ppt
专知会员服务
127+阅读 · 2020年8月2日
Hierarchically Structured Meta-learning
CreateAMind
27+阅读 · 2019年5月22日
Ray RLlib: Scalable 降龙十八掌
CreateAMind
9+阅读 · 2018年12月28日
[DLdigest-8] 每日一道算法
深度学习每日摘要
4+阅读 · 2017年11月2日
【论文】图上的表示学习综述
机器学习研究会
15+阅读 · 2017年9月24日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
VIP会员
相关VIP内容
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
27+阅读 · 2019年5月22日
Ray RLlib: Scalable 降龙十八掌
CreateAMind
9+阅读 · 2018年12月28日
[DLdigest-8] 每日一道算法
深度学习每日摘要
4+阅读 · 2017年11月2日
【论文】图上的表示学习综述
机器学习研究会
15+阅读 · 2017年9月24日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Top
微信扫码咨询专知VIP会员