SGD 在神经网络上的 SGD 学习:飞跃复杂和坐到船头的动态 (SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics)

We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure -- the leap -- which measures how "hierarchical" target functions are. For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f),2)})$. We prove a version of this conjecture for a class of functions on Gaussian isotropic data and 2-layer neural networks, under additional technical assumptions on how SGD is run. We show that the training sequentially learns the function support with a saddle-to-saddle dynamic. Our result departs from [Abbe et al. 2022] by going beyond leap 1 (merged-staircase functions), and by going beyond the mean-field and gradient flow approximations that prohibit the full complexity control obtained here. Finally, we note that this gives an SGD complexity for the full training trajectory that matches that of Correlational Statistical Query (CSQ) lower-bounds.

翻译：我们调查了SGD在完全连接的神经网络上学习带有异热带数据的SOD的时间复杂性。我们提出了一个复杂度量度 -- -- 跳跃 -- -- 测量“等级”目标功能是如何的。对于以美元为维的Uullean 或异向高斯星数据, 我们的主要推测是, 学习一个具有低维支持的函数所需的时间复杂性是 $\ tilde\ Theta ( d ⁇ max( matehr{Leap}(f,2)) $ 。我们证明了高山异地数据和两层神经网络上一系列功能的预测的版本。在关于 SGD 运行方式的额外技术假设下, 我们发现, 培训会以马鞍到悬动的动态按顺序学习函数支持。我们的结果从[ Abbe 和 al. 2022] 跳跃1( 折叠式功能) 开始, 并超越了禁止这里获得的完整复杂度控制的中位和梯度流近的中值。最后, 我们注意到, 使SGDR Q 的完整轨迹与整个训练相匹配。