神经网络初始化的影响：2层神经网络的缩放路径效应 (On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks)

In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized with zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with non-zero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite dimensional convex counterpart. We formulate the corresponding functional optimization problem and investigate its main properties. In particular, we show that as the scale of the initialization ranges between $0$ and $+\infty$, the associated path interpolates continuously between the so-called kernel and rich regimes. The numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly even beyond these extreme points.

翻译：在监督学习中，正则化路径有时被用作梯度下降初始化为零的优化路径的方便理论代理。在本文中，我们研究了无限宽度2层ReLU神经网络，其权重具有不同尺度的非零初始分布的正则化路径的修改。通过利用与非平衡最优输运理论的联系，我们表明，尽管2层网络训练的非凸性，但该问题具有无限维凸形对应物。我们制定了相应的最优化问题，并研究了其主要性质。特别是，我们展示了，当初始化的尺度在0到+∞范围内变化时，相应的路径在核和丰富区之间连续插值。数值实验证实，在我们的设置中，缩放路径和优化路径的最终状态即使超出这些极值点，其行为也类似。