We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-Łojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $\frac{1}{n}$. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $\frac{1}{n}$ and $\frac{1}{\sqrt{n}}$, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $\frac{1}{\log(n)}$.
翻译:我们分析了在$n$个数据点上通过梯度流训练的单隐藏层ReLU网络的收敛性。我们的主要贡献利用了环境空间的高维特性——这意味着输入样本之间的相关性较低——证明了宽度为$\log(n)$量级的神经元网络足以以高概率实现全局收敛。我们的分析采用沿梯度流轨迹的Polyak-Łojasiewicz视角,得出$\frac{1}{n}$的指数收敛速率。当数据完全正交时,我们进一步细化了对收敛速度的表征,证明其渐近行为介于$\frac{1}{n}$与$\frac{1}{\sqrt{n}}$量级之间,并揭示了收敛速率中的相变现象:在此过程中,收敛速率从下界演化至上界,且相对时间量级为$\frac{1}{\log(n)}$。