For overparameterized linear regression with isotropic Gaussian design and minimum-$\ell_p$ interpolator $p\in(1,2]$, we give a unified, high-probability characterization for the scaling of the family of parameter norms $ \\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $ with sample size. We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in $X^\top Y$, yielding closed-form predictions for (i) a data-dependent transition $n_\star$ (the "elbow"), and (ii) a universal threshold $r_\star=2(p-1)$ that separates $\lVert \widehat{w_p} \rVert_r$'s which plateau from those that continue to grow with an explicit exponent. This unified solution resolves the scaling of *all* $\ell_r$ norms within the family $r\in [1,p]$ under $\ell_p$-biased interpolation, and explains in one picture which norms saturate and which increase as $n$ grows. We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale $α$ to an effective $p_{\mathrm{eff}}(α)$ via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias. Given that many generalization proxies depend on $\lVert \widehat {w_p} \rVert_r$, our results suggest that their predictive power will depend sensitively on which $l_r$ norm is used.
翻译:针对各向同性高斯设计下的过参数化线性回归及最小ℓᵢ插值器(p∈(1,2]),我们通过统一的、高概率刻画方法,给出了参数范数族{‖ŵₚ‖ᵣ}_{r∈[1,p]}随样本量缩放的特征。通过简单的对偶射线分析,我们解决了这一基础但尚未明确的问题,揭示了XᵀY中信号尖峰与零坐标主体之间的竞争机制,从而得到闭式预测结果:(i)数据依赖的转变点n⋆(即“拐点”),(ii)普适阈值r⋆=2(p-1),该阈值将‖ŵₚ‖ᵣ区分为达到平台期的范数与以显式指数持续增长的范数。这一统一解阐明了在ℓᵢ偏置插值下,范数族r∈[1,p]内所有ℓᵣ范数的缩放规律,并通过单一图像解释了哪些范数随n增长趋于饱和、哪些持续增加。随后,我们研究了通过梯度下降训练的对角线性网络(DLNs)。通过利用DLN可分离势函数将初始化尺度α校准为有效值p_eff(α),我们通过实验证明DLNs继承了相同的拐点/阈值规律,从而在显式偏置与隐式偏置之间建立了预测桥梁。鉴于众多泛化代理指标依赖于‖ŵₚ‖ᵣ,我们的结果表明其预测能力将敏感地取决于所采用的ℓᵣ范数类型。