统计一致一致的内插:Hilbert内核回归的多维独立趋同率 (Parameter-free Statistically Consistent Interpolation: Dimension-independent Convergence Rates for Hilbert kernel regression)

Previously, statistical textbook wisdom has held that interpolating noisy data will generalize poorly, but recent work has shown that data interpolation schemes can generalize well. This could explain why overparameterized deep nets do not necessarily overfit. Optimal data interpolation schemes have been exhibited that achieve theoretical lower bounds for excess risk in any dimension for large data (Statistically Consistent Interpolation). These are non-parametric Nadaraya-Watson estimators with singular kernels. The recently proposed weighted interpolating nearest neighbors method (wiNN) is in this class, as is the previously studied Hilbert kernel interpolation scheme, in which the estimator has the form $\hat{f}(x)=\sum_i y_i w_i(x)$, where $w_i(x)= \|x-x_i\|^{-d}/\sum_j \|x-x_j\|^{-d}$. This estimator is unique in being completely parameter-free. While statistical consistency was previously proven, convergence rates were not established. Here, we comprehensively study the finite sample properties of Hilbert kernel regression. We prove that the excess risk is asymptotically equivalent pointwise to $\sigma^2(x)/\ln(n)$ where $\sigma^2(x)$ is the noise variance. We show that the excess risk of the plugin classifier is less than $2|f(x)-1/2|^{1-\alpha}\,(1+\varepsilon)^\alpha \sigma^\alpha(x)(\ln(n))^{-\frac{\alpha}{2}}$, for any $0<\alpha<1$, where $f$ is the regression function $x\mapsto\mathbb{E}[y|x]$. We derive asymptotic equivalents of the moments of the weight functions $w_i(x)$ for large $n$, for instance for $\beta>1$, $\mathbb{E}[w_i^{\beta}(x)]\sim_{n\rightarrow \infty}((\beta-1)n\ln(n))^{-1}$. We derive an asymptotic equivalent for the Lagrange function and exhibit the nontrivial extrapolation properties of this estimator. We present heuristic arguments for a universal $w^{-2}$ power-law behavior of the probability density of the weights in the large $n$ limit.

翻译：先前, 统计教科书智慧认为, 内调噪音数据将会不甚普遍化, 但最近的工作显示, 数据内调计划可以非常普遍化。这可以解释为什么过量光化深网不一定过于合适。最佳数据内插计划已经展示, 实现了任何方面超大数据( 统计一致的内插) 的理论下限。这些是非参数性 Nadaraya- Watson 的测量器, 带有奇数内核。最近提议的加权内插方法( WNN ) 属于这个类别, 正如以前研究过的 Hilbert 内核内部网的内插方案一样。 (x) $ hit{x 升) (x) (x) sum_ i y_ i w_ (x), 其中$w_ max (x) =xxxxxxx_ i_ =x_ i- d}/ sumit_ d} 。内调度方法是完全无值的参数。虽然先前已经证明了统计一致性, 趋同值。