Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks - 专知论文

会员服务 ·

0

泛函 · Networking · 有偏 · 曲率 · 均方误差 ·

2023 年 5 月 28 日

Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks

翻译：暂无翻译

Hui Jin,Guido Montúfar

from arxiv, 97 pages, 14 figures. Added the discussion of SGD and implications to generalization

We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. \hj{For stochastic gradient descent we obtain the same implicit bias result.} We obtain a similar result for different activation functions. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength.

翻译：暂无翻译

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

渗流及相关随机系统的极限行为研究

国家自然科学基金

0+阅读 · 2009年12月31日

算子代数上的映射及与群SL(2,R)相关的vN代数

国家自然科学基金

0+阅读 · 2008年12月31日

RAYEN: Imposition of Hard Convex Constraints on Neural Networks

Arxiv

0+阅读 · 2023年7月17日

Gauss-Southwell type descent methods for low-rank matrix optimization

Arxiv

0+阅读 · 2023年7月16日

SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks

Arxiv

0+阅读 · 2023年7月15日

Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks

Arxiv

0+阅读 · 2023年7月13日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

前沿人工智能趋势报告（Frontier AI Trends Report）

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

RAYEN: Imposition of Hard Convex Constraints on Neural Networks

Arxiv

0+阅读 · 2023年7月17日

Gauss-Southwell type descent methods for low-rank matrix optimization

Arxiv

0+阅读 · 2023年7月16日

SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks

Arxiv

0+阅读 · 2023年7月15日

Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks

Arxiv

0+阅读 · 2023年7月13日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

相关基金

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

渗流及相关随机系统的极限行为研究

国家自然科学基金

0+阅读 · 2009年12月31日

算子代数上的映射及与群SL(2,R)相关的vN代数

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员