SVRG及其超越：通过后验校正实现 (SVRG and Beyond via Posterior Correction) - 专知论文

会员服务 ·

0

梯度 · 贝叶斯 · 贝叶斯方法 · 高斯分布 · 方差 ·

SVRG and Beyond via Posterior Correction

翻译：SVRG及其超越：通过后验校正实现

Nico Daheim,Thomas Möllenhoff,Ming Liang Ang,Mohammad Emtiyaz Khan

from arxiv, Preprint. Under review

Stochastic Variance Reduced Gradient (SVRG) and its variants aim to speed-up training by using gradient corrections, but have seen limited success in deep learning. Here, we show surprising new foundational connections of SVRG to a recently proposed Bayesian method called posterior correction. Specifically, we show that SVRG is recovered as a special case of posterior correction over the isotropic-Gaussian family, while novel extensions are automatically obtained by using more flexible exponential families. We derive two new SVRG variants by using Gaussian families: First, a Newton-like variant that employs novel Hessian corrections, and second, an Adam-like extension that improves pretraining and finetuning of Transformer language models. This is the first work to connect SVRG to Bayes and use it to boost variational training for deep networks.

翻译：随机方差缩减梯度（SVRG）及其变体旨在通过梯度校正来加速训练，但在深度学习中的应用效果有限。本文揭示了SVRG与近期提出的贝叶斯方法——后验校正之间令人惊讶的基础性联系。具体而言，我们证明SVRG可视为各向同性高斯分布族上后验校正的特例，而通过采用更灵活的指数分布族，可自动推导出新颖的扩展方法。我们基于高斯分布族推导出两种新的SVRG变体：其一为采用新型海森矩阵校正的类牛顿法变体，其二为类Adam扩展方法，该扩展提升了Transformer语言模型的预训练与微调性能。本研究首次建立了SVRG与贝叶斯方法的理论关联，并将其用于增强深度网络的变分训练过程。

0

相关内容

梯度的本意是一个向量（矢量），表示某一函数在该点处的方向导数沿着该方向取得最大值，即函数在该点处沿着该方向（此梯度的方向）变化最快，变化率最大（为该梯度的模）。

[ICML2024]消除偏差：微调基础模型以进行半监督学习

[ICML2024]消除偏差：微调基础模型以进行半监督学习

专知会员服务

17+阅读 · 2024年5月23日

【NeurIPS2023】视觉Transformer自适应的高效低秩反向传播算法

【NeurIPS2023】视觉Transformer自适应的高效低秩反向传播算法

专知会员服务

23+阅读 · 2023年9月30日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

12+阅读 · 2022年3月19日

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

专知会员服务

33+阅读 · 2022年3月4日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

L-函数、大值特征和及相关问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

Statistical Inference for Differentially Private Stochastic Gradient Descent

Arxiv

0+阅读 · 12月12日

LPCD: Unified Framework from Layer-Wise to Submodule Quantization

Arxiv

0+阅读 · 12月1日

Improving Iterative Gaussian Processes via Warm Starting Sequential Posteriors

Arxiv

0+阅读 · 11月20日

Sparsity via Hyperpriors: A Theoretical and Algorithmic Study under Empirical Bayes Framework

Arxiv

0+阅读 · 11月9日

DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization

Arxiv

0+阅读 · 11月6日

VIP会员

文章信息

相关主题

贝叶斯方法

相关VIP内容

[ICML2024]消除偏差：微调基础模型以进行半监督学习

[ICML2024]消除偏差：微调基础模型以进行半监督学习

专知会员服务

17+阅读 · 2024年5月23日

【NeurIPS2023】视觉Transformer自适应的高效低秩反向传播算法

【NeurIPS2023】视觉Transformer自适应的高效低秩反向传播算法

专知会员服务

23+阅读 · 2023年9月30日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

12+阅读 · 2022年3月19日

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

专知会员服务

33+阅读 · 2022年3月4日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

热门VIP内容

开通专知VIP会员享更多权益服务

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

锚定情报：合成欺骗时代的地面真相

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

相关资讯

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

相关论文

Statistical Inference for Differentially Private Stochastic Gradient Descent

Arxiv

0+阅读 · 12月12日

LPCD: Unified Framework from Layer-Wise to Submodule Quantization

Arxiv

0+阅读 · 12月1日

Improving Iterative Gaussian Processes via Warm Starting Sequential Posteriors

Arxiv

0+阅读 · 11月20日

Sparsity via Hyperpriors: A Theoretical and Algorithmic Study under Empirical Bayes Framework

Arxiv

0+阅读 · 11月9日

DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization

Arxiv

0+阅读 · 11月6日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

L-函数、大值特征和及相关问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员