小批次随机梯度下降论文 - 专知

会员服务 ·

小批次随机梯度下降

小批次随机梯度下降

Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent

Arxiv

0+阅读 · 2024年10月16日

How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD

Arxiv

0+阅读 · 2024年6月17日

The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

Arxiv

0+阅读 · 2024年5月19日

AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms

Arxiv

0+阅读 · 2023年10月31日

Convergence Analysis of Decentralized ASGD

Arxiv

0+阅读 · 2023年9月7日

Generalizing DP-SGD with Shuffling and Batch Clipping

Generalizing DP-SGD with Shuffling and Batch Clipping

Arxiv

0+阅读 · 2023年7月25日

Learning from time-dependent streaming data with online stochastic algorithms

Arxiv

0+阅读 · 2023年7月18日

A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta

Arxiv

0+阅读 · 2023年3月9日

Generalizing DP-SGD with Shuffling and Batch Clipping

Arxiv

0+阅读 · 2023年3月5日

A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta

Arxiv

0+阅读 · 2022年6月22日

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

Arxiv

0+阅读 · 2022年6月16日

On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Arxiv

0+阅读 · 2021年12月2日

Image-specific Convolutional Kernel Modulation for Single Image Super-resolution

Arxiv

0+阅读 · 2021年11月16日

AGGLIO: Global Optimization for Locally Convex Functions

Arxiv

0+阅读 · 2021年11月6日

Unified Regularity Measures for Sample-wise Learning and Generalization

Arxiv

0+阅读 · 2021年8月9日

参考链接

微信扫码咨询专知VIP会员