具有最小内存的神经元体精度渐变的症状联合方法 (Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory)

A neural network model of a differential equation, namely neural ODE, has enabled us to learn continuous-time dynamical systems and probabilistic distributions with a high accuracy. It uses the same network repeatedly during a numerical integration. Hence, the backpropagation algorithm requires a memory footprint proportional to the number of uses times the network size. This is true even if a checkpointing scheme divides the computational graph into sub-graphs. Otherwise, the adjoint method obtains a gradient by a numerical integration backward in time with a minimal memory footprint; however, it suffers from numerical errors. This study proposes the symplectic adjoint method, which obtains the exact gradient (up to rounding error) with a footprint proportional to the number of uses plus the network size. The experimental results demonstrate the symplectic adjoint method occupies the smallest footprint in most cases, functions faster in some cases, and is robust to a rounding error among competitive methods.

翻译：神经等式的神经网络模型,即神经元代码,使我们能够以高精度学习连续时间动态系统和概率分布。它在数字集成期间反复使用相同的网络。因此, 反反向调整算法需要与网络规模使用次数成比例的内存足迹。即使一个边检办法将计算图分成子图, 也确实如此。否则, 联合方法通过数字集成获得梯度, 时间倒转, 最小的内存足迹; 然而, 它会受到数字错误的影响。本研究建议采用共振连接法, 获得精确的梯度( 直至圆形错误), 其足迹与使用次数和网络大小成比例。实验结果显示, 共振匹配法在多数情况下占据最小的足迹, 在某些情况下功能更快, 并且强于竞争性方法之间的四舍错。

相关内容

CASES

关注 0

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

专知会员服务

11+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日