Orthogonal momentum gradient updates have emerged to overcome the limitations of vector-based optimizers like Adam. The vector-based optimizer Adam suffers from high memory costs and ill-conditioned momentum gradient updates. However, traditional Orthogonal momentum approaches, such as SVD/QR decomposition, suffer from high computational and memory costs and underperform compared to well-tuned SGD with momentum.Recent advances, such as Muon, improve efficiency by applying momentum before orthogonalization and approximate orthogonal matrices via Newton-Schulz iterations, which gives better GPU utilization, active high TFLOPS, and reduces memory usage by up to 3x. Nevertheless, Muon(Vanilla) suffers from exploding attention logits and has cubic computation complexity. In this paper we , deep dive into orthogonal momentum gradient updates to find the main properties that help Muon to achieve remarkable performance.We propose \textbf{AuON} (Alternative Unit-norm momentum updates by Normalized nonlinear scaling), a linear-time optimizer that achieves strong performance without approximate orthogonal matrices, while preserving structural alignment and reconditioning ill-posed updates. AuON has an automatic (\textbf{"emergency brake"}) to handle exploding attention logits.. We further introduce a hybrid variant (\textbf{ Hybrid-AuON})that applies the linear transformations with Newton-Schulz iterations which out performs Muon in the language modeling tasks. Code is available at: https://github.com/ryyzn9/AuON
翻译:正交动量梯度更新方法已出现,以克服Adam等基于向量的优化器的局限性。基于向量的优化器Adam存在高内存成本和病态动量梯度更新的问题。然而,传统的正交动量方法(如SVD/QR分解)存在高计算和内存成本,且与经过良好调优的带动量SGD相比表现不佳。最近的进展(如Muon)通过在正交化前应用动量,并通过Newton-Schulz迭代近似正交矩阵来提高效率,这实现了更好的GPU利用率、更高的TFLOPS,并将内存使用量减少了最多3倍。尽管如此,Muon(基础版本)存在注意力对数爆炸问题,且具有立方计算复杂度。在本文中,我们深入研究了正交动量梯度更新,以找出帮助Muon实现卓越性能的主要特性。我们提出了\\textbf{AuON}(通过归一化非线性缩放的替代单位范数动量更新),这是一种线性时间优化器,无需近似正交矩阵即可实现强大性能,同时保持结构对齐并重新调节病态更新。AuON具有自动(\\textbf{“紧急制动”})机制来处理爆炸的注意力对数。我们进一步引入了一种混合变体(\\textbf{Hybrid-AuON}),它通过Newton-Schulz迭代应用线性变换,在语言建模任务中超越了Muon。代码可在以下网址获取:https://github.com/ryyzn9/AuON