Universal transformers (UTs) have been widely used for complex reasoning tasks such as ARC-AGI and Sudoku, yet the specific sources of their performance gains remain underexplored. In this work, we systematically analyze UTs variants and show that improvements on ARC-AGI primarily arise from the recurrent inductive bias and strong nonlinear components of Transformer, rather than from elaborate architectural designs. Motivated by this finding, we propose the Universal Reasoning Model (URM), which enhances the UT with short convolution and truncated backpropagation. Our approach substantially improves reasoning performance, achieving state-of-the-art 53.8% pass@1 on ARC-AGI 1 and 16.0% pass@1 on ARC-AGI 2. Our code is avaliable at https://github.com/zitian-gao/URM.
翻译:通用Transformer(UT)已广泛应用于复杂推理任务,如ARC-AGI和数独,但其性能提升的具体来源仍未得到充分探究。本研究系统分析了UT的变体,结果表明ARC-AGI上的改进主要源于Transformer的循环归纳偏置和强非线性组件,而非精细的架构设计。基于这一发现,我们提出了通用推理模型(URM),该模型通过短卷积和截断反向传播增强了UT。我们的方法显著提升了推理性能,在ARC-AGI 1上达到了53.8% pass@1的最新水平,在ARC-AGI 2上达到了16.0% pass@1。代码发布于https://github.com/zitian-gao/URM。