Large reasoning models (LRMs) have garnered significant attention from researchers owing to their exceptional capability in addressing complex tasks. Motivated by the observed human-like behaviors in their reasoning processes, this paper introduces a comprehensive taxonomy to characterize atomic reasoning steps and probe the ``psyche'' of LRM intelligence. Specifically, it comprises five groups and seventeen categories derived from human mental processes, thereby grounding the understanding of LRMs in an interdisciplinary perspective. The taxonomy is then applied for an in-depth understanding of current LRMs, resulting in a distinct labeled dataset that comprises 277,534 atomic reasoning steps. Using this resource, we analyze contemporary LRMs and distill several actionable takeaways for improving training and post-training of reasoning models. Notably, our analysis reveals that prevailing post-answer ``double-checks'' (self-monitoring evaluations) are largely superficial and rarely yield substantive revisions. Thus, incentivizing comprehensive multi-step reflection, rather than simple self-monitoring, may offer a more effective path forward. To complement the taxonomy, an automatic annotation framework, named CAPO, is proposed to leverage large language models (LLMs) for generating the taxonomy-based annotations. Experimental results demonstrate that CAPO achieves higher consistency with human experts compared to baselines, facilitating a scalable and comprehensive analysis of LRMs from a human cognitive perspective. Together, the taxonomy, CAPO, and the derived insights provide a principled, scalable path toward understanding and advancing LRM reasoning.
翻译:大型推理模型(LRMs)因其在解决复杂任务方面的卓越能力而受到研究者的广泛关注。受其推理过程中观察到的类人行为启发,本文提出一个全面的分类体系,以刻画原子推理步骤并探究LRM智能的“心智”。具体而言,该体系包含从人类心理过程衍生的五个组别和十七个类别,从而将LRM的理解建立在跨学科视角之上。随后,该分类体系被应用于深入理解当前LRMs,并构建了一个包含277,534个原子推理步骤的标注数据集。利用这一资源,我们分析了当代LRMs,并提炼出若干可操作的见解,以改进推理模型的训练与后训练。值得注意的是,我们的分析表明,当前流行的答案后“双重检查”(自我监控评估)大多流于表面,很少产生实质性修正。因此,激励全面的多步反思而非简单的自我监控,可能提供更有效的前进路径。为补充该分类体系,我们提出了一个名为CAPO的自动标注框架,利用大语言模型(LLMs)生成基于分类体系的标注。实验结果表明,与基线方法相比,CAPO在一致性上更接近人类专家,有助于从人类认知角度对LRMs进行可扩展的全面分析。综合而言,该分类体系、CAPO及衍生的见解为理解和推进LRM推理提供了一条原则性、可扩展的路径。