ReMind：理解大语言模型中的演绎式代码推理 (\texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs)

Large Language Models (LLMs) have achieved remarkable progress in code-related tasks. Despite their advancement, empirical evidence reveals that they still struggle with \emph{deductive code reasoning}, the ability to reason about the program execution process. While prior studies have recognized this limitation, the underlying causes remain largely underexplored. In this paper, we begin by presenting a comprehensive empirical study that reveals three key challenges undermining deductive code reasoning: (1) an intrinsic gap between generation and reasoning abilities, (2) a consistent bias towards code sources, and (3) weak zero-shot generalization on complex benchmarks. In light of these challenges, we propose \texttt{ReMind}, a multi-agent framework composed of \texttt{Mutator}, \texttt{Executor}, and \texttt{Inspector}. The \texttt{Mutator} generates code variants to mitigate bias towards code sources, the \texttt{Executor} traces variable states step-by-step to expose inconsistency, and the \texttt{Inspector} identifies problematic reasoning steps and provides control-flow refinement to bridge the intrinsic reasoning gap. Through their coordinated collaboration, \texttt{ReMind} systematically identifies and refines reasoning flaws, achieving outstanding performance and enabling robust zero-shot generalization. Extensive experiments on two benchmarks with five LLMs demonstrate the superior advantages of \texttt{ReMind} compared to baseline approaches in deductive code reasoning.

翻译：大语言模型（LLMs）在代码相关任务中取得了显著进展。然而，实证证据表明，它们在演绎式代码推理——即对程序执行过程进行推理的能力——方面仍存在困难。尽管先前研究已认识到这一局限性，但其根本原因尚未得到充分探索。本文首先通过一项全面的实证研究，揭示了削弱演绎式代码推理的三个关键挑战：（1）生成能力与推理能力之间的内在差距；（2）对代码来源的持续偏好偏差；（3）在复杂基准测试上较弱的零样本泛化能力。针对这些挑战，我们提出了ReMind，一个由Mutator、Executor和Inspector组成的多智能体框架。Mutator生成代码变体以减轻对代码来源的偏好偏差；Executor逐步追踪变量状态以暴露不一致性；Inspector识别有问题的推理步骤，并提供控制流优化以弥合内在推理差距。通过三者的协同合作，ReMind系统性地识别并修正推理缺陷，实现了卓越的性能，并支持稳健的零样本泛化。在包含五个LLM的两个基准测试上进行的大量实验表明，ReMind在演绎式代码推理方面相较于基线方法具有显著优势。