ReCode：基于强化学习的代码API知识更新 (ReCode: Updating Code API Knowledge with Reinforcement Learning)

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs' code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs' general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.

翻译：大语言模型（LLMs）展现出卓越的代码生成能力，但在适应外部库API的频繁更新时表现不佳。这一关键局限性源于模型对训练数据中过时API知识的依赖，即使能够访问最新文档，也阻碍了其在动态环境中可靠生成代码。为解决此问题，我们提出了ReCode（基于规则的代码更新强化学习），这是一个模拟人类程序员适应API变化的新型框架。具体而言，我们构建了一个包含约2000条数据条目的数据集，用于训练LLMs基于更新信息执行版本迁移。随后，我们引入一种改进的字符串相似度度量作为代码评估的奖励函数，用于强化学习。实验表明，ReCode显著提升了LLMs在动态API场景下的代码生成性能，尤其是在未见过的CodeUpdateArena任务上。关键的是，与监督微调相比，ReCode对LLMs的通用代码生成能力影响更小。我们在多种LLMs和强化学习算法（GRPO和DAPO）上应用ReCode，均取得了一致的改进。值得注意的是，训练后，Qwen2.5-Coder-7B的性能超越了参数规模为32B的代码指令微调模型以及相同架构的推理模型。代码发布于https://github.com/zjunlp/ReCode。