Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs' code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs' general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.
翻译:大型语言模型(LLMs)展现出卓越的代码生成能力,但在适应外部库API的频繁更新时表现不佳。这一关键局限性源于模型对训练数据中过时API知识的依赖,即使能够访问最新文档,仍会阻碍动态环境中可靠的代码生成。为解决此问题,我们提出ReCode(基于规则的强化学习代码更新框架),这是一种模拟程序员适应API变化行为的新颖框架。具体而言,我们构建了约2000条数据项的数据集,用于训练LLMs基于更新信息执行版本迁移。随后,我们引入改进的字符串相似度度量作为代码评估指标,并将其作为强化学习的奖励信号。实验表明,ReCode显著提升了LLMs在动态API场景下的代码生成性能,尤其在未见过的CodeUpdateArena任务上表现突出。关键的是,与监督微调相比,ReCode对LLMs通用代码生成能力的影响更小。我们在多种LLMs和强化学习算法(GRPO与DAPO)上应用ReCode,均取得一致的性能提升。值得注意的是,经过训练后,Qwen2.5-Coder-7B模型的性能超越了参数规模达32B的代码指令微调模型及相同架构的推理模型。代码已开源:https://github.com/zjunlp/ReCode。