Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs' code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs' general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.
翻译:大型语言模型(LLMs)展现出卓越的代码生成能力,但在适应外部库API频繁更新时表现不佳。这一关键局限性源于模型对训练数据中过时API知识的依赖,即使能够访问最新文档,仍阻碍了其在动态环境中生成可靠代码的能力。为解决该问题,我们提出ReCode(基于规则的代码更新强化学习框架),这是一种模拟程序员适应API变更行为的新颖框架。具体而言,我们构建了包含约2000条数据条目的数据集,用于训练LLMs基于更新信息执行版本迁移。随后,我们引入改进的代码字符串相似度度量作为强化学习的奖励函数。实验表明,ReCode显著提升了LLMs在动态API场景下的代码生成性能,尤其在未见过的CodeUpdateArena任务上表现突出。关键的是,与监督微调相比,ReCode对LLMs通用代码生成能力的影响更小。我们在多种LLMs及强化学习算法(GRPO与DAPO)上应用ReCode,均取得一致的性能提升。值得注意的是,训练后的Qwen2.5-Coder-7B模型性能超越了参数量为32B的代码指令微调模型及同架构推理模型。代码已开源:https://github.com/zjunlp/ReCode。