Pre-trained or fine-tuned on large code corpora, Large Language Models (LLMs) have demonstrated strong performance in code completion tasks. However, their embedded knowledge is constrained by the timeliness of training data, which often includes code using deprecated APIs. Consequently, LLMs frequently generate deprecated APIs that will no longer be supported in future versions of third-party libraries. While retraining LLMs on updated codebases could refresh their API knowledge, this approach is computationally expensive. Recently, lightweight model editing methods have emerged to efficiently correct specific knowledge in LLMs. However, it remains unclear whether these methods can effectively update deprecated API knowledge and enable edited models to generate up-to-date APIs. To address this gap, we conduct the first systematic study applying 10 state-of-the-art model editing techniques to update deprecated API knowledge in three LLMs: Qwen2.5-Coder, StarCoder2, and DeepSeek-Coder. We introduce EDAPIBench, a dedicated benchmark featuring over 70 deprecated APIs from 8 popular Python libraries, with more than 3,000 editing instances. Our results show that the parameter-efficient fine-tuning method AdaLoRA achieves the best performance in enabling edited models to generate correct, up-to-date APIs, but falls short in Specificity (i.e., the editing influences untargeted knowledge). To resolve this, we propose AdaLoRA-L, which defines "Common API Layers" (layers within the LLMs with high importance across all APIs, storing general knowledge and excluded from editing) and restricts edits exclusively to "Specific API Layers" (layers with high importance only for the target API, storing the API-specific knowledge). Experimental results demonstrate that AdaLoRA-L significantly improves Specificity while maintaining comparable performance across other evaluation metrics.
翻译:基于大规模代码语料库进行预训练或微调的大型语言模型(LLMs)在代码补全任务中展现出强大性能。然而,其内嵌知识受限于训练数据的时效性,这些数据常包含使用已弃用API的代码。因此,LLMs频繁生成在第三方库未来版本中将不再支持的过时API。尽管在更新的代码库上重新训练LLMs可刷新其API知识,但该方法计算成本高昂。近年来,轻量级模型编辑方法兴起,旨在高效修正LLMs中的特定知识。然而,这些方法是否能有效更新过时API知识,并使编辑后的模型生成最新API,尚不明确。为填补这一空白,我们首次开展系统性研究,应用10种前沿模型编辑技术更新三种LLMs中的过时API知识:Qwen2.5-Coder、StarCoder2和DeepSeek-Coder。我们提出了EDAPIBench,一个专用基准测试集,涵盖8个流行Python库中的70多个过时API,包含超过3000个编辑实例。实验结果表明,参数高效微调方法AdaLoRA在使编辑后模型生成正确、最新API方面表现最佳,但在特异性(即编辑对非目标知识的影响)方面存在不足。为解决此问题,我们提出AdaLoRA-L,该方法定义了“通用API层”(LLMs中对所有API均具有高重要性的层,存储通用知识且不参与编辑),并将编辑严格限制于“特定API层”(仅对目标API具有高重要性的层,存储API特定知识)。实验结果证明,AdaLoRA-L在保持其他评估指标可比性能的同时,显著提升了特异性。