CREME：通过层级感知模型编辑增强代码大语言模型的鲁棒性 (CREME: Robustness Enhancement of Code LLMs via Layer-Aware Model Editing)

Large language models (LLMs) have demonstrated impressive capabilities in code generation, where the natural language prompt plays a crucial role in conveying user intent to the model. However, prior studies have shown that LLMs are highly sensitive to prompt perturbations. Minor modifications in wording, syntax, or formatting can significantly reduce the functional correctness of generated code. As perturbations frequently occur in real-world scenarios, improving the robustness of LLMs to prompt perturbations is essential for ensuring reliable performance in practical code generation. In this paper, we introduce CREME (Code Robustness Enhancement via Model Editing), a novel approach that enhances LLM robustness through targeted parameter updates. CREME first identifies robustness-sensitive layers by comparing hidden states between an original prompt and its perturbed variant. Then, it performs lightweight parameter editing at the identified layer to reduce performance degradation. We evaluate CREME on two widely used code generation benchmarks (HumanEval and MBPP) along with their perturbed counterparts. Experimental results show that CREME improves Pass@1 accuracy by 63% on perturbed prompts while maintaining stable performance on clean inputs, with accuracy deviations within 1%. Further analysis reveals that robustness-sensitive layers are primarily concentrated in the middle and deeper layers of the network, and their locations vary across different model architectures. These insights provide a valuable foundation for developing future robustness-oriented editing strategies.

翻译：大语言模型（LLMs）在代码生成方面展现出卓越的能力，其中自然语言提示在向模型传达用户意图方面起着关键作用。然而，先前研究表明，LLMs对提示扰动高度敏感。措辞、语法或格式的细微修改都可能显著降低生成代码的功能正确性。由于实际场景中扰动频繁发生，提升LLMs对提示扰动的鲁棒性对于确保实际代码生成的可靠性能至关重要。本文提出CREME（通过模型编辑增强代码鲁棒性），这是一种通过定向参数更新来增强LLM鲁棒性的新方法。CREME首先通过比较原始提示与其扰动变体的隐藏状态来识别鲁棒性敏感层，随后在识别出的层级执行轻量级参数编辑以减少性能下降。我们在两个广泛使用的代码生成基准（HumanEval和MBPP）及其扰动版本上评估CREME。实验结果表明，CREME在扰动提示上将Pass@1准确率提升了63%，同时在干净输入上保持稳定性能，准确率偏差在1%以内。进一步分析表明，鲁棒性敏感层主要集中于网络的中层和深层，且其位置因不同模型架构而异。这些发现为开发未来面向鲁棒性的编辑策略提供了重要基础。