通过《减少差异法》对《守则》神经模式的反向攻击 (Adversarial Attacks on Neural Models of Code via Code Difference Reduction)

Deep learning has been widely used to solve various code-based tasks by building deep code models based on a large number of code snippets. However, deep code models are still vulnerable to adversarial attacks. As source code is discrete and has to strictly stick to the grammar and semantics constraints, the adversarial attack techniques in other domains are not applicable. Moreover, the attack techniques specific to deep code models suffer from the effectiveness issue due to the enormous attack space. In this work, we propose a novel adversarial attack technique (i.e., CODA). Its key idea is to use the code differences between the target input and reference inputs (that have small code differences but different prediction results with the target one) to guide the generation of adversarial examples. It considers both structure differences and identifier differences to preserve the original semantics. Hence, the attack space can be largely reduced as the one constituted by the two kinds of code differences, and thus the attack process can be largely improved by designing corresponding equivalent structure transformations and identifier renaming transformations. Our experiments on 10 deep code models (i.e., two pre trained models with five code-based tasks) demonstrate the effectiveness and efficiency of CODA, the naturalness of its generated examples, and its capability of defending against attacks after adversarial fine-tuning. For example, CODA improves the state-of-the-art techniques (i.e., CARROT and ALERT) by 79.25% and 72.20% on average in terms of the attack success rate, respectively.

翻译：深层次的学习被广泛用于通过建立基于大量代码片段的深层代码模型解决各种基于代码的任务。但是,深层代码模型仍然易受对抗性攻击的伤害。由于源代码是离散的,必须严格遵守语法和语义限制,其他领域的对抗性攻击技术不适用。此外,深层代码模型特有的攻击技术由于巨大的攻击空间而存在效力问题。在这项工作中,我们提出了一种新的对抗性攻击技术(即CODA)。它的关键思想是使用目标投入和参考投入(有小代码差异但与目标1不同的预测结果)之间的代码差异来指导对抗性实例的生成。它考虑到结构差异和识别性差异以保持原有语义。因此,由于两种代码差异构成的两种特性,袭击过程可以大大缩小。因此,我们可以通过设计相应的对应结构转型和识别性变名转换(即20 CODA ) 的10种深度代码模型(即两个事先培训过的模型,用5个基于代码的C-MODR 的模型来测量其攻击性、C-C-C-RA能力测试的模型) 来显示其自然效率,从而改善其以C-RDRA攻击率。