向针对神经网络的以等级为导向的中毒袭击 (Towards Class-Oriented Poisoning Attacks Against Neural Networks)

Poisoning attacks on machine learning systems compromise the model performance by deliberately injecting malicious samples in the training dataset to influence the training process. Prior works focus on either availability attacks (i.e., lowering the overall model accuracy) or integrity attacks (i.e., enabling specific instance-based backdoor). In this paper, we advance the adversarial objectives of the availability attacks to a per-class basis, which we refer to as class-oriented poisoning attacks. We demonstrate that the proposed attack is capable of forcing the corrupted model to predict in two specific ways: (i) classify unseen new images to a targeted "supplanter" class, and (ii) misclassify images from a "victim" class while maintaining the classification accuracy on other non-victim classes. To maximize the adversarial effect as well as reduce the computational complexity of poisoned data generation, we propose a gradient-based framework that crafts poisoning images with carefully manipulated feature information for each scenario. Using newly defined metrics at the class level, we demonstrate the effectiveness of the proposed class-oriented poisoning attacks on various models (e.g., LeNet-5, Vgg-9, and ResNet-50) over a wide range of datasets (e.g., MNIST, CIFAR-10, and ImageNet-ILSVRC2012) in an end-to-end training setting.

翻译：对机器学习系统的毒害性攻击通过在培训数据集中故意注射恶意样本来影响培训过程而损害模型性能,故意在培训数据集中注入恶意样本,从而损害模型性能。先前的工作重点是提供攻击(即降低总体模型准确性)或完整性攻击(即使具体实例基于后门)。在本文中,我们将提供攻击的对抗性目标推进到每类的基础上,我们称之为面向阶级的中毒攻击。我们证明拟议的攻击能够迫使腐败模式以两种具体方式预测:(一) 将未见新图像分类为目标的“顶替者”类,以及(二) 将图像从“受害者”类错误分类,同时保持其他非受害者类的分类准确性。为了最大限度地发挥对抗效应,并降低有毒数据生成的计算复杂性,我们提出了一个基于梯度的框架,用精心操纵的每个情景信息来制作中毒图像。我们使用新定义的班级级测量标准,展示了对各种模型(例如,LeNet-5,Vgg-9,和ResNet-50)的拟议面向的中毒性攻击的有效性,同时维持其他非受害者类类类的分类的分类准确性,同时保持其他类别的分类的分类。为了最大限度地生成的图像-MISLA-12,为一种广泛的数据设置一个梯测。