向导反向净化扩散模型 (Guided Diffusion Model for Adversarial Purification)

With wider application of deep neural networks (DNNs) in various algorithms and frameworks, security threats have become one of the concerns. Adversarial attacks disturb DNN-based image classifiers, in which attackers can intentionally add imperceptible adversarial perturbations on input images to fool the classifiers. In this paper, we propose a novel purification approach, referred to as guided diffusion model for purification (GDMP), to help protect classifiers from adversarial attacks. The core of our approach is to embed purification into the diffusion denoising process of a Denoised Diffusion Probabilistic Model (DDPM), so that its diffusion process could submerge the adversarial perturbations with gradually added Gaussian noises, and both of these noises can be simultaneously removed following a guided denoising process. On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range, thereby significantly improving the correctness of classification. GDMP improves the robust accuracy by 5%, obtaining 90.1% under PGD attack on the CIFAR10 dataset. Moreover, GDMP achieves 70.94% robustness on the challenging ImageNet dataset.

翻译：在各种算法和框架中,随着深心神经网络(DNNs)的更广泛应用,安全威胁已成为关注问题之一。反向攻击扰乱了DNN的图像分类器,在这种分类器中,攻击者可以故意在输入图像上添加可见的对抗性扰动,以愚弄分类者。在本文中,我们提出一种新的净化方法,称为净化指导扩散模型(GDMP),以帮助保护分类者免受对抗性攻击。我们的方法的核心是将净化纳入一个分辨的DPM(DDPM)的传播分辨过程,以便其扩散过程能够通过逐渐添加高斯噪音来压低对立性扰动,而这两种噪音都可以在受引导的分解过程之后同时消除。在对各种数据集的全面实验中,拟议的GDMP将对抗性攻击引起的扰动降低到一个浅范围,从而大大改善分类的正确性。GDMP改进了5%的稳健精确度,在PGMD攻击中获得了90.1%的抗力性图像模型10数据。