受神经启发的自动编码防反干扰防御系统 (A Neuro-Inspired Autoencoding Defense Against Adversarial Perturbations)

Deep Neural Networks (DNNs) are vulnerable to adversarial attacks: carefully constructed perturbations to an image can seriously impair classification accuracy, while being imperceptible to humans. While there has been a significant amount of research on defending against such attacks, most defenses based on systematic design principles have been defeated by appropriately modified attacks. For a fixed set of data, the most effective current defense is to train the network using adversarially perturbed examples. In this paper, we investigate a radically different, neuro-inspired defense mechanism, starting from the observation that human vision is virtually unaffected by adversarial examples designed for machines. We aim to reject L^inf bounded adversarial perturbations before they reach a classifier DNN, using an encoder with characteristics commonly observed in biological vision: sparse overcomplete representations, randomness due to synaptic noise, and drastic nonlinearities. Encoder training is unsupervised, using standard dictionary learning. A CNN-based decoder restores the size of the encoder output to that of the original image, enabling the use of a standard CNN for classification. Our nominal design is to train the decoder and classifier together in standard supervised fashion, but we also consider unsupervised decoder training based on a regression objective (as in a conventional autoencoder) with separate supervised training of the classifier. Unlike adversarial training, all training is based on clean images. Our experiments on the CIFAR-10 show performance competitive with state-of-the-art defenses based on adversarial training, and point to the promise of neuro-inspired techniques for the design of robust neural networks. In addition, we provide results for a subset of the Imagenet dataset to verify that our approach scales to larger images.

翻译：深心神经网络(DNNS) 很容易受到对抗性攻击: 仔细构造的神经扰动到图像会严重损害分类准确性, 而人类则无法理解。虽然在防范这类攻击方面已经进行了大量的研究, 但基于系统设计原则的大多数防御都因适当修改攻击而失败。对于固定的数据集, 目前最有效的防御是使用对抗性扰动的例子来训练网络。在本文中, 我们调查一个完全不同的、神经启发性的防御机制, 从观察人类视觉几乎不受为机器设计的对抗性实例的影响开始。我们的目标是在人类视觉到达解析器DNNN之前拒绝L- 内受约束的对抗性扰动。使用生物视觉中常见特征的编码器: 表现不全, 由合成性噪音产生的随机性, 非线性。使用标准字典学学习, 以CNNND 基的解调调调导输出到原始图像的大小, 使得使用标准型CNNNCS 系统进行常规性培训, 还要用我们的标准性培训, 将我们的标准性培训升级到一个非标准性。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【伯克利】黑盒机器翻译系统的模仿攻击与防御，Imitation Attacks and Defenses for Black-box Machine Translation Systems

专知会员服务

8+阅读 · 2020年5月4日