Deep Neural Networks (DNNs) are vulnerable to adversarial attacks: carefully constructed perturbations to an image can seriously impair classification accuracy, while being imperceptible to humans. While there has been a significant amount of research on defending against such attacks, most defenses based on systematic design principles have been defeated by appropriately modified attacks. For a fixed set of data, the most effective current defense is to train the network using adversarially perturbed examples. In this paper, we investigate a radically different, neuro-inspired defense mechanism, starting from the observation that human vision is virtually unaffected by adversarial examples designed for machines. We aim to reject L^inf bounded adversarial perturbations before they reach a classifier DNN, using an encoder with characteristics commonly observed in biological vision: sparse overcomplete representations, randomness due to synaptic noise, and drastic nonlinearities. Encoder training is unsupervised, using standard dictionary learning. A CNN-based decoder restores the size of the encoder output to that of the original image, enabling the use of a standard CNN for classification. Our nominal design is to train the decoder and classifier together in standard supervised fashion, but we also consider unsupervised decoder training based on a regression objective (as in a conventional autoencoder) with separate supervised training of the classifier. Unlike adversarial training, all training is based on clean images. Our experiments on the CIFAR-10 show performance competitive with state-of-the-art defenses based on adversarial training, and point to the promise of neuro-inspired techniques for the design of robust neural networks. In addition, we provide results for a subset of the Imagenet dataset to verify that our approach scales to larger images.
翻译:深心神经网络(DNNS) 很容易受到对抗性攻击: 仔细构造的神经扰动到图像会严重损害分类准确性, 而人类则无法理解。 虽然在防范这类攻击方面已经进行了大量的研究, 但基于系统设计原则的大多数防御都因适当修改攻击而失败。 对于固定的数据集, 目前最有效的防御是使用对抗性扰动的例子来训练网络。 在本文中, 我们调查一个完全不同的、 神经启发性的防御机制, 从观察人类视觉几乎不受为机器设计的对抗性实例的影响开始。 我们的目标是在人类视觉到达解析器DNNN之前拒绝L- 内受约束的对抗性扰动。 使用生物视觉中常见特征的编码器: 表现不全, 由合成性噪音产生的随机性, 非线性。 使用标准字典学学习, 以CNNND 基的解调调调导输出到原始图像的大小, 使得使用标准型CNNNCS 系统进行常规性培训, 还要用我们的标准性培训, 将我们的标准性培训升级到一个非标准性 。