使关注机制在虚拟反向培训中更有力和更能解释 (Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training)

Adversarial training (AT) for attention mechanisms has successfully reduced such drawbacks by considering adversarial perturbations. However, this technique requires label information, and thus, its use is limited to supervised settings. In this study, we explore the concept of incorporating virtual AT (VAT) into the attention mechanisms, by which adversarial perturbations can be computed even from unlabeled data. To realize this approach, we propose two general training techniques, namely VAT for attention mechanisms (Attention VAT) and "interpretable" VAT for attention mechanisms (Attention iVAT), which extend AT for attention mechanisms to a semi-supervised setting. In particular, Attention iVAT focuses on the differences in attention; thus, it can efficiently learn clearer attention and improve model interpretability, even with unlabeled data. Empirical experiments based on six public datasets revealed that our techniques provide better prediction performance than conventional AT-based as well as VAT-based techniques, and stronger agreement with evidence that is provided by humans in detecting important words in sentences. Moreover, our proposal offers these advantages without needing to add the careful selection of unlabeled data. That is, even if the model using our VAT-based technique is trained on unlabeled data from a source other than the target task, both the prediction performance and model interpretability can be improved.

翻译：研究中,我们探讨了将虚拟AT(VAT)纳入关注机制的概念,通过这种机制,即使从未贴标签的数据也可以计算对抗性扰动。为了实现这一方法,我们建议采用两种一般培训技术,即注意机制增值税和注意机制的“可解释性”增值税,将注意机制扩大到半监督环境。特别是,注意iVAT侧重于关注差异;因此,即使采用未贴标签的数据,也能够有效地学习更加关注和改进模型的可解释性。基于六个公共数据集的实证实验表明,我们的技术比传统的AT以及基于VAT的模型技术提供更好的预测性能,并且与人类在发现重要词时提供的证据达成更强烈的一致。此外,我们的提案提供了这些优势,而无需从仔细的预测中增加我们经过培训的VAT指标。