Attention mechanisms have achieved significant empirical success in multiple fields, but their underlying optimization objectives remain unclear yet. Moreover, the quadratic complexity of self-attention has become increasingly prohibitive. Although interpretability and efficiency are two mutually reinforcing pursuits, prior work typically investigates them separately. In this paper, we propose a unified optimization objective that derives inherently interpretable and efficient attention mechanisms through algorithm unrolling. Precisely, we construct a gradient step of the proposed objective with a set of forward-pass operations of our \emph{Contract-and-Broadcast Self-Attention} (CBSA), which compresses input tokens towards low-dimensional structures by contracting a few representatives of them. This novel mechanism can not only scale linearly by fixing the number of representatives, but also covers the instantiations of varied attention mechanisms when using different sets of representatives. We conduct extensive experiments to demonstrate comparable performance and superior advantages over black-box attention mechanisms on visual tasks. Our work sheds light on the integration of interpretability and efficiency, as well as the unified formula of attention mechanisms.
翻译:注意力机制已在多个领域取得显著的实证成功,但其底层优化目标仍不明确。此外,自注意力机制的二次复杂度日益成为计算瓶颈。尽管可解释性与效率是两项相互促进的研究目标,先前工作通常将它们分开探讨。本文提出一个统一的优化目标,通过算法展开推导出本质可解释且高效的注意力机制。具体而言,我们通过一组前向传播操作构建了该目标的梯度步进,即我们的“压缩与广播自注意力”(CBSA)机制,该机制通过压缩输入令牌中少数代表向量,将其向低维结构压缩。这一新颖机制不仅能通过固定代表数量实现线性复杂度,还能在使用不同代表集合时覆盖多种注意力机制的实例化形式。我们通过大量实验证明,在视觉任务上,该机制与黑盒注意力机制性能相当且具有显著优势。本研究为可解释性与效率的融合以及注意力机制的统一形式化提供了新的视角。