Going from pure Multilayer Perceptron (MLP) to a learnable graph message-passing mechanism at each layer has been foundational to state-of-the-art results, despite the computational trade-off (e.g. GATs or Transformers). To go a step further, in this work, we introduce N-simplicial attention, going from pairwise token similarity to higher-order interactions, and adapt it for Rotary Position Embeddings (RoPE). To help manage the increased complexity, we propose a cost-effective simplex selection enabling the model to focus its computation load onto the more task-sensitive interactions. Beyond these core mechanisms, we study how smoothing N-simplicial attention is by deriving a Lipschitz upper-bound and by demonstrating that by itself it also suffers from over-smoothing, despite opening the attention message-passing to higher-order interactions.
翻译:从纯粹的多层感知机(MLP)转向每层可学习的图消息传递机制,已成为实现最先进结果的基础,尽管存在计算权衡(例如GAT或Transformer)。为进一步推进,本研究引入N-单纯形注意力机制,将成对令牌相似性扩展至高阶交互,并使其适配旋转位置编码(RoPE)。为应对增加的复杂度,我们提出一种经济高效的单纯形选择方法,使模型能将计算资源集中于对任务更敏感的交互上。除核心机制外,我们通过推导Lipschitz上界,并证明该机制尽管将注意力消息传递扩展至高阶交互,其本身仍存在过度平滑问题,从而系统研究了N-单纯形注意力的光滑性特性。