N-单纯形注意力机制的光滑性如何？ (How Smoothing is N-simplicial Attention?) - 专知论文

会员服务 ·

0

单纯形 · 交互 · 注意力机制 · 消息传递 · 高阶 ·

How Smoothing is N-simplicial Attention?

翻译：N-单纯形注意力机制的光滑性如何？

Alexandre Dussolle,Pietro Liò

from arxiv, arXiv preprint

Going from pure Multilayer Perceptron (MLP) to a learnable graph message-passing mechanism at each layer has been foundational to state-of-the-art results, despite the computational trade-off (e.g. GATs or Transformers). To go a step further, in this work, we introduce N-simplicial attention, going from pairwise token similarity to higher-order interactions, and adapt it for Rotary Position Embeddings (RoPE). To help manage the increased complexity, we propose a cost-effective simplex selection enabling the model to focus its computation load onto the more task-sensitive interactions. Beyond these core mechanisms, we study how smoothing N-simplicial attention is by deriving a Lipschitz upper-bound and by demonstrating that by itself it also suffers from over-smoothing, despite opening the attention message-passing to higher-order interactions.

翻译：从纯粹的多层感知机（MLP）转向每层可学习的图消息传递机制，已成为实现最先进结果的基础，尽管存在计算权衡（例如GAT或Transformer）。为进一步推进，本研究引入N-单纯形注意力机制，将成对令牌相似性扩展至高阶交互，并使其适配旋转位置编码（RoPE）。为应对增加的复杂度，我们提出一种经济高效的单纯形选择方法，使模型能将计算资源集中于对任务更敏感的交互上。除核心机制外，我们通过推导Lipschitz上界，并证明该机制尽管将注意力消息传递扩展至高阶交互，其本身仍存在过度平滑问题，从而系统研究了N-单纯形注意力的光滑性特性。

0

相关内容

单纯形

【NeurIPS2024】迈向具有不完整数据的鲁棒多模态情感分析

【NeurIPS2024】迈向具有不完整数据的鲁棒多模态情感分析

专知会员服务

18+阅读 · 2024年10月2日

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

专知会员服务

68+阅读 · 2023年2月24日

【KDD2022】掩码与推理: 用于复杂逻辑查询的预训练知识图谱Transformers

【KDD2022】掩码与推理: 用于复杂逻辑查询的预训练知识图谱Transformers

专知会员服务

29+阅读 · 2022年8月12日

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

专知会员服务

37+阅读 · 2022年3月13日

【ICLR2022】Vision Transformer 模型工作机制的最新理论

【ICLR2022】Vision Transformer 模型工作机制的最新理论

专知会员服务

43+阅读 · 2022年2月19日

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

注意力机制可解释吗？这篇ACL 2019论文说……

注意力机制可解释吗？这篇ACL 2019论文说……

机器之心

11+阅读 · 2019年6月16日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

P3P问题解分布的临界曲面研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space

Arxiv

0+阅读 · 12月17日

How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness

Arxiv

0+阅读 · 12月17日

Can Language Models Discover Scaling Laws?

Arxiv

0+阅读 · 12月15日

iPINNER: An Iterative Physics-Informed Neural Network with Ensemble Kalman Filter

Arxiv

0+阅读 · 12月12日

Emergent Granger Causality in Neural Networks: Can Prediction Alone Reveal Structure?

Arxiv

0+阅读 · 12月8日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【NeurIPS2024】迈向具有不完整数据的鲁棒多模态情感分析

【NeurIPS2024】迈向具有不完整数据的鲁棒多模态情感分析

专知会员服务

18+阅读 · 2024年10月2日

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

专知会员服务

68+阅读 · 2023年2月24日

【KDD2022】掩码与推理: 用于复杂逻辑查询的预训练知识图谱Transformers

【KDD2022】掩码与推理: 用于复杂逻辑查询的预训练知识图谱Transformers

专知会员服务

29+阅读 · 2022年8月12日

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

专知会员服务

37+阅读 · 2022年3月13日

【ICLR2022】Vision Transformer 模型工作机制的最新理论

【ICLR2022】Vision Transformer 模型工作机制的最新理论

专知会员服务

43+阅读 · 2022年2月19日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

前沿人工智能趋势报告（Frontier AI Trends Report）

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

相关资讯

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

注意力机制可解释吗？这篇ACL 2019论文说……

注意力机制可解释吗？这篇ACL 2019论文说……

机器之心

11+阅读 · 2019年6月16日

相关论文

Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space

Arxiv

0+阅读 · 12月17日

How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness

Arxiv

0+阅读 · 12月17日

Can Language Models Discover Scaling Laws?

Arxiv

0+阅读 · 12月15日

iPINNER: An Iterative Physics-Informed Neural Network with Ensemble Kalman Filter

Arxiv

0+阅读 · 12月12日

Emergent Granger Causality in Neural Networks: Can Prediction Alone Reveal Structure?

Arxiv

0+阅读 · 12月8日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

P3P问题解分布的临界曲面研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员