Our paper challenges claims from prior research that transformer-based models, when learning in context, implicitly implement standard learning algorithms. We present empirical evidence inconsistent with this view and provide a mathematical analysis demonstrating that transformers cannot achieve general predictive accuracy due to inherent architectural limitations.
翻译:本文对先前研究中关于基于Transformer的模型在上下文学习时隐式实现标准学习算法的论断提出质疑。我们提供了与该观点不一致的实验证据,并通过数学分析证明,由于固有的架构限制,Transformer模型无法实现通用的预测准确性。