Agentic language models compose multi step reasoning chains, yet intermediate steps can be corrupted by inconsistent context, retrieval errors, or adversarial inputs, which makes post hoc evaluation too late because errors propagate before detection. We introduce a diagnostic that requires no additional training and uses only the forward pass to emit a binary accept or reject signal during agent execution. The method analyzes token graphs induced by attention and computes two spectral statistics in early layers, namely the high frequency energy ratio and spectral entropy. We formalize these signals, establish invariances, and provide finite sample estimators with uncertainty quantification. Under a two regime mixture assumption with a monotone likelihood ratio property, we show that a single threshold on the high frequency energy ratio is optimal in the Bayes sense for detecting context inconsistency. Empirically, the high frequency energy ratio exhibits robust bimodality during context verification across multiple model families, which enables gating decisions with overhead below one millisecond on our hardware and configurations. We demonstrate integration into retrieval augmented agent pipelines and discuss deployment as an inline safety monitor. The approach detects contamination while the model is still processing the text, before errors commit to the reasoning chain.
翻译:智能体语言模型通过组合多步推理链进行工作,然而中间步骤可能因上下文不一致、检索错误或对抗性输入而受到污染,这使得事后评估为时已晚,因为错误在检测前已发生传播。我们提出一种无需额外训练、仅需前向传播的诊断方法,可在智能体执行过程中输出二元接受或拒绝信号。该方法通过分析注意力机制诱导的标记图,在模型浅层计算两种谱统计量:高频能量比与谱熵。我们形式化这些信号,建立其不变性,并给出具有不确定性量化的有限样本估计器。在满足单调似然比特性的双机制混合假设下,我们证明对高频能量比设置单一阈值在贝叶斯意义下是检测上下文不一致的最优方法。实证研究表明,高频能量比在多种模型家族的上下文验证过程中均表现出稳健的双峰分布特性,这使得在本文硬件配置下可实现低于1毫秒开销的门控决策。我们展示了该方法在检索增强型智能体流水线中的集成应用,并探讨了作为内联安全监控器的部署方案。该方法能在模型处理文本时、错误固化至推理链之前实时检测污染。