行为等价令牌：大语言模型中长提示的单令牌替换方法 (Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs)

Carefully engineered system prompts play a critical role in guiding the behavior of LLM agents, but their considerable length introduces significant drawbacks, including increased inference latency, higher computational cost, and reduced effective context length. This raises the question of whether such lengthy prompts can be replaced by a drastically reduced number of tokens while preserving their behavioral effect on downstream tasks. To enable this, we propose a lightweight three-stage training framework that learns a single prompt-specific Behavior-Equivalent token ([BE]). The framework first trains [BE] to encode the natural-language content of the original system prompt via reconstruction, and then distills the prompt 's downstream behavior into this single token. Importantly, our method requires no access to model internals, no auxiliary compression models, and no labeled responses. Empirical evaluations on three datasets show that a single [BE] token achieves up to a 3000x reduction in prompt length, while retaining about 98% of the downstream performance of the original system prompts. This substantially reduces inference cost and leaves almost the entire context window available for user inputs.

翻译：精心设计的系统提示在引导大语言模型（LLM）智能体行为中起着关键作用，但其较长的篇幅会带来显著弊端，包括增加推理延迟、提高计算成本以及减少有效上下文长度。这引发了一个问题：是否可以用大幅减少的令牌数量来替代此类冗长提示，同时保持其对下游任务的行为影响。为实现这一目标，我们提出了一种轻量级的三阶段训练框架，用于学习单个提示特定的行为等价令牌（[BE]）。该框架首先通过重构训练[BE]以编码原始系统提示的自然语言内容，随后将提示的下游行为蒸馏至该单一令牌中。重要的是，我们的方法无需访问模型内部结构、无需辅助压缩模型，也无需标注响应。在三个数据集上的实证评估表明，单个[BE]令牌可实现高达3000倍的提示长度缩减，同时保留原始系统提示约98%的下游性能。这显著降低了推理成本，并几乎将整个上下文窗口保留给用户输入使用。