Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned documents for each target phrase. We introduce Eyes-on-Me, a modular attack that decomposes an adversarial document into reusable Attention Attractors and Focus Regions. Attractors are optimized to direct attention to the Focus Region. Attackers can then insert semantic baits for the retriever or malicious instructions for the generator, adapting to new targets at near zero cost. This is achieved by steering a small subset of attention heads that we empirically identify as strongly correlated with attack success. Across 18 end-to-end RAG settings (3 datasets $\times$ 2 retrievers $\times$ 3 generators), Eyes-on-Me raises average attack success rates from 21.9 to 57.8 (+35.9 points, 2.6$\times$ over prior work). A single optimized attractor transfers to unseen black box retrievers and generators without retraining. Our findings establish a scalable paradigm for RAG data poisoning and show that modular, reusable components pose a practical threat to modern AI systems. They also reveal a strong link between attention concentration and model outputs, informing interpretability research.
翻译:现有针对检索增强生成(RAG)系统的数据投毒攻击扩展性较差,因其需为每个目标短语对投毒文档进行代价高昂的优化。本文提出Eyes-on-Me,一种模块化攻击方法,将对抗性文档分解为可复用的注意力吸引子与聚焦区域。吸引子经优化以引导注意力至聚焦区域。攻击者随后可插入用于检索器的语义诱饵或用于生成器的恶意指令,以近乎零成本适应新目标。该机制通过引导我们经实证识别出的与攻击成功率强相关的小部分注意力头子集实现。在18种端到端RAG配置(3个数据集 × 2种检索器 × 3种生成器)中,Eyes-on-Me将平均攻击成功率从21.9提升至57.8(提升35.9个百分点,达到先前工作的2.6倍)。单个优化后的吸引子无需重新训练即可迁移至未见过的黑盒检索器与生成器。我们的研究确立了RAG数据投毒的可扩展范式,并表明模块化、可复用的组件对现代AI系统构成实际威胁。同时揭示了注意力集中度与模型输出间的强关联性,为可解释性研究提供了启示。