Prompt injection attacks pose a pervasive threat to the security of Large Language Models (LLMs). State-of-the-art prevention-based defenses typically rely on fine-tuning an LLM to enhance its security, but they achieve limited effectiveness against strong attacks. In this work, we propose \emph{SecInfer}, a novel defense against prompt injection attacks built on \emph{inference-time scaling}, an emerging paradigm that boosts LLM capability by allocating more compute resources for reasoning during inference. SecInfer consists of two key steps: \emph{system-prompt-guided sampling}, which generates multiple responses for a given input by exploring diverse reasoning paths through a varied set of system prompts, and \emph{target-task-guided aggregation}, which selects the response most likely to accomplish the intended task. Extensive experiments show that, by leveraging additional compute at inference, SecInfer effectively mitigates both existing and adaptive prompt injection attacks, outperforming state-of-the-art defenses as well as existing inference-time scaling approaches.
翻译:提示注入攻击对大型语言模型(LLMs)的安全性构成普遍威胁。当前最先进的基于预防的防御方法通常依赖于对LLM进行微调以增强其安全性,但在应对强攻击时效果有限。本研究提出一种新颖的防御方法——SecInfer,其基于新兴的推理时扩展范式构建,该范式通过在推理阶段分配更多计算资源进行推理来提升LLM能力。SecInfer包含两个关键步骤:系统提示引导采样(通过多样化的系统提示集探索不同推理路径,为给定输入生成多个响应)和目标任务引导聚合(选择最可能完成预期任务的响应)。大量实验表明,通过利用推理阶段的额外计算,SecInfer能有效缓解现有及自适应的提示注入攻击,其性能优于最先进的防御方法及现有推理时扩展方案。