This paper articulates short- and long-term research problems in AI agent security and privacy, using the lens of computer systems security. This approach examines end-to-end security properties of entire systems, rather than AI models in isolation. While we recognize that hardening a single model is useful, it is important to realize that it is often insufficient. By way of an analogy, creating a model that is always helpful and harmless is akin to creating software that is always helpful and harmless. The collective experience of decades of cybersecurity research and practice shows that this is insufficient. Rather, constructing an informed and realistic attacker model before building a system, applying hard-earned lessons from software security, and continuous improvement of security posture is a tried-and-tested approach to securing real computer systems. A key goal is to examine where research challenges arise when applying traditional security principles in the context of AI agents. A secondary goal of this report is to distill these ideas for AI and ML practitioners and researchers. We discuss the challenges of applying security principles to agentic computing, present 11 case studies of real attacks on agentic systems, and define a series of new research problems specific to the security of agentic systems.
翻译:本文从计算机系统安全的视角,阐述了人工智能智能体安全与隐私领域短期与长期的研究问题。该方法着眼于整个系统的端到端安全属性,而非孤立地考察AI模型。我们承认强化单一模型具有价值,但必须认识到这通常是不够的。类比而言,创建一个始终有益无害的模型,类似于开发始终有益无害的软件。数十年的网络安全研究与实践经验表明,仅此并不足够。相反,在构建系统前建立知情且现实的攻击者模型,应用软件安全领域来之不易的经验教训,并持续改进安全态势,才是保障真实计算机系统安全经得起检验的途径。一个核心目标是探究在AI智能体语境下应用传统安全原则时所产生的研究挑战。本报告的次要目标是为AI与机器学习从业者及研究者提炼这些理念。我们探讨了将安全原则应用于智能体计算所面临的挑战,呈现了11个针对智能体系统的真实攻击案例研究,并定义了一系列专属于智能体系统安全的新研究问题。