MCP-SafetyBench：基于真实世界MCP服务器的大型语言模型安全性评估基准 (MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers)

Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a key enabler of this transition, offering a standardized interface for connecting LLMs with heterogeneous tools and services. Yet MCP's openness and multi-server workflows introduce new safety risks that existing benchmarks fail to capture, as they focus on isolated attacks or lack real-world coverage. We present MCP-SafetyBench, a comprehensive benchmark built on real MCP servers that supports realistic multi-turn evaluation across five domains: browser automation, financial analysis, location navigation, repository management, and web search. It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides, and includes tasks requiring multi-step reasoning and cross-server coordination under uncertainty. Using MCP-SafetyBench, we systematically evaluate leading open- and closed-source LLMs, revealing large disparities in safety performance and escalating vulnerabilities as task horizons and server interactions grow. Our results highlight the urgent need for stronger defenses and establish MCP-SafetyBench as a foundation for diagnosing and mitigating safety risks in real-world MCP deployments.

翻译：大型语言模型（LLMs）正逐步演化为具备推理、规划及操作外部工具能力的智能体系统。模型上下文协议（MCP）是实现这一转型的关键赋能技术，它为LLMs与异构工具及服务提供了标准化接口。然而，MCP的开放性和多服务器工作流引入了现有基准测试未能涵盖的新型安全风险——这些基准往往聚焦于孤立攻击场景或缺乏真实世界覆盖。本文提出MCP-SafetyBench：一个基于真实MCP服务器构建的综合基准测试框架，支持跨浏览器自动化、金融分析、位置导航、仓库管理与网络搜索五大领域的真实多轮次评估。该基准整合了涵盖服务器端、主机端与用户端的20种MCP攻击类型统一分类体系，并包含需在不确定条件下进行多步推理及跨服务器协同的任务。通过MCP-SafetyBench，我们系统评估了主流开源与闭源LLMs，揭示了安全性能的显著差异以及随任务跨度与服务器交互增加而升级的脆弱性。研究结果凸显了强化防御机制的迫切需求，同时确立了MCP-SafetyBench作为诊断和缓解真实世界MCP部署安全风险的基础性工具。

相关内容

服务器

关注 0

服务器，也称伺服器，是提供计算服务的设备。由于服务器需要响应服务请求，并进行处理，因此一般来说服务器应具备承担服务并且保障服务的能力。
服务器的构成包括处理器、硬盘、内存、系统总线等，和通用的计算机架构类似，但是由于需要提供高可靠的服务，因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。

面向具身操作的高效视觉–语言–动作模型：系统综述

专知会员服务

21+阅读 · 10月22日

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

专知会员服务

12+阅读 · 9月22日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日