Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks, yet their integration of perception, language, and control introduces new safety vulnerabilities. Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear due to the absence of a unified evaluation framework. One major issue is that differences in action tokenizers across VLA architectures hinder reproducibility and fair comparison. More importantly, most existing attacks have not been validated in real-world scenarios. To address these challenges, we propose AttackVLA, a unified framework that aligns with the VLA development lifecycle, covering data construction, model training, and inference. Within this framework, we implement a broad suite of attacks, including all existing attacks targeting VLAs and multiple adapted attacks originally developed for vision-language models, and evaluate them in both simulation and real-world settings. Our analysis of existing attacks reveals a critical gap: current methods tend to induce untargeted failures or static action states, leaving targeted attacks that drive VLAs to perform precise long-horizon action sequences largely unexplored. To fill this gap, we introduce BackdoorVLA, a targeted backdoor attack that compels a VLA to execute an attacker-specified long-horizon action sequence whenever a trigger is present. We evaluate BackdoorVLA in both simulated benchmarks and real-world robotic settings, achieving an average targeted success rate of 58.4% and reaching 100% on selected tasks. Our work provides a standardized framework for evaluating VLA vulnerabilities and demonstrates the potential for precise adversarial manipulation, motivating further research on securing VLA-based embodied systems.
翻译:视觉-语言-动作(VLA)模型使机器人能够理解自然语言指令并执行多样化任务,然而其感知、语言与控制模块的集成引入了新的安全漏洞。尽管攻击此类模型的兴趣日益增长,但由于缺乏统一的评估框架,现有技术的有效性尚不明确。一个主要问题是,不同VLA架构中动作分词器的差异阻碍了实验的可复现性与公平比较。更重要的是,大多数现有攻击尚未在真实场景中得到验证。为应对这些挑战,我们提出了AttackVLA——一个与VLA开发生命周期(涵盖数据构建、模型训练与推理)相统一的对齐框架。在该框架内,我们实现了广泛的攻击套件,包括所有针对VLA的现有攻击以及多种从视觉-语言模型改编的攻击,并在仿真与真实场景中对其进行了评估。通过对现有攻击的分析,我们发现了一个关键缺陷:当前方法倾向于引发非定向故障或静态动作状态,而驱使VLA执行精确长程动作序列的定向攻击领域仍未被充分探索。为填补这一空白,我们提出了BackdoorVLA——一种定向后门攻击,当触发条件存在时,该攻击会强制VLA执行攻击者指定的长程动作序列。我们在仿真基准测试与真实机器人场景中评估了BackdoorVLA,实现了58.4%的平均定向成功率,并在特定任务中达到100%。本研究为评估VLA漏洞提供了标准化框架,并展示了精确对抗性操纵的潜力,从而推动基于VLA的具身系统安全研究的进一步发展。