Large Language Models (LLMs) and their agent systems have recently demonstrated strong potential in automating code reasoning and vulnerability detection. However, when applied to large-scale firmware, their performance degrades due to the binary nature of firmware, complex dependency structures, and heterogeneous components. To address this challenge, this paper presents FIRMHIVE, a recursive agent hive that enables LLMs to act as autonomous firmware security analysts. FIRMHIVE introduces two key mechanisms: (1) transforming delegation into a per-agent, executable primitive and (2) constructing a runtime Tree of Agents (ToA) for decentralized coordination. We evaluate FIRMHIVE using real-world firmware images obtained from publicly available datasets, covering five representative security analysis tasks. Compared with existing LLM-agent baselines, FIRMHIVE performs deeper (about 16x more reasoning steps) and broader (about 2.3x more files inspected) cross-file exploration, resulting in about 5.6x more alerts per firmware. Compared to state-of-the-art (SOTA) security tools, FIRMHIVE identifies about 1.5x more vulnerabilities (1,802 total) and achieves 71% precision, representing significant improvements in both yield and fidelity.
翻译:大型语言模型(LLMs)及其代理系统近期在自动化代码推理与漏洞检测方面展现出强大潜力。然而,当应用于大规模固件时,由于固件的二进制特性、复杂的依赖结构和异构组件,其性能会显著下降。为应对这一挑战,本文提出FIRMHIVE——一种递归代理集群,使LLMs能够作为自主的固件安全分析专家。FIRMHIVE引入两个关键机制:(1)将委托转化为每个代理可执行的原子操作;(2)构建运行时代理树(ToA)以实现去中心化协同。我们使用从公开数据集获取的真实固件镜像对FIRMHIVE进行评估,涵盖五项代表性安全分析任务。与现有LLM代理基线相比,FIRMHIVE实现了更深(推理步骤增加约16倍)更广(检查文件数增加约2.3倍)的跨文件探索,使每个固件的告警数量提升约5.6倍。相较于最先进(SOTA)的安全工具,FIRMHIVE多发现约1.5倍的漏洞(总计1802个),并达到71%的精确率,在产出量与准确性方面均实现显著提升。