Debugging is the dominant cost in modern hardware verification, where assertion failures are among the most frequent and expensive to resolve. While Large Language Models (LLMs) show promise, they often fail to capture the precise, reusable expertise that engineers apply, leading to inaccurate responses. We propose GROVE, a hierarchical knowledge management framework that learns and organizes reusable debugging expertise into an LLM-organized knowledge tree for solving assertion failures. GROVE distills debugging knowledge from prior cases and organizes it into a vertical tree of configurable depth, with each node encoding a concise knowledge item and explicit applicability conditions. During training, GROVE uses a parallel, gradient-free loop where an LLM proposes tree modifications as structured JSON edits by learning from the cases. At test time, a budget-aware iterative zoom is performed to navigate the tree, retrieving a small set of applicable knowledge items that guide a base LLM's hypothesis generation and fix proposals. Evaluated on a suite of assertion-failure cases, GROVE delivers consistent gains in pass@1 and pass@5, demonstrating the value of structured knowledge evolution.
翻译:调试是现代硬件验证中的主要成本,其中断言失败是最常见且解决成本最高的问题之一。尽管大型语言模型(LLM)展现出潜力,但它们往往难以捕捉工程师所应用的精确、可复用的专业知识,导致响应不准确。我们提出了GROVE,一种分层知识管理框架,它学习并将可复用的调试专业知识组织成一个由LLM组织的知识树,用于解决断言失败。GROVE从先前案例中提炼调试知识,并将其组织成一个深度可配置的垂直树结构,每个节点编码一个简洁的知识项及明确的适用条件。在训练过程中,GROVE采用并行、无梯度的循环,其中LLM通过学习案例,以结构化JSON编辑的形式提出树结构的修改建议。在测试时,执行预算感知的迭代聚焦以遍历树结构,检索一小部分适用的知识项,这些知识项指导基础LLM的假设生成和修复建议。在一系列断言失败案例上的评估表明,GROVE在pass@1和pass@5指标上均实现了持续提升,证明了结构化知识演进的价值。