With the rapid advancement of Large Language Models (LLMs), there is growing interest in applying them to hardware design and verification. Among these stages, design verification remains the most time-consuming and resource-intensive phase, where generating effective stimuli for the design under test (DUT) is both critical and labor-intensive. We present {\it TB or not TB}, a framework for automated stimulus generation using LLMs fine-tuned through Coverage-Driven Direct Preference Optimization (CD-DPO). To enable preference-based training, we introduce PairaNet, a dataset derived from PyraNet that pairs high- and low-quality testbenches labeled using simulation-derived coverage metrics. The proposed CD-DPO method integrates quantitative coverage feedback directly into the optimization objective, guiding the model toward generating stimuli that maximize verification coverage. Experiments on the CVDP CID12 benchmark show that {\it TB or not TB} outperforms both open-source and commercial baselines, achieving up to 77.27\% improvement in code coverage, demonstrating the effectiveness of Coverage-driven preference optimization for LLM-based hardware verification.
翻译:随着大语言模型(LLMs)的快速发展,将其应用于硬件设计与验证领域的兴趣日益增长。在这些阶段中,设计验证仍是最耗时且资源密集的环节,而为待测设计(DUT)生成有效的激励既是关键任务,也是劳动密集型工作。我们提出了{\it TB or not TB},一个利用通过覆盖率驱动的直接偏好优化(CD-DPO)微调的LLMs实现自动化激励生成的框架。为支持基于偏好的训练,我们引入了PairaNet数据集,该数据集源自PyraNet,包含使用仿真衍生的覆盖率指标标注的高质量与低质量测试平台对。所提出的CD-DPO方法将定量覆盖率反馈直接整合到优化目标中,引导模型生成能最大化验证覆盖率的激励。在CVDP CID12基准测试上的实验表明,{\it TB or not TB}在代码覆盖率上实现了高达77.27%的提升,优于开源和商业基线,证明了覆盖率驱动的偏好优化在基于LLM的硬件验证中的有效性。