TB or Not TB：基于覆盖率驱动的直接偏好优化用于Verilog激励生成 (TB or Not TB: Coverage-Driven Direct Preference Optimization for Verilog Stimulus Generation)

With the rapid advancement of Large Language Models (LLMs), there is growing interest in applying them to hardware design and verification. Among these stages, design verification remains the most time-consuming and resource-intensive phase, where generating effective stimuli for the design under test (DUT) is both critical and labor-intensive. We present {\it TB or not TB}, a framework for automated stimulus generation using LLMs fine-tuned through Coverage-Driven Direct Preference Optimization (CD-DPO). To enable preference-based training, we introduce PairaNet, a dataset derived from PyraNet that pairs high- and low-quality testbenches labeled using simulation-derived coverage metrics. The proposed CD-DPO method integrates quantitative coverage feedback directly into the optimization objective, guiding the model toward generating stimuli that maximize verification coverage. Experiments on the CVDP CID12 benchmark show that {\it TB or not TB} outperforms both open-source and commercial baselines, achieving up to 77.27\% improvement in code coverage, demonstrating the effectiveness of Coverage-driven preference optimization for LLM-based hardware verification.

翻译：随着大语言模型（LLMs）的快速发展，将其应用于硬件设计与验证领域的兴趣日益增长。在这些阶段中，设计验证仍是最耗时且资源密集的环节，而为待测设计（DUT）生成有效的激励既是关键任务，也是劳动密集型工作。我们提出了{\it TB or not TB}，一个利用通过覆盖率驱动的直接偏好优化（CD-DPO）微调的LLMs实现自动化激励生成的框架。为支持基于偏好的训练，我们引入了PairaNet数据集，该数据集源自PyraNet，包含使用仿真衍生的覆盖率指标标注的高质量与低质量测试平台对。所提出的CD-DPO方法将定量覆盖率反馈直接整合到优化目标中，引导模型生成能最大化验证覆盖率的激励。在CVDP CID12基准测试上的实验表明，{\it TB or not TB}在代码覆盖率上实现了高达77.27%的提升，优于开源和商业基线，证明了覆盖率驱动的偏好优化在基于LLM的硬件验证中的有效性。