ROBoto2：一种用于大语言模型辅助临床试验偏倚风险评估的交互式系统与数据集 (ROBoto2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment)

We present ROBOTO2, an open-source, web-based platform for large language model (LLM)-assisted risk of bias (ROB) assessment of clinical trials. ROBOTO2 streamlines the traditionally labor-intensive ROB v2 (ROB2) annotation process via an interactive interface that combines PDF parsing, retrieval-augmented LLM prompting, and human-in-the-loop review. Users can upload clinical trial reports, receive preliminary answers and supporting evidence for ROB2 signaling questions, and provide real-time feedback or corrections to system suggestions. ROBOTO2 is publicly available at https://roboto2.vercel.app/, with code and data released to foster reproducibility and adoption. We construct and release a dataset of 521 pediatric clinical trial reports (8954 signaling questions with 1202 evidence passages), annotated using both manually and LLM-assisted methods, serving as a benchmark and enabling future research. Using this dataset, we benchmark ROB2 performance for 4 LLMs and provide an analysis into current model capabilities and ongoing challenges in automating this critical aspect of systematic review.

翻译：我们提出了ROBoto2，一个开源的、基于网络的大语言模型（LLM）辅助临床试验偏倚风险（ROB）评估平台。ROBoto2通过集成PDF解析、检索增强的LLM提示以及人在回路审查的交互界面，简化了传统上劳动密集型的ROB v2（ROB2）标注流程。用户可以上传临床试验报告，接收针对ROB2信号问题的初步答案与支持证据，并对系统建议提供实时反馈或修正。ROBoto2已在https://roboto2.vercel.app/公开提供，相关代码与数据已发布以促进可重复性与应用推广。我们构建并发布了一个包含521份儿科临床试验报告的数据集（涵盖8954个信号问题及1202条证据段落），该数据集采用人工与LLM辅助方法进行标注，可作为基准并支持未来研究。利用此数据集，我们对4种LLM的ROB2性能进行了基准测试，并分析了当前模型在自动化这一系统综述关键环节中的能力与持续挑战。