Accurate and reliable search on online healthcare platforms is critical for user safety and service efficacy. Traditional methods, however, often fail to comprehend complex and nuanced user queries, limiting their effectiveness. Large language models (LLMs) present a promising solution, offering powerful semantic understanding to bridge this gap. Despite their potential, deploying LLMs in this high-stakes domain is fraught with challenges, including factual hallucinations, specialized knowledge gaps, and high operational costs. To overcome these barriers, we introduce \textbf{AR-Med}, a novel framework for \textbf{A}utomated \textbf{R}elevance assessment for \textbf{Med}ical search that has been successfully deployed at scale on the Online Medical Delivery Platforms. AR-Med grounds LLM reasoning in verified medical knowledge through a retrieval-augmented approach, ensuring high accuracy and reliability. To enable efficient online service, we design a practical knowledge distillation scheme that compresses large teacher models into compact yet powerful student models. We also introduce LocalQSMed, a multi-expert annotated benchmark developed to guide model iteration and ensure strong alignment between offline and online performance. Extensive experiments show AR-Med achieves an offline accuracy of over 93\%, a 24\% absolute improvement over the original online system, and delivers significant gains in online relevance and user satisfaction. Our work presents a practical and scalable blueprint for developing trustworthy, LLM-powered systems in real-world healthcare applications.
翻译:在线医疗平台上的准确可靠搜索对用户安全和服务效能至关重要。然而,传统方法往往难以理解复杂且微妙的用户查询,限制了其有效性。大语言模型(LLMs)提供了有前景的解决方案,其强大的语义理解能力有助于弥合这一差距。尽管潜力巨大,在这一高风险领域部署LLMs仍面临诸多挑战,包括事实性幻觉、专业知识缺口以及高昂的运营成本。为克服这些障碍,我们提出了\\textbf{AR-Med}——一种面向\\textbf{医疗}搜索的\\textbf{自动化相关性}评估新框架,该框架已在在线医疗配送平台上成功实现大规模部署。AR-Med通过检索增强方法将LLM推理建立在已验证的医学知识基础上,确保了高准确性与可靠性。为实现高效在线服务,我们设计了一种实用的知识蒸馏方案,将大型教师模型压缩为紧凑而强大的学生模型。我们还引入了LocalQSMed——一个为引导模型迭代并确保离线与在线性能强对齐而开发的多专家标注基准。大量实验表明,AR-Med在离线准确率上超过93%,较原在线系统实现24%的绝对提升,并在在线相关性和用户满意度方面带来显著增益。本研究为在现实世界医疗应用中开发可信赖的LLM驱动系统提供了实用且可扩展的蓝图。