优化医疗问答系统：基于RAG框架的微调与零样本大语言模型对比研究 (Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework)

Medical question-answering (QA) systems can benefit from advances in large language models (LLMs), but directly applying LLMs to the clinical domain poses challenges such as maintaining factual accuracy and avoiding hallucinations. In this paper, we present a retrieval-augmented generation (RAG) based medical QA system that combines domain-specific knowledge retrieval with open-source LLMs to answer medical questions. We fine-tune two state-of-the-art open LLMs (LLaMA~2 and Falcon) using Low-Rank Adaptation (LoRA) for efficient domain specialization. The system retrieves relevant medical literature to ground the LLM's answers, thereby improving factual correctness and reducing hallucinations. We evaluate the approach on benchmark datasets (PubMedQA and MedMCQA) and show that retrieval augmentation yields measurable improvements in answer accuracy compared to using LLMs alone. Our fine-tuned LLaMA~2 model achieves 71.8% accuracy on PubMedQA, substantially improving over the 55.4% zero-shot baseline, while maintaining transparency by providing source references. We also detail the system design and fine-tuning methodology, demonstrating that grounding answers in retrieved evidence reduces unsupported content by approximately 60%. These results highlight the potential of RAG-augmented open-source LLMs for reliable biomedical QA, pointing toward practical clinical informatics applications.

翻译：医疗问答系统可从大语言模型（LLMs）的进展中获益，但直接将LLMs应用于临床领域面临诸多挑战，例如保持事实准确性及避免幻觉生成。本文提出一种基于检索增强生成（RAG）的医疗问答系统，该系统结合领域知识检索与开源LLMs来回答医学问题。我们采用低秩自适应（LoRA）方法对两种前沿开源LLM（LLaMA~2与Falcon）进行高效领域微调。系统通过检索相关医学文献来锚定LLM的生成答案，从而提升事实正确性并减少幻觉。我们在基准数据集（PubMedQA与MedMCQA）上评估该方法，结果表明相较于单独使用LLMs，检索增强机制在答案准确率上带来显著提升。经微调的LLaMA~2模型在PubMedQA上达到71.8%的准确率，较零样本基线（55.4%）有大幅改进，同时通过提供参考文献保持透明度。我们还详述了系统设计与微调方法，证明基于检索证据的答案生成可将无依据内容减少约60%。这些结果凸显了RAG增强的开源LLMs在可靠生物医学问答中的潜力，为临床信息学的实际应用指明了方向。