As healthcare increasingly turns to AI for scalable and trustworthy clinical decision support, ensuring reliability in model reasoning remains a critical challenge. Individual large language models (LLMs) are susceptible to hallucinations and inconsistency, whereas naive ensembles of models often fail to deliver stable and credible recommendations. Building on our previous work on LLM Chemistry, which quantifies the collaborative compatibility among LLMs, we apply this framework to improve the reliability in medication recommendation from brief clinical vignettes. Our approach leverages multi-LLM collaboration guided by Chemistry-inspired interaction modeling, enabling ensembles that are effective (exploiting complementary strengths), stable (producing consistent quality), and calibrated (minimizing interference and error amplification). We evaluate our Chemistry-based Multi-LLM collaboration strategy on real-world clinical scenarios to investigate whether such interaction-aware ensembles can generate credible, patient-specific medication recommendations. Preliminary results are encouraging, suggesting that LLM Chemistry-guided collaboration may offer a promising path toward reliable and trustworthy AI assistants in clinical practice.
翻译:随着医疗健康领域日益依赖人工智能实现可扩展且可信的临床决策支持,确保模型推理的可靠性仍是关键挑战。单一大型语言模型(LLM)易产生幻觉与不一致性,而简单的模型集成往往无法提供稳定可信的推荐。基于我们先前关于LLM化学性质的研究——该研究量化了LLM间的协作相容性,我们将此框架应用于提升基于简短临床案例的药物推荐可靠性。该方法通过受化学启发的交互建模引导多LLM协作,构建出高效(利用互补优势)、稳定(产生一致质量)且校准良好(最小化干扰与误差放大)的集成模型。我们在真实临床场景中评估基于化学性质的多LLM协作策略,以探究此类具备交互感知的集成模型能否生成可信的、针对患者个体的药物推荐。初步结果令人鼓舞,表明LLM化学性质引导的协作可能为临床实践中实现可靠可信的AI辅助系统提供一条可行路径。