Artificial intelligence (AI) is increasingly permeating healthcare, from physician assistants to consumer applications. Since AI algorithm's opacity challenges human interaction, explainable AI (XAI) addresses this by providing AI decision-making insight, but evidence suggests XAI can paradoxically induce over-reliance or bias. We present results from two large-scale experiments (623 lay people; 153 primary care physicians, PCPs) combining a fairness-based diagnosis AI model and different XAI explanations to examine how XAI assistance, particularly multimodal large language models (LLMs), influences diagnostic performance. AI assistance balanced across skin tones improved accuracy and reduced diagnostic disparities. However, LLM explanations yielded divergent effects: lay users showed higher automation bias - accuracy boosted when AI was correct, reduced when AI erred - while experienced PCPs remained resilient, benefiting irrespective of AI accuracy. Presenting AI suggestions first also led to worse outcomes when the AI was incorrect for both groups. These findings highlight XAI's varying impact based on expertise and timing, underscoring LLMs as a "double-edged sword" in medical AI and informing future human-AI collaborative system design.
翻译:人工智能(AI)正日益渗透医疗健康领域,从医师助手到消费级应用。由于AI算法的黑箱特性对人类交互构成挑战,可解释人工智能(XAI)通过提供AI决策的透明洞察来应对这一问题,但研究表明XAI可能矛盾地引发过度依赖或认知偏差。我们通过两项大规模实验(623名普通公众;153名初级保健医师)的结果,结合基于公平性的诊断AI模型与不同的XAI解释方法,探究XAI辅助(特别是多模态大语言模型)如何影响诊断表现。跨肤色均衡设计的AI辅助提升了诊断准确率并减少了诊断差异。然而,LLM解释产生了分化效应:普通用户表现出更高的自动化偏见——当AI正确时准确率提升,AI错误时准确率下降;而经验丰富的初级保健医师则保持稳定性,无论AI准确与否均能获益。若先呈现AI建议,当AI错误时两组受试者的诊断结果均会恶化。这些发现揭示了XAI根据专业水平与介入时机的差异化影响,凸显了LLM在医疗AI中作为“双刃剑”的特性,并为未来人机协作系统设计提供了依据。