Existing large language models (LLMs) occasionally generate plausible yet factually incorrect responses, known as hallucinations. Two main approaches have been proposed to mitigate hallucinations: retrieval-augmented language models (RALMs) and refusal post-training. However, current research predominantly focuses on their individual effectiveness while overlooking the evaluation of the refusal capability of RALMs. Ideally, if RALMs know when they do not know, they should refuse to answer.In this study, we ask the fundamental question: Do RALMs know when they don't know? Specifically, we investigate three questions. First, are RALMs well calibrated with respect to different internal and external knowledge states? We examine the influence of various factors. Contrary to expectations, when all retrieved documents are irrelevant, RALMs still tend to refuse questions they could have answered correctly. Next, given the model's pronounced \textbf{over-refusal} behavior, we raise a second question: How does a RALM's refusal ability align with its calibration quality? Our results show that the over-refusal problem can be mitigated through in-context fine-tuning. However, we observe that improved refusal behavior does not necessarily imply better calibration or higher overall accuracy. Finally, we ask: Can we combine refusal-aware RALMs with uncertainty-based answer abstention to mitigate over-refusal? We develop a simple yet effective refusal mechanism for refusal-post-trained RALMs that improves their overall answer quality by balancing refusal and correct answers. Our study provides a more comprehensive understanding of the factors influencing RALM behavior. Meanwhile, we emphasize that uncertainty estimation for RALMs remains an open problem deserving deeper investigation.
翻译:现有的大型语言模型(LLMs)偶尔会产生看似合理但事实错误的回答,这种现象被称为“幻觉”。为缓解幻觉问题,目前主要提出两种方法:检索增强语言模型(RALMs)和拒绝式后训练。然而,当前研究多聚焦于各自方法的有效性,而忽视了对RALMs拒绝能力的评估。理想情况下,若RALMs能识别自身知识盲区,则应拒绝回答。本研究提出一个根本性问题:RALMs是否知晓其未知领域?具体而言,我们探究三个问题。首先,RALMs在不同内部与外部知识状态下的校准程度如何?我们考察了多种因素的影响。与预期相反,当所有检索文档均不相关时,RALMs仍倾向于拒绝本可正确回答的问题。其次,鉴于模型表现出显著的“过度拒绝”行为,我们提出第二个问题:RALMs的拒绝能力如何与其校准质量相匹配?实验结果表明,通过上下文微调可缓解过度拒绝问题。但我们发现,拒绝行为的改善并不必然意味着更好的校准或更高的整体准确率。最后,我们探讨:能否将具备拒绝意识的RALMs与基于不确定性的答案弃权机制结合,以缓解过度拒绝?我们为经过拒绝后训练的RALMs设计了一种简单有效的拒绝机制,通过平衡拒绝行为与正确答案来提升整体回答质量。本研究为理解影响RALM行为的因素提供了更全面的视角,同时强调RALMs的不确定性估计仍是一个值得深入探究的开放性问题。