利用NTPs实现视觉语言模型的高效幻觉检测 (Leveraging NTPs for Efficient Hallucination Detection in VLMs)

from arxiv, Accepted to The First Workshop on Confabulation, Hallucinations, & Overgeneration in Multilingual & Precision-critical Setting - AACL-IJCNLP2025

Hallucinations of vision-language models (VLMs), which are misalignments between visual content and generated text, undermine the reliability of VLMs. One common approach for detecting them employs the same VLM, or a different one, to assess generated outputs. This process is computationally intensive and increases model latency. In this paper, we explore an efficient on-the-fly method for hallucination detection by training traditional ML models over signals based on the VLM's next-token probabilities (NTPs). NTPs provide a direct quantification of model uncertainty. We hypothesize that high uncertainty (i.e., a low NTP value) is strongly associated with hallucinations. To test this, we introduce a dataset of 1,400 human-annotated statements derived from VLM-generated content, each labeled as hallucinated or not, and use it to test our NTP-based lightweight method. Our results demonstrate that NTP-based features are valuable predictors of hallucinations, enabling fast and simple ML models to achieve performance comparable to that of strong VLMs. Furthermore, augmenting these NTPs with linguistic NTPs, computed by feeding only the generated text back into the VLM, enhances hallucination detection performance. Finally, integrating hallucination prediction scores from VLMs into the NTP-based models led to better performance than using either VLMs or NTPs alone. We hope this study paves the way for simple, lightweight solutions that enhance the reliability of VLMs.

翻译：视觉语言模型（VLMs）的幻觉，即视觉内容与生成文本之间的错位，削弱了VLMs的可靠性。一种常见的检测方法使用同一VLM或另一VLM来评估生成输出，这一过程计算密集且增加模型延迟。本文探索一种高效的实时幻觉检测方法，通过在VLM的下一词概率（NTPs）信号上训练传统机器学习模型来实现。NTPs直接量化了模型的不确定性，我们假设高不确定性（即低NTP值）与幻觉密切相关。为验证此假设，我们引入了一个包含1,400条人工标注语句的数据集，这些语句源自VLM生成内容，每条均标记为是否幻觉，并用于测试基于NTP的轻量级方法。结果表明，基于NTP的特征是有效的幻觉预测因子，使快速简单的机器学习模型能达到与强VLM相当的性能。此外，通过仅将生成文本反馈至VLM计算语言NTPs来增强这些NTPs，可提升幻觉检测性能。最后，将VLM的幻觉预测分数整合到基于NTP的模型中，其性能优于单独使用VLM或NTPs。我们希望本研究为开发简单、轻量的解决方案铺平道路，以增强VLMs的可靠性。