Large language models (LLMs) can suffer from hallucinations when generating text. These hallucinations impede various applications in society and industry by making LLMs untrustworthy. Current LLMs generate text in an autoregressive fashion by predicting and appending text tokens. When an LLM is uncertain about the semantic meaning of the next tokens to generate, it is likely to start hallucinating. Thus, it has been suggested that predictive uncertainty is one of the main causes of hallucinations. We introduce Semantically Diverse Language Generation (SDLG) to quantify predictive uncertainty in LLMs. SDLG steers the LLM to generate semantically diverse yet likely alternatives for an initially generated text. This approach provides a precise measure of aleatoric semantic uncertainty, detecting whether the initial text is likely to be hallucinated. Experiments on question-answering tasks demonstrate that SDLG consistently outperforms existing methods while being the most computationally efficient, setting a new standard for uncertainty estimation in LLMs.
翻译:大型语言模型(LLMs)在生成文本时可能出现幻觉现象。这些幻觉使得LLMs变得不可靠,从而阻碍了社会和工业中的多种应用。当前的LLMs以自回归方式生成文本,通过预测并追加文本标记来实现。当LLM对接下来要生成的标记的语义含义不确定时,它很可能开始产生幻觉。因此,预测不确定性被认为是幻觉的主要成因之一。我们引入了语义多样化语言生成(SDLG)来量化LLMs中的预测不确定性。SDLG引导LLM为初始生成的文本生成语义多样化但可能性较高的替代文本。这种方法提供了对偶然性语义不确定性的精确度量,能够检测初始文本是否可能为幻觉。在问答任务上的实验表明,SDLG在计算效率最高的同时,持续优于现有方法,为LLMs中的不确定性估计设立了新标准。