Large language models (LLMs) produce fluent but unsupported answers - hallucinations - limiting safe deployment in high-stakes domains. We propose ECLIPSE, a framework that treats hallucination as a mismatch between a model's semantic entropy and the capacity of available evidence. We combine entropy estimation via multi-sample clustering with a novel perplexity decomposition that measures how models use retrieved evidence. We prove that under mild conditions, the resulting entropy-capacity objective is strictly convex with a unique stable optimum. We evaluate on a controlled financial question answering dataset with GPT-3.5-turbo (n=200 balanced samples with synthetic hallucinations), where ECLIPSE achieves ROC AUC of 0.89 and average precision of 0.90, substantially outperforming a semantic entropy-only baseline (AUC 0.50). A controlled ablation with Claude-3-Haiku, which lacks token-level log probabilities, shows AUC dropping to 0.59 with coefficient magnitudes decreasing by 95% - demonstrating that ECLIPSE is a logprob-native mechanism whose effectiveness depends on calibrated token-level uncertainties. The perplexity decomposition features exhibit the largest learned coefficients, confirming that evidence utilization is central to hallucination detection. We position this work as a controlled mechanism study; broader validation across domains and naturally occurring hallucinations remains future work.
翻译:大语言模型(LLMs)会产生流畅但缺乏依据的答案——即幻觉——这限制了其在高风险领域的安全部署。我们提出了ECLIPSE框架,该框架将幻觉视为模型语义熵与可用证据容量之间的不匹配。我们通过多样本聚类进行熵估计,并结合一种新颖的困惑度分解方法来衡量模型如何使用检索到的证据。我们证明,在温和条件下,由此产生的熵-容量目标函数是严格凸的,具有唯一的稳定最优解。我们在一个受控的金融问答数据集上使用GPT-3.5-turbo(n=200个包含合成幻觉的平衡样本)进行评估,ECLIPSE实现了0.89的ROC AUC和0.90的平均精度,显著优于仅基于语义熵的基线(AUC 0.50)。对缺乏词元级对数概率的Claude-3-Haiku进行的受控消融实验显示,其AUC下降至0.59,系数幅度降低了95%——这表明ECLIPSE是一种原生依赖对数概率的机制,其有效性取决于校准后的词元级不确定性。困惑度分解特征表现出最大的学习系数,证实了证据利用是幻觉检测的核心。我们将此项工作定位为一项受控的机制研究;跨领域及自然产生幻觉的更广泛验证仍是未来的工作。