通过持续领域预训练实现自演进大语言模型的稳健不确定性量化 (Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining)

Continual Learning (CL) is essential for enabling self-evolving large language models (LLMs) to adapt and remain effective amid rapid knowledge growth. Yet, despite its importance, little attention has been given to establishing statistical reliability guarantees for LLMs under CL, particularly in the setting of continual domain pretraining (CDP). Conformal Prediction (CP) has shown promise in offering correctness guarantees for LLMs, but it faces major challenges in CDP: testing data often stems from unknown or shifting domain distributions, under which CP may no longer provide valid guarantees. Moreover, when high coverage is required, CP can yield excessively large prediction sets for unanswerable queries, reducing informativeness. To address these challenges, we introduce an adaptive rejection and non-exchangeable CP framework. Our method first estimates the distribution of questions across domains in the test set using transformer-based clustering, then reweights or resamples the calibration data accordingly. Building on this, adaptive rejection CP allows the LLM to selectively abstain from answering when its confidence or competence shifts significantly. Extensive experiments demonstrate that our framework enhances both the effectiveness and reliability of CP under CDP scenarios. Our code is available at: https://anonymous.4open.science/r/CPCL-8C12/

翻译：持续学习对于使自演进大语言模型能够适应知识快速增长的挑战并保持有效性至关重要。然而，尽管其重要性不言而喻，在持续学习环境下为大语言模型建立统计可靠性保障的研究却鲜有涉及，尤其是在持续领域预训练场景中。保形预测已被证明能为大语言模型提供正确性保证，但在持续领域预训练中面临重大挑战：测试数据往往来源于未知或动态变化的领域分布，在此情况下保形预测可能无法继续提供有效保证。此外，当需要高覆盖率时，保形预测可能对无法回答的查询产生过大的预测集，从而降低信息量。为解决这些挑战，我们提出了一种自适应拒绝与非可交换保形预测框架。我们的方法首先利用基于Transformer的聚类估计测试集中问题在跨领域的分布，然后相应地对校准数据进行重加权或重采样。在此基础上，自适应拒绝保形预测允许大语言模型在其置信度或能力发生显著变化时有选择地拒绝回答。大量实验表明，我们的框架在持续领域预训练场景下显著提升了保形预测的有效性和可靠性。我们的代码公开于：https://anonymous.4open.science/r/CPCL-8C12/

相关内容

关注 1

这是第25届年度会议，讨论有约束计算的所有方面，包括理论、算法、环境、语言、模型、系统和应用，如决策、资源分配、调度、配置和规划。为了纪念25周年，吉恩·弗洛伊德创作了一本“虚拟卷”来庆祝这个系列会议。信息可以在这里找到。约束编程协会有本系列中以前的会议列表。CP 2019计划将包括展示关于约束技术的高质量科学论文。除了通常的技术轨道外，CP 2019年会议还将有主题轨道。每个赛道都有一个专门的小组委员会，以确保有能力的评审员将审查这些领域的人提交的论文。官网链接：https://cp2019.a4cp.org/index.html

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日