语言模型的理想归因与可信水印 (Ideal Attribution and Faithful Watermarks for Language Models)

We introduce ideal attribution mechanisms, a formal abstraction for reasoning about attribution decisions over strings. At the core of this abstraction lies the ledger, an append-only log of the prompt-response interaction history between a model and its user. Each mechanism produces deterministic decisions based on the ledger and an explicit selection criterion, making it well-suited to serve as a ground truth for attribution. We frame the design goal of watermarking schemes as faithful representation of ideal attribution mechanisms. This novel perspective brings conceptual clarity, replacing piecemeal probabilistic statements with a unified language for stating the guarantees of each scheme. It also enables precise reasoning about desiderata for future watermarking schemes, even when no current construction achieves them, since the ideal functionalities are specified first. In this way, the framework provides a roadmap that clarifies which guarantees are attainable in an idealized setting and worth pursuing in practice.

翻译：我们引入了理想归因机制，这是一种用于推理字符串归因决策的形式化抽象。该抽象的核心在于账本——一个记录模型与用户之间提示-响应交互历史的仅追加日志。每个机制基于账本和明确的选取准则产生确定性决策，使其非常适合作为归因的基准真值。我们将水印方案的设计目标框定为对理想归因机制的可信表示。这一新颖视角带来了概念上的清晰性，用统一语言阐述每个方案的保障性声明，取代了零散的概率性表述。由于理想功能首先被明确规范，该框架还能精确推理未来水印方案的期望特性，即使当前尚无实现方案。通过这种方式，该框架提供了一条路线图，阐明了哪些保障在理想化设置中可实现，并值得在实践中追求。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/