We introduce ideal attribution mechanisms, a formal abstraction for reasoning about attribution decisions over strings. At the core of this abstraction lies the ledger, an append-only log of the prompt-response interaction history between a model and its user. Each mechanism produces deterministic decisions based on the ledger and an explicit selection criterion, making it well-suited to serve as a ground truth for attribution. We frame the design goal of watermarking schemes as faithful representation of ideal attribution mechanisms. This novel perspective brings conceptual clarity, replacing piecemeal probabilistic statements with a unified language for stating the guarantees of each scheme. It also enables precise reasoning about desiderata for future watermarking schemes, even when no current construction achieves them, since the ideal functionalities are specified first. In this way, the framework provides a roadmap that clarifies which guarantees are attainable in an idealized setting and worth pursuing in practice.
翻译:我们引入了理想归因机制,这是一种用于推理字符串归因决策的形式化抽象。该抽象的核心在于账本——一个记录模型与用户之间提示-响应交互历史的仅追加日志。每个机制基于账本和明确的选取准则产生确定性决策,使其非常适合作为归因的基准真值。我们将水印方案的设计目标框定为对理想归因机制的可信表示。这一新颖视角带来了概念上的清晰性,用统一语言阐述每个方案的保障性声明,取代了零散的概率性表述。由于理想功能首先被明确规范,该框架还能精确推理未来水印方案的期望特性,即使当前尚无实现方案。通过这种方式,该框架提供了一条路线图,阐明了哪些保障在理想化设置中可实现,并值得在实践中追求。