Recent advances confirm that large language models (LLMs) can achieve state-of-the-art performance across various tasks. However, due to the resource-intensive nature of training LLMs from scratch, it is urgent and crucial to protect the intellectual property of LLMs against infringement. This has motivated the authors in this paper to propose a novel black-box fingerprinting technique for LLMs. We firstly demonstrate that the outputs of LLMs span a unique vector space associated with each model. We model the problem of fingerprint authentication as the task of evaluating the similarity between the space of the victim model and the space of the suspect model. To tackle with this problem, we introduce two solutions: the first determines whether suspect outputs lie within the victim's subspace, enabling fast infringement detection; the second reconstructs a joint subspace to detect models modified via parameter-efficient fine-tuning (PEFT). Experiments indicate that the proposed method achieves superior performance in fingerprint verification and robustness against the PEFT attacks. This work reveals inherent characteristics of LLMs and provides a promising solution for protecting LLMs, ensuring efficiency, generality and practicality.
翻译:近期研究证实,大型语言模型(LLMs)能够在多种任务中达到最先进的性能。然而,由于从头训练LLMs需要大量资源,保护LLMs的知识产权免受侵权变得尤为紧迫和关键。为此,本文作者提出了一种新颖的LLMs黑盒指纹识别技术。我们首先证明,LLMs的输出会形成一个与每个模型相关联的独特向量空间。我们将指纹认证问题建模为评估受害模型空间与嫌疑模型空间之间相似度的任务。针对此问题,我们提出了两种解决方案:第一种方法判断嫌疑输出是否位于受害模型的子空间内,从而实现快速侵权检测;第二种方法通过重构联合子空间来检测经过参数高效微调(PEFT)修改的模型。实验表明,所提方法在指纹验证方面表现出优越性能,并对PEFT攻击具有鲁棒性。本研究揭示了LLMs的内在特性,为保护LLMs提供了一种高效、通用且实用的解决方案。