With the rise of large language models, service providers offer language models as a service, enabling users to fine-tune customized models via uploaded private datasets. However, this raises concerns about sensitive data leakage. Prior methods, relying on differential privacy within device-cloud collaboration frameworks, struggle to balance privacy and utility, exposing users to inference attacks or degrading fine-tuning performance. To address this, we propose PrivTune, an efficient and privacy-preserving fine-tuning framework via Split Learning (SL). The key idea of PrivTune is to inject crafted noise into token representations from the SL bottom model, making each token resemble the $n$-hop indirect neighbors. PrivTune formulates this as an optimization problem to compute the optimal noise vector, aligning with defense-utility goals. On this basis, it then adjusts the parameters (i.e., mean) of the $d_χ$-Privacy noise distribution to align with the optimization direction and scales the noise according to token importance to minimize distortion. Experiments on five datasets (covering both classification and generation tasks) against three embedding inversion and three attribute inference attacks show that, using RoBERTa on the Stanford Sentiment Treebank dataset, PrivTune reduces the attack success rate to 10% with only a 3.33% drop in utility performance, outperforming state-of-the-art baselines.
翻译:随着大语言模型的兴起,服务提供商提供语言模型即服务,允许用户通过上传私有数据集微调定制化模型。然而,这引发了敏感数据泄露的担忧。现有方法依赖于设备-云协作框架内的差分隐私技术,难以平衡隐私性与实用性,导致用户面临推理攻击或微调性能下降的问题。为解决这一挑战,我们提出PrivTune——一种基于分割学习的高效隐私保护微调框架。PrivTune的核心思想是在分割学习底层模型生成的词元表示中注入精心设计的噪声,使每个词元特征趋近于其$n$跳间接邻居的表示。该框架将此过程形式化为优化问题,通过计算最优噪声向量以实现防御目标与模型效用的对齐。在此基础上,通过调整$d_χ$-隐私噪声分布的参数(如均值)以匹配优化方向,并依据词元重要性缩放噪声强度以最小化特征失真。在涵盖分类与生成任务的五个数据集上,针对三类嵌入反演攻击与三类属性推理攻击的实验表明:在斯坦福情感树库数据集上使用RoBERTa模型时,PrivTune仅以3.33%的效用性能损失将攻击成功率降至10%,性能优于现有最优基线方法。