FedSEA-LLaMA：一种安全、高效且自适应的面向大型语言模型的联邦分割框架 (FedSEA-LLaMA: A Secure, Efficient and Adaptive Federated Splitting Framework for Large Language Models)

Private data holds promise for improving LLMs due to its high quality, but its scattered distribution across data silos and the high computational demands of LLMs limit their deployment in federated environments. To address this, the transformer-based federated split models are proposed, which offload most model parameters to the server (or distributed clients) while retaining only a small portion on the client to ensure data privacy. Despite this design, they still face three challenges: 1) Peer-to-peer key encryption struggles to secure transmitted vectors effectively; 2) The auto-regressive nature of LLMs means that federated split learning can only train and infer sequentially, causing high communication overhead; 3) Fixed partition points lack adaptability to downstream tasks. In this paper, we introduce FedSEA-LLaMA, a Secure, Efficient, and Adaptive Federated splitting framework based on LLaMA2. First, we inject Gaussian noise into forward-pass hidden states to enable secure end-to-end vector transmission. Second, we employ attention-mask compression and KV cache collaboration to reduce communication costs, accelerating training and inference. Third, we allow users to dynamically adjust the partition points for input/output blocks based on specific task requirements. Experiments on natural language understanding, summarization, and conversational QA tasks show that FedSEA-LLaMA maintains performance comparable to centralized LLaMA2 and achieves up to 8x speedups in training and inference. Further analysis of privacy attacks and different partition points also demonstrates the effectiveness of FedSEA-LLaMA in security and adaptability.

翻译：私有数据因其高质量而有望提升大型语言模型的性能，但其分散在数据孤岛中的分布特性以及大型语言模型的高计算需求，限制了它们在联邦环境中的部署。为此，研究者提出了基于Transformer的联邦分割模型，该模型将大部分模型参数卸载到服务器（或分布式客户端），而仅在客户端保留一小部分以确保数据隐私。尽管采用了这一设计，这些模型仍面临三个挑战：1）点对点密钥加密难以有效保护传输的向量；2）大型语言模型的自回归特性意味着联邦分割学习只能顺序进行训练和推理，导致高昂的通信开销；3）固定的分割点缺乏对下游任务的适应性。本文提出FedSEA-LLaMA，一种基于LLaMA2的安全、高效且自适应的联邦分割框架。首先，我们在前向传播的隐藏状态中注入高斯噪声，以实现安全的端到端向量传输。其次，我们采用注意力掩码压缩和KV缓存协作来降低通信成本，加速训练和推理过程。第三，我们允许用户根据具体任务需求动态调整输入/输出块的分割点。在自然语言理解、摘要生成和对话问答任务上的实验表明，FedSEA-LLaMA保持了与集中式LLaMA2相当的性能，并在训练和推理中实现了高达8倍的加速。对隐私攻击和不同分割点的进一步分析也验证了FedSEA-LLaMA在安全性和适应性方面的有效性。