Increasing concerns and regulations about data privacy, necessitate the study of privacy-preserving methods for natural language processing (NLP) applications. Federated learning (FL) provides promising methods for a large number of clients (i.e., personal devices or organizations) to collaboratively learn a shared global model to benefit all clients, while allowing users to keep their data locally. To facilitate FL research in NLP, we present the FedNLP, a research platform for federated learning in NLP. FedNLP supports various popular task formulations in NLP such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling. We also implement an interface between Transformer language models (e.g., BERT) and FL methods (e.g., FedAvg, FedOpt, etc.) for distributed training. The evaluation protocol of this interface supports a comprehensive collection of non-IID partitioning strategies. Our preliminary experiments with FedNLP reveal that there exists a large performance gap between learning on decentralized and centralized datasets -- opening intriguing and exciting future research directions aimed at developing FL methods suited to NLP tasks.
翻译:联邦学习(FL)为大量客户(即个人设备或组织)提供了有希望的方法,以便合作学习一个共享的全球模式,使所有客户受益,同时允许用户将其数据保存在本地。为了便利在NLP中进行FL研究,我们介绍了FedNLP,这是在NLP中进行联邦化学习的研究平台。 FedNLP支持在NLP中的各种流行任务配制,如文本分类、序列标记、问答、后继2eq生成和语言模型。我们还在变换语言模型(例如BERET)和FL方法(例如FedAvg、FedOpt等)之间实施接口,用于分发培训。这一接口的评价协议支持全面收集非IID分区战略。我们与FedNP的初步实验显示,在分流和集中数据集的学习方面存在很大的绩效差距 -- -- 开端和激动人心的未来研究方向是开发FL方法,以适应FL方法。