FedNLP: 自然语言处理方面联邦学习研究平台 (FedNLP: A Research Platform for Federated Learning in Natural Language Processing)

Increasing concerns and regulations about data privacy, necessitate the study of privacy-preserving methods for natural language processing (NLP) applications. Federated learning (FL) provides promising methods for a large number of clients (i.e., personal devices or organizations) to collaboratively learn a shared global model to benefit all clients, while allowing users to keep their data locally. To facilitate FL research in NLP, we present the FedNLP, a research platform for federated learning in NLP. FedNLP supports various popular task formulations in NLP such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling. We also implement an interface between Transformer language models (e.g., BERT) and FL methods (e.g., FedAvg, FedOpt, etc.) for distributed training. The evaluation protocol of this interface supports a comprehensive collection of non-IID partitioning strategies. Our preliminary experiments with FedNLP reveal that there exists a large performance gap between learning on decentralized and centralized datasets -- opening intriguing and exciting future research directions aimed at developing FL methods suited to NLP tasks.

翻译：联邦学习(FL)为大量客户(即个人设备或组织)提供了有希望的方法,以便合作学习一个共享的全球模式,使所有客户受益,同时允许用户将其数据保存在本地。为了便利在NLP中进行FL研究,我们介绍了FedNLP,这是在NLP中进行联邦化学习的研究平台。 FedNLP支持在NLP中的各种流行任务配制,如文本分类、序列标记、问答、后继2eq生成和语言模型。我们还在变换语言模型(例如BERET)和FL方法(例如FedAvg、FedOpt等)之间实施接口,用于分发培训。这一接口的评价协议支持全面收集非IID分区战略。我们与FedNP的初步实验显示,在分流和集中数据集的学习方面存在很大的绩效差距 -- -- 开端和激动人心的未来研究方向是开发FL方法,以适应FL方法。

相关内容

联邦学习

关注 199

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。