codeXGLUE:用于了解和生成守则的机械学习基准数据集 (CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation)

Shuai Lu,Daya Guo,Shuo Ren,Junjie Huang,Alexey Svyatkovskiy,Ambrosio Blanco,Colin Clement,Dawn Drain,Daxin Jiang,Duyu Tang,Ge Li,Lidong Zhou,Linjun Shou,Long Zhou,Michele Tufano,Ming Gong,Ming Zhou,Nan Duan,Neel Sundaresan,Shao Kun Deng,Shengyu Fu,Shujie Liu

from arxiv, 14 pages; Revise CodeBLEU scores for all models on text-to-code task

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

翻译：基准数据集对加速编制方案语言任务的研究具有重大影响。在本文件中,我们引入了CodXGLUE,这是一个基准数据集,用于促进机器学习研究,以促进了解和生成方案。 CodXGLUE包括一个跨14个数据集的10项任务汇编和一个模型评估和比较平台。 CodXGLUE还包含三个基准系统,包括BERT型、GPT型和Encoder-Decoder型模型,使研究人员能够方便地使用该平台。这些数据和基线的提供有助于开发和验证可用于各种方案理解和生成问题的新方法。