Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study - 专知论文

会员服务 ·

0

Performer · MoDELS · Better · 代码 · 可理解性 ·

2023 年 4 月 24 日

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

翻译：暂无翻译

Tim van Dam,Maliheh Izadi,Arie van Deursen

from arxiv, 13 pages. To appear in the Proceedings of the 20th International Conference on Mining Software Repositories (MSR 2023)

Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer's toolkit. While many have striven to improve the code-understanding abilities of such models, the opposite -- making the code easier to understand -- has not been properly investigated. In this study, we aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion. We consider type annotations and comments as two common forms of additional contextual information that often help developers understand code better. For the experiments, we study code completion in two granularity levels; token and line completion and take three recent and large-scale language models for source code: UniXcoder, CodeGPT, and InCoder with five evaluation metrics. Finally, we perform the Wilcoxon Signed Rank test to gauge significance and measure the effect size. Contrary to our expectations, all models perform better if type annotations are removed (albeit the effect sizes are small). For comments, we find that the models perform better in the presence of multi-line comments (again with small effect sizes). Based on our observations, we recommend making proper design choices when training, fine-tuning, or simply selecting such models given the intended data and application. Better evaluations and multi-modal techniques can also be further investigated to improve the practicality and accuracy of auto-completions.

翻译：暂无翻译

0

相关内容

Performer

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

维、哈、柯跨语言内容过滤关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于计算机自适应测试和条目反应理论分析的FGIDs辨证诊断模型的方法学研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于FNN大规模油气集输管网SCADA安全防御体系建模理论及仿真方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

多源海量重力数据自适应融合算法及软件实现

国家自然科学基金

1+阅读 · 2011年12月31日

Cross-Genre Argument Mining: Can Language Models Automatically Fill in Missing Discourse Markers?

Arxiv

0+阅读 · 2023年6月7日

An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models

Arxiv

0+阅读 · 2023年6月6日

Large Language Models of Code Fail at Completing Code with Potential Bugs

Arxiv

0+阅读 · 2023年6月6日

SkCoder: A Sketch-based Approach for Automatic Code Generation

Arxiv

0+阅读 · 2023年6月4日

Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation

Arxiv

0+阅读 · 2023年6月2日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

大模型推理时代的知识编辑

《利用人工智能对军事行动进行建模》

【MIT博士论文】加速科学发现的因果建模实践算法

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Cross-Genre Argument Mining: Can Language Models Automatically Fill in Missing Discourse Markers?

Arxiv

0+阅读 · 2023年6月7日

An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models

Arxiv

0+阅读 · 2023年6月6日

Large Language Models of Code Fail at Completing Code with Potential Bugs

Arxiv

0+阅读 · 2023年6月6日

SkCoder: A Sketch-based Approach for Automatic Code Generation

Arxiv

0+阅读 · 2023年6月4日

Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation

Arxiv

0+阅读 · 2023年6月2日

相关基金

维、哈、柯跨语言内容过滤关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于计算机自适应测试和条目反应理论分析的FGIDs辨证诊断模型的方法学研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于FNN大规模油气集输管网SCADA安全防御体系建模理论及仿真方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

多源海量重力数据自适应融合算法及软件实现

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员