Code-documentation inconsistencies are common and undesirable: they can lead to developer misunderstandings and software defects. This paper introduces DocPrism, a multi-language, code-documentation inconsistency detection tool. DocPrism uses a standard large language model (LLM) to analyze and explain inconsistencies. Plain use of LLMs for this task yield unacceptably high false positive rates: LLMs identify natural gaps between high-level documentation and detailed code implementations as inconsistencies. We introduce and apply the Local Categorization, External Filtering (LCEF) methodology to reduce false positives. LCEF relies on the LLM's local completion skills rather than its long-term reasoning skills. In our ablation study, LCEF reduces DocPrism's inconsistency flag rate from 98% to 14%, and increases accuracy from 14% to 94%. On a broad evaluation across Python, TypeScript, C++, and Java, DocPrism maintains a low flag rate of 15%, and achieves a precision of 0.62 without performing any fine-tuning.
翻译:代码与文档之间的不一致性普遍存在且具有负面影响:它们可能导致开发人员误解并引发软件缺陷。本文介绍了DocPrism,一种支持多语言的代码-文档不一致性检测工具。DocPrism利用标准的大型语言模型(LLM)来分析与解释不一致性。直接使用LLM执行此任务会产生不可接受的高误报率:LLM会将高层级文档与详细代码实现之间的自然差异误判为不一致性。我们引入并应用了本地分类与外部过滤(LCEF)方法以降低误报。LCEF依赖LLM的局部补全能力而非其长期推理能力。在我们的消融实验中,LCEF将DocPrism的不一致标记率从98%降至14%,并将准确率从14%提升至94%。在涵盖Python、TypeScript、C++和Java的广泛评估中,DocPrism保持了15%的低标记率,并在未进行任何微调的情况下实现了0.62的精确率。