The increasing adoption of AI-generated code has reshaped modern software development, introducing syntactic and semantic variations in cloned code. Unlike traditional human-written clones, AI-generated clones exhibit systematic syntactic patterns and semantic differences learned from large-scale training data. This shift presents new challenges for classical code clone detection (CCD) tools, which have historically been validated primarily on human-authored codebases and optimized to detect syntactic (Type 1-3) and limited semantic clones. Given that AI-generated code can produce both syntactic and complex semantic clones, it is essential to evaluate the effectiveness of classical CCD tools within this new paradigm. In this paper, we systematically evaluate nine widely used CCD tools using GPTCloneBench, a benchmark containing GPT-3-generated clones. To contextualize and validate our results, we further test these detectors on established human-authored benchmarks, BigCloneBench and SemanticCloneBench, to measure differences in performance between traditional and AI-generated clones. Our analysis demonstrates that classical CCD tools, particularly those enhanced by effective normalization techniques, retain considerable effectiveness against AI-generated clones, while some exhibit notable performance variation compared to traditional benchmarks. This paper contributes by (1) evaluating classical CCD tools against AI-generated clones, providing critical insights into their current strengths and limitations; (2) highlighting the role of normalization techniques in improving detection accuracy; and (3) delivering detailed scalability and execution-time analyses to support practical CCD tool selection.
翻译:暂无翻译