As an increasing number of software systems reach unprecedented scale, relying solely on code-level abstractions is becoming impractical. While architectural abstractions offer a means to manage these systems, maintaining their consistency with the actual code has been problematic. The Java Platform Module System (JPMS), introduced in Java 9, addresses this limitation by enabling explicit module specification at the language level. JPMS enhances architectural implementation through improved encapsulation and direct specification of ground-truth architectures within Java projects. Although many projects are written in Java, modularizing existing monolithic projects to JPMS modules is an open challenge due to ineffective module recovery by existing architecture recovery techniques. To address this challenge, this paper presents ClassLAR (Class-and Language model-based Architectural Recovery), a novel, lightweight, and efficient approach that recovers Java modules from monolithic Java systems using fully-qualified class names. ClassLAR leverages language models to extract semantic information from package and class names, capturing both structural and functional intent. In evaluations across 20 popular Java projects, ClassLAR outperformed all state-of-the-art techniques in architectural-level similarity metrics while achieving execution times that were 3.99 to 10.50 times faster.
翻译:随着越来越多的软件系统达到前所未有的规模,仅依赖代码级抽象已变得不切实际。尽管架构抽象提供了管理这些系统的手段,但保持其与实际代码的一致性一直存在问题。Java 9引入的Java平台模块系统(JPMS)通过在语言级别支持显式模块规范,解决了这一限制。JPMS通过改进封装性及在Java项目中直接指定真实架构,增强了架构实现。尽管许多项目使用Java编写,但由于现有架构恢复技术在模块恢复方面效果有限,将现有单体项目模块化为JPMS模块仍是一个开放挑战。为应对这一挑战,本文提出ClassLAR(基于类和语言模型的架构恢复),这是一种新颖、轻量且高效的方法,利用完全限定类名从单体Java系统中恢复Java模块。ClassLAR利用语言模型从包名和类名中提取语义信息,捕捉结构和功能意图。在对20个流行Java项目的评估中,ClassLAR在架构级相似性指标上优于所有现有先进技术,同时执行速度提高了3.99至10.50倍。