利用大语言模型支持文本领域特定语言定义与实例的协同演化 (Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs)

Software languages evolve over time for various reasons, such as the addition of new features. When the language's grammar definition evolves, textual instances that originally conformed to the grammar become outdated. For DSLs in a model-driven engineering context, there exists a plethora of techniques to co-evolve models with the evolving metamodel. However, these techniques are not geared to support DSLs with a textual syntax -- applying them to textual language definitions and instances may lead to the loss of information from the original instances, such as comments and layout information, which are valuable for software comprehension and maintenance. This study explores the potential of Large Language Model (LLM)-based solutions in achieving grammar and instance co-evolution, with attention to their ability to preserve auxiliary information when directly processing textual instances. By applying two advanced language models, Claude-3.5 and GPT-4o, and conducting experiments across seven case languages, we evaluated the feasibility and limitations of this approach. Our results indicate a good ability of the considered LLMs for migrating textual instances in small-scale cases with limited instance size, which are representative of a subset of cases encountered in practice. In addition, we observe significant challenges with the scalability of LLM-based solutions to larger instances, leading to insights that are useful for informing future research.

翻译：软件语言因多种原因（如新增功能）随时间演化。当语言的语法定义发生演化时，原本符合该语法的文本实例会变得过时。在模型驱动工程背景下，针对领域特定语言（DSL），已有大量技术用于实现模型与演化中的元模型协同演化。然而，这些技术并不适用于支持具有文本语法的DSL——将其应用于文本语言定义和实例可能导致原始实例中信息的丢失，例如注释和格式信息，这些信息对于软件理解和维护具有重要价值。本研究探索了基于大语言模型（LLM）的解决方案在实现语法与实例协同演化方面的潜力，重点关注其在直接处理文本实例时保留辅助信息的能力。通过应用两种先进语言模型（Claude-3.5和GPT-4o）并在七种案例语言上进行实验，我们评估了该方法的可行性与局限性。结果表明，所考察的LLM在实例规模有限的小型案例中表现出良好的文本实例迁移能力，这些案例代表了实践中遇到的部分实际情况。此外，我们观察到基于LLM的解决方案在扩展到更大规模实例时面临显著的可扩展性挑战，这些发现为未来研究提供了重要参考。