迈向基于大语言模型的层次任务网络建模通用框架 (Towards a General Framework for HTN Modeling with LLMs)

from arxiv, 10 pages, 5 figures, to be published in the Workshop on Planning in the Era of LLMs ( LM4Plan - https://llmforplanning.github.io ) and the Workshop on Hierarchical Planning ( HPlan - https://icaps25.icaps-conference.org/program/workshops/hplan/ ), both in the International Conference on Automated Planning and Scheduling (ICAPS) 2025

The use of Large Language Models (LLMs) for generating Automated Planning (AP) models has been widely explored; however, their application to Hierarchical Planning (HP) is still far from reaching the level of sophistication observed in non-hierarchical architectures. In this work, we try to address this gap. We present two main contributions. First, we propose L2HP, an extension of L2P (a library to LLM-driven PDDL models generation) that support HP model generation and follows a design philosophy of generality and extensibility. Second, we apply our framework to perform experiments where we compare the modeling capabilities of LLMs for AP and HP. On the PlanBench dataset, results show that parsing success is limited but comparable in both settings (around 36\%), while syntactic validity is substantially lower in the hierarchical case (1\% vs. 20\% of instances). These findings underscore the unique challenges HP presents for LLMs, highlighting the need for further research to improve the quality of generated HP models.

翻译：利用大语言模型（LLMs）生成自动化规划（AP）模型的方法已被广泛探索；然而，其在层次化规划（HP）中的应用仍远未达到非层次化架构中观察到的成熟度。本研究试图弥补这一差距。我们提出了两项主要贡献：首先，我们提出了L2HP——这是L2P（一个用于生成LLM驱动的PDDL模型的库）的扩展，它支持HP模型生成，并遵循通用性和可扩展性的设计理念。其次，我们应用该框架进行实验，比较LLMs在AP和HP中的建模能力。在PlanBench数据集上的结果表明，两种设置下的解析成功率有限但相当（约36%），而层次化案例中的句法有效性显著更低（实例的1%对比20%）。这些发现凸显了HP对LLMs提出的独特挑战，强调了需要进一步研究以提高生成的HP模型质量。