基于结构化LTL表示的强化学习中零样本指令跟随 (Zero-Shot Instruction Following in RL via Structured LTL Representations)

Linear temporal logic (LTL) is a compelling framework for specifying complex, structured tasks for reinforcement learning (RL) agents. Recent work has shown that interpreting LTL instructions as finite automata, which can be seen as high-level programs monitoring task progress, enables learning a single generalist policy capable of executing arbitrary instructions at test time. However, existing approaches fall short in environments where multiple high-level events (i.e., atomic propositions) can be true at the same time and potentially interact in complicated ways. In this work, we propose a novel approach to learning a multi-task policy for following arbitrary LTL instructions that addresses this shortcoming. Our method conditions the policy on sequences of simple Boolean formulae, which directly align with transitions in the automaton, and are encoded via a graph neural network (GNN) to yield structured task representations. Experiments in a complex chess-based environment demonstrate the advantages of our approach.

翻译：线性时序逻辑（LTL）是一种用于为强化学习（RL）智能体指定复杂结构化任务的引人注目的框架。近期研究表明，将LTL指令解释为有限自动机（可视为监控任务进展的高级程序），能够学习一个单一的通才策略，在测试时能够执行任意指令。然而，在多个高级事件（即原子命题）可能同时为真且可能以复杂方式交互的环境中，现有方法存在不足。在本工作中，我们提出了一种新颖的方法来学习遵循任意LTL指令的多任务策略，以解决这一缺陷。我们的方法通过一系列简单的布尔公式来调节策略，这些公式直接与自动机中的转移对齐，并通过图神经网络（GNN）进行编码以生成结构化的任务表示。在复杂的基于国际象棋的环境中的实验证明了我们方法的优势。