强化强化学习,为不同操作任务提供行为优先的强化学习 (Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks)

Realistic manipulation tasks require a robot to interact with an environment with a prolonged sequence of motor actions. While deep reinforcement learning methods have recently emerged as a promising paradigm for automating manipulation behaviors, they usually fall short in long-horizon tasks due to the exploration burden. This work introduces Manipulation Primitive-augmented reinforcement Learning (MAPLE), a learning framework that augments standard reinforcement learning algorithms with a pre-defined library of behavior primitives. These behavior primitives are robust functional modules specialized in achieving manipulation goals, such as grasping and pushing. To use these heterogeneous primitives, we develop a hierarchical policy that involves the primitives and instantiates their executions with input parameters. We demonstrate that MAPLE outperforms baseline approaches by a significant margin on a suite of simulated manipulation tasks. We also quantify the compositional structure of the learned behaviors and highlight our method's ability to transfer policies to new task variants and to physical hardware. Videos and code are available at https://ut-austin-rpl.github.io/maple

翻译：现实操作任务要求机器人与具有长期运动动作序列的环境进行互动。虽然深强化学习方法最近成为操纵行为自动化的一个很有希望的范例, 但是由于勘探负担, 它们通常无法完成长半径任务。这项工作引入了操纵原始强化强化强化学习( MAPLE ) ( MAPLE ), 这是一个学习框架, 以预定义的行为原始资料库来增强标准强化学习算法。这些行为原始是在实现操纵目标( 如抓住和推力)方面专业的强健功能模块。使用这些多样化原始数据, 我们制定有原始数据参与的等级政策, 并用输入参数即时处决它们。我们证明, MAPLE 在一系列模拟操作任务上大大优于基线方法。我们还量化了所学行为的构成结构, 并突出我们将政策转移到新任务变体和硬件的方法。视频和代码可以在 https://ut- autin- rpl.github.io/maple上查阅。