Bimanual robotic manipulation is an emerging and critical topic in the robotics community. Previous works primarily rely on integrated control models that take the perceptions and states of both arms as inputs to directly predict their actions. However, we think bimanual manipulation involves not only coordinated tasks but also various uncoordinated tasks that do not require explicit cooperation during execution, such as grasping objects with the closest hand, which integrated control frameworks ignore to consider due to their enforced cooperation in the early inputs. In this paper, we propose a novel decoupled interaction framework that considers the characteristics of different tasks in bimanual manipulation. The key insight of our framework is to assign an independent model to each arm to enhance the learning of uncoordinated tasks, while introducing a selective interaction module that adaptively learns weights from its own arm to improve the learning of coordinated tasks. Extensive experiments on seven tasks in the RoboTwin dataset demonstrate that: (1) Our framework achieves outstanding performance, with a 23.5% boost over the SOTA method. (2) Our framework is flexible and can be seamlessly integrated into existing methods. (3) Our framework can be effectively extended to multi-agent manipulation tasks, achieving a 28% boost over the integrated control SOTA. (4) The performance boost stems from the decoupled design itself, surpassing the SOTA by 16.5% in success rate with only 1/6 of the model size.
翻译:双手机器人操作是机器人领域一个新兴且关键的研究方向。先前的研究主要依赖于集成控制模型,这些模型以双臂的感知和状态作为输入,直接预测其动作。然而,我们认为双手机器人操作不仅涉及协调任务,还包括各种在执行过程中无需显式协作的非协调任务,例如用最近的手抓取物体,而集成控制框架由于在早期输入中强制要求协作,往往忽略了这些任务。本文提出了一种新颖的解耦交互框架,该框架考虑了双手机器人操作中不同任务的特点。我们框架的核心思想是为每只手臂分配一个独立的模型,以增强对非协调任务的学习,同时引入一个选择性交互模块,该模块自适应地从自身手臂学习权重,以改进对协调任务的学习。在RoboTwin数据集的七个任务上进行的广泛实验表明:(1)我们的框架实现了卓越的性能,比现有最优方法提升了23.5%。(2)我们的框架具有灵活性,可以无缝集成到现有方法中。(3)我们的框架可以有效地扩展到多智能体操作任务,比集成控制的最优方法提升了28%。(4)性能提升源于解耦设计本身,在模型大小仅为1/6的情况下,成功率比现有最优方法高出16.5%。