基于并行化物理模拟器的双臂操作采样优化方法 (Sampling-Based Optimization with Parallelized Physics Simulator for Bimanual Manipulation)

In recent years, dual-arm manipulation has become an area of strong interest in robotics, with end-to-end learning emerging as the predominant strategy for solving bimanual tasks. A critical limitation of such learning-based approaches, however, is their difficulty in generalizing to novel scenarios, especially within cluttered environments. This paper presents an alternative paradigm: a sampling-based optimization framework that utilizes a GPU-accelerated physics simulator as its world model. We demonstrate that this approach can solve complex bimanual manipulation tasks in the presence of static obstacles. Our contribution is a customized Model Predictive Path Integral Control (MPPI) algorithm, \textbf{guided by carefully designed task-specific cost functions,} that uses GPU-accelerated MuJoCo for efficiently evaluating robot-object interaction. We apply this method to solve significantly more challenging versions of tasks from the PerAct$^{2}$ benchmark, such as requiring the point-to-point transfer of a ball through an obstacle course. Furthermore, we establish that our method achieves real-time performance on commodity GPUs and facilitates successful sim-to-real transfer by leveraging unique features within MuJoCo. The paper concludes with a statistical analysis of the sample complexity and robustness, quantifying the performance of our approach. The project website is available at: https://sites.google.com/view/bimanualakslabunitartu .

翻译：近年来，双臂操作已成为机器人学中备受关注的领域，端到端学习已成为解决双臂任务的主流策略。然而，此类基于学习的方法存在一个关键局限：难以泛化至新场景，尤其是在杂乱环境中。本文提出一种替代范式：一种基于采样的优化框架，该框架利用GPU加速的物理模拟器作为其世界模型。我们证明该方法能够在存在静态障碍物的情况下解决复杂的双臂操作任务。我们的贡献在于一种定制的模型预测路径积分控制（MPPI）算法，该算法由精心设计的任务特定成本函数引导，并使用GPU加速的MuJoCo来高效评估机器人-物体交互。我们将此方法应用于解决PerAct²基准测试中更具挑战性的任务变体，例如要求将球通过障碍物赛道进行点对点转移。此外，我们证实该方法在商用GPU上实现了实时性能，并通过利用MuJoCo中的独特功能促进了成功的仿真到现实迁移。本文最后对样本复杂性和鲁棒性进行了统计分析，量化了我们方法的性能。项目网站地址为：https://sites.google.com/view/bimanualakslabunitartu。