With the explosive growth of rigid-body simulators, policy learning in simulation has become the de facto standard for most rigid morphologies. In contrast, soft robotic simulation frameworks remain scarce and are seldom adopted by the soft robotics community. This gap stems partly from the lack of easy-to-use, general-purpose frameworks and partly from the high computational cost of accurately simulating continuum mechanics, which often renders policy learning infeasible. In this work, we demonstrate that rapid soft robot policy learning is indeed achievable via implicit time-stepping. Our simulator of choice, DisMech, is a general-purpose, fully implicit soft-body simulator capable of handling both soft dynamics and frictional contact. We further introduce delta natural curvature control, a method analogous to delta joint position control in rigid manipulators, providing an intuitive and effective means of enacting control for soft robot learning. To highlight the benefits of implicit time-stepping and delta curvature control, we conduct extensive comparisons across four diverse soft manipulator tasks against one of the most widely used soft-body frameworks, Elastica. With implicit time-stepping, parallel stepping of 500 environments achieves up to 6x faster speeds for non-contact cases and up to 40x faster for contact-rich scenarios. Finally, a comprehensive sim-to-sim gap evaluation--training policies in one simulator and evaluating them in another--demonstrates that implicit time-stepping provides a rare free lunch: dramatic speedups achieved without sacrificing accuracy.
翻译:随着刚体模拟器的爆炸式增长,在仿真中进行策略学习已成为大多数刚性形态机器人的事实标准。相比之下,软体机器人仿真框架仍然稀缺,且很少被软体机器人学界采用。这一差距部分源于缺乏易于使用的通用框架,部分源于精确模拟连续介质力学的高计算成本,这常常使得策略学习变得不可行。在本工作中,我们证明通过隐式时间步进确实可以实现快速的软体机器人策略学习。我们选择的模拟器DisMech是一个通用、完全隐式的软体模拟器,能够同时处理软体动力学和摩擦接触。我们进一步引入了增量自然曲率控制方法,该方法类似于刚性机械臂中的增量关节位置控制,为软体机器人学习提供了一种直观且有效的控制手段。为突显隐式时间步进和增量曲率控制的优势,我们在四个不同的软体机械臂任务中,与最广泛使用的软体框架之一Elastica进行了全面对比。采用隐式时间步进时,500个环境的并行步进在无接触情况下可实现高达6倍的速度提升,在接触密集场景中更可达40倍。最后,通过全面的仿真间差异评估——在一个模拟器中训练策略并在另一个模拟器中评估——证明隐式时间步进提供了罕见的'免费午餐':在保持精度的同时实现了显著加速。