Reinforcement learning (RL) offers a powerful approach for robots to learn complex, collaborative skills by combining Dynamic Movement Primitives (DMPs) for motion and Variable Impedance Control (VIC) for compliant interaction. However, this model-free paradigm often risks instability and unsafe exploration due to the time-varying nature of impedance gains. This work introduces Certified Gaussian Manifold Sampling (C-GMS), a novel trajectory-centric RL framework that learns combined DMP and VIC policies while guaranteeing Lyapunov stability and actuator feasibility by construction. Our approach reframes policy exploration as sampling from a mathematically defined manifold of stable gain schedules. This ensures every policy rollout is guaranteed to be stable and physically realizable, thereby eliminating the need for reward penalties or post-hoc validation. Furthermore, we provide a theoretical guarantee that our approach ensures bounded tracking error even in the presence of bounded model errors and deployment-time uncertainties. We demonstrate the effectiveness of C-GMS in simulation and verify its efficacy on a real robot, paving the way for reliable autonomous interaction in complex environments.
翻译:强化学习(RL)为机器人学习复杂的协作技能提供了一种强大的方法,它通过结合动态运动基元(DMPs)进行运动规划,以及变阻抗控制(VIC)实现顺应性交互。然而,这种无模型范式由于阻抗增益的时变特性,常常面临不稳定和不安全探索的风险。本研究提出了认证高斯流形采样(C-GMS),这是一种新颖的以轨迹为中心的强化学习框架,能够在学习DMP与VIC组合策略的同时,通过构造保证李雅普诺夫稳定性和执行器可行性。我们的方法将策略探索重新定义为从数学上定义的稳定增益调度流形中进行采样。这确保了每次策略执行都保证稳定且物理可实现,从而消除了对奖励惩罚或事后验证的需求。此外,我们提供了理论保证,证明即使在存在有界模型误差和部署时不确定性的情况下,我们的方法也能确保跟踪误差有界。我们在仿真中展示了C-GMS的有效性,并在真实机器人上验证了其效能,为复杂环境中可靠的自主交互铺平了道路。