We address the problem of safely learning controlled stochastic dynamics from discrete-time trajectory observations, ensuring system trajectories remain within predefined safe regions during both training and deployment. Safety-critical constraints of this kind are crucial in applications such as autonomous robotics, finance, and biomedicine. We introduce a method that ensures safe exploration and efficient estimation of system dynamics by iteratively expanding an initial known safe control set using kernel-based confidence bounds. After training, the learned model enables predictions of the system's dynamics and permits safety verification of any given control. Our approach requires only mild smoothness assumptions and access to an initial safe control set, enabling broad applicability to complex real-world systems. We provide theoretical guarantees for safety and derive adaptive learning rates that improve with increasing Sobolev regularity of the true dynamics. Experimental evaluations demonstrate the practical effectiveness of our method in terms of safety, estimation accuracy, and computational efficiency.
翻译:我们解决了从离散时间轨迹观测中安全学习受控随机动力学的问题,确保系统轨迹在训练和部署期间均保持在预定义的安全区域内。此类安全关键约束在自主机器人、金融和生物医学等应用中至关重要。我们提出了一种方法,通过基于核的置信边界迭代扩展初始已知安全控制集,确保安全探索和系统动力学的高效估计。训练完成后,学习到的模型能够预测系统动力学,并允许对任意给定控制进行安全性验证。我们的方法仅需温和的平滑性假设和初始安全控制集的访问权限,使其能够广泛应用于复杂的现实世界系统。我们为安全性提供了理论保证,并推导出自适应学习率,该学习率随真实动力学Sobolev正则性的增加而提升。实验评估证明了我们的方法在安全性、估计精度和计算效率方面的实际有效性。