Learning robust and generalizable world models is crucial for enabling efficient and scalable robotic control in real-world environments. In this work, we introduce a novel framework for learning world models that accurately capture complex, partially observable, and stochastic dynamics. The proposed method employs a dual-autoregressive mechanism and self-supervised training to achieve reliable long-horizon predictions without relying on domain-specific inductive biases, ensuring adaptability across diverse robotic tasks. We further propose a policy optimization framework that leverages world models for efficient training in imagined environments and seamless deployment in real-world systems. This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer. By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
翻译:学习稳健且可泛化的世界模型对于在现实环境中实现高效且可扩展的机器人控制至关重要。本研究提出了一种新颖的世界模型学习框架,能够准确捕捉复杂、部分可观测且随机的动力学特性。该方法采用双自回归机制和自监督训练,在不依赖领域特定归纳偏置的情况下实现可靠的长时程预测,确保其能够适应多样化的机器人任务。我们进一步提出了一种策略优化框架,该框架利用世界模型在虚拟环境中进行高效训练,并在现实系统中实现无缝部署。本研究通过解决长时程预测、误差累积以及仿真到现实迁移等挑战,推动了基于模型的强化学习的发展。通过提供一个可扩展且稳健的框架,所提出的方法为现实应用中的自适应高效机器人系统铺平了道路。