Model-based planning in robotic domains is fundamentally challenged by the hybrid nature of physical dynamics, where continuous motion is punctuated by discrete events such as contacts and impacts. Conventional latent world models typically employ monolithic neural networks that enforce global continuity, inevitably over-smoothing the distinct dynamic modes (e.g., sticking vs. sliding, flight vs. stance). For a planner, this smoothing results in catastrophic compounding errors during long-horizon lookaheads, rendering the search process unreliable at physical boundaries. To address this, we introduce the Prismatic World Model (PRISM-WM), a structured architecture designed to decompose complex hybrid dynamics into composable primitives. PRISM-WM leverages a context-aware Mixture-of-Experts (MoE) framework where a gating mechanism implicitly identifies the current physical mode, and specialized experts predict the associated transition dynamics. We further introduce a latent orthogonalization objective to ensure expert diversity, effectively preventing mode collapse. By accurately modeling the sharp mode transitions in system dynamics, PRISM-WM significantly reduces rollout drift. Extensive experiments on challenging continuous control benchmarks, including high-dimensional humanoids and diverse multi-task settings, demonstrate that PRISM-WM provides a superior high-fidelity substrate for trajectory optimization algorithms (e.g., TD-MPC), proving its potential as a powerful foundational model for next-generation model-based agents.
翻译:机器人领域的基于模型规划从根本上受到物理动力学混合性质的挑战,其中连续运动被接触和碰撞等离散事件所打断。传统的潜在世界模型通常采用强制全局连续性的整体神经网络,不可避免地过度平滑了不同的动态模式(例如,粘附与滑动、飞行与站立)。对于规划器而言,这种平滑会导致长时域前瞻过程中灾难性的复合误差,使得搜索过程在物理边界处不可靠。为解决此问题,我们引入了棱镜世界模型(PRISM-WM),这是一种结构化架构,旨在将复杂的混合动力学分解为可组合的基元。PRISM-WM利用上下文感知的专家混合(MoE)框架,其中门控机制隐式识别当前物理模式,并由专门专家预测相关的转移动力学。我们进一步引入了潜在正交化目标以确保专家多样性,有效防止模式崩溃。通过精确建模系统动力学中的急剧模式转换,PRISM-WM显著减少了轨迹漂移。在具有挑战性的连续控制基准测试(包括高维人形机器人和多样化多任务设置)上进行的大量实验表明,PRISM-WM为轨迹优化算法(例如TD-MPC)提供了卓越的高保真基础,证明了其作为下一代基于模型智能体的强大基础模型的潜力。