MonoMPC：基于单目视觉的导航与学习型碰撞模型及风险感知模型预测控制 (MonoMPC: Monocular Vision Based Navigation with Learned Collision Model and Risk-Aware Model Predictive Control)

Navigating unknown environments with a single RGB camera is challenging, as the lack of depth information prevents reliable collision-checking. While some methods use estimated depth to build collision maps, we found that depth estimates from vision foundation models are too noisy for zero-shot navigation in cluttered environments. We propose an alternative approach: instead of using noisy estimated depth for direct collision-checking, we use it as a rich context input to a learned collision model. This model predicts the distribution of minimum obstacle clearance that the robot can expect for a given control sequence. At inference, these predictions inform a risk-aware MPC planner that minimizes estimated collision risk. We proposed a joint learning pipeline that co-trains the collision model and risk metric using both safe and unsafe trajectories. Crucially, our joint-training ensures well calibrated uncertainty in our collision model that improves navigation in highly cluttered environments. Consequently, real-world experiments show reductions in collision-rate and improvements in goal reaching and speed over several strong baselines.

翻译：仅使用单个RGB相机在未知环境中导航具有挑战性，因为缺乏深度信息会阻碍可靠的碰撞检测。尽管一些方法利用估计的深度构建碰撞地图，但我们发现视觉基础模型生成的深度估计在杂乱环境中对于零样本导航而言噪声过大。我们提出了一种替代方案：不直接使用噪声深度估计进行碰撞检测，而是将其作为丰富上下文输入到一个学习型碰撞模型中。该模型预测机器人执行给定控制序列时预期的最小障碍物间隙分布。在推理阶段，这些预测信息被用于一个风险感知的模型预测控制（MPC）规划器，以最小化估计的碰撞风险。我们提出了一种联合学习框架，利用安全与不安全轨迹共同训练碰撞模型和风险度量。关键的是，我们的联合训练确保了碰撞模型中经过良好校准的不确定性，从而提升了在高度杂乱环境中的导航性能。因此，真实世界实验表明，与多个强基线方法相比，我们的方法降低了碰撞率，并在目标到达和速度方面取得了改进。