Robot control loops require causal pose estimates that depend only on past and present measurements. At each timestep, controllers compute commands using the current pose without waiting for future refinements. While traditional visual SLAM systems achieve high accuracy through retrospective loop closures, these corrections arrive after control decisions were already executed, violating causality. Visual-inertial odometry maintains causality but accumulates unbounded drift over time. To address the distinct requirements of robot control, we propose a multi-camera multi-map visual-inertial localization system providing real-time, causal pose estimation with bounded localization error through continuous map constraints. Since standard trajectory metrics evaluate post-processed trajectories, we analyze the error composition of map-based localization systems and propose a set of evaluation metrics suitable for measuring causal localization performance. To validate our system, we design a multi-camera IMU hardware setup and collect a challenging long-term campus dataset featuring diverse illumination and seasonal conditions. Experimental results on public benchmarks and on our own collected dataset demonstrate that our system provides significantly higher real-time localization accuracy compared to other methods. To benefit the community, we have made both the system and the dataset open source at https://anonymous.4open.science/r/Multi-cam-Multi-map-VILO-7993.
翻译:机器人控制回路需要满足因果性的位姿估计,即仅依赖过去与当前的测量数据。在每个时间步,控制器基于当前位姿计算指令,无需等待未来的优化修正。传统视觉SLAM系统通过回顾式闭环实现高精度,但这些修正产生于控制决策执行之后,违背了因果性。视觉惯性里程计虽保持因果性,但会随时间积累无界漂移。为满足机器人控制的特殊需求,我们提出一种多相机多地图视觉惯性定位系统,通过连续地图约束提供实时、因果性的位姿估计,并实现有界的定位误差。由于标准轨迹评估指标针对后处理轨迹设计,我们分析了基于地图的定位系统的误差构成,并提出一套适用于衡量因果性定位性能的评估指标。为验证系统性能,我们设计了多相机IMU硬件平台,并采集了包含多变光照与季节条件的长期校园数据集,其场景具有较高挑战性。在公开基准数据集及自主采集数据集上的实验结果表明,本系统相比其他方法能显著提升实时定位精度。为促进领域发展,我们已将系统与数据集在 https://anonymous.4open.science/r/Multi-cam-Multi-map-VILO-7993 开源。