Robust loop closure detection is a critical component of Simultaneous Localization and Mapping (SLAM) algorithms in GNSS-denied environments, such as in the context of planetary exploration. In these settings, visual place recognition often fails due to aliasing and weak textures, while LiDAR-based methods suffer from sparsity and ambiguity. This paper presents MPRF, a multimodal pipeline that leverages transformer-based foundation models for both vision and LiDAR modalities to achieve robust loop closure in severely unstructured environments. Unlike prior work limited to retrieval, MPRF integrates a two-stage visual retrieval strategy with explicit 6-DoF pose estimation, combining DINOv2 features with SALAD aggregation for efficient candidate screening and SONATA-based LiDAR descriptors for geometric verification. Experiments on the S3LI dataset and S3LI Vulcano dataset show that MPRF outperforms state-of-the-art retrieval methods in precision while enhancing pose estimation robustness in low-texture regions. By providing interpretable correspondences suitable for SLAM back-ends, MPRF achieves a favorable trade-off between accuracy, efficiency, and reliability, demonstrating the potential of foundation models to unify place recognition and pose estimation. Code and models will be released at github.com/DLR-RM/MPRF.
翻译:在GNSS信号缺失的环境(例如行星探测场景)中,稳健的闭环检测是同步定位与建图(SLAM)算法的关键组成部分。在此类环境中,视觉地点识别常因场景混淆和纹理稀疏而失效,而基于激光雷达的方法则受限于点云稀疏性和模糊性。本文提出MPRF,一种多模态流程,利用基于Transformer的基础模型处理视觉与激光雷达模态,以实现极端非结构化环境中的稳健闭环检测。与以往仅限于检索的研究不同,MPRF将两阶段视觉检索策略与显式六自由度姿态估计相结合:采用DINOv2特征与SALAD聚合进行高效候选筛选,并利用基于SONATA的激光雷达描述子进行几何验证。在S3LI数据集和S3LI Vulcano数据集上的实验表明,MPRF在精度上优于当前最先进的检索方法,同时在低纹理区域提升了姿态估计的鲁棒性。通过提供适用于SLAM后端的可解释对应关系,MPRF在精度、效率和可靠性之间实现了良好平衡,证明了基础模型在统一地点识别与姿态估计方面的潜力。代码与模型将在github.com/DLR-RM/MPRF发布。