The Instance Image Goal Navigation (IIN) problem requires mobile robots deployed in unknown environments to search for specific objects or people of interest using only a single reference goal image of the target. This problem can be especially challenging when: 1) the reference image is captured from an arbitrary viewpoint, and 2) the robot must operate with sparse-view scene reconstructions. In this paper, we address the IIN problem, by introducing SplatSearch, a novel architecture that leverages sparse-view 3D Gaussian Splatting (3DGS) reconstructions. SplatSearch renders multiple viewpoints around candidate objects using a sparse online 3DGS map, and uses a multi-view diffusion model to complete missing regions of the rendered images, enabling robust feature matching against the goal image. A novel frontier exploration policy is introduced which uses visual context from the synthesized viewpoints with semantic context from the goal image to evaluate frontier locations, allowing the robot to prioritize frontiers that are semantically and visually relevant to the goal image. Extensive experiments in photorealistic home and real-world environments validate the higher performance of SplatSearch against current state-of-the-art methods in terms of Success Rate and Success Path Length. An ablation study confirms the design choices of SplatSearch.
翻译:实例图像目标导航问题要求移动机器人在未知环境中仅凭目标的一张参考图像,搜索特定物体或感兴趣的人员。当面临以下情况时,该问题尤为困难:1)参考图像从任意视角拍摄;2)机器人必须在稀疏视角场景重建下运行。本文针对实例图像目标导航问题,提出SplatSearch——一种利用稀疏视角3D高斯泼溅重建的新型架构。SplatSearch通过稀疏在线3D高斯泼溅地图渲染候选物体周围的多视角图像,并采用多视角扩散模型补全渲染图像中的缺失区域,从而实现对目标图像的鲁棒特征匹配。本文还提出一种新颖的前沿探索策略,该策略结合合成视角的视觉上下文与目标图像的语义上下文来评估前沿位置,使机器人能够优先探索与目标图像在语义和视觉上相关的前沿。在光真实感家庭环境及真实场景中的大量实验表明,SplatSearch在成功率和成功路径长度方面均优于当前最先进方法。消融实验验证了SplatSearch架构设计的有效性。