Accurate camera pose estimation from an image observation in a previously mapped environment is commonly done through structure-based methods: by finding correspondences between 2D keypoints on the image and 3D structure points in the map. In order to make this correspondence search tractable in large scenes, existing pipelines either rely on search heuristics, or perform image retrieval to reduce the search space by comparing the current image to a database of past observations. However, these approaches result in elaborate pipelines or storage requirements that grow with the number of past observations. In this work, we propose a new paradigm for making structure-based relocalisation tractable. Instead of relying on image retrieval or search heuristics, we learn a direct mapping from image observations to the visible scene structure in a compact neural network. Given a query image, a forward pass through our novel visible structure retrieval network allows obtaining the subset of 3D structure points in the map that the image views, thus reducing the search space of 2D-3D correspondences. We show that our proposed method enables performing localisation with an accuracy comparable to the state of the art, while requiring lower computational and storage footprint.


翻译:在预先构建地图的环境中,基于图像观测实现精确相机位姿估计通常采用基于结构的方法:通过建立图像二维关键点与地图三维结构点之间的对应关系。为使大规模场景中的对应搜索可行,现有方案要么依赖搜索启发式策略,要么通过图像检索将当前图像与历史观测数据库进行比对以缩减搜索空间。然而,这些方法导致系统流程复杂化,或存储需求随历史观测数量增长而增加。本研究提出一种实现基于结构重定位的新范式:不依赖图像检索或搜索启发式策略,而是通过紧凑的神经网络学习从图像观测到可见场景结构的直接映射。给定查询图像,经过我们提出的可见结构检索网络前向传播,可直接获取图像观测范围内的地图三维结构点子集,从而缩减二维-三维对应关系的搜索空间。实验表明,所提方法在达到当前先进水平定位精度的同时,显著降低了计算与存储开销。

0
下载
关闭预览

相关内容

3D是英文“Three Dimensions”的简称,中文是指三维、三个维度、三个坐标,即有长、有宽、有高,换句话说,就是立体的,是相对于只有长和宽的平面(2D)而言。
Top
微信扫码咨询专知VIP会员