We present a novel framework named NeuralRecon for real-time 3D scene reconstruction from a monocular video. Unlike previous methods that estimate single-view depth maps separately on each key-frame and fuse them later, we propose to directly reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network. A learning-based TSDF fusion module based on gated recurrent units is used to guide the network to fuse features from previous fragments. This design allows the network to capture local smoothness prior and global shape prior of 3D surfaces when sequentially reconstructing the surfaces, resulting in accurate, coherent, and real-time surface reconstruction. The experiments on ScanNet and 7-Scenes datasets show that our system outperforms state-of-the-art methods in terms of both accuracy and speed. To the best of our knowledge, this is the first learning-based system that is able to reconstruct dense coherent 3D geometry in real-time.
翻译:我们提出了一个名为 NeuralRecon 的新框架, 用于用单向视频进行实时的 3D 场景重建。 与以前对每个关键框架的单视深度地图进行单独估计并随后将其连接的方法不同, 我们提议直接重建每个视频碎片的TSDF体积, 由神经网络按顺序排列。 以闭门的经常性单元为基础的基于学习的TSDF聚变模块用来引导网络从以前的碎片中结合特性。 这个设计允许网络在连续重建地表时在3D表面之前捕捉地方的平滑之前和全球形状, 从而导致准确、 一致和实时的地表重建。 扫描网和7Scenes数据集的实验显示, 我们的系统在精确和速度上都超越了最先进的方法。 据我们所知, 这是第一个能够实时重建密度连贯的 3D 几何测量的基于学习的系统 。