The problem of depth completion involves predicting a dense depth image from a single sparse depth map and an RGB image. Unsupervised depth completion methods have been proposed for various datasets where ground truth depth data is unavailable and supervised methods cannot be applied. However, these models require auxiliary data to estimate depth values, which is far from real scenarios. Monocular depth estimation (MDE) models can produce a plausible relative depth map from a single image, but there is no work to properly combine the sparse depth map with MDE for depth completion; a simple affine transformation to the depth map will yield a high error since MDE are inaccurate at estimating depth difference between objects. We introduce StarryGazer, a domain-agnostic framework that predicts dense depth images from a single sparse depth image and an RGB image without relying on ground-truth depth by leveraging the power of large MDE models. First, we employ a pre-trained MDE model to produce relative depth images. These images are segmented and randomly rescaled to form synthetic pairs for dense pseudo-ground truth and corresponding sparse depths. A refinement network is trained with the synthetic pairs, incorporating the relative depth maps and RGB images to improve the model's accuracy and robustness. StarryGazer shows superior results over existing unsupervised methods and transformed MDE results on various datasets, demonstrating that our framework exploits the power of MDE models while appropriately fixing errors using sparse depth information.
翻译:深度补全问题涉及从单个稀疏深度图和RGB图像预测稠密深度图像。针对缺乏真实深度数据且无法应用监督方法的各类数据集,已有无监督深度补全方法被提出。然而,这些模型需要辅助数据来估计深度值,这与实际场景相去甚远。单目深度估计(MDE)模型能够从单张图像生成合理的相对深度图,但尚无研究将稀疏深度图与MDE恰当结合以完成深度补全;由于MDE在估计物体间深度差异时存在不准确性,对深度图进行简单仿射变换会产生较大误差。本文提出StarryGazer,一种领域无关的框架,通过利用大型MDE模型的能力,无需依赖真实深度数据,即可从单个稀疏深度图像和RGB图像预测稠密深度图像。首先,我们采用预训练的MDE模型生成相对深度图像。这些图像经过分割和随机重缩放,形成用于稠密伪真实深度及对应稀疏深度的合成数据对。通过结合相对深度图和RGB图像,使用合成数据对训练优化网络,以提升模型的精度与鲁棒性。StarryGazer在多个数据集上展现出优于现有无监督方法及经变换MDE结果的表现,证明本框架在充分利用MDE模型能力的同时,能通过稀疏深度信息有效修正误差。