Three-dimensional reconstruction in scenes with extreme depth variations remains challenging due to inconsistent supervisory signals between near-field and far-field regions. Existing methods fail to simultaneously address inaccurate depth estimation in distant areas and structural degradation in close-range regions. This paper proposes a novel computational framework that integrates depth-of-field supervision and multi-view consistency supervision to advance 3D Gaussian Splatting. Our approach comprises two core components: (1) Depth-of-field Supervision employs a scale-recovered monocular depth estimator (e.g., Metric3D) to generate depth priors, leverages defocus convolution to synthesize physically accurate defocused images, and enforces geometric consistency through a novel depth-of-field loss, thereby enhancing depth fidelity in both far-field and near-field regions; (2) Multi-View Consistency Supervision employing LoFTR-based semi-dense feature matching to minimize cross-view geometric errors and enforce depth consistency via least squares optimization of reliable matched points. By unifying defocus physics with multi-view geometric constraints, our method achieves superior depth fidelity, demonstrating a 0.8 dB PSNR improvement over the state-of-the-art method on the Waymo Open Dataset. This framework bridges physical imaging principles and learning-based depth regularization, offering a scalable solution for complex depth stratification in urban environments.
翻译:在深度变化剧烈的场景中,三维重建仍然面临挑战,原因在于近场与远场区域之间的监督信号不一致。现有方法难以同时解决远距离区域深度估计不准确和近距离区域结构退化的问题。本文提出一种新颖的计算框架,通过整合景深监督与多视角一致性监督,以推进三维高斯溅射技术。我们的方法包含两个核心组成部分:(1) 景深监督:采用尺度恢复的单目深度估计器(如Metric3D)生成深度先验,利用散焦卷积合成物理精确的散焦图像,并通过一种新颖的景深损失函数强制几何一致性,从而提升远场和近场区域的深度保真度;(2) 多视角一致性监督:采用基于LoFTR的半稠密特征匹配来最小化跨视角几何误差,并通过可靠匹配点的最小二乘优化强制深度一致性。通过将散焦物理原理与多视角几何约束相统一,我们的方法实现了卓越的深度保真度,在Waymo开放数据集上比现有最先进方法提升了0.8 dB的PSNR。该框架桥接了物理成像原理与基于学习的深度正则化,为城市环境中复杂的深度分层问题提供了可扩展的解决方案。