Scene reconstruction has emerged as a central challenge in computer vision, with approaches such as Neural Radiance Fields (NeRF) and Gaussian Splatting achieving remarkable progress. While Gaussian Splatting demonstrates strong performance on large-scale datasets, it often struggles to capture fine details or maintain realism in regions with sparse coverage, largely due to the inherent limitations of sparse 3D training data. In this work, we propose GauSSmart, a hybrid method that effectively bridges 2D foundational models and 3D Gaussian Splatting reconstruction. Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision from foundational models such as DINO, to enhance Gaussian-based scene reconstruction. By leveraging 2D segmentation priors and high-dimensional feature embeddings, our method guides the densification and refinement of Gaussian splats, improving coverage in underrepresented areas and preserving intricate structural details. We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting in the majority of evaluated scenes. Our results demonstrate the significant potential of hybrid 2D-3D approaches, highlighting how the thoughtful combination of 2D foundational models with 3D reconstruction pipelines can overcome the limitations inherent in either approach alone.
翻译:场景重建已成为计算机视觉领域的核心挑战,神经辐射场(NeRF)与高斯溅射等方法已取得显著进展。尽管高斯溅射在大规模数据集上表现出色,但在稀疏覆盖区域往往难以捕捉精细细节或保持真实感,这主要源于稀疏三维训练数据的内在局限性。本研究提出GauSSmart,一种有效连接二维基础模型与三维高斯溅射重建的混合方法。该方法整合了成熟的二维计算机视觉技术,包括凸滤波以及来自DINO等基础模型的语义特征监督,以增强基于高斯的场景重建。通过利用二维分割先验知识和高维特征嵌入,我们的方法指导高斯溅射的致密化与精细化,改善了代表性不足区域的覆盖度并保留了复杂的结构细节。我们在三个数据集上验证了该方法,GauSSmart在大多数评估场景中持续优于现有高斯溅射方法。研究结果证明了二维-三维混合方法的巨大潜力,凸显了将二维基础模型与三维重建流程进行深思熟虑的结合,能够克服各自独立方法固有的局限性。