3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with accurately modeling in-the-wild scenes affected by transient objects and illuminations, leading to artifacts in the rendered images. We identify that the Gaussian densification process, while enhancing scene detail capture, unintentionally contributes to these artifacts by growing additional Gaussians that model transient disturbances and illumination variations. To address this, we propose RobustSplat++, a robust solution based on several critical designs. First, we introduce a delayed Gaussian growth strategy that prioritizes optimizing static scene structure before allowing Gaussian splitting/cloning, mitigating overfitting to transient objects in early optimization. Second, we design a scale-cascaded mask bootstrapping approach that first leverages lower-resolution feature similarity supervision for reliable initial transient mask estimation, taking advantage of its stronger semantic consistency and robustness to noise, and then progresses to high-resolution supervision to achieve more precise mask prediction. Third, we incorporate the delayed Gaussian growth strategy and mask bootstrapping with appearance modeling to handling in-the-wild scenes including transients and illuminations. Extensive experiments on multiple challenging datasets show that our method outperforms existing methods, clearly demonstrating the robustness and effectiveness of our method.
翻译:3D高斯溅射(3DGS)因其在新型视角合成与三维建模中实现实时、照片级真实感渲染而受到广泛关注。然而,现有方法难以准确建模受瞬态物体和光照影响的野外场景,导致渲染图像中出现伪影。我们发现,高斯致密化过程在增强场景细节捕捉的同时,无意中通过生成额外的高斯分布来建模瞬态干扰和光照变化,从而加剧了这些伪影。为解决此问题,我们提出了RobustSplat++,这是一种基于多项关键设计的鲁棒解决方案。首先,我们引入了一种延迟高斯增长策略,该策略在允许高斯分裂/克隆之前优先优化静态场景结构,从而减轻早期优化中对瞬态物体的过拟合。其次,我们设计了一种尺度级联掩码自举方法,该方法首先利用较低分辨率特征相似性监督来获得可靠的初始瞬态掩码估计,充分利用其更强的语义一致性和对噪声的鲁棒性,然后逐步过渡到高分辨率监督以实现更精确的掩码预测。第三,我们将延迟高斯增长策略和掩码自举与外观建模相结合,以处理包含瞬态物体和光照变化的野外场景。在多个具有挑战性的数据集上进行的大量实验表明,我们的方法优于现有方法,清晰证明了其鲁棒性和有效性。