Although many recent works have investigated generalizable NeRF-based novel view synthesis for unseen scenes, they seldom consider the synthetic-to-real generalization, which is desired in many practical applications. In this work, we first investigate the effects of synthetic data in synthetic-to-real novel view synthesis and surprisingly observe that models trained with synthetic data tend to produce sharper but less accurate volume densities. For pixels where the volume densities are correct, fine-grained details will be obtained. Otherwise, severe artifacts will be produced. To maintain the advantages of using synthetic data while avoiding its negative effects, we propose to introduce geometry-aware contrastive learning to learn multi-view consistent features with geometric constraints. Meanwhile, we adopt cross-view attention to further enhance the geometry perception of features by querying features across input views. Experiments demonstrate that under the synthetic-to-real setting, our method can render images with higher quality and better fine-grained details, outperforming existing generalizable novel view synthesis methods in terms of PSNR, SSIM, and LPIPS. When trained on real data, our method also achieves state-of-the-art results.
翻译:虽然最近许多工作研究了面向未见场景的可推广NeRF的新视角综合,但它们很少考虑合成到实际情况的通用性,而这在许多实际应用中都是期望的。在这项工作中,我们首先研究了合成数据在合成到实际的新视角综合中的影响,发现使用合成数据训练的模型往往会产生更清晰但不够准确的体密度。对于体密度正确的像素,可以获得更好的细节。但对于错误的体密度,将会出现严重的伪影。为了保持使用合成数据的优点并避免其负面影响,我们建议引入带有几何约束的几何感知对比学习,以学习具有几何约束的多视角一致特征。同时,我们采用跨视图注意力,通过查询输入视图中的特征来进一步增强几何感知能力。实验表明,在合成到实际的情况下,我们的方法可以生成更高质量、更好的细节图像,以PSNR、SSIM和LPIPS为指标,优于现有的可推广新视角综合方法。在使用真实数据进行训练时,我们的方法也实现了最先进的结果。