Multi-view image generation holds significant application value in computer vision, particularly in domains like 3D reconstruction, virtual reality, and augmented reality. Most existing methods, which rely on extending single images, face notable computational challenges in maintaining cross-view consistency and generating high-resolution outputs. To address these issues, we propose the Geometry-guided Multi-View Diffusion Model, which incorporates mechanisms for extracting multi-view geometric information and adjusting the intensity of geometric features to generate images that are both consistent across views and rich in detail. Specifically, we design a multi-view geometry information extraction module that leverages depth maps, normal maps, and foreground segmentation masks to construct a shared geometric structure, ensuring shape and structural consistency across different views. To enhance consistency and detail restoration during generation, we develop a decoupled geometry-enhanced attention mechanism that strengthens feature focus on key geometric details, thereby improving overall image quality and detail preservation. Furthermore, we apply an adaptive learning strategy that fine-tunes the model to better capture spatial relationships and visual coherence between the generated views, ensuring realistic results. Our model also incorporates an iterative refinement process that progressively improves the output quality through multiple stages of image generation. Finally, a dynamic geometry information intensity adjustment mechanism is proposed to adaptively regulate the influence of geometric data, optimizing overall quality while ensuring the naturalness of generated images. More details can be found on the project page: https://sobeymil.github.io/GeoMVD.com.
翻译:多视图图像生成在计算机视觉领域具有重要的应用价值,尤其是在三维重建、虚拟现实和增强现实等方向。现有方法大多基于单张图像扩展,在保持跨视图一致性和生成高分辨率输出方面面临显著的计算挑战。为解决这些问题,本文提出几何引导的多视图扩散模型,该模型通过引入多视图几何信息提取机制及几何特征强度调节,以生成既具有跨视图一致性又富含细节的图像。具体而言,我们设计了一个多视图几何信息提取模块,利用深度图、法线图和前景分割掩码构建共享的几何结构,确保不同视图间的形状与结构一致性。为增强生成过程中的一致性与细节还原,我们开发了一种解耦的几何增强注意力机制,强化对关键几何细节的特征聚焦,从而提升整体图像质量与细节保持能力。此外,我们采用自适应学习策略对模型进行微调,以更好地捕捉生成视图间的空间关系与视觉连贯性,确保生成结果的真实性。本模型还结合了迭代优化过程,通过多阶段图像生成逐步提升输出质量。最后,提出了一种动态几何信息强度调节机制,自适应地调控几何数据的影响,在优化整体质量的同时保证生成图像的自然度。更多细节请访问项目页面:https://sobeymil.github.io/GeoMVD.com。