GeoMVD：基于几何信息提取的几何增强多视图生成模型 (GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction)

Multi-view image generation holds significant application value in computer vision, particularly in domains like 3D reconstruction, virtual reality, and augmented reality. Most existing methods, which rely on extending single images, face notable computational challenges in maintaining cross-view consistency and generating high-resolution outputs. To address these issues, we propose the Geometry-guided Multi-View Diffusion Model, which incorporates mechanisms for extracting multi-view geometric information and adjusting the intensity of geometric features to generate images that are both consistent across views and rich in detail. Specifically, we design a multi-view geometry information extraction module that leverages depth maps, normal maps, and foreground segmentation masks to construct a shared geometric structure, ensuring shape and structural consistency across different views. To enhance consistency and detail restoration during generation, we develop a decoupled geometry-enhanced attention mechanism that strengthens feature focus on key geometric details, thereby improving overall image quality and detail preservation. Furthermore, we apply an adaptive learning strategy that fine-tunes the model to better capture spatial relationships and visual coherence between the generated views, ensuring realistic results. Our model also incorporates an iterative refinement process that progressively improves the output quality through multiple stages of image generation. Finally, a dynamic geometry information intensity adjustment mechanism is proposed to adaptively regulate the influence of geometric data, optimizing overall quality while ensuring the naturalness of generated images. More details can be found on the project page: https://sobeymil.github.io/GeoMVD.com.

翻译：多视图图像生成在计算机视觉领域具有重要的应用价值，尤其是在三维重建、虚拟现实和增强现实等方向。大多数现有方法依赖于单张图像的扩展，在保持跨视图一致性和生成高分辨率输出方面面临显著的计算挑战。为解决这些问题，我们提出了几何引导的多视图扩散模型，该模型整合了多视图几何信息提取机制和几何特征强度调节机制，以生成既跨视图一致又细节丰富的图像。具体而言，我们设计了一个多视图几何信息提取模块，利用深度图、法线图和前景分割掩码构建共享的几何结构，确保不同视图间的形状和结构一致性。为增强生成过程中的一致性和细节恢复，我们开发了一种解耦的几何增强注意力机制，该机制强化了特征对关键几何细节的关注，从而提升了整体图像质量和细节保留。此外，我们采用了一种自适应学习策略，通过微调模型以更好地捕捉生成视图间的空间关系和视觉连贯性，确保生成结果的真实性。我们的模型还包含了一个迭代细化过程，通过多阶段的图像生成逐步提升输出质量。最后，提出了动态几何信息强度调整机制，以自适应地调节几何数据的影响，在优化整体质量的同时确保生成图像的自然度。更多细节请访问项目页面：https://sobeymil.github.io/GeoMVD.com。