Omni-View：基于多视角图像的统一三维模型中生成如何促进理解 (Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images) - 专知论文

会员服务 ·

0

三维场景 · 多视角 · 多视角图像 · 合成 · 场景理解 ·

Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images

翻译：Omni-View：基于多视角图像的统一三维模型中生成如何促进理解

JiaKui Hu,Shanshan Zhao,Qing-Guo Chen,Xuerui Qiu,Jialun Liu,Zhao Xu,Weihua Luo,Kaifu Zhang,Yanye Lu

from arxiv, Under review

This paper presents Omni-View, which extends the unified multimodal understanding and generation to 3D scenes based on multiview images, exploring the principle that "generation facilitates understanding". Consisting of understanding model, texture module, and geometry module, Omni-View jointly models scene understanding, novel view synthesis, and geometry estimation, enabling synergistic interaction between 3D scene understanding and generation tasks. By design, it leverages the spatiotemporal modeling capabilities of its texture module responsible for appearance synthesis, alongside the explicit geometric constraints provided by its dedicated geometry module, thereby enriching the model's holistic understanding of 3D scenes. Trained with a two-stage strategy, Omni-View achieves a state-of-the-art score of 55.4 on the VSI-Bench benchmark, outperforming existing specialized 3D understanding models, while simultaneously delivering strong performance in both novel view synthesis and 3D scene generation.

翻译：本文提出Omni-View，将统一的多模态理解与生成扩展至基于多视角图像的三维场景，探索“生成促进理解”的原理。Omni-View由理解模型、纹理模块和几何模块组成，联合建模场景理解、新视角合成与几何估计，实现三维场景理解与生成任务的协同交互。通过设计，它利用负责外观合成的纹理模块的时空建模能力，结合专用几何模块提供的显式几何约束，从而增强模型对三维场景的整体理解。采用两阶段训练策略，Omni-View在VSI-Bench基准测试中获得55.4分的先进性能，超越现有专用三维理解模型，同时在新视角合成和三维场景生成任务中均表现出色。

0

相关内容

三维场景

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【ACL2022】一种基于三阶张量同构的高效实体对齐译码算法, An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism

【ACL2022】一种基于三阶张量同构的高效实体对齐译码算法, An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism

专知会员服务

13+阅读 · 2022年3月24日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

【CVPR2020-Oral-计算所-旷视】学习用于语义分割的动态路由，Learning Dynamic Routing

【CVPR2020-Oral-计算所-旷视】学习用于语义分割的动态路由，Learning Dynamic Routing

专知会员服务

27+阅读 · 2020年3月24日

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

基于稀疏表达理论和RGBD图像的人脸表情识别

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

基于分层稀疏表示的微动目标ISAR三维层析成像技术

国家自然科学基金

1+阅读 · 2015年12月31日

基于高空间分辨电子显微学In2-xGaxO3(ZnO)m缺陷分析

国家自然科学基金

0+阅读 · 2015年12月31日

WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling

Arxiv

0+阅读 · 12月8日

TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

Arxiv

0+阅读 · 11月20日

RTGaze: Real-Time 3D-Aware Gaze Redirection from a Single Image

Arxiv

0+阅读 · 11月14日

TinyDef-DETR: A Transformer-Based Framework for Defect Detection in Transmission Lines from UAV Imagery

Arxiv

0+阅读 · 11月13日

InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation

Arxiv

0+阅读 · 11月6日

VIP会员

文章信息

相关主题

多视角图像

相关VIP内容

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【ACL2022】一种基于三阶张量同构的高效实体对齐译码算法, An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism

【ACL2022】一种基于三阶张量同构的高效实体对齐译码算法, An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism

专知会员服务

13+阅读 · 2022年3月24日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

【CVPR2020-Oral-计算所-旷视】学习用于语义分割的动态路由，Learning Dynamic Routing

【CVPR2020-Oral-计算所-旷视】学习用于语义分割的动态路由，Learning Dynamic Routing

专知会员服务

27+阅读 · 2020年3月24日

热门VIP内容

开通专知VIP会员享更多权益服务

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

锚定情报：合成欺骗时代的地面真相

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

相关资讯

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

相关论文

WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling

Arxiv

0+阅读 · 12月8日

TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

Arxiv

0+阅读 · 11月20日

RTGaze: Real-Time 3D-Aware Gaze Redirection from a Single Image

Arxiv

0+阅读 · 11月14日

TinyDef-DETR: A Transformer-Based Framework for Defect Detection in Transmission Lines from UAV Imagery

Arxiv

0+阅读 · 11月13日

InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation

Arxiv

0+阅读 · 11月6日

相关基金

基于稀疏表达理论和RGBD图像的人脸表情识别

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

基于分层稀疏表示的微动目标ISAR三维层析成像技术

国家自然科学基金

1+阅读 · 2015年12月31日

基于高空间分辨电子显微学In2-xGaxO3(ZnO)m缺陷分析

国家自然科学基金

0+阅读 · 2015年12月31日

微信扫码咨询专知VIP会员