Bridging the gap to real-world language-grounded visual concept learning - 专知论文

会员服务 ·

0

Learning · 可辨认的 · MoDELS · 大学 · 多样性 ·

Bridging the gap to real-world language-grounded visual concept learning

翻译：暂无翻译

Whie Jung,Semin Kim,Junee Kim,Seunghoon Hong

Human intelligence effortlessly interprets visual scenes along a rich spectrum of semantic dimensions. However, existing approaches to language-grounded visual concept learning are limited to a few predefined primitive axes, such as color and shape, and are typically explored in synthetic datasets. In this work, we propose a scalable framework that adaptively identifies image-related concept axes and grounds visual concepts along these axes in real-world scenes. Leveraging a pretrained vision-language model and our universal prompting strategy, our framework identifies a diverse image-related axes without any prior knowledge. Our universal concept encoder adaptively binds visual features to the discovered axes without introducing additional model parameters for each concept. To ground visual concepts along the discovered axes, we optimize a compositional anchoring objective, which ensures that each axis can be independently manipulated without affecting others. We demonstrate the effectiveness of our framework on subsets of ImageNet, CelebA-HQ, and AFHQ, showcasing superior editing capabilities across diverse real-world concepts that are too varied to be manually predefined. Our method also exhibits strong compositional generalization, outperforming existing visual concept learning and text-based editing methods. The code is available at https://github.com/whieya/Language-grounded-VCL.

翻译：暂无翻译

0

相关内容

Learning

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

科技创新与创业

17+阅读 · 2017年11月17日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

哈密尔顿系统及多体问题的周期解

国家自然科学基金

0+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

Geometric multimodal representation learning

Arxiv

69+阅读 · 2022年9月7日

A psychological theory of explainability

Arxiv

16+阅读 · 2022年5月17日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

Recent advances in deep learning theory

Recent advances in deep learning theory

Arxiv

50+阅读 · 2020年12月20日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

13+阅读 · 2019年2月19日

VIP会员

文章信息

相关主题

相关VIP内容

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同时代的军事指挥控制演进

《英国智库：瓦解俄罗斯防空系统生产，夺回制空权》最新报告

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

《战术突击工具包：军队的“边缘”操作系统》报告

相关资讯

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

科技创新与创业

17+阅读 · 2017年11月17日

相关论文

Geometric multimodal representation learning

Arxiv

69+阅读 · 2022年9月7日

A psychological theory of explainability

Arxiv

16+阅读 · 2022年5月17日

Deep learning: a statistical viewpoint

Arxiv

18+阅读 · 2021年3月16日

Recent advances in deep learning theory

Recent advances in deep learning theory

Arxiv

50+阅读 · 2020年12月20日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

13+阅读 · 2019年2月19日

相关基金

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

哈密尔顿系统及多体问题的周期解

国家自然科学基金

0+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员