通过模拟人类注意的踪迹,将 " 说什么 " 与 " 从哪里看 " 联系起来 (Connecting What to Say With Where to Look by Modeling Human Attention Traces) - 专知论文

会员服务 ·

0

迹 · MoDELS · Extensibility · 注意力机制 · 图像字幕 ·

2021 年 5 月 12 日

Connecting What to Say With Where to Look by Modeling Human Attention Traces

翻译：通过模拟人类注意的踪迹,将 " 说什么 " 与 " 从哪里看 " 联系起来

Zihang Meng,Licheng Yu,Ning Zhang,Tamara Berg,Babak Damavandi,Vikas Singh,Amy Bearman

We introduce a unified framework to jointly model images, text, and human attention traces. Our work is built on top of the recent Localized Narratives annotation framework [30], where each word of a given caption is paired with a mouse trace segment. We propose two novel tasks: (1) predict a trace given an image and caption (i.e., visual grounding), and (2) predict a caption and a trace given only an image. Learning the grounding of each word is challenging, due to noise in the human-provided traces and the presence of words that cannot be meaningfully visually grounded. We present a novel model architecture that is jointly trained on dual tasks (controlled trace generation and controlled caption generation). To evaluate the quality of the generated traces, we propose a local bipartite matching (LBM) distance metric which allows the comparison of two traces of different lengths. Extensive experiments show our model is robust to the imperfect training data and outperforms the baselines by a clear margin. Moreover, we demonstrate that our model pre-trained on the proposed tasks can be also beneficial to the downstream task of COCO's guided image captioning. Our code and project page are publicly available.

翻译：我们引入一个统一框架, 共同模拟图像、文本和人类关注痕迹。我们的工作建在最近的本地化叙述说明框架 [30] 之上, 将给定标题的每个字都配上鼠标痕量段。我们提出两个新任务:(1) 预测一个给定图像和字幕的痕迹( 视觉地面), (2) 预测一个字幕和只给定图像的痕迹。学习每个词的地基具有挑战性, 因为人类提供的痕迹中的噪音以及存在无法产生有意义的视觉依据的单词。我们展示了一个在双重任务( 受控的痕量生成和受控的标题生成)上经过联合培训的新型模型。为了评估所生成的痕迹的质量, 我们提议了一个本地双边匹配距离的参数, 以便比较不同长度的两种痕迹。广泛的实验显示我们的模型对不完善的培训数据非常有力, 并且以清晰的边距超越基线。此外, 我们证明, 在拟议任务上预先培训的模型也可以对CO的下游任务( 受控的追踪和受控的图像生成) 。我们的代码和项目页面可以公开查阅。

0

相关内容

【ACL2021】预训练语言模型的少样本知识图谱文本生成

专知会员服务

39+阅读 · 2021年6月6日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【2020新书】Web应用安全，331页pdf

【2020新书】Web应用安全，331页pdf

专知会员服务

25+阅读 · 2020年10月24日

知识图嵌入和可解释人工智能 Knowledge Graph Embeddings and Explainable AI

知识图嵌入和可解释人工智能 Knowledge Graph Embeddings and Explainable AI

专知会员服务

136+阅读 · 2020年5月1日

17篇知识图谱Knowledge Graphs论文 @AAAI2020

17篇知识图谱Knowledge Graphs论文 @AAAI2020

专知会员服务

172+阅读 · 2020年2月13日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

专知会员服务

26+阅读 · 2019年11月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

专知

6+阅读 · 2018年1月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Classification by Attention: Scene Graph Classification with Prior Knowledge

Arxiv

8+阅读 · 2020年11月19日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Learning by Abstraction: The Neural State Machine

Learning by Abstraction: The Neural State Machine

Arxiv

6+阅读 · 2019年7月11日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Arxiv

7+阅读 · 2018年12月3日

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Arxiv

5+阅读 · 2018年9月24日

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

Arxiv

4+阅读 · 2018年7月23日

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Arxiv

4+阅读 · 2018年5月24日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【ACL2021】预训练语言模型的少样本知识图谱文本生成

专知会员服务

39+阅读 · 2021年6月6日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【2020新书】Web应用安全，331页pdf

【2020新书】Web应用安全，331页pdf

专知会员服务

25+阅读 · 2020年10月24日

知识图嵌入和可解释人工智能 Knowledge Graph Embeddings and Explainable AI

知识图嵌入和可解释人工智能 Knowledge Graph Embeddings and Explainable AI

专知会员服务

136+阅读 · 2020年5月1日

17篇知识图谱Knowledge Graphs论文 @AAAI2020

17篇知识图谱Knowledge Graphs论文 @AAAI2020

专知会员服务

172+阅读 · 2020年2月13日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

【ACL 2019 Tutorials】从结构化数据和知识图谱中讲故事：NLG的观点（Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective）

专知会员服务

26+阅读 · 2019年11月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】具身智能体中的复杂人类偏好

无人机如何改变战争？未来战场

【KDD2025】一种新颖的可解释性无监督异常检测模型

深度学习时代的模仿学习：新型分类体系与最新研究进展

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

专知

6+阅读 · 2018年1月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Classification by Attention: Scene Graph Classification with Prior Knowledge

Arxiv

8+阅读 · 2020年11月19日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Learning by Abstraction: The Neural State Machine

Learning by Abstraction: The Neural State Machine

Arxiv

6+阅读 · 2019年7月11日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Arxiv

7+阅读 · 2018年12月3日

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Arxiv

5+阅读 · 2018年9月24日

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

Arxiv

4+阅读 · 2018年7月23日

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Arxiv

4+阅读 · 2018年5月24日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

微信扫码咨询专知VIP会员