文本还不够: 将视觉压缩纳入 Open- domain 对话框生成 (Text is NOT Enough: Integrating Visual Impressions intoOpen-domain Dialogue Generation) - 专知论文

会员服务 ·

0

任务对话系统 · Integration · Better · 可理解性 · Processing（编程语言） ·

2021 年 9 月 13 日

Text is NOT Enough: Integrating Visual Impressions intoOpen-domain Dialogue Generation

翻译：文本还不够: 将视觉压缩纳入 Open- domain 对话框生成

Lei Shen,Haolan Zhan,Xin Shen,Yonghao Song,Xiaofang Zhao

from arxiv, Accepted by ACM MultiMedia 2021

Open-domain dialogue generation in natural language processing (NLP) is by default a pure-language task, which aims to satisfy human need for daily communication on open-ended topics by producing related and informative responses. In this paper, we point out that hidden images, named as visual impressions (VIs), can be explored from the text-only data to enhance dialogue understanding and help generate better responses. Besides, the semantic dependency between an dialogue post and its response is complicated, e.g., few word alignments and some topic transitions. Therefore, the visual impressions of them are not shared, and it is more reasonable to integrate the response visual impressions (RVIs) into the decoder, rather than the post visual impressions (PVIs). However, both the response and its RVIs are not given directly in the test process. To handle the above issues, we propose a framework to explicitly construct VIs based on pure-language dialogue datasets and utilize them for better dialogue understanding and generation. Specifically, we obtain a group of images (PVIs) for each post based on a pre-trained word-image mapping model. These PVIs are used in a co-attention encoder to get a post representation with both visual and textual information. Since the RVIs are not provided directly during testing, we design a cascade decoder that consists of two sub-decoders. The first sub-decoder predicts the content words in response, and applies the word-image mapping model to get those RVIs. Then, the second sub-decoder generates the response based on the post and RVIs. Experimental results on two open-domain dialogue datasets show that our proposed approach achieves superior performance over competitive baselines.

翻译：自然语言处理( NLP) 的开放式对话框生成默认是一种纯语言的任务, 目的是通过生成相关和内容丰富的回复, 满足人类对开放式专题日常沟通的需求。在本文中, 我们指出, 以视觉印象命名的隐藏图像( VIs ) 可以从纯文本数据中探索, 以加强对话理解和帮助产生更好的回应。此外, 对话框及其响应之间的语义依赖性非常复杂, 例如, 字词对齐和某些主题转换。因此, 它们的视觉印象并不共享, 将响应视觉印象( RVIs ) 纳入解码器而不是后视像图像( PVIs ) 。但是, 测试过程中不会直接给出被称为视觉印象的图像( VI) 。为了处理上述问题, 我们提议一个框架, 以纯语言对话框数据集为基础明确构建VIs, 并利用它们来改进对话理解和生成。具体地说, 我们为每个邮件获取了一系列的图像( PVIs) 组合, 以预培训的文字对高级图像( RCO ) 后, 在测试期间, 这些图像将生成的图像显示为两个版本。

0

相关内容

任务对话系统

任务对话系统

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

专知会员服务

37+阅读 · 2020年4月10日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

专知会员服务

29+阅读 · 2019年11月23日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

已删除

将门创投

5+阅读 · 2020年3月2日

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

AINLP

15+阅读 · 2019年8月12日

【论文】Awesome Relation Classification Paper（关系分类）（PART I）

【论文】Awesome Relation Classification Paper（关系分类）（PART I）

AINLP

5+阅读 · 2019年8月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

论文浅尝 | Global Relation Embedding for Relation Extraction

论文浅尝 | Global Relation Embedding for Relation Extraction

开放知识图谱

12+阅读 · 2019年3月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Visual Spatio-temporal Relation-enhanced Network for Cross-modal Text-Video Retrieval

Arxiv

0+阅读 · 2021年10月29日

Transformation Driven Visual Reasoning

Arxiv

3+阅读 · 2020年11月26日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Generating Question Relevant Captions to Aid Visual Question Answering

Generating Question Relevant Captions to Aid Visual Question Answering

Arxiv

5+阅读 · 2019年9月9日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Multi-turn Dialogue Response Generation in an Adversarial Learning Framework

Arxiv

4+阅读 · 2018年6月11日

Crossing Generative Adversarial Networks for Cross-View Person Re-identification

Arxiv

10+阅读 · 2018年1月4日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

VIP会员

文章信息

相关主题

任务对话系统

Processing（编程语言）

相关VIP内容

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

专知会员服务

37+阅读 · 2020年4月10日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

专知会员服务

29+阅读 · 2019年11月23日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

已删除

将门创投

5+阅读 · 2020年3月2日

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

【论文】Awesome Relation Classification Paper（关系分类）（PART II）

AINLP

15+阅读 · 2019年8月12日

【论文】Awesome Relation Classification Paper（关系分类）（PART I）

【论文】Awesome Relation Classification Paper（关系分类）（PART I）

AINLP

5+阅读 · 2019年8月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

论文浅尝 | Global Relation Embedding for Relation Extraction

论文浅尝 | Global Relation Embedding for Relation Extraction

开放知识图谱

12+阅读 · 2019年3月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Visual Spatio-temporal Relation-enhanced Network for Cross-modal Text-Video Retrieval

Arxiv

0+阅读 · 2021年10月29日

Transformation Driven Visual Reasoning

Arxiv

3+阅读 · 2020年11月26日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Generating Question Relevant Captions to Aid Visual Question Answering

Generating Question Relevant Captions to Aid Visual Question Answering

Arxiv

5+阅读 · 2019年9月9日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Multi-turn Dialogue Response Generation in an Adversarial Learning Framework

Arxiv

4+阅读 · 2018年6月11日

Crossing Generative Adversarial Networks for Cross-View Person Re-identification

Arxiv

10+阅读 · 2018年1月4日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

微信扫码咨询专知VIP会员