CLIP2TV: 视频文本检索对齐、匹配和蒸馏 (CLIP2TV: Align, Match and Distill for Video-Text Retrieval) - 专知论文

会员服务 ·

0

Learning · Extensibility · 最终评估 · 蒸馏 · 变换 ·

2022 年 7 月 21 日

CLIP2TV: Align, Match and Distill for Video-Text Retrieval

翻译：CLIP2TV: 视频文本检索对齐、匹配和蒸馏

Zijian Gao,Jingyu Liu,Weiqi Sun,Sheng Chen,Dedan Chang,Lili Zhao

Modern video-text retrieval frameworks basically consist of three parts: video encoder, text encoder and the similarity head. With the success on both visual and textual representation learning, transformer based encoders and fusion methods have also been adopted in the field of video-text retrieval. In this report, we present CLIP2TV, aiming at exploring where the critical elements lie in transformer based methods. To achieve this, We first revisit some recent works on multi-modal learning, then introduce some techniques into video-text retrieval, finally evaluate them through extensive experiments in different configurations. Notably, CLIP2TV achieves 52.9@R1 on MSR-VTT dataset, outperforming the previous SOTA result by 4.1%.

翻译：现代视频文本检索框架基本上由三部分组成:视频编码器、文本编码器和类似内容头。随着视觉和文字表述学习的成功,在视频文本检索领域也采用了基于变压器的编码器和聚合方法。在本报告中,我们介绍了CLIP2TV,旨在探索变压器方法中关键要素的位置。为了实现这一点,我们首先重新审视一些关于多模式学习的近期工作,然后将一些技术引入视频文本检索,最后通过不同配置的广泛实验对其进行评估。值得注意的是,CLIP2TV在MSR-VTT数据集上取得了52.9@R1的成绩,比以前SOTA的结果高出4.1%。

0

相关内容

Learning

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

级联交流高频链双向变流器的模块化集成优化与分布式控制方法

国家自然科学基金

0+阅读 · 2015年12月31日

精细亚区尺度难治性抑郁症异常神经环路的多模态MRI解析

国家自然科学基金

0+阅读 · 2015年12月31日

通过“Click”化学制备镧系金属有机框架荧光传感材料

国家自然科学基金

0+阅读 · 2013年12月31日

油藏条件下CO2生物转化为甲烷的群落结构与功能

国家自然科学基金

0+阅读 · 2012年12月31日

润湿性与金属海洋大气腐蚀行为的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

Sign Language Video Retrieval with Free-Form Textual Queries

Arxiv

0+阅读 · 2022年9月15日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Survey: Transformer based Video-Language Pre-training

Arxiv

20+阅读 · 2021年9月21日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

美海军作战管理系统：变革战场空间的二十年

《任务与武器驱动美海军舰队设计》报告

俄罗斯“沙希德”/“天竺葵”攻击无人机

《利用动态图对网络攻击进行建模与仿真：在云安全评估中的应用》90页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Sign Language Video Retrieval with Free-Form Textual Queries

Arxiv

0+阅读 · 2022年9月15日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Survey: Transformer based Video-Language Pre-training

Arxiv

20+阅读 · 2021年9月21日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

相关基金

级联交流高频链双向变流器的模块化集成优化与分布式控制方法

国家自然科学基金

0+阅读 · 2015年12月31日

精细亚区尺度难治性抑郁症异常神经环路的多模态MRI解析

国家自然科学基金

0+阅读 · 2015年12月31日

通过“Click”化学制备镧系金属有机框架荧光传感材料

国家自然科学基金

0+阅读 · 2013年12月31日

油藏条件下CO2生物转化为甲烷的群落结构与功能

国家自然科学基金

0+阅读 · 2012年12月31日

润湿性与金属海洋大气腐蚀行为的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员