The booming development and huge market of micro-videos bring new e-commerce channels for merchants. Currently, more micro-video publishers prefer to embed relevant ads into their micro-videos, which not only provides them with business income but helps the audiences to discover their interesting products. However, due to the micro-video recording by unprofessional equipment, involving various topics and including multiple modalities, it is challenging to locate the products related to micro-videos efficiently, appropriately, and accurately. We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances. A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval, consisting of the uni-modal feature and multi-modal instance representation learning. Moreover, a discriminative selection strategy with a multi-queue is used to distinguish the importance of different negatives based on their categories. We collect two large-scale microvideo-product datasets (MVS and MVS-large) for evaluation and manually construct the hierarchical category ontology, which covers sundry products in daily life. Extensive experiments show that MQMC outperforms the state-of-the-art baselines. Our replication package (including code, dataset, etc.) is publicly available at https://github.com/duyali2000/MQMC.
翻译:目前,更多的微视出版商倾向于在微视中嵌入相关广告,不仅为他们提供商业收入,而且帮助受众发现其有趣的产品。然而,由于非专业设备的微型录像记录,涉及各种专题和多种模式,因此,要高效率、适当和准确地确定与微视有关的产品,是具有挑战性的。我们制定了微视产品检索任务,这是探索多模式和多模式实例之间检索的首次尝试。提议采用名为多模式和多模式对比(MQMC)网络的新颖方法,进行双向检索,包括单模式特征和多模式实例介绍学习。此外,采用多模式的歧视性选择战略,根据它们的类别来区分不同负面产品的重要性。我们收集了两个大型微视产品数据集(MVS和MVS大),用于评估和手动的多模式对比(MQMC)网络,用于双向检索,包括单模式特征和多模式实例介绍学习。此外,我们使用的有歧视选择战略,根据它们的类别来区分不同负面产品的重要性。我们收集了两个大型微视产品集集成型和MVS),用于评估和手动的Q-MMC系统复制。在公共版本数据库中,包括了我们的系列数据库数据库,用来显示我们的数据。