CASTELLA：带有字幕和时间边界的长音频数据集 (CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries)

We introduce CASTELLA, a human-annotated audio benchmark for the task of audio moment retrieval (AMR). Although AMR has various useful potential applications, there is still no established benchmark with real-world data. The early study of AMR trained the model with solely synthetic datasets. Moreover, the evaluation is based on annotated dataset of fewer than 100 samples. This resulted in less reliable reported performance. To ensure performance for applications in real-world environments, we present CASTELLA, a large-scale manually annotated AMR dataset. CASTELLA consists of 1,009, 213, and 640 audio recordings for train, valid, and test split, respectively, which is 24 times larger than the previous dataset. We also establish a baseline model for AMR using CASTELLA. Our experiments demonstrate that a model fine-tuned on CASTELLA after pre-training on the synthetic data outperformed a model trained solely on the synthetic data by 10.4 points in Recall1@0.7. CASTELLA is publicly available in https://h-munakata.github.io/CASTELLA-demo/.

翻译：我们介绍了CASTELLA，一个用于音频片段检索（AMR）任务的人工标注音频基准。尽管AMR具有多种潜在的实际应用价值，但目前仍缺乏基于真实世界数据的成熟基准。AMR的早期研究仅使用合成数据集训练模型。此外，评估基于少于100个样本的标注数据集，导致报告的性能可靠性较低。为确保在真实环境应用中的性能，我们提出了CASTELLA，一个大规模人工标注的AMR数据集。CASTELLA包含分别用于训练、验证和测试的1,009、213和640个音频记录，规模是先前数据集的24倍。我们还利用CASTELLA建立了AMR的基线模型。实验表明，在合成数据预训练后使用CASTELLA微调的模型，在Recall1@0.7指标上比仅使用合成数据训练的模型高出10.4个百分点。CASTELLA已在https://h-munakata.github.io/CASTELLA-demo/公开提供。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日