翻译后的标题：高效的音频字幕Transformer：采用Patchout和文本指导翻译后的摘要：自动音频字幕是一项多模态的翻译任务，旨在为给定的音频剪辑生成文本描述。在本文中，我们提出了一种完整的Transformer体系结构，采用Patchout，可显著减少计算复杂性并避免过拟合。字幕生成部分受到预训练分类模型提取的文本AudioSet标签的部分调节，该分类模型经过微调，以最大化AudioSet标签和地面实况字幕之间的语义相似性。为了缓解自动音频字幕的数据稀缺性问题，我们引入了从上游音频相关任务和一个扩大的领域内数据集中进行的转移学习。此外，我们提出了一种在AAC中应用Mixup增强的方法。通过消融研究，我们研究了Patchout和文本指导如何为最终性能做出贡献。结果表明，所提出的技术提高了我们系统的性能，同时降低了计算复杂度。我们的提出的方法在2022年DCASE挑战赛的6A任务中获得了最佳评委奖。 (Efficient Audio Captioning Transformer with Patchout and Text Guidance)

翻译：翻译后的标题：高效的音频字幕Transformer：采用Patchout和文本指导翻译后的摘要：自动音频字幕是一项多模态的翻译任务，旨在为给定的音频剪辑生成文本描述。在本文中，我们提出了一种完整的Transformer体系结构，采用Patchout，可显著减少计算复杂性并避免过拟合。字幕生成部分受到预训练分类模型提取的文本AudioSet标签的部分调节，该分类模型经过微调，以最大化AudioSet标签和地面实况字幕之间的语义相似性。为了缓解自动音频字幕的数据稀缺性问题，我们引入了从上游音频相关任务和一个扩大的领域内数据集中进行的转移学习。此外，我们提出了一种在AAC中应用Mixup增强的方法。通过消融研究，我们研究了Patchout和文本指导如何为最终性能做出贡献。结果表明，所提出的技术提高了我们系统的性能，同时降低了计算复杂度。我们的提出的方法在2022年DCASE挑战赛的6A任务中获得了最佳评委奖。

Thodoris Kouzelis,Grigoris Bastas,Athanasios Katsamanis,Alexandros Potamianos

from arxiv, 5 pages, 1 figure

Automated audio captioning is multi-modal translation task that aim to generate textual descriptions for a given audio clip. In this paper we propose a full Transformer architecture that utilizes Patchout as proposed in [1], significantly reducing the computational complexity and avoiding overfitting. The caption generation is partly conditioned on textual AudioSet tags extracted by a pre-trained classification model which is fine-tuned to maximize the semantic similarity between AudioSet labels and ground truth captions. To mitigate the data scarcity problem of Automated Audio Captioning we introduce transfer learning from an upstream audio-related task and an enlarged in-domain dataset. Moreover, we propose a method to apply Mixup augmentation for AAC. Ablation studies are carried out to investigate how Patchout and text guidance contribute to the final performance. The results show that the proposed techniques improve the performance of our system and while reducing the computational complexity. Our proposed method received the Judges Award at the Task6A of DCASE Challenge 2022.

翻译：