Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic complexity of real-world marine environments. Data augmentation has proven to be an effective strategy to address this limitation by increasing dataset diversity and improving model generalization without requiring additional field data. However, most augmentation techniques used to date rely on effective but relatively simple transformations, leaving open the question of whether deep generative models can provide additional benefits. In this study, we evaluate the potential of deep generative for data augmentation in marine mammal call detection including: Variational Autoencoders, Generative Adversarial Networks, and Denoising Diffusion Probabilistic Models. Using Southern Resident Killer Whale (Orcinus orca) vocalizations from two long-term hydrophone deployments in the Salish Sea, we compare these approaches against traditional augmentation methods such as time-shifting and vocalization masking. While all generative approaches improved classification performance relative to the baseline, diffusion-based augmentation yielded the highest recall (0.87) and overall F1-score (0.75). A hybrid strategy combining generative-based synthesis with traditional methods achieved the best overall performance with an F1-score of 0.81. We hope this study encourages further exploration of deep generative models as complementary augmentation strategies to advance acoustic monitoring of threatened marine mammal populations.
翻译:海洋哺乳动物发声的自动检测与分类对于保护和管理工作至关重要,但受限于标注数据集的稀缺以及真实海洋环境声学复杂性。数据增强已被证明是一种有效的策略,可在无需额外野外数据的情况下增加数据集多样性并提升模型泛化能力。然而,目前使用的大多数增强技术依赖于有效但相对简单的变换,这引发了深度生成模型是否能提供额外优势的问题。本研究评估了深度生成模型在海洋哺乳动物叫声检测中用于数据增强的潜力,包括:变分自编码器、生成对抗网络和去噪扩散概率模型。基于萨利希海两处长期水听器部署记录的南方居留型虎鲸(Orcinus orca)发声数据,我们将这些方法与时间偏移和发声掩蔽等传统增强方法进行比较。尽管所有生成方法相较于基线均提升了分类性能,但基于扩散的增强方法取得了最高的召回率(0.87)和总体F1分数(0.75)。结合生成式合成与传统方法的混合策略实现了最佳整体性能,F1分数达到0.81。我们希望本研究能促进进一步探索深度生成模型作为补充增强策略,以推动对受威胁海洋哺乳动物种群的声学监测。