Temporal envelope morphing, the process of interpolating between the amplitude dynamics of two audio signals, is an emerging problem in generative audio systems that lacks sufficient perceptual grounding. Morphing of temporal envelopes in a perceptually intuitive manner should enable new methods for sound blending in creative media and for probing perceptual organization in psychoacoustics. However, existing audio morphing techniques often fail to produce intermediate temporal envelopes when input sounds have distinct temporal structures; many morphers effectively overlay both temporal structures, leading to perceptually unnatural results. In this paper, we introduce a novel workflow for learning envelope morphing with perceptual guidance: we first derive perceptually grounded morphing principles through human listening studies, then synthesize large-scale datasets encoding these principles, and finally train machine learning models to create perceptually intermediate morphs. Specifically, we present: (1) perceptual principles that guide envelope morphing, derived from our listening studies, (2) a supervised framework to learn these principles, (3) an autoencoder that learns to compress temporal envelope structures into latent representations, and (4) benchmarks for evaluating audio envelope morphs, using both synthetic and naturalistic data, and show that our approach outperforms existing methods in producing temporally intermediate morphs. All code, models, and checkpoints are available at https://github.com/TemporalMorphing/EnvelopeMorphing.
翻译:时间包络形态变换,即在两个音频信号的幅度动态之间进行插值的过程,是生成式音频系统中一个新兴但缺乏足够感知基础的问题。以感知直观的方式进行时间包络形态变换,应能为创意媒体中的声音融合以及心理声学中的感知组织探究提供新方法。然而,当输入声音具有明显不同的时间结构时,现有的音频形态变换技术往往无法产生中间时间包络;许多变换器实质上叠加了两种时间结构,导致感知上不自然的结果。本文提出了一种新颖的工作流程,用于在感知指导下学习包络形态变换:我们首先通过人类听觉研究推导出基于感知的形态变换原则,然后合成编码这些原则的大规模数据集,最后训练机器学习模型以创建感知上的中间形态。具体而言,我们提出了:(1)从听觉研究中推导出的指导包络形态变换的感知原则,(2)学习这些原则的监督框架,(3)学习将时间包络结构压缩为潜在表示的自动编码器,以及(4)使用合成和自然数据评估音频包络形态变换的基准,并证明我们的方法在产生时间中间形态方面优于现有方法。所有代码、模型和检查点均可在 https://github.com/TemporalMorphing/EnvelopeMorphing 获取。