This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.
翻译:本文介绍了Spheres数据集,该数据集为多轨管弦乐录音,旨在推动古典音乐领域中音乐源分离及相关音乐信息检索任务的机器学习研究。该数据集包含由Colibrì Ensemble在Spheres录音室演奏的超过一小时的乐曲录音,涵盖两部经典作品——柴可夫斯基的《罗密欧与朱丽叶》与莫扎特的《第40号交响曲》——以及各乐器的半音阶与独奏片段。录音设置采用了23支麦克风,包括近场点麦克风、主麦克风与环境麦克风,能够生成具有可控串音的真实立体声混音,并为源分离模型的监督训练提供独立的音轨素材。此外,本研究估算了每个乐器位置的房间脉冲响应,为录音空间提供了有价值的声学特性表征。我们展示了数据集的结构、声学分析结果,以及基于X-UMX模型在管弦乐声部分离与麦克风串音消除任务上的基线评估。实验结果既凸显了复杂管弦乐场景中源分离的潜力,也揭示了其面临的挑战,从而证明了该数据集在基准测试及探索古典音乐分离、定位、去混响与沉浸式渲染新方法方面的重要价值。