Recent advances in dance generation have enabled the automatic synthesis of 3D dance motions. However, existing methods still face significant challenges in simultaneously achieving high realism, precise dance-music synchronization, diverse motion expression, and physical plausibility. To address these limitations, we propose a novel approach that leverages a generative masked text-to-motion model as a distribution prior to learn a probabilistic mapping from diverse guidance signals, including music, genre, and pose, into high-quality dance motion sequences. Our framework also supports semantic motion editing, such as motion inpainting and body part modification. Specifically, we introduce a multi-tower masked motion model that integrates a text-conditioned masked motion backbone with two parallel, modality-specific branches: a music-guidance tower and a pose-guidance tower. The model is trained using synchronized and progressive masked training, which allows effective infusion of the pretrained text-to-motion prior into the dance synthesis process while enabling each guidance branch to optimize independently through its own loss function, mitigating gradient interference. During inference, we introduce classifier-free logits guidance and pose-guided token optimization to strengthen the influence of music, genre, and pose signals. Extensive experiments demonstrate that our method sets a new state of the art in dance generation, significantly advancing both the quality and editability over existing approaches. Project Page available at https://foram-s1.github.io/DanceMosaic/
翻译:舞蹈生成领域的最新进展已能实现三维舞蹈动作的自动合成。然而,现有方法在同时实现高真实感、精确的舞蹈-音乐同步性、多样化的动作表达以及物理合理性方面仍面临显著挑战。为突破这些局限,我们提出一种新颖方法,利用生成式掩码文本到运动模型作为分布先验,学习从多样化引导信号(包括音乐、流派和姿态)到高质量舞蹈动作序列的概率映射。我们的框架还支持语义化动作编辑,如运动修复和身体部位修改。具体而言,我们引入了一个多塔式掩码运动模型,该模型整合了文本条件掩码运动主干网络与两个并行的模态特定分支:音乐引导塔和姿态引导塔。模型通过同步渐进式掩码训练进行训练,这既能将预训练的文本到运动先验有效注入舞蹈合成过程,又能让每个引导分支通过其自身的损失函数独立优化,从而减轻梯度干扰。在推理阶段,我们引入无分类器对数引导和姿态引导的令牌优化,以强化音乐、流派和姿态信号的影响。大量实验表明,我们的方法在舞蹈生成领域确立了新的技术标杆,在质量和可编辑性方面均显著超越了现有方法。项目页面请访问:https://foram-s1.github.io/DanceMosaic/