Training a team of agents from scratch in multi-agent reinforcement learning (MARL) is highly inefficient, much like asking beginners to play a symphony together without first practicing solo. Existing methods, such as offline or transferable MARL, can ease this burden, but they still rely on costly multi-agent data, which often becomes the bottleneck. In contrast, solo experiences are far easier to obtain in many important scenarios, e.g., collaborative coding, household cooperation, and search-and-rescue. To unlock their potential, we propose Solo-to-Collaborative RL (SoCo), a framework that transfers solo knowledge into cooperative learning. SoCo first pretrains a shared solo policy from solo demonstrations, then adapts it for cooperation during multi-agent training through a policy fusion mechanism that combines an MoE-like gating selector and an action editor. Experiments across diverse cooperative tasks show that SoCo significantly boosts the training efficiency and performance of backbone algorithms. These results demonstrate that solo demonstrations provide a scalable and effective complement to multi-agent data, making cooperative learning more practical and broadly applicable.
翻译:在多智能体强化学习(MARL)中,从头开始训练一组智能体效率极低,这就像要求初学者未经个人练习就直接合奏交响乐。现有方法(如离线或可迁移的MARL)虽能缓解这一负担,但仍依赖昂贵的多智能体数据,而这往往成为瓶颈。相比之下,在许多重要场景(如协同编码、家庭协作、搜救任务)中,单智能体经验更容易获取。为释放其潜力,我们提出了单智能体到协作强化学习(SoCo),这是一个将单智能体知识迁移至协作学习的框架。SoCo首先基于单智能体演示预训练一个共享的单智能体策略,随后通过结合类MoE门控选择器与动作编辑器的策略融合机制,在多智能体训练期间将其适配为协作策略。在多种协作任务上的实验表明,SoCo显著提升了骨干算法的训练效率与性能。这些结果证明,单智能体演示为多智能体数据提供了可扩展且有效的补充,使协作学习更具实用性和广泛适用性。