路由器混合方法 (Mixture of Routers)

Supervised fine-tuning (SFT) is a milestone in aligning large language models with human instructions and adapting them to downstream tasks. In particular, Low-Rank Adaptation (LoRA) has gained widespread attention due to its parameter efficiency. However, its impact on improving the performance of large models remains limited. Recent studies suggest that combining LoRA with Mixture-of-Experts (MoE) can significantly enhance fine-tuning performance. MoE adapts to the diversity and complexity of datasets by dynamically selecting the most suitable experts, thereby improving task accuracy and efficiency. Despite impressive results, recent studies reveal issues in the MoE routing mechanism, such as incorrect assignments and imbalanced expert allocation. Inspired by the principles of Redundancy and Fault Tolerance Theory. We innovatively integrate the concept of Mixture of Experts into the routing mechanism and propose an efficient fine-tuning method called Mixture of Routers (MoR). It employs multiple sub-routers for joint selection and uses a learnable main router to determine the weights of the sub-routers. The results show that MoR outperforms baseline models on most tasks, achieving an average performance improvement of 1%. MoR can serve as a plug-and-play, parameter-efficient fine-tuning method suitable for a wide range of applications. Our code is available here: https://anonymous.4open.science/r/MoR-DFC6.

翻译：监督微调（SFT）是将大型语言模型与人类指令对齐并使其适应下游任务的重要里程碑。其中，低秩适应（LoRA）因其参数效率而受到广泛关注。然而，其在提升大模型性能方面的作用仍有限。近期研究表明，将LoRA与专家混合（MoE）相结合可显著提升微调性能。MoE通过动态选择最合适的专家来适应数据集的多样性和复杂性，从而提高任务准确性和效率。尽管取得了令人瞩目的成果，但近期研究揭示了MoE路由机制存在的问题，如错误分配和专家负载不均衡。受冗余与容错理论原理的启发，我们创新性地将专家混合概念融入路由机制，提出了一种高效的微调方法——路由器混合（MoR）。该方法采用多个子路由器进行联合选择，并利用可学习的主路由器确定子路由器的权重。实验结果表明，MoR在多数任务上优于基线模型，平均性能提升达1%。MoR可作为即插即用、参数高效的微调方法，适用于广泛的应用场景。我们的代码公开于此：https://anonymous.4open.science/r/MoR-DFC6。