基于强化学习的集中式多智能体大语言模型系统性能与成本控制 (Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning)

Large language models (LLMs) exhibit complementary strengths across domains and come with varying inference costs, motivating the design of multi-agent LLM systems where specialized models collaborate efficiently. Existing approaches predominantly rely on decentralized frameworks, which invoke multiple LLMs for every input and thus lead to substantial and uncontrolled inference costs. In this work, we introduce a centralized multi-LLM framework, where a controller LLM selectively coordinates a pool of expert models in a cost-efficient and cost-controllable manner. We formulate this coordination problem as reinforcement learning with dual objectives: maximizing task performance while minimizing the overall inference cost. In addition, we expect the multi-agent system to have adapted behavior with different budget conditions during inference. To this end, we propose CoRL, a reinforcement learning framework that optimizes the performance cost trade-off in a controllable multi-budget setting. Experiments on four diverse benchmarks demonstrate that CoRL enables a single system to surpass the best expert LLM under high-budget settings, while maintaining strong performance in more economical low-budget modes, highlighting the effectiveness of centralized coordination for scalable and cost-efficient multi-agent LLM systems.

翻译：大语言模型（LLMs）在不同领域展现出互补优势，且推理成本各异，这促使了多智能体LLM系统的设计，以实现专业化模型的高效协作。现有方法主要依赖去中心化框架，对每个输入均调用多个LLM，导致推理成本高昂且不可控。本研究提出一种集中式多LLM框架，其中控制器LLM以成本高效且成本可控的方式，有选择地协调一组专家模型。我们将该协调问题建模为具有双重目标的强化学习：最大化任务性能的同时最小化总体推理成本。此外，我们期望多智能体系统在推理过程中能根据不同预算条件自适应调整行为。为此，我们提出CoRL——一种在可控多预算设置下优化性能-成本权衡的强化学习框架。在四个多样化基准测试上的实验表明，CoRL使单一系统在高预算设置下超越最佳专家LLM，同时在更经济的低预算模式下保持强劲性能，凸显了集中式协调对于可扩展且成本高效的多智能体LLM系统的有效性。