Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning process. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that by combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.
翻译:AI能否自主设计计算机系统机制,达到与人类专家相当的创造力和推理水平?我们提出Glia,一种用于网络化系统设计的AI架构,它采用大型语言模型(LLMs)构建受人类启发的多智能体工作流。每个智能体专精于推理、实验与分析,通过一个将抽象推理与经验反馈相结合的评价框架进行协作。与先前优化黑盒策略的机器学习系统方法不同,Glia生成可解释的设计并公开其推理过程。当应用于分布式GPU集群进行LLM推理时,Glia在显著更短的时间内生成了请求路由、调度与自动扩缩容的新算法,其性能达到人类专家水平,同时为工作负载行为提供了新颖见解。我们的结果表明,通过将推理型LLMs与结构化实验相结合,AI能够为复杂系统问题生成具有创造性且易于理解的设计方案。