Artificial Intelligence (AI) is beginning to transform the research process by automating the discovery of new solutions. This shift depends on the availability of reliable verifiers, which AI-driven approaches require to validate candidate solutions. Research focused on improving systems performance is especially well-suited to this paradigm because system performance problems naturally admit such verifiers: candidates can be implemented in real systems or simulators and evaluated against predefined workloads. We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems (ADRS). Using several open-source ADRS instances (i.e., OpenEvolve, GEPA, and ShinkaEvolve), we demonstrate across ten case studies (e.g., multi-region cloud scheduling, mixture-of-experts load balancing, LLM-based SQL, transaction scheduling) that ADRS-generated solutions can match or even outperform human state-of-the-art designs. Based on these findings, we outline best practices (e.g., level of prompt specification, amount of feedback, robust evaluation) for effectively using ADRS, and we discuss future research directions and their implications. Although we do not yet have a universal recipe for applying ADRS across all of systems research, we hope our preliminary findings, together with the challenges we identify, offer meaningful guidance for future work as researcher effort shifts increasingly toward problem formulation and strategic oversight. Note: This paper is an extension of our prior work [14]. It adds extensive evaluation across multiple ADRS frameworks and provides deeper analysis and insights into best practices.
翻译:人工智能正通过自动化发现新解决方案,开始变革研究过程。这一转变依赖于可靠验证器的可用性,人工智能驱动的方法需要这些验证器来验证候选解决方案。专注于提升系统性能的研究尤其适合这一范式,因为系统性能问题天然允许此类验证器:候选方案可以在实际系统或模拟器中实现,并根据预定义工作负载进行评估。我们将这种生成、评估和优化的迭代循环称为系统的人工智能驱动研究。通过使用多个开源ADRS实例(即OpenEvolve、GEPA和ShinkaEvolve),我们在十个案例研究(例如多区域云调度、专家混合负载均衡、基于LLM的SQL、事务调度)中证明,ADRS生成的解决方案能够匹配甚至超越人类最先进的设计。基于这些发现,我们概述了有效使用ADRS的最佳实践(例如提示规范级别、反馈量、稳健评估),并讨论了未来研究方向及其影响。尽管我们尚未拥有将ADRS应用于所有系统研究的通用方法,但我们希望我们的初步发现,连同我们确定的挑战,能够为未来工作提供有意义的指导,因为研究人员的工作重心正日益转向问题表述和战略监督。注:本文是我们先前工作[14]的扩展。它增加了对多个ADRS框架的广泛评估,并提供了对最佳实践的更深入分析和见解。