AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators. By designing and systematically varying different operator sets and search policies (Greedy, MCTS, Evolutionary), we show that their interplay is critical for achieving high performance. Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%. Our investigation underscores the importance of jointly considering the search strategy, operator design, and evaluation methodology in advancing automated machine learning.
翻译:AI研究智能体通过自动化机器学习模型的设计、实现与训练,展现出加速科学进展的巨大潜力。本文聚焦于提升智能体在MLE-bench这一挑战性基准测试中的性能,该基准要求智能体通过参与Kaggle竞赛解决现实世界机器学习问题。我们将AI研究智能体形式化为搜索策略,其在候选解空间中导航,并利用算子对解进行迭代修改。通过设计并系统化调整不同算子集合与搜索策略(贪心算法、蒙特卡洛树搜索、进化算法),我们证明二者的协同作用对实现高性能至关重要。我们最优的搜索策略与算子组合在MLE-bench lite上取得了领先成果,将获得Kaggle奖牌的成功率从39.6%提升至47.7%。本研究强调了在推进自动化机器学习过程中,联合考量搜索策略、算子设计与评估方法的重要性。