Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board wargame exemplifying the challenge of strategic decision making under massive amounts of hidden information -- stands apart as a case where such efforts failed to produce performance at the level of top humans. This work establishes a step change in both performance and cost for Stratego, showing that it is now possible not only to reach the level of top humans, but to achieve vastly superhuman level -- and that doing so requires not an industrial budget, but merely a few thousand dollars. We achieved this result by developing general approaches for self-play reinforcement learning and test-time search under imperfect information.
翻译:鲜有经典游戏被视为人工智能的重要基准,以至于需要投入数百万美元的训练成本。其中,《Stratego》——一款体现海量隐藏信息下战略决策挑战的棋盘战争游戏——尤为特殊,因为此类投入未能达到顶尖人类玩家的水平。本研究在《Stratego》的性能与成本方面实现了突破性进展,表明不仅可能达到顶尖人类水平,更能实现远超人类的性能——且达成这一目标无需工业级预算,仅需数千美元。我们通过开发不完美信息下的通用自博弈强化学习与测试时搜索方法取得了这一成果。