AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold'em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.
翻译:长期以来,人工智能研究者将扑克类游戏视为测试多玩家动态、不完全信息及不确定性环境下推理能力的基准环境。尽管近期突破性进展已使AI在无限注德州扑克中达到顶尖人类水平,但此类游戏的多玩家动态较为收敛:多数牌局迅速结束,仅有两名玩家通过多轮叫注展开博弈。本文提出首个在简化版骗子扑克中达到顶尖人类水平的AI智能体Solly,该游戏以广泛的多玩家互动为特征。我们采用无模型、演员-评论家架构的深度强化学习算法,通过自我博弈训练Solly。在单挑与多玩家骗子扑克中,Solly在胜率(赢得超过50%牌局)与权益(赢取金额)指标上均达到精英人类水平。Solly在相同指标上亦优于包括具备推理能力的大型语言模型(LLMs)。Solly发展出新颖的叫注策略,能有效实施随机化博弈,且不易被世界级人类玩家利用。