Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. To enhance CFR's applicability in large games, researchers use neural networks to approximate its behavior. However, existing methods are mainly based on vanilla CFR and struggle to effectively integrate more advanced CFR variants. In this work, we propose an efficient model-free neural CFR algorithm, overcoming the limitations of existing methods in approximating advanced CFR variants. At each iteration, it collects variance-reduced sampled advantages based on a value network, fits cumulative advantages by bootstrapping, and applies discounting and clipping operations to simulate the update mechanisms of advanced CFR variants. Experimental results show that, compared with model-free neural algorithms, it exhibits faster convergence in typical imperfect-information games and demonstrates stronger adversarial performance in a large poker game.
翻译:反事实遗憾最小化(CFR)是一类有效求解不完全信息博弈的算法。为提升CFR在大规模博弈中的适用性,研究者常采用神经网络近似其行为。然而,现有方法主要基于基础CFR框架,难以有效整合更先进的CFR变体。本研究提出一种高效的无模型神经CFR算法,克服了现有方法在近似先进CFR变体时的局限性。该算法在每次迭代中,基于价值网络收集方差缩减的采样优势值,通过自助法拟合累积优势,并应用折扣与截断操作以模拟先进CFR变体的更新机制。实验结果表明,相较于无模型神经算法,其在典型不完全信息博弈中收敛更快,并在大规模扑克游戏中展现出更强的对抗性能。