This paper studies the optimistic variant of Fictitious Play for learning in two-player zero-sum games. While it is known that Optimistic FTRL -- a regularized algorithm with a bounded stepsize parameter -- obtains constant regret in this setting, we show for the first time that similar, optimal rates are also achievable without regularization: we prove for two-strategy games that Optimistic Fictitious Play (using any tiebreaking rule) obtains only constant regret, providing surprising new evidence on the ability of non-no-regret algorithms for fast learning in games. Our proof technique leverages a geometric view of Optimistic Fictitious Play in the dual space of payoff vectors, where we show a certain energy function of the iterates remains bounded over time. Additionally, we also prove a regret lower bound of $Ω(\sqrt{T})$ for Alternating Fictitious Play. In the unregularized regime, this separates the ability of optimism and alternation in achieving $o(\sqrt{T})$ regret.
翻译:本文研究用于双人零和博弈学习的乐观虚拟博弈变体。尽管已知乐观FTRL——一种具有有界步长参数的正则化算法——在该设定下可获得常数遗憾,我们首次证明无需正则化亦可实现类似的、最优的收敛速率:针对双策略博弈,我们证明乐观虚拟博弈(使用任意平局决胜规则)仅产生常数遗憾,为非无悔算法在博弈中实现快速学习的能力提供了令人惊讶的新证据。我们的证明技术利用了乐观虚拟博弈在收益向量对偶空间中的几何视角,通过证明迭代序列的特定能量函数随时间保持有界性实现。此外,我们还证明了交替虚拟博弈的遗憾下界为$Ω(\sqrt{T})$。在无正则化机制下,这揭示了乐观策略与交替策略在实现$o(\sqrt{T})$遗憾方面的能力差异。