普通田径运动会的可缩放深强化学习比值 (Scalable Deep Reinforcement Learning Algorithms for Mean Field Games)

Mathieu Laurière,Sarah Perrin,Sertan Girgin,Paul Muller,Ayush Jain,Theophile Cabannes,Georgios Piliouras,Julien Pérolat,Romuald Élie,Olivier Pietquin,Matthieu Geist

Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.

翻译：最近,学习MFG的平衡问题获得了势头,特别是使用无模型强化学习(RL)方法。进一步推广使用RL的一个限制因素是,现有解决MFG的算法需要混合大约数量,例如战略或美元价值。在非线性函数近似具有良好概括性,例如神经网络的情况下,这远非微不足道。我们提出了解决这一缺陷的两种方法。第一个方法从将历史数据蒸馏成神经网络中学习混合战略,并应用于Fictititious Play算法。第二个是基于正规化的在线混合方法,不需要混合历史数据或先前的估计数。它用来扩展在线光源。我们从数字上证明,这些方法能够有效地利用深RL的算法解决各种MFG。此外,我们显示,这些方法超越了文献中的SotA基线。