In an infinitely repeated pricing game, pricing algorithms based on artificial intelligence (Q-learning) may consistently learn to charge supra-competitive prices even without communication. Although concerns on algorithmic collusion have arisen, little is known on underlying factors. In this work, we experimentally analyze the dynamics of algorithms with three variants of experience replay. Algorithmic collusion still has roots in human preferences. Randomizing experience yields prices close to the static Bertrand equilibrium and higher prices are easily restored by favoring the latest experience. Moreover, relative performance concerns also stabilize the collusion. Finally, we investigate the scenarios with heterogeneous agents and test robustness on various factors.
翻译:在一个无限重复的定价游戏中,基于人工智能(Q-learning)的定价算法可能始终学会收取超竞争价格,即使没有沟通。虽然对算法串通的担忧已经出现,但基本因素却鲜为人知。在这项工作中,我们实验性地分析了算法的动态,有三种不同的经验重现。变数串通仍然源于人类的偏好。随机化的经验产生接近静态的伯特尔平衡的价格,而较高的价格很容易通过赞同最新的经验而恢复。此外,相对性能的考虑也稳定了这种串通。最后,我们用多种代理物来调查各种因素的情景,并测试各种因素的稳健性。