We study distributionally robust Markov games (DR-MGs) with the average-reward criterion, a crucial framework for multi-agent decision-making under uncertainty over extended horizons. We first establish a connection between the best-response policies and the optimal policies for the induced single-agent problems. Under a standard irreducible assumption, we derive a correspondence between the optimal policies and the solutions of the robust Bellman equation, and derive the existence of stationary Nash Equilibrium (NE) based on these results. We also study a more general weakly communicating setting. We construct a set-valued map and show its value is a subset of the best-response policies, convex and upper hemi-continuous, which imply the existence of NE. We then introduce Robust Nash-Iteration, and provide convergence guarantees. Finally, we connect average-reward NE to discounted robust equilibria, showing approximation as the discount factor approaches one. Our studies provide comprehensive theoretical and algorithmic foundation for decision-making in complex, uncertain, and long-running multi-player environments.
翻译:暂无翻译