遍历盲随机博弈中的一致值与可判定性 (Uniform Value and Decidability in Ergodic Blind Stochastic Games)

We study a class of two-player zero-sum stochastic games known as \textit{blind stochastic games}, where players neither observe the state nor receive any information about it during the game. A central concept for analyzing long-duration stochastic games is the \textit{uniform value}. A game has a uniform value $v$ if for every $\varepsilon>0$, Player 1 (resp., Player 2) has a strategy such that, for all sufficiently large $n$, his average payoff over $n$ stages is at least $v-\varepsilon$ (resp., at most $v+\varepsilon$). Prior work has shown that the uniform value may not exist in general blind stochastic games. To address this, we introduce a subclass called \textit{ergodic blind stochastic games}, defined by imposing an ergodicity condition on the state transitions. For this subclass, we prove the existence of the uniform value and provide an algorithm to approximate it, establishing the \textit{decidability} of the approximation problem. Notably, this decidability result is novel even in the single-player setting of Partially Observable Markov Decision Processes (POMDPs). Furthermore, we show that no algorithm can compute the uniform value exactly, emphasizing the tightness of our result. Finally, we establish that the uniform value is independent of the initial belief.

翻译：我们研究一类称为\\textit{盲随机博弈}的两人零和随机博弈，其中玩家在博弈过程中既无法观察状态，也无法获得任何关于状态的信息。分析长期随机博弈的一个核心概念是\\textit{一致值}。若博弈存在一致值$v$，则对于任意$\\varepsilon>0$，玩家1（相应地，玩家2）存在一种策略，使得对于所有足够大的$n$，其在$n$个阶段内的平均收益至少为$v-\\varepsilon$（相应地，至多为$v+\\varepsilon$）。先前研究表明，一般盲随机博弈中一致值可能不存在。为解决此问题，我们引入一个称为\\textit{遍历盲随机博弈}的子类，其定义为对状态转移施加遍历性条件。针对该子类，我们证明了一致值的存在性，并提供了近似计算该值的算法，从而确立了近似问题的\\textit{可判定性}。值得注意的是，即使在单玩家部分可观测马尔可夫决策过程（POMDPs）的设定下，该可判定性结果亦属首次提出。此外，我们证明不存在能够精确计算一致值的算法，这凸显了我们结果的紧致性。最后，我们确立了一致值与初始信念无关的性质。