马尔可夫博弈与鲁棒MDP中快速策略迭代的收敛性 (Convergence of Fast Policy Iteration in Markov Games and Robust MDPs) - 专知论文

会员服务 ·

0

博弈 · 鞍点 · FT · 算法 · 鲁棒 ·

Convergence of Fast Policy Iteration in Markov Games and Robust MDPs

翻译：马尔可夫博弈与鲁棒MDP中快速策略迭代的收敛性

Keith Badger,Jefferson Huang,Marek Petrik

Markov games and robust MDPs are closely related models that involve computing a pair of saddle point policies. As part of the long-standing effort to develop efficient algorithms for these models, the Filar-Tolwinski (FT) algorithm has shown considerable promise. As our first contribution, we demonstrate that FT may fail to converge to a saddle point and may loop indefinitely, even in small games. This observation contradicts the proof of FT's convergence to a saddle point in the original paper. As our second contribution, we propose Residual Conditioned Policy Iteration (RCPI). RCPI builds on FT, but is guaranteed to converge to a saddle point. Our numerical results show that RCPI outperforms other convergent algorithms by several orders of magnitude.

翻译：马尔可夫博弈与鲁棒马尔可夫决策过程（MDP）是密切相关的模型，其核心在于计算一对鞍点策略。作为长期致力于为这些模型开发高效算法的一部分，Filar-Tolwinski（FT）算法已展现出显著潜力。作为我们的第一项贡献，我们证明FT算法可能无法收敛至鞍点，甚至可能在小型博弈中无限循环。这一观察结果与原始论文中FT收敛至鞍点的证明相矛盾。作为第二项贡献，我们提出了残差条件策略迭代（RCPI）。RCPI基于FT算法构建，但能保证收敛至鞍点。数值实验表明，RCPI的性能优于其他收敛算法数个数量级。

0

相关内容

UnHiPPO：面向不确定性的状态空间模型初始化方法

UnHiPPO：面向不确定性的状态空间模型初始化方法

专知会员服务

11+阅读 · 6月6日

【WWW2025】基于不确定性的图结构学习

【WWW2025】基于不确定性的图结构学习

专知会员服务

17+阅读 · 2月20日

【NeurIPS2022】黎曼扩散模型

【NeurIPS2022】黎曼扩散模型

专知会员服务

42+阅读 · 2022年9月15日

【NeurIPS2021】序一致因果图的多任务学习

【NeurIPS2021】序一致因果图的多任务学习

专知会员服务

20+阅读 · 2021年11月7日

AAAI 2021 | 稀疏胜负多智能体博弈中的纳什均衡解计算

专知会员服务

41+阅读 · 2021年2月12日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

图节点嵌入(Node Embeddings)概述，9页pdf

图节点嵌入(Node Embeddings)概述，9页pdf

专知

15+阅读 · 2020年8月22日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

0+阅读 · 2017年12月31日

布尔可满足性算法和单调布尔函数的复杂性

国家自然科学基金

0+阅读 · 2015年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning

Arxiv

0+阅读 · 12月11日

Fundamentals of Computing Continuous Dynamic Time Warping in 2D under Different Norms

Arxiv

0+阅读 · 11月25日

Convergence of Regret Matching in Potential Games and Constrained Optimization

Arxiv

0+阅读 · 11月17日

Nonparametric Modeling of Continuous-Time Markov Chains

Arxiv

0+阅读 · 11月6日

Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games

Arxiv

0+阅读 · 11月4日

VIP会员

文章信息

相关主题

相关VIP内容

UnHiPPO：面向不确定性的状态空间模型初始化方法

UnHiPPO：面向不确定性的状态空间模型初始化方法

专知会员服务

11+阅读 · 6月6日

【WWW2025】基于不确定性的图结构学习

【WWW2025】基于不确定性的图结构学习

专知会员服务

17+阅读 · 2月20日

【NeurIPS2022】黎曼扩散模型

【NeurIPS2022】黎曼扩散模型

专知会员服务

42+阅读 · 2022年9月15日

【NeurIPS2021】序一致因果图的多任务学习

【NeurIPS2021】序一致因果图的多任务学习

专知会员服务

20+阅读 · 2021年11月7日

AAAI 2021 | 稀疏胜负多智能体博弈中的纳什均衡解计算

专知会员服务

41+阅读 · 2021年2月12日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

前沿人工智能趋势报告（Frontier AI Trends Report）

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

相关资讯

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

图节点嵌入(Node Embeddings)概述，9页pdf

图节点嵌入(Node Embeddings)概述，9页pdf

专知

15+阅读 · 2020年8月22日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

相关论文

On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning

Arxiv

0+阅读 · 12月11日

Fundamentals of Computing Continuous Dynamic Time Warping in 2D under Different Norms

Arxiv

0+阅读 · 11月25日

Convergence of Regret Matching in Potential Games and Constrained Optimization

Arxiv

0+阅读 · 11月17日

Nonparametric Modeling of Continuous-Time Markov Chains

Arxiv

0+阅读 · 11月6日

Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games

Arxiv

0+阅读 · 11月4日

相关基金

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

0+阅读 · 2017年12月31日

布尔可满足性算法和单调布尔函数的复杂性

国家自然科学基金

0+阅读 · 2015年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员