具有一般状态和行动空间的Markov决策程序 (Primal-dual regression approach for Markov decision processes with general state and action space) - 专知论文

会员服务 ·

0

Markov · Processing（编程语言） · 近似 · 优化器 · 蒙特卡罗 ·

2022 年 10 月 1 日

Primal-dual regression approach for Markov decision processes with general state and action space

翻译：具有一般状态和行动空间的Markov决策程序

Denis Belomestny,John Schoenmakers

We develop a regression based primal-dual martingale approach for solving finite time horizon MDPs with general state and action space. As a result, our method allows for the construction of tight upper and lower biased approximations of the value functions, and, provides tight approximations to the optimal policy. In particular, we prove tight error bounds for the estimated duality gap featuring polynomial dependence on the time horizon, and sublinear dependence on the cardinality/dimension of the possibly infinite state and action space.From a computational point of view the proposed method is efficient since, in contrast to usual duality-based methods for optimal control problems in the literature, the Monte Carlo procedures here involved do not require nested simulations.

翻译：我们开发了一种基于回归的原始双martingale 方法, 以解决具有一般状态和行动空间的有限时空 MDP 。结果, 我们的方法允许对数值函数构建紧紧的上下偏差近似值, 并为最佳政策提供近似值。特别是, 我们证明, 估计的双重性差距存在严格的错误界限, 其表现是多角度依赖时间范围, 以及亚线性依赖可能无限状态和行动空间的基点/二线性。从计算角度看, 提议的方法是有效的, 因为与文献中通常的基于双重性的最佳控制问题方法相反, 这里的蒙特卡洛程序不需要嵌套式模拟。

0

相关内容

Markov

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

随机时滞微分方程解的矩稳定性和有界性

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

粗糙核奇异积分算子的若干问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

On learning history based policies for controlling Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

Collaborative Multiobjective Evolutionary Algorithms in search of better Pareto Fronts. An application to trading systems

Arxiv

0+阅读 · 2022年11月4日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年11月4日

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

Arxiv

0+阅读 · 2022年11月3日

Isotropic Gaussian Processes on Finite Spaces of Graphs

Arxiv

0+阅读 · 2022年11月3日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向真实世界音视联合语音识别的可扩展框架

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

评估大语言模型在科学发现中的作用

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

On learning history based policies for controlling Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

Collaborative Multiobjective Evolutionary Algorithms in search of better Pareto Fronts. An application to trading systems

Arxiv

0+阅读 · 2022年11月4日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年11月4日

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

Arxiv

0+阅读 · 2022年11月3日

Isotropic Gaussian Processes on Finite Spaces of Graphs

Arxiv

0+阅读 · 2022年11月3日

相关基金

随机时滞微分方程解的矩稳定性和有界性

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

粗糙核奇异积分算子的若干问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员