SSP Q-学习中平均成本 (Concentration bounds for SSP Q-learning for average cost MDPs)

We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.

翻译：我们得出一个基于相同最短路径问题的Markov决策过程平均成本的Q-学习算法的集中值,并将其与基于相对价值迭代的替代方案进行数字比较。

相关内容

Markov

关注 1

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相变材料应变工程与锗多栅晶体管的优化集成方案

国家自然科学基金

0+阅读 · 2015年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于高光谱数据的交叉定标光谱特性差异订正

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

An Exploration of How Training Set Composition Bias in Machine Learning Affects Identifying Rare Objects

Arxiv

0+阅读 · 2022年7月25日

Meta-Registration: Learning Test-Time Optimization for Single-Pair Image Registration

Arxiv

0+阅读 · 2022年7月22日

Optimal precision for GANs

Arxiv

0+阅读 · 2022年7月21日

Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications

Arxiv

0+阅读 · 2022年7月20日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

VIP会员

文章信息

前往arXiv

下载PDF