关于部分已知 MDP 部分已知 MDP 的讲座说明 (Lecture Notes on Partially Known MDPs) - 专知论文

会员服务 ·

0

Learning · Markov · 情景 · 优化器 · Processing（编程语言） ·

2022 年 6 月 20 日

Lecture Notes on Partially Known MDPs

翻译：关于部分已知 MDP 部分已知 MDP 的讲座说明

Guillermo A. Perez

In these notes we will tackle the problem of finding optimal policies for Markov decision processes (MDPs) which are not fully known to us. Our intention is to slowly transition from an offline setting to an online (learning) setting. Namely, we are moving towards reinforcement learning.

翻译：在这些说明中,我们将解决为我们并不完全了解的Markov决策程序找到最佳政策的问题,我们的意图是缓慢地从脱线向在线(学习)环境过渡。也就是说,我们正在向强化学习过渡。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

早年应激与nectin-afadin系统调控海马环路发育与可塑性的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

地面激光雷达提取森林单木结构参数研究

国家自然科学基金

0+阅读 · 2009年12月31日

微流控芯片单细胞蛋白质组学新方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

Active Sampling of Multiple Sources for Sequential Estimation

Active Sampling of Multiple Sources for Sequential Estimation

Arxiv

0+阅读 · 2022年8月10日

Fast Offline Policy Optimization for Large Scale Recommendation

Arxiv

0+阅读 · 2022年8月8日

Recurrent networks, hidden states and beliefs in partially observable environments

Arxiv

0+阅读 · 2022年8月6日

On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Arxiv

0+阅读 · 2022年8月5日

Safe Data Collection for Offline and Online Policy Learning

Arxiv

0+阅读 · 2022年8月4日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

锚定情报：合成欺骗时代的地面真相

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Active Sampling of Multiple Sources for Sequential Estimation

Active Sampling of Multiple Sources for Sequential Estimation

Arxiv

0+阅读 · 2022年8月10日

Fast Offline Policy Optimization for Large Scale Recommendation

Arxiv

0+阅读 · 2022年8月8日

Recurrent networks, hidden states and beliefs in partially observable environments

Arxiv

0+阅读 · 2022年8月6日

On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Arxiv

0+阅读 · 2022年8月5日

Safe Data Collection for Offline and Online Policy Learning

Arxiv

0+阅读 · 2022年8月4日

相关基金

早年应激与nectin-afadin系统调控海马环路发育与可塑性的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

地面激光雷达提取森林单木结构参数研究

国家自然科学基金

0+阅读 · 2009年12月31日

微流控芯片单细胞蛋白质组学新方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员