一般政策优化分析更新规则 (An Analytical Update Rule for General Policy Optimization) - 专知论文

会员服务 ·

0

随机性策略 · 优化器 · 策略搜索 · Performer · 相互独立的 ·

2022 年 1 月 19 日

An Analytical Update Rule for General Policy Optimization

翻译：一般政策优化分析更新规则

Hepeng Li,Nicholas Clavette,Haibo He

We present an analytical policy update rule that is independent of parameterized function approximators. The update rule is suitable for general stochastic policies with monotonic improvement guarantee. The update rule is derived from a closed-form trust-region solution using calculus of variation, following a new theoretical result that tightens existing bounds for policy search using trust-region methods. An explanation building a connection between the policy update rule and value-function methods is provided. Based on a recursive form of the update rule, an off-policy algorithm is derived naturally, and the monotonic improvement guarantee remains. Furthermore, the update rule extends immediately to multi-agent systems when updates are performed by one agent at a time.

翻译：我们提出了一个独立于参数化功能近似器的分析性政策更新规则。更新规则适合于具有单声道改进保证的一般随机政策。更新规则源于一种使用变异计算法的封闭式信任区域解决办法,该计算法采用新的理论结果,用信任区域方法收紧了政策搜索的现有界限。提供了在政策更新规则和价值功能方法之间建立联系的解释。根据更新规则的循环形式,自然产生一种非政策性算法,单声道改进保证仍然存在。此外,更新规则在由一个代理人一次进行更新时立即扩展到多剂系统。

0

相关内容

随机性策略

随机性策略

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

基于Markov方法的大规模多阶段任务系统可靠性建模与分析

国家自然科学基金

1+阅读 · 2013年12月31日

基于混合Petri网的电力CPS协同建模与分析

国家自然科学基金

2+阅读 · 2013年12月31日

X2MnZ基Heusler合金磁和相稳定性及力学性能的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

含多微电网的配电网分层分布式协调控制研究

国家自然科学基金

1+阅读 · 2011年12月31日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

A Concise Function Representation for Faster Exact MPE and Constrained Optimisation in Graphical Models

Arxiv

0+阅读 · 2022年4月17日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Towards a Stronger Theory for Permutation-based Evolutionary Algorithms

Arxiv

0+阅读 · 2022年4月15日

Relating Graph Neural Networks to Structural Causal Models

Arxiv

44+阅读 · 2021年9月9日

VIP会员

文章信息

相关主题

随机性策略

相互独立的

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向真实世界音视联合语音识别的可扩展框架

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

评估大语言模型在科学发现中的作用

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

A Concise Function Representation for Faster Exact MPE and Constrained Optimisation in Graphical Models

Arxiv

0+阅读 · 2022年4月17日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Towards a Stronger Theory for Permutation-based Evolutionary Algorithms

Arxiv

0+阅读 · 2022年4月15日

Relating Graph Neural Networks to Structural Causal Models

Arxiv

44+阅读 · 2021年9月9日

相关基金

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

基于Markov方法的大规模多阶段任务系统可靠性建模与分析

国家自然科学基金

1+阅读 · 2013年12月31日

基于混合Petri网的电力CPS协同建模与分析

国家自然科学基金

2+阅读 · 2013年12月31日

X2MnZ基Heusler合金磁和相稳定性及力学性能的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

含多微电网的配电网分层分布式协调控制研究

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员