近似牛顿策略梯度算法 (Approximate Newton policy gradient algorithms) - 专知论文

会员服务 ·

0

近似 · 香农熵 · Markov · state-of-the-art · ENJOY ·

2023 年 3 月 20 日

Approximate Newton policy gradient algorithms

翻译：近似牛顿策略梯度算法

Haoya Li,Samarth Gupta,Hsiangfu Yu,Lexing Ying,Inderjit Dhillon

from arxiv, 22 pages, 15 figures

Policy gradient algorithms have been widely applied to Markov decision processes and reinforcement learning problems in recent years. Regularization with various entropy functions is often used to encourage exploration and improve stability. This paper proposes an approximate Newton method for the policy gradient algorithm with entropy regularization. In the case of Shannon entropy, the resulting algorithm reproduces the natural policy gradient algorithm. For other entropy functions, this method results in brand-new policy gradient algorithms. We prove that all these algorithms enjoy Newton-type quadratic convergence and that the corresponding gradient flow converges globally to the optimal solution. We use synthetic and industrial-scale examples to demonstrate that the proposed approximate Newton method typically converges in single-digit iterations, often orders of magnitude faster than other state-of-the-art algorithms.

翻译：策略梯度算法近年来已经广泛应用于马尔可夫决策过程和强化学习问题。采用各种熵函数的正则化通常用于鼓励探索和提高稳定性。本文提出了一个牛顿近似方法用于带熵正则化的策略梯度算法。在香农熵的情况下，得出的算法重现了自然策略梯度算法。对于其他熵函数，这种方法得出了全新的策略梯度算法。我们证明了所有这些算法都享受Newton类型的二次收敛性，相应的梯度流全局地收敛至最优解。我们使用综合和工业规模的实例来证明了所提出的近似牛顿方法通常在单数次迭代中收敛，往往比其他最先进的算法快数个数量级。

1

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

34+阅读 · 2020年8月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

组合测试用例优先排序算法及选择策略研究

国家自然科学基金

8+阅读 · 2015年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

反常扩散的广义积分方程构造理论及其数值应用

国家自然科学基金

0+阅读 · 2013年12月31日

谱范数下矩阵的广义最小秩逼近问题及应用

国家自然科学基金

0+阅读 · 2013年12月31日

Policy Gradient Algorithms Implicitly Optimize by Continuation

Arxiv

0+阅读 · 2023年5月11日

A Simple and Efficient Stochastic Algorithm for Decentralized Nonconvex-Strongly-Concave Minimax Optimization

Arxiv

0+阅读 · 2023年5月10日

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Arxiv

0+阅读 · 2023年5月9日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

34+阅读 · 2020年8月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能对军事行动进行建模》

《利用人工智能学习、优化与推演美国海军作战部队的战略布局与分散（续文）》

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

《指挥官意图消息中关键概念自动提取》最新47页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Policy Gradient Algorithms Implicitly Optimize by Continuation

Arxiv

0+阅读 · 2023年5月11日

A Simple and Efficient Stochastic Algorithm for Decentralized Nonconvex-Strongly-Concave Minimax Optimization

Arxiv

0+阅读 · 2023年5月10日

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Arxiv

0+阅读 · 2023年5月9日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

相关基金

组合测试用例优先排序算法及选择策略研究

国家自然科学基金

8+阅读 · 2015年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

反常扩散的广义积分方程构造理论及其数值应用

国家自然科学基金

0+阅读 · 2013年12月31日

谱范数下矩阵的广义最小秩逼近问题及应用

国家自然科学基金

0+阅读 · 2013年12月31日

微信扫码咨询专知VIP会员