通过奖励-抽奖政策优化不断发现新奇战略 (Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization) - 专知论文

会员服务 ·

0

优化器 · Continuity · 一致性收敛 · 多样性 · 讲稿 ·

2022 年 4 月 24 日

Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization

翻译：通过奖励-抽奖政策优化不断发现新奇战略

Zihan Zhou,Wei Fu,Bingliang Zhang,Yi Wu

from arxiv, 30 pages, 15 figures, published as a conference paper at ICLR 2022

We present Reward-Switching Policy Optimization (RSPO), a paradigm to discover diverse strategies in complex RL environments by iteratively finding novel policies that are both locally optimal and sufficiently different from existing ones. To encourage the learning policy to consistently converge towards a previously undiscovered local optimum, RSPO switches between extrinsic and intrinsic rewards via a trajectory-based novelty measurement during the optimization process. When a sampled trajectory is sufficiently distinct, RSPO performs standard policy optimization with extrinsic rewards. For trajectories with high likelihood under existing policies, RSPO utilizes an intrinsic diversity reward to promote exploration. Experiments show that RSPO is able to discover a wide spectrum of strategies in a variety of domains, ranging from single-agent particle-world tasks and MuJoCo continuous control to multi-agent stag-hunt games and StarCraftII challenges.

翻译：我们提出了一种范式,通过迭接地寻找既地方最佳又与现有政策充分不同的新政策,在复杂的RL环境中发现不同的战略。鼓励学习政策通过在优化过程中以轨迹为基础的新测量方法,将原先尚未发现的地方最佳的RSPO开关与外部和内在回报开关一致。当取样的轨迹足够不同时,RSPO用外观奖励来进行标准的政策优化。对于在现有政策下极有可能的轨迹,RSPO利用内在的多样性奖励来促进探索。实验显示,RSPO能够在各个领域发现广泛的战略,从单剂粒子世界任务和MujoCo持续控制到多剂静态狩猎游戏和StarCraftII挑战等。

0

相关内容

优化器

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于类betatron振荡的高亮飞秒伽马射线源研究

国家自然科学基金

0+阅读 · 2014年12月31日

分段光滑Filippov系统的动力学研究

国家自然科学基金

0+阅读 · 2013年12月31日

钩端螺旋体对不同宿主与细胞致病性差异及其播散与排菌分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

双组分系统在长双歧杆菌抗胆盐胁迫反应中的调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Arxiv

0+阅读 · 2022年6月9日

Anarchic Federated Learning

Arxiv

0+阅读 · 2022年6月8日

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Arxiv

1+阅读 · 2022年6月7日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

23+阅读 · 2021年9月23日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

VIP会员

文章信息

相关主题

一致性收敛

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的事件抽取：方法、模态与未来展望的全面综述

美海军作战管理系统：变革战场空间的二十年

【MIT博士论文】以语言为中心的医学影像理解

俄罗斯“沙希德”/“天竺葵”攻击无人机

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

相关论文

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Arxiv

0+阅读 · 2022年6月9日

Anarchic Federated Learning

Arxiv

0+阅读 · 2022年6月8日

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Arxiv

1+阅读 · 2022年6月7日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

23+阅读 · 2021年9月23日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

相关基金

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于类betatron振荡的高亮飞秒伽马射线源研究

国家自然科学基金

0+阅读 · 2014年12月31日

分段光滑Filippov系统的动力学研究

国家自然科学基金

0+阅读 · 2013年12月31日

钩端螺旋体对不同宿主与细胞致病性差异及其播散与排菌分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

双组分系统在长双歧杆菌抗胆盐胁迫反应中的调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员