通过反向环境生成方法进行分布式强力政策学习 (Distributionally Robust Policy Learning via Adversarial Environment Generation) - 专知论文

会员服务 ·

0

Learning · 回合 · 稳健性 · Performer · 潜变量/隐变量 ·

2022 年 7 月 7 日

Distributionally Robust Policy Learning via Adversarial Environment Generation

翻译：通过反向环境生成方法进行分布式强力政策学习

Allen Z. Ren,Anirudha Majumdar

from arxiv, IEEE Robotics and Automation Letters, 2022. Presented at ICRA 2022

Our goal is to train control policies that generalize well to unseen environments. Inspired by the Distributionally Robust Optimization (DRO) framework, we propose DRAGEN - Distributionally Robust policy learning via Adversarial Generation of ENvironments - for iteratively improving robustness of policies to realistic distribution shifts by generating adversarial environments. The key idea is to learn a generative model for environments whose latent variables capture cost-predictive and realistic variations in environments. We perform DRO with respect to a Wasserstein ball around the empirical distribution of environments by generating realistic adversarial environments via gradient ascent on the latent space. We demonstrate strong Out-of-Distribution (OoD) generalization in simulation for (i) swinging up a pendulum with onboard vision and (ii) grasping realistic 3D objects. Grasping experiments on hardware demonstrate better sim2real performance compared to domain randomization.

翻译：我们的目标是,在分布式强力优化(DRO)框架的启发下,我们建议DRAGEN - 通过环境的反向生成来进行分布式强力政策学习,通过创造对抗环境,迭代地提高政策的稳健性,使分配变化符合现实。关键思想是学习一个潜在变量捕捉环境成本预测和现实变化的环境的基因模型。我们用瓦塞斯坦球围绕环境的经验性分布进行DRO,通过潜在空间的梯度生成现实的对抗性环境。我们在模拟中展示了强烈的外向扩散(OOD)一般化,以便(一) 移动机上视觉的弹孔和(二) 掌握现实的 3D 对象。在硬件上进行实验表明比域随机化更好的模拟性功能。

0

相关内容

Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

调控IRF3介导小胶质细胞向M2方向极化改善放射性脑损伤的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

PGC-1α在糖尿病肾病发病中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

VIP会员

文章信息

相关主题

潜变量/隐变量

相关VIP内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的事件抽取：方法、模态与未来展望的全面综述

美海军作战管理系统：变革战场空间的二十年

【MIT博士论文】以语言为中心的医学影像理解

俄罗斯“沙希德”/“天竺葵”攻击无人机

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

相关论文

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

相关基金

调控IRF3介导小胶质细胞向M2方向极化改善放射性脑损伤的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

PGC-1α在糖尿病肾病发病中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员