End-to-End Policy Gradient方法用于POMDP和可解释智能体 (End-to-End Policy Gradient Method for POMDPs and Explainable Agents) - 专知论文

会员服务 ·

0

端到端 · 智能体 · 自主驾驶汽车 · 状态转移图 · 算法 ·

2023 年 4 月 19 日

End-to-End Policy Gradient Method for POMDPs and Explainable Agents

翻译：End-to-End Policy Gradient方法用于POMDP和可解释智能体

Soichiro Nishimori,Sotetsu Koyamada,Shin Ishii

from arxiv, 10 pagee, 6 figures

Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.

翻译：真实世界中的决策问题通常是部分可观察的，并且许多可以被形式化为部分可观察的马尔可夫决策过程(POMDP)。当我们应用强化学习算法到POMDP中时，合理地估计隐藏状态可以帮助解决问题。此外，可解释的决策是更可取的，考虑到它们在现实世界任务，如自主驾驶汽车中的应用。我们提出了一种RL算法，通过端到端训练估计隐藏状态，并将估计可视化为状态转移图。实验结果表明，所提出的算法可以解决简单的POMDP问题，并且可视化使智能体的行为对人类可解释。

0

相关内容

端到端

【CMU博士论文】强化学习可解释：统一状态和策略级解释，132页pdf

【CMU博士论文】强化学习可解释：统一状态和策略级解释，132页pdf

专知会员服务

40+阅读 · 2022年11月22日

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

专知会员服务

124+阅读 · 2020年12月7日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文】深度学习的数学解释

【论文】深度学习的数学解释

机器学习研究会

10+阅读 · 2017年12月15日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

基于多模型聚合的PHM故障寿命估算方法

国家自然科学基金

5+阅读 · 2014年12月31日

基于多Agent的分散式网络免疫方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

真实和虚拟金钱奖赏下风险决策的神经机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于MCMC算法的非线性贝叶斯估计方法及其应用

国家自然科学基金

1+阅读 · 2011年12月31日

基于抽象解释的逻辑程序验证研究

国家自然科学基金

1+阅读 · 2008年12月31日

Motion Control based on Disturbance Estimation and Time-Varying Gain for Robotic Manipulators

Arxiv

0+阅读 · 2023年6月5日

SPINEX: Similarity-based Predictions and Explainable Neighbors Exploration for Regression and Classification Tasks in Machine Learning

Arxiv

0+阅读 · 2023年6月1日

A New PHO-rmula for Improved Performance of Semi-Structured Networks

Arxiv

0+阅读 · 2023年6月1日

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

Arxiv

28+阅读 · 2022年11月15日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

14+阅读 · 2021年12月20日

VIP会员

文章信息

相关主题

自主驾驶汽车

状态转移图

相关VIP内容

【CMU博士论文】强化学习可解释：统一状态和策略级解释，132页pdf

【CMU博士论文】强化学习可解释：统一状态和策略级解释，132页pdf

专知会员服务

40+阅读 · 2022年11月22日

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

专知会员服务

124+阅读 · 2020年12月7日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

大模型推理时代的知识编辑

《利用人工智能对军事行动进行建模》

【MIT博士论文】加速科学发现的因果建模实践算法

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文】深度学习的数学解释

【论文】深度学习的数学解释

机器学习研究会

10+阅读 · 2017年12月15日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Motion Control based on Disturbance Estimation and Time-Varying Gain for Robotic Manipulators

Arxiv

0+阅读 · 2023年6月5日

SPINEX: Similarity-based Predictions and Explainable Neighbors Exploration for Regression and Classification Tasks in Machine Learning

Arxiv

0+阅读 · 2023年6月1日

A New PHO-rmula for Improved Performance of Semi-Structured Networks

Arxiv

0+阅读 · 2023年6月1日

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

Arxiv

28+阅读 · 2022年11月15日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

14+阅读 · 2021年12月20日

相关基金

基于多模型聚合的PHM故障寿命估算方法

国家自然科学基金

5+阅读 · 2014年12月31日

基于多Agent的分散式网络免疫方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

真实和虚拟金钱奖赏下风险决策的神经机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于MCMC算法的非线性贝叶斯估计方法及其应用

国家自然科学基金

1+阅读 · 2011年12月31日

基于抽象解释的逻辑程序验证研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员