学习时间和应问问什么:等级强化学习框架 (Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework)

Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7x improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challenges of learning with a hierarchical policy structure and suggest directions for future work.

翻译：可靠的大赦国际代理人应注意其知识的局限性,在意识到他们没有足够的知识来作出正确的决定时,应征求人类的意见。我们制定了一个等级强化学习框架,用于学习决定何时要求人类提供更多信息以及何种类型的信息会有助于要求;我们的框架扩展了部分观察的Markov决策过程(POMDPs),允许代理人与助理互动,以利用他们的知识来完成任务。模拟人辅助导航问题的结果显示了我们框架的有效性:通过我们的方法所学习的互动政策,导航政策在任务成功率方面实现了7x的提高,而任务成功率仅由自己来完成。互动政策也是有效的:平均而言,任务执行期间采取的所有行动只有四分之一是信息要求。我们分析以等级政策结构学习的好处和挑战,并为今后的工作提出方向。