选择的悖论:在等级强化学习中利用注意 (The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning)

Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we model "affordances" through an attention mechanism that limits the available choices of temporally extended options. We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. We identify and empirically illustrate the settings in which the paradox of choice arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.

翻译：负责决策的AI代理商往往面临两大挑战:规划视野的深度,以及由于有许多选择而导致的分支因素。等级强化学习方法旨在通过提供跳过多个时间步骤的捷径来解决第一个问题。为了应对宽度问题,最好将代理商在每一步的注意力限制在合理数量的可能选择上。支付权的概念(Gibson,1977年)表明,在某些州只有某些行动是可行的。在这项工作中,我们通过一个关注机制来模拟“支付权”模式,限制时间延长选项的可用选择。我们提供了一种在线、无模式的算法,以学习能够用来进一步学习次级目标选项的支付权。我们调查了在培训数据收集、长期同步任务中的抽象价值学习以及处理越来越多的选择中硬与软的注意作用。我们从经验上找出并展示了产生选择矛盾的环境,即当选择较少但更有意义的选择能够提高学习速度和强化学习代理人的绩效时。