通过隐含注意解锁强化学习像素 (Unlocking Pixels for Reinforcement Learning via Implicit Attention)

There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is a self-attention bottleneck, which provides a simple and effective framework for learning high performing policies, even in the presence of distractions. However, due to poor scalability of attention architectures, these methods do not scale beyond low resolution visual inputs, using large patches (thus small attention matrices). In this paper we make use of new efficient attention algorithms, recently shown to be highly effective for Transformers, and demonstrate that these new techniques can be applied in the RL setting. This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches, even individual pixels, improving generalization. In addition, we propose a new efficient algorithm approximating softmax attention with what we call hybrid random features, leveraging the theory of angular kernels. We show theoretically and empirically that hybrid random features is a promising approach when using attention for vision-based RL.

翻译：最近人们对在基于视觉的环境中培训强化学习(RL)剂的兴趣很大,这提出了许多挑战,例如高度的维度和通过假的关联进行观测超度的可能性。解决这两个问题的有希望的方法是自我注意的瓶颈,它为学习高性能政策提供了一个简单而有效的框架,即便在有分心的情况下也是如此。然而,由于关注结构的可伸缩性差,这些方法的规模不能超过低分辨率的视觉投入,而是使用大型的补丁(微小的注意矩阵)。在本文中,我们使用了新的高效的注意算法,最近显示对变异器非常有效,并表明这些新技术可以应用到RL的设置中。这使得我们注重的控制器能够扩大视觉投入,便利使用较小的补丁,甚至单个的像素,改进一般化。此外,我们建议采用一种新的高效的算法,以我们所谓的混合随机特性来适应软性注意,利用角内核理论。我们从理论上和实验上表明,混合随机特性在使用视力时是一种有希望的方法。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

最新《高级深度学习》课程，慕尼黑工业大学

专知会员服务

82+阅读 · 2020年6月20日

【牛津大学博士论文】基于强化学习的无地图机器人导航，Reinforcement Learning Based MRN

专知会员服务

122+阅读 · 2020年5月18日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日