There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is a self-attention bottleneck, which provides a simple and effective framework for learning high performing policies, even in the presence of distractions. However, due to poor scalability of attention architectures, these methods do not scale beyond low resolution visual inputs, using large patches (thus small attention matrices). In this paper we make use of new efficient attention algorithms, recently shown to be highly effective for Transformers, and demonstrate that these new techniques can be applied in the RL setting. This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches, even individual pixels, improving generalization. In addition, we propose a new efficient algorithm approximating softmax attention with what we call hybrid random features, leveraging the theory of angular kernels. We show theoretically and empirically that hybrid random features is a promising approach when using attention for vision-based RL.
翻译:最近人们对在基于视觉的环境中培训强化学习(RL)剂的兴趣很大,这提出了许多挑战,例如高度的维度和通过假的关联进行观测超度的可能性。解决这两个问题的有希望的方法是自我注意的瓶颈,它为学习高性能政策提供了一个简单而有效的框架,即便在有分心的情况下也是如此。然而,由于关注结构的可伸缩性差,这些方法的规模不能超过低分辨率的视觉投入,而是使用大型的补丁(微小的注意矩阵)。在本文中,我们使用了新的高效的注意算法,最近显示对变异器非常有效,并表明这些新技术可以应用到RL的设置中。这使得我们注重的控制器能够扩大视觉投入,便利使用较小的补丁,甚至单个的像素,改进一般化。此外,我们建议采用一种新的高效的算法,以我们所谓的混合随机特性来适应软性注意,利用角内核理论。我们从理论上和实验上表明,混合随机特性在使用视力时是一种有希望的方法。