国家学习强化学习特点的顺序性质会计 (Accounting for the Sequential Nature of States to Learn Features for Reinforcement Learning)

In this work, we investigate the properties of data that cause popular representation learning approaches to fail. In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features. We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning. However, metric learning requires supervision in the form of a distance function, which is absent in reinforcement learning. To overcome this, we leverage the sequential nature of states in a replay buffer to approximate a distance metric and provide a weak supervision signal, under the assumption that temporally close states are also semantically similar. We modify a VAE with triplet loss and demonstrate that this approach is able to learn useful features for downstream tasks, without additional supervision, in environments where standard VAEs fail.

翻译：在这项工作中,我们调查导致民众代议制学习方法失败的数据特性。特别是,我们发现,在国家没有明显重叠的环境中,变式自动电算器(VAEs)无法学习有用的特征。我们在一个简单的网格世界域中展示了这种失败,然后提供了衡量学习形式的解决方案。然而,衡量学习需要以远程功能的形式进行监督,而这种远程功能在强化学习中是不存在的。为了克服这一点,我们利用国家相继性质作为缓冲手段,重新发挥缓冲作用,以近似距离度,并提供薄弱的监督信号,假设时间接近状态也具有内在相似性。我们用三重损失来修改VAE,并表明在标准VAEs不能发挥作用的环境中,在没有额外监督的情况下,这一方法能够为下游任务学习有用的特征。