For deep reinforcement learning (RL) from pixels, learning effective state representations is crucial for achieving high performance. However, in practice, limited experience and high-dimensional input prevent effective representation learning. To address this, motivated by the success of masked modeling in other research fields, we introduce mask-based reconstruction to promote state representation learning in RL. Specifically, we propose a simple yet effective self-supervised method, Mask-based Latent Reconstruction (MLR), to predict the complete state representations in the latent space from the observations with spatially and temporally masked pixels. MLR enables the better use of context information when learning state representations to make them more informative, which facilitates RL agent training. Extensive experiments show that our MLR significantly improves the sample efficiency in RL and outperforms the state-of-the-art sample-efficient RL methods on multiple continuous and discrete control benchmarks. The code will be released soon.
翻译:对于从像素中深入强化学习(RL)而言,学习有效的国家表现对于取得高性能至关重要,但在实践中,经验有限和高维投入阻碍了有效的代表性学习。为了解决这一问题,由于在其他研究领域成功采用蒙面模型,我们引入了以面具为基础的重建,以促进在RL中进行国家代表性学习。具体地说,我们提议了一种简单而有效的自我监督方法,即以面具为基础的远程重建(MLR),用空间和时间蒙面像素来预测从观测的潜伏空间中完全的状态表现。MLR能够让在学习国家表现时更好地利用背景信息,使其更具信息性,从而便利RL代理培训。广泛的实验表明,我们的MLR大大提高了RL的样本效率,并超越了在多连续和离散控制基准上采用的最新样本高效RL方法。该代码将很快发布。