Experience replay is a key component in reinforcement learning for stabilizing learning and improving sample efficiency. Its typical implementation samples transitions with replacement from a replay buffer. In contrast, in supervised learning with a fixed dataset, it is a common practice to shuffle the dataset every epoch and consume data sequentially, which is called random reshuffling (RR). RR enjoys theoretically better convergence properties and has been shown to outperform with-replacement sampling empirically. To leverage the benefits of RR in reinforcement learning, we propose sampling methods that extend RR to experience replay, both in uniform and prioritized settings, and analyze their properties via theoretical analysis and simulations. We evaluate our sampling methods on Atari benchmarks, demonstrating their effectiveness in deep reinforcement learning. Code is available at https://github.com/pfnet-research/errr.
翻译:经验回放是强化学习中稳定学习过程并提升样本效率的关键组件。其典型实现方式是从回放缓冲区中采用有放回抽样获取状态转移样本。相比之下,在固定数据集的监督学习中,通常做法是每轮训练前对数据集进行随机重排,并按序使用数据,这种方法称为随机重排(RR)。RR在理论上具有更优的收敛特性,且实证研究表明其性能优于有放回抽样。为将RR的优势引入强化学习,我们提出了将RR扩展至经验回放的抽样方法,涵盖均匀抽样与优先级抽样两种场景,并通过理论分析与仿真实验验证其特性。我们在Atari基准测试中评估了所提出的抽样方法,证明了其在深度强化学习中的有效性。代码发布于https://github.com/pfnet-research/errr。