In this paper, we learn dynamics models for parametrized families of dynamical systems with varying properties. The dynamics models are formulated as stochastic processes conditioned on a latent context variable which is inferred from observed transitions of the respective system. The probabilistic formulation allows us to compute an action sequence which, for a limited number of environment interactions, optimally explores the given system within the parametrized family. This is achieved by steering the system through transitions being most informative for the context variable. We demonstrate the effectiveness of our method for exploration on a non-linear toy-problem and two well-known reinforcement learning environments.
翻译:在本文中,我们学习了具有不同特性的动态系统组合的动态模型,这些动态模型是作为根据从所观察到的各自系统转型中推导出来的潜在环境变量而形成的随机过程设计的。概率公式使我们能够计算出一个行动序列,对于为数有限的环境互动而言,该序列最理想地探索了在平衡系统中的某个系统。这是通过对上下文变量而言信息最丰富的过渡来引导系统实现的。我们展示了我们探索非线性玩具和两个众所周知的强化学习环境的方法的有效性。