以反事实解释加速 " 穷人中的人类加强学习 " 与 " 反事实解释 " 的趋同 (Accelerating the Convergence of Human-in-the-Loop Reinforcement Learning with Counterfactual Explanations)

The capability to interactively learn from human feedback would enable robots in new social settings. For example, novice users could train service robots in new tasks naturally and interactively. Human-in-the-loop Reinforcement Learning (HRL) addresses this issue by combining human feedback and reinforcement learning (RL) techniques. State-of-the-art interactive learning techniques suffer from slow convergence, thus leading to a frustrating experience for the human. This work approaches this problem by extending the existing TAMER Framework with the possibility to enhance human feedback with two different types of counterfactual explanations. We demonstrate our extensions' success in improving the convergence, especially in the crucial early phases of the training.

翻译：从人类反馈中进行互动学习的能力将使机器人能够在新的社会环境中发挥作用,例如,新用户可以自然和互动地对服务机器人进行新任务的培训。“人与人之间的交流强化学习”通过结合人类反馈和强化学习(RL)技术来解决这一问题。最先进的互动学习技术因缓慢的趋同而受损,从而导致人类的沮丧经历。这项工作通过扩大现有的TAMER框架来解决这一问题,有可能用两种不同的反事实解释来增强人类反馈。我们展示了我们在改进趋同方面所取得的成功,特别是在关键的早期培训阶段。