受监督的反反应行为反馈模型学习和强化学习:触觉反馈测试台 (Supervised Learning and Reinforcement Learning of Feedback Models for Reactive Behaviors: Tactile Feedback Testbed)

from arxiv, Accepted for publication in the International Journal of Robotics Research (IJRR). Paper length is 22 pages (including references) with 12 figures. A video overview of the reinforcement learning experiment on the real robot can be seen at https://www.youtube.com/watch?v=yu5v-ZXo4-E. arXiv admin note: text overlap with arXiv:1710.08555

Robots need to be able to adapt to unexpected changes in the environment such that they can autonomously succeed in their tasks. However, hand-designing feedback models for adaptation is tedious, if at all possible, making data-driven methods a promising alternative. In this paper we introduce a full framework for learning feedback models for reactive motion planning. Our pipeline starts by segmenting demonstrations of a complete task into motion primitives via a semi-automated segmentation algorithm. Then, given additional demonstrations of successful adaptation behaviors, we learn initial feedback models through learning from demonstrations. In the final phase, a sample-efficient reinforcement learning algorithm fine-tunes these feedback models for novel task settings through few real system interactions. We evaluate our approach on a real anthropomorphic robot in learning a tactile feedback task.

翻译：机器人需要能够适应环境的意外变化, 以便他们能够自主地成功完成任务。但是, 用于适应的手工设计反馈模型即使可能的话, 也会是乏味的, 使数据驱动方法成为充满希望的替代方法。在本文中, 我们引入了学习反馈模型的完整框架, 用于反应性动作规划。我们的管道通过半自动分解算法, 将一项完整的任务分解为运动原始体。然后, 有了更多的成功适应行为的演示, 我们通过从演示中学习学习了初步反馈模型。在最后阶段, 一个抽样高效的强化学习算法微调模型, 通过少数真正的系统互动, 将这些反馈模型用于新的任务设置。我们评估了我们对于一个真正的人类形态机器人在学习触动反馈任务时所采用的方法。