Understanding and following directions provided by humans can enable robots to navigate effectively in unknown situations. We present FollowNet, an end-to-end differentiable neural architecture for learning multi-modal navigation policies. FollowNet maps natural language instructions as well as visual and depth inputs to locomotion primitives. FollowNet processes instructions using an attention mechanism conditioned on its visual and depth input to focus on the relevant parts of the command while performing the navigation task. Deep reinforcement learning (RL) a sparse reward learns simultaneously the state representation, the attention function, and control policies. We evaluate our agent on a dataset of complex natural language directions that guide the agent through a rich and realistic dataset of simulated homes. We show that the FollowNet agent learns to execute previously unseen instructions described with a similar vocabulary, and successfully navigates along paths not encountered during training. The agent shows 30% improvement over a baseline model without the attention mechanism, with 52% success rate at novel instructions.
翻译:人类提供的了解和遵循方向可以使机器人在未知情况下有效导航。 我们展示了“ 跟踪网络”,这是一个用于学习多模式导航政策的端到端不同的神经结构。 跟踪网络绘制自然语言指示图,以及向移动原始体提供视觉和深度投入。 跟踪网络进程指示,以其视觉和深度输入为条件,以其视觉和深度输入为条件,在开展导航任务时侧重于指令的相关部分。 深层强化学习(RL) 微薄的奖励同时学习国家代表、关注功能和控制政策。 我们评估了一个复杂的自然语言方向数据集的代理,该数据集通过一个丰富和现实的模拟家庭数据集指导代理人。 我们显示,“ 跟踪网络” 代理学会用相似的词汇执行先前未见的指示,并成功地沿着培训期间未遇到的路径前进。 该代理显示,在没有关注机制的情况下,基线模型有30%的改进,在新指令中成功率为52%。