Real-world tasks of interest are generally poorly defined by human-readable descriptions and have no pre-defined reward signals unless it is defined by a human designer. Conversely, data-driven algorithms are often designed to solve a specific, narrowly defined, task with performance metrics that drives the agent's learning. In this work, we present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft, which challenged participants to use human data to solve four tasks defined only by a natural language description and no reward function. Our approach uses the available human demonstration data to train an imitation learning policy for navigation and additional human feedback to train an image classifier. These modules, together with an estimated odometry map, are then combined into a state-machine designed based on human knowledge of the tasks that breaks them down in a natural hierarchy and controls which macro behavior the learning agent should follow at any instant. We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators. Codebase is available at https://github.com/viniciusguigo/kairos_minerl_basalt.
翻译:人类可读描述通常对现实世界感兴趣的任务定义不甚明确,除非由人类设计师界定,否则没有预先界定的奖励信号。相反,数据驱动算法的设计往往旨在用能推动代理人学习的性能衡量标准解决具体、定义狭窄的任务。在这项工作中,我们首先提出在2021年NeurIPS竞争MineRL BASALT挑战中获得最像人类的代理物的解决方案,在2021年NeurIPS竞争MineRL BASALT挑战中被授予最像人类的代理物:从人类在地雷工艺中的反馈中学习,这要求参与者使用人类数据来解决仅由自然语言描述和无报酬功能界定的四项任务。我们的方法是利用现有的人类演示数据来训练模拟导航学习政策和更多的人类反馈来训练图像分析师。这些模块,加上估计的odogramat 地图,然后被合并成一种基于人类对任务的知识设计的国家机器,在自然等级和控制中打破这些任务,学习代理人随时应该遵循的宏观行为。我们把这种混合情报方法比较到终端机器学习和纯工程设计的解决办法,然后由人类评价员_qubkabrus_bgius_basbasbasbasbasbasbasbasbasbasbasbasbusususbususus。