DIAMBRA 竞技场:研究和实验新强化学习平台 (DIAMBRA Arena: a New Reinforcement Learning Platform for Research and Experimentation)

The recent advances in reinforcement learning have led to effective methods able to obtain above human-level performances in very complex environments. However, once solved, these environments become less valuable, and new challenges with different or more complex scenarios are needed to support research advances. This work presents DIAMBRA Arena, a new platform for reinforcement learning research and experimentation, featuring a collection of high-quality environments exposing a Python API fully compliant with OpenAI Gym standard. They are episodic tasks with discrete actions and observations composed by raw pixels plus additional numerical values, all supporting both single player and two players mode, allowing to work on standard reinforcement learning, competitive multi-agent, human-agent competition, self-play, human-in-the-loop training and imitation learning. Software capabilities are demonstrated by successfully training multiple deep reinforcement learning agents with proximal policy optimization obtaining human-like behavior. Results confirm the utility of DIAMBRA Arena as a reinforcement learning research tool, providing environments designed to study some of the most challenging topics in the field.

翻译：近期在强化学习方面的进步导致在非常复杂的环境中采用有效的方法,能够取得超越人类层面的成绩;然而,一旦解决,这些环境就变得不那么宝贵,需要以不同或更复杂的情景提出新的挑战来支持研究进展;这项工作提出了加强学习研究和实验的新平台DIAMBRA Arena,这是一个加强学习研究和实验的新平台,其特点是一系列高质量的环境,暴露出完全符合OpenAI Gym标准的Python API;它们是由生素和额外数字值组成的单独行动和观测的偶数任务,所有这些都支持单一玩家和两种玩家模式,从而能够进行标准强化学习、竞争性多试剂竞争、人体代理竞争、自玩、流动人员培训和模仿学习;通过成功培训多种强化学习剂,使政策得到符合人性行为的优化,展示了软件能力;结果证实了DIAMBRA Arena作为强化学习工具的效用,为研究实地一些最具挑战性的专题提供了设计的环境。