通过分辨主动引文获得的回报最大化 (Reward Maximisation through Discrete Active Inference)

Active inference is a probabilistic framework for modelling the behaviour of biological and artificial agents, which derives from the principle of minimising free energy. In recent years, this framework has successfully been applied to a variety of situations where the goal was to maximise reward, offering comparable and sometimes superior performance to alternative approaches. In this paper, we clarify the connection between reward maximisation and active inference by demonstrating how and when active inference agents perform actions that are optimal for maximising reward. Precisely, we show the conditions under which active inference produces the optimal solution to the Bellman equation--a formulation that underlies several approaches to model-based reinforcement learning and control. On partially observed Markov decision processes, the standard active inference scheme can produce Bellman optimal actions for planning horizons of 1, but not beyond. In contrast, a recently developed recursive active inference scheme (sophisticated inference) can produce Bellman optimal actions on any finite temporal horizon. We append the analysis with a discussion of the broader relationship between active inference and reinforcement learning.

翻译：积极推断是模拟生物和人造制剂行为的概率框架,它源于尽量减少免费能源的原则。近年来,这一框架成功地应用于各种情况,其目标是最大限度地奖励、提供可比较的、有时优异的绩效以替代方法。在本文件中,我们通过展示积极推断剂如何和何时采取最有利于最佳奖励的行动来澄清奖励最大化和积极推断之间的联系。确切地说,我们展示了积极推断为Bellman等式-a的公式提供最佳解决办法的条件,该公式是若干基于模型的强化学习和控制方法的基础。在部分观察的Markov决策过程中,标准主动推断方案可以产生贝尔曼最佳行动,用于规划1个范围,但不能超出1个范围。相比之下,最近开发的循环积极的推断计划(简单推断)可以在任何有限的时间范围内产生贝尔曼最佳行动。我们附上分析,讨论积极推断与强化学习之间的更广泛关系。