How do we formalize the challenge of credit assignment in reinforcement learning? Common intuition would draw attention to reward sparsity as a key contributor to difficult credit assignment and traditional heuristics would look to temporal recency for the solution, calling upon the classic eligibility trace. We posit that it is not the sparsity of the reward itself that causes difficulty in credit assignment, but rather the \emph{information sparsity}. We propose to use information theory to define this notion, which we then use to characterize when credit assignment is an obstacle to efficient learning. With this perspective, we outline several information-theoretic mechanisms for measuring credit under a fixed behavior policy, highlighting the potential of information theory as a key tool towards provably-efficient credit assignment.
翻译:如何在强化学习中正式确定信用分配的挑战? 共同直觉会提醒人们注意作为困难信用分配关键因素的奖励过度性,而传统的累赘主义则会寻求时间上的耐久性来解决这个问题,呼吁典型的资格追踪。 我们假设不是奖励本身的过度性本身造成了信用转让的难题,而是造成信用转让的难度。 我们提议使用信息理论来定义这一概念,我们随后在信贷分配阻碍有效学习时用信息理论来描述这一概念。 从这个角度出发,我们概述了根据固定行为政策衡量信贷的若干信息理论机制,强调了信息理论作为可实现高效信贷转让的关键工具的潜力。