Deep Reinforcement Learning is one of the state-of-the-art methods for producing near-optimal system controllers. However, deep RL algorithms train a deep neural network, that lacks transparency, which poses challenges when the controller has to meet regulations, or foster trust. To alleviate this, one could transfer the learned behaviour into a model that is human-readable by design using knowledge distilla- tion. Often this is done with a single model which mimics the original model on average but could struggle in more dynamic situations. A key challenge is that this simpler model should have the right balance be- tween flexibility and complexity or right balance between balance bias and accuracy. We propose a new model-agnostic method to divide the state space into regions where a simplified, human-understandable model can operate in. In this paper, we use Voronoi partitioning to find regions where linear models can achieve similar performance to the original con- troller. We evaluate our approach on a gridworld environment and a classic control task. We observe that our proposed distillation to locally- specialized linear models produces policies that are explainable and show that the distillation matches or even slightly outperforms the black-box policy they are distilled from.
翻译:深度强化学习是生成近似最优系统控制器的前沿方法之一。然而,深度强化学习算法训练出的深度神经网络缺乏透明度,这在控制器需要满足监管要求或建立信任时带来挑战。为缓解此问题,可通过知识蒸馏将学习到的行为迁移到设计上具备人类可读性的模型中。通常采用单一模型进行蒸馏,该模型在平均意义上能模仿原始模型,但在动态场景中可能表现不佳。关键挑战在于简化模型需在灵活性与复杂性之间取得适当平衡,或在偏差与精度之间达到合理权衡。本文提出一种新的模型无关方法,将状态空间划分为若干区域,使简化且人类可理解的模型能在其中有效运行。我们利用Voronoi划分技术寻找线性模型能达到与原控制器相近性能的区域。在网格世界环境和经典控制任务中评估了所提方法,观察到通过蒸馏生成的局部专用线性模型策略具备可解释性,且其性能与原黑盒策略相当甚至略有提升。