利用深强化学习的适应性流感知 (Adaptive Streaming Perception using Deep Reinforcement Learning)

Executing computer vision models on streaming visual data, or streaming perception is an emerging problem, with applications in self-driving, embodied agents, and augmented/virtual reality. The development of such systems is largely governed by the accuracy and latency of the processing pipeline. While past work has proposed numerous approximate execution frameworks, their decision functions solely focus on optimizing latency, accuracy, or energy, etc. This results in sub-optimum decisions, affecting the overall system performance. We argue that the streaming perception systems should holistically maximize the overall system performance (i.e., considering both accuracy and latency simultaneously). To this end, we describe a new approach based on deep reinforcement learning to learn these tradeoffs at runtime for streaming perception. This tradeoff optimization is formulated as a novel deep contextual bandit problem and we design a new reward function that holistically integrates latency and accuracy into a single metric. We show that our agent can learn a competitive policy across multiple decision dimensions, which outperforms state-of-the-art policies on public datasets.

翻译：执行关于视觉数据流或流动感知的计算机视觉模型是一个正在出现的问题,在自我驱动、内装剂和增强/虚拟现实的应用中,这些系统的开发在很大程度上取决于处理管道的准确性和长期性。虽然过去的工作提出了许多近似执行框架,但其决定功能完全侧重于优化延时度、准确性或能量等。这导致了次优化决定,影响到整个系统的业绩。我们主张流动感系统应全面最大限度地实现整个系统的业绩(即既考虑精确性又考虑延时性)。为此,我们描述了一种基于深度强化学习的新方法,以在流动感知的运行时间学习这些权衡。这种权衡优化是作为一个全新的深层背景宽度问题制定的,我们设计一种新的奖励功能,将延时度和准确性整体整合到一个单一的尺度中。我们表明,我们的代理人可以学习一种跨越多个决定层面的竞争性政策,这超越了公共数据集的状态政策。