This paper presents OptNet, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. We explore the foundations for such an architecture: we show how techniques from sensitivity analysis, bilevel optimization, and implicit differentiation can be used to exactly differentiate through these layers and with respect to layer parameters; we develop a highly efficient solver for these layers that exploits fast GPU-based batch solves within a primal-dual interior point method, and which provides backpropagation gradients with virtually no additional cost on top of the solve; and we highlight the application of these approaches in several problems. In one notable example, the method is learns to play mini-Sudoku (4x4) given just input and output games, with no a-priori information about the rules of the game; this highlights the ability of OptNet to learn hard constraints better than other neural architectures.
翻译:本文展示了 OptNet 的网络架构, 将优化问题( 具体以二次程序的形式) 整合为在更大的端到端可训练的深层网络中的单个层。 这些层将传统革命和完全连接的层往往无法捕捉的隐藏状态之间的制约和复杂依赖性编码成。 我们探索了这种架构的基础: 我们展示了如何利用敏感分析、 双层优化和隐含差异性的技术来通过这些层和层次参数进行精确区分; 我们为这些层开发了高效的解决方案, 利用了在原始二极内点方法中快速的基于 GPU 的批量解决方案, 并在解决方案上几乎没有额外成本的情况下提供了反对映梯度; 我们突出了这些方法在几个问题上的应用。 在一个显著的例子中, 方法是学习如何玩小型Sudoku ( 4x4), 只需投入和产出游戏, 并且没有关于游戏规则的优先信息; 这凸显了 OptNet 的能力, 来学习比其他神经结构更好的硬性制约 。