快速 BERT 通过早期输出导出序列标签加速推推 (Accelerating BERT Inference for Sequence Labeling via Early-Exit)

Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their computational cost is expensive. To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. However, existing early-exit mechanisms are specifically designed for sequence-level tasks, rather than sequence labeling. In this paper, we first propose a simple extension of sentence-level early-exit for sequence labeling tasks. To further reduce the computational cost, we also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers. Considering the local dependency inherent in sequence labeling, we employed a window-based criterion to decide for a token whether or not to exit. The token-level early-exit brings the gap between training and inference, so we introduce an extra self-sampling fine-tuning stage to alleviate it. The extensive experiments on three popular sequence labeling tasks show that our approach can save up to 66%-75% inference cost with minimal performance degradation. Compared with competitive compressed models such as DistilBERT, our approach can achieve better performance under the same speed-up ratios of 2X, 3X, and 4X.

翻译：性能和效率是许多现实世界情景中排序标签任务的关键因素。尽管预先培训的模型(PTMs)大大改善了各种序列标签任务的业绩,但其计算成本是昂贵的。为了缓解这一问题,我们推广了最近成功的早期退出机制,以加快PTMs对序列标签任务的推断。然而,现有的提前退出机制是专门为排序任务设计的,而不是顺序标签。在本文件中,我们首先提议为序列标签任务简单延长判刑级的提前退出阶段。为进一步降低计算成本,我们还提议了一个象征性的提前退出机制,允许部分标识在不同的层次上提前退出。考虑到序列标签中固有的局部依赖性,我们采用了基于窗口的标准来决定是否退出。象征性的提前退出机制为培训与推断之间带来差距,因此我们引入了额外的自我抽样微调阶段来缓解它。在三种流行排序任务上的广泛实验表明,我们的方法可以节省到66 %的提前退出标志。在SIRX模式下, 将SB 与最起码的运行速度比为2,在SIR-75B中, 将我们的方法保存到比比为更好的S-R-75。