飞机:9.4托普/s FPGA型LSTM加速器 (Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-temporal Sparsity)

Long Short-Term Memory (LSTM) recurrent networks are frequently used for tasks involving time sequential data such as speech recognition. However, it is difficult to deploy these networks on hardware to achieve high throughput and low latency because the fully-connected structure makes LSTM networks a memory-bounded algorithm. Previous work in LSTM accelerators either exploited weight spatial sparsity or temporal sparsity. In this paper, we present a new accelerator called "Spartus" that exploits spatio-temporal sparsity to achieve ultra-low latency inference. The spatial sparsity was induced using our proposed pruning method called Column-Balanced Targeted Dropout (CBTD) that leads to structured sparse weight matrices benefiting workload balance. It achieved up to 96% weight sparsity with negligible accuracy difference for an LSTM network trained on a TIMIT phone recognition task. To induce temporal sparsity in LSTM, we create the DeltaLSTM by extending the previous DeltaGRU method to the LSTM network. This combined sparsity saves on weight memory access and associated arithmetic operations simultaneously. Spartus was implemented on a Xilinx Zynq-7100 FPGA. The per-sample latency for a single DeltaLSTM layer of 1024 neurons running on Spartus is 1 us. Spartus achieved 9.4 TOp/s effective batch-1 throughput and 1.1 TOp/J energy efficiency, which are respectively 4X and 7X higher than the previous state-of-the-art.

翻译：长期内存(LSTM) 常规网络经常用于包含时间序列数据的任务,如语音识别等。但是,很难将这些网络安装在硬件上,以实现高吞吐量和低悬浮度,因为完全连接的结构使LSTM网络成为内存的算法。 LSTM 加速器以往的工作要么开发了重量空间宽度,要么时间偏移。在本文中,我们展示了一个新的加速器,名为“出入口”,利用时空宽度,实现超低悬浮度推断。空间宽度是用我们提议的“高压定点下降”(CBDTD) 运行方法引发的。LSTM 中, 使LSTM 开发了96%的重量宽度, 精度差异很小。要在LSTM 中引入时间偏缓度, 我们将以前的DAGRUTM方法推广到LSTM 超低悬浮度。 S- 7- IMDA 和 S- AS- Ralental 的S- real- real- report S- reportal AS- reportal a SA 10- real AS- report AS- report S- real ax AS- report S- report S-xxxx 10- reportal ax 和 AS-ral- reportal ax 10- reports- sx AS-ral FFFPS- s10-s- s- s- s- s-xx 和S- s-x-x-x-x-x-x-ral-S-S-S-ral-ral-S-x-s-x-x-s-ral-s-s-s-s-s-s-s-s-x-xxxxxxx-ral-s-ral-s-s-ral-x-x-x-x-x-x-xx-ral-S-s-s-S-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-S-S-s-s-s-x