The increasing diversity and complexity of transformer workloads at the edge present significant challenges in balancing performance, energy efficiency, and architectural flexibility. This paper introduces NX-CGRA, a programmable hardware accelerator designed to support a range of transformer inference algorithms, including both linear and non-linear functions. Unlike fixed-function accelerators optimized for narrow use cases, NX-CGRA employs a coarse-grained reconfigurable array (CGRA) architecture with software-driven programmability, enabling efficient execution across varied kernel patterns. The architecture is evaluated using representative benchmarks derived from real-world transformer models, demonstrating high overall efficiency and favorable energy-area tradeoffs across different classes of operations. These results indicate the potential of NX-CGRA as a scalable and adaptable hardware solution for edge transformer deployment under constrained power and silicon budgets.
翻译:边缘计算中Transformer工作负载的日益多样化和复杂性,对性能、能效和架构灵活性之间的平衡提出了重大挑战。本文介绍了NX-CGRA,一种专为支持多种Transformer推理算法(包括线性和非线性函数)而设计的可编程硬件加速器。与针对狭窄用例优化的固定功能加速器不同,NX-CGRA采用具有软件驱动可编程性的粗粒度可重构阵列架构,能够高效执行多样化的核心运算模式。该架构通过源自实际Transformer模型的代表性基准进行评估,展示了在不同类型操作中具有高整体效率及优越的能耗-面积权衡特性。这些结果表明,NX-CGRA在受限的功耗和硅片预算下,有望成为边缘Transformer部署的可扩展且适应性强的硬件解决方案。