CIMinus：基于SRAM的存内计算架构上稀疏深度神经网络工作负载建模与探索的赋能框架 (CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures)

Compute-in-memory (CIM) has emerged as a pivotal direction for accelerating workloads in the field of machine learning, such as Deep Neural Networks (DNNs). However, the effective exploitation of sparsity in CIM systems presents numerous challenges, due to the inherent limitations in their rigid array structures. Designing sparse DNN dataflows and developing efficient mapping strategies also become more complex when accounting for diverse sparsity patterns and the flexibility of a multi-macro CIM structure. Despite these complexities, there is still an absence of a unified systematic view and modeling approach for diverse sparse DNN workloads in CIM systems. In this paper, we propose CIMinus, a framework dedicated to cost modeling for sparse DNN workloads on CIM architectures. It provides an in-depth energy consumption analysis at the level of individual components and an assessment of the overall workload latency. We validate CIMinus against contemporary CIM architectures and demonstrate its applicability in two use-cases. These cases provide valuable insights into both the impact of sparsity patterns and the effectiveness of mapping strategies, bridging the gap between theoretical design and practical implementation.

翻译：存内计算已成为加速机器学习领域工作负载（如深度神经网络）的关键方向。然而，由于刚性阵列结构的固有局限性，在存内计算系统中有效利用稀疏性面临诸多挑战。考虑到多样化的稀疏模式和多宏块存内计算结构的灵活性，设计稀疏深度神经网络数据流并开发高效的映射策略也变得更加复杂。尽管存在这些复杂性，目前仍缺乏针对存内计算系统中多样化稀疏深度神经网络工作负载的统一系统化视角与建模方法。本文提出CIMinus，一个专用于存内计算架构上稀疏深度神经网络工作负载成本建模的框架。该框架提供了组件级别的深度能耗分析以及整体工作负载延迟评估。我们将CIMinus与当代存内计算架构进行验证，并通过两个应用案例展示了其适用性。这些案例为理解稀疏模式的影响及映射策略的有效性提供了宝贵见解，从而弥合了理论设计与实际实施之间的鸿沟。