This paper presents the first implementation and in-depth evaluation of the primary computational kernels from the stable-diffusion.cpp image generation framework on IMAX3, a general-purpose Coarse-Grained Reconfigurable Array (CGRA) accelerator. We designed IMAX3 as a versatile computational platform, and this work assesses its capabilities by executing a demanding image generation workload. We evaluate its performance on a current Field-Programmable Gate Array (FPGA) prototype to establish a baseline and project its potential for a future Application-Specific Integrated Circuit (ASIC) implementation. Our results demonstrate that, despite its general-purpose architecture, IMAX3 achieves promising performance and power efficiency, particularly in its projected ASIC form. This work provides concrete guidelines for future IMAX architectural designs and establishes a foundation for developing next-generation, AI-specialized Coarse-Grained Linear Array (CGLA) accelerators by refining this versatile platform. Ultimately, this achievement contributes to the realization of energy-efficient, on-device, multi-modal AI platforms.
翻译:本文首次在IMAX3——一种通用粗粒度可重构阵列(CGRA)加速器上,实现了stable-diffusion.cpp图像生成框架中的核心计算内核,并进行了深入评估。我们将IMAX3设计为一个多功能计算平台,并通过执行高要求的图像生成任务来评估其能力。我们在当前的现场可编程门阵列(FPGA)原型上评估其性能以建立基准,并预测其在未来专用集成电路(ASIC)实现中的潜力。我们的结果表明,尽管采用通用架构,IMAX3仍展现出良好的性能和能效,尤其是在其预期的ASIC形态中。这项工作为未来IMAX架构设计提供了具体指导,并通过完善这一多功能平台,为开发下一代面向人工智能的粗粒度线性阵列(CGLA)加速器奠定了基础。最终,这一成果有助于实现高效节能、设备端部署的多模态人工智能平台。