基于混合粒度缓存的加速可控生成方法 (Accelerating Controllable Generation via Hybrid-grained Cache)

Controllable generative models have been widely used to improve the realism of synthetic visual content. However, such models must handle control conditions and content generation computational requirements, resulting in generally low generation efficiency. To address this issue, we propose a Hybrid-Grained Cache (HGC) approach that reduces computational overhead by adopting cache strategies with different granularities at different computational stages. Specifically, (1) we use a coarse-grained cache (block-level) based on feature reuse to dynamically bypass redundant computations in encoder-decoder blocks between each step of model reasoning. (2) We design a fine-grained cache (prompt-level) that acts within a module, where the fine-grained cache reuses cross-attention maps within consecutive reasoning steps and extends them to the corresponding module computations of adjacent steps. These caches of different granularities can be seamlessly integrated into each computational link of the controllable generation process. We verify the effectiveness of HGC on four benchmark datasets, especially its advantages in balancing generation efficiency and visual quality. For example, on the COCO-Stuff segmentation benchmark, our HGC significantly reduces the computational cost (MACs) by 63% (from 18.22T to 6.70T), while keeping the loss of semantic fidelity (quantized performance degradation) within 1.5%.

翻译：可控生成模型已被广泛应用于提升合成视觉内容的真实感。然而，此类模型需同时处理控制条件与内容生成的计算需求，导致生成效率普遍较低。为解决该问题，我们提出一种混合粒度缓存（HGC）方法，通过在不同计算阶段采用不同粒度的缓存策略来降低计算开销。具体而言：（1）我们基于特征复用设计了一种粗粒度缓存（块级），可在模型推理的每一步间动态绕过编码器-解码器块中的冗余计算；（2）我们设计了一种细粒度缓存（提示级），其作用于模块内部，通过复用连续推理步骤间的交叉注意力图，并将其扩展至相邻步骤的对应模块计算中。这些不同粒度的缓存可无缝集成到可控生成过程的每个计算环节中。我们在四个基准数据集上验证了HGC的有效性，尤其展示了其在平衡生成效率与视觉质量方面的优势。例如，在COCO-Stuff分割基准测试中，我们的HGC将计算成本（MACs）显著降低了63%（从18.22T降至6.70T），同时将语义保真度损失（量化性能下降）控制在1.5%以内。