The Text-to-Image (T2I) diffusion model has emerged as one of the most widely adopted generative models. However, serving diffusion models at the granularity of entire images introduces significant challenges, particularly under multi-resolution workloads. First, image-level serving obstructs batching across requests. Second, heterogeneous resolutions exhibit distinct locality characteristics, making it difficult to apply a uniform cache policy effectively. To address these challenges, we present PatchedServe, a Patch Management Framework for SLO-Optimized Hybrid-Resolution Diffusion Serving. PatchedServe is the first SLO-optimized T2I diffusion serving framework designed to handle heterogeneous resolutions. Specifically, it incorporates a novel patch-based processing workflow that substantially improves throughput for hybrid-resolution inputs. Moreover, PatchedServe devises a patch-level cache reuse policy to fully exploit diffusion redundancies and integrates an SLO-aware scheduling algorithm with lightweight online latency prediction to improve responsiveness. Our evaluation demonstrates that PatchedServe achieves 30.1 % higher SLO satisfaction than the state-of-the-art diffusion serving system, while preserving image quality.
翻译:暂无翻译