Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and \emph{evaluate}, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~\cite{shih2024parallel}, we propose to divide the sampling process into $\mathcal{O}(1)$ blocks with parallelizable Picard iterations within each block. Rigorous theoretical analysis reveals that our algorithm achieves $\widetilde{\mathcal{O}}(\mathrm{poly} \log d)$ overall time complexity, marking \emph{the first implementation with provable sub-linear complexity w.r.t. the data dimension $d$}. Our analysis is based on a generalized version of Girsanov's theorem and is compatible with both the SDE and probability flow ODE implementations. Our results shed light on the potential of fast and efficient sampling of high-dimensional data on fast-evolving modern large-memory GPU clusters.
翻译:扩散模型已成为图像和科学数据生成建模的主流方法。由于这些模型的训练和评估成本高昂,降低扩散模型的推理成本仍是一个重要目标。受近期通过并行采样技术加速扩散模型的实证成功启发,我们提出将采样过程划分为O(1)个块,每个块内采用可并行的皮卡迭代。严格的理论分析表明,我们的算法实现了Õ(poly log d)的整体时间复杂度,这是首个在数据维度d上具有可证明亚线性复杂度的实现。我们的分析基于广义版本的Girsanov定理,且兼容SDE和概率流ODE两种实现方式。这些结果揭示了在快速发展的现代大内存GPU集群上高效采样高维数据的潜力。