Diffusion models have quickly become some of the most popular and powerful generative models for high-dimensional data. The key insight that enabled their development was the realization that access to the score -- the gradient of the log-density at different noise levels -- allows for sampling from data distributions by solving a reverse-time stochastic differential equation (SDE) via forward discretization, and that popular denoisers allow for unbiased estimators of this score. In this paper, we demonstrate that an alternative, backward discretization of these SDEs, using proximal maps in place of the score, leads to theoretical and practical benefits. We leverage recent results in proximal matching to learn proximal operators of the log-density and, with them, develop Proximal Diffusion Models (ProxDM). Theoretically, we prove that $\widetilde{O}(d/\sqrt{\varepsilon})$ steps suffice for the resulting discretization to generate an $\varepsilon$-accurate distribution w.r.t. the KL divergence. Empirically, we show that two variants of ProxDM achieve significantly faster convergence within just a few sampling steps compared to conventional score-matching methods.
翻译:扩散模型已迅速成为高维数据中最受欢迎且强大的生成模型之一。推动其发展的关键洞见在于认识到:获取分数(即不同噪声水平下对数密度梯度的梯度)允许通过前向离散化求解反向时间随机微分方程(SDE)来从数据分布中采样,并且常用的去噪器能够为此分数提供无偏估计。本文中,我们证明采用替代性的后向离散化方法处理这些SDE,以近端映射替代分数,可带来理论与实际优势。我们利用近端匹配的最新研究成果学习对数密度的近端算子,并基于此开发了近端扩散模型(ProxDM)。理论上,我们证明所得离散化过程仅需$\widetilde{O}(d/\sqrt{\varepsilon})$步即可生成KL散度意义上$\varepsilon$精度的分布。实证研究表明,与传统的分数匹配方法相比,ProxDM的两种变体在仅少数采样步骤内即可实现显著更快的收敛速度。