通过自推测实现并行采样 (Parallel Sampling via Autospeculation)

We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. An any-order autoregressive model accesses a target distribution $μ$ on $[q]^n$ through an oracle that provides conditional marginals, while a denoising diffusion model accesses a target distribution $μ$ on $\mathbb{R}^n$ through an oracle that provides conditional means under Gaussian noise. Standard sequential sampling algorithms require $\widetilde{O}(n)$ time to produce a sample from $μ$ in either setting. We show that, by issuing oracle calls in parallel, the expected sampling time can be reduced to $\widetilde{O}(n^{1/2})$. This improves the previous $\widetilde{O}(n^{2/3})$ bound for any-order autoregressive models and yields the first parallel speedup for diffusion models in the high-accuracy regime, under the relatively mild assumption that the support of $μ$ is bounded. We introduce a novel technique to obtain our results: speculative rejection sampling. This technique leverages an auxiliary ``speculative'' distribution~$ν$ that approximates~$μ$ to accelerate sampling. Our technique is inspired by the well-studied ``speculative decoding'' techniques popular in large language models, but differs in key ways. Firstly, we use ``autospeculation,'' namely we build the speculation $ν$ out of the same oracle that defines~$μ$. In contrast, speculative decoding typically requires a separate, faster, but potentially less accurate ``draft'' model $ν$. Secondly, the key differentiating factor in our technique is that we make and accept speculations at a ``sequence'' level rather than at the level of single (or a few) steps. This last fact is key to unlocking our parallel runtime of $\widetilde{O}(n^{1/2})$.

翻译：我们提出了两种设置下通过计数加速采样的并行算法：任意阶自回归模型和去噪扩散模型。任意阶自回归模型通过提供条件边缘分布的神谕访问定义在$[q]^n$上的目标分布$μ$，而去噪扩散模型则通过提供高斯噪声下条件均值的神谕访问定义在$\mathbb{R}^n$上的目标分布$μ$。标准的顺序采样算法在任一设置下都需要$\widetilde{O}(n)$的时间来从$μ$中生成一个样本。我们证明，通过并行调用神谕，期望采样时间可以降低至$\widetilde{O}(n^{1/2})$。这改进了先前针对任意阶自回归模型的$\widetilde{O}(n^{2/3})$界限，并在$μ$的支撑集有界的相对温和假设下，为高精度区域的扩散模型实现了首次并行加速。我们引入了一种新技术来获得这些结果：推测性拒绝采样。该技术利用一个近似于$μ$的辅助“推测”分布~$ν$来加速采样。我们的技术灵感来源于大型语言模型中广泛研究的“推测解码”技术，但在关键方面有所不同。首先，我们使用“自推测”，即我们利用定义~$μ$的同一个神谕来构建推测分布$ν$。相比之下，推测解码通常需要一个独立的、更快速但可能精度较低的“草稿”模型$ν$。其次，我们技术的关键区别在于，我们在“序列”级别而非单个（或少数几个）步骤级别进行并接受推测。最后这一点是实现$\widetilde{O}(n^{1/2})$并行运行时间的关键。