While diffusion models excel at generating continuous data such as images, adapting them to discrete tasks has relied on indirect approaches that either operate in continuous embedding spaces or use token masking mechanisms, both of which deviate from modeling the true discrete data distribution that can be theoretically guaranteed by Tweedie's formula. We propose in-situ Tweedie Discrete Diffusion (TDD), a framework that performs diffusion guaranteed by Tweedie's formula directly within the discrete one-hot space, hence "in-situ." Unlike prior methods that diffuse continuous embeddings or mask tokens, TDD directly corrupts one-hot vectors with Gaussian noise and performs iterative denoising through a timestep-conditioned cross-entropy objective rather than mean-squared-error reconstruction. At each denoising step, the model predicts class probabilities, applies argmax to obtain discrete predictions, converts them to one-hot vectors, and feeds them into the next iteration with progressively reduced noise. This process naturally unifies discriminative classification and generative modeling under a single framework. Experiments demonstrate that TDD achieves strong performance on both image classification and text generation tasks, with extensive ablation studies confirming the effectiveness of each design component. Our work establishes a principled approach to discrete diffusion that preserves the core characteristics of diffusion models while operating natively in discrete space.
翻译:尽管扩散模型在生成连续数据(如图像)方面表现出色,但将其应用于离散任务时,现有方法主要依赖间接策略:要么在连续嵌入空间中操作,要么采用令牌掩码机制。这两种方式均偏离了对Tweedie公式理论上可保证的真实离散数据分布的建模。本文提出原位Tweedie离散扩散模型(TDD),该框架在离散独热编码空间内直接执行由Tweedie公式保证的扩散过程,故称为'原位'。与先前对连续嵌入进行扩散或掩码令牌的方法不同,TDD直接对独热向量施加高斯噪声污染,并通过时间步条件化的交叉熵目标(而非均方误差重建)进行迭代去噪。在每个去噪步骤中,模型预测类别概率,应用argmax获得离散预测结果,将其转换为独热向量后输入到噪声逐步减小的下一轮迭代中。该过程自然地将判别式分类与生成式建模统一于单一框架下。实验表明,TDD在图像分类和文本生成任务上均取得优异性能,大量消融研究证实了各设计组件的有效性。本研究建立了一种保持扩散模型核心特性、同时在离散空间原生运行的离散扩散原理性方法。