Adapting pretrained diffusion-based generative models for text-driven image editing with negligible tuning overhead has demonstrated remarkable potential. A classical adaptation paradigm, as followed by these methods, first infers the generative trajectory inversely for a given source image by image inversion, then performs image editing along the inferred trajectory guided by the target text prompts. However, the performance of image editing is heavily limited by the approximation errors introduced during image inversion by diffusion models, which arise from the absence of exact supervision in the intermediate generative steps. To circumvent this issue, we investigate the parameter-efficient adaptation of binary-quantized generative models for image editing, and leverage their inherent characteristic that the exact intermediate quantized representations of a source image are attainable, enabling more effective supervision for precise image inversion. Specifically, we propose EditInfinity, which adapts \emph{Infinity}, a binary-quantized generative model, for image editing. We propose an efficient yet effective image inversion mechanism that integrates text prompting rectification and image style preservation, enabling precise image inversion. Furthermore, we devise a holistic smoothing strategy which allows our EditInfinity to perform image editing with high fidelity to source images and precise semantic alignment to the text prompts. Extensive experiments on the PIE-Bench benchmark across `add', `change', and `delete' editing operations, demonstrate the superior performance of our model compared to state-of-the-art diffusion-based baselines. Code available at: https://github.com/yx-chen-ust/EditInfinity.
翻译:通过微调开销可忽略的文本驱动图像编辑,适配预训练的扩散生成模型已展现出显著潜力。这些方法遵循的经典适配范式首先通过图像反演为给定源图像推断生成轨迹,随后沿推断轨迹在目标文本提示引导下进行图像编辑。然而,图像编辑的性能严重受限于扩散模型在图像反演过程中引入的近似误差,这些误差源于中间生成步骤缺乏精确监督。为规避此问题,我们研究了二值量化生成模型在图像编辑中的参数高效适配,并利用其固有特性——源图像的精确中间量化表示是可获取的,从而为精确图像反演提供更有效的监督。具体而言,我们提出EditInfinity,该方法适配了二值量化生成模型Infinity用于图像编辑。我们提出了一种高效且有效的图像反演机制,整合了文本提示修正与图像风格保持,以实现精确的图像反演。此外,我们设计了一种整体平滑策略,使EditInfinity能够在保持对源图像高保真度的同时,精确实现与文本提示的语义对齐。在PIE-Bench基准测试中,针对‘添加’、‘更改’和‘删除’编辑操作的大量实验表明,我们的模型相较于最先进的基于扩散的基线方法具有优越性能。代码发布于:https://github.com/yx-chen-ust/EditInfinity。