Image restoration (IR) models are typically trained to recover high-quality images using L1 or LPIPS loss. To handle diverse unknown degradations, zero-shot IR methods have also been introduced. However, existing pre-trained and zero-shot IR approaches often fail to align with human preferences, resulting in restored images that may not be favored. This highlights the critical need to enhance restoration quality and adapt flexibly to various image restoration tasks or backbones without requiring model retraining and ideally without labor-intensive preference data collection. In this paper, we propose the first Test-Time Preference Optimization (TTPO) paradigm for image restoration, which enhances perceptual quality, generates preference data on-the-fly, and is compatible with any IR model backbone. Specifically, we design a training-free, three-stage pipeline: (i) generate candidate preference images online using diffusion inversion and denoising based on the initially restored image; (ii) select preferred and dispreferred images using automated preference-aligned metrics or human feedback; and (iii) use the selected preference images as reward signals to guide the diffusion denoising process, optimizing the restored image to better align with human preferences. Extensive experiments across various image restoration tasks and models demonstrate the effectiveness and flexibility of the proposed pipeline.
翻译:图像修复模型通常使用L1或LPIPS损失函数进行训练,以恢复高质量图像。为处理多样未知退化问题,零样本图像修复方法亦被提出。然而,现有预训练及零样本修复方法常难以符合人类偏好,导致修复图像可能不受青睐。这凸显了在无需模型重训练、且理想情况下无需人工密集型偏好数据收集的前提下,提升修复质量并灵活适应各类图像修复任务或骨干网络的关键需求。本文首次提出图像修复的测试时偏好优化范式,该范式可增强感知质量、动态生成偏好数据,并能兼容任意图像修复模型骨干。具体而言,我们设计了一种免训练的三阶段流程:(i)基于初始修复图像,通过扩散反演与去噪在线生成候选偏好图像;(ii)使用自动化偏好对齐度量或人工反馈筛选偏好与非偏好图像;(iii)将选定偏好图像作为奖励信号引导扩散去噪过程,优化修复图像以更好地契合人类偏好。跨多种图像修复任务与模型的广泛实验验证了该流程的有效性与灵活性。