压缩视频超分辨率的 Slear Spatotoental 频率转换器 (Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution)

Compressed video super-resolution (VSR) aims to restore high-resolution frames from compressed low-resolution counterparts. Most recent VSR approaches often enhance an input frame by borrowing relevant textures from neighboring video frames. Although some progress has been made, there are grand challenges to effectively extract and transfer high-quality textures from compressed videos where most frames are usually highly degraded. In this paper, we propose a novel Frequency-Transformer for compressed video super-resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain. First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band. Such a design enables a fine-grained level self-attention on each frequency band, so that real visual texture can be distinguished from artifacts, and further utilized for video frame restoration. Second, we study different self-attention schemes, and discover that a divided attention which conducts a joint space-frequency attention before applying temporal attention on each frequency band, leads to the best video enhancement quality. Experimental results on two widely-used video super-resolution benchmarks show that FTVSR outperforms state-of-the-art approaches on both uncompressed and compressed videos with clear visual margins. Code is available at https://github.com/researchmm/FTVSR.

翻译：压缩的超分辨率视频(VSR)旨在从压缩的低分辨率视频中恢复高分辨率框架。最新的VSR方法往往通过借用邻近视频框架的相关纹理来增强输入框架。虽然已经取得一些进展,但在有效提取和转让压缩视频中质量高质质质素方面仍面临重大挑战,因为大多数框架通常都高度退化。在本文中,我们提议为压缩的超分辨率视频(FTVSR)建立一个新型的频率转换系统,对联合空间-时频域进行自我关注。首先,我们将视频框架分成一个补丁,并将每个补丁转换为DCT光谱图,每个频道都代表一个频带。这种设计可以使每个频带都有一个精细的刻度自留感,以便真实的视觉质素素可以与工艺品区分,并用于视频框架的恢复。第二,我们研究不同的自我保护计划,发现在对每个频带进行时间关注之前,对空间-频率进行联合关注时,引起最佳视频增强质量。在两种广泛使用的视频/超频带的实验结果中,在两种广泛使用的视频/超频带的超分辨率分辨率分辨率定位上,在两个直径的图像定位上都显示。