用于少样本分割的图像间与图像内精炼方法 (Inter- and Intra-image Refinement for Few Shot Segmentation)

Deep neural networks for semantic segmentation rely on large-scale annotated datasets, leading to an annotation bottleneck that motivates few shot semantic segmentation (FSS) which aims to generalize to novel classes with minimal labeled exemplars. Most existing FSS methods adopt a prototype-based paradigm, which generates query prior map by extracting masked-area features from support images and then makes predictions guided by the prior map. However, they suffer from two critical limitations induced by inter- and intra-image discrepancies: 1) The intra-class gap between support and query images, caused by single-prototype representation, results in scattered and noisy prior maps; 2) The inter-class interference from visually similar but semantically distinct regions leads to inconsistent support-query feature matching and erroneous predictions. To address these issues, we propose the Inter- and Intra-image Refinement (IIR) model. The model contains an inter-image class activation mapping based method that generates two prototypes for class-consistent region matching, including core discriminative features and local specific features, and yields an accurate and robust prior map. For intra-image refinement, a directional dropout mechanism is introduced to mask inconsistent support-query feature pairs in cross attention, thereby enhancing decoder performance. Extensive experiments demonstrate that IIR achieves state-of-the-art performance on 9 benchmarks, covering standard FSS, part FSS, and cross-domain FSS. Our source code is available at \href{https://github.com/forypipi/IIR}{https://github.com/forypipi/IIR}.

翻译：用于语义分割的深度神经网络依赖于大规模标注数据集，这导致了标注瓶颈，从而催生了少样本语义分割（FSS）的研究，其目标是通过极少量标注样本泛化到新类别。现有的大多数FSS方法采用基于原型的范式，即从支持图像中提取掩码区域特征以生成查询先验图，随后在先验图的引导下进行预测。然而，这些方法受到图像间与图像内差异带来的两个关键限制：1）由于单原型表示导致的支持图像与查询图像之间的类内差异，使得先验图呈现分散且带有噪声；2）视觉相似但语义不同的区域产生的类间干扰，导致支持-查询特征匹配不一致及预测错误。为解决这些问题，我们提出了图像间与图像内精炼（IIR）模型。该模型包含一种基于图像间类别激活映射的方法，该方法为类一致区域匹配生成两个原型——包括核心判别特征与局部特定特征，从而产生准确且鲁棒的先验图。对于图像内精炼，我们引入了一种定向丢弃机制，以在交叉注意力中屏蔽不一致的支持-查询特征对，从而提升解码器性能。大量实验表明，IIR在涵盖标准FSS、部分FSS及跨域FSS的9个基准测试中均达到了最先进的性能。我们的源代码发布于 \\href{https://github.com/forypipi/IIR}{https://github.com/forypipi/IIR}。