Generative models have achieved remarkable progress with the emergence of flow matching (FM). It has demonstrated strong generative capabilities and attracted significant attention as a simulation-free flow-based framework capable of learning exact data densities. Motivated by these advances, we propose LatentFM, a flow-based model operating in the latent space for medical image segmentation. To model the data distribution, we first design two variational autoencoders (VAEs) to encode both medical images and their corresponding masks into a lower-dimensional latent space. We then estimate a conditional velocity field that guides the flow based on the input image. By sampling multiple latent representations, our method synthesizes diverse segmentation outputs whose pixel-wise variance reliably captures the underlying data distribution, enabling both highly accurate and uncertainty-aware predictions. Furthermore, we generate confidence maps that quantify the model certainty, providing clinicians with richer information for deeper analysis. We conduct experiments on two datasets, ISIC-2018 and CVC-Clinic, and compare our method with several prior baselines, including both deterministic and generative approach models. Through comprehensive evaluations, both qualitative and quantitative results show that our approach achieves superior segmentation accuracy while remaining highly efficient in the latent space.
翻译:随着流匹配(FM)的出现,生成模型取得了显著进展。作为一种无需模拟的基于流的框架,它能够学习精确的数据密度,展现出强大的生成能力并引起了广泛关注。受这些进展的启发,我们提出了LatentFM,一种在潜在空间中运行的基于流的医学图像分割模型。为了对数据分布进行建模,我们首先设计了两个变分自编码器(VAEs),将医学图像及其对应的掩码编码到低维潜在空间中。随后,我们估计一个基于输入图像引导流的条件速度场。通过采样多个潜在表示,我们的方法能够合成多样化的分割输出,其像素级方差可靠地捕捉了底层数据分布,从而实现高精度且具有不确定性感知的预测。此外,我们生成了量化模型置信度的置信图,为临床医生提供了更丰富的信息以进行深入分析。我们在ISIC-2018和CVC-Clinic两个数据集上进行了实验,并将我们的方法与多种先前基线模型进行了比较,包括确定性方法和生成式方法模型。通过综合评估,定性和定量结果均表明,我们的方法在潜在空间中实现了卓越的分割精度,同时保持了高效性。