为超声波视频互动代表性学习提供有效的 " 平生 " 样本 (Effective Sample Pair Generation for Ultrasound Video Contrastive Representation Learning)

Most deep neural networks (DNNs) based ultrasound (US) medical image analysis models use pretrained backbones (e.g., ImageNet) for better model generalization. However, the domain gap between natural and medical images causes an inevitable performance bottleneck when applying to US image analysis. Our idea is to pretrain DNNs on US images directly to avoid this bottleneck. Due to the lack of annotated large-scale datasets of US images, we first construct a new large-scale US video-based image dataset named US-4, containing over 23,000 high-resolution images from four US video sub-datasets, where two sub-datasets are newly collected by our local experienced doctors. To make full use of this dataset, we then innovatively propose an US semi-supervised contrastive learning (USCL) method to effectively learn feature representations of US images, with a new sample pair generation (SPG) scheme to tackle the problem that US images extracted from videos have high similarities. Moreover, the USCL treats contrastive loss as a consistent regularization, which boosts the performance of pretrained backbones by combining the supervised loss in a mutually reinforcing way. Extensive experiments on down-stream tasks' fine-tuning show the superiority of our approach against ImageNet pretraining and pretraining using previous state-of-the-art semi-supervised learning approaches. In particular, our pretrained backbone gets fine-tuning accuracy of over 94%, which is 9% higher than 85% of the ImageNet pretrained model on the widely used POCUS dataset. The constructed US-4 dataset and source codes of this work will be made public.

翻译：最深的神经网络(DNN)基于美国图像的超声波超深(DNN)医学图像分析模型使用预先训练的骨干(例如图像网络)来更好地进行模型化概括化。然而,自然和医疗图像之间的域差导致在应用美国图像分析时不可避免地出现性能瓶颈。我们的想法是直接将DNN用在美国图像上进行预设,以避免出现这种瓶颈。由于缺乏美国图像的附加说明的大型数据集,我们首先建立一个名为US-4的大型基于视频的图像数据集,其中含有4个美国视频子数据集的23 000多张高分辨率图像,这4个视频子数据集是由我们当地有经验的医生新收集的。为了充分利用这一数据集,我们随后提出了一种半超超导化化的对比学习方法,以有效学习美国图像的特征,并采用新的两组制样本生成方法来解决从视频中提取的美国图像具有高度相似性的问题。此外,USCL将对比性损失视为一种稳定的正规化模型,这2 000多张图像网络的子数据集系由我们本地有经验的医生收集的9个子网络数据集。我们之前的升级的升级的升级系统,在测试之前进行。在测试前的测试中,将用一种测试前的系统升级的数据升级的模质变压前的系统上,将用一种方法来演示前的模模化的模化的模模制模制。