双分支中心-周边对比：重新思考三维点云的对比学习 (Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds)

Most existing self-supervised learning (SSL) approaches for 3D point clouds are dominated by generative methods based on Masked Autoencoders (MAE). However, these generative methods have been proven to struggle to capture high-level discriminative features effectively, leading to poor performance on linear probing and other downstream tasks. In contrast, contrastive methods excel in discriminative feature representation and generalization ability on image data. Despite this, contrastive learning (CL) in 3D data remains scarce. Besides, simply applying CL methods designed for 2D data to 3D fails to effectively learn 3D local details. To address these challenges, we propose a novel Dual-Branch \textbf{C}enter-\textbf{S}urrounding \textbf{Con}trast (CSCon) framework. Specifically, we apply masking to the center and surrounding parts separately, constructing dual-branch inputs with center-biased and surrounding-biased representations to better capture rich geometric information. Meanwhile, we introduce a patch-level contrastive loss to further enhance both high-level information and local sensitivity. Under the FULL and ALL protocols, CSCon achieves performance comparable to generative methods; under the MLP-LINEAR, MLP-3, and ONLY-NEW protocols, our method attains state-of-the-art results, even surpassing cross-modal approaches. In particular, under the MLP-LINEAR protocol, our method outperforms the baseline (Point-MAE) by \textbf{7.9\%}, \textbf{6.7\%}, and \textbf{10.3\%} on the three variants of ScanObjectNN, respectively. The code will be made publicly available.

翻译：现有的大多数三维点云自监督学习方法主要基于掩码自编码器的生成式方法。然而，这些生成式方法已被证明难以有效捕捉高层判别性特征，导致在线性探测和其他下游任务上表现不佳。相比之下，对比方法在图像数据上展现出优异的判别性特征表示和泛化能力。尽管如此，针对三维数据的对比学习研究仍然稀缺。此外，直接将为二维数据设计的对比学习方法应用于三维数据，无法有效学习三维局部细节。为应对这些挑战，我们提出了一种新颖的双分支中心-周边对比框架。具体而言，我们分别对中心部分和周边部分进行掩码处理，构建具有中心偏向和周边偏向表示的双分支输入，以更好地捕捉丰富的几何信息。同时，我们引入了块级对比损失，以进一步增强高层信息和局部敏感性。在FULL和ALL协议下，CSCon实现了与生成式方法相当的性能；在MLP-LINEAR、MLP-3和ONLY-NEW协议下，我们的方法取得了最先进的结果，甚至超越了跨模态方法。特别是在MLP-LINEAR协议下，我们的方法在ScanObjectNN的三个变体上分别比基线方法（Point-MAE）高出7.9%、6.7%和10.3%。代码将公开提供。