The Information Contrastive (I-Con) framework revealed that over 23 representation learning methods implicitly minimize KL divergence between data and learned distributions that encode similarities between data points. However, a KL-based loss may be misaligned with the true objective, and properties of KL divergence such as asymmetry and unboundedness may create optimization challenges. We present Beyond I-Con, a framework that enables systematic discovery of novel loss functions by exploring alternative statistical divergences. Key findings: (1) on unsupervised clustering of DINO-ViT embeddings, we achieve state-of-the-art results by modifying the PMI algorithm to use total variation (TV) distance; (2) supervised contrastive learning with Euclidean distance as the feature space metric is improved by replacing the standard loss function with Jenson-Shannon divergence (JSD); (3) on dimensionality reduction, we achieve superior qualitative results and better performance on downstream tasks than SNE by replacing KL with a bounded $f$-divergence. Our results highlight the importance of considering divergence choices in representation learning optimization.
翻译:信息对比(I-Con)框架揭示了超过23种表征学习方法隐含地最小化了数据分布与学习分布之间的KL散度,这些分布编码了数据点之间的相似性。然而,基于KL的损失可能与真实目标不一致,且KL散度的非对称性和无界性等特性可能带来优化挑战。我们提出了Beyond I-Con框架,通过探索替代的统计散度,实现新型损失函数的系统性发现。关键发现包括:(1)在DINO-ViT嵌入的无监督聚类任务中,通过将PMI算法修改为使用总变差(TV)距离,我们取得了最先进的结果;(2)在监督对比学习中,以欧氏距离作为特征空间度量时,用Jenson-Shannon散度(JSD)替换标准损失函数可提升性能;(3)在降维任务中,通过用有界的$f$-散度替换KL散度,我们获得了比SNE更优的定性结果及下游任务性能。我们的研究结果凸显了在表征学习优化中考虑散度选择的重要性。