Clustering is a fundamental unsupervised learning task for uncovering patterns in data. While Gaussian Blurring Mean Shift (GBMS) has proven effective for identifying arbitrarily shaped clusters in Euclidean space, it struggles with datasets exhibiting hierarchical or tree-like structures. In this work, we introduce HypeGBMS, a novel extension of GBMS to hyperbolic space. Our method replaces Euclidean computations with hyperbolic distances and employs Möbius-weighted means to ensure that all updates remain consistent with the geometry of the space. HypeGBMS effectively captures latent hierarchies while retaining the density-seeking behavior of GBMS. We provide theoretical insights into convergence and computational complexity, along with empirical results that demonstrate improved clustering quality in hierarchical datasets. This work bridges classical mean-shift clustering and hyperbolic representation learning, offering a principled approach to density-based clustering in curved spaces. Extensive experimental evaluations on $11$ real-world datasets demonstrate that HypeGBMS significantly outperforms conventional mean-shift clustering methods in non-Euclidean settings, underscoring its robustness and effectiveness.
翻译:聚类是揭示数据模式的基本无监督学习任务。尽管高斯模糊均值漂移(GBMS)在欧几里得空间中已被证明能有效识别任意形状的簇,但其在处理呈现层次化或树状结构的数据集时存在困难。本文提出HypeGBMS,一种将GBMS扩展至双曲空间的新方法。该方法通过双曲距离替代欧几里得计算,并采用莫比乌斯加权均值以确保所有更新与空间几何结构保持一致。HypeGBMS在保留GBMS密度搜索特性的同时,能有效捕捉潜在层次结构。我们提供了关于收敛性与计算复杂度的理论分析,并通过实证结果表明其在层次化数据集上提升了聚类质量。本研究连接了经典均值漂移聚类与双曲表示学习,为弯曲空间中的基于密度的聚类提供了理论完备的方法。在$11$个真实数据集上的广泛实验评估表明,在非欧几里得场景下,HypeGBMS显著优于传统均值漂移聚类方法,凸显了其鲁棒性与有效性。