Approximate nearest neighbor (ANN) search in high-dimensional metric spaces is a fundamental problem with many applications. Over the past decade, proximity graph (PG)-based indexes have demonstrated superior empirical performance over alternatives. However, these methods often lack theoretical guarantees regarding the quality of query results, especially in the worst-case scenarios. In this paper, we introduce the α-convergent graph (α-CG), a new PG structure that employs a carefully designed edge pruning rule. This rule eliminates candidate neighbors for each data point p by applying the shifted-scaled triangle inequalities among p, its existing out-neighbors, and new candidates. If the distance between the query point q and its exact nearest neighbor v* is at most τ for some constant τ > 0, our α-CG finds the exact nearest neighbor in poly-logarithmic time, assuming bounded intrinsic dimensionality for the dataset; otherwise, it can find an ANN in the same time. To enhance scalability, we develop the α-convergent neighborhood graph (α-CNG), a practical variant that applies the pruning rule locally within each point's neighbors. We also introduce optimizations to reduce the index construction time. Experimental results show that our α-CNG outperforms existing PGs on real-world datasets. For most datasets, α-CNG can reduce the number of distance computations and search steps by over 15% and 45%, respectively, when compared with the best-performing baseline.
翻译:高维度量空间中的近似最近邻(ANN)搜索是一个具有广泛应用的基础问题。过去十年中,基于邻近图(PG)的索引方法在实证性能上展现出优于其他方法的优势。然而,这些方法通常缺乏关于查询结果质量的理论保证,尤其是在最坏情况下。本文提出了一种新的PG结构——α收敛图(α-CG),它采用了一种精心设计的边剪枝规则。该规则通过应用数据点p、其现有出边邻居以及新候选点之间的平移缩放三角不等式,为每个数据点p剔除候选邻居。假设数据集的固有维度有界,若查询点q与其精确最近邻v*之间的距离不超过某个常数τ(τ > 0),则我们的α-CG可在多对数时间内找到精确最近邻;否则,它能在相同时间内找到一个ANN。为提高可扩展性,我们开发了α收敛邻域图(α-CNG),这是一种实用变体,它在每个点的邻居局部范围内应用剪枝规则。我们还引入了优化措施以减少索引构建时间。实验结果表明,我们的α-CNG在真实数据集上优于现有PG方法。对于大多数数据集,与性能最佳的基线方法相比,α-CNG能将距离计算次数和搜索步骤分别减少超过15%和45%。