Dense retrieval models have become a standard for state-of-the-art information retrieval. However, their high-dimensional, high-precision (float32) vector embeddings create significant storage and memory challenges for real-world deployment. To address this, we conduct a rigorous empirical study on the BEIR SciFact benchmark, evaluating the trade-offs between two primary compression strategies: (1) Dimensionality Reduction via deep Autoencoders (AE), reducing original 384-dim vectors to latent spaces from 384 down to 12, and (2) Precision Reduction via Quantization (float16, int8, and binary). We systematically compare each method by measuring the "performance loss" (or gain) relative to a float32 baseline across a full suite of retrieval metrics (NDCG, MAP, MRR, Recall, Precision) at various k cutoffs. Our results show that int8 scalar quantization provides the most effective "sweet spot," achieving a 4x compression with a negligible [~1-2%] drop in nDCG@10. In contrast, Autoencoders show a graceful degradation but suffer a more significant performance loss at equivalent 4x compression ratios (AE-96). binary quantization was found to be unsuitable for this task due to catastrophic performance drops. This work provides a practical guide for deploying efficient, high-performance retrieval systems.
翻译:稠密检索模型已成为信息检索领域达到最先进性能的标准方法。然而,其高维度、高精度(float32)的向量嵌入在实际部署中带来了显著的存储和内存挑战。为解决这一问题,我们在BEIR SciFact基准上进行了严格的实证研究,评估了两种主要压缩策略之间的权衡:(1)通过深度自动编码器(AE)进行降维,将原始的384维向量压缩至从384到12的潜在空间;(2)通过量化(float16、int8和二值化)进行精度降低。我们通过测量相对于float32基线在多种检索指标(NDCG、MAP、MRR、召回率、精确率)及不同k截断值下的“性能损失”(或增益),系统比较了每种方法。我们的结果表明,int8标量量化提供了最有效的“最佳平衡点”,实现了4倍的压缩,同时nDCG@10指标仅出现可忽略的[~1-2%]下降。相比之下,自动编码器表现出平缓的性能衰减,但在相同的4倍压缩比(AE-96)下遭受更显著的性能损失。二值化量化由于导致性能急剧下降,被证明不适用于此任务。本研究为部署高效、高性能的检索系统提供了实用指南。