Retrievability of a document is a collection-based statistic that measures its expected (reciprocal) rank of being retrieved within a specific rank cut-off. A collection with uniformly distributed retrievability scores across documents is an indicator of fair document exposure. While retrievability scores have been used to quantify the fairness of exposure for a collection, in our work, we use the distribution of retrievability scores to measure the exposure bias of retrieval models. We hypothesise that an uneven distribution of retrievability scores across the entire collection may not accurately reflect exposure bias but rather indicate variations in topical relevance. As a solution, we propose a topic-focused localised retrievability measure, which we call \textit{T-Retrievability} (topic-retrievability), which first computes retrievability scores over multiple groups of topically-related documents, and then aggregates these localised values to obtain the collection-level statistics. Our analysis using this proposed T-Retrievability measure uncovers new insights into the exposure characteristics of various neural ranking models. The findings suggest that this localised measure provides a more nuanced understanding of exposure fairness, offering a more reliable approach for assessing document accessibility in IR systems.
翻译:文档的可检索性是一种基于文档集合的统计量,用于衡量其在特定排名截断点内被检索到的期望(倒数)排名。若一个文档集合中所有文档的可检索性分数呈均匀分布,则表明文档曝光具有公平性。尽管可检索性分数已被用于量化文档集合的曝光公平性,但在本研究中,我们利用可检索性分数的分布来度量检索模型的曝光偏差。我们假设,整个文档集合中可检索性分数的不均匀分布可能无法准确反映曝光偏差,而是体现了主题相关性差异。为此,我们提出一种面向主题的局部可检索性度量方法,称为\\textit{T-可检索性}(主题可检索性),该方法首先在多个主题相关的文档组上计算可检索性分数,然后聚合这些局部值以获得集合层面的统计量。使用该T-可检索性度量方法进行分析,我们揭示了各类神经排序模型曝光特性的新见解。研究结果表明,这种局部度量方法能更细致地理解曝光公平性,为评估信息检索系统中文档可访问性提供了更可靠的方法。