A common method of comparing items is to collect numerical ratings on a linear scale and compare the average rating for each item. However, averaging ratings does not account for people rating according to differing personal rating scales. With this in mind, we investigate the problem of calculating aggregate numerical ratings from individual numerical ratings and propose a new, non-parametric model for the problem. We show that, with minimal modeling assumptions, the standard average is inconsistent for estimating the quality of items. Analyzing the problem of heterogeneous personal rating scales from the perspective of optimal transport, we derive an alternative rating estimator, which we show is asymptotically consistent almost surely and in L^p for estimating quality, with an optimal rate of convergence. Further, we generalize Kendall's W, a non-parametric coefficient of preference concordance between raters, from the special case of rankings to the more general case of arbitrary numerical ratings. Along the way, we prove Glivenko--Cantelli-type theorems for uniform convergence of the cumulative distribution functions and quantile functions for Wasserstein-2 barycenters on [0,1].
翻译:比较项目的常用方法是收集线性尺度上的数值评分,并比较每个项目的平均评分。然而,平均评分未能考虑人们依据不同的个人评分尺度进行评分的情况。基于此,我们研究了从个体数值评分计算聚合数值评分的问题,并提出了一种新的非参数模型。我们证明,在极简的建模假设下,标准平均值对于估计项目质量是不一致的。从最优传输的视角分析异质个人评分尺度问题,我们推导出一种替代的评分估计量,并证明其在估计质量时几乎必然且于L^p空间内渐近一致,具有最优收敛速率。此外,我们将肯德尔W系数(一种衡量评分者间偏好一致性的非参数系数)从排名的特殊情况推广至任意数值评分的一般情形。在此过程中,我们证明了关于[0,1]区间上Wasserstein-2重心的累积分布函数与分位数函数一致收敛的格利文科-坎泰利型定理。