Two-alternative forced choice (2AFC) experiments are popular in the visual perception literature to understand how human observers perceive distances within triplets made of a reference image and two distorted versions. Previously, this had been conducted in controlled environments, with triplets sharing images, making it possible to rank the perceived quality and evaluate perceptual distance models against the ranking. Recently, crowd-sourced perceptual datasets have emerged, with no images shared between triplets, making ranking infeasible. Evaluations using this data reduces the judgements on a triplet to a binary decision, namely, whether the distance model agrees with the human decision - which is suboptimal and prone to misleading conclusions. Instead, we statistically model the underlying decision-making process during 2AFC experiments using a binomial distribution. We estimate a smooth and consistent distribution of the judgements on the reference-distorted distance plane, according to each distance model. We estimate the parameter of the local binomial distribution using maximum likelihood, and a global measurement of the expected log-likelihood of the judgements. We calculate meaningful and well-founded metrics, beyond the mere prediction accuracy as percentage agreement and compare to a neural network counterpart, also optimised to maximise likelihood according to a binomial model.
翻译:暂无翻译