Predictive modelling is often reduced to finding the best model that optimizes a selected performance measure. But what if the second-best model describes the data equally well but in a completely different way? What about the third? Is it possible that the most effective models learn completely different relationships in the data? Inspired by Anscombe's quartet, this paper introduces Rashomon's quartet, a synthetic dataset for which four models from different classes have practically identical predictive performance. However, their visualization reveals drastically distinct ways of understanding the correlation structure in data. The introduced simple illustrative example aims to further facilitate visualization as a mandatory tool to compare predictive models beyond their performance. We need to develop insightful techniques for the explanatory analysis of model sets.
翻译:预测建模通常被简化为寻找最佳模型,以优化所选的性能度量。但是,如果排名第二的模型以完全不同的方式描述数据但同样优秀呢?第三个呢?最有效的模型学到的关系是否完全不同?受安斯康姆四重奏启发,本文引入了拉什莫四重奏,这是一个合成数据集,其中四个来自不同类别的模型具有实际相同的预测性能。然而,它们的可视化揭示了对数据中的相关性结构截然不同的理解方式。所介绍的简单说明性示例旨在进一步促进可视化作为比较预测模型超越其性能的必备工具,我们需要开发深入的技术来对模型集进行解释性分析。