Local surrogate approaches for explaining machine learning model predictions have appealing properties, such as being model-agnostic and flexible in their modelling. Several methods exist that fit this description and share this goal. However, despite their shared overall procedure, they set out different objectives, extract different information from the black-box, and consequently produce diverse explanations, that are -- in general -- incomparable. In this work we review the similarities and differences amongst multiple methods, with a particular focus on what information they extract from the model, as this has large impact on the output: the explanation. We discuss the implications of the lack of agreement, and clarity, amongst the methods' objectives on the research and practice of explainability.
翻译:用于解释机器学习模型预测的当地替代方法具有有吸引力的特性,例如在模拟中具有模型不可知性和灵活性。有几种方法适合这一描述,并共同实现这一目标。然而,尽管它们采用共同的总体程序,但它们提出了不同的目标,从黑箱中提取不同的信息,从而产生不同的解释,这些解释总的来说是无法比较的。在这项工作中,我们审查了多种方法之间的异同之处,特别侧重于它们从模型中提取的信息,因为这对产出有重大影响:解释。我们讨论了缺乏一致意见和清晰度对可解释性研究和实践的影响。我们讨论了方法目标中缺乏一致和清晰度的问题。