Interpreto：一个用于Transformer模型的可解释性库 (Interpreto: An Explainability Library for Transformers)

Antonin Poché,Thomas Mullor,Gabriele Sarti,Frédéric Boisnard,Corentin Friedrich,Charlotte Claye,François Hoofd,Raphael Bernas,Céline Hudelot,Fanny Jourdan

from arxiv, Equal contribution: Poché and Jourdan

Interpreto is a Python library for post-hoc explainability of text HuggingFace models, from early BERT variants to LLMs. It provides two complementary families of methods: attributions and concept-based explanations. The library connects recent research to practical tooling for data scientists, aiming to make explanations accessible to end users. It includes documentation, examples, and tutorials. Interpreto supports both classification and generation models through a unified API. A key differentiator is its concept-based functionality, which goes beyond feature-level attributions and is uncommon in existing libraries. The library is open source; install via pip install interpreto. Code and documentation are available at https://github.com/FOR-sight-ai/interpreto.

翻译：Interpreto是一个用于HuggingFace文本模型事后可解释性的Python库，涵盖从早期BERT变体到大型语言模型（LLMs）。它提供两种互补的方法族：归因分析和基于概念的解释。该库将前沿研究与数据科学家的实用工具相连接，旨在使最终用户能够便捷地获取解释。库中包含文档、示例和教程。Interpreto通过统一API同时支持分类和生成模型。其关键差异化特性在于基于概念的功能，这超越了特征级归因分析，在现有库中较为罕见。该库为开源项目，可通过pip install interpreto安装。代码和文档可在https://github.com/FOR-sight-ai/interpreto获取。