BERT的可解释性猜想 (An Interpretability Illusion for BERT) - 专知论文

会员服务 ·

0

BERT · Taxonomy · 线性组合 · SimPLe · 迹 ·

2021 年 4 月 14 日

An Interpretability Illusion for BERT

翻译：BERT的可解释性猜想

Tolga Bolukbasi,Adam Pearce,Ann Yuan,Andy Coenen,Emily Reif,Fernanda Viégas,Martin Wattenberg

We describe an "interpretability illusion" that arises when analyzing the BERT model. Activations of individual neurons in the network may spuriously appear to encode a single, simple concept, when in fact they are encoding something far more complex. The same effect holds for linear combinations of activations. We trace the source of this illusion to geometric properties of BERT's embedding space as well as the fact that common text corpora represent only narrow slices of possible English sentences. We provide a taxonomy of model-learned concepts and discuss methodological implications for interpretability research, especially the importance of testing hypotheses on multiple data sets.

翻译：我们描述分析BERT模型时产生的“解释性错觉 ” 。网络中个体神经元的激活可能假想地将单一的简单概念编码为编码,而事实上,这些神经元正在编码更为复杂的东西。同样的效果也适用于激活的线性组合。我们追踪这种错觉的来源是BERT嵌入空间的几何特性,以及共同文本子体只代表可能的英语句子的狭小部分。我们提供了模型学概念的分类,并讨论了可解释性研究的方法影响,特别是测试多个数据集的假设的重要性。

0

相关内容

BERT

BERT全称Bidirectional Encoder Representations from Transformers，是预训练语言表示的方法，可以在大型文本语料库（如维基百科）上训练通用的“语言理解”模型，然后将该模型用于下游NLP任务，比如机器翻译、问答。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

On the Lack of Robust Interpretability of Neural Text Classifiers

Arxiv

0+阅读 · 2021年6月8日

Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling

Arxiv

0+阅读 · 2021年6月8日

Prediction or Comparison: Toward Interpretable Qualitative Reasoning

Arxiv

0+阅读 · 2021年6月4日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

Visual Interpretability for Deep Learning: a Survey

Arxiv

16+阅读 · 2018年2月7日

VIP会员

文章信息

相关主题

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向真实世界音视联合语音识别的可扩展框架

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

评估大语言模型在科学发现中的作用

相关资讯

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

On the Lack of Robust Interpretability of Neural Text Classifiers

Arxiv

0+阅读 · 2021年6月8日

Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling

Arxiv

0+阅读 · 2021年6月8日

Prediction or Comparison: Toward Interpretable Qualitative Reasoning

Arxiv

0+阅读 · 2021年6月4日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

Visual Interpretability for Deep Learning: a Survey

Arxiv

16+阅读 · 2018年2月7日

微信扫码咨询专知VIP会员