Recent studies indicate that NLU models are prone to rely on shortcut features for prediction. As a result, these models could potentially fail to generalize to real-world out-of-distribution scenarios. In this work, we show that the shortcut learning behavior can be explained by the long-tailed phenomenon. There are two findings : 1) Trained NLU models have strong preference for features located at the head of the long-tailed distribution, and 2) Shortcut features are picked up during very early few iterations of the model training. These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample. Based on this shortcut measurement, we propose a shortcut mitigation framework, to suppress the model from making overconfident predictions for samples with large shortcut degree. Experimental results on three NLU benchmarks demonstrate that our long-tailed distribution explanation accurately reflects the shortcut learning behavior of NLU models. Experimental analysis further indicates that our method can improve the generalization accuracy on OOD data, while preserving the accuracy on in distribution test data.


翻译:最近的研究显示,NLU模型很容易依赖快捷方式进行预测,因此,这些模型可能无法概括到现实世界的分布外情景。在这项工作中,我们表明,快捷式学习行为可以由长尾现象解释。有两个结果:1)经过培训的NLU模型非常偏爱长尾分布图顶部的特征,2)在模型培训的极早期很少的迭接中,就采集了快捷式特征。这两个观察进一步用于制定一种计量方法,可以量化每个培训样本的快捷程度。根据这一快捷式测量,我们提出了一个快捷式减缓框架,以阻止模型对样本作出高度快捷式的过度偏执预测。三个NLU基准的实验结果表明,我们长尾分布解释准确地反映了NLU模型的快捷式学习行为。实验分析进一步表明,我们的方法可以提高OD数据的一般准确度,同时保持分布测试数据的准确性。

0
下载
关闭预览

相关内容

【干货书】真实机器学习,264页pdf,Real-World Machine Learning
Hierarchically Structured Meta-learning
CreateAMind
27+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
29+阅读 · 2019年5月18日
Unsupervised Learning via Meta-Learning
CreateAMind
43+阅读 · 2019年1月3日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
18+阅读 · 2018年12月24日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
Learning From Positive and Unlabeled Data: A Survey
Arxiv
5+阅读 · 2018年11月12日
VIP会员
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
27+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
29+阅读 · 2019年5月18日
Unsupervised Learning via Meta-Learning
CreateAMind
43+阅读 · 2019年1月3日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
18+阅读 · 2018年12月24日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
Top
微信扫码咨询专知VIP会员