Shortcuts, spurious rules that perform well during training but fail to generalize, present a major challenge to the reliability of deep networks (Geirhos et al., 2020). However, the impact of shortcuts on feature representations remains understudied, obstructing the design of principled shortcut-mitigation methods. To overcome this limitation, we investigate the layer-wise localization of shortcuts in deep models. Our novel experiment design quantifies the layer-wise contribution to accuracy degradation caused by a shortcut-inducing skew by counterfactual training on clean and skewed datasets. We employ our design to study shortcuts on CIFAR-10, Waterbirds, and CelebA datasets across VGG, ResNet, DeiT, and ConvNeXt architectures. We find that shortcut learning is not localized in specific layers but distributed throughout the network. Different network parts play different roles in this process: shallow layers predominantly encode spurious features, while deeper layers predominantly forget core features that are predictive on clean data. We also analyze the differences in localization and describe its principal axes of variation. Finally, our analysis of layer-wise shortcut-mitigation strategies suggests the hardness of designing general methods, supporting dataset- and architecture-specific approaches instead.
翻译:捷径(Shortcuts)是指在训练阶段表现良好但无法泛化的虚假规则,对深度网络的可靠性构成重大挑战(Geirhos et al., 2020)。然而,捷径对特征表示的影响仍未得到充分研究,这阻碍了基于原理的捷径缓解方法的设计。为克服这一局限,我们探究了深度模型中捷径的逐层局部化特性。通过设计新颖的实验方案,我们采用在干净数据集与偏斜数据集上的反事实训练,量化了由捷径诱导偏斜导致的精度下降在每层的贡献程度。我们应用该方案在CIFAR-10、Waterbirds和CelebA数据集上,针对VGG、ResNet、DeiT和ConvNeXt架构研究了捷径行为。研究发现,捷径学习并非局限于特定层,而是分布于整个网络中。网络的不同部分在此过程中扮演不同角色:浅层主要编码虚假特征,而深层主要遗忘在干净数据上具有预测性的核心特征。我们还分析了局部化差异并描述了其主要变化维度。最后,对逐层捷径缓解策略的分析表明,设计通用方法具有较高难度,因此更支持针对特定数据集和架构的解决方案。