Can in-context learning (ICL) override pre-trained label semantics, or does it merely refine an existing semantic backbone? We address this question by treating LLMs as prompt-induced classifiers and contrasting their behavior under \emph{natural} demonstrations (with correct labels) and \emph{inverted} demonstrations (systematically flipping label meanings). We decompose ICL behavior into three alignment metrics (truth, prior, and prompt alignment) and introduce a semantic override rate, defined as correctness under flipped semantics. Across eight classification tasks and eight open-source LLMs (1--12B parameters), we find consistent evidence for a semantic anchor view. With natural demonstrations, ICL improves accuracy while maintaining strong prior alignment; most correct predictions coincide with zero-shot behavior, even when the prior is weak. With inverted demonstrations, models cannot learn coherent anti-semantic classifiers: prompt alignment increases only by sacrificing accuracy, and semantic override rates remain exactly zero in our few-shot 1--12B setting. Rather than flexibly remapping label meanings, ICL primarily adjusts how inputs project onto stable semantic directions learned during pre-training, clarifying fundamental limits of few-shot prompting and suggesting that overriding label semantics at these scales requires interventions beyond ICL. All code is available at: https://github.com/AnanthaPadmanaban-KrishnaKumar/semantic-anchors-icl.
翻译:上下文学习能否覆盖预训练获得的标签语义,还是仅能优化已有的语义骨架?我们将大语言模型视为提示诱导的分类器,通过对比其在自然演示(标签正确)与反转演示(系统性地翻转标签含义)下的行为来探讨这一问题。我们将上下文学习行为分解为三个对齐指标(真实性对齐、先验对齐和提示对齐),并引入语义覆盖率,定义为在翻转语义下的正确率。在八个分类任务和八个开源大语言模型(参数规模1-12B)中,我们发现了支持语义锚点观点的一致证据。在自然演示下,上下文学习提高了准确性,同时保持了较强的先验对齐;即使先验较弱,大多数正确预测仍与零样本行为一致。在反转演示下,模型无法学习到连贯的反语义分类器:提示对齐的提升以牺牲准确性为代价,且在我们的少样本1-12B设置中,语义覆盖率始终为零。上下文学习并非灵活地重映射标签含义,而主要是调整输入如何映射到预训练期间学到的稳定语义方向上。这阐明了少样本提示的基本限制,并表明在此规模上覆盖标签语义需要超越上下文学习的干预措施。所有代码已发布于:https://github.com/AnanthaPadmanaban-KrishnaKumar/semantic-anchors-icl。