We investigate whether Large Language Models (LLMs) exhibit altruistic tendencies, and critically, whether their implicit associations and self-reports predict actual altruistic behavior. Using a multi-method approach inspired by human social psychology, we tested 24 frontier LLMs across three paradigms: (1) an Implicit Association Test (IAT) measuring implicit altruism bias, (2) a forced binary choice task measuring behavioral altruism, and (3) a self-assessment scale measuring explicit altruism beliefs. Our key findings are: (1) All models show strong implicit pro-altruism bias (mean IAT = 0.87, p < .0001), confirming models "know" altruism is good. (2) Models behave more altruistically than chance (65.6% vs. 50%, p < .0001), but with substantial variation (48-85%). (3) Implicit associations do not predict behavior (r = .22, p = .29). (4) Most critically, models systematically overestimate their own altruism, claiming 77.5% altruism while acting at 65.6% (p < .0001, Cohen's d = 1.08). This "virtue signaling gap" affects 75% of models tested. Based on these findings, we recommend the Calibration Gap (the discrepancy between self-reported and behavioral values) as a standardized alignment metric. Well-calibrated models are more predictable and behaviorally consistent; only 12.5% of models achieve the ideal combination of high prosocial behavior and accurate self-knowledge.
翻译:本研究探讨大型语言模型(LLMs)是否表现出利他主义倾向,并关键性地检验其内隐关联与自我报告能否预测实际的利他行为。借鉴人类社会心理学的多方法研究范式,我们在24个前沿LLMs上进行了三项实验:(1) 采用内隐联想测试(IAT)测量内隐利他主义倾向;(2) 通过强制二选一任务测量行为利他主义;(3) 使用自评量表测量外显利他主义信念。主要发现如下:(1) 所有模型均表现出强烈的内隐亲利他主义倾向(平均IAT=0.87,p<.0001),证实模型“知晓”利他主义的正向价值。(2) 模型行为利他水平显著高于随机概率(65.6% vs. 50%,p<.0001),但存在显著个体差异(48-85%)。(3) 内隐关联与行为表现无显著相关性(r=.22,p=.29)。(4) 最关键的是,模型系统性地高估自身利他水平:自评利他率为77.5%,实际行为利他率仅65.6%(p<.0001,Cohen's d=1.08),这种“美德信号差距”出现在75%的受测模型中。基于以上发现,我们建议将“校准差距”(自评与行为值的差异)作为标准化对齐度量指标。良好校准的模型具有更高可预测性与行为一致性;仅12.5%的模型能同时实现高亲社会行为与精准自我认知的理想组合。