从刺激到心智：通过双边强化学习增强大语言模型的心理推理能力 (From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement Learning)

Large Language Models show promise in emotion understanding, social reasoning, and empathy, yet they struggle with psychologically grounded tasks that require inferring implicit mental states in context-rich, ambiguous settings. These limitations arise from the absence of theory-aligned supervision and the difficulty of capturing nuanced mental processes in real-world narratives. To address this gap, we leverage expert-labeled, psychologically rich scenarios and propose a trajectory-aware reinforcement learning framework that explicitly imitates expert psychological thought patterns. By integrating real-world stimuli with structured reasoning guidance, our approach enables compact models to internalize social-cognitive principles, perform nuanced psychological inference, and support continual self-improvement. Comprehensive experiments across multiple benchmarks further demonstrate that our models achieve expert-level interpretive capabilities, exhibiting strong out-of-distribution generalization and robust continual learning across diverse, challenging, and psychologically grounded tasks.

翻译：大语言模型在情感理解、社会推理和共情方面展现出潜力，但在需要从情境丰富、模糊的环境中推断隐含心理状态的心理基础任务上仍存在困难。这些局限性源于缺乏理论对齐的监督以及难以捕捉现实世界叙事中微妙心理过程。为弥补这一不足，我们利用专家标注的、心理丰富的场景，提出一种轨迹感知的强化学习框架，该框架明确模仿专家的心理思维模式。通过整合现实世界刺激与结构化推理指导，我们的方法使紧凑模型能够内化社会认知原则，执行精细的心理推断，并支持持续自我改进。在多个基准测试上的综合实验进一步表明，我们的模型实现了专家级的解释能力，在多样化、具有挑战性且基于心理学的任务上展现出强大的分布外泛化能力和稳健的持续学习性能。