从刺激到心智：通过双向强化学习增强大语言模型的心理推理能力 (From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement Learning)

Large Language Models show promise in emotion understanding, social reasoning, and empathy, yet they struggle with psychologically grounded tasks that require inferring implicit mental states in context-rich, ambiguous settings. These limitations arise from the absence of theory-aligned supervision and the difficulty of capturing nuanced mental processes in real-world narratives. To address this gap, we leverage expert-labeled, psychologically rich scenarios and propose a trajectory-aware reinforcement learning framework that explicitly imitates expert psychological thought patterns. By integrating real-world stimuli with structured reasoning guidance, our approach enables compact models to internalize social-cognitive principles, perform nuanced psychological inference, and support continual self-improvement. Comprehensive experiments across multiple benchmarks further demonstrate that our models achieve expert-level interpretive capabilities, exhibiting strong out-of-distribution generalization and robust continual learning across diverse, challenging, and psychologically grounded tasks.

翻译：大语言模型在情感理解、社会推理和共情方面展现出潜力，但在需要于情境丰富、模糊环境中推断隐含心理状态的心理基础任务上仍存在困难。这些局限性源于缺乏理论对齐的监督，以及难以捕捉现实世界叙事中微妙心理过程。为弥补这一差距，我们利用专家标注的、富含心理情境的案例，提出一种轨迹感知的强化学习框架，该框架显式模仿专家心理思维模式。通过将现实世界刺激与结构化推理指导相结合，我们的方法使紧凑模型能够内化社会认知原则，执行精细的心理推断，并支持持续自我改进。在多个基准测试上的综合实验进一步表明，我们的模型实现了专家级的解释能力，在多样化、具有挑战性且基于心理学的任务上表现出强大的分布外泛化能力和稳健的持续学习性能。