RedOne 2.0：重新思考社交网络服务中领域特定大语言模型的后训练 (RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services)

As a key medium for human interaction and information exchange, social networking services (SNS) pose unique challenges for large language models (LLMs): heterogeneous workloads, fast-shifting norms and slang, and multilingual, culturally diverse corpora that induce sharp distribution shift. Supervised fine-tuning (SFT) can specialize models but often triggers a ``seesaw'' between in-distribution gains and out-of-distribution robustness, especially for smaller models. To address these challenges, we introduce RedOne 2.0, an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm designed for rapid and stable adaptation. The pipeline consist in three stages: (1) Exploratory Learning on curated SNS corpora to establish initial alignment and identify systematic weaknesses; (2) Targeted Fine-Tuning that selectively applies SFT to the diagnosed gaps while mixing a small fraction of general data to mitigate forgetting; and (3) Refinement Learning that re-applies RL with SNS-centric signals to consolidate improvements and harmonize trade-offs across tasks. Across various tasks spanning three categories, our 4B scale model delivers an average improvements about 2.41 over the 7B sub-optimal baseline. Additionally, RedOne 2.0 achieves average performance lift about 8.74 from the base model with less than half the data required by SFT-centric method RedOne, evidencing superior data efficiency and stability at compact scales. Overall, RedOne 2.0 establishes a competitive, cost-effective baseline for domain-specific LLMs in SNS scenario, advancing capability without sacrificing robustness.

翻译：作为人类互动和信息交换的关键媒介，社交网络服务（SNS）对大语言模型（LLMs）提出了独特的挑战：异构的工作负载、快速变化的规范和俚语，以及引发显著分布偏移的多语言、文化多样化语料库。监督微调（SFT）可以使模型专业化，但常常引发域内性能提升与域外鲁棒性之间的“跷跷板”效应，对于较小模型尤其如此。为应对这些挑战，我们提出了RedOne 2.0，这是一个面向SNS的LLM，采用了一种渐进式、强化学习优先的后训练范式，旨在实现快速且稳定的适应。该流程包含三个阶段：（1）在精选的SNS语料上进行探索性学习，以建立初步对齐并识别系统性弱点；（2）针对性微调，选择性地对诊断出的差距应用SFT，同时混合少量通用数据以缓解遗忘；（3）精炼学习，重新应用以SNS为中心的信号进行强化学习，以巩固改进并协调跨任务的权衡。在涵盖三个类别的多种任务中，我们的40亿参数规模模型相比70亿参数的次优基线平均提升了约2.41分。此外，RedOne 2.0相比基础模型实现了平均约8.74分的性能提升，所需数据量不到以SFT为中心的方法RedOne的一半，证明了其在紧凑规模下具有卓越的数据效率和稳定性。总体而言，RedOne 2.0为SNS场景中的领域特定LLM建立了一个具有竞争力且成本效益高的基准，在提升能力的同时未牺牲鲁棒性。

相关内容

泛 SNS

关注 0

SNS，全称Social Networking Services，即社会性网络服务，专指旨在帮助人们建立社会性网络的互联网应用服务。也指社会现有已成熟普及的信息载体，如短信SMS服务。SNS的另一种常用解释：全称Social Network Site，即“社交网站”或“社交网”。社会性网络（Social Networking）是指个人之间的关系网络，这种基于社会网络关系系统思想的网站就是社会性网络网站(SNS网站)。

大语言模型机器遗忘综述

专知会员服务

16+阅读 · 11月2日

强化学习遇见大语言模型：贯穿 LLM 生命周期的进展与应用综述

专知会员服务

37+阅读 · 9月23日

PlanGenLLMs：大型语言模型规划能力的最新综述

专知会员服务

32+阅读 · 5月18日

如何检测大模型“幻觉”？剑桥提出SelfCheckGPT: 针对生成型大型语言模型的零资源黑盒子幻觉检测

专知会员服务

43+阅读 · 2023年8月22日