As a key medium for human interaction and information exchange, social networking services (SNS) pose unique challenges for large language models (LLMs): heterogeneous workloads, fast-shifting norms and slang, and multilingual, culturally diverse corpora that induce sharp distribution shift. Supervised fine-tuning (SFT) can specialize models but often triggers a ``seesaw'' between in-distribution gains and out-of-distribution robustness, especially for smaller models. To address these challenges, we introduce RedOne 2.0, an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm designed for rapid and stable adaptation. The pipeline consist in three stages: (1) Exploratory Learning on curated SNS corpora to establish initial alignment and identify systematic weaknesses; (2) Targeted Fine-Tuning that selectively applies SFT to the diagnosed gaps while mixing a small fraction of general data to mitigate forgetting; and (3) Refinement Learning that re-applies RL with SNS-centric signals to consolidate improvements and harmonize trade-offs across tasks. Across various tasks spanning three categories, our 4B scale model delivers an average improvements about 2.41 over the 7B sub-optimal baseline. Additionally, RedOne 2.0 achieves average performance lift about 8.74 from the base model with less than half the data required by SFT-centric method RedOne, evidencing superior data efficiency and stability at compact scales. Overall, RedOne 2.0 establishes a competitive, cost-effective baseline for domain-specific LLMs in SNS scenario, advancing capability without sacrificing robustness.
翻译:作为人类互动和信息交换的关键媒介,社交网络服务(SNS)对大语言模型(LLMs)提出了独特的挑战:异构的工作负载、快速变化的规范和俚语,以及引发显著分布偏移的多语言、文化多样化语料库。监督微调(SFT)可以使模型专业化,但常常引发域内性能提升与域外鲁棒性之间的“跷跷板”效应,对于较小模型尤其如此。为应对这些挑战,我们提出了RedOne 2.0,这是一个面向SNS的LLM,采用了一种渐进式、强化学习优先的后训练范式,旨在实现快速且稳定的适应。该流程包含三个阶段:(1)在精选的SNS语料上进行探索性学习,以建立初步对齐并识别系统性弱点;(2)针对性微调,选择性地对诊断出的差距应用SFT,同时混合少量通用数据以缓解遗忘;(3)精炼学习,重新应用以SNS为中心的信号进行强化学习,以巩固改进并协调跨任务的权衡。在涵盖三个类别的多种任务中,我们的40亿参数规模模型相比70亿参数的次优基线平均提升了约2.41分。此外,RedOne 2.0相比基础模型实现了平均约8.74分的性能提升,所需数据量不到以SFT为中心的方法RedOne的一半,证明了其在紧凑规模下具有卓越的数据效率和稳定性。总体而言,RedOne 2.0为SNS场景中的领域特定LLM建立了一个具有竞争力且成本效益高的基准,在提升能力的同时未牺牲鲁棒性。