g-DPO：面向蛋白质语言模型的可扩展偏好优化方法 (g-DPO: Scalable Preference Optimization for Protein Language Models)

from arxiv, Accepted at two workshops: FM4LS NeurIPS 2025 (https://nips2025fm4ls.github.io/pages/accepted-paper.html) and MLSB in Copenhagen EurIPS 2025

Direct Preference Optimization (DPO) is an effective approach for aligning protein language models with experimental design goals. However, DPO faces a scalability bottleneck: the number of possible training pairs grows quadratically with the number of labeled sequences, leading to prohibitive training times even for modestly sized datasets. We introduce g-DPO, a framework that (i) uses sequence space clustering to prune redundant pairs while preserving training signal, and (ii) amortizes likelihood computations with group-based approximations. Across three protein engineering tasks, g-DPO maintains in silico and in vitro performance that is statistically indistinguishable from standard DPO, while converging 1.7x to 5.4x times faster, with speedups that scale with dataset size and the structure of the underlying mutational landscape.

翻译：直接偏好优化（DPO）是一种将蛋白质语言模型与实验设计目标对齐的有效方法。然而，DPO面临可扩展性瓶颈：可能的训练对数量随标记序列数量呈二次方增长，导致即使对于中等规模的数据集，训练时间也令人望而却步。我们提出了g-DPO框架，该框架（i）利用序列空间聚类来剪枝冗余训练对，同时保留训练信号；（ii）通过基于分组的近似方法分摊似然计算成本。在三个蛋白质工程任务中，g-DPO在计算机模拟和体外实验中的性能与标准DPO在统计学上无显著差异，同时收敛速度提高了1.7倍至5.4倍，其加速效果随数据集规模和底层突变景观的结构而扩展。