Diffusion and flow matching models have recently emerged as promising approaches for peptide binder design. Despite their progress, these models still face two major challenges. First, categorical sampling of discrete residue types collapses their continuous parameters into onehot assignments, while continuous variables (e.g., atom positions) evolve smoothly throughout the generation process. This mismatch disrupts the update dynamics and results in suboptimal performance. Second, current models assume unimodal distributions for side-chain torsion angles, which conflicts with the inherently multimodal nature of side chain rotameric states and limits prediction accuracy. To address these limitations, we introduce PepBFN, the first Bayesian flow network for full atom peptide design that directly models parameter distributions in fully continuous space. Specifically, PepBFN models discrete residue types by learning their continuous parameter distributions, enabling joint and smooth Bayesian updates with other continuous structural parameters. It further employs a novel Gaussian mixture based Bayesian flow to capture the multimodal side chain rotameric states and a Matrix Fisher based Riemannian flow to directly model residue orientations on the $\mathrm{SO}(3)$ manifold. Together, these parameter distributions are progressively refined via Bayesian updates, yielding smooth and coherent peptide generation. Experiments on side chain packing, reverse folding, and binder design tasks demonstrate the strong potential of PepBFN in computational peptide design.
翻译:扩散模型与流匹配模型近年来已成为肽结合剂设计领域的有前景方法。尽管取得了进展,这些模型仍面临两大挑战。首先,离散残基类型的分类采样将其连续参数坍缩为独热编码分配,而连续变量(如原子坐标)在生成过程中平滑演化。这种不匹配扰乱了更新动力学并导致次优性能。其次,现有模型假设侧链二面角服从单峰分布,这与侧链旋转异构态固有的多峰性质相冲突,限制了预测精度。为解决这些局限,我们提出了PepBFN——首个用于全原子肽设计的贝叶斯流网络,直接在完全连续空间中建模参数分布。具体而言,PepBFN通过学习连续参数分布来建模离散残基类型,实现与其他连续结构参数的联合平滑贝叶斯更新。该模型进一步采用新型高斯混合贝叶斯流捕捉多峰侧链旋转异构态,并基于矩阵费舍尔的黎曼流直接在$\\mathrm{SO}(3)$流形上建模残基取向。这些参数分布通过贝叶斯更新逐步优化,实现平滑连贯的肽序列生成。在侧链堆积、反向折叠和结合剂设计任务上的实验证明了PepBFN在计算肽设计领域的强大潜力。