The convergence of statistical learning and molecular physics is transforming our approach to modeling biomolecular systems. Physics-informed machine learning (PIML) offers a systematic framework that integrates data-driven inference with physical constraints, resulting in models that are accurate, mechanistic, generalizable, and able to extrapolate beyond observed domains. This review surveys recent advances in physics-informed neural networks and operator learning, differentiable molecular simulation, and hybrid physics-ML potentials, with emphasis on long-timescale kinetics, rare events, and free-energy estimation. We frame these approaches as solutions to the "biomolecular closure problem", recovering unresolved interactions beyond classical force fields while preserving thermodynamic consistency and mechanistic interpretability. We examine theoretical foundations, tools and frameworks, computational trade-offs, and unresolved issues, including model expressiveness and stability. We outline prospective research avenues at the intersection of machine learning, statistical physics, and computational chemistry, contending that future advancements will depend on mechanistic inductive biases, and integrated differentiable physical learning frameworks for biomolecular simulation and discovery.
翻译:统计学习与分子物理学的融合正在变革我们对生物分子系统建模的方法。物理信息机器学习(PIML)提供了一个系统化框架,将数据驱动的推断与物理约束相结合,从而产生精确、机制化、可泛化且能外推至观测域之外的模型。本综述回顾了物理信息神经网络与算子学习、可微分分子模拟以及混合物理-机器学习势函数等领域的最新进展,重点关注长时间尺度动力学、稀有事件及自由能估计。我们将这些方法视为解决“生物分子闭合问题”的方案,在恢复经典力场之外未解析相互作用的同时,保持热力学一致性与机制可解释性。我们探讨了理论基础、工具与框架、计算权衡以及未解决的问题,包括模型表达能力与稳定性。我们展望了机器学习、统计物理学与计算化学交叉领域的前瞻性研究方向,认为未来的进展将依赖于机制化归纳偏置,以及用于生物分子模拟与发现的一体化可微分物理学习框架。