Branch-and-Bound (B\&B) is the dominant exact solution method for Mixed Integer Linear Programs (MILP), yet its exponential time complexity poses significant challenges for large-scale instances. The growing capabilities of machine learning have spurred efforts to improve B\&B by learning data-driven branching policies. However, most existing approaches rely on Imitation Learning (IL), which tends to overfit to expert demonstrations and struggles to generalize to structurally diverse or unseen instances. In this work, we propose Tree-Gate Proximal Policy Optimization (TGPPO), a novel framework that employs Proximal Policy Optimization (PPO), a Reinforcement Learning (RL) algorithm, to train a branching policy aimed at improving generalization across heterogeneous MILP instances. Our approach builds on a parameterized state space representation that dynamically captures the evolving context of the search tree. Empirical evaluations show that TGPPO often outperforms existing learning-based policies in terms of reducing the number of nodes explored and improving p-Primal-Dual Integrals (PDI), particularly in out-of-distribution instances. These results highlight the potential of RL to develop robust and adaptable branching strategies for MILP solvers.
翻译:分支定界(B&B)是求解混合整数线性规划(MILP)的主要精确方法,但其指数级时间复杂度给大规模问题带来了显著挑战。随着机器学习能力的提升,研究者开始尝试通过数据驱动的分支策略学习来改进B&B算法。然而,现有方法大多依赖模仿学习(IL),容易对专家演示数据过拟合,难以泛化至结构多样或未见过的实例。本研究提出了树门控近端策略优化(TGPPO),这是一个创新框架,采用强化学习(RL)算法——近端策略优化(PPO)来训练分支策略,旨在提升对异构MILP实例的泛化能力。我们的方法基于参数化的状态空间表示,动态捕捉搜索树的演化上下文。实验评估表明,TGPPO在减少探索节点数和提升p-原始对偶积分(PDI)方面通常优于现有基于学习的策略,尤其在分布外实例中表现突出。这些结果凸显了强化学习在开发鲁棒且自适应的MILP求解器分支策略方面的潜力。