Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we developed HERALD-AF, a data extraction pipeline that converts natural language math problems into formal statements, and a new open-source Lean 4 interactive environment (Jixia-interactive) to facilitate synthesis data collection. In our experiments, our prover using only supervised fine-tune achieves competitive results with a 23.7% success rate (Pass@64) on the ProofNet dataset-comparable to state-of-the-art (SOTA) models. To further evaluate our approach, we introduce FATE-M, a new benchmark focused on algebraic problems, where our prover achieves a SOTA success rate of 56.7% (Pass@64).
翻译:当前,形式化定理证明器在中学及竞赛级数学问题上已取得显著进展,但鲜有能推广至更高等数学领域的系统。本文提出REAL-Prover,一种基于Lean 4的新型开源逐步定理证明器,旨在突破此局限。该证明器基于我们微调的大型语言模型(REAL-Prover-v1)并结合检索系统(Leansearch-PS),显著提升了解决大学级数学问题的性能。为训练REAL-Prover-v1,我们开发了HERALD-AF数据提取流程,可将自然语言数学问题转化为形式化陈述,并构建了新的开源Lean 4交互环境(Jixia-interactive)以促进合成数据收集。实验表明,仅使用监督微调的证明器在ProofNet数据集上取得了23.7%的成功率(Pass@64),与当前最优模型性能相当。为进一步评估方法,我们提出了专注于代数问题的新基准FATE-M,在该基准上我们的证明器达到了56.7%的成功率(Pass@64),刷新了当前最优水平。