Large language models (LLMs) struggle with formal domains that require rigorous logical deduction and symbolic reasoning, such as mathematical proof generation. We propose a neuro-symbolic approach that combines LLMs' generative strengths with structured components to overcome this challenge. As a proof-of-concept, we focus on geometry problems. Our approach is two-fold: (1) we retrieve analogous problems and use their proofs to guide the LLM, and (2) a formal verifier evaluates the generated proofs and provides feedback, helping the model fix incorrect proofs. We demonstrate that our method significantly improves proof accuracy for OpenAI's o1 model (58%-70% improvement); both analogous problems and the verifier's feedback contribute to these gains. More broadly, shifting to LLMs that generate provably correct conclusions could dramatically improve their reliability, accuracy and consistency, unlocking complex tasks and critical real-world applications that require trustworthiness.
翻译:大型语言模型(LLMs)在需要严格逻辑演绎与符号推理的形式化领域(如数学证明生成)中存在困难。我们提出一种神经符号方法,将LLMs的生成能力与结构化组件相结合以应对这一挑战。作为概念验证,我们聚焦于几何问题。该方法包含两方面:(1)检索类似问题并利用其证明指导LLM;(2)通过形式化验证器评估生成的证明并提供反馈,帮助模型修正错误证明。实验表明,我们的方法显著提升了OpenAI o1模型的证明准确率(提升58%-70%),类似问题检索与验证器反馈均对此增益有所贡献。更广泛而言,转向生成可证明正确结论的LLMs将极大提升其可靠性、准确性与一致性,从而解锁需要可信度的复杂任务及关键现实应用。