大型语言模型能否解决符号执行中的路径约束问题？ (Can Large Language Models Solve Path Constraints in Symbolic Execution?)

Symbolic execution is an important software analysis technique which benefits downstream tasks such as software testing and debugging. However, several limitations hinder symbolic execution from application on real-world software. One of the limitations is the inability to solve diverse execution path constraints: traditional symbolic execution based on SMT solvers is difficult to handle execution paths with complex data structures or external API calls. In this paper, we focus on investigating the possibility of adopting large language models (LLM) for path constraint solving instead of traditional solver-based techniques in symbolic execution. We conduct an empirical study to evaluate the ability of LLMs in two types of path constraint solving: generating test inputs to facilitate an execution path, and determining whether a given execution path can be satisfied without triggering any bugs. We build new evaluation pipelines and benchmarks for two tasks: test case generation and path classification, which include data sources from both competition-level programs and real-world repositories. Our experiment results show that state-of-the-art LLMs are able to solve path constraints in both generation and classification tasks, with 60% of generated test cases that accurately cover the given execution path. Moreover, LLMs are capable of improving test coverage by covering execution paths in real-world repositories where traditional symbolic execution tools cannot be applied. These findings highlight the possibility of extending symbolic execution techniques with LLMs in the future to improve the ability and generalizability of symbolic execution.

翻译：符号执行是一种重要的软件分析技术，对软件测试和调试等下游任务具有重要价值。然而，符号执行在实际软件应用中仍面临若干限制。其中一个关键限制是难以解决多样化的执行路径约束：基于SMT求解器的传统符号执行难以处理涉及复杂数据结构或外部API调用的执行路径。本文重点研究在符号执行中采用大型语言模型（LLM）替代传统基于求解器的技术来解决路径约束的可能性。我们通过实证研究评估LLM在两类路径约束求解任务中的能力：生成测试输入以驱动特定执行路径，以及判断给定执行路径是否可满足且不会触发任何程序缺陷。我们为测试用例生成和路径分类两项任务构建了新的评估流程与基准数据集，数据来源涵盖竞赛级程序与实际代码仓库。实验结果表明，当前最先进的LLM能够在生成与分类任务中有效解决路径约束问题，其中60%的生成测试用例能精确覆盖目标执行路径。此外，LLM能够提升测试覆盖率，覆盖传统符号执行工具无法处理的实际代码仓库中的执行路径。这些发现揭示了未来结合LLM扩展符号执行技术以提升其能力与泛化性的潜力。