Text-to-SQL generation bridges the gap between natural language and databases, enabling users to query data without requiring SQL expertise. While large language models (LLMs) have significantly advanced the field, challenges remain in handling complex queries that involve multi-table joins, nested conditions, and intricate operations. Existing methods often rely on multi-step pipelines that incur high computational costs, increase latency, and are prone to error propagation. To address these limitations, we propose HI-SQL, a pipeline that incorporates a novel hint generation mechanism utilizing historical query logs to guide SQL generation. By analyzing prior queries, our method generates contextual hints that focus on handling the complexities of multi-table and nested operations. These hints are seamlessly integrated into the SQL generation process, eliminating the need for costly multi-step approaches and reducing reliance on human-crafted prompts. Experimental evaluations on multiple benchmark datasets demonstrate that our approach significantly improves query accuracy of LLM-generated queries while ensuring efficiency in terms of LLM calls and latency, offering a robust and practical solution for enhancing Text-to-SQL systems.
翻译:文本到SQL生成弥合了自然语言与数据库之间的鸿沟,使用户无需SQL专业知识即可查询数据。尽管大型语言模型(LLMs)显著推动了该领域的发展,但在处理涉及多表连接、嵌套条件和复杂操作的查询时仍面临挑战。现有方法通常依赖多步骤流水线,导致高计算成本、延迟增加且易受错误传播影响。为应对这些局限,我们提出HI-SQL,一种集成新型提示生成机制的流水线,利用历史查询日志指导SQL生成。通过分析先前查询,我们的方法生成专注于处理多表及嵌套操作复杂性的上下文提示。这些提示被无缝整合到SQL生成过程中,无需昂贵的多步骤方法,并减少对人工编写提示的依赖。在多个基准数据集上的实验评估表明,我们的方法显著提升了LLM生成查询的准确性,同时在LLM调用次数和延迟方面确保效率,为增强文本到SQL系统提供了稳健且实用的解决方案。