基于梯度的连接顺序优化 (Gradient-Based Join Ordering)

Join ordering is the NP-hard problem of selecting the most efficient sequence in which to evaluate joins (conjunctive, binary operators) in a database query. As the performance of query execution critically depends on this choice, join ordering lies at the core of query optimization. Traditional approaches cast this problem as a discrete combinatorial search over binary trees guided by a cost model, but they often suffer from high computational complexity and limited scalability. We show that, when the cost model is differentiable, the query plans can be continuously relaxed into a soft adjacency matrix representing a superposition of plans. This continuous relaxation, together with a Gumbel-Softmax parameterization of the adjacency matrix and differentiable constraints enforcing plan validity, enables gradient-based search for plans within this relaxed space. Using a learned Graph Neural Network as the cost model, we demonstrate that this gradient-based approach can find comparable and even lower-cost plans compared to traditional discrete local search methods on two different graph datasets. Furthermore, we empirically show that the runtime of this approach scales linearly with query size, in contrast to quadratic or exponential runtimes of classical approaches. We believe this first step towards gradient-based join ordering can lead to more effective and efficient query optimizers in the future.

翻译：连接顺序优化是一个NP难问题，旨在为数据库查询中的连接（合取、二元操作符）选择最高效的评估序列。由于查询执行性能严重依赖于这一选择，连接顺序优化是查询优化的核心。传统方法将该问题视为基于成本模型的二叉树离散组合搜索，但通常面临计算复杂度高和可扩展性有限的问题。我们证明，当成本模型可微分时，查询计划可以连续松弛为一个表示计划叠加态的软邻接矩阵。这种连续松弛结合邻接矩阵的Gumbel-Softmax参数化以及确保计划有效性的可微分约束，使得在该松弛空间内进行基于梯度的计划搜索成为可能。通过使用学习型图神经网络作为成本模型，我们在两个不同图数据集上证明，这种基于梯度的方法能够找到与传统离散局部搜索方法相当甚至成本更低的计划。此外，实验表明该方法运行时间随查询规模呈线性增长，而经典方法通常具有二次或指数级时间复杂度。我们相信，这一基于梯度的连接顺序优化的初步探索，将为未来开发更高效、更强大的查询优化器奠定基础。