We present the data model, design choices, and performance of ProvSQL, a general and easy-to-deploy provenance tracking and probabilistic database system implemented as a PostgreSQL extension. ProvSQL's data and query models closely reflect that of a large core of SQL, including multiset semantics, the full relational algebra, and aggregation. A key part of its implementation relies on generic provenance circuits stored in memory-mapped files. We propose benchmarks to measure the overhead of provenance and probabilistic evaluation and demonstrate its scalability and competitiveness with respect to other state-of-the-art systems.
翻译:本文介绍了ProvSQL的数据模型、设计选择与性能表现,该系统作为一个PostgreSQL扩展实现,是一个通用且易于部署的来源追踪与概率数据库系统。ProvSQL的数据与查询模型紧密反映了SQL核心的大部分特性,包括多重集语义、完整的关系代数以及聚合操作。其实现的关键部分依赖于存储在内存映射文件中的通用来源电路。我们提出了用于衡量来源追踪与概率评估开销的基准测试,并展示了该系统相较于其他前沿系统在可扩展性与竞争力方面的表现。