Our goal is to find combinations of facts that optimally summarize data sets. We consider this problem in the context of voice query interfaces for simple, exploratory data analysis. Here, the system answers voice queries with a short summary of relevant data. Finding optimal voice data summaries is computationally expensive. Prior work in this domain has exploited sampling and incremental processing. Instead, we rely on a pre-processing stage generating summaries of data subsets in a batch operation. This step reduces run time overheads by orders of magnitude. We present multiple algorithms for the pre-processing stage, realizing different tradeoffs between optimality and data processing overheads. We analyze our algorithms formally and compare them experimentally with prior methods for generating voice data summaries. We report on multiple user studies with a prototype system implementing our approach. Furthermore, we report on insights gained from a public deployment of our system on the Google Assistant Platform.
翻译:我们的目标是找到对数据集进行最佳总结的各种事实的组合。 我们从语音查询界面的角度来考虑这一问题, 以便进行简单的探索性数据分析。 这里, 系统用相关数据的简短摘要解答语音询问。 找到最佳语音数据摘要是计算上昂贵的。 先前在这一领域的工作已经利用了抽样和渐进处理。 相反, 我们依赖一个预处理阶段, 在一个批量操作中生成数据子集摘要。 这个步骤可以减少数量级的运行时间管理。 我们为预处理阶段提供多种算法, 实现最佳性与数据处理间接费用之间的不同权衡。 我们正式分析我们的算法, 并用先前生成语音数据摘要的方法进行实验性比较。 我们报告多用户研究, 使用一个执行我们方法的原型系统。 此外, 我们报告从在谷歌助理平台上公开部署我们的系统中获得的洞察力。