分布式可恢复草图（扩展版） (Distributed Recoverable Sketches (Extended Version))

Sketches are commonly used in computer systems and network monitoring tools to provide efficient query executions while maintaining a compact data representation. Switches and routers maintain sketches to track statistical characteristics of network traffic. The availability of such data is essential for the network analysis as a whole. Consequently, being able to recover sketches is critical after a switch crash. In this work, we explore how nodes in a network environment can cooperate to recover sketch data whenever any subset of them crashes. In particular, we focus on frequency estimation linear sketches, such as the Count-Min Sketch. We consider various approaches to ensure data reliability and explore the trade-offs between space consumption, runtime overheads, and traffic during recovery, which we point out as design guidelines. Besides different aspects of efficacy, we design a modular system for ease of maintenance and further scaling. A key aspect we examine is how the nodes update each other regarding their sketch content as it evolves over time. In particular, we compare periodic full updates vs incremental updates. We also examine several data structures to economically represent and encode a batch of latest changes. Our framework is generic, and other data structures can be plugged-in via an abstract API as long as they implement the corresponding API methods.

翻译：草图在计算机系统和网络监控工具中被广泛使用，以在保持紧凑数据表示的同时提供高效的查询执行。交换机和路由器维护草图以追踪网络流量的统计特性。此类数据的可用性对整个网络分析至关重要。因此，在交换机发生故障后能够恢复草图数据极为关键。在本研究中，我们探讨了网络环境中的节点如何协作，以便在任何节点子集发生故障时恢复草图数据。我们特别关注频率估计线性草图，例如Count-Min草图。我们考虑了多种确保数据可靠性的方法，并深入分析了存储空间消耗、运行时开销以及恢复期间通信流量之间的权衡，这些权衡点被我们总结为设计准则。除了效能的不同方面外，我们设计了一个模块化系统，以简化维护并支持进一步扩展。我们研究的一个关键方面是节点如何随着草图内容随时间演变而相互更新信息。具体而言，我们比较了周期性完整更新与增量更新两种策略。我们还研究了多种数据结构，以经济高效地表示和编码一批最新变更。我们的框架具有通用性，只要其他数据结构实现了相应的API方法，即可通过抽象API进行集成。