Raft is a leading consensus algorithm for replicating writes in distributed databases. However, distributed databases also require consistent reads. To guarantee read consistency, a Raft-based system must either accept the high communication overhead of a safety check for each read, or implement leader leases. Prior lease protocols are vaguely specified and hurt availability, so most Raft systems implement them incorrectly or not at all. We introduce LeaseGuard, a novel lease algorithm that relies on guarantees specific to Raft elections. LeaseGuard is simple, rigorously specified in TLA+, and includes two novel optimizations that maximize availability during leader failover. The first optimization restores write throughput quickly, and the second improves read availability. We evaluate LeaseGuard with a simulation in Python and an implementation in LogCabin, the C++ reference implementation of Raft. By replacing LogCabin's default consistency mechanism (quorum checks), LeaseGuard reduces the overhead of consistent reads from one to zero network roundtrips. It also improves write throughput from ~1000 to ~10,000 writes per second, by eliminating contention between writes and quorum reads. Whereas traditional leases ban all reads on a new leader while it waits for a lease, in our LeaseGuard test the new leader instantly allows 99% of reads to succeed.
翻译:Raft是分布式数据库中实现写入复制的核心共识算法。然而,分布式数据库同样需要一致性读取。为保证读取一致性,基于Raft的系统要么需承受每次读取安全检查的高通信开销,要么需实现领导者租约机制。现有租约协议规范模糊且会损害可用性,导致多数Raft系统错误实现或完全未实现该机制。本文提出LeaseGuard——一种基于Raft选举特定保证的新型租约算法。该算法设计简洁,通过TLA+形式化规范严格定义,并包含两项最大化领导者故障转移期间可用性的创新优化:首项优化快速恢复写入吞吐量,次项优化提升读取可用性。我们通过Python仿真实验与LogCabin(Raft的C++参考实现)部署对LeaseGuard进行评估。通过替换LogCabin默认的一致性机制(法定人数检查),LeaseGuard将一致性读取的网络往返开销从1次降为0次,并通过消除写入与法定人数读取间的竞争,将写入吞吐量从约1000次/秒提升至约10000次/秒。传统租约机制要求新领导者等待租约期间禁止所有读取,而在LeaseGuard测试中,新领导者可立即支持99%的读取操作成功执行。