One of the primary objectives of a distributed storage system is to reliably store large amounts of source data for long durations using a large number $N$ of unreliable storage nodes, each with $c$ bits of storage capacity. Storage nodes fail randomly over time and are replaced with nodes of equal capacity initialized to zeroes, and thus bits are erased at some rate $e$. To maintain recoverability of the source data, a repairer continually reads data over a network from nodes at an average rate $r$, and generates and writes data to nodes based on the read data. The distributed storage source capacity is the maximum amount of source that can be reliably stored for long periods of time. Previous research shows that asymptotically the distributed storage source capacity is at most $\left(1-\frac{e}{2 \cdot r}\right) \cdot N \cdot c$ as $N$ and $r$ grow. In this work we introduce and analyze algorithms such that asymptotically the distributed storage source data capacity is at least the above equation. Thus, the above equation expresses a fundamental trade-off between network traffic and storage overhead to reliably store source data.
翻译:分布式存储系统的首要目标之一是利用大量不可靠的存储节点,可靠地储存大量源数据,长期长期储存大量源数据,使用大量不可靠的存储节点,每个节点都有美元存储能力。存储节点随着时间而随机失灵,代之以同等容量节点,初始初始化为零,从而以某种速率将位点删除。为了保持源数据的可恢复性,一个修理机不断在网络上以平均速率从节点读取数据,以美元为单位,并根据已读数据向节点生成和撰写数据。分布式存储源能力是能够可靠地长期储存的最大源量。以前的研究表明,分散的储存源能力在最大时间里是左端(1-\frac=2\cdot r\\\cdot r ⁇ right)\cdot N\cddot c$为美元和美元增长。在这个工作中,我们介绍和分析的算法,即分布式存储源数据能力至少位于上述方程式。因此,可可靠地表示存储方程式的基本数据交易源。