We study the amount of reliable information that can be stored in a DNA-based storage system composed of short DNA molecules. In this regime, Shomorony and Heckel (2022) put forward a conjecture on the scaling of the number of information bits that can be reliably stored. In this paper, we complete the proof of this conjecture. We analyze a random-coding scheme in which each codeword is obtained by quantizing a randomly generated probability mass function drawn from the probability simplex. By analyzing the optimal maximum-likelihood decoder, we derive an achievability bound that matches a recently established converse bound across the entire short-molecule regime. We also propose a second coding scheme, which operates with significantly lower computational complexity but achieves the optimal scaling, except for a specific range of very short molecules.
翻译:本研究探讨了基于短DNA分子的存储系统能够可靠存储的信息量。在此体系下,Shomorony与Heckel(2022)提出了关于可可靠存储信息比特数缩放规律的猜想。本文完成了该猜想的完整证明。我们分析了一种随机编码方案:每个码字通过对概率单纯形中随机生成的概率质量函数进行量化获得。通过分析最优最大似然解码器,我们推导出与近期建立的短分子全域逆界相匹配的可达性界。同时,我们提出了第二种编码方案,该方案在计算复杂度显著降低的情况下,除特定极短分子区间外,仍能实现最优缩放性能。