SRAM-based cache memory faces several scalability limitations in deep nanoscale technologies, e.g., high leakage current, low cell stability, and low density. Emerging Non-Volatile Memory (NVM) technologies have received lots of attention in recent years, where Racetrack Memory (RTM) is among the most promising ones. RTM has the highest density among all NVMs and its access performance is comparable to SRAM technology. Therefore, RTM is a suitable alternative for SRAM in the Last-Level Caches (LLCs). Despite all its benefits, RTM confronts different reliability challenges due to the stochastic behavior of its storage element and highly error-prone data shifting, leading to a high probability of multiple-bit errors. Conventional Error-Correcting Codes (ECCs) are either incapable of tolerating multiple-bit errors or require a large amount of extra storage for check bits. This paper proposes taking advantage of value locality for compressing data blocks and freeing up a large fraction of cache blocks for storing data redundancy of strong ECCs. Utilizing the proposed scheme, a large majority of cache blocks are protected by strong ECCs to tolerate multiple-bit errors without any storage overhead. The evaluation using gem5 full-system simulator demonstrates that the proposed scheme enhances the mean-time-to-failure of the cache by an average of 11.3x with less than 1% hardware and performance overhead.
翻译:基于SRAM的缓存存储器在深纳米级技术中面临多种可扩展性限制,例如高漏电流、低单元稳定性和低密度。新兴的非易失性存储器(NVM)技术近年来受到广泛关注,其中赛道存储器(RTM)是最具前景的技术之一。RTM在所有NVM中具有最高密度,其访问性能可与SRAM技术相媲美。因此,RTM是末级缓存(LLC)中SRAM的合适替代方案。尽管具有诸多优势,RTM因其存储单元的随机行为和高度易错的数据移位而面临不同的可靠性挑战,导致多比特错误的高概率发生。传统的纠错码(ECC)要么无法容忍多比特错误,要么需要大量额外存储空间用于校验位。本文提出利用数值局部性压缩数据块,从而释放大部分缓存块用于存储强ECC的数据冗余。通过采用所提方案,绝大多数缓存块受到强ECC保护,可在无任何存储开销的情况下容忍多比特错误。使用gem5全系统模拟器的评估表明,该方案将缓存的平均故障间隔时间平均提升11.3倍,硬件和性能开销均低于1%。