关于10倍可扩展性提升：KV存储系统扩展KV缓存 (On 10x Better Scalability: KV Stores Scale Up KV Cache)

Large language models (LLMs) rely on Key-Value (KV) cache to reduce time-to-first-token (TTFT) latency, but existing disk-based KV cache systems using file-per-object layouts suffer from severe scalability bottlenecks due to file system metadata overhead, I/O inefficiency, and poor spatial locality. This paper presents SGLANG-LSM, a database-inspired system that leverages Log-Structured Merge-tree (LSM-tree) architectures for scalable KV cache management. SGLANG-LSM implements a layered system design with three coordinated components: (1) a prefix-preserving storage engine that maintains token sequence locality while efficiently storing large KV cache tensors through key-value separation, (2) an adaptive controller that dynamically optimizes LSM-tree configurations based on shifting workload characteristics, and (3) runtime services including batch operations and automatic resource management for production deployment. Evaluation on large-scale dynamic workloads demonstrates that SGLANG-LSM significantly improves cache hits by up to 143% and reduces TTFT by up to 24% compared to state-of-the-art systems, representing the first systematic application of database storage architectures to large-scale LLM cache management.

翻译：大语言模型依赖键值缓存来降低首词生成延迟，但现有基于磁盘的KV缓存系统采用每个对象单独文件的布局，由于文件系统元数据开销、I/O效率低下以及空间局部性差，存在严重的可扩展性瓶颈。本文提出SGLANG-LSM，一种受数据库启发的系统，利用日志结构合并树架构实现可扩展的KV缓存管理。SGLANG-LSM采用三层协同的系统设计：(1) 前缀保持存储引擎，通过键值分离高效存储大型KV缓存张量，同时保持词元序列局部性；(2) 自适应控制器，根据动态变化的工作负载特征优化LSM树配置；(3) 运行时服务，包括批处理操作和自动资源管理，支持生产环境部署。在大规模动态工作负载上的评估表明，与最先进系统相比，SGLANG-LSM将缓存命中率最高提升143%，并将首词生成延迟最高降低24%，这是数据库存储架构在大规模LLM缓存管理中的首次系统性应用。