Large language models (LLMs) deployed in user-facing applications require long-horizon consistency: the ability to remember prior interactions, respect user preferences, and ground reasoning in past events. However, contemporary memory systems often adopt complex architectures such as knowledge graphs, multi-stage retrieval pipelines, and OS-style schedulers, which introduce engineering complexity and reproducibility challenges. We present ENGRAM, a lightweight memory system that organizes conversation into three canonical memory types (episodic, semantic, and procedural) through a single router and retriever. Each user turn is converted into typed memory records with normalized schemas and embeddings and stored in a database. At query time, the system retrieves top-k dense neighbors for each type, merges results with simple set operations, and provides the most relevant evidence as context to the model. ENGRAM attains state-of-the-art results on LoCoMo, a multi-session conversational QA benchmark for long-horizon memory, and exceeds the full-context baseline by 15 points on LongMemEval while using only about 1% of the tokens. These results show that careful memory typing and straightforward dense retrieval can enable effective long-term memory management in language models without requiring complex architectures.
翻译:部署于用户端应用的大语言模型(LLMs)需要具备长时程一致性能力:即能够记住先前的交互、尊重用户偏好,并将推理基于过往事件。然而,现有的记忆系统通常采用复杂的架构,如知识图谱、多阶段检索流水线和操作系统式调度器,这些设计引入了工程复杂性并带来可复现性挑战。我们提出了ENGRAM,一种轻量级记忆系统,通过单一路由器和检索器将对话组织为三种规范记忆类型(情景记忆、语义记忆和程序性记忆)。每个用户轮次被转换为具有规范化模式和嵌入向量的类型化记忆记录,并存储于数据库中。在查询时,系统为每种类型检索前k个稠密近邻,通过简单的集合操作合并结果,并将最相关的证据作为上下文提供给模型。ENGRAM在LoCoMo(一个针对长时程记忆的多轮会话问答基准测试)上取得了最先进的结果,并在LongMemEval基准上以仅使用约1%的token量,超越全上下文基线15个百分点。这些结果表明,精细的记忆类型划分与直接的稠密检索能够在不依赖复杂架构的情况下,为语言模型实现有效的长期记忆管理。