Memory plays a foundational role in augmenting the reasoning, adaptability, and contextual fidelity of modern Large Language Models (LLMs) and Multi-Modal LLMs (MLLMs). As these models transition from static predictors to interactive systems capable of continual learning and personalized inference, the incorporation of memory mechanisms has emerged as a central theme in their architectural and functional evolution. This survey presents a comprehensive and structured synthesis of memory in LLMs and MLLMs, organizing the literature into a cohesive taxonomy comprising implicit, explicit, and agentic memory paradigms.
The architecture of memory in modern (M)LLMs is increasingly analogous to the synergistic relationship between different human brain systems, particularly the neocortex, the hippocampus, and the prefrontal cortex. This brain-inspired framework, which echoes the principles of Complementary Learning Systems theory, provides a powerful lens through which to understand the different memory paradigms evolving in AI.
-
Implicit Memory: The Neocortex.: We conceptualize the model's internal parameters as its digital neocortex. In the brain, the neocortex is the primary repository for long-term semantic knowledge, skills, and consolidated memories, which are learned slowly and stored in a distributed manner. Similarly, a transformer's weights embody the implicit memory of the model, the foundational "world knowledge" acquired during pre-training. This parametric knowledge represents the model's stable, generalized understanding of language, patterns, and facts.
-
Explicit Memory: The Hippocampal System.: To access specific, real-time, or episodic information, an AI system requires a mechanism analogous to the hippocampus. The hippocampus is critical for the rapid encoding of new episodic memories (\ie, specific events and their context) and acts as an index that binds together disparate elements of an experience stored across the neocortex. Explicit memory systems in AI, such as Retrieval-Augmented Generation (RAG), mimic this function. They serve as an "AI Hippocampus" by providing an on-demand, queryable index to external information (vector embeddings, knowledge graphs). This allows the model to ground its responses in specific, up-to-date facts without the need for slow, resource-intensive retraining of its entire parametric base (the "neocortex").
-
Agentic Memory: The Prefrontal Cortex.: The functionality of agentic memory is best analogized to the prefrontal cortex (PFC), the brain's executive control center. The PFC is responsible for working memory, goal-directed planning, and integrating information from both long-term stores (neocortex) and recent episodic memories (hippocampus) to guide behavior. Agentic memory systems similarly maintain a persistent state across interactions, manage working memory (\eg, a scratchpad), and orchestrate the strategic retrieval and use of both implicit and explicit memory to formulate plans and execute complex tasks. Furthermore, as we explore in xxx, this executive function extends to integrating information from specialized memory modules for spatial, temporal, and embodied intelligence, akin to how the PFC coordinates inputs from various sensory cortices.
🌟🌟🌟 Please feel free to make a PR if I missed something. 🌟🌟🌟
Table of Contents
- 1 Implicit Memory: Unveiling Knowledge Inside Transformers
- 2 Explicit Memory: When (M)LLMs Meet Retrieval
- 3 Agentic Memory: Consolidating Memories into Humanic Agents
- 4 Memory-augmented Multi-Modal Large Language models
- Dissecting recall of factual associations in auto-regressive language models
- Knowledge circuits in pretrained transformers
- Interpreting key mechanisms of factual recall in transformer-based language models
- Confidence regulation neurons in language models
- Revealing the dark secrets of BERT
- Transformer Feed-Forward Layers Are Key-Value Memories
- Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space
- Knowledge Neurons in Pretrained Transformers
- What does bert look at? an analysis of bert’s attention
- exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models
- Characterizing mechanisms for factual recall in language models
- Transformer Feed-Forward Layers Are Key-Value Memories
- Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space
- Empirical study on updating key-value memories in transformer feed-forward layers
- Understanding transformer from the perspective of associative memory
- Knowledge Neurons in Pretrained Transformers
- MoEfication: Transformer feed-forward layers are mixtures of experts.
- Emergent modularity in pre-trained transformers
- Depn: Detecting and editing privacy neurons in pretrained language models
- Confidence regulation neurons in language models
- What does bert look at? an analysis of bert’s attention
- Characterizing mechanisms for factual recall in language models
- Inference-time intervention: Eliciting truthful answers from a language model
- Stanford alpaca: An instruction-following llama model
- Do llms dream of elephants (when told not to)? latent concept association and associative memory in transformers
- Knowledge circuits in pretrained transformers
- Dissecting recall of factual associations in auto-regressive language models
- Interpreting key mechanisms of factual recall in transformer-based language models
- Scaling Laws for Neural Language Models
- Scaling Laws for Fact Memorization of Large Language Models
- Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
- Physics of language models: Part 3.1, knowledge storage and extraction
- Neural networks and physical systems with emergent collective computational abilities
- Hopfield networks is all you need
- Birth of a transformer: a memory viewpoint
- Do llms dream of elephants (when told not to)? latent concept association and associative memory in transformers
- Scaling Laws for Associative Memories
- Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
- CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory
- Hierarchical Associative Memory
- Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models
- Modifying memories in transformer models
- Propagating Knowledge Updates to LMs Through Distillation
- Training Language Models with Memory Augmentation
- RECKONING: reasoning through dynamic knowledge encoding
- LoRA: Low-Rank Adaptation of Large Language Models
- K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
- DEMix Layers: Disentangling Domains for Modular Language Modeling
- Locating and editing factual associations in GPT
- Mass-editing memory in a transformer
- Calibrating Factual Knowledge in Pretrained Language Models
- Editing Factual Knowledge in Language Models
- Methods for measuring, updating, and visualizing factual beliefs in language models
- Large Scale Knowledge Washing
- A comprehensive study of knowledge editing for large language models
- To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
- Towards Safer Large Language Models through Machine Unlearning
- LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
- Unlearn What You Want to Forget: Efficient Unlearning for LLMs
- Knowledge editing for large language models: A survey
- Evaluating the Ripple Effects of Knowledge Editing in Language Models
- Cross-Lingual Multi-Hop Knowledge Editing--Benchmarks, Analysis and a Simple Contrastive Learning based Approach
- A comprehensive study of knowledge editing for large language models
- MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
- Eva-kellm: A new benchmark for evaluating knowledge editing of llms
- PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter
- PaperQA: Retrieval-Augmented Generative Agent for Scientific Research
- Memory-assisted prompt editing to improve GPT-3 after deployment
- Prompt-Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks
- Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation
- Understanding Retrieval Augmentation for Long-Form Question Answering
- Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
- IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions
- Retrieval meets Long Context Large Language Models
- RA-DIT: Retrieval-Augmented Dual Instruction Tuning
- RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
- ISEEQ: Information Seeking Question Generation using Dynamic Meta-Information Retrieval and Knowledge Graphs
- G-retriever: retrieval-augmented generation for textual graph understanding and question answering
- KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases
- FABULA: Intelligence Report Generation Using Retrieval-Augmented Narrative Construction
- HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses
- Knowledge-Augmented Language Model Verification
- PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning
- From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
- Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval
- Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning
- ProAI: Proactive Multi-Agent Conversational AI with Structured Knowledge Base for Psychiatric Diagnosis
- LlamaIndex
- LangChain
- Large Language Models Prompting With Episodic Memory
- Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation
- Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation
- Evaluating Very Long-Term Conversational Memory of LLM Agents
- REALM: Retrieval-Augmented Language Model Pre-Training
- Improving language models by retrieving from trillions of tokens
- Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
- InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
- {$Memory^3$}: Language Modeling with Explicit Memory
- Atlas: Few-shot Learning with Retrieval Augmented Language Models
- Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data
- Copy Is All You Need
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
- Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
- REPLUG: Retrieval-Augmented Black-Box Language Models
- Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation
- UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
- Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
- Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions
- Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
- Memorizing Transformers
- Unlimiformer: Long-Range Transformers with Unlimited Length Input
- Focused Transformer: Contrastive Training for Context Scaling
- EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts
- {$R^3$}Mem: Bridging Memory Retention and Retrieval via Reversible Compression
- Augmenting Language Models with Long-Term Memory
- Training language models with memory augmentation
- Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory
- Faithful reasoning using large language models
- Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation
- Towards Cross-Modality Modeling for Time Series Analytics: A Survey in the LLM Era
- Foundation Models for Spatio-Temporal Data Science: A Tutorial and Survey
- Human memory: A proposed system and its control processes
- A survey on the memory mechanism of large language model based agents
- LLM as a robotic brain: Unifying egocentric memory and control
- Evaluating very long-term conversational memory of LLM agents
- Memory3: Language modeling with explicit memory
- Chain-of-thought prompting elicits reasoning in large language models
- Self-consistency improves chain of thought reasoning in language models
- Tree of thoughts: Deliberate problem solving with large language models
- Graph of thoughts: Solving elaborate problems with large language models
- ReAct: Synergizing reasoning and acting in language models
- Reflexion: Language agents with verbal reinforcement learning
- Reflectevo: Improving meta introspection of small LLMs by learning self-reflection
- Mars: Situated inductive reasoning in an open-world environment
- RAM: Towards an ever-improving memory system by learning from communications
- Absolute zero: Reinforced self-play reasoning with zero data
- Metareflection: Learning instructions for language agents using past reflections
- Learn beyond the answer: Training language models with reflection for mathematical reasoning
- RoT: Enhancing large language models with reflection on search trees
- Mirror: A multiple-perspective self-reflection method for knowledge-rich reasoning
- Textgrad: Automatic differentiation via text
- Langsuite: Planning, controlling and interacting with large language models in embodied text environments
- Benchmarking the spectrum of agent capabilities
- The arcade learning environment: An evaluation platform for general agents
- Understanding the planning of LLM agents: A survey
- Generative agents: Interactive simulacra of human behavior
- Recmind: Large language model powered agent for recommendation
- Large language models are semi-parametric reinforcement learning agents
- How do large language models capture the ever-changing world knowledge? a review of recent advances
- From static to dynamic: A continual learning framework for large language models
- Ret-LLM: Towards a general read-write memory for large language models
- Empowering working memory for large language model agents
- Think-in-memory: Recalling and post-thinking enable LLMs with long-term memory
- RAP: Retrieval-augmented planning with contextual memory for multimodal LLM agents
- Learn-by-interact: A data-centric framework for self-adaptive agents in realistic environments
- Memp: Exploring agent procedural memory
- Contextual experience replay for self-improvement of language agents
- Sage: Self-evolving agents with reflective and memory-augmented abilities
- Reasoningbank: Scaling agent self-evolving with reasoning memory
- Memory-assisted prompt editing to improve GPT-3 after deployment
- Learning to repair: Repairing model output errors after deployment using a dynamic memory of feedback
- Towards teachable reasoning systems: Using a dynamic memory of user feedback for continual system improvement
- Compress to impress: Unleashing the potential of compressive memory in real-world long-term conversations
- KnowledGPT: Enhancing large language models with retrieval and storage access on knowledge bases
- LLM-based Medical Assistant Personalization with Short- and Long-Term Memory Coordination
- MemoryBank: Enhancing large language models with long-term memory
- A-mem: Agentic memory for llm agents
- Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence
- LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning
- Memory Sharing for Large Language Model based Agents
- Zep: A Temporal Knowledge Graph Architecture for Agent Memory
- The NarrativeQA Reading Comprehension Challenge
- Quality: Question answering with long input texts, yes!
- LooGle: Can long-context language models understand long contexts?
- RetrievalQA: Assessing adaptive retrieval-augmented generation for short-form open-domain question answering
- Needle In A Haystack
- ALFWorld: Aligning text and embodied environments for interactive learning
- WebArena: A realistic web environment for building autonomous agents
- Gentopia: A collaborative platform for tool-augmented LLMs
- AgentBench: Evaluating LLMs as agents
- LongMemEval: Benchmarking chat assistants on long-term interactive memory
- The Llama 3 Herd of Models
- Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia
- A Scalable Communication Protocol for Networks of Large Language Models
- A survey on the memory mechanism of large language model based agents
- DroidSpeak: Enhancing Cross-LLM Communication
- IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems
- RAM: Towards an Ever-Improving Memory System by Learning from Communications
- Rethinking memory in ai: Taxonomy, operations, topics, and future directions
- Memory-augmented conformer for improved end-to-end long-form ASR
- Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
- MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
- Memory Consolidation Enables Long-Context Video Understanding
- MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
- MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
- Streaming Long Video Understanding with Large Language Models
- Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
- Videollamb: Long-context video understanding with recurrent memory bridges
- OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
- A Simple LLM Framework for Long-Range Video Question-Answering
- LifelongMemory: Leveraging LLMs for Answering Queries in Egocentric Videos
- VideoAgent: Long-form Video Understanding with Large Language Model as Agent
- VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
- ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
- DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
- Language Repository for Long Video Understanding
- DrVideo: Document Retrieval Based Long Video Understanding
- Attend to You: Personalized Image Captioning with Context Sequence Memory Networks
- Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection
- Memory-Guided Unsupervised Image-to-Image Translation
- Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks
- Universal Face Restoration With Memorized Modulation
- Learning Meta-class Memory for Few-Shot Semantic Segmentation
- Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer
- Memory Oriented Transfer Learning for Semi-Supervised Image Deraining
- TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
- Learning Video Object Segmentation with Visual Memory
- Recurrent Filter Learning for Visual Tracking
- Learning Recurrent Memory Activation Networks for Visual Tracking
- Learning Dynamic Memory Networks for Object Tracking
- Video Object Segmentation Using Space-Time Memory Networks
- Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
- STMTrack: Template-Free Visual Tracking With Space-Time Memory Networks
- Video Object Segmentation with Episodic Graph Memory Networks
- Dual Temporal Memory Network for Efficient Video Object Segmentation
- Alignment Before Aggregation: Trajectory Memory Retrieval Network for Video Object Segmentation
- Associating Objects with Transformers for Video Object Segmentation
- Local Memory Attention for Fast Video Semantic Segmentation
- TrackFormer: Multi-Object Tracking with Transformers
- XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
- Streaming Video Model
- Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
- Reading Relevant Feature from Global Representation Memory for Visual Object Tracking
- RMem: Restricted Memory Banks Improve Video Object Segmentation
- Efficient Video Object Segmentation via Modulated Cross-Attention Memory
- SAM 2: Segment Anything in Images and Videos
- Long Short-Term Transformer for Online Action Detection
- MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
- Recurring the Transformer for Video Action Recognition
- Memory-and-Anticipation Transformer for Online Action Understanding
- Online Temporal Action Localization with Memory-Augmented Transformer
- Video Summarization with Long Short-Term Memory
- Extractive Video Summarizer with Memory Augmented Neural Networks
- Stacked Memory Network for Video Summarization
- M3: Multimodal Memory Modelling for Video Captioning
- Memory-Attended Recurrent Network for Video Captioning
- MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
- AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description
- Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
- A Read-Write Memory Network for Movie Story Understanding
- Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
- MemFlow: Optical Flow Estimation and Prediction with Memory
- MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation
- Multi-Scale Memory-Based Video Deblurring
- MENet: A Memory-Based Network with Dual-Branch for Efficient Event Stream Processing
- CroMM-VSR: Cross-Modal Memory Augmented Visual Speech Recognition
- Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
- Video Prediction Recalling Long-Term Motion Context via Memory Alignment Learning
- MV-TON: Memory-based Video Virtual Try-on network
- Audio-driven Talking Face Video Generation with Natural Head Pose
- SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
- EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation
- STAM: A SpatioTemporal Attention Based Memory for Video Prediction
- Memories are One-to-Many Mapping Alleviators in Talking Face Generation
- JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
- GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
- Explore, Select, Derive, and Recall: Augmenting LLM with Human-like Memory for Mobile Task Automation
- AppAgent: Multimodal Agents as Smartphone Users
- OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
- Multimodal Embodied Interactive Agent for Cafe Scene
- Visual Memory for Robust Path Following
- Structured Scene Memory for Vision-Language Navigation
- Visual Graph Memory with Unsupervised Representation for Visual Navigation
- MemoNav: Working Memory Model for Visual Navigation
- Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
- MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
- Deep Visual Odometry With Adaptive Memory
- Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models
