You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Give your OpenClaw agent real memory — three-layer, self-maintaining, locally searchable.
Problem
OpenClaw's compaction loses detail. The tech preferences you told your agent last week, the decisions you made, the names you mentioned — gone after compression. The native MEMORY.md is plain text with keyword-only search that mostly misses. Embeddings require remote API calls that cost money and leak data.
Solution
An OpenClaw plugin that stores memories in local chDB (embedded ClickHouse), generates vectors with Qwen3-Embedding-0.6B on-device, and runs semantic search via ClickHouse's native HNSW index. Fully local, zero API cost, zero data leakage.
Three-Layer Memory Model
This is the core of the entire system. The three layers are not tag categories — they are storage tiers with fundamentally different lifecycles, each with its own write rules, injection strategy, decay mechanics, and capacity constraints.
┌─────────────────────────────────────────────────────────────────┐
│ │
│ L0 Working Memory (current focus) │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ "User is building memory system Phase 2, last discussed │ │
│ │ HNSW index config" │ │
│ │ "Follow-up: user said demo due by Friday" │ │
│ └────────────────────────────────────────────────────────────┘ │
│ Always injected · Overwritten after every conversation · ≤500t │
│ │
├──────────────────────── ▲ overwrite ────────────────────────────┤
│ │
│ L1 Episodic Memory (event stream) │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ 03-04 14:30 Decided on Python core + JS thin-shell arch │ │
│ │ 03-04 10:15 Aligned API design with Alice, adopted gRPC │ │
│ │ 03-03 16:00 Researched chDB HNSW index, confirmed 25.8 GA│ │
│ │ 03-01 09:00 Project kickoff, goal: replace sqlite-vec │ │
│ │ ... │ │
│ └────────────────────────────────────────────────────────────┘ │
│ Retrieved on demand · Time-decayed · Compressed after 30d │
│ · 500 entries/month cap │
│ │
├──────── ▲ promote (recurring patterns) ▼ compress (old → sum) ─┤
│ │
│ L2 Semantic Memory (durable knowledge) │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ [preference] Prefers SwiftUI over UIKit │ │
│ │ [preference] Keep answers concise, skip basic explanations│ │
│ │ [knowledge] iOS developer, based in Singapore │ │
│ │ [person] Alice is the backend lead │ │
│ │ [project] Memory system project, goal: replace native │ │
│ │ [todo] Demo due by April 15 │ │
│ └────────────────────────────────────────────────────────────┘ │
│ Always injected · Rarely changed · Never auto-deleted │
│ · Overwritten only on contradiction │
│ │
└─────────────────────────────────────────────────────────────────┘
Layer Specifications
L0 Working Memory — Current Focus
Property
Spec
Nature
The agent's "scratchpad" — what it's doing right now
Capacity
Strictly ≤ 500 tokens (~3–5 sentences)
Injection
Always injected in full at the top of the system prompt
Write timing
LLM rewrites (not appends — overwrites) at end of every conversation
Decay
None. Each rewrite naturally keeps it fresh
Persistence
Stored in chDB, but only the latest 1 entry retained (per agent)
Typical content
Current task focus, where the last conversation left off, pending follow-ups, interaction style notes
Persistent facts about the user — "who the user is"
Capacity
No hard limit, but recommended ≤ 200 entries (keep it lean)
Injection
Always injected in full into system prompt (after working memory)
Injection budget
≤ 2000 tokens
Write timing
Promoted from episodic / new persistent facts discovered in conversation / manual user input
Decay
No automatic decay. LLM reviews once a month for stale information
Deletion
Only on contradiction (delete old + add new)
Typical content
User profile, tech preferences, people, project overviews, long-term todos
Inter-Layer Flow
Conversation input
│
├── direct write ──▶ L0 (overwritten at end of every conversation)
│
├── extract ──▶ L1 (specific events, decisions)
│
└── extract ──▶ L2 (newly discovered persistent facts — less frequent)
L1 ── compress ──▶ L1 summary entry (30+ day old entries merged into monthly summary)
L1 ── promote ──▶ L2 (tag recurs ≥ 3 times, LLM decides, then distills)
L2 ── contradiction override ──▶ L2 (new info conflicts with old → delete old + add new)
L1 ── decay cleanup ──▶ deleted (120 days without access + access_count=0)
Context Injection Example
What the agent sees at the start of every conversation:
[Working Memory — Current Focus]
User is building memory system Phase 2, last discussed HNSW index config.
Follow-up: user said demo due by Friday.
[Semantic Memory — User Profile]
- [preference] Prefers SwiftUI over UIKit
- [preference] Keep answers concise, skip basic explanations
- [knowledge] iOS developer, based in Singapore
- [person] Alice is the backend lead
- [project] Memory system project, goal: replace OpenClaw native memory
- [todo] Demo due by April 15
[Relevant Episodic — Events Related to This Conversation]
- 03-04 Decided on Python core + JS thin-shell architecture (score=0.85)
- 03-03 Researched chDB HNSW index, confirmed 25.8 GA (score=0.78)
- 03-01 Project kickoff, goal: replace sqlite-vec (score=0.71)
Single memories table; the layer column distinguishes the three tiers. Vectors are embedded as a column; ClickHouse HNSW index accelerates semantic search.
Field
Description
id
UUID
layer
working / episodic / semantic — the core field that determines lifecycle
category
decision / preference / event / person / project / knowledge / todo / insight
Single SQL hybrid (brute-force cosineDistance + keywords)
Embedding available + memories ≥ 50K
Two-stage (HNSW recall top-100 → rerank)
No embedding
Keywords only + tag matching + time decay
Note: Retrieval only applies to L1 Episodic. L0 and L2 are injected in full — they bypass retrieval entirely.
Extraction
frommemory_coreimportMemoryExtractorextractor=MemoryExtractor(db, emb)
# End of conversation: auto-extract and write to appropriate layersnew_ids=extractor.extract(
messages=[...],
llm_complete=my_llm_function,
session_id="sess-001",
)
# LLM automatically determines which layer (L0/L1/L2) each memory belongs to# Before compaction: emergency extract → write to L1new_ids=extractor.emergency_flush(
context="...", llm_complete=my_llm_function,
)
Maintenance
frommemory_coreimportmaintenancemaintenance.run_all(db, llm_complete=my_llm, emb=emb)
# Or run individually:maintenance.cleanup_stale(db, decay_days=120) # L1: clean long-unaccessedmaintenance.purge_deleted(db, days=7) # Physical deletionmaintenance.compress_episodic(db, my_llm, emb, # L1: monthly compressionmonth="2026-01")
maintenance.promote_to_semantic(db, my_llm, emb) # L1 → L2: pattern promotionmaintenance.review_semantic(db, my_llm) # L2: review stale information