Skip to content

Commit 5d9d9a1

Browse files
ihwooclaude
authored andcommitted
Add automatic conversation logging and sleep cycle memory extraction v0.6.0
All conversation turns are now auto-logged to SQLite and high-value turns are instantly extracted to ChromaDB. Sleep cycle batch-processes missed memories using a progressive RL extraction pipeline (heuristic → RL). New files: conversation_log.py, extraction.py Modified: bridge.py, sleep_cycle.py, graph_store.py, config.py, server.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 794c56a commit 5d9d9a1

File tree

11 files changed

+844
-12
lines changed

11 files changed

+844
-12
lines changed

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,29 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.6.0] - 2026-03-03
9+
10+
### Added
11+
12+
- **Automatic conversation logging** — All conversation turns are recorded to a SQLite append-only log (`conversation_log.db`) with WAL mode for concurrent safety
13+
- **Dual-saving in `auto_search`** — Every turn is logged to SQLite for batch processing, and high-value turns (matching personal/preference/tech/emotion patterns) are instantly extracted to ChromaDB
14+
- **Memory extraction pipeline** — New `extraction.py` module with three extractors:
15+
- `HeuristicMemoryExtractor` — Pattern-matching based (reuses `extract_keywords()` / `classify_category()`)
16+
- `RLMemoryExtractor` — MLP bandit for EXTRACT/SKIP binary decisions with imitation learning
17+
- `ProgressiveExtraction` — Manages transition: `heuristic_only``rl_assisted``rl_primary`
18+
- **Sleep cycle extraction task** (Task 0) — Batch-processes unprocessed conversation logs to catch memories missed by real-time heuristics, with deduplication (similarity ≥ 0.90)
19+
- **Sleep cycle log cleanup** (Task 5) — Deletes processed logs older than 30 days (configurable)
20+
- **Auto category classification**`memory_save` default category changed to `"auto"`, which auto-classifies content using pattern matching
21+
- **`extraction_source` metadata** — Tracks how each memory was created: `"heuristic"`, `"rl"`, `"auto"`, or `""` (manual)
22+
- **6 new extraction config options** in `SleepCycleConfig`: `enable_memory_extraction`, `extraction_max_turns`, `extraction_dedup_threshold`, `extraction_min_info_density`, `extraction_rl_confidence_threshold`, `log_retention_days`
23+
24+
### Changed
25+
26+
- `SleepCycleRunner` now accepts optional `conversation_log` parameter
27+
- `SleepCycleReport` includes `extraction` and `log_cleanup_deleted` fields
28+
- `MemoryNode` includes `extraction_source` field
29+
- `GraphMemoryStore.add_memory()` accepts `extraction_source` parameter
30+
831
## [0.3.0] - 2026-03-01
932

1033
### Changed

README.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Current AI memory tools have two critical problems:
2020
| Problem | How we solve it |
2121
|---------|----------------|
2222
| **Manual retrieval** — you must ask "do you remember X?" | `auto_search` runs every turn, injecting relevant memories automatically |
23+
| **Missed memories** — AI decides what to save, so experiences/stories get lost | Every turn is auto-logged; sleep cycle extracts what the AI missed |
2324
| **Token waste** — entire memory dump inserted into context | Multi-resolution composer selects top-K memories within a token budget |
2425

2526
## Key Features
@@ -30,8 +31,11 @@ Current AI memory tools have two critical problems:
3031
- **GraphRAG hybrid retrieval** — Vector similarity + graph traversal, fused and re-ranked by an RL re-ranker
3132
- **Auto-linking** — New memories automatically link to similar existing ones (similarity ≥ 0.92)
3233
- **Multi-resolution text** — Full text → summary → entity triples, composed within token budget
34+
- **Automatic conversation logging** — All turns recorded to SQLite; high-value turns instantly extracted to ChromaDB
35+
- **Sleep cycle memory extraction** — Batch-processes missed memories from conversation logs using progressive RL extraction
36+
- **Auto category classification**`memory_save` auto-classifies content category from patterns
3337
- **Forgetting pipeline** — Decay-based aging with consolidation, pinning, and immutable protection
34-
- **Sleep cycle** — Periodic maintenance: dedup, compress, forget, checkpoint
38+
- **Sleep cycle** — Periodic maintenance: extraction, dedup, compress, forget, checkpoint
3539
- **Live graph** — Real-time WebSocket visualization of the memory graph
3640
- **Multilingual** — Korean and English pattern support out of the box
3741

@@ -215,7 +219,7 @@ Open `http://127.0.0.1:8765` in a browser. Requires the `[live]` extra (`pip ins
215219
| `memory_pin` / `memory_unpin` | Protect memories from forgetting |
216220
| `memory_stats` | Total count and category breakdown |
217221
| `memory_visualize` | Generate interactive graph HTML |
218-
| `sleep_cycle_run` | Trigger maintenance (consolidation + forgetting + checkpoint) |
222+
| `sleep_cycle_run` | Trigger maintenance (extraction + consolidation + forgetting + checkpoint) |
219223
| `policy_status` | RL policy state (epsilon, action distribution, updates) |
220224
| `policy_decide` | Ask the RL policy for a SAVE/SKIP/RETRIEVE decision with reasoning |
221225

@@ -243,7 +247,7 @@ All settings via environment variables:
243247
```
244248
┌─────────────────────────────────────────────────┐
245249
│ MCP Client │
246-
(Claude Desktop / Claude Code)
250+
│ (Claude Desktop / Claude Code / OpenClaw)
247251
└────────────────────┬────────────────────────────┘
248252
│ stdio (JSON-RPC)
249253
┌────────────────────▼────────────────────────────┐
@@ -254,12 +258,12 @@ All settings via environment variables:
254258
│ RL Policy│ Retrieval│ Storage │ Maintenance │
255259
│ │ │ │ │
256260
│ Rule- │ ChromaDB │ Graph │ Sleep Cycle │
257-
│ Based + │ vector + │ Memory │ (consolidation,
258-
│ MLP │ Knowledge│ Store │ forgetting,
259-
│ Bandit │ Graph │ │ checkpoints)
260-
│ │ (GraphRAG)│ │
261-
│ Re-ranker│ │ │ │
262-
│ (11d MLP)│ │
261+
│ Based + │ vector + │ Memory │ (extraction,
262+
│ MLP │ Knowledge│ Store │ consolidation,
263+
│ Bandit │ Graph │ │ forgetting,
264+
│ │ (GraphRAG)│ │ checkpoints)
265+
│ Re-ranker│ │ SQLite │ │
266+
│ (11d MLP)│ │ Conv Log │ Extraction RL
263267
└──────────┴──────────┴──────────┴─────────────────┘
264268
↕ WebSocket (cross-process)
265269
┌──────────────────────────────────────────────────┐

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "long-term-memory"
3-
version = "0.5.1"
3+
version = "0.6.0"
44
description = "Long-term memory system for AI assistants — persistent, searchable, self-organizing memory powered by semantic search, knowledge graphs, and reinforcement learning"
55
requires-python = ">=3.11,<3.14"
66
license = "MIT"

src/aimemory/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
"""Long-Term Memory System for AI assistants."""
22

3-
__version__ = "0.4.2"
3+
__version__ = "0.6.0"

src/aimemory/config.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,14 @@ class SleepCycleConfig(BaseModel):
183183
checkpoint_dir: str = "checkpoints/sleep_cycle"
184184
report_dir: str = "data/reports/sleep_cycle"
185185

186+
# Memory extraction from conversation logs
187+
enable_memory_extraction: bool = True
188+
extraction_max_turns: int = 500
189+
extraction_dedup_threshold: float = 0.90
190+
extraction_min_info_density: float = 0.1
191+
extraction_rl_confidence_threshold: int = 50
192+
log_retention_days: int = 30
193+
186194

187195
class ComposerConfig(BaseModel):
188196
"""Context composer configuration."""

src/aimemory/mcp/bridge.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,14 @@
55
import logging
66
import math
77
import os
8+
import uuid as _uuid
9+
from pathlib import Path
810
from typing import Any
911

1012
from aimemory.config import MCPServerConfig
1113
from aimemory.live_graph.notify import notify_live_graph
1214
from aimemory.memory.composer import ContextComposer
15+
from aimemory.memory.conversation_log import ConversationLog
1316
from aimemory.memory.graph_store import GraphMemoryStore, ImmutableMemoryError, MemoryNode
1417
from aimemory.memory.sleep_cycle import SleepCycleRunner
1518
from aimemory.online.policy import MemoryPolicyAgent, OnlinePolicy, StateEncoder
@@ -173,9 +176,21 @@ def __init__(
173176
reranker=self._reranker,
174177
)
175178

179+
# Conversation log for automatic turn recording
180+
log_db_path = Path(self._persist_directory) / "conversation_log.db"
181+
self._conversation_log = ConversationLog(log_db_path)
182+
self._conversation_id = _uuid.uuid4().hex[:16]
183+
self._turn_counter = 0
184+
185+
# Heuristic extractor for real-time extraction in auto_search
186+
from aimemory.memory.extraction import HeuristicMemoryExtractor
187+
188+
self._heuristic_extractor = HeuristicMemoryExtractor()
189+
176190
self._sleep_runner = SleepCycleRunner(
177191
store=self._store,
178192
policy=self._policy,
193+
conversation_log=self._conversation_log,
179194
)
180195

181196
# Track recent policy actions for status reporting
@@ -301,9 +316,50 @@ def auto_search(
301316
"""Search for relevant memories and compose a context string.
302317
303318
Returns dict with context string, memory count, token count, and details.
319+
Also performs dual-saving: SQLite log + heuristic instant extraction.
304320
"""
305321
import random
306322

323+
# ── Dual saving: record turn + heuristic instant extraction ──
324+
try:
325+
self._turn_counter += 1
326+
# 1. Always log to SQLite (for sleep cycle batch processing)
327+
self._conversation_log.append_turn(
328+
conversation_id=self._conversation_id,
329+
turn_index=self._turn_counter,
330+
role="user",
331+
content=user_message,
332+
)
333+
334+
# 2. Heuristic instant filter: extract high-value turns to ChromaDB immediately
335+
if len(user_message.strip()) > 20:
336+
candidate = self._heuristic_extractor.evaluate(user_message, role="user")
337+
if candidate.should_extract:
338+
# Dedup check before saving
339+
existing = self._store.search(user_message, top_k=1, track_access=False)
340+
is_dup = (
341+
existing
342+
and existing[0].similarity_score is not None
343+
and existing[0].similarity_score >= 0.90
344+
)
345+
if not is_dup:
346+
content = user_message[:300].strip()
347+
self._store.add_memory(
348+
content=content,
349+
keywords=candidate.keywords,
350+
category=candidate.category,
351+
conversation_id=self._conversation_id,
352+
extraction_source="auto",
353+
)
354+
logger.debug(
355+
"Auto-extracted memory from turn %d (category=%s)",
356+
self._turn_counter,
357+
candidate.category,
358+
)
359+
except Exception:
360+
# Logging/extraction failure must never block search
361+
logger.debug("Conversation logging/extraction failed", exc_info=True)
362+
307363
budget = token_budget or self._token_budget
308364
effective_top_k = top_k or self._top_k
309365

src/aimemory/mcp/server.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ def _get_bridge() -> MemoryBridge:
6767
async def memory_save(
6868
content: str,
6969
keywords: list[str] | None = None,
70-
category: str = "fact",
70+
category: str = "auto",
7171
related_ids: list[str] | None = None,
7272
immutable: bool = False,
7373
pinned: bool = False,
@@ -79,11 +79,29 @@ async def memory_save(
7979
keywords: Optional list of keywords. Auto-extracted if not provided.
8080
category: Memory category. One of: fact, preference,
8181
experience, emotion, technical, core_principle.
82+
Defaults to "auto" which auto-classifies from content.
8283
related_ids: Optional list of memory IDs to link as related.
8384
immutable: If True, memory cannot be updated or deleted.
8485
pinned: If True, memory is protected from the forgetting pipeline.
8586
"""
8687
try:
88+
# Auto-classify category if "auto"
89+
if category == "auto":
90+
from aimemory.selfplay.memory_agent import classify_category, extract_keywords
91+
92+
auto_keywords = keywords or extract_keywords(content)
93+
raw = classify_category(content, auto_keywords)
94+
cat_map = {
95+
"general": "fact",
96+
"personal": "fact",
97+
"technical": "technical",
98+
"preference": "preference",
99+
}
100+
category = cat_map.get(raw, "fact")
101+
# Auto-extract keywords if not provided
102+
if keywords is None:
103+
keywords = auto_keywords
104+
87105
result = _get_bridge().save_memory(
88106
content=content,
89107
keywords=keywords,
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
"""SQLite-based append-only conversation log.
2+
3+
Stores all conversation turns for batch processing during sleep cycles.
4+
Uses WAL mode for concurrent read/write safety.
5+
"""
6+
7+
from __future__ import annotations
8+
9+
import logging
10+
import sqlite3
11+
from datetime import datetime, timedelta
12+
from pathlib import Path
13+
14+
logger = logging.getLogger(__name__)
15+
16+
_SCHEMA = """
17+
CREATE TABLE IF NOT EXISTS conversation_turns (
18+
id INTEGER PRIMARY KEY AUTOINCREMENT,
19+
conversation_id TEXT NOT NULL,
20+
turn_index INTEGER NOT NULL,
21+
role TEXT NOT NULL,
22+
content TEXT NOT NULL,
23+
timestamp TEXT NOT NULL,
24+
processed INTEGER NOT NULL DEFAULT 0
25+
);
26+
27+
CREATE INDEX IF NOT EXISTS idx_conv_id ON conversation_turns(conversation_id);
28+
CREATE INDEX IF NOT EXISTS idx_processed ON conversation_turns(processed);
29+
CREATE INDEX IF NOT EXISTS idx_timestamp ON conversation_turns(timestamp);
30+
"""
31+
32+
33+
class ConversationLog:
34+
"""Append-only conversation log backed by SQLite.
35+
36+
All conversation turns are recorded for later batch processing
37+
by the sleep cycle memory extraction pipeline.
38+
"""
39+
40+
def __init__(self, db_path: str | Path) -> None:
41+
self._db_path = str(db_path)
42+
self._conn = sqlite3.connect(self._db_path, check_same_thread=False)
43+
self._conn.execute("PRAGMA journal_mode=WAL")
44+
self._conn.executescript(_SCHEMA)
45+
self._conn.commit()
46+
47+
def append_turn(
48+
self,
49+
conversation_id: str,
50+
turn_index: int,
51+
role: str,
52+
content: str,
53+
) -> int:
54+
"""Append a conversation turn. Returns the row id."""
55+
now = datetime.now().isoformat()
56+
cursor = self._conn.execute(
57+
"INSERT INTO conversation_turns (conversation_id, turn_index, role, content, timestamp) "
58+
"VALUES (?, ?, ?, ?, ?)",
59+
(conversation_id, turn_index, role, content, now),
60+
)
61+
self._conn.commit()
62+
return cursor.lastrowid or 0
63+
64+
def get_unprocessed_turns(self, limit: int = 500) -> list[dict]:
65+
"""Get unprocessed turns ordered by conversation and turn index.
66+
67+
Returns list of dicts with keys: id, conversation_id, turn_index, role, content, timestamp.
68+
"""
69+
cursor = self._conn.execute(
70+
"SELECT id, conversation_id, turn_index, role, content, timestamp "
71+
"FROM conversation_turns "
72+
"WHERE processed = 0 "
73+
"ORDER BY conversation_id, turn_index "
74+
"LIMIT ?",
75+
(limit,),
76+
)
77+
return [
78+
{
79+
"id": row[0],
80+
"conversation_id": row[1],
81+
"turn_index": row[2],
82+
"role": row[3],
83+
"content": row[4],
84+
"timestamp": row[5],
85+
}
86+
for row in cursor.fetchall()
87+
]
88+
89+
def get_conversation(self, conversation_id: str) -> list[dict]:
90+
"""Get all turns for a specific conversation, ordered by turn index."""
91+
cursor = self._conn.execute(
92+
"SELECT id, conversation_id, turn_index, role, content, timestamp, processed "
93+
"FROM conversation_turns "
94+
"WHERE conversation_id = ? "
95+
"ORDER BY turn_index",
96+
(conversation_id,),
97+
)
98+
return [
99+
{
100+
"id": row[0],
101+
"conversation_id": row[1],
102+
"turn_index": row[2],
103+
"role": row[3],
104+
"content": row[4],
105+
"timestamp": row[5],
106+
"processed": bool(row[6]),
107+
}
108+
for row in cursor.fetchall()
109+
]
110+
111+
def mark_processed(self, turn_ids: list[int]) -> int:
112+
"""Mark turns as processed. Returns number of rows updated."""
113+
if not turn_ids:
114+
return 0
115+
placeholders = ",".join("?" for _ in turn_ids)
116+
cursor = self._conn.execute(
117+
f"UPDATE conversation_turns SET processed = 1 WHERE id IN ({placeholders})",
118+
turn_ids,
119+
)
120+
self._conn.commit()
121+
return cursor.rowcount
122+
123+
def cleanup_old(self, days: int = 30) -> int:
124+
"""Delete processed turns older than `days`. Returns number of rows deleted."""
125+
cutoff = (datetime.now() - timedelta(days=days)).isoformat()
126+
cursor = self._conn.execute(
127+
"DELETE FROM conversation_turns WHERE processed = 1 AND timestamp < ?",
128+
(cutoff,),
129+
)
130+
self._conn.commit()
131+
deleted = cursor.rowcount
132+
if deleted > 0:
133+
logger.info("Cleaned up %d old conversation turns (older than %d days)", deleted, days)
134+
return deleted
135+
136+
def count(self, processed: bool | None = None) -> int:
137+
"""Count turns, optionally filtered by processed status."""
138+
if processed is None:
139+
cursor = self._conn.execute("SELECT COUNT(*) FROM conversation_turns")
140+
else:
141+
cursor = self._conn.execute(
142+
"SELECT COUNT(*) FROM conversation_turns WHERE processed = ?",
143+
(1 if processed else 0,),
144+
)
145+
return cursor.fetchone()[0]
146+
147+
def close(self) -> None:
148+
"""Close the database connection."""
149+
self._conn.close()

0 commit comments

Comments
 (0)