feat(memory): persisted go-vector episode index + featurization quality by jkyberneees · Pull Request #12 · BackendStack21/odek

jkyberneees · 2026-06-06T09:39:04Z

Summary

Fixes weaknesses #3 and #4 from the memory design review using only the existing go-vector library — no new embedding dependency.

Problem

#4 — LLM call on every turn. FormatEpisodeContext fires on every agent loop turn. With the default llm_search:true it called episodes.Search → NewLLMRanker → one llm.SimpleCall per turn. The RP fallback was no better at scale: NewRPRanker re-instantiated RandomProjections and re-fit + re-embedded every episode on every call — no caching at all.

#3 — RP quality ceiling. Episodes were embedded from the 120-char truncated index summary (not the full ~1 KB on-disk summary), at 64 dims while everything else used 256. Text was fed raw to go-vector, so "Postgres," and "postgres" were different tokens.

Solution

Persisted go-vector episode index (episode_index.go)

Mirrors internal/session/vector_index.go:

vector.Store (brute-force k-NN) + RandomProjections (256 dims) persisted as gob files.
Dirty-flag + full rebuild design. RP must be Fit on the full corpus to produce a valid vocabulary; incremental embedding with a stale Fit yields degenerate vectors. Since writes happen at session-end (infrequent) and recall fires every turn (frequent), the right trade-off is: markDirty() on write (no I/O), then rebuild from all on-disk summaries on the next search() call. One O(n) rebuild per new episode, then µs cosine per turn.
Process-wide singleton per memory dir (sharedEpisodeIndex, mirroring factsDirLock) so the per-connection MemoryManagers that odek serve builds all share one index and don't race on the gob files.
Reads full on-disk summaries for the rebuild corpus (not the 120-char index truncations).

Per-turn recall is now zero-LLM

FormatEpisodeContext → episodes.recallByVector → sharedEpisodeIndex.search → embed query + N cached cosines. No LLM call on the hot path, verified by test.

Explicit search: bounded LLM rerank

SearchEpisodes (the memory tool) now fetches candidates from the vector index first, then LLM-reranks only those candidates — fixing both the O(n)·LLM scaling problem and keeping relevance quality. llm_search now gates explicit-search reranking only; per-turn recall ignores it.

Featurization quality (featurize.go)

normalizeForEmbedding: lowercase + alphanumeric-only tokens (strips punctuation so "Postgres," == "postgres").
featurizeForEmbedding: normalise + append bigrams ("w1_w2") for local word-order signal. "uses postgres" and "uses mysql" share the "uses" unigram but differ on bigrams; RP now discriminates them.
Applied at the go-vector boundary in the episode index and MergeDetector (facts). Raw strings kept in m.corpus; only RP sees featurized text.

Verification

gofmt/build/vet clean.
go test -race ./internal/memory/... passes — all prior tests green + new tests:
- Zero LLM calls on per-turn recall (TestEpisodeIndex_FormatEpisodeContextNoLLM)
- Cold start: index rebuilds from disk, returns relevant episodes
- Persistence: second Init loads gobs, same results
- Dirty rebuild: new episode picked up after markDirty
- Provenance filter: untrusted/unapproved excluded from recallByVector
- Concurrent safety under -race: N goroutines sharing one dir
- Featurization discrimination: "uses postgres" vs "migrated to mysql" → add; vs "database is postgres" → merge/judge
- Featurize unit tests, absolute-path singleton identity

🤖 Generated with Claude Code

Fixes the two remaining memory weaknesses using only the existing go-vector library — no new embedding dependency. **#4 — per-turn LLM call eliminated.** FormatEpisodeContext previously called episodes.Search → NewLLMRanker → one llm.SimpleCall per turn. The RP fallback was no better at scale: NewRPRanker re-instantiated RandomProjections and re-fit + re-embedded every episode on every Search call (no caching). New episodeVectorIndex (mirroring session/vector_index.go) persists a go-vector Store + RandomProjections embedder to gob files. FormatEpisodeContext now calls episodes.recallByVector → sharedEpisodeIndex.search → embed query + N cached cosines. Zero LLM calls on the per-turn path. Design: dirty-flag + full rebuild. RP must be Fit on the full corpus to produce a valid vocabulary — incremental embedding after a stale Fit yields degenerate vectors. Episodes are written at session-end (infrequent); rebuild is triggered on the next search after any write, then cached for all subsequent turns until the next write. One O(n) rebuild per session-end, then µs cosine per turn. Process-wide singleton per memory directory (sharedEpisodeIndex, mirroring factsDirLock) prevents concurrent serve.go per-connection managers from racing on the gob files. SearchEpisodes (explicit memory tool) now fetches candidates from the vector index and LLM-reranks only those candidates (bounded), fixing the O(n)·LLM cost in the explicit search path too. llm_search now gates explicit search reranking only — per-turn recall always uses RP. **#3 — featurization quality.** New featurize.go: normalizeForEmbedding (lowercase + alphanumeric tokens, strips punctuation so "Postgres," == "postgres") + featurizeForEmbedding (normalise + bigrams "w1_w2" for light local word order). Applied at the go-vector boundary in both the episode index (full ~1KB on-disk summaries, up from 120-char truncated; 256 dims, up from 64) and MergeDetector.Fit/ Classify/AppendEntry/ReplaceEntry. Raw strings are preserved in m.corpus; only the RP boundary uses featurized text. Verified: FormatEpisodeContext fires zero LLM calls; postgres vs mysql episodes rank distinctly; postgres vs "database is postgres" ranks similar; persistence round-trips; dirty rebuild picks up new episodes; provenance filter holds; concurrent safety under -race; all existing tests green. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…4/D-05/D-06) Remediates the AI Verification Protocol findings on PR #12. D-05 (OOV zero-score bypass): when a query has no vocabulary overlap with the episode corpus, go-vector Embed returns a zero vector and Store.Search returns k results all with cosine similarity=0. recallByVector was returning those as non-empty candidates, so SearchEpisodes skipped the LLM fallback with noise. Fix: filter zero-score results in recallByVector before returning; an all-OOV query now returns nil, correctly triggering the fallback path in SearchEpisodes. D-06 (SearchEpisodes at 52.9% coverage): four branches untested. Added: llm_search=false (no LLM), nil LLM client, limit truncation, rankFn error fallback, and OOV query triggering the nil-then-fallback path. Coverage now 76.5%. D-04 (multi-process caveat undocumented): the in-process singleton serializes concurrent MemoryManagers within one process, but two separate odek processes sharing ~/.odek/memory are not serialized. Added explicit documentation to the singleton var block explaining the limitation and its bounded impact. D-01/D-02/D-03/D-07/D-08/D-09 confirmed held (no fix needed): double-checked locking is correct; per-TempDir tests prevent singleton bleed; corrupted emb gob falls back to rebuild; featurization is symmetric across all Fit/Embed paths. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

jkyberneees · 2026-06-06T09:58:18Z

Verification Protocol (v5.2.7) — Remediation pass (commit `f1e7309`)

All certificate findings closed.

Finding	Sev	Status
D-05 OOV query returns zero-score noise → LLM fallback bypassed	WARN	✅ Fixed — `recallByVector` now strips scores ≤ 0 before returning; all-OOV query returns nil, correctly triggering fallback in `SearchEpisodes`.
D-06 `SearchEpisodes` at 52.9% coverage (4 branches untested)	WARN	✅ Fixed — added `llm_search=false`, nil-LLM, limit-truncation, `rankFn`-error-fallback, and OOV-triggers-nil-then-fallback tests. Coverage: 52.9% → 76.5%.
D-04 Multi-process gob contention undocumented	WARN	✅ Documented — added explicit note to the singleton var block explaining the in-process vs cross-process serialization boundary and its bounded impact.
D-01 TOCTOU double-checked locking	info	✅ Confirmed correct — second waiter always hits double-check and returns.
D-02 Singleton bleed across tests	info	✅ Confirmed safe — each test uses unique `t.TempDir()`.
D-03 Mismatched gobs on partial write	info	✅ Self-heals — `tryLoadLocked` falls through to rebuild if emb gob is corrupt; verified with injected corruption.
D-07/08/09	info	✅ All held.

Re-scored axes

2.1 ✅ (OOV fix) · 2.2 ✅ (branches covered) · 2.3 ✅ · 2.4 ✅ (documented) · 2.5 ✅ · 2.6 ✅ · 2.7 ⚠️ (monoculture) · 2.8 ✅ · 2.9 ✅

Verdict: 🟡 HumanReviewRequired — monoculture only. No axis is 🔴 and no substantive finding remains. The HumanReviewRequired floor is the monoculture ρ penalty (same model authored and verified). All content findings are fixed.

jkyberneees and others added 2 commits June 6, 2026 11:38

jkyberneees merged commit 686fb6f into main Jun 6, 2026
6 checks passed

jkyberneees deleted the feat/memory-vector-recall branch June 6, 2026 10:24

jkyberneees mentioned this pull request Jun 6, 2026

feat(memory): deterministic buffer turn-summaries (no LLM) #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): persisted go-vector episode index + featurization quality#12

feat(memory): persisted go-vector episode index + featurization quality#12
jkyberneees merged 2 commits into
mainfrom
feat/memory-vector-recall

jkyberneees commented Jun 6, 2026

Uh oh!

jkyberneees commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkyberneees commented Jun 6, 2026

Summary

Problem

Solution

Persisted go-vector episode index (episode_index.go)

Per-turn recall is now zero-LLM

Explicit search: bounded LLM rerank

Featurization quality (featurize.go)

Verification

Uh oh!

jkyberneees commented Jun 6, 2026

Verification Protocol (v5.2.7) — Remediation pass (commit f1e7309)

Re-scored axes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Verification Protocol (v5.2.7) — Remediation pass (commit `f1e7309`)