Skip to content

feat(memory): recall_memory tool — deliberate full-store semantic lookup (#47)#49

Merged
mattmezza merged 5 commits into
mainfrom
feat/recall-memory
Jun 28, 2026
Merged

feat(memory): recall_memory tool — deliberate full-store semantic lookup (#47)#49
mattmezza merged 5 commits into
mainfrom
feat/recall-memory

Conversation

@mattmezza

Copy link
Copy Markdown
Owner

Closes #47. Phase 2 of #41.

Why

Memory injection (PR #43) always injects a small relevance-ranked top-k into each turn. That covers the obvious facts, but as the store grows top-k can't surface everything — and the agent can't ask for a fact it doesn't suspect it has. The robust shape is hybrid: always-inject top-k plus a deliberate recall tool for the long tail.

What

A recall_memory tool the agent calls on demand to semantically search its full long-term store.

  • MemoryStore.recall(query, limit, scope) — embeds the query and ranks the entire long-term store (archived rows included) by pure semantic relevance, returning the best matches above a small floor. Falls back to lexical token-overlap when embeddings are disabled, so it always works.
  • Scope-aware (Two-tier scoped memory — shared owner pool + per-persona private memory #42) — filtered to scope IN ('', <active persona>) exactly like the injection readers, so a persona only recalls shared facts + its own private ones, never another persona's. The scope is the per-turn persona_name; the default identity recalls shared only.
  • Recall revives matches — reinforces them (access_count / last_accessed) and un-archives any that were archived (a fact the agent looked up and used is warm again).
  • Tool wiring — read-only (ALWAYS permission, no approval prompt), always retained per persona like load_skill (memory injection is always-on too, so recall exposes nothing extra). Nudged in the memory prompt instruction and the memory skill.
  • recall_top_k config knob (default 10), beside injection_top_k, hot-reloaded in patch_config.

Notes

Tests

uv run pytest471 passed. New coverage: semantic ranking, relevance-floor exclusion, archived-row search + un-archive/reinforce, limit cap, lexical fallback, per-persona scope isolation, tool dispatch with scope plumbing, blank-query guard. Docs (docs/memory.mdx, skills/memory.md) updated.

…kup (#47)

Phase 2 of #41. Always-on top-k injection covers the obvious facts but
can't surface everything as the store grows, and the agent can't ask for
a fact it doesn't suspect it has. Pair it with a recall_memory tool the
agent calls on demand:

- MemoryStore.recall(query, limit, scope): semantic search over the agent's
  ENTIRE long-term store (archived included), ranked purely by relevance
  above a small floor; lexical-overlap fallback when embeddings are off.
- Scope-aware (#42): filtered to scope IN ('', <persona>) exactly like the
  injection readers, so a persona only recalls shared + its own private
  memories, never another persona's. Scope is the per-turn persona_name.
- Recalling revives matches — reinforces them and un-archives any that
  were archived (looked up + matched = warm again).
- recall_memory tool: read-only (ALWAYS permission, no prompt), always
  retained per persona like load_skill (memory injection is always-on too,
  so recall exposes nothing extra); nudged in the memory prompt.
- recall_top_k config knob (default 10), hot-reloaded in patch_config.
…ch (#47)

Covers semantic ranking, relevance-floor exclusion, archived-row search +
un-archive/reinforce, limit cap, lexical fallback without an embedder,
per-persona scope isolation (a persona never recalls another's private
rows), tool dispatch with scope plumbing, and the blank-query guard.
#47)

The floor-exclusion assertion lived on the embedding path, where the test
_HashEmbedder buckets tokens via salted hash() into 64 dims — collisions
pushed the unrelated row above the floor for ~20% of PYTHONHASHSEEDs
(e.g. seed 42), failing intermittently. Move the exclusion check to the
lexical-fallback path, where zero token overlap is deterministic and still
guards _RECALL_MIN_RELEVANCE against being lowered to 0. Verified green
across seeds 0/1/2/7/42/123/999.

Also drop a placeholder (#) markdown link in the memory docs.
Sibling of injection_top_k: surfaces memory.embedding.recall_top_k via the
config GET (emb_recall_top_k) and a number input next to the existing
top-k field, capped at 25 (the recall hard limit) so a set value is never
silently clamped. Saved through the same embedding-settings PATCH, which
already hot-applies it to the running store — no restart.
@mattmezza mattmezza merged commit 9b26415 into main Jun 28, 2026
1 check passed
@mattmezza mattmezza deleted the feat/recall-memory branch June 28, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add recall_memory tool for deliberate full-store semantic lookup (phase 2 of #41)

1 participant