feat(memory): recall_memory tool — deliberate full-store semantic lookup (#47)#49
Merged
Conversation
…kup (#47) Phase 2 of #41. Always-on top-k injection covers the obvious facts but can't surface everything as the store grows, and the agent can't ask for a fact it doesn't suspect it has. Pair it with a recall_memory tool the agent calls on demand: - MemoryStore.recall(query, limit, scope): semantic search over the agent's ENTIRE long-term store (archived included), ranked purely by relevance above a small floor; lexical-overlap fallback when embeddings are off. - Scope-aware (#42): filtered to scope IN ('', <persona>) exactly like the injection readers, so a persona only recalls shared + its own private memories, never another persona's. Scope is the per-turn persona_name. - Recalling revives matches — reinforces them and un-archives any that were archived (looked up + matched = warm again). - recall_memory tool: read-only (ALWAYS permission, no prompt), always retained per persona like load_skill (memory injection is always-on too, so recall exposes nothing extra); nudged in the memory prompt. - recall_top_k config knob (default 10), hot-reloaded in patch_config.
…ch (#47) Covers semantic ranking, relevance-floor exclusion, archived-row search + un-archive/reinforce, limit cap, lexical fallback without an embedder, per-persona scope isolation (a persona never recalls another's private rows), tool dispatch with scope plumbing, and the blank-query guard.
#47) The floor-exclusion assertion lived on the embedding path, where the test _HashEmbedder buckets tokens via salted hash() into 64 dims — collisions pushed the unrelated row above the floor for ~20% of PYTHONHASHSEEDs (e.g. seed 42), failing intermittently. Move the exclusion check to the lexical-fallback path, where zero token overlap is deterministic and still guards _RECALL_MIN_RELEVANCE against being lowered to 0. Verified green across seeds 0/1/2/7/42/123/999. Also drop a placeholder (#) markdown link in the memory docs.
Sibling of injection_top_k: surfaces memory.embedding.recall_top_k via the config GET (emb_recall_top_k) and a number input next to the existing top-k field, capped at 25 (the recall hard limit) so a set value is never silently clamped. Saved through the same embedding-settings PATCH, which already hot-applies it to the running store — no restart.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #47. Phase 2 of #41.
Why
Memory injection (PR #43) always injects a small relevance-ranked top-k into each turn. That covers the obvious facts, but as the store grows top-k can't surface everything — and the agent can't ask for a fact it doesn't suspect it has. The robust shape is hybrid: always-inject top-k plus a deliberate recall tool for the long tail.
What
A
recall_memorytool the agent calls on demand to semantically search its full long-term store.MemoryStore.recall(query, limit, scope)— embeds the query and ranks the entire long-term store (archived rows included) by pure semantic relevance, returning the best matches above a small floor. Falls back to lexical token-overlap when embeddings are disabled, so it always works.scope IN ('', <active persona>)exactly like the injection readers, so a persona only recalls shared facts + its own private ones, never another persona's. The scope is the per-turnpersona_name; the default identity recalls shared only.access_count/last_accessed) and un-archives any that were archived (a fact the agent looked up and used is warm again).ALWAYSpermission, no approval prompt), always retained per persona likeload_skill(memory injection is always-on too, so recall exposes nothing extra). Nudged in the memory prompt instruction and the memory skill.recall_top_kconfig knob (default 10), besideinjection_top_k, hot-reloaded inpatch_config.Notes
mainand rebased onto currentmain(which now carries scoped memory Two-tier scoped memory — shared owner pool + per-persona private memory #42 + per-turn injection fix(agent): inject memory + reflections per-turn, not in frozen snapshot (#41) #43). An adversarial multi-agent review flagged that an unscoped recall query would have silently become a cross-persona leak on the scope-aware base — fixed by makingrecall()scope-aware and threading the persona's scope through the tool; covered by a new scope-isolation test.Tests
uv run pytest— 471 passed. New coverage: semantic ranking, relevance-floor exclusion, archived-row search + un-archive/reinforce, limit cap, lexical fallback, per-persona scope isolation, tool dispatch with scope plumbing, blank-query guard. Docs (docs/memory.mdx,skills/memory.md) updated.