Skip to content

fix(agent): inject memory + reflections per-turn, not in frozen snapshot (#41)#43

Merged
mattmezza merged 9 commits into
mainfrom
feat/per-turn-memory
Jun 28, 2026
Merged

fix(agent): inject memory + reflections per-turn, not in frozen snapshot (#41)#43
mattmezza merged 9 commits into
mainfrom
feat/per-turn-memory

Conversation

@mattmezza

Copy link
Copy Markdown
Owner

Closes #41.

Problem

In session mode the static system prompt is snapshotted once per session so the cacheable prefix stays stable and the provider's prompt cache hits. But memories and task reflections were baked into that snapshot, so anything extracted mid-session was invisible to the model until /new rebuilt the snapshot. It bites hardest:

  1. After compaction — a fact extracted early, then compacted out of raw context and absent from the frozen snapshot, becomes genuinely invisible.
  2. Cross-chat — a fact taught in chat B lands in the global store but chat A's open session can't see it until A resets.
  3. Reflections — a lesson from a tool failure is most useful on the next similar task, often in the same session.

Fix (the cheap one)

Move memory + reflections out of the frozen snapshot and into the existing per-turn, never-cached preamble (_turn_preamble), which already injects live date/time + the execution plan onto the current user message.

  • Static snapshot keeps only truly static content (persona / skills / instructions) — still snapshotted once, cache intact.
  • Memory is fetched fresh each turn via memory.format_for_prompt(query=message) — already query-aware — so it is both fresh and per-turn relevance-ranked.
  • The preamble rides on the new user turn, which is uncached anyway, so there is no extra cache miss on the static prefix or prior messages. Cost is only the (bounded, top-k) memory block's own tokens, paid on a turn you'd pay for regardless — far cheaper than rebuilding the whole prompt.
  • Resolved the issue's open question as preamble for both modes → one code path; injection mode also gets query-relevant-per-turn memory.

Reflections are injected fresh too (still query-less / most-recent-N — making them query-aware is the noted follow-up). The recall_memory tool is the issue's explicitly-deferred phase 2 and is not in this PR.

Self-check

test_mid_session_memory_visible_next_turn_without_new asserts a long-term memory written mid-session appears in the next turn's preamble — without /new — while the static snapshot stays frozen.

Docs

Updated architecture.mdx and memory.mdx to describe per-turn injection (dropped the stale "snapshot holds memories" / "first message is the query" wording).

…hot (#41)

Session mode snapshots the static system prompt once per session so the
cacheable prefix stays stable. But memories and task reflections were baked
into that snapshot, so anything extracted mid-session stayed invisible to the
model until /new — hurting most after compaction, across chats, and for
reflections meant for the next similar task in the same session.

Move both into the per-turn preamble (already an uncached seam carrying the
live date/time + execution plan). The static snapshot now holds only truly
static content (persona/skills/instructions); memory is fetched fresh each
turn via format_for_prompt(query=message), so it is both current and
relevance-ranked per turn. Cost is only the block's own (bounded, top-k)
tokens on the new turn — far cheaper than rebuilding the whole prefix.
#41)

Asserts a long-term memory written mid-session appears in the next turn's
preamble — without /new and without rebuilding the frozen session snapshot.
…n cost (#41)

Adversarial review: extend the mid-session self-check to insert a task
reflection too and assert <task_reflections> appears in the next preamble —
covering the third staleness case the issue names. Add a comment flagging that
session mode now retrieves per turn (intended; phase-2 recall tool if the store
grows huge).
@mattmezza mattmezza force-pushed the feat/per-turn-memory branch from cd73b75 to efd61da Compare June 28, 2026 20:29
…ate (#42)

Memory was global to the owner: every persona read and wrote one pool, so a
fitness coach could surface facts only ever told to the finance assistant, and
group multi-agent (#30) would silently share private memory.

Add a `scope` column to long_term + short_term ('' = shared owner-level,
'<persona>' = private to that persona), mirroring the secrets vault's two-tier
shared/scoped model (#19). Additive ALTER TABLE migration — existing rows
become shared, the correct default.

- Retrieval (format_for_prompt / get_relevant_long_term / get_short_term)
  filters scope IN ('', <active persona>); the default identity sees shared only.
- Extraction is tagged: the active persona is plumbed through; the extractor may
  mark a fact private to that persona, defaulting to shared when unsure.
- Dedup/UPDATE/DELETE candidates are bounded to shared + own scope, so a private
  fact can never merge into or delete another persona's memory.
- Consolidation promotes per scope; hygiene clusters within a scope only — no
  cross-persona merge.

Composes with the per-turn memory seam from the preceding change (#41): scope is
resolved from the active persona and passed into the same preamble injection.
…42)

Show each memory's scope (shared vs the owning persona) in the admin Memory
tables and the /memory/long-term + /memory/short-term JSON. Migrate-on-read so
a legacy DB gains the column even when no agent is running.
#42)

Covers the hard invariant: a persona's private memory is invisible to other
personas and the default identity across retrieval, dedup candidates, and the
hygiene pass (no cross-scope merge); legacy rows default to shared.
feat(memory): two-tier scoped memory — shared pool + per-persona private (#42)
@mattmezza mattmezza merged commit 634b16e into main Jun 28, 2026
1 check passed
@mattmezza mattmezza deleted the feat/per-turn-memory branch June 28, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session-mode memory/reflection staleness — inject fresh memories per-turn (+ recall tool)

1 participant