Skip to content

fix: agent loses context and halts after first session compaction#3042

Merged
rumpl merged 2 commits into
docker:mainfrom
rumpl:fix/compaction-context-loss
Jun 9, 2026
Merged

fix: agent loses context and halts after first session compaction#3042
rumpl merged 2 commits into
docker:mainfrom
rumpl:fix/compaction-context-loss

Conversation

@rumpl

@rumpl rumpl commented Jun 9, 2026

Copy link
Copy Markdown
Member

Fixes #2871

Problem

After the first session compaction in a multi-agent run, the agent halts mid-task and replies as if it has no conversation history ("I understand you're looking for a session summary, but ... no previous conversation history visible").

Root cause

Two compounding bugs, surfaced by ead9745 / 8dba51f (2026-05-18) which expanded when compaction activates — four days before the issue was filed:

  1. Phantom trigger in multi-agent runs: compactIfNeeded estimated newly-added tokens via sess.GetAllMessages(), which recurses into sub-sessions. The content produced by a transfer_task child was attributed to the parent session even though it never enters the parent's prompt (GetMessages skips sub-session items). The phantom tokens triggered compaction of a parent conversation that was actually tiny; with everything fitting the keep budget, the split resolved to the "compact everything, keep nothing" sentinel — wiping the user's task and the in-flight tool exchange. The agent's next prompt was literally just Session Summary: ..., which models read as the user asking for a summary. This also explains the "first compaction only" symptom: the first compaction fires while the parent history is still tiny; after re-prompting, later compactions keep a real tail.

  2. Fixed budgets break small context windows: MaxSummaryTokens (16k) and maxKeepTokens (20k) are absolute constants. For models whose window resolves from provider_opts.context_size and is ≤ ~16k, the summarizer's input budget went to zero — it received only its own prompts, fabricated a "no history" non-summary, and that text replaced the entire session history.

Fix

  • Session.OwnMessages() (no sub-session recursion) now drives the compaction trigger's token accounting, so sub-agent work no longer causes phantom parent compactions.
  • Summary/keep budgets scale with the window (min(16k, limit/4) / min(20k, limit/5)); the scaled cap is also used for the summary call's max_tokens.
  • Safety net: RunLLM no-ops when no conversation message fits the summarization budget, instead of running the summarizer on an empty conversation and wiping history with the result.
  • ComputeFirstKeptEntry gains a contextLimit parameter so hook-supplied summaries share the same kept-tail policy.

Tests

  • TestCompactIfNeeded_IgnoresSubSessionTokens — regression test, verified to fail against the old trigger code.
  • TestCompactIfNeeded_TriggersOnOwnMessages — large own tool results still trigger.
  • TestRunLLM_SmallContextWindow — summarizer receives real conversation on an 8k window and a tail is kept.
  • TestRunLLM_NoConversationFits_NoOps — empty summarizer input no-ops instead of wiping history.

task build, task test, task lint all pass (only pre-existing, environment-dependent pkg/sandbox.TestExtraWorkspace failure, which also fails on clean main).

Note for reviewers

One residual hazard left untouched (documented contract defended in 1e9512e): a legitimately triggered compaction whose whole conversation fits the keep budget (possible with image-heavy histories — token estimates ignore images) still drops the tail via the "compact everything" sentinel. Happy to follow up with a "keep the last user turn on threshold/overflow compaction" change if desired.

Assisted-By: docker-agent

rumpl added 2 commits June 9, 2026 21:30
compactIfNeeded estimated the token impact of newly added messages via
sess.GetAllMessages(), which recurses into sub-sessions. In multi-agent
runs the content produced by a transfer_task child was therefore
attributed to the parent session even though it never enters the
parent's prompt (GetMessages skips sub-session items).

The phantom tokens triggered a compaction of a parent conversation that
was actually tiny; with everything fitting the keep budget the split
resolved to the 'compact everything, keep nothing' sentinel, so the
user's task and the in-flight tool exchange were wiped. The agent's next
prompt was literally just 'Session Summary: ...', which models read as
the user asking for a summary and answer with a confused 'I see no
conversation history' reply, halting mid-task.

Add Session.OwnMessages() (direct messages only, no sub-session
recursion) and use it for the trigger's before/after counts so the
estimate matches what the session actually sends.

Fixes docker#2871

Assisted-By: docker-agent
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
The compactor used fixed absolute budgets: MaxSummaryTokens (16k) was
subtracted from the window when sizing the summarizer's input, and
maxKeepTokens (20k) sized the verbatim-kept tail. Since ead9745 made
compaction activate for models whose window resolves from
provider_opts.context_size, both constants can exceed the entire
window: contextAvailable went to zero, FirstIndexInBudget dropped every
conversation message, and the summarizer received only its own prompts.
It then fabricated an 'I see no conversation history' non-summary that
replaced the real session history.

Scale both budgets to the window (min(16k, limit/4) for the summary
cap, min(20k, limit/5) for the kept tail) so the kept tail plus the
summary always land well under the compaction threshold, and use the
scaled cap for the summary call's max_tokens so small-window providers
don't reject the request.

As a safety net, RunLLM now no-ops when not a single conversation
message fits the summarization budget (e.g. one giant tool result)
instead of running the summarizer on an empty conversation and wiping
the history with the result.

ComputeFirstKeptEntry gains a contextLimit parameter so hook-supplied
summaries share the same kept-tail policy; a non-positive limit falls
back to the unscaled budget.

Related to docker#2871

Assisted-By: docker-agent
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
@rumpl rumpl requested a review from a team as a code owner June 9, 2026 19:31

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

This PR correctly addresses two compounding compaction bugs:

  1. Phantom token trigger — switching from GetAllMessages() (which recurses into sub-sessions) to OwnMessages() (which does not) ensures sub-agent token counts no longer falsely trigger parent-session compaction.
  2. Fixed budget overflow — scaling MaxSummaryTokens and maxKeepTokens proportionally to the context window (limit/4 and limit/5) prevents the summarizer from consuming the entire budget on small-window models, and the len(messages) <= 2 no-op guard correctly prevents a fabricated non-summary from replacing real session history.

Verification summary:

  • The ApplyCompaction path only appends to s.Messages, so the sess.OwnMessages()[messageCountBefore:] slice in compactIfNeeded cannot panic (length is monotonically non-decreasing in a single-goroutine call chain).
  • OwnMessages() excluding system-role items is intentional and consistent with GetAllMessages(); the invariant system messages in GetMessages() are built dynamically and were never stored in session items.
  • All four new tests (TestCompactIfNeeded_IgnoresSubSessionTokens, TestCompactIfNeeded_TriggersOnOwnMessages, TestRunLLM_SmallContextWindow, TestRunLLM_NoConversationFits_NoOps) directly target the described regression scenarios.

No confirmed or likely bugs found in the changed code.

@rumpl rumpl merged commit 4af658c into docker:main Jun 9, 2026
8 checks passed
aheritier added a commit that referenced this pull request Jun 10, 2026
…windows

After the fix in #3042, the summary and keep-tail token budgets used during
session compaction scale proportionally to provider_opts.context_size instead
of using absolute 16k/20k constants. Small-context-window models (≤ ~16k)
no longer have their history wiped during compaction.

Ref: #3042
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent loses context and halts after first session compaction

3 participants