fix: agent loses context and halts after first session compaction by rumpl · Pull Request #3042 · docker/docker-agent

rumpl · 2026-06-09T19:31:28Z

Problem

After the first session compaction in a multi-agent run, the agent halts mid-task and replies as if it has no conversation history ("I understand you're looking for a session summary, but ... no previous conversation history visible").

Root cause

Two compounding bugs, surfaced by ead9745 / 8dba51f (2026-05-18) which expanded when compaction activates — four days before the issue was filed:

Phantom trigger in multi-agent runs: compactIfNeeded estimated newly-added tokens via sess.GetAllMessages(), which recurses into sub-sessions. The content produced by a transfer_task child was attributed to the parent session even though it never enters the parent's prompt (GetMessages skips sub-session items). The phantom tokens triggered compaction of a parent conversation that was actually tiny; with everything fitting the keep budget, the split resolved to the "compact everything, keep nothing" sentinel — wiping the user's task and the in-flight tool exchange. The agent's next prompt was literally just Session Summary: ..., which models read as the user asking for a summary. This also explains the "first compaction only" symptom: the first compaction fires while the parent history is still tiny; after re-prompting, later compactions keep a real tail.
Fixed budgets break small context windows: MaxSummaryTokens (16k) and maxKeepTokens (20k) are absolute constants. For models whose window resolves from provider_opts.context_size and is ≤ ~16k, the summarizer's input budget went to zero — it received only its own prompts, fabricated a "no history" non-summary, and that text replaced the entire session history.

Fix

Session.OwnMessages() (no sub-session recursion) now drives the compaction trigger's token accounting, so sub-agent work no longer causes phantom parent compactions.
Summary/keep budgets scale with the window (min(16k, limit/4) / min(20k, limit/5)); the scaled cap is also used for the summary call's max_tokens.
Safety net: RunLLM no-ops when no conversation message fits the summarization budget, instead of running the summarizer on an empty conversation and wiping history with the result.
ComputeFirstKeptEntry gains a contextLimit parameter so hook-supplied summaries share the same kept-tail policy.

Tests

TestCompactIfNeeded_IgnoresSubSessionTokens — regression test, verified to fail against the old trigger code.
TestCompactIfNeeded_TriggersOnOwnMessages — large own tool results still trigger.
TestRunLLM_SmallContextWindow — summarizer receives real conversation on an 8k window and a tail is kept.
TestRunLLM_NoConversationFits_NoOps — empty summarizer input no-ops instead of wiping history.

task build, task test, task lint all pass (only pre-existing, environment-dependent pkg/sandbox.TestExtraWorkspace failure, which also fails on clean main).

Note for reviewers

One residual hazard left untouched (documented contract defended in 1e9512e): a legitimately triggered compaction whose whole conversation fits the keep budget (possible with image-heavy histories — token estimates ignore images) still drops the tail via the "compact everything" sentinel. Happy to follow up with a "keep the last user turn on threshold/overflow compaction" change if desired.

Assisted-By: docker-agent

compactIfNeeded estimated the token impact of newly added messages via sess.GetAllMessages(), which recurses into sub-sessions. In multi-agent runs the content produced by a transfer_task child was therefore attributed to the parent session even though it never enters the parent's prompt (GetMessages skips sub-session items). The phantom tokens triggered a compaction of a parent conversation that was actually tiny; with everything fitting the keep budget the split resolved to the 'compact everything, keep nothing' sentinel, so the user's task and the in-flight tool exchange were wiped. The agent's next prompt was literally just 'Session Summary: ...', which models read as the user asking for a summary and answer with a confused 'I see no conversation history' reply, halting mid-task. Add Session.OwnMessages() (direct messages only, no sub-session recursion) and use it for the trigger's before/after counts so the estimate matches what the session actually sends. Fixes docker#2871 Assisted-By: docker-agent Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>

The compactor used fixed absolute budgets: MaxSummaryTokens (16k) was subtracted from the window when sizing the summarizer's input, and maxKeepTokens (20k) sized the verbatim-kept tail. Since ead9745 made compaction activate for models whose window resolves from provider_opts.context_size, both constants can exceed the entire window: contextAvailable went to zero, FirstIndexInBudget dropped every conversation message, and the summarizer received only its own prompts. It then fabricated an 'I see no conversation history' non-summary that replaced the real session history. Scale both budgets to the window (min(16k, limit/4) for the summary cap, min(20k, limit/5) for the kept tail) so the kept tail plus the summary always land well under the compaction threshold, and use the scaled cap for the summary call's max_tokens so small-window providers don't reject the request. As a safety net, RunLLM now no-ops when not a single conversation message fits the summarization budget (e.g. one giant tool result) instead of running the summarizer on an empty conversation and wiping the history with the result. ComputeFirstKeptEntry gains a contextLimit parameter so hook-supplied summaries share the same kept-tail policy; a non-positive limit falls back to the unscaled budget. Related to docker#2871 Assisted-By: docker-agent Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>

docker-agent

Assessment: 🟢 APPROVE

This PR correctly addresses two compounding compaction bugs:

Phantom token trigger — switching from GetAllMessages() (which recurses into sub-sessions) to OwnMessages() (which does not) ensures sub-agent token counts no longer falsely trigger parent-session compaction.
Fixed budget overflow — scaling MaxSummaryTokens and maxKeepTokens proportionally to the context window (limit/4 and limit/5) prevents the summarizer from consuming the entire budget on small-window models, and the len(messages) <= 2 no-op guard correctly prevents a fabricated non-summary from replacing real session history.

Verification summary:

The ApplyCompaction path only appends to s.Messages, so the sess.OwnMessages()[messageCountBefore:] slice in compactIfNeeded cannot panic (length is monotonically non-decreasing in a single-goroutine call chain).
OwnMessages() excluding system-role items is intentional and consistent with GetAllMessages(); the invariant system messages in GetMessages() are built dynamically and were never stored in session items.
All four new tests (TestCompactIfNeeded_IgnoresSubSessionTokens, TestCompactIfNeeded_TriggersOnOwnMessages, TestRunLLM_SmallContextWindow, TestRunLLM_NoConversationFits_NoOps) directly target the described regression scenarios.

No confirmed or likely bugs found in the changed code.

…windows After the fix in #3042, the summary and keep-tail token budgets used during session compaction scale proportionally to provider_opts.context_size instead of using absolute 16k/20k constants. Small-context-window models (≤ ~16k) no longer have their history wiped during compaction. Ref: #3042

rumpl added 2 commits June 9, 2026 21:30

rumpl requested a review from a team as a code owner June 9, 2026 19:31

dgageot approved these changes Jun 9, 2026

View reviewed changes

docker-agent reviewed Jun 9, 2026

View reviewed changes

rumpl merged commit 4af658c into docker:main Jun 9, 2026
8 checks passed

aheritier mentioned this pull request Jun 10, 2026

docs: update evaluation and compaction documentation #3044

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: agent loses context and halts after first session compaction#3042

fix: agent loses context and halts after first session compaction#3042
rumpl merged 2 commits into
docker:mainfrom
rumpl:fix/compaction-context-loss

rumpl commented Jun 9, 2026

Uh oh!

docker-agent left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rumpl commented Jun 9, 2026

Problem

Root cause

Fix

Tests

Note for reviewers

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants