Fix non-terminating stash retrieve loop on explicit working-memory reads by rockfordlhotka · Pull Request #467 · MarimerLLC/rockbot

rockfordlhotka · 2026-06-10T18:25:43Z

Problem

The 2026-06-10 10am communications briefing cron job spent 15 minutes and timed out. Investigation of the live k8s cluster showed calendar-mcp was healthy and fast (yesterday's IMAP fix #69 / image 1.4.1 works — email/calendar calls complete in 1–6s). The timeout was a non-terminating stash retrieve loop in the agent host.

When a tool result is too large, the host trims it (head + elision marker + tail) and stashes the full original in working memory under stash/{session}/{callId}, telling the model to fetch it via GetFromWorkingMemory. But the retrieval result was itself oversized, so the per-call cap (CapToolResultAsync) and watermark trimmer (ToolResultTrimmer) re-stashed it under the retrieval call's new id and advertised that key back. The model fetched it, got a larger reference, which was re-stashed again — looping ~35s/iteration until the budget killed it:

GetFromWorkingMemory(stash/.../call_iz7u7s2) -> big -> re-stashed as call_V5QImO2
GetFromWorkingMemory(stash/.../call_V5QImO2) -> big -> re-stashed as call_Qm6TZL
GetFromWorkingMemory(stash/.../call_Qm6TZL)  -> ... (until 15-min timeout)

The earlier llm-high-tier-cost-guard subagent hit the identical loop.

Fix

ChunkingAIFunction already exempted the working-memory read tools (GetFromWorkingMemory/SearchWorkingMemory/ListWorkingMemory) from re-chunking for exactly this reason. This PR centralizes that exemption in a shared StashExemptTools set and honors it in all three paths:

StashExemptTools (new) — single source of truth.
ChunkingAIFunction — uses the shared set (removed private duplicate).
CapToolResultAsync — returns explicit-retrieval results unchanged.
ToolResultTrimmer.TrimAsync — skips exempt results when picking the largest result to trim.

An explicit retrieval is now always returned in full and never re-stashed.

Tests

Added 4 regression tests (with [Timeout] guards mirroring the real loop). RockBot.Host.Tests: 1061 passed, 0 failed.

Deployment

Version bumped 0.12.29 -> 0.12.30. Image rockylhotka/rockbot-agent:0.12.30 built, pushed, and deployed to the live rockbot namespace via kubectl set image for testing; calendar-mcp confirms Client (RockBot.Agent 0.12.30.0) is live.

The per-call tool-result cap (CapToolResultAsync) and the watermark trimmer (ToolResultTrimmer) re-stashed the result of an explicit GetFromWorkingMemory retrieval under the retrieval call's new id, then advertised that new key back to the model. The model fetched it, got a slightly larger reference, which was re-stashed again -- a retrieve->re-stash->retrieve loop that made no progress until the iteration/timeout budget killed it. Observed 2026-06-10: a communications-briefing subagent burned its full 15-minute budget this way after pulling a ~15k-char multi-account email payload. ChunkingAIFunction already exempted these working-memory read tools from re-chunking for the same reason. Centralize that exemption in a shared StashExemptTools set and honor it in all three paths (chunk, cap, trim) so an explicit retrieval is always returned in full and never re-stashed. Bump version to 0.12.30. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rockfordlhotka merged commit 2447dd4 into main Jun 10, 2026
2 checks passed

rockfordlhotka deleted the fix/stash-retrieval-loop branch June 10, 2026 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix non-terminating stash retrieve loop on explicit working-memory reads#467

Fix non-terminating stash retrieve loop on explicit working-memory reads#467
rockfordlhotka merged 1 commit into
mainfrom
fix/stash-retrieval-loop

rockfordlhotka commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rockfordlhotka commented Jun 10, 2026

Problem

Fix

Tests

Deployment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant