Skip to content

proposal: make Max Iteration Limit visible to the LLM and have a Graceful Termination Message #2406

@VascoSch92

Description

@VascoSch92

Summary

The max_iterations parameter acts as a hard server-side ceiling on the number of LLM calls per conversation, but:

  1. The LLM is never informed of the limit. The system prompt contains zero mention of iteration budgets, step limits, or remaining turns. The agent has no way to plan its work within the budget.
  2. When the limit is (or would be) hit, there is no graceful degradation. The agent doesn't receive a warning as it approaches the ceiling, and there is no mechanism to prompt it to wrap up and provide a best-effort answer before being cut off.

What the code does when the limit is hit

From local_conversation.py:

  1. max_iterations_per_run is set on ConversationState (default 500) and tracked at line 594 as a local iteration counter.
  2. The run loop (lines 596-689) increments iteration after each agent.step() call and checks if iteration >= self.max_iteration_per_run.
  3. When the limit is reached, it hard-stops with an ERROR status and emits a ConversationErrorEvent with a MaxIterationsReached error.
  4. The iteration count is never passed to the agent — not in agent.step(), not in the system prompt, not in the context. The agent has zero visibility into how many steps it has left.
  5. There is no warning injection as the agent approaches the limit — no "you have N iterations remaining" message.

The result: the conversation abruptly terminates with an error event. Any work the agent has done but not yet synthesized into a final answer is lost. The instance scores as failed.

The Problem

Without budget awareness, the agent can't prioritize

If the agent knew it had N steps remaining, it could:

  • Plan ahead: tackle the most critical parts first
  • Wrap up early: provide a best-effort answer when time is running low
  • Avoid rabbit holes: skip expensive exploration paths when budget is tight
  • Manage subagent delegation: avoid spawning subagents late in the budget (subagent calls with max_iteration_per_run: null have no limit either, risking budget exhaustion)

Without this information, an agent at step 498/500 will happily start a 50-step research task and get killed mid-way.

Without a graceful termination message, answers are lost

When the limit is hit:

  • The run loop in local_conversation.py fires a ConversationErrorEvent with MaxIterationsReached and sets status to ERROR
  • The LLM never gets a chance to synthesize its findings into a final answer
  • All the work done up to that point is wasted — the instance scores as a hard error (not even "incorrect" — just failed)
  • This is especially costly for tasks where the agent has gathered the right data but hasn't yet formatted the answer

The error type is indistinguishable from real errors

MaxIterationsReached produces the same ERROR status as infrastructure failures, OOM crashes, or API errors. Downstream reporting (e.g., output_errors.jsonl, ERROR_LOGS.txt) treats iteration exhaustion identically to a crash. This makes it hard to:

  • Distinguish "ran out of budget" from "something broke"
  • Measure how often the limit is the bottleneck
  • Tune max_iterations based on data

Proposed Solutions

Option A: Communicate the budget to the LLM (recommended)

Add iteration budget info to the system prompt and/or inject warnings as the agent approaches the limit.

In the system prompt:

You have a budget of {max_iterations} steps for this task.
Plan your approach to complete the task within this budget.

Warning injection at ~80% and ~95% budget:

[SYSTEM] You have used {current}/{max_iterations} steps.
{remaining} steps remaining. Begin wrapping up and provide your best answer.

Option B: Graceful termination message (minimum fix)

If we don't want to expose the budget to the LLM (to avoid gaming or conservative behavior), at least inject a final message when the limit is about to be hit:

[SYSTEM] You are about to reach the maximum number of steps allowed.
This is your FINAL step. Provide your best answer NOW based on everything
you have gathered so far.

This gives the agent one last turn to produce an answer instead of being silently killed.

Option C: Return a distinct status instead of ERROR (minimum observability fix)

Instead of emitting a generic ConversationErrorEvent with ERROR status, return a dedicated status like MAX_ITERATIONS_REACHED or BUDGET_EXHAUSTED. This would:

  • Let eval harnesses distinguish "ran out of steps" from real crashes
  • Enable data-driven tuning of max_iterations
  • Allow retry logic to treat budget exhaustion differently (e.g., retry with higher limit vs. retry with same config)

In local_conversation.py, this could be as simple as:

# Instead of:
raise ConversationError("MaxIterationsReached")

# Use a distinct status:
self.state.status = ConversationStatus.ITERATION_LIMIT
return ConversationIterationLimitEvent(
    iteration=iteration,
    max_iterations=self.max_iteration_per_run,
    last_agent_action=last_action,  # preserve what the agent was doing
)

Option D: All of the above (ideal)

  • Light budget awareness in the system prompt (without exact numbers): "You have a limited number of steps. Work efficiently and be prepared to provide a best-effort answer if prompted."
  • Hard warning injection at 90-95% budget with exact remaining steps
  • Final-chance message at the last step
  • Distinct ITERATION_LIMIT status for observability

Additional Consideration: Subagent Iteration Limits

The max_iteration_per_run field for subagents is currently null (unlimited). This means a subagent could theoretically consume the entire remaining budget of the parent conversation. Consider:

  • Setting a default max_iteration_per_run for subagents (e.g., 20-50 steps)
  • Or deducting subagent steps from the parent's budget

Impact

Helpful for:

  • Harder benchmarks (SWE-bench, complex GAIA Level 3 tasks)
  • Lower max_iterations settings used for cost control
  • Tasks that involve extensive web research or multi-step reasoning
  • Heavy subagent usage patterns

The fix is low-effort (a few lines in the system prompt + a budget check before each LLM call) but high-impact — it turns a silent failure into a recoverable situation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalproposal for discussion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions