Summary
The max_iterations parameter acts as a hard server-side ceiling on the number of LLM calls per conversation, but:
- The LLM is never informed of the limit. The system prompt contains zero mention of iteration budgets, step limits, or remaining turns. The agent has no way to plan its work within the budget.
- When the limit is (or would be) hit, there is no graceful degradation. The agent doesn't receive a warning as it approaches the ceiling, and there is no mechanism to prompt it to wrap up and provide a best-effort answer before being cut off.
What the code does when the limit is hit
From local_conversation.py:
max_iterations_per_run is set on ConversationState (default 500) and tracked at line 594 as a local iteration counter.
- The run loop (lines 596-689) increments
iteration after each agent.step() call and checks if iteration >= self.max_iteration_per_run.
- When the limit is reached, it hard-stops with an
ERROR status and emits a ConversationErrorEvent with a MaxIterationsReached error.
- The iteration count is never passed to the agent — not in
agent.step(), not in the system prompt, not in the context. The agent has zero visibility into how many steps it has left.
- There is no warning injection as the agent approaches the limit — no "you have N iterations remaining" message.
The result: the conversation abruptly terminates with an error event. Any work the agent has done but not yet synthesized into a final answer is lost. The instance scores as failed.
The Problem
Without budget awareness, the agent can't prioritize
If the agent knew it had N steps remaining, it could:
- Plan ahead: tackle the most critical parts first
- Wrap up early: provide a best-effort answer when time is running low
- Avoid rabbit holes: skip expensive exploration paths when budget is tight
- Manage subagent delegation: avoid spawning subagents late in the budget (subagent calls with
max_iteration_per_run: null have no limit either, risking budget exhaustion)
Without this information, an agent at step 498/500 will happily start a 50-step research task and get killed mid-way.
Without a graceful termination message, answers are lost
When the limit is hit:
- The run loop in
local_conversation.py fires a ConversationErrorEvent with MaxIterationsReached and sets status to ERROR
- The LLM never gets a chance to synthesize its findings into a final answer
- All the work done up to that point is wasted — the instance scores as a hard error (not even "incorrect" — just failed)
- This is especially costly for tasks where the agent has gathered the right data but hasn't yet formatted the answer
The error type is indistinguishable from real errors
MaxIterationsReached produces the same ERROR status as infrastructure failures, OOM crashes, or API errors. Downstream reporting (e.g., output_errors.jsonl, ERROR_LOGS.txt) treats iteration exhaustion identically to a crash. This makes it hard to:
- Distinguish "ran out of budget" from "something broke"
- Measure how often the limit is the bottleneck
- Tune
max_iterations based on data
Proposed Solutions
Option A: Communicate the budget to the LLM (recommended)
Add iteration budget info to the system prompt and/or inject warnings as the agent approaches the limit.
In the system prompt:
You have a budget of {max_iterations} steps for this task.
Plan your approach to complete the task within this budget.
Warning injection at ~80% and ~95% budget:
[SYSTEM] You have used {current}/{max_iterations} steps.
{remaining} steps remaining. Begin wrapping up and provide your best answer.
Option B: Graceful termination message (minimum fix)
If we don't want to expose the budget to the LLM (to avoid gaming or conservative behavior), at least inject a final message when the limit is about to be hit:
[SYSTEM] You are about to reach the maximum number of steps allowed.
This is your FINAL step. Provide your best answer NOW based on everything
you have gathered so far.
This gives the agent one last turn to produce an answer instead of being silently killed.
Option C: Return a distinct status instead of ERROR (minimum observability fix)
Instead of emitting a generic ConversationErrorEvent with ERROR status, return a dedicated status like MAX_ITERATIONS_REACHED or BUDGET_EXHAUSTED. This would:
- Let eval harnesses distinguish "ran out of steps" from real crashes
- Enable data-driven tuning of
max_iterations
- Allow retry logic to treat budget exhaustion differently (e.g., retry with higher limit vs. retry with same config)
In local_conversation.py, this could be as simple as:
# Instead of:
raise ConversationError("MaxIterationsReached")
# Use a distinct status:
self.state.status = ConversationStatus.ITERATION_LIMIT
return ConversationIterationLimitEvent(
iteration=iteration,
max_iterations=self.max_iteration_per_run,
last_agent_action=last_action, # preserve what the agent was doing
)
Option D: All of the above (ideal)
- Light budget awareness in the system prompt (without exact numbers): "You have a limited number of steps. Work efficiently and be prepared to provide a best-effort answer if prompted."
- Hard warning injection at 90-95% budget with exact remaining steps
- Final-chance message at the last step
- Distinct
ITERATION_LIMIT status for observability
Additional Consideration: Subagent Iteration Limits
The max_iteration_per_run field for subagents is currently null (unlimited). This means a subagent could theoretically consume the entire remaining budget of the parent conversation. Consider:
- Setting a default
max_iteration_per_run for subagents (e.g., 20-50 steps)
- Or deducting subagent steps from the parent's budget
Impact
Helpful for:
- Harder benchmarks (SWE-bench, complex GAIA Level 3 tasks)
- Lower
max_iterations settings used for cost control
- Tasks that involve extensive web research or multi-step reasoning
- Heavy subagent usage patterns
The fix is low-effort (a few lines in the system prompt + a budget check before each LLM call) but high-impact — it turns a silent failure into a recoverable situation.
Summary
The
max_iterationsparameter acts as a hard server-side ceiling on the number of LLM calls per conversation, but:What the code does when the limit is hit
From
local_conversation.py:max_iterations_per_runis set onConversationState(default 500) and tracked at line 594 as a local iteration counter.iterationafter eachagent.step()call and checksif iteration >= self.max_iteration_per_run.ERRORstatus and emits aConversationErrorEventwith aMaxIterationsReachederror.agent.step(), not in the system prompt, not in the context. The agent has zero visibility into how many steps it has left.The result: the conversation abruptly terminates with an error event. Any work the agent has done but not yet synthesized into a final answer is lost. The instance scores as failed.
The Problem
Without budget awareness, the agent can't prioritize
If the agent knew it had N steps remaining, it could:
max_iteration_per_run: nullhave no limit either, risking budget exhaustion)Without this information, an agent at step 498/500 will happily start a 50-step research task and get killed mid-way.
Without a graceful termination message, answers are lost
When the limit is hit:
local_conversation.pyfires aConversationErrorEventwithMaxIterationsReachedand sets status toERRORThe error type is indistinguishable from real errors
MaxIterationsReachedproduces the sameERRORstatus as infrastructure failures, OOM crashes, or API errors. Downstream reporting (e.g.,output_errors.jsonl,ERROR_LOGS.txt) treats iteration exhaustion identically to a crash. This makes it hard to:max_iterationsbased on dataProposed Solutions
Option A: Communicate the budget to the LLM (recommended)
Add iteration budget info to the system prompt and/or inject warnings as the agent approaches the limit.
In the system prompt:
Warning injection at ~80% and ~95% budget:
Option B: Graceful termination message (minimum fix)
If we don't want to expose the budget to the LLM (to avoid gaming or conservative behavior), at least inject a final message when the limit is about to be hit:
This gives the agent one last turn to produce an answer instead of being silently killed.
Option C: Return a distinct status instead of ERROR (minimum observability fix)
Instead of emitting a generic
ConversationErrorEventwithERRORstatus, return a dedicated status likeMAX_ITERATIONS_REACHEDorBUDGET_EXHAUSTED. This would:max_iterationsIn
local_conversation.py, this could be as simple as:Option D: All of the above (ideal)
ITERATION_LIMITstatus for observabilityAdditional Consideration: Subagent Iteration Limits
The
max_iteration_per_runfield for subagents is currentlynull(unlimited). This means a subagent could theoretically consume the entire remaining budget of the parent conversation. Consider:max_iteration_per_runfor subagents (e.g., 20-50 steps)Impact
Helpful for:
max_iterationssettings used for cost controlThe fix is low-effort (a few lines in the system prompt + a budget check before each LLM call) but high-impact — it turns a silent failure into a recoverable situation.