proposal: make Max Iteration Limit visible to the LLM and have a Graceful Termination Message

## Summary

The `max_iterations` parameter acts as a hard server-side ceiling on the number of LLM calls per conversation, but:

1. **The LLM is never informed of the limit.** The system prompt contains zero mention of iteration budgets, step limits, or remaining turns. The agent has no way to plan its work within the budget.
2. **When the limit is (or would be) hit, there is no graceful degradation.** The agent doesn't receive a warning as it approaches the ceiling, and there is no mechanism to prompt it to wrap up and provide a best-effort answer before being cut off.

### What the code does when the limit is hit

From `local_conversation.py`:

1. `max_iterations_per_run` is set on `ConversationState` (default 500) and tracked at line 594 as a local iteration counter.
2. The run loop (lines 596-689) increments `iteration` after each `agent.step()` call and checks `if iteration >= self.max_iteration_per_run`.
3. When the limit is reached, it **hard-stops** with an `ERROR` status and emits a `ConversationErrorEvent` with a `MaxIterationsReached` error.
4. The iteration count is **never passed to the agent** — not in `agent.step()`, not in the system prompt, not in the context. The agent has zero visibility into how many steps it has left.
5. There is **no warning injection** as the agent approaches the limit — no "you have N iterations remaining" message.

The result: the conversation abruptly terminates with an error event. Any work the agent has done but not yet synthesized into a final answer is lost. The instance scores as failed.

## The Problem

### Without budget awareness, the agent can't prioritize

If the agent knew it had N steps remaining, it could:
- **Plan ahead**: tackle the most critical parts first
- **Wrap up early**: provide a best-effort answer when time is running low
- **Avoid rabbit holes**: skip expensive exploration paths when budget is tight
- **Manage subagent delegation**: avoid spawning subagents late in the budget (subagent calls with `max_iteration_per_run: null` have no limit either, risking budget exhaustion)

Without this information, an agent at step 498/500 will happily start a 50-step research task and get killed mid-way.

### Without a graceful termination message, answers are lost

When the limit is hit:
- The run loop in `local_conversation.py` fires a `ConversationErrorEvent` with `MaxIterationsReached` and sets status to `ERROR`
- The LLM never gets a chance to synthesize its findings into a final answer
- All the work done up to that point is wasted — the instance scores as a hard error (not even "incorrect" — just failed)
- This is especially costly for tasks where the agent has gathered the right data but hasn't yet formatted the answer

### The error type is indistinguishable from real errors

`MaxIterationsReached` produces the same `ERROR` status as infrastructure failures, OOM crashes, or API errors. Downstream reporting (e.g., `output_errors.jsonl`, `ERROR_LOGS.txt`) treats iteration exhaustion identically to a crash. This makes it hard to:
- Distinguish "ran out of budget" from "something broke"
- Measure how often the limit is the bottleneck
- Tune `max_iterations` based on data

## Proposed Solutions

### Option A: Communicate the budget to the LLM (recommended)

Add iteration budget info to the system prompt and/or inject warnings as the agent approaches the limit.

**In the system prompt:**
```
You have a budget of {max_iterations} steps for this task.
Plan your approach to complete the task within this budget.
```

**Warning injection at ~80% and ~95% budget:**
```
[SYSTEM] You have used {current}/{max_iterations} steps.
{remaining} steps remaining. Begin wrapping up and provide your best answer.
```

### Option B: Graceful termination message (minimum fix)

If we don't want to expose the budget to the LLM (to avoid gaming or conservative behavior), at least inject a final message when the limit is about to be hit:

```
[SYSTEM] You are about to reach the maximum number of steps allowed.
This is your FINAL step. Provide your best answer NOW based on everything
you have gathered so far.
```

This gives the agent one last turn to produce an answer instead of being silently killed.

### Option C: Return a distinct status instead of ERROR (minimum observability fix)

Instead of emitting a generic `ConversationErrorEvent` with `ERROR` status, return a dedicated status like `MAX_ITERATIONS_REACHED` or `BUDGET_EXHAUSTED`. This would:
- Let eval harnesses distinguish "ran out of steps" from real crashes
- Enable data-driven tuning of `max_iterations`
- Allow retry logic to treat budget exhaustion differently (e.g., retry with higher limit vs. retry with same config)

In `local_conversation.py`, this could be as simple as:
```python
# Instead of:
raise ConversationError("MaxIterationsReached")

# Use a distinct status:
self.state.status = ConversationStatus.ITERATION_LIMIT
return ConversationIterationLimitEvent(
    iteration=iteration,
    max_iterations=self.max_iteration_per_run,
    last_agent_action=last_action,  # preserve what the agent was doing
)
```

### Option D: All of the above (ideal)

- Light budget awareness in the system prompt (without exact numbers): *"You have a limited number of steps. Work efficiently and be prepared to provide a best-effort answer if prompted."*
- Hard warning injection at 90-95% budget with exact remaining steps
- Final-chance message at the last step
- Distinct `ITERATION_LIMIT` status for observability

## Additional Consideration: Subagent Iteration Limits

The `max_iteration_per_run` field for subagents is currently `null` (unlimited). This means a subagent could theoretically consume the entire remaining budget of the parent conversation. Consider:
- Setting a default `max_iteration_per_run` for subagents (e.g., 20-50 steps)
- Or deducting subagent steps from the parent's budget

## Impact
Helpful for:
- Harder benchmarks (SWE-bench, complex GAIA Level 3 tasks)
- Lower `max_iterations` settings used for cost control
- Tasks that involve extensive web research or multi-step reasoning
- Heavy subagent usage patterns

The fix is low-effort (a few lines in the system prompt + a budget check before each LLM call) but high-impact — it turns a silent failure into a recoverable situation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: make Max Iteration Limit visible to the LLM and have a Graceful Termination Message #2406

Summary

What the code does when the limit is hit

The Problem

Without budget awareness, the agent can't prioritize

Without a graceful termination message, answers are lost

The error type is indistinguishable from real errors

Proposed Solutions

Option A: Communicate the budget to the LLM (recommended)

Option B: Graceful termination message (minimum fix)

Option C: Return a distinct status instead of ERROR (minimum observability fix)

Option D: All of the above (ideal)

Additional Consideration: Subagent Iteration Limits

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: make Max Iteration Limit visible to the LLM and have a Graceful Termination Message #2406

Description

Summary

What the code does when the limit is hit

The Problem

Without budget awareness, the agent can't prioritize

Without a graceful termination message, answers are lost

The error type is indistinguishable from real errors

Proposed Solutions

Option A: Communicate the budget to the LLM (recommended)

Option B: Graceful termination message (minimum fix)

Option C: Return a distinct status instead of ERROR (minimum observability fix)

Option D: All of the above (ideal)

Additional Consideration: Subagent Iteration Limits

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions